In the ever-evolving landscape of scientific research, the meticulous documentation and citation of data play a pivotal role in fostering transparency, reproducibility, and the advancement of knowledge. Researchers rely on a multitude of datasets to underpin their findings, drawn from various sources and disciplines. As the volume and complexity of available data continue to expand, the need for systematic and standardized data citations becomes increasingly paramount.

Much like scholarly articles, which are bound by specific citation guidelines to maintain clarity and conciseness, datasets too require appropriate acknowledgment within the constraints of maximum citations allowed by journals. A notable example can be found in prestigious journals such as Nature, which imposes a restriction of up to 50 references in the main text, balancing the need for comprehensive literature coverage with practical page constraints. This limitation, while essential for managing the flow of information within a published article, presents a challenge when researchers seek to credit and reference the extensive datasets that underpin their findings.

In this context, the concept of data citations gains prominence as an essential facet of scholarly communication. Properly crediting datasets not only honors the contributions of data creators but also facilitates the reproducibility and validation of research. This introduces the need for curated dataset collections – repositories that allow researchers to systematically cite and reference datasets without encroaching on the limited space allocated for citations in scholarly articles. 

At the forefront of this solution is the top-level Digital Object Identifier (DOI) associated with the overarching dataset collection. By employing a single DOI for the entire collection, researchers can efficiently cite the comprehensive range of datasets housed within, simplifying the attribution process while remaining within the confines of journal citation limits. This top-level DOI serves as a gateway, allowing scholars to access and explore the wealth of individual DOIs associated with specific datasets contained in the collection. For more information on data citations and DOIs see Data Citations

Ocean Networks Canada provides the service of curating dataset collections to users upon request. More information on this process can be found below. 

ONC defines a dataset as one deployment of one device. A dataset can belong to more than one collection, depending on the use case. A data collection (aka aggregation, reliquary) is a series of datasets that have been logically arranged around a similar concept or set of parameters.

A dataset collection can include cases such as:


A Collection and its associated landing page consists of several pieces of metadata including:


Current status of Data Collection implementation at ONC:


What to do when a user wants to request a collection?