Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

In the ever-evolving landscape of scientific research, the meticulous documentation and citation of data play a pivotal role in fostering transparency, reproducibility, and the advancement of knowledge. Researchers rely on a multitude of datasets to underpin their findings, drawn from various sources and disciplines. As the volume and complexity of available data continue to expand, the need for systematic and standardized data citations becomes increasingly paramount.

Much like scholarly articles, which are bound by specific citation guidelines to maintain clarity and conciseness, datasets too require appropriate acknowledgment within the constraints of maximum citations allowed by journals. A notable example can be found in prestigious journals such as Nature, which imposes a restriction of up to 50 references in the main text, balancing the need for comprehensive literature coverage with practical page constraints. This limitation, while essential for managing the flow of information within a published article, presents a challenge when researchers seek to credit and reference the extensive datasets that underpin their findings.

In this context, the concept of data citations gains prominence as an essential facet of scholarly communication. Properly crediting datasets not only honors the contributions of data creators but also facilitates the reproducibility and validation of research. This introduces the need for curated dataset collections – repositories that allow researchers to systematically cite and reference datasets without encroaching on the limited space allocated for citations in scholarly articles. 

At the forefront of this solution is the top-level Digital Object Identifier (DOI) associated with the overarching dataset collection. By employing a single DOI for the entire collection, researchers can efficiently cite the comprehensive range of datasets housed within, simplifying the attribution process while remaining within the confines of journal citation limits. This top-level DOI serves as a gateway, allowing scholars to access and explore the wealth of individual DOIs associated with specific datasets contained in the collection. For more information on data citations and DOIs see Data Citations

Ocean Networks Canada provides the service of curating dataset collections to users upon request. More information on this process can be found below. 

ONC defines a dataset as one deployment of one device. A dataset can belong to more than one collection, depending on the use case. A data collection (aka aggregation, reliquary) is a series of datasets that have been logically arranged around a similar concept or set of parameters.

A dataset collection can include cases such as:


A Collection and its associated landing page consists of several pieces of metadata including:

  • The Title is composed of several pieces of information describing the dataset collection: location, the type of data and related device categories, and the date range 
  • The DOI is the persistent identifier for the entire dataset collection
  • The Abstract provides the user with a high level description of the collection
  • The Creator specifies the author/organization responsible for creating the dataset collection
  • The Date Created is the date that the collection was created
  • The Funding References section includes any/all funding organizations that contributed to the data collection and archival
  • The Publisher section specifies the organization that made the data available to the public
  • The Resource Type will always be Collection, specifying that the collection includes multiple datasets
  • The Rights section points towards ONC’s Data Usage Policy, helping users understand the constraints and licensing of the associated dataset
  • The Formats section provides a list of the types of file formats the data are available in
  • The Geolocations provide the user with a bounding box of where the data were collected
  • The Contributors section lists the organization(s) and roles involved with data collection and archival
  • The Component Datasets list all of the datasets included within the collection, as well as their respective DOIs
  • The Version History section provides details about any updates to the collection after it was initially assigned its DOI, e.g., reprocessing of the data. The reason for the new version and the date it took place are also provided


Current status of Data Collection implementation at ONC:


What to do when a user wants to request a collection?

  • The user should contact ONC Data Stewardship team at datacitations@oceannetworks.ca 
    • Please contact us as soon as you know the datasets you will need to cite. Providing us with time to manually create the requested collection to support your publication
    • Please note that the process to curate a dataset collection is iterative and requires user input