Skip to content

Use Case: Citing very large volume datasets that are too large for current repositories #17

@hsu000001

Description

@hsu000001

Use Case Title: Citation of very large volume datasets that are too large for current repositories

  • Contributors: Leslie Hsu @hsu000001
  • label: citing data

Goals and Summary

Investigator runs experiments where the main raw data type is high resolution images and videos. Raw data is about 4 TB per experiment. Processed data is still 1 TB in order to provide a dataset that would allow reproducing the results. Current data repositories usually do not offer this much storage, so it is very hard to obtain a citable DOI for such a large dataset. Usually, dataset DOIs are not assigned unless the "trusted" allocating agent has possession of the data resource (so that the DOI will not point to a resource that moves or is changed).

Why is it important and to whom?

  • Important to investigators who produce such large datasets but are told by funding agencies to make their data available to the public.
  • Important to data repositories who serve communities that produce such large, long-tail datasets.

Why hasn’t it been solved yet?

  • Storage of large datasets is expensive.

Actionable Outcomes

If guidelines, best practices, or some sort of solution is found during the workshop, they will be disseminated through the EarthCube Research Coordination Network SEN (Sediment Experimentalist Network), and also shared with the several investigators who have asked me about this for large image/video datasets.

Additional Information and Links

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions