-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Use Case Title: Citation of very large volume datasets that are too large for current repositories
- Contributors: Leslie Hsu @hsu000001
- label: citing data
Goals and Summary
Investigator runs experiments where the main raw data type is high resolution images and videos. Raw data is about 4 TB per experiment. Processed data is still 1 TB in order to provide a dataset that would allow reproducing the results. Current data repositories usually do not offer this much storage, so it is very hard to obtain a citable DOI for such a large dataset. Usually, dataset DOIs are not assigned unless the "trusted" allocating agent has possession of the data resource (so that the DOI will not point to a resource that moves or is changed).
Why is it important and to whom?
- Important to investigators who produce such large datasets but are told by funding agencies to make their data available to the public.
- Important to data repositories who serve communities that produce such large, long-tail datasets.
Why hasn’t it been solved yet?
- Storage of large datasets is expensive.
Actionable Outcomes
If guidelines, best practices, or some sort of solution is found during the workshop, they will be disseminated through the EarthCube Research Coordination Network SEN (Sediment Experimentalist Network), and also shared with the several investigators who have asked me about this for large image/video datasets.