The speed with which COVID-19 research is being produced and shared only increases the need for good documentation and description. It may be necessary to make available the results of preliminary trials or unprocessed datasets, which may be difficult for fellow researchers to interpret without contextual information. To ensure your data are independently understandable, please consider providing:
- a codebook and/or a README file,
- a description of the methods you used to collect and/or process the data,
- any associated code, scripts, or syntax files you used to process or analyze your data, and
- a data use agreement if any data will be restricted.
You can also download this document as a PDF from the Portage Zenodo Community.
- Create a README File
- Include Recommended Supplementary Materials
- Add Repository Metadata
- Specify Data Use Agreements
Create a README File
When creating a codebook and/or README file, please consider the following:
- Include a point of contact.
- List any restrictions on secondary use of your data.
- For quantitative datasets, define all variables and allowable values. When applicable, include units of measure, and define the code you used for missing or null values.
- Include a brief description of your study, the methods you used to collect your data, and any steps you took to anonymize or otherwise process your data.
- If you removed variables from your raw dataset to create a public use copy for archiving, include a list of the variables that were removed so the changes made to your raw dataset are transparent. You might also choose to provide summary statistics for any variable that was removed.
- The names of equipment or instruments used to collect data, and software or statistical packages that were used to process the data should be listed. If possible, include the version of software you used.
- If your file formats are not plain text, include a recommendation for software that can be used to view or analyze the files.
Further guidance is available in UBC’s Quick Guide: Creating a README for your dataset and Cornell University’s Guide to writing “readme” style metadata. Cornell has also published a README template that you are free to download and modify for your own dataset.
Include Recommended Supplementary Materials
Please consider depositing the following documents as well, or link to them from your codebook or README if they have been deposited elsewhere:
- A copy of your Data Management Plan.
- An unsigned copy of the consent form you provided to study participants. This may be archived alongside your data, or you may provide a copy to the data repository if your data will be screened for disclosure risk before they are archived.
- Consider archiving supplemental materials that provide context to your dataset, for example, study protocols, clinical study reports, and statistical analysis plans.
- Plan to include links to any published article(s) or related resources that report on the results of your analyses. You may be able to add links to your publication(s) later, if you have deposited your data first.
Add Repository Metadata
Another way to provide context to your data is to add rich metadata when you deposit it. Some disciplinary repositories have required standards you must adhere to, but many repositories require only basic information in order to publish, such as title, contact author, and a choice of license. To increase the findability and usability of your data, plan to include a descriptive title and keywords. A robust description will also help other researchers understand your data and will be especially important if access to your dataset is restricted. It may be helpful to think of the dataset description as you would an article abstract. The description may include high-level information about the dataset such as the data collection methodology, steps you took to process or analyze the data, the results of your analyses, and other potential uses for the data.
Specify Data Use Agreements