Subject Guides: Research Data Services: Documentation

Data Services Offered

NEW! Computational Support

Meet one-on-one with an expert in Python, R, ArcGIS, data visualization, and many other data science/programming packages.

Data Skill Consultations

For more information or to set up an appointment, contact the Research Data Management Librarian.

Are you looking for massive parallel computing power? Consider the High Performance Computing Cluster.

Documentation Overview

Good data documentation practices facilitate reproducibility and reuse. Many of the recommendations provided below build upon established scholarly practice. As research workflows grow more complicated, new tools and techniques have emerged to help researchers create the metadata necessary to understand and manipulate datasets.

General principles:

It is easier to document work while it is ongoing rather than retrospectively
Ask: would a peer or colleague have enough information to make sense of the data?
More is ideal, but some is better than none
If you looked back through your data five years from now, could you still interpret it?

File-level Metadata

File-level metadata describes each individual file in the dataset. This information can be written up in an accompanying document, or it can be embedded in the file itself. For example, a PDF or Word document can carry a record of its "creator", and an image stored in TIFF or JPEG format frequently has embedded information about the date it was taken, the type of camera used, and even the GPS coordinates where the photo was shot.

Either embedded metadata or metadata kept in a separate file can be a good solution. The most important thing is to ensure that a file and its metadata do not get separated as files are copied, moved, and processed. Before finalizing a workflow, test whether metadata reliably follows the files as they pass through each stage of processing.

Important note: Metadata about a file that is visible in one program may not be readable by another. Additionally, some programs make it appear as though metadata is embedded in the file when it is actually internal to the program's database.

Tips for creating file-level metadata:

Filenames are metadata – related files should have a consistent, descriptive naming scheme.
For each file, record who created it, the date of creation, and the date it was last updated if applicable.
What program created the file, and how should it be read? (For instance, if the file is a script, what environment is needed to run it?)
For tabular data, is there a key for the column headings? How are missing values denoted?

Dataset Metadata

Metadata that covers a dataset has two functions – to help people find the data, and then to use it once they've found it.

Researchers and other subject matter experts are in the best position to create comprehensive metadata on methodology, workflows, and analyses so that their peers can evaluate and potentially reuse the dataset. One way to do this is via a plaintext (.txt) Readme file. A Readme template is provided at the bottom of this section.

Datasets placed into repositories or archives will have additional metadata that allow for indexing and searching. The quality of this kind of metadata strongly depends upon standardization, and many disciplines have developed metadata standards specifically for datasets (examples include DDI for the social sciences and EML for ecology). Software tools can help researchers create this kind of metadata, but consultation with a librarian is recommended prior to publishing or archiving datasets.

Important Note: As with file-level metadata, the most important consideration is that all relevant documentation remains with the dataset wherever and however it is stored.