NEW! Computational Support
Meet one-on-one with an expert in Python, R, ArcGIS, data visualization, and many other data science/programming packages.
For more information or to set up an appointment, contact the Research Data Management Librarian.
Are you looking for massive parallel computing power? Consider the High Performance Computing Cluster.
Good data documentation practices facilitate reproducibility and reuse. Many of the recommendations provided below build upon established scholarly practice. As research workflows grow more complicated, new tools and techniques have emerged to help researchers create the metadata necessary to understand and manipulate datasets.
General principles:
File-level metadata describes each individual file in the dataset. This information can be written up in an accompanying document, or it can be embedded in the file itself. For example, a PDF or Word document can carry a record of its "creator", and an image stored in TIFF or JPEG format frequently has embedded information about the date it was taken, the type of camera used, and even the GPS coordinates where the photo was shot.
Either embedded metadata or metadata kept in a separate file can be a good solution. The most important thing is to ensure that a file and its metadata do not get separated as files are copied, moved, and processed. Before finalizing a workflow, test whether metadata reliably follows the files as they pass through each stage of processing.
Important note: Metadata about a file that is visible in one program may not be readable by another. Additionally, some programs make it appear as though metadata is embedded in the file when it is actually internal to the program's database.
Tips for creating file-level metadata:
Metadata that covers a dataset has two functions – to help people find the data, and then to use it once they've found it.
Researchers and other subject matter experts are in the best position to create comprehensive metadata on methodology, workflows, and analyses so that their peers can evaluate and potentially reuse the dataset. One way to do this is via a plaintext (.txt) Readme file. A Readme template is provided at the bottom of this section.
Datasets placed into repositories or archives will have additional metadata that allow for indexing and searching. The quality of this kind of metadata strongly depends upon standardization, and many disciplines have developed metadata standards specifically for datasets (examples include DDI for the social sciences and EML for ecology). Software tools can help researchers create this kind of metadata, but consultation with a librarian is recommended prior to publishing or archiving datasets.
Important Note: As with file-level metadata, the most important consideration is that all relevant documentation remains with the dataset wherever and however it is stored.