Getting started with research data

Before you even begin your collection/production of data it´s advisable to set up a plan for how this data shall be managed. A Data Management Plan (DMP) is often a necessity if you want to receive a grant. And a DMP that is updated and followed up on a regular basis will provide support come project implement time.

DMPonline at Halmstad University

Use Halmstad University's instance of DMPonline (https://dmp.hh.se) to kickstart your DMP! For more information, see the help page.

What data should I manage?

Data can be qualitative (e.g., text interviews, images and videos, audio recordings) or quantitative (e.g., tabular data, structured databases). Are there any ethical aspects (including personal data) or intellectual property rights issues with your data that you need to address?

As part of your Research Data Management (RDM) you should manage any data and code, as well as documentation about them, that are created or used as part of a research project. This might include:

Quantitative and qualitative data
Raw or processed data
Notes
Laboratory or research notebooks
Codebooks
Code or software used to run data analyses
Data workflows or pipelines
Metadata (documentation describing the data)

As a rule of thumb

You should know the location of all data produced by or used in your research project and it should be annotated sufficiently so that others can understand and reproduce your work, and possibly re-use your data in future studies.

What should I document about my research data?

What is the sufficient information needed for others to be able to use your data, understand or replicate your work? The answer to that and your type of data determines the need to document some or all of the items below.

Research Project Documentation

Rationale and context for data collection
Data collection methods
Structure and organisation of data files
Data sources used
Strategies for data validation and quality assurance
Analytical steps and pipelines (if any) used to process data
Information on data confidentiality, access and use conditions (sensitive data, such as personal data)
Archiving

Dataset documentation

Variable names and descriptions (for quantitative data)
Explanation of codes and classification schemes used
Algorithms used to transform data (including code)
File format (including version) for any software used

Read more about the importance of Research data management for both institutions and researchers.

Perrier, L., Blondal, E., Ayala, A. P., Dearborn, D., Kenny, T., & Pluye, P. (2017). Research data management in academic institutions: A scoping review. PLOS ONE, 12(5), e0178261. https://doi.org/10.1371/journal.pone.0178261

Classify information

Depending on the type of data, for instance personal data or other sensitive data, you might need to rethink how you plan to manage and store your data.

There are a couple of documents dedicated to the handling (Riktlinjer för hantering av information) and classification (Rutinbeskrivning för informationsklassning) of information at Halmstad University. There is also a page, Personal data processing, on the staff web where you can read more about handling of personal data.

Storing data

At Halmstad University Sunet Drive is used as a dedicated storage solution for research data, that meet the needs required for storage of all types of research data. Find out more here.

If you are not used to the procedure of classifying your data, or have other questions regarding data security, please contact the Data Protection Unit at Halmstad University (dataskydd@hh.se).

Organising data

It will be easier to find and to keep track of data files, even after a long time, if the file names are sensible and the folder structures are well-organised. Delete data and files that are not needed and will not to be archived. Separate work in progress or drafts from completed work. Make sure you backup your original data.

File structuring

Think carefully how best to structure files in folders, in order to make it easy to locate and organise files and versions. When working in collaboration with others, the need for an orderly structure is even more important.

If your workplace already have established ways to structure folders, use the same method.

Consider the best hierarchy for files, deciding whether a deep or shallow hierarchy is preferable. But use a hierarchical structure!

Example folder structure

In the example to the right, data and documentation files are held in separate folders. Data files are further organised according to data type and then according to research activity. Documentation files are organised also according to type of documentation file and research activity.

File naming

A file name should be seen as a principal identifier for a file. Therefore good file naming conventions can give clues to the content, status and version of a file. It can uniquely identify a file and help in classifying and sorting files. File names that reflect the file content also facilitate searching and discovering files. In collaborative research, it is vital to keep track of changes and edits to files via the file name. File names should be independent of the location of the file on a computer.

There are software available that can help in naming of files. Bulk renaming of files can be done with the Bulk Rename Utility in Windows, or with software such as Ant Renamer, Rename-IT or Renamer (MacOS).

Best practice is to:

create sensible, meaningful but brief names
use file names to classify types of files
avoid using spaces, dots and special characters (& or ? or !)
use hyphens (-) or underscores (_) to separate elements in a file name
avoid very long file names
include versioning within file names where appropriate, e.g. _v1, _v2

Even though computers add basic information and properties to a file, such as file type, date and time of creation and modification, this is not reliable data management. This type of metadata should instead be added to the file name.

This article largely builds upon information from the UK Data Service.