Getting started with research data
In this article
What data should I manage?
Data can be qualitative (e.g., text interviews, images and videos, audio recordings) or quantitative (e.g., tabular data, structured databases). Are there any ethical aspects (including personal data) or intellectual property rights issues with your data that you need to address?
As part of your Research Data Management (RDM) you should manage any data and code, as well as documentation about them, that are created or used as part of a research project. This might include:
- Quantitative and qualitative data
- Raw or processed data
- Laboratory or research notebooks
- Code or software used to run data analyses
- Data workflows or pipelines
- Metadata (documentation describing the data)
What should I document about my research data?
What is the sufficient information needed for others to be able to use your data, understand or replicate your work? The answer to that and your type of data determines the need to document some or all of the items below.
Research Project Documentation
- Rationale and context for data collection
- Data collection methods
- Structure and organisation of data files
- Data sources used
- Strategies for data validation and quality assurance
- Analytical steps and pipelines (if any) used to process data
- Information on data confidentiality, access and use conditions (sensitive data, such as personal data)
- Variable names and descriptions (for quantitative data)
- Explanation of codes and classification schemes used
- Algorithms used to transform data (including code)
- File format (including version) for any software used
Read more about the importance of Research data management for both institutions and researchers.
Perrier L, Blondal E, Ayala AP, Dearborn D, Kenny T, et al. (2017) Research data management in academic institutions: A scoping review. PLOS ONE 12(5): e0178261. https://doi.org/10.1371/journal.pone.0178261
Depending on the type of data, for instance personal data or other sensitive data, you might need to rethink how you plan to manage and store your data.
There are a couple of documents dedicated to the handling ( Riktlinjer för hantering av information) and classification (Rutinbeskrivning för informationsklassning) of information at Halmstad University. There is also a page, Guidelines for processing information, on the staff web where you can read more about handling of personal data as well as a guide to help you identify a correct storage solution for your needs.
At Halmstad University Sunet Drive is used as a dedicated storage solution for research data, that meet the needs required for storage of all types of research data. Find out more here.
If you are not used to the procedure of classifying your data, or have other questions regarding data security, please contact the Data Protection Unit at Halmstad University (email@example.com).
It will be easier to find and to keep track of data files, even after a long time, if the file names are sensible and the folder structures are well-organised. Delete data and files that are not needed and will not to be archived. Separate work in progress or drafts from completed work. Make sure you backup your original data.
Think carefully how best to structure files in folders, in order to make it easy to locate and organise files and versions. When working in collaboration with others, the need for an orderly structure is even more important.
If your workplace already have established ways to structure folders, use the same method.
Consider the best hierarchy for files, deciding whether a deep or shallow hierarchy is preferable. But use a hierarchical structure!
Example folder structure
In the example to the right, data and documentation files are held in separate folders. Data files are further organised according to data type and then according to research activity. Documentation files are organised also according to type of documentation file and research activity.
A file name should be seen as a principal identifier for a file. Therefore good file naming conventions can give clues to the content, status and version of a file. It can uniquely identify a file and help in classifying and sorting files. File names that reflect the file content also facilitate searching and discovering files. In collaborative research, it is vital to keep track of changes and edits to files via the file name. File names should be independent of the location of the file on a computer.
There are software available that can help in naming of files. Bulk renaming of files can be done with the Bulk Rename Utility in Windows, or with software such as Ant Renamer, Rename-IT or Renamer (MacOS).
Best practice is to:
- create sensible, meaningful but brief names
- use file names to classify types of files
- avoid using spaces, dots and special characters (& or ? or !)
- use hyphens (-) or underscores (_) to separate elements in a file name
- avoid very long file names
- include versioning within file names where appropriate, e.g. _v1, _v2
Even though computers add basic information and properties to a file, such as file type, date and time of creation and modification, this is not reliable data management. This type of metadata should instead be added to the file name.
This article largely builds upon information from the UK Data Service.