What things should be considered when thinking about publishing research data?
1. Agree on data ownership in time
Data may only be published or accessed by permission from the owner of the data. If data is collected in a project, make written agreements on the ownership with all parties, preferably before the collection. If you use data collected by third parties, ensure the publishing rights before transfer of the data. Ask for a written agreement on this.
Without written agreements, the ownership is difficult to determine. You may only omit agreements, if you do all the phases of the research project on your own, without supervision.
2. Plan where to publish your data - already when planning the research project
Does your research funder recommend some data archives (check the funder’s guidelines & Sherpa/Juliet)? How about the journal where you publish your results (check the author guidelines & FairSharing.org)?
It is good to publish data in the same place where also other researchers in your field publish their data. Therefore, ask tips from your colleagues, search for data citations in publications in your field, or check data search services to see where the same kind of data has been published (e.g. Re3Data.org, DataMed).
If your research project produces several types of data, store each data type in the most suitable location for that type - instead of storing everything in just one archive. For example, some archives are more suited for survey data, and others for imaging data. Also, remember to include a description of where the various parts of the overall set of the data can be found.
Some archives will publish any data while others follow strict quality criteria. A good data archive provides a permanent identifier for the data (such as DOI or URN), making it easy to cite data correctly in publications. Also, see what kind of support services and access management the archive provides.
3. Check the data archive’s requirements for documentation, metadata and file format
To ensure the data can be utilized further, high-quality data archives require certain formats and documentation for the datasets. Adequate documentation needs to be provided with the data to ensure that it is interpreted correctly and the research results are verifiable. In addition, pay attention to file formats and prefer commonly used ones.
Documentation methods should be planned at the beginning of the project, before starting to collect the data. Documenting the data at the end of the project is difficult and can take a lot of time. Hence, investing time in documentation during the project will correspondingly safe time in the publishing phase.
4. Remember the privacy of the research subjects
If your data contains sensitive or confidential information - such as sensitive personal information - you must determine whether only metadata of the data set can be published, or is it possible to anonymize the data. See the Finnish Social Science Data Archive for more information about identifiable data and anonymisation. The support services of your organisation will provide help and advice in data protection.
5. Choose a suitable license for your data
You can use Creative Commons licenses, among others, to specify the conditions for further use of your data. Publishing under the CC license means granting some rights to the users according to conditions which you specify. MIT and GNU-GPL licenses are suitable for algorithms, codes and software related to data.
For example it is possible to store data to The Language Bank of Finland and Finnish Social Science Data Archive, both with open license or with limited access (e.g. if the data contains personal information). Check the instructions of the respective data archive and / or contact the archive at an early stage to make sure that your data can be further utilized.
Mari Elisa Kuusniemi, Katri Larmo and Siiri Fuchs work as information specialists in Data Support at the library of the University of Helsinki.
Tämä teos on lisensoitu Creative Commons Nimeä 4.0 Kansainvälinen -lisenssillä. Detta verk är licensierat under en Creative Commons Erkännande 4.0 Licens. This work is licensed under a Creative Commons Attribution 4.0 International license.