Do you use social media data in your research?

23.10.2019

Using material collected from social media entails a number of legal obligations and complex questions of research ethics.

Social media data may be readily or “freely” available, but that does not eliminate the need for careful legal and ethical consideration. On the contrary, using such data entails substantial legal obligations and complex questions of research ethics.

At present, much depends on the researcher’s own judgement. There are currently no widely accepted codes of practice or recommendations covering social media research. The Ethical Decision-Making and Internet Research (2012) report provides guidelines on internet research more broadly, but its recommendations are general in nature.

Social media can be defined in several ways, but every definition comes down to the same core. There are many different ways to use social media in research. As part of the research process, it acts as a channel for recruiting subjects and as a source of various types of data. Social media is researched, for example, using observational study and interactive study methods, as well as surveys and interviews. Social media is also employed for scientific communication between researchers in the same field and with the public more widely.

It is a good idea to begin planning research by considering the legal perspectives associated with social media material. On the one hand, this ensures the legality of the research. On the other hand, it provides a basis on which to evaluate the adequacy and sufficiency of the legally required measures from an ethical perspective. It is often said that the law sets out the ethical baseline. For activities to be at all morally acceptable, they must take into consideration the applicable national and international laws. In practice, this type of approach has been prominent in the research funding application process of the Horizon 2020 Framework Programme from the European Union.

However, the relationship between law and ethics is more complex. Laws can be assessed from an ethical standpoint, and they can also be unethical. Conversely, the actions required by law may, at best, ensure the realisation of fundamental ethical principles to an adequate extent. Such principles include, for example, avoiding harm, respecting autonomy or the right of self-determination, respecting privacy and respecting human dignity.

At the very least, laws can significantly contribute to the realisation of these principles, but additional actions are often necessary. By way of example, this applies to the following three perspectives related to social media research material.

Take note of contractual and copyright constraints and obligations

Researchers must carefully familiarise themselves with the terms of use of the online platform in question. Many platforms, such as Twitter, enable scientific research and offer interface tools for collecting data. However, they also prohibit automated data collection, known as web scraping. This does not apply to every platform – the terms of use in relation to research can vary. The rapid development of platforms, opacity of the terms of use and occasional changes to conditions present further challenges in this regard.

When researchers collect and use social media data, they are considered third parties. From a legal perspective, two contractual relationships can be distinguished: one between the social media platform provider and the user who produces the content, and another between the platform provider and the researcher.

Social media data may include copyrighted material.

Social media data may include copyrighted material. Insofar as collecting material can be considered in legal terms to be reproducing a work, the copyright perspectives already apply to the material collection phase (this is atypical). Copyright issues are most often relevant to research publications, the further use of material and the potential opening of material.

The threshold of originality is surpassed when the work is the outcome of original independent creative activity. For example, photographs taken by research subjects are normally considered original works, but individual tweets have also been legally considered original works.

Copyrighted materials can be used under the exemption for teaching and research use. However, according to Kopiosto, which manages licensing in Finland, this does not generally apply to social media data. In such cases, the copyright holder’s consent is required for the data to be used for research purposes.

If the material is to be published, it would be good to request the consent of any people who appear in photographs, audio or video. In principle, if a research publication includes direct quotes that have not previously been published, the cited party’s permission is required.

Ensure data protection and respect privacy

Personal data, such as online identifiers, is almost always associated with social media data. In such cases, the collected material is subject to data protection regulations. In the context of social media, it is often difficult or, in some cases, impossible to anonymise data. Conventional methods, such as removing identifiers and coarsening information, are ineffective.

On Twitter, it is possible to enter just part of the text into the search field and find the original tweet, accompanied by the user profile, with little difficulty. Anonymisation may also conflict with the platform’s terms of use. On Twitter, the user handle must be shown alongside direct quotes. Tweets should be published in full, and they should not be substantially altered.

In the context of social media, it is often difficult or, in some cases, impossible to anonymise data.

There must always be a legal basis for processing personal data. For scientific research, this is most often (i) a task carried out in the public interest or in the exercise of official authority or (ii) consent. The processing of special categories of personal data (sensitive data) is prohibited. Such data includes details, for example, on ethnic origin, political opinions and health.

However, there are exceptions to this prohibition. For social media research, the basis for processing data is usually (i) explicit consent, (ii) information manifestly made public or (iii) archiving, or research or statistical purposes in the public interest. This requires a data protection statement, risk/impact assessment and safeguards. First of all, it is advisable to double-check the research institute’s data protection policy.

In terms of ethics, a wider point of view is also required in order to protect privacy. The relevant perspective is the view of the platform user concerning how personal they consider the material to be. Did the user manifestly make the material public? It is often said for social media that the context of publication and the “objective” of the material largely determine its sensitivity.

The privacy settings on platforms often change, and it is highly likely that many users accept the changes without truly looking into them. In the worst case, this could result in a user failing to notice if material intended to be private has become publicly available.

Irrespective of the platform user’s perspective, some of the produced material may also include information about others which should be considered private.

Evaluate the need for consent, how consent is obtained and the necessary measures

Is express consent required for social media research? It is not traditionally needed for published information and material from registers and archives. On social media, drawing a line between what is public and what is private is more difficult than normal. Clear cases include closed, password-protected groups.

On many platforms, when users accept the terms of use, they are approving the use of the materials they produce for purposes including scientific research. This may be adequate consent from a legal perspective, but it is often insufficient from the perspective of research ethics.

For example, what happens if the user later deletes their account? Should this also be considered as withdrawing from the research? Can the data collected until that point still be used? Equally, just because a person uses a platform that enables scientific research in its terms of use, it cannot be inferred that free informed consent has been provided. Free informed consent is also not the same thing as consent forming the legal basis for processing personal data or the licence to copy legally protected work and make it available to the public.

Free informed consent is an essential ethical tool for safeguarding the right of self-determination in particular, and also for ensuring the avoidance of harm. It demands genuinely voluntary participation in the research, the opportunity to withdraw from the study at any time without incurring any consequences, and sufficient understanding of and information about the research. The last of these includes, for example, information on the risks and benefits of participation and the opportunity to ask questions.

But is this consent necessary? In practice, many factors make it more difficult to obtain for social media research. Complicating factors may include a very large number of research subjects, anonymous conversations, subsequently deleted accounts and generally outdated material.

The Finnish National Board on Research Integrity’s guidelines entitled The ethical principles of research with human participants and ethical review in the human sciences in Finland (2019) do not mention social media. However, they do define the research settings that require prior ethical assessment.

If consent is not requested (and the material is not entirely public), the researcher must apply for a statement from their institute of higher education, research institution or local Human Sciences Ethics Committee. The statement cannot be requested once the research has begun, and failure to obtain a statement may, in the worst case, present a violation of the responsible conduct of research. In the context of social media research, it is advisable to contact the Committee secretary in advance.

Even if free informed consent is not obtained, various measures can be taken to ensure that some of its components can be realised. One way of increasing the autonomy of research subjects is to keep basic information about the research up to date on the research project’s website. To some extent, this can ensure the fulfilment of the general duty to inform the research subject (for which the law recognises more exceptions than ethics), as well as their ability to withdraw from the study if desired.

Please find out whether you need a statement from the Human Sciences Ethics Committee before you begin your research.

In a certain sense, the three perspectives outlined above lead the researcher onto perhaps the most pressing ethical principle: consideration of how to avoid causing harm to research subjects. Do the research subjects belong to vulnerable groups, such as minors or those with limited capacity? Is the research topic sensitive? Could the research results reinforce preconceptions, stigmatize or pigeonhole individuals or groups? Could the research cause long-term mental harm to the subjects? Does the research entail safety threats for the subjects or their relatives, or for researchers? And so on.

If any of the foregoing questions can be answered in the affirmative, please find out whether you need a statement from the Human Sciences Ethics Committee before you begin your research. Preconditions for receiving a positive statement include taking appropriate account of contractual matters, copyrights and data protection.

Marko Ahteensuu, a Docent in practical philosophy, works as an acting university lecturer in research ethics at Tampere University.

Research data