DML Hub Data Sharing Policy and Procedures

DML Hub Data Sharing Policy and Procedures February 21, 2011

    I. Introduction

The Digital Media and Learning Initiative (“DML Initiative”) is a funding initiative of the John D. and Catherine T. MacArthur Foundation that aims to determine how digital media are changing the way people—especially young people—learn, play, socialize and participate in civic life. As part of the DML Initiative, the Foundation has funded the DML Hub, a research project at the University of California Humanities Research Institute. The purpose of the DML Hub is to develop shared policies, infrastructures, and resources to facilitate research collaboration and outreach for the DML field.

The DML Hub has determined that in order to further the work of building the DML field, it is important to establish this policy on the sharing of data by grantees of the Foundation and third parties who may wish to access the data created by Foundation grantees. This policy is consistent with and in furtherance of the Foundation’s Intellectual Property Policy that calls for the broad dissemination of work product and sharing of data created with Foundation funds.

    II. Rationale for the Policy

The overarching principle of the DML Hub’s data sharing policy rests on the belief that data should be shared as openly as possible to the degree that it is consistent with the protocols and consent structures approved by grantees’ Institutional Review Boards (“IRBs”), privacy of research subjects, and does not unfairly compromise the rights of Research Leads1 to analyze and publish their research material. The benefits of sharing are to inform the construction of a more robust knowledge base and to foster a culture of collaboration among researchers that is ultimately beneficial for the learning of all the participating researchers in the DML field. In order to realize these benefits, we must build a base of mutual respect and trust among different participating researchers and embark on this effort with a spirit of experimentation and open communication so that we can continue to iterate and improve on our policies and norms. Through this process, our work can provide models and leadership to define a set of practices that promote sharing and collaborative work in the service of field-building.

Data sharing will also promote the objectives of the DML Initiative, including the development of an evidence base for how learning is distributed across both formal and informal settings, and mediated by networked social media. Pursuing this research agenda requires synthesis of findings across diverse settings and institutions, such as in schools, in homes, in networked media, on online sites, in youth peer cultures, and in afterschool settings. The DML Initiative offers an opportunity to develop a new model of networked and distributed knowledge production towards the goal of developing this robust empirical evidence base in order to inform research, policy, and practice in highly complex societies.

A networked knowledge base of this type requires collaborative analysis and sharing of data across different researchers tackling different dimensions of the problems surrounding new media and learning. The DML Initiative offers a unique opportunity to conduct collaborative analyses across broad data sets documenting learning in diverse settings by diverse kids. Although it is common to conduct analyses across diverse setting and demographics for quantitative data, it is rare for qualitative data to be analyzed at this breadth and with this range of diversity. Sharing of multiple forms of data across dimensions will elevate the research capacity of all participating projects and researchers by providing multiple vantage points to each body of research, and opening up opportunities for collaboration. Further, by making protocols, instruments, and to the extent possible, data available to the public, an open process for data sharing will enable other researchers to further extend the utility and influence of indicators, findings, and frameworks, supporting the building of the DML field writ large.

In addition to the formal data sharing policies and infrastructures described below, we expect to set up ongoing occasions for researchers at all academic levels to be in conversation concerning emerging codes, themes, analysis, and publication plans. The goal of the data sharing process is to provide an infrastructure for collaboration. But, productive collaboration will only succeed in a context of ongoing dialog and mutual respect for diverse intellectual contributions. When, in the course of sharing data and analyses, researchers find themselves working in closely related domains, they should make every effort to credit the contributions of fellow researchers in the form of joint publications and appropriate attribution, including attribution to the source of any data utilized.

    III. Commitment to Share Data

In order to provide consistency across the various research projects within the DML Initiative, the mechanics of data sharing will be focused primarily at the level of the Principal Investigator (“PI”). Therefore, every DML Initiative PI must agree to this policy and to share any data collected in connection with empirical work funded by the Foundation’s DML Initiative pursuant to it. In addition, pursuant to this policy, the Foundation’s Intellectual Property Policy, and any associated grant letter, each DML Initiative PI must require all other grant personnel to agree to this policy as part of their participation in any funded project.

    IV. How Data Will Be Shared

As a general matter, data will be shared via requests from one DML Initiative PI to another. When a Non-PI Initiative researcher wishes to have access another researcher’s data, she must ask her own PI to make the request on her behalf to the PI who oversees the project where the data was collected. Subject to the limited exceptions detailed below, access will then be granted to the requesting PI and that PI will then be responsible for sharing the data with the requesting researcher and attending to any requirements or concerns over proliferation, storage, access, and security. PIs requesting data should also include the name, institution and role of anyone who will have access to the shared data and should agree not to grant additional access to the data without permission of the PI providing the data. This will help ensure that there are tight controls on access as well as strong accountability mechanisms between researchers while at the same time allowing broad sharing throughout the Initiative. PIs who receive data sharing requests should also notify the research lead of any data to be shared of the request and decision.

    V. Limited Exceptions

As noted above, the presumption is that data will be shared as widely and as often as possible via DML Initiative PIs. However, we recognize that there may be some limited legitimate reasons to postpone or even withhold data from sharing. In order to recognize these concerns and provide a method for addressing them, we have developed a list of pre-approved “check box” exceptions to the policy that PIs can invoke when appropriate in response to a data-sharing request.2 These exceptions include: (1) revealing this data would violate the researcher’s IRB protocol;3 (2) the data has not been verified sufficiently to share; (3) sharing the data would lead to unacceptable risks to the research subjects;4 (4) it is illegal to share this data; (5) sharing the data would create a conflict of interest or be otherwise unethical.5

    VI. How Data Can Be Used

When data is shared initially, the requesting researcher may only access the data to look at it; it may not be analyzed for purposes of publication. When requesters want to use the data for publication in more than de minimis fashion, they must seek permission from the data owner. Accordingly, we have developed a three-tiered approach to the use of shared data for publication:

De minimis use of data: data requester may proceed without seeking permission from the data owner but must attribute any use of the data to the owner in any publication.6
Marginal use of data for publication: data requester must contact data owner for permission to use for publication.
Substantial use of data as basis for publication: data requester must contact data owner for permission and offer to co-author publication with data owner.7

    VII. Process For Addressing Complaints or Disputes

If any participating researchers have a complaint with how these policies have been developed or carried out, they should first make an effort to resolve the issue in consultation with the relevant PIs. If the issue cannot be resolved by the PIs, or if one of the PIs is implicated in the complaint, a researcher should contact Mizuko Ito or David Theo Goldberg at the DML Hub. The DML Hub has established procedures for mediation of data sharing disputes, including the option for the researcher requesting the data to participate in the mediation process.

    VIII. Sharing with non-DML Initiative Researchers

As noted above, this data-sharing policy applies to all researchers funded as part of the DML Initiative. However, we recognize that these researchers collaborate across many fields and disciplines, and that there are many opportunities for sharing data outside of MacArthur-funded projects that are beneficial. In order to accommodate this and build out our data-sharing capacity while still applying appropriate control and accountability mechanisms, the DML Hub has established a process for approving PIs of empirical research conducted outside of the Initiative so that they can request DML Initiative data (and vice-versa), subject to two conditions: (1) they agree to enter into a reciprocal data- sharing agreement that mirrors this policy; and (2) they have been vetted within the DML Initiative to allow any DML Initiative PI to veto a new entrant for any reasonable reason.

    IX. Specific Data Sharing Practices

Below, we provide additional detail for how to approach and prepare particular types of data for sharing under this policy.              

              a. Protocols, Instruments, and Codes

All IRB protocols for research conducted pursuant to the DML Initiative must be designed to allow for sharing and analysis across institutions as described in this policy. Research Leads must prepare individual protocols at their home institutions, naming the Hub as a supporting institution. Sharing must comply with the individual IRB protocol requirements for data privacy, including the striping of personal indentifiers (names, social security numbers, phone numbers etc.) when sharing beyond the immediate research team. Names of schools, programs and cities/towns can be changed at the Research Lead’s discretion, although we encourage Leads to coordinate this process within the research team to facilitate clarity. In addition, all DML Initiative IRB protocols themselves (along with any associated IRB correspondence or minutes) must also be made available for sharing pursuant to this policy.

Notwithstanding the need for separate items/protocols that are keyed to the needs of a specific survey or case study, all DML Initiative researchers should make best efforts to develop a shared set of interview and survey protocols across the DML Initiative. In addition, all research instruments must be made available for sharing as soon they are developed. Network-wide protocols will be released to the public under a CC-BY license,8 and every effort will be made to release these publicly as soon as they are developed, or after the data has been collected if that is necessary for the creation and validation of measures. The instruments associated with specific case studies or assessments will be released to the public at the individual researcher’s discretion.

DML Initiative researchers should also seek to develop a set of shared codes, and during analysis, code for the shared codes as well as the codes that are specific to their study. The shared codes will be shared back across the DML Initiative to allow for iterative and collaborative analysis, as well as publicly, in order to offer opportunities for other research projects to build on the categories that we will be developing in our work.      

               b. Quantitative Data Sharing

For quantitative data, the expectation is that data will be shared pursuant to this policy and with the public after the participating researchers have had the opportunity to conduct their first complete analysis of the data.

The DML Initiative PI on the dataset will have the right of first publication and analysis, though other DML Initiative PIs can request the right to analyze and publish on the data before public release. The data will be released publicly when the report of the main findings has been accepted for publication. The expectation is that the public release will happen within 30 months of data collection, but extensions to this policy can be considered if there are extenuating circumstances.

Data for public release should be free of identifiers that would permit linkages to individual research participants and variables that could lead to deductive disclosure of the identity of individual subjects.

               c. Qualitative Data Sharing

For qualitative data, the expectation is that data will be shared pursuant to this policy and in particular, that field notes and interview transcripts will be available to share as soon as they are completed.

However, qualitative data will not be made public except in the context of analysis and publication. Work-in-progress and raw data may only be made public by the Research Lead who is in charge of a specific case.9 In addition, the Research Lead on a specific case study retains the sole prerogative to publish work that describes the specific case study. It is imperative that DML Initiative researchers be in close communication when data sharing is requested for publication purposes and relies on data collected by a given Research Lead. Transparency at this layer is important so there are no surprises on either side when people are preparing for publication.

When data has been shared with a requester for publication, the expectation is that data owner will stay in ongoing communication about their findings and analysis as part of the collaborative endeavor, to vet analyses in process, and to ensure that data is being used in appropriate ways. Additionally, before publication and presentation, all research publications and presentations should be made available for a period of comment of at least two weeks.

              d. Attribution

When quantitative data are used in a publication, the study and lead investigators should be noted (e.g. Smith, Year data was collected or made public), and in the case where it is posted and public, a link and the name of the study should be provided. In addition, one should cite, where possible, any research publications of collaborators that speak to the issue at hand.

When qualitative data are used in a way that a specific case, individual, or observation from a specific case is identifiable, the Research Lead and the individual case study should be attributed. For example, in the case of the Digital Youth Project, when drawing from shared data, for each quote and example, we noted the name of the Research Lead and the study (e.g. Perkel, MySpace Profile Production), and we included an appendix that listed the participating researchers and the full study titles for each research project (see Ito et al. 2009).

              e. Orphan Data

In the case where a researcher is leaving the project and does not plan to continue to conduct analysis on the data that they contributed to collecting, she should ensure that her material has been archived with the project, and should assign the rights to the data and to any analysis of it to the PI for the project.

In the case when a researcher becomes unreachable, the PI on the research project will assume the right to make decisions over the use of the data.


1 A Research Lead has the primary responsibility for a particular case study. They may be the sole fieldworker on a case or leading a group of fieldworkers working collaboratively on a single case. For the purposes of this data sharing policy, all fieldwork efforts should define and name the work as distinct case studies each with a clear research lead.
2 This is similar to Open Records and Freedom of Information Act (FOIA) requests, where, upon request, information held by the government must be made public unless the government asserts one of a series of set exemptions, such as the protection of classified national security information.
3 Note that checking this box would require that the researcher attempt to either de- identify the data so it can be shared or amend their IRB protocol to include the requestors as collaborators so there would be IRB approval to share.
4 Examples include risks to physical safety or privacy risks.
5 We expect to add other exceptions through conversation with the Foundation and DML Initiative researchers as research and sharing practices continue to develop.
6 An example of de minimis use might be aggregate observations about the data without inclusion of specific case studies.
7 Note that data owners can deny permission for use in publication for reasons beyond the limited set of exceptions for data sharing; for example, if the group that collected the data is publishing on the same specific point and wants to wait until after the first publication has been accepted to allow the data to be published a second time.
9 To help facilitate this, the DML Hub will provide a space for DML Initiative projects to list relevant information, such as the contact information for the PI, the starting and ending dates of any data collection, a short description of how the project fits within the DML Initiative, and a list of any researchers who are publicly involved in the project and their roles.