By Elliot Schrage, Vice President for Special Projects and Chaya Nayak, Strategic Initiatives Manager
Building on the effort we launched last spring to promote independent research on social media’s role in elections, today our partners at Social Science One and the Social Science Research Council (SSRC) announced the first researchers who will gain access to privacy-protected Facebook data. More than 60 researchers from 30 academic institutions across 11 countries were chosen through a competitive peer review process organized by the SSRC. You can find the full list of research grants awarded and other details on today’s announcement from the SSRC and Social Science One here and here. To assure the independence of the research and the researchers, Facebook did not play any role in the selection of the individuals or their projects and will have no role in directing the findings or conclusions of the research.
We hope this initiative will deepen public understanding of the role social media has on elections and democracy and help Facebook and other companies improve their products and practices. Over the past two years, we have made significant improvements in how we monitor for and take action against abuse on our platform. We know we can’t do this work alone, and much of the progress we have made is due to significant support from external partners, including governments, civil society groups, NGOs, other private sector companies and academics. This initiative will deepen our work with universities around the world as we continue to improve our ability to address current threats and anticipate new ones.
In support of this effort, over the past several months, we’ve begun building a first-of-its-kind data sharing infrastructure to provide researchers access to Facebook data in a secure manner that protects people’s privacy. We’ve consulted with some of the country’s leading external privacy advisors and the Social Science One privacy committee for recommendations on how best to ensure the privacy of the data sets shared and have rigorously tested our infrastructure to make sure it is secure. Some of these steps include building a process to remove personally identifiable information from the data set and only allowing researcher access to the data set through a secure portal that leverages two-factor authentication and a VPN. In addition to building a custom infrastructure, we’re also testing the application of differential privacy, which adds statistical noise to raw data sets to make sure an individual can’t be re-identified without affecting the reliability of the results. It also limits the number of queries a researcher can run, which ensures the system cannot be repeatedly queried to circumvent privacy measures. We hope that this testing will lead to other benefits by letting us unlock more data sets to the research community safely and securely.
We understand many stakeholders are eager for data to be made available as quickly as possible. While we remain committed to advancing this important initiative, Facebook is also committed to taking the time necessary to incorporate the highest privacy protections and build a data infrastructure that provides data in a secure manner. With these safeguards in place, selected researchers will gain access to the following data:
- CrowdTangle: CrowdTangle allows researchers to track the popularity of news items and other public posts across social media platforms. The CrowdTangle API will allow researchers to access public Facebook and Instagram data, which includes posts from public pages, public groups and verified profiles. Beginning today, we are providing the researchers selected in this initial round of grants, as well as Social Science One commission members, access to this API.
- Ad Library API: The Ad Library API provides data on ads related to politics or issues on Facebook in the US, UK, Brazil, India, Ukraine, Israel and the EU. Beginning today, researchers have access to the API. Facebook and Social Science One are also working to provide feedback on the API to help make it more useful for research purposes.
- Facebook URLs Data Set: The URL data set will be aggregated and anonymized to prevent researchers from identifying any individual Facebook users. This data set includes URLs that have been shared on Facebook by at least 100 unique Facebook users on average who have posted the URL with public privacy settings. This dataset includes the URL link and information on the total shares for a given URL, a text summary of content within the URL, engagement statistics such as the top country where the URL was shared, and information related to the fact-checking ratings from our third-party fact-checking partners. More details on what is contained in this data set can be found in the URL Codebook. Before getting access to this data set, researchers must attend a training session we are leading in June about these data and our research tool. Over the coming months, we will continue to explore ways to expand the scope of the data we make available to researchers in line with our commitment to privacy.
We want to thank Social Science One and SSRC, as well as the many experts in academic and privacy communities who contributed, for their hard work and ongoing investment in supporting this unprecedented partnership. We also appreciate the commitment from leading foundations funding this research, including the John and Laura Arnold Foundation, the Democracy Fund, the William and Flora Hewlett Foundation, the John S. and James L. Knight Foundation, the Charles Koch Foundation, the Omidyar Network, and the Sloan Foundation and Children’s Investment Fund Foundation.