Artificial intelligence/Bellagio 2024

On February 19–23, 2024, a group of 21 Wikimedians, academics, and practitioners met at the Rockefeller Foundation’s Bellagio Center to draft an initial research agenda on the implications of artificial intelligence (AI) for the knowledge commons. We aimed to focus attention (and therefore resources) on the vital questions volunteer contributors have raised, including the promise, as well as risks and negative impacts, of AI systems on the open Internet.

We are optimistic that the use of machine learning and other AI approaches can improve the efficiency of operations on these platforms and the work of volunteers, and can support efforts to reach a new generation of readers and contributors. At the same time, we are concerned about the potential negative impact of the use of AI on the motivations of contributors as well as the misuse of these technologies to discourage volunteers and disrupt their work in the peer-produced knowledge commons ecosystem.

Below, we published the initial thinking on potential research directions that may eventually become a shared research agenda. Our hope is that many researchers across industry, government, and nonprofit organizations will adopt the final research agenda to help support and guide their own research. By focusing research efforts on topics that benefit the knowledge commons and help volunteers, our goal is to help inform and guide product development, public policy, and public opinion about the future direction of AI-related technology.

A note on AI ethics

The development, evaluation, deployment, and measurement of AI tools raise many ethical concerns—both in general and in the context of the knowledge commons. The project of articulating these risks and developing principles and guidelines to shape research in this area reflects both a significant effort and a critically important aspect of every part of the research agenda outlined here. Efforts to develop these principles and guidelines should be made in parallel with the research outlined here. Researchers engaged in any aspect of the work described here have a responsibility to consider the harms and impacts of their research. As ethical principles and guidelines are developed, they should be used to critically assess and shape all the work outlined below. As the work below is conducted, we hope that the results will also shape our knowledge of ethical research.

Research areas

This section is currently a draft.

This is a summary of potential research areas that the research agenda may eventually pursue. It represents some initial brainstorming and work that we are sharing here to gather early feedback and direction from Wikimedians, other knowledge commons communities, and researchers, with the aim of publishing a more stable agenda in March or April 2024.

The four potential research areas are:

Characterizing and monitoring use of AI in the knowledge commons over time
Developing AI tools for the knowledge commons
Evaluating the effect and impact of deploying AI tools
Empower knowledge commons communities in the AI era

Characterizing and monitoring use of AI in the knowledge commons over time

Knowledge commons platforms are one of the greatest success stories of the Internet. As the latest wave of automation is sweeping through people’s digital work and life, there are concerns about the amount of disruption this may cause for knowledge equity around the world, for the communities of volunteers engaged in these initiatives, and the integrity of the knowledge they help create.

Robust current research on the extent of these changes is lacking. This lack of data makes it difficult for these communities (and their broader ecosystem of partners, supporters, and collaborators) to address current and potential harms or make the most of the new capabilities of foundational and frontier models. Used wisely, these hold the promise to address ongoing knowledge commons challenges such as community growth, contributors’ experience and content quality.

Proposed research

Current and future uses of AI. AI did not start with the launch of ChatGPT in November 2022. Many AI tools are already deployed in knowledge commons communities, and popular knowledge commons platforms like Wikipedia have employed the use of machine learning tools for more than a decade. However, our understanding of how actively these tools are used and how they can be improved is limited, especially when it comes to newer generative capabilities. We also lack understanding around whether contributors find such capabilities helpful, and around what measures are needed to empower all contributors to use them. We need to explore how AI could potentially lead to new ways for people to contribute, including those who are, for a variety of reasons, not currently part of these communities. Our “State of AI in the knowledge commons” research agenda could include:
- A review of currently deployed systems, including (where available) quantitative and qualitative evidence of use and impact.
- A survey of contributors’ experience and opinions of AI assistants, as well as broader issues such as their perceptions of how knowledge commons are used towards the development of AI models and applications.
- A hub for AI assistants in use, how they work, and what they are for, including datasets, related resources, and ways for the community to provide feedback and contribute to their further development.
Contributors’ motivations. To attract new contributors and help make knowledge commons communities sustainable in the face of ongoing challenges, it is essential to deepen our understanding of the reasons people do or do not contribute. To have real impact, this research will need to be mindful of the diversity of existing and prospective contributors across the world, including countries and demographics that are currently underrepresented. Our assumption, supported by some evidence from platforms such as GitHub and StackOverflow, is that the mainstream availability of tools such as ChatGPT could fundamentally change both levels of participation and contribution practices with mixed effects. This means we will first need to revisit and refresh existing frameworks that have been used to study community motivations to consider the impact of AI assistants, informed by ongoing research in responsible AI, as well as an up-to-date account of contribution profiles. This would inform discussions in the form of workshops and other established community engagement means.

Develop AI tools for the Knowledge Commons

There are many places where AI can be used to improve knowledge commons processes or outputs. Research can aid in the development of new techniques and tools to do so. Tools can broadly be classified in two groups:

Tools focused on content contribution that make contributing easier or more effective. This is important because maintaining the commons is simply too much work for too few people. AI can help boost productivity and content quality.
Tools focused on content consumption that can improve user experience by making content discoverable etc.

The proposed research areas below are focused on the Wikimedia ecosystem, but can hopefully serve as an inspiration for other knowledge commons projects too.

Empower knowledge commons communities in the AI era

Recent advances in AI would not be possible without the communities that build knowledge commons. Knowledge commons are widely known as being some of the most important datasets on which AI systems are developed, trained, and beyond. However, knowledge commons communities have little to no influence over how these AI systems are built and used. This is a particular challenge when these systems begin to affect knowledge commons communities (e.g., by threatening to reduce participation in them) or when they violate core values in those communities (e.g., citation and provenance). As knowledge commons-based AI systems grow in prominence across society, there is growing demand for new mechanisms to ensure that knowledge commons communities are able to have an influence in the AI ecosystem that is commensurate with the value they create for the ecosystem. These mechanisms must be sufficiently powerful to make change and at the same time must also be compatible with the openness principles that are core to many knowledge commons communities.

Proposed research

Some of the research includes but is certainly not limited to:

Revisiting licensing: New AI systems use data in ways that were difficult to predict when the current family of open content licenses were developed, or when communities and people decided to attach these licenses to their content. This has led to a growing interest in new types of licenses (or new mechanisms to express and enforce community preferences) that, for instance, empower communities to express preferences about the use of content in AI training/RAG (e.g. explicit opt-in or consent). Should new types of licenses need to be developed to support community values and norms around the use of their content in AI systems? There are also questions as to whether such licenses would be enforceable–either legally or practically–and whether licensing is the right mechanism for influencing the behavior of downstream technologies. What other legal, normative, and policy strategies might complement or replace licensing in this regard?

Collective action across knowledge communities: Existing research suggests that collective action is essential for content producers to successfully influence the AI ecosystem. What types of community structures and tools are needed to facilitate collective action across knowledge commons communities with respect to the behavior of AI models and systems? Would a knowledge commons coalition be a sufficient institution? What would such a council look like? What types of shared infrastructure would be needed to facilitate finding consensus across the leaders and members of multiple commons communities? Can we adopt ideas from Works Councils that can help influence the developers of AI systems on questions on which knowledge commons communities are key stakeholders?

Supporting open AI systems: While knowledge commons communities often would not and cannot restrict the use of their knowledge to specific AI systems regardless of their behavior, they can pay special attention to AI systems whose behavior better matches their values, especially transparency and openness more generally. What can knowledge commons communities do to best support these types of AI systems? For instance, are there knowledge creation drives that can differentially benefit these systems? How can knowledge commons communities best contribute to larger public data repositories that are being developed? Should these public repositories have usage rules that reflect the values of the communities that contributed to them, and how would they be enforced?

Additional influence mechanisms: What other mechanisms for influence do knowledge communities have? For instance, are there normative standards that can be included in professional conduct guides, and how might those normative standards be enforced? For communities that want to restrict usage of the knowledge they developed by a certain set of actors, what are all the ways they can do that?

Knowledge commons communities as a representative of the broader truth infrastructure for the web: How can knowledge commons communities use their leverage to advocate for the needs of other parts of the truth infrastructure of the web (e.g. journalism), on which knowledge commons communities often rely. How can knowledge commons communities partner with institutions that create knowledge in different ways for shared goals, and what are those goals?

Participants in the 2024 Bellagio symposium

(Listed in alphabetical order)

Chris Albon 一 Wikimedia Foundation
Ricardo Baeza-Yates 一 Institute for Experiential AI, Northeastern University
Giovanni Colavizza 一 University of Bologna and Odoma LLC
Claudia Deane 一 Pew Research Center
Selena Deckelmann 一 Wikimedia Foundation
Jan Gerlach 一 Wikimedia Foundation
Brent Hecht 一 Northwestern University; Microsoft
Benjamin Mako Hill 一 University of Washington; Princeton University (2023–2024)
Fred von Lohmann 一 OpenAI
Lorenzo Losa 一 Wikimedia Foundation
Angela Oduor Lungati 一 Ushahidi
Guillaume Paumier 一 Wikimedia Foundation
Miriam Redi 一 Wikimedia Foundation
Tom Scott 一 PLOS
Elena Simperl 一 King’s College London, Open Data Institute
Matt Thompson 一 Facilitator
Stefaan Verhulst 一 The GovLab and The Data Tank
Denny Vrandečić 一 Wikimedia Foundation
Kat Walsh 一 Creative Commons
Bob West 一 École Polytechnique Fédérale de Lausanne
Leila Zia 一 Wikimedia Foundation

Get involved

Questions and comments on the proposed research agenda are encouraged on the talk page.