Beyond spatial bias: understanding the colonial legacies and contemporary social forces shaping biodiversity data

Authors: Hilary Faxon & Millie Chapman

How do historical and contemporary social forces shape what we know about nature? This question, while explored for decades in environmental social science, has become more urgent and complex in the age of AI and Big Data.

Social, political, and historical drivers of data collection, management, and publishing are not merely a reflection of  “bias.” Ecologists and conservation scientists have long acknowledged the taxonomic, geographic, and socioeconomic unevenness of biodiversity data, often proposing statistical strategies to detect and correct for data gaps while calling for reallocation of research effort to understudied species and places. While important, these strategies – mainly focusing on established and quantitative approaches to addressing bias – are not sufficient to understand the social and political factors that drive and perpetuate unevenness in data control, processing, and use, as well as their consequences.

In a recent article in Environmental Research Letters, we bring together quantitative and qualitative approaches to map colonial and postcolonial patterns in the distribution and management of global biodiversity data. We explore how and why these patterns persist, even in the wake of open-access repositories, global investment, and new data modalities. Focusing on the Global Biodiversity Information Facility (GBIF), an open-access repository of over 2 billion species observations, we draw on data analysis and interviews with data publishers and platform managers to identify and explain social patterns in biodiversity data. We identify three social forces shaping contemporary trends in biodiversity data: (1) colonial legacies of collecting (2) infrastructures of international development (3) contemporary data cultures.

Colonial geographies structure not only data distributions, but also data ownership and control, even long after a country’s independence (see Figure 1). GBIF staff and data publishers were quick to recognize the ways that colonial legacies of collecting as well as the enduring economic and scientific inequalities impacted ongoing collection. One told us “it is very obvious in GBIF if you look where the specimens sit and where they come from, it is true: it is basically a map of the colonies.”

Figure Caption: (A) Colonial legacies still shape who publishes biodiversity data, both from the colonial period and after independence. In some cases, colonial ties are even more entrenched in recent data – for example, in Germany and Spain. The maps and bar charts show the number of observations from previous colonies published by the colonizer countries. (B) However, these patterns vary widely across former colonies. For example, over 30% of post-independence biodiversity records in the Democratic Republic of Congo are still published by its former colonizer, Belgium. But in Rwanda, citizen science and ecotourism have led to more data being published by other countries, like the United States.

Among biodiversity data initiatives, GBIF is uniquely tied to the architecture of international development. The facility was founded in 2001 following calls from the Organization for Economic Cooperation and Development (OECD), a group of high-income countries organized in the 1960s to promote global trade and free market democracy, to create an international mechanism to make biodiversity data globally accessible. A quarter century later, observations and publishers remain concentrated in OECD countries, even as GBIF staff work to facilitate the “data flow,” including with targeted technical and financial support to underrepresented regions.

Our visits to the offices of GBIF data publishers illustrated resource disparities that shape global science and knowledge. In Northern Europe, we visited facilities where hundreds of research staff experimented with cutting edge automation and AI for digitizing specimens; in Southeast Asia, we spent time sweating in un-airconditioned offices, where thousands of insect specimens housed in recycled boxes with handwritten labels waited to be photographed and entered into excel in a monsoon climate.  

Interviewees also highlighted the issue of data culture, specifically resistance to sharing data. One GBIF staff member explained the challenge: “We had a lot of convincing to do… people sit on their data like ducks on their eggs!” The benefits of open data are not always obvious, particularly in authoritarian countries or places concerned with national sovereignty. One interviewee was enthusiastic about open biodiversity data, but explained that founding a national GBIF node would be a tough sell; the government was reluctant to pay a membership fee to fund infrastructure just to, “give the data away for free.” More popular, they explained, was an online portal that linked to a digital reference collection unconnected to GBIF, which aimed at virtual repatriation of specimens held abroad in the name of preserving national heritage.

As new data repositories and algorithms increasingly shape environmental knowledge and conservation decision-making, new sorts of mixed methods collaborations can help uncover patterns and drivers. This project emerged from a longer collaboration exploring the ethics and practices of algorithmic conservation, and was sparked by Mille’s curiosity about the social forces that might be shaping a dataset that has increasingly become the global standard for macroecological modeling and conservation decision-making. Combining Millie’s knowledge and analysis of the GBIF dataset with Hilary’s expertise in qualitative methods and experience exploring the social life of environmental monitoring in Southeast Asia, we experimented with different ways of understanding the social forces that shape GBIF. We wrote the first draft of this paper for a Conservation Data Justice workshop held in 2024, where we were grateful for interdisciplinary perspectives, encouragement and feedback. Throughout the research and writing process – even during final revisions – we experimented with ways of iterating and integrating the quantitative and qualitative analysis.

This work has led to new research questions about how environmental data publishers, curators, and users think of labor and value amidst rapid technological change and escalating ecological crisis. We are also curious about alternative types of organizations and data infrastructures – such as for-profit start-up companies and decentralized technologies such as blockchain – that are embracing biodiversity data. New technologies and institutions bring new empirical objects of investigation, but the core concerns of conservation data justice remain.