Navigating Trust in Academic Research: The Rise of Data Availability Statements – Part I

23rd August 2023

Navigating Trust concept image

In an era of miscommunication and escalating pressures on academic researchers, the bedrock of credibility and trustworthiness in the scholarly world is under the microscope like never before. In this blog series, we venture into the realms of research transparency, focusing first on the rise of Data Availability Statements. We explore what research powerhouses are leading the charge in providing these critical transparency markers and the underlying trends behind this data. Dive in as we uncover the influences and paradoxes of the academic trust landscape.


Introduction

In the realm of academic communication, research integrity has emerged as a pivotal concern in recent years. It is a keystone principle that spans all disciplines, all cultures, and all geo-political divides. The good practices around data availability, ethical declarations, funder acknowledgement, detailed author contributions, and conflict of interest disclosures are not mere administrative tasks; they serve as guiding beacons, offering reassurance that research is conducted with honesty and transparency. We call these Trust Markers. The trust markers enable researchers to establish and nurture a strong foundation of trust with the public, industry, and funders. 

Welcome to the first part of our blog series, “Navigating Trust in Research”. Inspired by recommendations from the Hong Kong Principles for assessing research (Moher et al), and the Singapore Statement on Research Integrity, we embark on a journey into the realm of research integrity using Trust Marker data from Digital Science’s Dimensions database. We explore an abundant resource of insights offering the potential to reshape our perspectives on scholarly contributions, fortify trust in today’s research landscape, and harness this data to recognise, reward and encourage best practice on a global scale. 

Our inaugural instalment focuses on what can be regarded as the most critical component of a research output: the underlying data. Our analysis explores the growth in Data Availability Statements (DAS) and the trends that underscore commendable research practices with regard to data availability and transparency. In an era that places a premium on trust and ethical research, DAS emerge as pillars of credibility. They support the path to open science, equipping scholars, policymakers and stakeholders with the tools to challenge or support findings, scrutinise and validate methodologies – ultimately strengthening the underlying foundations of sharing knowledge.

When researchers choose to openly share their methodologies and data, they make a strong statement about their commitment to transparency and accountability. This action communicates to the public that they welcome scrutiny and have confidence in the thoroughness and ethical foundations of their work. Such openness can help to build public trust and establish researchers as dependable contributors to the knowledge pool – a role that is incredibly valuable in our current era of misinformation.

Dimensions Research Integrity Data

Trust Markers (i.e. explicit statements on a paper such as funding acknowledgement, data availability, conflict of interest statements, author contributions, and ethical approval) are the hallmarks of transparent and reproducible scientific research. The Dimensions Research Integrity (DRI) dataset uses AI models to recognize these Trust Markers in scientific publications. The resulting dataset provides information on the presence or absence of the Trust Markers across 33 million research articles, conference proceedings, books, chapters, and preprints from 2010. This data provides invaluable insights into authorship, reproducibility and transparency. 

Methodology 

In conducting this analysis, we rely on Digital Science’s Dimensions database and the Dimensions Research Integrity dataset. We use the Dimensions on Google BigQuery provision to extract data on research output, underlying funder data and associated countries using the organisation associated with each author. Additional country data from the Google BigQuery World Bank dataset is also integrated for further analysis. Research output for this study excludes books, chapters and monographs. 

To ensure focused analysis, our data has been limited to research outputs from 2017 to 2022 and grouped at the country level. After ranking countries by the quantity of research output, those cumulatively responsible for the top 70% of research output over this period have been selected as the underlying dataset. 

The Dimensions Research Integrity (DRI) data has been used to assess the proportion of Trust Markers against the research output. The DRI data accounts for 68% of the extracted research output for the timeframe i.e. we had 68% coverage of DRI data against our data extraction of research output from 2017-2022. Publications excluded from the DRI dataset have been omitted from the analysis.

For the purpose of this blog, we focus on the provision of DAS and zoom into a subset of data on the quality of the statement (e.g., whether repository data was provided and/or the URL location of the repository). 

This allows us to gain valuable insights into the prevalence and quality of DAS, an essential aspect of research integrity and transparency.

Importance of the Data Availability Statement (DAS)

A Data Availability Statement (DAS) is a crucial component of a scientific article, highlighting the accessibility of the research data. A DAS should advise how the reported data supporting the study’s findings can be accessed. 

Though many publishers and funders offer guidance on the composition of a DAS based on their distinct policies, not all make it a compulsory requirement. Incorporating a DAS can however enhance the credibility and validity of research findings by bolstering the transparency and quality of the study behind the publication. Additionally, it elevates the quality of the publication and enhances the potential for the associated data to be cited and even built upon, helping research studies become reproducible, and minimising unnecessary repetition. This is of particular importance with regard to capacity-building efforts. Researchers from economically emergent countries or marginalised communities can access existing datasets, which reduces the need for duplicative data collection, promoting cost-effective research. The necessity of a DAS became acutely evident during the COVID-19 pandemic, as “just-in-time” data became paramount, catalysing swift global research collaborations through immediate access to trustworthy and verifiable datasets.

A DAS can serve another useful role: detailing the rationale for any restrictions on data accessibility, such as the need to safeguard personal information. 

Findings

The 14 countries selected for the analysis (the United States, China, the United Kingdom, Germany, India, Japan, Russia, Brazil, Italy, Indonesia, Canada, France, Spain, and Australia) roughly account for 70% of the world’s research output from 2017 to 2022, according to Dimensions data. This implies that the amalgamated trend we observe from the 14 countries would ideally provide a good representation of the megatrend in DAS practices. These research superpowers will also have profound influences in shaping and driving the DAS practices around the world.

The proportion of publications containing DAS grew phenomenally over the past five years (Figure 1). In 2017, only about one in every 25 publications contained a data availability statement. In 2022, a data availability statement is found in nearly one out of every three publications, with the most notable growth in DAS taking place in the last three years.

Figure 1: Proportion of publications containing DAS from 2017 to 2022 for the country cluster responsible for 70% of the world’s research output. Source: Dimensions.

Taking a closer look at the selected countries, Australia, the UK and Canada had been consistently outperforming the 30% DAS average (Figure 2). Japan had been closely resembling the average DAS baseline, while India and Russia had been moving up along a ‘catch-up’ curve. China had a low DAS starting point of 3% in 2017 but it had been rising quickly and became the country with the highest proportion of publications containing DAS among the benchmarking group in 2022. The DAS trust marker of the US, in comparison, was at a higher-than-average point in 2017, but gradually fell below the average DAS baseline as other countries improved at a faster pace.

Figure 2: Proportion of publications containing DAS from 2017 to 2022: selected countries benchmarked against the country cluster responsible for 70% of the world’s research output. Source: Dimensions.

Field(s) of research, data sharing requirements imposed by funding agencies, and journal-specific publishing guidelines are known factors influencing DAS practices. If we look specifically at the metadata on funders, the proportion of publications containing DAS is higher for those with known funders (i.e. those whose underlying funder can be determined) as compared with those without (Figure 3). The gap has widened since 2020, corresponding with an increasing call in recent years to proactively publish and share data generated from publicly funded research. The EU Open Data Directive (formerly Public Sector Information (PSI) Directive), for example, came into force in 2019 and requires valuable public data to be re-usable, including those generated from research-performing organisations and research funding agencies.

Figure 3: Proportion of publications containing DAS from 2017 to 2022: a comparison between publications with and without known funders. Source: Dimensions.

At the country level, the availability of funder information is associated with differential impacts on DAS practices (Figure 4). In countries such as Indonesia, Russia, and Brazil, the proportion of publications containing DAS is substantially higher for those with known funders (i.e. 30-40% higher than those without known funders). It should be noted, however, that for countries such as Indonesia, publications with known funders represent a very small proportion of the national research output (i.e. 2% in the case of Indonesia); Trust Markers such as DAS are absent from the vast majority of the country’s research output.

Within the country benchmarking group, China showed the smallest difference in DAS practices between those with and without known funders.

Figure 4: Proportion of publications containing DAS from 2017 to 2022: a comparison between publications with and without known funders for the country cluster responsible for 70% of the world’s research output. Source: Dimensions.

Drilling down to the DAS sub-categories, we could see that, as recently as 2022, most publications (63%) containing a DAS indicated that data is available only upon request (Figure 5). Only 18% pointed to online repositories. The rest declared that data is available within the paper (8%), in the supplementary file (8%), or not publicly available (3%). This resonates with Couture et al.’s study in 2018, which reported unsatisfactory data recovery success under a particular funder-imposed requirement of public availability. 

This is worth further reflection as DAS practices should not be conceived as a box-ticking exercise. Knowing about data repositories commonly used in different fields could help researchers to improve DAS practices and make research data more transparent and discoverable.

Figure 5: DAS breakdown for the country cluster responsible for 70% of the world’s research output. Source: Dimensions.

Conclusion

Establishing evidence that research has undergone thorough scrutiny holds immense potential for bolstering research reputation and assuring legitimacy to external parties. As we strive for a more reliable and credible academic landscape, these Trust Markers become indispensable in establishing evidence of quality, safeguarding the integrity of research, and promoting a culture of transparency and accountability. When we consider the pivotal role that elevated research integrity standards now play, the prospect of incorporating such data into benchmarking frameworks or research evaluation processes becomes even more compelling. In an era marked by a surge in misinformation, coupled with the evolving challenges and pressures researchers face while striving to make meaningful scholarly contributions, quality matters.

In our next instalment, we will delve further into the factors that can influence the quality and content and the significance of a DAS. Have countries within emerging research economies also seen a growth in DAS practices? Are they associated with specific funding sources or collaboration patterns? We will explore the underlying metadata behind these publications. Furthermore we will look at how good practice in terms of data availability and transparency might influence citation patterns over time. 

Join us as we embark on our exploration into best practice, shedding light on a research landscape rooted in trust, collaboration, and a relentless pursuit of excellence.

Note: The authors have made the data associated with their post freely available. It can be found on Figshare here: https://doi.org/10.6084/m9.figshare.24018009

Share this article
Link copied to clipboard

Subscribe to our newsletter

Explore More From Digital Science
All TL;DR Videos