The Initial Transformation

In this ongoing Investigation into Research Transformation, we seek to celebrate the art of change. How does change happen in research? What influences our behaviour? How do all of the different systems in research influence each other?

We begin our reflection on transformation with perhaps one of the most unremarked on, yet most pervasive changes in research – the switch between initials and full first names in the author records. As we will see, the shift from the formal to the familiar has been in flux from the start of scholarly publishing, however – particularly in the last 80 years – we can trace the influence of countries, fields of research, publishers and journal submission technology, funders and scholarly knowledge graphs on author name behaviours. In more recent history, we can observe that the shift towards full names has also been gendered, particularly in medicine, with men shifting towards full names earlier than women.

Why does it matter? The increase in transparency afforded by first author names is not simply a curiosity. First names, in the ethnicities and genders that they suggest, provide an (albeit imperfect) high level reflection of the diversity of experiences that are brought to research. It is just as important to see ourselves reflected in the outputs of the research careers that we choose to pursue, as the voices that represent us on panels at conferences. Framed this way, the progress towards the use of first names is part of the story of inclusion in research. The ‘Initial Transformation’ is also an initial problem.

Fortunately, the use of initials as part of author names has been in steady, if gradual, decline. The full details of the “The Rise and Fall of the Initial Era” can be found in our recent paper on arXiv: https://arxiv.org/abs/2404.06500.

Below are six observations from the paper:

The transformation from initials to full first names is part of the the broader transformation of the journal article as technology

The form of a research article itself a the technology used to encode the global norms of science. As a key building block of shared knowledge, the evolution of the form of a research article must be at once slow enough to allow the discoveries of the past to be understood today, and flexible enough to codify new patterns of behavior (such as researcher identifiers ORCiD, funding statements, conflicts of interest, authors contribution statements and other trust markers).

Over time, not only has the structure of the content of a research article evolved, the way that authors are represented has also changed. From 1945 through to 1980, we identify a period of name formalism (referring to authors by first initial and surname). This is the only period in the history of publishing where initials are used in preference to full first names. We call this period the ‘Initial Era’.

In the ‘Initial Era’, we suggest that accommodating a growing number of authors per paper on a constrained physical page size encouraged the formalism towards initials. From 1980, full names begin to be used more commonly than initials marking the beginning of the ‘Modern Era’. Within the ‘Modern Era’, name formalism continues a gradual decline through to the 1990s. In the period between 1990 through to 2003 – a period of significant digital transformation in which the research article was recast as a digital object, name formalism drops steeply. After 2003, the decline in name formalism is less steep, but steadily trends toward zero.

The story of the Initial transformation is one of different research cultures becoming homogenised

The US is the first country to shift towards the familiar, followed reasonably quickly by other western countries, with France perhaps holding out the longest. Slavic Countries are more formal for longer but also increasingly shift towards familiar names. At the bottom of the graph (see below) in green, are three countries in the Asia-Pacific region – Japan, South Korea and China. For these countries there is no concept of a first initial, and where names have been anglicised, full names were preferred.

The story of Initial Transformation highlights a discipline separation in research culture

How we name ourselves on papers has nothing to do with the type of research that we conduct, yet there are very clear differences in the rate of shift from name formalism between disciplines. Research does not change at a single pace, local cultures can impact change regardless of their relationship to the change itself.

Technology influenced our name formalism

The choice to use first names or initials has not always been a choice that resides with researchers themselves. Below we present an analysis of three journals that all went live with online journal systems in 1995-96. From the mid 70s through to 1995, journals still mostly employed typesetting houses that set the style of the journal. Even before the onset of online submission systems, journal styles influenced the way that first initials were represented. From the mid 70s these three journals take different approaches. Tetrahedron shifts from a majority initials approach, whereas The BMJ and the Journal of Biological Chemistry switch to typesetting that preferences initials. With the emergence of the internet in 1995, research articles began to be recast as discoverable landing pages, and here the Journal of Biological Chemistry switches all at once to a system that enforces full names, and The BMJ – a system that allows choice. In all cases where author choice is allowed, the trend away from formal names continues.

Changes in Infrastructure can affect how we understand the past as well as the present

Between 2003 and 2010, DOI infrastructure run by CrossRef was adopted by the majority of publishers. As part of the CrossRef metadata schema, a separate field for given names was assumed. Critically, during this transition most journals chose to implement their back catalogue, including full names where possible. We owe our ability to view full name data in the past to infrastructure changes in the first decade of the 2000s.

How were publishers able to communicate first names to the crossref DOI standard? At a layer below DOIs was another language to describe the digital structure of papers. The Journal Article Tag Suite (JATS XML), now a common standard used to describe the digital form of a journal article – aiding both the presentation, and preservation of digital content - was first released in 2003, and reflected over a decade of prior work in the industry to reexpress the journal article as a digital object. Within this standard full names were also codified, and the requirement of a publisher to preserve all digital content meant that there was an imperative to apply this standard (or at least compatible earlier versions) to their complete catalogues.

Although the communication of first names seems to have occurred reasonably seamlessly to DOI metadata, the transition of first names to the scholarly knowledge graphs of the time was slower.

MedLine (and by relation pubmed) only began adding full names to its metadata records in 2002. Journals that relied on MedLine records for discovery (and chose not to implement DOIs) did not benefit from retrospective updates.

The difference in the adoption of first names between crossref and MedLine/PubMed also highlights a risk in adopting scholarly knowledge graphs as infrastructure. Scholarly Knowledge graphs have their own constraints on infrastructure, and make decisions on what is sustainable to present. Although enormously valuable, they are a disconnection point with the sources of truth they present. We can see this split starkly if we look at publications from those journals that chose not to create DOIs for their articles, relying instead just on the services provided by MedLine.

The shift to full names happened at different rates for men and women, and at least for publications associated with pubmed, technology influenced the practice

With the benefit of gender guessing technology, we note that progress towards first names has occurred at different rates for men and women. This is particularly stark for publications in PubMed.

Why is there a jump in 2002? As mentioned above, 2002 was the year that you could start to interact with author first names, with pubmed and medline incorporating it into their search. Although we cannot draw a direct causal connection, it is tempting to make the argument that this subtle shift in critical technology used by almost all medical researchers had a small but important impact on making research more inclusive. When we look at articles that have both a PubMed ID and a DOI, we can see that in 2002 the average number of first names on papers associated with women rose by 17%, and 13% for men. This jump is not present in publications that have not been indexed by PubMed.

For medical disciplines associated with papers in pubmed, after 2002 there also is a distinct difference in the rate of first name transformation for men and women. The rate of change for men is less than half that of women, rising only 5% in 20 years, compared to 12%. For some disciplines then, this raises a methodological challenge in gender studies as (at least based on author records,) the changes in participation rates of women in science must be disentangled from changes in the visibility of women in science.

Embracing Initial Transformation

Finally, the transition from initials to first names has happened slowly and without advocacy. Whilst this has been to our advantage in identifying some of the axis along which research transformation occurs, an argument could be made that, if first names help provide us (imperfectly) access to the diversity of experiences that are brought to research, then the pace of change has not been fast enough. For instance, could more have been made of the use of ORCiD to facilitate the shift to using first names so that older works by the same researcher identified by an initial based moniker could be linked to newer works that use the researchers full first name?

The transformation away from name formalism of course does not stop at author bylines. Name formalism is also embraced in reference formats. It could be argued that even within a paper, this formalism suppresses the diversity signal in the research that we encounter. Reference styles were defined in a different era with physical space constraints. Is it time to reconsider these conventions?
Within contribution statements that use the CRediT taxonomy, initials are also commonly employed to refer to authors. Here, this convention also creates disambiguation issues when two authors share the same surname and first initials. Here too, as the digital structure of a paper continues to evolve, we should be careful not to unquestioningly embed the naming conventions of a different era into our evolving metadata standards.