Today, Digital Science releases its first custom GPT on OpenAI’s ChatGPT platform – Dimensions Research GPT – as a free version based on Open Access content and an enterprise version calling on all the diverse content types in Dimensions from publications, grants, patents and clinical trials. In alignment with our goals to be responsible about the AIs that we introduce, we explore below some of the steps that we’ve taken in its development, explain our key principles in developing these tools, and make the context of these tools clear for the community that we intend them to serve.
For any software development company, there is an implicit responsibility to the user communities that they serve. Typically, this commitment might extend to being conscientious about how the software is developed; ensuring, to the greatest extent possible, that their software should be secure, not contain bugs, and that it will function as described to the client, would seem to be some of the basic requirements.
The rise of AI should raise the value that systems can bring to users, but it also raises the bar in the relationship between developer and user, especially with large language models (LLMs). Users need to understand how the data that they submit to the system are being used, and they also need to understand the limitations of the responses that they receive. Developers need to understand and minimise biases in the tools they create, as well as understand complex concepts such as hallucination and work out how to educate users about how they should think about trusting different types of output from their software.
All these problems are magnified tenfold when it comes to supporting researchers or the broader research enterprise. The research system is so fundamental to how society functions and progresses that we cannot afford for new technologies to undermine the trust that we have in it.
At Digital Science we believe that research is the single most powerful tool that humanity possesses for the positive transformation of society and, as such, we have a responsibility to provide software that does not damage research. Although that sounds simple, it is tremendously difficult. In an era of papermills and p-hacking, providing information tools that support research requires deeper thinking before releasing a product to users.
Beyond all the requirements that we have listed above, to support researchers and the research community, we believe that we need to:
- ensure that researchers understand what uses of the system are valid and which aren’t;
- sensitise users to the fact that this technology is in its early stage of development and that it cannot be completely trusted;
- provide users with the ability to contextualise the output that they get so that they don’t have to trust without verification;
- ensure that no groups of researchers are artificially or through commercial approaches disenfranchised or excluded from accessing this type of technology.
Many of these features have been built into the offering that we launch today: this blog attempts to address some of the points above; we are working to ensure equitable access by creating a free version; and we have made specific functionality choices to try to address our concerns with where this technology can lead. Overall, it is with some pride and much excitement that we launch Dimensions Research GPT today!
The Road to Dimensions Research GPT
Our free offering Dimensions Research GPT and its more powerful counterpart Dimensions Research GPT Enterprise are the result of a long period of testing and feedback from the community. We started developing this type of functionality in late 2022, but by summer 2023 it had reached a phase where we needed more understanding from the sector. Thus, in August 2023 we launched the Dimensions AI Assistant as a beta concept. We quickly learned that “question answering” can be challenging not just from a technical perspective (for example, providing a low-to-no-hallucination experience) but also in terms of providing users with an interface that continues to be engaging and which fuels curiosity.
In addition, we found that there is a certain “fuzziness” in querying through an LLM that doesn’t sit comfortably in an environment that involves highly structured data, such as Dimensions. That realisation led us to make certain design decisions that you’ll see informing the way that we develop both the products launched today and Dimensions in the future.
For better or worse, since the beginning of modern search in the mid-1990s we have become used to searching the web and seeing pages of search results – some of which are more relevant to our search, and some of which appear less relevant. With most LLMs, the information experience is different to a standard internet search: We ask a question and we get an answer. What’s more, we get an answer that typically does not equivocate or sound anything less than completely confident. It does not encourage us to read around a field or notice interesting articles that might not be relevant – it focuses us on the answer rather than being curious about all the things around the answer. Launching a tool that has those characteristics in a research context is not only potentially irresponsible but also dangerous. We have used that concern as a guiding principle for how we have built Dimensions Research GPT.
What is Dimensions Research GPT?
Dimensions Research GPT and Dimensions Research GPT Enterprise both bring together the language capabilities of OpenAI’s ChatGPT and the content across the different facets of Dimensions. In the case of Dimensions Research GPT, data related to research articles from the open access corpus contained in Dimensions is used to provide context to the user’s question and discover more. This free tool gives users the ability to interact with the world’s openly accessible scholarly content via an interface that ensures that answers refer back to the research that underlies the answer. This provides two important features: Firstly, the ability to verify any assertions made by Dimensions Research GPT, and secondly, the ability to see references to a set of articles that may be relevant to their question so that users continue to be inquisitive and read around a field. Basing this free tool on content that is free-to-read provides the greatest chance for equity and impact.
Dimensions Research GPT Enterprise runs the same engine and approach as Dimensions Research GPT but it extends the scope of the content that it can access to include data from the full Dimensions database covering 350 million records, including research articles, grant information, the clinical trials, and the patents. A truly fascinating dataset to explore in this new way.
Before we explore further what Dimensions Research GPT is, and the kinds of things that you can do, it is worth taking a moment to be clear about what it is not. Put simply, it is not intended for analytics. While many users are familiar with Dimensions as an analytics tool, the Dimensions Research GPT is not a tool for asking evaluative or quantitative questions. Thus, asking Dimensions Research GPT to calculate your H-index or rank the people in your field by their attention will be a fruitless task. Similarly, the system is designed to help you explore knowledge, not people; hence, if you ask Dimensions Research GPT to summarise your own work, provide rankings, or tell you who the most prolific people are in your field, you will be disappointed. Many of these use cases, with the exception of those involving H-index (Digital Science is a signatory to DORA) are already covered by Dimensions Analytics.
An example of how to use Dimensions Research GPT
We’ve covered at a high level the principles behind building a tool like Dimensions Research GPT, and we’ve also explained what it is and is not, so now we really should show you how to think about using the tool.
Below, we show a brief conversation with Dimensions Research GPT about a research area known to one of the co-authors of this blog. We encourage readers to carry out the same queries in ChatGPT or Dimensions Research GPT Enterprise and compare the answers that they receive.
Our first prompt introduces the area of interest…
Summarise three of the most important, recent applications of PT-symmetric quantum theory to real-world technologies
The references link over to Dimensions to give full contextualised details of the articles and connect over to source versions so that you can read further. Maybe we’re not from the field and we want to understand that response in simpler terms. That might look like:
Rewrite your last response at the level which a high-school student can understand and highlight the potential for application in the real world
With this query, we’ve just begun to explore the base functionality that ChatGPT provides under Dimensions Research GPT. This is just scratching the surface of the open-ended possibilities implied here.
Finally, we ask Dimensions Research GPT to speculate:
Please speculate on the potential applications of PT symmetry to medical device development, providing references to appropriate supporting literature
Again, the tool shows references that back up these speculations about these exciting potential advances.
We fully realise that this is not a panacea, but at the same time, we think that this approach is worthy of exploration and pursuit in a way that can help the research community benefit from new AI technologies in a responsible way. We’re sure that we won’t get everything right on the first attempt – but we aim to learn. On that note, we hope that you will be part of our experiment – please do tell us how you use this platform to inform and accelerate your own research. Like us, we’re sure you’ll find that with this technology there are always possibilities.
If you want to try Dimensions Research GPT, you can do so as a ChatGPT Plus or Enterprise user, by going to your OpenAI/ChatGPT environment and looking for Dimensions Research GPT under Explore GPTs.