
Project
Enhancing UK Research Insight
Linking open research data to map people, projects and impact
Published
IGL’s Data and Technology Unit worked with the Department for Science, Innovation and Technology to enhance administrative data from UK Research and Innovation enabling advanced analysis of publicly funded research and innovation activities.
Administrative data is a valuable resource for supporting the evaluation of research and innovation (R&I) policies and informing future strategy.
UK Research and Innovation (UKRI), the UK’s public funder of R&I, collects data through the funding programmes and support it provides. When researchers and innovators apply for those programmes, data about them and their proposals is recorded. For those who receive funding, additional data is collected through self-reporting of their project outputs. This information, available via the Gateway to Research (GtR) platform, can be used to track the outcomes of public funding.
The possibilities for analysis and decision making are extended further when these administrative data are enriched and linked with additional data from external, open-access sources, such as OpenAlex, a large, freely accessible database of publication metadata. This can be used to link funded researchers with their wider academic activities to develop advanced scientometric measures. Other examples include open taxonomies of research topics and technology groups that can be used to categorise project outputs and make universal comparisons between them.
To this end, IGL and the Department for Science Technology and Innovation (DSIT) collaborated on three micro-projects designed to enhance administrative data from UKRI that is openly available on the GtR platform.
Project 1: Taxonomising Research
To generate useful policy insights, it is important to analyse R&I activities according to research topics, technologies and other useful categorisations.
However, project information on the GtR platform is not systematically structured in this way, making large-scale analysis difficult.
To overcome this challenge at scale, this project uses semantic analysis and machine learning to tag UKRI-funded projects with topic labels by categorising them within one of three taxonomies (CWTS Topics, GOScience Technologies, and OpenAlex Concepts). The method we develop identifies the best topic tags based on project descriptions. It is generalisable and can be repurposed to categorise projects into any suitable taxonomy.
Project 2: Who’s Who
The journeys of researchers and the development of research communities are a valuable impact of research and innovation projects, alongside the publications and insights they produce.
But, tracing individuals across different datasets is challenging, as many do not have a single, unique identifier in GtR, which can obscure their full contribution and career path.
This project addresses this ambiguity through an automated pipeline that links authors in OpenAlex to their researcher entries in GtR. To ensure good matches, the process uses not just names but also contextual information such as their research interests and institutional affiliations to make a probabilistic assessment.
Project 3: Tracing Papers
Counting high impact publications and citations have been traditional ways of assessing research impact. Today, more holistic ideas of impact are emerging.
Developing these richer metrics first requires a comprehensive dataset, which can be challenging to build when publication identifiers like DOIs are not always available in GtR.
This project overcomes this data gap by using automated search and text-matching techniques to link publications from GtR to OpenAlex, even in the absence of a DOI. This enhanced dataset is then used to create new metrics to measure the impact of research teams, including contextual citation data and measures of interdisciplinarity and disruption.
These projects represent a foundational step towards an open, linked ecosystem of research and innovation data. The methods developed are highly adaptable and could be applied to other national research systems, contributing to a more interoperable global landscape.
Beyond the data, a core part of our mission was to build lasting analytical capability. We developed open-source codebases and delivered hands-on walkthrough sessions to our partners at DSIT. This approach ensures they can replicate and build upon these analyses independently, and have greater ownership over their data assets for future strategic use.
Data
Coming soon
Articles
Coming Soon
Project team
-
David Ampudia
Senior Data Scientist -
George Richardson
Head of Data Science and Technology
Partners
