Use of AI tools and other technologies by NICE and their partners
About the use of AI tools and other technologies by NICE and their partners. These were presented at the NICE Joint information Day on 1 May 2025.
NICE held its biannual Joint Information Day on 1 May. It’s billed as an opportunity for information professionals working at NICE or at organisations working with NICE to share practice and network.
Held at the offices in London, the event focused on the use of AI tools. It included presentations from NICE and Cochrane, lightning talks, as offered delegates a chance to network and discuss how they were using AI tool in practice.
Pall Jonsson, Programme Director for Data and Real World Evidence at NICE presented an overview of AI within the organisation. He highlighted NICE’s Statement of Intent for Artificial Intelligence, published last year, and its Position Statement for AI in Evidence Submissions due out later this year.
The Statement of Intent sets out NICE’s intention to develop its approach to AI in three priority areas:
- AI-based methods to support evidence generation
- Evaluation of AI-based technologies
- AI to streamline its processes and increase efficiency and effectiveness
The Position Statement for AI in Evidence Submissions outlines what NICE expects when AI methods are used to generate and report evidence considered by its evaluation programmes. It will also indicate existing regulations, good practices, standards and guidelines to follow when using AI methods.
Pall offered three principles for use of AI in evidence generation:
- Any use of AI methods should augment, not replace human involvement
- When AI is used, submitting organisations and authors should clearly declare its use, explain the choice of method and report how it was used
- Submitting organisations should ensure compliance with any licensing agreements
In 2025/26, NICE’s work will include developing its AI strategy, develop external partnerships for knowledge exchange and upskilling, building scientific collaborations to understand the applications of AI approaches, and testing and piloting AI methods and systems, including the use of AI in systematic literature reviews. They will also assess the cyber risk for AI as a tool.
Dr Anna Noel-Storr, Head of Evidence Pipeline and Data Curation at Cochrane, spoke about her organisation’s use of AI in evidence synthesis.
She highlighted the core principles of evidence synthesis: rigour; transparency; and replicability. Questions are becoming more complex and it’s hard to find what’s needed without retrieving large amounts of noise.
AI has been around for a long time. Anna explored its development from text analytics in the 1950s to natural language processing in the 1980s, machine learning in the 2000s and generative AI now.
Technology can be innovative, that is doing the same things a bit better or disruptive, doing things that make the old things obsolete. Large language models (LLMs) offer the potential to recognise, summarise, translate, predict and generate text without any training or with only a few instructions. They may help structure questions, generate structured Boolean searches, translate these searches across databases and to suggest search terms.
With increased contextual understanding, LLMs show promise in screening. Evaluations report high precision and efficiency but there are concerns about reproducibility and consistency. Some studies suggest that LLMs can exceed human data extraction but there are issues around hallucination (model generates misleading content) and data integrity.
Other challenges include:
- over-fitting (the LLM cannot generalise and fits too closely to the training dataset instead)
- algorithmic bias (model perpetuates existing biases within the training set used to train it)
- black box predictions (lack of explanation of how the LLM arrived at its prediction)
Evaluation of AI tools is just beginning, and current studies mostly focus on ChatGPT. One study has shown that they have the potential to improve efficiency and accuracy in evidence synthesis and clinical practice guidelines.
Anna introduced RAISE, (Responsible AI in Evidence Synthesis) which will be published shortly. It will include recommendations for practice, and for building and evaluating, and selecting and using evidence synthesis tools.
As with Pall, AI should not replace human input. If you use AI in evidence synthesis, you need to follow legal and ethical standards, demonstrate that it will not compromise methodological rigour or the integrity of syntheses, and if it suggests judgements, they should be fully and transparently reported.
Dr Su Golder at the University of York evaluated Elicit’s ability to find studies for inclusion in systematic reviews. It found that while sensitivity was consistently higher in traditional searches, precision was consistently higher in Elicit. While Elicit found unique studies that met the inclusion criteria for the systematic review, Su’s team concluded that it should be used as a supplementary tool rather than one used without any human input.
NICE Knowledge and Library Services team told us about their use of lens.org which seeks to integrate scholarly and patent knowledge using open knowledge sets. These include Microsoft Academic, Crossref, PubMed and OpenAlex. They also told us about their use Power BI to produce new insights and create dashboards to aid understanding.
The University of Cardiff Specialist Unity for Review Evidence (SURE) presented on the use of AI tools in their work. ChatGPT was able to identify additional search terms but was less good at identifying MeSH. This confirms the findings of the Current and Emerging Technology in KLS CoP Stay and Play on generative AI tools and searching. A comparison of Elicit and Undermined to find studies for inclusion in a systematic review showed that both found some but not all of the studies which were included in the review.
Rayyan showed promise for quick screening for non-systematic reviews. The majority of includes were identified within the top 100 suggested records. Tested against a completed SR, Rayyan identified the 14 target articles within 856 records.
There was discussion to explore the impact of AI on participants’ work. There had been some use of AI tools, mostly as described above, or for text and data mining or scoping searches. No-one was training large language models on their own data. Copyright, reproducibility, and duplication of effort were mentioned as issues affecting use of generative AI tools.
The IQWIG in Germany is using R to adapt Shiny Apps to help streamline information retrieval tasks such as textual analysis and building searches.
There is quite a lot of using being made of AI tools to streamline the processes around screening evidence, refining searches and thematic analysis, but less use of generative AI tools, given current limitations and concerns. There is some exploration of training large language models on organisational data sets.
However, the standout message for me was the importance of the human in the loop. Knowledge and Library Services staff can play a part in giving people the skills to use AI tools effectively as well as to understand their limitations.
Generative AI tools are only as good as the data on which their large language model has been trained. If you need to validate the output from generative AI, you might as well write the summary yourself, especially as much useful data is hidden behind paywalls. While AI tools can aid the production of evidence summaries and synthesis, there is still the need for the skills of expert searchers and knowledge specialists.
To see the presentations, please sign up or in to NICE Docs