About the testing by expert searchers of EBSCO's natural language search.

EBSCO’s Natural Language Search (NLS) allows users to search EBSCO interfaces using conversational language rather than using keywords, phrases, Boolean operators and bracketing of a traditional search. 

A search might be entered as a question: What are the benefits of diazepam over other treatments for anxiety? 

Natural Language Search uses AI to better parse the query into conventional search syntax using keywords, phrases and Boolean. By improving the parsing of the query, more contextual clues and users' intent can be included.   

Members of the Current and Emerging Technology in KLS Community of Practice tested NLS against their usual search approach for both basic and more complex queries in both the NHS Knowledge and Library Hub and EBSCOhost. 

20 testers assessed NLS for the relevancy of results in both search approaches.  They were also asked to run a search query in conversational English in the standard search mode.  Testers were asked to comment on differences they noticed between searches run in NLS and standard search mode and on their overall impression of NLS search mode and it’s usefulness to their work.

Testers were divided into EBSCOhost or NHS Knowledge Hub groups.  Each used a slightly different testing script. There were 11 returns from the NHS Knowledge Hub group and 9 from the EBSCOhost group. 

Relevancy

For the basic and more complex search query in both NLS and usual search, testers were asked to rate the relevancy of the search results from 1 (not at all relevant) to 5 (very relevant).

The higher the average the better.

Relevancy of results for NLS and Boolean search in EBSCOhost
Boolean Search   NLS  
Basic search 3.67 Basic search 3.25
Complex search 3.11

Complex search

2.44
NLS query in other search mode 1.71    
Relevancy of results for NLS and Boolean search in NHS Knowledge Hub
Boolean seach   NLS  
Basic search 3.91 Basic search 4.45
Complex search 3.18 Complex search 3.45
NLS query in other search mode 1.64    

Testers were also asked to list the number of irrelevant results.

The lower the average the better.

Relevancy of results for NLS and Boolean search in EBSCOhost
Boolean search   NLS  
Basic search 2.63 Basic search 3.25
Complex search 3.22

Complex search

5.56
Relevancy of results for NLS and Boolean search in the NHS Knowledge Hub
Boolean search   NLS  
Basic search 3.10 Basic search 0.78
Complex search 2.44

Complex search

3.75

In the NHS Knowledge Hub, the relevance of results was higher, and the number of irrelevant results was lower, in NLS than the usual search method for basic searches. While the relevance of results was higher in a more complex search using NLS, the number of irrelevant results was higher.

In EBSCOhost, the relevance of results was higher, and the number of irrelevant results was lower, in a search using the usual approach.

What differences do you notice between the natural language search and your usual search in basic and more complex search queries?

In the NHS Knowledge Hub, results for the NLS were returned more relevant results more quickly.  The results were generally different to those found using the more usual approach to searching. 

Testers reported that NLS prioritised evidence reviews even if these were sometimes quite old.  It was felt to be more focused in answering the question, perhaps better for specific rather than more sensitive searches.

In more complex searches, NLS retrieved more relevant results than the more usual approach but seemed to ignore additional concepts.  It did not link the concepts together well.  This may be due to other testing which revealed that NLS can handle up to a maximum of three concepts.

Using conversational language in a non-NLS mode resulted in poor results.  The main search term was the focus of the search leading to a massive increase in the number of irrelevant results.

In EBSCOhost, NLS tended to produce more results.  Its interpretation of concepts sometimes worked well, but in other cases, misinterpreted them and had difficulty dealing with phrases.

One tester said that ‘I was hoping it would pick up on the semantic elements of the search better, using “semantic triples” maybe, but sadly not!’  Others found NLS broadened the search and or dealt with context and meaning of the query, well.

For more complex queries, it was much the same. The expectation was for NLS to bring up fewer, but more relevant, papers compared to a basic search, but this proved not to be the case.

Testers reported that a modification to the query could result in more relevant results being returned, but it was felt that NLS should be able to cope with any search query regardless of how well or not it is input.

Looking at how the search was converted to the usual search using the refine my search query option, it doesn’t seem very smart or consistent in how it accomplishes this.

As an expert searcher, what’s your overall impression of Natural Language Search and how useful would you find it in your work? 

In the Hub, NLS generally worked well, brought back relevant results, and would be useful for basic searches. 

Testers felt NLS would work best in the Hub for really basic searches by novice searchers.  It may help them think around a topic, identify search terms or used as a check to see if any results were missed.

Most expert searchers would not recommend using NLS for advanced searching.  It struggles identifying the context for the search terms or combining the search terms in such a way to retrieve the most relevant results.

It is not good at retrieving precise or clinically relevant results compared to targeted keyword-based search methodologies.

There was concern about the reproducibility of searches.  Slightly different NLS searches for the same set of concepts produced different results. This could cause confusion for end-users as the results in one search may not be reproduced in other similar searches and may result in making literature searches take even longer (the FOMO effect). On the plus side, NLS may give users a different angle to their search query.

The ‘Show refined query’ option is useful and could be a benefit in constructing a good Boolean search string. However, the assumptions or inclusion/exclusion criteria made by the NLS aren’t entirely clear.

This feedback led to some recommendations, approved by the NHSE KLS Change Advisory Board at their June meeting.

  1. Allow the NLS as an option in Hub basic searches.  It is not particularly useful for more complex searches, but as most searches of the Hub are basic ones, it should improve the relevancy of the results it finds.  It may help users think around a topic, identify search terms or be used as a check to see if any results were missed.
  2. Do not allow the NLS as an option in EBSCOhost.  It does not always pick up the contextual clues or the user’s intent well.  It misses nuance and does not interpret phrases well.  It struggles with more than three concepts.  In addition, there is currently no possibility of easily refining NLS searches or combining them with more usual search query sets.  Instead, the user would need to start a new NLS search with a modified query.
  3. Send the feedback to EBSCO and ask them to work on the search algorithm to make it a more useful tool for more complex searches and for KLS expert searchers.

The feedback, including some suggestions for improving NLS was went to EBSCO. 

EBSCO’s response

Overall, they accept our conclusion about the use of NLS for more general search or topic exploration.

EBSCO says that they intentionally took a conservative approach to the term expansion for the rewrite into a Boolean keyword search, and consider it as a baseline approach, which gives them the flexibility to explore improvements based on feedback.

They will review the issue about NLS being better at specificity than sensitivity and improve it without losing the current level of specificity.

They will also enhance NLS’s ability to identify and comprehend multiple concepts and address the problem of queries which end with a question mark not retrieving results.  The latter is due to the question mark triggering different reasoning patterns which affects the translation of the NLA search into a Boolean keyword one.

To our concern about reproducibility, they plan to review why a rewritten Boolean search, which is re-run in proximity search mode can yield a significantly different record count compared to the one run in NLS.

EBSCO may implement an editing function for rewritten Boolean keyword searches once NLS can handle more complex queries. They will also make it clearer to users that a search is being run in NLS mode.

About out comment about being able to filter NLS searches, EBSCO told us that filters carry over between searches. Once a user selects a filter, the filter is persistent unless the lock icon is changed. The “Clear all” button at the bottom of the filter panel will remove a previously set filter, even if the lock is in the ‘on’

Finally, EBSCO will review the issue where emailed results are not identical to the actual result list when NLS is selected.

NLS has now been switched on in the NHS Knowledge Hub.  The feedback has been well received by EBSCO and may lead to improvements to the product in the future.  Thanks to all the people who got involved with the testing of Natural Language Search.

Mr Richard Bridgen

KLS Development Manager, East of England and South

Knowledge and Library Services