
By Aaron Tay, Head, Data Services
You've explored the literature, and as we discussed in our article on uncovering missing synonyms, you have gathered a comprehensive set of keywords. You then used those to build a robust search string and, by following our guide on testing your Boolean search strategy, you've confirmed it can find your essential papers.
So, you're ready to go, right? Almost.
Before you run your search across multiple databases, how can you be confident that your strategy isn't too narrow, causing you to miss other important studies? Conversely, how do you know it isn't too broad, leaving you to look or screen through thousands of irrelevant results?
In a perfect world, your search would return every relevant paper on your topic (high recall) and nothing else (high precision). But in reality, every search strategy faces the fundamental precision-recall trade-off.
- Recall asks: Of all the relevant papers out there, what percentage did my search find?
- Precision asks: Of all the papers my search returned, what percentage are actually relevant?
The goal of a high-quality search strategy is to maximize recall, ensuring you don't miss crucial studies. However, striving for 100% recall often destroys your precision, forcing you to screen thousands of irrelevant results.
Even if you have decided on the acceptable trade-off, how do you tell if your search strategy does that desired trade-off?
A 2025 article in Technological Forecasting & Social Change by Michael Gusenbauer and Sebastian P. Gauster outlines two essential quality checks that every researcher should perform. These checks help you validate both the recall and the precision of your keyword search, ensuring your literature review is built on a solid foundation.
The article - How to search for literature in systematic reviews and meta-analyses: A comprehensive step-by-step guide was designed for systematic reviews and provides even more guidance for systematic reviews and meta-analyses, but I adapt only the part that applies to all literature reviews. Most importantly unlike most guidance which is designed for clinical or health sciences, this article is focused on evidence drawn from "management and adjacent social science disciplines."
Quality Check 1: Validate Your Search Against Your "Gold Standard" List
This first check is a crucial validation step that confirms your search string works as intended. As detailed in our previous post on testing your Boolean search strategy, the goal is to test whether your final search string can successfully identify all the core articles you discovered during your initial scoping search.
If your search string fails to find even one of these "gold standard" papers in a database that you know contains it, that's a red flag. It tells you that your keywords or syntax are not capturing the language used in that part of the literature, and you need to revisit your search string.
Quality Check 2: Using NNR to Assess Your Precision-Recall Balance
Once you've confirmed your search can find your known papers, you need to gauge its performance on the wider literature. For this, Gusenbauer and Gauster (2025) recommend looking at the Number Needed to Read (NNR) metric
NNR = The total number of articles you screen or read / The number of relevant articles you find.
e.g a NNR of 20 means you have to look through 20 papers to find 1 relevant article.
Notice that because Precision = The number of relevant articles you find / Total number of articles you read, NNR and Precision are inversely correlated.
How to Estimate Your NNR in 4 Steps
Before you download thousands of results and begin the screening process, you should estimate your NNR. This simple test can save you dozens of hours.
- Run Your Search: Execute your final, validated search string in one of your primary databases (e.g., Scopus, Web of Science). Note the total number of results returned
- Take a Random Sample: You don't need to screen everything to get an estimate. Just take a random sample of the results. The easiest way is to export the first 100 or 200 results into a spreadsheet or your reference manager. This is your test sample.
- Screen the Sample: Carefully screen the titles and abstracts of your sample articles. Based on your inclusion and exclusion criteria, count how many of them are actually relevant to your review.
How you take your sample is critical. Do not sort by "relevance." This will create a biased sample by clustering the most relevant papers at the top, giving you a falsely low NNR. To get a more accurate estimate, if you cannot do a random sample, sort your results by a neutral factor, like publication date (newest first), and then take a sample of the first 100-200 results. This method avoids the bias of relevancy ranking and gives you a much more realistic preview of your true workload.
Calculate the Estimated NNR: Use the formula with your sample data:
Estimated NNR = (Number of articles in your random sample) / (Number of relevant articles found in the sample)
For example, if you screened a random sample of 200 articles and found 10 to be relevant, your estimated NNR is 200 / 10 = 20.
Interpreting Your Estimated NNR
Now you can use this estimate to diagnose your search strategy.
The High-Precision Trap (low estimated NNR)
It might seem ideal to have a low NNR. If you only need to screen five papers to find a relevant one (NNR=5), your precision is high. But be careful---this is often a warning sign that your recall is dangerously low.
An extremely precise search is usually a highly restrictive one. You may have used too many specific keywords or linked too many concepts with "AND." While this eliminates irrelevant results, it almost certainly excludes many relevant papers that use slightly different terminology.
The authors found that 42% of management reviews had an NNR below 8, suggesting they had sacrificed essential recall for the convenience of high precision.
The Low-Precision Problem (NNR > 100)
This is a more obvious issue. If you have to screen 200 papers to find one relevant study (NNR=200), your precision is horrible. Your search terms are likely too broad or vague, and your workload will be overwhelming. In these cases, you may need to refine your search to improve its precision without significantly harming recall. Adding a well-chosen concept or using proximity operators can often help.
Finding the Sweet Spot: The Pragmatic Balance
There is no perfect NNR, but the authors suggest a pragmatic range for most reviews is an NNR between 8 and 100 based on their empirical study of systematic reviews done in management area.
Falling within this range suggests you have struck a reasonable compromise: your search is broad enough to achieve good recall, but still precise enough to make the screening process manageable.
By using these two quality checks, you can move beyond guesswork. You'll have validated your search's recall and strategically positioned it within the precision-recall trade-off, setting the stage for a truly systematic and robust review.
Beyond Keywords: What Comes Next?
But what happens when even the most carefully calibrated keyword search isn't enough? Some research topics have inconsistent terminology, are too new for standardized language, or are so interdisciplinary that keywords fail. For these "difficult-to-search" topics, relying only on keywords means you will miss essential research.
This is where more advanced techniques come in. Instead of matching words, you can follow the "conversation" in the literature by tracking which papers cite each other. Or you can leverage the power of AI to find conceptually similar papers, even if they don't share your exact keywords.
In our next take piece, we will dive into the world of citation searching (backward and forward chasing) and semantic searching, showing you how to add these powerful tools to your research arsenal to ensure no stone is left unturned.