By Aaron Tay, Head, Data Services
The SMU community is not a stranger to the latest AI academic search tools. We have subscriptions to top tier tools like Undermind.ai, Scopus AI, Scite Assistant and more.
To these tools, we now add Consensus.ai on an extended trial until Aug 2026.
To get free access to Consensus Pro account, just register with your SMU email, verify via your email and you are done!
Want to know what's unique about Consensus and how it differs from other AI academic search tools that we have access to like Undermind.ai? Read on.
Basics of Consensus
At the basic level Consensus.ai works similarly to many of the tools already mentioned. You enter your query in natural language, it queries its index returns the top few results and it generates an answer with citations.
Similar to Undermind.ai, Elicit.com, it uses generally open sources, in particular OpenAlex and Semantic Scholar, which means it typically will have abstracts of papers but only open access full text.
One of the unique features of Consensus – and the feature that gives this product the name is that for certain types of questions and where there are enough relevant papers found, it can generate a "Consensus meter", which tries to estimate the "Consensus" from the set of papers on the same topic.

In the above example, you can see for the query "Do certain personality traits influence career success?", you get a nice visualisation showing 16 papers or 94% of the relevant papers found say "Yes", with 6% (1) saying the results are "mixed"
But can we trust the papers saying "yes"?
In the details, you can see for the average paper that says "yes", what their average recency, Journal Quartile Tier and citation count is. Moreover, In the example above, you can see of those 16 papers that say "yes", 3 are "Tier one studies", which is defined as RCT (Randomized Controlled Trials).
In short, these are proxy signals for quality that help you get a sense of the average quality of papers that are found.
For example, in this deep search on “Does chess training causally improve academic performance in children?” the Consensus Meter shows that 41% (7 studies) say “yes,” 12% (2 studies) say “possibly,” 24% (4 studies) say “mixed,” and 24% (4 studies) say “no.” At first glance the results look close. However, none of the seven “yes” studies are tier-one (randomised controlled trials, systematic reviews, or meta-analyses), while the tier-one studies are among the more negative results. This suggests the higher-quality evidence tends to be negative, and the overall weight of evidence points to no causal effect.

You can of course ignore both the generated answers and consensus meter and focus on the search results which has 3 layouts - "Default", "Compact", "Table", with the "Table" setting being most interesting.

Consensus not only lists the papers but uses LLMs to extract information about "population", "methods", "results", "outcomes" and "Sample size".
Unlike some other tools like Elicit or SciSpace you cannot create additional custom columns of information.
Many of the papers in the results are also tagged with information on whether they are RCT (randomized controlled trials), highly cited, or even "rigorous" or "very rigorous journals" to "help users better understand the quality of their search results". How is "rigorous journals" defined? Seems like they are using SciScore, with "rigorous journals" being in the top 50 percentile of journals and "very rigorous journals"in the top 10 percentile.
Another useful feature is Consensus has Libkey support. As such, you can go to settings and go to preferences and set your institution as Singapore Management University. This enables Consensus to "know" how to link to the full text of papers available via SMU subscriptions.
Settings of Consensus.ai

Like many academic AI search tools, it provides a setting that determines roughly how much "effort" it will do to search and in the case of Consensus how many papers it will try to summarise.
"Quick" summarises max 10 papers, "Pro" summarises max 20 papers and "Deep" summarises max 50 papers.
As I write this review, Consensus has announced in collaboration with OpenAI, "Scholar Agent" which is used in "Pro" and "Deep" mode. Under the hood this means Consensus is now using GPT5 and the response API and the post even includes a flow diagram of how the planning agent, search agent, reading agent work together.
Another somewhat unique feature about Consensus is that they provide a wide variety of pre-filters you can use to control the search. This is less common for current AI-powered search systems.
Besides the newly introduced "medical mode" that restrict the search to 8 million papers and guidelines drawn from: ~50,000 clinical guidelines, and The top 1,000 medical journals worldwide., there are also a variety of other prefilters.

You can filter by some of the usual suspects such as publish year, Scimago Journal Rank, citation count, exclude preprints etc.
Somewhat less commonly you can be filter by methodology. One big way is by design, so you can include only for example. Meta-analysis, RCT, Systematic review etc. Another way is by Sample size. Note that these classficiations are by AI/LLMs and may not be accurate.
Lastly, you can filter by field of study (23) and countries (236).
Consensus Deep Search – search process
I have written about the trend towards Deep Research with general tools like OpenAI/Gemini Deep Research as well as academic search implementations like Undermind.ai, Scopus Deep Research, ASTA summarize paper, and more.
Consensus version of this is called Deep Search.
It is interesting perhaps to compare Consensus Deep Search with Undermind.ai (which we also have a subscription)
Both Undermind and Consensus Deep Search take <10 minutes with Consensus being a bit faster.
One major difference is while Undermind will ask clarifying questions, Consensus Deep Search just searches straight away.
That said, some librarians dislike the fact that Undermind's search process is 100% non transparent.
In comparison with Consensus Deep Search, you get a PRISMA like flow diagram of the literature search and flow diagram.

In the example, above you can see the Deep Search run 20 searches, with 1004 hits. 43 more papers was identified using citation searching with a total of 1047 hits. After removing papers without abstracts and dupes (-333), you get 714.
They remove another 141 papers with "low semantic relevance" leaving 573 that are eligible to be selected and of these the top 50 are selected (probably after another rerank).

The sidebar gives even more details, and you can click on the drop down of each section to show what queries were done.

We even know the criteria used to do citation searching
Consensus Deep Search – output
Like many modern Deep Research tools, Consensus Deep Search provides a long form report with many interesting visualisations. See report here.
One particularly unique feature is that not only does the text include citations like most tools, the tools are coloured into green, yellow, orange, red which correspond to the classification in the Consensus meter for "yes", "possibly", "mixed", "no"

Besides the already mentioned Consensus Meter, consensus deep search also offers a "results timeline", "Top Contributors" table, "Claims and Evidence Table", "Research Gaps Matrix" and "Open Research Questions"




Some of these visualisations are similar to what you can also get via Undermind and some are different and it's a matter of preference which one you like.
Conclusion
Consensus is a very powerful AI powered academic search tool similar to Undermind.ai searching roughly the same sources.
It differs from Undermind.ai in terms of having more prefilters and having a more transparent search. Its Deep Search has visualisations that are quite standard in this class of tools, but there are also more unique ones like the Claims & Evidence table and the Consensus Meter.
You can see from the various details extracted and the filters available that Consensus has a clear STEM and medical slant, still the sources are as broad as what is used by Undermind so coverage shouldn't be an issue. But more testing is needed to ensure the retrieval algo and LLM is not fine-tuned to medical only.
At the end of the day, both tools are excellent and for me personally, which one to use would come down to which one was better at retrieving all the relevant papers in your topic, and this can only be determined with a lot of comparison testing in the area you care about.