scite- a new generation discovery citation index- an interview with scite co-founder and CEO Josh Nicholson

Learn more about scite with Josh Nicholson CEO and cofounder of scite



Interview Questions by Aaron Tay, Lead, Data Services

SMU Libraries have recently subscribed to scite, a new generation discovery citation index that covers 900 million citation statements.

Similar to traditional citation indexes like Web of Science and Scopus, you can use it to find citation counts of papers, authors, journals and institutions. However scite goes beyond that in two ways by allowing you to:

  • Search for expert analyses and opinion on any topics using the citation statement search mode
  • Check how research papers have been cited – whether the cite is a mentioning cite, supporting cite or contrasting cite

To benefit from SMU’s subscription, you can access scite directly via this link or go to and do a one-time registration with your SMU email. Interested to know more about scite?

Join our online training on 8 December 2021 at 9am.

In this post, we will be interviewing Josh Nicholson CEO and cofounder of scite on what scite aims to do and how it can be helpful to researchers.


Thank you, Josh for agreeing to do this Q&A. Could you please introduce yourself briefly and the goals of scite?

My name is Josh Nicholson, and I am the co-founder and CEO of scite. We started scite in an effort to help make research more reliable after numerous studies came out documenting high rates of irreproducibility in cancer research, psychology, and other fields. Within my subfield of cancer research looking at aneuploidy and chromosome mis-segregation, I knew that there were papers with hundreds of citations that had been shown to be irreproducible by multiple labs. This information was buried in the avalanche of citations so it would be missed by most that were not intensely reading the literature in my field. The flip side of that is that I wouldn’t know this information about other papers that were outside of my field.

We wanted to make it easy to see how a paper has been cited and to help researchers understand if it has been supported or contrasted by subsequent studies. Our aim has broadened a bit beyond reproducibility at this point and I think we want to make research more easy to understand, discover, and interpret. In short, we want to bring in the next generation of citations moving beyond a superficial metric to something that is much more rich with information and nuance.


I understand that scite allows you to search for papers and track citations to published works such as journal articles but what would you say are the major differences between scite and from other traditional citation indexes and discovery services like Google Scholar, Web of science and Scopus?

Traditional citation tools are used much more for analytics by administrators and bibliometricians and much less for discovery and understanding. In general, researchers will use tools like Web of Science or Scopus when preparing their publication metrics for various review and evaluation purposes but will not use it to discover new articles in their field. Students will rarely, if ever, use such tools for their work as they do not really help with day to day writing and interpretation and are overly complex. scite is primarily used by students at this point as it helps provide important nuance and understanding about different papers, journals, researchers, and topics. This information can help a student as they read various articles and as they write, whether that is an essay for a class, a dissertation, or a paper they are preparing to publish.

scite and Google Scholar are much closer. The key difference is that we show citation context, we classify citations as providing supporting or contrasting evidence, and we allow you to search not just based on title and abstract but to search the citation statements themselves.


What is the size of scite’s index and how does that compare to Web of Science?

We are roughly on par with all major citation tools, if not further ahead at this point. A comparison between scite and Web of Science and other citation tools can be seen here: In general, we have as many traditional citations as traditional citation indices, like Web of Science or Scopus. The key difference is that we have over 925M citation statements extracted and classified from full text scientific articles while these tools effectively have none or only a few.

Comparison between scite, Scopus, Web of Science and Dimensions
Note. Adapted from Coverage and Comparison With Other Indexes, by scite ( Copyright by scite. 


scite provides a “citation statement” search mode. What is a citation statement and what are the advantages of using this search mode?

A citation statement or “citance” is the sentence extracted from the full-text article that includes the in-text citation. The citation statement shows how a citation was used by allowing you to easily read it, we also include the sentence before and after the citation statement, which we collectively call the citation context. Seeing the citation context can help readers easily see how an article or topic has been cited without having to open every full-text of the article that cites it, saving readers a lot of time while providing them with extra information and nuance.

Example of a citation statement and citation context

The Citation Statement Search allows users to quickly and easily see what the research literature says about any topic. Really citation statements are expert analyses, insights, and opinions extracted from academic publications. Thus, with our search you have nearly 1 billion expert thoughts on any topic, which can be used to make more informed decisions and to understand complex topics easier.

Note. The difference between normal search and Citation Statement Search. From Citation Statement Search, by scite ( Copyright by scite. Reprint with permission.


What are “contrasting cites” How are they determined and how can I use them for my research?

We classify citations as providing supporting or contrasting evidence using a deep learning model. This model has been trained on roughly 40,000 manually annotated citation statements. Filtering for supporting or contrasting citation statements allows users to easily see if a study has been tested by others in the literature and to see if they provide supporting or contrasting evidence. While this gets a lot of attention because it is deep learning, I would also emphasise that filtering citations by where they appear in the citing paper (introduction versus discussion section) can be hugely beneficial and help provide important interpretations and understanding of a paper.

Interviewer Note : For full technical details of scite – refer to the following publication - scite: A smart citation index that displays the context of citations and classifies their intent using deep learning.


What does the scite Reference Check do?

The scite Reference Check was built to help users check the references of their unpublished manuscript/essay/dissertation to see how others have cited the same sources, to see if any of the references have been contrasted in the literature or retracted, and to help researchers in general make sure their references are sound and that they are not missing any contextual information, like competing studies.

The scite Reference Check can be used by uploading a PDF or manuscript of your own or one that you might be reviewing. Additionally, we have integrated the reference check into major submission systems so we are working to make this a default service pre-publication. As you can see from our reference check twitter bot, many papers continue to cite papers even after they have been retracted. We want to stop this and more generally help researchers make sure their writing is built on a strong foundation.


scite has an interesting dataset, is it possible to obtain access to the dataset for a research study I have in mind?

We are happy to make our data available for research purposes. We make our data available via API ( and we provide data dumps to researchers for larger analyses. There are some limitations though as we can’t sublicense or redistribute citation context from publishers. The citation counts and classifications though can be more or less freely distributed.


What are some features that we can expect to see from scite in the coming future?

We are working on “scite analyses” now, which will allow users to export a PDF report based on their search results or a dashboard. The information included in these analyses will depend upon the search but in general will show who has worked the most in that area/topic space, which are some of the most supported or influential studies and what are some key viewpoints (citation statements). We want to help consultants better utilise our data to understand different topics and make presentations easier, thus making scite more useful than just writing papers.