By Aaron Tay, Lead, Data Services
As we noted in “Bibliometric reviews in Business, Management & accounting and the tools used”, the top four bibliometric analysis tools that were mentioned in the abstract or keyword of Scopus articles in the area of Business, Management and Accounting and Social Sciences were:
- VOSviewer
- CiteSpace
- Harzing's Publish or Perish
- Bibliometrix/Biblioshiny
Of these four tools, VOSviewer, Citespace and Bibliometrix/Biblioshiny are tools designed to accept citation sources to produce visualisations of networked papers, authors, journals, organisations and countries using varied bibliometric techniques e.g., Cocitation, Bibliometric coupling.
Harzing’s Publish or Perish stands out in that it does not provide any visualisation at all. Rather the tool is renowned as a tool for retrieving and analysing citations for groups of papers. As such it is often used alongside the other tools in bibiliometric analysis or bibliometric reviews for performance analysis, while other tools like VOSviewer and Citespace are used to undercover the conceptual, intellectual and social structure of the field.
Publish or Perish and Google Scholar
Publish or Perish software was first released by Professor Anne-Wil Harzing in 2006. Designed to work with Google Scholar, it was one of the first tools to extract results from Google Scholar searches and generate citation metrics like H-index, G-index.
While Google Scholar is known as one of the biggest academic sources of data, they unfortunately do not provide a way to do bulk analysis using the citation data available in them. Unlike established citation indexes like Web of Science or Scopus, Google Scholar does not provide any build-in features in the web interface to allow bulk export of results and citation counts, nor is an API available for use.
As such, the only way to do such analysis is to scrape the data from Google Scholar. Over the years, Harzing’s Publish or Perish has become one of the most popular tools to do this.
Below shows a basic search using the tool and a typical output.
On a successful search, you can export the results of your search from Google Scholar into standard forwards like CSV, RIS, Bibtex and more.
You can also see citation metrics like the combined h-index, g-index and more of the papers you have found and these metrics can be exported as well.
Limitations of extraction of data from Google Scholar
While Publish or Perish works, it is important to note that Google Scholar generally discourages scraping of results. While Harzing’s Publish or Perish has some features in place to mitigate this (such as caching of results and including a hard limit of 1,000 results per pull), there are limits to the amount it can extract before Google Scholar starts to notice that a lot of traffic is going to an individual IP in a short period of time and initiates a CAPTCHA popup to slow you down (often sooner below the 1,000 limit imposed by the software). You may be able to continue by completing the first few CAPTCHAs but eventually access to Google Scholar from your current computer IP will be blocked for a period of time (typically a few hours).
This is an inherent problem that is due to Google Scholar and any similar tools will faced the same problem.
Publish or Perish and other sources
While the tool is still mostly associated with Google Scholar, Publish or Perish has begun to support other search indexes beyond Google Scholar & Google Scholar profiles.
Other sources it now works with include:
- Crossref
- Scopus (Subscription needed, works with SMU Subscription)
- Web of Science (Subscription needed, works with SMU Subscription)
- Microsoft Academic (defunct as of Jan 2022 and replaced with OpenAlex - see our coverage)
- Semantic Scholar
and V8.0 brings in PubMed as a source.
While I suspect users are mostly using Publish or Perish with Google Scholar, the fact that you can run the same search in one interface across multiple sources to extract the same output can be useful if you are doing systematic reviews that require you to search across various sources with the same or equivalent search terms.
Publish or Perish 8.0 new features
Besides support of Pubmed as a source, 8.0 adds a whole lot of features.
The ones I find most significant include:
These two features combined, allow you to extract in batches all citations made to a set of papers.
Here’s an example of a recent extraction I did.
- Extract all papers from my Google Scholar citation profile
- Use the retrieve citing works function from these papers
- [Optional] If you have too many results (citing works) and Google Scholar times out, you may want to do step 2 by dividing it into batches by publication year.
Other interesting 8.0 features include:
- download and export of abstracts
- Google Scholar - DOI extraction
- include/exclude stray citations and patents
While “abstracts” in Google Scholar refers to the search snipplets you see in the search results, with other sources you can indeed extract abstracts. This can be useful if you are doing a systematic review or meta-analysis and intend to screen the results you get from this search later.
Conclusion
Harzing's Publish or Perish tool has been one of the more popular bibliometric analysis tools for the last 15 years and has grown to become a rich and powerful bibliometric analysis tool and it is not possible to cover all its features. Do try it and if you have questions or need someone to walk you through the features, email me at aarontay@smu.edu.sg.