Harzing’s Publish or Perish tool 8.0

NA

By Aaron Tay, Lead, Data Services

As we noted in “Bibliometric reviews in Business, Management & accounting and the tools used”, the top four bibliometric analysis tools that were mentioned in the abstract or keyword of Scopus articles in the area of Business, Management and Accounting and Social Sciences were:

  • VOSviewer
  • CiteSpace
  • Harzing's Publish or Perish
  • Bibliometrix/Biblioshiny

Of these four tools, VOSviewer, Citespace and Bibliometrix/Biblioshiny are tools designed to accept citation sources to produce visualisations of networked papers, authors, journals, organisations and countries using varied bibliometric techniques e.g., Cocitation, Bibliometric coupling.

Harzing’s Publish or Perish stands out in that it does not provide any visualisation at all. Rather the tool is renowned as a tool for retrieving and analysing citations for groups of papers. As such it is often used alongside the other tools in bibiliometric analysis or bibliometric reviews for performance analysis, while other tools like VOSviewer and Citespace are used to undercover the conceptual, intellectual and social structure of the field.

Publish or Perish and Google Scholar

Publish or Perish software was first released by Professor Anne-Wil Harzing in 2006. Designed to work with Google Scholar, it was one of the first tools to extract results from Google Scholar searches and generate citation metrics like H-index, G-index.

While Google Scholar is known as one of the biggest academic sources of data, they unfortunately do not provide a way to do bulk analysis using the citation data available in them. Unlike established citation indexes like Web of Science or Scopus, Google Scholar does not provide any build-in features in the web interface to allow bulk export of results and citation counts, nor is an API available for use.

As such, the only way to do such analysis is to scrape the data from Google Scholar. Over the years, Harzing’s Publish or Perish has become one of the most popular tools to do this.

Below shows a basic search using the tool and a typical output.

NA
A sample search in Publish Or Perish to extract results and research metrics with Google Scholar

On a successful search, you can export the results of your search from Google Scholar into standard forwards like CSV, RIS, Bibtex and more.

You can also see citation metrics like the combined h-index, g-index and more of the papers you have found and these metrics can be exported as well.

The chart shows an upward trend in the documents that Scopus classified as reviews in the business, management and accounting.
Citation metrics extracted from Publish or Perish

Limitations of extraction of data from Google Scholar

While Publish or Perish works, it is important to note that Google Scholar generally discourages scraping of results. While Harzing’s Publish or Perish has some features in place to mitigate this (such as caching of results and including a hard limit of 1,000 results per pull), there are limits to the amount it can extract before Google Scholar starts to notice that a lot of traffic is going to an individual IP in a short period of time and initiates a CAPTCHA popup to slow you down (often sooner below the 1,000 limit imposed by the software). You may be able to continue by completing the first few CAPTCHAs but eventually access to Google Scholar from your current computer IP will be blocked for a period of time (typically a few hours).

The chart shows an upward trend in the documents that Scopus classified as reviews in the business, management and accounting.
Google Scholar slows down web scrapping from Publish or Perish tool with CAPTCHA

This is an inherent problem that is due to Google Scholar and any similar tools will faced the same problem.

Publish or Perish and other sources

While the tool is still mostly associated with Google Scholar, Publish or Perish has begun to support other search indexes beyond Google Scholar & Google Scholar profiles.

Other sources it now works with include:

and V8.0 brings in PubMed as a source.

NA

While I suspect users are mostly using Publish or Perish with Google Scholar, the fact that you can run the same search in one interface across multiple sources to extract the same output can be useful if you are doing systematic reviews that require you to search across various sources with the same or equivalent search terms.

Publish or Perish 8.0 new features

Besides support of Pubmed as a source, 8.0 adds a whole lot of features.

The ones I find most significant include:

These two features combined, allow you to extract in batches all citations made to a set of papers.

Here’s an example of a recent extraction I did.

  1. Extract all papers from my Google Scholar citation profile
  2. Use the retrieve citing works function from these papers
  3. [Optional] If you have too many results (citing works) and Google Scholar times out, you may want to do step 2 by dividing it into batches by publication year.
NA
Retrieving citing works from a set of papers authored by me.
NA
209 papers from Google Scholar that cite my works listed on my Google Scholar profile.

Other interesting 8.0 features include:

While “abstracts” in Google Scholar refers to the search snipplets you see in the search results, with other sources you can indeed extract abstracts. This can be useful if you are doing a systematic review or meta-analysis and intend to screen the results you get from this search later.

Conclusion

Harzing's Publish or Perish tool has been one of the more popular bibliometric analysis tools for the last 15 years and has grown to become a rich and powerful bibliometric analysis tool and it is not possible to cover all its features. Do try it and if you have questions or need someone to walk you through the features, email me at aarontay@smu.edu.sg.