Supporting all your research needs - Analysis (Quantitative and Qualitative) support

25 Nov 2024

As of Aug 2024, after a library reorganization, SMU Libraries' Research & Data Services department, jointly headed by Yeo Pin PIn (Head, Research Services) and Aaron Tay (Head, Data Services) has been expanded to a team of 9 librarians with diverse expertise in supporting research and is now charged with the mission to support SMU researcher needs throughout the whole research work cycle.

Engagement activities

We engage with researchers including Postgraduates, Faculty and Research staff via activities such as orientations and initiatives like the SMU Researcher Club (with activities jointly curated with SMU researchers). We also produce the Research Radar newsletter series (this article!) bimonthly.

Research support services

Research Support is now provided through these four main clusters of services

Literature and Data Discovery
Analysis (Quantitative and Qualitative)
Writing, Publication & Digital preservation
Research visibility & assessment

In our last Research Radar piece, we discussed the first cluster of research support services around "Literature and Data Discovery". In this piece, we will move towards discussing our support of Analysis (quantitative and qualitative)

Analysis (Quantitative and Qualitative) research support - major services

Auto-Transcription of speech via Whisper
Thematic coding support with ATLAS.ti / NVIVO
Survey design and analysis with Qualtrics
Advice on data anonymizing/de-identifying sensitive data
Basic support of clustering and topic modelling and classification tasks using popular packages like NLTK, spaCy, BERTopic etc.

Depending on the degree of support provided, acknowledgement or co-authorship may be requested.

Some common tools we provide and/or support

Management of Investment & Data Studio

The Research & Data Services department also manages the Investment & Data Studio at Li Ka Shing Library Level 3 and provides access to research software (see list), finance related terminals like Bloomberg and staffs it with support.

Some past user stories

Scenario 1: A faculty needed to cluster thousands of journal articles on a certain topic into distinct clusters

Support provided: We helped extract the articles. Using the Python library BERTopic, we created clusters of topics for initial inspection and analysis (topic modeling). BERTopic is a Python library that leverages transformer models to create embeddings, perform dimension reduction, and then apply c-TF-IDF to generate keywords for easily interpretable clusters.

Because the library is very modular, we experimented with different clustering models (HDBSCAN, KNN) and changing different hyper parameters to control the number of clusters as well as different representations of clusters identified.

Scenario 2: An admin department running a competition needed help auto transcribing hours of recorded audio.

Support provided: Using the method of Multi-speakers diarization + transcription using Whisper, we helped to create a sample auto transcription of the audio.

Scenario 3: A research staff needed advice on how to remove sensitive data and Personally Identifiable Information (PII) before depositing the dataset (competition data) to a data repository.

Support provided: We met up with the research staff and provided advice on how to make the dataset less sensitive and identifying. For example we suggested certain fields could be totally excluded in the deposited dataset, while other fields could be grouped to lower the risk of identification

These are of course just three use cases. We observe that the barrier to entry of using sophisticated deep learning NLP models for NLP tasks have fallen greatly in recent months and it now relatively easy to identify and use near state of art models using resources and platforms like hugging face.

Conclusion

We are just beginning on our journey to provide analysis support services and will continue to adapt and expand our offerings based on demand. Feel free to ask us for a consultation!

In the next Research Radar piece, we will focus on research support services around Writing, Publication & Digital preservation, which is a mix of new and existing services.