By Aaron Tay, Head, Data Services
As of Aug 2024, after a library reorganization, SMU Libraries' Research & Data Services department, jointly headed by Yeo Pin PIn (Head, Research Services) and Aaron Tay (Head, Data Services) has been expanded to a team of 9 librarians with diverse expertise in supporting research and is now charged with the mission to support SMU researcher needs throughout the whole research work cycle.
Engagement activities
We engage with researchers including Postgraduates, Faculty and Research staff via activities such as orientations and initiatives like the SMU Researcher Club (with activities jointly curated with SMU researchers). We also produce the Research Radar newsletter series (this article!) bimonthly.
Research support services
Research Support is now provided through these four main clusters of services
- Literature and Data Discovery
- Analysis (Quantitative and Qualitative)
- Writing, Publication & Digital preservation
- Research visibility & assessment
In our last Research Radar piece, we discussed the first cluster of research support services around "Literature and Data Discovery". In this piece, we will move towards discussing our support of Analysis (quantitative and qualitative)
Analysis (Quantitative and Qualitative) research support - major services
- Auto-Transcription of speech via Whisper
- Thematic coding support with ATLAS.ti / NVIVO
- Survey design and analysis with Qualtrics
- Advice on data anonymizing/de-identifying sensitive data
- Basic support of clustering and topic modelling and classification tasks using popular packages like NLTK, spaCy, BERTopic etc.
Depending on the degree of support provided, acknowledgement or co-authorship may be requested.
Some common tools we provide and/or support
Management of Investment & Data Studio
The Research & Data Services department also manages the Investment & Data Studio at Li Ka Shing Library Level 3 and provides access to research software (see list), finance related terminals like Bloomberg and staffs it with support.
Some past user stories
Scenario 1: A faculty needed to cluster thousands of journal articles on a certain topic into distinct clusters
Support provided: We helped extract the articles. Using the Python library BERTopic, we created clusters of topics for initial inspection and analysis (topic modeling). BERTopic is a Python library that leverages transformer models to create embeddings, perform dimension reduction, and then apply c-TF-IDF to generate keywords for easily interpretable clusters.
Because the library is very modular, we experimented with different clustering models (HDBSCAN, KNN) and changing different hyper parameters to control the number of clusters as well as different representations of clusters identified.
Scenario 2: An admin department running a competition needed help auto transcribing hours of recorded audio.
Support provided: Using the method of Multi-speakers diarization + transcription using Whisper, we helped to create a sample auto transcription of the audio.
Scenario 3: A research staff needed advice on how to remove sensitive data and Personally Identifiable Information (PII) before depositing the dataset (competition data) to a data repository.
Support provided: We met up with the research staff and provided advice on how to make the dataset less sensitive and identifying. For example we suggested certain fields could be totally excluded in the deposited dataset, while other fields could be grouped to lower the risk of identification
These are of course just three use cases. We observe that the barrier to entry of using sophisticated deep learning NLP models for NLP tasks have fallen greatly in recent months and it now relatively easy to identify and use near state of art models using resources and platforms like hugging face.
Conclusion
We are just beginning on our journey to provide analysis support services and will continue to adapt and expand our offerings based on demand. Feel free to ask us for a consultation!
In the next Research Radar piece, we will focus on research support services around Writing, Publication & Digital preservation, which is a mix of new and existing services.