By Janna Xu, Senior Research Data Curator, Research & Data Services
Janna Xu Cong joined SMU Libraries as a Research Data Curator in 2026, bringing her experience in scholarly communication, library systems, and workflow automation. Her experience spans institutional repository management, bibliometrics support, digitization of special collections, and AI-assisted process improvement — work that earned her the LAS Passion Award (2024), the NIE Innovation Award (2024 & 2025) and the SUTD Best Cross-Functional Team Award (2021). She holds an MSc in Information Studies from NTU and a Graduate Diploma in System Analysis from NUS, and is a certified Project Management Professional (PMP)® and Alma & Primo Certified Administrator.
As a member of SMU's Data Stacks Initiative, Janna oversees the enTRUST platform and CLOAK anonymisation toolkit, enabling researchers to work securely with high-value and sensitive datasets.
In this Research Radar article, we catch up with her to learn more about her and her work.
Q: Hello Janna, thank you for taking the time to chat with us. Could you tell us about your role as a Research Data Curator?
Thank you for having me. As a Research Data Curator at SMU Libraries, my work is centred around the Data Stacks Initiative — a university-wide effort to build a secure, scalable data infrastructure that supports SMU's strategic research areas: ageing societies, sustainable cities, and resilient workforces.
The Data Stacks Initiative is designed to help researchers unlock the potential of high-value and sensitive datasets. Through this initiative, we enable researchers to securely access, link, and analyse data from government agencies and industry partners using the enTRUST platform, which is a government-secured cloud environment. This means researchers can work with rich, meshed datasets that were previously siloed, opening new possibilities for high-impact research.
In my role, I focus on four key areas that support researchers throughout their journey:
First, the data catalogue. I coordinate with internal stakeholders to establish and maintain a well-curated data catalogue aligned with SMU's strategic research areas. This catalogue helps researchers discover what datasets are available, understand their scope, and request access.
Second, data anonymisation with CLOAK. Many researchers work with sensitive data containing personal identifiers and sensitive information. I provide training on using the CLOAK anonymisation toolkit, helping researchers de-identify data responsibly while preserving its analytical value, so they can comply with privacy requirements without compromising their research.
Third, researcher onboarding. I assist with the onboarding process for researchers using enTRUST tools, such as Amazon WorkSpaces and JupyterLab — ensuring a seamless experience from the moment they gain access to the point they start their analysis.
Fourth, training and guides. I design and deliver training sessions and produce user guides and FAQs that empower researchers to confidently navigate the data catalogue and secure analysis environments. My goal is to build sustainable capacity so researchers can focus on their work, not on troubleshooting technical issues.
Q: What are some common challenges you see researchers face?
I'd like to highlight three challenges I often see. But don't worry, we're here to support you at every step of your research data management journey.
Discovering and Accessing High-Value Datasets
Many researchers know that rich datasets exist within government agencies or universities, but they aren't sure what's available, how to request access, or whether the data can be linked with their own. The SMU enTRUST Data Catalogue, which I help curate, addresses this by systematically documenting datasets aligned with our strategic research areas. Researchers can browse the catalogue, understand what datasets are available, and initiate access requests through a clear workflow.
Protecting Privacy While Preserving Analytical Value
Working with sensitive data requires strict adherence to privacy laws and guidelines like PDPA. Researchers often struggle with how to de-identify data effectively without losing the nuance needed for meaningful analysis. This is where the CLOAK anonymisation toolkit comes in. CLOAK is built specifically to comply with Singapore government data privacy policies and guidelines. It ensures anonymization, data protection, and policy compliance, including adherence to IM8 guidelines when handling sensitive, confidential data. We also provide training and consultations to help researchers navigate anonymisation concepts, identify direct and indirect identifiers, and balance privacy protection with data utility.
Navigating Secure Analysis Environments
Tools like Amazon WorkSpaces and JupyterLab are powerful, but they come with a learning curve. To bridge this gap, we produce user guides, FAQs, and hands-on workshops that walk researchers through everything from first login to running their analyses. The goal is to make the technical infrastructure feel like a supportive tool rather than a barrier, so researchers can focus on their research questions. On top of that, SMU Libraries also offer dedicated support for computational analysis — whether it's data cleaning, advising on reproducible workflows, or connecting researchers with the right expertise. We're here to help at every step.
Q: What's one recent success story you can share?
We're still in the early stages of the Data Stacks Initiative, but I'm excited to share that we've just started working on onboarding our very first dataset from SMU Academy. This dataset captures rich information on adult learners and holds great potential for our faculty fellows in their research. The plan is to share this dataset through enTRUST, with the possibility of linking it to other government administrative data later on.
What's particularly encouraging is that this first use case is already showing how well enTRUST can support research beyond its original version TRUST, which only handles medical data. One of the things I've been highlighting to researchers is that the platform isn't limited to medical data — it can handle a wide range of research data, which opens up possibilities for many more disciplines across SMU. Researchers can request access to diverse datasets, and the platform gives them a clear, governed path to do so.
Currently, we're in the process of:
- Onboarding the SMUA data into the enTRUST catalogue — curating the metadata so the dataset is discoverable and well-documented for future researchers.
- Providing data curation support to ensure the dataset is structured, documented, and ready for secure sharing within enTRUST.
We'll also be looking into anonymisation advice using CLOAK as the project progresses, especially if the data needs to be linked with external administrative datasets.
This first use case is helping us build a template that will benefit many more researchers in the future, especially those working with sensitive data across SMU's strategic research areas.
Q: Is there anything exciting that you are working on now that you can share?
Yes, I'm currently exploring how to leverage the AI capabilities within CLOAK itself to make the data anonymisation process smoother for researchers.
CLOAK is actually quite sophisticated when it comes to AI-powered anonymisation. One feature I'm particularly excited about is its free-text anonymisation, which uses Language Models (LLMs) to detect and redact sensitive information from unstructured text — things like names, NRICs, addresses, or even custom entities that researchers define. What's impressive is that researchers can add their own custom entities using few-shot prompting, meaning they only need to provide 3–5 examples, and the LLM learns to detect similar patterns without requiring extensive training data. The model runs securely within the Government Commercial Cloud, so no data leaves the trusted environment.
Beyond free-text anonymisation, CLOAK also offers synthetic data generation using advanced deep learning and statistical models — it can create synthetic datasets that preserve the statistical properties of real data while protecting individual privacy. This is incredibly valuable for researchers who want to share data or develop analysis pipelines without exposing sensitive information.
Q: Finally, how can readers get in touch with you if they need support or want to learn more?
Please feel free to reach out to me and ask any questions — if I cannot answer your question, I will direct you to the relevant team member! You can:
- Email me at jannaxu@smu.edu.sg
- Send me a Teams message
- Arrange a meeting to chat about anything related to your research and data.

Looking forward to meeting and chatting with you all!