Saptarshi Sengupta - profile photo

Saptarshi Sengupta

PhD Student in Informatics

I'm a PhD student at The Pennsylvania State University working on domain-specific applications of LLMs, under the guidance of my advisor, Dr. Suhang Wang. Outside of work, I am an advocate for animal rights ๐Ÿฆ™, enjoy reading ๐Ÿ“– (currently on an Alan Turing biography), cooking ๐Ÿฒ and playing the guitar ๐ŸŽธ.

I'm actively seeking full-time positions as an applied research scientist/engineer. If you believe I might be a good fit, do reach out & I'd be happy to talk ๐Ÿ˜€

Research Interests

My research interests span various aspects of NLP such as QA, RAG, IR/Search, LLM-agents, model interpretability and low-resource languages. Overall, I'm interested in applying language technologies to challenging edge-cases which have either too much/little data. Through my work, I aim to develop methods for tackling real-world problems that are easy to use and cost-effective.

Publications

Pre-Prints

BioMol-MQA: A Multi-Modal Question Answering Dataset For LLM Reasoning Over Bio-Molecular Interactions

Saptarshi Sengupta, Shuhua Yang, Paul Kwong Yu, Fali Wang, Suhang Wang

ToolDreamer: Instilling LLM Reasoning Into Tool Retrievers

Saptarshi Sengupta, Zhengyu Zhou, Jun Araki, Xingbo Wang, Bingqing Wang, Suhang Wang, Zhe Feng

MAG-V: A Multi-Agent Framework for Synthetic Data Generation and Verification

Saptarshi Sengupta, Harsh Vashistha, Kristal Curtis, Akshay Mallipeddi, Abhinav Mathur, Joseph Ross, Liang Gou

Published Work

TOP-Training: Target-Oriented Pretraining for Medical Extractive Question Answering

Saptarshi Sengupta, Connor Heaton, Shreya Ghosh, Wenpeng Yin, Preslav Nakov, Suhang Wang

International Conference on Computational Linguistics (COLING), 2025

Exploring Language Model Generalization in Low-Resource Extractive QA

Saptarshi Sengupta Wenpeng Yin, Preslav Nakov, Shreya Ghosh, Suhang Wang

International Conference on Computational Linguistics (COLING), 2025

Towards Efficient Methods in Medical Question Answering using Knowledge Graph Embeddings

Saptarshi Sengupta Connor Heaton, Suhan Cui, Soumalya Sarkar, Prasenjit Mitra

IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2024

Improving Semantic Similarity with Cross-Lingual Resources: A Study in Banglaโ€”A Low Resourced Language

Rajat Pandit, Saptarshi Sengupta, Sudip Kumar Naskar, Niladri Sekhar Dash, Mohini Mohan Sardar

Informatics journal, 2019

Word sense induction in bengali using parallel corpora and distributional semantics

Saptarshi Sengupta Rajat Pandit, Parag Mitra, Sudip Kumar Naskar, Mohini Mohan Sardar

Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology, 2019

Writing

Feed-Forward Neural Network From Scratch

I've always wanted to implement a simple FFNN from scratch just to see how the math works and really understand things at a deeper level. This is my attempt at creating something from an educational perspective, breaking down all the math in bits to be more accessible. Note: All of the code works but some final illustrations remain.

Experience

NLP and Large Language Model Intern

Robert Bosch LLC | May 2025 - August 2025

Performed research in tool retrieval for LLM-agents when dealing with a large number of tools. Proposed a new framework (ToolDreamer) for the same.

Machine Learning Applied Scientist Intern

Splunk | May 2024 - November 2024

Worked on synthetic data generation and LLM-agent trajectory verification for an internal AI assistant. Developed systems were implemented using the Autogen library