Saptarshi Sengupta - profile photo

Saptarshi Sengupta

PhD Student in Informatics

I'm a PhD student at The Pennsylvania State University working on domain-specific applications of LLMs, under the guidance of my advisor, Dr. Suhang Wang. Outside of work, I am an advocate for animal rights 🦙, enjoy reading 📖 (currently on an Alan Turing biography), cooking 🍲 and playing the guitar 🎸.

Research Interests

My research interests span various aspects of NLP, including QA, RAG, IR/Search, LLM agents, model interpretability, and low-resource languages. Overall, I'm interested in applying language technologies to challenging edge cases that have either too much/little data. Through my work, I aim to develop methods for tackling real-world problems that are easy to use and cost-effective. You can find my research timeline described in this Google Slide.

Publications

Pre-Prints

BioMol-MQA: A Multi-Modal Question Answering Dataset For LLM Reasoning Over Bio-Molecular Interactions

Saptarshi Sengupta, Shuhua Yang, Paul Kwong Yu, Fali Wang, Suhang Wang

ToolDreamer: Instilling LLM Reasoning Into Tool Retrievers (Just accepted to EACL 2026 Main!)

Saptarshi Sengupta, Zhengyu Zhou, Jun Araki, Xingbo Wang, Bingqing Wang, Suhang Wang, Zhe Feng

MAG-V: A Multi-Agent Framework for Synthetic Data Generation and Verification

Saptarshi Sengupta, Harsh Vashistha, Kristal Curtis, Akshay Mallipeddi, Abhinav Mathur, Joseph Ross, Liang Gou

Published Work

TOP-Training: Target-Oriented Pretraining for Medical Extractive Question Answering

Saptarshi Sengupta, Connor Heaton, Shreya Ghosh, Wenpeng Yin, Preslav Nakov, Suhang Wang

International Conference on Computational Linguistics (COLING), 2025

Exploring Language Model Generalization in Low-Resource Extractive QA

Saptarshi Sengupta Wenpeng Yin, Preslav Nakov, Shreya Ghosh, Suhang Wang

International Conference on Computational Linguistics (COLING), 2025

Towards Efficient Methods in Medical Question Answering using Knowledge Graph Embeddings

Saptarshi Sengupta Connor Heaton, Suhan Cui, Soumalya Sarkar, Prasenjit Mitra

IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2024

Improving Semantic Similarity with Cross-Lingual Resources: A Study in Bangla—A Low Resourced Language

Rajat Pandit, Saptarshi Sengupta, Sudip Kumar Naskar, Niladri Sekhar Dash, Mohini Mohan Sardar

Informatics journal, 2019

Word sense induction in bengali using parallel corpora and distributional semantics

Saptarshi Sengupta Rajat Pandit, Parag Mitra, Sudip Kumar Naskar, Mohini Mohan Sardar

Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology, 2019

Writing

Feed-Forward Neural Network From Scratch

I've always wanted to implement a simple FFNN from scratch just to see how the math works and really understand things at a deeper level. This is my attempt at creating something from an educational perspective, breaking down all the math in bits to be more accessible. Note: All of the code works but some final illustrations remain.

Experience

NLP and Large Language Model Intern

Robert Bosch LLC | May 2025 - August 2025

Performed research in tool retrieval for LLM-agents when dealing with a large number of tools. Proposed a new framework (ToolDreamer) for the same.

Machine Learning Applied Scientist Intern

Splunk | May 2024 - November 2024

Worked on synthetic data generation and LLM-agent trajectory verification for an internal AI assistant. Developed systems were implemented using the Autogen library