Hi, I'm Sanjeepan
Machine Learning Engineer at H2O.ai. Building enterprise LLMs and AI systems with a research background in code generation and alignment.
SS

About

I'm a Machine Learning Engineer specializing in enterprise AI systems and Large Language Models. Currently at H2O.ai as MLE II, I build and deploy production-scale LLM solutions. I recently completed my Master of Applied Science (MASc) in Computer Engineering at York University, where I researched alignment techniques (RLHF, DPO, RLAIF) for improving code generation in LLMs, with publications at A* conferences like FSE.

Work Experience

Skills

Python
PyTorch
TensorFlow
LLMs
RLHF / DPO
Hugging Face
Scikit-Learn
Java
TypeScript
Docker
PostgreSQL
AWS
GCP
FastAPI
Git
Linux
Redis
Projects

Things I've Built

From production ML systems to open-source research tools.

H2O LLM DataStudio

A no-code application and toolkit to streamline data preparation tasks for fine-tuning Large Language Models.

Python
H2O Wave
LLMs
Data Curation
Fine-tuning

ReCo — Semantic Segmentation

Contrastive learning framework for semantic segmentation achieving strong results with minimal labeled data (5 examples per class). Supports CityScapes and Pascal VOC.

PyTorch
DeepLabV3+
ResNet-101
Semi-supervised Learning
Publications & Talks

Research & Speaking

Published research in NLP and information extraction, and presented at IEEE events and community meetups.

Reward-Free Code Alignment from Pretrained or Fine-Tuned LLM: Unpacking the Trade-offs for Code Generation

Reward-Free Code Alignment from Pretrained or Fine-Tuned LLM: Unpacking the Trade-offs for Code Generation

FSE 2026 · Research Paper · Montreal, Canada

Investigated how preference alignment techniques (DPO, BoNBoN) benefit LLMs for code generation across five models, showing that alignment reduces the performance gap between pretrained and fine-tuned variants.

Party Extraction from Legal Contracts Using Contextualized Span Representations

Party Extraction from Legal Contracts Using Contextualized Span Representations

RANLP 2023 · Long Paper

Developed a QA-based approach using RoBERTa to identify parties in legal contracts. Created an open-source legal party dataset with 1000 documents.

Sentiment Analysis in Dravidian Code-Mixed YouTube Comments and Posts

Sentiment Analysis in Dravidian Code-Mixed YouTube Comments and Posts

FIRE 2021 · CEUR Workshop Proceedings

Message-level polarity classification for code-mixed Dravidian language content on social media.

Guest Speaker @ IEEE Summer School on Computational Intelligence

Guest Speaker @ IEEE Summer School on Computational Intelligence

University of Jaffna, Sri Lanka

Speaker at the H2O.ai LLM Space and Tools session during an IEEE event, sharing insights on machine learning, AI, and H2O.ai's technologies.

Presentation at H2O.ai Sri Lanka Community Event

Presentation at H2O.ai Sri Lanka Community Event

Colombo, Sri Lanka

Presented latest work in signature detection, engaging with the ML community in Sri Lanka.

Contact

Get in Touch

Want to chat? Feel free to reach out on LinkedIn or send me an email. Always happy to connect.

GitHub
LinkedIn
Google Scholar