I am a fourth-year Software Engineering student at the University of Waterloo, specializing in Machine Learning and Artificial Intelligence.
My passion lies in training and optimizing large transformer-based multimodal models, with a focus on enhancing their efficiency,
interpretability, and real-world applicability.
Interests
- Transformer-based efficient model training and inference
- Multimodal LLMs and Vision LLMs
- Parameter efficient model finetuning
- Transformer-based Embodied Autonomous Agents
- Tech, investing, personal growth, podcasts, and books
Skills
Languages: Python, C++, C, SQL, JavaScript, Java, Scala, Assembly
Technologies: PyTorch, Jax, TensorFlow, Pandas, Scikit-learn, MySQL, Docker, Kubernetes, GCP, AWS
ML Focus: Modeling and Training Transformer-Based LLMs and Vision LLMs, Parameter-Efficient Fine-Tuning of LLMs, Optimized Export and Inference of LLMs, Retrieval-Augmented Generation
Work Experience
Software Engineer Intern - NVIDIA
Santa Clara, CA | Oct 2024 - Dec 2024
- Designed and developed Tripy in C++ and Python—a PyTorch-inspired machine learning framework powered by TensorRT—
delivering customizable, high-performance inference pipelines
- Added core layers such as BatchNorm and Sequential, and extended model support to include vision models such as ResNet50
- Implemented an efficient intermediate IR caching mechanism within Tripy, accelerating model execution by 100x and optimizing
throughput for GPT and ResNet benchmarks
Research Engineer Intern - Cohere AI
Toronto, ON | Jan 2024 - Aug 2024
- Designed parameter-efficient fine-tuning framework for model-parallel distributed training and inference of LLMs on GPUs
- Implemented, trained, evaluated, and deployed Cohere's first Multimodal Vision LLM in PyTorch
- Led internal research to evaluate fine-tuning techniques for LLMs on customer data, comparing methods such as parameter efficient supervised fine-tuning, direct policy optimization, and offline preference tuning
- Developed and dockerized a FastAPI + Uvicorn server for batch inference of fine-tuned LLMs and VLLMs, integrating it with evaluation libraries to enable large-scale evaluation of fine-tuned models
Machine Learning Engineering Intern - Cohere AI
Toronto, ON | May 2023 - Aug 2023
- Led the development of a comprehensive data-quality evaluation library to evaluate LLM quality in terms of human preference, grammar, spelling, and repetitiveness, using Python and Pandas
- Integrated the data-quality library to run on training datasets and +10,000 daily API data to flag and remove bad items
- Fine-tuned Cohere's Command model on collected and cleaned API data resulting in a 70% performance boost, using PyTorch and Scikit-learn
Machine Learning Engineering Intern - Cohere AI
Toronto, ON | Sep 2022 - Dec 2023
- Implemented and benchmarked throughput-efficient, LLM inference setups of Jax, PyTorch+TensorRT, and TensorFlow frameworks running on CPUs, GPUs, and TPUs
- Leveraged mixed-precision inference, kernel fusion, data parallelism, and batch size tuning techniques to boost the inference latency of models on CPUs by 4X
- Conducted a self-initiated, proof of concept project on inference of embedding models on TPUs using Jax and Haiku frameworks, resulting in 2X throughput improvement and 50% cost reduction compared to A100 GPUs
- Built a dockerized end-2-end model export pipeline in Python for handling export and deployment of billion-parameter Jax model checkpoints, to TensorFlow, PyTorch and Triton FasterTransformer serving solutions for production
Machine Learning Engineering Intern - Airy3D
Montreal, QC | Jan 2022 - Apr 2022
- Conducted data analysis, cleaning, validation, and feature engineering on 500k+ unstructured image data
- Modeled, trained and hyperparameter tuned a custom end-to-end CNN built on top of Resnet in PyTorch, leveraging sensor generated depth maps to improve state-of-the-art object tracking F-Score by 12%+
- Implemented a Gauss-Newton based optimizer with Conjugate Gradient in PyTorch and Python, improving object localization latency by 10X compared to a standard Gradient descent
- Wrote CUDA kernels for model sub-modules and deployed the model as a desktop application via C++ and ONNX
Software Developer Intern - Ford Motor Company of Canada
Oakville, ON | May 2021 - Aug 2021
- Developed automated test scripts using Python and Slash library, expanding the Wi-Fi testing coverage by 10%
- Improved overall runtime and test pass rate by 40% through debugging and fixing software of failed test suites
- Executed 1000+ automated test runs via Jenkins CI, identified root cause of failures, and reported defects
Software Developer Intern - Vancouver Community Network
Vancouver, BC | May 2020 - Aug 2020
- Redesigned client-side authentication system by implementing a SHA-512 hashing algorithm in C and Perl, minimizing system breaches and increasing identity validation speed by 80%
- Reduced local server dependency by building a RESTful API in Python to migrate user data to GSuite
- Increased team efficiency by +10 hours a week through automating compromised password detection using Node.js
Research Experience
Undergraduate Research Assistant - Stanford NLP
Remote | April 2023 - Present
- Conducting research on DSPy framework and Retrieval-Augmented Generation (RAG) with LLMs (LLama3, GPT-4) under the supervision of Omar Khattab
- Publication: Prompts as Auto-Optimized Training Hyperparameters: Training Best-in-Class IR Models from Scratch with 10 Gold Labels https://arxiv.org/abs/2406.11706
Undergraduate Research Assistant - University of Waterloo Data Systems Group
Waterloo, ON | April 2023 - Present
- Conducting research on Retrieval-Augmented Generation (RAG) with LLMs (LLama3, GPT-4) using PyTorch and Huggingface under the supervision of Ronak Pradeep and Prof. Jimmy Lin
Publications
Blog Posts
-
Hacking Images with LLMs, Part I: Encoding, Alignment, Combining
Published on Sep 22, 2024 · 5 min read
With the rise of open-source vision transformer models, image understanding and multimodal LLMs have become essential features in AI. In this blog, we'll explore how to get LLMs to process images...
-
Building Your Machine Learning Career: Essential Advice for Students and Practitioners
Published on Aug 31, 2024 · 3 min read
Over the past few years, countless students have asked me how to break into the world of machine learning. Since my first year of college…
-
Top 5 Lessons I Learned from Analyzing 1000 YC Startups
Published on Mar 30, 2024 · 6 min read
As a person in tech, I often wondered what kind of companies get into Y Combinator, the Ivy League of startup accelerators.
-
How to Create a Personal Website with Notion and Github in less than 10 minutes
Published on Dec 26, 2022 · 2 min read
Are you dreaming about making a quick personal website using the "elegant" notion formatting?
Must Read
Must See
Must Watch
My Book Collection
Reading List