Faraz Khoubsirat

I am a fourth-year Software Engineering student at the University of Waterloo, specializing in Machine Learning and Artificial Intelligence. My passion lies in training and optimizing large transformer-based multimodal models, with a focus on enhancing their efficiency, interpretability, and real-world applicability.

Interests

Transformer-based efficient model training and inference
Multimodal LLMs and Vision LLMs
Parameter efficient model finetuning
Transformer-based Embodied Autonomous Agents
Tech, investing, personal growth, podcasts, and books

Skills

Languages: Python, C++, C, SQL, JavaScript, Java, Scala, Assembly

Technologies: PyTorch, Jax, TensorFlow, Pandas, Scikit-learn, MySQL, Docker, Kubernetes, GCP, AWS

ML Focus: Modeling and Training Transformer-Based LLMs and Vision LLMs, Parameter-Efficient Fine-Tuning of LLMs, Optimized Export and Inference of LLMs, Retrieval-Augmented Generation

Work Experience

Software Engineer Intern - NVIDIA

Santa Clara, CA | Oct 2024 - Dec 2024

Designed and developed Tripy in C++ and Python—a PyTorch-inspired machine learning framework powered by TensorRT— delivering customizable, high-performance inference pipelines
Added core layers such as BatchNorm and Sequential, and extended model support to include vision models such as ResNet50
Implemented an efficient intermediate IR caching mechanism within Tripy, accelerating model execution by 100x and optimizing throughput for GPT and ResNet benchmarks

Research Engineer Intern - Cohere AI

Toronto, ON | Jan 2024 - Aug 2024

Designed parameter-efficient fine-tuning framework for model-parallel distributed training and inference of LLMs on GPUs
Implemented, trained, evaluated, and deployed Cohere's first Multimodal Vision LLM in PyTorch
Led internal research to evaluate fine-tuning techniques for LLMs on customer data, comparing methods such as parameter efficient supervised fine-tuning, direct policy optimization, and offline preference tuning
Developed and dockerized a FastAPI + Uvicorn server for batch inference of fine-tuned LLMs and VLLMs, integrating it with evaluation libraries to enable large-scale evaluation of fine-tuned models

Machine Learning Engineering Intern - Cohere AI

Toronto, ON | May 2023 - Aug 2023

Led the development of a comprehensive data-quality evaluation library to evaluate LLM quality in terms of human preference, grammar, spelling, and repetitiveness, using Python and Pandas
Integrated the data-quality library to run on training datasets and +10,000 daily API data to flag and remove bad items
Fine-tuned Cohere's Command model on collected and cleaned API data resulting in a 70% performance boost, using PyTorch and Scikit-learn

Machine Learning Engineering Intern - Cohere AI

Toronto, ON | Sep 2022 - Dec 2023

Implemented and benchmarked throughput-efficient, LLM inference setups of Jax, PyTorch+TensorRT, and TensorFlow frameworks running on CPUs, GPUs, and TPUs
Leveraged mixed-precision inference, kernel fusion, data parallelism, and batch size tuning techniques to boost the inference latency of models on CPUs by 4X
Conducted a self-initiated, proof of concept project on inference of embedding models on TPUs using Jax and Haiku frameworks, resulting in 2X throughput improvement and 50% cost reduction compared to A100 GPUs
Built a dockerized end-2-end model export pipeline in Python for handling export and deployment of billion-parameter Jax model checkpoints, to TensorFlow, PyTorch and Triton FasterTransformer serving solutions for production

Machine Learning Engineering Intern - Airy3D

Montreal, QC | Jan 2022 - Apr 2022

Conducted data analysis, cleaning, validation, and feature engineering on 500k+ unstructured image data
Modeled, trained and hyperparameter tuned a custom end-to-end CNN built on top of Resnet in PyTorch, leveraging sensor generated depth maps to improve state-of-the-art object tracking F-Score by 12%+
Implemented a Gauss-Newton based optimizer with Conjugate Gradient in PyTorch and Python, improving object localization latency by 10X compared to a standard Gradient descent
Wrote CUDA kernels for model sub-modules and deployed the model as a desktop application via C++ and ONNX

Software Developer Intern - Ford Motor Company of Canada

Oakville, ON | May 2021 - Aug 2021

Developed automated test scripts using Python and Slash library, expanding the Wi-Fi testing coverage by 10%
Improved overall runtime and test pass rate by 40% through debugging and fixing software of failed test suites
Executed 1000+ automated test runs via Jenkins CI, identified root cause of failures, and reported defects

Software Developer Intern - Vancouver Community Network

Vancouver, BC | May 2020 - Aug 2020

Redesigned client-side authentication system by implementing a SHA-512 hashing algorithm in C and Perl, minimizing system breaches and increasing identity validation speed by 80%
Reduced local server dependency by building a RESTful API in Python to migrate user data to GSuite
Increased team efficiency by +10 hours a week through automating compromised password detection using Node.js

Research Experience

Undergraduate Research Assistant - Stanford NLP

Remote | April 2023 - Present

Conducting research on DSPy framework and Retrieval-Augmented Generation (RAG) with LLMs (LLama3, GPT-4) under the supervision of Omar Khattab
Publication: Prompts as Auto-Optimized Training Hyperparameters: Training Best-in-Class IR Models from Scratch with 10 Gold Labels https://arxiv.org/abs/2406.11706

Undergraduate Research Assistant - University of Waterloo Data Systems Group

Waterloo, ON | April 2023 - Present

Conducting research on Retrieval-Augmented Generation (RAG) with LLMs (LLama3, GPT-4) using PyTorch and Huggingface under the supervision of Ronak Pradeep and Prof. Jimmy Lin

Publications

Prompts as Auto-Optimized Training Hyperparameters: Training Best-in-Class IR Models from Scratch with 10 Gold Labels
https://arxiv.org/abs/2406.11706

Blog Posts

Hacking Images with LLMs, Part I: Encoding, Alignment, Combining
Published on Sep 22, 2024 · 5 min read
With the rise of open-source vision transformer models, image understanding and multimodal LLMs have become essential features in AI. In this blog, we'll explore how to get LLMs to process images...
Building Your Machine Learning Career: Essential Advice for Students and Practitioners
Published on Aug 31, 2024 · 3 min read
Over the past few years, countless students have asked me how to break into the world of machine learning. Since my first year of college…
Top 5 Lessons I Learned from Analyzing 1000 YC Startups
Published on Mar 30, 2024 · 6 min read
As a person in tech, I often wondered what kind of companies get into Y Combinator, the Ivy League of startup accelerators.
How to Create a Personal Website with Notion and Github in less than 10 minutes
Published on Dec 26, 2022 · 2 min read
Are you dreaming about making a quick personal website using the "elegant" notion formatting?

Table of Contents

Interests

Skills

Work Experience

Software Engineer Intern - NVIDIA

Research Engineer Intern - Cohere AI

Machine Learning Engineering Intern - Cohere AI

Machine Learning Engineering Intern - Cohere AI

Machine Learning Engineering Intern - Airy3D

Software Developer Intern - Ford Motor Company of Canada

Software Developer Intern - Vancouver Community Network

Research Experience

Undergraduate Research Assistant - Stanford NLP

Undergraduate Research Assistant - University of Waterloo Data Systems Group

Publications

Blog Posts

Must Read

Must See

Must Watch

My Book Collection