ML Systems & Infrastructure

Sudipta Pathak, PhD

Open to Principal/Staff RolesUS Citizen - No Sponsorship Required

Building scalable ML platforms, distributed training systems, and AI infrastructure. Expert in MLOps, model optimization, and production-grade machine learning.

About Me

I am a Machine Learning Systems Engineer with deep expertise in building production-grade AI infrastructure. My work spans from distributed training systems to edge deployment, with a focus on scalability, efficiency, and reliability.

Distributed Training

Large-scale model training across GPU clusters with optimized data parallelism and model parallelism strategies.

ML Infrastructure

Building robust ML platforms with Kubernetes, Kubeflow, and custom orchestration for production workloads.

Data Pipelines

High-throughput data processing pipelines with Apache Spark, Ray, and modern data lake architectures.

Cloud-Native ML

Multi-cloud deployments on AWS, GCP, and Azure with focus on cost optimization and scalability.

Model Optimization

Quantization, pruning, and distillation techniques for deploying efficient models at the edge and cloud.

MLOps & Security

End-to-end ML lifecycle management with CI/CD, monitoring, and enterprise-grade security practices.

Writings

Technical blog posts, paper reviews, and tutorials on ML systems, infrastructure, and distributed computing.

AI-Assisted Coding
Feb 202611 min read

The Claude Code Field Manual: 13 Battle-Tested Workflows From the Tool's Creator

Most developers use Claude Code at 20% of its potential. These 13 workflows -- from parallel instances to feedback loops -- will change how you ship. Fork the repo, work through the exercises, and build muscle memory by end of day.

Read more
Text-to-SQL
Feb 202626 min read

NL2SQL Benchmark: Vanna AI vs WrenAI vs DB-GPT — 50 Queries, One Database, Three Approaches

I benchmarked three open-source NL2SQL approaches against the same DuckDB database with 50 gold-standard queries. Vanna AI's RAG approach led at 40% execution accuracy, but all three tools reveal hard truths about the state of text-to-SQL.

Read more
Text-to-SQL
Feb 202620 min read

Dissecting Open-Source NL2SQL: How Vanna, WrenAI, and DB-GPT Approach Text-to-SQL

A deep technical teardown of three leading open-source NL2SQL frameworks -- tracing the query path from natural language to SQL, comparing RAG vs semantic models vs multi-agent architectures, and identifying what works and what doesn't.

Read more
SGLearn
Feb 202514 min read

Debugging SGLang's NIXL PD Disaggregation: How a Missing CLI Argument Breaks Multi-Prefill Inference

A deep dive into root-causing a multi-prefill disaggregation bug in SGLang's NIXL backend -- from symptoms to a three-component failure spanning Rust, Python, and ZMQ.

Read more
AI-Assisted Coding
Feb 20259 min read

The Complete Opencode Guide: Tips, Tricks, and Best Practices

A comprehensive, evolving guide to effectively using Opencode for AI-powered coding. Learn tips, tricks, and workflows from daily use that will transform your development workflow.

Read more

ML Systems & Infrastructure Curriculum

A structured learning path covering production ML systems — from distributed training to LLM serving at scale.

4 modules · 1 lesson

Explore the Curriculum

Get in Touch

Whether you are looking for a senior ML Systems engineer, need consulting on AI infrastructure, or want to collaborate on research, I would love to hear from you.