Module 8: Agents from Scratch — Lesson 1 of 1

Agents from Scratch: Series Overview

May 26, 2026·7 min read·Sudipta Pathak
Agents from Scratchagentsllmtool-useragmemoryplanningmulti-agentmcpproductionoverview

Agents from Scratch

Why another agent series

The agent tutorial landscape splits into two unhelpful piles. One pile is prompt cookbooks — "here's how to phrase your prompt so the agent does the right thing." The other is framework demos — "here's how to call LangGraph / CrewAI / Swarm to make a thing." Neither teaches you how agents actually work.

This series takes the third path: build every meaningful agent primitive from scratch, in the smallest amount of code that captures it, then scale it across machines.

By the end you should be able to read a multi-agent system's source on a Friday and have an opinion on its failure modes by Monday — because you'll have built each piece yourself.

The teaching philosophy

The same three rules from the Inference series apply here, adapted for agents.

1. Plain language first, formalism second

You can't reason about a planner you can only call. Every concept starts with a sentence you could say to a colleague at a coffee machine — "ReAct is a while loop that lets the model decide between thinking and acting at each step" — before any formalism shows up.

When the formalism arrives (state machines, protocols, schemas), we walk through it piece by piece, no skipped steps. If a message gets transformed before it's appended to context, we say exactly what changes and why.

2. Build it on one machine before you scale it

A single-process implementation is the unit test for your mental model. If you can't write a ReAct loop in 100 lines of plain Python, you have no business reasoning about durable distributed agent scheduling.

Every primitive that can be implemented in a single process is built there first — no framework, no abstractions you didn't write yourself. You'll see the naive version, the bug, the fix, and the production version. In that order.

3. Then take it distributed

After the single-process version works, we rewrite it for the realistic case: multiple agent workers behind a queue, durable state in Redis or Postgres, tracing across processes, scheduling under cost and rate-limit constraints. This is where most agent content drops off and where production actually lives.

You'll see the message protocols, the failure modes (deadlock, divergence, role drift), and how the patterns scale — or don't.


The roadmap

Ten parts, ~60 tutorials, sequenced so each one earns the next. Topics get linked here as they ship.

Part 1 — The agent loop

The foundation. Before frameworks, before tools — what is an agent?

  1. What makes a system "agentic" — autonomy, tools, feedback
  2. The minimal loop: perceive → reason → act → observe
  3. ReAct from scratch in <200 lines, no framework
  4. LLM-as-planner vs LLM-as-worker
  5. State machines vs flexible loops — when each wins

Part 2 — Tool use

The thing that makes an agent useful. The thing that makes it dangerous.

  1. Native function calling — what's actually in the API payload
  2. JSON-schema decoding and grammar-constrained outputs
  3. Parallel tool calls — async patterns, fan-out/fan-in
  4. Tool selection at scale — when you have 100+ tools (retrieval, hierarchies)
  5. Tool result feedback — error recovery, retries, idempotency
  6. Designing tools well — schema, side-effect taxonomy, reversibility

Part 3 — Memory

The agent's view of the world beyond the current context.

  1. Why the context window isn't enough — the agent memory hierarchy
  2. Conversation memory — buffer, summary, sliding window
  3. Vector memory — embeddings as a retrieval index
  4. Key-value & structured memory — when you don't need similarity
  5. Episodic vs semantic memory
  6. Memory compaction & eviction — the bounded-context problem

Part 4 — Retrieval-Augmented Generation

RAG deserves its own part. It's the single most common production agent pattern, and every interview asks about it. We build the whole stack, including the bits everyone hand-waves through.

  1. Why RAG — the parametric vs non-parametric memory split
  2. The core pipeline from scratch — load, chunk, embed, retrieve, generate
  3. Chunking strategies — fixed, semantic, recursive, document-aware
  4. Embedding models — what they encode, where they fail
  5. Vector DBs from the inside — HNSW, IVF, pgvector
  6. Hybrid search — BM25 + dense + reranking with cross-encoders
  7. Query transformation — HyDE, multi-query, step-back prompting
  8. Advanced retrieval — parent-child, contextual retrieval, ColBERT
  9. Agentic RAG — the agent decides what to retrieve, when, and how
  10. Self-RAG & Corrective RAG
  11. Graph RAG — knowledge graph + vector hybrid
  12. RAG evaluation — faithfulness, context precision/recall, RAGAs

Part 5 — Planning & reasoning

When one step isn't enough.

  1. Single-step vs multi-step planning
  2. Plan-and-execute
  3. Tree of Thoughts
  4. LATS — Language Agent Tree Search
  5. Self-reflection & critic loops
  6. Verifier agents — separating "do" from "check"

Part 6 — Context engineering

The 2025–26 hot topic. Treating the context window as a resource you have to budget.

  1. Context windows as a resource — budget like RAM
  2. Compaction strategies — summarization, eviction, layered context
  3. Subagent isolation — when to spin up a fresh context
  4. Cache-friendly prompts — structuring for prompt caching
  5. Context pollution — drift, recency bias, instruction decay

Part 7 — Multi-agent systems

Multiple agents, multiple failure modes.

  1. The orchestrator-worker pattern
  2. Debate & adversarial agents
  3. Role-based crews (CrewAI-style)
  4. Handoff protocols — OpenAI Swarm patterns
  5. A2A and the interop landscape
  6. Communication failure modes — deadlock, divergence, role drift

Part 8 — Build it from scratch

The frameworks everyone uses, taken apart.

  1. A minimal agent loop in 100 lines
  2. LangGraph internals — rebuild the executor
  3. AutoGen architecture
  4. OpenAI Agents SDK / Swarm internals
  5. Claude Agent SDK & Claude Code's loop
  6. MCP from scratch — protocol, custom server, custom client

Part 9 — Vertical agents

Where the patterns meet real product surfaces.

  1. Coding agents — Claude Code, Cursor, Aider patterns
  2. Browser & computer-use agents — OSWorld, Playwright loops
  3. Research agents — Deep Research patterns
  4. Data agents — NL2SQL, autonomous analytics
  5. Long-running & scheduled agents — cron, event-driven, durable execution

Part 10 — Production agents

This is the part where the systems work pays off. Distributed scheduling, observability, eval pipelines, cost control — agents as infrastructure, not as demos.

  1. Trajectory eval vs outcome eval; LLM-as-judge
  2. Benchmarks — SWE-bench, GAIA, WebArena, OSWorld
  3. Tracing & replay — Langfuse / custom OTEL pipelines
  4. Safety — prompt injection, sandboxing, HITL checkpoints
  5. Cost & latency engineering — model routing, parallelism, caching
  6. Distributed agent scheduling — Ray, queue-backed workers, statefulness
  7. A/B testing agents in production

How this series fits the bigger picture

This is one of four breadth tracks I'm building, with ML infrastructure as the depth bar — a T-shaped portfolio.

  • Pre-Training from Scratchplanned
  • Post-Training from Scratchplanned
  • Inference from Scratchshipping now
  • Agents from Scratchthis series

Every breadth track lands in the same place: a "Production X" part that uses the infra depth — distributed scheduling, observability, queue infrastructure, eval pipelines. That's the bridge back to the vertical bar of the T.


How to follow along

You can read these in order — they're sequenced for that — or jump to the part most relevant to what you're building. RAG (Part 4) and Production Agents (Part 10) are the parts most interview loops care about; if you're optimizing for that, start there and double back for foundations.

Code lives alongside the prose. Single-process versions run on anything with a CPU and an API key. Distributed sections assume access to a small cluster, a managed Redis/Postgres, and a vector DB — instructions for free-tier setups are included where they apply.

If you find an error, an explanation that didn't land, or a topic that's missing — open an issue or reach out. This series is meant to be lived in, not just published.