Agentic Engineering 101

In 2005, a freestyle chess tournament was held online. Freestyle means anything goes - players can use chess engines, databases, teammates, whatever they like. Grandmasters entered, paired with the best software available. Then there was ZackS: two guys from New Hampshire, a database administrator and a football coach. Neither of them serious chess players. They ran three chess engines simultaneously and had built a process for knowing when to trust the computer and when to override it.

They beat the grandmasters. They beat Hydra, one of the most powerful chess computers in the world.

Garry Kasparov had been thinking about this kind of pairing for years: "A weak human player plus machine plus a better process is superior, not only to a very powerful machine, but most remarkably, to a strong human player plus machine plus an inferior process."

That's as true for AI software development in 2026 as it was for freestyle chess in 2005.

In February last year, Andrej Karpathy coined "vibe coding" - the word we're all a bit bored of by now, but an exciting idea: that you could fully give in to the AI, stop reading code, and just see what happened. To this day it's an incredibly addictive workflow, and especially empowering to people without traditional engineering backgrounds.

The challenge people tend to have with vibe coding is that the experience degrades over time. The more a codebase grows, the more the experience drifts toward frustration. Features that used to appear in minutes start requiring more coaxing. The AI contradicts things it built two sessions ago. Changes that should be simple break things they have no business breaking.

What's happening has a technical explanation that doesn't require a technical background to follow.

Every conversation you have with an AI coding tool - everything you've said, everything it's thought about, every file it's read - gets processed together as a body of text. The model can only work with what fits inside its context window - think of it as working memory. When GPT-3 launched in 2020, that window was 2,048 tokens - roughly 1,500 words. Developers treated it like a scarce resource, because it was.

Today, the average context window tends to sit around 200,000 tokens. Some models go to two million. This is largely what unlocked AI coding tools as a category - suddenly a model could hold an entire codebase in its head at once.

But expanding the window creates new problems.

Think of it like a glass of cordial. The cordial is your actual goal - the feature you want built, the problem you want solved. Every file the AI reads, every message in the conversation, every piece of context it processes before it starts doing any actual work: that's water going into the glass. The more water you add, the more diluted the cordial gets. Your goal is technically still in there. But the concentration - the model's grip on what actually matters - weakens with every pour.

This introduces context engineering: a set of habits for keeping the glass as concentrated as possible. Minimising noise. Making sure what the model spends its attention on is the thing you actually care about.

The good news is that most of the hard parts have been quietly solved for you. Claude automatically compacts the conversation once it approaches 80% of the context window - summarising older parts of the session while keeping the active work fresh. It also indexes your codebase upfront so it can retrieve only the files relevant to the current task rather than loading everything at once, and uses tools like grep and glob to search surgically rather than reading entire directories into context. There's more going on under the hood, but the point is that serious AI coding tools are actively solving for this. You don't need to manage it yourself.

What you are responsible for is a handful of good habits to make the most of these systems. And the most important of them brings us back to ZackS. They won because they knew how to structure the work. When to calculate. When to trust the machine. When to step in.

That is the job description Karpathy had in mind when he recently declared vibe coding passé and introduced what comes next: agentic engineering.

His definition: "You are not writing the code directly 99% of the time. You are orchestrating agents who do, and acting as oversight."

Vibe coding is giving in to the vibes. Agentic engineering is the process you wrap around the model. And - the part that mirrors what Kasparov kept observing in freestyle chess - it's something you can learn and get measurably better at.

The rest of this is about how.

A quick word from our sponsor: Vibe Check Cloud. (The sponsor is us, and we don't know what we're selling because it's free and open-source.)

Before we get into the workflow, one tool worth knowing about if you're currently vibe coding and want a safety net while you build better habits.

Vibe Check is a free, open-source toolkit we built for catching the production risks that vibe coding tends to leave behind. It comes in two forms depending on where you are in your journey.

If you're not technical and using a web-based AI tool like Lovable, Replit, or Bolt - check out vibe-check.cloud, which gives you a personalised risk assessment and prompt packs you can paste straight into whatever you're building with. No code required.

If you're using Claude Code - the CLI plugin scans your actual codebase across security, payments, authentication, and 16 other domains, then generates specifications to fix what it finds.

Think of it as the checklist a senior engineer runs through before going live, now available to anyone at any stage of a build. If you're vibe coding right now, run it before you ship.

The rest of this article is written with trained engineers in mind - but in plain English, and if you're comfortable with tools like Claude Code you could absolutely apply these techniques yourself. We'll be using Claude as the reference point throughout, but the same habits apply whether you're using Cursor, Codex, or any other serious agentic coding tool.

The First Good Habit: Plan Before You Build

This is the one that makes the biggest difference and the one most people skip.

In Claude Code, hit Shift+Tab twice (or type /plan). This drops you into Plan Mode. Claude will think through the task and tell you what it's going to do before it touches a single file. Read it. Push back if something looks wrong. Adjust the scope. Agree on the approach. Then let it execute. We spend most of our time here. Ten minutes of back-and-forth on the plan is almost always cheaper than an hour of untangling an implementation that went in the wrong direction.

It's also your last cheap opportunity to steer. Once Claude is in execution mode, mid-session corrections add noise to the context and tend to produce messier code than a clean plan followed cleanly. Get the plan right first.

The Second Good Habit: Context Files

Here's something that will feel familiar if you've spent any time coding with AI.

You have a great session. Things get built. You come back the next day, start a new chat, and spend the first twenty minutes re-explaining your project to an AI that has absolutely no memory of yesterday. Your stack. Your conventions. The thing you specifically told it not to do last week that it's now doing again.

The fix is a single file.

CLAUDE.md is a markdown file that lives in the root of your project. Claude Code reads it automatically at the start of every session. If you're using Cursor, the equivalent is .cursorrules. Codex and most other serious AI coding tools have their own version of the same concept. Whatever's in it becomes the baseline context for everything that follows - your stack, your conventions, your preferences, the mistakes that have already been made so they don't get made again.

Boris Cherny, who created Claude Code at Anthropic, described his team's practice: every time Claude does something wrong, they add it as a new rule in CLAUDE.md. Every mistake becomes an instruction. The file gets checked into git so the whole team benefits.

That's the flywheel. And it's available to anyone on any project. A basic CLAUDE.md might look like this:

# Project: Customer Portal

## Stack
Next.js, TypeScript, Tailwind, Supabase for database and auth, deployed on Vercel.

## Conventions
- Components go in /components, named PascalCase
- Use Zod for all form validation
- Error messages should be human-readable — not "string must contain 8 characters"
but "Password must be at least 8 characters long"
- Never use `any` in TypeScript

## Do not
- Modify any code without writing a plan first
- Use console.log in production code — use our logger utility
- Install new packages without flagging it first

## After every session
If you hit anything unexpected, or I have to correct your approach — reflect and suggest what should be added here.

The last section is the important one - it turns CLAUDE.md from a static document into something that grows with the project. At the end of every session, ask Claude what it learned. Add the useful bits. Commit it with the PR.

That's the "101" version - consider using Anthropic's official skill for this rather than jamming it into the CLAUDE.md file itself, but it'll do the job.

Start with something minimal. Add to it every time something goes wrong. Six months from now it'll be one of the most valuable files in your repository.

We keep our own CLAUDE.md lean, with modular references to more specific files like ARCHITECTURE.md or DATA-LAYER.md. The model knows they're there and pulls them in when relevant, rather than loading everything into every context window whether it's needed or not.

The Third Good Habit: Give It Tools

Out of the box, Claude Code knows how to read and write files, run terminal commands, and search a codebase. That's the tutorial level - next step:

MCP - Model Context Protocol - is a standardised way to give Claude Code access to external services. Think of it as a plugin system, but one that any tool can implement and any agent can use.

The practical value is that Claude stops needing you to copy and paste things into the conversation. Connect it to GitHub or Linear and it can pull the next ticket itself. Connect it to Slack and it can post a summary when it's done. Connect it to Sentry and it can read the error logs before it starts debugging.

The "gun to our head" must-have MCP for any coding session is Context7. Every model's training data has a cutoff, which means its knowledge of any given library is the version that existed when training ended. For fast-moving ecosystems - Next.js, React, Tailwind, whatever framework is on its fourth major version - this matters. Context7 pulls live documentation directly into the context window. Fewer hallucinated APIs. Fewer deprecated methods used with confidence. If you're building on anything that moves fast, install it.

Skills are knowledge files Claude picks up automatically when relevant - you don't invoke them, Claude just reads them when the context calls for it. Define them once, check them into git, the whole team benefits.

Slash commands are the explicit counterpart - workflows invoked by name with /command. In our case a /ship command handles the commit-push-PR flow, a /reflect command reviews the session and suggests CLAUDE.md updates.

Subagents run specific tasks in their own context window, keeping your main session clean. A code-simplifier agent strips unnecessary complexity after a session ends. A verify-app agent closes the verification loop - more on that below.

Hooks trigger commands automatically. A useful one: a PostToolUse hook that runs your code formatter on every file Claude edits. Handles the last 10%, means you never get a CI failure for a formatting issue.

The Fourth Good Habit: Make A Loop

Gradually automate this manual workflow, step by step.

Pull from your backlog. MCP if you have it, copy-paste if you don't. Clear task, known scope, before anything else.
Plan first. You know this one.
Execute. Stay present enough to catch anything obviously wrong, resist the urge to micromanage. You can use a /btw command in Claude to tell it stuff as it comes to mind without disturbing the current work.
Close the verification loop. Cherny: "probably the most important thing - give Claude a way to verify its work. If Claude has that feedback loop, it will 2-3x the quality of the final result." Tell Claude at the start of the session what done looks like and how to verify it. Ten seconds to write. Consistently better output. The invested version is a verify subagent that runs automatically at the end of every session.
Clean up. Run code-simplifier or ask Claude to remove anything unnecessarily complex before the PR.
Commit and raise the PR. /commit-push-pr. Done.
Update CLAUDE.md. Ask Claude what it learned. Capture the useful bits, commit with the PR. Each iteration makes the next one slightly smoother.

The Part That Compounds

ZackS didn't set out to beat grandmasters. They set out to build a better process than everyone else in the room. The grandmasters had more chess knowledge. Hydra had more raw compute. ZackS had a system for deciding when to trust the engine and when to override it, refined across hundreds of positions.

Process compounds. Every mistake that becomes a CLAUDE.md rule, every workflow that becomes a slash command, every verification step that catches a bug before it reaches the PR - the gap between a disciplined setup and an undisciplined one widens over time.

Karpathy's framing of agentic engineering - "you are not writing the code directly 99% of the time, you are orchestrating agents who do, and acting as oversight" - is a job description, but it's also a skill. Orchestration is learnable. Oversight is learnable. The habits in this piece are a starting point, not a ceiling.