Codebase context engineering - Your codebase is the prompt
tldr: The hype tells you the next model release, the new shiny tool, is what will make you productive. From my experience, it's not — but your codebase is. Build solid foundations, iterate on your tools, do the boring work: this investment will pay off with any model and any AI tool you could choose.
Plus, and I can't stress this enough, you are in control of this process.
In my case, that approach is good for 50%+ productivity gains, with better code and broader test coverage than what I shipped before LLMs.
Human reviews are the bottleneck - at least for now
At the time writing those lines, there is a consensus emerging: left alone, agents are not capable of writing maintainable code, and that all the wrong technical decisions they take compound to a flaky codebase with more bugs than their human counterpart would produce.
The resulting codebase will also often feel "off", showing that subtle choices matter to craft a codebase that "feels good". Which is important, as not all tasks can be handled by agents alone: humans will still have to get their hands dirty. If they can't get a grasp of the codebase, the time gained will evaporate quickly.
The models definitely still make mistakes and if you have any code you actually care about I would watch them like a hawk, in a nice large IDE on the side.
Andrej Karpathy
For now, human made, thorough code reviews are still considered necessary. But code reviews are expensive, tiring (especially reviewing LLMs code) and human time is limited compared to the agents' productivity.
In short, reviews are becoming one of the biggest bottleneck to the productivity boost provided by LLMs, and the key is to speed them up by any means without lowering their quality.
This can done by being able to spot the critical details of the implementation fast and being able to trust that most tasks are executed exactly as we want, so we can skim over them and spot quirks quickly.
Sure, the statements above are going to change dramatically over the next few years, some will probably age poorly. Agents will certainly get better at coding, better orchestration and token cost reduction will allow for (hopefully good?) automatic review using multiple models, and tools/documentation discovery will improve.
But ensuring all your agents code "the same way" will stay on topic: human teams made of decent developers have had the same problem for decades and I think that agents are no different. A LLM's made Frankenstein codebase is not better than a human made one.
And yes, agents can produce quality output, given the right tools
LLMs are by no-means perfect, but I feel that a lot of criticism around "vibe coding" is missing an essential point:
"Vibe coding" does not have to lead to bad and hard to maintain code, if you provide a strong and adequate playground to your agents.
The tale that LLMs could only produce badly crafted code was reassuring for a while —probably good for our egos— but does not hold up anymore, at least for a lot of projects I've seen.
My recent experience has been the opposite: "AI assisted engineering" (in contrast to "vibe coding") can lead to well crafted and documented code, as long as you do not throw tasks beyond their capabilities (which evolve every week).
No doubt, LLMs can fall flat on complex tasks. But most of our (at least my) day to day work is much more mundane. As I've said before, once you start treating LLMs like polyglot junior devs who can't learn (poor long term memory, no real problem solving or vision) but can execute fast — and you start building a framework to work around those limitations — you stop sacrificing quality for velocity.
The term quality is pretty lose here, but think "clean code", separation of concerns, small components, avoiding mutations, readable code, complete error handling, a lot of taste, and much more. All the big decisions and small day-to-day details that make a "good" codebase ("good" varies based on your typical project).
Onboarding agents is like any onboarding
Imagine you are onboarding a new junior dev in your team.
What do you think impact their productivity the most? What can you provide to ensure they start delivering right away? And what do you put in place to ensure that their output is matching your expected quality bar, allowing you to skim over their Pull Requests as quickly as possible?
From my experience, the most important criteria are:
- a logically structured codebase, with files at the right place, with the right name. A codebase that feels easy and natural to explore ;
- technology and framework that fits the project, reduce cognitive load via typing and good api, and are well documented ;
- a concise yet complete, up to date documentation ;
- clear processes and rules, with the right tooling to ensure those are respected ;
- and lastly, a good amount of examples (e.g. existing code) that shows the expected code more fully than linter rules and documentation could
And that goes without saying, clear expectations for their tasks (not too big, not too small, and precise acceptance criteria) are also mandatory, but that's another topic.
Usually, the new recruit will use "copy and paste" approach first: they are tasked to create a new dashboard page? They check the documentation, the existing pages, the existing components and file structure, the existing API calls, and they mimic that.
Sure, they won't improve the codebase with this approach, but at least that ensures consistency and makes code reviews faster and more reliable, as the reviewer knows what to expect and can spot drifts faster.
Codebase context engineering
With agents, how you enforce the 5 criteria above will differ. Your first thought is probably to use Claude Skills or Cursor Rules, or even some kind of complex knowledge management system, with a multi level prompting, etc..
I tried those, and failed. I came up with the conclusion that nothing explains code better than code itself, so I decided to go the other way: bake as many rules in the code and its structure as possible.
This is is called codebase context engineering (I did not coin the term), and instead of rules, it takes this form:
- The code structure is enforced by providing actual, well, code!
- The choice of technology is enforced by setting things up yourself
- The project comes with detailed documentation
- The automations and process (hooks, CI, rules and skills) are in place
- Examples are provided in the codebase too
Ultimately, what you want is a repository you can clone and start throwing agents at, knowing that they will (almost) always produce the same code quality.
Concrete example — Building our boilerplate
Being the CEO at Lonestone — a web development company — provided me with the perfect use case.
My team and I had been needing a shared codebase for several years now. We work for a multitude of clients, and while we automate more and more the coding process, the complexity of those projects still require major human involvement. We also don't do "vibe coding", our quality bar is high and our projects can not become "AI blackboxes".
We had been using a common tech stack for most projects for years, but creating and maintaining a fully functional boilerplate is a costly initiative. As for many things, LLMs reduced that cost dramatically, allowing me to start working on this new tech stack in 2025.
In this article, I'll define a boilerplate as a "ready-to-go" codebase that can be cloned and adapted for your project. Boilerplates have been around for a long time, especially in the "web agency" world, when new projects need to be spin up every day.
Contrary to simply copy/pasting/cleaning your previous project to start a new one, boilerplate are a bit more flexible and are usually meant to be shared with your team.
I want to stress out that I did not only aim to improve Agents' productivity: developers in my team often switch from one project to another, and the company starts new projects every month, so a good boilerplate would boost the team productivity.
And from my experience, what works well for humans works just as well for agents.
After a few months of work on my own, and another few months working on the boilerplate with the whole team, we open-sourced it here. I'll often use this codebase to illustrate my points, so feel free to take a look (and to use it for your own projects!).
More than just the right choice of technologies, we aim to respect the 5 criteria above. And it pays off: our productivity sky-rocketed, without measurable impacts in term of code quality.
Up next — Solid foundations matter more than ever (coming soon)
The two foundations that quietly shape every agent interaction before you write a single rule: a tasteful structure and the right tech.
Then — Documentation and processes do the heavy lifting (coming soon)
The parts you actually have to write down: documentation your agents (and your future self) keep coming back to, and the rules, hooks and custom tools that make those decisions stick.
Finally — Show, don't tell — and know what not to delegate (coming soon)
The parts that resist documentation: working examples that distill your taste better than any guideline, and the strategic choices you should never delegate to an LLM in the first place.