Inside Out | On code agents

Coding agents are now a part of my daily workflow both at home and work. I’m beginning to see a trade-off between building a lot of features, shipping them faster, and the quality or craftsmanship. Let’s consider two scenarios.

At home, I recently purchased a developer subscription for an open-weights LLM. They provide Claude and OpenAI compatible API endpoints and nicely integrate with Claude Code etc. I was already running aider↗ with Openrouter and Gemini free plans, so the migration was straightforward. I primarily used this for Rust, C# and some Typescript hobby projects.

The work projects were much more challenging. We’re trying to migrate ~50 FluentUI↗ v8 controls to v9. They’re part of a UX platform used in multiple downstream monorepos. Given the wider use this code requires a higher quality standard.

Here’s what I learned.

Pick joy over quantity

I cancelled the developer subscription for home. Obvious advantages of shipping more features quickly were weighed down by the dryness of fixing AI slop.

The open-weights LLM plan actually punched well-above the price point and it was a joy to use Claude Code, one of the good CLI apps I’ve recently used. I’d happily consider them again.

In my scenario, I do these open-source hobby projects for fun and learning. By design they are privacy focused, I don’t track DAU, MAU, or manage OKRs.

Few things went wrong with the subscription.

Focus to maximize quantity: I noticed myself picking up the agent at every possible chance to churn some feature or other. If I pay for it, I’d rather use it all day. Heck I even tried to parallelize my study time with an agent running in background building something or other. Not healthy.
Too noisy: With more quantity of code produced, the quality took a toll. Even with all possible instruction tuning including focused planning and iterations, the agent will lose context and violate basic principles. End result most of my time went to review and ask the Agent fix the slop introduced. This is something I’d have avoided with active coding.
Zero learning: AI is a more knowledgeable dev than I am especially so in the areas new to me. With passive coding, the features got built, build, and test passed without any design as such. Yes, I do the planning/architect phase before I ask LLM to code. Still the nuances matter, getting involved matters.

The lesson: just like slow travel, I will embrace slow code. Nobody cares if I ship 30 features on a side project, I’d rather do it on my terms.

Decades ago I was introduced to Resharper↗ at work. Back then it was a linter on steroids with excellent suggestions including refactorings. I genuinely learned tons of best practices in C# and .NET from the linter warnings and errors than from the LLMs today.

Maybe today’s AI agent is more capable, runs without me, but I’d prefer its dumb counterpart where learning and joy matters.

Where quantity and speed matters

All that said slow crafts-focused coding is no longer a viable option at work.

There are OKRs everywhere to build and ship in days. Heck we measure the usage of AI and you may get a gentle note to embrace it more. I must admit that I’m in such a role pushing for AI dev productivity in my organization. Sorry for that, it’s either you embrace AI, fit in or you’re out.

When we started on the FluentUI↗ v8 to v9 migration, the prototype informed something like 80-90% code generated by AI is good along with awesome test coverage. We claimed to be done with entire migration using 3x less effort.

All of this was exciting until we tried a complex prototype! Suddenly the v9 code generated by AI appeared to be v8 like, missing the new features slots architecture↗, or style design tokens↗. Like an AI first developer, we added few MCPs including one that provided access to FluentUI v9 docs, threw in detailed prompts with meticulous examples of code snippets, and AI was able to generate decent code.

But with poor design choices. At this point we reset the expectation.

AI can do the migration of a simple control with poor design in a day or two. It will take 2x that time for the Human to align it with our design principles and improve the quality.

It is not delegate-and-forget.

Human must understand FluentUI v9 architecture and the design decisions to be able to guide the AI. This requires a learning curve and is a pre-requisite for AI driven migration.

This codified into the following heuristic.

Generate design docs for the UX component along with a v8-v9 migration note. Use AI if you will but the Human Developer must attest to the design choices and quality.
Develop a custom Github Copilot chat mode for migration with access to specific tools only.
Use the design doc and use AI to develop step-by-step. At each step ensure code is shippable similar to the red-green-refactor cycle. Keep building a mental model in your head as AI generates code.
Bar for PR reviews remain same as earlier: unit tests, storybooks, a11y testing and code that you’re proud of.
AI PR reviews is okay. But human PR review is required. Response to human reviewer comments must be by human PR author. We learned that AI response to human PR comments was too verbose and missed the point leading to unnecessary cycles.

The good part of this heuristic is our human developers are no longer flying blind. They are in control, their architectural skills grow because we have human design discussions on each PR, and code continues to ship fast enough.

Here’s my key takeaway.

I’m going to optimize for the craft of building software where I get a chance. This includes slow and mindful coding.

In an industry setting where the incentives are on faster delivery and quantity of features, I will not fly blind. I will focus on setting a process that works for the project, optimize to empower the human in loop, and improving the craft as much as I can.

I’d love to learn from your experience. Please do drop an email if you’d like to chat more on this!