How should we program differently to be better suited to Agentic Coding AI?
It makes sense to make programming overall more strict given the advent of AI as AIs benefit so much from good feedback.
If you have had the pleasure of using Cursor, Windsurf, or other agent-directed coding AI you’ve certainly seen the pattern now: The AI doesn’t always get it right but often it can see and fix the errors itself. Often you have to tell it “Hey I I’m getting an error when I run the program”, sometimes you have to give it more direct guidance “I think there’s a bug in how you’re joining the results of the two API calls together”. But sometimes it just can’t figure it out, and that seems to be usually when there is no good feedback mechanism to give the AI hints about what might be going wrong. We’ve all seen it just flail hopelessly and get nowhere.
It’s clear that the AI does better when it gets clear repeatable errors it can use the debug the problem. Even esoteric compiler errors seem to beat no errors by a long shot. What are some ways we might enable the AI via additional infromation?
Firstly, you might consider the human as the one typically playing this role. You might imagine as we get better at using the AI we get better at providing the feedback it needs. At a minimum we are evaluating if the code is good and giving feedback to the AI. It is clear however that the long term goal will be minimize the need for humans to insert themselves here given the high time cost. It’s also true that as the AI gets better humans will get worse at detecting their errors via inspection. It’s been like this with every kind of machine learning.
Second, tests are a possibility, if you can rigorously specify tests with AI and then the AI will use those tests and fix errors to make the code run as specified. However, it takes a lot more time due to the extra surface area of those tests. This is totally worth it for production code, but it’s a drag if you’re trying to toss together some small tool or MVP to get a job done. Potentially an alternative approach to this is to use AI to target each error with a test as they come as TDD programmers often do with production code, but that does still slow you down substantially. In the end I imagine the testing approach will work best with legacy code or APIs you can’t change as you don’t have many other options.
Third, if you can control it you likely want to make sure your API ecosystem gives good errors, or even wrap them so they do. The biggest problem I’ve run into thus far involved an API that accepted any parameters and ignored them if it wasn’t expecting them. There are good reasons we build APIs like this today: forward compatibility, resilience to tools appending common parameters, and to decouple deployments. Potentially it’s time to rethink this and at least provide warnings, or let the client decide how strict they want it. A reasonable compromise might be to develop with AI in strict mode, then turn it off for the prod deploy. As it is now I expect the less well tread APIs to be a major stumbling block for coding AI. Potentially this will be sidestepped by MCP enabling the AI to get exact api definitions but it remains to be seen if it will matter.
Fourth, you might expect a good compiler can help you here. The more the type system itself can provide boundaries to the AI, the more that AI can iterate with guardrails to correctness. This was the sales pitch of functional programming languages back when humans used to write code, I’ve said it so many times myself up on stage at conferences and I still believe it: “Make Illegal States Unrepresentable”. What does that mean? Make it so you have a domain model that is always valid, typically enforced by type signatures and checks on inputs at the boundaries. You then pattern match on all of the various possible states of that model as exhaustively as you can, and you can be *nearly* sure the program will never be in an unexpected state. There are some caveats but I’ll save those for another time.
I think many in the Functional Programming community imagine this fourth way being quite a powerful approach, I’m hopeful for it as well because I find the correctness quite beautiful, intellectually pleasant and less taxing to reason about. It may even be the case that if an AI could see all of the intermediate types, it might be able to detect entirely new classes of error early, or even performance issues. However in most cases all but the most popular languages are currently constrained by the fact that models tend to be much better at the most popular programming languages.
I think the key takeaway is that the dynamics in play here closely mimic the very same ones we had when humans were writing code: correctness vs popularity, but the population has shifted. There’s only a small number of models and they are mostly good at the same things, just different degrees. We will need to take steps to change this if we want to give the fourth option a serious go and I plan to write about that another time.