Programming is hard (but not in the way you think)
Programming is hard, and a lot of people just don’t get it. The nature of our work is that we rarely do the same thing in the same way twice; the whole point is to automate a process so that other non-programmers can repeat it later. We solve problems that have no existing solution – after all, if one existed, we’d just download it!
Programming is also hard because many processes are ill-defined. For example, if someone gives us a “simple” request for a tool to reconcile two data sets, that’s not enough. They also need to tell us what each data set represents, what date ranges it encompasses, how often it is updated, what it is used for, and myriad other details that usually only emerge after extensive back-and-forth.
Humans are smart, so when they describe to a programmer how a process should work, they omit what they assume are the obvious steps. But computers are dumb; they need you to describe… every… obvious… step… in… excruciating… detail. A huge part of programming is breaking down a process conceptually into smaller and smaller pieces until it can be coded. For an experienced programmer, the coding is the easy part; it’s the conceptual work, the collaboration, the design, the architecture, and the debugging that can make us tear our hair out – but can also be deeply satisfying when we succeed.
Many attempts to make it easier have failed
As someone who’s been developing software for over 20 years, I’ve seen a long line of attempts at making it easier, many of dubious value. Of course, today we benefit from a lineage of programming tools that have brought ubiquitous benefits to the industry: compilers, virtual machines, debuggers, IDEs, base libraries and frameworks, APIs, package managers, and so forth. These advancements have made our work unquestionably more productive and pleasant.
But there have also been innumerable attempts to dumb down (or “democratize,” if you prefer) the job to make it more accessible and reduce the cost of hiring specialized, highly skilled workers: UML designers, low-code/no-code frameworks, visual query builders, and many more Quixotic projects. These attempts have almost universally failed, because they miss the point – coding is not the hard part.
For example, to use a visual query builder well, you need to understand the underlying data set you’re querying, its performance characteristics, and its domain-specific semantics. You also need to understand basic relational database concepts like joins, foreign keys, and indexes. Once you understand those things, the visual query builder becomes a poor substitute for the underlying, textual Structured Query Language (SQL) itself – which is no surprise, as SQL was adapted from relational algebra and is, by definition, the best tool for the job.
It’s as if someone looked at the landscaping profession, which requires a certain degree of physical strength, and mused, “I bet we could get more landscaping done for less money if we developed a pill that made anyone strong enough to be a landscaper.” Of course, this completely misses the point that the ability to lift a wheelbarrow is not the most important part of being a landscaper. Landscapers quickly acquire sufficient physical strength just by doing landscaping – it’s not even the hardest part of the job – just as people who write good database queries learn about SQL and databases in the course of their work. And the pill, like the SQL produced by visual query builders, probably has some bad side effects.
But the explosion of remarkably impressive AI that began with generative adversarial networks (GANs) and large language models (LLMs) in the 2010s and culminated in ChatGPT and similar interactive tools in the 2020s has brought a new kind of tool to the desktop of programmers: AI coding assistants. These tools promise to dramatically improve productivity by responding to natural-language prompts with working code and accurate explanations. They even suggest auto-completion of code while you are typing!
Are these tools a net gain for productivity, code quality, and programmer satisfaction? Some studies (mostly funded by the vendors of the tools) would have us believe so. As the manager of a small team of developers, I wanted to evaluate them for myself, so I picked what seemed to be the two most popular tools today: GitHub Coplit and Codeium.
My experience with Codeium and GitHub Copilot
Our use case for software development at Webnames.ca is perhaps quite different from the tests that AI coding assistants have thrown at them in typical studies. We are not solving abstract problems, as in a programming contest. We are not a startup that is building greenfield foundational and plumbing code. We are not stringing together microservices based on a bunch of examples we found on Stack Overflow or ChatGPT. Instead, we are maintaining and expanding the capabilities of a system of over one-million lines of code that has evolved across 20 years and integrates with dozens of other systems, both locally and remotely. (I think our use case is actually quite common in the industry, it’s just not the sexy kind of Bay Area-startup programming that you see trumpeted by the big FAANG companies.)
Our type of software development requires deep knowledge of existing business processes, background tasks, architecture, and private, third-party APIs. Over about a week, I happened to be doing more coding than I usually do, and it was coding of a new feature (ironically, our new ChatGPT AI-driven domain search suggestions) so it should have been a favorable test for an AI assistant.
Both Copilot and Codeium combine a predictive auto-complete feature (essentially IntelliSense on steroids) and an interactive chat feature. The interactive chat can already be done almost as well in a separate window/app with ChatGPT or any other LLM. You can even paste in snippets of your code to ask it questions. Unfortunately, I haven’t found it very useful in answering programming problems, except for one specific use case: translating code from one language to another. So I didn’t see huge value in that feature.
The auto-completion, however, is potentially extremely valuable, depending on how well it works. With Copilot, there is a helpful log available in Visual Studio under View > Output > Show output from > GitHub Copilot, and from that log you can count how many times you accepted a completion vs. rejected/dismissed it. My acceptance rate turned out to be about 6.5% (23 accepted suggestions out of 351 offers.) But is that good?
Well, each time you are offered a completion, you need to read the suggested code and decide whether it will work in your program. But it is well established in software engineering that it’s harder to read code than to write it. (By the way, that fantastic article was written by Joel Spolsky, co-founder of Stack Overflow, and I highly recommend his blog archives; even 20 years later, their wisdom remains relevant.) So even if over 50% of the AI suggestions were correct, it could still result in slowing down your overall coding speed.
My acceptance rate of 6.5% strongly suggests that I’m spending more time reading and dismissing auto-completions than I would spend without using completions at all. I have seen studies showing an average acceptance rate of 30%, and I suspect those participants heavily skew toward less experienced developers doing greenfield and/or boilerplate development, rather than our use case. But even at 30% acceptance, I feel there would be a net loss of productivity (see the studies I linked below under Further Reading.)
The more I reflect on my experience with AI coding assistance, the more I feel like someone promised to make me a better landscaper, and then made me 6.5% stronger at lifting wheelbarrows, as long as I read a poem before each lift.
Sadly, I have to conclude that Copilot did not significantly improve my productivity. If anything, it decreased it by interrupting me with low-quality solutions. (My tests with Codeium left me with a similar impression.)
The future of AI coding assistance
Don’t misunderstand me. I’m not claiming that AI coding assistants can never improve productivity. (I don’t want to be the guy who claimed cars would never replace horses.) Like any technology, AI coding assistants will improve over time. (Even medieval Europe didn’t really regress in technology, and technological progress today is far faster.) If the quality of suggestions given by AI coding assistants eventually increases enough, it’s conceivable that using them will represent a net productivity gain.
I’m sure GitHub Copilot and Codeium will both improve, and I plan to revisit them in a year or two. I have also heard some good things about Claude Sonnet and Tabnine and hope to evaluate those at some point.
Here it may be useful to step back and remind ourselves what a real assistant is. Usually, an assistant is a person capable of doing most of the tasks of the person they are assisting, but with less knowledge, experience, and skill. For example, in a pinch, a dental assistant could probably do much of what their dentist does, just not as well. In principle, an assistant who apprenticed under a skilled practitioner long enough could take over the job; after all, that’s how apprenticeships worked for hundreds of years.
To be a true assistant to programmers, the tool must be able to ask questions, reason, and creatively solve problems like a real programmer. Doing this requires general intelligence. Now, AIs already have more knowledge than any human, but they don’t have that intelligence, reasoning, and creativity, and there’s good reason to believe that LLMs specifically – currently the most popular type of AI – never will.
That doesn’t mean Artificial General Intelligence (AGI) is impossible; our brains are not magic, they’re just machines, and in principle there’s nothing stopping a sufficiently advanced machine with the right software from achieving (and likely quickly surpassing) human intelligence. Such a machine would undoubtedly make a tremendous AI coding assistant. But at that point, it will no longer be an assistant; rather, the AI itself could do the whole job – in fact, it could do literally any job a human could do, because it would think like a human, only faster.
What comes next? Only time will tell.
Further reading
Devs gaining little (if anything) from AI coding assistants
Is GitHub Copilot For Everyone?
Another Report Weighs In on GitHub Copilot Dev Productivity: 👎 — Visual Studio Magazine
Measuring GitHub Copilot’s Impact on Productivity – Communications of the ACM