AI is Not Yet Good at Debugging

Quite a few people fear that artificial intelligence will take over their jobs, but for programmers, that is not happening too soon. Microsoft Research reports that the technology is bad at fine-tuning code.

While the software industry is among the first to experiment with artificial intelligence, for example, by building small apps with the help of GitHub Copilot or a large language model, Microsoft Research writes that AI is not yet good at the most time-consuming job for many programmers: debugging.

With Debug Gym, Microsoft has built an environment that allows AI models to try to fine-tune existing code. They are provided with tools that were not part of the training of the AI models and must then gradually learn to work with them.

AI models that have not been ‘trained’ in that Debug Gym are more likely to be bad at debugging, Microsoft writes. Even AI models that have learned to work with many debug tools in the Debug Gym are still not nearly as good as an experienced programmer.

The research is interesting because most companies that use LLMs and other AI models usually skip the step of teaching that code to work with the debug tools. The results of that are not going to be good. Even with the Debug Gym, the success rate is 48.4 percent.

Microsoft even indicates that the company sees a future where AI can make suggestions to improve code, which then have to be approved by an experienced programmer. The idea that AI agents can completely replace expensive IT employees for many companies seems far away.

Google Breaks With Scale AI After Meta Investment

Wikipedia Pauses AI Rollout After Moderator Protest

Several Tesla Employees have Officially Asked Elon Musk to Step Down

German Tesla Sales Plummet Again

Meta Compares European Commission’s Multi-Million Dollar Fine to a Levy