Google’s LLM Agent Finds and Fixes Real-World Bug in Popular Database

Google’s Big Sleep team announced that its large language model (LLM) agent discovered a bug in the popular open-source database engine SQLite and fixed it. It is the first example of artificial intelligence (AI) being employed to uncover a “real-world vulnerability” outside the test sandbox.

“Fortunately, we found this issue before it appeared in an official release, so SQLite users were not impacted,” the Big Sleep team announced in a blog post. “We believe this is the first public example of an AI agent finding a previously unknown exploitable memory-safety issue in widely used real-world software.”

The team further suggested that the Big Sleep agent and similar LLM could offer tremendous defensive potential, including the ability to find vulnerabilities in software before it is released. That could help ensure that there is “no scope” for attacks to complete, as potential bugs are fixed before a hacker can even exploit it.

Big Sleep, which is an evolution of Project Zero’s Naptime project, was announced in June. It enables LLMs to autonomously perform basic vulnerability research, but the framework also provides LLMs with the necessary tools to test software for potential flaws in a human-like workflow. That includes a code browser, debugger, a reporter tool, and even a sandbox environment that can be used to run Python scripts and record outputs.

LLM Agents of Good

There have been concerns that LLM could all too easily be employed in nefarious or at least dubious ways – such as in creating deepfake videos, or to generate content to spread misinformation and disinformation. However, the LLM agents could be seen as a more positive application of AI.

“Using AI to study source code is truly one of the hot areas right now. Google’s discovery of an exploitable defect in SQLite illustrates the cybersecurity impact. In this case, it is ‘good guys’ finding a vulnerability before it became available to some malicious actor – AI improved our security,” said Dr. Jim Purtilo, associate professor of computer science at the University of Maryland.

Given the vast amount of code being created, AI should be a powerful tool to help find vulnerabilities in new and existing code,” added Maili Gorantla, chief scientist at AI cybersecurity provider App Soc.

Not only can AI handle more tedious and mundane aspects of reviewing code, but it can do it far faster than any human. This could prove extremely valuable given how one small bug like the one that caused this past summer’s “CrowdStrike incident” almost brought down the Internet!

“The challenge of keeping up with vast numbers of threats and vulnerabilities has grown beyond human scale and requires automation with intelligent decision-making just to keep up,” Gorantla told ClearanceJobs. “But AI will benefit both sides in the security arms race with vastly more unique threats being AI-generated. It’s also critical that we secure the AI tools themselves. Smart attackers will both use AI offensively and try to corrupt or poison the AI tools being used for defense.”

That could be part of the broader story about how AI techniques are improving security.

“The tools are being used to translate and improve legacy source code, removing old issues along the way, and also to flag gaps in operations which might have left critical sections exposed to malicious actors,” Purtilo told ClearanceJobs.

However, this could also be proof of concept that LLM agents could also be used for nefarious purposes.

“We need to assume that ‘bad guys’ are using AI to search for defects too though,” Purtilo continued. “The question is who finds those defects faster? Any cutting-edge development shop should be employing LLMs to screen code before deployment as a matter of routine.”

LLM Agents of Good

Related News