Novel security risks from generated code. Insecurity as a form of AI safety.


Security problems in application development can be subtle. Generative ML for code leads to novel problems in application security.

First, code generation reduces developer understanding. Even when the developer has a deep understanding of their system they are likely to miss vulnerabilities. The use of generation means the developer will not have to pay attention to the meaning of default choices and will have a lesser mastery.

Second, greater ease of generation will reduce the need for libraries. Libraries remain stable over time, allowing security problems to be detected and fixed, and for the fixes to accumulate. Generated code is new every time.

Third, the only way to accumulate fixes to standard patterns is to retrain the generators. The challenge is to prevent them from recreating code containing the same security problems again and again. Generators are notoriously complex and opaque. Their complexity is fundamental. Retraining generators is a non-linear process.


The nature of writing using generative ML is to put a lower burden on the typing stage and a higher burden on the editing stage. For example, asking ChatGPT to write a haiku about rain in Spain may lead to one about snow in Honduras. The human using the AI must carefully read the output for semantic errors.

Generative ML is an excellent tool for some software engineering, which often requires code that is similar but not the same. In this paradigm an engineer crafts a program by describing components in enough detail for the AI to generate compliant code. The engineer will read the generated code for semantic errors and, if they find any, improve their description to correct those errors.

The concept falls easily into the Read–Eval–Print Loop (REPL) in languages such as Python, where the interpreter:

  1. (R)eads in code entered by the devleoper
  2. (E)xecutes the code
  3. (P)rints the results, allowing the developer to identify problems
  4. (L)oops back to step one after the developer has modified their code to fix the problems identified in step 3.

IDEs like Juypiter and Visual Studio, and new IDEs like e2b, are already being enhanced to support chatbot-style generation.


Step 3 in the REPL loop - examining the results of the program to evaluate correctness - has to be enhanced. This is the moment when the developer acts as an editor. They need new tools to see and understand the impact of generated code. For example, there should be quick access to a tool to visualize the abstract syntax tree (AST), so that the developer can see when their intention doesn't match reality.

Code generators need to embrace the use of libraries so that security problems can be iterated on.

Developer-editors must learn how to write requirements that lead generators to create secure code. They must ask generators to use trustworthy libraries.

Developer-editors must learn how to evaluate the security of generated code. They must be trained to see unsafe patterns.

Code generators must learn how to perform security testing on their output. They must be able to fuzz their code, check for unsafe patterns, and prioritize lower-risk patterns.

Security testing must also be incorporated at the training stage, so that insecure code is not used as an exemplar. This is a form of AI safety.

You'll only receive email when they publish something new.

More from Lucas Gonze
All posts