Darius Baruo
Mar 18, 2026 17:55
OpenAI explains why Codex Safety makes use of AI constraint reasoning as a substitute of conventional static evaluation, aiming to chop false positives in code safety scanning.
OpenAI has printed a technical deep-dive explaining why its Codex Safety software intentionally avoids conventional static utility safety testing (SAST), as a substitute utilizing AI-driven constraint reasoning to seek out vulnerabilities that standard scanners miss.
The March 17, 2026 weblog put up arrives because the SAST market—valued at $554 million in 2025 and projected to hit $1.5 billion by 2030—faces rising questions on its effectiveness towards refined assault vectors.
The Core Downside with Conventional SAST
OpenAI’s argument facilities on a basic limitation: SAST instruments excel at monitoring knowledge movement from untrusted inputs to delicate outputs, however they battle to find out whether or not safety checks really work.
“There is a huge distinction between ‘the code calls a sanitizer’ and ‘the system is protected,'” the corporate wrote.
The put up cites CVE-2024-29041, an Categorical.js open redirect vulnerability, as a real-world instance. Conventional SAST may hint the dataflow simply sufficient. The precise bug? Malformed URLs bypassed allowlist implementations as a result of validation ran earlier than URL decoding—a refined ordering downside that source-to-sink evaluation could not catch.
How Codex Safety Works In another way
Somewhat than importing a SAST report and triaging findings, Codex Safety begins from the repository itself—inspecting structure, belief boundaries, and supposed conduct earlier than validating what it finds.
The system employs a number of strategies:
Full repository context evaluation, studying code paths the best way a human safety researcher would. The AI does not mechanically belief feedback—including “//this isn’t a bug” above weak code will not idiot it.
Micro-fuzzer era for remoted code slices, testing transformation pipelines round single inputs.
Constraint reasoning throughout transformations utilizing z3-solver when wanted, significantly helpful for integer overflow bugs on non-standard architectures.
Sandboxed execution to tell apart “might be an issue” from “is an issue” with precise proof-of-concept exploits.
Why Not Use Each?
OpenAI addressed the apparent query: why not seed the AI with SAST findings and motive deeper from there?
Three failure modes, in response to the corporate. First, untimely narrowing—a SAST report biases the system towards areas already examined, probably lacking whole bug courses. Second, implicit assumptions about sanitization and belief boundaries which are arduous to unwind when flawed. Third, analysis issue—separating what the agent found independently from what it inherited makes measuring enchancment practically inconceivable.
Aggressive Panorama Heating Up
The announcement comes amid intensifying competitors in AI-powered code safety. Simply someday later, on March 18, Korean safety agency Theori launched Xint Code, its personal AI platform concentrating on vulnerability detection in giant codebases. The timing suggests a race to outline how AI transforms utility safety.
OpenAI was cautious to not dismiss SAST totally. “SAST instruments may be glorious at what they’re designed for: implementing safe coding requirements, catching simple source-to-sink points, and detecting recognized patterns at scale,” the put up acknowledged.
However for locating the bugs that value safety groups probably the most time—workflow bypasses, authorization gaps, state-related vulnerabilities—OpenAI is betting that beginning recent with AI reasoning beats constructing on high of conventional tooling.
Documentation for Codex Safety is offered at builders.openai.com/codex/safety/.
Picture supply: Shutterstock
