45% of AI-Generated Code Ships with Security Flaws. Here's What That Means for You.

A study from Stanford published in 2022 found that developers who used GitHub Copilot were significantly more likely to introduce security vulnerabilities than those who didn't. Since then, LLM coding assistants have gotten substantially more capable - and substantially more popular. The latest estimates suggest that somewhere between 40% and 50% of code at companies like Google is now written with AI assistance. That's not a fringe phenomenon anymore. It's the default workflow for a generation of developers.

So when researchers at Veracode analyzed a large corpus of AI-generated code last year and found that roughly 45% contained at least one security flaw, it should have been front-page news in engineering circles. Instead, most teams treated it as a known risk and moved on. The productivity gains were too good to give up. That's understandable - and it's also a problem.

What the Flaws Actually Look Like

The vulnerabilities that AI coding tools produce aren't usually exotic. They're classic, well-documented issues that good developers know to avoid. SQL injection still shows up constantly. An LLM will generate a database query that concatenates user input directly into a string because that pattern exists everywhere in its training data - StackOverflow answers from 2011, tutorial blog posts, legacy codebases. The model learned from imperfect examples and reproduces those imperfections at scale.

Hardcoded secrets are another big one. Ask an LLM to write a function that connects to an S3 bucket and there's a real chance it fills in the credentials inline - because that's what countless examples in its training data showed. It doesn't understand that the correct approach is an environment variable or a secrets manager. It pattern-matches to what it's seen. If the developer doesn't catch it in review, those credentials end up in git history. Sometimes permanently.

Insecure defaults are subtler and arguably more dangerous. AI-generated Express.js servers often skip helmet.js. Generated CORS configurations frequently use * as the allowed origin. Cryptographic operations show up using MD5 or SHA-1 when the developer asked for "password hashing" without specifying the algorithm. None of these are bugs in the traditional sense - the code works fine in testing. They only matter when an attacker gets involved.

Why Vibe Coding Makes This Worse

"Vibe coding" - the practice of describing what you want in plain English and accepting whatever the AI produces with minimal review - has become a genuine phenomenon. Pieter Levels built a flight-tracking app with Cursor that hit $50K MRR. Indie hackers are shipping functional products in days that would have taken months before. The productivity story is real.

The security story is also real. When you're moving fast and accepting AI output without reading it carefully, you don't just inherit functional code - you inherit whatever security assumptions the model made. And those assumptions are often wrong. The model doesn't know your threat model. It doesn't know that your app will handle financial data, or medical records, or that your users will try to break it. It's optimizing for code that works, not code that's safe.

The developers who suffer most from this are often the ones who needed AI assistance in the first place - early-stage founders with limited security experience who are building quickly and don't have a security engineer to review their PRs. The irony is brutal: the tool that makes it possible for a non-expert to ship software also silently embeds vulnerabilities that a non-expert won't recognize.

SAST: The Obvious Answer That Most Teams Skip

Static Application Security Testing - SAST - is the practice of analyzing source code for vulnerabilities without executing it. Tools like Semgrep, Checkmarx, and Snyk Code scan your codebase for known vulnerability patterns and flag them before they reach production. This is not new technology. SAST has been around for decades. But adoption among smaller engineering teams remains stubbornly low, because the traditional objection was always: "we don't have time to deal with all the false positives."

That objection used to have some merit. Older SAST tools were notoriously noisy. A scan would return hundreds of findings and developers would learn to ignore all of them. In security, an alert that nobody looks at is worse than no alert at all - it creates a false sense of coverage while doing nothing.

Modern SAST is different. Semgrep with a well-tuned rule set can achieve false positive rates below 10%. More importantly, the rules can be customized to your stack - if you're building on FastAPI with SQLAlchemy, you can write rules that understand your specific ORM patterns and catch injection vulnerabilities that generic tools would miss. That's the version of SAST that actually works in practice.

What Good Looks Like in Practice

The teams that handle this well aren't the ones that ban AI coding tools - they're the ones that treat AI output like third-party dependencies. You don't ship npm packages without checking them. You shouldn't ship AI-generated code without running it through a scanner.

The practical implementation is a CI/CD gate. Every pull request runs SAST before it can be merged. Critical and high findings block the merge. Medium findings generate a comment in the PR that the developer has to acknowledge. This doesn't slow teams down much - a Semgrep scan on a typical repo takes under two minutes. What it does is ensure that the developer at least sees the issue before it lands in main.

There's also a prompt engineering angle that's underexplored. If you add security requirements to your AI instructions - "use parameterized queries for all database operations," "never hardcode credentials," "use bcrypt or argon2 for password hashing" - the models follow them reasonably well. They're not perfect, but you cut the error rate meaningfully. The combination of better prompts plus automated scanning is far more effective than either alone.

One more thing worth saying explicitly: none of this requires slowing down. The developers who are most productive with AI coding tools tend to also be the most disciplined about their guardrails, because they've learned through experience that debugging a security incident is a lot slower than preventing one. The goal isn't to add friction - it's to make the feedback loop tight enough that vulnerabilities get caught in the minute after they're written, not the year after they're exploited.

AI-assisted development is here to stay. The companies that figure out how to use it safely will ship faster and more securely than those who either reject it outright or use it naively. That's the genuine opportunity right now - not getting rid of your Cursor subscription, but adding the layer of automated security review that makes it safe to trust what it produces.

45% of AI-Generated Code Ships with Security Flaws. Here's What That Means for You.

What the Flaws Actually Look Like

Why Vibe Coding Makes This Worse

SAST: The Obvious Answer That Most Teams Skip

What Good Looks Like in Practice

Ready to automate your security?