CTFs Are Not Dead, They’re Just Growing Up

888 words

4 minutes

CTFs Are Not Dead, They’re Just Growing Up

2026-04-11

Security Research

security

/

ctf

/

llm

/

ai

/

education

/

research

Cover image for CTFs Are Not Dead, They’re Just Growing Up

TL;DR#

There’s a growing narrative floating around that Capture The Flag (CTF) competitions are “dead” because of modern LLMs. The argument is simple: if a model can solve challenges faster than humans, what’s the point?

I think that’s the wrong conclusion.

What’s actually happening is something more interesting. CTFs are splitting into two very different worlds, and both still matter.

The “LLM killed CTFs” take#

The concern is not baseless.

Recent work has shown that LLMs can already solve a non-trivial portion of CTF-style problems. For example, LLMs have demonstrated the ability to solve binary exploitation and web challenges with tool augmentation, especially when prompts include structured hints or intermediate feedback (Fang et al., 2024). Similarly, AutoCTF-style agents can autonomously chain reasoning and tools to solve tasks end-to-end (Zhang et al., 2024).

Even OpenAI and Anthropic have shown that models can perform multi-step reasoning and tool use in security-relevant contexts, including reverse engineering and vulnerability discovery (OpenAI, 2024), (Anthropic, 2024).

So yes, if your challenge is:

a known vuln pattern
a standard crypto primitive misuse
or a simple reversing puzzle

then a strong model plus a bit of scaffolding can absolutely solve it.

But that says more about the challenge than about the death of CTFs.

Low-level CTFs were never about the leaderboard#

Let’s be honest for a second.

Beginner and intermediate CTFs were never about “who is the smartest hacker alive”. They were about learning.

If someone uses an LLM to solve a basic buffer overflow challenge and gets first place, that’s fine. They learned nothing.

Meanwhile, someone else who:

steps through the binary in GDB
understands stack layout
writes the exploit manually

is actually building skill.

This aligns with long-standing educational research: active problem-solving leads to deeper understanding than passive solution consumption (Chi et al., 1989).

So nothing is lost here.

And honestly, if your goal is just to win a leaderboard in a non top-tier CTF by throwing LLM prompts at it, it’s worth asking what that even means. If the competition itself does not meaningfully validate skill because it can be mostly automated, then winning it does not really say much. The leaderboard only has value when the underlying challenges demand real expertise.

If your goal is to learn, CTFs still work exactly the same. The leaderboard has always been a bad proxy for understanding anyway.

The real shift: high-end CTFs are becoming research problems#

This is where things get interesting.

Top-tier CTFs like DEF CON Finals, PlaidCTF, or Google CTF have already been moving in this direction for years. Now LLMs are accelerating that trend.

Modern high-end challenges increasingly require:

novel exploitation techniques
deep understanding of mitigations
chaining multiple domains (crypto + reversing + systems)
or even discovering unintended behaviors

These are not easily solvable by current LLMs alone.

Why?

Because LLMs are still heavily bounded by training data and pattern generalization (Bubeck et al., 2023). When a challenge requires:

reasoning about something new
exploring an unknown attack surface
or forming hypotheses and testing them iteratively

the human is still in the loop.

Even in autonomous agent research, models struggle with long-horizon planning and exploration in unfamiliar domains (Xi et al., 2023).

And that’s exactly what high-end CTFs are becoming: mini research problems.

LLMs as tools, not replacements#

What’s actually emerging is a new workflow.

Instead of replacing participants, LLMs are becoming:

fast documentation readers
boilerplate generators
idea expanders
sanity checkers

This is similar to how compilers or debuggers changed programming. They didn’t eliminate programmers, they raised the floor.

There’s already evidence that human + AI collaboration outperforms either alone in complex tasks (Khan et al., 2024).

So in a CTF context:

The LLM helps you move faster
But you still need to know what you’re doing

Otherwise you just prompt blindly and hope.

This might actually be good for security research#

Here’s the part I find most exciting.

If low-tier challenges become trivial for AI, and mid-tier ones become semi-automatable, then:

The only way to keep CTFs interesting is to push them toward novel techniques.

That means:

more original vulnerabilities
more creative primitives
more cross-domain challenges
more “this shouldn’t work but it does” situations

In other words, closer to real-world security research.

And that has a side effect:

It incentivizes participants to become actual researchers.

This aligns with how fields evolve under tooling pressure. For example, automation in software engineering shifted focus toward higher-level design and architecture problems (Brooks, 1987).

CTFs may follow the same path.

The skill gap will widen#

One thing that will happen is divergence.

There will be:

People who rely heavily on LLMs and plateau early
People who use LLMs as leverage and go deeper

This is consistent with studies showing that tools amplify existing skill differences rather than equalize them (Brynjolfsson & McAfee, 2014).

So instead of “AI democratizing CTFs”, we might actually see:

faster beginners
but much stronger experts

So… are CTFs dead?#

No.

They’re just changing shape.

Beginner CTFs: still great for learning, leaderboard matters less than ever
Mid-tier CTFs: partially automatable, good for practicing workflows
Top-tier CTFs: increasingly research-driven, still human-dominated

And if anything, the high end is becoming more interesting, not less.

Final thoughts#

If your goal is to win a leaderboard by throwing prompts at an LLM, especially in a CTF that is not top-tier, then you might be optimizing for something that has very little real value.

But if your goal is to: