The Bottleneck Moved. Your Metrics Probably Didn't.

GitQuick

The Bottleneck Moved. Your Metrics Probably Didn't.

AI coding tools boosted commits by 180%, but releases only rose 30%. Recent engineering debates and real PR data from major open-source organizations point to the same problem: code review is becoming the new delivery constraint.

I keep coming back to one question:

How long do code reviews actually take?

Some teams may have an answer. Not many have data. GitHub shows you events: opened, commented, approved, merged. It does not show whether your review process is healthy. No median latency. No reviewer load. No “this PR sat for four days and nobody looked at it.” You usually find out when someone complains in Slack.

That gap is why I built GitQuick.

And it is exactly what the last month of engineering discourse has been pointing at, just with better vocabulary.

Photo by Kevin Ku on Unsplash

What changed in the last 30 days

This is not a slow-burn trend anymore. It has become one of the central engineering conversations.

Stack Overflow’s engineering blog put it plainly on June 18: when output doubles but review capacity does not, something has to give. The bottleneck moves from writing code to review and judgment.

A CEPR/NBER study of 100,000+ GitHub developers published on June 21 put numbers behind the same pattern:

Autonomous coding agents: +180% commit activity
The same developers working on about 50% more projects
Actual releases: about 30% more

Writing code got faster. Shipping did not keep up. The weak link is downstream: review, integration, testing, and release.

r/EngineeringManagers circulated the same pattern from the field: a 741% increase in lines of code translating to roughly 20% more releases. More code in the pipe. The same human judgment at the end.

On Hacker News, “When I reject AI code even if it works” resonated for the same reason: the issue was not that the code failed tests. The issue was that reviewers could not trust what they were looking at.

A highly upvoted comment on a r/cscareerquestions thread said it without dressing it up:

“I haven’t written a line of code in 6 months. I have read a lot of incorrect code and poor implementations produced by Claude Opus…”

Madrona’s survey of 49 engineering leaders asked where bottlenecks have shifted. 57% said code-review queue time. Second place: spec clarity. Not “we need faster models.”

And Codacy’s write-up cites data showing that agentic AI PRs sit in the review queue 5.3× longer than unassisted ones. The queue is not growing because reviewers are lazy. It is growing because the input rate changed and most teams are not measuring the constraint.

Developers are not becoming reviewers by choice. The job changed under them.

This should not surprise anyone who works in this field. Developers generally enjoy writing code more than reading it. Review has always been one of the slowest steps in the pipeline. AI made it more visible by increasing the rate at which code arrives for review.

The part barely anyone measures

Everyone feels this. Almost nobody has the numbers.

GitHub will tell you:

This PR was opened
Someone commented
Someone approved
It merged

GitHub will not tell you:

How long PRs wait before anyone looks
Whether your P90 is hiding a week of pain behind a four-hour median
What percentage of merges skip review entirely
Whether one reviewer is carrying 40% of the load
Whether PRs are getting bigger while everyone insists they are not

Anthropic’s own engineering blog describes the Claude Code team dealing with exactly this: “Verification, code review, and security took their place” as the new constraints.

Even the people building the tools hit the wall.

What the data looks like when you actually measure it

I pointed GitQuick at ten public GitHub organizations to stress-test the pipeline. Same metrics, same time window, very different review cultures.

Live data from git-quick.dev/showcase, June 2026.

Org	PRs analyzed	Median merge	P90 merge	Merged/week	Merged w/o review
Microsoft	37,758	3.9h	92h	4,484	8.9%
Google	7,152	7.1h	119h	704	52.1%
AWS	3,397	13.5h	190h	315	9.5%
Kubernetes	2,033	12.2h	201h	191	38.3%
Anthropic	1,897	3.2h	47h	165	0.1%
Netflix	1,666	3.4h	186h	33	52.4%

One pattern is clear, and it maps directly onto what engineers have been saying online.

The median lies politely

Microsoft’s median merge time is under four hours. That sounds like a very fast machine.

But the P90 is 92 hours - almost four days for one in ten PRs.

Across the dataset, one in ten PRs takes roughly 14× to 23× longer to merge than the median. That is the long tail most dashboards hide.

This is the pattern HN has discussed for years: when reviews take a week, it is rarely a week of eyes-on-code. It is usually a week of waiting for someone to start.

The CEPR study’s 180% → 30% drop makes more sense when you see this shape in the data.

What can you do about it?

You do not need a full process overhaul. Three moves address most of what the data and, the last month of discussion, keep pointing at.

1. Measure the queue, not the vibe

GitHub will not tell you if a review is healthy.

Pull latency by stage:

Time to first review
Time to approval
Time to merge

Then report median and P90 together.

A good median with a bad P90 means your happy path is fine and your long tail is eating people alive.

Add merge-without-review rate and reviewer load to the same view. Those two numbers explain much of the Netflix-vs-Anthropic gap in the table above: fast with loose control, or fast with a tight gate.

Either can be intentional. Neither should be invisible.

2. Fix time-to-first-touch before asking anyone to “review faster”

When reviews take a week, it is usually not a week of reading diffs. It is a week of waiting for someone to start.

Agentic PRs already sit in the queue 5.3× longer than unassisted ones. The input rate changed; review habits did not.

Cap open PR WIP per developer so the queue stops growing.

Alert on PRs with no first review after a few hours.

Block dedicated review time in the calendar the same way you block standup.

Pressuring people to approve faster is how you get rubber stamps. Shrinking time-to-first-touch is how you unblock the pipeline without lowering the bar.

3. Automate the mechanical layer and reserve humans for judgment

AI did not invent review bottlenecks. It increased the volume hitting them.

The useful split is simple:

CI and bots catch lint, tests, dependency issues, formatting, and obvious failures.
Humans judge architecture, business logic, maintainability, and whether they trust the change - especially on AI-generated diffs they did not write.

Run linters and tests before requesting review.

Optionally run AI review in a fresh context, not with the same agent that wrote the code.

By the time a teammate opens the PR, they should be answering “is this the right approach?” not “why is there an unused import on line 7?”

What I would actually do this week

If I were an EM reading the CEPR paper and staring at a growing PR queue, I would not start by buying another AI coding seat.

I would answer three questions with data:

What is our P90 time to first review? Not median. Median tells you the happy path. P90 tells you who is blocked.
What percentage of PRs merge without a recorded review? Maybe that is fine for docs and dependency bumps. Maybe it is not. Either way, it should be a number on a dashboard, not a surprise in a retro.
Is the review load concentrated? The queue grows quietly when two people review 60% of everything. That does not always show up in standup. It shows up in burnout.

Those are the metrics GitQuick surfaces - plus rule-based signals that explain what is wrong, how bad it is, and what to try next.

Not another wall of charts you will ignore by Thursday.

See it before you connect anything

The part I am most proud of is the Showcase.

Before you install a GitHub App or grant organization access, you can browse real analysis on Microsoft, AWS, Google Cloud, Kubernetes, Anthropic, Netflix, and more.

Same pipeline as private organizations. Public GitHub metadata only.

Pick an organization you already have opinions about. See if the numbers match your intuition.

They usually do. And that is what makes the product more credible than a landing-page screenshot.

Sign in with GitHub using read-only OAuth, and you get reviewer load breakdowns and side-by-side organization comparison. Still no app install required for the public datasets.

Try it on your org

The conversation this month is not “should we use AI to code?”

It is:

Can our review pipeline absorb what AI is producing?

Most teams do not know the answer yet.

The data is probably worse than you think – and more fixable than you expect.

Browse the Showcase →

Try GitQuick on your org →

Showcase metrics are derived from public GitHub metadata only. They reflect review-process signals, not code quality or internal engineering culture. GitQuick is not affiliated with or endorsed by the organizations featured.