Developer Productivity: You Can't Measure It with One Number

McKinsey published "Yes, you can measure software developer productivity" in August 2023 and the industry lost its mind. The backlash from engineers, engineering leaders, and productivity researchers was immediate and sustained. Kent Beck, co-author of the Agile Manifesto, called the framework "actively dangerous." Gergely Orosz, author of The Pragmatic Engineer, wrote a detailed rebuttal. Dan North called it "the McKinsey 'Developers are like bricklayers' article."

The backlash was not about whether productivity can be measured. It was about what McKinsey proposed measuring and the perverse incentives those measurements create.

I manage 35-50 engineers across multiple client projects. I measure team performance constantly. But I do not use a single number. Nobody should.

Why Single-Number Metrics Fail

Lines of code

The most obvious bad metric and the easiest to dismantle. A developer who writes 1,000 lines of code per day is not 10x more productive than one who writes 100. They might be 10x less productive — writing verbose, poorly structured code that a better engineer would express in 100 lines.

The best engineering work is often subtractive. Deleting 500 lines of dead code. Replacing a 200-line function with a 30-line library call. Refactoring a module so the next feature requires 50 lines instead of 500. By the lines-of-code metric, these improvements look like negative productivity.

Nobody serious uses lines of code anymore. But the instinct behind it — measuring output volume — persists in more sophisticated disguises.

Story points completed

Story points were designed for estimation, not measurement. They are team-relative, not absolute. A team that estimates generously completes more points. A team that estimates conservatively completes fewer. Comparing point velocity between teams, or using it as a productivity metric for individuals, produces exactly the wrong incentives: inflate estimates, cherry-pick easy tickets, avoid complex work that takes longer than the points suggest.

The creator of story points, Ron Jeffries, has explicitly said they should not be used for productivity measurement. "Story points are for planning, not for management," he wrote. The industry adopted his tool and used it for the opposite of its intended purpose.

Pull requests per week

PRs per week incentivizes small PRs. Small PRs are generally good for code review. But a developer who splits a feature into 10 tiny PRs to improve their metrics is not more productive than one who submits 2 well-scoped PRs that accomplish the same work. They are gaming the metric and creating 5x the review overhead.

DORA metrics (misapplied)

Google's DORA metrics — deployment frequency, lead time, change failure rate, time to restore — are excellent team-level indicators of software delivery performance. They are not individual productivity metrics. A team that deploys 50 times per week has a healthy pipeline. An individual who creates 50 deployments per week might be introducing instability.

DORA explicitly measures at the team level. The researchers who developed the metrics warn against applying them to individuals.

What Actually Measures Productivity

Outcome over output

The question is not "how much did the engineer produce?" It is "what business outcome did the engineering produce?"

RiseMD: 20X ROI from $160K in marketing spend. That is a measurable outcome from the platform we built. The number of story points, PRs, or deploys that produced it is irrelevant. The business outcome is the metric.

Greek House: went from releases every few months to same-day deploys, which enabled Inc. 5000 growth and eventually an acquisition. The outcome was business growth unlocked by engineering capability. No single-number productivity metric captures that.

Ripe: acquired by Hungry after 5 years of development. The acquirer's engineers could read, understand, and extend the codebase. That code quality — not the volume of code — is what made the exit possible.

Delivery against commitment

Can the team deliver what it committed to in the sprint? Not story points. Not velocity. The actual working software that was planned, built, tested, and shipped.

A team that commits to 5 features and delivers 5 is performing well. A team that commits to 10 and delivers 6 (completing 120 story points along the way) is performing poorly despite the higher point count. The commitment-to-delivery ratio is a better signal than any volume metric.

We track this on every client engagement. Our sprint review shows: what was planned, what was delivered, and what slipped. The ratio tells us whether the team is healthy, overcommitted, or struggling. No single number. A conversation about the gap between plan and reality.

Quality signals over time

Defect rate, change failure rate, and time-to-resolve are quality signals that correlate with sustainable productivity. A team that ships fast but creates bugs is not productive — they are generating rework that consumes future productivity.

Track defect rate per sprint. If it is trending up, the team is cutting corners (possibly under pressure from unrealistic deadlines or from burnout). If it is trending down, the team's practices are improving. The trend matters more than any single data point.

Developer experience surveys

Ask the developers. "On a scale of 1-5, how productive do you feel this sprint?" "What blocked you?" "What would make you more productive?" Self-reported productivity correlates with actual output better than any external metric (multiple studies from Microsoft Research confirm this).

DX (Developer Experience) is an emerging field precisely because the research shows that developer satisfaction, perceived productivity, and actual delivery are strongly correlated. Happy developers who feel productive are productive. Unhappy developers who feel blocked are not. The survey is cheaper and more accurate than any dashboard of computed metrics.

The McKinsey Problem

McKinsey's framework proposed measuring individual developers on "inner loop" and "outer loop" activities, contribution analysis, and talent capability assessments. The framework is internally consistent. The problem is the incentive structure it creates.

When you measure inner-loop speed (how fast a developer writes and tests code), developers optimize for speed over quality. When you measure outer-loop throughput (how fast code moves through review and deployment), developers pressure reviewers to approve faster. When you measure contribution relative to peers, developers compete instead of collaborate.

The engineering teams that perform best — Google's Project Aristotle data confirms this — are teams with high psychological safety, where members help each other, share knowledge freely, and are not afraid to admit mistakes. Individual productivity measurement undermines psychological safety by creating competitive dynamics that punish collaboration.

How We Measure at EltexSoft

We do not measure individual developer productivity. We measure team delivery against client commitments.

Sprint delivery ratio. What percentage of committed work was delivered? Target: 85-95%. Below 80% consistently means the team is overcommitting or blocked. Above 95% consistently means the team is undercommitting (sandbagging).

Client satisfaction. Does the client feel the team is performing? This is subjective. It is also the metric that determines whether the engagement continues. A team with perfect velocity metrics but an unhappy client is failing.

Code quality trends. Technical debt trajectory (improving or declining), defect rate trend, test coverage trend, deployment confidence. These are lagging indicators of practices, not leading indicators of productivity. But they reveal whether the team is building sustainably.

Retention. Our engineers stay for years. High retention is a proxy for healthy engineering culture, interesting work, and sustainable pace. The teams that retain people are the teams that perform well. Attrition is a productivity metric — the most honest one.

Business outcomes. Ultimately, the question is whether the engineering produced business value. HeyTutor grew from a one-page spec to a marketplace with 10,000+ tutors and LAUSD as a client. Nautical Commerce raised $30M and was acquired. Greek House made Inc. 5000 and was acquired. These outcomes are not captured by story points, PRs per week, or lines of code. They are captured by whether the engineering was good enough to make the business succeed.

You cannot measure developer productivity with a single number. You can measure team health, delivery consistency, quality trends, and business outcomes. The aggregate of these signals tells you more than any McKinsey framework ever will.

Talk to us →

Last updated September 29, 2024

You Can't Measure Developer Productivity with a Single Number