Back to blog
engineering

84% of Developers Use AI Tools. Only 29% Trust the Output.

84% of developers use AI tools. Only 29% trust them. METR's study: AI made experienced developers 19% slower. The productivity paradox.

Dennis Vorobyov
Dennis Vorobyov
Founder & CEO
August 3, 2025 · 7 min read

Stack Overflow's 2025 Developer Survey found that 84% of developers use AI coding tools. GitHub Copilot, ChatGPT, Claude, Cursor, and others. The adoption is near-universal.

The same survey found that only 29% trust the output.

That gap — 84% usage, 29% trust — is the most important number in software engineering right now. Developers are using tools they do not trust. They are generating code they review with suspicion. They are faster at producing text that might be wrong.

I run an engineering studio. Our engineers use AI tools. I use AI tools. The question is not whether to use them. The question is whether the productivity gains are real or illusory.

The Productivity Paradox

GitHub's claim: Developers using Copilot complete tasks 55% faster.

METR's finding (2025, peer-reviewed): Experienced open-source developers using AI tools were 19% slower on real-world tasks in their own repositories. Not faster. Slower.

How can both be true?

GitHub measured isolated coding tasks: "write a function that does X." These are exercises where the AI has seen millions of similar examples. The developer accepts the suggestion, maybe modifies it, and the task is done faster.

METR measured real work: fixing bugs, implementing features, and refactoring code in production open-source repositories. These tasks require understanding the existing codebase, navigating complex dependencies, respecting architectural patterns, and testing against real-world edge cases. The AI suggestions were plausible but often wrong in ways that required time to discover and correct.

The paradox: AI makes simple tasks faster and complex tasks slower. Most production engineering work is complex.

Where AI Tools Help

Based on what our team actually uses them for:

Boilerplate generation. Database migrations, CRUD endpoints, test scaffolds, configuration files. The code that follows a known pattern and has no domain-specific complexity. AI handles this well.

Documentation. Generating docstrings, README sections, and API documentation from existing code. AI reads the code and describes what it does. This saves time and produces reasonable first drafts.

Exploring unfamiliar APIs. "Show me how to use Stripe Connect for marketplace payment splitting" produces a useful starting point. Not production code, but a map of the API surface that accelerates the developer's understanding.

Refactoring patterns. "Convert this class component to a functional component with hooks" or "rewrite this callback chain as async/await." Mechanical transformations where the pattern is well-established.

Test generation. Given a function, generate unit tests covering the obvious cases. The developer still needs to add edge cases and domain-specific tests, but the scaffolding saves time.

Where AI Tools Hurt

Architecture decisions. AI suggests what it has seen most often in training data. That is not always (or even usually) the right architecture for your system. An AI will suggest a microservices architecture for a 3-person startup because that is what appears in most architecture blog posts. The right answer for a 3-person startup is a monolith.

Security-sensitive code. Authentication flows, encryption implementations, access control logic. AI suggestions frequently contain subtle security flaws: missing input validation, incorrect token handling, race conditions in authorization checks. These flaws are not obvious. They pass cursory review. They are exploitable in production.

Complex business logic. The matching algorithm for HeyTutor considers 100+ metrics. No AI tool can generate this from a prompt. It requires understanding the business, the users, the edge cases, and the failure modes. AI-generated code for complex business logic looks plausible and is almost always wrong in ways that only surface with real users.

Legacy codebase understanding. METR's finding makes sense here. AI tools do not understand why the code is the way it is. They suggest changes based on what "good code" looks like in general, without understanding the specific constraints, workarounds, and business decisions embedded in the legacy codebase. Following the AI suggestion introduces bugs because the suggestion ignores context the developer has not yet fed into the prompt.

How Our Team Uses AI

We use AI tools as drafting assistants, not as co-authors.

Every AI-generated code suggestion goes through the same code review process as human-written code. The reviewer is responsible for correctness, not the AI. If a developer submits a PR with AI-generated code that has a bug, the developer is accountable — not the tool.

We do not use AI for security-critical code. Authentication, encryption, and access control are written by senior engineers and reviewed by a second senior engineer. No AI in the loop.

We do not trust AI for architecture. Architecture decisions come from engineers who have built similar systems before. PropertyRate's Kohana-to-Laravel migration was designed by an engineer with 15 years of experience. No AI tool could have designed that migration plan because it required understanding the specific constraints of a 50-state appraisal management platform.

We do use AI for the productivity wins that are real: boilerplate, documentation, test scaffolding, and API exploration. These are genuine time savers that let our engineers focus on the hard problems.

The 29% Question

Why do only 29% of developers trust AI output? Because the other 71% have been burned. They accepted a suggestion that looked correct, shipped it, and discovered the bug in production. Or they spent 30 minutes debugging an AI suggestion that would have taken 10 minutes to write from scratch.

The trust deficit is not ignorance. It is experience.

The teams that use AI effectively are teams with strong code review practices, clear boundaries on where AI can and cannot be used, and senior engineers who can spot when the AI suggestion is subtly wrong. The teams that use AI poorly are teams that treat it as a replacement for engineering judgment.

84% adoption and 29% trust is not a contradiction. It is a realistic assessment of a tool that is useful within limits and dangerous beyond them. The engineers know this. The marketing departments of AI companies do not.

Talk to us →

Last updated August 3, 2025

Need engineers who think this way?

Senior developers on retainer. Same team, month 1 and month 36+.

Talk to us