How AI Is Changing Software Engineering: What the Research Actually Shows

Published 2026-04-14 · Updated 2026-04-15

✓ Fact-checked by the Kalenux editorial team

AI coding tools are software systems powered by large language models (LLMs) that assist software engineers by generating code, explaining existing code, writing tests, translating between languages, and increasingly taking autonomous actions across codebases. Since GitHub Copilot launched publicly in 2022, these tools have moved from novelty to near-ubiquity among professional developers -- and the discourse around them has occupied two opposite poles. One pole insists that AI will make programmers obsolete within a decade. The other insists that AI is a glorified autocomplete that cannot reason about systems and will always need a human checking its work. Both positions are wrong in the same way: they mistake the current state of a rapidly moving technology for a permanent condition.

What the evidence actually shows is more nuanced and more interesting. AI coding tools are meaningfully accelerating certain categories of software development work. They are not, as of 2025, capable of replacing the judgment that distinguishes a good software engineer from a code-generating process. But the distribution of value within software engineering is shifting -- and the engineers who understand where that shift is heading are positioning themselves very differently from those who are dismissing or ignoring it.

This article reviews the research that exists -- from GitHub's controlled studies to McKinsey's productivity analyses to MIT's work on AI and task distribution -- and provides a clear-eyed account of what is actually changing, what is not, and what skills software engineers need to navigate the next decade of their careers.

"AI is not replacing software engineers. It is replacing the parts of software engineering that software engineers least wanted to do. The question is what that leaves, and whether engineers are prepared for it." -- Thomas Dohmke, CEO of GitHub, at GitHub Universe 2023

Key Definitions

Large Language Model (LLM): The underlying technology in most AI coding tools. LLMs are neural networks trained on vast corpora of text and code, allowing them to generate plausible code completions, explain existing code, translate between languages, and produce boilerplate. They do not "understand" code in the way humans do; they predict statistically likely outputs given inputs. Understanding this distinction -- generation versus comprehension -- is essential to using these tools effectively.

GitHub Copilot: Microsoft/GitHub's AI coding assistant, integrated directly into IDEs including VS Code, JetBrains products, and Neovim. Copilot uses GPT-4-class models to suggest code completions and generate entire functions from natural language descriptions. As of early 2025, Copilot has over 1.8 million paying subscribers and is the most widely adopted AI coding tool.

Cursor: An AI-first IDE built on VS Code that integrates LLM capabilities more deeply than Copilot, allowing multi-file editing, codebase-aware context windows, and agentic operations that can modify multiple files in a single operation. Cursor saw rapid adoption among professional engineers in 2024, particularly those working on complex, multi-file refactoring tasks.

Agentic AI: AI systems that take sequences of actions -- running code, editing files, calling APIs, reading error messages and adjusting -- to complete a goal, rather than simply generating a single code suggestion. Devin, released in 2024 by Cognition AI, was marketed as the first "AI software engineer" and demonstrated agentic capabilities including writing code, running tests, and debugging failures autonomously. Subsequent independent benchmarking revealed significant limitations, but the direction of development is clear: AI coding tools are moving from suggestion to execution.

Prompt Engineering (Development Context): The practice of crafting effective instructions and context for AI coding tools to produce higher-quality outputs. In a development context, this includes providing architectural context, specifying constraints and edge cases, decomposing problems into AI-tractable units, and iterating on outputs rather than accepting first drafts.

The Productivity Research: What Studies Actually Found

Study	Year	Finding	Key Caveat
GitHub Copilot Controlled Study	2023	55% faster on specific task	Task was simple, well-defined; best-case scenario
McKinsey GenAI Report	2023	20-45% productivity boost estimate	"Augmented" not "automated" -- tasks done faster, not removed
MIT/NBER (Noy & Zhang)	2023	AI compressed performance variance	Study focused on writing tasks; generalization to coding remains open
Stack Overflow Developer Survey	2024	76% of developers use or plan to use AI tools	Satisfaction varied significantly by task type
Google DeepMind AlphaCode 2	2024	Competitive coding at 85th percentile	Competitive programming differs fundamentally from production engineering
Pearce et al. (Stanford/NYU)	2022	40% higher security vulnerability rate in AI-assisted code	Specific to authentication and cryptography code
Brynjolfsson, Li, Raymond (NBER)	2023	14% productivity increase for customer support agents	Largest gains for least experienced workers
METR/Anthropic Study	2025	No statistically significant speedup for experienced OSS developers	Experienced developers on familiar codebases

The table reveals something important: the measured productivity gains vary enormously depending on the task, the developer's experience level, and how "productivity" is defined. There is no single number that captures "how much AI helps" because the answer depends entirely on what the engineer is doing.

What the Research Shows About Productivity

GitHub's Controlled Study (2023)

GitHub's most-cited study placed 95 developers in controlled conditions where half used Copilot and half did not, tasking them with implementing an HTTP server in JavaScript. The Copilot group completed the task in a median of 1 hour 11 minutes versus 2 hours 41 minutes for the control group -- a 55 percent speed advantage.

The critical caveat is the task design. Implementing a small, well-defined server is precisely the kind of task where LLMs perform best: the problem has a clear specification, the solution is well-represented in training data, and the output is easily verifiable. Arvind Narayanan of Princeton, in his AI Snake Oil newsletter, noted that the study measured a best-case scenario, not a representative sample of software engineering work. The tasks that consume most of a senior engineer's time -- debugging distributed systems, designing for maintainability, navigating ambiguous requirements, managing technical debt -- were not represented.

GitHub's accompanying survey data showed that 88 percent of developers reported feeling more productive with Copilot, 74 percent said they could focus on more satisfying work, and 77 percent said it helped them spend less time searching for documentation. These subjective reports are meaningful -- they suggest AI tools are reducing the friction of routine work even when the measured productivity gains are task-dependent.

McKinsey Global Institute (2023)

McKinsey's report The Economic Potential of Generative AI estimated that 20-45 percent of software developer time could be augmented by generative AI tools, primarily through code generation, documentation, and test writing. They were careful to distinguish "augmented" from "automated" -- the analysis did not suggest these tasks would disappear, but that they could be completed faster with AI assistance.

The report estimated that widespread adoption could increase global software engineering productivity by roughly 25-35 percent in the medium term. Critically, they noted that productivity gains tend to accrue faster to experienced engineers who can evaluate AI output critically than to novices who may not catch errors in generated code -- a finding with significant implications for how organizations think about training and career development.

MIT/NBER Working Paper on AI and Task Distribution (2023)

Shakked Noy and Whitney Zhang at MIT studied how AI tools affected the distribution of performance across workers on professional writing tasks. Their central finding was that AI tools compressed performance variance -- lower-performing workers benefited substantially more than higher-performing ones. The gap between the 25th percentile and 75th percentile performer narrowed significantly when AI tools were available.

If this finding generalizes to software engineering -- and subsequent research by Brynjolfsson, Li, and Raymond (2023) on customer support agents found similar compression -- it suggests that AI tools will reduce the performance advantage of moderately skilled engineers over less skilled ones, potentially changing hiring calculus. Exceptional engineers, whose judgment, system-level thinking, and debugging ability remain largely outside AI's current capability, may see their relative value increase.

The METR/Anthropic Study (2025)

A particularly important counterpoint emerged from a study conducted by METR (Model Evaluation and Threat Research) in partnership with Anthropic (2025), which measured the impact of AI coding tools on experienced open-source software developers working on their own familiar codebases. Contrary to expectations, the study found no statistically significant speedup -- and in some cases, AI tools slightly slowed developers down, possibly because time spent reviewing, prompting, and correcting AI output offset the time saved on generation.

This result does not contradict the GitHub study; it contextualizes it. The GitHub study measured performance on a well-defined task with clear specifications. The METR study measured performance on real-world tasks in familiar codebases where the developer already knew the codebase intimately and where the tasks were often ambiguous, multi-step, and context-dependent. The implication is that AI productivity gains are task-type dependent and experience-level dependent -- a more nuanced picture than either the optimists or skeptics present.

What AI Tools Do Well

Boilerplate and Scaffolding

AI coding tools dramatically accelerate the generation of repetitive code patterns: CRUD endpoints, form validation logic, unit test scaffolding, configuration files, data transformation functions, and API client code. Engineers who previously spent 30-40 percent of their time writing this kind of code can now generate first drafts in seconds and invest their time in reviewing, adjusting, and integrating.

The leverage is particularly high for engineers working in unfamiliar frameworks or domains. AI can generate correct-looking code for an unfamiliar API or framework, which the engineer can review and validate rather than spending time reading documentation from scratch. This reduces the ramp-up cost of working with new technology, which has strategic implications for team composition and project staffing.

Documentation and Code Explanation

One of the highest-value but most underutilized capabilities of AI coding tools is code explanation. Given a complex function or module, LLMs can produce accurate natural-language explanations that accelerate onboarding, code review, and debugging. The Stack Overflow Developer Survey 2024 found documentation and code explanation to be the most-valued AI use cases among developers who use AI tools regularly.

Legacy codebases with poor documentation benefit particularly from this capability. An engineer who can ask "what does this 800-line function actually do?" and receive a coherent, structured explanation is substantially more productive when navigating unfamiliar code. Organizations with large legacy codebases -- banks, insurance companies, government agencies -- represent a massive addressable market for this specific capability.

Test Generation

Writing unit tests is often the most tedious part of disciplined software development, and AI tools can generate test cases from function signatures and docstrings with reasonable accuracy. However, engineers consistently report that AI-generated tests tend to cover happy paths better than edge cases and may miss the boundary conditions that matter most. The most effective approach combines AI-generated test scaffolding with human-written edge case tests -- using AI to handle the volume and human judgment to handle the subtlety.

Language and Framework Translation

AI tools are effective at translating code between languages (Python to Go, JavaScript to TypeScript) and adapting code from one framework to another. This has meaningfully reduced the friction of maintaining polyglot codebases and accelerated migration projects. Amazon reported using AI tools extensively in its migration from Java to Kotlin, claiming it saved thousands of developer-hours across the project.

What AI Tools Do Poorly

System Design and Architecture

AI cannot replace the judgment required to design systems that must be maintainable, scalable, and appropriate to the organizational context. Good system design requires understanding tradeoffs that depend on non-technical factors: team size and skill distribution, deployment environment, future product roadmap, existing technical debt, and the fallibility of the humans who will operate and maintain the system. LLMs can suggest architectures, but the suggestions are typically textbook-appropriate without being situation-appropriate.

The specific failure mode: AI architecture suggestions optimize for correctness in isolation, not for the team's actual constraints. A suggestion to use Kubernetes for a 3-person team shipping a single service reflects what is architecturally defensible in a blog post, not what is practically appropriate for a small team with limited operations bandwidth. The judgment to choose the simpler, less elegant, more maintainable solution is precisely the kind of contextual reasoning that AI tools lack.

Debugging Emergent System Behavior

When a distributed system behaves unexpectedly under production load, the debugging process requires understanding causality across systems, interpreting metrics and logs that were not designed to be machine-readable, forming and testing hypotheses under pressure, and making judgment calls about acceptable risk. AI tools can assist with parts of this process -- suggesting possible causes, explaining error messages, searching documentation -- but the core investigative judgment remains human. Charity Majors, CTO of Honeycomb and a prominent voice on observability, has argued that production debugging requires a form of abductive reasoning -- inference to the best explanation from incomplete evidence -- that current AI systems do not perform reliably.

Security and Adversarial Thinking

Research by Hammond Pearce and colleagues at Stanford and NYU (2022) found that developers who used AI coding assistants were significantly more likely to introduce security vulnerabilities than those who did not, particularly in code involving authentication, cryptography, and input handling. The AI tools generated syntactically correct code that was semantically insecure -- code that would pass review by someone not specifically looking for security issues.

This finding has serious implications. The engineers most at risk of shipping insecure AI-generated code are those who lack the security knowledge to recognize insecure patterns -- precisely the people who benefit most from AI assistance on other dimensions. The OWASP Top 10 and CWE Top 25 vulnerability categories represent the minimum security knowledge that every engineer using AI tools should internalize.

Novel Problem-Solving

LLMs generate plausible outputs based on patterns in training data. When the problem is genuinely novel -- a system failure mode with no prior art, an optimization problem in a specialized domain, a design challenge at a scale not previously documented -- LLMs produce confident-sounding but unreliable outputs. The failure is not obvious: an LLM responding to a genuinely novel problem sounds exactly as confident as one solving a well-documented problem. The engineer who recognizes the problem as novel can discount the AI's answer; the engineer who does not may ship a plausible but incorrect solution.

Are Junior Engineering Roles Disappearing?

Entry-level engineering hiring fell substantially at major tech companies between 2022 and 2024. Amazon, Google, Meta, and Microsoft all reduced or paused new graduate programs during this period. Revelio Labs (2024), which tracks workforce data, reported that junior engineering job postings dropped approximately 30 percent from their 2022 peak across the technology sector.

Attributing this entirely to AI is incorrect. The 2022-2024 contraction was driven primarily by rising interest rates, advertising market weakness, and the unwinding of pandemic-era growth assumptions. The same companies continued investing heavily in AI-focused products that require substantial engineering talent.

The more accurate concern is structural: if AI tools allow a team of 10 senior engineers to produce the output previously requiring 15 engineers (including junior contributors performing well-specified tasks), the equilibrium team size changes. Junior engineers have historically been partially valued for executing well-specified implementation tasks -- precisely the category where AI tools show the largest gains.

The counterargument, supported by economic history of previous automation waves, is that productivity gains tend to expand the scope of what gets built rather than shrink team sizes. Economist David Autor (MIT, 2024) has argued that AI, like previous general-purpose technologies, will create new categories of work faster than it eliminates existing ones -- though the transition period creates real displacement for specific workers in specific roles. The software engineering profession has been subject to this dynamic for decades: better tools have not historically reduced employment, though they have continuously shifted which skills command premium compensation.

The honest assessment for aspiring engineers: the junior engineering role is not disappearing, but its composition is changing. Entry-level engineers who can only write code to specification -- without understanding why the specification exists, whether it is correct, and how the code fits into a larger system -- face genuine competitive pressure from AI tools. Entry-level engineers who combine code fluency with system thinking, security awareness, and the ability to evaluate AI output critically remain highly valuable.

New Skills for the AI Era

Critical Evaluation of AI Output

The most immediately valuable skill for engineers working with AI tools is the ability to critically evaluate what AI generates. This requires strong foundational knowledge -- you cannot catch subtle bugs in AI-generated cryptography code if you do not understand cryptography. It also requires a specific skeptical stance toward AI output: treating it as a draft from a capable but unreliable collaborator, not a correct answer. Engineers who review AI output with the same rigor they apply to code from a junior team member consistently produce better results than those who accept AI suggestions uncritically.

Specification and Problem Decomposition

Working effectively with AI coding tools requires skill at specifying what you want. This is not simply "writing good prompts" -- it is the ability to decompose problems into units that AI can handle, provide sufficient context for the AI to generate useful output, recognize when a problem requires human judgment before AI execution, and iterate on AI outputs rather than regenerating from scratch. The engineers who get the most from AI tools are those who spend more time on specification and less time on generation.

System Design and Architecture

As AI absorbs more of the code implementation layer, the premium on system design judgment increases. The engineer who can determine what should be built, how it should be structured, and what tradeoffs are acceptable becomes more valuable as the gap between specification and implementation narrows. This is a skill that takes years to develop and that AI tools currently cannot replicate, making it the most durable career investment in software engineering.

Security Review

Given the documented tendency of AI-generated code to introduce security vulnerabilities, security review skills are increasingly important for all engineers, not just security specialists. Understanding common vulnerability patterns (OWASP Top 10, CWE Top 25) and being able to identify them in AI-generated code is a practical skill that matters right now. Organizations should consider requiring security training for all engineers who use AI coding tools, not just those in security-focused roles.

Domain Expertise

AI tools are most dangerous when applied to domains the engineer does not understand, because the engineer cannot evaluate whether the generated code is correct. Domain expertise -- deep understanding of the business context, regulatory requirements, and operational constraints of the systems you build -- becomes more valuable as AI makes it easier to generate code quickly and harder to verify that the code does what the business actually needs.

The Agentic Future: From Suggestion to Execution

The gap between "AI that suggests code" and "AI that completes tasks autonomously" is narrowing rapidly. Agentic AI systems -- tools that can read a task description, write code, run tests, interpret error messages, debug failures, and iterate toward a solution -- represent the next phase of AI's impact on software engineering.

Early agentic systems like Devin (Cognition AI, 2024) demonstrated the concept but with significant limitations: independent benchmarking by SWE-bench found that Devin resolved only about 14 percent of real-world GitHub issues autonomously, far below the marketing claims. However, the trajectory is clear: subsequent models and systems have steadily improved on these benchmarks, and the capabilities that seemed impressive in 2023 are baseline in 2025.

The engineers who understand what agentic tools can and cannot be trusted to do will be significantly more effective at directing them. This requires developing a mental model of AI capability that is neither dismissive nor credulous -- understanding the specific conditions under which AI agents succeed (well-specified tasks, good test coverage, familiar codebases) and the conditions under which they fail (ambiguous requirements, novel architectures, security-sensitive code).

Practical Takeaways

Adopt AI coding tools actively, not reluctantly. Engineers who dismiss these tools are not protecting their jobs; they are reducing their productivity relative to peers who are using them. The competitive advantage is not in avoiding AI but in using it more effectively than others.

Invest in the skills AI is not reaching. System design, security, complex debugging, domain expertise, and the organizational judgment that staff-level engineering requires. These skills were valuable before AI and are becoming more valuable as the skill distribution shifts.

Monitor agentic developments carefully. The transition from suggestion to autonomous execution changes the engineer's role more fundamentally than code completion did. The engineers who develop mental models of what agentic tools can be trusted to do -- and what requires human oversight -- will be the most effective at leveraging the next generation of tools.

Strengthen your fundamentals, not just your tool proficiency. AI tools change rapidly; fundamental knowledge of algorithms, data structures, system design, networking, and security does not. The engineer who deeply understands how systems work will navigate any tool transition effectively. The engineer who knows only how to prompt will be displaced by the next tool.

References and Further Reading

GitHub. (2023). Research: Quantifying GitHub Copilot's Impact on Developer Productivity and Happiness. https://github.blog/2022-09-07-research-quantifying-github-copilots-impact-on-developer-productivity-and-happiness/
McKinsey Global Institute. (2023). The Economic Potential of Generative AI: The Next Productivity Frontier. https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/the-economic-potential-of-generative-ai-the-next-productivity-frontier
Noy, S., & Zhang, W. (2023). Experimental Evidence on the Productivity Effects of Generative Artificial Intelligence. Science, 381(6654), 187-192.
Pearce, H., et al. (2022). Asleep at the Keyboard? Assessing the Security of GitHub Copilot's Code Contributions. IEEE Symposium on Security and Privacy (S&P), 2022.
Brynjolfsson, E., Li, D., & Raymond, L. (2023). Generative AI at Work. NBER Working Paper No. 31161.
METR. (2025). Measuring the Impact of AI Coding Tools on Developer Productivity. metr.org
Stack Overflow. (2024). Developer Survey 2024. https://survey.stackoverflow.co/2024
Narayanan, A. (2023). AI Snake Oil: What Artificial Intelligence Can Do, What It Can't, and How to Tell the Difference. Princeton University Press.
Autor, D. (2024). Applying AI to Rebuild Middle Class Jobs. NBER Working Paper No. 32140.
Cognition AI. (2024). Introducing Devin, the First AI Software Engineer. https://www.cognition.ai/blog/introducing-devin
Google DeepMind. (2024). AlphaCode 2: Competitive Programming with Large Language Models. deepmind.google
OECD. (2023). OECD Employment Outlook 2023: Artificial Intelligence and the Labour Market. https://www.oecd.org/employment-outlook/2023/
Revelio Labs. (2024). Tech Sector Workforce Trends. reveliolabs.com
Dohmke, T. (2023). Keynote address at GitHub Universe 2023, San Francisco.

Frequently Asked Questions

Is AI going to replace software engineers?

Research consensus as of 2025 is that AI tools augment rather than replace software engineers -- they accelerate boilerplate and documentation but cannot replace system design judgment, complex debugging, or security evaluation. The role is changing, not disappearing.

How much faster do developers work with AI coding tools?

GitHub's controlled study found 55% faster completion on a specific well-defined task; McKinsey estimated a 20-45% productivity boost across software development tasks. Gains are largest for routine code generation and smallest for architecture and debugging.

Are junior software engineering jobs disappearing because of AI?

Entry-level hiring slowed in 2023-2024 primarily due to broader tech sector contraction, not AI alone. The more accurate concern is that AI raises expected output per engineer, which may slow headcount growth rather than eliminate junior roles.

What new skills do software engineers need in the AI era?

Critical evaluation of AI-generated code (including security vulnerabilities), prompt engineering for development contexts, system design judgment, and domain expertise for specifying what should be built -- not just how to build it.

Which AI coding tools are software engineers actually using?

GitHub Copilot remains the most widely adopted with over 1.8 million paying subscribers as of early 2024; Cursor has grown rapidly; Amazon CodeWhisperer, Tabnine, and Codeium are also widely used. ChatGPT and Claude are frequently used for code explanation outside the IDE.