The discourse around AI and software engineering has occupied two opposite poles since GitHub Copilot launched publicly in 2022. One pole insists that AI will make programmers obsolete within a decade. The other insists that AI is a glorified autocomplete that cannot reason about systems and will always need a human checking its work. Both positions are wrong in the same way: they mistake the current state of a rapidly moving technology for a permanent condition.

What the evidence actually shows is more nuanced and more interesting. AI coding tools are meaningfully accelerating certain categories of software development work. They are not, in 2025, capable of replacing the judgment that distinguishes a good software engineer from a code-generating process. But the distribution of value within software engineering is shifting -- and the engineers who understand where that shift is going are positioning themselves very differently from those who are dismissing or ignoring it.

This article reviews the research that exists -- from GitHub's controlled studies to McKinsey's productivity analyses to MIT's work on AI and task distribution -- and tries to give a clear-eyed account of what is actually changing, what is not, and what skills software engineers need to navigate the next decade of their careers.

"AI is not replacing software engineers. It is replacing the parts of software engineering that software engineers least wanted to do. The question is what that leaves, and whether engineers are prepared for it." -- Thomas Dohmke, CEO of GitHub, at GitHub Universe 2023


Key Definitions

Large Language Model (LLM): The underlying technology in most AI coding tools. LLMs are trained on vast corpora of text and code, allowing them to generate plausible code completions, explain existing code, translate between languages, and produce boilerplate. They do not 'understand' code in the way humans do; they predict likely outputs given inputs.

GitHub Copilot: Microsoft/GitHub's AI coding assistant, integrated directly into IDEs including VS Code and JetBrains products. Copilot uses GPT-4-class models to suggest code completions and generate entire functions from natural language descriptions.

Cursor: An AI-first IDE built on VS Code that integrates LLM capabilities more deeply than Copilot, allowing multi-file editing, codebase-aware context, and agentic operations. Widely adopted among professional engineers in 2024.

Agentic AI: AI systems that take sequences of actions -- running code, editing files, calling APIs -- to complete a goal, rather than simply generating a single code suggestion. Devin, released in 2024 by Cognition AI, was marketed as the first 'AI software engineer' and demonstrated some agentic capabilities, though with significant limitations.

Prompt Engineering (Development Context): The practice of crafting effective instructions and context for AI coding tools to produce higher-quality outputs. Includes techniques like providing context about codebase architecture, specifying constraints and edge cases, and iterating on AI outputs rather than accepting first drafts.


AI Productivity Research: What Studies Actually Found

Study Finding Key Caveat
GitHub Copilot Study (2023) 55% faster on specific task Task was simple, well-defined; best-case scenario
McKinsey GenAI Report (2023) 20-45% productivity boost estimate 'Augmented' not 'automated' -- tasks done faster, not removed
MIT/NBER (Noy & Zhang, 2023) AI compressed performance variance Generalisation to coding remains open question
Stack Overflow Survey (2024) 62% of developers use AI tools Satisfaction varied significantly by task type
Google DeepMind AlphaCode 2 (2024) Competitive coding at 85th percentile Competitive programming differs from production engineering
Pearce et al. Stanford (2022) 40% higher security vulnerability rate in AI code For auth/crypto specifically; not all code types

What the Research Shows About Productivity

GitHub's Controlled Study (2023)

GitHub's most-cited study on Copilot productivity placed 95 developers in controlled conditions where half used Copilot and half did not, and tasked them all with implementing a server in JavaScript. The Copilot group completed the task in a median of 1 hour 11 minutes versus 2 hours 41 minutes for the control group -- a 55% speed advantage.

The critical caveat is the task design. Implementing a small, well-defined server is precisely the kind of task where LLMs perform best: the problem has a clear specification, the solution is well-represented in training data, and the output is easily verifiable. Researchers including Arvind Narayanan of Princeton noted that the study measured a best-case scenario, not a representative sample of software engineering work.

GitHub's survey data from the same period showed 88% of developers reported feeling more productive with Copilot, 74% said they could focus on more satisfying work, and 77% said it helped them spend less time searching for documentation and examples.

McKinsey Global Institute (2023)

McKinsey's 2023 report 'The Economic Potential of Generative AI' estimated that 20-45% of software developer time could be augmented by generative AI tools, primarily through code generation, documentation, and test writing. They were careful to distinguish 'augmented' from 'automated' -- the analysis did not suggest these tasks would be removed from engineering, but rather that they could be completed faster.

The report estimated that widespread adoption of AI coding tools could increase global software engineering productivity by roughly 25-35% in the medium term. They also noted that productivity gains tend to accrue faster to experienced engineers who can evaluate AI output critically than to novices who may not catch errors in AI-generated code.

MIT NBER Working Paper on AI and Task Distribution (2023)

A working paper from MIT researchers Shakked Noy and Whitney Zhang studied how AI tools affected the distribution of performance across workers on writing tasks. They found that AI tools compressed performance variance -- lower-performing workers benefited more than higher-performing ones. The gap between the 25th percentile and 75th percentile performer narrowed significantly.

If this finding generalises to software engineering, it suggests that AI tools will reduce the performance advantage of moderately skilled engineers over less skilled ones, potentially changing hiring calculus. Exceptional engineers, whose judgment, system-level thinking, and debugging ability remain largely outside AI's current capability, may see their relative value increase.


What AI Tools Do Well

Boilerplate and Scaffolding

AI coding tools dramatically accelerate the generation of repetitive code patterns: CRUD endpoints, form validation logic, unit test scaffolding, configuration files, data transformation functions. Engineers who previously spent 30-40% of their time writing this kind of code can now generate first drafts in seconds and spend their time reviewing, adjusting, and integrating.

The leverage is particularly high for engineers working in new frameworks or unfamiliar domains -- AI can generate correct-looking code for an unfamiliar API or framework, which the engineer can then review and validate rather than spending time reading documentation from scratch.

Documentation and Code Explanation

One of the highest-value but least-utilised capabilities of AI coding tools is code explanation. Given a complex function, LLMs can produce accurate natural-language explanations that accelerate onboarding, code review, and debugging. The Stack Overflow Developer Survey 2024 found documentation and code explanation as the most-valued AI use cases among developers who use AI tools regularly.

Legacy codebases with poor documentation benefit particularly from this capability. An engineer who can ask 'what does this 800-line function actually do?' and receive a coherent explanation is substantially more productive when navigating unfamiliar code.

Test Generation

Writing unit tests is often the most tedious part of disciplined software development. AI tools can generate test cases from function signatures and docstrings with reasonable accuracy, though engineers note that AI-generated tests tend to cover happy paths better than edge cases, and may miss the tests that matter most.

Language and Framework Translation

AI tools are effective at translating code between languages (Python to Go, JavaScript to TypeScript) and adapting code from one framework to another. This has meaningfully reduced the friction of maintaining polyglot codebases and accelerated migration projects.


What AI Tools Do Poorly

System Design and Architecture

AI cannot replace the judgment required to design systems that must be maintainable, scalable, and appropriate to the organisational context. Good system design requires understanding tradeoffs that depend on non-technical factors: team size, deployment environment, future roadmap, existing technical debt, and the fallibility of the humans who will operate and maintain the system. LLMs can suggest architectures, but the suggestions are often textbook-appropriate without being situation-appropriate.

The specific failure mode: AI architecture suggestions optimise for correctness in isolation, not for the team's actual constraints. A suggestion to use Kubernetes for a 3-person team shipping a single service reflects what is architecturally defensible, not what is practically appropriate.

Debugging Emergent System Behaviour

When a distributed system behaves unexpectedly under production load, the debugging process requires understanding causality across systems, interpreting metrics and logs that were not designed to be machine-readable, forming and testing hypotheses under pressure, and making judgment calls about acceptable risk. AI tools can assist with parts of this process, but the core investigative judgment remains human.

Security and Adversarial Thinking

Research from Stanford University (2022) found that developers who used AI coding assistants were significantly more likely to introduce security vulnerabilities than those who did not, particularly in code involving authentication, cryptography, and input handling. The AI tools generated syntactically correct code that was semantically insecure -- code that would pass review by someone not looking specifically for security issues.

This finding has significant implications for production engineering. The engineers most at risk of shipping insecure AI-generated code are those who lack the security knowledge to recognise insecure patterns -- precisely the people who benefit most from AI assistance on other dimensions.

Novel Problem-Solving

LLMs are systems that generate plausible outputs based on patterns in training data. When the problem is genuinely novel -- a system failure mode with no prior art, an optimisation problem in a specialised domain, a design challenge at a scale that has not been previously documented -- LLMs produce confident-sounding but unreliable outputs.

The failure is not obvious. An LLM responding to a genuinely novel problem sounds as confident as one solving a well-documented problem. The engineer who knows the problem is novel can discount the AI's confident answer; the engineer who does not may not.


Are Junior Engineering Roles Disappearing?

Entry-level engineering hiring fell substantially at major tech companies between 2022 and 2024. Amazon, Google, Meta, and Microsoft all reduced or paused new graduate programmes during this period, citing macroeconomic conditions and a recalibration after over-hiring during the 2020-2021 pandemic tech boom.

Attributing this entirely to AI is incorrect. The 2022-2024 contraction was driven primarily by rising interest rates, advertising market weakness, and the unwinding of pandemic-era growth assumptions. Many of the same companies continued building AI-focused products that require engineering talent.

The more accurate concern is structural: if AI tools allow a team of 10 senior engineers to produce the output previously requiring 15 engineers (including junior contributors), the equilibrium team size changes. Junior engineers have historically been partially valued for executing well-specified tasks -- precisely the category where AI tools show the largest gains.

The counterargument, supported by economic history of previous automation waves, is that productivity gains tend to expand the overall scope of what gets built rather than shrink team sizes. If AI doubles the productivity of senior engineers, companies may build more products rather than employ fewer engineers. The software engineering profession has been subject to this dynamic for decades -- better tools have not, historically, reduced employment, though they have continuously shifted which skills are valued.


New Skills for the AI Era

Critical Evaluation of AI Output

The most immediately valuable skill for engineers working with AI tools is the ability to critically evaluate what AI generates. This requires strong foundational knowledge -- you cannot catch subtle bugs in AI-generated cryptography code if you do not understand cryptography. It also requires a specific sceptical stance toward AI output: treating it as a draft from a capable but unreliable junior colleague, not a correct answer.

Specification and Prompt Engineering

Working effectively with AI coding tools requires skill at specifying what you want. This is not simply 'writing good prompts' -- it is the ability to decompose problems into units that AI can handle, provide sufficient context for the AI to generate useful output, and recognise when a problem requires human judgment before AI execution.

System Design and Architecture

As AI absorbs more of the code implementation layer, the premium on system design judgment increases. The engineer who can determine what should be built, how it should be structured, and what tradeoffs are acceptable becomes more valuable as the gap between specification and implementation narrows.

Security Review

Given the documented tendency of AI-generated code to introduce security vulnerabilities, security review skills are increasingly important for all engineers, not just security specialists. Understanding common vulnerability patterns (OWASP Top 10, CWE Top 25) and being able to identify them in AI-generated code is a practical skill that matters right now, across all roles.


Practical Takeaways

Adopt AI coding tools actively, not reluctantly. Engineers who dismiss these tools are not protecting their jobs; they are reducing their productivity relative to peers who are using them. The skill is in using them effectively, not in avoiding them.

Invest in the skills that AI is not yet reaching: system design, security, complex debugging, and the organisational judgment that staff-level engineering requires. These skills were valuable before AI and are becoming more valuable as the skill distribution shifts.

Monitor agentic AI developments carefully. The gap between 'AI that suggests code' and 'AI that completes tasks autonomously' is narrowing, and the second category changes the engineer's role more fundamentally than the first. The engineers who understand what agentic tools can and cannot be trusted to do will be significantly more effective at directing them.


References

  1. GitHub. (2023). Research: Quantifying GitHub Copilot's Impact on Developer Productivity and Happiness. github.blog
  2. McKinsey Global Institute. (2023). The Economic Potential of Generative AI: The Next Productivity Frontier. mckinsey.com
  3. Noy, S., & Zhang, W. (2023). Experimental Evidence on the Productivity Effects of Generative Artificial Intelligence. MIT/NBER Working Paper.
  4. Pearce, H., et al. (2022). Asleep at the Keyboard? Assessing the Security of GitHub Copilot's Code Contributions. Stanford University/IEEE S&P 2022.
  5. Stack Overflow. (2024). Developer Survey 2024. survey.stackoverflow.co/2024
  6. Dohmke, Thomas. (2023). Keynote address at GitHub Universe 2023. San Francisco.
  7. OECD. (2023). OECD Employment Outlook 2023: Artificial Intelligence and the Labour Market. oecd.org
  8. Brynjolfsson, E.; Li, D.; Raymond, L. (2023). Generative AI at Work. NBER Working Paper No. 31161.
  9. Cognition AI. (2024). Introducing Devin, the First AI Software Engineer. cognition-labs.com
  10. Google DeepMind. (2024). AlphaCode 2: Competitive Programming with LLMs. deepmind.google
  11. Narayanan, Arvind. (2023). AI Snake Oil Newsletter: What AI Can and Cannot Do. aisnakeoil.com
  12. IDC. (2024). Worldwide Developer and Source Code Management Software Forecast 2024-2028. idc.com

Frequently Asked Questions

Is AI going to replace software engineers?

Research consensus as of 2025 is that AI tools augment rather than replace software engineers -- they accelerate boilerplate and documentation but cannot replace system design judgment, complex debugging, or security evaluation. The role is changing, not disappearing.

How much faster do developers work with AI coding tools?

GitHub's controlled study found 55% faster completion on a specific well-defined task; McKinsey estimated a 20-45% productivity boost across software development tasks. Gains are largest for routine code generation and smallest for architecture and debugging.

Are junior software engineering jobs disappearing because of AI?

Entry-level hiring slowed in 2023-2024 primarily due to broader tech sector contraction, not AI alone. The more accurate concern is that AI raises expected output per engineer, which may slow headcount growth rather than eliminate junior roles.

What new skills do software engineers need in the AI era?

Critical evaluation of AI-generated code (including security vulnerabilities), prompt engineering for development contexts, system design judgment, and domain expertise for specifying what should be built -- not just how to build it.

Which AI coding tools are software engineers actually using?

GitHub Copilot remains the most widely adopted with over 1.8 million paying subscribers as of early 2024; Cursor has grown rapidly; Amazon CodeWhisperer, Tabnine, and Codeium are also widely used. ChatGPT and Claude are frequently used for code explanation outside the IDE.