Before DevOps, software organizations typically operated under a division that seemed logical but created serious dysfunction: development teams wrote code, and operations teams ran it in production. Developers were incentivized to ship new features quickly; operations teams were incentivized to maintain stability and avoid changes that might cause outages. The result was predictable: developers threw code over the wall to operations, who received it reluctantly, deployed it infrequently, and blamed developers when things broke. Developers blamed operations for moving too slowly. Releases were stressful, infrequent events that required elaborate coordination and often caused production incidents. The wall between development and operations was organizational, not technical, but it produced real costs in delivery speed, quality, and the wellbeing of everyone involved.

Patrick Debois organized the first DevOpsDays conference in Ghent, Belgium in October 2009, giving a name to a set of ideas that a community of practitioners was already developing. The name was deliberately unifying: development and operations were not separate disciplines with competing interests but facets of a single activity -- delivering working software to users reliably and repeatedly. The insights Debois drew on came from multiple directions: Agile software development, which had already challenged waterfall models of software delivery; the lean manufacturing movement and the Toyota Production System, which provided frameworks for thinking about waste, flow, and continuous improvement; and the practical experience of engineers at companies including Flickr and Amazon who had demonstrated that frequent deployments and operational stability were not in tension.

In the decade and a half since, DevOps has moved from a provocative idea at a small conference in Belgium to the organizing principle of software delivery at organizations ranging from startups to governments and Fortune 500 companies. It has spawned a substantial body of research (the DORA metrics and the 'Accelerate' book by Nicole Forsgren, Jez Humble, and Gene Kim are the most rigorous and influential), a vast commercial ecosystem of tools, and a set of practices -- continuous integration, continuous delivery, infrastructure as code, observability -- that have become standard expectations in software engineering. It has also generated confusion, hype, and misapplication sufficient to fill several books of its own.

"DevOps is not a goal, but a never-ending process of continual improvement." -- Jez Humble


Key Definitions

DevOps: A cultural, philosophical, and technical movement that unifies software development (Dev) and IT operations (Ops) by breaking down the organizational silos between them, enabling faster, more reliable software delivery through automation, collaboration, and shared responsibility.

Continuous integration (CI): The practice of merging code changes into a shared repository frequently, with each merge triggering automated builds and tests to detect integration problems quickly.

Continuous delivery (CD): The practice of ensuring that code is always in a deployable state, with every successful CI pipeline producing an artifact that can be deployed to production at any time.

DORA metrics: The four key metrics identified by the DevOps Research and Assessment team (deployment frequency, change lead time, change failure rate, time to restore service) that measure software delivery performance and predict organizational outcomes.

Platform engineering: The practice of building and maintaining an internal developer platform that provides development teams with self-service access to infrastructure and tooling, reducing cognitive load and enabling DevOps at scale.


Origins and Historical Context

The Wall Between Dev and Ops

The separation of development and operations has deep roots in both organizational theory and the practical history of computing. In the mainframe era, 'operations' was a specialized discipline concerned with scheduling batch jobs, managing tape libraries, and maintaining physical hardware -- work that was genuinely distinct from programming. As software systems became more complex, the separation persisted and deepened: operations teams became responsible for production environments while development teams worked in isolation from production realities.

By the 2000s, this separation was creating serious problems in organizations building web-scale software. Development teams working in Agile sprints were producing code continuously, but operations teams were deploying it quarterly, at best. The coordination overhead of each release -- change advisory boards, release notes, deployment runbooks, post-deployment monitoring rotations -- was consuming enormous engineering effort. And because releases were infrequent, each one contained large batches of changes that were difficult to diagnose when they caused problems.

The Agile movement had addressed the development side of this equation by introducing iterative delivery, collaboration, and continuous feedback. But Agile practices stopping at the boundary of the development team left the deployment and operations side largely unchanged. The DevOps insight was that the same principles -- small batches, fast feedback, continuous improvement, shared responsibility -- needed to extend through the entire value stream from code commit to production.

John Allspaw and the Flickr Model

One of the most influential moments in the early DevOps movement was John Allspaw and Paul Hammond's presentation '10+ Deploys Per Day: Dev and Ops Cooperation at Flickr' at the Velocity conference in June 2009. Flickr, the photo sharing platform that had recently been acquired by Yahoo, was deploying code to production ten or more times per day -- a figure that seemed almost reckless to engineers accustomed to quarterly release cycles.

Allspaw and Hammond's presentation argued that this deployment frequency was not only possible but desirable: small, frequent deployments meant that each deployment contained fewer changes, making problems easier to diagnose and fix; developers received faster feedback on their code in production; and the operational risk of each individual deployment was lower precisely because it was smaller. The key enabler was not technology but culture: development and operations teams operated as a single team with shared goals and shared accountability, not as separate organizations with competing incentives.

This presentation was influential on Patrick Debois and on the broader community that gathered at the first DevOpsDays, and it became a touchstone for the argument that high deployment frequency and high reliability were not in tension.


The Three Ways

Gene Kim, Jez Humble, Patrick Debois, and John Willis developed the Three Ways framework in 'The DevOps Handbook' (2016) as a conceptual organizing principle for DevOps practices.

The First Way: Flow

The First Way is about optimizing the flow of work through the entire value stream, from business hypothesis to production deployment. The focus is on making work visible, identifying and eliminating bottlenecks, reducing batch sizes, and avoiding work being passed back upstream. The practical applications include continuous integration (which eliminates the integration bottleneck), deployment pipelines (which automate the path from commit to production), and limiting work in progress (which surfaces bottlenecks and prevents the partial work that creates waste).

The Lean manufacturing concept of value stream mapping -- tracing the complete path that a unit of work takes from inception to delivery, identifying all the waiting, handoffs, and inefficiencies -- is directly applicable to software delivery. Teams that map their deployment pipeline often discover that the actual work of building, testing, and deploying code takes hours, but the total elapsed time from commit to production is days or weeks, most of which is waiting: for reviews, for approvals, for environment availability.

The Second Way: Feedback

The Second Way is about creating fast, continuous feedback loops at every stage of the value stream. Automated testing provides feedback on whether code changes break existing functionality within minutes of a commit. Monitoring and alerting provide feedback on whether a production deployment has caused problems. Post-incident reviews provide feedback on what failure modes were not anticipated. Customer feedback provides information on whether features are actually meeting user needs.

The principle underlying the Second Way is that problems should be discovered as close as possible in time and space to where they are introduced, when they are cheapest and easiest to fix. A test that catches a bug seconds after it is introduced costs minutes to fix; the same bug discovered in production six months later may cost days. The investment in fast, comprehensive automated testing is justified by this asymmetry.

The Third Way: Continuous Learning and Experimentation

The Third Way is about creating a culture of continuous learning and experimentation. This includes blameless post-incident reviews (which generate learning from failures without creating incentives to hide problems), allocating time for improvement and experimentation (which creates space for teams to address technical debt and explore new approaches), and tolerating the necessary failure that learning requires.

Netflix's chaos engineering practice -- deliberately injecting failures into production systems to discover and fix weaknesses before they cause unplanned outages -- is an example of the Third Way applied at scale. The practice, pioneered by Netflix engineer Cory Bennett and others with the Chaos Monkey tool, was built on the recognition that production failures are inevitable and that discovering them proactively (when a team is prepared to respond) is preferable to discovering them unexpectedly.


The DORA Research

Accelerate and Its Findings

Nicole Forsgren, Jez Humble, and Gene Kim's 'Accelerate: The Science of Lean Software and DevOps' (2018) presented findings from four years of annual State of DevOps surveys conducted with DORA (DevOps Research and Assessment). The research involved data from thousands of software delivery practitioners worldwide and used rigorous statistical methods (including structural equation modeling) to identify the capabilities that drive software delivery performance.

The central finding was that high-performing technology organizations -- those that deployed frequently, with short lead times, low failure rates, and fast recovery -- also outperformed their peers on organizational outcomes: revenue growth, market share, profitability, and customer satisfaction. Software delivery performance was not in tension with business performance; it was a significant driver of it.

The research also identified 24 capabilities that statistically predict high software delivery performance, organized into technical practices (continuous delivery, architecture, version control, test automation), process practices (team experimentation, flow management, work in process limits, visibility of work), cultural practices (generative organizational culture, learning culture, transformational leadership), and lean product management practices.

The Four DORA Metrics

Metric What It Measures Elite Performance High Performance
Deployment frequency How often code is deployed to production On-demand (multiple per day) Weekly to monthly
Change lead time Time from commit to production deployment Less than one hour One day to one week
Change failure rate Percentage of deployments causing incidents 0-5% 5-10%
Time to restore service How long to recover from a production incident Less than one hour Less than one day

Measuring Improvement

The DORA metrics provide a practical framework for teams to measure their own delivery performance and track improvement over time. Deployment frequency can be measured directly from deployment tooling logs. Change lead time can be measured from commit timestamps to deployment timestamps. Change failure rate can be measured by tagging production incidents with the deployment that caused them and calculating the ratio. Time to restore service can be measured from incident creation to incident resolution.

DORA publishes annual benchmarks that allow teams to compare their performance against industry norms, categorizing organizations as elite, high, medium, or low performers. The benchmarks have shifted significantly over the decade of research: what was considered elite performance in 2014 (daily deployments, week-long lead times) is considered medium performance by 2023 standards, reflecting genuine industry improvement.


DevSecOps and the Security Integration

The integration of security into DevOps pipelines has become one of the most important evolution of the model. Traditional security practices -- penetration testing and security review gates before major releases -- were incompatible with the high-velocity deployment model that DevOps enabled. If a team is deploying dozens of times per day, a security review that takes two weeks for each release becomes an immediate bottleneck.

DevSecOps solves this by automating security checks throughout the CI/CD pipeline. Static application security testing (SAST) tools analyze source code for security vulnerabilities as part of every build. Software composition analysis (SCA) checks dependencies against vulnerability databases and alerts on newly discovered vulnerabilities in packages the application uses. Container image scanning checks base images and installed packages against known CVEs before deployment. Infrastructure as code scanning checks Terraform, CloudFormation, or Kubernetes manifests for misconfigurations before they are applied.

The Shift Left Security principle -- addressing security concerns as early as possible, ideally during development rather than pre-production or post-deployment -- reflects the same logic as the broader DevOps principle that late discovery of problems is expensive. A vulnerability discovered in code review is cheap to fix; the same vulnerability discovered after it has been exploited in production may be catastrophic.


Platform Engineering as DevOps at Scale

The DevOps model works well for small teams and simple environments but encounters scaling challenges in large organizations with many teams and complex infrastructure. When every team is responsible for their own Kubernetes clusters, CI/CD pipelines, observability stacks, and security tooling, the cognitive overhead can become overwhelming, and the inconsistency between teams makes organization-wide visibility and governance difficult.

Platform engineering addresses these challenges by centralizing infrastructure and tooling concerns in a dedicated team whose customers are the development teams. The platform team builds and maintains an internal developer platform (IDP) -- a curated, self-service abstraction layer that provides development teams with the capabilities they need without requiring deep infrastructure expertise. Provisioning a new service, setting up a CI/CD pipeline, deploying to a testing environment, and accessing logs and metrics all happen through the platform, with sensible defaults enforced by the platform rather than requiring each team to figure them out independently.

Internally created platforms should treat their internal developer teams as customers with product requirements, applying product thinking to the development of internal tooling. Humanitec's Platform Orchestrator, HashiCorp's Waypoint, and the open-source Backstage developer portal from Spotify are among the tools that support platform engineering practices, though the most important factor is not the specific tooling but the organizational model: a dedicated team with a product mindset building infrastructure that serves developer teams.


Common Misconceptions

DevOps is frequently misunderstood as a job title (a DevOps engineer who does what operations engineers used to do, but with more automation), as a tool set (CI/CD pipelines, Kubernetes, infrastructure as code), or as something that can be bought from a vendor. None of these captures the core idea.

The most important misconception is that DevOps is primarily about tools and automation. The automation is essential but secondary; the primary change is organizational and cultural: breaking down the wall between development and operations through shared goals, shared metrics, shared ownership of production systems, and collaboration across the entire value stream. Teams that invest heavily in CI/CD tooling without changing their organizational structure and incentives often find that they have automated a dysfunctional process rather than improved it.

DevOps also does not mean that there are no specialists. Organizations need engineers with deep expertise in networking, security, database administration, and infrastructure. What changes is the relationship between these specialists and the development teams: instead of operating as gatekeepers through whom all changes must pass, they operate as enablers who build platforms, automation, and self-service capabilities that allow development teams to move faster safely.


References

  1. Kim, G., Humble, J., Debois, P., & Willis, J. (2016). The DevOps Handbook. IT Revolution Press.
  2. Forsgren, N., Humble, J., & Kim, G. (2018). Accelerate: The Science of Lean Software and DevOps. IT Revolution Press.
  3. Allspaw, J., & Hammond, P. (2009). 10+ deploys per day: Dev and ops cooperation at Flickr. Velocity 2009 conference presentation.
  4. Humble, J., & Farley, D. (2010). Continuous Delivery. Addison-Wesley.
  5. Kim, G. (2019). The Unicorn Project. IT Revolution Press.
  6. Kim, G., Behr, K., & Spafford, G. (2013). The Phoenix Project. IT Revolution Press.
  7. DORA. (2023). State of DevOps Report 2023. dora.dev.
  8. Womack, J. P., & Jones, D. T. (1996). Lean Thinking. Simon & Schuster.
  9. Nygard, M. T. (2018). Release It! (2nd ed.). Pragmatic Bookshelf.
  10. Bass, L., Weber, I., & Zhu, L. (2015). DevOps: A Software Architect's Perspective. Addison-Wesley.
  11. Skelton, M., & Pais, M. (2019). Team Topologies. IT Revolution Press.
  12. Richardson, C. (2018). Microservices Patterns. Manning Publications.

Frequently Asked Questions

Who invented DevOps and where did it come from?

The term 'DevOps' is generally credited to Patrick Debois, a Belgian software consultant and developer, who coined it in 2009 for the first DevOpsDays conference he organized in Ghent, Belgium. Debois had been frustrated by the organizational dysfunction he observed between development teams (who wanted to ship new features quickly) and operations teams (who prioritized stability and were reluctant to deploy frequently). His insight was that the conflict was cultural and organizational, not technical, and that the solution required changes to how teams were structured, incentivized, and how they communicated.The intellectual foundations of DevOps drew on several earlier threads. The Agile Manifesto (2001) had already challenged the waterfall model of software development, emphasizing iterative delivery and collaboration. The Lean manufacturing movement, and especially the Toyota Production System, provided concepts (waste elimination, continuous improvement, value stream mapping) that influenced the DevOps approach to software delivery. John Allspaw and Paul Hammond's 2009 Velocity conference presentation '10+ Deploys Per Day: Dev and Ops Cooperation at Flickr' provided a compelling real-world demonstration that development and operations could collaborate to deploy software dozens of times daily while maintaining stability.The DevOps movement was subsequently shaped by practitioners including Gene Kim, Jez Humble, Patrick Debois, and John Willis, whose 'The DevOps Handbook' (2016) systematized the practices, and the 'State of DevOps Report,' which since 2013 has provided annual survey data on DevOps adoption and its relationship to organizational performance.

What are the DORA metrics?

The DORA metrics (named for the DevOps Research and Assessment team, founded by Nicole Forsgren, Jez Humble, and Gene Kim) are four key metrics that the 'Accelerate' research (published in book form in 2018) identified as measuring software delivery performance and predicting organizational outcomes.Deployment frequency measures how often a team deploys code to production. High-performing teams deploy on demand, multiple times per day; low performers deploy less than once per month. Change lead time measures the time from code commit to production deployment. High performers achieve less than one hour; low performers take more than six months. Change failure rate measures the percentage of deployments that cause a production incident requiring remediation. High performers have failure rates below 5%; low performers between 46-60%. Time to restore service measures how long it takes to recover from a production incident. High performers recover in less than one hour; low performers take more than six months.The DORA research found that these four metrics cluster together: teams that deploy frequently also have short lead times, low failure rates, and fast recovery. They also found that high performance on these metrics is strongly associated with broader organizational outcomes -- commercial performance, market share, profitability, and employee wellbeing. A fifth metric, reliability (the degree to which services meet their user availability and performance targets), was added in later research.The DORA metrics have become widely used as benchmarks for software delivery improvement, providing teams and organizations with concrete, measurable targets rather than vague aspirations to 'do DevOps.'

What is a CI/CD pipeline?

A CI/CD pipeline is the automated sequence of steps that code goes through from a developer's commit to production deployment. CI (continuous integration) and CD (continuous delivery or deployment) are distinct practices that are often combined.Continuous integration is the practice of merging code changes into a shared repository frequently (multiple times per day) and running automated tests on each merge. The goal is to detect integration problems -- conflicts between different developers' changes, regressions introduced by new code -- as quickly as possible, when they are cheapest to fix. A CI pipeline typically includes: source control trigger (a push or pull request starts the pipeline), build (compile the code or build the container image), static analysis and linting (code quality checks), unit tests (fast, isolated tests of individual functions), integration tests (tests that verify components work together), and security scanning (checking for known vulnerabilities in dependencies).Continuous delivery extends CI by ensuring that the code is always in a deployable state. Every successful CI run produces an artifact (a container image, a deployment package) that could be deployed to production at any time, though the actual deployment to production still requires human approval. Continuous deployment goes further: every successful pipeline run automatically deploys to production without manual approval.Popular CI/CD platforms include GitHub Actions, GitLab CI/CD, CircleCI, Jenkins, and TeamCity. Cloud providers offer their own services (AWS CodePipeline, Google Cloud Build, Azure Pipelines). The specific tools matter less than the underlying practice: fast feedback, frequent integration, and automated quality gates that prevent broken code from reaching production.

What is the difference between DevOps and DevSecOps?

DevSecOps (development, security, and operations) extends the DevOps model by integrating security practices throughout the software development lifecycle rather than treating security as a separate phase or team that reviews code only before release.In traditional software development, security was often addressed in a 'security review' gate near the end of the development cycle, before production deployment. This created bottlenecks (security teams became a constraint on release velocity) and made security problems expensive to fix (late discovery means extensive rework). DevSecOps applies the same principle to security that DevOps applied to operations: shift it left, meaning address it as early as possible in the development process.Practical DevSecOps involves: integrating static application security testing (SAST) tools into CI pipelines to detect vulnerabilities in source code; adding software composition analysis (SCA) to identify vulnerable dependencies (tools like Snyk, Dependabot, or OWASP Dependency-Check); running container image scanning against known vulnerability databases before deployment; implementing infrastructure as code security scanning (tools like Checkov or tfsec); and establishing dynamic application security testing (DAST) against running environments.The cultural shift is as important as the tooling: security knowledge needs to be distributed to development teams rather than concentrated in a separate security team. Developer security training, 'champion' programs where developers develop security expertise, and security requirements embedded in definition-of-done criteria all support this. The CISA (Cybersecurity and Infrastructure Security Agency) Secure by Design principles and the NIST Secure Software Development Framework provide governance frameworks for DevSecOps programs.

What is platform engineering and how does it relate to DevOps?

Platform engineering emerged in the late 2010s and early 2020s as organizations scaled their DevOps practices and recognized a new bottleneck: every development team was spending significant time on infrastructure, toolchain, and operational concerns that were not directly related to building their applications. The DevOps principle of 'you build it, you run it' was creating cognitive overload, with developers expected to be experts in Kubernetes, observability, CI/CD configuration, security tooling, and database management simultaneously with their application development work.Platform engineering addresses this by creating an internal developer platform (IDP) -- a curated, self-service abstraction layer over infrastructure and tooling that development teams can use without deep expertise in the underlying systems. The platform engineering team builds and maintains the platform (Kubernetes clusters, CI/CD templates, observability stacks, service catalogs, secrets management) so that development teams can provision environments, deploy services, and access observability with minimal friction.Backstage, the developer portal open-sourced by Spotify in 2020 and donated to the CNCF, has become a widely adopted foundation for internal developer portals. It provides a catalog of services, documentation, CI/CD status, ownership information, and self-service infrastructure provisioning in a single interface.The relationship to DevOps is evolutionary: platform engineering does not abandon DevOps principles but operationalizes them at scale. The goal remains fast, reliable software delivery with high developer autonomy; the mechanism is a high-quality platform that makes the DevOps path the path of least resistance. Gartner predicted in 2023 that 80% of large software engineering organizations would have platform engineering teams by 2026.