Before DevOps, software organizations typically operated under a division that seemed logical but created serious dysfunction: development teams wrote code, and operations teams ran it in production. Developers were incentivized to ship new features quickly; operations teams were incentivized to maintain stability and avoid changes that might cause outages. The result was predictable: developers threw code over the wall to operations, who received it reluctantly, deployed it infrequently, and blamed developers when things broke. Developers blamed operations for moving too slowly. Releases were stressful, infrequent events that required elaborate coordination and often caused production incidents. The wall between development and operations was organizational, not technical, but it produced real costs in delivery speed, quality, and the wellbeing of everyone involved.
Patrick Debois organized the first DevOpsDays conference in Ghent, Belgium in October 2009, giving a name to a set of ideas that a community of practitioners was already developing. The name was deliberately unifying: development and operations were not separate disciplines with competing interests but facets of a single activity -- delivering working software to users reliably and repeatedly. The insights Debois drew on came from multiple directions: Agile software development, which had already challenged waterfall models of software delivery; the lean manufacturing movement and the Toyota Production System, which provided frameworks for thinking about waste, flow, and continuous improvement; and the practical experience of engineers at companies including Flickr and Amazon who had demonstrated that frequent deployments and operational stability were not in tension.
In the decade and a half since, DevOps has moved from a provocative idea at a small conference in Belgium to the organizing principle of software delivery at organizations ranging from startups to governments and Fortune 500 companies. It has spawned a substantial body of research (the DORA metrics and the Accelerate book by Nicole Forsgren, Jez Humble, and Gene Kim are the most rigorous and influential), a vast commercial ecosystem of tools, and a set of practices -- continuous integration, continuous delivery, infrastructure as code, observability -- that have become standard expectations in software engineering. It has also generated confusion, hype, and misapplication sufficient to fill several books of its own.
"DevOps is not a goal, but a never-ending process of continual improvement." -- Jez Humble
Key Definitions
DevOps: A cultural, philosophical, and technical movement that unifies software development (Dev) and IT operations (Ops) by breaking down the organizational silos between them, enabling faster, more reliable software delivery through automation, collaboration, and shared responsibility.
Continuous integration (CI): The practice of merging code changes into a shared repository frequently, with each merge triggering automated builds and tests to detect integration problems quickly.
Continuous delivery (CD): The practice of ensuring that code is always in a deployable state, with every successful CI pipeline producing an artifact that can be deployed to production at any time.
Continuous deployment: An extension of continuous delivery in which every successful pipeline run results in an automatic deployment to production, without human approval. This is distinct from continuous delivery, which requires a human decision to deploy.
DORA metrics: The four key metrics identified by the DevOps Research and Assessment team (deployment frequency, change lead time, change failure rate, time to restore service) that measure software delivery performance and predict organizational outcomes.
Value stream: The complete sequence of activities required to deliver value to the end user, from business hypothesis through production deployment and user feedback. DevOps practices are applied across the entire value stream, not just within individual teams.
Platform engineering: The practice of building and maintaining an internal developer platform that provides development teams with self-service access to infrastructure and tooling, reducing cognitive load and enabling DevOps at scale.
Observability: The degree to which the internal state of a system can be inferred from its external outputs -- metrics, logs, and traces. Highly observable systems enable engineers to understand and debug production behavior without needing to reproduce problems locally.
Origins and Historical Context
The Wall Between Dev and Ops
The separation of development and operations has deep roots in both organizational theory and the practical history of computing. In the mainframe era, "operations" was a specialized discipline concerned with scheduling batch jobs, managing tape libraries, and maintaining physical hardware -- work that was genuinely distinct from programming. As software systems became more complex, the separation persisted and deepened: operations teams became responsible for production environments while development teams worked in isolation from production realities.
By the 2000s, this separation was creating serious problems in organizations building web-scale software. Development teams working in Agile sprints were producing code continuously, but operations teams were deploying it quarterly, at best. The coordination overhead of each release -- change advisory boards, release notes, deployment runbooks, post-deployment monitoring rotations -- was consuming enormous engineering effort. And because releases were infrequent, each one contained large batches of changes that were difficult to diagnose when they caused problems.
The Agile movement had addressed the development side of this equation by introducing iterative delivery, collaboration, and continuous feedback. But Agile practices stopping at the boundary of the development team left the deployment and operations side largely unchanged. The DevOps insight was that the same principles -- small batches, fast feedback, continuous improvement, shared responsibility -- needed to extend through the entire value stream from code commit to production.
The Toyota Production System Connection
The intellectual roots of DevOps reach back further than the Agile movement, to the Toyota Production System (TPS) and the lean manufacturing philosophy that emerged from it. The TPS, developed by Taiichi Ohno and Shigeo Shingo at Toyota in the decades after World War II, introduced concepts including just-in-time production (eliminating inventory waste by producing only what is needed when it is needed), andon cords (production line workers can stop the line to address quality problems immediately), and kaizen (continuous incremental improvement as a permanent organizational practice).
James Womack and Daniel Jones documented these ideas for Western audiences in The Machine That Changed the World (1990) and Lean Thinking (1996), coining the term "lean manufacturing." The application of lean principles to software delivery -- eliminating waste, reducing batch size, amplifying feedback, and building in quality -- is the conceptual foundation on which DevOps rests. Gene Kim has described the DevOps movement as "the application of lean and agile principles to IT value streams."
John Allspaw and the Flickr Model
One of the most influential moments in the early DevOps movement was John Allspaw and Paul Hammond's presentation "10+ Deploys Per Day: Dev and Ops Cooperation at Flickr" at the Velocity conference in June 2009. Flickr, the photo sharing platform that had recently been acquired by Yahoo, was deploying code to production ten or more times per day -- a figure that seemed almost reckless to engineers accustomed to quarterly release cycles.
Allspaw and Hammond's presentation argued that this deployment frequency was not only possible but desirable: small, frequent deployments meant that each deployment contained fewer changes, making problems easier to diagnose and fix; developers received faster feedback on their code in production; and the operational risk of each individual deployment was lower precisely because it was smaller. The key enabler was not technology but culture: development and operations teams operated as a single team with shared goals and shared accountability, not as separate organizations with competing incentives.
This presentation was influential on Patrick Debois and on the broader community that gathered at the first DevOpsDays, and it became a touchstone for the argument that high deployment frequency and high reliability were not in tension.
The Three Ways
Gene Kim, Jez Humble, Patrick Debois, and John Willis developed the Three Ways framework in The DevOps Handbook (2016) as a conceptual organizing principle for DevOps practices. The Three Ways draw explicitly on systems thinking and lean manufacturing concepts, translated into the context of software delivery.
The First Way: Flow
The First Way is about optimizing the flow of work through the entire value stream, from business hypothesis to production deployment. The focus is on making work visible, identifying and eliminating bottlenecks, reducing batch sizes, and avoiding work being passed back upstream. The practical applications include continuous integration (which eliminates the integration bottleneck), deployment pipelines (which automate the path from commit to production), and limiting work in progress (which surfaces bottlenecks and prevents the partial work that creates waste).
The Lean manufacturing concept of value stream mapping -- tracing the complete path that a unit of work takes from inception to delivery, identifying all the waiting, handoffs, and inefficiencies -- is directly applicable to software delivery. Teams that map their deployment pipeline often discover that the actual work of building, testing, and deploying code takes hours, but the total elapsed time from commit to production is days or weeks, most of which is waiting: for reviews, for approvals, for environment availability.
A common finding from value stream mapping exercises in software organizations is that value-adding activities represent only 5-20% of total lead time. The remaining 80-95% is waiting -- in queues, for approvals, for environment provisioning, for sign-offs from distant teams. Eliminating this wait is the primary opportunity that DevOps practices address.
The Second Way: Feedback
The Second Way is about creating fast, continuous feedback loops at every stage of the value stream. Automated testing provides feedback on whether code changes break existing functionality within minutes of a commit. Monitoring and alerting provide feedback on whether a production deployment has caused problems. Post-incident reviews provide feedback on what failure modes were not anticipated. Customer feedback provides information on whether features are actually meeting user needs.
The principle underlying the Second Way is that problems should be discovered as close as possible in time and space to where they are introduced, when they are cheapest and easiest to fix. A test that catches a bug seconds after it is introduced costs minutes to fix; the same bug discovered in production six months later may cost days. The investment in fast, comprehensive automated testing is justified by this asymmetry.
Observability is a central enabling capability for the Second Way in production environments. A team that cannot understand what is happening in their production system cannot receive meaningful feedback from it. Observability requires three types of telemetry:
- Metrics: Numerical measurements over time (request rate, error rate, latency, resource utilization)
- Logs: Timestamped records of discrete events (requests, errors, state transitions)
- Traces: Distributed traces that follow a request across multiple services, enabling diagnosis of latency and failure in microservice architectures
The Third Way: Continuous Learning and Experimentation
The Third Way is about creating a culture of continuous learning and experimentation. This includes blameless post-incident reviews (which generate learning from failures without creating incentives to hide problems), allocating time for improvement and experimentation (which creates space for teams to address technical debt and explore new approaches), and tolerating the necessary failure that learning requires.
Blameless post-mortems, formalized by John Allspaw at Etsy and later at Kitty Hawk, are based on the recognition that complex systems fail in complex ways that are rarely attributable to a single person's mistake. A blameless post-mortem focuses on understanding the systemic conditions that made the failure possible and identifying changes to the system that reduce the likelihood of recurrence, rather than on identifying and punishing the individual who "caused" the incident. This approach produces better learning outcomes and creates an environment in which engineers are willing to be honest about what happened.
Netflix's chaos engineering practice -- deliberately injecting failures into production systems to discover and fix weaknesses before they cause unplanned outages -- is an example of the Third Way applied at scale. The practice, pioneered by Netflix engineers with the Chaos Monkey tool (and expanded into the broader Simian Army), was built on the recognition that production failures are inevitable and that discovering them proactively (when a team is prepared to respond) is preferable to discovering them unexpectedly. Chaos engineering has since been formalized as a discipline by practitioners and codified in resources from the Chaos Engineering Community and CNCF.
The DORA Research
Accelerate and Its Findings
Nicole Forsgren, Jez Humble, and Gene Kim's Accelerate: The Science of Lean Software and DevOps (2018) presented findings from four years of annual State of DevOps surveys conducted with DORA (DevOps Research and Assessment). The research involved data from thousands of software delivery practitioners worldwide and used rigorous statistical methods (including structural equation modeling) to identify the capabilities that drive software delivery performance.
The central finding was striking and commercially important: high-performing technology organizations -- those that deployed frequently, with short lead times, low failure rates, and fast recovery -- also outperformed their peers on organizational outcomes: revenue growth, market share, profitability, and customer satisfaction. Software delivery performance was not in tension with business performance; it was a significant driver of it. Organizations could not choose between moving fast and being stable; the research demonstrated that they were the same thing.
"High performers are twice as likely to exceed profitability, market share, and productivity goals. Elite performers deploy 973 times more frequently than low performers with 6,570 times faster lead time from commit to deploy." -- Forsgren, Humble, & Kim, Accelerate (2018)
The research also identified 24 capabilities that statistically predict high software delivery performance, organized into:
- Technical practices: Continuous delivery, architecture, version control, test automation, trunk-based development, database change management
- Process practices: Team experimentation, flow management, work in process limits, visibility of work, working in small batches
- Cultural practices: Generative organizational culture, learning culture, transformational leadership
- Lean product management practices: Working in small batches, making flow visible, gathering and implementing customer feedback
The Four DORA Metrics
| Metric | What It Measures | Elite Performance | High Performance | Low Performance |
|---|---|---|---|---|
| Deployment frequency | How often code is deployed to production | On-demand (multiple per day) | Weekly to monthly | Monthly or less |
| Change lead time | Time from commit to production deployment | Less than one hour | One day to one week | One month to six months |
| Change failure rate | Percentage of deployments causing incidents | 0-5% | 5-10% | 16-30% |
| Time to restore service | How long to recover from a production incident | Less than one hour | Less than one day | One week to one month |
Source: DORA State of DevOps Report, 2023
Measuring Improvement
The DORA metrics provide a practical framework for teams to measure their own delivery performance and track improvement over time. Deployment frequency can be measured directly from deployment tooling logs. Change lead time can be measured from commit timestamps to deployment timestamps. Change failure rate can be measured by tagging production incidents with the deployment that caused them and calculating the ratio. Time to restore service can be measured from incident creation to incident resolution.
DORA publishes annual benchmarks that allow teams to compare their performance against industry norms, categorizing organizations as elite, high, medium, or low performers. The benchmarks have shifted significantly over the decade of research: what was considered elite performance in 2014 (daily deployments, week-long lead times) is considered medium performance by 2023 standards, reflecting genuine industry improvement.
The 2023 State of DevOps Report introduced a fifth metric: reliability, measured as meeting SLO (Service Level Objective) targets, reflecting the recognition that speed metrics alone are incomplete without stability metrics alongside them.
CI/CD Pipelines: The Technical Backbone
Continuous integration and continuous delivery are the technical practices most directly associated with DevOps. They are implemented through CI/CD pipelines -- automated workflows that take source code changes through build, test, security scanning, artifact publishing, and deployment stages without human intervention at each step.
A representative CI/CD pipeline for a containerized application might include:
- Source control trigger: Developer pushes code or opens a pull request; pipeline starts automatically
- Build: Source code compiled, container image built, tagged with commit hash
- Unit tests: Fast tests run against the built artifact, typically completing in under five minutes
- Static analysis and security scanning: SAST tools scan source code; SCA tools check dependencies for vulnerabilities; IaC scanning validates infrastructure code
- Integration tests: Tests run against a deployed instance of the application in a test environment
- Image scanning: Container image scanned for CVEs before being pushed to a registry
- Artifact publishing: Approved image pushed to container registry
- Deployment to staging: Automated deployment to staging environment with smoke tests
- Deployment to production: Manual approval gate (for continuous delivery) or automatic (for continuous deployment)
Popular CI/CD platforms include GitHub Actions, GitLab CI/CD, CircleCI, Jenkins, and cloud-provider-native options like AWS CodePipeline and Google Cloud Build. The choice of platform matters less than the quality and coverage of the pipeline it runs.
Trunk-Based Development
The DORA research identified trunk-based development as one of the strongest technical predictors of high delivery performance. Trunk-based development means that developers integrate their changes into the main branch (trunk) at least once per day, rather than working on long-lived feature branches that diverge significantly before being merged.
The benefits are significant: frequent integration prevents the accumulation of merge conflicts that make integration painful; developers receive feedback from CI tests on a daily basis rather than waiting until a feature branch is "done"; and the main branch remains in a deployable state at all times, enabling continuous delivery. Feature flags (runtime switches that control whether a feature is visible to users) allow large features to be deployed progressively without long feature branches.
DevSecOps and the Security Integration
The integration of security into DevOps pipelines has become one of the most important evolutions of the model. Traditional security practices -- penetration testing and security review gates before major releases -- were incompatible with the high-velocity deployment model that DevOps enabled. If a team is deploying dozens of times per day, a security review that takes two weeks for each release becomes an immediate bottleneck.
DevSecOps solves this by automating security checks throughout the CI/CD pipeline:
- Static Application Security Testing (SAST): Analyzes source code for security vulnerabilities as part of every build -- buffer overflows, injection vulnerabilities, insecure cryptography
- Software Composition Analysis (SCA): Checks dependencies against vulnerability databases (NVD, OSV) and alerts on newly discovered vulnerabilities in packages the application uses
- Container image scanning: Checks base images and installed packages against known CVEs before deployment
- Infrastructure as Code scanning: Checks Terraform, CloudFormation, or Kubernetes manifests for misconfigurations before they are applied
- Dynamic Application Security Testing (DAST): Runs automated tests against a deployed application to find runtime vulnerabilities
The Shift Left Security principle -- addressing security concerns as early as possible, ideally during development rather than pre-production or post-deployment -- reflects the same logic as the broader DevOps principle that late discovery of problems is expensive. IBM's Cost of a Data Breach report (2023) found that the average cost of a data breach was $4.45 million, and that breaches identified and contained in under 200 days cost $1.02 million less than those taking longer -- quantifying the financial value of fast detection.
Platform Engineering as DevOps at Scale
The DevOps model works well for small teams and simple environments but encounters scaling challenges in large organizations with many teams and complex infrastructure. When every team is responsible for their own Kubernetes clusters, CI/CD pipelines, observability stacks, and security tooling, the cognitive overhead can become overwhelming, and the inconsistency between teams makes organization-wide visibility and governance difficult.
Platform engineering addresses these challenges by centralizing infrastructure and tooling concerns in a dedicated team whose customers are the development teams. The platform team builds and maintains an Internal Developer Platform (IDP) -- a curated, self-service abstraction layer that provides development teams with the capabilities they need without requiring deep infrastructure expertise.
The key insight is that platform engineering applies product thinking to internal tooling. The platform team treats development teams as customers with product requirements, measures developer experience (how long it takes to onboard a new service, how often developers are blocked by platform limitations), and iterates on the platform based on feedback. This is distinct from the traditional "infrastructure team" model in which requests were queued and fulfilled on a support-ticket basis.
"Platform engineering is DevOps done at organizational scale. Instead of asking every team to become infrastructure experts, you build a product that encodes infrastructure expertise and delivers it as a service." -- Manuel Pais, co-author of Team Topologies (2019)
Humanitec's Platform Orchestrator, HashiCorp's Waypoint, and the open-source Backstage developer portal from Spotify are among the tools that support platform engineering practices. Backstage, released as open source in 2020 and donated to the CNCF, provides a unified developer portal where teams can discover services, provision infrastructure, view documentation, and track deployments -- creating a single pane of glass for the developer experience.
According to Gartner, by 2026, 80% of large software engineering organizations will establish platform engineering teams as internal providers of reusable services, tools, and processes. The figure reflects both the growing complexity of cloud-native infrastructure and the recognition that developer productivity is a strategic asset.
SRE: DevOps in Practice at Google Scale
Site Reliability Engineering (SRE), developed at Google starting around 2003, is a specific implementation of DevOps principles with a strong engineering and quantitative orientation. Google's SRE book (Beyer et al., 2016) describes SRE as "what happens when you ask a software engineer to design an operations function."
Key SRE concepts have influenced DevOps practice broadly:
- Service Level Objectives (SLOs): Explicit targets for service reliability (e.g., 99.9% of requests will succeed) agreed between service teams and their customers. SLOs provide a quantitative basis for reliability conversations.
- Error budgets: The acceptable amount of unreliability (1 - SLO target). If a service has spent its error budget for the month, new feature deployments pause until reliability improves -- making reliability a shared priority between development and operations.
- Toil reduction: SRE teams are expected to spend no more than 50% of their time on operational toil (manual, repetitive work that does not improve the system). The other 50% must be spent on engineering work that reduces future toil.
- Blameless post-mortems: Systematic analysis of incidents to improve system design and operational practices.
The SRE model has been influential on DevOps practice even outside Google. The concepts of SLOs, error budgets, and toil have been adopted widely, and the SRE book remains one of the most-cited works in DevOps literature.
Measuring DevOps Maturity
Organizations frequently want to assess where they are in a DevOps journey. Several maturity models have been developed:
| Capability Area | Level 1 (Initial) | Level 3 (Defined) | Level 5 (Optimizing) |
|---|---|---|---|
| Deployment frequency | Monthly or less | Weekly | Multiple per day |
| Testing | Manual, limited coverage | Automated, 70%+ coverage | Continuous, AI-assisted |
| Infrastructure | Manual, click-ops | IaC, partially automated | Fully declarative, GitOps |
| Monitoring | Reactive, manual alerting | Metrics and logs, alerting | Full observability, SLOs |
| Security | Post-deployment reviews | Automated scanning in CI/CD | Policy as code, zero trust |
| Culture | Siloed, blame culture | Some collaboration | Blameless, learning organization |
The DORA maturity assessment (available at dora.dev) provides a validated self-assessment based on the research. Teams answering questions about their deployment practices, incident response, and culture receive a performance profile and recommendations for the highest-impact improvements.
Common Misconceptions
DevOps is frequently misunderstood as a job title (a DevOps engineer who does what operations engineers used to do, but with more automation), as a tool set (CI/CD pipelines, Kubernetes, infrastructure as code), or as something that can be bought from a vendor. None of these captures the core idea.
The most important misconception is that DevOps is primarily about tools and automation. The automation is essential but secondary; the primary change is organizational and cultural: breaking down the wall between development and operations through shared goals, shared metrics, shared ownership of production systems, and collaboration across the entire value stream. Teams that invest heavily in CI/CD tooling without changing their organizational structure and incentives often find that they have automated a dysfunctional process rather than improved it.
DevOps also does not mean that there are no specialists. Organizations need engineers with deep expertise in networking, security, database administration, and infrastructure. What changes is the relationship between these specialists and the development teams: instead of operating as gatekeepers through whom all changes must pass, they operate as enablers who build platforms, automation, and self-service capabilities that allow development teams to move faster safely.
A third misconception is that DevOps is only for technology companies. The DORA research has found strong correlations between DevOps capabilities and business outcomes across industries including finance, healthcare, retail, and government. Organizations in regulated industries have implemented DevOps practices while maintaining compliance with SOX, HIPAA, PCI-DSS, and other regulatory frameworks -- often by encoding compliance requirements as automated checks in CI/CD pipelines rather than as manual review gates.
Getting Started with DevOps
For organizations beginning a DevOps journey, the DORA research provides guidance on where to focus first. The capabilities with the highest impact on delivery performance, according to the research, are:
- Version control for all code and configuration: The foundational practice on which everything else builds
- Trunk-based development: Reducing batch size and integration delay
- Test automation: Building the fast feedback loop that makes continuous deployment safe
- Deployment automation: Eliminating manual steps from the deployment process
- Monitoring and observability: Creating the feedback loop from production to development
- Loosely coupled architecture: Enabling teams to deploy independently without coordinating with other teams
The sequence matters. It is difficult to benefit from deployment automation without test automation to verify that automated deployments are safe. It is difficult to benefit from frequent deployments without the architectural decoupling that allows teams to deploy their service without synchronizing with other teams.
The organizational dimension is equally important. Forsgren et al. identify generative organizational culture -- characterized by high information flow, shared risks, bridging between teams, and learning from failure -- as one of the strongest predictors of software delivery performance. Technical practices implemented within a blame culture or highly siloed organization will not produce the outcomes the research describes.
References
- Kim, G., Humble, J., Debois, P., & Willis, J. (2016). The DevOps Handbook. IT Revolution Press.
- Forsgren, N., Humble, J., & Kim, G. (2018). Accelerate: The Science of Lean Software and DevOps. IT Revolution Press.
- Allspaw, J., & Hammond, P. (2009). 10+ deploys per day: Dev and ops cooperation at Flickr. Velocity 2009 conference presentation.
- Humble, J., & Farley, D. (2010). Continuous Delivery. Addison-Wesley.
- Kim, G. (2019). The Unicorn Project. IT Revolution Press.
- Kim, G., Behr, K., & Spafford, G. (2013). The Phoenix Project. IT Revolution Press.
- DORA. (2023). State of DevOps Report 2023. dora.dev.
- Womack, J. P., & Jones, D. T. (1996). Lean Thinking. Simon & Schuster.
- Nygard, M. T. (2018). Release It! (2nd ed.). Pragmatic Bookshelf.
- Bass, L., Weber, I., & Zhu, L. (2015). DevOps: A Software Architect's Perspective. Addison-Wesley.
- Skelton, M., & Pais, M. (2019). Team Topologies. IT Revolution Press.
- Richardson, C. (2018). Microservices Patterns. Manning Publications.
- Beyer, B., Jones, C., Petoff, J., & Murphy, N. R. (Eds.). (2016). Site Reliability Engineering: How Google Runs Production Systems. O'Reilly Media.
- Allspaw, J. (2012). Blameless post-mortems and a just culture. Etsy Code as Craft blog.
- IBM Security. (2023). Cost of a Data Breach Report 2023. ibm.com/security.
- Gartner. (2023). Innovation Insight for Platform Engineering. gartner.com.
- CNCF. (2023). Annual Survey 2023. Cloud Native Computing Foundation. cncf.io.
- Ohno, T. (1988). Toyota Production System: Beyond Large-Scale Production. Productivity Press.
Frequently Asked Questions
Who invented DevOps and where did it come from?
The term 'DevOps' is generally credited to Patrick Debois, a Belgian software consultant and developer, who coined it in 2009 for the first DevOpsDays conference he organized in Ghent, Belgium. Debois had been frustrated by the organizational dysfunction he observed between development teams (who wanted to ship new features quickly) and operations teams (who prioritized stability and were reluctant to deploy frequently). His insight was that the conflict was cultural and organizational, not technical, and that the solution required changes to how teams were structured, incentivized, and how they communicated.The intellectual foundations of DevOps drew on several earlier threads. The Agile Manifesto (2001) had already challenged the waterfall model of software development, emphasizing iterative delivery and collaboration. The Lean manufacturing movement, and especially the Toyota Production System, provided concepts (waste elimination, continuous improvement, value stream mapping) that influenced the DevOps approach to software delivery. John Allspaw and Paul Hammond's 2009 Velocity conference presentation '10+ Deploys Per Day: Dev and Ops Cooperation at Flickr' provided a compelling real-world demonstration that development and operations could collaborate to deploy software dozens of times daily while maintaining stability.The DevOps movement was subsequently shaped by practitioners including Gene Kim, Jez Humble, Patrick Debois, and John Willis, whose 'The DevOps Handbook' (2016) systematized the practices, and the 'State of DevOps Report,' which since 2013 has provided annual survey data on DevOps adoption and its relationship to organizational performance.
What are the DORA metrics?
The DORA metrics (named for the DevOps Research and Assessment team, founded by Nicole Forsgren, Jez Humble, and Gene Kim) are four key metrics that the 'Accelerate' research (published in book form in 2018) identified as measuring software delivery performance and predicting organizational outcomes.Deployment frequency measures how often a team deploys code to production. High-performing teams deploy on demand, multiple times per day; low performers deploy less than once per month. Change lead time measures the time from code commit to production deployment. High performers achieve less than one hour; low performers take more than six months. Change failure rate measures the percentage of deployments that cause a production incident requiring remediation. High performers have failure rates below 5%; low performers between 46-60%. Time to restore service measures how long it takes to recover from a production incident. High performers recover in less than one hour; low performers take more than six months.The DORA research found that these four metrics cluster together: teams that deploy frequently also have short lead times, low failure rates, and fast recovery. They also found that high performance on these metrics is strongly associated with broader organizational outcomes -- commercial performance, market share, profitability, and employee wellbeing. A fifth metric, reliability (the degree to which services meet their user availability and performance targets), was added in later research.The DORA metrics have become widely used as benchmarks for software delivery improvement, providing teams and organizations with concrete, measurable targets rather than vague aspirations to 'do DevOps.'
What is a CI/CD pipeline?
A CI/CD pipeline is the automated sequence of steps that code goes through from a developer's commit to production deployment. CI (continuous integration) and CD (continuous delivery or deployment) are distinct practices that are often combined.Continuous integration is the practice of merging code changes into a shared repository frequently (multiple times per day) and running automated tests on each merge. The goal is to detect integration problems -- conflicts between different developers' changes, regressions introduced by new code -- as quickly as possible, when they are cheapest to fix. A CI pipeline typically includes: source control trigger (a push or pull request starts the pipeline), build (compile the code or build the container image), static analysis and linting (code quality checks), unit tests (fast, isolated tests of individual functions), integration tests (tests that verify components work together), and security scanning (checking for known vulnerabilities in dependencies).Continuous delivery extends CI by ensuring that the code is always in a deployable state. Every successful CI run produces an artifact (a container image, a deployment package) that could be deployed to production at any time, though the actual deployment to production still requires human approval. Continuous deployment goes further: every successful pipeline run automatically deploys to production without manual approval.Popular CI/CD platforms include GitHub Actions, GitLab CI/CD, CircleCI, Jenkins, and TeamCity. Cloud providers offer their own services (AWS CodePipeline, Google Cloud Build, Azure Pipelines). The specific tools matter less than the underlying practice: fast feedback, frequent integration, and automated quality gates that prevent broken code from reaching production.
What is the difference between DevOps and DevSecOps?
DevSecOps (development, security, and operations) extends the DevOps model by integrating security practices throughout the software development lifecycle rather than treating security as a separate phase or team that reviews code only before release.In traditional software development, security was often addressed in a 'security review' gate near the end of the development cycle, before production deployment. This created bottlenecks (security teams became a constraint on release velocity) and made security problems expensive to fix (late discovery means extensive rework). DevSecOps applies the same principle to security that DevOps applied to operations: shift it left, meaning address it as early as possible in the development process.Practical DevSecOps involves: integrating static application security testing (SAST) tools into CI pipelines to detect vulnerabilities in source code; adding software composition analysis (SCA) to identify vulnerable dependencies (tools like Snyk, Dependabot, or OWASP Dependency-Check); running container image scanning against known vulnerability databases before deployment; implementing infrastructure as code security scanning (tools like Checkov or tfsec); and establishing dynamic application security testing (DAST) against running environments.The cultural shift is as important as the tooling: security knowledge needs to be distributed to development teams rather than concentrated in a separate security team. Developer security training, 'champion' programs where developers develop security expertise, and security requirements embedded in definition-of-done criteria all support this. The CISA (Cybersecurity and Infrastructure Security Agency) Secure by Design principles and the NIST Secure Software Development Framework provide governance frameworks for DevSecOps programs.
What is platform engineering and how does it relate to DevOps?
Platform engineering emerged in the late 2010s and early 2020s as organizations scaled their DevOps practices and recognized a new bottleneck: every development team was spending significant time on infrastructure, toolchain, and operational concerns that were not directly related to building their applications. The DevOps principle of 'you build it, you run it' was creating cognitive overload, with developers expected to be experts in Kubernetes, observability, CI/CD configuration, security tooling, and database management simultaneously with their application development work.Platform engineering addresses this by creating an internal developer platform (IDP) -- a curated, self-service abstraction layer over infrastructure and tooling that development teams can use without deep expertise in the underlying systems. The platform engineering team builds and maintains the platform (Kubernetes clusters, CI/CD templates, observability stacks, service catalogs, secrets management) so that development teams can provision environments, deploy services, and access observability with minimal friction.Backstage, the developer portal open-sourced by Spotify in 2020 and donated to the CNCF, has become a widely adopted foundation for internal developer portals. It provides a catalog of services, documentation, CI/CD status, ownership information, and self-service infrastructure provisioning in a single interface.The relationship to DevOps is evolutionary: platform engineering does not abandon DevOps principles but operationalizes them at scale. The goal remains fast, reliable software delivery with high developer autonomy; the mechanism is a high-quality platform that makes the DevOps path the path of least resistance. Gartner predicted in 2023 that 80% of large software engineering organizations would have platform engineering teams by 2026.