How Software Is Actually Built: The Development Process Explained
In 1968, the NATO Science Committee convened a conference in Garmisch, Germany to address what attendees called the "software crisis." Programs were delivered late, over budget, and unreliable. Large projects routinely failed entirely. The conference produced a report that coined the term "software engineering" and proposed applying the discipline of established engineering fields -- civil, mechanical, electrical -- to software development.
The analogy was appealing. Engineers designed bridges and buildings using proven mathematics, produced detailed blueprints, and then construction proceeded from those plans. Why could software not work the same way? Define requirements comprehensively, design the system completely, then build it.
The next several decades demonstrated why this analogy was fundamentally flawed. Software is not like a bridge. Requirements for a bridge do not change while the bridge is being built. Users of a bridge do not discover mid-construction that they actually want the bridge to go somewhere else. The bridge does not need to be compatible with every bridge that existed before it. And when you finish building a bridge, you do not then discover that your customers use it in ways you never anticipated, requiring you to modify the bridge continuously forever.
Software is different in kind, not just in degree. Understanding how software is actually built -- as opposed to how early theorists hoped it could be built -- requires understanding those differences.
Why the Blueprint Model Fails
The waterfall model, formalized in a 1970 paper by Winston Royce (who actually intended it as a cautionary illustration of a flawed approach, not a recommendation), organized software development as a sequence of phases: requirements, design, implementation, testing, deployment. Each phase completed before the next began. Requirements were locked. Design preceded all coding.
The model failed for predictable reasons:
Requirements are discovered, not defined. Users and stakeholders cannot accurately specify what they want before they have seen something concrete. They can tell you their problems. They cannot tell you the solution. When shown a working prototype, they immediately see things they want changed that they could not have articulated from a requirements document.
Technical constraints emerge during implementation. The elegant architecture designed on a whiteboard encounters real-world limits: the third-party API does not work as documented, the database query that looked efficient in theory is slow with real data volumes, the mobile platform has restrictions that invalidate an assumption made during design.
Markets change faster than development cycles. A product designed to specifications written eighteen months ago may be solving a problem that competitors solved six months ago, or addressing a market that has moved on.
The Standish Group's CHAOS Report, which surveys thousands of software projects annually, has consistently found that software projects following waterfall-style processes have higher rates of failure -- defined as late delivery, over-budget, or delivering incomplete functionality -- than iterative approaches. The 2020 report found that 19% of waterfall projects were successful against 42% of agile projects.
The Iterative Reality
Almost all successful software is built iteratively: build something small, put it in front of users, learn from the experience, build the next small thing. The iteration loop replaces the blueprint.
This was not always an accepted approach. In the 1990s, iterative development was considered an admission that you could not plan properly. The Agile Manifesto of 2001, signed by seventeen software developers who had developed different iterative methodologies independently, made the philosophical break explicit. The manifesto's twelve principles include: "Welcome changing requirements, even late in development. Agile processes harness change for the customer's competitive advantage." And: "Deliver working software frequently, from a couple of weeks to a couple of months, with a preference to the shorter timescale."
The manifesto was not describing a new invention. It was naming and defending a practice that effective development teams had been using for decades against critics who insisted that proper software development required comprehensive upfront planning.
Example: Amazon's evolution from an online bookstore to a global logistics and cloud computing company illustrates iterative development at organizational scale. Amazon's first website in 1995 sold books. The company did not design a "Books, DVDs, Electronics, Cloud Services, Logistics, Advertising" company in 1994. Each capability was added iteratively based on what was learned from the previous one. The Amazon Web Services team, which launched in 2006, emerged partly from Amazon's realization that it had built internal infrastructure capabilities worth selling. The opportunity was discovered through building, not planned in advance.
The Problem Understanding Stage
The Most Expensive Mistake: Building the Wrong Thing
The 2020 Standish Group CHAOS report found that 35% of software features are never used by anyone, and another 45% are rarely used. If these numbers are accurate -- and they are consistent across multiple independent studies -- then roughly 80% of development effort produces features that add minimal value to users.
The economic implication is striking. If a software team could eliminate the 80% of work that produces minimal value and focus entirely on the 20% that matters, they would be five times as productive without anyone working any harder or any faster. The bottleneck is not execution speed; it is problem selection.
This is why the stage of understanding the problem before writing code is not optional overhead -- it is the primary leverage point in the development process.
Techniques for Problem Understanding
User interviews investigate actual behavior rather than assumed behavior. The cardinal rule of user research: do not ask hypothetical questions. "Would you use a feature that did X?" generates useless answers -- people say yes to hypothetical features because they are trying to be helpful. Instead, investigate what users actually do: "Walk me through what you did the last time you needed to accomplish X." Current behavior is reality; stated preferences are speculation.
Prototyping before coding tests the design before it is expensive to change. A clickable prototype in Figma or Sketch takes hours to create and can be tested with users before any code is written. When a prototype test reveals that users misunderstand the design, that discovery costs nothing to address. The same discovery after six weeks of development costs six weeks of rework.
Data analysis reveals patterns in existing behavior that interviews might miss. If users of an existing product consistently abandon a particular workflow step, the data reveals this regardless of what users say in interviews. Combining behavioral data with interview insights produces more reliable understanding than either alone.
Minimal viable products test the core value proposition with the smallest possible implementation. The MVP concept, popularized by Eric Ries in The Lean Startup, is not about building a bad product -- it is about identifying the minimum that demonstrates value and testing that before investing in the rest. Dropbox's MVP was a three-minute video explaining what the product would do, before the product existed. The video generated 75,000 signups overnight, validating demand before a single line of the eventual product was written.
Translating Understanding Into Requirements
Once the problem is understood, requirements must be expressed in a form that enables development. The format matters less than the content.
User stories express requirements from the user's perspective: "As a [type of user], I want to [do something] so that [I achieve some goal]." The format forces requirements to be stated in terms of user need rather than implementation specification. "As a customer, I want to see my order history so I can track previous purchases" is a user story. "Implement an API endpoint that returns order records for a given user ID" is not -- it specifies implementation before the user need is clear.
Acceptance criteria define when a requirement is satisfied: the specific conditions that must be true for the story to be considered complete. Clear acceptance criteria prevent the endless disagreement about whether a feature is "done" -- the criteria either pass or they do not.
Architecture and Technology Decisions
Making Technology Choices That Will Not Be Regretted
Technology decisions made early constrain the project for years. Choosing a programming language, database, cloud provider, or architectural pattern is not easily reversed. These choices deserve serious consideration -- but not so much consideration that the project never starts.
The most common mistake in early architecture decisions is designing for scale that does not yet exist. A startup with 100 users building a microservices architecture capable of serving 10 million users is optimizing for a problem they do not have while creating the problem of unnecessary complexity. The engineering complexity of microservices is real and immediate; the scaling benefits are theoretical and distant.
Start with the simplest architecture that works. A monolith -- a single deployable application -- is appropriate for most new projects. It is easier to build, easier to deploy, easier to debug, and easier to understand than a distributed system. If the monolith later needs to be decomposed into services, that decomposition can be done with knowledge of which boundaries matter, informed by the actual usage patterns of a real product. Premature decomposition imposes the costs of distribution before the benefits materialize.
The same principle applies to database selection, infrastructure complexity, and framework choice. Complexity is a form of technical debt taken on at the start of the project. It should be justified by requirements that exist now, not by anticipated future scale.
Example: Stack Overflow, which as of 2022 served 1.5 billion page views per month, ran primarily on a small number of physical servers rather than a large cloud-native distributed system. The company's engineering team wrote extensively about this architecture -- they could achieve their scale with a monolith and careful performance optimization rather than distributed systems complexity. The lesson is not that distributed systems are never appropriate; it is that architectural simplicity is valuable and should be preserved as long as it remains viable.
The Architecture Decision Record
Significant architecture decisions should be documented. The Architecture Decision Record (ADR) format, introduced by Michael Nygard, captures:
- The decision that was made
- The context that made it necessary
- The options that were considered
- The rationale for the chosen option
- The consequences expected
ADRs serve two purposes. First, they force the decision-making process to be explicit, which tends to improve decision quality. Second, they provide the future team with the reasoning behind decisions that may otherwise appear arbitrary. "Why is this service written in Go when everything else is Python?" has a better answer if the decision to use Go was documented at the time it was made.
The Daily Practice of Development
What Developers Actually Do
The popular image of software development is typing code. The reality is more varied. Research on developer time allocation consistently finds that writing new code represents a minority of actual working time.
Robert C. Martin, author of Clean Code, estimated a 10:1 ratio of reading to writing code: for every line written, ten lines are read. Before writing anything, a developer must understand the existing system that the new code will integrate with. This reading time is not waste -- it is the foundation that makes the writing correct.
A realistic breakdown of a professional developer's time:
- Reading and understanding existing code: 25-30%
- Writing new code: 20-25%
- Debugging and investigating issues: 15-20%
- Code review (reviewing others' code): 10-15%
- Meetings and communication: 10-15%
- Documentation and administrative tasks: 5-10%
The implication is that code quality and readability are not aesthetic concerns -- they are economic ones. Code that is difficult to read requires more time to understand before modification. Across an engineering organization of 50 people, the difference between readable and unreadable code accumulates into thousands of hours of lost productivity per year.
The Task Lifecycle
A single task -- fixing a bug, implementing a feature, improving performance -- moves through a consistent lifecycle regardless of the team's specific process:
Discovery and specification: The task is identified and described. What is the expected behavior? What is the actual behavior? What constraints apply? What is the acceptance criteria?
Estimation: How long will this take? Estimation is systematically difficult because development work involves uncertainty that is hard to quantify. The most reliable approach is breaking work into small pieces (ideally under two days) before estimating -- small pieces are more predictable than large ones.
Implementation planning: Before writing code, the developer understands the scope of change. What existing code is relevant? What approach will be taken? What tests are needed?
Development: Writing code, writing tests, verifying locally that the implementation is correct.
Code review: Submitting the change for review by teammates, addressing feedback, revising as needed.
Integration and testing: Merging the change into the shared codebase, verifying that automated tests pass, deploying to staging for manual verification when appropriate.
Deployment: Moving the change to production, monitoring for problems.
The lifecycle for a small bug fix might compress into a single morning. A significant feature might take weeks, with the implementation broken into smaller pieces that each move through the cycle independently.
Testing: Verifying That Software Works
The Cost of Finding Bugs Late
The cost to fix a software defect increases by roughly an order of magnitude at each stage of development. A bug caught by the developer while writing code might take five minutes to fix. The same bug caught in code review might take thirty minutes. Caught in QA testing, an hour including regression verification. Caught in production, it might require emergency deployment, rollback, customer communication, and post-incident review -- representing hours or days of multiple people's time, plus potential business impact.
These numbers, which originate in Barry Boehm's 1981 research and have been replicated many times since, make an economic case for investing in practices that catch defects early: automated testing, code review, static analysis, and type systems.
The Testing Pyramid
The testing pyramid is a framework for thinking about the right mix of test types:
Unit tests (many, fast): Test individual functions or classes in isolation, with dependencies replaced by test doubles. Run in milliseconds. Catch logic errors at the point of creation. A comprehensive unit test suite gives the developer immediate feedback while writing code.
Integration tests (moderate, medium speed): Test how components work together -- the database layer with the service layer, the service with an external API. Run in seconds to minutes. Catch interface mismatches and integration assumptions.
End-to-end tests (few, slow): Test complete user workflows through the full application stack. Run in minutes. Catch system-level issues that do not appear in isolated component tests.
The pyramid shape reflects the recommended distribution: many unit tests, fewer integration tests, even fewer end-to-end tests. This distribution produces a fast, reliable test suite. Teams that invert the pyramid -- heavy reliance on end-to-end tests -- have slow, flaky test suites that do not provide the fast feedback needed during development.
Test-Driven Development
Test-Driven Development (TDD), popularized by Kent Beck, reverses the typical sequence: write the test first, then write the code that makes the test pass.
The cycle is:
- Write a failing test that describes the behavior you want
- Write the minimum code that makes the test pass
- Refactor the code while keeping the tests passing
TDD has two effects that defenders cite. First, it forces the developer to think clearly about what the code should do before writing the code -- the test is a specification. Second, it produces code that is inherently testable, because the code was written specifically to be testable from the start. Code that was written first and tested afterward is often difficult to test without significant refactoring.
Critics of TDD note that it is slower in the short term and requires discipline to maintain. Proponents argue that the investment returns multiple times through reduced debugging and safer refactoring.
Deployment: Getting Code to Users
The Gap Between Development and Production
"It works on my machine" is a cliche for a reason. Development environments differ from production environments in ways that consistently reveal unexpected bugs: different operating system versions, different library versions, different data volumes, different network conditions, different concurrent user loads.
Modern deployment practice minimizes these differences through containerization. Docker packages an application with all its dependencies into a container image that runs identically on the developer's laptop, in the CI pipeline, in staging, and in production. The environment is defined as code rather than installed manually, eliminating an entire category of environment-specific bugs.
The Path to Production
A production deployment typically follows:
Automated build: The application source code is compiled, assets are built, and a deployment artifact is created -- a Docker image, an executable binary, a ZIP archive.
Automated testing: The CI pipeline runs the full test suite against the built artifact. Security scanning checks for known vulnerabilities. Linting verifies code quality standards.
Staging deployment: The artifact deploys to a staging environment that mirrors production. Automated smoke tests verify core functionality.
Production deployment: The artifact deploys to production. Modern deployment strategies (canary, blue-green, rolling) minimize the risk of deployment failures affecting all users simultaneously.
Post-deployment monitoring: Error tracking, performance monitoring, and alerting watch for regressions introduced by the deployment. Problems that appear in the first minutes after deployment are caught before affecting most users.
Example: Etsy's deployment practice, described in detail by their engineering team in multiple talks and blog posts, includes a practice called "feature flags" that enables deploying code to production without making features visible to users. Developers at Etsy could deploy to production dozens of times per day because each deployment was a small, isolated change, and any new features could be disabled if problems appeared. The practice reduced deployment risk while increasing deployment frequency -- a counterintuitive result that demonstrates the value of small, incremental change.
The Maintenance Reality
Software Is Never Finished
A deployed application does not enter a stable state. It requires continuous attention:
Security vulnerabilities are discovered in dependencies constantly. A library that was safe when installed may have a known vulnerability a week later. Unpatched vulnerabilities are exploited. Security maintenance is not optional.
Performance characteristics change as data volumes grow and usage patterns evolve. A database query that runs in 50ms with 10,000 records may run in 30 seconds with 10 million records. Performance problems that did not exist at launch appear over time.
Dependencies become outdated. Programming languages release new versions. Frameworks add capabilities that make old patterns obsolete. Cloud services deprecate APIs. Dependency updates are ongoing work, not a one-time event.
User needs evolve. User research reveals new pain points. Competitors add features that users expect the product to match. Business strategy changes. The product must evolve with these needs.
The Stripe Developer Coefficient report of 2018 estimated that developers spend 42% of their working time on maintenance of existing systems rather than building new capabilities. This number is not a failure of planning -- it is the nature of living software in a changing environment.
Technical Debt Accumulation and Repayment
Every software system accumulates technical debt over time: the accumulated cost of previous expedient decisions. Code written under deadline pressure that was correct but not clean. Architecture that worked at one scale but requires redesign at another. Tests that were skipped. Documentation that was deferred.
Debt is not inherently problematic. Taking on technical debt deliberately -- choosing a known shortcut with a plan to address it later -- is a legitimate engineering trade-off. The problem is unacknowledged debt that accumulates invisibly until the codebase becomes difficult to change.
High-performing teams manage technical debt as a first-class concern: tracking it explicitly, allocating time for repayment (often 20-30% of sprint capacity), and addressing it proactively before it reaches critical levels. The alternative -- treating debt as invisible until it becomes a crisis -- produces the rewrite cycles that consume years of effort in mature engineering organizations.
The connection between development workflow practices, technical debt management, and team productivity is explored further in development workflows and developer productivity.
The Human Dimension
Communication as Engineering Infrastructure
Software is built by people in coordination with other people. The quality of communication -- between developers, between developers and product managers, between the engineering team and users -- determines outcomes as much as technical skill.
A development team that communicates poorly builds the wrong things correctly. Features that are technically excellent solve problems that users do not have. Architectural decisions that are sound in isolation create integration problems because they were made without sufficient coordination.
The practices that enable good communication are not soft: written specifications, documented decisions, code review comments that explain reasoning, retrospectives that surface and address process problems. These are engineering practices with measurable impact on outcomes.
Psychological safety -- the team member's belief that they can raise concerns without punishment -- determines whether problems are identified and addressed before they become expensive. Teams where developers feel safe saying "I think we are building the wrong thing" or "I am not sure this is the right approach" discover problems early. Teams where these concerns are suppressed discover them in production.
Onboarding and Knowledge Transfer
The efficiency with which new team members become productive is a measurable organizational capability. A team where a new developer requires six months to become independently productive loses six months of engineering capacity per hire. A team with clear documentation, clean code, good test coverage, and deliberate onboarding processes may achieve that same productivity in six weeks.
The investment in making knowledge transferable -- through documentation, code clarity, architectural decision records, and mentoring -- pays continuous returns. It also reduces the organizational risk of knowledge concentration in individuals who may leave.
What Separates Teams That Ship from Those That Struggle
DORA research has identified specific practices that correlate with elite software delivery performance. Teams in the top tier of delivery performance:
Deploy to production multiple times per day, with lead times under one hour. This requires comprehensive automated testing, streamlined deployment pipelines, and organizational trust in the process.
Have change failure rates below 15% and restore service from failures in under one hour. This requires monitoring, rollback capabilities, and incident response practices.
Practice trunk-based development with short-lived feature branches. Long-lived branches are a risk factor for integration problems and delayed feedback.
The research finding that most challenges expectations: deployment frequency and stability are not in tension. The teams that deploy most frequently have the fewest failures. Frequent small deployments are safer than infrequent large ones -- they are easier to understand, easier to roll back, and produce faster feedback when problems occur.
Software development at its best is a continuous learning process: build, observe, learn, adapt. The teams that have internalized this rhythm -- and built the organizational practices that support it -- consistently outperform those that treat it as a linear project with a beginning and an end.
References
- Ries, Eric. The Lean Startup: How Today's Entrepreneurs Use Continuous Innovation to Create Radically Successful Businesses. Crown Business, 2011. https://theleanstartup.com/
- Forsgren, Nicole, Humble, Jez, and Kim, Gene. Accelerate: The Science of Lean Software and DevOps. IT Revolution Press, 2018. https://itrevolution.com/accelerate-book/
- Standish Group. "CHAOS 2020: Beyond Infinity." Standish Group, 2020.
- Martin, Robert C. Clean Code: A Handbook of Agile Software Craftsmanship. Prentice Hall, 2008.
- Beck, Kent. Test Driven Development: By Example. Addison-Wesley, 2002.
- Boehm, Barry. Software Engineering Economics. Prentice Hall, 1981.
- Royce, Winston. "Managing the Development of Large Software Systems." Proceedings of IEEE Wescon, 1970.
- Kim, Gene, Behr, Kevin, and Spafford, George. The Phoenix Project: A Novel about IT, DevOps, and Helping Your Business Win. IT Revolution Press, 2013. https://itrevolution.com/the-phoenix-project/
- Nygard, Michael T. Release It!: Design and Deploy Production-Ready Software. Pragmatic Bookshelf, 2018. https://pragprog.com/titles/mnee2/release-it-second-edition/
- Stripe. "The Developer Coefficient." stripe.com, 2018. https://stripe.com/reports/developer-coefficient-2018
- Netflix Technology Blog. "Full Cycle Developers at Netflix." netflixtechblog.com. https://netflixtechblog.com/full-cycle-developers-at-netflix-a08c31f83249