Project Metrics Explained: Measuring What Matters
In 2008, Flickr was one of the most popular photo-sharing platforms on the internet. Their engineering dashboard tracked a metric the team was proud of: lines of code committed per week. Engineers competed informally to top the leaderboard. And gradually, the codebase became bloated with unnecessary features, redundant implementations, and code written to boost the metric rather than to solve user problems. Meanwhile, a small startup called Instagram launched in 2010 with a tiny codebase, focused on a single metric -- daily active users -- and within 18 months had more users than Flickr had ever achieved.
The metrics you choose to track shape the behavior of your team more powerfully than any mission statement, incentive program, or motivational speech. Choose the wrong metrics and your team will optimize for the wrong outcomes -- enthusiastically and efficiently marching in the wrong direction.
The Metric Hierarchy: Leading vs. Lagging Indicators
The most important distinction in project metrics is between leading indicators (predictive signals that suggest future outcomes) and lagging indicators (historical measures that confirm what already happened).
Lagging Indicators (What Already Happened)
- Project completed on time (yes/no)
- Total budget consumed
- Features delivered per release
- Defect count after deployment
- Customer satisfaction score post-launch
The problem with lagging indicators: By the time they signal trouble, it is too late to intervene. Discovering that you are over budget at the end of the project is useless information. Discovering that customer satisfaction dropped after launch means the damage is done.
Leading Indicators (What Will Likely Happen)
- Burn rate vs. planned spending (are we consuming budget faster than expected?)
- Velocity trends (is delivery speed increasing, stable, or declining?)
- Blocker age (how long do obstacles sit before resolution?)
- Team morale trends (is energy and enthusiasm increasing or declining?)
- Requirements stability (how frequently is scope changing?)
- Estimation accuracy (are our estimates improving or degrading?)
Leading indicators enable intervention. If burn rate is exceeding plan in month 2 of a 12-month project, you can adjust course with 10 months of runway. If velocity is declining steadily over three sprints, you can investigate root causes before delivery dates are missed.
Example: Google uses a concept called "launch readiness reviews" that incorporate leading indicators -- not just "is the feature built?" but "what does the error rate trend look like?" and "what do early user testing results suggest?" A feature that is technically complete but showing concerning error rate trends will not pass the readiness review.
The Metrics That Actually Indicate Project Health
Delivery Against Commitments
What it measures: Of the work the team committed to delivering in a given period (sprint, milestone, quarter), what percentage was actually delivered?
Why it matters: This metric reveals both planning accuracy and execution reliability. Consistent 80-90% delivery suggests good estimation and strong execution. Consistently under 60% suggests either chronic overcommitment or execution problems. Consistently 100% suggests the team is sandbagging -- committing to less than their capacity to ensure they always hit targets.
The nuance: Measuring delivery against commitments requires honest commitments. If the team is pressured to commit to unrealistic targets, the metric becomes meaningless.
Cycle Time
What it measures: How long it takes a work item to move from "started" to "done." Not calendar time on the backlog -- actual working time from when someone begins to when it is delivered.
Why it matters: Cycle time is one of the purest measures of process efficiency. If your average cycle time is increasing, something is wrong -- more dependencies, more complexity, more blockers, more rework. Decreasing cycle time suggests improving process health.
Example: Etsy famously focused on reducing deployment cycle time from once per week in 2009 to 50+ deployments per day by 2013. Shorter cycle times meant faster feedback, smaller changes (less risk per deployment), and quicker response to customer needs. The metric drove behavior that improved the entire engineering operation.
Burn Rate vs. Budget
What it measures: The rate at which the project consumes its budget compared to the planned spend rate.
Why it matters: If you are spending 120% of planned rate, you will run out of budget before the project completes. If you are spending 70% of planned rate, you may have over-allocated resources or the project may be behind schedule (people are not working on it as planned).
Blocker Age
What it measures: How long blockers (dependencies, decisions needed, resources unavailable) remain unresolved.
Why it matters: Aging blockers indicate coordination failure. If the average blocker sits for 5 days before resolution, that is 5 days of potential delay per blocked work item. Tracking blocker age creates accountability for removing obstacles quickly.
Defect Rate and Rework Percentage
What it measures: How many defects are found per unit of delivered work, and what percentage of effort goes to fixing problems rather than building new capabilities.
Why it matters: Rising defect rates and increasing rework signal quality degradation -- the team is shipping faster but creating problems that consume future capacity. This is the leading indicator of technical debt accumulation.
Team Health Metrics
What it measures: Working hours trends, voluntary turnover, satisfaction survey results, and qualitative feedback from team members.
Why it matters: Team health is the ultimate leading indicator. Declining morale predicts declining performance by weeks or months. Increasing work hours predict burnout. Turnover of key team members can cripple a project.
Why Dashboards Fail
Problem 1: Measuring What Is Easy, Not What Is Useful
Most dashboards display metrics that are easy to capture automatically: lines of code, number of commits, story points completed, tickets closed. These are activity metrics -- they measure busyness, not effectiveness.
The dashboard should answer one question: "Are we on track to deliver value?" If the metrics on the dashboard do not answer that question, they are decoration.
Problem 2: Manual Update Burden
If maintaining the dashboard requires someone to spend hours each week copying data between systems, updating spreadsheets, and compiling reports, the dashboard will be abandoned as soon as that person gets busy -- which is exactly when the dashboard would be most valuable.
Effective dashboards are automated. They pull data from the tools where work actually happens (project tracking, source control, deployment pipelines) and display it without manual intervention.
Problem 3: Too Many Metrics
A dashboard with 30 metrics is a dashboard with zero insights. Nobody can process that much information at a glance. Five to seven metrics is the maximum for a useful dashboard. Choose them carefully, and be willing to change them as the project evolves.
Example: Basecamp co-founder Jason Fried has argued against dashboards entirely, preferring periodic written narratives about project health. His reasoning: "Numbers without context are worse than useless -- they create false precision." While this extreme position is not practical for all teams, the underlying point is valid: metrics without context mislead.
Problem 4: No Connection to Decisions
The test of a useful metric: if this number goes up or down, would it change a decision you make?
If velocity drops 20%, do you investigate? Do you adjust scope? Do you reallocate resources? If yes, velocity is a useful metric. If the answer is "we'd note it but wouldn't do anything differently," the metric is not driving decisions and should not be on the dashboard.
Tracking Risk Without Creating Alarm Fatigue
Risk tracking is essential but easily overdone. A risk register with 150 items creates the same paralysis as no risk tracking at all.
The Prioritization Framework
Score each risk on two dimensions:
- Probability: How likely is this to occur? (Low / Medium / High)
- Impact: If it occurs, how bad is it? (Low / Medium / High)
Focus active management on the high-probability, high-impact quadrant. Monitor the low-probability, high-impact quadrant (black swans). Accept or ignore the low-probability, low-impact quadrant.
Reporting That Informs Without Overwhelming
- In status reports: Highlight only risks that are critical, changing status, or requiring decisions. Not every risk needs to be mentioned every week.
- In the risk register: Maintain the complete list for reference, but do not force stakeholders to review it entirely each cycle.
- Trends matter more than states: "This risk is increasing because the vendor missed their second consecutive milestone" is more informative than "Vendor risk: HIGH."
Metrics for Uncertain Projects
When requirements are unclear or evolving, traditional progress metrics (percent complete, features delivered) break down because the target is moving.
Alternative Progress Indicators
Validated learning: How many key assumptions have been tested? How many hypotheses confirmed or invalidated?
Working software in user hands: Tangible artifacts that users can interact with, even if incomplete.
Decision velocity: How quickly are ambiguities being resolved and decisions being made?
Scope stability rate: Not expecting zero changes, but tracking whether the rate of change is decreasing (indicating convergence) or increasing (indicating chaos).
Confidence levels: Track not just what is done but what the team is confident about versus still uncertain about.
Example: At Amazon, Jeff Bezos reportedly asked teams to track "number of experiments run" as a key metric for new product development. The reasoning: in uncertain domains, the team that runs the most experiments learns the fastest, and learning is the true measure of progress.
The Danger of Gameable Metrics
Goodhart's Law: "When a measure becomes a target, it ceases to be a good measure."
Every metric can be gamed:
- Story points completed can be inflated by assigning more points to the same work
- Lines of code can be increased by writing verbose, duplicated code
- Tickets closed can be boosted by splitting work into smaller tickets
- On-time delivery can be achieved by reducing scope at the last minute
- Zero defects can be maintained by not testing thoroughly enough to find defects
The defense against gaming is layered metrics. No single metric tells the full story. Combine outcome metrics (customer satisfaction, business value) with process metrics (cycle time, defect rate) and input metrics (team health, investment allocation). Gaming one metric will create visible distortion in the others.
For a deeper treatment of measurement pitfalls, see KPIs explained without buzzwords.
Building a Metrics Practice
Start Small
Begin with three metrics:
- One delivery metric (delivery against commitments or cycle time)
- One quality metric (defect rate or rework percentage)
- One health metric (team satisfaction or blocker resolution time)
Automate Collection
If you cannot automate a metric, reconsider whether it is worth tracking. Manual metrics create maintenance burden and human error.
Review Regularly
Dedicate 10 minutes of your weekly team meeting to reviewing metrics trends. Not just the numbers -- the trends and the stories behind them.
Evolve the Metrics
As the project progresses, different metrics become relevant. Early-stage projects benefit from learning and experimentation metrics. Mid-stage projects benefit from delivery and coordination metrics. Late-stage projects benefit from quality and readiness metrics.
Never Punish Based on Metrics Alone
When metrics are used punitively, teams game them. When metrics are used diagnostically -- "this trend suggests we have a problem; let's understand it" -- teams keep them honest.
References
Doerr, J. "Measure What Matters: How Google, Bono, and the Gates Foundation Rock the World with OKRs." Portfolio, 2018.
Forsgren, N., Humble, J., & Kim, G. "Accelerate: The Science of Lean Software and DevOps." IT Revolution, 2018.
Goldratt, E. M. "The Goal: A Process of Ongoing Improvement." North River Press, 1984.
Goodhart, C. A. E. "Problems of Monetary Management: The U.K. Experience." Papers in Monetary Economics, Reserve Bank of Australia, 1975.
Etsy. "Engineering at Etsy." Code as Craft Blog, 2013. https://www.etsy.com/codeascraft
Hubbard, D. W. "How to Measure Anything: Finding the Value of Intangibles in Business." Wiley, 2014.
Ries, E. "The Lean Startup." Crown Business, 2011.
Reinertsen, D. G. "The Principles of Product Development Flow." Celeritas Publishing, 2009.
Anderson, D. J. "Kanban: Successful Evolutionary Change for Your Technology Business." Blue Hole Press, 2010.
Cohn, M. "Agile Estimating and Planning." Prentice Hall, 2005.