Few workplace rituals are as consistently dreaded, by managers and employees alike, as the annual performance review. Employees approach them with anxiety about judgment and compensation. Managers dread the conversations that feel scripted, the ratings that feel arbitrary, and the forms that feel designed for compliance rather than growth. HR professionals spend significant time managing the process. And at the end of it, research suggests that the result is often ambiguously useful: some employees receive genuinely valuable feedback, many receive feedback so vague as to be actionless, and some leave the conversation more confused about what is expected of them than they were going in.
This is not a universal indictment of structured performance conversations — they serve real functions that do not happen reliably without some formal mechanism. It is an indictment of how most performance reviews are designed and conducted. The problems are well-documented in the industrial-organizational psychology literature: rating scales that produce systematically biased results, cognitive biases that distort managers' recollections of performance, the structural tension between using the same conversation for developmental feedback and administrative decisions, and the one-year cadence that makes feedback too distant from the behavior it concerns.
Understanding these problems does not require accepting that performance reviews are hopeless. It requires understanding them well enough to either improve them within their structural constraints or advocate for approaches that work better. This article examines the research, maps the specific biases that distort reviews, explains the SBI feedback model and why its structure matters, addresses the continuous feedback versus annual review debate, and provides practical guidance for giving reviews that people find useful rather than performative.
"Most performance reviews are not really about performance. They are an HR compliance exercise dressed as a development conversation, and everyone involved knows it." — Common practitioner observation among organizational effectiveness consultants
Key Definitions
Performance appraisal: A formal, structured process for evaluating an employee's performance over a defined period, typically involving ratings, written assessments, and a conversation between manager and employee.
SBI model: Situation, Behavior, Impact — a feedback framework developed by the Center for Creative Leadership that structures feedback around observable behavior in a specific context and its documented effects.
Recency bias: The tendency to weight recent events disproportionately when evaluating performance over a longer period, skewing review accuracy toward the final weeks or months rather than the full review period.
Leniency bias: The systematic tendency of managers to assign higher performance ratings than are warranted, clustering ratings toward the high end of scales and avoiding negative evaluations.
Forced ranking: A performance management system that requires managers to distribute ratings according to a predetermined curve, placing fixed percentages of employees into each rating category.
What Research Says About Annual Reviews
The Evidence Base
Performance appraisal is among the most studied topics in industrial-organizational psychology. Herman Aguinis, a leading researcher in this area, has published extensively on what makes performance management systems effective versus counterproductive. His 2012 paper with Hee Joo and Harry Gottfredson identified a consistent finding: performance appraisal systems vary enormously in effectiveness, with many producing outcomes that are worse than no formal system at all.
The core problem identified in the research is what Deloitte's Marcus Buckingham and Ashley Goodall called the 'idiosyncratic rater effect': performance ratings tell us more about the rater than about the person being rated. Research by Buckingham and others found that up to 62% of the variance in performance ratings is attributable to characteristics of the rater — their tendency toward leniency or harshness, their relationship with the ratee, their own performance standards — rather than to objective performance differences. This means that annual performance ratings measure managerial perception with a large idiosyncratic bias rather than employee performance with high fidelity.
The Developmental vs. Administrative Tension
A fundamental structural problem with performance reviews is that they are typically asked to serve two distinct purposes simultaneously: administrative decisions (who gets what raise, who is promoted, who is on a performance improvement plan) and developmental conversations (how can this person grow, what are they doing well, what should change).
These two purposes pull in opposite directions. When an employee knows that what they say in the review affects their salary, they are inclined toward impression management rather than honest self-assessment. When a manager knows their ratings will be reviewed for calibration and that they will need to justify low ratings, they are inclined toward leniency rather than accurate assessment. The presence of stakes suppresses the honest exchange that developmental feedback requires.
Several major organizations have attempted to separate these functions. Adobe's redesign in 2012, led by Donna Morris (then Chief Talent Officer), eliminated the annual rating and replaced it with regular check-ins that explicitly excluded salary conversation from most feedback sessions. The company tracked voluntary turnover in the following years and found significant reductions, which Morris attributed partly to the change in feedback culture. Deloitte redesigned its performance management system around 2015 to eliminate ratings entirely in favor of questions about future-oriented decisions ('would I give this person the best assignment available?') that are easier to answer accurately.
Common Biases in Performance Reviews
| Bias | Description | Effect on Rating | How to Counter |
|---|---|---|---|
| Recency bias | Over-weights events near the review date | Ignores earlier performance; unfairly rewards/punishes recent behavior | Keep running notes throughout the year |
| Leniency bias | Rates employees higher than warranted | Rating inflation; managers avoid difficult conversations | Calibration sessions; forced distribution (cautiously) |
| Halo effect | One strong trait inflates overall rating | Masks real development areas | Evaluate each dimension separately |
| Similarity bias | Favors employees who resemble the manager | Disadvantages diverse employees | Structured criteria; 360 feedback |
| Idiosyncratic rater effect | Up to 62% of rating variance reflects the rater, not the ratee | Ratings measure manager perception, not performance | Combine multiple raters; focus on observable behaviors |
Cognitive Biases That Distort Performance Reviews
Recency Bias
Recency bias is probably the most practically significant and common bias in performance evaluation. When a manager sits down to write an annual review, they are relying primarily on memory of twelve months of performance. Memory is not a neutral archive — it is reconstructive and weighted toward recent events.
The result is that the employee's performance in the last month or two before the review period closes carries disproportionate weight relative to their performance in January through October. An employee who had a consistently strong year but made a visible mistake in November may receive a review that fails to credit the full year's performance. An employee who struggled for most of the year but had two successful project deliveries in the review-writing window may receive an inflated review.
The defense is mundane but effective: keeping contemporaneous notes. A manager who documents significant observations, accomplishments, and concerns throughout the year — in brief notes after significant events, in a running document updated monthly — has a much more accurate basis for review than one relying on end-of-period recall. This practice is not common and not difficult; it is simply not habitual.
Leniency Bias
Leniency bias is the systematic overrating of performance relative to actual performance quality. In corporate rating distributions, the median rating is almost never 'meets expectations' — it is typically somewhere between 'meets' and 'exceeds' expectations, reflecting the tendency of managers to rate generously to avoid conflict, maintain relationships, and minimize administrative hassle from low-rating appeals.
A 2016 study by CEB (now Gartner) found that the majority of employees rated in corporate performance systems receive above-average ratings, which is mathematically impossible if ratings accurately reflect a normal distribution of performance. Leniency bias emerges because rating someone poorly is genuinely uncomfortable: it may damage the relationship, invite pushback and disagreement, require documentation and justification, and be perceived as unfair even if accurate.
Leniency bias makes the performance rating system unreliable for its intended purposes. If 85% of employees consistently receive 'exceeds expectations,' the rating provides no information about who is actually performing at a high level and makes compensation differentiation based on ratings essentially arbitrary.
Halo and Horn Effects
The halo effect occurs when a manager's positive overall impression of an employee causes them to rate all dimensions of performance more favorably, including dimensions the employee does not actually perform well on. The horn effect is the inverse: a negative overall impression causes systematic underrating across all dimensions.
These effects are well-documented in the broader psychology of judgment. A manager who likes a particular employee — because of personality fit, shared communication style, or the employee's visibility on successful projects — may rate that employee generously even on dimensions where they need improvement. Conversely, a manager who has had interpersonal friction with an employee may underrate them even on dimensions where their performance is genuinely strong.
Affinity Bias
Research on performance evaluation consistently finds racial and gender disparities that cannot be fully explained by performance differences. A meta-analysis by Castilla and Benard found evidence of 'merit paradox' in meritocratic systems — organizations that most strongly emphasize merit-based evaluation sometimes show larger gender pay gaps, because the emphasis on merit creates the impression that biases have been eliminated, reducing vigilance. Performance ratings are a significant transmission mechanism for organizational inequality.
The SBI Feedback Model
Why Structure Matters
Most feedback that managers give is too vague to be actionable, too global to be received without defensiveness, or both. 'Your communication needs work' tells an employee nothing about what specifically to change and invites the response 'I disagree' rather than 'I understand.' 'You're a strong performer but need to be more strategic' is similarly global: the employee may hear validation and discount the second part, or hear criticism without any basis for knowing what strategic means in their specific context.
The SBI model, developed and extensively used by the Center for Creative Leadership (CCL), provides a template that produces feedback that is specific, observable, and non-judgmental in its structure. It does not guarantee comfortable conversations, but it significantly reduces the likelihood that feedback will be dismissed because it is too vague or received defensively because it sounds like a character judgment.
Applying the Model
Situation: describe the specific context. 'In the client presentation on October 14th' is a situation. 'In your recent presentations' is not — it is too general to anchor a specific behavior. The situation creates a shared reference point that both manager and employee can agree on factually.
Behavior: describe what was observable. 'You did not make eye contact with the client during your section of the presentation and read directly from your slides' is a behavior — it is specific and observable. 'You seemed uncomfortable and unprepared' is an interpretation — it assigns mental state and judgment to observable actions. Feedback on behavior is easier to receive because it does not require the employee to accept a judgment about who they are.
Impact: describe the effect of the behavior. 'As a result, the client asked several questions that suggested they had not followed the presentation, and after the meeting they emailed asking for a written summary of the points you covered.' The impact grounds the feedback in consequences rather than in the manager's personal preferences. It answers the employee's implicit question: 'why does this matter?'
Completing the Loop
SBI is a model for structuring the delivery of feedback, not for the entire feedback conversation. Effective feedback conversations typically continue beyond the SBI structure: the manager pauses after stating impact and invites response ('what was your experience of that section of the presentation?'), which may reveal information the manager was not aware of (the employee was given the wrong materials, there was a family emergency the prior night, the section was assigned at the last minute). The SBI structure opens a conversation; it does not close one.
Continuous Feedback vs. Annual Reviews
The Case for Continuous Feedback
The argument for continuous feedback rather than annual reviews is fundamentally about timing. Feedback is most useful when it is proximate to the behavior it concerns: when the employee still remembers the context clearly, when the behavior is still within reach of immediate change, and when the emotional stakes are lower because no single conversation determines annual compensation.
Research on learning and skill development is consistent: feedback that follows behavior quickly and specifically is more effective than delayed, aggregated feedback. A sports coaching analogy is useful: a coach who waits until the end of a season to tell a player they are not planting their feet correctly before a shot provides feedback that is too late to affect the hundreds of opportunities for practice that occurred during the season.
Adobe's experience after eliminating annual reviews is the most frequently cited organizational case study. The company replaced annual appraisals with regular check-ins — informal, manager-initiated conversations expected to happen at least quarterly — covering three areas: expectations, feedback, and growth and development. Managers reported spending less total time on performance management despite the increased conversation frequency, and the company tracked significant decreases in voluntary turnover.
The Case for Structured Annual Conversations
The strongest argument for retaining some form of structured annual conversation is that important developmental and career discussions are unlikely to happen consistently without a formal mechanism requiring them. In the absence of a scheduled, required conversation, managers who are uncomfortable with development discussions will delay or avoid them indefinitely.
A structured annual (or semi-annual) review provides a forcing function: both parties prepare for and show up to a conversation about long-term performance trends, career trajectory, and development priorities that would not organically occur in the flow of daily work. The developmental conversation does not need to include ratings or determine compensation to serve this function.
The emerging practitioner consensus is a hybrid: retain structured semi-annual or annual conversations focused on development and career trajectory, while providing ongoing informal feedback continuously throughout the year. The structured conversations are stripped of numerical ratings where possible, or ratings are separated from the developmental conversation by time and context, to reduce the administrative-defensive dynamic.
Making Performance Reviews Actually Useful
Preparation as the Core Practice
The single most reliable predictor of review quality is preparation — specifically, the manager's preparation. Reviews conducted without contemporaneous notes, without specific examples of both strong performance and development areas, without attention to the full review period, and without a considered view of what the employee most needs to hear are consistently rated as less useful by employees than those where specific, documented examples are provided.
Preparation for a genuinely useful review includes: reviewing your notes and documentation from the full review period, identifying two to three specific observations in each of the strong performance and development area categories, formulating those observations in SBI terms, and thinking about what the employee's career goals are and how the review conversation can connect to them.
Asking Good Questions
Performance reviews are conversations, not monologues. Managers who do most of the talking in a review miss significant opportunities to gather information that would improve the feedback, understand the employee's perspective, and surface concerns the employee has not raised through other channels.
Useful opening questions include: 'What do you feel best about from this past year?' 'What do you feel you could have done differently?' 'What has been most frustrating or demotivating?' 'What would make the next year more productive and satisfying for you?' These questions serve multiple functions: they give the employee agency in the conversation, they surface information the manager does not have, and they set a tone of mutual inquiry rather than unilateral judgment.
Handling Disagreement
Employees sometimes disagree with the feedback they receive in performance reviews — about ratings, about characterizations of specific events, or about the significance of particular incidents. Disagreement in a review is not inherently a problem; it is information. A manager who responds to disagreement by defending their position and closing down the conversation learns less and damages trust. A manager who responds with curiosity — 'tell me more about how you saw that situation' — may discover that the employee has relevant context the manager lacked, or may understand more clearly what the employee heard versus what was intended.
Agreement is not the goal of a performance review. Understanding is. The employee may leave still disagreeing with a rating while feeling heard, having received specific feedback they understand, and with a clear sense of what expectations look like going forward. That is a successful review.
Practical Takeaways
Great performance reviews are not primarily about the format, the rating scale, or the frequency — they are about the quality of the manager's knowledge of the employee's work, the honesty and care with which feedback is delivered, and the genuine developmental intent behind the conversation.
The most actionable starting points: keep contemporaneous notes throughout the review period, practice the SBI structure for specific feedback before entering review conversations, separate ratings and compensation discussions from developmental conversations where possible, and treat the review as the beginning of a development conversation rather than the end-of-year verdict.
For organizations evaluating their performance management systems: the research suggests that the frequency and formality matter less than the quality of manager preparation, the specificity of feedback, and whether employees leave reviews with a clear understanding of expectations and a sense that someone is genuinely invested in their development.
References
- Aguinis, H., Joo, H., & Gottfredson, R. K. (2011). 'Why we hate performance management — and why we should love it.' Business Horizons, 54(6), 503-507.
- Buckingham, M., & Goodall, A. (2015). 'Reinventing performance management.' Harvard Business Review, April 2015.
- Rock, D., Davis, J., & Jones, B. (2014). 'Kill your performance ratings.' Strategy+Business, August 2014.
- Center for Creative Leadership. (2022). Feedback That Works: How to Build and Deliver Your Message. CCL Press.
- Cunningham, L. (2015). 'In big move, Accenture will get rid of annual performance reviews and rankings.' The Washington Post, July 21.
- Castilla, E. J., & Benard, S. (2010). 'The paradox of meritocracy in organizations.' Administrative Science Quarterly, 55(4), 543-676.
- Morris, D. (2012). 'Adobe breaks free from annual performance reviews.' Adobe Life Blog, February 13.
- CEB (Gartner). (2016). The Real Impact of Eliminating Performance Ratings. CEB Corporate Leadership Council.
- Pulakos, E. D. (2004). Performance Management: A Roadmap for Developing, Implementing and Evaluating Performance Management Systems. SHRM Foundation.
- Ilgen, D. R., & Davis, C. A. (2000). 'Bearing bad news: Reactions to negative performance feedback.' Applied Psychology: An International Review, 49(3), 550-565.
- Longenecker, C. O., Sims, H. P., & Gioia, D. A. (1987). 'Behind the mask: The politics of employee appraisal.' The Academy of Management Executive, 1(3), 183-193.
- Grote, D. (2011). How to Be Good at Performance Appraisals: Simple, Effective, Done Right. Harvard Business Review Press.
Frequently Asked Questions
What does research say about whether annual performance reviews work?
The research on annual performance reviews is decidedly mixed. A widely cited 2012 study by Aguinis, Joo, and Gottfredson found that performance appraisal is one of the most researched topics in industrial-organizational psychology, but its effectiveness is highly variable depending on design and implementation quality. The core problem is that annual reviews conflate two distinct purposes that are poorly served by the same conversation: administrative decisions (compensation, promotion) and developmental feedback. When employees know a review determines their salary, they are less open to honest development conversations. Companies including Adobe, Deloitte, and Accenture have eliminated or significantly redesigned traditional annual reviews, with generally positive results.
What is the SBI feedback model?
SBI stands for Situation, Behavior, Impact — a structured feedback framework developed by the Center for Creative Leadership (CCL). The model provides a template for feedback that is specific, observable, and non-judgmental. Situation: describe the specific context when the behavior occurred ('In Tuesday's client meeting'). Behavior: describe the observable behavior, not an interpretation or character judgment ('you interrupted the client twice while they were describing their concern'). Impact: describe the effect of that behavior on you, the team, or the work ('which made it harder for us to fully understand what they needed, and I noticed the client seemed frustrated'). SBI replaces vague or global character assessments ('you have communication issues') with concrete, actionable observations that the recipient can hear without becoming defensive.
What is recency bias in performance reviews?
Recency bias is the tendency for managers to weight recent events disproportionately when evaluating performance over a longer period. In an annual review covering twelve months, an employee's performance in October and November often dominates the manager's mental model more than performance in January through August. This means that an employee who had a difficult year but finished strongly may receive a better review than they deserve, while an employee who had a strong year but struggled in the final months may receive a worse review. The practical defense against recency bias is keeping contemporaneous notes throughout the review period — documenting significant events, accomplishments, and development areas as they occur rather than relying on memory at review time.
What is leniency bias and why does it matter?
Leniency bias is the systematic tendency for managers to rate employees more favorably than their actual performance warrants — clustering ratings at the high end of rating scales and avoiding low ratings. Studies consistently find that in typical corporate rating distributions, most employees receive above-average ratings, which is mathematically impossible if ratings are accurate. Leniency bias occurs because giving low ratings is uncomfortable, can damage relationships, and may invite complaints and appeals. It makes performance rating systems unreliable for their intended purposes (identifying genuinely high and low performers, making fair compensation decisions) and can mask performance problems until they become severe. Forced ranking systems were developed partly to counter leniency bias, but created their own significant problems.
Should performance reviews be continuous or annual?
The research and practitioner consensus has shifted substantially toward continuous feedback rather than reliance on annual reviews alone. Adobe eliminated its annual review in 2012 and replaced it with regular 'check-ins' — informal, manager-initiated conversations about priorities, feedback, and growth. They reported significant reductions in voluntary turnover following the change. The argument for continuous feedback is that it is more timely (behavior is corrected or reinforced close to when it occurs), less high-stakes (no one conversation carries so much weight), and more natural (it integrates feedback into the flow of work rather than making it a formal event). Most practitioners recommend retaining some structured annual or semi-annual conversation to ensure development and career discussions happen, while conducting ongoing informal feedback continuously.