Few workplace rituals are as consistently dreaded, by managers and employees alike, as the annual performance review. Employees approach them with anxiety about judgment and compensation. Managers dread the conversations that feel scripted, the ratings that feel arbitrary, and the forms that feel designed for compliance rather than growth. HR professionals spend significant time managing the process. And at the end of it, research suggests that the result is often ambiguously useful: some employees receive genuinely valuable feedback, many receive feedback so vague as to be actionless, and some leave the conversation more confused about what is expected of them than they were going in.
This is not a universal indictment of structured performance conversations — they serve real functions that do not happen reliably without some formal mechanism. It is an indictment of how most performance reviews are designed and conducted. The problems are well-documented in the industrial-organizational psychology literature: rating scales that produce systematically biased results, cognitive biases that distort managers' recollections of performance, the structural tension between using the same conversation for developmental feedback and administrative decisions, and the one-year cadence that makes feedback too distant from the behavior it concerns.
Understanding these problems does not require accepting that performance reviews are hopeless. It requires understanding them well enough to either improve them within their structural constraints or advocate for approaches that work better. This article examines the research, maps the specific biases that distort reviews, explains the SBI feedback model and why its structure matters, addresses the continuous feedback versus annual review debate, and provides practical guidance for giving reviews that people find useful rather than performative.
"Most performance reviews are not really about performance. They are an HR compliance exercise dressed as a development conversation, and everyone involved knows it." — Common practitioner observation among organizational effectiveness consultants
Key Definitions
Performance appraisal: A formal, structured process for evaluating an employee's performance over a defined period, typically involving ratings, written assessments, and a conversation between manager and employee.
SBI model: Situation, Behavior, Impact — a feedback framework developed by the Center for Creative Leadership that structures feedback around observable behavior in a specific context and its documented effects.
Recency bias: The tendency to weight recent events disproportionately when evaluating performance over a longer period, skewing review accuracy toward the final weeks or months rather than the full review period.
Leniency bias: The systematic tendency of managers to assign higher performance ratings than are warranted, clustering ratings toward the high end of scales and avoiding negative evaluations.
Forced ranking: A performance management system that requires managers to distribute ratings according to a predetermined curve, placing fixed percentages of employees into each rating category. Associated most famously with GE under Jack Welch, and later abandoned or modified by most major adopters due to its documented adverse effects on collaboration and trust.
Idiosyncratic rater effect: The documented finding that performance ratings reflect characteristics of the rater — their personal tendencies toward leniency or harshness, their relationship with the ratee — at least as much as they reflect actual performance differences.
Calibration: A process in which multiple managers discuss and align their ratings, typically before ratings are finalized, to reduce individual rater biases and ensure that rating standards are applied consistently across teams.
What Research Says About Annual Reviews
The Evidence Base
Performance appraisal is among the most studied topics in industrial-organizational psychology. Herman Aguinis, a leading researcher in this area, has published extensively on what makes performance management systems effective versus counterproductive. His 2011 paper with Hee Joo and Harry Gottfredson in Business Horizons identified a consistent finding: performance appraisal systems vary enormously in effectiveness, with many producing outcomes that are worse than no formal system at all (Aguinis, Joo & Gottfredson, 2011).
The core problem identified in the research is what Deloitte's Marcus Buckingham and Ashley Goodall called the idiosyncratic rater effect: performance ratings tell us more about the rater than about the person being rated. Research by Buckingham and colleagues published in Harvard Business Review (2015) found that up to 62% of the variance in performance ratings is attributable to characteristics of the rater — their tendency toward leniency or harshness, their relationship with the ratee, their own performance standards — rather than to objective performance differences. This means that annual performance ratings measure managerial perception with a large idiosyncratic bias rather than employee performance with high fidelity.
A comprehensive meta-analysis by Landy and Farr (1980), which examined several decades of performance rating research, reached a conclusion that has been repeatedly confirmed in subsequent research: the relationship between supervisor ratings and objective performance measures is surprisingly weak. The correlation between supervisor ratings and independently validated performance measures typically falls in the range of r = 0.20 to r = 0.35 — meaningful, but far smaller than the organizational weight placed on annual ratings would suggest is warranted.
The Scope of the Problem in Organizations
The practical consequences of these research findings are visible in survey data. A 2019 Gallup survey found that only 14% of employees strongly agreed that their performance reviews inspired them to improve. A separate CEB (now Gartner) study of 18,000 employees found that nearly 90% of HR leaders rated their performance management systems as mediocre or worse. And a survey by the Society for Human Resource Management (SHRM) found that 90% of HR professionals thought their performance review systems failed to yield accurate information.
These numbers reflect a systemic dysfunction, not isolated poor execution. The architecture of the typical annual review — concentrated at year-end, combining administrative and developmental purposes, relying on manager memory, rating individuals on broad global scales — produces predictably poor results almost regardless of how carefully it is administered.
The Developmental vs. Administrative Tension
A fundamental structural problem with performance reviews is that they are typically asked to serve two distinct purposes simultaneously: administrative decisions (who gets what raise, who is promoted, who is on a performance improvement plan) and developmental conversations (how can this person grow, what are they doing well, what should change).
These two purposes pull in opposite directions. When an employee knows that what they say in the review affects their salary, they are inclined toward impression management rather than honest self-assessment. When a manager knows their ratings will be reviewed for calibration and that they will need to justify low ratings, they are inclined toward leniency rather than accurate assessment. The presence of stakes suppresses the honest exchange that developmental feedback requires.
Several major organizations have attempted to separate these functions. Adobe's redesign in 2012, led by Donna Morris (then Chief Talent Officer), eliminated the annual rating and replaced it with regular check-ins that explicitly excluded salary conversation from most feedback sessions. The company tracked voluntary turnover in the following years and found a 30% reduction, which Morris attributed partly to the change in feedback culture (Morris, 2012). Deloitte redesigned its performance management system around 2015 to eliminate ratings entirely in favor of questions about future-oriented decisions ('would I give this person the best assignment available?') that are easier to answer accurately. Accenture made a similar shift in the same year, with roughly 330,000 employees affected (Cunningham, 2015).
Common Biases in Performance Reviews
| Bias | Description | Effect on Rating | How to Counter |
|---|---|---|---|
| Recency bias | Over-weights events near the review date | Ignores earlier performance; unfairly rewards/punishes recent behavior | Keep running notes throughout the year |
| Leniency bias | Rates employees higher than warranted | Rating inflation; managers avoid difficult conversations | Calibration sessions; forced distribution (cautiously) |
| Halo effect | One strong trait inflates overall rating | Masks real development areas | Evaluate each dimension separately |
| Horn effect | One negative impression deflates overall rating | Underestimates genuine strengths | Structured criteria; separate dimensional ratings |
| Similarity bias | Favors employees who resemble the manager | Disadvantages diverse employees | Structured criteria; 360 feedback |
| Idiosyncratic rater effect | Up to 62% of rating variance reflects the rater, not the ratee | Ratings measure manager perception, not performance | Combine multiple raters; focus on observable behaviors |
| Attribution error | Attributes outcomes to the person, not circumstances | Ignores context and systemic factors | Ask: what circumstances shaped this outcome? |
Cognitive Biases That Distort Performance Reviews
Recency Bias
Recency bias is probably the most practically significant and common bias in performance evaluation. When a manager sits down to write an annual review, they are relying primarily on memory of twelve months of performance. Memory is not a neutral archive — it is reconstructive and weighted toward recent events.
The result is that the employee's performance in the last month or two before the review period closes carries disproportionate weight relative to their performance in January through October. An employee who had a consistently strong year but made a visible mistake in November may receive a review that fails to credit the full year's performance. An employee who struggled for most of the year but had two successful project deliveries in the review-writing window may receive an inflated review.
The psychological mechanism underlying recency bias in this context is availability: events that are more easily recalled feel more representative of the whole. Kahneman's research on cognitive heuristics (Kahneman, 2011) explains why this is a structural feature of human judgment rather than a correctable attitude — the brain is not designed to accurately weight distributed events over long time horizons.
The defense is mundane but effective: keeping contemporaneous notes. A manager who documents significant observations, accomplishments, and concerns throughout the year — in brief notes after significant events, in a running document updated monthly — has a much more accurate basis for review than one relying on end-of-period recall. This practice is not common and not difficult; it is simply not habitual.
Leniency Bias
Leniency bias is the systematic overrating of performance relative to actual performance quality. In corporate rating distributions, the median rating is almost never 'meets expectations' — it is typically somewhere between 'meets' and 'exceeds' expectations, reflecting the tendency of managers to rate generously to avoid conflict, maintain relationships, and minimize administrative hassle from low-rating appeals.
A 2016 study by CEB (now Gartner) found that the majority of employees rated in corporate performance systems receive above-average ratings, which is mathematically impossible if ratings accurately reflect a normal distribution of performance. Leniency bias emerges because rating someone poorly is genuinely uncomfortable: it may damage the relationship, invite pushback and disagreement, require documentation and justification, and be perceived as unfair even if accurate.
Leniency bias makes the performance rating system unreliable for its intended purposes. If 85% of employees consistently receive 'exceeds expectations,' the rating provides no information about who is actually performing at a high level and makes compensation differentiation based on ratings essentially arbitrary.
Research by Longenecker, Sims, and Gioia (1987), published in the Academy of Management Executive, investigated the political dimensions of performance appraisal through interviews with executives and found that deliberate inflation of ratings was widely practiced and openly acknowledged — managers rated generously to avoid conflict, to motivate employees, and to protect their own teams in organizational resource allocation. The research was among the first to document formally what practitioners had long known: performance ratings are social acts as much as measurement acts.
Halo and Horn Effects
The halo effect occurs when a manager's positive overall impression of an employee causes them to rate all dimensions of performance more favorably, including dimensions the employee does not actually perform well on. The horn effect is the inverse: a negative overall impression causes systematic underrating across all dimensions.
These effects are well-documented in the broader psychology of judgment. A manager who likes a particular employee — because of personality fit, shared communication style, or the employee's visibility on successful projects — may rate that employee generously even on dimensions where they need improvement. Conversely, a manager who has had interpersonal friction with an employee may underrate them even on dimensions where their performance is genuinely strong.
Research by Thorndike (1920), who first described the halo effect in the context of military officer ratings, found that correlations between officers' ratings on entirely separate dimensions (physical appearance, intelligence, character) were far higher than could be explained by genuine trait correlations — indicating that a general favorable or unfavorable impression was contaminating all dimensional ratings. The effect has been replicated in performance appraisal contexts across dozens of subsequent studies.
Affinity and Similarity Bias
Research on performance evaluation consistently finds racial and gender disparities that cannot be fully explained by performance differences. A meta-analysis by Castilla and Benard (2010) found evidence of 'merit paradox' in meritocratic systems — organizations that most strongly emphasize merit-based evaluation sometimes show larger gender pay gaps, because the emphasis on merit creates the impression that biases have been eliminated, reducing vigilance (Castilla & Benard, 2010).
Similarity bias — the tendency to rate more favorably those who resemble oneself in background, communication style, or demographic characteristics — is one of the mechanisms through which performance evaluation perpetuates rather than corrects organizational inequality. A manager who has more in common with some direct reports than others will, without deliberate counter-measures, tend to give those direct reports the benefit of the doubt in ambiguous evaluative situations.
The structural response to similarity bias in performance evaluation includes: using structured evaluation criteria rather than global impressions, incorporating 360-degree feedback from multiple raters with different relationships to the ratee, and conducting calibration sessions where managers discuss borderline ratings and surface the reasoning behind them.
The SBI Feedback Model
Why Structure Matters
Most feedback that managers give is too vague to be actionable, too global to be received without defensiveness, or both. 'Your communication needs work' tells an employee nothing about what specifically to change and invites the response 'I disagree' rather than 'I understand.' 'You're a strong performer but need to be more strategic' is similarly global: the employee may hear validation and discount the second part, or hear criticism without any basis for knowing what strategic means in their specific context.
The SBI model, developed and extensively used by the Center for Creative Leadership (CCL), provides a template that produces feedback that is specific, observable, and non-judgmental in its structure (Center for Creative Leadership, 2022). It does not guarantee comfortable conversations, but it significantly reduces the likelihood that feedback will be dismissed because it is too vague or received defensively because it sounds like a character judgment.
Research on feedback specificity consistently finds that specific behavioral feedback is more likely to result in behavior change than global evaluative feedback. A meta-analysis by Kluger and DeNisi (1996), published in Psychological Bulletin and considered the most rigorous review of feedback effectiveness research, found that feedback reduced performance in over one-third of cases — primarily when feedback focused on the person's self-image rather than on specific, changeable behaviors. The SBI structure is essentially a mechanism for keeping feedback in the behavior-and-impact domain rather than sliding into person-level judgment.
Applying the Model
Situation: describe the specific context. 'In the client presentation on October 14th' is a situation. 'In your recent presentations' is not — it is too general to anchor a specific behavior. The situation creates a shared reference point that both manager and employee can agree on factually. This shared factual anchor prevents the feedback conversation from degenerating into a dispute about whether the event being described actually occurred.
Behavior: describe what was observable. 'You did not make eye contact with the client during your section of the presentation and read directly from your slides' is a behavior — it is specific and observable. 'You seemed uncomfortable and unprepared' is an interpretation — it assigns mental state and judgment to observable actions. Feedback on behavior is easier to receive because it does not require the employee to accept a judgment about who they are. It only requires them to engage with a description of what they did.
Impact: describe the effect of the behavior. 'As a result, the client asked several questions that suggested they had not followed the presentation, and after the meeting they emailed asking for a written summary of the points you covered.' The impact grounds the feedback in consequences rather than in the manager's personal preferences. It answers the employee's implicit question: 'why does this matter?' and connects specific behavior to organizational outcomes in a way that makes the feedback feel relevant rather than arbitrary.
SBI in Practice: An Extended Example
Consider a common management situation: a direct report interrupted a colleague repeatedly in a team meeting, and two people mentioned it to the manager afterward.
Without SBI: "You can be pretty dominant in meetings. You need to be a better listener."
With SBI: "In yesterday's product strategy meeting [situation], I noticed you spoke over Michael three times while he was mid-sentence, and after the third time he stopped contributing to the discussion [behavior]. Two team members mentioned after the meeting that they felt Michael's input had been cut off before he could make his point [impact]. I wanted to raise it because I think we're missing some of Michael's perspective, and I'm not sure you were aware of how it landed."
The second version is more work to construct. It is also incomparably more useful. The employee knows exactly what happened, when it happened, what the observable behavior was, and what effect it had — and receives the feedback in a context that signals the manager's constructive intent rather than a personality criticism.
Completing the Loop
SBI is a model for structuring the delivery of feedback, not for the entire feedback conversation. Effective feedback conversations typically continue beyond the SBI structure: the manager pauses after stating impact and invites response ('what was your experience of that section of the presentation?'), which may reveal information the manager was not aware of (the employee was given the wrong materials, there was a family emergency the prior night, the section was assigned at the last minute). The SBI structure opens a conversation; it does not close one.
The full feedback conversation following SBI ideally includes: the manager's SBI-structured observation, the employee's response and perspective, collaborative exploration of what led to the behavior, discussion of what different behavior would look like, and agreement on what the employee will try differently going forward. This takes more time than a unilateral pronouncement. It is also significantly more likely to result in behavior change.
Continuous Feedback vs. Annual Reviews
The Case for Continuous Feedback
The argument for continuous feedback rather than annual reviews is fundamentally about timing. Feedback is most useful when it is proximate to the behavior it concerns: when the employee still remembers the context clearly, when the behavior is still within reach of immediate change, and when the emotional stakes are lower because no single conversation determines annual compensation.
Research on learning and skill development is consistent on this point. Kluger and DeNisi's 1996 meta-analysis found that immediate, specific feedback was substantially more effective than delayed, aggregated feedback on measures of skill acquisition and performance improvement. A sports coaching analogy is useful: a coach who waits until the end of a season to tell a player they are not planting their feet correctly before a shot provides feedback that is too late to affect the hundreds of opportunities for practice that occurred during the season.
Adobe's experience after eliminating annual reviews is the most frequently cited organizational case study. The company replaced annual appraisals with regular check-ins — informal, manager-initiated conversations expected to happen at least quarterly — covering three areas: expectations, feedback, and growth and development. Managers reported spending less total time on performance management despite the increased conversation frequency, and the company tracked a 30% decrease in voluntary turnover over the period following the change (Morris, 2012).
Microsoft's performance management redesign under Satya Nadella (beginning around 2014) is another well-documented case. The company moved away from its notorious "stack ranking" system — a form of forced ranking that required managers to identify fixed percentages of their teams as top, average, and low performers — toward a model emphasizing continuous feedback and growth. The stack ranking system had been widely blamed for destroying collaboration at Microsoft (employees had incentive to sabotage peers who might rank above them) and for creating a culture of risk avoidance. The shift to continuous feedback was accompanied by broader cultural changes around growth mindset, influenced by Carol Dweck's work. Microsoft's employee satisfaction scores improved significantly in the years following the change, and the company's stock performance during Nadella's tenure has been used as circumstantial evidence for the transformation's effectiveness.
The Case for Structured Annual Conversations
The strongest argument for retaining some form of structured annual conversation is that important developmental and career discussions are unlikely to happen consistently without a formal mechanism requiring them. In the absence of a scheduled, required conversation, managers who are uncomfortable with development discussions will delay or avoid them indefinitely.
A structured annual (or semi-annual) review provides a forcing function: both parties prepare for and show up to a conversation about long-term performance trends, career trajectory, and development priorities that would not organically occur in the flow of daily work. The developmental conversation does not need to include ratings or determine compensation to serve this function.
Ilgen and Davis (2000), reviewing research on how employees receive negative performance feedback, found that formal, structured contexts for feedback delivery were associated with better retention and integration of difficult feedback — employees were more likely to accept and act on critical feedback when it was delivered in a context that was explicitly designated for evaluation and development, rather than informally in the flow of work. Structure signals seriousness; seriousness signals investment.
The emerging practitioner consensus is a hybrid: retain structured semi-annual or annual conversations focused on development and career trajectory, while providing ongoing informal feedback continuously throughout the year. The structured conversations are stripped of numerical ratings where possible, or ratings are separated from the developmental conversation by time and context, to reduce the administrative-defensive dynamic.
Designing Better Performance Conversations: A Framework
Based on the research literature and practitioner evidence, the following framework for performance conversations addresses the most significant documented failure modes.
| Element | Poor Practice | Better Practice | Research Basis |
|---|---|---|---|
| Timing | Annual only, at year-end | Semi-annual structured + ongoing informal | Kluger & DeNisi (1996); Adobe case study |
| Content | Global ratings + compensation in one conversation | Separate development from administrative decisions where possible | Buckingham & Goodall (2015) |
| Preparation | Manager writes review from memory | Manager keeps running notes throughout year | Recency bias research |
| Feedback structure | Vague global impressions | SBI-structured specific observations | CCL (2022); Kluger & DeNisi (1996) |
| Employee voice | Manager monologue with employee response | Manager asks before telling; invites self-assessment | Pulakos (2004) |
| Bias mitigation | Individual manager impression | Structured criteria + calibration sessions | Castilla & Benard (2010) |
| Follow-up | Review ends the conversation | Review initiates development plan with follow-up | Grote (2011) |
Making Performance Reviews Actually Useful
Preparation as the Core Practice
The single most reliable predictor of review quality is preparation — specifically, the manager's preparation. Reviews conducted without contemporaneous notes, without specific examples of both strong performance and development areas, without attention to the full review period, and without a considered view of what the employee most needs to hear are consistently rated as less useful by employees than those where specific, documented examples are provided.
Research on the effectiveness of performance appraisals, reviewed by Pulakos (2004), consistently identifies manager preparation as the variable most correlated with review usefulness as perceived by employees. Preparation is not complicated: it means keeping a running record throughout the year, reviewing it before the conversation, identifying two or three specific observations in SBI terms for each of the strong performance and development area categories, and thinking about what the employee's career goals are and how the review conversation can connect to them.
A practical approach many managers find useful is keeping a shared document — accessible to both manager and direct report — where significant observations, project completions, client feedback, and developmental conversations are noted throughout the year. This serves double duty: it counters recency bias by creating a full-year record, and it ensures the direct report is not surprised by the content of their review, because the inputs have been visible throughout the year.
Asking Good Questions
Performance reviews are conversations, not monologues. Managers who do most of the talking in a review miss significant opportunities to gather information that would improve the feedback, understand the employee's perspective, and surface concerns the employee has not raised through other channels.
Useful opening questions include: 'What do you feel best about from this past year?' 'What do you feel you could have done differently?' 'What has been most frustrating or demotivating?' 'What would make the next year more productive and satisfying for you?' These questions serve multiple functions: they give the employee agency in the conversation, they surface information the manager does not have, and they set a tone of mutual inquiry rather than unilateral judgment.
The self-assessment question — asking the employee to rate their own performance before the manager shares theirs — serves a specific function beyond information gathering. Research by Ilgen and Davis (2000) found that employees who had the opportunity to provide self-assessments before receiving manager feedback were more likely to accept and act on critical feedback than those who heard the manager's assessment first and were then asked to respond. The sequencing shifts the employee from defensive posture to reflective posture, which changes the entire texture of the conversation.
Handling Disagreement
Employees sometimes disagree with the feedback they receive in performance reviews — about ratings, about characterizations of specific events, or about the significance of particular incidents. Disagreement in a review is not inherently a problem; it is information. A manager who responds to disagreement by defending their position and closing down the conversation learns less and damages trust. A manager who responds with curiosity — 'tell me more about how you saw that situation' — may discover that the employee has relevant context the manager lacked, or may understand more clearly what the employee heard versus what was intended.
Agreement is not the goal of a performance review. Understanding is. The employee may leave still disagreeing with a rating while feeling heard, having received specific feedback they understand, and with a clear sense of what expectations look like going forward. That is a successful review.
A specific technique for handling rating disagreements: rather than debating the rating itself (a largely unproductive conversation, since ratings are inherently subjective), redirect to the underlying behavior evidence. 'I understand we see the rating differently. Let me describe the specific observations that informed it, and I'd like to hear how you see those same situations.' Grounding the conversation in specific behaviors, using something like the SBI structure, reduces the degree to which disagreement feels like a personal judgment and increases the degree to which it can be constructively explored.
The Development Plan: Making Reviews Forward-Looking
One of the most common failure modes of annual reviews is that they are entirely retrospective — focused on evaluating past performance — without any genuine forward-looking developmental component. This is the wrong design. The past is fixed; the only leverage is in what happens next.
An effective performance review allocates meaningful time to the question: 'What do we want to see different, better, or more of in the coming year, and what is the plan for getting there?' This is not the same as listing developmental goals in a form — it is a genuine conversation about what the employee is working toward, what skills they want to develop, what experiences would be most useful, and what the manager can do to create those experiences.
Research by DeRue and Wellman (2009) on how managers develop through experience found that three conditions predicted whether challenging experiences actually produced learning and growth: the developmental challenge of the experience, the learner's orientation toward learning (whether they approached it with a growth rather than performance mindset), and the availability of feedback. The performance review is the natural setting for addressing the third condition — explicitly naming what feedback will be available and how growth will be supported in the coming year.
Practical Takeaways
Great performance reviews are not primarily about the format, the rating scale, or the frequency — they are about the quality of the manager's knowledge of the employee's work, the honesty and care with which feedback is delivered, and the genuine developmental intent behind the conversation.
The most actionable starting points: keep contemporaneous notes throughout the review period, practice the SBI structure for specific feedback before entering review conversations, separate ratings and compensation discussions from developmental conversations where possible, ask before telling, and treat the review as the beginning of a development conversation rather than the end-of-year verdict.
For organizations evaluating their performance management systems: the research suggests that the frequency and formality matter less than the quality of manager preparation, the specificity of feedback, and whether employees leave reviews with a clear understanding of expectations and a sense that someone is genuinely invested in their development. The shift that most consistently produces better outcomes — across Adobe, Deloitte, Microsoft, Accenture, and the research literature — is separating the developmental conversation from the administrative decision, reducing or eliminating global numerical ratings, and moving toward more frequent, less formal, more specific feedback throughout the year.
References
- Aguinis, H., Joo, H., & Gottfredson, R. K. (2011). 'Why we hate performance management — and why we should love it.' Business Horizons, 54(6), 503-507.
- Buckingham, M., & Goodall, A. (2015). 'Reinventing performance management.' Harvard Business Review, April 2015.
- Rock, D., Davis, J., & Jones, B. (2014). 'Kill your performance ratings.' Strategy+Business, August 2014.
- Center for Creative Leadership. (2022). Feedback That Works: How to Build and Deliver Your Message. CCL Press.
- Cunningham, L. (2015). 'In big move, Accenture will get rid of annual performance reviews and rankings.' The Washington Post, July 21.
- Castilla, E. J., & Benard, S. (2010). 'The paradox of meritocracy in organizations.' Administrative Science Quarterly, 55(4), 543-676.
- Morris, D. (2012). 'Adobe breaks free from annual performance reviews.' Adobe Life Blog, February 13.
- CEB (Gartner). (2016). The Real Impact of Eliminating Performance Ratings. CEB Corporate Leadership Council.
- Pulakos, E. D. (2004). Performance Management: A Roadmap for Developing, Implementing and Evaluating Performance Management Systems. SHRM Foundation.
- Ilgen, D. R., & Davis, C. A. (2000). 'Bearing bad news: Reactions to negative performance feedback.' Applied Psychology: An International Review, 49(3), 550-565.
- Longenecker, C. O., Sims, H. P., & Gioia, D. A. (1987). 'Behind the mask: The politics of employee appraisal.' The Academy of Management Executive, 1(3), 183-193.
- Grote, D. (2011). How to Be Good at Performance Appraisals: Simple, Effective, Done Right. Harvard Business Review Press.
- Kluger, A. N., & DeNisi, A. (1996). 'The effects of feedback interventions on performance: A historical review, a meta-analysis, and a preliminary feedback intervention theory.' Psychological Bulletin, 119(2), 254-284.
- Landy, F. J., & Farr, J. L. (1980). 'Performance rating.' Psychological Bulletin, 87(1), 72-107.
- Kahneman, D. (2011). Thinking, Fast and Slow. Farrar, Straus and Giroux.
- DeRue, D. S., & Wellman, N. (2009). 'Developing leaders via experience: The role of developmental challenge, learning orientation, and feedback availability.' Journal of Applied Psychology, 94(4), 859-875.
- Thorndike, E. L. (1920). 'A constant error in psychological ratings.' Journal of Applied Psychology, 4(1), 25-29.
- Gallup. (2019). Performance Management: The Employee Experience. Gallup Inc.
Frequently Asked Questions
What does research say about whether annual performance reviews work?
The research on annual performance reviews is decidedly mixed. A widely cited 2012 study by Aguinis, Joo, and Gottfredson found that performance appraisal is one of the most researched topics in industrial-organizational psychology, but its effectiveness is highly variable depending on design and implementation quality. The core problem is that annual reviews conflate two distinct purposes that are poorly served by the same conversation: administrative decisions (compensation, promotion) and developmental feedback. When employees know a review determines their salary, they are less open to honest development conversations. Companies including Adobe, Deloitte, and Accenture have eliminated or significantly redesigned traditional annual reviews, with generally positive results.
What is the SBI feedback model?
SBI stands for Situation, Behavior, Impact — a structured feedback framework developed by the Center for Creative Leadership (CCL). The model provides a template for feedback that is specific, observable, and non-judgmental. Situation: describe the specific context when the behavior occurred ('In Tuesday's client meeting'). Behavior: describe the observable behavior, not an interpretation or character judgment ('you interrupted the client twice while they were describing their concern'). Impact: describe the effect of that behavior on you, the team, or the work ('which made it harder for us to fully understand what they needed, and I noticed the client seemed frustrated'). SBI replaces vague or global character assessments ('you have communication issues') with concrete, actionable observations that the recipient can hear without becoming defensive.
What is recency bias in performance reviews?
Recency bias is the tendency for managers to weight recent events disproportionately when evaluating performance over a longer period. In an annual review covering twelve months, an employee's performance in October and November often dominates the manager's mental model more than performance in January through August. This means that an employee who had a difficult year but finished strongly may receive a better review than they deserve, while an employee who had a strong year but struggled in the final months may receive a worse review. The practical defense against recency bias is keeping contemporaneous notes throughout the review period — documenting significant events, accomplishments, and development areas as they occur rather than relying on memory at review time.
What is leniency bias and why does it matter?
Leniency bias is the systematic tendency for managers to rate employees more favorably than their actual performance warrants — clustering ratings at the high end of rating scales and avoiding low ratings. Studies consistently find that in typical corporate rating distributions, most employees receive above-average ratings, which is mathematically impossible if ratings are accurate. Leniency bias occurs because giving low ratings is uncomfortable, can damage relationships, and may invite complaints and appeals. It makes performance rating systems unreliable for their intended purposes (identifying genuinely high and low performers, making fair compensation decisions) and can mask performance problems until they become severe. Forced ranking systems were developed partly to counter leniency bias, but created their own significant problems.
Should performance reviews be continuous or annual?
The research and practitioner consensus has shifted substantially toward continuous feedback rather than reliance on annual reviews alone. Adobe eliminated its annual review in 2012 and replaced it with regular 'check-ins' — informal, manager-initiated conversations about priorities, feedback, and growth. They reported significant reductions in voluntary turnover following the change. The argument for continuous feedback is that it is more timely (behavior is corrected or reinforced close to when it occurs), less high-stakes (no one conversation carries so much weight), and more natural (it integrates feedback into the flow of work rather than making it a formal event). Most practitioners recommend retaining some structured annual or semi-annual conversation to ensure development and career discussions happen, while conducting ongoing informal feedback continuously.