4 ways to automate staff performance reviews

Liam Jones

Liam Jones

Founder, Pilla App

Date Modified

20 May 2026

I'm Liam Jones, founder of Pilla and a qualified management consultant. I've helped hundreds of businesses set up workflows, and in this article I'm going to show you four real examples of how to set up your performance reviews. I'll start from the simplest and then add some more powerful options. You can open up each template in our workflow builder playground as a starting point and experiment for yourself. If you have any suggestions or you need some help, you can email me directly.

The workflows at a glance

Article Content

#1 - Two-rating review

Who it's for: Single-site managers running quarterly or annual reviews with a small team. An owner with 5-15 direct reports, or a single-venue manager handling annual reviews themselves.

Available on: Basic.

What it is: The simplest structured performance conversation: two ratings (responsibilities and behaviours), specific evidence for each, and three development goals for the period ahead. CIPD's research on review effectiveness shows the two things that matter most are ratings that mean the same thing across managers, and concrete evidence. Without them, reviews collapse into either a gut-feel "good job all round" or a long write-up nobody reads. The two-rating structure forces both.

In practice: A single-site pub owner runs annual reviews with 12 direct reports across the year. The canvas sits on their phone. For each review they type the team member's name and the review period, give a 1-5 rating on role responsibilities, and write the evidence ("ran the December rota with zero gaps"). Then they give a 1-5 rating on team behaviours, write the evidence ("calm during the December rush, talked the kitchen down twice when service backed up"), and set three development goals for next quarter. The whole conversation takes 40-60 minutes. The canvas holds the structure, so the conversation can be about the substance.

Why it works: The ratings are clearly set at 3 = doing the role, 4 = doing it consistently well, 5 = taking on more. The evidence fields ask for specific examples with dates and outcomes, not adjectives. The development goals are capped at three, and the team member has to own them.

Steps included:

  • 4 text inputs (team member's name, review period, did-well evidence, fell-short evidence)
  • 2 rating scales (responsibilities, behaviours, each 1 to 5)
  • 1 checklist (development goals for next period, 3 items)

When to upgrade:

  1. Different managers run reviews differently and a 4 means different things to different reviewers
  2. You want each rating tied to a photo of the team member's actual work, not just the manager's memory
  3. HR or head office wants reviews to feed a meeting where ratings are compared
  4. The relationship is in a capability or PIP process and records need to stand up at a tribunal

#2 - Performance review with anchors

Who it's for: Multi-site businesses with 3+ reviewers, organisations where ratings have been creeping upward, or teams that hold a review-comparison meeting after the individual reviews.

Available on: Standard.

What it is: The two-rating review plus guidance panels next to each rating and evidence field, explaining exactly what each rating means and what good evidence looks like. The anchors come from CIPD's guidance on keeping ratings consistent: 3 means doing the role to standard, 4 means doing it consistently well over a long stretch, 5 means taking on more than the role asks for. With anchors, a 4 from one manager equals a 4 from another, and the comparison meeting becomes a discussion about evidence instead of working out what each manager meant.

In practice: Take a 14-site casual dining group with 14 site managers, each running annual reviews for 8-15 team members. Before anchors, head office found that the share of 5-star ratings ran between 12% and 71% across sites. Either some sites had genuinely outstanding teams, or some managers were over-rating. The "Anchor the rating in their actual job description" panel above the responsibilities rating ties the score to what the role is meant to deliver. The "Specific examples with dates and outcomes" panel above the evidence field stops vague claims being treated as evidence. The "How they show up, not what they produce" panel above the behaviours rating separates how from what. The "Real examples, said face-to-face" panel above the fell-short evidence field stops vague impressions driving low ratings.

What it adds to the previous template:

  1. Different managers' 4s become the same 4
  2. Evidence becomes specific and dated, not just adjectives
  3. The behaviours rating stops being a score for being nice
  4. The comparison meeting weighs evidence, not each manager's own read

Why it works: The rating anchors sit next to each score, so a 4 means the same thing whether the most generous or the strictest manager gives it. Spelling out the scale at the point of rating is what turns the comparison meeting into a discussion about evidence, instead of working out what each manager meant.

Steps included:

  • 4 text inputs (team member's name, review period, did-well evidence, fell-short evidence)
  • 2 rating scales (responsibilities, behaviours, each 1 to 5)
  • 1 checklist (development goals for next period, 3 items)
  • 5 written guidance panels (rating anchors and evidence rules)

When to upgrade: When you want each rating tied to a photo of the team member's actual work (Performance Review #3), or when records need to stand up at a tribunal (Performance Review #4).

#3 - Performance review with annotated evidence

Who it's for: Teams that want each rating tied to something solid, a real piece of the team member's work, not just the manager's memory and a typed note. Mid-to-large businesses with HR-led review cycles, or anywhere ratings need to hold up beyond the room.

Available on: Standard.

What it is: The anchored review plus an annotated photo of the team member's actual work from the period: a rota they ran, a photo of their venue on a busy night, a customer feedback note they handled. The reviewer marks up the photo to show what it proves. It ties each rating to real delivery you can see, instead of resting on the manager's recollection.

In practice: Take a 30-site retail group with HR-led comparison meetings. A manager rating a supervisor on responsibilities photographs the December rota they ran with zero gaps and annotates it to show the point. At the comparison meeting, the rating is no longer one manager's word against another's. The evidence is on screen, tied to the score.

What it adds to the previous template:

  1. An annotated photo of real work from the period, tying the rating to delivery you can see
  2. Ratings backed by an artefact, not just the manager's memory of the period
  3. A comparison meeting that weighs real evidence, not competing recollections

Why it works: A rating anchored to a real work artefact is far harder to inflate or dispute than one resting on memory. The annotated photo shows what the score is actually based on, captured from the period itself.

Steps included:

  • 4 text inputs (team member's name, review period, did-well evidence, fell-short evidence)
  • 2 rating scales (responsibilities, behaviours, each 1 to 5)
  • 1 checklist (development goals for next period, 3 items)
  • 5 written guidance panels (rating anchors and evidence rules)
  • 1 annotated photo (work outputs from the period)

When to upgrade: When records need to be signed by both sides to be defensible against a tribunal challenge or capability dispute (Performance Review #4).

#4 - Performance review with artefacts and signatures

Who it's for: HR-led organisations that run capability processes, multi-site groups that have faced unfair-dismissal claims before, and anywhere a tribunal might later look at whether the review was genuinely fair.

Available on: Standard.

What it is: The annotated-evidence review plus a signature from each side: the manager and the team member. The two signatures turn the canvas from a manager's record into a record both sides signed, which is what tribunal hearings often treat as best-practice evidence of a fair process.

In practice: Take a 50-site care provider with quarterly performance reviews across 800+ team members. Their HR team defends roughly 6-12 capability cases a year, and each one might be challenged at a tribunal. Before evidence capture, the review record was a manager's tick-list with nothing solid and no sign-off from the team member. After evidence capture, the canvas holds a photo of the team member's actual work (a rota, a care plan, a feedback note), a manager signature committing to the rating and goals, and a team-member signature confirming they agree the review happened as recorded. If the team member later claims they were never told what they needed to improve, that signature on the timestamped canvas is the answer. If they claim the rating was unfair, the work photo is the proof.

What it adds to the previous template:

  1. Manager signature: a clear commitment to the rating and development goals, which holds up against later challenges
  2. Team-member signature: proof the team member saw and agreed the review happened as recorded, which is key for capability cases and tribunals
  3. A complete, tribunal-ready record: the annotated work photo and both signatures on one timestamped canvas

Why it works: The work photo ties the rating to real delivery, and the two signatures make the review a record both sides signed. Taken at the time of the review, that combination is what a tribunal treats as evidence of a fair process, rather than a manager's word with nothing to back it.

Steps included:

  • 4 text inputs (team member's name, review period, did-well evidence, fell-short evidence)
  • 2 rating scales (responsibilities, behaviours, each 1 to 5)
  • 1 checklist (development goals for next period, 3 items)
  • 5 written guidance panels (rating anchors and evidence rules)
  • 1 annotated photo (work outputs from the period)
  • 2 signatures (manager sign-off, team-member sign-off)

When to upgrade: When you want AI to gather the work evidence on its own (Poppi pulling completed canvases from the review period), flag a review that's overdue, or post the development goals to a follow-up system. Those versions are coming in the next post update.

How to pick the right version

You don't need to know our product to choose. Just answer three questions about how your reviews actually run. Each one moves you up a rung.

Is it just you reviewing, or several managers?

If you run every review, a 4 means whatever you mean by it. The moment several managers review, a 4 from one has to mean the same as a 4 from another, or the scores don't compare. If it's just you, #1 is enough. If several managers review and you worry their ratings don't line up, start at #2, where the anchors next to each score pin down what each number means.

Do you need evidence behind the rating?

A written review tells you the rating and the manager's note. A photo of the team member's actual work ties the rating to something you can see. If the typed evidence is enough, stop at #2. If you want the rating backed by a real artefact, #3 adds an annotated photo of their work.

Do you need proof, or is a record enough?

A written review tells you what was said. Proof is something you can put in front of a tribunal if a capability case is ever challenged. If a photo-backed record is enough, stop at #3. If you need to show your working with sign-off, #4 adds signatures from both sides.

  • One-to-ones - the weekly routine that feeds the bigger review
  • Onboarding - the first-week structure that informs the first review
  • Interviews - the structured hiring conversation that started the relationship

Conclusion

A performance review is only useful when ratings mean the same thing across managers and the evidence is solid. When ratings don't line up, they tend to creep upward. When ratings rest on memory alone, they're easy to dispute. Businesses running Performance Review #3 or #4 tie every rating to a real work artefact and a record both sides signed.

Five more versions are coming in the next refresh that bring AI into the review. Poppi gathers the work evidence on its own (pulling completed canvases from the review period). It can flag a review that's overdue, post development goals to a follow-up system, decide whether to escalate borderline ratings, and set next quarter's focus based on the spread of ratings. Those need more review time and will land separately.

Build your own performance review canvas on Pilla. Basic plan unlocks Performance Review #1 today.