7 min read

The Six Signs Your UXR Team Is Actually Healthy

The Six Signs Your UXR Team Is Actually Healthy
Photo by Anna Pelzer / Unsplash

There is absolutely no shortage of frameworks that will rank your research team on a five-point scale and tell you you're a three on the way to a four. Throw them out. A research unit is healthy when its output survives a reorg, a tooling shift, and the loss of the person who built it. Everything else is mood.

Morale was fine. Stakeholders said nice things in the survey. The repository had four hundred studies in it. Then the team got dissolved, and almost nothing broke, because almost nothing had been structurally carrying weight.

I keep going back to that. Or to that kind of case, since I've now seen versions of it three times.

The metrics that measure nothing

When teams want to claim they are healthy, they reach for the same five metrics. Each measures something real, just not health.

  • Headcount measures budget tolerance, not value. Six researchers who produce decks nobody reopens are less healthy than one who changed a roadmap. Or two who changed two roadmaps. The point holds.
  • "A seat at the table" measures proximity, not influence. Plenty of researchers have a seat and spend it nodding.
  • Stakeholder satisfaction is the worst of the bunch, because it quietly rewards telling people what they already decided. The fastest route to a glowing score is to confirm the PM's prior and call it a study. High satisfaction with zero impact looks like a paradox until you sit in the meetings, where it stops looking like one fast.
  • Study volume measures activity. A unit that runs forty studies a year and a unit that runs eight can have identical effect on the business. The busier one is usually the sicker one, because motion got confused for work.
  • The repository is the one people are proudest of, and it is mostly a graveyard. Four hundred studies that nobody can find when they actually need them is closer to storage than to an asset, no matter how good the tagging looks in screenshots.

What the unit is actually for

A research unit exists to convert uncertainty into knowledge that someone acts on. That is the whole job. Not insights, not decks, not bringing the user's voice into the room. Reduced uncertainty that changes a decision.

Which gives you the only impact test that matters, and it is an uncomfortable one. If the org would have made the same call, with the same confidence, at the same time, in the same way, you produced nothing that quarter. The deck can be beautiful. The method can be airtight. If the decision was identical to the counterfactual, the work was decorative.

The counterfactual is the hard part, because nobody can run the actual experiment. You have to estimate it, and your estimate is shaped by the same biases that made you optimistic about your own impact in the first place. I think most researchers, myself included, mark our own work generously here. The PM said it changed her thinking, the design got revised, the launch hit its number. Sounds like impact. Could just as easily have been the path she was already on with a better-decorated week. I don't have a good fix for that. I just try to flinch a little harder when I'm telling myself a flattering story.

I keep coming back to this test for teams I worry about. Most fail it without realizing. The ones that pass share roughly six things, and I haven't found teams that have all six.

1) It runs at more than one speed

Sick units have a single tempo, and it is almost always the slow one. Every question becomes a six-week study, because six weeks is the only mode the team knows how to run. So the fast questions go unanswered, and someone less rigorous answers them instead, usually in a Slack thread.

Healthy units match the mode to the stakes. A cheap reversible decision gets a cheap fast answer (micro research, a day, sometimes an afternoon). The middle gets a sprint. Deep work is rationed for the questions that actually earn it.

Most of what people call research skill is really mode selection, even though nobody names it that. The slow teams have one mode and run it on everything, which is why they keep losing the fast questions before they get to answer them.

2) Its knowledge compounds instead of rotting

One test exposes most repositories. Can someone find what you already know about a topic before they commission a new study about that topic. The answer is almost always no.

If it is no, every study you have ever run is depreciating in a folder, and your team rediscovers its own findings every eighteen months at full price.

I have watched the same usability finding get rediscovered three times by three different researchers on the same product in under two years. Same finding, same readout format, nobody aware that the previous two existed. The repository had all three.

Compounding requires structure. Studies have to connect to something larger than themselves, a Frame that holds what the unit knows about a problem space and how confident it is in each piece. Without that you have a pile of PDFs with a search bar bolted to the front, and the search bar does not work, because it never does.

3) Its output is legible to machines

This one matters more this year than last, and it will matter more next year.

If your knowledge lives as unstructured decks, any agentic system pointed at it will happily ingest the lot and flatten the difference between a finding backed by forty interviews and an opinion someone typed into a slide once. The machine cannot tell them apart, because the format never recorded which was which.

Hallucination is the failure mode everyone worries about. The flatter, weirder problem is false equivalence. A replicated pattern, a one-off quote, a stakeholder hunch, and a midnight slide-note all become text. If the unit never encoded evidence quality, the model has no reason to treat them differently.

A healthy unit structures its claims so that provenance and confidence survive automation. Where the claim came from, how strong it is, what it would take to overturn it. Do that and your knowledge gets more useful as the tooling improves. Skip it and you are one ingestion job away from being indistinguishable from the company group chat.

4) It survives removal

The founder leaves. The platform changes. Headcount drops. A new VP shows up who thinks research is decoration. What still works?

If the unit collapses the moment its founder leaves or a new tool shows up, it was propped up. Propping looks like strength from the outside, which is most of the problem. From the inside it is usually one person quietly carrying what the system should carry, until they get tired or promoted or laid off.

I'm not entirely sure how many teams pass this test. Possibly fewer than say they would. Probably fewer than think they would.

5) It is funded like infrastructure

How an org views its research function shows up in the budget line. Headcount tells you the org will tolerate research. An operations budget tells you the org has decided research is load-bearing.

A team that argues for a recruiting credit every time it runs a study is on the tolerated side, no matter how clever its findings are. Healthy units have funded operations: ResOps headcount that absorbs the load, tooling budgets that survive the quarterly cost review, recruiting capacity that does not require begging across three Slack channels, vendor relationships the unit owns rather than borrows from marketing.

I once watched a Staff UXR spend four hours of a Tuesday morning on a procurement form for an incentive disbursement. Not a complicated form. Wrong person filling it out. The org was paying for that time at a staff salary, because the org had decided ResOps was not a real line item.

The diagnostic is one question. What share of a senior researcher's week goes to scheduling, legal review, panel management, formatting, and procurement chases. If the answer is more than fifteen percent, the unit cannot run at speed no matter how badly it wants to. The salary is being spent on work that should belong to a system.

6) It can say no, and it can say "we don't know"

Calibration is a health marker almost nobody lists.

A sick unit hedges everything into mush so it can never be caught wrong. Findings arrive wrapped in enough qualifiers that they commit to nothing, which feels safe and is useless, because a finding that cannot be wrong cannot be useful either.

What gets called calibrated hedging is, more often than not, plausible deniability with a research budget.

A healthy unit tells you the confidence is low when it is low, names the price of raising it, and says no to questions not worth answering. "We don't know, finding out costs three weeks, and this decision is not worth three weeks" is one of the healthiest sentences a research team can produce. Most teams cannot say it. Saying it requires being secure enough to decline work.

The split

Look at enough research teams and a smaller version of the K-shaped split I wrote about before shows up inside individual companies. Two kinds of units, getting easier to tell apart.

One kind is infrastructure. Its knowledge compounds, its output is legible, its findings are coupled to decisions, and it would survive most of what an org can throw at it.

The other kind is decoration. It runs studies, ships decks, posts the highlights, and produces a steady stream of artifacts that change nothing and get reopened by no one.

Most teams have pieces of both. Pressure tells you which one is real.

I don't think health is really a rung on a maturity ladder. It's which of those two you actually are, when nobody is watching the dashboard.

A healthy unit could lose half its people and its founder and still be the first thing the org reaches for a year later. An unhealthy one needs all of it intact just to stay visible.

I don't have a clean number for how many teams are in the first category. Fewer than would claim to be. The gap surfaces at the wrong moment, which is the only moment that ever asks the question.

🎯 Subscribe for more of this. Frequency depends on how often a topic refuses to leave my head. The line is long.