Simulation Design

What Makes a Leadership Simulation Feel Real

 ·  Rachel Kim

The most common complaint we heard about leadership simulations in the first year of building Coachvyne was this: "It felt like a test, not like a real situation." Leaders knew they were being evaluated, so they performed rather than led. They gave textbook answers. They avoided the ambiguity that real leadership conversations actually require. The simulation produced data, but it wasn't data about how they actually lead — it was data about how they answer leadership questions when they know someone is watching.

That problem is not unique to our platform. It's structural to the simulation category. And solving it, or getting close to solving it, requires understanding what actually makes a simulation feel real — not from a technology standpoint, but from a cognitive and emotional one.

We've spent significant time studying this, both in the design of our own scenarios and in the session data we've accumulated. Here's what we've learned about the four variables that most determine whether a leader engages authentically with a simulation.

Variable 1: Scenario specificity — the details that create cognitive commitment

Generic scenarios produce generic responses. When we say "you're a VP of Sales having a difficult conversation with your CFO about forecast accuracy," the leader has very little to hold onto. There's no texture. No prior relationship history. No institutional context. No stakes they recognize.

Contrast that with: "You're the VP of Sales at a B2B software company that missed its Q3 number by 22%. The CFO has seen three consecutive quarters of forecast variance. She's heading into the board meeting in 72 hours. Your controller has flagged that your largest deal — the one anchoring Q4 — hasn't had a contract signature in 11 days." Now there's something to engage with. The leader has context that makes decisions feel consequential.

Specificity creates what behavioral researchers call cognitive commitment — the mental state where a person is reasoning about a real situation rather than pattern-matching to an abstract category. The quality of response in a cognitively committed state is much more diagnostic of actual leadership capability than the response in a test-taking state.

The design implication is that building good scenarios takes significant investment. Shallow scenarios are fast to build and produce poor data. High-specificity scenarios take 10–15x longer to design, but the session data they generate is actually useful for identifying development gaps.

Variable 2: Character consistency — the other party has to be unpredictable in a believable way

One of the fastest ways a simulation loses fidelity is when the other party in the scenario responds to leader input in ways that don't feel coherent. If the simulated CFO backs down immediately when the VP pushes back, the leader recognizes that the simulation doesn't have real stakes. Real counterparts don't capitulate that fast. Real counterparts have their own logic, their own institutional pressures, and their own emotional state in the conversation.

Building a counterpart character that responds consistently — not predictably, but consistently — is one of the harder design problems in simulation. The character needs to have a point of view that it defends under pressure, but it can't be so rigid that the leader feels they're talking to a wall. The character needs to be persuadable but not immediately, needs to escalate when appropriate, needs to change register when the emotional temperature of the conversation shifts.

When we review sessions where leaders report the highest sense of realism, the common thread is almost always the counterpart character's behavior under pressure. When the CFO stays skeptical even after a reasonable response, and asks a follow-up that reveals she has information the VP didn't expect — that's when leaders lean forward. That's when the simulation starts to feel like something that matters.

Variable 3: Consequence structure — decisions need to have downstream effects

Most HR training simulations are single-frame: you make a choice, you get feedback, the session ends. That structure produces something closer to a self-assessment quiz than a leadership simulation. Real leadership conversations have sequences. What you say in minute three affects what becomes possible in minute eight. A concession you make early forecloses options later. A framing you establish in the opening shapes what the other person is willing to hear after that.

For a simulation to feel real, decisions made early in the conversation have to shape the trajectory of what comes later. If a VP handles the opening minutes of a board challenge poorly — becomes defensive rather than owning the miss — the board members in the scenario should become harder to manage, not easier. The simulation should have memory. It should know what happened before and use that information to make the current moment harder or easier based on how it was earned.

We're not saying that every simulation needs to be a 20-branch decision tree — that level of complexity becomes impossible to score meaningfully. But even a three-point branching structure, where the opening exchange sets one of three tone registers for the rest of the session, produces meaningfully more realistic engagement than a linear scenario.

Variable 4: Psychological safety — the condition everything else depends on

Here's the variable that often gets underweighted in simulation design: a leader will not engage authentically with a simulation they believe will be used against them. This isn't irrational. In many organizations, "development assessment" is indistinguishable from "performance review." Leaders have learned, often through direct experience, that showing a gap is a career risk, not a development opportunity.

If that belief is present when a leader enters a simulation, all the scenario specificity and character consistency in the world will not produce authentic behavior. The leader will perform the correct answer rather than actually leading. The simulation data will reflect their knowledge of leadership best practices, not their actual behavioral patterns under stress.

This means the psychological framing of the simulation is at least as important as the simulation's technical quality. The framing needs to be explicit and credible: the session output is for your development, not for your manager's performance file. The score is a starting point for a coaching conversation, not a judgment. And critically, that framing has to be backed by organizational behavior — if managers start referencing simulation scores in performance discussions, the signal is received quickly and the authenticity of future sessions degrades.

The best simulation programs we've seen have an explicit protocol about who can see session-level scores. In most cases, individual dimension scores are available only to the leader and their coach. Aggregate patterns — like "this cohort has a consistent gap in Conflict Navigation" — are available to the L&D team. But individual scores don't go to the leader's manager except through the leader's own choice. That separation is what makes authentic engagement possible.

The design tension that never fully resolves

There's a fundamental tension in simulation design that practitioners don't talk about enough: the features that make a simulation feel most real also make it hardest to score consistently. The more a scenario branches, the more a character responds unpredictably, the more context-specific the stakes — the harder it becomes to apply a consistent scoring rubric across sessions.

We've had to make deliberate choices about where to sit on that spectrum. We prioritize fidelity in the character behavior and consequence structure, and we accept that this means our scoring requires more nuanced interpretation than a scenario with a clean rubric. A Conflict Navigation score of 6.2 in a session where the counterpart escalated hard in the opening minute means something different than a 6.2 in a session where the counterpart was cooperative throughout. We flag those context differences in the session output.

The alternative — simpler scenarios that are easier to score — produces cleaner numbers that are less meaningful. We'd rather have a score that requires interpretation than a score that looks precise but doesn't reflect real leadership behavior under real pressure.

That's the choice every simulation designer has to make. The answer depends on what you're optimizing for: data volume or data quality. In a development context, quality wins.