The First-Prompt Test — Builders, Not Priests

The interview pack from "Always Two", Chapter 4. Find builders: people who shape AI into a partner.

The test (run it in ~1 hour)

Give the candidate a real problem and an AI tool, then watch how they work with it.

Phase 1 — no AI (10 min). Pose the problem; watch their raw thinking and clarifying questions.
Phase 2 — with AI (45 min). Hand them the AI assistant and let them build/solve. Observe the process, not just the result.

"Let's build a small to-do app. You have an hour, I'll be nearby. I can see where you lean on AI, where you don't, and how you handle time pressure." — the ideal interview (Ch4)

Ready-to-run scenarios

Pick one matched to the role and difficulty. Each lists what a builder does versus a priest, so you can run it cold.

Programmerwarmup

Fix a flaky test

This test passes locally but fails ~1 in 5 times in CI. Here's the AI assistant — figure out why and fix it.

Builder move: Asks what changed / what the test is really asserting, reproduces the flake before trusting the AI's first guess, fixes the root cause and adds a guard so it can't silently rot again.

Priest move: Pastes the failure into the AI, applies the first 'add a retry' suggestion verbatim, calls it done without reproducing — green once is 'fixed'.

Programmercore

Implement a small rate limiter

Build a rate limiter for an API endpoint. You have the AI assistant — I care more about how you work with it than the final code.

Builder move: Pushes back when the AI's token-bucket misses the burst case, names the fixed-window vs sliding-window trade-off out loud, writes a test for the concurrent path before declaring it done.

Priest move: Accepts the first implementation the AI emits, can't say why one algorithm over another, ships it untested because 'the AI wrote it'.

Programmerstretch

CSV importer with messy rows

Write an importer for this CSV. Heads up: real exports have malformed rows, missing columns, and bad encodings. Use the AI however you like.

Builder move: Starts with the simplest importer that works, then asks how to handle the malformed rows (skip + report vs fail-fast) and explains the cost of each; corrects the AI when it silently drops bad rows.

Priest move: Takes the AI's happy-path parser as complete, doesn't probe the malformed-row behaviour, and treats 'it ran on the sample' as 'it works'.

Product personwarmup

Triage a chunk of user feedback

Here are 40 raw support messages and an AI tool. Tell me the top 3 things we should act on and why.

Builder move: Asks who the users are and what decision this feeds, challenges the AI's tidy summary against the raw quotes, and explains why the top 3 beat the runners-up.

Priest move: Lets the AI cluster and rank, reads back its summary as fact without checking it against the messages, no 'why these three'.

Product personcore

Draft a short product spec

Draft a one-page spec for this feature using the AI. I'll watch how you scope it, not just what you write.

Builder move: Opens with who it's for and what problem it solves, cuts scope to a first slice that ships, and names the trade-off of what got cut; uses the AI to draft, not to decide.

Priest move: Has the AI generate a maximal spec, keeps every feature, can't say what to drop or why, and presents the AI's framing as the requirement.

Shared (either role)core

Build a tiny app end-to-end

Build a tiny working to-do app in the hour — any stack. You have the AI. I want to see where you lean on it and where you don't.

Builder move: Gets a minimal version working first, refines prompts when the AI overshoots, and verifies the core flow actually works before adding polish.

Priest move: Asks the AI for the whole app at once, pastes large blocks without understanding them, and never runs the thing end-to-end under time pressure.

What good looks like (worked example)

A real builder session, scored by this kit — the numbers are computed, not claimed.

First, why do we need this — what problem are we protecting against?

That's wrong — it misses the burst case. Let me rephrase the prompt to be more specific.

Let's start simple with a fixed window. The downside is the boundary spike, but it's easier to reason about.

I'd write a test for the concurrent path before I call it done.

Builder lean 100/100 — Strong builder signal — argues with the AI, owns quality, thinks in trade-offs.

Argues with / challenges the AI — “That's wrong — it misses the burst case. Let me rephrase the prompt to be more specific.”
Explains trade-offs (both sides) — “Let's start simple with a fixed window. The downside is the boundary spike, but it's easier to reason about.”
Asks WHY before what/how — “First, why do we need this — what problem are we protecting against?”
Iteratively refines the prompt — “That's wrong — it misses the burst case. Let me rephrase the prompt to be more specific.”
Simplifies / breaks down complexity — “Let's start simple with a fixed window. The downside is the boundary spike, but it's easier to reason about.”
Tests what they ship / owns quality — “I'd write a test for the concurrent path before I call it done.”

…and what to avoid (priest worked example)

The same task, a priest-leaning candidate — same scorer, the other pole.

The AI said this works, so it must be right. Looks good, I'll just use that.

I only do frontend, I won't touch the backend part.

Writing tests isn't my job — QA will catch it.

Anyone who uses that framework is wrong; you should always use mine.

Builder lean 0/100 — Priest-leaning — watch for blind AI acceptance, lane-guarding, or zealotry.

Blindly accepts AI output — “The AI said this works, so it must be right. Looks good, I'll just use that.”
Narrow specialist who won't step out of their lane — “I only do frontend, I won't touch the backend part.”
"Not my job" / ownership aversion — “Writing tests isn't my job — QA will catch it.”
Tool / framework zealot — “Anyone who uses that framework is wrong; you should always use mine.”

What to watch for

Builder signals

Argues with / challenges the AI argues-with-ai · w3
Iteratively refines the prompt refines-prompt · w2
Persistence/grit to steer to a useful answer shows-grit · w2
Explains trade-offs (both sides) explains-tradeoffs · w3
Asks WHY before what/how asks-why · w2
Simplifies / breaks down complexity simplifies · w2
Curiosity across domains cross-domain-curiosity · w1
Tests what they ship / owns quality tests-what-they-ship · w2

Anti-signals

Blindly accepts AI output blind-ai-acceptance · w3
Narrow specialist who won't step out of their lane wont-leave-lane · w3
Tool / framework zealot tool-zealot · w2
"Not my job" / ownership aversion not-my-job · w3

Scorecard

For each signal you observe, capture the candidate's verbatim quote as evidence, then run:

interview score transcript.txt   # offline, or AI-assisted if ANTHROPIC_API_KEY is set

Decision-support, not a verdict. The output is a builder-lean score + cited evidence. A human makes the hire call — the book is explicit: humans decide. This pack never tells you who to hire.