Skip to main content

What Domain Expertise Looks Like in an AI Workflow — A Worked Example

"Domain expertise" is one of those phrases that sounds nice and explains nothing. So here's a real walk-through of one production workflow — the Follow-Up Agent we built for Khalas Kitchen — including the 47 decisions made before a single prompt was written. The decisions are the work.

Two pieces of preface. The first: this is a 200-cover restaurant in Burnaby, family-owned, two managers, four-month-old online ordering pipeline. The numbers we'll mention are real (rounded to protect specifics). The second: this is the third post in a series. The earlier ones cover the why and when self-built versions stall; this one is the how.

The brief (one sentence)

"Reach out to customers who started an online order but didn't complete it, before they forget."

That's a five-minute prompt. It's also where a self-built version usually starts and stops — the prompt produces a perfectly fine "hey, you left items in your cart" message, the owner sees it work for a couple of test cases, and ships.

The production version is what the prompt becomes after the work below.

Decisions made before the first prompt

Who counts as "abandoned"?

  • Items added to cart but no checkout started? (Too noisy — most people browse menus mid-shift.)
  • Checkout started but not completed? (Better, but excludes drop-offs at delivery details.)
  • Got to the payment step? (Best signal, but smaller volume.)
  • Decision: tier them. Three different cohorts get three different message styles.

How long after they leave?

  • 5 minutes? (Feels stalkery; people are still deciding.)
  • 30 minutes? (Sweet spot for "I got distracted".)
  • 2 hours? (Useful for "I started ordering from work, stopped at the meeting, want to finish at home".)
  • Next day? (Mostly noise — they ordered elsewhere.)
  • Decision: 30 minutes for top-tier (payment-step), 2 hours for mid-tier, never for low-tier. Cohorts get different cadences.

What channel?

  • Email — fine, but who reads email in the 30-minute window?
  • SMS — high open rate, but the customer didn't sign a marketing consent.
  • Push notification — only if they have the app, which most don't.
  • Decision: email by default; SMS only if they explicitly opted in during checkout. Different copy per channel because what works in email reads as pushy over SMS.

Who do we NEVER message?

  • Customers who ordered something else within the last 24 hours (they didn't abandon, they pivoted).
  • Customers who've been messaged twice this week (diminishing returns + annoyance).
  • Anyone who clicked an unsubscribe in the last 6 months.
  • Orders flagged as suspicious by the fraud check.
  • The owner's own family (mom orders all the time; she doesn't need follow-up emails).
  • Decision: these are exclusion rules, applied before the AI is even invoked.

What do we say?

  • "You forgot something" — implicates them; doesn't work for the "I changed my mind" cohort.
  • "Anything wrong with our service?" — too defensive; admits a problem before there is one.
  • "Want a discount to come back?" — trains the customer to abandon-then-wait-for-discount; eats margin permanently.
  • "Here's the order, in case you'd like to finish it" — neutral, helpful, low-pressure. Winner.
  • For the highest-confidence cohort: include a one-tap link that pre-fills the cart at checkout. (This required a back-end build — but doubled the conversion of the message.)

What does the AI actually decide?

  • Which cohort the customer falls into (rule-based — no AI needed).
  • Whether to include any item-specific copy (AI: looks at the cart and writes a one-line note that's accurate to the order without sounding like a robot reciting items).
  • Whether to mention the item is currently popular, in-season, or coming off the menu (AI: pulls signals from the kitchen system, weaves it in if relevant).
  • Tone: warmer for first-time customers; a bit more familiar for repeat ones.
  • What the AI doesn't decide: who gets messaged, when, or whether to message at all. Those are deterministic rules.

What happens when the AI is unsure?

  • Below confidence threshold → fall back to a hand-written template. Don't ship a message you wouldn't be proud of.
  • If the customer's previous interaction was a complaint → escalate to the manager, do not auto-message.
  • If the cart contains an item the AI has never seen before → use the deterministic template, log it for review.
  • Decision: the system has three escape hatches before bad output reaches a customer.

How do we know it's working?

  • Recovery rate (orders completed within 24h of the message vs. baseline).
  • Unsubscribe rate (annoyance signal — must stay flat, not drift up).
  • Reply-back rate ("hey, this isn't what I wanted" — caught early prevents a public review).
  • Manager dashboard: any message that triggered an escalation → flagged for review the next morning.
  • Decision: these metrics are wired in before launch, not after the first complaint.

The result, and what wasn't AI

Once all of that is decided, writing the prompt takes thirty minutes. The prompt itself is mostly: "Given this cart, this customer history, and this tone (variable: warm/familiar), write a 2-3 sentence note. Don't push a discount. Stay under 30 words. If the cart contains [X edge case], use template B instead."

Of the 47 decisions above, exactly two are made by the AI per message: the one-line item note, and the tone calibration. Everything else is rules, exclusions, escalation logic, and the operational structure that makes the system durable. The boring 90% is what makes the visible 10% stop being a demo and start being a system.

What this work earns

Numbers from the first 60 days, after launch:

None of those numbers are about the AI. They're about the workflow. The AI is the executor of decisions someone made deliberately.

The pattern generalises. Whether it's plumbers (which leads to escalate, which to text back), accountants (which prospects to nurture, when), or salons (which clients to ask for reviews and when) — the work is in the 30-50 decisions made before the first prompt. The AI part is small and replaceable. The decisions are not.

What you'd take back to a self-built attempt

If you're building your own AI workflow, the most useful thing you could do today isn't tweak your prompt. It's write down the equivalent of the eight decision-lists above for your own use case. Specifically:

  1. Who counts as a candidate for the workflow? (Cohorts, tiers, signals.)
  2. What's the timing? (How fresh is the signal; when does it go stale.)
  3. Channel logic. (What's appropriate; what consent is required.)
  4. Exclusion rules. (Who never gets touched, automatically.)
  5. Outcome menu. (What set of responses is acceptable.)
  6. What's the AI's actual job. (Usually narrower than you think.)
  7. What happens when the AI is unsure. (Three escape hatches, minimum.)
  8. How do you know it's working. (Metrics wired in before launch.)

Each one feels like a conversation with yourself. None of them feel like AI work. All of them are necessary, and most of them are the parts that distinguish a demo from a system.


If you'd like one of these designed for your business

20-minute call. Tell us your one-sentence brief. We'll walk through the 30+ decisions that have to land before a prompt is written, and tell you whether it's the kind of work you can do yourself or where outside help shortens the path.

Want a worked example for your business?

Bring your one-sentence brief. We'll walk through what the production version actually requires.