SmartDuke Technologies
All essaysOperations

AI safety in production: a checklist that actually ships.

Safety isn't a content filter you add at the end. It's an architecture. These six layers are non-negotiable before any AI product touches real users.

By SmartDuke Team··9 min
Circuit board representing a hardened production AI system
In brief

AI safety in production isn't a content filter — it's an architecture. Required before launch: typed input contracts at the system boundary, output verification on shape and semantics, explicit refusal paths when confidence falls, designed human-handoff for high-stakes cases, audit logs for every action, and adversarial red-team testing. Bolt-on filters on top of an unsafe architecture are not enough.

The first incident in any AI product is always the same shape: an input no one tested for, a response no one expected, and a downstream effect no one designed against. The team adds a filter, calls it a guardrail, ships it, and moves on. Six months later there's a stack of bolt-on filters fighting each other and the product is harder to reason about than it was at day one.

The teams that don't fall into this loop treat safety as architecture. Six layers, every time, before the first user touches the system.

01 — Typed input contracts.

Don't pass arbitrary user text into the model. Define an input contract — type, length, language, allowed intents, max tokens — and enforce it at the system boundary. Reject early with a structured error your UI can render gracefully. Inputs that don't match the contract never reach the model.

02 — Output verification on shape and semantics.

Every model output gets two checks. Structural: does it parse into the expected schema? Semantic: does it pass the rubric — citation present, refusal correct, no PII leak, no out-of-scope content? Failed checks trigger explicit retry, fallback, or refusal — not silent retries that hide the failure.

Detailed view of an electronic board representing layered safety architecture

03 — Explicit refusal paths.

When confidence drops below threshold, the system has a designed response — refuse with a useful message, fall back to a simpler model, or escalate to a different path. Refusals are first-class, instrumented, and tracked. Silent uncertainty is the most dangerous failure mode.

04 — Designed human handoff for high-stakes cases.

If an answer affects someone's visa, money, medical decision, or legal status, the system must know that and route accordingly. Identify high-stakes paths in design, not after the first incident report. Build the handoff UI before the agent is live, not after.

05 — Audit logs for every action.

Every model call, tool invocation, and downstream effect is logged with the inputs, the outputs, the score, and the user identity. Not for compliance theater — for incident response. When something goes wrong, you need to be able to reconstruct exactly what happened in under five minutes.

Workspace with security audit checklist and monitoring tools

06 — Adversarial red-team testing before launch.

Spend a structured day attacking your own product. Prompt injection. Jailbreaks. Out-of-scope queries. Edge cases your test set doesn't cover. The findings go into the eval suite as new test cases. Skip this and you'll learn the hard way, on a Tuesday, from a screenshot on Twitter.

Safety is the operating loop, not the launch checklist. Re-run the checklist whenever you change the model, the prompt, the tools, or the data sources. Anything that changes the inputs or outputs changes the safety surface.

What you don't need.

You don't need a content-moderation API for most products. You don't need every off-the-shelf safety wrapper. You don't need to ship 17 vendor SDKs to call yourself safety-conscious. You need the six layers above, designed in, tested, instrumented. Most other tooling is window dressing on a foundation that's either solid or not.

Next essay
Engineering · 10 min

Schema.org markup for AI engines: what actually works in 2026.

Start a project

Have an AI product
that needs to ship?

Tell us where you are — early concept, broken prototype, or scaling something that already works. We'll come back within 24 hours with a take and a quote.