Guardrails

Guardrails are foundational to building safe, predictable, and verifiably correct AI agents. This document provides a technical blueprint for developers designing guardrails for domain-restricted, API-driven autonomous agents, using the ACME Sales Analytics Agent definition as a canonical example.

1. Objective of Guardrails

The primary purpose of guardrails is to ensure that the agent:

Operates strictly within a predefined domain (e.g., a specific API).
Executes autonomously without user clarification.
Rejects unsupported or risky tasks deterministically.
Avoids hallucinations, data fabrication, and speculative reasoning.
Maintains safety, privacy, and operational correctness.

In this model, guardrails serve as the contract between the agent and the environment. They specify what the agent can, cannot, and must do—across all phases of planning and execution.

2. Guardrail Structure

The example specification defines guardrails as a list of policies, each with:

id – A stable, unique identifier for referencing the policy.
phase – The lifecycle stages where the rule applies:
context: interpreting the user request
plan: building the chain of actions or API calls (If omitted, the rule applies to all phases.)
description – A normative, prescriptive rule the agent must follow.

Why phases matter

Guardrails often need to enforce different behavior depending on what the agent is doing. For example:

During context understanding, the agent should validate entity references.
During planning, the agent should prevent invalid API request construction.

This ensures guardrails are not general suggestions—they are precise constraints tied to specific reasoning stages.

3. Key Guardrail Categories (with Examples)

Below is a breakdown of the major categories in the provided specification and why they matter when designing robust agents.

3.1 Scope Limitation

Policy: scope.limits Defines the strictly allowed operations. The agent may only:

Use the explicitly supported API endpoints (/query, /fields)
Operate on documented entities
Perform allowed Retrieve/Compute actions

Why it matters: Scope drift is the leading cause of hallucinations and invalid agent behavior. Constraining the agent prevents:

Unauthorized API usage
Implicit reasoning beyond declared capabilities
Unexpected out-of-domain outputs

Best Practice: Write the allowed domain as positive rules (what is allowed), not just negative ones.

3.2 Explicit Rejection of Unsupported Tasks

Policy: reject.out.of.scope Provides a mandatory pattern for rejection:

Creative writing
Opinion-based questions
Data creation, mutation, deletion
Usage of unknown fields or entities
Predictions not computable through the API
Requests involving external systems

Why it matters: This category ensures the agent fails safely instead of improvising.

Best Practice: Rejection rules must be:

Deterministic
Verbose and explicit
Applied before planning, not after errors occur

3.3 Partial Assignment Handling

Policy: isolate.unsupported.instructions When a user mixes supported and unsupported tasks:

Identify supported parts
Plan only for those
Reject unsupported parts individually

Why it matters: This prevents:

Silent ignoring of instructions
Overgeneralization
Unintended multi-step inference

Best Practice: Always separate the user request into a validated task graph:

Supported → plan
Unsupported → reject

3.4 Entity and Schema Validation

Policy: enforce.entity.mapping Ensures that every requested operation maps to:

A known entity
A valid field
A supported operator

Why it matters: Incorrect schema usage is the most common agent failure case. An agent must treat the schema like a strict type system.

Best practice: Implement entity mapping as a compile-time check in the planning phase.

3.5 Hallucination Prevention

Policy: prevent.hallucination Prohibits:

Fabricating fields, data, relationships, or operators
Filling missing details with guesses
Producing synthetic data

Why it matters: Hallucination control is the cornerstone of reliable autonomous agents.

Best practice: Treat missing information as a hard failure, not something to infer.

3.6 Safe Defaults

Policy: safe.defaults Allows use of safe internal defaults only when:

They are part of the API schema
They do not require assumptions

Why it matters: Some tasks are fully determined without user input (e.g., default limits or orderings). Others introduce hidden assumptions.

Best practice: Defaults must be documented and consistent. If they cannot be applied safely → reject.

3.7 No User Clarification

Policy: no.clarification.allowed The agent must not request:

Additional details
Rephrasing
Missing context

If insufficient information exists → reject.

Why it matters: Autonomous agents are often embedded in automated workflows where user interaction is not guaranteed.

Best practice: All ambiguity is treated as fatal, not recoverable.

3.8 Data Privacy

Policy: data.privacy Prevents inference or fabrication of personal data not present in the schema.

Why it matters: Agents often exploit patterns to guess fields such as a customer’s name, address, or preferences. Schema boundaries prevent that.

Best practice: Require strict adherence to the OpenAPI schema as the sole source of truth.

4. Designing Effective Guardrails: Best Practices

4.1 Guardrails must be explicit and prescriptive

Avoid vague or interpretive language. Use MUST, MUST NOT, and MAY (RFC-style terminology).

4.2 Separate rules by failure modes

Each policy should address one failure type only. This improves logging, debugging, and enforcement.

4.3 Tie guardrails to agent phases

Agents behave differently depending on:

Understanding context
Planning
Executing API calls

Binding rules to phases prevents misapplication.

4.4 Avoid overlapping or conflicting rules

Every behavior the agent should exhibit needs to be:

Declared once
Unambiguous
Testable

4.5 Use schema-driven reasoning

The schema is the source of truth. The agent must:

Validate all fields against it
Construct queries strictly using supported types
Never extrapolate

4.6 Provide deterministic rejection patterns

Rejections should follow a consistent template:

Identify the unsupported instruction
Cite the violated policy
Avoid partial or ambiguous answers

4.7 Build testing suites around guardrails

Each policy should be convertible into:

Unit tests
Integration tests
Spec-based simulations

5. Summary

Guardrails are the backbone of reliable autonomous agents. Using the ACME Sales Analytics guardrail structure as a reference, developers should:

Define strict domain boundaries
Enforce schema compliance
Reject ambiguity deterministically
Prevent hallucinations
Maintain user and data safety
Tie constraints to specific phases of agent reasoning

A well-designed guardrail system ensures that your AI agent behaves as a predictable, compliant, and trustworthy component in larger enterprise systems.