The Operating System
for Human + AI Teams

NexusAOS is a framework for businesses where humans and AI agents work together as one coordinated team. Not another chatbot. An operating system for intelligent work.

This framework is a collaborative draft, not a finished product. We welcome challenges, additions, and real-world stress tests.

It answers six questions: What AI agents do we need? What does each one do? How do they work together? How do humans stay in control? How do we know it's working? How do we get better over time?

Most businesses deploy AI and get disappointed. Not because the AI failed — because it succeeded at the wrong thing.

Klarna's AI handled 2.3 million customer conversations and saved $60M. Then their CEO went on Bloomberg to explain why he was hiring humans back. The AI optimized for speed and cost — and destroyed customer trust in the process.

NexusAOS is the framework that prevents this.

Section 1

The Universal Loop ⚠️ WORKING MODEL

Everything in NexusAOS runs on one four-step cycle. Every AI agent, from annual strategy to a five-second task, follows the same pattern.

👁

Sense
Notice what's happening

→

🧠

Decide
Figure out the right move

→

⚡

Act
Take action

→

📚

Learn
Remember what worked

Think of it like how a great employee operates. They notice what's happening (Sense), figure out the right move (Decide), take action (Act), and remember what worked for next time (Learn).

The Five Loops — Five Cadences, One Pattern

Nexus Loop — Annual

The company-wide vision: what are we building toward? Humans make this decision.

Strategy Loop — Quarterly

Which priorities matter this quarter? Which agents are assigned to what?

Operations Loop — Weekly

How's it going? What needs tuning?

Execution Loop — Continuous 24/7

The agents actually doing the work, around the clock.

Guardian Loop — Always-On

The safety system. Watching everything, all the time.

Think of it like a military command structure. The General sets the annual strategy (Nexus). Colonels assign quarterly missions (Strategy). Captains run weekly debriefs (Operations). Soldiers execute on the ground 24/7 (Execution). And military police monitor for anything going wrong — all the time (Guardian).

Section 2

The Agent Brief ⚠️ WORKING MODEL

Every AI agent has a one-page spec. It's the job description and performance review — for an AI employee.

You wouldn't hire a human without explaining their role. You don't deploy an agent without an Agent Brief.

🎯

Mission

What does this agent do? One clear purpose.

📥

Inputs & Outputs

What does it receive? What does it produce?

🔗

Connections

Who does it hand off to? What's upstream?

🛡️

Guardrails

What can it never do? Hard limits on behavior.

🔑

Authority Level

How much freedom does this agent have?

📊

Success Metrics

How do we know it's performing well?

Authority Levels — Agents Earn Trust Over Time

Restricted

Agent can only recommend. Human must approve everything. Day one for every agent.

Advised

Agent acts, but flags anything unusual for human review.

Supervised

Agent runs independently. Human checks outputs periodically.

Autonomous

Agent runs without oversight — reserved for proven, low-stakes workflows only.

It's like a new employee vs. a 10-year veteran. Day one: every decision gets signed off. Year ten: they run their domain. NexusAOS builds AI systems the same way.

Section 3

How Agents Stay Sharp ⚠️ WORKING MODEL

Agents don't carry memories between jobs. They receive a briefing, do the work, file their notes, and reset. Every time.

💧

Hydration

Before each task, the agent receives a "briefing packet" — all the context it needs to do the job well.

📦

Dehydration

When finished, the agent files its work to storage. Then it resets completely. Clean slate.

Think of a surgeon. Before each operation, they review the patient's chart (hydration). After, they write their notes (dehydration). They don't carry every patient's history in their head — they access it when they need it.

Why This Matters

If an agent breaks, you replace it without losing any work
Every output is documented and traceable
Nothing exists only in someone's head
Agents are scalable, replaceable, and debuggable

Section 4

The Lean Execution Layer ⚠️ WORKING MODEL

Toyota spent 40 years solving our same problem: consistent, high-quality output from complex multi-step processes where small errors compound. We applied their principles to AI agents.

🔴 Jidoka — Never Pass a Defect Downstream

Toyota: any worker can stop the entire production line the moment a defect is found. In AI: every workflow step has a Verification Gate. If an agent produces bad output, the workflow stops. The error is logged. A human is alerted.

In medicine, you never skip a lab result just because the surgeon is waiting. You stop, check, and only proceed when it's clear.

A 10-step contract review workflow. Step 3 extracts key dates. If Step 3 produces a format error, the entire chain stops before Step 4 uses those dates to generate wrong compliance notifications.

📋 Kanban — Pull, Don't Push

Toyota: workers pull work when they have capacity. In AI: agents pull tasks when ready. You set a maximum number of agents working at once (WIP limit). No spawning 50 agents when 5 will do.

A restaurant kitchen. The expeditor doesn't shout 20 orders at once — tickets arrive and are pulled when the cook has a station open.

A content generation pipeline. Instead of launching 100 drafts simultaneously and crashing the API, you set a WIP limit of 10. Ten run, finish, ten more begin. Controlled, bounded, cost-predictable.

🔬 Kaizen — Earn the Right to Change

Continuous improvement through small, tested changes. You never modify a running process to "try something." Every prompt change, model swap, or workflow modification goes through a controlled test cycle.

A pharmaceutical trial. You don't roll out a new drug because someone thinks it might work better. You test, measure, compare. NexusAOS treats agent improvements the same way.

🗺️ Value Stream Mapping — Map Before You Build

Before writing a single line of agent code, draw out every step of the workflow. For each step: what's the input? Output? Where could it go wrong? Is this step necessary? Only then, build.

An architect draws blueprints before pouring concrete. NexusAOS draws workflow maps before deploying agents.

🗑️ Muda — Eliminate Waste

Toyota identified seven categories of waste. All seven translate directly to AI:

Overproduction

Agent generates a 10-page report when 1 paragraph was needed

Waiting

Agent blocked 20 minutes for human approval on a low-stakes decision

Overprocessing

Using an expensive model for a task a smaller model handles fine

Defects

One hallucination at Step 2 compounds through Steps 3–10

Inventory

500 tasks queued for 2 agents

Motion

Passing entire conversation histories when only the last 3 messages matter

Transportation

A workflow with 12 agent hops when 6 would do

Section 5

The New Vocabulary ⚠️ WORKING MODEL

Terms NexusAOS invented because the concepts are genuinely new.

📦 Context Budget

Every AI agent has a limit on how much information it can hold in mind at once. A Context Budget is the plan for how to use that space wisely.

Packing for a trip. You have a carry-on. You don't pack everything — you pack what you actually need. Context Budget is the packing list for an agent's mind.

A research agent gets 100K tokens of context. 40K for the brief, 30K for source documents, 20K for prior conversation, 10K reserved for output. No overflow, no truncation of critical information.

🔗 Agent Lineage

A record of which agent did what, with which model, at which step — all the way through a workflow.

A chain of custody in evidence handling. You can trace any piece of evidence back to exactly who touched it and when.

A client report generated by 8 agents. Something's wrong with the financial projections. Agent Lineage tells you: Step 4, Agent B, GPT-4o-mini, at 14:22 UTC. You fix Step 4. Everything else stands.

🧬 Workflow Genome

The complete blueprint for how a workflow runs. Every step, every rule, every model, every gate — versioned, stored, auditable.

A recipe in a restaurant. Not a vague description — the exact recipe with measurements, temperatures, and techniques. When a chef leaves, the recipe stays. When a new chef arrives, they follow the same recipe.

🚀 Spawn Point

A planned moment in a workflow where one agent creates specialized sub-agents to run parallel work.

A project manager who calls in specialists. At certain milestones, they bring in a lawyer, a designer, and a developer simultaneously. Each does their part, reports back.

A market research workflow reaches the "competitive analysis" spawn point and creates five sub-agents — one per competitor — running simultaneously. All five report back. The orchestrator synthesizes.

🔭 Memory Horizon

How far back into the workflow's history an agent can "see" when it starts a task.

A doctor reviewing a patient's medical history. They don't read every record from birth. They read the last 5 years, plus any flagged critical notes. That's their memory horizon.

Section 6

The Three Engineering Disciplines ⚠️ WORKING MODEL

Most AI failures aren't technical. They're conceptual. There are three things you must get right. Most organizations get only the first one.

1. Prompt Engineering

What the AI Does

You write instructions telling the AI what task to perform. This is where everyone starts.

The limit: You can't write instructions for every situation. When an edge case appears, the AI does its best — which may be wrong.

2. Context Engineering

What the AI Knows

You design the information environment the AI works in — data, history, knowledge it can access.

The limit: An AI with perfect instructions but no context is like a brilliant consultant who has to relearn your business from scratch every meeting.

3. Intent Engineering

What the AI Wants

You encode the organization's goals, values, and decision rules directly into the system. When the AI hits a situation no instruction covers, it knows what matters most.

This is what 95% of companies miss.

The Klarna Lesson

Klarna told its AI what to do (handle inquiries). Told it what to know (customer data, policies). Never told it what to want. So it optimized for speed and cost — because those were measurable. Trust, empathy, relationship quality weren't. The AI destroyed them.

MIT: 95% of enterprise AI deployments fail to deliver measurable impact. Not because AI doesn't work. Because it succeeds at the wrong thing.

The Intent Schema

For every agent that interacts with customers, makes decisions, or handles anything that matters — an Intent Schema is required.

Intent Schema Fields

Primary Objective: What are we actually trying to achieve? Not the task — the purpose of the task.
Priority Hierarchy: When goals conflict, which one wins? Always explicit — never let the AI guess.
Protected Outcomes: What can this agent never compromise, even for efficiency?
Unmeasurables: What matters but can't be tracked by a computer? Route these to humans.
Decision Boundaries: What is this agent explicitly NOT allowed to do?
Escalation Trigger: When must this agent call a human, even if it's not sure why?

What Klarna's Intent Schema Should Have Been

Primary Objective: Long-term customer retention through trust
Priority Hierarchy: 1. Relationship quality → 2. Accurate resolution → 3. Efficiency
Protected Outcomes: Customer dignity, empathy in every interaction
Unmeasurables: Relationship quality → always route to human when ambiguous
Decision Boundaries: Never sacrifice empathy for speed, even when scripts permit it

The AI would have made completely different decisions.

Intent Drift Detection

The Guardian Loop watches not just for system failures — but for behavioral drift. Is an agent making decisions that technically comply with instructions but violate the spirit of its Intent Schema? This is flagged, reviewed by humans, and corrected.

Section 7

Model Selection ⚠️ WORKING MODEL

The question isn't "what's the best AI?" — it's "what's the minimum capability that reliably produces the right output for this step?"

Task Type	Example	Right Model
Classify / Extract	"Is this email a complaint?"	Small / Fast (cheap)
Draft / Generate	"Write a first-pass proposal"	Mid-tier
Reason / Synthesize	"Analyze these 50 contracts"	Large reasoning model
Verify / Check	"Is this output correct?"	Independent mid-tier
Brainstorm / Create	"What are 20 new approaches?"	Large, creative

The Verification Rule

The agent checking work must be a different model from the agent that produced it. You can't grade your own test.

Section 8

The Human Roles ⚠️ WORKING MODEL

NexusAOS is a human + AI framework. Humans aren't optional — they're integral.

👤

Agent Owner

Maintains the Agent Brief and Intent Schema for assigned agents. Responsible when an agent misbehaves.

🏗️

Workflow Architect

Designs the Workflow Genomes. Draws the maps. Defines the Spawn Points.

📋

Operations Lead

Runs the Weekly Operations Loop. Reviews performance, resolves escalations.

🛡️

Guardian Operator

Monitors the Guardian Loop. Receives alerts. Investigates behavioral drift.

In a small company: one person plays multiple roles. In a large one: these are dedicated functions.

Section 9

Getting Started ⚠️ WORKING MODEL

You don't implement all of this on day one. NexusAOS scales with your maturity.

Level 1 — First Agent

One Agent Brief. One workflow. One human watching. No Genome needed. Just: what does it do, what does it produce, who checks it.

Level 2 — First Workflow

Multiple agents in a sequence. Add Verification Gates. Add a basic Workflow Genome. Define authority levels. Weekly review.

Level 3 — First Team

Multiple workflows. Guardian Loop monitoring. Kaizen Protocol for improvements. Intent Schemas for any customer-facing agent.

Level 4 — Operating System

Full five-loop architecture. All vocabulary in use. Cost tracking. Agent Lineage. Drift detection.

Level 5 — Autonomous Operations

High-performing agents earn Autonomous authority. Human role shifts from operator to architect. The system improves itself through Kaizen.

Open Questions — For Collaboration

Should all agents need an Intent Schema, or only supervised/client-facing ones?
How frequently should Kaizen cycles run by default?
Should model selection be locked in the Genome or dynamic?
How do we detect intent drift quantitatively?

The Operating Systemfor Human + AI Teams