The Operating System
for Human + AI Teams
NexusAOS is a framework for businesses where humans and AI agents work together as one coordinated team. Not another chatbot. An operating system for intelligent work.
This framework is a collaborative draft, not a finished product. We welcome challenges, additions, and real-world stress tests.
It answers six questions: What AI agents do we need? What does each one do? How do they work together? How do humans stay in control? How do we know it's working? How do we get better over time?
Most businesses deploy AI and get disappointed. Not because the AI failed β because it succeeded at the wrong thing.
Klarna's AI handled 2.3 million customer conversations and saved $60M. Then their CEO went on Bloomberg to explain why he was hiring humans back. The AI optimized for speed and cost β and destroyed customer trust in the process.
NexusAOS is the framework that prevents this.
The Universal Loop β οΈ WORKING MODEL
Everything in NexusAOS runs on one four-step cycle. Every AI agent, from annual strategy to a five-second task, follows the same pattern.
Notice what's happening
Figure out the right move
Take action
Remember what worked
The Five Loops β Five Cadences, One Pattern
Nexus Loop β Annual
The company-wide vision: what are we building toward? Humans make this decision.
Strategy Loop β Quarterly
Which priorities matter this quarter? Which agents are assigned to what?
Operations Loop β Weekly
How's it going? What needs tuning?
Execution Loop β Continuous 24/7
The agents actually doing the work, around the clock.
Guardian Loop β Always-On
The safety system. Watching everything, all the time.
The Agent Brief β οΈ WORKING MODEL
Every AI agent has a one-page spec. It's the job description and performance review β for an AI employee.
Mission
What does this agent do? One clear purpose.
Inputs & Outputs
What does it receive? What does it produce?
Connections
Who does it hand off to? What's upstream?
Guardrails
What can it never do? Hard limits on behavior.
Authority Level
How much freedom does this agent have?
Success Metrics
How do we know it's performing well?
Authority Levels β Agents Earn Trust Over Time
Agent can only recommend. Human must approve everything. Day one for every agent.
Agent acts, but flags anything unusual for human review.
Agent runs independently. Human checks outputs periodically.
Agent runs without oversight β reserved for proven, low-stakes workflows only.
How Agents Stay Sharp β οΈ WORKING MODEL
Agents don't carry memories between jobs. They receive a briefing, do the work, file their notes, and reset. Every time.
Hydration
Before each task, the agent receives a "briefing packet" β all the context it needs to do the job well.
Dehydration
When finished, the agent files its work to storage. Then it resets completely. Clean slate.
Why This Matters
- If an agent breaks, you replace it without losing any work
- Every output is documented and traceable
- Nothing exists only in someone's head
- Agents are scalable, replaceable, and debuggable
The Lean Execution Layer β οΈ WORKING MODEL
Toyota spent 40 years solving our same problem: consistent, high-quality output from complex multi-step processes where small errors compound. We applied their principles to AI agents.
π΄ Jidoka β Never Pass a Defect Downstream
Toyota: any worker can stop the entire production line the moment a defect is found. In AI: every workflow step has a Verification Gate. If an agent produces bad output, the workflow stops. The error is logged. A human is alerted.
π Kanban β Pull, Don't Push
Toyota: workers pull work when they have capacity. In AI: agents pull tasks when ready. You set a maximum number of agents working at once (WIP limit). No spawning 50 agents when 5 will do.
π¬ Kaizen β Earn the Right to Change
Continuous improvement through small, tested changes. You never modify a running process to "try something." Every prompt change, model swap, or workflow modification goes through a controlled test cycle.
πΊοΈ Value Stream Mapping β Map Before You Build
Before writing a single line of agent code, draw out every step of the workflow. For each step: what's the input? Output? Where could it go wrong? Is this step necessary? Only then, build.
ποΈ Muda β Eliminate Waste
Toyota identified seven categories of waste. All seven translate directly to AI:
Agent generates a 10-page report when 1 paragraph was needed
Agent blocked 20 minutes for human approval on a low-stakes decision
Using an expensive model for a task a smaller model handles fine
One hallucination at Step 2 compounds through Steps 3β10
500 tasks queued for 2 agents
Passing entire conversation histories when only the last 3 messages matter
A workflow with 12 agent hops when 6 would do
The New Vocabulary β οΈ WORKING MODEL
Terms NexusAOS invented because the concepts are genuinely new.
π¦ Context Budget
Every AI agent has a limit on how much information it can hold in mind at once. A Context Budget is the plan for how to use that space wisely.
π Agent Lineage
A record of which agent did what, with which model, at which step β all the way through a workflow.
𧬠Workflow Genome
The complete blueprint for how a workflow runs. Every step, every rule, every model, every gate β versioned, stored, auditable.
π Spawn Point
A planned moment in a workflow where one agent creates specialized sub-agents to run parallel work.
π Memory Horizon
How far back into the workflow's history an agent can "see" when it starts a task.
The Three Engineering Disciplines β οΈ WORKING MODEL
Most AI failures aren't technical. They're conceptual. There are three things you must get right. Most organizations get only the first one.
1. Prompt Engineering
What the AI Does
You write instructions telling the AI what task to perform. This is where everyone starts.
The limit: You can't write instructions for every situation. When an edge case appears, the AI does its best β which may be wrong.
2. Context Engineering
What the AI Knows
You design the information environment the AI works in β data, history, knowledge it can access.
The limit: An AI with perfect instructions but no context is like a brilliant consultant who has to relearn your business from scratch every meeting.
3. Intent Engineering
What the AI Wants
You encode the organization's goals, values, and decision rules directly into the system. When the AI hits a situation no instruction covers, it knows what matters most.
This is what 95% of companies miss.
The Klarna Lesson
Klarna told its AI what to do (handle inquiries). Told it what to know (customer data, policies). Never told it what to want. So it optimized for speed and cost β because those were measurable. Trust, empathy, relationship quality weren't. The AI destroyed them.
MIT: 95% of enterprise AI deployments fail to deliver measurable impact. Not because AI doesn't work. Because it succeeds at the wrong thing.
The Intent Schema
For every agent that interacts with customers, makes decisions, or handles anything that matters β an Intent Schema is required.
Intent Schema Fields
- Primary Objective: What are we actually trying to achieve? Not the task β the purpose of the task.
- Priority Hierarchy: When goals conflict, which one wins? Always explicit β never let the AI guess.
- Protected Outcomes: What can this agent never compromise, even for efficiency?
- Unmeasurables: What matters but can't be tracked by a computer? Route these to humans.
- Decision Boundaries: What is this agent explicitly NOT allowed to do?
- Escalation Trigger: When must this agent call a human, even if it's not sure why?
What Klarna's Intent Schema Should Have Been
- Primary Objective: Long-term customer retention through trust
- Priority Hierarchy: 1. Relationship quality β 2. Accurate resolution β 3. Efficiency
- Protected Outcomes: Customer dignity, empathy in every interaction
- Unmeasurables: Relationship quality β always route to human when ambiguous
- Decision Boundaries: Never sacrifice empathy for speed, even when scripts permit it
The AI would have made completely different decisions.
Intent Drift Detection
The Guardian Loop watches not just for system failures β but for behavioral drift. Is an agent making decisions that technically comply with instructions but violate the spirit of its Intent Schema? This is flagged, reviewed by humans, and corrected.
Model Selection β οΈ WORKING MODEL
The question isn't "what's the best AI?" β it's "what's the minimum capability that reliably produces the right output for this step?"
| Task Type | Example | Right Model |
|---|---|---|
| Classify / Extract | "Is this email a complaint?" | Small / Fast (cheap) |
| Draft / Generate | "Write a first-pass proposal" | Mid-tier |
| Reason / Synthesize | "Analyze these 50 contracts" | Large reasoning model |
| Verify / Check | "Is this output correct?" | Independent mid-tier |
| Brainstorm / Create | "What are 20 new approaches?" | Large, creative |
The Verification Rule
The agent checking work must be a different model from the agent that produced it. You can't grade your own test.
The Human Roles β οΈ WORKING MODEL
NexusAOS is a human + AI framework. Humans aren't optional β they're integral.
Agent Owner
Maintains the Agent Brief and Intent Schema for assigned agents. Responsible when an agent misbehaves.
Workflow Architect
Designs the Workflow Genomes. Draws the maps. Defines the Spawn Points.
Operations Lead
Runs the Weekly Operations Loop. Reviews performance, resolves escalations.
Guardian Operator
Monitors the Guardian Loop. Receives alerts. Investigates behavioral drift.
In a small company: one person plays multiple roles. In a large one: these are dedicated functions.
Getting Started β οΈ WORKING MODEL
You don't implement all of this on day one. NexusAOS scales with your maturity.
Level 1 β First Agent
One Agent Brief. One workflow. One human watching. No Genome needed. Just: what does it do, what does it produce, who checks it.
Level 2 β First Workflow
Multiple agents in a sequence. Add Verification Gates. Add a basic Workflow Genome. Define authority levels. Weekly review.
Level 3 β First Team
Multiple workflows. Guardian Loop monitoring. Kaizen Protocol for improvements. Intent Schemas for any customer-facing agent.
Level 4 β Operating System
Full five-loop architecture. All vocabulary in use. Cost tracking. Agent Lineage. Drift detection.
Level 5 β Autonomous Operations
High-performing agents earn Autonomous authority. Human role shifts from operator to architect. The system improves itself through Kaizen.
Open Questions β For Collaboration
- Should all agents need an Intent Schema, or only supervised/client-facing ones?
- How frequently should Kaizen cycles run by default?
- Should model selection be locked in the Genome or dynamic?
- How do we detect intent drift quantitatively?