Cortex-Agent-Framework

Getting Started

A step-by-step guide to building, configuring, and deploying AI agents with Cortex.

What is Cortex?

Cortex is a Python library that gives your application an AI agent capable of decomposing complex requests into parallel tasks, calling external tools, and synthesising results — all driven by a single cortex.yaml configuration file.

You don’t run Cortex on its own. You wrap it in your application — a web API, a CLI tool, a background worker, or an MCP server — and call its run_session() method whenever you need an AI-powered response.

Your Application
  └── CortexFramework("cortex.yaml")
        ├── Decomposes request into task graph
        ├── Fans out tasks in parallel to MCP tool servers
        ├── Synthesises results
        ├── Validates response quality
        └── Streams events back to your app via event_queue

Quick Start (5 minutes)

1. Install

Cortex is a Python package and requires Python 3.11 or newer. Install it into a virtual environment so its dependencies stay isolated from your system Python.

Create and activate a virtual environment:

python3 -m venv .venv

# Activate it — run this in every new shell you work in:
source .venv/bin/activate          # macOS / Linux
# .venv\Scripts\activate           # Windows (PowerShell / cmd)

Your prompt now shows (.venv). To leave the environment later, run deactivate.

Install Cortex into the activated environment:

# From PyPI
pip install cortex-agent-framework

# Or from source
git clone <repo-url>
cd cortex-agent-framework
pip install -e .

Verify the install:

cortex --help     # lists: setup, dev, dry-run, publish, spec, replay, delta,
                  # migrate, ants, config-ui, run, chat, sessions, blueprints,
                  # mcps, stats, config, providers, storage

2. Hello World (no external tools)

Before wiring up MCP tool servers, you can run a fully working agent using only the LLM. Create cortex.yaml:

agent:
  name: HelloAgent
  description: A minimal Cortex agent that only uses LLM synthesis

llm_access:
  default:
    provider: anthropic
    model: claude-sonnet-4-5
    api_key_env_var: ANTHROPIC_API_KEY
    max_tokens: 2048

task_types:
  - name: answer
    description: Answer a user question directly
    output_format: md
    capability_hint: llm_synthesis

storage:
  base_path: ./cortex_storage

Then:

export ANTHROPIC_API_KEY=sk-ant-...
cortex dry-run "Explain gradient descent in two sentences"
cortex dev

No MCP servers, no Docker, no extra setup — this is the smallest config that runs. Add tool_servers and more task_types once this works.

Tip: web_search is a built-in capability — it works out of the box via DuckDuckGo with no API key. Add a web_search task type and Cortex will search the web for live information automatically. You can upgrade to Brave Search or any MCP web-search server later by adding a tool_servers entry with the same capability.

3. Run the Setup Wizard

cortex setup

This opens an interactive browser-based wizard at http://localhost:7799 that walks you through:

Step	What you configure
Agent Identity	Name, description, interaction mode (interactive / rpc)
LLM Provider	Model, API key env var — cloud (Anthropic, OpenAI, Gemini, Grok, Mistral, DeepSeek, Bedrock, Azure) or local runtime (Ollama / LM Studio / vLLM, with a Gemma 4 quickstart)
Tool Servers	MCP integrations for external capabilities
Task Types	What your agent can do (web search, code execution, document generation, etc.)
Storage & Persistence	Backend (Memory / SQLite / Redis), retention, encryption
Adaptive Behaviour	Capability Scout, wave-gate validation, learning engine, blueprints
Runtime & Delivery	Session timeouts, concurrency limits, chat UI settings, Ant Colony
Publish Mode	How you want to deploy (Docker, Python package, MCP server, Chat UI)

The wizard saves a validated cortex.yaml in your project root.

Re-running the wizard: If cortex.yaml already exists, the wizard loads your current settings. Fields that could break existing data (agent name, storage backend with data) are locked. Everything else is editable.

4. Validate

cortex dry-run "Summarise the latest news on quantum computing"

This validates your config and compiles the task graph without making any LLM calls. Use it to catch config errors before spending API credits.

Expected output on success:

✓ Config loaded: cortex.yaml
✓ LLM provider reachable: anthropic / claude-sonnet-4-5
✓ Task graph compiled: 2 tasks, max depth 2
  ├─ web_research        (capability: web_search)
  └─ analysis            (depends on: web_research)
✓ No cycles detected
✓ Dry run complete — 0 LLM calls made

If any tool server is unreachable or a depends_on points at a missing task, dry-run fails here instead of mid-session.

5. Run in Dev Mode

cortex dev --watch

Starts Cortex with hot-reload — edit cortex.yaml and changes apply instantly without restarting. You’ll see:

[cortex] Initialising framework from cortex.yaml
[cortex] LLM: anthropic claude-sonnet-4-5
[cortex] Tool servers: brave_search (sse), filesystem (stdio)
[cortex] Watching cortex.yaml for changes...
[cortex] Ready. Send requests via framework.run_session() or the HTTP adapter.

How to Use Cortex in Your Application

The core integration is always the same three lines:

from cortex.framework import CortexFramework

framework = CortexFramework("cortex.yaml")
await framework.initialize()

result = await framework.run_session(
    user_id="user_123",
    request="Analyse Q3 revenue trends",
    event_queue=asyncio.Queue(),   # receives streaming events
)
print(result.response)

What changes is what wraps those lines. Below are all the ways developers use Cortex, with complete working examples.

Code-First Agents — `CortexBuilder`

A cortex.yaml file is optional. You can build the entire agent in Python with CortexBuilder and pass the result straight to the framework. This is the path for developers who prefer code to config, want their agent definition in the same repo as the rest of their app, or want code nodes — plain Python functions wired in as graph nodes, LangGraph-style.

The builder

from cortex import CortexBuilder, CortexFramework

agent = CortexBuilder("ResearchAgent", "Searches the web and writes reports")
agent.llm("anthropic", model="claude-sonnet-4-5", api_key_env="ANTHROPIC_API_KEY")
agent.storage(base_path="./cortex_storage")
agent.tool_server("brave", url="http://localhost:9000/sse",
                  capability_hints=["web_search"])

# An LLM-routed task type — same as a `task_types:` entry in cortex.yaml.
agent.task("web_research", capability="web_search", output="md")
agent.task("write_report", capability="document_generation",
           depends_on=["web_research"])

framework = CortexFramework(config=agent.build())   # no YAML file
await framework.initialize()
result = await framework.run_session("user_1", "Research vector DB benchmarks")

Every builder method returns self, so calls chain. .build() returns a validated CortexConfig; CortexFramework(config=...) consumes it directly.

Method	Purpose
`.llm(provider, model=, api_key_env=)`	Set the `default` LLM provider (required)
`.provider(key, provider, ...)`	Register a named provider for per-task routing
`.storage(base_path=, ...)`	Storage location and options
`.tool_server(name, url= / command=, ...)`	Register an MCP tool server (SSE or stdio)
`.task(name, capability=, depends_on=, ...)`	Add an LLM-routed task type
`.node(...)`	Register a Python code node (decorator — see below)
`.validation(threshold=, enabled=)`	Tune the Validation Agent
`.execution_mode("planned" / "static")`	Force the execution mode
`.configure(**sections)`	Escape hatch — merge any raw config section
`.build()`	Validate and return the `CortexConfig`

Code nodes — `@agent.node`

A code node is a Python function used as a graph node. Decorate it with @agent.node and it joins the DAG:

agent = CortexBuilder("Pipeline", "fetch → summarise → publish")
agent.llm("anthropic", model="claude-sonnet-4-5", api_key_env="ANTHROPIC_API_KEY")

@agent.node()
async def fetch(ctx):
    return await ctx.call_tool("brave", "search", query=ctx.request)

@agent.node(depends_on=["fetch"])
async def summarise(ctx):
    return await ctx.llm(f"Summarise concisely:\n{ctx.deps['fetch']}")

@agent.node(depends_on=["summarise"])
def publish(ctx):                      # sync functions work too
    return f"PUBLISHED: {ctx.deps['summarise']}"

framework = CortexFramework(config=agent.build())
await framework.initialize()
result = await framework.run_session("user_1", "latest on RAG benchmarks")

print(result.response)                  # synthesised answer
print(result.node_outputs["summarise"]) # raw output of one node

The decorator works bare (@agent.node) or parameterised (@agent.node(depends_on=[...], name=..., output=..., timeout=...)). The function name becomes the node name unless you pass name=.

The `TaskContext`

Each node receives one argument — a TaskContext (ctx) — wiring it into the running session:

Attribute / method	What it gives you
`ctx.request`	The original user request for the session
`ctx.deps`	`dict` of upstream node outputs, keyed by node name
`await ctx.llm(prompt, system=, max_tokens=, provider=)`	One-shot LLM completion via the agent’s providers
`await ctx.call_tool(server, tool, **params)`	Invoke an MCP tool on a configured tool server
`ctx.task_name`, `ctx.session_id`, `ctx.user_id`	Identity / bookkeeping
`ctx.instruction`, `ctx.input_refs`, `ctx.context_hints`	Lower-level task metadata

A node returns its output as a str, a (str, format) tuple, a dict/list (serialised to JSON), or None.

Static vs. planned execution

Registering any code node flips the agent to static execution (execution_mode="static"):

planned (default) — the decomposition LLM generates the task graph at runtime from your task_types. Flexible; one LLM call to plan.
static — the graph you declared is the plan. It runs verbatim in dependency order with no decomposition, intent-gate, or capability-scout LLM calls. Deterministic and cheaper per run.

Either way you keep the rest of Cortex: parallel fan-out/fan-in waves, the wave validation gate, retries, streaming events, session persistence, and the final synthesis + validation. Static mode just skips the planner.

You can also run a static DAG built only from .task() (LLM-routed) nodes — call .execution_mode("static") explicitly. And .task() and .node() mix freely in one agent.

When to use which

Use…	When
`cortex.yaml`	Config should be diffable / reviewed separately; non-developers tune the agent; you want the wizard and CLI
`CortexBuilder` + `.task()`	You prefer code, but still want the LLM to plan the graph per request
`CortexBuilder` + `.node()`	You want deterministic control of the graph and to run real Python at each step (LangGraph-style)

Caller Identity: `user_id` and `principal`

Every call to run_session() carries an identity — who initiated the request. Cortex uses this identity for storage isolation, audit logs, session ownership, and per-user concurrency caps.

There are two parameters:

Parameter	Required	Purpose
`user_id`	Yes	Storage namespace key. Sessions, history, and per-user data are isolated under this id.
`principal`	No	Rich identity object (`Principal`) that captures type (human/system/agent) and the delegation chain when one agent calls another. Auto-built from `user_id` if omitted.

For most applications you only ever pass user_id — the framework constructs a human-user Principal for you. The principal= parameter exists for two cases: system/autonomous agents and agent-to-agent delegation.

Case 1: Human user (most common)

await framework.run_session(
    user_id="user_123",
    request="Analyse Q3 revenue trends",
    event_queue=asyncio.Queue(),
)

Cortex internally builds Principal.from_user_id("user_123"). No code changes needed for existing apps.

Case 2: System / autonomous agent

For background jobs, schedulers, cron triggers, or any non-human initiator:

from cortex.identity import Principal

scheduler = Principal.system("nightly-report")   # → principal_id = "system:nightly-report"

await framework.run_session(
    user_id="system:nightly-report",   # storage key — matches principal_id
    request="Generate the nightly KPI report",
    event_queue=asyncio.Queue(),
    principal=scheduler,
)

System principal ids must follow "system:<name>" (alphanumeric, dash, underscore). Audit logs will record principal_type=system so you can distinguish autonomous runs from human-driven ones.

Case 3: Agent-to-agent delegation

When your agent calls Cortex on behalf of an end user — e.g. a sales bot that uses Cortex to research a prospect — pass an agent principal that records the original user:

from cortex.identity import Principal

user = Principal.from_user_id("user_123")
bot  = Principal.agent("sales-bot", delegated_by=user)

await framework.run_session(
    user_id="user_123",        # origin user — storage stays under their namespace
    request="Research competitor pricing for ACME Corp",
    event_queue=asyncio.Queue(),
    principal=bot,             # carries the chain ["user_123"] → "agent:sales-bot"
)

Delegation chains compose — if sales-bot then delegates to a pricing-engine agent, build a third principal with delegated_by=bot and the chain becomes ["user_123", "agent:sales-bot"]. Audit logs preserve the full provenance.

Key rules

user_id should always be the storage namespace. For delegated calls, that’s the originating user — never the agent’s id. This keeps all hops in a delegation chain in one storage space, which means session history, blueprints, and learned task types are attributed to the human who started the work.
principal_id for system/agent principals must match <type>:<name>. Cortex enforces this at construction time.
You can resume a delegated session by passing resume_session_id= along with the same user_id — ownership is checked against the origin user, not the agent.
Principal is immutable. Build a new one for each delegation hop rather than mutating an existing one.

Usage 1: Conversational Chat UI

Best for: Customer-facing apps, internal tools, support agents, dashboards with AI chat.

User ──► Browser ──► Your API (FastAPI) ──► CortexFramework
              ◄── SSE stream ◄─────────────── event_queue

Step 1 — Build the API layer:

# app.py
import asyncio
import json
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from cortex.framework import CortexFramework
from cortex.streaming.status_events import (
    EventType, StatusEvent, ResultEvent, ClarificationEvent,
)

app = FastAPI()
framework = CortexFramework("cortex.yaml")

@app.on_event("startup")
async def startup():
    await framework.initialize()

@app.on_event("shutdown")
async def shutdown():
    await framework.shutdown()

@app.post("/chat")
async def chat(body: dict):
    queue = asyncio.Queue()
    asyncio.create_task(
        framework.run_session(
            user_id=body["user_id"],
            request=body["message"],
            event_queue=queue,
        )
    )

    async def stream():
        while True:
            event = await queue.get()
            payload = {
                "type": event.event_type.value,
                "session_id": event.session_id,
            }
            if isinstance(event, ResultEvent):
                payload["content"] = event.content
                payload["partial"] = event.partial
            elif isinstance(event, StatusEvent):
                payload["message"] = event.message
            elif isinstance(event, ClarificationEvent):
                payload["question"] = event.question
                payload["clarification_id"] = event.clarification_id
                payload["options"] = event.options
            yield f"data: {json.dumps(payload)}\n\n"
            if event.event_type in (EventType.SESSION_END, EventType.ERROR):
                break

    return StreamingResponse(stream(), media_type="text/event-stream")

# Handle clarification answers (intent-gate or HITL task prompts)
@app.post("/clarify")
async def clarify(body: dict):
    resolved = framework.resolve_task_clarification(
        body["clarification_id"], body["answer"]
    )
    return {"resolved": resolved}

# Resume timed-out sessions
@app.get("/sessions/{user_id}/resumable")
async def resumable(user_id: str):
    return await framework.get_resumable_sessions(user_id)

Step 2 — Build a frontend that:

Sends messages to POST /chat
Reads the SSE stream for real-time events
Handles CLARIFICATION events (show follow-up questions, post answers to /clarify)
Renders RESULT events as the agent’s response

Event lifecycle in the UI:

Event	What the UI does
`session_start`	Show “thinking…” indicator
`intent_classified`	Show chat vs task routing decision
`task_blueprint`	Render the full DAG (waves, dependencies) before execution
`task_start`	Show progress (“Searching web…”, “Analysing data…”)
`task_tool_call`	Show which MCP or built-in tool is being invoked
`task_complete`	Update progress bar
`workspace_event`	Show file read/write/execution in workspace
`status`	Display status messages
`clarification`	Render a follow-up question with options
`result` (partial)	Stream text into the chat bubble
`result` (final)	Display complete response
`file_output`	Show download link for agent-produced file
`session_token_usage`	Display cumulative token counters
`error`	Show error state
`session_end`	Re-enable input

Usage 2: MCP Server (Agent-to-Agent)

Best for: Agent composition, building specialised sub-agents, multi-agent architectures.

Parent Agent ──► MCP protocol ──► Your Agent (as tool server)
                                     └── CortexFramework

This is the most powerful pattern. Your agent becomes a tool that other agents can call.

Step 1 — Build a specialised agent with its own cortex.yaml:

# research-agent/cortex.yaml
agent:
  name: ResearchAgent
  description: Searches the web and summarises findings

llm_access:
  default:
    provider: anthropic
    model: claude-sonnet-4-6
    api_key_env_var: ANTHROPIC_API_KEY
    max_tokens: 4096

# Optional: add a tool server for richer web search.
# Without this, Cortex falls back to built-in DuckDuckGo automatically.
# tool_servers:
#   brave_search:
#     transport: stdio
#     command: npx
#     args: ["-y", "@modelcontextprotocol/server-brave-search"]
#     env:
#       BRAVE_API_KEY: ${BRAVE_API_KEY}

task_types:
  - name: web_research
    description: Search the web for current information
    output_format: md
    capability_hint: web_search   # uses built-in DuckDuckGo if no tool server configured

storage:
  base_path: ./research_storage

Step 2 — Publish it as an MCP server:

cortex publish mcp --port 8081
# MCP server running at http://localhost:8081/mcp

Step 3 — Connect it from a parent agent’s config. You need both a tool_servers entry (so the parent can reach the child) and a task_types entry that references it (so the decomposer knows it exists):

# parent-agent/cortex.yaml
agent:
  name: OrchestratorAgent
  description: Delegates research, code review, and writing to specialised sub-agents

llm_access:
  default:
    provider: anthropic
    model: claude-sonnet-4-6
    api_key_env_var: ANTHROPIC_API_KEY

tool_servers:
  research:
    url: http://localhost:8081/mcp   # MCP endpoint — not /sse
    transport: sse
  code_review:
    url: http://localhost:8082/mcp
    transport: sse
  writing:
    url: http://localhost:8083/mcp
    transport: sse

task_types:
  - name: research
    description: Delegate web research to the ResearchAgent sub-agent
    output_format: md
    capability_hint: web_search         # planning hint (web_search → research server)

  - name: review_code
    description: Delegate code review to the CodeReviewAgent sub-agent
    output_format: md
    capability_hint: auto               # no hint — let the planner decide

  - name: write_report
    description: Generate a final written report from research + review inputs
    output_format: md
    capability_hint: document_generation
    depends_on: [research, review_code] # fan-in: waits for both

Step 4 — Drive it:

result = await framework.run_session(
    user_id="dev_1",
    request="Research competitor pricing for vector DBs, review our benchmark code, and write a report",
    event_queue=asyncio.Queue(),
)

The parent decomposes the request into three tasks, fans out research and review_code in parallel to the two MCP sub-agents, waits for both, then runs write_report to synthesise. Each sub-agent is independently deployable and has its own cortex.yaml.

Common pitfall: adding a tool_server without a matching task_type is a silent no-op — the decomposer only generates tasks for the types you’ve declared, and a tool server is only reached while executing a task. If a sub-agent never gets called, make sure a task_type exists whose work would actually need that server’s capability.

Composing agent hierarchies:

                    ┌── ResearchAgent (port 8081)
                    │     └── brave_search tool
OrchestratorAgent ──┼── CodeReviewAgent (port 8082)
                    │     └── github tools
                    └── WritingAgent (port 8083)
                          └── doc generation tools

Each agent is independent, has its own config, and can be deployed/scaled separately.

Usage 3: CLI Tool

Best for: Developer tools, ops automation, data pipelines, scripts.

Terminal ──► Your CLI (Click/Typer) ──► CortexFramework

# cli_agent.py
import asyncio
import click
from cortex.framework import CortexFramework
from cortex.streaming.status_events import EventType, ResultEvent, StatusEvent

@click.command()
@click.argument("request")
@click.option("--config", default="cortex.yaml")
def run(request, config):
    """Run a one-shot agent request from the command line."""
    asyncio.run(_run(request, config))

async def _run(request, config):
    fw = CortexFramework(config)
    await fw.initialize()
    q = asyncio.Queue()

    # Print events as they arrive
    async def print_events():
        while True:
            event = await q.get()
            if isinstance(event, StatusEvent):
                click.echo(f"  [{event.event_type.value}] {event.message}")
            elif isinstance(event, ResultEvent) and not event.partial:
                click.echo(f"\n{event.content}")
            if event.event_type in (EventType.SESSION_END, EventType.ERROR):
                break

    event_task = asyncio.create_task(print_events())
    result = await fw.run_session("cli_user", request, q)
    await event_task
    await fw.shutdown()

if __name__ == "__main__":
    run()

python cli_agent.py "Analyse the error logs from the last 24 hours"

Usage 4: Background Worker

Best for: Batch processing, scheduled jobs, email triage, automated report generation.

Job Queue (Celery / SQS / Redis) ──► Worker ──► CortexFramework
                                   ◄── Result stored to DB

# worker.py
import asyncio
from cortex.framework import CortexFramework

framework = CortexFramework("cortex.yaml")

async def init():
    await framework.initialize()

async def process_job(job: dict) -> str:
    """Called by your job queue for each incoming job."""
    q = asyncio.Queue()
    result = await framework.run_session(
        user_id=job["user_id"],
        request=job["prompt"],
        event_queue=q,
    )
    return result.response

# Example: process a batch of documents
async def batch_process(documents: list):
    await init()
    for doc in documents:
        summary = await process_job({
            "user_id": "batch_worker",
            "prompt": f"Summarise this document:\n\n{doc['content']}",
        })
        save_to_database(doc["id"], summary)
    await framework.shutdown()

Usage 5: Embedded in an Existing Application

Best for: Adding AI capabilities to an app that already exists (Django, Flask, FastAPI).

# Inside your existing Django view or FastAPI route
from cortex.framework import CortexFramework

framework = CortexFramework("cortex.yaml")

# Call this once at app startup
# await framework.initialize()

async def handle_support_ticket(ticket):
    q = asyncio.Queue()
    result = await framework.run_session(
        user_id=ticket.author_id,
        request=f"Triage and categorise this support ticket:\n\n{ticket.body}",
        event_queue=q,
    )
    ticket.ai_triage = result.response
    ticket.ai_score = result.validation_report.composite_score
    ticket.save()

No new app to build. Cortex is just another dependency.

Running Multiple Cortex Agents on One Machine

Cortex is designed for multi-agent composition — you can run any number of agents side-by-side on one machine. There’s no “one Cortex per host” limit; the defaults (filename cortex.yaml, wizard port 7799, MCP port 8080, storage ./cortex_storage) just need to be overridden per agent.

What’s shared vs. per-agent

Thing	Default	How to override per agent
Config file	`./cortex.yaml` in CWD	Every CLI command takes `--config PATH`, or set `CORTEX_CONFIG` env var
Wizard port	`7799`	`cortex setup --port 7800`
MCP publish port	`8080`	`cortex publish mcp --port 8081`
Storage base_path	`./cortex_storage`	Set `storage.base_path` in each `cortex.yaml`
SQLite DB path	`./cortex_storage/cortex.db`	Set `sqlite.path` in each `cortex.yaml`

Recommended directory layout

Give each agent its own directory with its own config and storage. Never run two agents from the same directory.

~/agents/
├── research-agent/
│   ├── cortex.yaml           # MCP port 8081, storage ./storage
│   └── storage/
├── code-review-agent/
│   ├── cortex.yaml           # MCP port 8082, storage ./storage
│   └── storage/
└── orchestrator/
    ├── cortex.yaml           # references 8081 + 8082 as tool_servers
    └── storage/

Step-by-step: build a 3-agent mesh

Step 1 — Create the research sub-agent:

mkdir -p ~/agents/research-agent && cd ~/agents/research-agent
cortex setup --port 7799        # wizard configures this one

In the wizard, set:

Agent name: ResearchAgent
Storage base_path: ./storage
SQLite path: ./storage/cortex.db
Add your web-search MCP tool server + a web_research task type

Step 2 — Create the code-review sub-agent (use a different wizard port so you can run both wizards in parallel if needed):

mkdir -p ~/agents/code-review-agent && cd ~/agents/code-review-agent
cortex setup --port 7800

Agent name: CodeReviewAgent
Storage base_path: ./storage
Add a GitHub/filesystem MCP tool server + a review_code task type

Step 3 — Create the orchestrator that fans out to both:

mkdir -p ~/agents/orchestrator && cd ~/agents/orchestrator
cortex setup --port 7801

Edit ~/agents/orchestrator/cortex.yaml so tool_servers references the two sub-agents (both running as MCP servers) and task_types has entries that route to them — see the MCP example in Usage 2 above for the exact shape.

tool_servers:
  research:
    url: http://localhost:8081/sse
    transport: sse
  code_review:
    url: http://localhost:8082/sse
    transport: sse

Step 4 — Run all three in separate terminals (or as systemd/supervisor/pm2 units):

# Terminal 1
cd ~/agents/research-agent    && cortex publish mcp --port 8081

# Terminal 2
cd ~/agents/code-review-agent && cortex publish mcp --port 8082

# Terminal 3
cd ~/agents/orchestrator      && cortex dev

Step 5 — Drive the orchestrator from your app (or another cortex dev REPL):

result = await framework.run_session(
    user_id="dev_1",
    request="Research the latest vector DB benchmarks and review our benchmark script at ./bench.py",
    event_queue=asyncio.Queue(),
)

The orchestrator decomposes the request, fans out to both sub-agents in parallel over MCP, and synthesises the combined result.

Things to watch out for

Never share a SQLite file between running agents. SQLite locks the DB file, so two agents pointing at the same sqlite.path will intermittently fail writes. Give each agent its own sqlite.path under its own storage.base_path.
Redis is safe to share across agents if you want centralised storage — but use a different key prefix per agent in the redis config block so sessions don’t collide.
Don’t run two agents from the same directory. Both would load the same cortex.yaml, write to the same storage, and fight over the same ports. Always cd into the agent’s own folder (or pass --config /abs/path/cortex.yaml explicitly).
Wizard is one-at-a-time per port. If you’re configuring multiple agents, use cortex setup --port 7800, --port 7801, etc., so wizards don’t collide.
Pick a port allocation scheme up front. A simple convention like wizard 7799 + N and MCP 8080 + N keeps the mesh readable. Write the mapping into each agent’s cortex.yaml comments so it’s discoverable.
Avoid circular tool_server references. Agent A referencing Agent B as a tool server which references A back will deadlock decomposition. Keep the call graph a DAG.
Kill orphaned MCP servers before restarting. cortex publish mcp binds the port until the process exits — if a previous run is still up, the next one will fail with address already in use. lsof -i :8081 to find the PID.
Set CORTEX_CONFIG in long-lived shells if you work on one specific agent a lot: export CORTEX_CONFIG=~/agents/research-agent/cortex.yaml. Then cortex dev / cortex dry-run from anywhere will target it without --config.

Deployment

Option A: Docker

cortex publish docker --tag my-agent:latest
docker build -f Dockerfile.cortex -t my-agent:latest .
docker run -p 8080:8080 --env-file .env my-agent:latest

Option B: Python Package

cortex publish package --output-dir dist
# Distribute the .whl file
pip install dist/*.whl
cortex dev --config cortex.yaml

Option C: MCP Server

cortex publish mcp --port 8080
# Other agents connect via:
#   tool_servers:
#     my_agent:
#       url: http://host:8080/sse
#       transport: sse

Automatically runs the agent with CORTEX_INTERACTION_MODE=rpc so MCP callers never hang on an interactive clarification.

Option D: Built-in Chat UI

cortex publish ui --port 8090
# → http://localhost:8090

Serves a single-page chat frontend backed by your agent. Streams status and results over SSE, supports file uploads (validated against file_input MIME / size limits), and persists per-user threads through the existing History Store. Configure title, host, port, and auth (none / token / basic) in the ui: block of cortex.yaml or via the wizard’s Chat UI step. See Deployment → Chat UI.

Configuration at a Glance

Everything lives in cortex.yaml. Here is a fully annotated example:

# ── Agent identity ──
agent:
  name: MyAgent
  description: A helpful AI assistant
  time:
    default_max_wait_seconds: 120     # Session timeout
    default_task_timeout_seconds: 40  # Per-task timeout
  concurrency:
    max_concurrent_sessions: 50       # Global session cap
    max_concurrent_sessions_per_user: 3
    max_parallel_tasks: 5             # Parallel tasks per session
    max_tasks_per_session: 20

# ── LLM provider ──
llm_access:
  default:
    provider: anthropic               # anthropic | openai | gemini | grok | mistral | deepseek | bedrock | azure_ai
    model: claude-sonnet-4-20250514
    api_key_env_var: ANTHROPIC_API_KEY
    max_tokens: 4096
    temperature: 1.0

# ── External tools via MCP ──
tool_servers:
  brave_search:
    transport: sse
    url: http://localhost:8051/sse
  filesystem:
    transport: stdio
    command: npx
    args: ["-y", "@modelcontextprotocol/server-filesystem", "/tmp/workspace"]

# ── Task types ──
task_types:
  - name: web_research
    description: Search the web for current information
    output_format: md                  # text | md | json | file | html | csv | code
    capability_hint: web_search        # auto | llm_synthesis | web_search | bash | code_exec | document_generation | image_generation
    timeout_seconds: 60

  - name: analysis
    description: Analyse data and produce structured insights
    output_format: json
    capability_hint: llm_synthesis
    depends_on: [web_research]         # Runs after web_research completes

# ── Storage ──
storage:
  base_path: ./cortex_storage

sqlite:                                # Or redis for distributed deployments
  enabled: true
  path: ./cortex_storage/cortex.db
  wal_mode: true

# ── Optional features ──
validation:
  threshold: 0.75                      # Min quality score (floor: 0.60)

history:
  enabled: true
  retention_days: 90

learning:
  enabled: true                        # Autonomic learning — signal-gated, no consent prompt
  validation_threshold: 0.75           # Min composite validation score to learn
  complexity_threshold: 0.6            # Min TaskComplexityScorer score to stage ad-hoc tasks
  auto_apply_delta: true               # Auto-promote deltas once confidence accumulates
  auto_apply_min_confidence: medium    # ≥ 3 distinct principals

workspace_bash:
  enabled: true                        # Workspace-scoped file/command execution with HITL
  hitl_enabled: true                   # Cannot be disabled — enforced at runtime

CLI Reference

Command	What it does
`cortex setup`	Interactive browser wizard to generate `cortex.yaml`
`cortex config-ui`	Browser-based Config Studio to inspect/edit all framework config at `localhost:7801`
`cortex dev --watch`	Dev mode with hot-reload on config changes
`cortex dry-run "query"`	Validate config and task graph without LLM calls
`cortex publish docker`	Generate `Dockerfile.cortex` (pass `--with-ui` for a chat-UI image)
`cortex publish package`	Build a distributable `.whl`
`cortex publish mcp --port 8080`	Expose agent as an MCP tool server (auto-sets `CORTEX_INTERACTION_MODE=rpc`)
`cortex publish ui --port 8090`	Serve the built-in chat UI
`cortex spec --format json`	Generate capability manifest
`cortex replay SESSION_ID --user-id USER_ID`	Replay a historical session
`cortex delta review`	Review auto-discovered task type proposals
`cortex delta apply --min-confidence high`	Apply confirmed proposals to config
`cortex delta rollback`	Restore previous config from backup
`cortex migrate`	Validate config schema compatibility
`cortex ants list / hatch / stop / status`	Manage the Ant Colony — self-spawning specialist agents

Session Result

Every call to run_session() returns a SessionResult:

result = await framework.run_session(user_id, request, event_queue)

result.session_id          # Unique session identifier
result.response            # Final synthesised response (string)
result.validation_report   # Quality scores (intent_match, completeness, coherence)
result.task_completion     # Which tasks succeeded/failed/timed out
result.token_usage         # Token counts by role (decomposition, execution, synthesis, validation)
result.duration_seconds    # Wall-clock time
result.node_outputs        # dict {node_name: raw_output} — read individual task/node results
result.error               # Error message if session failed (None on success)

event_queue is optional — omit it when you only need the returned SessionResult and don’t consume streaming events:

result = await framework.run_session("user_1", "Summarise this")  # no queue

Streaming Events

Cortex streams events through the event_queue as work progresses. Three event types:

from cortex.streaming.status_events import StatusEvent, ResultEvent, ClarificationEvent

# StatusEvent — progress updates
event.message       # "Executing task: web_research"
event.session_id
event.event_type    # EventType.STATUS | TASK_START | TASK_COMPLETE | SESSION_START | SESSION_END | ERROR

# ResultEvent — agent response (partial or final)
event.content       # The response text
event.partial       # True if streaming, False when complete
event.validation_score

# ClarificationEvent — agent needs more information (intent gate, HITL task prompts)
event.question          # "Which time period should I analyse?"
event.clarification_id  # Pass back to resolve_task_clarification()
event.options           # ["Last 7 days", "Last 30 days", "Last quarter"]

# LearningEvent — autonomic learning gate decision (end of session)
event.action            # "staged" | "applied" | "blueprint_updated" | "skipped_*"
event.complexity_score  # TaskComplexityScorer output, or None if gate was skipped
event.validation_score  # Composite validation score, or None
event.staged_tasks      # Task names staged into cortex_delta/pending.yaml
event.applied_tasks     # Task names auto-promoted into cortex.yaml

Supported LLM Providers

Provider	Config value	Default env var	Example models
Anthropic	`anthropic`	`ANTHROPIC_API_KEY`	claude-sonnet-4, claude-opus-4, claude-haiku-4.5
OpenAI	`openai`	`OPENAI_API_KEY`	gpt-4o, gpt-4o-mini, o3-mini
Google Gemini	`gemini`	`GEMINI_API_KEY`	gemini-2.5-pro, gemini-2.5-flash
xAI Grok	`grok`	`XAI_API_KEY`	grok-3, grok-3-mini
Mistral	`mistral`	`MISTRAL_API_KEY`	mistral-large-latest
DeepSeek	`deepseek`	`DEEPSEEK_API_KEY`	deepseek-chat, deepseek-reasoner
AWS Bedrock	`bedrock`	`AWS_ACCESS_KEY_ID`	anthropic.claude-sonnet-4-*
Azure AI	`azure_ai`	`AZURE_API_KEY`	claude-sonnet-4 (via Azure)
Anthropic Proxy	`anthropic_compatible`	`ANTHROPIC_API_KEY`	Any (set `base_url`)
Local Runtime	`local`	`LOCAL_LLM_API_KEY` (optional)	`gemma4:e4b`, `gemma4:26b`, or any Ollama / LM Studio / vLLM tag
Custom	`custom`	—	Provide `function` dotted path

Architecture

User Request
     │
     ▼
[Primary Agent]  ──── decomposes into task graph ────►  [Task A]  [Task B]
                                                             │         │
                                                        [MCP Agent] [MCP Agent]
                                                             │         │
                                                        (tool calls) (tool calls)
                                                             │         │
                                                        [Task C depends on A + B]
                                                             │
                                                        [MCP Agent]
                                                             │
                                                    [Primary Agent synthesises]
                                                             │
                                                    [Validation Agent scores]
                                                             │
                                                    [Learning Engine observes]
                                                             │
                                                       Final Response

Component	Role
Primary Agent	Decomposes requests into a task graph, synthesises final response
Generic MCP Agent	Executes individual tasks with access to MCP tool servers
Task Graph Compiler	Validates dependencies, detects cycles, computes execution order
Capability Scout	Pre-decomposition tool discovery so the agent knows what’s available
Validation Agent	Scores responses on intent match, completeness, and coherence
Learning Engine	Signal-gated autonomic evolution — stages deltas and refines blueprints at session end
WorkspaceBash	Workspace-scoped file read/write and command execution with mandatory HITL before any mutating operation
Session Manager	Concurrency limits, per-user caps, session resume after timeout
Signal Registry	Coordinates async completion across parallel tasks

Usage Mode Summary

Mode	Who calls it	Best for
Chat UI	Human via browser	Customer-facing apps, dashboards, internal tools
MCP Server	Another agent via MCP	Agent composition, specialised sub-agents
CLI Tool	Developer in terminal	Dev tools, ops automation, scripts
Background Worker	Job queue (Celery/SQS)	Batch processing, scheduled reports
Embedded Library	Your existing app	Adding AI to Django/Flask/FastAPI apps
Docker Microservice	Other services via HTTP	Production, cloud, CI/CD
Python Package	End users via pip	Distributing pre-configured agents

Testing

# Unit tests (no API key required)
pytest tests/ -v -k "not integration"

# Integration tests (requires API key)
ANTHROPIC_API_KEY=sk-... pytest tests/ -v

# With coverage
pytest tests/ --cov=cortex --cov-report=html

Cortex ships with test utilities:

from cortex.testing import MockLLMClient, make_test_config

cfg = make_test_config(agent_name="TestAgent", task_types=["web_search", "summarise"])
mock_llm = MockLLMClient(responses={"default": "Mock response"})

Environment Variables

Variable	Description
`ANTHROPIC_API_KEY`	Anthropic provider API key
`OPENAI_API_KEY`	OpenAI provider API key
`GEMINI_API_KEY`	Google Gemini provider API key
`XAI_API_KEY`	xAI Grok provider API key
`MISTRAL_API_KEY`	Mistral AI provider API key
`DEEPSEEK_API_KEY`	DeepSeek provider API key
`AWS_DEFAULT_REGION`	AWS region for Bedrock
`AZURE_AI_API_KEY`	Azure AI provider API key
`LOCAL_LLM_API_KEY`	Optional auth for the local provider (Ollama / LM Studio / vLLM)
`CORTEX_CONFIG`	Override default config path
`CORTEX_LOG_LEVEL`	Logging level (DEBUG, INFO, WARNING, ERROR)
`CORTEX_INTERACTION_MODE`	Override `agent.interaction_mode` — `interactive` or `rpc`. Set automatically by `cortex publish mcp`.
`CORTEX_HITL_URL`	Set automatically on ant subprocesses so WorkspaceBash HITL prompts relay to the parent session. Not set manually in normal use.

License

MIT License. See LICENSE for details.

This site is open source. Improve this page.