A step-by-step guide to building, configuring, and deploying AI agents with Cortex.
Cortex is a Python library that gives your application an AI agent capable of decomposing complex requests into parallel tasks, calling external tools, and synthesising results — all driven by a single cortex.yaml configuration file.
You don’t run Cortex on its own. You wrap it in your application — a web API, a CLI tool, a background worker, or an MCP server — and call its run_session() method whenever you need an AI-powered response.
Your Application
└── CortexFramework("cortex.yaml")
├── Decomposes request into task graph
├── Fans out tasks in parallel to MCP tool servers
├── Synthesises results
├── Validates response quality
└── Streams events back to your app via event_queue
Cortex is a Python package and requires Python 3.11 or newer. Install it into a virtual environment so its dependencies stay isolated from your system Python.
Create and activate a virtual environment:
python3 -m venv .venv
# Activate it — run this in every new shell you work in:
source .venv/bin/activate # macOS / Linux
# .venv\Scripts\activate # Windows (PowerShell / cmd)
Your prompt now shows (.venv). To leave the environment later, run deactivate.
Install Cortex into the activated environment:
# From PyPI
pip install cortex-agent-framework
# Or from source
git clone <repo-url>
cd cortex-agent-framework
pip install -e .
Verify the install:
cortex --help # lists: setup, dev, dry-run, publish, spec, replay, delta,
# migrate, ants, config-ui, run, chat, sessions, blueprints,
# mcps, stats, config, providers, storage
Before wiring up MCP tool servers, you can run a fully working agent using only the LLM. Create cortex.yaml:
agent:
name: HelloAgent
description: A minimal Cortex agent that only uses LLM synthesis
llm_access:
default:
provider: anthropic
model: claude-sonnet-4-5
api_key_env_var: ANTHROPIC_API_KEY
max_tokens: 2048
task_types:
- name: answer
description: Answer a user question directly
output_format: md
capability_hint: llm_synthesis
storage:
base_path: ./cortex_storage
Then:
export ANTHROPIC_API_KEY=sk-ant-...
cortex dry-run "Explain gradient descent in two sentences"
cortex dev
No MCP servers, no Docker, no extra setup — this is the smallest config that runs. Add tool_servers and more task_types once this works.
Tip:
web_searchis a built-in capability — it works out of the box via DuckDuckGo with no API key. Add aweb_searchtask type and Cortex will search the web for live information automatically. You can upgrade to Brave Search or any MCP web-search server later by adding atool_serversentry with the same capability.
cortex setup
This opens an interactive browser-based wizard at http://localhost:7799 that walks you through:
| Step | What you configure |
|---|---|
| Agent Identity | Name, description, interaction mode (interactive / rpc) |
| LLM Provider | Model, API key env var — cloud (Anthropic, OpenAI, Gemini, Grok, Mistral, DeepSeek, Bedrock, Azure) or local runtime (Ollama / LM Studio / vLLM, with a Gemma 4 quickstart) |
| Tool Servers | MCP integrations for external capabilities |
| Task Types | What your agent can do (web search, code execution, document generation, etc.) |
| Storage & Persistence | Backend (Memory / SQLite / Redis), retention, encryption |
| Adaptive Behaviour | Capability Scout, wave-gate validation, learning engine, blueprints |
| Runtime & Delivery | Session timeouts, concurrency limits, chat UI settings, Ant Colony |
| Publish Mode | How you want to deploy (Docker, Python package, MCP server, Chat UI) |
The wizard saves a validated cortex.yaml in your project root.
Re-running the wizard: If
cortex.yamlalready exists, the wizard loads your current settings. Fields that could break existing data (agent name, storage backend with data) are locked. Everything else is editable.
cortex dry-run "Summarise the latest news on quantum computing"
This validates your config and compiles the task graph without making any LLM calls. Use it to catch config errors before spending API credits.
Expected output on success:
✓ Config loaded: cortex.yaml
✓ LLM provider reachable: anthropic / claude-sonnet-4-5
✓ Task graph compiled: 2 tasks, max depth 2
├─ web_research (capability: web_search)
└─ analysis (depends on: web_research)
✓ No cycles detected
✓ Dry run complete — 0 LLM calls made
If any tool server is unreachable or a depends_on points at a missing task, dry-run fails here instead of mid-session.
cortex dev --watch
Starts Cortex with hot-reload — edit cortex.yaml and changes apply instantly without restarting. You’ll see:
[cortex] Initialising framework from cortex.yaml
[cortex] LLM: anthropic claude-sonnet-4-5
[cortex] Tool servers: brave_search (sse), filesystem (stdio)
[cortex] Watching cortex.yaml for changes...
[cortex] Ready. Send requests via framework.run_session() or the HTTP adapter.
The core integration is always the same three lines:
from cortex.framework import CortexFramework
framework = CortexFramework("cortex.yaml")
await framework.initialize()
result = await framework.run_session(
user_id="user_123",
request="Analyse Q3 revenue trends",
event_queue=asyncio.Queue(), # receives streaming events
)
print(result.response)
What changes is what wraps those lines. Below are all the ways developers use Cortex, with complete working examples.
CortexBuilderA cortex.yaml file is optional. You can build the entire agent in Python with
CortexBuilder and pass the result straight to the framework. This is the
path for developers who prefer code to config, want their agent definition in
the same repo as the rest of their app, or want code nodes — plain Python
functions wired in as graph nodes, LangGraph-style.
from cortex import CortexBuilder, CortexFramework
agent = CortexBuilder("ResearchAgent", "Searches the web and writes reports")
agent.llm("anthropic", model="claude-sonnet-4-5", api_key_env="ANTHROPIC_API_KEY")
agent.storage(base_path="./cortex_storage")
agent.tool_server("brave", url="http://localhost:9000/sse",
capability_hints=["web_search"])
# An LLM-routed task type — same as a `task_types:` entry in cortex.yaml.
agent.task("web_research", capability="web_search", output="md")
agent.task("write_report", capability="document_generation",
depends_on=["web_research"])
framework = CortexFramework(config=agent.build()) # no YAML file
await framework.initialize()
result = await framework.run_session("user_1", "Research vector DB benchmarks")
Every builder method returns self, so calls chain. .build() returns a
validated CortexConfig; CortexFramework(config=...) consumes it directly.
| Method | Purpose |
|---|---|
.llm(provider, model=, api_key_env=) |
Set the default LLM provider (required) |
.provider(key, provider, ...) |
Register a named provider for per-task routing |
.storage(base_path=, ...) |
Storage location and options |
.tool_server(name, url= / command=, ...) |
Register an MCP tool server (SSE or stdio) |
.task(name, capability=, depends_on=, ...) |
Add an LLM-routed task type |
.node(...) |
Register a Python code node (decorator — see below) |
.validation(threshold=, enabled=) |
Tune the Validation Agent |
.execution_mode("planned" / "static") |
Force the execution mode |
.configure(**sections) |
Escape hatch — merge any raw config section |
.build() |
Validate and return the CortexConfig |
@agent.nodeA code node is a Python function used as a graph node. Decorate it with
@agent.node and it joins the DAG:
agent = CortexBuilder("Pipeline", "fetch → summarise → publish")
agent.llm("anthropic", model="claude-sonnet-4-5", api_key_env="ANTHROPIC_API_KEY")
@agent.node()
async def fetch(ctx):
return await ctx.call_tool("brave", "search", query=ctx.request)
@agent.node(depends_on=["fetch"])
async def summarise(ctx):
return await ctx.llm(f"Summarise concisely:\n{ctx.deps['fetch']}")
@agent.node(depends_on=["summarise"])
def publish(ctx): # sync functions work too
return f"PUBLISHED: {ctx.deps['summarise']}"
framework = CortexFramework(config=agent.build())
await framework.initialize()
result = await framework.run_session("user_1", "latest on RAG benchmarks")
print(result.response) # synthesised answer
print(result.node_outputs["summarise"]) # raw output of one node
The decorator works bare (@agent.node) or parameterised
(@agent.node(depends_on=[...], name=..., output=..., timeout=...)). The
function name becomes the node name unless you pass name=.
TaskContextEach node receives one argument — a TaskContext (ctx) — wiring it into the
running session:
| Attribute / method | What it gives you |
|---|---|
ctx.request |
The original user request for the session |
ctx.deps |
dict of upstream node outputs, keyed by node name |
await ctx.llm(prompt, system=, max_tokens=, provider=) |
One-shot LLM completion via the agent’s providers |
await ctx.call_tool(server, tool, **params) |
Invoke an MCP tool on a configured tool server |
ctx.task_name, ctx.session_id, ctx.user_id |
Identity / bookkeeping |
ctx.instruction, ctx.input_refs, ctx.context_hints |
Lower-level task metadata |
A node returns its output as a str, a (str, format) tuple, a dict/list
(serialised to JSON), or None.
Registering any code node flips the agent to static execution
(execution_mode="static"):
planned (default) — the decomposition LLM generates the task graph at
runtime from your task_types. Flexible; one LLM call to plan.static — the graph you declared is the plan. It runs verbatim in
dependency order with no decomposition, intent-gate, or capability-scout
LLM calls. Deterministic and cheaper per run.Either way you keep the rest of Cortex: parallel fan-out/fan-in waves, the wave validation gate, retries, streaming events, session persistence, and the final synthesis + validation. Static mode just skips the planner.
You can also run a static DAG built only from .task() (LLM-routed) nodes —
call .execution_mode("static") explicitly. And .task() and .node() mix
freely in one agent.
| Use… | When |
|---|---|
cortex.yaml |
Config should be diffable / reviewed separately; non-developers tune the agent; you want the wizard and CLI |
CortexBuilder + .task() |
You prefer code, but still want the LLM to plan the graph per request |
CortexBuilder + .node() |
You want deterministic control of the graph and to run real Python at each step (LangGraph-style) |
user_id and principalEvery call to run_session() carries an identity — who initiated the request. Cortex uses this identity for storage isolation, audit logs, session ownership, and per-user concurrency caps.
There are two parameters:
| Parameter | Required | Purpose |
|---|---|---|
user_id |
Yes | Storage namespace key. Sessions, history, and per-user data are isolated under this id. |
principal |
No | Rich identity object (Principal) that captures type (human/system/agent) and the delegation chain when one agent calls another. Auto-built from user_id if omitted. |
For most applications you only ever pass user_id — the framework constructs a human-user Principal for you. The principal= parameter exists for two cases: system/autonomous agents and agent-to-agent delegation.
await framework.run_session(
user_id="user_123",
request="Analyse Q3 revenue trends",
event_queue=asyncio.Queue(),
)
Cortex internally builds Principal.from_user_id("user_123"). No code changes needed for existing apps.
For background jobs, schedulers, cron triggers, or any non-human initiator:
from cortex.identity import Principal
scheduler = Principal.system("nightly-report") # → principal_id = "system:nightly-report"
await framework.run_session(
user_id="system:nightly-report", # storage key — matches principal_id
request="Generate the nightly KPI report",
event_queue=asyncio.Queue(),
principal=scheduler,
)
System principal ids must follow "system:<name>" (alphanumeric, dash, underscore). Audit logs will record principal_type=system so you can distinguish autonomous runs from human-driven ones.
When your agent calls Cortex on behalf of an end user — e.g. a sales bot that uses Cortex to research a prospect — pass an agent principal that records the original user:
from cortex.identity import Principal
user = Principal.from_user_id("user_123")
bot = Principal.agent("sales-bot", delegated_by=user)
await framework.run_session(
user_id="user_123", # origin user — storage stays under their namespace
request="Research competitor pricing for ACME Corp",
event_queue=asyncio.Queue(),
principal=bot, # carries the chain ["user_123"] → "agent:sales-bot"
)
Delegation chains compose — if sales-bot then delegates to a pricing-engine agent, build a third principal with delegated_by=bot and the chain becomes ["user_123", "agent:sales-bot"]. Audit logs preserve the full provenance.
user_id should always be the storage namespace. For delegated calls, that’s the originating user — never the agent’s id. This keeps all hops in a delegation chain in one storage space, which means session history, blueprints, and learned task types are attributed to the human who started the work.principal_id for system/agent principals must match <type>:<name>. Cortex enforces this at construction time.resume_session_id= along with the same user_id — ownership is checked against the origin user, not the agent.Principal is immutable. Build a new one for each delegation hop rather than mutating an existing one.Best for: Customer-facing apps, internal tools, support agents, dashboards with AI chat.
User ──► Browser ──► Your API (FastAPI) ──► CortexFramework
◄── SSE stream ◄─────────────── event_queue
Step 1 — Build the API layer:
# app.py
import asyncio
import json
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from cortex.framework import CortexFramework
from cortex.streaming.status_events import (
EventType, StatusEvent, ResultEvent, ClarificationEvent,
)
app = FastAPI()
framework = CortexFramework("cortex.yaml")
@app.on_event("startup")
async def startup():
await framework.initialize()
@app.on_event("shutdown")
async def shutdown():
await framework.shutdown()
@app.post("/chat")
async def chat(body: dict):
queue = asyncio.Queue()
asyncio.create_task(
framework.run_session(
user_id=body["user_id"],
request=body["message"],
event_queue=queue,
)
)
async def stream():
while True:
event = await queue.get()
payload = {
"type": event.event_type.value,
"session_id": event.session_id,
}
if isinstance(event, ResultEvent):
payload["content"] = event.content
payload["partial"] = event.partial
elif isinstance(event, StatusEvent):
payload["message"] = event.message
elif isinstance(event, ClarificationEvent):
payload["question"] = event.question
payload["clarification_id"] = event.clarification_id
payload["options"] = event.options
yield f"data: {json.dumps(payload)}\n\n"
if event.event_type in (EventType.SESSION_END, EventType.ERROR):
break
return StreamingResponse(stream(), media_type="text/event-stream")
# Handle clarification answers (intent-gate or HITL task prompts)
@app.post("/clarify")
async def clarify(body: dict):
resolved = framework.resolve_task_clarification(
body["clarification_id"], body["answer"]
)
return {"resolved": resolved}
# Resume timed-out sessions
@app.get("/sessions/{user_id}/resumable")
async def resumable(user_id: str):
return await framework.get_resumable_sessions(user_id)
Step 2 — Build a frontend that:
POST /chatCLARIFICATION events (show follow-up questions, post answers to /clarify)RESULT events as the agent’s responseEvent lifecycle in the UI:
| Event | What the UI does |
|---|---|
session_start |
Show “thinking…” indicator |
intent_classified |
Show chat vs task routing decision |
task_blueprint |
Render the full DAG (waves, dependencies) before execution |
task_start |
Show progress (“Searching web…”, “Analysing data…”) |
task_tool_call |
Show which MCP or built-in tool is being invoked |
task_complete |
Update progress bar |
workspace_event |
Show file read/write/execution in workspace |
status |
Display status messages |
clarification |
Render a follow-up question with options |
result (partial) |
Stream text into the chat bubble |
result (final) |
Display complete response |
file_output |
Show download link for agent-produced file |
session_token_usage |
Display cumulative token counters |
error |
Show error state |
session_end |
Re-enable input |
Best for: Agent composition, building specialised sub-agents, multi-agent architectures.
Parent Agent ──► MCP protocol ──► Your Agent (as tool server)
└── CortexFramework
This is the most powerful pattern. Your agent becomes a tool that other agents can call.
Step 1 — Build a specialised agent with its own cortex.yaml:
# research-agent/cortex.yaml
agent:
name: ResearchAgent
description: Searches the web and summarises findings
llm_access:
default:
provider: anthropic
model: claude-sonnet-4-6
api_key_env_var: ANTHROPIC_API_KEY
max_tokens: 4096
# Optional: add a tool server for richer web search.
# Without this, Cortex falls back to built-in DuckDuckGo automatically.
# tool_servers:
# brave_search:
# transport: stdio
# command: npx
# args: ["-y", "@modelcontextprotocol/server-brave-search"]
# env:
# BRAVE_API_KEY: ${BRAVE_API_KEY}
task_types:
- name: web_research
description: Search the web for current information
output_format: md
capability_hint: web_search # uses built-in DuckDuckGo if no tool server configured
storage:
base_path: ./research_storage
Step 2 — Publish it as an MCP server:
cortex publish mcp --port 8081
# MCP server running at http://localhost:8081/mcp
Step 3 — Connect it from a parent agent’s config. You need both a tool_servers entry (so the parent can reach the child) and a task_types entry that references it (so the decomposer knows it exists):
# parent-agent/cortex.yaml
agent:
name: OrchestratorAgent
description: Delegates research, code review, and writing to specialised sub-agents
llm_access:
default:
provider: anthropic
model: claude-sonnet-4-6
api_key_env_var: ANTHROPIC_API_KEY
tool_servers:
research:
url: http://localhost:8081/mcp # MCP endpoint — not /sse
transport: sse
code_review:
url: http://localhost:8082/mcp
transport: sse
writing:
url: http://localhost:8083/mcp
transport: sse
task_types:
- name: research
description: Delegate web research to the ResearchAgent sub-agent
output_format: md
capability_hint: web_search # planning hint (web_search → research server)
- name: review_code
description: Delegate code review to the CodeReviewAgent sub-agent
output_format: md
capability_hint: auto # no hint — let the planner decide
- name: write_report
description: Generate a final written report from research + review inputs
output_format: md
capability_hint: document_generation
depends_on: [research, review_code] # fan-in: waits for both
Step 4 — Drive it:
result = await framework.run_session(
user_id="dev_1",
request="Research competitor pricing for vector DBs, review our benchmark code, and write a report",
event_queue=asyncio.Queue(),
)
The parent decomposes the request into three tasks, fans out research and review_code in parallel to the two MCP sub-agents, waits for both, then runs write_report to synthesise. Each sub-agent is independently deployable and has its own cortex.yaml.
Common pitfall: adding a
tool_serverwithout a matchingtask_typeis a silent no-op — the decomposer only generates tasks for the types you’ve declared, and a tool server is only reached while executing a task. If a sub-agent never gets called, make sure atask_typeexists whose work would actually need that server’s capability.
Composing agent hierarchies:
┌── ResearchAgent (port 8081)
│ └── brave_search tool
OrchestratorAgent ──┼── CodeReviewAgent (port 8082)
│ └── github tools
└── WritingAgent (port 8083)
└── doc generation tools
Each agent is independent, has its own config, and can be deployed/scaled separately.
Best for: Developer tools, ops automation, data pipelines, scripts.
Terminal ──► Your CLI (Click/Typer) ──► CortexFramework
# cli_agent.py
import asyncio
import click
from cortex.framework import CortexFramework
from cortex.streaming.status_events import EventType, ResultEvent, StatusEvent
@click.command()
@click.argument("request")
@click.option("--config", default="cortex.yaml")
def run(request, config):
"""Run a one-shot agent request from the command line."""
asyncio.run(_run(request, config))
async def _run(request, config):
fw = CortexFramework(config)
await fw.initialize()
q = asyncio.Queue()
# Print events as they arrive
async def print_events():
while True:
event = await q.get()
if isinstance(event, StatusEvent):
click.echo(f" [{event.event_type.value}] {event.message}")
elif isinstance(event, ResultEvent) and not event.partial:
click.echo(f"\n{event.content}")
if event.event_type in (EventType.SESSION_END, EventType.ERROR):
break
event_task = asyncio.create_task(print_events())
result = await fw.run_session("cli_user", request, q)
await event_task
await fw.shutdown()
if __name__ == "__main__":
run()
python cli_agent.py "Analyse the error logs from the last 24 hours"
Best for: Batch processing, scheduled jobs, email triage, automated report generation.
Job Queue (Celery / SQS / Redis) ──► Worker ──► CortexFramework
◄── Result stored to DB
# worker.py
import asyncio
from cortex.framework import CortexFramework
framework = CortexFramework("cortex.yaml")
async def init():
await framework.initialize()
async def process_job(job: dict) -> str:
"""Called by your job queue for each incoming job."""
q = asyncio.Queue()
result = await framework.run_session(
user_id=job["user_id"],
request=job["prompt"],
event_queue=q,
)
return result.response
# Example: process a batch of documents
async def batch_process(documents: list):
await init()
for doc in documents:
summary = await process_job({
"user_id": "batch_worker",
"prompt": f"Summarise this document:\n\n{doc['content']}",
})
save_to_database(doc["id"], summary)
await framework.shutdown()
Best for: Adding AI capabilities to an app that already exists (Django, Flask, FastAPI).
# Inside your existing Django view or FastAPI route
from cortex.framework import CortexFramework
framework = CortexFramework("cortex.yaml")
# Call this once at app startup
# await framework.initialize()
async def handle_support_ticket(ticket):
q = asyncio.Queue()
result = await framework.run_session(
user_id=ticket.author_id,
request=f"Triage and categorise this support ticket:\n\n{ticket.body}",
event_queue=q,
)
ticket.ai_triage = result.response
ticket.ai_score = result.validation_report.composite_score
ticket.save()
No new app to build. Cortex is just another dependency.
Cortex is designed for multi-agent composition — you can run any number of agents side-by-side on one machine. There’s no “one Cortex per host” limit; the defaults (filename cortex.yaml, wizard port 7799, MCP port 8080, storage ./cortex_storage) just need to be overridden per agent.
| Thing | Default | How to override per agent |
|---|---|---|
| Config file | ./cortex.yaml in CWD |
Every CLI command takes --config PATH, or set CORTEX_CONFIG env var |
| Wizard port | 7799 |
cortex setup --port 7800 |
| MCP publish port | 8080 |
cortex publish mcp --port 8081 |
| Storage base_path | ./cortex_storage |
Set storage.base_path in each cortex.yaml |
| SQLite DB path | ./cortex_storage/cortex.db |
Set sqlite.path in each cortex.yaml |
Give each agent its own directory with its own config and storage. Never run two agents from the same directory.
~/agents/
├── research-agent/
│ ├── cortex.yaml # MCP port 8081, storage ./storage
│ └── storage/
├── code-review-agent/
│ ├── cortex.yaml # MCP port 8082, storage ./storage
│ └── storage/
└── orchestrator/
├── cortex.yaml # references 8081 + 8082 as tool_servers
└── storage/
Step 1 — Create the research sub-agent:
mkdir -p ~/agents/research-agent && cd ~/agents/research-agent
cortex setup --port 7799 # wizard configures this one
In the wizard, set:
ResearchAgent./storage./storage/cortex.dbweb_research task typeStep 2 — Create the code-review sub-agent (use a different wizard port so you can run both wizards in parallel if needed):
mkdir -p ~/agents/code-review-agent && cd ~/agents/code-review-agent
cortex setup --port 7800
CodeReviewAgent./storagereview_code task typeStep 3 — Create the orchestrator that fans out to both:
mkdir -p ~/agents/orchestrator && cd ~/agents/orchestrator
cortex setup --port 7801
Edit ~/agents/orchestrator/cortex.yaml so tool_servers references the two sub-agents (both running as MCP servers) and task_types has entries that route to them — see the MCP example in Usage 2 above for the exact shape.
tool_servers:
research:
url: http://localhost:8081/sse
transport: sse
code_review:
url: http://localhost:8082/sse
transport: sse
Step 4 — Run all three in separate terminals (or as systemd/supervisor/pm2 units):
# Terminal 1
cd ~/agents/research-agent && cortex publish mcp --port 8081
# Terminal 2
cd ~/agents/code-review-agent && cortex publish mcp --port 8082
# Terminal 3
cd ~/agents/orchestrator && cortex dev
Step 5 — Drive the orchestrator from your app (or another cortex dev REPL):
result = await framework.run_session(
user_id="dev_1",
request="Research the latest vector DB benchmarks and review our benchmark script at ./bench.py",
event_queue=asyncio.Queue(),
)
The orchestrator decomposes the request, fans out to both sub-agents in parallel over MCP, and synthesises the combined result.
sqlite.path will intermittently fail writes. Give each agent its own sqlite.path under its own storage.base_path.redis config block so sessions don’t collide.cortex.yaml, write to the same storage, and fight over the same ports. Always cd into the agent’s own folder (or pass --config /abs/path/cortex.yaml explicitly).cortex setup --port 7800, --port 7801, etc., so wizards don’t collide.7799 + N and MCP 8080 + N keeps the mesh readable. Write the mapping into each agent’s cortex.yaml comments so it’s discoverable.cortex publish mcp binds the port until the process exits — if a previous run is still up, the next one will fail with address already in use. lsof -i :8081 to find the PID.CORTEX_CONFIG in long-lived shells if you work on one specific agent a lot: export CORTEX_CONFIG=~/agents/research-agent/cortex.yaml. Then cortex dev / cortex dry-run from anywhere will target it without --config.cortex publish docker --tag my-agent:latest
docker build -f Dockerfile.cortex -t my-agent:latest .
docker run -p 8080:8080 --env-file .env my-agent:latest
cortex publish package --output-dir dist
# Distribute the .whl file
pip install dist/*.whl
cortex dev --config cortex.yaml
cortex publish mcp --port 8080
# Other agents connect via:
# tool_servers:
# my_agent:
# url: http://host:8080/sse
# transport: sse
Automatically runs the agent with CORTEX_INTERACTION_MODE=rpc so MCP callers never hang on an interactive clarification.
cortex publish ui --port 8090
# → http://localhost:8090
Serves a single-page chat frontend backed by your agent. Streams status and results over SSE, supports file uploads (validated against file_input MIME / size limits), and persists per-user threads through the existing History Store. Configure title, host, port, and auth (none / token / basic) in the ui: block of cortex.yaml or via the wizard’s Chat UI step. See Deployment → Chat UI.
Everything lives in cortex.yaml. Here is a fully annotated example:
# ── Agent identity ──
agent:
name: MyAgent
description: A helpful AI assistant
time:
default_max_wait_seconds: 120 # Session timeout
default_task_timeout_seconds: 40 # Per-task timeout
concurrency:
max_concurrent_sessions: 50 # Global session cap
max_concurrent_sessions_per_user: 3
max_parallel_tasks: 5 # Parallel tasks per session
max_tasks_per_session: 20
# ── LLM provider ──
llm_access:
default:
provider: anthropic # anthropic | openai | gemini | grok | mistral | deepseek | bedrock | azure_ai
model: claude-sonnet-4-20250514
api_key_env_var: ANTHROPIC_API_KEY
max_tokens: 4096
temperature: 1.0
# ── External tools via MCP ──
tool_servers:
brave_search:
transport: sse
url: http://localhost:8051/sse
filesystem:
transport: stdio
command: npx
args: ["-y", "@modelcontextprotocol/server-filesystem", "/tmp/workspace"]
# ── Task types ──
task_types:
- name: web_research
description: Search the web for current information
output_format: md # text | md | json | file | html | csv | code
capability_hint: web_search # auto | llm_synthesis | web_search | bash | code_exec | document_generation | image_generation
timeout_seconds: 60
- name: analysis
description: Analyse data and produce structured insights
output_format: json
capability_hint: llm_synthesis
depends_on: [web_research] # Runs after web_research completes
# ── Storage ──
storage:
base_path: ./cortex_storage
sqlite: # Or redis for distributed deployments
enabled: true
path: ./cortex_storage/cortex.db
wal_mode: true
# ── Optional features ──
validation:
threshold: 0.75 # Min quality score (floor: 0.60)
history:
enabled: true
retention_days: 90
learning:
enabled: true # Autonomic learning — signal-gated, no consent prompt
validation_threshold: 0.75 # Min composite validation score to learn
complexity_threshold: 0.6 # Min TaskComplexityScorer score to stage ad-hoc tasks
auto_apply_delta: true # Auto-promote deltas once confidence accumulates
auto_apply_min_confidence: medium # ≥ 3 distinct principals
workspace_bash:
enabled: true # Workspace-scoped file/command execution with HITL
hitl_enabled: true # Cannot be disabled — enforced at runtime
| Command | What it does |
|---|---|
cortex setup |
Interactive browser wizard to generate cortex.yaml |
cortex config-ui |
Browser-based Config Studio to inspect/edit all framework config at localhost:7801 |
cortex dev --watch |
Dev mode with hot-reload on config changes |
cortex dry-run "query" |
Validate config and task graph without LLM calls |
cortex publish docker |
Generate Dockerfile.cortex (pass --with-ui for a chat-UI image) |
cortex publish package |
Build a distributable .whl |
cortex publish mcp --port 8080 |
Expose agent as an MCP tool server (auto-sets CORTEX_INTERACTION_MODE=rpc) |
cortex publish ui --port 8090 |
Serve the built-in chat UI |
cortex spec --format json |
Generate capability manifest |
cortex replay SESSION_ID --user-id USER_ID |
Replay a historical session |
cortex delta review |
Review auto-discovered task type proposals |
cortex delta apply --min-confidence high |
Apply confirmed proposals to config |
cortex delta rollback |
Restore previous config from backup |
cortex migrate |
Validate config schema compatibility |
cortex ants list / hatch / stop / status |
Manage the Ant Colony — self-spawning specialist agents |
Every call to run_session() returns a SessionResult:
result = await framework.run_session(user_id, request, event_queue)
result.session_id # Unique session identifier
result.response # Final synthesised response (string)
result.validation_report # Quality scores (intent_match, completeness, coherence)
result.task_completion # Which tasks succeeded/failed/timed out
result.token_usage # Token counts by role (decomposition, execution, synthesis, validation)
result.duration_seconds # Wall-clock time
result.node_outputs # dict {node_name: raw_output} — read individual task/node results
result.error # Error message if session failed (None on success)
event_queue is optional — omit it when you only need the returned
SessionResult and don’t consume streaming events:
result = await framework.run_session("user_1", "Summarise this") # no queue
Cortex streams events through the event_queue as work progresses. Three event types:
from cortex.streaming.status_events import StatusEvent, ResultEvent, ClarificationEvent
# StatusEvent — progress updates
event.message # "Executing task: web_research"
event.session_id
event.event_type # EventType.STATUS | TASK_START | TASK_COMPLETE | SESSION_START | SESSION_END | ERROR
# ResultEvent — agent response (partial or final)
event.content # The response text
event.partial # True if streaming, False when complete
event.validation_score
# ClarificationEvent — agent needs more information (intent gate, HITL task prompts)
event.question # "Which time period should I analyse?"
event.clarification_id # Pass back to resolve_task_clarification()
event.options # ["Last 7 days", "Last 30 days", "Last quarter"]
# LearningEvent — autonomic learning gate decision (end of session)
event.action # "staged" | "applied" | "blueprint_updated" | "skipped_*"
event.complexity_score # TaskComplexityScorer output, or None if gate was skipped
event.validation_score # Composite validation score, or None
event.staged_tasks # Task names staged into cortex_delta/pending.yaml
event.applied_tasks # Task names auto-promoted into cortex.yaml
| Provider | Config value | Default env var | Example models |
|---|---|---|---|
| Anthropic | anthropic |
ANTHROPIC_API_KEY |
claude-sonnet-4, claude-opus-4, claude-haiku-4.5 |
| OpenAI | openai |
OPENAI_API_KEY |
gpt-4o, gpt-4o-mini, o3-mini |
| Google Gemini | gemini |
GEMINI_API_KEY |
gemini-2.5-pro, gemini-2.5-flash |
| xAI Grok | grok |
XAI_API_KEY |
grok-3, grok-3-mini |
| Mistral | mistral |
MISTRAL_API_KEY |
mistral-large-latest |
| DeepSeek | deepseek |
DEEPSEEK_API_KEY |
deepseek-chat, deepseek-reasoner |
| AWS Bedrock | bedrock |
AWS_ACCESS_KEY_ID |
anthropic.claude-sonnet-4-* |
| Azure AI | azure_ai |
AZURE_API_KEY |
claude-sonnet-4 (via Azure) |
| Anthropic Proxy | anthropic_compatible |
ANTHROPIC_API_KEY |
Any (set base_url) |
| Local Runtime | local |
LOCAL_LLM_API_KEY (optional) |
gemma4:e4b, gemma4:26b, or any Ollama / LM Studio / vLLM tag |
| Custom | custom |
— | Provide function dotted path |
User Request
│
▼
[Primary Agent] ──── decomposes into task graph ────► [Task A] [Task B]
│ │
[MCP Agent] [MCP Agent]
│ │
(tool calls) (tool calls)
│ │
[Task C depends on A + B]
│
[MCP Agent]
│
[Primary Agent synthesises]
│
[Validation Agent scores]
│
[Learning Engine observes]
│
Final Response
| Component | Role |
|---|---|
| Primary Agent | Decomposes requests into a task graph, synthesises final response |
| Generic MCP Agent | Executes individual tasks with access to MCP tool servers |
| Task Graph Compiler | Validates dependencies, detects cycles, computes execution order |
| Capability Scout | Pre-decomposition tool discovery so the agent knows what’s available |
| Validation Agent | Scores responses on intent match, completeness, and coherence |
| Learning Engine | Signal-gated autonomic evolution — stages deltas and refines blueprints at session end |
| WorkspaceBash | Workspace-scoped file read/write and command execution with mandatory HITL before any mutating operation |
| Session Manager | Concurrency limits, per-user caps, session resume after timeout |
| Signal Registry | Coordinates async completion across parallel tasks |
| Mode | Who calls it | Best for |
|---|---|---|
| Chat UI | Human via browser | Customer-facing apps, dashboards, internal tools |
| MCP Server | Another agent via MCP | Agent composition, specialised sub-agents |
| CLI Tool | Developer in terminal | Dev tools, ops automation, scripts |
| Background Worker | Job queue (Celery/SQS) | Batch processing, scheduled reports |
| Embedded Library | Your existing app | Adding AI to Django/Flask/FastAPI apps |
| Docker Microservice | Other services via HTTP | Production, cloud, CI/CD |
| Python Package | End users via pip | Distributing pre-configured agents |
# Unit tests (no API key required)
pytest tests/ -v -k "not integration"
# Integration tests (requires API key)
ANTHROPIC_API_KEY=sk-... pytest tests/ -v
# With coverage
pytest tests/ --cov=cortex --cov-report=html
Cortex ships with test utilities:
from cortex.testing import MockLLMClient, make_test_config
cfg = make_test_config(agent_name="TestAgent", task_types=["web_search", "summarise"])
mock_llm = MockLLMClient(responses={"default": "Mock response"})
| Variable | Description |
|---|---|
ANTHROPIC_API_KEY |
Anthropic provider API key |
OPENAI_API_KEY |
OpenAI provider API key |
GEMINI_API_KEY |
Google Gemini provider API key |
XAI_API_KEY |
xAI Grok provider API key |
MISTRAL_API_KEY |
Mistral AI provider API key |
DEEPSEEK_API_KEY |
DeepSeek provider API key |
AWS_DEFAULT_REGION |
AWS region for Bedrock |
AZURE_AI_API_KEY |
Azure AI provider API key |
LOCAL_LLM_API_KEY |
Optional auth for the local provider (Ollama / LM Studio / vLLM) |
CORTEX_CONFIG |
Override default config path |
CORTEX_LOG_LEVEL |
Logging level (DEBUG, INFO, WARNING, ERROR) |
CORTEX_INTERACTION_MODE |
Override agent.interaction_mode — interactive or rpc. Set automatically by cortex publish mcp. |
CORTEX_HITL_URL |
Set automatically on ant subprocesses so WorkspaceBash HITL prompts relay to the parent session. Not set manually in normal use. |
MIT License. See LICENSE for details.