this page runs it live — Python 3 · Elixir · WasmGC

Run agent-written Python as a function call

The sandbox is a function call.

pyex is a Python 3 interpreter written in Elixir, built as an execution substrate for agent loops. Sandboxed code never touches a Python runtime, a process, or your filesystem — it reaches an interpreter that sees only the capabilities you pass in. Tenant boot: microseconds. Container cold start: seconds.

Open the playground {:pyex, "~> 0.1"}

# the whole pitch: the model emits Python, your tools are host functions,
# isolation is deny-by-default — and it's all one function call.
tools = %{
  "search" => {:builtin, fn [q] -> MyApp.Search.run(q) end},
  "fetch"  => {:builtin, fn [url] -> MyApp.HTTP.get(url) end}
}

{:ok, result, ctx} = Pyex.run(code_the_model_wrote,
  modules: %{"agent" => %{"tools" => tools}},
  filesystem: %{"notes.md" => scratchpad},
  limits: [timeout: 5_000, max_memory_bytes: 50_000_000])

live — this very interpreter, compiled to WasmGC, in this page's Worker ⌘↵

fresh sandbox per request · deterministic step budget

why code mode needs a different shape

Ten tool calls become one program.
Now who runs the program?

Tool-calling agents pay one model round-trip per action. Code mode collapses ten tool calls into one program — but now you're executing untrusted, model-written code on every step. The industry answer is a VM per step, which reintroduces the latency you were removing and puts an RPC boundary between the agent's code and every tool it calls. pyex's answer: interpret the Python yourself, in-process — a step costs microseconds, and a tool call is a host function dispatch. No IPC, no marshalling, no path from Python source to an OS process.

	Sandbox service / microVM	pyex
Start a run	seconds cold, or a warm pool to manage	~200 µs, no pool
Call a tool	HTTP/RPC round-trip	Elixir function dispatch
Keep agent state	serialize + ship	a value on your heap
Per-tenant cost	a VM	a struct

the trust boundary is a diff you can read

Everything the program can see
is an argument.

open() writes to the map you passed in. requests.get hits your allowlist. There is no os.exec because it doesn't exist. And a static analyzer walks the compiled BEAM code on every CI run and fails the build if anything under lib/pyex references File, Port, System.cmd, spawn, or the host environment. The sandbox guarantee is a CI gate, not a code-review promise.

# Deny by default. Every effect is a capability you chose to hand in.
Pyex.run(source,
  filesystem: %{"data.json" => json},           # open() sees only this
  network: [%{allowed_url_prefix: "https://api.example.com/"}],
  env: %{"API_KEY" => key},                     # injected, never in source
  limits: [timeout: 5_000, max_memory_bytes: 50_000_000])

And every run returns an unforgeable capability ledger — an OpenTelemetry span tree of every file, URL, and store the program touched, even when it crashed. Preview effects before they happen: copy-on-write overlays stage open(...).write and store.put for review, then commit/1 applies exactly the run you approved — deterministic under a seed, so there is no time-of-check/time-of-use gap.

the loop itself is sandboxed

Most sandboxes run the tool code.
pyex runs the controller.

# The model wrote this. It runs 10 steps without a single
# network hop between the code and the tools.
import json
from agent import call_model, tools

state = {"steps": []}
for _ in range(10):
    decision = call_model(state)
    if decision["action"] == "stop":
        break
    result = tools[decision["tool"]](*decision["args"])
    state["steps"].append({"tool": decision["tool"], "result": result})
print(json.dumps(state))

Generators are continuations, so a step can pause and resume without owning a process. asyncio.gather interleaves like CPython. Retries, planners, eval harnesses — the loop logic the model emits just runs. See examples/research_agent.py for the runnable proof.

numbers, reproducibly

The command is the marketing.

Workload	p50	p99
FizzBuzz (100 iterations)	182 µs	238 µs
Algorithms suite (~150 LOC: sieve + sort + fib + stats)	1.67 ms	2.04 ms
FastAPI cold boot	221 µs	302 µs
FastAPI route — list + Jinja2 render	108 µs	166 µs
FastAPI route — 404	9 µs	19 µs

mix run bench/readme_bench.exs

The honest tradeoff: 10–100× slower than CPython for pure CPU work — and it doesn't matter, because agent steps are dominated by tool I/O, JSON shaping, and routing. Compute budgets exclude I/O time: an agent waiting on a slow tool isn't killed for it; an infinite loop is.

multi-tenancy

A tenant is a value.

A booted app is a struct on your heap. 100,000 tenants is a benchmark file (bench/multitenant_scaling_bench.exs), not a capacity-planning meeting. Storage multitenancy is an object boundary, not a tenant_id filter someone forgets.

{:ok, app}       = Pyex.Lambda.boot(model_generated_fastapi_source)
{:ok, resp, app} = Pyex.Lambda.handle(app, %{method: "GET", path: "/hello/world"})
# boot once, handle many; state threads through; tenants serialize like any value

trust, itemized

How we know it works.

Differentially fuzzed against CPython — outputs and exception types must match.
Byte-for-byte repr conformance suite, plus fixture programs checked against CPython ground truth.
5,073 IBM dectest vectors pass for decimal.
Property tests assert malformed input never crashes the host — it returns a Python error.
Dialyzer-clean, with @spec on the public surface; the banned-call tracer fails CI if the sandbox boundary regresses.
Real workloads as end-to-end tests: a webhook handler, a DCF model, an SSR blog, a research agent.

defense in depth

Three layers, each named.

pyex stops the 99% cooperatively — step, memory, and output budgets with clean Python errors. The BEAM stops the rest unconditionally — run each guest in a monitored process with a GC-enforced max_heap_size and a wall-clock kill (examples/sandbox_server.exs is the copy-paste). A microVM around the whole node stops the adversary. One ops property worth quoting: the guest can't move your 5xx rate — verdicts (ok / error / timeout / OOM) are body fields; HTTP status describes only your service.

What it isn't

pyex is a hardened library, not a microVM. Against a sophisticated adversary it composes with stronger isolation rather than replacing it. It runs the Python agents actually write — json, re, asyncio, pydantic, requests, fastapi, partial pandas — not all of CPython. And it's an interpreter: pure CPU work runs 10–100× slower than CPython, which agent workloads don't notice. Naming our own boundary is the point.

Your agent writes Python.
Run it on your heap.

{:pyex, "~> 0.1"}

Read the source Try it in your browser

Ten tool calls become one program.Now who runs the program?

Everything the program can seeis an argument.

Most sandboxes run the tool code.pyex runs the controller.