Deploy inference to GPU clusters, TEE nodes, and major cloud providers — all through one async Python interface. No lock-in. No rewriting.
Axon is to edge compute what httpx is to HTTP — one client, any backend.
pip install axon
Your existing OpenAI SDK code works unchanged. Swap the base_url and api_key — Axon routes requests automatically to the cheapest available provider across io.net, Akash, Acurast, AWS and more.
openai Python package, LangChain, LlamaIndex, DSPyhttpx — compatible with FastAPI, Django Async, Starletteimport asyncio from axon import AxonClient from axon.types import DeploymentConfig async def main(): async with AxonClient( provider="ionet", secret_key="your-key" ) as client: # Deploy AI workload to edge GPU deployment = await client.deploy( DeploymentConfig( name="my-inference-worker", entry_point="worker.py", memory_mb=4096, replicas=2, ) ) # Stream results from the worker client.on_message(lambda msg: print(f"Result: {msg.payload}") ) await client.send( deployment.id, {"prompt": "Summarise this article..."} ) asyncio.run(main())
from openai import AsyncOpenAI import os # ── Before: OpenAI ────────────────────────── client = AsyncOpenAI( api_key=os.getenv("OPENAI_API_KEY") ) # ── After: Axon edge routing ───────────────── # Change just these two lines — nothing else client = AsyncOpenAI( base_url="http://localhost:8787/v1", api_key=os.getenv("AXON_SECRET_KEY") ) # Your existing code works unchanged ↓ response = await client.chat.completions.create( model="llama-3-8b", messages=[{"role": "user", "content": "Hello!"}], stream=True, ) async for chunk in response: print(chunk.choices[0].delta.content, end="")
Live provider health
All 10 providers operational — updated every 5 minutes.
Deploy to GPU clusters or major cloud platforms without changing your code. Axon routes to the fastest, cheapest available option.
Edge & Private compute
Cloud Providers Live
One pip install, one config, one async client — regardless of which provider you deploy to.
Core install covers all edge & cloud providers that use only httpx. Add optional extras for AWS, GCP, or Azure SDKs.
Copy .env.example to .env and fill in your provider key. Run axon auth to validate them.
Point Axon at your Python or Node.js entry point. It handles bundling, upload, and registration automatically.
Add multiple providers and let the router pick the fastest or cheapest automatically — with circuit-breaker failover built in.
$ pip install axon # Optional cloud SDK extras $ pip install "axon[aws]" $ pip install "axon[gcp]" $ pip install "axon[azure]" # Or everything at once $ pip install "axon[all]" # Initialise a new project $ axon init my-worker # Validate credentials $ axon auth # Deploy to ionet $ axon deploy --provider ionet # Check deployment status $ axon status
import asyncio from axon.router import AxonRouter from axon.types import DeploymentConfig, RoutingStrategy async def main(): router = AxonRouter( providers=["ionet", "akash", "aws"], secret_key="your-axon-key", strategy=RoutingStrategy.LATENCY, ) async with router: # Connects all providers concurrently, # tolerates individual failures estimates = await router.estimate_all(config) for e in sorted(estimates, key=lambda e: e.usd_estimate): print(f"{e.provider}: ${e.usd_estimate:.4f}/hr") # Deploy — router picks the best provider deployment = await router.deploy( DeploymentConfig( name="inference-worker", entry_point="worker.py", memory_mb=2048, ) ) print(f"Deployed on {deployment.provider}") asyncio.run(main())
The AxonRouter connects to every provider concurrently, runs a background health loop, and routes deployments based on your chosen strategy — with automatic circuit-breaker failover.
estimate() calls before each deployment.
Circuit breaker states
from axon.router import AxonRouter, CircuitBreaker from axon.types import RoutingStrategy # Custom circuit breaker — lower threshold for critical workloads router = AxonRouter( providers=["ionet", "akash", "aws", "fly"], secret_key="your-key", strategy=RoutingStrategy.FAILOVER, health_check_interval=30.0, ) async with router: # Inspect circuit state per provider for name, slot in router._slots.items(): cb = slot.circuit print(f"{name}: {cb.state.value} " f"(failures: {cb.failure_count})") # estimate_all() fetches cost from every live provider estimates = await router.estimate_all(config) # Router auto-skips OPEN circuits deployment = await router.deploy(config) # health() returns ProviderHealth with latency_ms health = await router.health() for h in health: print(f"{h.provider}: {h.latency_ms:.0f}ms")
Built with Typer and Rich — coloured output, spinners, and interactive prompts out of the box. Install the extras for the full interactive experience.
axon.json, .env.example, and entry point
--provider and --config flags
pip install "axon[cli]"
Everything you need to route inference at scale — security, observability, and resilience built in.
Built on httpx and asyncio throughout. All providers are async context managers — connect, deploy, send, and disconnect without blocking your event loop.
All endpoint and IPFS URLs are validated against a private IP regex before any request is made — blocking 169.254.x.x, 10.x.x.x, 172.16–31.x.x, 192.168.x.x, and localhost.
Environment variables ending in _KEY, _SECRET, _TOKEN, _PASSWORD, or _MNEMONIC are automatically stripped before any value reaches a cloud runtime.
Per-provider circuit breakers with configurable failure thresholds and recovery timeouts. Unhealthy providers are automatically skipped and retried — no cascading failures.
All config, deployment, cost, and health objects are fully typed Pydantic v2 models. IDE completion, runtime validation, and JSON serialisation all included.
Implement IAxonProvider ABC and register in PROVIDER_REGISTRY. Any custom backend — private cloud, on-prem, exotic hardware — slots in automatically.
Call estimate() before deploying to get a USD breakdown per provider. Compare across all 10 providers in one estimate_all() call.
Install axon[inference] for a FastAPI server exposing /v1/models and /v1/chat/completions — drop-in replacement for OpenAI's endpoint.
The @axonsdk monorepo brings the same provider-agnostic interface to the JavaScript ecosystem — with packages for Node.js, CLI, OpenAI-compatible inference, and React Native mobile.
@axonsdk/sdk — core client for Node.js and edge runtimes@axonsdk/inference — OpenAI-compatible endpoint for Express / Next.js@axonsdk/mobile — React Native hooks for iOS & Android@axonsdk/cli — same axon commands for JS projectsimport { AxonClient } from '@axonsdk/sdk'; const client = new AxonClient({ provider: 'ionet', secretKey: process.env.AXON_SECRET_KEY, }); await client.connect(); client.onMessage((msg) => { console.log('Result:', msg.payload.result); }); await client.send('worker-id', { prompt: 'Summarise this article…', }); await client.disconnect();
One pip install. Any provider. Zero lock-in. MIT licensed and fully open source.
pip install axon