The fastest inference API for APAC.
50+ models. Sub-100ms TTFT from Singapore. Data stays in-region.
Built with teams shipping AI in APAC
# Before: US-routed, 300ms+ latency from APAC
from openai import OpenAI
# After: Change one line. Sub-100ms from Singapore.
client = OpenAI(
base_url="https://api.brightnode.cloud/v1",
api_key="$BRIGHTNODE_API_KEY",
)
response = client.chat.completions.create(
model="anthropic/claude-sonnet-4",
messages=[{"role": "user", "content": "Hello"}],
)Compact proof from Singapore
TTFT and latency are shown side-by-side with reliability metrics. Full reproducible methodology lives on `/performance`.
| Metric | Global | Brightnode |
|---|---|---|
| End-to-end latency p50 | 733ms | 199ms |
| TTFT p50 (streaming) | 840ms | 94ms |
| End-to-end latency p95 | 1,643ms | 318ms |
| TTFT p95 (streaming) | 1,393ms | 142ms |
8.9x faster first token where users feel it.
This panel translates benchmark numbers into user-facing impact: faster response perception, lower p95 spikes, and stronger reliability under real traffic patterns.
TTFT p50 from 840ms to 94ms (8.9x faster)
TTFT p95 from 1,393ms to 142ms (9.8x faster)
Global baseline observed 3.3% errors
Benchmarked from Singapore against global router baselines. Full reproducible methodology is published on the performance page.
Popular production models across APAC. One API. Data stays in-region.
Claude Sonnet, Llama, Qwen, DeepSeek, and more with current per-1M pricing, context, region, and latency in one view.
| Model | Provider | Input / 1M | Output / 1M | Context | APAC regions | Latency (SG/TYO/SYD) | Residency |
|---|---|---|---|---|---|---|---|
| Claude Sonnet 4 Claude | Anthropic | $3 | $15 | 200,000 | Singapore, Sydney, Tokyo, Thailand, Malaysia, Jakarta, New Zealand, Seoul, Taiwan, Mumbai | 83ms / 98ms / 74ms | in-region |
| Claude Haiku 4.5 Claude | Anthropic | $1 | $5 | 200,000 | Singapore, Jakarta, Malaysia, Thailand, Tokyo, Seoul, Taiwan, Mumbai, Sydney, New Zealand | 59ms / 71ms / 61ms | in-region |
| Llama 3.3 70B Instruct Llama | Meta | $0.22 | $0.50 | 131,072 | Singapore | 27ms / 41ms / 34ms | in-region |
| DeepSeek V3 Deepseek | DeepSeek | $0.60 | $1.74 | 163,840 | Jakarta, Singapore, Malaysia, Thailand, Tokyo, Seoul, Taiwan, Mumbai, Sydney, New Zealand | 40ms / 58ms / 51ms | in-region |
| Qwen3 32B Qwen | Qwen | $0.10 | $1.20 | 131,072 | Singapore | 31ms / 45ms / 39ms | in-region |
| Mistral Nemo Mistral | Mistral | $0.15 | $0.15 | 131,072 | Singapore | 38ms / 54ms / 47ms | in-region |
Embedded playground preview
Demo mode works instantly. Live beta mode can run a real request when you provide an API key.
Run a prompt to preview streamed output.
Demo mode uses representative text. Live beta mode performs a direct request via the public API proxy.
from openai import OpenAI
client = OpenAI(
base_url="https://api.brightnode.cloud/v1",
api_key="YOUR_BRIGHTNODE_API_KEY",
)
stream = client.chat.completions.create(
model="meta-llama/Llama-3.3-70B-Instruct",
stream=True,
messages=[{"role": "user", "content": "Hello APAC"}],
)import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.brightnode.cloud/v1",
apiKey: process.env.BRIGHTNODE_API_KEY,
});
const completion = await client.chat.completions.create({
model: "meta-llama/Llama-3.3-70B-Instruct",
messages: [{ role: "user", content: "Hello APAC" }],
});Choose the path that matches your stage.
Router for serving, managed models for fast launch, and Workspaces for pre-production development.
Inference Router
Route API traffic to APAC-first inference paths with OpenAI-compatible requests and clear model controls.
Dedicated Endpoints
Deploy reserved APAC inference capacity for custom checkpoints, LoRA workflows, and enterprise traffic.
GPU Workspaces
Fine-tune and evaluate in APAC, then deploy to Brightnode Inference through the same product workflow.
Join the First 100 Startup Teams
Early access for approved startups. We're onboarding teams across Singapore, Jakarta, and Bangkok.
