2026.05.24: by hermes + gpt-5.5

LLM API Families, LiteLLM, Codex Responses API, and Compiled-Language Gateway Alternatives

Why “completions API” is confusing

“Completion” is used in at least three ways:

  1. Generic generation concept: a model completes/generates output.
  2. Legacy OpenAI Completions API: POST /v1/completions, prompt string in, text string out.
  3. OpenAI Chat Completions API: POST /v1/chat/completions, role-tagged messages in, assistant message out.

So always distinguish:

/v1/completions       = legacy prompt completions
/v1/chat/completions  = chat completions
/v1/responses         = new agent-oriented Responses API

Mainstream LLM API families

FamilyEndpoint examplesInput shapeAgent suitabilityStatus
Legacy prompt completions/v1/completions, Ollama /api/generatesingle prompt stringPoorLegacy/simple generation
Chat Completions/v1/chat/completionsmessages[] with system/user/assistant/tool rolesGoodVery common compatibility target
Messages APIAnthropic /v1/messagesprovider-specific messages[]; separate system promptGoodAnthropic-style modern API
Responses APIOpenAI /v1/responsesstructured input[] / output itemsBestOpenAI’s new direction, used by new Codex
Gemini nativemodels/{model}:generateContent, :streamGenerateContentcontents[] with parts[]GoodGoogle-native API
Bedrock ConverseConverse, ConverseStreamAWS unified message/tool schemaGoodAWS cross-model API
Provider-specific invokeBedrock InvokeModel, custom JSON APIsarbitrary provider schemaVariesLow-level escape hatch
Local runtime APIsOllama /api/chat, vLLM/llama.cpp OpenAI-compatible serversvariesVariesCommon for local models

Compiled-language alternatives to LiteLLM

ProjectLanguageClosest to LiteLLM?Main focusNotes for Codex /v1/responses
GoModelGoVery closeLightweight multi-provider AI gateway (~17 MB image, semantic caching, usage tracking)OpenAI-compatible passthrough; verify Responses API and streaming/tool parity
BifrostGoVery closeHigh-performance multi-provider AI gatewayPromising; verify Responses API and streaming/tool parity
Moon BridgeGoClose for Codex/Responses use cases, narrower as a general LiteLLM cloneOpenAI Responses API front door that routes/translates to Anthropic Messages, Google Gemini, OpenAI Chat Completions, or OpenAI Responses upstreamsStrong candidate when the decisive requirement is /v1/responses: it is explicitly built around Codex CLI + Responses streaming/tool-call translation; verify provider breadth and production maturity
BarbacaneRustCloseSpec-driven API gateway with bidirectional AI + MCP supportOpenAPI-spec-as-config; ai-proxy plugin supports Chat Completions + stateless Responses API; verify streaming/tool parity
OmniRouteNode.js/TypeScriptCloseFree self-hosted AI gateway (160+ providers, 13 routing strategies, semantic cache, MCP)MIT license; npm/Docker/Electron/Termux deploy; verify Responses API support
HecateGoCloseLocal-first AI runtime console + gateway for cloud/local modelsCoding-agent console, task approvals, OpenTelemetry; early stage (16 stars); verify maturity
Traceloop HubRustCloseLLM gateway + OpenTelemetry observabilityDocs emphasize chat/completions; verify Responses API
Envoy AI GatewayGo control plane + Envoy/C++ data planeClose, infra-nativeKubernetes/Envoy LLM gatewayGood infra story; verify Responses transform support
AISIXRustClose but youngRust AI gateway / LLM proxyPromising; verify maturity and Responses API
LocalAIGo + native backendsPartialLocal model serving with OpenAI-compatible APIUseful for local inference, not a pure multi-provider gateway
Kong AI GatewayNginx/OpenResty/Lua ecosystem, Go componentsPartial/enterpriseAPI gateway with AI pluginsStrong gateway features, not a simple LiteLLM clone
Apache APISIX AI GatewayOpenResty/Lua + etcd ecosystemPartial/infraAPI gateway with AI pluginsInfra-grade, less automatic provider normalization
agentgatewayRustPartialAI-native proxy for LLM/MCP/agent trafficInteresting for agent/MCP traffic; verify LLM provider breadth

Recommendation

If the goal is "LiteLLM but compiled-language and lower overhead," evaluate in this order:

  1. GoModel — smallest footprint (~17 MB), semantic caching, usage tracking, fully OSS.
  2. Bifrost — closest conceptual Go replacement; production-grade with MCP support.
  3. Moon Bridge — best fit among the listed Go projects when the immediate target is Codex CLI over /v1/responses; less broad than LiteLLM/GoModel/Bifrost as a general multi-provider gateway, but its architecture is explicitly Responses-first.
  4. Barbacane — Rust-native, spec-driven (OpenAPI-as-config), bidirectional AI + MCP gateway; best if you already use OpenAPI specs and want AI traffic governed same as other HTTP traffic.
  5. OmniRoute — MIT-licensed, 160+ providers, 13 routing strategies, semantic cache, MCP; TypeScript-based but self-hostable via npm/Docker.
  6. Hecate — local-first Go runtime with coding-agent console; early stage but interesting for agent workloads.
  7. Traceloop Hub — Rust gateway with observability.
  8. AISIX — Rust, promising but younger.
  9. Envoy AI Gateway — best if running Kubernetes/Envoy and wanting infra-grade control.

If the goal is specifically new Codex compatibility, the decisive requirement is:

Does the gateway expose /v1/responses and correctly translate streaming/tool-call events?

A fast Go/Rust gateway that only supports /v1/chat/completions will not by itself solve the new Codex problem.