2026.05.24: by hermes + gpt-5.5

LLM API Families, LiteLLM, Codex Responses API, and Compiled-Language Gateway Alternatives

Why “completions API” is confusing

“Completion” is used in at least three ways:

Generic generation concept: a model completes/generates output.
Legacy OpenAI Completions API: POST /v1/completions, prompt string in, text string out.
OpenAI Chat Completions API: POST /v1/chat/completions, role-tagged messages in, assistant message out.

So always distinguish:

/v1/completions       = legacy prompt completions
/v1/chat/completions  = chat completions
/v1/responses         = new agent-oriented Responses API

Mainstream LLM API families

Family	Endpoint examples	Input shape	Agent suitability	Status
Legacy prompt completions	`/v1/completions`, Ollama `/api/generate`	single prompt string	Poor	Legacy/simple generation
Chat Completions	`/v1/chat/completions`	`messages[]` with `system/user/assistant/tool` roles	Good	Very common compatibility target
Messages API	Anthropic `/v1/messages`	provider-specific `messages[]`; separate system prompt	Good	Anthropic-style modern API
Responses API	OpenAI `/v1/responses`	structured `input[]` / output items	Best	OpenAI’s new direction, used by new Codex
Gemini native	`models/{model}:generateContent`, `:streamGenerateContent`	`contents[]` with `parts[]`	Good	Google-native API
Bedrock Converse	`Converse`, `ConverseStream`	AWS unified message/tool schema	Good	AWS cross-model API
Provider-specific invoke	Bedrock `InvokeModel`, custom JSON APIs	arbitrary provider schema	Varies	Low-level escape hatch
Local runtime APIs	Ollama `/api/chat`, vLLM/llama.cpp OpenAI-compatible servers	varies	Varies	Common for local models

Compiled-language alternatives to LiteLLM

Project	Language	Closest to LiteLLM?	Main focus	Notes for Codex `/v1/responses`
GoModel	Go	Very close	Lightweight multi-provider AI gateway (~17 MB image, semantic caching, usage tracking)	OpenAI-compatible passthrough; verify Responses API and streaming/tool parity
Bifrost	Go	Very close	High-performance multi-provider AI gateway	Promising; verify Responses API and streaming/tool parity
Moon Bridge	Go	Close for Codex/Responses use cases, narrower as a general LiteLLM clone	OpenAI Responses API front door that routes/translates to Anthropic Messages, Google Gemini, OpenAI Chat Completions, or OpenAI Responses upstreams	Strong candidate when the decisive requirement is `/v1/responses`: it is explicitly built around Codex CLI + Responses streaming/tool-call translation; verify provider breadth and production maturity
Barbacane	Rust	Close	Spec-driven API gateway with bidirectional AI + MCP support	OpenAPI-spec-as-config; `ai-proxy` plugin supports Chat Completions + stateless Responses API; verify streaming/tool parity
OmniRoute	Node.js/TypeScript	Close	Free self-hosted AI gateway (160+ providers, 13 routing strategies, semantic cache, MCP)	MIT license; npm/Docker/Electron/Termux deploy; verify Responses API support
Hecate	Go	Close	Local-first AI runtime console + gateway for cloud/local models	Coding-agent console, task approvals, OpenTelemetry; early stage (16 stars); verify maturity
Traceloop Hub	Rust	Close	LLM gateway + OpenTelemetry observability	Docs emphasize chat/completions; verify Responses API
Envoy AI Gateway	Go control plane + Envoy/C++ data plane	Close, infra-native	Kubernetes/Envoy LLM gateway	Good infra story; verify Responses transform support
AISIX	Rust	Close but young	Rust AI gateway / LLM proxy	Promising; verify maturity and Responses API
LocalAI	Go + native backends	Partial	Local model serving with OpenAI-compatible API	Useful for local inference, not a pure multi-provider gateway
Kong AI Gateway	Nginx/OpenResty/Lua ecosystem, Go components	Partial/enterprise	API gateway with AI plugins	Strong gateway features, not a simple LiteLLM clone
Apache APISIX AI Gateway	OpenResty/Lua + etcd ecosystem	Partial/infra	API gateway with AI plugins	Infra-grade, less automatic provider normalization
agentgateway	Rust	Partial	AI-native proxy for LLM/MCP/agent traffic	Interesting for agent/MCP traffic; verify LLM provider breadth

Recommendation

If the goal is "LiteLLM but compiled-language and lower overhead," evaluate in this order:

GoModel — smallest footprint (~17 MB), semantic caching, usage tracking, fully OSS.
Bifrost — closest conceptual Go replacement; production-grade with MCP support.
Moon Bridge — best fit among the listed Go projects when the immediate target is Codex CLI over /v1/responses; less broad than LiteLLM/GoModel/Bifrost as a general multi-provider gateway, but its architecture is explicitly Responses-first.
Barbacane — Rust-native, spec-driven (OpenAPI-as-config), bidirectional AI + MCP gateway; best if you already use OpenAPI specs and want AI traffic governed same as other HTTP traffic.
OmniRoute — MIT-licensed, 160+ providers, 13 routing strategies, semantic cache, MCP; TypeScript-based but self-hostable via npm/Docker.
Hecate — local-first Go runtime with coding-agent console; early stage but interesting for agent workloads.
Traceloop Hub — Rust gateway with observability.
AISIX — Rust, promising but younger.
Envoy AI Gateway — best if running Kubernetes/Envoy and wanting infra-grade control.

If the goal is specifically new Codex compatibility, the decisive requirement is:

Does the gateway expose /v1/responses and correctly translate streaming/tool-call events?

A fast Go/Rust gateway that only supports /v1/chat/completions will not by itself solve the new Codex problem.

Xieby1's Nix/NixOS Config

LLM API Families, LiteLLM, Codex Responses API, and Compiled-Language Gateway Alternatives

Why “completions API” is confusing

Mainstream LLM API families

Compiled-language alternatives to LiteLLM

Recommendation