GX10 Queued LLM Gateway

An authenticated API that queues long-running inference jobs against the GX10 box (vLLM), handles model switching, and returns results by polling or webhook callback.

One bearer API key — send Authorization: Bearer <key>
Submit a job → poll /v1/jobs/:id or receive a webhook
Models: gpt-oss-120b · llama4-scout (32K) · llama4-scout-long (256K)
Built for big jobs that can run overnight

Admin Portal Health

Full API reference ships with this service as API.md — drop it into an AI agent and it can drive the API end-to-end.