An authenticated API that queues long-running inference jobs against the GX10 box (vLLM), handles model switching, and returns results by polling or webhook callback.
Authorization: Bearer <key>/v1/jobs/:id or receive a webhookgpt-oss-120b · llama4-scout (32K) · llama4-scout-long (256K)Full API reference ships with this
service as API.md — drop it into an AI agent and it can drive
the API end-to-end.