API ReferenceOpenAI

Responses

Use the Responses endpoint for newer OpenAI-compatible generation flows, including agent-style input, multimodal content, and tool-oriented applications.

Official OpenAI reference

Endpoint

Create a response using the Velrix OpenAI-compatible base URL.

Method

Create a model response.

POST /v1/responses

Base URL

OpenAI-compatible Velrix endpoint.

https://api.velrix.ai/v1

Authentication

Headers

Velrix follows OpenAI-compatible bearer-token authentication for this route.

Authorization

Scoped Velrix API key.

Bearer $VELRIX_API_KEY

Content-Type

Requests and responses use JSON.

application/json

Schema

Body parameters

These are the primary fields most Responses clients need. Velrix forwards compatible fields to the selected upstream route when supported.

model

Required

Model ID to route. Use gpt-5.4 for policy routing or a catalog model ID.

input

Required

Text, image, file, or structured input items for the model to process. The official endpoint accepts either text or typed input arrays.

instructions

Optional

System or developer-style instructions inserted into the model context for this response.

previous_response_id

Optional

Link this request to a previous response when building multi-turn stateful flows.

tools / tool_choice

Optional

Configure hosted tools or function-like tools, and control whether tool use is automatic, required, or constrained.

stream

Optional

Return incremental server-sent events as output is generated.

background

Optional

Run the response asynchronously in the background when supported by the selected route.

truncation

Optional

Control how the request should handle context-window overflow, such as automatic truncation or disabled truncation.

reasoning

Optional

Configure reasoning behavior for models that expose reasoning controls.

text

Optional

Configure text response formatting, including structured output settings when supported.

metadata

Optional

Attach key-value metadata for filtering, tracing, or operational bookkeeping.

Request

Request example

Send a model and input. Velrix applies the same key scoping, routing policy, and telemetry as other gateway endpoints.

Shell

curl https://api.velrix.ai/v1/responses \
  -H "Authorization: Bearer $VELRIX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.4",
    "instructions": "Write concise operational guidance.",
    "input": "Draft a release checklist for the gateway.",
    "metadata": {
      "service": "dashboard-docs"
    }
  }'

Structured input

curl https://api.velrix.ai/v1/responses \
  -H "Authorization: Bearer $VELRIX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.4",
    "input": [
      {
        "role": "user",
        "content": [
          {
            "type": "input_text",
            "text": "List the rollout risks."
          }
        ]
      }
    ],
    "stream": true
  }'

Response

Response shape

Responses use a unified response object. Clients often read output text directly when available, while advanced clients inspect output items.

JSON

{
  "id": "id",
  "created_at": 0,
  "error": {
    "code": "server_error",
    "message": "message"
  },
  "incomplete_details": {
    "reason": "max_output_tokens"
  },
  "instructions": "string",
  "metadata": {
    "foo": "string"
  },
  "model": "gpt-5.1",
  "object": "response",
  "output": [
    {
      "id": "id",
      "content": [
        {
          "annotations": [
            {
              "file_id": "file_id",
              "filename": "filename",
              "index": 0,
              "type": "file_citation"
            }
          ],
          "logprobs": [
            {
              "token": "token",
              "bytes": [
                0
              ],
              "logprob": 0,
              "top_logprobs": [
                {
                  "token": "token",
                  "bytes": [
                    0
                  ],
                  "logprob": 0
                }
              ]
            }
          ],
          "text": "text",
          "type": "output_text"
        }
      ],
      "role": "assistant",
      "status": "in_progress",
      "type": "message"
    }
  ],
  "parallel_tool_calls": true,
  "temperature": 1,
  "tool_choice": "none",
  "tools": [
    {
      "name": "name",
      "parameters": {
        "foo": "bar"
      },
      "strict": true,
      "type": "function",
      "description": "description"
    }
  ],
  "top_p": 1,
  "background": true,
  "completed_at": 0,
  "conversation": {
    "id": "id"
  },
  "max_output_tokens": 0,
  "max_tool_calls": 0,
  "output_text": "output_text",
  "previous_response_id": "previous_response_id",
  "prompt": {
    "id": "id",
    "variables": {
      "foo": "string"
    },
    "version": "version"
  },
  "prompt_cache_key": "prompt-cache-key-1234",
  "prompt_cache_retention": "in-memory",
  "reasoning": {
    "effort": "none",
    "generate_summary": "auto",
    "summary": "auto"
  },
  "safety_identifier": "safety-identifier-1234",
  "service_tier": "auto",
  "status": "completed",
  "text": {
    "format": {
      "type": "text"
    },
    "verbosity": "low"
  },
  "top_logprobs": 0,
  "truncation": "auto",
  "usage": {
    "input_tokens": 0,
    "input_tokens_details": {
      "cached_tokens": 0
    },
    "output_tokens": 0,
    "output_tokens_details": {
      "reasoning_tokens": 0
    },
    "total_tokens": 0
  },
  "user": "user-1234"
}

Operations

Workflow notes

Use Responses when your framework prefers a unified response object instead of chat completion choices.

Agent-friendly shape

Keep tool and multimodal workflows on a response-oriented API while still routing through Velrix.

Same routing controls

Use automatic routing or pinned catalog models exactly as you would with chat completions.

Streaming

Use stream: true for incremental server-sent events when the selected route supports streaming.

Tools

Configure tools for agent workflows. Tool availability depends on the selected upstream route.