Responses
Use the Responses endpoint for newer OpenAI-compatible generation flows, including agent-style input, multimodal content, and tool-oriented applications.
Endpoint
Endpoint
Create a response using the Velrix OpenAI-compatible base URL.
Method
Create a model response.
POST /v1/responsesBase URL
OpenAI-compatible Velrix endpoint.
https://api.velrix.ai/v1Authentication
Headers
Velrix follows OpenAI-compatible bearer-token authentication for this route.
Authorization
Scoped Velrix API key.
Bearer $VELRIX_API_KEYContent-Type
Requests and responses use JSON.
application/jsonSchema
Body parameters
These are the primary fields most Responses clients need. Velrix forwards compatible fields to the selected upstream route when supported.
modelRequired
Model ID to route. Use gpt-5.4 for policy routing or a catalog model ID.
inputRequired
Text, image, file, or structured input items for the model to process. The official endpoint accepts either text or typed input arrays.
instructionsOptional
System or developer-style instructions inserted into the model context for this response.
previous_response_idOptional
Link this request to a previous response when building multi-turn stateful flows.
tools / tool_choiceOptional
Configure hosted tools or function-like tools, and control whether tool use is automatic, required, or constrained.
streamOptional
Return incremental server-sent events as output is generated.
backgroundOptional
Run the response asynchronously in the background when supported by the selected route.
truncationOptional
Control how the request should handle context-window overflow, such as automatic truncation or disabled truncation.
reasoningOptional
Configure reasoning behavior for models that expose reasoning controls.
textOptional
Configure text response formatting, including structured output settings when supported.
metadataOptional
Attach key-value metadata for filtering, tracing, or operational bookkeeping.
Request
Request example
Send a model and input. Velrix applies the same key scoping, routing policy, and telemetry as other gateway endpoints.
curl https://api.velrix.ai/v1/responses \
-H "Authorization: Bearer $VELRIX_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5.4",
"instructions": "Write concise operational guidance.",
"input": "Draft a release checklist for the gateway.",
"metadata": {
"service": "dashboard-docs"
}
}'curl https://api.velrix.ai/v1/responses \
-H "Authorization: Bearer $VELRIX_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5.4",
"input": [
{
"role": "user",
"content": [
{
"type": "input_text",
"text": "List the rollout risks."
}
]
}
],
"stream": true
}'Response
Response shape
Responses use a unified response object. Clients often read output text directly when available, while advanced clients inspect output items.
{
"id": "id",
"created_at": 0,
"error": {
"code": "server_error",
"message": "message"
},
"incomplete_details": {
"reason": "max_output_tokens"
},
"instructions": "string",
"metadata": {
"foo": "string"
},
"model": "gpt-5.1",
"object": "response",
"output": [
{
"id": "id",
"content": [
{
"annotations": [
{
"file_id": "file_id",
"filename": "filename",
"index": 0,
"type": "file_citation"
}
],
"logprobs": [
{
"token": "token",
"bytes": [
0
],
"logprob": 0,
"top_logprobs": [
{
"token": "token",
"bytes": [
0
],
"logprob": 0
}
]
}
],
"text": "text",
"type": "output_text"
}
],
"role": "assistant",
"status": "in_progress",
"type": "message"
}
],
"parallel_tool_calls": true,
"temperature": 1,
"tool_choice": "none",
"tools": [
{
"name": "name",
"parameters": {
"foo": "bar"
},
"strict": true,
"type": "function",
"description": "description"
}
],
"top_p": 1,
"background": true,
"completed_at": 0,
"conversation": {
"id": "id"
},
"max_output_tokens": 0,
"max_tool_calls": 0,
"output_text": "output_text",
"previous_response_id": "previous_response_id",
"prompt": {
"id": "id",
"variables": {
"foo": "string"
},
"version": "version"
},
"prompt_cache_key": "prompt-cache-key-1234",
"prompt_cache_retention": "in-memory",
"reasoning": {
"effort": "none",
"generate_summary": "auto",
"summary": "auto"
},
"safety_identifier": "safety-identifier-1234",
"service_tier": "auto",
"status": "completed",
"text": {
"format": {
"type": "text"
},
"verbosity": "low"
},
"top_logprobs": 0,
"truncation": "auto",
"usage": {
"input_tokens": 0,
"input_tokens_details": {
"cached_tokens": 0
},
"output_tokens": 0,
"output_tokens_details": {
"reasoning_tokens": 0
},
"total_tokens": 0
},
"user": "user-1234"
}Operations
Workflow notes
Use Responses when your framework prefers a unified response object instead of chat completion choices.
Agent-friendly shape
Keep tool and multimodal workflows on a response-oriented API while still routing through Velrix.
Same routing controls
Use automatic routing or pinned catalog models exactly as you would with chat completions.
Streaming
Use stream: true for incremental server-sent events when the selected route supports streaming.
Tools
Configure tools for agent workflows. Tool availability depends on the selected upstream route.