Create async chat completion

Submits a chat completion request for asynchronous execution. Returns a job ID immediately with HTTP 202. Poll the corresponding GET endpoint with the job ID to retrieve the result. Streaming is not supported for async requests.

POST
/v1/async/chat/completions
AuthorizationBearer <token>

Bearer token authentication. Use your MPilot virtual-key JWT or admin JWT. Virtual keys (prefixed with sk-bf-) can also be passed here.

In: header

Header Parameters

x-bf-async-job-result-ttl?integer

Time-to-live in seconds for the job result after completion. Defaults to 3600 (1 hour). After expiry, the job result is automatically cleaned up.

Default3600
modelstring

Model in provider/model format (e.g., openai/gpt-4)

messagesarray<ChatMessage>

List of messages in the conversation

fallbacks?array<string>

Fallback models in provider/model format

stream?boolean

Whether to stream the response

frequency_penalty?number
Range-2 <= value <= 2
logit_bias?object

Empty Object

logprobs?boolean
max_completion_tokens?integer
metadata?object

Empty Object

modalities?array<string>
parallel_tool_calls?boolean
presence_penalty?number
Range-2 <= value <= 2
prompt_cache_key?string
reasoning?object
response_format?object

Format for the response

Empty Object

safety_identifier?string
service_tier?string
stream_options?object
store?boolean
temperature?number
Range0 <= value <= 2
tool_choice?string | object
tools?array<object>
seed?integer

Deterministic sampling seed

top_p?number

Nucleus sampling parameter

Range0 <= value <= 1
top_logprobs?integer

Number of most likely tokens to return at each position

Range0 <= value <= 20
stop?string | array<string>

Up to 4 sequences where the API will stop generating tokens

prediction?object

Predicted output content for the model to reference (OpenAI only). Can reduce latency.

prompt_cache_retention?string

Prompt cache retention policy

Value in"in-memory" | "24h"
web_search_options?object

Web search options for chat completions (OpenAI only)

truncation?string
user?string
verbosity?string
Value in"low" | "medium" | "high"

Response Body

curl -X POST "https://loading/{AI_GATEWAY_URL}/v1/async/chat/completions" \  -H "x-bf-async-job-result-ttl: 3600" \  -H "Content-Type: application/json" \  -d '{    "model": "openai/gpt-4",    "messages": [      {        "role": "assistant"      }    ]  }'
{
  "id": "string",
  "status": "pending",
  "expires_at": "2019-08-24T14:15:22Z",
  "created_at": "2019-08-24T14:15:22Z",
  "completed_at": "2019-08-24T14:15:22Z",
  "status_code": 0,
  "result": null,
  "error": {
    "event_id": "string",
    "type": "string",
    "is_bifrost_error": true,
    "status_code": 0,
    "error": {
      "type": "string",
      "code": "string",
      "message": "string",
      "param": "string",
      "event_id": "string"
    },
    "extra_fields": {
      "provider": "openai",
      "model_requested": "string",
      "request_type": "string"
    }
  }
}
{
  "event_id": "string",
  "type": "string",
  "is_bifrost_error": true,
  "status_code": 0,
  "error": {
    "type": "string",
    "code": "string",
    "message": "string",
    "param": "string",
    "event_id": "string"
  },
  "extra_fields": {
    "provider": "openai",
    "model_requested": "string",
    "request_type": "string"
  }
}
{
  "event_id": "string",
  "type": "string",
  "is_bifrost_error": true,
  "status_code": 0,
  "error": {
    "type": "string",
    "code": "string",
    "message": "string",
    "param": "string",
    "event_id": "string"
  },
  "extra_fields": {
    "provider": "openai",
    "model_requested": "string",
    "request_type": "string"
  }
}
On this page

On this page

No Headings