Create async chat completion

Submits a chat completion request for asynchronous execution. Returns a job ID immediately with HTTP 202. Poll the corresponding GET endpoint with the job ID to retrieve the result. Streaming is not supported for async requests.

Authorization

AuthorizationBearer <token>

Bearer token authentication. Use your MPilot virtual-key JWT or admin JWT. Virtual keys (prefixed with sk-bf-) can also be passed here.

In: header

Header Parameters

x-bf-async-job-result-ttl?integer

Time-to-live in seconds for the job result after completion. Defaults to 3600 (1 hour). After expiry, the job result is automatically cleaned up.

Default3600

Request Body

modelstring

Model in provider/model format (e.g., openai/gpt-4)

messagesarray<ChatMessage>

List of messages in the conversation

fallbacks?array<string>

Fallback models in provider/model format

stream?boolean

Whether to stream the response

frequency_penalty?number

Range-2 <= value <= 2

logit_bias?object

Empty Object

logprobs?boolean

max_completion_tokens?integer

metadata?object

Empty Object

modalities?array<string>

parallel_tool_calls?boolean

presence_penalty?number

Range-2 <= value <= 2

prompt_cache_key?string

reasoning?object

response_format?object

Format for the response

Empty Object

safety_identifier?string

service_tier?string

stream_options?object

store?boolean

temperature?number

Range0 <= value <= 2

tool_choice?string | object

tools?array<object>

seed?integer

Deterministic sampling seed

top_p?number

Nucleus sampling parameter

Range0 <= value <= 1

top_logprobs?integer

Number of most likely tokens to return at each position

Range0 <= value <= 20

stop?string | array<string>

Up to 4 sequences where the API will stop generating tokens

prediction?object

Predicted output content for the model to reference (OpenAI only). Can reduce latency.

prompt_cache_retention?string

Prompt cache retention policy

Value in"in-memory" | "24h"

web_search_options?object

Web search options for chat completions (OpenAI only)

truncation?string

user?string

verbosity?string

Value in"low" | "medium" | "high"

Response Body

curl -X POST "https://loading/{AI_GATEWAY_URL}/v1/async/chat/completions" \
  -H "x-bf-async-job-result-ttl: 3600" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4",
    "messages": [
      {
        "role": "assistant"
      }
    ]
  }'

curl -X POST "https://loading/{AI_GATEWAY_URL}/v1/async/chat/completions" \  -H "x-bf-async-job-result-ttl: 3600" \  -H "Content-Type: application/json" \  -d '{    "model": "openai/gpt-4",    "messages": [      {        "role": "assistant"      }    ]  }'

{
  "id": "string",
  "status": "pending",
  "expires_at": "2019-08-24T14:15:22Z",
  "created_at": "2019-08-24T14:15:22Z",
  "completed_at": "2019-08-24T14:15:22Z",
  "status_code": 0,
  "result": null,
  "error": {
    "event_id": "string",
    "type": "string",
    "is_bifrost_error": true,
    "status_code": 0,
    "error": {
      "type": "string",
      "code": "string",
      "message": "string",
      "param": "string",
      "event_id": "string"
    },
    "extra_fields": {
      "provider": "openai",
      "model_requested": "string",
      "request_type": "string"
    }
  }
}

{
  "event_id": "string",
  "type": "string",
  "is_bifrost_error": true,
  "status_code": 0,
  "error": {
    "type": "string",
    "code": "string",
    "message": "string",
    "param": "string",
    "event_id": "string"
  },
  "extra_fields": {
    "provider": "openai",
    "model_requested": "string",
    "request_type": "string"
  }
}

{
  "event_id": "string",
  "type": "string",
  "is_bifrost_error": true,
  "status_code": 0,
  "error": {
    "type": "string",
    "code": "string",
    "message": "string",
    "param": "string",
    "event_id": "string"
  },
  "extra_fields": {
    "provider": "openai",
    "model_requested": "string",
    "request_type": "string"
  }
}

Retrieve a file from a container Create async embedding

On this page

No Headings

Create async chat completion

Authorization

Header Parameters

Request Body

Response Body

202

400

500

On this page