Overview

Use FinOps as a drop-in replacement for OpenAI API with full compatibility and enhanced features.

Overview

FinOps provides complete OpenAI API compatibility through protocol adaptation. The integration handles request transformation, response normalization, and error mapping between OpenAI's API specification and FinOps's internal processing pipeline.

This integration enables you to utilize FinOps's features like governance, load balancing, semantic caching, multi-provider support, and more, all while preserving your existing OpenAI SDK-based architecture.

Endpoint: /openai

Setup

Python

import openai

# Configure client to use FinOps
client = openai.OpenAI(
 base_url="{AI_GATEWAY_URL}/openai",
 api_key="dummy-key" # Keys handled by FinOps
)

# Make requests as usual
response = client.chat.completions.create(
 model="gpt-4o-mini",
 messages=[{"role": "user", "content": "Hello!"}]
)

print(response.choices[0].message.content)

JavaScript

import OpenAI from "openai";

// Configure client to use FinOps
const openai = new OpenAI({
 baseURL: "{AI_GATEWAY_URL}/openai",
 apiKey: "dummy-key", // Keys handled by FinOps
});

// Make requests as usual
const response = await openai.chat.completions.create({
 model: "gpt-4o-mini",
 messages: [{ role: "user", content: "Hello!" }],
});

console.log(response.choices[0].message.content);

Provider/Model Usage Examples

Use multiple providers through the same OpenAI SDK format by prefixing model names with the provider:

Python

import openai

client = openai.OpenAI(
 base_url="{AI_GATEWAY_URL}/openai",
 api_key="dummy-key"
)

# OpenAI models (default)
openai_response = client.chat.completions.create(
 model="gpt-4o-mini",
 messages=[{"role": "user", "content": "Hello from OpenAI!"}]
)

# Anthropic models via OpenAI SDK format
anthropic_response = client.chat.completions.create(
 model="anthropic/claude-3-sonnet-20240229",
 messages=[{"role": "user", "content": "Hello from Claude!"}]
)

# Google Vertex models via OpenAI SDK format
vertex_response = client.chat.completions.create(
 model="vertex/gemini-pro",
 messages=[{"role": "user", "content": "Hello from Gemini!"}]
)

# Azure models
azure_response = client.chat.completions.create(
 model="azure/gpt-4o",
 messages=[{"role": "user", "content": "Hello from Azure!"}]
)

# Local Ollama models
ollama_response = client.chat.completions.create(
 model="ollama/llama3.1:8b",
 messages=[{"role": "user", "content": "Hello from Ollama!"}]
)

JavaScript

import OpenAI from "openai";

const openai = new OpenAI({
 baseURL: "{AI_GATEWAY_URL}/openai",
 apiKey: "dummy-key",
});

// OpenAI models (default)
const openaiResponse = await openai.chat.completions.create({
 model: "gpt-4o-mini",
 messages: [{ role: "user", content: "Hello from OpenAI!" }],
});

// Anthropic models via OpenAI SDK format
const anthropicResponse = await openai.chat.completions.create({
 model: "anthropic/claude-3-sonnet-20240229",
 messages: [{ role: "user", content: "Hello from Claude!" }],
});

// Google Vertex models via OpenAI SDK format
const vertexResponse = await openai.chat.completions.create({
 model: "vertex/gemini-pro",
 messages: [{ role: "user", content: "Hello from Gemini!" }],
});

// Azure models
const azureResponse = await openai.chat.completions.create({
 model: "azure/gpt-4o",
 messages: [{ role: "user", content: "Hello from Azure!" }],
});

// Local Ollama models
const ollamaResponse = await openai.chat.completions.create({
 model: "ollama/llama3.1:8b",
 messages: [{ role: "user", content: "Hello from Ollama!" }],
});

Adding Custom Headers

Pass custom headers required by FinOps plugins (like governance, telemetry, etc.):

Python

import openai

client = openai.OpenAI(
 base_url="{AI_GATEWAY_URL}/openai",
 api_key="dummy-key",
 default_headers={
 "x-bf-vk": "vk_12345", # Virtual key for governance
 }
)

response = client.chat.completions.create(
 model="gpt-4o-mini",
 messages=[{"role": "user", "content": "Hello with custom headers!"}]
)

JavaScript

import OpenAI from "openai";

const openai = new OpenAI({
 baseURL: "{AI_GATEWAY_URL}/openai",
 apiKey: "dummy-key",
 defaultHeaders: {
 "x-bf-vk": "vk_12345", // Virtual key for governance
 },
});

const response = await openai.chat.completions.create({
 model: "gpt-4o-mini",
 messages: [{ role: "user", content: "Hello with custom headers!" }],
});

Async Inference

Submit inference requests asynchronously and poll for results later using the x-bf-async header. This is useful for long-running requests where you don't want to hold a connection open. See Async Inference for full details.

Async inference requires a Logs Store to be configured and is not compatible with streaming.

Chat Completions

Python

import openai
import time

client = openai.OpenAI(
 base_url="{AI_GATEWAY_URL}/openai",
 api_key="dummy-key"
)

# Submit async request
initial = client.chat.completions.create(
 model="openai/gpt-4o-mini",
 messages=[{"role": "user", "content": "Tell me a short story."}],
 extra_headers={"x-bf-async": "true"}
)

# If choices are present, the request completed synchronously
if initial.choices:
 print(initial.choices[0].message.content)
else:
 # Poll until completed
 while True:
 time.sleep(2)
 poll = client.chat.completions.create(
 model="openai/gpt-4o-mini",
 messages=[{"role": "user", "content": "Tell me a short story."}],
 extra_headers={"x-bf-async-id": initial.id}
 )
 if poll.choices:
 print(poll.choices[0].message.content)
 break

JavaScript

import OpenAI from "openai";

const openai = new OpenAI({
 baseURL: "{AI_GATEWAY_URL}/openai",
 apiKey: "dummy-key",
});

// Submit async request
const initial = await openai.chat.completions.create(
 {
 model: "openai/gpt-4o-mini",
 messages: [{ role: "user", content: "Tell me a short story." }],
 },
 { headers: { "x-bf-async": "true" } }
);

// If choices are present, the request completed synchronously
if (initial.choices?.length > 0) {
 console.log(initial.choices[0].message.content);
} else {
 // Poll until completed
 while (true) {
 await new Promise((r) => setTimeout(r, 2000));
 const poll = await openai.chat.completions.create(
 {
 model: "openai/gpt-4o-mini",
 messages: [{ role: "user", content: "Tell me a short story." }],
 },
 { headers: { "x-bf-async-id": initial.id } }
 );
 if (poll.choices?.length > 0) {
 console.log(poll.choices[0].message.content);
 break;
 }
 }
}

Responses API

Python

import openai
import time

client = openai.OpenAI(
 base_url="{AI_GATEWAY_URL}/openai",
 api_key="dummy-key"
)

# Submit async request
initial = client.responses.create(
 model="openai/gpt-4o-mini",
 input="Tell me a short story.",
 extra_headers={"x-bf-async": "true"}
)

# If status is "completed", the request completed synchronously
if initial.status == "completed":
 print(initial.output_text)
else:
 # Poll until completed
 while True:
 time.sleep(2)
 poll = client.responses.create(
 model="openai/gpt-4o-mini",
 input="Tell me a short story.",
 extra_headers={"x-bf-async-id": initial.id}
 )
 if poll.status == "completed":
 print(poll.output_text)
 break

JavaScript

import OpenAI from "openai";

const openai = new OpenAI({
 baseURL: "{AI_GATEWAY_URL}/openai",
 apiKey: "dummy-key",
});

// Submit async request
const initial = await openai.responses.create(
 { model: "openai/gpt-4o-mini", input: "Tell me a short story." },
 { headers: { "x-bf-async": "true" } }
);

// If status is "completed", the request completed synchronously
if (initial.status === "completed") {
 console.log(initial.output_text);
} else {
 // Poll until completed
 while (true) {
 await new Promise((r) => setTimeout(r, 2000));
 const poll = await openai.responses.create(
 { model: "openai/gpt-4o-mini", input: "Tell me a short story." },
 { headers: { "x-bf-async-id": initial.id } }
 );
 if (poll.status === "completed") {
 console.log(poll.output_text);
 break;
 }
 }
}

Async Headers

Header	Description
`x-bf-async: true`	Submit the request as an async job. Returns immediately with a job ID.
`x-bf-async-id: <job-id>`	Poll for results of a previously submitted async job.
`x-bf-async-job-result-ttl: <seconds>`	Override the default result TTL (default: 3600s).

Supported Features

The OpenAI integration supports all features that are available in both the OpenAI SDK and FinOps core functionality. If the OpenAI SDK supports a feature and FinOps supports it, the integration will work seamlessly.

Next Steps

Files and Batch API - File uploads and batch processing
Anthropic SDK - Claude integration patterns
Google GenAI SDK - Gemini integration patterns
Configuration - FinOps setup and configuration
Core Features - Advanced FinOps capabilities

What is an integration?Files and Batch API

Overview

On this page