IntegrationsAWS Bedrock SDK

Overview

Use FinOps as a Bedrock-compatible gateway for the Converse and Invoke APIs, with FinOps features on top.

Overview

FinOps provides a Bedrock-compatible endpoint for the Converse and Invoke APIs via protocol adaptation. The integration handles request transformation, response normalization, and error mapping between AWS Bedrock's API specification and FinOps's internal processing pipeline.

This integration enables you to utilize FinOps's features like governance, load balancing, semantic caching, multi-provider support, and more, all while preserving your existing Bedrock SDK-based architecture.

Endpoint: /bedrock

Setup

Python

import boto3

# Configure boto3 Bedrock client to use FinOps
# Note: When using FinOps keys, dummy credentials are required
# because boto3 needs credentials to sign requests, even though
# FinOps will use its own configured keys.
client = boto3.client(
 service_name="bedrock-runtime",
 endpoint_url="{AI_GATEWAY_URL}/bedrock",
 region_name="us-west-2",
 aws_access_key_id="finops-dummy-key", # Required when using FinOps keys
 aws_secret_access_key="finops-dummy-secret" # Required when using FinOps keys
)

# Make requests as usual
response = client.converse(
 modelId="anthropic.claude-3-5-sonnet-20240620-v1:0",
 messages=[
 {
 "role": "user",
 "content": [{"text": "Hello!"}]
 }
 ]
)

print(response)

Provider/Model Usage Examples

Because Bedrock itself is a multi-provider platform, you can use any Bedrock-supported model ID and still route through FinOps. FinOps will handle governance, observability, and other cross-cutting concerns.

import boto3
import json

client = boto3.client(
 service_name="bedrock-runtime",
 endpoint_url="{AI_GATEWAY_URL}/bedrock",
 region_name="us-west-2",
 aws_access_key_id="finops-dummy-key",
 aws_secret_access_key="finops-dummy-secret"
)

# Anthropic via Bedrock (Converse API)
anthropic_response = client.converse(
 modelId="anthropic.claude-3-sonnet-20240229",
 messages=[{"role": "user", "content": [{"text": "Hello from Claude!"}]}]
)

# Mistral via Bedrock (Converse API)
mistral_response = client.converse(
 modelId="mistral.mistral-large-2407",
 messages=[{"role": "user", "content": [{"text": "Hello from Mistral!"}]}]
)

# Mistral via Bedrock (Invoke API)
mistral_invoke_response = client.invoke_model(
 modelId="mistral.mistral-large-2407",
 contentType="application/json",
 accept="application/json",
 body=json.dumps({
 "prompt": "Say hello from Mistral using Invoke API.",
 "max_tokens": 50,
 "temperature": 0.7
 }),
)

Adding Custom Headers

Pass custom headers required by FinOps plugins (like governance, telemetry, etc.) using boto3's event system:

Python

import boto3

def add_bifrost_headers(request, **kwargs):
 """Add custom FinOps headers to the request before signing."""
 request.headers.add_header("x-bf-vk", "vk_12345") # Virtual key for governance
 request.headers.add_header("x-bf-env", "production") # Environment tag

client = boto3.client(
 service_name="bedrock-runtime",
 endpoint_url="{AI_GATEWAY_URL}/bedrock",
 region_name="us-west-2",
 aws_access_key_id="finops-dummy-key",
 aws_secret_access_key="finops-dummy-secret"
)

# Register the header injection for all Bedrock API calls
client.meta.events.register_first(
 "before-sign.bedrock-runtime.*",
 add_bifrost_headers,
)

# Now make requests with custom headers
response = client.converse(
 modelId="anthropic.claude-3-5-sonnet-20240620-v1:0",
 messages=[{"role": "user", "content": [{"text": "Hello with custom headers!"}]}]
)

Note: Use register_first to ensure headers are added before request signing. The event name format is before-sign.<service-name>.<operation-name>. You need to register for each API operation you plan to use (Converse, ConverseStream, InvokeModel, etc.).


Streaming Examples

Converse Stream

Use converse_stream for chat-based streaming with a unified interface across models.

import boto3

client = boto3.client(
 service_name="bedrock-runtime",
 endpoint_url="{AI_GATEWAY_URL}/bedrock",
 region_name="us-west-2",
 aws_access_key_id="finops-dummy-key",
 aws_secret_access_key="finops-dummy-secret"
)

response = client.converse_stream(
 modelId="anthropic.claude-3-5-sonnet-20240620-v1:0",
 messages=[{"role": "user", "content": [{"text": "Tell me a story about a brave knight."}]}],
 inferenceConfig={"maxTokens": 512, "temperature": 0.5}
)

print("Response:")
for chunk in response["stream"]:
 if "contentBlockDelta" in chunk:
 text = chunk["contentBlockDelta"]["delta"]["text"]
 print(text, end="", flush=True)

Invoke Stream

Use invoke_model_with_response_stream for model-specific streaming payloads.

import boto3
import json

client = boto3.client(
 service_name="bedrock-runtime",
 endpoint_url="{AI_GATEWAY_URL}/bedrock",
 region_name="us-west-2",
 aws_access_key_id="finops-dummy-key",
 aws_secret_access_key="finops-dummy-secret"
)

# Example for Claude 3 (Messages API format)
body = json.dumps({
 "anthropic_version": "bedrock-2023-05-31",
 "max_tokens": 1024,
 "messages": [
 {"role": "user", "content": "Write a haiku about coding."}
 ]
})

response = client.invoke_model_with_response_stream(
 modelId="anthropic.claude-3-haiku-20240307-v1:0",
 body=body,
 contentType="application/json",
 accept="application/json"
)

print("Response:")
for event in response.get("body"):
 if "chunk" in event:
 chunk = event["chunk"]
 if "bytes" in chunk:
 # The chunk bytes contain the model-specific JSON response
 result = json.loads(chunk["bytes"].decode("utf-8"))
 
 # Extract content based on model (e.g., Claude)
 if "delta" in result and "text" in result["delta"]:
 print(result["delta"]["text"], end="", flush=True)
 elif "completion" in result:
 print(result["completion"], end="", flush=True)

Supported Features

The Bedrock integration currently supports:

  • Converse API (/bedrock/model/{modelId}/converse) for text/chat-style workloads
  • Invoke API (/bedrock/model/{modelId}/invoke) for model-specific text completion workloads
  • Streaming via converse_stream and invoke_model_with_response_stream
  • Tools via toolConfig, toolUse, and toolResult inside Converse requests
  • Image and multimodal responses where supported by the underlying Bedrock model
  • All FinOps core features that apply to these flows (governance, load balancing, semantic cache, observability, etc.)

Next Steps