Simulator

Test prompts against virtual keys and inspect live budget, rate limit, and routing behavior.

The Simulator lets you send real inference requests through the FinOps AI Gateway and see governance take effect immediately — without writing code or deploying a client application.

FinOps Simulator with conversation panel and governance hierarchy

When to use it

  • Validate budgets before production traffic hits a cap
  • Confirm routing rules redirect to the expected model
  • Reproduce rate-limit and model-block errors safely
  • Demo FinOps behavior to stakeholders

Layout

The page has two panels:

PanelPurpose
Conversation (left)Chat-style prompt/response testing
Governance hierarchy (right)Live budget, limit, and routing state for the selected virtual key

Step-by-step

  1. Open Finops Config → Simulator.
  2. On the right, select a Virtual key from the dropdown (active keys from User Keys).
  3. Review the Governance hierarchy cards — each shows Budget, Limits, and Routing for that scope.
  4. Choose a model (provider / model) in the compose bar.
  5. Type a prompt or pick a Suggested prompt preset, then send (Enter) or click the send button.
  6. Inspect the response:
    • Routed banner — requested model differed from the model actually used; hover for rule name and CEL expression
    • Error alerts — budget exceeded, rate limited, or model/provider blocked
    • Details (chart icon) — tokens, cost, latency, and routing metadata
  7. Use Load history to pull prior turns from gateway logs for this key.
  8. Use Refresh to reload models and governance without clearing the chat.
  9. Use Clear to reset the on-screen conversation only.

Conversation controls

ControlAction
Load historyFetches paginated inference logs for the selected virtual key
RefreshReloads governance hierarchy and available models
ClearClears the in-session chat display
Suggested promptsNine FinOps-themed presets (cost drivers, budget alerts, routing policy, etc.)
Query paramsKey=value pairs sent with the request so routing rules can match on metadata

Governance hierarchy

Cards appear in evaluation order from most specific to global:

Hierarchy: Virtual key scope → Model scope → Organization chain → Global scope

Each card shows:

  • Budget$used / $max · N% used with color coding (green under 70%, amber 70–89%, red 90%+)
  • Limits — request and/or token rate-limit usage, or "None"
  • Routing — rule names; click to see status, CEL expression, target provider/model, and fallbacks

The breadcrumb above the cards (e.g. Engineering (UK) → UK Entity → EMEA → tokensystem) reflects the virtual key's Governance Scope Organization walking up the organization tree.

Budget and limit counters refresh automatically every second while a virtual key is selected.

Auto-fire

Toggle Auto-fire on the right panel to send a random suggested prompt every second. Useful for stress-testing rate limits and watching budget bars move under sustained load.

Auto-fire sends real gateway requests and consumes budget. Use only in non-production tenants or with test keys.

Error types you may see

AlertMeaning
Budget ExceededA budget at virtual key, org, model, or global scope blocked the request
Rate Limited / Token Limited / Request LimitedA rate limit window was exhausted
Provider Blocked / Model BlockedThe model or provider is not allowed for this virtual key
Virtual Key BlockedThe key is inactive or otherwise denied