BazaarLinkBazaarLink
Sign in
DocsAPI ReferenceSDK ReferenceAgentic UsageAI Skills

API Reference

Chat Completions

The primary endpoint. Compatible with the OpenAI Chat Completions API.

POST/api/v1/chat/completions

Request Body

modelrequired
string
Model ID, e.g. "openai/gpt-4o" or "anthropic/claude-3.5-sonnet"
messagesrequired
Message[]
Array of message objects with role and content
stream
boolean
If true, returns a Server-Sent Events stream. Default: false
temperature
number
Sampling temperature 0–2. Higher = more random. Default: 1
max_tokens
integer
Maximum number of tokens to generate
max_completion_tokens
integer
Alias for max_tokens (OpenAI o-series compatible). Both are accepted; whichever is provided takes effect
top_p
number
Nucleus sampling probability mass. Default: 1
top_k
integer
Limit token choices to top-K. 0 = disabled (consider all). Default: 0
frequency_penalty
number
Penalize repeated tokens. Range: [-2, 2]. Default: 0
presence_penalty
number
Penalize tokens based on presence. Range: [-2, 2]. Default: 0
repetition_penalty
number
Reduce token repetition from input. Range: (0, 2]. Default: 1
min_p
number
Minimum probability relative to the top token. Range: [0, 1]. Default: 0
top_a
number
Dynamic top-P based on highest-probability token. Range: [0, 1]. Default: 0
seed
integer
Integer seed for deterministic sampling. Not guaranteed for all models
n
integer
Number of completions to generate. Default: 1
user
string
End-user identifier for monitoring and abuse detection. Has no effect on billing
stop
string | string[]
Stop sequences — generation halts when encountered
logit_bias
object
Map token IDs to bias values [-100, 100] added before sampling
logprobs
boolean
Return log probabilities of each output token
top_logprobs
integer
Number of most-likely tokens to return per position (requires logprobs: true). Range: 0–20
tools
Tool[]
List of tools (functions) the model may call
tool_choice
string | object
Controls tool use: "auto", "none", or specific tool
parallel_tool_calls
boolean
Enable parallel function calling when tools are provided. Default: true
response_format
object
Force structured JSON output. See Structured Output section
transforms
string[]
Message transforms to apply, e.g. ["middle-out"]. Omit to auto-apply on ≤8k-context models
models
string[]
Fallback model list — BazaarLink tries each in order if primary fails
route
string
Set to "fallback" to enable waterfall routing through the models array
provider
object
Provider routing preferences — order, only, ignore, sort, allow_fallbacks
debug
object
Debug options. echo_upstream_body: true returns transformed request body as first SSE chunk (streaming only)

Request Schema (TypeScript)

typescript
type Request = {
  // Required
  model: string;                    // "provider/model-name"
  messages: Message[];

  // Common
  stream?: boolean;                 // Default: true
  temperature?: number;             // Range: [0, 2], default: 0.7
  max_tokens?: number;              // Range: [1, context_length)
  n?: number;                       // Default: 1
  seed?: integer;                   // Deterministic sampling
  stop?: string | string[];

  // Sampling
  top_p?: number;                   // Range: (0, 1]
  top_k?: integer;                  // Default: 0 (disabled)
  frequency_penalty?: number;       // Range: [-2, 2]
  presence_penalty?: number;        // Range: [-2, 2]
  repetition_penalty?: number;      // Range: (0, 2], default: 1
  min_p?: number;                   // Range: [0, 1]
  top_a?: number;                   // Range: [0, 1]

  // Logprobs
  logit_bias?: Record<number, number>;  // Token ID → bias [-100, 100]
  logprobs?: boolean;
  top_logprobs?: number;            // Range: [0, 20], requires logprobs: true

  // Tools & output
  tools?: Tool[];
  tool_choice?: ToolChoice;
  parallel_tool_calls?: boolean;    // Default: true
  response_format?: ResponseFormat;

  // BazaarLink-only
  transforms?: string[];            // e.g. ["middle-out"]
  models?: string[];                // Fallback model list
  route?: "fallback";
  provider?: ProviderPreferences;
  debug?: {
    echo_upstream_body?: boolean;   // Streaming only
  };
};

type Message =
  | { role: "system" | "user" | "assistant"; content: string | ContentPart[] }
  | { role: "tool"; content: string; tool_call_id: string };

type ContentPart =
  | { type: "text"; text: string }
  | { type: "image_url"; image_url: { url: string; detail?: string } };

type Tool = {
  type: "function";
  function: {
    name: string;
    description?: string;
    parameters: object;  // JSON Schema
  };
};

type ToolChoice =
  | "none" | "auto" | "required"
  | { type: "function"; function: { name: string } };

type ResponseFormat =
  | { type: "json_object" }
  | { type: "json_schema"; json_schema: { name: string; strict?: boolean; schema: object } };

type ProviderPreferences = {
  order?: string[];
  only?: string[];
  ignore?: string[];
  allow_fallbacks?: boolean;
  sort?: "price" | "latency" | "throughput";
};

Example Request

bash
curl https://bazaarlink.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $BAZAARLINK_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4.1",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain quantum computing in one paragraph."}
    ],
    "temperature": 0.7,
    "max_tokens": 512
  }'

Response

json
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1740000000,
  "model": "openai/gpt-4.1",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Quantum computing leverages quantum mechanics..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 28,
    "completion_tokens": 74,
    "total_tokens": 102,
    "cost": 0.0006480,
    "prompt_tokens_details": {
      "cached_tokens": 0
    },
    "completion_tokens_details": {
      "reasoning_tokens": 0
    }
  }
}

Response Schema (TypeScript)

typescript
type Response = {
  id: string;
  object: "chat.completion" | "chat.completion.chunk";
  created: number;                 // Unix timestamp
  model: string;
  choices: (NonStreamingChoice | StreamingChoice)[];
  usage?: ResponseUsage;
  cost?: number;                   // Total cost in USD
};

type NonStreamingChoice = {
  index: number;
  finish_reason: "stop" | "length" | "tool_calls" | "content_filter" | null;
  native_finish_reason: string | null;  // Provider's original finish reason
  message: {
    role: "assistant";
    content: string | null;
    tool_calls?: ToolCall[];
  };
};

type StreamingChoice = {
  index: number;
  finish_reason: string | null;
  native_finish_reason: string | null;  // Provider's original finish reason
  delta: {
    role?: string;
    content?: string | null;
    tool_calls?: ToolCall[];
  };
};

type ResponseUsage = {
  prompt_tokens: number;
  completion_tokens: number;
  total_tokens: number;
  cost: number;                      // Total cost for this request in USD
  prompt_tokens_details?: {
    cached_tokens: number;           // Tokens served from prompt cache (reduced cost)
    cache_write_tokens?: number;     // Tokens written to cache in this request
    audio_tokens?: number;
  };
  completion_tokens_details?: {
    reasoning_tokens?: number;       // Thinking/reasoning tokens (e.g. o3, Qwen3, DeepSeek R1)
    image_tokens?: number;
  };
};

type ToolCall = {
  id: string;
  type: "function";
  function: { name: string; arguments: string };
};

Image Generation

BazaarLink offers two image-generation paths: (A) /v1/chat/completions with modalities: ["image"] — the native path, supports SSE streaming and mixed text+image output; recommended for new integrations. (B) /v1/images/generations — OpenAI DALL·E-compatible request shape, but responses are SSE event streams (required for slow models to dodge the 100 s upstream timeout). Both paths emit the same SSE event protocol — endpoint choice is purely a request-shape preference.

Breaking change in v0.200.0

/api/v1/images/generations now returns text/event-stream (SSE) instead of sync JSON. This is required to support models with >100 s generation time (Cloudflare origin timeout). Existing fast-model integrations need to switch to SSE consumption — see the migration snippet below for a 5-line wrapper that restores sync semantics on the client side.

A. /v1/chat/completions (native, recommended)

POST/api/v1/chat/completions

The canonical streaming path. Recommended for any new integration.

bash
curl -N https://bazaarlink.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $BL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-5.4-image-2",
    "messages": [{"role":"user","content":"a red cat on a sofa"}],
    "modalities": ["image","text"],
    "stream": true
  }'

B. /v1/images/generations (DALL·E-compatible)

POST/api/v1/images/generations

OpenAI DALL-E request shape. Returns SSE event stream regardless of model.

modelrequired
string
Model id, e.g. google/gemini-2.5-flash-image
promptrequired
string
Text prompt
size
string
Output size (auto-mapped)
n
integer
Number of images (default 1)
bash
curl -N https://bazaarlink.ai/api/v1/images/generations \
  -H "Authorization: Bearer $BL_API_KEY" \
  -H "Accept: text/event-stream" \
  -H "Content-Type: application/json" \
  -d '{"model":"openai/gpt-5.4-image-2","prompt":"a red cat on a sofa"}'

SSE event protocol

Both endpoints emit the same event types:

text
event: heartbeat        # every 60 s, keeps Cloudflare happy
data: {}

event: image            # upstream URL, fastest path
data: {"index": 0, "url": "https://upstream/a.png"}

event: image-cached     # bazaarlink Redis-backed proxy URL (1 hr TTL)
data: {"index": 0, "url": "https://bazaarlink.ai/api/v1/images/proxy/<token>"}

event: usage            # final cost / token count
data: {"promptTokens": 12, "completionTokens": 7080, "cost": 0.226, "durationMs": 163400, "imageCount": 1}

event: done
data: {}

Migration: sync wrapper for clients that need it

If your integration genuinely needs sync semantics (synchronous scripts, environments without SSE-capable clients), drop in either snippet below to consume the SSE stream and return a DALL-E-shaped object.

Python (requires pip install requests sseclient-py):

python
import json, requests, sseclient

def generate_image_sync(model, prompt, api_key, host="https://bazaarlink.ai"):
    res = requests.post(
        f"{host}/api/v1/images/generations",
        headers={"Authorization": f"Bearer {api_key}", "Accept": "text/event-stream"},
        json={"model": model, "prompt": prompt},
        stream=True,
    )
    images, usage = [], None
    for ev in sseclient.SSEClient(res).events():
        data = json.loads(ev.data) if ev.data else {}
        if ev.event == "image-cached":
            images.append({"url": data["url"]})
        elif ev.event == "usage":
            usage = data
        elif ev.event == "error":
            raise RuntimeError(data["reason"])
        elif ev.event == "done":
            break
    return {"data": images, "usage": usage}

Node.js (built-in fetch, no deps):

javascript
async function generateImageSync(model, prompt, apiKey, host = "https://bazaarlink.ai") {
  const res = await fetch(`${host}/api/v1/images/generations`, {
    method: "POST",
    headers: { Authorization: `Bearer ${apiKey}`, Accept: "text/event-stream", "Content-Type": "application/json" },
    body: JSON.stringify({ model, prompt }),
  });
  const reader = res.body.getReader();
  const dec = new TextDecoder();
  let buf = "", images = [], usage = null;
  while (true) {
    const { value, done } = await reader.read();
    if (done) break;
    buf += dec.decode(value, { stream: true });
    let m;
    while ((m = buf.match(/event: (\S+)\ndata: (.+?)\n\n/))) {
      const [, event, data] = m;
      buf = buf.slice(m.index + m[0].length);
      const d = JSON.parse(data);
      if (event === "image-cached") images.push({ url: d.url });
      else if (event === "usage") usage = d;
      else if (event === "error") throw new Error(d.reason);
    }
  }
  return { data: images, usage };
}

Supported image models

Model IDModality

Video Generation

Asynchronous three-step flow (submit → poll → content). Video generation takes 30 s–5 min, which doesn't fit the synchronous request/response shape of chat-completions — so BazaarLink exposes it as a dedicated /api/v1/videos endpoint using a job-id pattern: submit returns a vjob_* ID → poll status → fetch bytes on completion. Calling a video model via /chat/completions or /images/generations returns 400 (code: wrong_endpoint_for_video). Billing settles against real usage.cost when the job reaches completed.

1. Submit job (returns vjob_xxx immediately)

POST/api/v1/videos
modelrequired
string
Model id, e.g. alibaba/wan-2.7
promptrequired
string
Text prompt
duration
integer
Duration in seconds (model-dependent)
resolution
string
Resolution — 480p / 720p / 1080p etc.
generate_audio
boolean
Generate audio track (true/false)
bash
curl https://bazaarlink.ai/api/v1/videos \
  -H "Authorization: Bearer $BL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "alibaba/wan-2.7",
    "prompt": "a bird flying over mountains",
    "duration": 3,
    "resolution": "720p",
    "generate_audio": false
  }'
# → 202 { "id": "vjob_xxx", "status": "pending" }
Note
Submit reserves worst-case × duration × multiplier; on completion the real upstream `usage.cost` settles the delta.

2. Poll status

GET/api/v1/videos/{id}
bash
curl -H "Authorization: Bearer $BL_API_KEY" \
  https://bazaarlink.ai/api/v1/videos/vjob_xxx
Note
Each GET must be ≥ 8 s apart to actually hit the upstream (avoids rate limits).
Note
When status=failed, the full reserve amount is refunded to the user.

3. Fetch video content (MP4)

GET/api/v1/videos/{id}/content
bash
curl -H "Authorization: Bearer $BL_API_KEY" \
  -o output.mp4 \
  https://bazaarlink.ai/api/v1/videos/vjob_xxx/content

Supported video models

Model IDModality

Audio Inputs

BazaarLink supports two audio paths: inline audio in chat messages for multimodal models, and a dedicated transcription endpoint for speech-to-text.

Path 1 — Chat completions (multimodal input)

Base64 only
Audio data must be **raw base64** — do NOT include a data URI prefix (`data:audio/...;base64,`). That prefix is only used for `image_url`. Pass the bare base64 string directly in the `data` field.

Supported Formats

WAVMP3AIFFAACOGG (Opus / Vorbis)FLACM4APCM16 (raw)PCM24 (raw)
python
import base64

with open("audio.wav", "rb") as f:
    audio_b64 = base64.b64encode(f.read()).decode()
    # audio_b64 is a raw base64 string — do NOT add data: prefix

response = client.chat.completions.create(
    model="openai/gpt-4o-audio-preview",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Transcribe this audio:"},
            {
                "type": "input_audio",
                "input_audio": {
                    "data": audio_b64,   # raw base64, no "data:audio/...;base64," prefix
                    "format": "wav",     # wav, mp3, flac, ogg, m4a, aac
                },
            },
        ],
    }],
)
print(response.choices[0].message.content)
Compatibility
Not all models support all audio formats — check the model page for supported modalities. Billing is based on token usage proportional to audio duration.

Path 2 — Transcription endpoint

POST /v1/audio/transcriptions is a dedicated speech-to-text endpoint compatible with the OpenAI Whisper API. Drop-in replacement: point your existing OpenAI SDK client at BazaarLink and it just works.

Model
Billing
Description
openai/whisper-1Duration (seconds)Classic Whisper — reliable, low cost, broad format support
openai/gpt-4o-transcribeTokensGPT-4o powered — higher accuracy for noisy audio and accents
python
from openai import OpenAI

client = OpenAI(
    base_url="https://bazaarlink.ai/v1",
    api_key="YOUR_API_KEY",
)

with open("audio.wav", "rb") as f:
    transcript = client.audio.transcriptions.create(
        model="openai/whisper-1",   # or "openai/gpt-4o-transcribe"
        file=f,
    )

print(transcript.text)
Billing
whisper-1 bills by audio duration (seconds). gpt-4o-transcribe bills by tokens. Both return the exact upstream cost in the usage.cost field.

Text-to-Speech (TTS)

Synthesize natural speech from text via POST /v1/audio/speech. Returns binary audio (mp3/opus/aac/flac/wav/pcm depending on model and response_format).

Binary response, not JSON
The response body is raw audio bytes — save with --output in curl, or call .arrayBuffer() in fetch. Usage is recorded server-side; query /v1/usage for spend.

TTS Models

ModelBillingNotes
openai/gpt-4o-mini-tts-2025-12-15$0.60 / 1M charsCost-efficient, 11 voices, accepts natural-language instructions for tone/pace/style.
openai/tts-1$15 / 1M charsOriginal OpenAI TTS — faster, lower fidelity.
openai/tts-1-hd$30 / 1M charsHigher-fidelity OpenAI TTS.
mistralai/voxtral-mini-tts-2603$16 / 1M charsMistral's voice model — different voice character.

Voices (gpt-4o-mini-tts)

alloyashballadcoralechofablenovaonyxsageshimmerverse

Voice availability varies by model — check the model card. For natural Mandarin female voice, pick coral, nova, sage, or shimmer.

Response Formats

mp3 (default)opusaacflacwavpcm

Each model accepts a subset of formats. mp3 and pcm are the safest defaults.

python
from openai import OpenAI

client = OpenAI(
    base_url="https://bazaarlink.ai/api/v1",
    api_key="sk-bl-YOUR_API_KEY",
)

with client.audio.speech.with_streaming_response.create(
    model="openai/gpt-4o-mini-tts-2025-12-15",
    voice="coral",
    input="Hello from BazaarLink.",
    response_format="mp3",
) as response:
    response.stream_to_file("hello.mp3")

Steering with instructions

openai/gpt-4o-mini-tts accepts an optional instructions string in natural language to direct voice tone, pace, and style.

Less is more
For natural-sounding voice, leave instructions empty and pick a suitable voice. Heavy direction (e.g. "cute, melodic, singing-style") can cause the model to over-emote.

Limits

Input text is limited to 4096 characters per request. For longer text, split into multiple calls.

Video Inputs

Send video content for models to analyze visuals, generate descriptions, or answer questions about scenes and events. Supported via Gemini models today.

Supported Formats

MP4 (H.264)MPEGMOVWebM

Provider Variations

  • Google AI (gemini-*): Supports YouTube URLs and Google Cloud Storage (gs://) URIs. File size limit: ~1 GB / up to 1 hour (Gemini 1.5 Pro); ~50 MB / 5 min (Gemini 1.5 Flash, 2.0 Flash).
  • Vertex AI (gemini-* via vertex): Supports base64-encoded video data and GCS URIs. Suited for private or enterprise storage.
  • Other providers: MP4, MPEG, MOV, WebM via URL or base64 (model-dependent).
python
# Video analysis via Gemini (Google AI — direct video URL)
response = client.chat.completions.create(
    model="google/gemini-2.5-flash",
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "video_url",
                "video_url": {"url": "https://example.com/video.mp4"},
            },
            {"type": "text", "text": "What is happening in this video?"},
        ],
    }],
)

# Via base64 — local file upload
import base64
with open("video.mp4", "rb") as f:
    vid_b64 = base64.b64encode(f.read()).decode()

response = client.chat.completions.create(
    model="google/gemini-2.5-flash",
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "video_url",
                "video_url": {"url": f"data:video/mp4;base64,{vid_b64}"},
            },
            {"type": "text", "text": "Describe the key scenes."},
        ],
    }],
)

Best Practices

  • Trim to only the necessary segments — shorter videos reduce cost and latency.
  • Use 720p resolution or lower — higher resolution rarely improves model understanding.
  • Compress with H.264 codec for widest compatibility.
  • For long videos, provide a text description of the relevant time range.

PDF Inputs

Send PDF documents directly in messages for models to analyze, summarize, or answer questions about. BazaarLink passes file content through to upstream providers that support it.

Supported Formats

  • PDF documents (text, images, tables, scanned)
  • Base64-encoded data URL (`data:application/pdf;base64,...`)
  • Multi-page documents
  • Password-free PDFs only
python
import base64

with open("document.pdf", "rb") as f:
    pdf_data = base64.b64encode(f.read()).decode()

response = client.chat.completions.create(
    model="anthropic/claude-sonnet-4.6",
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "file",
                "file": {
                    "filename": "document.pdf",
                    "file_data": f"data:application/pdf;base64,{pdf_data}",
                },
            },
            {"type": "text", "text": "Summarize this document."},
        ],
    }],
)

Processing Engines

Engine
Pricing
Description
nativeModel costForwards PDF bytes directly. Requires model-native PDF support (Claude, Gemini).
pdf-textFreeExtracts embedded text. Best for text-only PDFs with embedded fonts. Fast, zero extra cost.
mistral-ocrPaid / pageOCR extraction — works on scanned PDFs and image-heavy documents. Higher accuracy.

Selecting a Processing Engine

Pass a `plugins` array to select the PDF parsing engine. The parsed content is automatically injected into the model context — works with any model, not just PDF-native ones.

python
import base64

with open("document.pdf", "rb") as f:
    pdf_data = base64.b64encode(f.read()).decode()

response = client.chat.completions.create(
    model="openai/gpt-4o",        # any model — parser injects text into context
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "file",
                "file": {
                    "filename": "report.pdf",
                    "file_data": f"data:application/pdf;base64,{pdf_data}",
                },
            },
            {"type": "text", "text": "Summarize this document."},
        ],
    }],
    plugins=[{
        "id": "file-parser",
        "pdf": {"engine": "mistral-ocr"},  # or "pdf-text" (free), "native"
    }],
)

Responses API

An OpenAI Responses API-compatible endpoint for stateless multi-turn conversations, tool calling, and multimodal inputs. Ideal for agents and frameworks that use the OpenAI Python SDK ≥ 1.x with client.responses.create().

POST/api/v1/responses
Note
Accepts the same authentication and model routing as Chat Completions.

Request Body

modelrequired
string
Model ID, e.g. "openai/gpt-4o" or "anthropic/claude-3.5-sonnet"
inputrequired
string | Item[]
User input — a plain string (single message) or an array of input items for multi-turn / multimodal conversations.
instructions
string
System-level instructions, equivalent to a system message. Must be re-sent on every request.
stream
boolean
If true, returns Responses API SSE stream events. Event types: response.created, response.output_text.delta, response.completed.
max_output_tokens
integer
Maximum number of output tokens to generate (includes reasoning tokens for o-series models).
temperature
number
Sampling temperature 0–2. Higher = more random. Default: 1
top_p
number
Nucleus sampling probability mass. Default: 1
tools
Tool[]
Tool (function) definitions — same JSON Schema format as Chat Completions. Built-in tools (web_search, file_search, computer_use) are not supported.
tool_choice
string | object
Controls tool use: "auto", "none", or specific tool
parallel_tool_calls
boolean
Enable parallel function calling when tools are provided. Default: true
response_format
object
Force structured JSON output. See Structured Output section
models
string[]
Fallback model list — BazaarLink tries each in order if primary fails
transforms
string[]
Message transforms to apply, e.g. ["middle-out"]. Omit to auto-apply on ≤8k-context models
previous_response_id
string
Not supported in this implementation. Use stateless mode: pass the full conversation history in the input array instead.
provider
object
Provider routing preferences — order, only, ignore, sort, allow_fallbacks

Request Schema (TypeScript)

typescript
type ResponsesRequest = {
  model: string;                    // "provider/model-name"
  input: string | InputItem[];      // string or multi-turn array

  // Optional
  instructions?: string;            // System-level message
  stream?: boolean;                 // Default: false
  max_output_tokens?: number;
  temperature?: number;             // Range: [0, 2], default: 0.7
  top_p?: number;
  tools?: Tool[];
  tool_choice?: "auto" | "none" | "required" | object;
  parallel_tool_calls?: boolean;    // Default: true
  previous_response_id?: string;    // Not supported — use full input array
  provider?: ProviderPreferences;   // Same as Chat Completions
};

type InputItem =
  | { type?: "message"; role: "user" | "assistant" | "system" | "developer"; content: string | ContentBlock[] }
  | { type: "function_call_output"; call_id: string; output: string }   // tool result
  | { type: "function_call"; call_id: string; name: string; arguments: string };

type ContentBlock =
  | { type: "input_text"; text: string }
  | { type: "input_image"; image_url: string; detail?: "auto" | "low" | "high" };

Example Request

bash
curl https://bazaarlink.ai/api/v1/responses \
  -H "Authorization: Bearer $ROUTEFREE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "instructions": "You are a helpful assistant.",
    "input": "What is the capital of Taiwan?"
  }'

Response Format

typescript
// Non-streaming response object
type ResponsesResponse = {
  id: string;             // "resp_..."
  object: "response";
  created_at: number;
  completed_at: number;
  status: "completed" | "failed" | "incomplete";
  model: string;
  output: OutputItem[];
  usage: {
    input_tokens: number;   // equivalent to prompt_tokens
    output_tokens: number;  // equivalent to completion_tokens
    total_tokens: number;
    cost?: number;          // actual cost in credits
  } | null;
  error: null | { code: string; message: string };
};

type OutputItem =
  | {
      type: "message";
      id: string;
      role: "assistant";
      status: "completed";
      content: Array<{ type: "output_text"; text: string; annotations: [] }>;
    }
  | { type: "function_call"; id: string; call_id: string; name: string; arguments: string; status: "completed" };

Migrating from Chat Completions

Replace messages with input (string or array), use instructions instead of a system-role message, and read output[0].content[0].text instead of choices[0].message.content.

python
# Chat Completions (before)
response = client.chat.completions.create(
    model="openai/gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are helpful."},
        {"role": "user",   "content": "Hello"},
    ]
)
text = response.choices[0].message.content

# Responses API (after)
response = client.responses.create(
    model="openai/gpt-4o-mini",
    instructions="You are helpful.",
    input="Hello"
)
text = response.output[0].content[0].text

Limitations

  • previous_response_id is accepted but ignored — use stateless mode (full input array).
  • Built-in tools (web_search_preview, file_search, computer_use_preview) are not supported.
  • background: true (async execution) is not supported.

Messages (Anthropic)

Anthropic-compatible Messages API for Claude SDK. Use exactly as you would with Anthropic's API — just change base URL and auth header.

POST/api/v1/messages
Note
Accepts Bearer token or x-api-key header (Anthropic SDK compatibility). Max body size: 10 MB.

Request Body

modelrequired
string
Model ID, e.g. "openai/gpt-4o" or "anthropic/claude-3.5-sonnet"
max_tokensrequired
integer
Maximum number of tokens to generate (positive integer).
messagesrequired
Message[]
Array of conversation messages (non-empty).
system
string
Optional system prompt.
stream
boolean
If true, returns a Server-Sent Events stream. Default: false
temperature
number
Sampling temperature 0–2. Higher = more random. Default: 1
top_p
number
Nucleus sampling probability mass. Default: 1
top_k
integer
Limit token choices to top-K. 0 = disabled (consider all). Default: 0
stop_sequences
string[]
Custom stop sequence strings.
tools
Tool[]
List of tools (functions) the model may call
tool_choice
string | object
Controls tool use: "auto", "none", or specific tool

Example Request

python
from anthropic import Anthropic

client = Anthropic(
    base_url="https://bazaarlink.ai/api/v1",
    api_key="sk-bl-YOUR_KEY"
)

response = client.messages.create(
    model="anthropic/claude-opus-4",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}]
)
print(response.content[0].text)

Response

json
{
  "id": "msg_...",
  "type": "message",
  "role": "assistant",
  "model": "anthropic/claude-opus-4",
  "content": [
    { "type": "text", "text": "Hello! How can I help you today?" }
  ],
  "stop_reason": "end_turn",
  "usage": {
    "input_tokens": 10,
    "output_tokens": 12,
    "cache_read_input_tokens": 0,
    "cache_creation_input_tokens": 0,
    "bz_cost": 0.00042
  }
}

Errors: 400 (validation), 402 (insufficient credits), 429 (rate limit), 502 (upstream error / missing key), 503 (server restart).

Models

List all available models with pricing and capability information. No authentication required for this endpoint.

GET/api/v1/models
bash
curl https://bazaarlink.ai/api/v1/models

Response

json
{
  "data": [
    {
      "id": "openai/gpt-4.1",
      "name": "GPT 4.1",
      "context_length": 1047576,
      "modality": "text+image+file->text",
      "pricing": {
        "prompt": "2.00",
        "completion": "8.00"
      }
    }
  ]
}
typescript
// /v1/models — Response Schema
type ModelsResponse = {
  data: Model[];
};

type Model = {
  id: string;                    // Model ID (e.g. "openai/gpt-4.1")
  name: string;                  // Human-readable name
  context_length: number | null; // Max context window in tokens
  modality: string | null;       // e.g. "text->text", "text+image->text"
  pricing: {
    prompt: string;              // Input price per 1M tokens (USD)
    completion: string;          // Output price per 1M tokens (USD)
  };
  description?: string | null;   // Model description
  top_provider?: {
    max_completion_tokens?: number;
  };
  supported_parameters?: string[]; // e.g. ["tools", "response_format", "reasoning"]
};

Available Models (228)

These are the models currently available on BazaarLink, loaded dynamically from our database:

Qwen
qwen/qwen3-vl-32b-instruct262K ctx · $0.10/$0.42text+image->text
qwen/qwen3-vl-30b-a3b-instruct262K ctx · $0.13/$0.52text+image->text
qwen/qwen3-30b-a3b-thinking-2507131K ctx · $0.08/$0.40text->text
qwen/qwen3.5-397b-a17b262K ctx · $0.39/$2.34text+image+video->text
qwen/qwen3-coder-plus1000K ctx · $0.65/$3.25text->text
qwen/qwen3-next-80b-a3b-instruct262K ctx · $0.09/$1.10text->text
qwen/qwen3-235b-a22b-thinking-2507262K ctx · $0.15/$1.50text->text
qwen/qwen3.5-35b-a3b262K ctx · $0.14/$1.00text+image+video->text
qwen/qwen3-vl-8b-thinking256K ctx · $0.12/$1.36text+image->text
qwen/qwen-plus1000K ctx · $0.26/$0.78text->text
qwen/qwen3-235b-a22b131K ctx · $0.46/$1.82text->text
qwen/qwen3.5-122b-a10b262K ctx · $0.26/$2.08text+image+video->text
qwen/qwen3-coder-next262K ctx · $0.11/$0.80text->text
qwen/qwen3-vl-235b-a22b-thinking131K ctx · $0.26/$2.60text+image->text
qwen/qwen-plus-2025-07-281000K ctx · $0.26/$0.78text->text
qwen/qwen3.5-27b262K ctx · $0.20/$1.56text+image+video->text
qwen/qwen-2.5-7b-instruct131K ctx · $0.04/$0.10text->text
qwen/qwen3-14b132K ctx · $0.10/$0.24text->text
qwen/qwen3-max-thinking262K ctx · $0.78/$3.90text->text
qwen/qwen-plus-2025-07-28:thinking1000K ctx · $0.26/$0.78text->text
qwen/qwen3-vl-30b-a3b-thinking131K ctx · $0.13/$1.56text+image->text
qwen/qwen3-coder-flash1000K ctx · $0.20/$0.97text->text
qwen/qwen3-max262K ctx · $0.78/$3.90text->text
qwen/qwen3.5-plus-02-151000K ctx · $0.26/$1.56text+image+video->text
qwen/qwen-2.5-72b-instruct131K ctx · $0.36/$0.40text->text
qwen/qwen2.5-vl-72b-instruct131K ctx · $0.25/$0.75text+image->text
qwen/qwen-2.5-coder-32b-instruct128K ctx · $0.66/$1.00text->text
qwen/qwen3-8b131K ctx · $0.05/$0.40text->text
qwen/qwen3-vl-8b-instruct256K ctx · $0.08/$0.50text+image->text
qwen/qwen3-235b-a22b-2507262K ctx · $0.07/$0.10text->text
qwen/qwen3-coder1049K ctx · $0.22/$1.80text->text
qwen/qwen3.5-flash-02-231000K ctx · $0.07/$0.26text+image+video->text
qwen/qwen3-next-80b-a3b-thinking262K ctx · $0.10/$0.78text->text
qwen/qwen3-vl-235b-a22b-instruct262K ctx · $0.20/$0.88text+image->text
qwen/qwen3-coder-30b-a3b-instruct160K ctx · $0.07/$0.27text->text
qwen/qwen3-30b-a3b-instruct-2507262K ctx · $0.09/$0.30text->text
qwen/qwen3-32b131K ctx · $0.08/$0.28text->text
OpenAI
openai/gpt-3.5-turbo16K ctx · $0.50/$1.50text->text
openai/gpt-5.3-chat128K ctx · $1.75/$14.00text+image+file->text
openai/gpt-4o-2024-08-06128K ctx · $2.50/$10.00text+image+file->text
openai/gpt-3.5-turbo-16k16K ctx · $3.00/$4.00text->text
openai/gpt-oss-120b131K ctx · $0.04/$0.18text->text
openai/gpt-4o-mini-search-preview128K ctx · $0.15/$0.60text->text
openai/gpt-4-03148K ctx · $30.00/$60.00text->text
openai/gpt-4o-mini-2024-07-18128K ctx · $0.15/$0.60text+image+file->text
openai/gpt-4o-2024-11-20128K ctx · $2.50/$10.00text+image+file->text
openai/gpt-oss-safeguard-20b131K ctx · $0.07/$0.30text->text
openai/gpt-5.4-image-2272K ctx · $8.00/$15.00text+image+file->text+image
openai/gpt-3.5-turbo-instruct4K ctx · $1.50/$2.00text->text
openai/gpt-4o128K ctx · $2.50/$10.00text+image+file->text
openai/gpt-48K ctx · $30.00/$60.00text->text
openai/gpt-3.5-turbo-06134K ctx · $1.00/$2.00text->text
openai/gpt-4o-search-preview128K ctx · $2.50/$10.00text->text
openai/gpt-4-turbo-preview128K ctx · $10.00/$30.00text->text
openai/gpt-5.3-codex400K ctx · $1.75/$14.00text+image+file->text
openai/gpt-4-turbo128K ctx · $10.00/$30.00text+image->text
openai/gpt-4o-mini128K ctx · $0.15/$0.60text+image+file->text
openai/gpt-oss-20b131K ctx · $0.03/$0.14text->text
openai/gpt-4o-2024-05-13128K ctx · $5.00/$15.00text+image+file->text
openai/gpt-4o-mini-tts-2025-12-154K ctx · $0.60/$0.00text->speech
Mistral
mistralai/mistral-large128K ctx · $2.00/$6.00text+file->text
mistralai/mistral-nemo131K ctx · $0.02/$0.03text->text
mistralai/codestral-2508256K ctx · $0.30/$0.90text+file->text
mistralai/mistral-small-3.1-24b-instruct128K ctx · $0.35/$0.56text+image->text
mistralai/devstral-medium131K ctx · $0.40/$2.00text+file->text
mistralai/ministral-8b-2512262K ctx · $0.15/$0.15text+image->text
mistralai/mistral-small-3.2-24b-instruct128K ctx · $0.07/$0.20text+image->text
mistralai/mistral-large-2407131K ctx · $2.00/$6.00text+file->text
mistralai/mistral-large-2512262K ctx · $0.50/$1.50text+image+file->text
mistralai/devstral-small131K ctx · $0.10/$0.30text+file->text
mistralai/mistral-large-2411131K ctx · $2.00/$6.00text+file->text
mistralai/mistral-small-24b-instruct-250133K ctx · $0.05/$0.08text->text
mistralai/ministral-14b-2512262K ctx · $0.20/$0.20text+image->text
mistralai/ministral-3b-2512131K ctx · $0.10/$0.10text+image->text
mistralai/voxtral-small-24b-250732K ctx · $0.10/$0.30text+file+audio->text
mistralai/mistral-saba33K ctx · $0.20/$0.60text+file->text
mistralai/devstral-2512262K ctx · $0.40/$2.00text+file->text
mistralai/mistral-medium-3131K ctx · $0.40/$2.00text+image+file->text
Google
google/gemini-2.5-flash-lite-preview-09-20251049K ctx · $0.10/$0.40text+image+file+audio+video->text
google/gemini-3.1-pro-preview1049K ctx · $2.00/$12.00text+image+file+audio+video->text
google/gemini-3.1-pro-preview-customtools1049K ctx · $2.00/$12.00text+image+file+audio+video->text
google/gemma-2-27b-it8K ctx · $0.65/$0.65text->text
google/gemini-2.5-flash-image33K ctx · $0.30/$2.50text+image->text+image
google/gemini-3.1-flash-image-preview131K ctx · $0.50/$3.00text+image->text+image
google/gemini-2.5-pro1049K ctx · $1.25/$10.00text+image+file+audio+video->text
google/gemma-3-4b-it131K ctx · $0.04/$0.08text+image->text
google/gemma-3-27b-it131K ctx · $0.08/$0.16text+image->text
google/gemini-3-pro-image-preview66K ctx · $2.00/$12.00text+image->text+image
google/gemini-3.1-flash-lite-preview1049K ctx · $0.25/$1.50text+image+file+audio+video->text
google/gemini-3-flash-preview1049K ctx · $0.50/$3.00text+image+file+audio+video->text
google/gemma-3-12b-it131K ctx · $0.04/$0.13text+image->text
google/gemini-2.5-pro-preview1049K ctx · $1.25/$10.00text+image+file+audio->text
google/gemini-2.5-pro-preview-05-061049K ctx · $1.25/$10.00text+image+file+audio+video->text
google/gemini-2.5-flash-lite1049K ctx · $0.10/$0.40text+image+file+audio+video->text
google/gemini-2.5-flash1049K ctx · $0.30/$2.50text+image+file+audio+video->text
Meta
meta-llama/llama-4-maverick1049K ctx · $0.15/$0.60text+image->text
meta-llama/llama-4-scout10000K ctx · $0.08/$0.30text+image->text
meta-llama/llama-3.3-70b-instruct131K ctx · $0.10/$0.32text->text
meta-llama/llama-3-70b-instruct8K ctx · $0.51/$0.74text->text
meta-llama/llama-3.1-70b-instruct131K ctx · $0.40/$0.40text->text
meta-llama/llama-guard-3-8b131K ctx · $0.48/$0.03text->text
meta-llama/llama-3.1-8b-instruct131K ctx · $0.02/$0.05text->text
meta-llama/llama-3-8b-instruct8K ctx · $0.04/$0.04text->text
meta-llama/llama-3.2-1b-instruct131K ctx · $0.03/$0.20text->text
meta-llama/llama-3.2-11b-vision-instruct131K ctx · $0.24/$0.24text+image->text
meta-llama/llama-guard-4-12b164K ctx · $0.18/$0.18text+image->text
meta-llama/llama-3.2-3b-instruct131K ctx · $0.05/$0.34text->text
Anthropic
anthropic/claude-sonnet-4.61000K ctx · $3.00/$15.00text+image+file->text
anthropic/claude-3-haiku200K ctx · $0.25/$1.25text+image->text
anthropic/claude-opus-4.71000K ctx · $5.00/$25.00text+image+file->text
anthropic/claude-3.5-haiku200K ctx · $0.80/$4.00text+image->text
anthropic/claude-sonnet-41000K ctx · $3.00/$15.00text+image+file->text
anthropic/claude-opus-4.5200K ctx · $5.00/$25.00text+image+file->text
anthropic/claude-sonnet-4.51000K ctx · $3.00/$15.00text+image+file->text
anthropic/claude-opus-4.1200K ctx · $15.00/$75.00text+image+file->text
anthropic/claude-haiku-4.5200K ctx · $1.00/$5.00text+image+file->text
anthropic/claude-opus-4.61000K ctx · $5.00/$25.00text+image+file->text
anthropic/claude-opus-4200K ctx · $15.00/$75.00text+image+file->text
DeepSeek
deepseek/deepseek-v3.2-exp164K ctx · $0.27/$0.41text->text
deepseek/deepseek-v3.2-speciale164K ctx · $0.29/$0.43text->text
deepseek/deepseek-r1-0528164K ctx · $0.50/$2.15text->text
deepseek/deepseek-r1164K ctx · $0.70/$2.50text->text
deepseek/deepseek-chat-v3.1164K ctx · $0.21/$0.79text->text
deepseek/deepseek-chat-v3-0324164K ctx · $0.20/$0.77text->text
deepseek/deepseek-chat131K ctx · $0.23/$0.91text->text
deepseek/deepseek-r1-distill-llama-70b131K ctx · $0.70/$0.80text->text
deepseek/deepseek-r1-distill-qwen-32b128K ctx · $0.29/$0.29text->text
deepseek/deepseek-v3.2131K ctx · $0.25/$0.38text->text
deepseek/deepseek-v3.1-terminus164K ctx · $0.27/$0.95text->text
z-ai
z-ai/glm-4.7203K ctx · $0.40/$1.75text->text
z-ai/glm-5203K ctx · $0.60/$1.92text->text
z-ai/glm-4.5-air:free131K ctx · $0.00/$0.00text->text
z-ai/glm-4-32b128K ctx · $0.10/$0.10text->text
z-ai/glm-4.6v131K ctx · $0.30/$0.90text+image+video->text
z-ai/glm-4.5v66K ctx · $0.60/$1.80text+image->text
z-ai/glm-4.5131K ctx · $0.60/$2.20text->text
z-ai/glm-4.7-flash203K ctx · $0.06/$0.40text->text
z-ai/glm-4.5-air131K ctx · $0.13/$0.85text->text
z-ai/glm-4.6203K ctx · $0.43/$1.74text->text
nvidia
nvidia/nemotron-3-nano-30b-a3b:free256K ctx · $0.00/$0.00text->text
nvidia/nemotron-nano-12b-v2-vl:free128K ctx · $0.00/$0.00text+image+video->text
nvidia/nemotron-nano-9b-v2131K ctx · $0.04/$0.16text->text
nvidia/nemotron-nano-9b-v2:free128K ctx · $0.00/$0.00text->text
nvidia/nemotron-3-nano-30b-a3b262K ctx · $0.05/$0.20text->text
nvidia/llama-3.3-nemotron-super-49b-v1.5131K ctx · $0.10/$0.40text->text
minimax
minimax/minimax-m2-her66K ctx · $0.30/$1.20text->text
minimax/minimax-m2.5205K ctx · $0.15/$1.15text->text
minimax/minimax-m2205K ctx · $0.26/$1.00text->text
minimax/minimax-011000K ctx · $0.20/$1.10text+image->text
minimax/minimax-m2.1205K ctx · $0.29/$0.95text->text
minimax/minimax-m11000K ctx · $0.40/$2.20text->text
baidu
baidu/ernie-4.5-vl-28b-a3b131K ctx · $0.14/$0.56text+image->text
baidu/ernie-4.5-300b-a47b131K ctx · $0.28/$1.10text->text
baidu/ernie-4.5-vl-424b-a47b131K ctx · $0.42/$1.25text+image->text
baidu/ernie-4.5-21b-a3b131K ctx · $0.07/$0.28text->text
baidu/ernie-4.5-21b-a3b-thinking131K ctx · $0.07/$0.28text->text
amazon
amazon/nova-premier-v11000K ctx · $2.50/$12.50text+image->text
amazon/nova-micro-v1128K ctx · $0.04/$0.14text->text
amazon/nova-pro-v1300K ctx · $0.80/$3.20text+image->text
amazon/nova-2-lite-v11000K ctx · $0.30/$2.50text+image+file+video->text
amazon/nova-lite-v1300K ctx · $0.06/$0.24text+image->text
perplexity
perplexity/sonar127K ctx · $1.00/$1.00text+image->text
perplexity/sonar-reasoning-pro128K ctx · $2.00/$8.00text+image->text
perplexity/sonar-pro200K ctx · $3.00/$15.00text+image->text
perplexity/sonar-pro-search200K ctx · $3.00/$15.00text+image->text
perplexity/sonar-deep-research128K ctx · $2.00/$8.00text->text
aion-labs
aion-labs/aion-2.0131K ctx · $0.80/$1.60text->text
aion-labs/aion-1.0131K ctx · $4.00/$8.00text->text
aion-labs/aion-rp-llama-3.1-8b33K ctx · $0.80/$1.60text->text
aion-labs/aion-1.0-mini131K ctx · $0.70/$1.40text->text
bytedance-seed
bytedance-seed/seed-2.0-mini262K ctx · $0.10/$0.40text+image+video->text
bytedance-seed/seed-1.6-flash262K ctx · $0.07/$0.30text+image+video->text
bytedance-seed/seedream-4.54K ctx · $0.00/$0.00image+text->image
bytedance-seed/seed-1.6262K ctx · $0.25/$2.00text+image+video->text
thedrummer
thedrummer/cydonia-24b-v4.1131K ctx · $0.30/$0.50text->text
thedrummer/unslopnemo-12b33K ctx · $0.40/$0.40text->text
thedrummer/rocinante-12b33K ctx · $0.17/$0.43text->text
thedrummer/skyfall-36b-v233K ctx · $0.55/$0.80text->text
moonshotai
moonshotai/kimi-k2131K ctx · $0.57/$2.30text->text
moonshotai/kimi-k2.5262K ctx · $0.40/$1.90text+image->text
moonshotai/kimi-k2-thinking262K ctx · $0.60/$2.50text->text
moonshotai/kimi-k2-0905262K ctx · $0.60/$2.50text->text
sao10k
sao10k/l3.3-euryale-70b131K ctx · $0.65/$0.75text->text
sao10k/l3.1-euryale-70b131K ctx · $0.85/$0.85text->text
sao10k/l3.1-70b-hanami-x116K ctx · $3.00/$3.00text->text
sao10k/l3-lunaris-8b8K ctx · $0.04/$0.05text->text
Cohere
cohere/command-r-plus-08-2024128K ctx · $2.50/$10.00text->text
cohere/command-a256K ctx · $2.50/$10.00text->text
cohere/command-r-08-2024128K ctx · $0.15/$0.60text->text
cohere/command-r7b-12-2024128K ctx · $0.04/$0.15text->text
Nous
nousresearch/hermes-4-70b131K ctx · $0.13/$0.40text->text
nousresearch/hermes-3-llama-3.1-70b131K ctx · $0.30/$0.30text->text
nousresearch/hermes-3-llama-3.1-405b131K ctx · $1.00/$1.00text->text
nousresearch/hermes-4-405b131K ctx · $1.00/$3.00text->text
bytedance
bytedance/ui-tars-1.5-7b128K ctx · $0.10/$0.20text+image->text
bytedance/seedance-2.0 · $0.00/$0.00text+image->video
bytedance/seedance-2.0-fast · $0.00/$0.00text+image->video
liquid
liquid/lfm-2-24b-a2b128K ctx · $0.03/$0.12text->text
liquid/lfm-2.5-1.2b-instruct:free33K ctx · $0.00/$0.00text->text
liquid/lfm-2.5-1.2b-thinking:free33K ctx · $0.00/$0.00text->text
morph
morph/morph-v3-large262K ctx · $0.90/$1.90text->text
morph/morph-v3-fast82K ctx · $0.80/$1.20text->text
inflection
inflection/inflection-3-productivity8K ctx · $2.50/$10.00text->text
inflection/inflection-3-pi8K ctx · $2.50/$10.00text->text
openrouter
openrouter/free200K ctx · $0.00/$0.00text+image->text
openrouter/bodybuilder128K ctx · $-1000000.00/$-1000000.00text->text
stepfun
stepfun/step-3.5-flash262K ctx · $0.09/$0.30text->text
arcee-ai
arcee-ai/trinity-mini131K ctx · $0.04/$0.15text->text
undi95
undi95/remm-slerp-l2-13b6K ctx · $0.45/$0.65text->text
gryphe
gryphe/mythomax-l2-13b4K ctx · $0.06/$0.06text->text
ibm-granite
ibm-granite/granite-4.0-h-micro131K ctx · $0.02/$0.11text->text
switchpoint
switchpoint/router131K ctx · $0.85/$3.40text->text
essentialai
essentialai/rnj-1-instruct33K ctx · $0.15/$0.15text->text
tencent
tencent/hunyuan-a13b-instruct131K ctx · $0.14/$0.57text->text
writer
writer/palmyra-x51040K ctx · $0.60/$6.00text->text
allenai
allenai/olmo-3-32b-think66K ctx · $0.15/$0.50text->text
nex-agi
nex-agi/deepseek-v3.1-nex-n1131K ctx · $0.14/$0.50text->text
xiaomi
xiaomi/mimo-v2-flash262K ctx · $0.10/$0.30text->text
alfredpros
alfredpros/codellama-7b-instruct-solidity4K ctx · $0.80/$1.20text->text
prime-intellect
prime-intellect/intellect-3131K ctx · $0.20/$1.10text->text
mancer
mancer/weaver8K ctx · $0.75/$1.00text->text
upstage
upstage/solar-pro-3128K ctx · $0.15/$0.60text->text
anthracite-org
anthracite-org/magnum-v4-72b33K ctx · $3.00/$5.00text->text
ai21
ai21/jamba-large-1.7256K ctx · $2.00/$8.00text->text
Microsoft
microsoft/wizardlm-2-8x22b66K ctx · $0.62/$0.62text->text
black-forest-labs
black-forest-labs/flux.2-max47K ctx · $0.00/$0.00text+image->image
deepcogito
deepcogito/cogito-v2.1-671b128K ctx · $1.25/$1.25text->text
relace
relace/relace-search256K ctx · $1.00/$3.00text->text

Browse all models on the Models page.

Streaming

Set stream: true to receive a Server-Sent Events (SSE) stream. Each event contains a chunk of the response.

python
from openai import OpenAI

client = OpenAI(
    base_url="https://bazaarlink.ai/api/v1",
    api_key="sk-bl-YOUR_API_KEY",
)

stream = client.chat.completions.create(
    model="anthropic/claude-sonnet-4.6",
    messages=[{"role": "user", "content": "Count to 10 slowly."}],
    stream=True,
)

for chunk in stream:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="", flush=True)

SSE Format

bash
data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":"Hello"},"index":0}]}

data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":" world"},"index":0}]}

data: {"id":"chatcmpl-abc","choices":[{"delta":{},"finish_reason":"stop","index":0}],"usage":{"prompt_tokens":10,"completion_tokens":4,"total_tokens":14}}

data: [DONE]
Usage in streaming
When streaming, usage data is returned in the final chunk before the [DONE] message, with an empty choices array.

Cursor IDE Integration

Use BazaarLink as Cursor's OpenAI Override URL. Drop-in setup with auto-conversion of Responses API bodies, tool-format normalization, and the bz- prefix convention for Claude models.

Quick setup

In Cursor, open Settings → Models, then:

  1. Set Override OpenAI Base URL to https://bazaarlink.ai/v1
  2. Set Override OpenAI API Key to your sk-bl-... BazaarLink key
  3. Add the model name you want — see below for Claude (the bz- prefix).
Backward compatibility
The legacy URL https://bazaarlink.ai/v1/cursor still works — it's now a thin re-export of /v1/chat/completions. New setups should use /v1 directly.

The bz- prefix (for Claude models)

Cursor's client-side validation reroutes any model name starting with claude- through Cursor's own Anthropic integration, bypassing your Override URL. To force Cursor to send the request to BazaarLink, prefix the model name with bz-. The server strips the prefix and resolves the rest via the alias map.

Type in CursorResolves to
bz-claude-sonnet-4.6anthropic/claude-sonnet-4.6
bz-claude-opus-4-7anthropic/claude-opus-4-7
gpt-4oopenai/gpt-4o
gemini-2.5-flashgoogle/gemini-2.5-flash

The dot-vs-hyphen variants are normalized: bz-claude-sonnet-4.6 and bz-claude-sonnet-4-6 both resolve to the same model.

CURSOR_MODEL_MAP env var (operator override)

For self-hosted BazaarLink deployments, set this env var to remap arbitrary Cursor-side model names to canonical OpenRouter ids:

bash
CURSOR_MODEL_MAP=gpt-claude-sonnet:anthropic/claude-sonnet-4.6,gpt-opus:anthropic/claude-opus-4-7

Now gpt-claude-sonnet typed in Cursor maps to anthropic/claude-sonnet-4.6 server-side. Useful when you want Cursor to think a model is GPT-family (so it routes through the Override URL) while you actually serve Claude.

What happens automatically

When a request hits /api/v1/chat/completions, BazaarLink applies these compatibility transforms transparently — you don't need to do anything client-side:

  • Auto-detects Responses API bodies — if the body has input instead of messages, it's converted to Chat Completions shape (Cursor sends Responses API format for GPT-family models).
  • Wraps flat tool definitions — Cursor Agent sends { name, description, parameters } without a function wrapper. We wrap it so Anthropic doesn't reject as Tool '' not found in provided tools.
  • Coerces malformed tool_choice — Cursor sends { type: "auto" } (object form, no function). The OpenAI spec requires the string form for auto/none/required, so we coerce.
  • Strips OpenAI-only fields when routing to non-OpenAI providers — parallel_tool_calls, logprobs, top_logprobs, logit_bias, service_tier, user are removed before forwarding (Anthropic returns 400 otherwise).
  • Maps max_output_tokens → max_tokens and strips Responses-API-only fields (previous_response_id, truncation, background, store). The reasoning field is preserved for Chat-Completions-native bodies (it's a valid OpenRouter passthrough).

Cursor Agent mode

Tool calling works through the standard Chat Completions tool-call flow. Cursor sends tools (Shell, Read, Write, Grep, etc.) with tool_choice: "auto"; BazaarLink forwards to your chosen provider, which decides whether to call a tool. Tool calls return as standard OpenAI tool_calls deltas; Cursor executes locally and continues the conversation. Works the same regardless of whether you pick gpt-4o (native OpenAI) or bz-claude-sonnet-4.6 (Anthropic via OpenRouter).

Debugging upstream rejects
If you see provider 4xx errors, check the admin Provider Health panel. Every 4xx response is persisted with the full upstream error body and a summary of the request body we forwarded — click any 🔴 row to expand the JSON.

Embeddings

Generate text embeddings compatible with the OpenAI Embeddings API.

POST/api/v1/embeddings
Note
Not all upstream providers support embeddings. If your configured provider does not support the requested model, BazaarLink will automatically failover to the next available provider.
python
from openai import OpenAI

client = OpenAI(
    base_url="https://bazaarlink.ai/api/v1",
    api_key="sk-bl-YOUR_API_KEY",
)

response = client.embeddings.create(
    model="openai/text-embedding-3-small",
    input="The quick brown fox jumps over the lazy dog",
)

print(response.data[0].embedding)  # 1536-dimensional vector

Parameters

Sampling parameters shape the token generation process. BazaarLink passes supported parameters to the upstream provider; unsupported parameters are silently ignored.

Sampling Parameters

temperature
number
Sampling temperature 0–2. Higher = more random. Default: 1
top_p
number
Nucleus sampling probability mass. Default: 1
top_k
integer
Limit token choices to top-K. 0 = disabled (consider all). Default: 0
frequency_penalty
number
Penalize repeated tokens. Range: [-2, 2]. Default: 0
presence_penalty
number
Penalize tokens based on presence. Range: [-2, 2]. Default: 0
repetition_penalty
number
Reduce token repetition from input. Range: (0, 2]. Default: 1
min_p
number
Minimum probability relative to the top token. Range: [0, 1]. Default: 0
top_a
number
Dynamic top-P based on highest-probability token. Range: [0, 1]. Default: 0
seed
integer
Integer seed for deterministic sampling. Not guaranteed for all models
max_tokens
integer
Maximum number of tokens to generate
n
integer
Number of completions to generate. Default: 1
logit_bias
object
Map token IDs to bias values [-100, 100] added before sampling
logprobs
boolean
Return log probabilities of each output token
top_logprobs
integer
Number of most-likely tokens to return per position (requires logprobs: true). Range: 0–20
response_format
object
Force structured JSON output. See Structured Output section
stop
string | string[]
Stop sequences — generation halts when encountered
tools
Tool[]
List of tools (functions) the model may call
tool_choice
string | object
Controls tool use: "auto", "none", or specific tool
parallel_tool_calls
boolean
Enable parallel function calling when tools are provided. Default: true

BazaarLink-only Parameters

transforms
string[]
Message transforms to apply, e.g. ["middle-out"]. Omit to auto-apply on ≤8k-context models
models
string[]
Fallback model list — BazaarLink tries each in order if primary fails
route
string
Set to "fallback" to enable waterfall routing through the models array
provider
object
Provider routing preferences — order, only, ignore, sort, allow_fallbacks
debug
object
Debug options. echo_upstream_body: true returns transformed request body as first SSE chunk (streaming only)

Tool Calling

Tool calling (also known as function calling) lets models invoke external functions you define. The model decides when to call a tool and generates structured arguments — your code executes the function and returns results to continue the conversation.

Supported Models

Most frontier models support tool calling. Here are some popular choices:

Defining Tools

Each tool is a JSON object describing a function the model can call. The parameters field uses JSON Schema.

namerequired
string
Function name (a-z, A-Z, 0-9, underscores, dashes)
descriptionrequired
string
Clear description of when and how the function should be used
parametersrequired
object
JSON Schema object defining function parameters

tool_choice Options

Value
Behavior
"auto"Model decides whether to call a tool (default)
"none"Model will not call any tool
"required"Model must call at least one tool
{"type": "function", "function": {"name": "get_weather"}}Model must call the specified function

Complete Flow

Tool calling is a multi-turn process: (1) send the request with tools → (2) model returns tool_calls → (3) execute functions → (4) send results back → (5) model generates final response.

python
import json
from openai import OpenAI

client = OpenAI(
    base_url="https://bazaarlink.ai/api/v1",
    api_key="sk-bl-YOUR_API_KEY",
)

# Step 1: Define tools and send request
tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get current weather for a city",
        "parameters": {
            "type": "object",
            "properties": {
                "city": {"type": "string", "description": "City name"},
                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
            },
            "required": ["city"]
        }
    }
}]

response = client.chat.completions.create(
    model="openai/gpt-4.1",
    messages=[{"role": "user", "content": "What's the weather in Taipei?"}],
    tools=tools,
    tool_choice="auto",
)

# Step 2: Check for tool calls
message = response.choices[0].message
if message.tool_calls:
    tool_call = message.tool_calls[0]
    args = json.loads(tool_call.function.arguments)

    # Step 3: Execute your function
    result = {"temperature": 28, "unit": "celsius", "condition": "Partly cloudy"}

    # Step 4: Send result back
    final = client.chat.completions.create(
        model="openai/gpt-4.1",
        messages=[
            {"role": "user", "content": "What's the weather in Taipei?"},
            message,
            {"role": "tool", "tool_call_id": tool_call.id, "content": json.dumps(result)},
        ],
        tools=tools,
    )

    # Step 5: Get final response
    print(final.choices[0].message.content)
    # "The weather in Taipei is 28°C and partly cloudy."

Parallel Tool Calls

Some models can call multiple tools in a single response. Handle each tool call and return all results:

python
# Model may return multiple tool_calls
if message.tool_calls:
    messages = [
        {"role": "user", "content": "Weather and time in Tokyo?"},
        message,
    ]

    for tool_call in message.tool_calls:
        # Execute each function
        if tool_call.function.name == "get_weather":
            result = {"temperature": 22, "condition": "Clear"}
        elif tool_call.function.name == "get_time":
            result = {"time": "2026-02-23T15:30:00+09:00"}

        messages.append({
            "role": "tool",
            "tool_call_id": tool_call.id,
            "content": json.dumps(result),
        })

    # Send all results back at once
    final = client.chat.completions.create(
        model="openai/gpt-4.1",
        messages=messages,
        tools=tools,
    )
    print(final.choices[0].message.content)

Structured Output

Force the model to return valid JSON matching a schema. This is essential for building reliable applications that parse model outputs programmatically.

Method 1: response_format (JSON Schema)

to enforce strict JSON Schema compliance:

typerequired
string
Must be "json_schema"
json_schema.namerequired
string
A name for the schema (used for caching)
json_schema.strict
boolean
When true, guarantees exact schema compliance
json_schema.schemarequired
object
The JSON Schema definition
python
response = client.chat.completions.create(
    model="openai/gpt-4.1",
    messages=[{"role": "user", "content": "Review the movie Inception"}],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "movie_review",
            "strict": True,
            "schema": {
                "type": "object",
                "properties": {
                    "title": {"type": "string"},
                    "rating": {"type": "integer", "description": "Rating 1-10"},
                    "summary": {"type": "string"},
                    "pros": {"type": "array", "items": {"type": "string"}},
                    "cons": {"type": "array", "items": {"type": "string"}},
                },
                "required": ["title", "rating", "summary", "pros", "cons"],
                "additionalProperties": False,
            },
        },
    },
)

import json
review = json.loads(response.choices[0].message.content)
print(review["title"])    # "Inception"
print(review["rating"])   # 9

Tips

  • Use clear, descriptive property names — the model uses them as context.
  • Add descriptions to schema properties to guide the model.
  • Set strict: true for guaranteed schema compliance (may increase latency slightly).
  • Keep schemas simple — deeply nested schemas may reduce output quality.
  • Test with different models — some handle complex schemas better than others.

Assistant Prefill

Guide the model to respond in a specific way by including a partial assistant message at the end of your messages array. The model will continue from where you left off.

python
response = client.chat.completions.create(
    model="anthropic/claude-sonnet-4.6",
    messages=[
        {"role": "user", "content": "What is the capital of France?"},
        {"role": "assistant", "content": "The capital of France is"},
    ],
)

# Model continues: " Paris, known for the Eiffel Tower..."
print(response.choices[0].message.content)
How it works
BazaarLink passes messages directly to the upstream provider. Assistant prefill works with any model that supports it — including Anthropic Claude and most OpenAI models.

Model Routing

BazaarLink uses the provider/model-name format to route requests to the correct upstream provider. This gives you access to all major models through a single API endpoint.

Model ID Format

bash
{provider}/{model-name}

# Examples:
openai/gpt-4.1
anthropic/claude-sonnet-4.6
google/gemini-2.5-flash
deepseek/deepseek-chat
meta-llama/llama-4-maverick

Routing Priority

When you send a request, BazaarLink resolves the upstream provider in this order:

  1. Exact match — looks for a model route matching the full model ID
  2. Provider wildcard — falls back to provider/* routes (e.g. openai/*)
  3. Global wildcard — falls back to * wildcard routes
  4. Default provider — uses the provider key marked as default
  5. Environment fallback — uses the configured API key as last resort

Browse all available models on the Models page.

Auto Router

The Auto Router analyzes your request and selects the most appropriate model automatically — no need to manually specify a model. Two modes are available:

  • autopaid mode. Routes to premium models for best quality. Billed at the resolved model's pricing.
  • auto:freefree mode. Routes to cost-efficient free models. No credits required, subject to rate limits.

How to Use

Set the model to "auto" (paid) or "auto:free" (free) to enable automatic routing:

python
from openai import OpenAI

client = OpenAI(
    base_url="https://bazaarlink.ai/api/v1",
    api_key="sk-bl-YOUR_API_KEY",
)

# Paid auto routing — uses premium models, charges credits
response = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Explain quantum computing"}],
)
print(f"Model used: {response.model}")  # e.g. anthropic/claude-4.6-opus

# Free auto routing — uses free models, no credits needed
response = client.chat.completions.create(
    model="auto:free",
    messages=[{"role": "user", "content": "What is 2+2?"}],
)
print(f"Model used: {response.model}")  # e.g. openai/gpt-5-nano
print(f"Cost: {response.usage.cost}")   # 0

How to Use

  • Task classification: your prompt is analyzed to determine the task type
  • Model selection: the best model for that task type is selected from the corresponding pool (paid or free)
  • Request forwarding: the request is forwarded to the selected model transparently
  • Response tracking: the resolved model is returned in the response body and X-Auto-Resolved-Model header

Classification Logic

The router computes weighted scores across all categories simultaneously (English + Chinese keywords + structural features) and picks the highest-scoring category:

#CategorySignalExample
Hard ruleTool CallingRequest contains tools parameterAny request with function calling
Hard ruleTrivialVery short prompt with no signals"Hi", "OK", "Thanks"
WeightedMath / ReasoningMath/logic keywords (1.3x boost)"Prove that √2 is irrational"
WeightedCodingCode keywords, file extensions, code blocks"Write a Python sort function"
WeightedDeep AnalysisAnalysis/comparison keywords (1.1x boost)"Analyze the impact of AI on employment"
WeightedCreativeCreative/narrative keywords (1.1x boost)"Write a short story about time travel"
WeightedInstructionTutorial/setup/install keywords"How to set up a Node.js project"
FallbackComplex Chat≥6 messages + long prompt (no keyword hit)Multi-turn conversation with context
FallbackSimple QAShort prompt (no keyword hit)"What is the capital of France?"
FallbackGeneralDefault fallbackEverything else

auto vs auto:free

Both modes use the same classification logic. The difference is the model pool:

  • auto — selects from premium models (GPT-5, Claude Opus, Gemini Pro, etc.). Credits are charged at the resolved model's rate.
  • auto:free — selects from free-tier models (DeepSeek, Gemini Flash Lite, GPT-5 Nano, etc.). No credits deducted. RPM and daily request limits apply.
  • Fallback: if auto:free hits a rate limit and the user has credits + fallback enabled, the request is automatically re-routed as a paid auto request.
Response Header
When model="auto" or "auto:free" is used, the response includes an X-Auto-Resolved-Model header showing the actual model selected. The response body's model field also reflects the resolved model.

Use Cases

  • General-purpose apps: when you don't know what types of prompts users will send
  • Cost optimization: use auto:free for development/testing, auto for production
  • Quality optimization: ensure complex prompts are routed to capable models
  • Free-tier products: offer AI features to users without requiring them to top up

Limitations

  • Requires messages format (not raw prompt strings)
  • Streaming is supported for both auto and auto:free
  • All standard BazaarLink features (tool calling, response_format, etc.) work with the selected model
  • auto:free has RPM and daily request limits that vary by user tier
Auto Router
Auto Router is live. Set model to "auto" (paid) or "auto:free" (free) to enable it. The resolved model is returned in the X-Auto-Resolved-Model response header and the response body's model field.

Fallbacks

When a provider experiences an outage or returns an error, BazaarLink can automatically retry your request with alternative models. This ensures high availability without any code changes on your side.

Fully Implemented
The `body.models[]` fallback list is fully supported. Requests are automatically retried across the fallback chain in order until one succeeds or all models are exhausted.

How It Works

  1. Your request is sent to the primary model.
  2. If the primary model fails (5xx error, timeout, or rate limit), BazaarLink automatically retries with the next model in the list.
  3. This continues until a model succeeds or all models have been tried.
  4. The response includes a header indicating which model actually served the request.

Best Practices

  • Order models from preferred to least preferred — the first model is always tried first.
  • Mix providers for maximum resilience (e.g. OpenAI → Anthropic → Google).
  • Use models with similar capabilities to ensure consistent results.
  • Set reasonable timeouts to avoid long waits before fallback triggers.
  • Monitor X-Fallback-Used header to track provider reliability.

Provider Selection

Control which providers are used when routing your requests. Include a provider object in your request body to customize routing behaviour. BazaarLink applies preferences locally or passes them through natively depending on upstream support.

Multi-Provider
Provider preferences are fully supported. BazaarLink applies order, only, ignore, sort, and allow_fallbacks locally, and passes advanced parameters to supported upstreams.

Supported Fields

FieldTypeDescription
orderstring[]Provider slugs to try in order
allow_fallbacksbooleanAllow providers outside order/only as fallbacks (default: true)
onlystring[]Only allow these provider slugs
ignorestring[]Skip these provider slugs
sortstring | object"price" | "throughput" | "latency" (or { by, partition })
quantizationsstring[]Filter by quantization level
data_collectionstring"allow" | "deny"
require_parametersbooleanOnly use providers supporting all params
max_priceobject{ prompt, completion } max $/M tokens
zdrbooleanOnly route to Zero Data Retention endpoints
enforce_distillable_textbooleanOnly route to models allowing text distillation
preferred_min_throughputnumber | objectPreferred minimum throughput (tokens/sec)
preferred_max_latencynumber | objectPreferred maximum latency (seconds)

Ordering Providers

Use the order field to specify which providers to try first. Providers not in the list are used as fallbacks (unless allow_fallbacks is false).

json
{
  "model": "meta-llama/llama-4-maverick",
  "messages": [{"role": "user", "content": "Hello"}],
  "provider": {
    "order": ["together", "fireworks"]
  }
}

Filtering Providers

Use only to whitelist specific providers, or ignore to blacklist them. These filters are applied before ordering.

json
// Only use specific providers
{
  "model": "openai/gpt-4o",
  "provider": { "only": ["openai"] }
}

// Skip a provider
{
  "model": "openai/gpt-4o",
  "provider": { "ignore": ["deepinfra"] }
}

Sorting by Price

Set sort to 'price' to automatically route to the cheapest available provider. Throughput and latency sorting are also passed through to supported upstreams.

json
{
  "model": "meta-llama/llama-4-maverick",
  "messages": [{"role": "user", "content": "Hello"}],
  "provider": {
    "sort": "price"
  }
}

Disabling Fallbacks

Set allow_fallbacks to false to restrict routing strictly to your ordered or whitelisted providers. If none are available, the request will fail rather than falling back.

json
{
  "model": "openai/gpt-4o",
  "messages": [{"role": "user", "content": "Hello"}],
  "provider": {
    "order": ["openai", "azure"],
    "allow_fallbacks": false
  }
}

Advanced Parameters

Advanced provider parameters (including quantizations, data_collection, require_parameters, max_price, etc.) are passed through to supported upstream providers for native handling.

json
// Advanced provider selection example
{
  "model": "deepseek/deepseek-v3.2",
  "messages": [{"role": "user", "content": "Hello"}],
  "provider": {
    "order": ["deepinfra", "together"],
    "sort": "throughput",
    "quantizations": ["fp8"],
    "data_collection": "deny",
    "require_parameters": true,
    "allow_fallbacks": true
  }
}

Model Variants

Append a suffix to any model ID to change routing behaviour. BazaarLink supports 7 variant types.

Multi-Provider
Model variants are now supported. Append a suffix like :free, :nitro, or :floor to any model ID. BazaarLink passes the suffix through natively or handles variant routing locally depending on upstream support.

Variant Types

There are two categories of variants: Independent Model IDs (the suffixed model is a distinct endpoint) and Routing Shortcuts (the suffix changes how BazaarLink selects a provider without altering the model itself).

Independent Model IDs

These variants exist as separate models with their own pricing and capabilities. BazaarLink tries the full model ID (with suffix) first, then falls back to the base model.

SuffixDescriptionExample
:freeFree-tier version (rate-limited)deepseek/deepseek-r1:free
:extendedExtended context windowanthropic/claude-sonnet-4.5:extended
:thinkingExtended reasoning / chain-of-thoughtdeepseek/deepseek-r1:thinking
:exactoCurated providers for tool-calling accuracymoonshotai/kimi-k2-0905:exacto

Routing Shortcuts

These suffixes modify provider selection without changing the model identity. The suffix is stripped before matching routes.

SuffixEquivalentBehaviour
:nitroprovider.sort="throughput"Prioritise highest throughput providers
:floorprovider.sort="price"Sort candidates by price ASC (cheapest first)
:onlineplugins: { web: {} }Enable real-time web search

Multi-Provider Behaviour

For upstreams that support variants, suffixes are passed through as-is. For direct providers (e.g., direct OpenAI, Fireworks), the suffix is stripped and BazaarLink handles routing locally.

Examples

json
// Independent variant — use free tier
{
  "model": "deepseek/deepseek-r1:free",
  "messages": [{"role": "user", "content": "Hello"}]
}

// Routing shortcut — cheapest provider first
{
  "model": "meta-llama/llama-4-maverick:floor",
  "messages": [{"role": "user", "content": "Hello"}]
}

// Routing shortcut — highest throughput
{
  "model": "openai/gpt-4o:nitro",
  "messages": [{"role": "user", "content": "Hello"}]
}

// Web search
{
  "model": "openai/gpt-4o:online",
  "messages": [{"role": "user", "content": "What happened today?"}]
}

Message Transforms

Automatically transform messages to fit within model context limits. When your messages exceed a model's context window, transforms intelligently condense the conversation by removing messages from the middle.

Auto
Models with a context window of 8,192 tokens or less apply middle-out automatically by default. To opt out, pass `transforms: []`. To enable for any model, pass `transforms: ["middle-out"]`.

Usage

json
// Enable middle-out on any model
{
  "model": "openai/gpt-4.1",
  "transforms": ["middle-out"],
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    ... // long conversation — middle will be trimmed to fit context
  ]
}

// Disable auto-trimming for small-context models
{ "transforms": [] }

Transform Types

Transform
Description
middle-outRemoves messages from the middle first, preserving the beginning (system prompt, context) and end (recent messages)

Default Behavior

Models with ≤8k context have middle-out enabled automatically. For larger context models, opt in explicitly. Anthropic Claude models also automatically enforce the 1,000-message limit regardless of the transforms setting.

Zero Completion Insurance

Provides billing protection for requests that completely fail to connect. If a stream never starts, you won't be charged.

Beta
Coverage is partially implemented. Streams that fail mid-way still incur a 10% minimum charge; empty completions (0 output tokens) are still billed for input tokens.

Covered Cases

  • Upstream refused connection / returned empty body — full refund
  • Stream failed before starting — full refund

Not Covered

Stream failed mid-way after starting: 10% minimum charge applied. Model returned 0 output tokens (empty content): input tokens are still billed.

Guardrails

Add safety guardrails to your API requests to filter harmful content, enforce compliance policies, and protect your application.

Beta
Guardrails are a planned feature. Currently, content filtering is handled by each upstream model provider's built-in safety systems.

Planned Features

Guardrail
Description
Content filteringBlock harmful, toxic, or inappropriate content in inputs and outputs
PII detectionDetect and redact personally identifiable information
Topic restrictionRestrict model responses to approved topics only
Output validationValidate model outputs against custom rules before returning

Current Behavior

All upstream providers have their own content safety systems. Model responses that trigger content filters will return with finish_reason: "content_filter". Custom guardrail configuration will be available in a future update.

Zero Data Retention

BazaarLink does not store your message content by default. This page describes how your data is handled. Suitable for applications processing sensitive data.

Current Data Handling

  • Message content: not stored by default, discarded from memory after processing
  • Billing metadata: token counts, timestamps, model IDs
  • Usage logs: request statistics only, no message content
  • Upstream forwarding: messages forwarded to upstream providers — subject to their privacy policies

Prompt Caching

Prompt caching reuses previously computed prompt tokens, significantly reducing cost and latency — especially for applications with large, repeated system prompts.

Note
BazaarLink automatically tracks cache savings and reflects them in billing. The `cached_tokens` field in the response shows actual cache hits; `cacheDiscount` shows the amount saved on that request.

How It Works

Caching is handled automatically by each model provider — no extra configuration needed. BazaarLink transparently proxies cache parameters and reports results in usage responses. For models that support caching, repeated prompt prefixes are typically charged at 10–50% of the normal input rate.

python
response = client.chat.completions.create(
    model="anthropic/claude-3-7-sonnet",
    messages=[
        {"role": "system", "content": "You are an expert..."},  # Long system prompt cached
        {"role": "user", "content": "Question here"},
    ],
)

# Check cache savings in the response usage
usage = response.usage
print(f"Prompt tokens: {usage.prompt_tokens}")
print(f"Cached tokens: {usage.prompt_tokens_details.cached_tokens}")
print(f"Cache savings: {usage.prompt_tokens_details.cached_tokens / usage.prompt_tokens * 100:.1f}%")

Reasoning Tokens

Reasoning models (e.g., DeepSeek R1, o1 series) think internally before producing their final answer. These internal tokens are called reasoning tokens and are billed separately.

Note
BazaarLink reports reasoning tokens in `usage.completion_tokens_details.reasoning_tokens` and shows them separately in billing.

Reading Reasoning Tokens from Responses

python
response = client.chat.completions.create(
    model="deepseek/deepseek-r1",
    messages=[{"role": "user", "content": "Solve: if f(x) = x^2 + 3x, what is f(5)?"}],
)

# Read reasoning tokens from usage
usage = response.usage
print(f"Completion tokens: {usage.completion_tokens}")
if hasattr(usage, "completion_tokens_details"):
    details = usage.completion_tokens_details
    print(f"Reasoning tokens: {details.reasoning_tokens}")
    print(f"Output tokens: {details.accepted_prediction_tokens}")
typescript
const response = await client.chat.completions.create({
  model: "openai/o3-mini",
  messages: [{ role: "user", content: "Prove that sqrt(2) is irrational." }],
  // @ts-ignore - BazaarLink extension
  reasoning_effort: "high",  // low | medium | high
});

const usage = response.usage;
console.log("Reasoning tokens:", usage?.completion_tokens_details?.reasoning_tokens);

Thinking Mode Control

Some models support toggling their "thinking" mode. Thinking mode generates internal reasoning tokens before producing the final answer, improving quality at the cost of more tokens.

Model FamilyParameterDefault
qwen3-*enable_thinking: booleanfalse (platform default)
openai/o1, o3, o4-minireasoning_effort: "low" | "medium" | "high"medium
deepseek/deepseek-r1Always enabled (cannot disable)
python
# Qwen3: explicitly enable thinking mode
response = client.chat.completions.create(
    model="qwen/qwen3-32b",
    messages=[{"role": "user", "content": "Prove the Pythagorean theorem"}],
    extra_body={"enable_thinking": True},  # opt-in to thinking
)

# usage.completion_tokens_details.reasoning_tokens shows thinking token count

Unified reasoning Object (New Format)

BazaarLink also supports the unified reasoning object, which works across all model families with a single consistent API:

FieldValuesApplies to
reasoning.effort"xhigh" | "high" | "medium" | "low" | "none"OpenAI o-series, Grok
reasoning.max_tokensintegerAnthropic Claude, Gemini
reasoning.excludebooleanHide thinking from response (model still reasons)
typescript
// Claude extended thinking — specify thinking budget in tokens
const response = await client.chat.completions.create({
  model: "anthropic/claude-sonnet-4-5",
  messages: [{ role: "user", content: "Prove the Pythagorean theorem" }],
  // @ts-ignore - BazaarLink extension
  reasoning: { max_tokens: 5000 },
});

// OpenAI o3 — specify effort level
const response2 = await client.chat.completions.create({
  model: "openai/o3",
  messages: [{ role: "user", content: "Solve this math problem..." }],
  // @ts-ignore - BazaarLink extension
  reasoning: { effort: "high" },
});

// Hide thinking content from response (model still thinks)
const response3 = await client.chat.completions.create({
  model: "anthropic/claude-sonnet-4-5",
  messages: [{ role: "user", content: "What is 2+2?" }],
  // @ts-ignore - BazaarLink extension
  reasoning: { max_tokens: 2000, exclude: true },
});
Pricing
Thinking tokens are billed as completion tokens. Some providers charge a higher rate for thinking mode — Qwen3 costs 2x standard pricing when thinking is active. BazaarLink defaults Qwen3 to enable_thinking=false to avoid unexpected costs.

Latency & Performance

Optimizing AI API response latency is critical for user experience. Below are the key factors that affect latency in the BazaarLink architecture and best practices for optimization.

Note
BazaarLink logs `latencyMs` (end-to-end latency) and `throughput` (tokens/sec) for every request, visible in your usage logs.

Factors That Affect Latency

  • Model size: larger models (70B+) are generally slower to generate
  • Provider load: varies across providers and time of day
  • Token count: higher max_tokens means longer completion time
  • Streaming vs. non-streaming: stream: true delivers the first token faster
  • Context length: very long contexts increase pre-processing time

Optimization Tips

  • Prefer streaming (stream: true) to improve perceived latency
  • Use the :nitro variant to select high-throughput providers
  • Choose smaller models (flash/mini/haiku) for latency-sensitive scenarios
  • Use provider.sort: "latency" to automatically select the lowest-latency provider
  • Enable prompt caching to reduce latency for repeated requests
python
import time

# Measure time to first token with streaming
start = time.time()
first_token_time = None

stream = client.chat.completions.create(
    model="google/gemini-2.5-flash",  # Fast model
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True,
)

for chunk in stream:
    if chunk.choices and chunk.choices[0].delta.content and not first_token_time:
        first_token_time = time.time() - start

print(f"Time to first token: {first_token_time:.3f}s")

# Check latency in usage logs via /api/v1/usage
# Each log entry includes: latency_ms, throughput (tokens/sec)
python
# Use provider.sort for automatic latency optimization
response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
    extra_body={
        "provider": {
            "sort": "latency",  # Always pick lowest-latency provider
        }
    },
)

Uptime Optimization

BazaarLink maximizes API availability through multiple layers: automatic failover, circuit breakers, and provider health monitoring.

Note
BazaarLink tracks availability for all upstream providers. When a provider's error rate exceeds a threshold, the circuit breaker automatically triggers and routes requests to the next available provider.

Availability Mechanisms

  • Circuit breaker: auto-detects and isolates failing providers
  • Automatic failover: seamlessly switches to a backup provider — no code changes needed
  • Provider health monitoring: continuously tracks error rates and latency per provider
  • Retry logic: transient errors (5xx) are automatically retried

Circuit Breaker

python
# BazaarLink handles failover automatically — no code changes needed.
# Configure fallback models for maximum resilience:

response = client.chat.completions.create(
    model="openai/gpt-4o",       # Primary model
    messages=[{"role": "user", "content": "Hello!"}],
    extra_body={
        "models": [              # Fallback chain
            "openai/gpt-4o",
            "anthropic/claude-3.5-sonnet",
            "google/gemini-2.5-flash",
        ],
        "route": "fallback",     # Enable fallback routing
    },
)

# Check if failover was used (in usage logs)
# "is_failover": true indicates the primary provider was bypassed
bash
# Check provider health (admin only)
GET https://bazaarlink.ai/api/admin/provider-health
Authorization: Bearer sk-bl-ADMIN_KEY

# Response
{
  "providers": [
    {
      "id": "provider-1",
      "name": "Anthropic",
      "status": "healthy",
      "error_rate": 0.002,
      "avg_latency_ms": 145,
      "circuit_open": false
    }
  ]
}

Credits

Query current credit balance and lifetime API usage (OpenRouter-compat format).

GET/api/v1/credits
Auth
Requires Bearer token (standard API key sk-bl-...).

Example Request

bash
curl https://bazaarlink.ai/api/v1/credits \
  -H "Authorization: Bearer sk-bl-YOUR_KEY"

Response

json
{
  "data": {
    "total_credits": 100.00,
    "total_usage": 12.34
  }
}

Errors: 401 (missing/invalid key), 403 (suspended user).

Generation Details

Retrieve detailed statistics for a single completion by generation ID (from chat/completions response id or streaming x-bz-gen-id header).

GET/api/v1/generation?id=<generation-id>
Auth
Requires Bearer token (standard API key). Required query param: id.

Example Request

bash
curl "https://bazaarlink.ai/api/v1/generation?id=gen_abc123" \
  -H "Authorization: Bearer sk-bl-YOUR_KEY"

Response

json
{
  "data": {
    "id": "gen_xyz...",
    "model": "openai/gpt-4.1",
    "provider": "openai",
    "created_at": "2026-04-20T10:00:00.000Z",
    "app_name": "MyApp",
    "finish_reason": "stop",
    "status": 200,
    "duration_ms": 1234,
    "first_token_ms": 234,
    "throughput": 45.6,
    "usage": {
      "prompt_tokens": 100,
      "completion_tokens": 200,
      "total_tokens": 300,
      "prompt_tokens_details": { "cached_tokens": 50 },
      "completion_tokens_details": { "reasoning_tokens": 30 },
      "cost": 0.00123
    },
    "cost_breakdown": {
      "subtotal": 0.00123,
      "cache_discount": 0.00015,
      "total": 0.00108
    }
  }
}

Errors: 400 (missing id), 401 (auth), 404 (generation not found).

API Key Info

Query current API key's rate limit tier and aggregated usage counters (OpenRouter-compat format).

GET/api/v1/key
Auth
Requires Bearer token (standard API key).

Response

json
{
  "data": {
    "label": "Production Key",
    "limit": null,
    "limit_remaining": 100.00,
    "is_free_tier": false,
    "usage": 12.34,
    "usage_daily": 0.12,
    "usage_weekly": 0.45,
    "usage_monthly": 1.23,
    "requests": 1234,
    "requests_daily": 10,
    "requests_weekly": 50,
    "requests_monthly": 200,
    "rate_limit": { "rpm": 600, "tier": "paid" }
  }
}
Note
is_free_tier = true when credit balance < $10. Windows are in UTC: daily = current day, weekly = Mon–Sun, monthly = 1st–EOM.

Errors: 401 (auth), 404 (user missing — rare).

AI Points Balance

Query current daily AI points balance and next reset time (separate from credit-based rate limiting).

GET/api/v1/points/balance
Auth
Requires Bearer token (standard API key).

Response

json
{
  "dailyLimit": 100,
  "dailyUsed": 23,
  "dailyRemaining": 77,
  "resetAt": "2026-04-21T16:00:00.000Z",
  "plan": { "slug": "starter", "dailyPoints": 100 }
}
Note
resetAt is the next UTC+8 midnight (in ISO UTC format). plan is null if the user has no active subscription.

Errors: 401 (auth), 403 (suspended).

Subscription Plans

Public endpoint listing all active subscription plans with pricing, rate limits, daily points, and included models.

GET/api/v1/plans

No authentication required for this endpoint.

Response

json
[
  {
    "slug": "starter",
    "priceUsd": 29,
    "rpmLimit": 600,
    "tokenLimit": null,
    "tokenLimitPeriod": "month",
    "dailyPoints": 100,
    "newAccountDailyPoints": 50,
    "newAccountCooldownHrs": 24,
    "models": [
      { "modelId": "openai/gpt-4.1", "basePoints": 1 }
    ]
  }
]

Cache: s-maxage=300, stale-while-revalidate=60.

Tier Brackets

Public reference for model pricing tiers and per-million-token caps.

GET/api/v1/tier-brackets

No authentication required for this endpoint.

Response

json
[
  { "pts": 1,  "label": "Budget",    "maxIn": 0.30, "maxOut": 1.00 },
  { "pts": 2,  "label": "Mid",       "maxIn": 0.50, "maxOut": 1.50 },
  { "pts": 3,  "label": "Mid",       "maxIn": 1.20, "maxOut": 2.50 },
  { "pts": 5,  "label": "Premium",   "maxIn": 1.00, "maxOut": 3.50 },
  { "pts": 8,  "label": "Premium",   "maxIn": 2.00, "maxOut": 6.00 },
  { "pts": 15, "label": "Flagship",  "maxIn": 3.00, "maxOut": 12.00 },
  { "pts": 25, "label": "Flagship",  "maxIn": null, "maxOut": null }
]
Note
null values indicate no cap. Cache: s-maxage=300.

Agent Registration

Self-service registration for AI agents (bots, autonomous systems). Returns an API key with trial credits and a claim token for account upgrade.

POST/api/v1/agents/register
Rate Limit
No authentication required, but IP-limited to 1 registration per 24 hours.

Request Body

namerequired
string
Agent name (non-empty, trimmed, max 100 chars).
description
string
Optional agent description.
referral_code
string
Optional referral code.

Example Request

bash
curl -X POST https://bazaarlink.ai/api/v1/agents/register \
  -H "content-type: application/json" \
  -d '{
    "name": "My Agent",
    "description": "Autonomous research bot"
  }'

Response

json
{
  "api_key": "sk-bl-xxxxx...",
  "credits": 0.10,
  "credits_usd": "$0.1000",
  "claim_token": "abc...xyz",
  "claim_expires": "2026-04-27T10:00:00.000Z",
  "upgrade_url": "https://bazaarlink.ai/claim?token=...",
  "referral_code": "aBcDeFgH",
  "free_model": "auto:free",
  "message": "Welcome to BazaarLink!...",
  "referral_message": "Share referral link:...",
  "base_url": "https://bazaarlink.ai/api/v1",
  "docs": "https://bazaarlink.ai/llms.txt"
}

Errors: 400 (invalid body / missing name), 429 (rate limit — 1/IP/24h), 500 (internal).

Error Codes

BazaarLink uses standard HTTP status codes. Error responses follow the OpenAI format:

json
{
  "error": {
    "message": "Invalid or disabled API key.",
    "type": "invalid_request_error",
    "code": 401
  }
}
Code
Name
Description
400Bad RequestMalformed request, empty messages array, or missing required fields
401UnauthorizedAPI key is missing, invalid, or disabled
402Payment RequiredInsufficient account credits, per-key spend limit reached, or monthly/weekly budget cap exceeded
403ForbiddenAccount is suspended or does not have permission
413Payload Too LargeRequest body exceeds 10 MB; reduce content size or split the request
429Too Many RequestsRate limit exceeded; check Retry-After header before retrying
500Server ErrorInternal BazaarLink error
502Bad GatewayAll upstream providers failed; failover was attempted
503Service UnavailableNo upstream provider is configured for this model; contact admin

Handling Errors

python
from openai import OpenAI, APIError, RateLimitError

client = OpenAI(
    base_url="https://bazaarlink.ai/api/v1",
    api_key="sk-bl-YOUR_API_KEY",
)

try:
    response = client.chat.completions.create(
        model="openai/gpt-4.1",
        messages=[{"role": "user", "content": "Hello!"}],
    )
except RateLimitError:
    print("Rate limited — waiting before retry...")
except APIError as e:
    print(f"API error {e.status_code}: {e.message}")

Streaming Error Formats

Errors that occur before any tokens are streamed return a standard HTTP error response with a JSON body.

Errors that occur mid-stream are sent as SSE events with finish_reason: "error". Parse the error field in the delta.

typescript
// Error chunk sent mid-stream (finish_reason: "error")
type MidStreamError = {
  choices: [
    {
      index: 0;
      finish_reason: "error";
      delta: { content: "" };
      native_finish_reason: null;
      error: {
        code: number;
        message: string;
        metadata?: {
          provider_name?: string;
          raw?: unknown;
        };
      };
    }
  ];
};

Debugging

Use debug.echo_upstream_body: true to inspect the exact request body sent to the upstream provider. The transformed request is returned as the first SSE chunk. For development / debugging only — do not use in production.

json
// Request with debug enabled (streaming only)
{
  "model": "openai/gpt-4.1",
  "messages": [{ "role": "user", "content": "Hello" }],
  "stream": true,
  "debug": { "echo_upstream_body": true }
}
Support
Support
Hi! How can we help you?
Send a message and we'll get back to you soon.