Question 1

What is AI API relay detection?

Accepted Answer

AI API relay detection is an automated test suite that verifies whether an OpenAI-compatible API endpoint honestly executes your requests. BazaarLink Probe sends standardised probes to detect model swapping, token padding, system prompt injection, secret exfiltration, and 50 other security risks, outputting a 0–100 score.

Question 2

How can I tell if a relay is swapping models?

Accepted Answer

The most reliable method is model fingerprinting: send questions only a specific model can answer correctly (e.g. knowledge cutoff date, specific capability tests), then compare the response against the expected model. BazaarLink Probe includes these probes and automatically flags swap risks.

Question 3

What is token padding?

Accepted Answer

Token padding (token inflation) means a relay reports higher prompt_tokens or completion_tokens in the API usage field than actually consumed, causing you to overpay. Minor inflation (5–15%) is hard to spot. BazaarLink Probe detects it by comparing precisely known token counts against reported values.

Question 4

What API latency is considered normal?

Accepted Answer

TTFT (Time to First Token) under 500ms is generally healthy; over 2 seconds suggests performance issues or extra processing layers. Large models like GPT-4o average 300–800ms TTFT. BazaarLink Probe's latency test benchmarks your endpoint against baseline values.

Question 5

How is BazaarLink Probe different from a ping test?

Accepted Answer

Ping only measures network connectivity (ICMP packets). BazaarLink Probe is a full application-layer (L7) test that sends real LLM requests to verify model identity, token counts, refusal behaviour, stream format, and system prompt injection — 50 indicators ping cannot cover.

Question 6

How can I use the detection results to choose a provider?

Accepted Answer

Enter each provider's API endpoint into BazaarLink Probe and compare scores and risk flags. A higher score (closer to 100) with no red flags indicates a more trustworthy endpoint. Focus on three key indicators: model authenticity (no swapping), token accuracy (no padding), and latency performance (TTFT).

Model	Quick	Full
gpt-4	$0.870	$1.260
claude-opus-4.7	$0.235	$0.375
claude-opus-4.6	$0.235	$0.375
claude-sonnet-4.6	$0.141	$0.225
gpt-5.4	$0.133	$0.215
gpt-5.2 / 5.3-codex	$0.114	$0.189
gemini-3.1-pro	$0.106	$0.172
gpt-4o	$0.102	$0.160
claude-haiku-4.5	$0.047	$0.075
gpt-5.4-mini	$0.040	$0.065
glm-5.1	$0.035	$0.054
glm-5	$0.026	$0.040
gpt-3.5-turbo	$0.018	$0.026

LLM API Relay Quality Check

Attacks this tool detects

Response Tampering

Conditional Injection

Secret Scanning

Frequently Asked Questions