What is AI API relay detection?+
AI API relay detection is an automated test suite that verifies whether an OpenAI-compatible API endpoint honestly executes your requests. BazaarLink Probe sends standardised probes to detect model swapping, token padding, system prompt injection, secret exfiltration, and 50 other security risks, outputting a 0–100 score.
How can I tell if a relay is swapping models?+
The most reliable method is model fingerprinting: send questions only a specific model can answer correctly (e.g. knowledge cutoff date, specific capability tests), then compare the response against the expected model. BazaarLink Probe includes these probes and automatically flags swap risks.
What is token padding?+
Token padding (token inflation) means a relay reports higher prompt_tokens or completion_tokens in the API usage field than actually consumed, causing you to overpay. Minor inflation (5–15%) is hard to spot. BazaarLink Probe detects it by comparing precisely known token counts against reported values.
What API latency is considered normal?+
TTFT (Time to First Token) under 500ms is generally healthy; over 2 seconds suggests performance issues or extra processing layers. Large models like GPT-4o average 300–800ms TTFT. BazaarLink Probe's latency test benchmarks your endpoint against baseline values.
How is BazaarLink Probe different from a ping test?+
Ping only measures network connectivity (ICMP packets). BazaarLink Probe is a full application-layer (L7) test that sends real LLM requests to verify model identity, token counts, refusal behaviour, stream format, and system prompt injection — 50 indicators ping cannot cover.
How can I use the detection results to choose a provider?+
Enter each provider's API endpoint into BazaarLink Probe and compare scores and risk flags. A higher score (closer to 100) with no red flags indicates a more trustworthy endpoint. Focus on three key indicators: model authenticity (no swapping), token accuracy (no padding), and latency performance (TTFT).