Build with Canary in minutes
Transparent, evals-first AI monitoring. Framework-agnostic — works with OpenAI, Anthropic, Cohere, or any LLM. Three API calls to catch hallucinations before your users do.
Install
Add the Canary SDK to your project:
# npm npm install canary-ai # yarn yarn add canary-ai # or use the REST API directly — no SDK required
Connect
Initialize the client with your API key. Get yours at canary.ai/signup — takes 30 seconds.
import { Canary } from 'canary-ai'; // Initialize with your API key const canary = new Canary({ apiKey: 'can_your_key_here', }); // Register your LLM provider (one-time setup) const monitor = await canary.connect({ provider: 'openai', model: 'gpt-4o', name: 'production', }); // monitor.id — use this to evaluate responses
Framework-agnostic. Works with any LLM provider — OpenAI, Anthropic, Cohere, Mistral, or a self-hosted model. Just pass the provider name you want to track.
Evaluate
Call evaluate() after every LLM response. Canary runs dual-layer scoring — heuristics catch obvious failures instantly, then the AI layer scores coherence, factuality, and relevance.
// Wrap your LLM call const llmResponse = await openai.chat.completions.create({ model: 'gpt-4o', messages: [{ role: 'user', content: userInput }], }); const output = llmResponse.choices[0].message.content; // Evaluate the response const result = await canary.evaluate({ input: userInput, output: output, }); // Act on the result if (!result.passed) { console.warn('Quality issue detected:', result.flags); // fallback logic here }
{
"passed": true,
"score": 94,
"flags": [],
"breakdown": {
"coherence": 97,
"factuality": 91,
"relevance": 95
}
}
When a hallucination is detected:
{
"passed": false,
"score": 19,
"flags": ["likely_hallucination", "overconfidence"],
"breakdown": {
"coherence": 45,
"factuality": 8,
"relevance": 22
}
}
Authentication
All API requests require an API key passed in the X-API-Key header. Keys are prefixed with can_.
curl -X POST https://canary.polsia.app/api/monitor/evaluate \ -H "Content-Type: application/json" \ -H "X-API-Key: can_your_key_here" \ -d '{ "input": "What is 2+2?", "output": "4" }'
REST API Reference
Register a new LLM monitor. Returns an API key for evaluations.
| Parameter | Type | Required | Description |
|---|---|---|---|
| provider | string | required | LLM provider (e.g., openai, anthropic, cohere) |
| model | string | optional | Model name (e.g., gpt-4o). Defaults to default. |
| name | string | optional | Friendly name for this monitor (e.g., production) |
Evaluate an LLM response. Requires X-API-Key header.
| Parameter | Type | Required | Description |
|---|---|---|---|
| input | string | required | The user prompt or input sent to the LLM |
| output | string | required | The LLM's response to evaluate |
| context | string | optional | System prompt or context provided to the LLM |
Get quality metrics for your monitor. Returns rolling score, hallucination count, and recent alerts. Requires X-API-Key header.
Response Format
All evaluation responses include:
- passed — boolean, overall pass/fail
- score — 0–100 quality score
- flags — array of detected issues (e.g., likely_hallucination, identity_leak, repetition, overconfidence, refusal)
- breakdown — per-dimension scores (coherence, factuality, relevance)
Threshold: Responses with score ≥ 70 pass by default. Any flagged anomaly (likely_hallucination, identity_leak) auto-fails regardless of score.
Error Codes
| Status | Meaning |
|---|---|
| 400 | Missing required parameters |
| 401 | Invalid or missing API key |
| 429 | Rate limit exceeded |
| 500 | Internal evaluation error |
Ready to stop babysitting your LLM?
Free to start. No credit card. API key in 30 seconds.
Get your API key →