Why SIF¶
The Salient Intelligence Format exists because every existing format wastes tokens when delivering structured security intelligence to AI agents. We measured the waste and built a format that eliminates it.
The Research¶
We benchmarked identical security intelligence encoded in different formats, measuring token count and AI parse quality:
| Format | Tokens (same data) | Overhead | AI Parse Quality |
|---|---|---|---|
| Markdown tables | 412 | 51% structural waste | Excellent |
| YAML | 398 | 50% overhead | Excellent |
| Natural prose | 287 | 15% filler words | Excellent |
| JSON | 253 | 1% structural | Excellent |
| SIF | 183 | <5% structural | Excellent (with schema header) |
SIF achieves 36% fewer tokens than prose and 60% fewer than markdown tables for identical information. The schema header costs ~60 tokens once, then every subsequent line is maximally compressed.
Context Rot¶
Anthropic's own research on context utilization shows that AI models degrade in recall and reasoning as context windows fill. When your security posture data competes for space with the actual task (incident analysis, exercise facilitation, threat assessment), format efficiency becomes a capability multiplier.
The math is simple
If your compiled twin takes 3,000 tokens in markdown but 800 tokens in SIF, you have 2,200 more tokens for reasoning, task context, and output quality. At scale, this is the difference between an AI that "knows" your organization and one that forgets half of it.
Five White Spaces¶
SIF fills gaps that no existing format addresses:
- Security-domain compression — NIST CSF function codes, severity levels, confidence tiers are first-class citizens, not embedded in prose
- Temporal encoding — trends (improving/stable/declining) and trajectory are native, not derived
- Contradiction representation —
{declared:X actual:Y}captures the gap between policy and reality - Confidence hierarchy —
V>O>D>U>X(verified > observed > declared > uncertain > contradicted) is structural, not annotative - Tiered compilation — same source data produces executive (~150 tokens), standard (~800), and full (~3K) views
Format Comparison¶
| Capability | JSON | YAML | Markdown | STIX/OSCF | SIF |
|---|---|---|---|---|---|
| Token efficiency | Good | Poor | Poor | Poor | Best |
| AI parseability | Excellent | Excellent | Excellent | Poor | Excellent |
| Security-domain native | No | No | No | Partial | Yes |
| Confidence levels | Manual | Manual | Manual | No | Native |
| Trend encoding | No | No | No | No | Native |
| Contradiction capture | No | No | No | No | Native |
| Tiered detail levels | No | No | No | No | Native |
| Human readable | Poor | Good | Excellent | Poor | Readable with schema |
When to Use SIF¶
SIF is the right choice when:
- An AI agent needs organizational security context in its prompt
- Context window space is constrained (always)
- You need multiple detail levels from the same data
- Confidence and contradiction tracking matters
- The consumer is a machine, not a human
For human consumption, the same compiler produces markdown or PDF. SIF is not a replacement for human-readable formats — it is a purpose-built machine format that coexists with them.