Collapse Index Labs
/home /datasets /evals
> SYSTEM_READY

Collapse Index CLI

8ba7c07b991f04a58dedb69254c2f7edfc503ab9216250029a725bbf0a465235

Standard metrics often hide fragility. Collapse Index (CI) detects hidden brittleness under stress, providing sealed, audit-grade diagnostics for safer, more robust AI deployment.


πŸ“‘ The Hidden Signal

Standard metrics mask fragility.

Complex systems often appear reliable under standard evaluation conditions (high accuracy) yet degrade sharply when exposed to benign stress.

Collapse Index (CI) captures these abrupt instabilities. While accuracy remains flat, CI spikesβ€”revealing the hidden risk before it hits production.

False Positive Rate True Positive Rate
Conf (AUC: 0.50)
CI (AUC: 0.99)

Full Self-Driving

High accuracy on highways, but brittle failure under specific lighting changes.

Credit Approval

Silent denials triggered by mere rounding errors in applicant data.

LLMs

Hallucinating facts when prompts are rephrased slightly.


πŸ“ Design Principles

Rigorous by design.

CI is governed by three core principles ensuring it serves as a reliable diagnostic dimension.

Boundedness

Scores are normalized to [0,1], ensuring interpretability across different domains and model types.

Lightweight Stressors

Uses benign perturbations (paraphrases, pixel shifts) that preserve semantics while inducing ordinary stress.

Reproducibility

Each run produces sealed bundles with logs, hashes, and traces, enabling verification without disclosing internals.


πŸ”¬ Diagnostic Protocol

From input to sealed artifact.

The framework follows a strict pipeline to ensure audit-grade results.

01

Perturbation

Apply domain-appropriate lightweight transformations to inputs.

02

Signals

Derive bounded diagnostic signals representing instability.

03

Collapse Log

Record row-level diagnostics, outputs, and error states.

04

Aggregation

Combine signals into a single bounded score in [0, 1].

05

Artifact Bundle

Export sealed outputs (logs, plots, SHA-256 hashes) for verification.


πŸ“ Positioning CI

How CI Stacks.

CI uniquely combines stress-based probing with audit-aligned outputs.

Method Stress-based Lightweight Audit-aligned Modality-agnostic
Collapse Index (CI) βœ“ βœ“ βœ“ βœ“
HELM βœ— βœ— βœ“ βœ—
Calibration βœ— βœ“ βœ— βœ“
OOD Detection - βœ“ βœ— βœ“

* CI does not replace these methods; it complements them by detecting instability that standard benchmarks often miss.


πŸ“š Publications

Foundational Research.

The Collapse Index framework is openly published and DOI indexed, with full reproducibility and domain expert scrutiny across multiple disciplines.

Framework Paper

Core methodology, theoretical bounds, and design principles of the Collapse Index.

Read on Zenodo β†’

Supernova Paper

CI applied to astrophysical transient detection in synthetic supernova light curves.

Read on Zenodo β†’

ESA Telemetry Paper

First real-world operational validation on ESA satellite telemetry data.

Read on Zenodo β†’

CrackTest Paper

CI applied to LLM robustness testing using morphology-aligned perturbations.

Read on Zenodo β†’

✈️ Pilots & Services

Get Started with CI.

Ready to detect hidden AI brittleness? Explore our services and datasets.

Synthetic Datasets

Access curated stress-test packs (CISD) designed specifically for CI workflows.

Learn more β†’

Professional Evals

Full-service brittleness analysis. Send us your predictions, get enterprise-grade CI reports.

View Services β†’

Enterprise Support

Custom integration & team training on the Collapse Index methodology.

Contact Us β†’

🚨 Behavioral Drift Detection

CI Judgeβ„’ LLM Drift Detector

Detect behavioral drift in production AI systems with statistically rigorous LLM-as-a-Judge.

🎯 CI Judge

Binary drift detection with bias-corrected probabilities using confusion matrix inversion. Catches hostile refusals, quality degradation, and content shifts.

Features: 95% confidence intervals, ZDR (Zero Data Retention), risk classification

πŸ“Š Statistical Calibration

Bootstrap-based bias correction using calibration datasets. Removes LLM judge bias through empirical confusion matrix estimation.

Output: Bias-corrected probabilities, 95% confidence intervals, calibration quality reports

πŸ”Œ REST API

RESTful API with OpenRouter/OpenAI/Anthropic support. FastAPI backend with async processing, rate limiting, and enterprise-grade logging.

Status: ALPHA v0.1.0

Coming soon on Github β†’

🧩 Semantic Stability Wrapper

CIWrapβ„’ - Reduce Hallucinations with Semantic Wrappers.

Prevent LLM drift at generation time with semantic contracts. Validated across 9 real-world examples: 8-9/10 judge scores vs 6-7/10 for raw LLM.

πŸ’° Similar Cost, Better Quality

CIWrap costs $0.0010-0.0023 per call (similar to raw LLM) but delivers 8-9/10 quality vs 6-7/10. Prevents scope creep without price premium.

Real Data: Tested on GPT-4o-mini via OpenRouter across 9 examples

βš–οΈ LLM Judge Validated

CIWrap consistently scores 8-9/10 vs IDE Wrapper 6-7/10 across 9 real-world examples. Even LLMs think CIWrap prevents drift better.

Tested: Pricing pages, hero sections, forms, navigation, dialogs, grids, CTAs, dashboards, footers

🎯 6 Usage Modes

Pre-calibrated semantic contracts for UX→Code, Spec→Code, RAG, Agents, JSON, and CYOA. Multi-layered drift prevention through morphology monitoring, inheritance tracking, and confidence-based regeneration.

Foundation: CrackTest morphology-aligned perturbation testing (Kwon 2025)

πŸ“Š Real Cost Comparison

Approach Cost Range Judge Score
IDE Wrapper (Raw) $0.0010-0.0025 6-7/10
CIWrap $0.0010-0.0023 8-9/10

Real Data: 9 test examples (pricing, hero, forms, nav, dialogs, grids, CTAs, dashboards, footers)
Pricing: Calculated from GPT-4o-mini on OpenRouter
Key Insight: Similar cost per call, but CIWrap prevents scope creep β†’ fewer re-generations β†’ lower total cost

πŸš€ Quick Start

from ciwrap import CIWrapper

wrapper = CIWrapper(
    model="gpt-4o-mini",
    usage_mode="ux_to_code"
)

# Optional: Calibrate with your examples
wrapper.calibrate(examples)

# Use it - drift eliminated
code = await wrapper.generate(
    "Blue button with 'Sign Up' text"
)
Coming Soon on GitHub β†’

πŸ”¬ Research Partnerships

Collaborate with Us.

Are you a research lab, university, or institution interested in advancing Collapse Index methodology? We welcome collaborations on cross-domain validation, theoretical extensions, and safety-critical applications.

Contact for Research Partnerships β†’