1. SAE Toolkit
(This is the heart: training, auto-interpret, steering, feature search)
Pages:
- Training SAEs
- Endpoint:
/v1/sae/train - Parameters: layers, dict_size, sparsity, warm_start.
- Best practices: which layers to pick, probe set size.
- Metrics: explained variance, dead-rate, selectivity.
- Feature Explorer
/v1/sae/features- Search features by ID, label, activation.
- Example: “Feature 471 → negative sentiment in credit risk.”
- Labeling Features
- How to add human labels (
/v1/sae/label). - Guidelines for good labels (not too broad, not too narrow).
- Steering Playground
/v1/sae/steer/preview- Adjust feature weights → see generations before/after.
- Example: Turn down hallucination trigger, rerun answer.
- Feature Drift (Optional Link to Alignment)
2. Trading Signals
Pages:
- Signals API
- Endpoint details (
/v1/trading/signals). - Example: BTC-USD daily signals.
- Backtesting
- How to run (
/v1/trading/backtests). - Sample equity curve + Sharpe ratio output.
- Feature Correlations
- Explaining correlation heatmap.
- Use cases: “Volume vs Price Uptrend,” “Sentiment vs Returns.”
- Example Workflow
- Python notebook: backtest BTC signals, visualize, export CSV.
3. Audit & Assurance
Pages:
- Audit Runs
- How to trigger audit (
/v1/audit/runs). - Choosing regulators (BIS/SEC/ECB).
- Decision Traces
- What traces show (layer, feature, weight).
- Example: Loan rejection model → why it rejected.
- Before vs After Visualization
- Black box vs explained example.
- Exportable Reports
/v1/audit/reports/{id}?format=pdf- Sample report screenshots.
- Compliance Guidance
- How reports help with regulators.
4. Alignment Guard (Fine-Tuning Check)
Pages:
- Alignment Audits
/v1/alignment/audits- Inputs: base_model, fine-tuned_model, layers, sae_mode.
- Outputs: drift heatmap, risky features, alignment score.
- Metrics Explained
- Drift Score (Δ activation).
- Risk Feature Rate.
- Alignment Correlation Index.
- Hallucination disagreement %.
- Certificates
- Downloading PDF/JSON certificate.
- Example use: “model alignment verified post fine-tuning.”
- Case Study
- How drift detection flagged a misaligned fine-tune.
5. Red-Team & Harmful Feature Removal
Pages:
- Red-Team Runs
/v1/redteam/runs- Choose strategies: jailbreak, bias, misinformation, PII.
- Feature Steering Red-Team
/v1/redteam/feature-steer- Example: boosting harmful features to expose failures.
- Mitigation Tools
/v1/mitigations/apply- Types: steer, prune, router, fine-tune penalty.
- Example: clamp harmful features → drop unsafe output %.
- Failure Mode Catalog
/v1/failure-modes- View discovered modes + linked features.
- Red-Team Dashboard
- Reports: broken prompts, harm categories, severity index.
API Spec - Examples
Auth
POST /v1/auth/token→{ access_token, expires_in }
Common & Catalog
GET /v1/models→ list deployed base/fine-tuned models
GET /v1/models/{model_id}→ detail
GET /v1/checkpoints?model_id=...→ training/fine-tune checkpoints
POST /v1/datasets/probes(json or file) → register probe set (benign, safety, finance)
GET /v1/datasets/probes→ list probe sets
POST /v1/uploads(multipart) → upload documents or CSVs (e.g., finance data)
GET /v1/exports/{export_id}→ signed link
POST /v1/webhooks→ subscribe to job status (job_succeeded,job_failed)
1. Interpretability-as-a-Service (SAE)
POST /v1/sae/train- body:
{ model_id, layers:[...], target_streams:["resid_pre","mlp_out"], dict_size, sparsity, epochs, probe_set_id, warm_start_sae_id? } - →
{ sae_job_id }
GET /v1/sae/jobs/{sae_job_id}→ status, metrics (explained_var, dead_rate)
GET /v1/sae/models?model_id=...→ list trained SAEs
POST /v1/sae/label- body:
{ sae_id, feature_id, label, tags[], notes } - →
{ feature_id, label }
GET /v1/sae/features?sae_id=...&q=...&risk=true|false- → feature cards (id, label, examples, selectivity)
POST /v1/sae/steer/preview- body:
{ model_id, sae_id, edits:[{feature_id, scale}], prompts[] } - → side-by-side generations + activation deltas
2. Trading Signals
GET /v1/trading/signals?symbol=BTC-USD&window=1d&features=...&model_id=...- →
{ timestamp, signal: "buy|sell|hold", score, rationale, feature_contrib[] }[]
POST /v1/trading/backtests- body:
{ model_id, symbols[], start, end, features[], target, split: {train, test}, metrics[] } - →
{ backtest_id }
GET /v1/trading/backtests/{backtest_id}- → curves, metrics (Sharpe, MDD, hit-rate), trades, feature importance
GET /v1/trading/feature-corr?symbols[]=...&features[]=...&model_id=...- → correlation matrix + p-values
3. Audit & Assurance
POST /v1/audit/runs- body:
{ model_id, probe_set_id, regs: ["BIS","SEC","ECB"], views:["before_after","trace"], outputs:["pdf","json"] } - →
{ audit_id }
GET /v1/audit/runs/{audit_id}- → status, key findings, before vs after visuals (URIs), trace snapshots
GET /v1/audit/reports/{audit_id}?format=pdf|json- → downloadable report
GET /v1/audit/traces?model_id=...&layer=...&token_range=...- → decision trace frames (layer/feature/weight)
4. Fine-Tuning Alignment Guard (post-training)
POST /v1/alignment/audits- body:
{ base_model_id, finetuned_model_id, probe_set_id, layers:[...], sae_mode:"train_new|project_base", metrics:["drift","risk_corr","hallucination","attention_health","tuned_lens"] } - →
{ alignment_audit_id }
GET /v1/alignment/audits/{alignment_audit_id}- →
{ score, drift_heatmap_uri, risky_features[], examples[], metrics:{KL, CKA, probe_drift, outlier_pct, disagreement_pct} }
GET /v1/alignment/certificates/{alignment_audit_id}?format=pdf|json- → signed certificate/report
5. Red-Team & Harmful Feature Removal
POST /v1/redteam/runs- body:
{ model_id, probe_set_id?, strategies:["prompt_gen","jailbreak","bias","pii","misinfo"], rounds: number } - →
{ redteam_id }
GET /v1/redteam/runs/{redteam_id}- → findings, prompts that broke guardrails, category scores
POST /v1/redteam/feature-steer- body:
{ model_id, sae_id, edits:[{feature_id, scale}], attack_prompts[] } - → harm rate before/after, sample outputs
POST /v1/mitigations/apply- body:
{ model_id, type:"steer|prune|router|finetune_penalty", params:{...} } - →
{ mitigation_id }
GET /v1/mitigations/{mitigation_id}→ status, diff metrics
GET /v1/failure-modes?model_id=...- → catalog (mode, triggers, linked features, severity)
Standard responses & jobs
- All long ops return
{job_id}; pollGET /v1/jobs/{job_id}
- Every list supports
?page&limit&sort&filter=...
- Errors:
{ error:{ code, message, details? } }