Senior Judgment Compiler · the moat
Senior judgment → deterministic trust system.
Type a rule the way a staff engineer would say it in standup. VibeOps compiles it into a trust policy, a runtime check, an eval case, a model route, an exception rule, a certificate section, and a historical-replay benchmark. Demo An internal team can write a Claude skill — they cannot easily write the system around it.
Senior rule (natural language)
Walk through the canned artifacts. The live compiler runs inside your VPC after onboarding.
Policy
Eval
Check
Route
Cert
Replay
What you get back
One sentence in. Six artifacts out. Every artifact is versioned, audited, and re-runs against your last 90 days of history so you see — before you ship the rule — what would have changed.
Trust policy
Versioned. Auditable.
Eval case
Permanent in CI.
GitHub check
Pre-merge gate.
Model route
Per-tier.
Cert section
Surfaces in every PR.
Replay benchmark
Backtested on 90d.
Compiled artifacts
Each tab shows one artifact the rule above generates. All six are produced together and stay in sync.
Trust policy
policy IntegrationRetryChange:
applies_when:
workflow = integration
files_touch = api_client | connector
diff_contains = retry | timeout | backoff
require:
idempotency_test = true
timeout_behavior_evidence = true
contract_test = true
route:
basic_policy_checks → slm
integration_reasoning → frontier_model
ambiguity → human_owner (@connectors-team)
decision:
all_evidence_present → candidate_for_auto_approval
else → escalateEval case
eval IntegrationRetryChange.v1:
goldset: 42 historical PRs (87 retry-touching)
predicates: idempotency_evidence_detector,
timeout_behavior_detector,
contract_test_presence
scoring: macro_f1
baseline: raw_claude → recall 0.74, fp 0.31
target: claude+pack → recall 0.88, fp 0.18
failure_modes:
- "retry without idempotency proof"
- "timeout unspecified on 5xx storm"
- "contract test absent on payload schema"GitHub check
vibeops/trust · integration-retry-change · required
✓idempotency_test detected
✗timeout_behavior_evidence missing
✗contract_test missing
·awaiting @connectors-team approval
Model route
1
SLM (Kimi-32B)
Surface checks: naming, ownership, files-touched policy. $0.003 / PR.
2
Frontier (Claude Opus 4.7)
Contract reasoning over retry semantics + idempotency contract. $0.42 / PR.
3
Human owner
Only when contract proof remains ambiguous. ~12% of triggered PRs.
Trust Certificate section
Required proof · integration retry change
- · Idempotency test
- · Timeout behavior evidence
- · Contract test
Auto-renders in every certificate where this policy applies.
Replay benchmark · 90 days
Before rule
74%
recall · 31% fp
After rule
88%
recall · 18% fp
Net catches
+14
contract gaps surfaced
Failure Atlas · what compiles into Trust Packs
VibeOps has accumulated patterns of how agentic engineering fails. Each pattern feeds the relevant Trust Pack's evals. Private customer data stays private; generalized patterns improve every deployment.
Missing backward compatibilityContract harness
Fake test coverage (AI-written)Mutation eval
Missing rollback pathMigration Pack rollback gate
Flag boundary leakBoundary classifier
Duplicate abstractionRepo graph + semantic match
Permission boundary ambiguityAppSec policy harness
Trust Pack Library · the network effect
Each pack ships pre-built policies, evals, model routes, and certificate sections — built from 420+ replayed PRs across customer deployments. Activate one for the workflow that matters most.
Integration Change Pack420 PRs · 36 evals
availableFrontend Critical Flow Pack312 PRs · 24 evals
availableMigration Pack188 PRs · 18 evals
availableFeature Flag Rollout Pack154 PRs · 14 evals
availableAuth/RBAC Pack142 PRs · 22 evals
availableCore Infra Pack96 PRs · 16 evals
preview