Senior Judgment Compiler · the moat

Senior judgment → deterministic trust system.

Type a rule the way a staff engineer would say it in standup. VibeOps compiles it into a trust policy, a runtime check, an eval case, a model route, an exception rule, a certificate section, and a historical-replay benchmark. Demo An internal team can write a Claude skill — they cannot easily write the system around it.

Senior rule (natural language)

Walk through the canned artifacts. The live compiler runs inside your VPC after onboarding.

Policy

Eval

Check

Route

Cert

Replay

What you get back

One sentence in. Six artifacts out. Every artifact is versioned, audited, and re-runs against your last 90 days of history so you see — before you ship the rule — what would have changed.

Trust policy

Versioned. Auditable.

Eval case

Permanent in CI.

GitHub check

Pre-merge gate.

Model route

Per-tier.

Cert section

Surfaces in every PR.

Replay benchmark

Backtested on 90d.

Compiled artifacts

Each tab shows one artifact the rule above generates. All six are produced together and stay in sync.

Trust policy

Compiled

policy IntegrationRetryChange:
  applies_when:
    workflow      = integration
    files_touch   = api_client | connector
    diff_contains = retry | timeout | backoff
  require:
    idempotency_test           = true
    timeout_behavior_evidence  = true
    contract_test              = true
  route:
    basic_policy_checks  → slm
    integration_reasoning → frontier_model
    ambiguity             → human_owner (@connectors-team)
  decision:
    all_evidence_present → candidate_for_auto_approval
    else                 → escalate

Eval case

Compiled

eval IntegrationRetryChange.v1:
  goldset:    42 historical PRs (87 retry-touching)
  predicates: idempotency_evidence_detector,
              timeout_behavior_detector,
              contract_test_presence
  scoring:    macro_f1
  baseline:   raw_claude  → recall 0.74, fp 0.31
  target:     claude+pack → recall 0.88, fp 0.18
  failure_modes:
    - "retry without idempotency proof"
    - "timeout unspecified on 5xx storm"
    - "contract test absent on payload schema"

GitHub check

Compiled

vibeops/trust · integration-retry-change · required

✓idempotency_test detected

✗timeout_behavior_evidence missing

✗contract_test missing

·awaiting @connectors-team approval

Model route

Compiled

SLM (Kimi-32B)

Surface checks: naming, ownership, files-touched policy. $0.003 / PR.

Frontier (Claude Opus 4.7)

Contract reasoning over retry semantics + idempotency contract. $0.42 / PR.

Human owner

Only when contract proof remains ambiguous. ~12% of triggered PRs.

Trust Certificate section

Compiled

Required proof · integration retry change

· Idempotency test
· Timeout behavior evidence
· Contract test

Auto-renders in every certificate where this policy applies.

Replay benchmark · 90 days

Compiled

Before rule

74%

recall · 31% fp

After rule

88%

recall · 18% fp

Net catches

+14

contract gaps surfaced

Failure Atlas · what compiles into Trust Packs

VibeOps has accumulated patterns of how agentic engineering fails. Each pattern feeds the relevant Trust Pack's evals. Private customer data stays private; generalized patterns improve every deployment.

Missing backward compatibilityContract harness

Fake test coverage (AI-written)Mutation eval

Missing rollback pathMigration Pack rollback gate

Flag boundary leakBoundary classifier

Duplicate abstractionRepo graph + semantic match

Permission boundary ambiguityAppSec policy harness

Trust Pack Library · the network effect

Each pack ships pre-built policies, evals, model routes, and certificate sections — built from 420+ replayed PRs across customer deployments. Activate one for the workflow that matters most.

Integration Change Pack420 PRs · 36 evals

available

Frontend Critical Flow Pack312 PRs · 24 evals

available

Migration Pack188 PRs · 18 evals

available

Feature Flag Rollout Pack154 PRs · 14 evals

available

Auth/RBAC Pack142 PRs · 22 evals

available

Core Infra Pack96 PRs · 16 evals

preview