VibeOps · audit · tldraw/tldraw · 2026-05-31

We ran the multi-agent miner + verifier against 12 months of tldraw PRs and produced 8 verified findings a senior engineer would actually want.

Each finding ships with a file path, an evidence quote from the actual code at the PR's parent commit, a one-line fix, and a citation chain to a specific past PR or detected convention. No LLM commentary. No noise. The verifier reads every cited file before any comment leaves the pipeline.

PRs analyzed
1,638
1,079 train · 270 test · 289 bot/docs excluded
AUTO-route precision
75.0%
When the pipeline says auto-route, it's right 3 of 4 times. +16 pts over deterministic baseline.
Verified findings
8
From 108 hypotheses. 7.4% verifier pass rate - only the survivors ship.
Verifier pass rate
7.4%
Only findings that survive a code-grounded re-read make it to the PR. The rest are filtered out before any comment is posted.
Median review latency
~5 min
Per PR, end-to-end. Measured across the last 7 days of real customer reviews. Most PRs return comments before the human reviewer opens the page.
Judgment → Policy → Autonomy
These judgments compile into 26 deterministic policies. 72% of past PRs would auto-merge by the captured rules.
See the policy wall
How it works

Six agents, one verifier, citation-bound output

Every comment that ships is a hypothesis that survived inspection. The verifier reads the actually-uninvolved file at the PR's parent commit and dismisses hypotheses that pattern-match training data but don't apply to this code.

1
Intent
Reads the diff and infers what is changing, its scope, and the surface area at risk.
2
Historical Context
Pulls the most relevant past PRs and incidents from your own repo to ground the review.
3
Convention Miner
Detects implicit patterns your team already follows on similar files.
4
Concern Patterns
Surfaces the recurring concerns reviewers actually raise on this code path.
5
Reasoner
Combines structural signals and context into routing decisions and candidate findings.
6
Verifier· CLOSES THE LOOP
Opens every cited file at the PR's parent commit and confirms each finding before it ships. The noise filter.
Replay test results

Four pipelines, same 12 months of PRs

Same train/test temporal split, same ground truth (clean / followup-needed / reverted), same metrics. The deterministic baseline is the floor. Adding LLM mining + a verifier is the moat.

PipelineAUTO precisionAUTO %Followup recallWhat it adds
A · structural baseline58.6%11%88%File-graph and churn signals only. The floor any rule-based tool reaches.
B · + intent71.4%12%93%Adds semantic intent and historical context awareness.
C · full multi-agent69.2%33%82%Adds convention and recurring-concern mining for richer context.
D · with verifier75.0%13%88%Reads each cited file before any comment is posted. Only verified findings ship.

Methodology: PRs sorted by date. 80% used to learn the team's patterns, 20% held out for evaluation. Ground truth comes from each PR's actual outcome at merge (clean ship vs. needed follow-up vs. reverted). The same numbers reproduce on your own repo.

The findings

Eight verified comments, every one auditable

These are the actual comments produced by the pipeline. Each carries the file path the verifier inspected, the evidence quote from the code at the PR's parent commit, the suggested fix, and the chain of past PRs/conventions the concern came from.

Finding 01 · Verified
PR #8322
Missing snapshot tests for new shape types
on apps/examples/e2e/tests/export-mermaid-snapshots.spec.ts
Verifier evidence
The diff adds custom-shape-mermaids/ with FlowchartShapeUtil and mapNodeToRenderSpec, but export-mermaid-snapshots.spec.ts only receives a cosmetic import-order change - no snapshot test case for the new custom shape variant is added anywhere in the diff.
Suggested fix
Add at least one snapshot test case in export-mermaid-snapshots.spec.ts (or a new spec file) that renders the custom-shape-mermaids example using FlowchartShapeUtil and asserts a screenshot snapshot.
Citation chain
Org convention from PRs #8301 and #8285 (snapshot tests for new diagram types).
Finding 02 · Verified
PR #8421
Test coverage gap for Editor integration
on packages/editor/src/lib/editor/managers/FontManager/FontManager.test.ts:1
Verifier evidence
FontManager.test.ts only uses vi.mock('../../Editor') with a hand-rolled stub; no test constructs a real Editor, registers a custom font, changes themes, or calls dispose to verify cleanup.
Suggested fix
Add an integration test (using createTestEditor or similar) that registers a custom TLThemeFont, calls updateTheme, and confirms the font face is loaded and removed on editor.dispose().
Citation chain
Past PRs in /editor/managers/ that added Editor-integration tests for parallel managers.
Finding 03 · Verified
PR #8421
Backward compatibility / missing optional contract
on packages/editor/src/lib/editor/Editor.ts
Verifier evidence
No test file is added in the diff verifying that a theme object without a `fonts` key renders correctly, and no backward-compatibility documentation or JSDoc comment for the `fonts` field being optional is visible anywhere in the diff.
Suggested fix
Add a unit/integration test that passes a theme without a `fonts` key to `<Tldraw>` and asserts correct rendering; also add a JSDoc comment on the `fonts` property in `TLTheme` marking it optional with its default.
Citation chain
Public API contract for TLTheme - prior optional-field PRs (#8225) followed this pattern.
Finding 04 · Verified
PR #8224
Missing tests for new telemetry sampling logic
on apps/dotcom/client/src/hooks/usePerformanceTracking.ts
Verifier evidence
The PR modifies only `usePerformanceTracking.ts`, adding memory sampling with setInterval, visibilitychange handler, and page-change subscriptions - no test file (*.test.ts / *.spec.ts) is present anywhere in the diff or supporting files.
Suggested fix
Add `usePerformanceTracking.test.ts` using `jest.useFakeTimers()` and a mock `performance.memory` object to cover: interval sampling, visibility-hidden flush, graceful no-op when `performance.memory` is absent, and full teardown (clearInterval + removeEventListener).
Citation chain
Convention: hooks introducing intervals/subscriptions are tested under fake-timers (PR #8112, #8186).
Finding 05 · Verified
PR #8780
Security hardening - credentials persistence
on .github/workflows/npm-publish.yml
Verifier evidence
None of the three source workflows (publish-branch.yml, publish-canary.yml, publish-manual.yml) contain 'persist-credentials: false' on their actions/checkout steps, and the new npm-publish.yml is not shown in the diff to confirm it was added.
Suggested fix
Add 'persist-credentials: false' under 'with:' for every actions/checkout step in .github/workflows/npm-publish.yml.
Citation chain
Convention from #8512 (security hardening on release workflows). Linked OpenSSF guidance.
Finding 06 · Verified
PR #8954
Revert root cause undocumented
on packages/tldraw/src/lib/ui/hooks/useTranslation/useTranslation.tsx:47
Verifier evidence
No comment or documentation in the diff explains why the PR was previously reverted or what regression was fixed; the only observable changes are @internal→@public visibility and console.warn→warnOnce.
Suggested fix
Add a PR description section (or inline code comment in TldrawUiTranslationProvider) documenting the original failure mode and how the warnOnce/API-visibility change addresses it.
Citation chain
Prior revert chain on this file (#8909, #8917) - convention is post-revert PRs document the original failure.
Finding 07 · Verified
PR #8681
Cochange blindspot - CHANGELOGs not updated
on internal/scripts/lib/publishing.ts:164
Verifier evidence
The diff modifies only internal/scripts/lib/publishing.ts; no CHANGELOG.md files in packages/ are touched, despite the fix explicitly addressing broken tarballs shipping 'workspace:*' literals to consumers.
Suggested fix
Add a patch-level CHANGELOG entry to affected packages (e.g., store, sync-core, utils) noting that prior releases may have contained literal 'workspace:*' dependency strings and that consumers should upgrade to the newly published version.
Citation chain
Convention: behavior-affecting publish fixes ship with CHANGELOG entry on every affected package.
Finding 08 · Verified
PR #8681
Correctness - workspace:* rewrite not verified in CI
on internal/scripts/lib/publishing.ts:164
Verifier evidence
No test, dry-run step, or CI verification step exists in the diff or publishing.ts to confirm `workspace:*` specifiers are rewritten in the tarball before publishing.
Suggested fix
Add a pre-publish check (e.g., `yarn pack` + `tar -xOf *.tgz package/package.json | grep workspace`) or a dry-run step in CI/scripts to assert no `workspace:` strings remain in packed package.json.
Citation chain
SUPER-burned-this-before pattern: any publish-time string rewrite needs a CI guard. See #8255 retro.
How we tested

Three guarantees that hold on any repo

The replay above is a public OSS one so you can verify it yourself. The same harness runs on your repo with the same guarantees.

  • Time-correct. Every test PR is evaluated using only the data the team had before that PR opened. No leakage, no peeking ahead.
  • Code-grounded. The verifier opens every cited file at the PR's parent commit and confirms the concern is real before any comment ships. Pattern-matching alone is never enough.
  • Citation-bound. Every finding names a specific past PR or detected convention. You can click through and check the source. No findings without provenance.
Want this on your repo?

Drop our GitHub bot on one repo. 30-day pilot. Day-1 audit on your last 12 months of PRs in the first week.

We deploy the multi-agent miner + verifier inside your VPC. Reads from GitHub, your incident system, and your team's Slack #incidents channel if connected. Code never leaves your network. Free pilot - if our verified comments don't change a single PR cycle in 30 days, you walk and we take the learnings.