VibeOps · audit · tldraw/tldraw · 2026-05-31

We ran the multi-agent miner + verifier against 12 months of tldraw PRs and produced 8 verified findings a senior engineer would actually want.

Each finding ships with a file path, an evidence quote from the actual code at the PR's parent commit, a one-line fix, and a citation chain to a specific past PR or detected convention. No LLM commentary. No noise. The verifier reads every cited file before any comment leaves the pipeline.

PRs analyzed

1,638

1,079 train · 270 test · 289 bot/docs excluded

AUTO-route precision

75.0%

When the pipeline says auto-route, it's right 3 of 4 times. +16 pts over deterministic baseline.

Verified findings

From 108 hypotheses. 7.4% verifier pass rate - only the survivors ship.

Verifier pass rate

7.4%

Only findings that survive a code-grounded re-read make it to the PR. The rest are filtered out before any comment is posted.

Median review latency

~5 min

Per PR, end-to-end. Measured across the last 7 days of real customer reviews. Most PRs return comments before the human reviewer opens the page.

Judgment → Policy → Autonomy

These judgments compile into 26 deterministic policies. 72% of past PRs would auto-merge by the captured rules.

See the policy wall

How it works

Six agents, one verifier, citation-bound output

Every comment that ships is a hypothesis that survived inspection. The verifier reads the actually-uninvolved file at the PR's parent commit and dismisses hypotheses that pattern-match training data but don't apply to this code.

1

Intent

Reads the diff and infers what is changing, its scope, and the surface area at risk.

2

Historical Context

Pulls the most relevant past PRs and incidents from your own repo to ground the review.

3

Convention Miner

Detects implicit patterns your team already follows on similar files.

4

Concern Patterns

Surfaces the recurring concerns reviewers actually raise on this code path.

5

Reasoner

Combines structural signals and context into routing decisions and candidate findings.

6

Verifier· CLOSES THE LOOP

Opens every cited file at the PR's parent commit and confirms each finding before it ships. The noise filter.

Replay test results

Four pipelines, same 12 months of PRs

Same train/test temporal split, same ground truth (clean / followup-needed / reverted), same metrics. The deterministic baseline is the floor. Adding LLM mining + a verifier is the moat.

Pipeline	AUTO precision	AUTO %	Followup recall	What it adds
A · structural baseline	58.6%	11%	88%	File-graph and churn signals only. The floor any rule-based tool reaches.
B · + intent	71.4%	12%	93%	Adds semantic intent and historical context awareness.
C · full multi-agent	69.2%	33%	82%	Adds convention and recurring-concern mining for richer context.
D · with verifier	75.0%	13%	88%	Reads each cited file before any comment is posted. Only verified findings ship.

Methodology: PRs sorted by date. 80% used to learn the team's patterns, 20% held out for evaluation. Ground truth comes from each PR's actual outcome at merge (clean ship vs. needed follow-up vs. reverted). The same numbers reproduce on your own repo.

The findings

Eight verified comments, every one auditable

These are the actual comments produced by the pipeline. Each carries the file path the verifier inspected, the evidence quote from the code at the PR's parent commit, the suggested fix, and the chain of past PRs/conventions the concern came from.

Finding 01 · Verified

PR #8322

Missing snapshot tests for new shape types

on apps/examples/e2e/tests/export-mermaid-snapshots.spec.ts

Verifier evidence

The diff adds custom-shape-mermaids/ with FlowchartShapeUtil and mapNodeToRenderSpec, but export-mermaid-snapshots.spec.ts only receives a cosmetic import-order change - no snapshot test case for the new custom shape variant is added anywhere in the diff.

Suggested fix

Add at least one snapshot test case in export-mermaid-snapshots.spec.ts (or a new spec file) that renders the custom-shape-mermaids example using FlowchartShapeUtil and asserts a screenshot snapshot.

Citation chain

Org convention from PRs #8301 and #8285 (snapshot tests for new diagram types).

Finding 02 · Verified

PR #8421

Test coverage gap for Editor integration

on packages/editor/src/lib/editor/managers/FontManager/FontManager.test.ts:1

Verifier evidence

FontManager.test.ts only uses vi.mock('../../Editor') with a hand-rolled stub; no test constructs a real Editor, registers a custom font, changes themes, or calls dispose to verify cleanup.

Suggested fix

Add an integration test (using createTestEditor or similar) that registers a custom TLThemeFont, calls updateTheme, and confirms the font face is loaded and removed on editor.dispose().

Citation chain

Past PRs in /editor/managers/ that added Editor-integration tests for parallel managers.

Finding 03 · Verified

PR #8421

Backward compatibility / missing optional contract

on packages/editor/src/lib/editor/Editor.ts

Verifier evidence

No test file is added in the diff verifying that a theme object without a `fonts` key renders correctly, and no backward-compatibility documentation or JSDoc comment for the `fonts` field being optional is visible anywhere in the diff.

Suggested fix

Add a unit/integration test that passes a theme without a `fonts` key to `<Tldraw>` and asserts correct rendering; also add a JSDoc comment on the `fonts` property in `TLTheme` marking it optional with its default.

Citation chain

Public API contract for TLTheme - prior optional-field PRs (#8225) followed this pattern.

Finding 04 · Verified

PR #8224

Missing tests for new telemetry sampling logic

on apps/dotcom/client/src/hooks/usePerformanceTracking.ts

Verifier evidence

The PR modifies only `usePerformanceTracking.ts`, adding memory sampling with setInterval, visibilitychange handler, and page-change subscriptions - no test file (*.test.ts / *.spec.ts) is present anywhere in the diff or supporting files.

Suggested fix

Add `usePerformanceTracking.test.ts` using `jest.useFakeTimers()` and a mock `performance.memory` object to cover: interval sampling, visibility-hidden flush, graceful no-op when `performance.memory` is absent, and full teardown (clearInterval + removeEventListener).

Citation chain

Convention: hooks introducing intervals/subscriptions are tested under fake-timers (PR #8112, #8186).

Finding 05 · Verified

PR #8780

Security hardening - credentials persistence

on .github/workflows/npm-publish.yml

Verifier evidence

None of the three source workflows (publish-branch.yml, publish-canary.yml, publish-manual.yml) contain 'persist-credentials: false' on their actions/checkout steps, and the new npm-publish.yml is not shown in the diff to confirm it was added.

Suggested fix

Add 'persist-credentials: false' under 'with:' for every actions/checkout step in .github/workflows/npm-publish.yml.

Citation chain

Convention from #8512 (security hardening on release workflows). Linked OpenSSF guidance.

Finding 06 · Verified

PR #8954

Revert root cause undocumented

on packages/tldraw/src/lib/ui/hooks/useTranslation/useTranslation.tsx:47

Verifier evidence

No comment or documentation in the diff explains why the PR was previously reverted or what regression was fixed; the only observable changes are @internal→@public visibility and console.warn→warnOnce.

Suggested fix

Add a PR description section (or inline code comment in TldrawUiTranslationProvider) documenting the original failure mode and how the warnOnce/API-visibility change addresses it.

Citation chain

Prior revert chain on this file (#8909, #8917) - convention is post-revert PRs document the original failure.

Finding 07 · Verified

PR #8681

Cochange blindspot - CHANGELOGs not updated

on internal/scripts/lib/publishing.ts:164

Verifier evidence

The diff modifies only internal/scripts/lib/publishing.ts; no CHANGELOG.md files in packages/ are touched, despite the fix explicitly addressing broken tarballs shipping 'workspace:*' literals to consumers.

Suggested fix

Add a patch-level CHANGELOG entry to affected packages (e.g., store, sync-core, utils) noting that prior releases may have contained literal 'workspace:*' dependency strings and that consumers should upgrade to the newly published version.

Citation chain

Convention: behavior-affecting publish fixes ship with CHANGELOG entry on every affected package.

Finding 08 · Verified

PR #8681

Correctness - workspace:* rewrite not verified in CI

on internal/scripts/lib/publishing.ts:164

Verifier evidence

No test, dry-run step, or CI verification step exists in the diff or publishing.ts to confirm `workspace:*` specifiers are rewritten in the tarball before publishing.

Suggested fix

Add a pre-publish check (e.g., `yarn pack` + `tar -xOf *.tgz package/package.json | grep workspace`) or a dry-run step in CI/scripts to assert no `workspace:` strings remain in packed package.json.

Citation chain

SUPER-burned-this-before pattern: any publish-time string rewrite needs a CI guard. See #8255 retro.

How we tested

Three guarantees that hold on any repo

The replay above is a public OSS one so you can verify it yourself. The same harness runs on your repo with the same guarantees.

Time-correct. Every test PR is evaluated using only the data the team had before that PR opened. No leakage, no peeking ahead.
Code-grounded. The verifier opens every cited file at the PR's parent commit and confirms the concern is real before any comment ships. Pattern-matching alone is never enough.
Citation-bound. Every finding names a specific past PR or detected convention. You can click through and check the source. No findings without provenance.

Want this on your repo?

Drop our GitHub bot on one repo. 30-day pilot. Day-1 audit on your last 12 months of PRs in the first week.

We deploy the multi-agent miner + verifier inside your VPC. Reads from GitHub, your incident system, and your team's Slack #incidents channel if connected. Code never leaves your network. Free pilot - if our verified comments don't change a single PR cycle in 30 days, you walk and we take the learnings.

Try with your org View live demo dashboard