concepts · tweet · 4 min
AI-Powered Code Review Pipeline with Risk-Based Gates
Ryan Carson · Feb 19, 2026
You want one loop:
-
The coding agent writes code
-
The repo enforces risk-aware checks before merge
-
A code review agent validates the PR
-
Evidence (tests + browser + review) is machine-verifiable
-
Findings turn into repeatable harness cases
The specific review agent can be
,
, CodeQL + policy logic, custom LLM review, or another service. The control-plane pattern stays the same.
I took inspiration from this helpful blog post by
Ryan Carson
@ryancarson
I've been grinding with Codex (on Extra High) through setting up our repo for Harness Engineering. The goal is to have Codex right and review 100% of the code. Getting much closer.
Your contract should define:
-
risk tiers by path
-
required checks by tier
-
docs drift rules for control-plane changes
-
evidence requirements for UI/critical flows
json
{
"version": "1",
"riskTierRules": {
"high": [
"app/api/legal-chat/**",
"lib/tools/**",
"db/schema.ts"
],
"low": ["**"]
},
"mergePolicy": {
"high": {
"requiredChecks": [
"risk-policy-gate",
"harness-smoke",
"Browser Evidence",
"CI Pipeline"
]
},
"low": {
"requiredChecks": ["risk-policy-gate", "CI Pipeline"]
}
}
}
Why it matters: it removes ambiguity and prevents silent drift between scripts, workflow files, and policy docs.
A reliable pattern is:
-
run
risk-policy-gatefirst -
verify deterministic policy + review-agent state
-
only then start
test/build/securityfanout jobs
This avoids wasting CI minutes on PR heads that are already blocked by policy or unresolved review findings.
typescript
const requiredChecks = computeRequiredChecks(changedFiles, riskTier);
await assertDocsDriftRules(changedFiles);
await assertRequiredChecksSuccessful(requiredChecks);
if (needsCodeReviewAgent(changedFiles, riskTier)) {
await waitForCodeReviewCompletion({ headSha, timeoutMinutes: 20 });
await assertNoActionableFindingsForHead(headSha);
}
This was the biggest practical lesson from real PR loops.
Treat review state as valid only when it matches the current PR head commit:
-
wait for the review check run on
headSha -
ignore stale summary comments tied to older SHAs
-
fail if the latest review run is non-success or times out
-
require reruns after each synchronize/push
-
clear stale gate failures by rerunning policy gate on the same head
If you skip this, you can merge a PR using stale “clean” evidence.
When multiple workflows can request reruns, duplicate bot comments and race conditions appear.
Use exactly one workflow as canonical rerun requester and dedupe by marker + sha:<head>.
typescript
const marker = '<!-- review-agent-auto-rerun -->';
const trigger = `sha:${headSha}`;
const alreadyRequested = comments.some((c) =>
c.body.includes(marker) && c.body.includes(trigger),
);
if (!alreadyRequested) {
postComment(`${marker}\n@review-agent please re-review\n${trigger}`);
}
If review findings are actionable, trigger a coding agent to:
-
read review context
-
patch code
-
run focused local validation
-
push fix commit to the same PR branch
Then let PR synchronize trigger the normal rerun path. Keep this deterministic:
-
pin model + effort for reproducibility
-
skip stale comments not matching current head
-
never bypass policy gates
A useful quality-of-life step:
-
after a clean current-head rerun
-
auto-resolve unresolved threads where all comments are from the review bot
-
never auto-resolve human-participated threads
Then rerun policy gate so required-conversation-resolution reflects the new state.
For UI or user-flow changes, require evidence manifests and assertions in CI (not just screenshots in PR text):
-
required flows exist
-
expected entrypoint was used
-
expected account identity is present for logged-in flows
-
artifacts are fresh and valid
bash
npm run harness:ui:capture-browser-evidence
npm run harness:ui:verify-browser-evidence
plaintext
production regression -> harness gap issue -> case added -> SLA tracked
This keeps fixes from becoming one-off patches and grows long-term coverage.
The most important lessons were:
-
Deterministic ordering matters: preflight gate must complete before CI fanout.
-
Current-head SHA matching is non-negotiable.
-
Review rerun requests need one canonical writer.
-
Review summary parsing should treat vulnerability language and weak-confidence summaries as actionable.
-
Auto-resolving bot-only threads reduces friction, but only after clean current-head evidence.
-
A remediation agent can shorten loop time significantly if guardrails stay strict.
General pattern terms:
-
code review agent -
remediation agent -
risk policy gate
One concrete implementation (ours):
-
code review agent: Greptile
-
remediation agent: Codex Action
-
canonical rerun workflow:
greptile-rerun.yml -
stale-thread cleanup workflow:
greptile-auto-resolve-threads.yml -
preflight policy workflow:
risk-policy-gate.yml
If you use a different reviewer, keep the same control-plane semantics and swap integration points.
bash
npm run typecheck
npm test
npm run build:ci
npm run harness:legal-chat:smoke
npm run harness:ui:pre-pr
npm run harness:risk-tier
npm run harness:weekly-metrics
-
Put risk + merge policy into one contract.
-
Enforce preflight gate before expensive CI.
-
Require clean code-review-agent state for current head SHA.
-
If findings exist, remediate in-branch and rerun deterministically.
-
Auto-resolve only bot-only stale threads after clean rerun.
-
Require browser evidence for UI/flow changes.
-
Convert incidents into harness cases and track loop SLOs.
That gives you a repo where agents can implement, validate, and be reviewed with deterministic, auditable standards.