← Library

concepts · tweet · 4 min

AI-Powered Code Review Pipeline with Risk-Based Gates

Ryan Carson · Feb 19, 2026

You want one loop:

  1. The coding agent writes code

  2. The repo enforces risk-aware checks before merge

  3. A code review agent validates the PR

  4. Evidence (tests + browser + review) is machine-verifiable

  5. Findings turn into repeatable harness cases

The specific review agent can be

,

, CodeQL + policy logic, custom LLM review, or another service. The control-plane pattern stays the same.

I took inspiration from this helpful blog post by

Ryan Carson

@ryancarson

I've been grinding with Codex (on Extra High) through setting up our repo for Harness Engineering. The goal is to have Codex right and review 100% of the code. Getting much closer.

Your contract should define:

  • risk tiers by path

  • required checks by tier

  • docs drift rules for control-plane changes

  • evidence requirements for UI/critical flows

json

{
  "version": "1",
  "riskTierRules": {
    "high": [
      "app/api/legal-chat/**",
      "lib/tools/**",
      "db/schema.ts"
    ],
    "low": ["**"]
  },
  "mergePolicy": {
    "high": {
      "requiredChecks": [
        "risk-policy-gate",
        "harness-smoke",
        "Browser Evidence",
        "CI Pipeline"
      ]
    },
    "low": {
      "requiredChecks": ["risk-policy-gate", "CI Pipeline"]
    }
  }
}

Why it matters: it removes ambiguity and prevents silent drift between scripts, workflow files, and policy docs.

A reliable pattern is:

  1. run risk-policy-gate first

  2. verify deterministic policy + review-agent state

  3. only then start test/build/security fanout jobs

This avoids wasting CI minutes on PR heads that are already blocked by policy or unresolved review findings.

typescript

const requiredChecks = computeRequiredChecks(changedFiles, riskTier);
await assertDocsDriftRules(changedFiles);
await assertRequiredChecksSuccessful(requiredChecks);

if (needsCodeReviewAgent(changedFiles, riskTier)) {
  await waitForCodeReviewCompletion({ headSha, timeoutMinutes: 20 });
  await assertNoActionableFindingsForHead(headSha);
}

This was the biggest practical lesson from real PR loops.

Treat review state as valid only when it matches the current PR head commit:

  • wait for the review check run on headSha

  • ignore stale summary comments tied to older SHAs

  • fail if the latest review run is non-success or times out

  • require reruns after each synchronize/push

  • clear stale gate failures by rerunning policy gate on the same head

If you skip this, you can merge a PR using stale “clean” evidence.

When multiple workflows can request reruns, duplicate bot comments and race conditions appear.

Use exactly one workflow as canonical rerun requester and dedupe by marker + sha:<head>.

typescript

const marker = '<!-- review-agent-auto-rerun -->';
const trigger = `sha:${headSha}`;
const alreadyRequested = comments.some((c) =>
  c.body.includes(marker) && c.body.includes(trigger),
);

if (!alreadyRequested) {
  postComment(`${marker}\n@review-agent please re-review\n${trigger}`);
}

If review findings are actionable, trigger a coding agent to:

  1. read review context

  2. patch code

  3. run focused local validation

  4. push fix commit to the same PR branch

Then let PR synchronize trigger the normal rerun path. Keep this deterministic:

  • pin model + effort for reproducibility

  • skip stale comments not matching current head

  • never bypass policy gates

A useful quality-of-life step:

  • after a clean current-head rerun

  • auto-resolve unresolved threads where all comments are from the review bot

  • never auto-resolve human-participated threads

Then rerun policy gate so required-conversation-resolution reflects the new state.

For UI or user-flow changes, require evidence manifests and assertions in CI (not just screenshots in PR text):

  • required flows exist

  • expected entrypoint was used

  • expected account identity is present for logged-in flows

  • artifacts are fresh and valid

bash

npm run harness:ui:capture-browser-evidence
npm run harness:ui:verify-browser-evidence

plaintext

production regression -> harness gap issue -> case added -> SLA tracked

This keeps fixes from becoming one-off patches and grows long-term coverage.

The most important lessons were:

  1. Deterministic ordering matters: preflight gate must complete before CI fanout.

  2. Current-head SHA matching is non-negotiable.

  3. Review rerun requests need one canonical writer.

  4. Review summary parsing should treat vulnerability language and weak-confidence summaries as actionable.

  5. Auto-resolving bot-only threads reduces friction, but only after clean current-head evidence.

  6. A remediation agent can shorten loop time significantly if guardrails stay strict.

General pattern terms:

  • code review agent

  • remediation agent

  • risk policy gate

One concrete implementation (ours):

  • code review agent: Greptile

  • remediation agent: Codex Action

  • canonical rerun workflow: greptile-rerun.yml

  • stale-thread cleanup workflow: greptile-auto-resolve-threads.yml

  • preflight policy workflow: risk-policy-gate.yml

If you use a different reviewer, keep the same control-plane semantics and swap integration points.

bash

npm run typecheck
npm test
npm run build:ci
npm run harness:legal-chat:smoke
npm run harness:ui:pre-pr
npm run harness:risk-tier
npm run harness:weekly-metrics
  1. Put risk + merge policy into one contract.

  2. Enforce preflight gate before expensive CI.

  3. Require clean code-review-agent state for current head SHA.

  4. If findings exist, remediate in-branch and rerun deterministically.

  5. Auto-resolve only bot-only stale threads after clean rerun.

  6. Require browser evidence for UI/flow changes.

  7. Convert incidents into harness cases and track loop SLOs.

That gives you a repo where agents can implement, validate, and be reviewed with deterministic, auditable standards.