Second Brain

You want one loop:

The coding agent writes code
The repo enforces risk-aware checks before merge
A code review agent validates the PR
Evidence (tests + browser + review) is machine-verifiable
Findings turn into repeatable harness cases

The specific review agent can be

, CodeQL + policy logic, custom LLM review, or another service. The control-plane pattern stays the same.

I took inspiration from this helpful blog post by

Ryan Carson

@ryancarson

I've been grinding with Codex (on Extra High) through setting up our repo for Harness Engineering. The goal is to have Codex right and review 100% of the code. Getting much closer.

Your contract should define:

risk tiers by path
required checks by tier
docs drift rules for control-plane changes
evidence requirements for UI/critical flows

json

{
  "version": "1",
  "riskTierRules": {
    "high": [
      "app/api/legal-chat/**",
      "lib/tools/**",
      "db/schema.ts"
    ],
    "low": ["**"]
  },
  "mergePolicy": {
    "high": {
      "requiredChecks": [
        "risk-policy-gate",
        "harness-smoke",
        "Browser Evidence",
        "CI Pipeline"
      ]
    },
    "low": {
      "requiredChecks": ["risk-policy-gate", "CI Pipeline"]
    }
  }
}

Why it matters: it removes ambiguity and prevents silent drift between scripts, workflow files, and policy docs.

A reliable pattern is:

run risk-policy-gate first
verify deterministic policy + review-agent state
only then start test/build/security fanout jobs

This avoids wasting CI minutes on PR heads that are already blocked by policy or unresolved review findings.

typescript

const requiredChecks = computeRequiredChecks(changedFiles, riskTier);
await assertDocsDriftRules(changedFiles);
await assertRequiredChecksSuccessful(requiredChecks);

if (needsCodeReviewAgent(changedFiles, riskTier)) {
  await waitForCodeReviewCompletion({ headSha, timeoutMinutes: 20 });
  await assertNoActionableFindingsForHead(headSha);
}

This was the biggest practical lesson from real PR loops.

Treat review state as valid only when it matches the current PR head commit:

wait for the review check run on headSha
ignore stale summary comments tied to older SHAs
fail if the latest review run is non-success or times out
require reruns after each synchronize/push
clear stale gate failures by rerunning policy gate on the same head

If you skip this, you can merge a PR using stale “clean” evidence.

When multiple workflows can request reruns, duplicate bot comments and race conditions appear.

Use exactly one workflow as canonical rerun requester and dedupe by marker + sha:<head>.

typescript

const marker = '<!-- review-agent-auto-rerun -->';
const trigger = `sha:${headSha}`;
const alreadyRequested = comments.some((c) =>
  c.body.includes(marker) && c.body.includes(trigger),
);

if (!alreadyRequested) {
  postComment(`${marker}\n@review-agent please re-review\n${trigger}`);
}

If review findings are actionable, trigger a coding agent to:

read review context
patch code
run focused local validation
push fix commit to the same PR branch

Then let PR synchronize trigger the normal rerun path. Keep this deterministic:

pin model + effort for reproducibility
skip stale comments not matching current head
never bypass policy gates

A useful quality-of-life step:

after a clean current-head rerun
auto-resolve unresolved threads where all comments are from the review bot
never auto-resolve human-participated threads

Then rerun policy gate so required-conversation-resolution reflects the new state.

For UI or user-flow changes, require evidence manifests and assertions in CI (not just screenshots in PR text):

required flows exist
expected entrypoint was used
expected account identity is present for logged-in flows
artifacts are fresh and valid

bash

npm run harness:ui:capture-browser-evidence
npm run harness:ui:verify-browser-evidence

plaintext

production regression -> harness gap issue -> case added -> SLA tracked

This keeps fixes from becoming one-off patches and grows long-term coverage.

The most important lessons were:

Deterministic ordering matters: preflight gate must complete before CI fanout.
Current-head SHA matching is non-negotiable.
Review rerun requests need one canonical writer.
Review summary parsing should treat vulnerability language and weak-confidence summaries as actionable.
Auto-resolving bot-only threads reduces friction, but only after clean current-head evidence.
A remediation agent can shorten loop time significantly if guardrails stay strict.

General pattern terms:

code review agent
remediation agent
risk policy gate

One concrete implementation (ours):

code review agent: Greptile
remediation agent: Codex Action
canonical rerun workflow: greptile-rerun.yml
stale-thread cleanup workflow: greptile-auto-resolve-threads.yml
preflight policy workflow: risk-policy-gate.yml

If you use a different reviewer, keep the same control-plane semantics and swap integration points.

bash

npm run typecheck
npm test
npm run build:ci
npm run harness:legal-chat:smoke
npm run harness:ui:pre-pr
npm run harness:risk-tier
npm run harness:weekly-metrics

Put risk + merge policy into one contract.
Enforce preflight gate before expensive CI.
Require clean code-review-agent state for current head SHA.
If findings exist, remediate in-branch and rerun deterministically.
Auto-resolve only bot-only stale threads after clean rerun.
Require browser evidence for UI/flow changes.
Convert incidents into harness cases and track loop SLOs.

That gives you a repo where agents can implement, validate, and be reviewed with deterministic, auditable standards.