Problem Context

Code review is the highest-leverage quality practice in software engineering — and the biggest bottleneck. Senior engineers spend hours reviewing PRs, catching the same patterns repeatedly: missing error handling, inconsistent naming, forgotten null checks, security anti-patterns.

AI code review tools promise to automate the routine catches, freeing human reviewers for design and architecture discussions. But most implementations either generate noise (flagging style issues nobody cares about) or miss substance (ignoring real bugs while complimenting your variable names). This experiment builds one that actually helps.

🤔 Sound familiar?
  • Your senior engineers spend 30%+ of their time on code reviews, mostly catching the same patterns
  • You've tried AI review tools but they generate too much noise — the team started ignoring them
  • You want AI to catch the routine issues so human reviewers can focus on design and correctness
  • You need a code review bot that knows your team's specific standards, not generic best practices

This article builds a custom AI code review bot from scratch and shares three months of real-world results.

Concept Explanation

An effective AI code review system has three phases: diff analysis (understanding what changed and why),contextualized review (evaluating changes against team standards and codebase patterns), andtargeted feedback (commenting only when there's something actionable to say).


      flowchart TD
          PR["Pull Request\nOpened"] --> D["Parse Diff\n(changed files + hunks)"]
          D --> C["Gather Context\n(related files, standards)"]
          C --> R["LLM Review\n(per-file analysis)"]
          R --> F{"Actionable\nfindings?"}
          F -->|"Yes"| P["Post Comments\n(inline on PR)"]
          F -->|"No"| S["Silent Pass\n(no noise)"]
      
          style PR fill:#4f46e5,color:#fff,stroke:#4338ca
          style C fill:#059669,color:#fff,stroke:#047857
          style R fill:#7c3aed,color:#fff,stroke:#6d28d9
          style S fill:#6b7280,color:#fff,stroke:#4b5563
      

Design Principles

Signal over noise: A review bot that comments on every PR with generic advice gets ignored within a week. Only comment when there's a genuine issue — a bug, a security concern, a violation of team standards. Silence is better than noise.

Context matters: Reviewing a diff without understanding the surrounding code produces shallow feedback. The bot needs to read related files, understand interfaces being implemented, and know team conventions.

Actionable feedback: "Consider improving error handling" is useless. "This catch block swallows the exception silently — add logging or rethrow. See our standard CS-0034." is actionable.

Implementation

Step 1: GitHub Webhook Handler

app.MapPost("/api/webhook", async (
          HttpRequest request,
          IReviewService reviewer) =>
      {
          var payload = await request.ReadFromJsonAsync<GitHubWebhookPayload>();
      
          if (payload?.Action is not "opened" and not "synchronize")
              return Results.Ok();
      
          // Queue review — don't block the webhook
          await reviewer.QueueReviewAsync(
              payload.Repository.FullName,
              payload.PullRequest.Number);
      
          return Results.Accepted();
      });
      

Step 2: Diff Analysis and Context Gathering

public class ReviewService
      {
          public async Task<ReviewResult> ReviewPullRequest(
              string repo, int prNumber)
          {
              // Get the diff
              var diff = await _github.GetPullRequestDiffAsync(repo, prNumber);
              var files = DiffParser.Parse(diff);
      
              // Filter to reviewable files
              var reviewable = files
                  .Where(f => ReviewableExtensions.Contains(Path.GetExtension(f.Path)))
                  .Where(f => !f.Path.StartsWith("test/")) // Don't review tests
                  .ToList();
      
              var findings = new List<ReviewFinding>();
      
              foreach (var file in reviewable)
              {
                  // Get full file content for context
                  var fullContent = await _github.GetFileContentAsync(
                      repo, file.Path, prNumber);
      
                  // Get related files (interfaces, base classes)
                  var related = await FindRelatedFiles(repo, file.Path, prNumber);
      
                  var fileFindings = await ReviewFile(file, fullContent, related);
                  findings.AddRange(fileFindings);
              }
      
              return new ReviewResult { Findings = findings };
          }
      }
      

Step 3: LLM Review with Team Standards

private async Task<List<ReviewFinding>> ReviewFile(
          DiffFile file, string fullContent, List<RelatedFile> related)
      {
          var relatedContext = string.Join("\n\n",
              related.Select(r => $"// {r.Path}\n{r.Content}"));
      
          var prompt = $"""
              Review this code change. Our team standards:
              - All public methods must validate parameters
              - Async methods must use CancellationToken
              - Exceptions must be logged before rethrowing
              - No string concatenation in SQL queries
              - All HTTP calls must use IHttpClientFactory
      
              ONLY comment if you find:
              1. A bug or logical error
              2. A security vulnerability
              3. A violation of the standards above
              4. A performance issue in a hot path
      
              DO NOT comment on: style, naming preferences, or minor
              refactoring suggestions.
      
              Changed file ({file.Path}):
              ```
              {file.Diff}
              ```
      
              Full file for context:
              ```
              {fullContent}
              ```
      
              Related files:
              {relatedContext}
      
              Return JSON array of findings. Empty array if no issues.
              Each finding: {{"line": number, "severity": "error"|"warning",
              "message": "specific actionable feedback",
              "standard": "CS-XXXX if applicable"}}
              """;
      
          var response = await _chatClient.CompleteChatAsync(
              [new UserChatMessage(prompt)],
              new ChatCompletionOptions
              {
                  ResponseFormat = ChatResponseFormat.CreateJsonObjectFormat(),
                  Temperature = 0.2f
              });
      
          return JsonSerializer.Deserialize<List<ReviewFinding>>(
              response.Value.Content[0].Text) ?? [];
      }
      

Step 4: Post Inline Comments

public async Task PostFindings(
          string repo, int prNumber, List<ReviewFinding> findings)
      {
          if (findings.Count == 0) return; // Silent pass
      
          var comments = findings.Select(f => new PullRequestReviewComment
          {
              Path = f.FilePath,
              Line = f.Line,
              Body = FormatComment(f)
          }).ToList();
      
          await _github.CreatePullRequestReviewAsync(repo, prNumber, new
          {
              body = $"AI Review: {findings.Count} finding(s)",
              @event = "COMMENT", // Never APPROVE or REQUEST_CHANGES
              comments
          });
      }
      
      private string FormatComment(ReviewFinding f)
      {
          var severity = f.Severity == "error" ? "🔴" : "🟡";
          var standard = f.Standard != null ? $" ({f.Standard})" : "";
          return $"{severity} {f.Message}{standard}";
      }
      

Three-Month Results

Metric                          | Result
      --------------------------------|---------------
      PRs reviewed                    | 847
      PRs with findings               | 312 (37%)
      Total findings                  | 489
      True positives (confirmed bugs) | 201 (41%)
      Useful warnings                 | 198 (40%)
      False positives                 | 90 (18%)
      Average review time             | 45 seconds
      Team satisfaction (survey)      | 4.1/5
      

Pitfalls

⚠️ Common Mistakes

1. Commenting on every PR

A bot that always has something to say trains the team to ignore it. The most important feature is knowing when to stay quiet. If there are no actionable findings, post nothing. A 37% comment rate is healthy — it means 63% of PRs passed without issues.

2. Using APPROVE or REQUEST_CHANGES

The AI bot should never block or approve PRs — only comment. Using REQUEST_CHANGES blocks merges on false positives. UsingAPPROVE gives false confidence. Always use COMMENT and let humans make the final call.

3. Reviewing without full file context

Reviewing only the diff misses critical context — is this method implementing an interface? Is the class already handling the concern elsewhere? Always send the full file and related interfaces alongside the diff.

4. Generic standards instead of team standards

"Consider using dependency injection" is generic advice that wastes everyone's time. Encode your team's actual standards — specific rule IDs, specific patterns to catch. The bot should review like a team member who knows your codebase.

Practical Takeaways

✅ Key Lessons
  • Silence is a feature. Only comment when there's something actionable. High noise → team ignores the bot → zero value. Target a 30-40% comment rate.
  • Encode your team's actual standards. The bot should review against your specific rules, not generic best practices. Include rule IDs and concrete examples in the prompt.
  • Never approve or request changes. The bot comments. Humans decide. This prevents false positives from blocking merges and false negatives from bypassing review.
  • Context is everything. Full file, related interfaces, and team standards go into every review. Diff-only review produces shallow, often wrong feedback.
  • Track false positive rate relentlessly. If false positives exceed 25%, the team will stop reading comments. Review and tune the prompt monthly based on which comments were acted on vs. dismissed.