Debugging an AI agent integration: A production tale

• By Rich Martinez

Debugging an AI Agent Integration: A Production War Story 🪖😒.

The Problem

I had just built an AI-powered newsletter system. The idea was simple: when I publish a blog post in StoryChief, an AI agent analyzes it, drafts a newsletter, and presents it in my admin dashboard for one-click approval.

The code looked perfect. The architecture was sound. But when I published a test post, nothing happened.

This is the story of how I debugged a multi-worker, multi-database, AI-powered system in production.

The Setup

The system has three moving parts:

  1. StoryChief Webhook Receiver (Cloudflare Worker) - Receives webhooks when I publish
  2. Newsletter Agent (Cloudflare AI) - Analyzes posts and generates drafts
  3. Astro Admin Dashboard (Cloudflare Worker) - Displays drafts for approval

The flow should be:

StoryChief → Webhook → Agent → D1 Database → Admin Dashboard

But the admin dashboard was empty.

Bug #1: The Missing Table

First, I checked the logs. The error was immediate:

D1_ERROR: no such table: broadcast_drafts

The mistake: I created the migration file (002_broadcast_drafts.sql) but never ran it against production. It only existed in my local database.

The fix:

npx wrangler d1 migrations apply prod-db --remote

Lesson learned: Local migrations don't automatically sync to production. Always run migrations against remote databases explicitly.

Bug #2: The Wrong Endpoint

After fixing the database, I published another post. Still nothing. But this time, the logs showed a different error:

Unexpected token 'e', "error code: 522" is not valid JSON

Error 522 is Cloudflare's "Connection timed out" error. The webhook receiver was trying to call an API endpoint that didn't exist.

I looked at the webhook code:

await fetch("https://richsd.com/api/admin/drafts", {
  method: "POST",
  // ...
});

The mistake: /api/admin/drafts only supported GET (list drafts) and DELETE (reject drafts). There was no POST handler to create drafts.

The fix: Create a new endpoint:

// src/pages/api/admin/drafts/create.ts
export const POST: APIRoute = async ({ request, locals }) => {
  const { post_slug, subject_line, summary, a2ui_payload } = await request.json();
  
  await locals.runtime.env.DB.prepare(`
    INSERT INTO broadcast_drafts (post_slug, subject_line, summary, a2ui_payload)
    VALUES (?, ?, ?, ?)
  `).bind(post_slug, subject_line, summary, a2ui_payload).run();
  
  return new Response(JSON.stringify({ success: true }), { status: 201 });
};

Lesson learned: RESTful API design matters. Don't overload endpoints with multiple HTTP methods unless you explicitly handle them.

Bug #3: The Missing API Key

Even after fixing the endpoint, the agent wasn't generating drafts. The webhook was calling the endpoint, but the AI wasn't running.

I checked the webhook receiver code and found this:

const response = await env.AI.run("@cf/meta/llama-3.1-8b-instruct", {
  messages: [/* ... */]
});

It was using Cloudflare's AI binding, but I had originally planned to use Google's Gemini API. The GEMINI_API_KEY environment variable was never set.

The fix:

  1. Add the secret to both workers:npx wrangler secret put GEMINI_API_KEY
  2. Update env.d.ts to include the type:interface Env { GEMINI_API_KEY: string; // ... }

Lesson learned: Environment variables don't magically propagate. Each Cloudflare Worker needs its own secrets, even if they're in the same project.

Bug #4: The Mock Data Confusion

When I finally got the UI working, I saw this:

Review Newsletter Draft
**Subject:** TEST: Agent-Driven Approvals
This is a test summary generated by the agent mockup.

Wait, that's not AI-generated—that's the mock data I inserted during development!

The mistake: I had run a test migration (999_mock_data.sql) that inserted fake data into the production database.

The fix:

npx wrangler d1 execute prod-db --remote --command="DELETE FROM broadcast_drafts WHERE post_slug = 'a2ui-test';"

Lesson learned: Never commit test data to production migrations. Use separate files for development fixtures.

The AI Angle

This debugging session was interesting because I was pair-programming with an AI (Google's Gemini). Here's what it got right and wrong:

What the AI Got Right

  1. Immediate diagnosis: When I said "nothing is happening," it immediately asked to check the D1 database and logs.
  2. Systematic debugging: It worked through the stack methodically: database → API → environment variables.
  3. Code generation: It created the missing /api/admin/drafts/create endpoint on the first try.

What I Had to Guide

  1. Context switching: The AI didn't realize the webhook receiver was a separate Cloudflare Worker project outside the current workspace.
  2. Production vs. Local: It assumed migrations would sync automatically between environments.
  3. API design: It initially suggested modifying the existing /api/admin/drafts endpoint instead of creating a new one.

Key Takeaways

For Developers Building Multi-Worker Systems

  1. Explicit is better than implicit: Don't assume environment variables, migrations, or configurations will propagate between workers.
  2. Log everything: The console.error statements in my webhook receiver saved hours of debugging.
  3. Test the integration, not just the units: Each component worked perfectly in isolation. The bugs only appeared when they talked to each other.
  4. Use proper HTTP status codes: The 522 error immediately told me it was a timeout, not a logic error.

For AI Pair Programmers

  1. Be specific about your environment: "I have two separate Cloudflare Workers" is more helpful than "my webhook isn't working."
  2. Share error messages verbatim: The exact error text (D1_ERROR: no such table) led to instant diagnosis.
  3. Confirm each fix before moving on: I tested each change in production before proceeding to the next bug.

What's Next?

The system is now working end-to-end:

  1. I publish a blog post in StoryChief
  2. The webhook receiver calls Cloudflare AI to generate a draft
  3. The draft appears in my admin dashboard
  4. I click "Approve & Send" and 500+ subscribers get the newsletter

Zero manual copy-pasting. Zero context switching. Just pure, automated bliss.


Tech Stack: Astro, Cloudflare Workers, D1, Cloudflare AI, Resend
Bugs Fixed: 4
Time to Debug: ~30 minutes (with AI assistance)
Time Saved Per Newsletter: ~15 minutes

If you're building agentic systems, remember: the code is the easy part. The hard part is making all the pieces talk to each other in production.