Debugging an AI agent integration: A production tale
Debugging an AI Agent Integration: A Production War Story 🪖😒.
The Problem
I had just built an AI-powered newsletter system. The idea was simple: when I publish a blog post in StoryChief, an AI agent analyzes it, drafts a newsletter, and presents it in my admin dashboard for one-click approval.
The code looked perfect. The architecture was sound. But when I published a test post, nothing happened.
This is the story of how I debugged a multi-worker, multi-database, AI-powered system in production.
The Setup
The system has three moving parts:
- StoryChief Webhook Receiver (Cloudflare Worker) - Receives webhooks when I publish
- Newsletter Agent (Cloudflare AI) - Analyzes posts and generates drafts
- Astro Admin Dashboard (Cloudflare Worker) - Displays drafts for approval
The flow should be:
StoryChief → Webhook → Agent → D1 Database → Admin Dashboard
But the admin dashboard was empty.
Bug #1: The Missing Table
First, I checked the logs. The error was immediate:
D1_ERROR: no such table: broadcast_drafts
The mistake: I created the migration file (002_broadcast_drafts.sql) but never ran it against production. It only existed in my local database.
The fix:
npx wrangler d1 migrations apply prod-db --remote
Lesson learned: Local migrations don't automatically sync to production. Always run migrations against remote databases explicitly.
Bug #2: The Wrong Endpoint
After fixing the database, I published another post. Still nothing. But this time, the logs showed a different error:
Unexpected token 'e', "error code: 522" is not valid JSON
Error 522 is Cloudflare's "Connection timed out" error. The webhook receiver was trying to call an API endpoint that didn't exist.
I looked at the webhook code:
await fetch("https://richsd.com/api/admin/drafts", {
method: "POST",
// ...
});
The mistake: /api/admin/drafts only supported GET (list drafts) and DELETE (reject drafts). There was no POST handler to create drafts.
The fix: Create a new endpoint:
// src/pages/api/admin/drafts/create.ts
export const POST: APIRoute = async ({ request, locals }) => {
const { post_slug, subject_line, summary, a2ui_payload } = await request.json();
await locals.runtime.env.DB.prepare(`
INSERT INTO broadcast_drafts (post_slug, subject_line, summary, a2ui_payload)
VALUES (?, ?, ?, ?)
`).bind(post_slug, subject_line, summary, a2ui_payload).run();
return new Response(JSON.stringify({ success: true }), { status: 201 });
};
Lesson learned: RESTful API design matters. Don't overload endpoints with multiple HTTP methods unless you explicitly handle them.
Bug #3: The Missing API Key
Even after fixing the endpoint, the agent wasn't generating drafts. The webhook was calling the endpoint, but the AI wasn't running.
I checked the webhook receiver code and found this:
const response = await env.AI.run("@cf/meta/llama-3.1-8b-instruct", {
messages: [/* ... */]
});
It was using Cloudflare's AI binding, but I had originally planned to use Google's Gemini API. The GEMINI_API_KEY environment variable was never set.
The fix:
- Add the secret to both workers:
npx wrangler secret put GEMINI_API_KEY
- Update
env.d.tsto include the type:interface Env { GEMINI_API_KEY: string; // ... }
Lesson learned: Environment variables don't magically propagate. Each Cloudflare Worker needs its own secrets, even if they're in the same project.
Bug #4: The Mock Data Confusion
When I finally got the UI working, I saw this:
Review Newsletter Draft
**Subject:** TEST: Agent-Driven Approvals
This is a test summary generated by the agent mockup.
Wait, that's not AI-generated—that's the mock data I inserted during development!
The mistake: I had run a test migration (999_mock_data.sql) that inserted fake data into the production database.
The fix:
npx wrangler d1 execute prod-db --remote --command="DELETE FROM broadcast_drafts WHERE post_slug = 'a2ui-test';"
Lesson learned: Never commit test data to production migrations. Use separate files for development fixtures.
The AI Angle
This debugging session was interesting because I was pair-programming with an AI (Google's Gemini). Here's what it got right and wrong:
What the AI Got Right
- Immediate diagnosis: When I said "nothing is happening," it immediately asked to check the D1 database and logs.
- Systematic debugging: It worked through the stack methodically: database → API → environment variables.
- Code generation: It created the missing
/api/admin/drafts/createendpoint on the first try.
What I Had to Guide
- Context switching: The AI didn't realize the webhook receiver was a separate Cloudflare Worker project outside the current workspace.
- Production vs. Local: It assumed migrations would sync automatically between environments.
- API design: It initially suggested modifying the existing
/api/admin/draftsendpoint instead of creating a new one.
Key Takeaways
For Developers Building Multi-Worker Systems
- Explicit is better than implicit: Don't assume environment variables, migrations, or configurations will propagate between workers.
- Log everything: The
console.errorstatements in my webhook receiver saved hours of debugging. - Test the integration, not just the units: Each component worked perfectly in isolation. The bugs only appeared when they talked to each other.
- Use proper HTTP status codes: The 522 error immediately told me it was a timeout, not a logic error.
For AI Pair Programmers
- Be specific about your environment: "I have two separate Cloudflare Workers" is more helpful than "my webhook isn't working."
- Share error messages verbatim: The exact error text (
D1_ERROR: no such table) led to instant diagnosis. - Confirm each fix before moving on: I tested each change in production before proceeding to the next bug.
What's Next?
The system is now working end-to-end:
- I publish a blog post in StoryChief
- The webhook receiver calls Cloudflare AI to generate a draft
- The draft appears in my admin dashboard
- I click "Approve & Send" and 500+ subscribers get the newsletter
Zero manual copy-pasting. Zero context switching. Just pure, automated bliss.
Tech Stack: Astro, Cloudflare Workers, D1, Cloudflare AI, Resend
Bugs Fixed: 4
Time to Debug: ~30 minutes (with AI assistance)
Time Saved Per Newsletter: ~15 minutes
If you're building agentic systems, remember: the code is the easy part. The hard part is making all the pieces talk to each other in production.