← back

Writing.

Notes from building production AI systems for paying customers: what transfers, what breaks, and the unglamorous engineering underneath the headline numbers.

01

The agent that audited my own contracts

Pointing an LLM at the operational risk no human has the hours to read.

FXVPS is a multi-tenant SaaS I run solo. Over years, customer-facing copy accumulates (onboarding emails, refund replies, affiliate terms, SLA language) written by different hands, at different times, against different versions of the policy. Individually each message is fine. Collectively, they drift out of agreement with each other and with the actual terms. Nobody re-reads years of email to check.

The problem

Contradictory commitments are a quiet liability. If one email promises a refund window the policy does not offer, or an SLA the infrastructure cannot meet, you have created an obligation you did not intend, and you will not find out until a customer holds you to it.

What I built

An LLM content-audit pipeline that ingests customer-facing emails, extracts every explicit commitment (refunds, affiliate terms, retention, SLA), and cross-checks them against the canonical policy and against each other. It flags contradictions with the specific sentences and a suggested correction, so the output is a worklist, not a vibe.

The run

Across 126 emails it surfaced four real contradictions: refund, affiliate, retention, and SLA. I reviewed each, shipped corrections to 38 active customers with zero failures, then wrapped the audit in a monthly cron hook so the corpus can never drift that far again.

The takeaway

Agents are not only for the headline product features. The highest-ROI place to point them is often the unglamorous operational risk no human has the hours to read. Pick a corpus that is too big to police manually and too consequential to ignore. That is where an agent pays for itself in week one.

PHP, multi-tenant Go/PHP backend, Anthropic Claude, scheduled hooks.

  • Agents
  • Content audit
  • Anthropic Claude
02

$540k of phantom data, and the real lesson underneath it

Most data problems are really undeclared source-of-truth problems.

Traders on FXVPS run automated strategies across MetaTrader 4 and MetaTrader 5. Their journal needs one trustworthy view of performance, but the trade history arrives from two platforms that each behave as if they are the source of truth.

The problem

Ingest both naively and you get duplicates, double-counted volume, and balance/credit events that masquerade as P&L. The numbers look plausible, which is the dangerous part: a reconciliation error does not announce itself, it just quietly makes your reporting wrong.

What I found

A multi-platform reconciliation pass turned up $540k of phantom data across 7 accounts (volume and positions that did not correspond to anything real) plus $4k of genuine accounting drift hiding underneath the noise.

What I built

Cross-platform ingestion with platform-aware duplicate detection and balance/credit filtering, plus an explicit decision about the source of truth: every account P&L is re-anchored to the broker, not to either platform local view. Migration ordering was index-first so the job fit inside the host boot window.

The takeaway

Most data problems are really undeclared source-of-truth problems. The clever part is not the dedup maths, it is deciding, on paper, which system is allowed to be right when two disagree. Name the source of truth first; the reconciliation logic follows from it.

Go, PHP, MT4/MT5, Railway, multi-tenant data pipeline.

  • Reconciliation
  • Data integrity
  • MT4/MT5
03

Eighteen months of running agents in production: what actually transfers

Agents are leverage on the parts of engineering nobody enjoys.

For a year and a half I have operated a custom multi-agent ecosystem (Claude Code skills, sub-agent definitions, MCP integrations) across eight live projects. Not demos; systems with paying customers attached.

The claim I would push back on

"Agents replace engineers." In practice it is something duller and more useful: agents are leverage on the parts of engineering nobody enjoys, the audits, the reconciliations, the migrations, the checks that should happen and do not.

What has actually moved the needle

  • Tool surface design. An agent is only as good as the tools you hand it. The majority of the work is API and interface design (what the tool exposes, what it refuses, what it returns) not prompt wording.
  • Sub-agent delegation. One agent that tries to do everything is reliably worse than three that each do one thing well, with clear handoffs between them.
  • Eval-driven iteration. If you cannot measure the output, you are not engineering, you are guessing with extra steps. Evals are the difference between a demo and a system.
  • Context engineering. What you leave out of the context matters as much as what you put in. Most "the model got confused" problems are context problems.

Why it transfers

The orchestration layer is the same whether it lives in your IDE or in production. The patterns I rely on in Claude Code map directly onto standalone deployments via the Anthropic SDK and MCP: same tool design, same delegation, same eval loop.

The takeaway

The discipline that wins is the old one: design the failure modes before the happy path. The tools just got dramatically better at executing it.

Anthropic Claude, Claude Code, MCP, sub-agent orchestration, eval harnesses.

  • Agent orchestration
  • MCP
  • Evals
// let's talk

Have a system worth getting right?

main@eaymon.com

or read the full case studies · find me on LinkedIn