Sweeping i18n leaks with four parallel AI agents — from 300 candidates down to 60 real bugs

For any app past a certain size that’s gone bilingual, the question “how much hardcoded Japanese is still hiding in our repo?” never quite goes away. A naive grep for [ぁ-んァ-ヶ一-龯] returns thousands of hits, and the vast majority are inside translation tables, already-branched code, or comments. The real leaks are buried.

For one cleanup pass we attacked this with four parallel AI investigation agents plus AST-based false-positive filtering. The result: ~300 candidates detected → ~60 real leaks → cleaned up across five rounds. This post walks through the flow and the most interesting bug it uncovered — paying English users had been getting Japanese email from the Stripe webhook for months.

Why a plain grep isn’t enough

A repository-wide grep returns thousands of hits, but the contents fall into four bins: translation tables / already branched by lang == 'en' / comments and docstrings / real leaks. The first three are harmless. Only the last shows Japanese to English users. The trouble is that grep can’t separate them, and the volume is too high for a human to triage one by one.

Four parallel agents for “wide and shallow” detection

The approach: launch AI investigation agents in parallel with each one assigned a different surface area.

[Agent 1] templates/*.html + lang/*.json    — data-i18n attribute gaps
[Agent 2] server/wpmm-license/*.php          — license API
[Agent 3] server/wpmm-web/*.php               — landing-page API
[Agent 4] core/*.py + tools/*.py             — desktop app code

Each agent gets the same prompt template — “enumerate user-facing JP hardcodes, decide as best you can whether each is already branched” — and runs independently. Parallelism keeps wall-clock time below a single-agent run, and having four perspectives on the same kind of problem improves coverage.

The merged report came in around 300 candidates. Still noisy.

Filter false positives with the AST

Hidden in those 300 were heavy false-positive clusters:

Location	Count	Why it’s a false positive
`templates/tos.html`	63	`tosJa` / `tosEn` blocks both exist; `switchLang` toggles them
`core/report_generator.py`	141	All inside `if lang == 'en'` branches or `_JA / _EN` variant maps

Going through 200 items by hand wasn’t realistic. Instead, we wrote a Python script using the ast module to mechanically decide “does this function have a lang branch around the JP literal?” A sketch:

import ast

def has_lang_branch(func_node):
    """Does this function use `lang` in a conditional?"""
    for node in ast.walk(func_node):
        if isinstance(node, ast.If):
            for sub in ast.walk(node.test):
                if isinstance(sub, ast.Name) and sub.id == 'lang':
                    return True
    return False

def has_jp_literal(func_node):
    """Any Constant string node containing Japanese characters?"""
    for node in ast.walk(func_node):
        if isinstance(node, ast.Constant) and isinstance(node.value, str):
            if any('぀' <= c <= '鿿' for c in node.value):
                return True
    return False

# A real leak = has JP literal AND no lang branch
real_leaks = [f for f in functions
              if has_jp_literal(f) and not has_lang_branch(f)]

Running this against the 141 in report_generator.py gave essentially zero real leaks (the one residual hit was a docstring false positive). The 63 in tos.html were also fully cleared by checking DOM structure + the presence of switchLang.

Net: about 60 real leaks, finally a tractable pile.

The 60 real leaks — and the worst one

Inside those 60 was the largest single impact: all four Stripe-webhook emails (purchase complete, renewal, payment failed, plan change) were hardcoded to Japanese. English-paying users had been getting purchase confirmations, failure notices, everything in Japanese. The kind of bug that quietly persists forever unless you go looking for it.

The fix was a one-function language inference from the Stripe event:

/** Infer display language from Stripe event currency. */
function lang_from_currency(string $currency): string {
    $en_currencies = ['usd'];
    return in_array(strtolower($currency), $en_currencies, true) ? 'en' : 'ja';
}

This $lang then gets passed into send_license_email / send_payment_failed_email / send_plan_changed_email / send_renewal_email, branching the subject and body, and switching mb_language('uni'|'Japanese') so English subjects are UTF-8 Base64 encoded instead of ISO-2022-JP. Subject encoding is small but real: mb_language('Japanese') was MIME-encoding English subjects in ISO-2022-JP, which raises spam scores on Gmail and Outlook.

On the license API side, we consolidated all language detection into one helper:

// server/wpmm-license/lib/i18n_helpers.php
function resolve_request_lang(?array $body = null): string {
    if (isset($body['language']) && in_array($body['language'], ['ja','en'], true)) {
        return $body['language'];
    }
    // Accept-Language fallback
    if (preg_match('/^en\b/i', $_SERVER['HTTP_ACCEPT_LANGUAGE'] ?? '')) {
        return 'en';
    }
    return 'ja';
}

validate.php / release_machine.php / webhook.php / verify_email.php now all require_once this and call resolve_request_lang() instead of rolling their own. An English plan-name table (PLAN_NAMES_EN) lives in the same file, so plan_name($code, $lang) becomes the single point of truth.

The remaining real leaks were similar in shape: core/license.py, core/key_perms.py, the desktop launchers (_launcher.sh / .ps1), and the landing-page APIs (checkout.php / chat.php / rate.php). All got the same treatment — extract a small helper, branch on language, route everything through one entry point.

Closing — agents + AST as a “wide → narrow” pipeline

Three principles worth keeping from this round:

Parallel investigation agents are well suited to wide repository-scale tasks. When the work is “find every X across the codebase,” splitting the surface area across multiple agents in parallel covers more ground than a single sequential pass, and the independent perspectives reduce blind spots
Don’t trust agent reports as-is — verify with the AST. Agent judgments include false positives, and “already branched” calls are exactly where they’re least reliable. Inserting an AST-based mechanical filter as a second pass cuts the noise dramatically (200 false positives removed in our case)
Consolidate fixes into helpers, not patches. Once you’ve found the leaks, dropping them into a single helper (lang_from_currency, resolve_request_lang, plan_name) makes adding a new API naturally route through the same path. The “uh, I forgot to branch” failure mode becomes structurally harder

The fear “how much Japanese is still hardcoded in our repo?” doesn’t fully go away — but with a parallel-agents + AST pipeline in your toolkit, you can at least quantify it on demand instead of carrying it as a vague anxiety.