Renaming a site without losing its data — separating display name from a stable identifier

A client asks you to rename a site from acme-staging to the production name acme. The moment you rename it in the app, the DB backups, screenshots, and thumbnails you had been collecting all appear to disappear.

The files are still on disk, but the new directory is empty. The data hasn’t carried over as “the same site.” It’s a trap you can fall into on day one, and we did — with our original design.

Here’s how we redesigned things so renames don’t orphan data.

Why the data appears to disappear — the site name was the key

The original design of WP Maintenance Manager decided file locations based on the site name.

backups/
  acme-staging/        ← DB backups for site "acme-staging"
    backup_20260101_120000.sql

screenshots/
  acme-staging/        ← Screenshots for the same site
    home_pre.png

After renaming acme-staging → acme, a new empty directory backups/acme/ gets created and starts from zero. The old directory is still there, but the app treats it as “some other site’s stale data” and doesn’t surface it.

Site names are natural candidates for labels, but in practice they get renamed all the time. Cleaning up client-name typos, promoting staging to production, renaming on a re-org — the reasons to rename are endless.

The fix — give every site an immutable `site_id`

Every site now carries a _id in the form site_xxxxxxxxxxxx (a UUID, 12 hex chars), and every file location now keys off that _id instead of the site name.

# core/site_id_utils.py
def generate_site_id():
    return f"site_{uuid.uuid4().hex[:12]}"

_id is assigned once and never changes. Even if the site name is renamed, the file location stays the same backups/site_a1b2c3d4e5f6/ directory — and the existing contents are still in use.

It’s a classic two-layer design: the display name (site name) is separate from the internal identifier (_id).

A migration that doesn’t break existing data

The hardest part was handling existing users whose data was already keyed by site name.

ensure_site_ids() is an idempotent migration:

Auto-generates and assigns _id only to sites that don’t have one
Leaves sites that already have a _id untouched
Uses FileLock + tempfile + os.replace() for atomic writes, so a crash mid-write won’t corrupt anything

It runs at app startup and at the entry points of site-related APIs (three paths in total). The user doesn’t have to do anything — IDs are silently assigned in the background.

The file-side migration follows the same pattern. On first launch, if backups/<site_name>/ exists, rename it to backups/<site_id>/ (but if the new-format directory already exists, leave both alone). Idempotent.

Tying logs to sites — strict + compat hybrid matching

Log entries also carry a site_id now. But existing log entries don’t have one — they were written before the rename.

The UI scoping feature (filter logs for a specific site) is implemented as a hybrid:

New logs (with site_id) → match by strict equality
Old logs (without site_id) → fall back to site_name compat matching

The result: logs from before and after a rename appear together in the same scope. The user never feels like “past history disappeared.”

A post-release blunder

For honesty: shortly after release, we shipped a bug. The is_valid_site_id validation function had a regex that only matched the new-generation format, and rejected some legitimate existing IDs.

# the broken version
SITE_ID_RE = re.compile(r'^site_[0-9a-f]{12}$')  # exactly 12 hex

A few longer ID formats — leftovers from the migration’s earlier iterations — got rejected outright, and the symptom was “every site has disappeared.” The lesson is mundane but real: fully audit existing data formats before tightening validation. Adding validation after the fact is exactly where these regressions hide.

Takeaway — separating stable identifier from display name

Separating “the name displayed to humans” from “the immutable identifier” is a classic software-design pattern, but introducing it after the product is already in production is expensive. The idempotent migration, the edit-vs-duplicate ownership split, the backward-compatible validation — drop any one of these and existing user data evaporates.

Since separating site name (display) from site_id (immutable), clients can have their site names corrected, staging promoted to production, or org-rename refactoring done — all while keeping every byte of historical data tied to the same site. Designing your file locations to trust the display name 100% on day one closes that door before you even reach for it. That’s the retrospective on this one.