Three pitfalls in a dashboard cache lifetime — boot-time restore, TTL, and partial invalidation

We added a cache-first design to the cross-site updates dashboard, then wired the site-list badge to read from the same cache. Both ships went well.

But as soon as it hit real usage, we hit three distinct pitfalls around the cache’s “lifetime” in quick succession: “the badges all disappear on refresh,” “the 7-day TTL is too short,” “running maintenance on one site clears all the badges.” Each is a small spec call in isolation, but from the user’s side, all three feel like the same symptom: “badges aren’t sticking around the way I expect.” This post walks through each fix and what we got wrong.

Pitfall 1 — All badges disappear on refresh

The first report was blunt: “Is it intentional that the pending-plugin badges all vanish when I refresh?“

Not intentional. It was a bug. _updatesDashState was designed as a localStorage-backed persistent cache — but the restoration call only ran the moment the dashboard was opened.

// Old: restored only when the dashboard opens
function openUpdatesDashboard() {
  _loadUpdatesDashStateFromLocalStorage();   // ← the only place this got called
  // ... render ...
}

Through this path:

App boot: in-memory _updatesDashState sits at its initial empty value
Site-list render: _getPendingPluginCountForSite sees the empty array, returns null
Result: every badge hidden

Restoration only happened when the user explicitly opened the dashboard. In the original designer’s head, “the cache belongs to the dashboard” — but a later consumer (the site-list badge) rendered before the dashboard ever opened, and that path wasn’t accounted for.

The fix was one line, in DOMContentLoaded:

document.addEventListener('DOMContentLoaded', () => {
  _completedSiteTimestamps = _loadCompletedSites();
  try {
    _loadUpdatesDashStateFromLocalStorage();  // ← restore at boot, always
  } catch (e) {
    // Don't block startup if localStorage is unavailable
  }
  // ... rest of init
});

The try/catch is there because some browser environments (private mode, etc.) disallow localStorage. Restoration is “nice to have” — if it fails, don’t take down the app.

Lesson 1 — Tie cache restoration to the app’s lifecycle, not its consumer

Putting restoration at “the first screen that uses the cache” was the mistake. If another screen might consume the same cache earlier, restoration belongs at app boot, once. Adding more readers later doesn’t require touching the restoration logic.

Pitfall 2 — A 7-day TTL was too short

The next report: “Does the pending-plugin display in the list disappear after 7 days?“

The 7-day figure was a feel-based guess at implementation time. We’d assumed a workflow of “open the dashboard regularly.” But real usage looks more like “glance at the dashboard occasionally, then work site-by-site referring to the badges.” Under that pattern, 7 days expires before the next visit.

The client’s ask was clean: “I want it visible until the next check.” So we bumped TTL to 30 days:

const _DASH_CACHE_MAX_AGE_MS = 30 * 24 * 60 * 60 * 1000;  // 7→30 days

But 30 days is long, so without showing how old the visible number is, users risk treating stale data as fresh. We added an age suffix to the tooltip:

function _formatPendingPluginCountAgeSuffix() {
  const loaded = _updatesDashState.loadedAt;
  if (!loaded) return '';
  const days = _daysSinceTimestamp(loaded);
  return _formatDaysAgoText(days, '（最終取得: ${days}）');
  // → e.g. "(last fetched: 5 days ago)"
}

_formatDaysAgoText is the helper from the elapsed-days badge, reused as-is. Keeping the “N days ago” phrasing in one place makes small reuses like this easy.

Lesson 2 — Pair longer TTLs with explicit freshness

Stretching a TTL increases the risk of staring at stale data. Rather than cutting it shorter to compensate, show the user how stale and let them decide. 30 days + “5 days ago” is more informative than a hard 7-day cutoff.

Pitfall 3 — One site’s maintenance wiped every badge

The third pitfall surfaced in real usage: “I ran maintenance on one site, but the badges for other sites are gone too. Is that right?“

The old behavior was “clear the entire dashboard cache on a successful maintenance run”:

// Old: wipe everything
function _invalidateUpdatesDashCache() {
  _updatesDashState = { sites: [], total_pending_count: 0, loadedAt: 0 };
  _saveUpdatesDashStateToLocalStorage();
}

This was the same trap we hit in the streaming-log story: if a state clear isn’t scoped to the execution, it sweeps up unrelated state. Maintaining one site shouldn’t blow away the other nine sites’ badges; that’s worse than not clearing at all.

The fix takes a list of executed site IDs and does a partial invalidation:

function _invalidatePendingPluginCacheForSiteIds(siteIds) {
  if (!Array.isArray(siteIds) || siteIds.length === 0) return;
  const idSet = new Set(siteIds);
  _updatesDashState.sites = _updatesDashState.sites.filter(
    s => !idSet.has(s.site_id)
  );
  _updatesDashState.total_pending_count =
    _updatesDashState.sites.reduce((sum, s) => sum + s.plugins.length, 0);
  _saveUpdatesDashStateToLocalStorage();
  // Leave loadedAt alone — this is a partial update, the "fetch time" is still valid
}

The caller pulls the executed _ids out of payload.sites and hands them over. When payload.sites === null (a full-site run), we fall back to full clear — so we didn’t remove full-clear, we split “explicit clear-all” from “clear-only-these”.

The old _invalidateUpdatesDashCache() is kept around (not removed). There may be a legitimate “wipe everything” caller in the future (a manual “clear cache” button, say), and keeping the explicit option around is safer than deleting it.

Lesson 3 — Always scope state-clears to the execution

Same shape as the running-site detection trap. When you write “on maintenance start, reset the cache” — be very careful that the implicit scope isn’t “everything.” If execution scope varies, take it as an argument and limit the clear to that scope. Small effort, big effect.

Closing — cache-lifetime design comes down to “your reader’s lifecycle” and “your clear’s scope”

The three pitfalls came from separate reports, but the underlying principles converge:

Restore caches at app boot, period. Tying restoration to the “first consumer” silently breaks when a second consumer is added. DOMContentLoaded + a one-time restore with try/catch defense is the safe default
Pair long TTLs with explicit freshness signals. 30 days + a “fetched N days ago” tooltip carries more information than a hard 7-day cutoff. Showing the staleness lets the user decide
Scope every state-clear to the execution. Don’t default to “wipe all.” Take the execution-scope ID list as an argument and clear only that. Keep the explicit “wipe all” available as a separate function for cases that legitimately need it

Caches are about speed, but once you’re in production, the lifetime design ends up determining the UX more than the speed gain does. Next time we add a cache, the three patterns above — boot-time restore, freshness visualization, scoped invalidation — go in as a template from the start.