Skip to content

When paramikos defaults silently get your IP banned — the look_for_keys and allow_agent trap

One day a multi-site administrator reported a strange bug: “After running the app’s SSH connection test 2-3 times, my IP can’t reach SSH on that server for a long while.” The errors came back as Connection refused or Connection closed by .... The server wasn’t down, and SSH from a different IP worked fine. The source IP was being temporarily banned at the server.

Two external investigation reports gave the cause: server-side protection mechanisms (fail2ban or PerSourcePenalties in OpenSSH 25+) detect short-windowed authentication failure spikes and temporarily ban the source IP. But the user had only clicked the test button 2-3 times — why were failures “spiking”?

The answer turned out to be paramiko’s default behavior.

paramiko’s default — trying many keys per connection

paramiko.SSHClient.connect() defaults two options to True:

client.connect(
    'host',
    pkey=my_key,
    # The following are True by default:
    # look_for_keys=True,   # also try ~/.ssh/id_* files
    # allow_agent=True,     # also try ssh-agent registered keys
)

When the explicitly passed pkey fails, paramiko falls back through ssh-agent registered keys → ~/.ssh/id_* files → password auth in order. Convenient for developers with a single key. Disastrous for a multi-site administrator:

  • The SSH agent has multiple per-site keys registered
  • ~/.ssh/ holds several id_rsa / id_ed25519 files
  • A single connect call ends up trying 5-10 keys in sequence
  • That blows past the server’s MaxAuthTries (default 6) on a single connection

So what looked to the user like “one connection test” was being seen by the server as “a suspicious IP racking up 5-10 auth failures in a row.” Repeat that 2-3 times and the protection mechanism declares the IP “exceeded threshold” and bans it.

The fix — look_for_keys=False and allow_agent=False

paramiko exposes options to scope key trial. We set them explicitly in connect_kwargs:

connect_kwargs = {
    'pkey': my_key,
    'look_for_keys': False,   # don't try ~/.ssh/id_*
    'allow_agent': False,     # don't try ssh-agent keys
}
client.connect('host', **connect_kwargs)

Now “the explicitly passed pkey and nothing else” gets tried — failure ends in one attempt per connection. The MaxAuthTries-overrun path is gone.

Backward compatibility is preserved. Existing pkey / key_filename users see no change, and password-auth users are unaffected.

Lesson from V12: fix the same bug in 10 places at once

This is where the principle from the csh / bash-syntax SSH command bug paid off: the moment you find this kind of bug, grep for the same pattern everywhere. A full sweep of Connection(...) calls found 10 sites with missing or empty connect_kwargs:

Location Role
core/ssh_utils.py::get_ssh_connection Maintenance main path
save_server_profile WP-CLI auto-detect Profile save
test_ssh_profile Connection test
discover_server_paths Path discovery
test_wpcli WP-CLI test
install_wpcli WP-CLI install
diagnose_server Server diagnosis
fetch_plugins Plugin list fetch
fetch_pending_plugins_for_site Pending plugin fetch
save_site WP-CLI auto-detect Site save

All 10 got a comment explaining the IP-block prevention rationale. If we’d patched only one, the same bug would have come back through a different code path.

We also removed a warning UI — automatic retries made it worse

In an earlier round we’d actually tried a different approach: diagnose private-key permissions before the connection test → show a warning + “Fix and connect” button. Well intentioned, but in this exact scenario it backfired completely.

The user-experienced sequence was:

  1. Warning: “Your SSH key permissions are loose. Want to fix them?”
  2. User clicks “Fix and connect” → internal chmod 600 → connection test auto-retries
  3. Retry hits the same multi-key trial → fails
  4. “Authentication failed” → try another key → fail again
  5. Failures spike → IP ban triggers

“Warn the user, auto-fix, auto-retry” turns out to be a UX pattern that amplifies failure counts when the root cause lies elsewhere. The auto-retry hides multiple attempts behind a single user click.

Worse, paramiko doesn’t enforce OpenSSH’s StrictModes check, so for this app’s purposes the key works fine even with loose permissions. The “helpful” preventive warning was actually overreach in our specific context.

We removed the warning UI in the same round (-120 / +31 = net 89 lines deleted). The _diagnoseAndOfferFix() function shell is kept for backward-compatible signatures, but its body is now a no-op.

Regression defense — an AST test that forbids empty connect_kwargs

Same two-layer defense pattern as V12: a regression test that fails the build if the bug ever comes back. tests/test_ssh_connection_isolation.py ships with 6 tests:

# Sketch
import ast

def test_all_connect_kwargs_have_look_for_keys_false():
    """Verify that every connect_kwargs / ck initialization
       contains 'look_for_keys': False and 'allow_agent': False."""
    for file in [CORE_SSH_UTILS, SITE_MANAGER_WEB]:
        for assign in find_connect_kwargs_assignments(file):
            keys = extract_dict_keys(assign.value)
            assert keys.get('look_for_keys') is False, \
                f"{file}:{assign.lineno} missing look_for_keys=False"
            assert keys.get('allow_agent') is False, \
                f"{file}:{assign.lineno} missing allow_agent=False"

If someone later adds a new SSH API and leaves connect_kwargs = {}, the build fails. Reading assign.value directly from the AST keeps comment / docstring strings from causing false positives.

Closing — three principles

  1. Library defaults aren’t always “correct” for your environment. paramiko’s look_for_keys=True / allow_agent=True is reasonable fallback behavior for single-key users, but dangerous in multi-key environments. Re-read library docs through the lens of your actual deployment environment before trusting defaults
  2. Warning UI + auto-retry can amplify the problem. “Warn the user, auto-fix, auto-retry” is well-intentioned but when the root cause lies somewhere else, the auto-retry compounds the failure count. UX-layer “helpfulness” can become operational debt
  3. Find a same-shaped bug? Grep + AST regression test on the spot. Third entry in this pattern. The /bin/sh -c wrap / _safe_run helper (V12) and now the connect_kwargs scoping share the same shape: a fix needs to apply across every SSH path in the codebase. Cross-grep + AST regression test as a standard step changes how confident you can be about “did we get them all?”

SSH-related code tends to be exactly where “library defaults break under our environment” and “the same pattern scatters across many call sites” collide. If you’re building a multi-site administrator tool on top of paramiko, defaulting look_for_keys and allow_agent to False from the start is the safer bet — that’s what this round taught us.