SSH Hardening Implementation - Nate's Portfolio

The Problem

My homelab infrastructure was accessible via Tailscale VPN for remote administration—a common and reasonable pattern. However, all systems used password-based SSH authentication, which creates several security risks:

Before (Vulnerable)

Password-based SSH on all systems
Unlimited authentication attempts
No automated threat response
Single factor (password only)
Compromised password = full access

After (Hardened)

Ed25519 key-based authentication
Password authentication disabled
fail2ban monitoring all access
Automatic IP banning on failures
Defense-in-depth architecture

The risk model: if my Tailscale account were ever compromised, an attacker would have direct network access to infrastructure. Password-based SSH would then allow brute force attacks against any exposed system.

The Solution

Security Architecture

I implemented a defense-in-depth approach with two complementary layers:

┌─────────────────────────────────────────────────────────────────┐
│                    SSH HARDENING ARCHITECTURE                    │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│   LAYER 1: Authentication Hardening                             │
│   ┌─────────────────────────────────────────────────────────┐   │
│   │  Ed25519 Key-Based Authentication                       │   │
│   │  • Private key never leaves client device               │   │
│   │  • Public key deployed to all servers                   │   │
│   │  • Password authentication completely disabled          │   │
│   └─────────────────────────────────────────────────────────┘   │
│                              │                                   │
│                              ▼                                   │
│   LAYER 2: Intrusion Prevention                                 │
│   ┌─────────────────────────────────────────────────────────┐   │
│   │  fail2ban Active Monitoring                             │   │
│   │  • Watches /var/log/auth.log for failures               │   │
│   │  • 5 failed attempts → 10-minute IP ban                 │   │
│   │  • VPN network whitelisted to prevent self-lockout      │   │
│   └─────────────────────────────────────────────────────────┘   │
│                                                                  │
├─────────────────────────────────────────────────────────────────┤
│   SYSTEMS PROTECTED                                              │
│   • 3 Proxmox Hypervisors (root with keys, prohibit-password)   │
│   • 3 Linux VMs (user account + sudo, root login disabled)      │
└─────────────────────────────────────────────────────────────────┘

Different Security Models by System Type

Enterprise environments treat hypervisors differently from application servers. I applied the same principle:

Proxmox Hypervisors

PermitRootLogin prohibit-password
PasswordAuthentication no
PubkeyAuthentication yes

Root access required for Proxmox management, but only via SSH keys.

Linux VMs

PermitRootLogin no
PasswordAuthentication no
PubkeyAuthentication yes

Root cannot SSH at all. User account with sudo for all admin tasks.

Implementation Details

Phase 1: SSH Key Deployment

Using existing Ed25519 keys (stored securely in password manager), I deployed public keys to all systems using ssh-copy-id:

# Deploy to each system, test immediately
ssh-copy-id root@proxmox-node
ssh root@proxmox-node  # Verify key auth works

ssh-copy-id user@linux-vm
ssh user@linux-vm      # Verify key auth works

Critical Safety Pattern: Always test SSH key authentication in a NEW terminal before closing your existing session. Keep the old session open as a safety net until the new one is verified working.

Phase 2: Disable Password Authentication

With key authentication verified on all systems, I disabled password auth:

# Edit SSH config
sudo nano /etc/ssh/sshd_config

# Verify changes
grep -E "^PermitRootLogin|^PasswordAuthentication|^PubkeyAuthentication" /etc/ssh/sshd_config

# Restart SSH service
sudo systemctl restart ssh    # Ubuntu/Debian VMs
systemctl restart sshd        # Proxmox nodes

Phase 3: Deploy fail2ban

Installed and configured fail2ban for automated intrusion prevention:

# Install
sudo apt update && sudo apt install fail2ban -y

# Create local config
sudo nano /etc/fail2ban/jail.local

Configuration applied:

[DEFAULT]
bantime = 10m
findtime = 10m
maxretry = 5
ignoreip = 127.0.0.1/8 ::1

[sshd]
enabled = true
port = ssh
logpath = /var/log/auth.log
backend = systemd

Why these settings:

10-minute bans: Slows attackers without permanent blocks (avoids self-lockout)
5 attempts: Balance between security and usability
VPN whitelist: Never ban legitimate remote access
/var/log/auth.log: Correct log path for Debian/Ubuntu/Proxmox

The Challenges

Challenge 1: fail2ban Couldn't Find Log File

▼

What Went Wrong:

ERROR Failed during configuration: Have not found any log file for sshd jail

The Problem:

Default fail2ban config looks for /var/log/sshd.log, but Debian/Ubuntu/Proxmox use /var/log/auth.log.

The Fix:

Created /etc/fail2ban/jail.local with explicit log path:

[sshd]
enabled = true
logpath = /var/log/auth.log
backend = systemd

Lesson Learned:

Always check distro-specific paths. Default configs assume RHEL/CentOS conventions that don't apply to Debian-family systems.

Challenge 2: SSH Service Name Varies by Distro

▼

What Went Wrong:

Failed to restart sshd.service: Unit sshd.service not found

The Problem:

Different distributions use different service names:

Ubuntu/Debian: Service is called ssh
RHEL/CentOS/Proxmox: Service is called sshd

The Fix:

# On Ubuntu/Debian VMs
sudo systemctl restart ssh

# On Proxmox nodes  
systemctl restart sshd

Lesson Learned:

Know your target systems. This is exactly the kind of detail that matters when managing heterogeneous infrastructure.

Challenge 3: Proxmox Enterprise Repository 401 Errors

▼

What Went Wrong:

E: Failed to fetch https://enterprise.proxmox.com/... 401 Unauthorized

The Problem:

Proxmox comes with enterprise repository enabled by default, which requires a paid subscription.

The Fix:

# Disable enterprise repository
mv /etc/apt/sources.list.d/pbs-enterprise.list \
   /etc/apt/sources.list.d/pbs-enterprise.list.disabled

# Now apt update works
apt update && apt install fail2ban -y

Lesson Learned:

Proxmox defaults to enterprise repos. For homelab use, disable these or add the no-subscription repository.

Results

100% password attack surface eliminated - No system accepts password authentication
Zero downtime - Parallel testing methodology prevented any lockouts
Automated threat response - 5 failed attempts triggers 10-minute IP ban
SOC 2 compliant authentication - Key-based auth meets compliance requirements
Defense-in-depth - Authentication hardening + intrusion prevention layers
Proper role separation - Hypervisors vs VMs have appropriate security models

Verification

Confirmed hardening is working correctly:

# Test that password auth is disabled
$ ssh -o PubkeyAuthentication=no root@proxmox-node
Permission denied (publickey).

# Verify fail2ban is active
$ fail2ban-client status sshd
Status for the jail: sshd
|- Filter
|  |- Currently failed: 0
|  |- Total failed:     0
|  `- File list:        /var/log/auth.log
`- Actions
   |- Currently banned: 0
   |- Total banned:     0
   `- Banned IP list:

Interview Story (STAR Format)

Situation

"I had homelab infrastructure with 3 Proxmox hypervisors and 3 Linux VMs accessible via VPN. All systems used password-based SSH authentication, creating vulnerability to brute force attacks if the VPN were ever compromised."

Task

"Implement enterprise-grade SSH hardening across all 6 systems without service disruption. Required maintaining different security models—hypervisors need root access for management, while application VMs should use user accounts with sudo."

Action

"I implemented a defense-in-depth approach: First, deployed Ed25519 SSH keys to all systems using ssh-copy-id, testing each before proceeding. Then disabled password authentication with role-appropriate configs—'prohibit-password' for Proxmox root, 'no' for VM root login. Finally, deployed fail2ban on all systems with 10-minute bans after 5 failed attempts, whitelisting the VPN network to prevent self-lockout."

Result

"Achieved 100% elimination of password-based attack surface across all 6 systems in about 2 hours with zero downtime. Implemented SOC 2 compliant authentication and automated threat response. Created comprehensive documentation including emergency recovery procedures."

Skills Demonstrated

Technical Skills

SSH/OpenSSH Ed25519 Cryptography fail2ban Linux System Administration Debian/Ubuntu Proxmox VE systemd Defense-in-Depth

Professional Skills

Risk Assessment Change Management Zero-Downtime Deployment Technical Documentation Security Compliance (SOC 2) Troubleshooting