239 lines
10 KiB
Markdown
239 lines
10 KiB
Markdown
# HorizonBench.sh
|
||
|
||
MTU diagnostic and load test tool for Linux servers. Detects the machine role automatically, tests path MTU both idle and under real TCP load, and gives role-aware verdicts — so a VPS behind a WireGuard tunnel does not get the same warnings as a WireGuard router.
|
||
|
||
**Read-only / non-destructive.** The script makes no changes to the system. It only reads kernel state, sends ICMP and TCP probes, and writes a log file to `/tmp`. All temporary files are cleaned up on exit.
|
||
|
||
---
|
||
|
||
## Requirements
|
||
|
||
| Package | Debian | AlmaLinux | Required |
|
||
|---|---|---|---|
|
||
| `iproute2` / `iproute` | `apt install iproute2` | `dnf install iproute` | Yes |
|
||
| `iputils-ping` / `iputils` | `apt install iputils-ping` | `dnf install iputils` | Yes |
|
||
| `iproute2` (ss) | included above | included above | Yes |
|
||
| `ethtool` | `apt install ethtool` | `dnf install ethtool` | Optional |
|
||
| `iperf3` | `apt install iperf3` | `dnf install iperf3` | Optional (load test) |
|
||
| `wireguard-tools` | `apt install wireguard-tools` | `dnf install wireguard-tools` | Optional (WG peer detail) |
|
||
| `bird2` / `birdc` | `apt install bird2` | `dnf install bird` | Optional (BGP detail) |
|
||
|
||
---
|
||
|
||
## Usage
|
||
|
||
```bash
|
||
chmod +x HorizonBench.sh
|
||
sudo ./HorizonBench.sh [TARGET_IP] [INTERFACE] [OPTIONS]
|
||
```
|
||
|
||
### Arguments
|
||
|
||
| Argument | Default | Description |
|
||
|---|---|---|
|
||
| `TARGET_IP` | `8.8.8.8` | IP or hostname to probe with ICMP and as iperf3 reference point |
|
||
| `INTERFACE` | auto-detected | Network interface to test. If omitted, an interactive TUI lets you pick |
|
||
|
||
### Options
|
||
|
||
| Option | Description |
|
||
|---|---|
|
||
| `--expected-mtu N` | Tell the script the expected path MTU. Use this when the path intentionally has a reduced MTU due to a tunnel (e.g. WireGuard, PPPoE). All verdicts are then relative to this value instead of 1500. Accepted range: 576–9000 |
|
||
| `--no-load` | Skip the load test entirely. Zero extra traffic, useful on sensitive production machines |
|
||
|
||
### Examples
|
||
|
||
```bash
|
||
# Interactive TUI interface picker, probe 8.8.8.8
|
||
sudo ./HorizonBench.sh
|
||
|
||
# Specify target and interface directly
|
||
sudo ./HorizonBench.sh 1.2.3.4 eth0
|
||
|
||
# VPS behind a WireGuard tunnel with path MTU 1370
|
||
sudo ./HorizonBench.sh --expected-mtu 1370
|
||
|
||
# WireGuard router, skip load test
|
||
sudo ./HorizonBench.sh 8.8.8.8 wg0 --no-load
|
||
|
||
# Full explicit invocation
|
||
sudo ./HorizonBench.sh 1.2.3.4 ens18 --expected-mtu 1420
|
||
```
|
||
|
||
---
|
||
|
||
## What it tests
|
||
|
||
The script runs 8 sections in order.
|
||
|
||
### Section 0 — Machine Role Detection
|
||
|
||
Inspects the machine before any MTU testing and determines what kind of node it is. This drives all subsequent verdicts — a VPS gets different warnings than a WireGuard router.
|
||
|
||
| Detected role | Conditions |
|
||
|---|---|
|
||
| `vps` | No WireGuard, no BIRD, ip_forward=0 |
|
||
| `wg-client` | WireGuard present, ip_forward=0 |
|
||
| `wg-router` | WireGuard present, ip_forward=1 |
|
||
| `bird-router` | BIRD daemon running, no WireGuard |
|
||
| `wg-bird-router` | WireGuard + BIRD both present |
|
||
| `router` | ip_forward=1, no WireGuard or BIRD |
|
||
|
||
On a WireGuard router the script also reads:
|
||
- MTU from all `wg*` interfaces and auto-sets `--expected-mtu` if not already specified
|
||
- Peer count per interface via `wg show`
|
||
- Routed **public** subnets from `allowed-ips` (RFC1918, loopback, and /32 host routes are filtered out)
|
||
|
||
On a BIRD router the script reads active BGP/OSPF sessions via `birdc show protocols` and the route table count via `birdc show route count`.
|
||
|
||
### Section 1 — Interface & Kernel MTU Settings
|
||
|
||
Reads the MTU of the selected interface and all other interfaces, then checks kernel sysctls relevant to path MTU discovery:
|
||
|
||
- `net.ipv4.ip_no_pmtu_disc`
|
||
- `net.ipv4.tcp_mtu_probing`
|
||
- `net.ipv4.route.min_pmtu`
|
||
- `net.ipv4.ip_forward`
|
||
- `net.ipv4.conf.all.rp_filter`
|
||
|
||
### Section 2 — ICMP MTU Step-Down Probe
|
||
|
||
Sends ICMP packets with the Don't-Fragment bit set at decreasing sizes: 1500, 1492, 1480, 1472, 1450, 1420, 1400, 1300, 1200, 1000, 576. Finds the largest size that gets through. Because the step list has gaps of up to 100 bytes, this gives a rough range; the exact value comes from Section 3.
|
||
|
||
### Section 3 — Binary-Search Exact Path MTU
|
||
|
||
Binary bisects between 576 and 1500 to find the exact path MTU byte-perfect. With `--expected-mtu` set, the result is compared against that value with a ±10 byte tolerance.
|
||
|
||
### Section 4 — TCP MSS Inspection
|
||
|
||
Reads MSS values from all active TCP connections via `ss -tin` and shows the distribution. Also checks for TCPMSS clamping rules in iptables mangle FORWARD.
|
||
|
||
Role-aware verdicts:
|
||
- **VPS / wg-client**: no FORWARD clamping warning (ip_forward is off, forwarding never happens)
|
||
- **wg-router / wg-bird-router**: warns if FORWARD clamping is missing, shows the exact iptables command with the correct MSS value derived from the WireGuard interface MTU
|
||
- **bird-router / router**: warns if FORWARD clamping is missing
|
||
|
||
### Section 5 — Interface Error Counters
|
||
|
||
Reads RX/TX byte, packet, error, and drop counters from `/proc/net/dev` and `ip -s link`. Also reads fragmentation-relevant stats from `ethtool -S` if available.
|
||
|
||
### Section 6a — Public iperf3 Server Selection
|
||
|
||
Finds the best public iperf3 server using a three-phase parallel ping pipeline:
|
||
|
||
**Phase 1** — Pings one representative per region (EU, NA, ASIA, OCE) in parallel with 3 pings each. Eliminates distant regions immediately. Regions within 2× the best RTT are kept.
|
||
|
||
**Phase 2** — Pings all servers in the winning region(s), 2 at a time in parallel, with 4 pings each. Identifies the best country by grouping results.
|
||
|
||
**Phase 3** — Retests the top 3 servers of the best country with 10 pings each (serial) for an accurate final measurement. Picks the winner.
|
||
|
||
This approach covers 30 servers across EU/NL/DE/GB/FR/CH/SE/NA/ASIA/OCE and typically completes in ~30 seconds instead of the ~90 seconds a sequential ping of all servers would take.
|
||
|
||
If iperf3 is not installed, this section is skipped and the load test falls back to flood-ping.
|
||
|
||
### Section 6b — MTU Under Load
|
||
|
||
Runs real TCP traffic to the selected public iperf3 server while simultaneously firing ICMP MTU probes at the target. This catches MTU issues that only appear under load.
|
||
|
||
**What runs:**
|
||
- `iperf3 -c <server> -P 4 -M 1460 -t 15s` — 4 parallel TCP streams with MSS 1460 over the real network path
|
||
- Concurrent ICMP step-down probes (DF-bit) at TARGET during the iperf3 run
|
||
- RX/TX error counter delta before vs after the entire load period
|
||
|
||
**Retransmit interpretation:**
|
||
|
||
With `--expected-mtu` set and ICMP results clean at/below that MTU, retransmits are classified as TCP PMTUD warmup — the kernel opens connections at MSS 1460, the path reduces it to `expected_mtu - 40` in the first 1-2 seconds causing a burst of retransmits, then settles. This is normal and expected. The script reports `[PASS]` with an explanation instead of a false `[FAIL]`.
|
||
|
||
Without `--expected-mtu` and with ICMP failures, retransmits are a genuine signal of an MTU problem.
|
||
|
||
Falls back to flood-ping (`ping -i 0.05 -M do -s 1472`) if iperf3 is not installed.
|
||
|
||
### Section 7 — Traffic Control (tc) Inspection
|
||
|
||
Reads qdisc configuration and tc filters on the selected interface. Checks the minimum packet unit (mpu) value if set.
|
||
|
||
### Section 8 — Summary & Recommendations
|
||
|
||
Shows a role-specific fixes box. Only relevant commands are shown:
|
||
|
||
- **VPS / client**: PMTUD sysctl, tracepath hint for hoster contact
|
||
- **WireGuard router**: exact `ip link set <iface> mtu` and `iptables TCPMSS --set-mss <value>` commands with values derived from the detected WireGuard MTU
|
||
- **BIRD / generic router**: MSS clamping and PMTUD commands
|
||
|
||
---
|
||
|
||
## Understanding the output
|
||
|
||
### Status prefixes
|
||
|
||
| Prefix | Meaning |
|
||
|---|---|
|
||
| `[PASS]` | Check passed, no action needed |
|
||
| `[WARN]` | Something worth investigating, may or may not be a problem |
|
||
| `[FAIL]` | Problem detected that likely affects connectivity |
|
||
| `[INFO]` | Informational, no verdict |
|
||
| `[SKIP]` | Section skipped (e.g. `--no-load`) |
|
||
|
||
### Common scenarios
|
||
|
||
**Everything passes, issues = 0**
|
||
The machine is healthy. Path MTU is as expected, no errors or drops, TCP MSS is correctly set.
|
||
|
||
**Path MTU reduced, no `--expected-mtu` set (VPS)**
|
||
```
|
||
[INFO] Path MTU=1370 — reduced by upstream infrastructure, use --expected-mtu 1370 if intentional
|
||
```
|
||
Rerun with `--expected-mtu 1370`. If everything then passes, the reduced MTU is set by your upstream router and is working correctly.
|
||
|
||
**Path MTU reduced, `--expected-mtu` set**
|
||
```
|
||
[PASS] Path MTU=1370 — matches expected tunnel MTU 1370 ✓
|
||
```
|
||
Everything is working as designed.
|
||
|
||
**High iperf3 retransmits with `--expected-mtu`, ICMP clean**
|
||
```
|
||
[INFO] iperf3 retransmits: 487 — ICMP path clean at/below expected MTU 1370
|
||
[PASS] Retransmit correlation: consistent with PMTUD warmup, not an MTU problem ✓
|
||
```
|
||
TCP PMTUD warmup. The kernel starts connections at MSS 1460, the path clamps it down in the first 1-2 seconds. Normal behaviour when `tcp_mtu_probing=0` and standard PMTUD is handling it.
|
||
|
||
**FAIL: TCPMSS clamping missing on WireGuard router**
|
||
```
|
||
[WARN] No TCPMSS clamping rule in mangle FORWARD — required on WireGuard router
|
||
```
|
||
Add the clamping rule shown in the Section 8 fixes box. Use `--set-mss <value>` rather than `--clamp-mss-to-pmtu` to set an explicit value matching your WireGuard MTU minus 40.
|
||
|
||
**FAIL: RX/TX errors or drops under load**
|
||
```
|
||
[FAIL] New errors/drops during load — check MTU mismatch, ring buffer size, or NIC driver
|
||
```
|
||
Check `ethtool -g <iface>` for ring buffer size, and `ethtool -S <iface>` for driver-level drop counters.
|
||
|
||
---
|
||
|
||
## Log file
|
||
|
||
Every run writes a full timestamped log to `/tmp/horizonbench_YYYYMMDD_HHMMSS.log`. The path is shown at the start and end of every run.
|
||
|
||
---
|
||
|
||
## Common MTU values reference
|
||
|
||
| Scenario | Interface MTU | Path MTU | MSS |
|
||
|---|---|---|---|
|
||
| Standard Ethernet | 1500 | 1500 | 1460 |
|
||
| WireGuard over Ethernet | 1420 | 1420 | 1380 |
|
||
| PPPoE (DSL) | 1492 | 1492 | 1452 |
|
||
| VXLAN / GRE over Ethernet | 1450 | 1450 | 1410 |
|
||
| Custom tunnel (e.g. IPVEE) | varies | 1370 | 1330 |
|
||
| Jumbo frames | 9000 | 9000 | 8960 |
|
||
|
||
---
|
||
|
||
## Notes
|
||
|
||
- The script requires root (`sudo`) because reading some interface stats and sending raw ICMP with `ping -M do` needs elevated privileges on some systems.
|
||
- The iperf3 load test connects to public servers. Make sure outbound TCP on ports 5200–5210 is not firewalled on the machine you are testing.
|
||
- On machines running as WireGuard gateways for customer subnets, the `--expected-mtu` is auto-detected from the WireGuard interface MTU. You can override it if needed.
|
||
- The script is safe to run on live production systems. It does not modify any kernel parameters, firewall rules, interface settings, or files outside of `/tmp`. |