Some time ago, I encountered an issue on an Ubuntu server where SSH login became unusually slow or unstable.
System resources appeared normal, logs were clean, and nothing pointed to an obvious cause.
This post summarizes the analysis I performed back then.
It may be useful for anyone facing a similar situation.
1. Symptoms at the Time
- SSH login attempts were abnormally slow
- Occasional login failures
- Existing logged-in sessions worked normally
- CPU, memory, and disk usage were all normal
- Nothing unusual in
auth.log
From the outside, it was a difficult issue to diagnose because the symptoms were vague and pointed in many directions.
2. First Suspicious Finding — Large Number of bash Processes
I started by checking how many bash processes were running on the server:
ps -ef | grep bash
Under normal conditions, the number of bash processes should roughly match the number of user sessions.
However, the server had more than 100 lingering bash processes.
Example:
deploy 31830 ... -bash SSH_TTY=/dev/pts/137
deploy 31888 ... -bash SSH_TTY=/dev/pts/114
deploy 31938 ... -bash SSH_TTY=/dev/pts/19
deploy 32403 ... -bash SSH_TTY=/dev/pts/38
deploy 32602 ... -bash SSH_TTY=/dev/pts/80
Key characteristics:
- All sessions had been created via SSH from an automation/deployment server
- Some sessions had been alive for days or even weeks
- Each session was holding a separate
/dev/pts/*device
In short, SSH sessions were not being terminated cleanly, leaving behind orphaned bash processes.
3. The Core Issue — Deletion of /dev/pts Does Not Mean the Kernel Released It
Running lsof | grep pts revealed entries like:
/dev/pts/137 (deleted)
Although it appears that the device file is deleted,
the important part is this:
Even if the device file disappears, the kernel still considers the PTY number to be allocated. It is not released.
PTY devices are limited kernel-managed resources.
If they are not properly returned, SSH exhibits the following behavior:
- The kernel waits while trying to allocate a new PTY → login delays
- At some point PTY allocation fails → SSH session creation fails
This issue becomes most visible during the initial PTY allocation stage inside sshd.
4. Why “Only About 100 Sessions” Can Still Cause Problems
Most Linux systems allow 1024–4096 PTYs by default.
So at first glance, “only 100 stale PTYs” may seem harmless.
(/proc/sys/kernel/pty/max — file that shows the maximum number of PTYs the kernel can allocate)
However, slow SSH login isn’t caused by running out of total PTY count.
It is caused by how the kernel internally manages PTY allocation.
✔ Reason 1: PTY allocation is synchronous
During SSH login, sshd must complete PTY allocation.
In this step, the kernel performs:
- PTY number lookup
- Validity checks
- Handling of “deleted but not released” PTYs
- Contention resolution
If hundreds of stale PTYs remain, the allocation process slows down significantly.
✔ Reason 2: The more stale PTYs, the higher the kernel scanning cost
Entries like /dev/pts/137 (deleted) force the kernel to repeatedly check “is this usable?”
This lookup is roughly O(N).
The more stale PTYs exist, the longer it takes.
Even if there are plenty of PTY numbers remaining,
the allocation step becomes slow, and this stage becomes the bottleneck in SSH login.
This explains the issue:
SSH slowdown happens not because the system is out of PTYs, but because stale PTYs significantly delay the kernel’s PTY allocation process.
That is the core of the problem.
5. Root Cause — Automation Server Creating Unintended PTY Sessions
The automation server was running commands like:
ssh -t deploy@server "sh /opt/scripts/deploy.sh"
The main issue here is the -t option.
✔ -t forces interactive mode → always allocates a PTY
- sshd creates a PTY
- bash is launched
- Even for non-interactive automation tasks, a PTY is allocated unnecessarily
- If network interruptions or errors occur, bash can remain running → stale PTY
This pattern repeated over time and caused the accumulation of stale PTYs.
6. Resolution — Clean Up bash Processes + Fix SSH Execution Method
1) Clean Up Old bash Sessions
ps -eo pid,etimes,cmd --sort=etimes | \
grep bash | awk '$2 > 43200 {print $1}' | xargs -r kill -9
(Example: killing sessions older than 12 hours)
2) Modify Automation SSH Behavior — Avoid Creating PTYs
The ssh -T option means:
“Do not allocate a pseudo-terminal.”
Since automation tasks do not require interactive terminals,
not allocating a PTY is the safest and cleanest approach.
✔ Recommended non-interactive SSH
ssh -T deploy@server "systemctl restart app"
✔ For multiple commands
ssh -T deploy@server << 'EOF'
cd /opt/app
git pull
./restart.sh
EOF
This method provides:
- No PTY creation
- No bash spawned
- No lingering sessions
- No stale PTYs
- No /dev/pts exhaustion
It is the most stable pattern for production automation.
3) Optional: Adjust sshd_config
ClientAliveInterval 60
ClientAliveCountMax 3
LoginGraceTime 20
MaxSessions 50
MaxStartups 10:30:200
These settings help the system terminate zombie sessions more aggressively.
7. Prevent Recurrence — Automatic Cleanup Script
#!/bin/bash
ps -eo pid,etimes,cmd | grep bash | awk '$2 > 43200 {print $1}' | xargs -r kill -9
Cron example:
0 * * * * /usr/local/sbin/pts_cleanup.sh
Conclusion
To summarize, the root cause was:
- The automation server was using
ssh -t, which forcibly created PTYs - Some sessions did not terminate cleanly, leaving behind orphaned bash processes
- Stale PTYs accumulated over time
- This caused significant delays during the kernel’s PTY allocation step in SSH login
PTY issues are not just about “how many are in use” but how efficiently the kernel can scan and allocate them.
If you encounter similar problems, check the following:
- Number of bash processes
- Presence of stale PTYs
- Whether automation scripts use
ssh -t - Prefer
ssh -Tfor all non-interactive automation
These checks will help diagnose and prevent the issue effectively.
🛠 마지막 수정일: 2025.11.17
ⓒ 2025 엉뚱한 녀석의 블로그 [quirky guy's Blog]. All rights reserved. Unauthorized copying or redistribution of the text and images is prohibited. When sharing, please include the original source link.
💡 도움이 필요하신가요?
Zabbix, Kubernetes, 그리고 다양한 오픈소스 인프라 환경에 대한 구축, 운영, 최적화, 장애 분석이 필요하다면 언제든 편하게 연락 주세요.
📧 Contact: jikimy75@gmail.com
💼 Service: 구축 대행 | 성능 튜닝 | 장애 분석 컨설팅
💡 Need Professional Support?
If you need deployment, optimization, or troubleshooting support for Zabbix, Kubernetes, or any other open-source infrastructure in your production environment, feel free to contact me anytime.
📧 Email: jikimy75@gmail.com
💼 Services: Deployment Support | Performance Tuning | Incident Analysis Consulting
답글 남기기
댓글을 달기 위해서는 로그인해야합니다.