Analysis of Slow SSH Login — /dev/pts Contention Caused by Accumulated bash Sessions

Some time ago, I encountered an issue on an Ubuntu server where SSH login became unusually slow or unstable.
System resources appeared normal, logs were clean, and nothing pointed to an obvious cause.
This post summarizes the analysis I performed back then.
It may be useful for anyone facing a similar situation.


1. Symptoms at the Time

  • SSH login attempts were abnormally slow
  • Occasional login failures
  • Existing logged-in sessions worked normally
  • CPU, memory, and disk usage were all normal
  • Nothing unusual in auth.log

From the outside, it was a difficult issue to diagnose because the symptoms were vague and pointed in many directions.


2. First Suspicious Finding — Large Number of bash Processes

I started by checking how many bash processes were running on the server:

ps -ef | grep bash

Under normal conditions, the number of bash processes should roughly match the number of user sessions.
However, the server had more than 100 lingering bash processes.

Example:

deploy   31830  ...  -bash SSH_TTY=/dev/pts/137
deploy   31888  ...  -bash SSH_TTY=/dev/pts/114
deploy   31938  ...  -bash SSH_TTY=/dev/pts/19
deploy   32403  ...  -bash SSH_TTY=/dev/pts/38
deploy   32602  ...  -bash SSH_TTY=/dev/pts/80

Key characteristics:

  • All sessions had been created via SSH from an automation/deployment server
  • Some sessions had been alive for days or even weeks
  • Each session was holding a separate /dev/pts/* device

In short, SSH sessions were not being terminated cleanly, leaving behind orphaned bash processes.


3. The Core Issue — Deletion of /dev/pts Does Not Mean the Kernel Released It

Running lsof | grep pts revealed entries like:

/dev/pts/137 (deleted)

Although it appears that the device file is deleted,
the important part is this:

Even if the device file disappears, the kernel still considers the PTY number to be allocated. It is not released.

PTY devices are limited kernel-managed resources.
If they are not properly returned, SSH exhibits the following behavior:

  • The kernel waits while trying to allocate a new PTY → login delays
  • At some point PTY allocation fails → SSH session creation fails

This issue becomes most visible during the initial PTY allocation stage inside sshd.


4. Why “Only About 100 Sessions” Can Still Cause Problems

Most Linux systems allow 1024–4096 PTYs by default.
So at first glance, “only 100 stale PTYs” may seem harmless.
(/proc/sys/kernel/pty/max — file that shows the maximum number of PTYs the kernel can allocate)

However, slow SSH login isn’t caused by running out of total PTY count.
It is caused by how the kernel internally manages PTY allocation.

Reason 1: PTY allocation is synchronous

During SSH login, sshd must complete PTY allocation.
In this step, the kernel performs:

  • PTY number lookup
  • Validity checks
  • Handling of “deleted but not released” PTYs
  • Contention resolution

If hundreds of stale PTYs remain, the allocation process slows down significantly.

Reason 2: The more stale PTYs, the higher the kernel scanning cost

Entries like /dev/pts/137 (deleted) force the kernel to repeatedly check “is this usable?”

This lookup is roughly O(N).
The more stale PTYs exist, the longer it takes.

Even if there are plenty of PTY numbers remaining,
the allocation step becomes slow, and this stage becomes the bottleneck in SSH login.

This explains the issue:

SSH slowdown happens not because the system is out of PTYs, but because stale PTYs significantly delay the kernel’s PTY allocation process.

That is the core of the problem.


5. Root Cause — Automation Server Creating Unintended PTY Sessions

The automation server was running commands like:

ssh -t deploy@server "sh /opt/scripts/deploy.sh"

The main issue here is the -t option.

-t forces interactive mode → always allocates a PTY

  • sshd creates a PTY
  • bash is launched
  • Even for non-interactive automation tasks, a PTY is allocated unnecessarily
  • If network interruptions or errors occur, bash can remain running → stale PTY

This pattern repeated over time and caused the accumulation of stale PTYs.


6. Resolution — Clean Up bash Processes + Fix SSH Execution Method

1) Clean Up Old bash Sessions

ps -eo pid,etimes,cmd --sort=etimes | \
    grep bash | awk '$2 > 43200 {print $1}' | xargs -r kill -9

(Example: killing sessions older than 12 hours)


2) Modify Automation SSH Behavior — Avoid Creating PTYs

The ssh -T option means:

“Do not allocate a pseudo-terminal.”

Since automation tasks do not require interactive terminals,
not allocating a PTY is the safest and cleanest approach.

✔ Recommended non-interactive SSH

ssh -T deploy@server "systemctl restart app"

✔ For multiple commands

ssh -T deploy@server << 'EOF'
cd /opt/app
git pull
./restart.sh
EOF

This method provides:

  • No PTY creation
  • No bash spawned
  • No lingering sessions
  • No stale PTYs
  • No /dev/pts exhaustion

It is the most stable pattern for production automation.


3) Optional: Adjust sshd_config

ClientAliveInterval 60
ClientAliveCountMax 3
LoginGraceTime 20
MaxSessions 50
MaxStartups 10:30:200

These settings help the system terminate zombie sessions more aggressively.


7. Prevent Recurrence — Automatic Cleanup Script

#!/bin/bash
ps -eo pid,etimes,cmd | grep bash | awk '$2 > 43200 {print $1}' | xargs -r kill -9

Cron example:

0 * * * * /usr/local/sbin/pts_cleanup.sh

Conclusion

To summarize, the root cause was:

  • The automation server was using ssh -t, which forcibly created PTYs
  • Some sessions did not terminate cleanly, leaving behind orphaned bash processes
  • Stale PTYs accumulated over time
  • This caused significant delays during the kernel’s PTY allocation step in SSH login

PTY issues are not just about “how many are in use” but how efficiently the kernel can scan and allocate them.

If you encounter similar problems, check the following:

  • Number of bash processes
  • Presence of stale PTYs
  • Whether automation scripts use ssh -t
  • Prefer ssh -T for all non-interactive automation

These checks will help diagnose and prevent the issue effectively.

🛠 마지막 수정일: 2025.11.17

ⓒ 2025 엉뚱한 녀석의 블로그 [quirky guy's Blog]. 본문 및 이미지를 무단 복제·배포할 수 없습니다. 공유 시 반드시 원문 링크를 명시해 주세요.
ⓒ 2025 엉뚱한 녀석의 블로그 [quirky guy's Blog]. All rights reserved. Unauthorized copying or redistribution of the text and images is prohibited. When sharing, please include the original source link.

💡 도움이 필요하신가요?
Zabbix, Kubernetes, 그리고 다양한 오픈소스 인프라 환경에 대한 구축, 운영, 최적화, 장애 분석이 필요하다면 언제든 편하게 연락 주세요.

📧 Contact: jikimy75@gmail.com
💼 Service: 구축 대행 | 성능 튜닝 | 장애 분석 컨설팅


💡 Need Professional Support?
If you need deployment, optimization, or troubleshooting support for Zabbix, Kubernetes, or any other open-source infrastructure in your production environment, feel free to contact me anytime.

📧 Email: jikimy75@gmail.com
💼 Services: Deployment Support | Performance Tuning | Incident Analysis Consulting