Opened 7 years ago

Closed 6 years ago

Last modified 5 years ago

#54193 closed defect (fixed)

squid3+launchd causes multiple runaway processes when run in daemon mode

Reported by: essandess (Steve Smith) Owned by: jmroot (Joshua Root)
Priority: Normal Milestone:
Component: ports Version: 2.4.1
Keywords: Cc:
Port: squid3

Description

sudo port load squid3

regularly and consistently causes multiple runaway squid process with 100 percent cpu. The problem is severe enough to crash a squid server. I observe this problem, as do others here: https://github.com/essandess/osxfortress/issues/5#issuecomment-292507167 and elsewhere.

This happens after a network interface is changed, or some other change in dis or the network. The only fix is to kill everything:

sudo port unload squid3
sudo killall -9 squid
sudo killall -9 '(squid-1)'

This problem is caused because the file /opt/local/etc/LaunchDaemons/org.macports.Squid/Squid.wrapper specifies that squid is launched in daemon mode, which confuses launchd.

This is a long standing issue between launchd and Macports squid. It is

The confirmed fix is to launch squid in non-daemon mode by adding the -N flag to the wrapper's Start() method.

Here is the corrected, working Squid.wrapper file:

#!/bin/sh
#
# MacPorts generated daemondo support script
#

#
# Init
#
prefix=/opt/local

#
# Start
#
Start()
{
	cd /opt/local/var/squid
	if [ ! -d "/opt/local/var/squid/cache/00" ]; then
	    /opt/local/sbin/squid -s -z
	fi
	/opt/local/sbin/squid -s -N
}

#
# Stop
#
Stop()
{
	cd /opt/local/var/squid
	/opt/local/sbin/squid -k shutdown
	while /opt/local/sbin/squid -k check; do
	    sleep 1
	done
}

#
# Restart
#
Restart()
{
	Stop
	Start
}

#
# Run
#
Run()
{
case $1 in
  start  ) Start   ;;
  stop   ) Stop    ;;
  restart) Restart ;;
  *      ) echo "$0: unknown argument: $1";;
esac
}

#
# Run a phase based on the selector
#
Run $1

Attachments (1)

Screen Shot 2017-05-21 at 6.07.42 PM.png (376.8 KB) - added by essandess (Steve Smith) 7 years ago.
Activity Monitor for Runaway Squid Processes

Download all attachments as: .zip

Change History (19)

comment:1 Changed 7 years ago by jmroot (Joshua Root)

We're not running squid directly from launchd, it's going via daemondo, the whole purpose of which is to interface between launchd and things running in daemon mode. So I'm not really sold on this analysis of what is going on, or convinced that this is the correct fix.

I don't doubt that you are seeing this issue, but I run squid myself (on a machine that undergoes netchange on occasion) and have never seen it. So it's not as simple as the current config being just plain wrong. Could be a bug in daemondo, maybe a race condition with the pidfile?

comment:2 Changed 7 years ago by jmroot (Joshua Root)

Priority: HighNormal

comment:3 Changed 7 years ago by essandess (Steve Smith)

I see this issue regularly on a variety of boxes and networks: Mac Pros, Minis, MBPs. One near-sure way to trigger it is to sleep a box through a VPN expiration.

Here's squid.conf: https://raw.githubusercontent.com/essandess/osxfortress/master/squid.conf

And specifying -N definitely fixes the issue.

What's your setting for http_port? Do you bind to a fixed IP address?

comment:4 Changed 7 years ago by mf2k (Frank Schima)

Cc: jmroot removed
Owner: set to jmroot
Status: newassigned

comment:5 in reply to:  3 Changed 7 years ago by jmroot (Joshua Root)

Replying to essandess:

And specifying -N definitely fixes the issue.

Works around it at least. Using -N has undesirable effects on privilege separation and SMP scalability. I'd like to understand and fix the actual bug.

What's your setting for http_port? Do you bind to a fixed IP address?

http_port 3128

comment:6 Changed 7 years ago by essandess (Steve Smith)

Please let me know if I can help with any diagnostics. This issue is both annoyingly consistent in that it would trigger every few days if there lots of network changes, but also annoyingly unpredictable. I was never able to nail down the precise issue or conditions. At some point I was speculating about loss of DNS service with mDNSRespnder as a trigger, and haven't ruled out that hypothesis.

comment:7 Changed 7 years ago by jmroot (Joshua Root)

When the issue occurs, does squid.pid contain a valid pid?

I'll try to find some time to do a stress test with many net changes in a VM or something. Very hard to diagnose what I can't reproduce.

comment:8 Changed 7 years ago by essandess (Steve Smith)

I believe I have an instance running now that exhibits the underlying issue.

I have set the -N flag in Squid.wrapper and do not see the runaway squid processes issue this is intended to fix; however, other problems exist that I believe are related.

Problem 1: The squid.pid file doesn't match the squid pid.

Problem 2: The squid process is not responding to -k signals.

Good thing: The running squid process is working fine as a proxy.

Here's what I see:

$ ps -el | grep squid | grep -v grep
    0  3621     1     4004   0  20  0  2506820    912 -      Ss                  0 ??         0:00.02 /opt/local/bin/daemondo --label=Squid --start-cmd /opt/local/etc/LaunchDaemons/org.macports.Squid/Squid.wrapper start ; --stop-cmd /opt/local/etc/LaunchDaemons/org.macports.Squid/Squid.wrapper stop ; --restart-cmd /opt/local/etc/LaunchDaemons/org.macports.Squid/Squid.wrapper restart ; --restart-netchange --pid=fileauto --pidfile /opt/local/var/run/squid/squid.pid
  504  3626  3622     4104   0  20  0  2970120 473076 -      S                   0 ??        10:58.75 /opt/local/sbin/squid -s -N
  504  3639  3626     4004   0  20  0  2432784    212 -      Ss                  0 ??         0:05.76 (logfile-daemon) /opt/local/var/squid/logs/access.log
$ cat /opt/local/var/run/squid/squid.pid
74809

Error sending a signal to squid:

$ sudo squid -k rotate
squid: ERROR: Could not send signal 30 to process 74809: (3) No such process

Permissions check:

$ ls -l /opt/local/var/run/squid/squid.pid
-rw-r--r--  1 root  squid  6 May 20 21:21 /opt/local/var/run/squid/squid.pid
$ ls -ld /opt/local/var/run/squid
drwxr-xr-x  4 squid  squid  136 May 19 06:48 /opt/local/var/run/squid

(Is this correct? There's a difference between the permissions on squid.pid file and the directory in which it resides.)

Background info:

  • Squid is loaded using the command sudo port load squid
  • I have verified that the launchd plist is loaded:
$ sudo launchctl list | grep -i squid
-	0	org.squid-cache.squid-rotate.plist
3621	0	org.macports.Squid
  • The state of the current problem arose some time after a fresh reboot several days ago. The launchd plist was loaded at boot time.

Because this issue is running in a stable state (and I have a working proxy), I would be able to query any additional logs that you might need.

I've also launched squid without the -N flag on a separate box and am waiting for it to fail out with a bunch of runaway processes.

Last edited 7 years ago by essandess (Steve Smith) (previous) (diff)

comment:9 Changed 7 years ago by essandess (Steve Smith)

Here's the runaway 100 percent multiple squid processes failure mode when the -N flag is not used. Again, the pid file /opt/local/var/run/squid/squid.pid has an inconsistent pid

This is from a different box. It took somewhere between 24 and 36 hours for this issue to appear:

https://trac.macports.org/attachment/ticket/54193/Screen%20Shot%202017-05-21%20at%206.07.42%20PM.png

$ ps -el | grep squid | grep -v grep
    0 48655     1      104   0  20  0  2465472    628 -      Ss                  0 ??         0:00.00 /opt/local/sbin/squid -s
  503 48657 48655     4104   0  20  0  2633220 140332 -      S                   0 ??         0:12.53 (squid-1) -s
    0 48662     1     4004   0  20  0  2515008   2444 -      Ss                  0 ??         0:00.02 /opt/local/bin/daemondo --label=Squid --start-cmd /opt/local/etc/LaunchDaemons/org.macports.Squid/Squid.wrapper start ; --stop-cmd /opt/local/etc/LaunchDaemons/org.macports.Squid/Squid.wrapper stop ; --restart-cmd /opt/local/etc/LaunchDaemons/org.macports.Squid/Squid.wrapper restart ; --restart-netchange --pid=fileauto --pidfile /opt/local/var/run/squid/squid.pid
    0 48665     1      104   0  20  0  2491072    676 -      Ss                  0 ??         0:00.04 /opt/local/sbin/squid -s
  503 48668 48657     4004   0  20  0  2440976    688 -      S                   0 ??         0:00.00 (logfile-daemon) /opt/local/var/squid/logs/access.log
    0 48699     1      104   0  20  0  2491072    736 -      Ss                  0 ??         0:00.03 /opt/local/sbin/squid -s
    0 48744     1      104   0  20  0  2473664    672 -      Ss                  0 ??         0:00.03 /opt/local/sbin/squid -s
    0 48792     1      104   0  20  0  2465472    668 -      Ss                  0 ??         0:00.02 /opt/local/sbin/squid -s
    0 48850     1      104   0  20  0  2474688    660 -      Ss                  0 ??         0:00.02 /opt/local/sbin/squid -s
    0 48913     1      104   0  20  0  2507456    696 -      Ss                  0 ??         0:00.02 /opt/local/sbin/squid -s
    0 48980     1      104   0  20  0  2473664    628 -      Ss                  0 ??         0:00.01 /opt/local/sbin/squid -s
    0 49041     1      104   0  20  0  2481856    620 -      Ss                  0 ??         0:00.01 /opt/local/sbin/squid -s
    0 49171     1      104   0  20  0  2491072    596 -      Ss                  0 ??         0:00.00 /opt/local/sbin/squid -s
  503 49252 48980     4104   0   0  0  2598404 132680 -      R                   0 ??         0:12.28 (squid-1) -s
  503 49254 48850     4104   0   0  0  2590212 123884 -      R                   0 ??         0:11.30 (squid-1) -s
  503 49256 49171     4104   0   0  0  2590212 124064 -      R                   0 ??         0:11.26 (squid-1) -s
  503 49262 48792     4104   0   0  0  2607620 116032 -      R                   0 ??         0:09.88 (squid-1) -s
  503 49264 48744     4104   0   0  0  2585092  84676 -      R                   0 ??         0:07.04 (squid-1) -s
  503 49266 48913     4104   0   0  0  2583044  83692 -      R                   0 ??         0:06.96 (squid-1) -s
  503 49268 48699     4104   0   0  0  2560516  83476 -      R                   0 ??         0:06.95 (squid-1) -s
  503 49270 48665     4104   0   0  0  2575876  83348 -      R                   0 ??         0:06.91 (squid-1) -s


$ cat /opt/local/var/run/squid/squid.pid
49292

$ ls -l /opt/local/var/run/squid/squid.pid
-rw-r--r--  1 root  squid  6 May 21 18:08 /opt/local/var/run/squid/squid.pid

$ ls -ld /opt/local/var/run/squid/
drwxr-xr-x  4 squid  squid  136 May 21 09:36 /opt/local/var/run/squid/

$ sudo squid -k rotate
squid: ERROR: Could not send signal 30 to process 49377: (3) No such process
Last edited 7 years ago by essandess (Steve Smith) (previous) (diff)

Changed 7 years ago by essandess (Steve Smith)

Activity Monitor for Runaway Squid Processes

comment:10 Changed 7 years ago by essandess (Steve Smith)

I've relaunched squid on both boxes (sudo port unload squid; sudo port load squid) and the PID file /opt/local/var/run/squid/squid.pid shows the correct pid.

The error must occur over the daemondo relaunch transition. If there's way of tracking or logging this please let me know and I'll keep an eye out for the event.

Last edited 7 years ago by essandess (Steve Smith) (previous) (diff)

comment:11 Changed 7 years ago by essandess (Steve Smith)

I also see the incorrect pid in the PID file after a fresh reboot.

Is this an issue that should be upstreamed to daemondo?

comment:12 in reply to:  11 Changed 7 years ago by jmroot (Joshua Root)

Replying to essandess:

Is this an issue that should be upstreamed to daemondo?

Possibly. However the upstream is still MacPorts.

I still haven't been able to reproduce this, unfortunately.

comment:13 Changed 7 years ago by jmroot (Joshua Root)

Could you try this and see if it helps?

  • net/squid3/Portfile

    diff --git a/net/squid3/Portfile b/net/squid3/Portfile
    index a670aab57f..dd601a991a 100644
    a b post-configure { 
    8585startupitem.create      yes
    8686startupitem.name        Squid
    8787startupitem.netchange   yes
    88 startupitem.pidfile     auto ${pidfile}
     88startupitem.pidfile     clean ${pidfile}
    8989startupitem.start \
    9090    "cd ${prefix}/var/squid" \
    9191    "if \[ ! -d \"${prefix}/var/squid/cache/00\" \]; then" \

comment:14 Changed 7 years ago by essandess (Steve Smith)

Sorry, using clean didn't work.

The issue persists after a full uninstall and rebuild with these steps:

port unload squid3
vi `port file squid3`
# auto to clean ${pidfile}
port uninstall squid3
port clean --dist squid3
port install squid3
port load squid3

It took less than a day for the problem to appear, and after changing the System Preferences>Network>Location.

Several runaway squid process were being launched, and the pid file had the wrong pid. This is what ps -ef | grep squid | grep -v grep looks like write after the runaway problem starts:

ps -ef | grep squid | grep -v grep
    0 16935     1   0  9:02PM ??         0:00.00 /opt/local/sbin/squid -s
    0 16956     1   0  9:02PM ??         0:00.00 /opt/local/sbin/squid -s
  504 17000 16935   0  9:02PM ??         0:06.86 (squid-1) -s
  504 17003 16956   0  9:03PM ??         0:00.12 (squid-1) -s
    0 83825     1   0  9:23AM ??         0:00.03 /opt/local/bin/daemondo --label=Squid --start-cmd /opt/local/etc/LaunchDaemons/org.macports.Squid/Squid.wrapper start ; --stop-cmd /opt/local/etc/LaunchDaemons/org.macports.Squid/Squid.wrapper stop ; --restart-cmd /opt/local/etc/LaunchDaemons/org.macports.Squid/Squid.wrapper restart ; --restart-netchange --pid=fileclean --pidfile /opt/local/var/run/squid/squid.pid
    0 83832     1   0  9:23AM ??         0:00.00 /opt/local/sbin/squid -s
  504 83834 83832   0  9:23AM ??         1:59.40 (squid-1) -s
  504 83841 83834   0  9:23AM ??         0:01.07 (logfile-daemon) /opt/local/var/squid/logs/access.log

The only thing that works to stop the runaway squid processes is to add the {{{-N}}} flag as described above. The PID file eventually is inconsistent with the running squid process, which I believe causes all the problems.

comment:15 Changed 7 years ago by essandess (Steve Smith)

By the way, I have noticed a lag of a few seconds for squid to write the PID file:

$ sudo port load squid3
$ ps -ef | grep squid | grep -v grep
    0  2971     1   0  9:25PM ??         0:00.01 /opt/local/bin/daemondo --label=Squid --start-cmd /opt/local/etc/LaunchDaemons/org.macports.Squid/Squid.wrapper start ; --stop-cmd /opt/local/etc/LaunchDaemons/org.macports.Squid/Squid.wrapper stop ; --restart-cmd /opt/local/etc/LaunchDaemons/org.macports.Squid/Squid.wrapper restart ; --restart-netchange --pid=fileclean --pidfile /opt/local/var/run/squid/squid.pid
  504  2976  2972   0  9:25PM ??         0:04.10 /opt/local/sbin/squid -s -N
$ cat /opt/local/var/run/squid/squid.pid
cat: /opt/local/var/run/squid/squid.pid: No such file or directory
$ cat /opt/local/var/run/squid/squid.pid
2976
# the correct PID file appears a few seconds later

Could this PID file lagginess be related to the underlying problem?

comment:16 Changed 6 years ago by essandess (Steve Smith)

Re: https://github.com/essandess/macOS-Fortress/issues/5#issuecomment-300650444 [Squid.wrapper with a -N flag.]

I’ve made a few changes that I believe mitigate the problem of squid running multiple processes at boot, but do not address the underlying issue.

const CFTimeInterval kChildDeathTimeout = 20.0;

# commands
SUDO=/usr/bin/sudo
PORT=/opt/local/bin/port
TAIL=/usr/bin/tail
LSOF=/usr/sbin/lsof
KILLALL=/usr/bin/killall

# squid
# make sure there aren't multiple squid jobs running
if ! [[ `$SUDO $LSOF -i ':3128' | $TAIL -1` && `ps -ef | grep "/opt/local/sbin/squid -s" | grep -v grep | wc -l` -eq 1 ]]; then
    $SUDO $PORT unload squid3
    $SUDO $KILLALL 'squid'
    $SUDO $KILLALL '(squid-1)'
    $SUDO $PORT load squid3
fi
  • Deprecated the modified Squid.wrapper file and use the default Macports daemondo settings.

This has helped on the boxes I run, but doesn’t address the underlying issue of daemondo launching multiple squid processes at boot or on a network change.

comment:17 Changed 6 years ago by jmroot (Joshua Root)

Resolution: fixed
Status: assignedclosed

comment:18 in reply to:  17 Changed 5 years ago by essandess (Steve Smith)

@jmroot,

I just wanted to thank you and confirm that the new port file for squid4 along with how squid4 handles its parent and child processes fixes this issue.

Replying to jmroot:

In 2176e957f8389cc830b9605da73ace4e2c3f8071/macports-ports (master):

New port: squid4

Fixes: #54193

Note: See TracTickets for help on using tickets.