Linux 守护进程

守护进程介绍

本笔记内容主要参考自：《Advanced Programming in The Unix Environment 3rd edition》第 13 章第 3 小节。

守护进程是一类长时间运行的进程，一般随操作系统启动运行直到系统关闭而停止（也可以由 crond 启动，或者由用户终端 Shell 启动）。因为没有关联的控制终端，所以我们称其在后台运行，这也是守护进程最重要的特性。另一个重要的特性是守护进程必须与其运行前的环境隔离开。参考《Advanced Programming in The Unix Environment》第 13 章内容，编程实现守护进程有一些通用的设计规范。

守护进程编码设计规范

这里将介绍一些编码规范，这些规范将阻止守护进程与其运行前环境产生一些不必要的交互。

Call umask to set the file mode creation mask to a known value, usually 0. The inherited file mode creation mask could be set to deny certain permissions. If the daemon process creates files, it may want to set specific permissions. For example, if it creates files with group-read and group-write enabled, a file mode creation mask that turns off either of these permissions would undo its efforts. On the other hand, if the daemon calls library functions that result in files being created, then it might make sense to set the file mode create mask to a more restrictive value (such as 007), since the library functions might not allow the caller to specify the permissions through an explicit argument.

调用 umask 将文件创建掩码设置为一个值，通常是 0 。因为守护进程从父进程继承而来的 “文件创建掩码” 可能会屏蔽某些特定的文件操作权限。如果守护进程想要创建文件，那么便需要设置特定的文件操作权限。例如，守护进程想要创建允许用户组读和写权限的文件，继承而来的 “文件创建掩码” 屏蔽了这个权限，则创建操作不会成功。另一方面，如果后台进程调用的库函数会创建文件，但是库函数又不允许调用者通过一个明确的参数来指定文件的权限，为了安全起见将 “文件创建掩码” 设置为一个更严格的值（比如 007 ）是非常有意义和必要的。(注：默认情况下的 umask 值是 022 (可以用 umask 命令查看），此时你建立的文件默认权限是 644 (6-0,6-2,6-2)，建立的目录的默认权限是 755 (7-0,7-2,7-2)。使用 umask(0) 修改 “文件创建掩码” ，保证进程拥有文件的读写权限，这个操作很危险将导致新建的文件权限为 0666/world-writable 。这个操作通常用于文件创建者和修改者不是同一个用户的场景，比如：你需要创建一个文件，该文件后续会被 Web Server 修改，而 Web Server 使用的是另外一个用户运行。这种情况下为 Web Server 写文件的目录 Set Group ID（）是个不错的选择。)
Call fork and have the parent exit. This does several things. First, if the daemon was started as a simple shell command, having the parent terminate makes the shell think that the command is done. Second, the child inherits the process group ID of the parent but gets a new process ID, so we’re guaranteed that the child is not a process group leader. This is a prerequisite for the call to setsid that is done next.

调用 fork 创建子进程并使父进程退出，将守护进程放入后台运行。这个操作主要有两个目的。首先，如果守护进程是通过一个简单的 Shell 命令创建的，那么父进程结束时便会让 Shell 一并将守护进程也结束（注：在终端中 ctrl+c/delete 会向前台进程组所有进程发送中断信号，若父进程退出那么子进程便会被 init 进程接管进入后台运行。）；其次，子进程继承得到父进程的 “进程组ID” 同时也获得了一个新的进程号，这样便能保证子进程不是 “进程组组长” ，这是下一步 setsid 操作的前提（注：只有当前进程不是进程组组长时，才能调用 setsid 创建新会话。）。
Call setsid to create a new session. The three steps listed in Section 9.5 occur.The process (a) becomes the leader of a new session, (b) becomes the leader of a new process group, and (c) is disassociated from its controlling terminal.

调用 setsid 创建一个新会话，这个调用实际会执行 3 个操作：(a) 使当前进程称为新会话的 “会话首进程”；(b) 使当前进程称为新 “进程组组长”；(c) 使当前进程脱离控制终端。（注：第 2 个操作使当前进程进入后台运行，这个操作接着使进程脱离原来的进程组、控制终端和会话。）

在基于 System V 的系统中，有人建议再一次调用 fork 并使父进程退出，而新产生的进程将会成为真正的守护进程。这一步骤将保证守护进程不是一个 “会话首进程” ，进而阻止它重新申请获取一个控制终端。另外一种阻止守护进程重新申请获取控制终端的方法是任意时刻打开一个终端设备的时候明确指定 O_NOCTTY 标识（注：调用 open() 函数打开文件时，若文件是一个终端，指定 O_NOCTTY 标识后便不会让此终端成为该进程的控制终端。 ）。
Change the current working directory to the root directory. The current working directory inherited from the parent could be on a mounted file system. Since daemons normally exist until the system is rebooted, if the daemon stays on a mounted file system, that file system cannot be unmounted.

Alternatively, some daemons might change the current working directory to a specific location where they will do all their work. For example, a line printer spooling daemon might change its working directory to its spool directory.

将当前工作目录切换到系统目录下。这是因为继承自父进程的当前工作目录可能是一个挂载的文件系统，而守护进程通常会一直运行到系统重启。如果守护进程工作在一个挂载的文件系统上，那么这个文件系统便不能被卸载。

另外，有些守护进程会把当前工作目录切换到特定的路径下，并在这些路径下完成它们的工作。例如，行式打印机守护进程通常会将当前工作目录切换到 spool 目录。
Unneeded file descriptors should be closed. This prevents the daemon from holding open any descriptors that it may have inherited from its parent (which could be a shell or some other process). We can use our open_max function (Figure 2.17) or the getrlimit function (Section 7.11) to determine the highest descriptor and close all descriptors up to that value.

关闭不必要的文件描述符。这将阻止守护进程保持任何从父进程（Shell 或者其他进程）进程而来的文件描述符。我们可以使用 open_max 或 getrlimit 函数来查找当前优先级最高的文件描述符并关闭此描述符之下的所有其他描述符。（注：保持打开的文件描述符将会占用系统资源并使某系文件不能被卸载。）
Some daemons open file descriptors 0, 1, and 2 to /dev/null so that any library routines that try to read from standard input or write to standard output or standard error will have no effect. Since the daemon is not associated with a terminal device, there is nowhere for output to be displayed, nor is there anywhere to receive input from an interactive user. Even if the daemon was started from an interactive session, the daemon runs in the background, and the login session can terminate without affecting the daemon. If other users log in on the same terminal device, we wouldn’t want output from the daemon showing up on the terminal, and the users wouldn’t expect their input to be read by the daemon.

有些守护进程会将标准输入、标准输出、标准错误描述符重定向到 /dev/null，这样一来任何尝试从标准输入、标准输出或者标准错误读取守护进程信息的操作都会失败。因为守护进程不与任何终端设备关联，便没有地方显示输出或者接受用户输入。即使守护进程是由一个交互式会话创建，但由于其在后台运行，便不会受登录会话结束的影响；如果有其他用户通过当前终端登录，我们也不希望守护进程的输出出现在终端上，并且该用户的任何输入也不会被守护进程接收。
（注：引用自《linux系统编程之进程（八）：守护进程详解及创建，daemon()使用》）处理 SIGCHLD 信号。这不是一个必须的操作，但对于某些进程，特别是服务器进程（守护进程）往往在请求到来时生成子进程处理请求。如果父进程不等待子进程结束，子进程将成为僵尸进程（zombie ）从而占用系统资源。如果父进程等待子进程结束，将增加父进程的负担，影响服务器进程的并发性能。在 Linux 下可以简单地将SIGCHLD 信号的操作设为 SIG_IGN： signal(SIGCHLD,SIG_IGN)。这样，内核在子进程结束时不会产生僵尸进程。这一点与 BSD4 不同，BSD4 下必须显式等待子进程结束才能释放僵尸进程。

使用 Python 实现守护进程

以下 Python2.x 代码引用自《A simple unix/linux daemon in Python》，作者文章中有兼容 Python 3 的代码下载。下面代码实现了一个守护进程基类，子类可以简单地继承并实现 run() 方法来在守护进程中工作。

#!/usr/bin/env python

import sys, os, time, atexit
from signal import SIGTERM

class Daemon:
	"""
	A generic daemon class.

	Usage: subclass the Daemon class and override the run() method
	"""
	def __init__(self, pidfile, stdin='/dev/null', stdout='/dev/null', stderr='/dev/null'):
		self.stdin = stdin
		self.stdout = stdout
		self.stderr = stderr
		self.pidfile = pidfile

	def daemonize(self):
		"""
		do the UNIX double-fork magic, see Stevens' "Advanced
		Programming in the UNIX Environment" for details (ISBN 0201563177)
		http://www.erlenstar.demon.co.uk/unix/faq_2.html#SEC16
		"""
		try:
			pid = os.fork()
			if pid > 0:
				# exit first parent
				sys.exit(0)
		except OSError, e:
			sys.stderr.write("fork #1 failed: %d (%s)\n" % (e.errno, e.strerror))
			sys.exit(1)

		# decouple from parent environment
		os.chdir("/")
		os.setsid()
		os.umask(0)

		# do second fork
		try:
			pid = os.fork()
			if pid > 0:
				# exit from second parent
				sys.exit(0)
		except OSError, e:
			sys.stderr.write("fork #2 failed: %d (%s)\n" % (e.errno, e.strerror))
			sys.exit(1)

		# redirect standard file descriptors
		sys.stdout.flush()
		sys.stderr.flush()
		si = file(self.stdin, 'r')
		so = file(self.stdout, 'a+')
		se = file(self.stderr, 'a+', 0)
		os.dup2(si.fileno(), sys.stdin.fileno())
		os.dup2(so.fileno(), sys.stdout.fileno())
		os.dup2(se.fileno(), sys.stderr.fileno())

		# write pidfile
		atexit.register(self.delpid)
		pid = str(os.getpid())
		file(self.pidfile,'w+').write("%s\n" % pid)

	def delpid(self):
		os.remove(self.pidfile)

	def start(self):
		"""
		Start the daemon
		"""
		# Check for a pidfile to see if the daemon already runs
		try:
			pf = file(self.pidfile,'r')
			pid = int(pf.read().strip())
			pf.close()
		except IOError:
			pid = None

		if pid:
			message = "pidfile %s already exist. Daemon already running?\n"
			sys.stderr.write(message % self.pidfile)
			sys.exit(1)

		# Start the daemon
		self.daemonize()
		self.run()

	def stop(self):
		"""
		Stop the daemon
		"""
		# Get the pid from the pidfile
		try:
			pf = file(self.pidfile,'r')
			pid = int(pf.read().strip())
			pf.close()
		except IOError:
			pid = None

		if not pid:
			message = "pidfile %s does not exist. Daemon not running?\n"
			sys.stderr.write(message % self.pidfile)
			return # not an error in a restart

		# Try killing the daemon process
		try:
			while 1:
				os.kill(pid, SIGTERM)
				time.sleep(0.1)
		except OSError, err:
			err = str(err)
			if err.find("No such process") > 0:
				if os.path.exists(self.pidfile):
					os.remove(self.pidfile)
			else:
				print str(err)
				sys.exit(1)

	def restart(self):
		"""
		Restart the daemon
		"""
		self.stop()
		self.start()

	def run(self):
		"""
		You should override this method when you subclass Daemon. It will be called after the process has been
		daemonized by start() or restart().
		"""

子类实现 run() 方法：

#!/usr/bin/env python

import sys, time
from daemon import Daemon

class MyDaemon(Daemon):
	def run(self):
		while True:
			time.sleep(1)

if __name__ == "__main__":
	daemon = MyDaemon('/tmp/daemon-example.pid')
	if len(sys.argv) == 2:
		if 'start' == sys.argv[1]:
			daemon.start()
		elif 'stop' == sys.argv[1]:
			daemon.stop()
		elif 'restart' == sys.argv[1]:
			daemon.restart()
		else:
			print "Unknown command"
			sys.exit(2)
		sys.exit(0)
	else:
		print "usage: %s start|stop|restart" % sys.argv[0]
		sys.exit(2)