README.txt 23 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575
  1. Supervisor: A System for Allowing the Control of Process State on UNIX
  2. History
  3. 7/3/2006: updated for version 2.0
  4. Changes
  5. 2.0: fundamental rewrite based on 1.0.6, use distutils (only) for
  6. installation, use ConfigParser rather than ZConfig, use HTTP for
  7. wire protocol, web interface.
  8. Introduction
  9. The supervisor is a client/server system that allows its users to
  10. control a number of processes on UNIX-like operating systems. It
  11. was inspired by the following:
  12. - It is often inconvenient to need to write "rc.d" scripts for
  13. every single process instance. rc.d scripts are a great
  14. lowest-common-denominator form of process
  15. initialization/autostart/management, but they can be painful to
  16. write and maintain. Additionally, rc.d scripts cannot
  17. automatically restart a crashed process and many programs do not
  18. restart themselves properly on a crash. Supervisord starts
  19. processes as its subprocesses, and can be configured to
  20. automatically restart them on a crash. It can also automatically
  21. be configured to start processes on its own invocation.
  22. - It's often difficult to get accurate up/down status on processes
  23. on UNIX. Pidfiles often lie. Supervisord starts processes as
  24. subprocesses, so it always knows the true up/down status of its
  25. children and can be queried conveniently for this data.
  26. - Users who need to control process state often need only to do
  27. that. They don't want or need full-blown shell access to the
  28. machine on which the processes are running. Supervisorctl allows
  29. a very limited form of access to the machine, essentially
  30. allowing users to see process status and control
  31. supervisord-controlled subprocesses by emitting "stop", "start",
  32. and "restart" commands from a simple shell or web UI.
  33. - Users often need to control processes on many machines.
  34. Supervisor provides a simple, secure, and uniform mechanism for
  35. interactively and automatically controlling processes on groups
  36. of machines.
  37. - Processes which listen on "low" TCP ports often need to be
  38. started and restarted as the root user (a UNIX misfeature). It's
  39. usually the case that it's perfectly fine to allow "normal"
  40. people to stop or restart such a process, but providing them with
  41. shell access is often impractical, and providing them with root
  42. access or sudo access is often impossible. It's also (rightly)
  43. difficult to explain to them why this problem exists. If
  44. supervisord is started as root, it is possible to allow "normal"
  45. users to control such processes without needing to explain the
  46. intricacies of the problem to them.
  47. - Processes often need to be started and stopped in groups,
  48. sometimes even in a "priority order". It's often difficult to
  49. explain to people how to do this. Supervisor allows you to
  50. assign priorities to processes, and allows user to emit commands
  51. via the supervisorctl client like "start all", and "restart all",
  52. which starts them in the preassigned priority order.
  53. Supported Platforms
  54. Supervisor has been tested and is known to run on Linux (Fedora Core
  55. 5, Ubuntu 6), Mac OS X (10.4), and Solaris (10 for Intel) and
  56. FreeBSD 6.1. It will likely work fine on most UNIX systems.
  57. Supervisor will not run at all under any version of Windows.
  58. Supervisor requires Python 2.3 or better.
  59. Installing
  60. Run "python setup.py install", then copy the "sample.conf" file to
  61. /etc/supervisord.conf and modify to your liking. If you'd rather
  62. not put the supervisord.conf file in /etc, you can place it anywhere
  63. and start supervisord and point it at the configuration file via the
  64. -c flag, e.g. "python supervisord.py -c /path/to/sample/conf" or, if
  65. you use the shell script named "supervisord", "supervisord -c
  66. /path/to/sample.conf".
  67. I make reference below to a "$BINDIR" when explaining how to run
  68. supervisord and supervisorctl. This is the "bindir" directory that
  69. your Python installation has been configured with. For example, for
  70. an installation of Python installed via "./configure
  71. --prefix=/usr/local/python; make; make install", $BINDIR would be
  72. "/usr/local/python/bin". Python interpreters on different platforms
  73. use different $BINDIRs. Look at the output of "setup.py install" if
  74. you can't figure out where yours is.
  75. Running Supervisord
  76. To start supervisord, run $BINDIR/supervisord. The resulting
  77. process will daemonize itself and detach from the terminal. It
  78. keeps an operations log at "/tmp/supervisor.log" by default.
  79. You can start supervisord in the foreground by passing the "-n" flag
  80. on its command line. This is useful to debug startup problems.
  81. To change the set of programs controlled by supervisord, edit the
  82. supervisord.conf file and kill -HUP or otherwise restart the
  83. supervisord process. This file has several example program
  84. definitions. Controlled programs should themselves not be daemons,
  85. as supervisord assumes it is responsible for daemonizing its
  86. subprocesses.
  87. Supervisord accepts a number of command-line overrides. Type
  88. 'supervisord -h' for an overview.
  89. Running Supervisorctl
  90. To start supervisorctl, run $BINDIR/supervisorctl. A shell will
  91. be presented that will allow you to control the processes that are
  92. currently managed by supervisord. Type "help" at the prompt to get
  93. information about the supported commands.
  94. supervisorctl may be invoked with "one time" commands when invoked
  95. with arguments from a command line. An example: "supervisorctl stop
  96. all". If arguments are present on the supervisorctl command-line,
  97. it will prevent the interactive shell from being invoked. Instead,
  98. the command will be executed and supervisorctl will exit.
  99. If supervisorctl is invoked in interactive mode against a
  100. supervisord that requires authentication, you will be asked for
  101. authentication credentials.
  102. Components
  103. Supervisord
  104. The server piece of the supervisor is named "supervisord". It is
  105. responsible for responding to commands from the client process as
  106. well as restarting crashed processes. It is meant to be run as
  107. the root user in most production setups. NOTE: see "Security
  108. Notes" at the end of this document for caveats!
  109. The server process uses a configuration file. This is typically
  110. located in "/etc/supervisord.conf". This configuration file is an
  111. "Windows-INI" style config file. It is important to keep this
  112. file secure via proper filesystem permissions because it may
  113. contain unencrypted usernames and passwords.
  114. Supervisorctl
  115. The command-line client piece of the supervisor is named
  116. "supervisorctl". It provides a shell-like interface to the
  117. features provided by supervisord. From supervisorctl, a user can
  118. connect to different supervisord processes, get status on the
  119. subprocesses controlled by a supervisord, stop and start
  120. subprocesses of a supervisord, and get lists of running processes
  121. of a supervisord.
  122. The command-line client talks to the server across a UNIX domain
  123. socket or an Internet socket. The server can assert that the user
  124. of a client should present authentication credentials before it
  125. allows him to perform commands. The client process may use the
  126. same configuration file as the server; any configuration file with
  127. a [supervisorctl] section in it will work.
  128. Web Server
  129. A (sparse) web user interface with functionality comparable to
  130. supervisorctl may be accessed via a browser if you start
  131. supervisord against an internet socket. Visit the server URL
  132. (e.g. http://localhost:9001/) to view and control process status
  133. through the web interface.
  134. XML-RPC Interface
  135. The same HTTP server which serves the web UI serves up an XML-RPC
  136. interface that can be used to interrogate and control supervisor
  137. and the programs it runs. To use the XML-RPC interface, connect
  138. to supervisor's http port with any XML-RPC client library and run
  139. commands against it. An example of doing this using Python's
  140. xmlrpclib client library::
  141. import xmlrpclib
  142. server = xmlrpclib.Server('http://localhost:9001')
  143. Call methods against the supervisor and its subprocesses by using
  144. the 'supervisor' namespace::
  145. server.supervisor.getState()
  146. You can get a list of methods supported by supervisor's XML-RPC
  147. interface by using the XML-RPC 'system.listMethods' API:
  148. server.system.listMethods()
  149. You can see help on a method by using the 'system.methodHelp' API
  150. against the method::
  151. print server.system.methodHelp('supervisor.shutdown')
  152. Supervisor's XML-RPC interface also supports the nascent
  153. "XML-RPC multicall API":http://www.xmlrpc.com/discuss/msgReader$1208 .
  154. Configuration File '[supervisord]' Section Settings
  155. The supervisord.conf log file contains a section named
  156. '[supervisord]' in which global settings for the supervisord process
  157. should be inserted. These are:
  158. 'http_port' -- Either a TCP host:port value or (e.g. 127.0.0.1:9001)
  159. or a path to a UNIX domain socket (e.g. /tmp/supervisord.sock) on
  160. which supervisor will listen for HTTP/XML-RPC requests.
  161. Supervisorctl itself uses XML-RPC to communicate with supervisord
  162. over this port.
  163. 'sockchmod' -- Change the UNIX permission mode bits of the http_port
  164. UNIX domain socket to this value (ignored if using a TCP socket).
  165. Default: 0700.
  166. 'sockchown' -- Change the user and group of the socket file to this
  167. value. May be a username (e.g. chrism) or a username and group
  168. separated by a dot (e.g. chrism.wheel) Default: do not change.
  169. 'umask' -- The umask of the supervisord process. Default: 022.
  170. 'logfile' -- The path to the activity log of the supervisord process.
  171. 'logfile_maxbytes' -- The maximum number of bytes that may be
  172. consumed by the activity log file before it is rotated (suffix
  173. multipliers like "KB", "MB", and "GB" can be used in the value).
  174. Set this value to 0 to indicate an unlimited log size. Default:
  175. 50MB.
  176. 'logfile_backups' -- The number of backups to keep around resulting
  177. from activity log file rotation. Set this to 0 to indicate an
  178. unlimited number of backups. Default: 10.
  179. 'loglevel' -- The logging level, dictating what is written to the
  180. activity log. One of 'critical', 'error', 'warn', 'info', 'debug'
  181. or 'trace'. At log level 'trace', the supervisord log file will
  182. record the stderr/stdout output of its child processes, which is
  183. useful for debugging. Default: info.
  184. 'pidfile' -- The location in which supervisord keeps its pid file.
  185. 'nodaemon' -- If true, supervisord will start in the foreground
  186. instead of daemonizing. Default: false.
  187. 'minfds' -- The minimum number of file descriptors that must be
  188. available before supervisord will start successfully. Default:
  189. 1024.
  190. 'minprocs' -- The minimum nymber of process descriptors that must be
  191. available before supervisord will start successfully. Default: 200.
  192. 'nocleanup' -- prevent supervisord from clearing old "AUTO" log
  193. files at startup time. Default: false.
  194. 'http_username' -- the username required for authentication to our
  195. HTTP server. Default: none.
  196. 'http_password' -- the password required for authentication to our
  197. HTTP server. Default: none.
  198. 'childlogdir' -- the directory used for AUTO log files. Default:
  199. value of Python's tempfile.get_tempdir().
  200. 'user' -- if supervisord is run as root, switch users to this UNIX
  201. user account before doing any meaningful processing. This value has
  202. no effect if supervisord is not run as root. Default: do not switch
  203. users.
  204. 'directory' -- When supervisord daemonizes, switch to this
  205. directory. Default: do not cd.
  206. Configuration File '[supervisorctl]' Section Settings
  207. The configuration file may contain settings for the supervisorctl
  208. interactive shell program. These options are listed below.
  209. 'serverurl' -- The URL that should be used to access the supervisord
  210. server, e.g. "http://localhost:9001". For UNIX domain sockets, use
  211. "unix:///absolute/path/to/file.sock".
  212. 'username' -- The username to pass to the supervisord server for use
  213. in authentication (should be same as 'http_username' in supervisord
  214. config). Optional.
  215. 'password' -- The password to pass to the supervisord server for use
  216. in authentication (should be the same as 'http_password' in
  217. supervisord config). Optional.
  218. 'prompt' -- String used as supervisorctl prompt. Default: supervisor.
  219. Configuration File '[program:x]' Section Settings
  220. The .INI file must contain one or more 'program' sections in order
  221. for supervisord to know which programs it should start and control.
  222. A sample program section has the following structure, the options of
  223. which are described below it::
  224. [program:programname]
  225. command=/path/to/programname
  226. priority=1
  227. autostart=true
  228. autorestart=true
  229. startsecs=10
  230. startretries=3
  231. exitcodes=0,2
  232. stopsignal=TERM
  233. stopwaitsecs=10
  234. user=nobody
  235. log_stdout=true
  236. log_stderr=false
  237. logfile=/tmp/programname.log
  238. logfile_maxbytes=10MB
  239. logfile_backups=2
  240. '[program:programname]' -- the section header, required for each
  241. program. 'programname' is a descriptive name (arbitrary) used to
  242. describe the program being run.
  243. 'command' -- the command that will be run when this program is
  244. started. The command can be either absolute,
  245. e.g. ('/path/to/programname') or relative ('programname'). If it is
  246. relative, the PATH will be searched for the executable. Programs
  247. can accept arguments, e.g. ('/path/to/program foo bar'). The
  248. command line can used double quotes to group arguments with spaces
  249. in them to pass to the program, e.g. ('/path/to/program/name -p "foo
  250. bar"').
  251. 'priority' -- the relative priority of the program in the start and
  252. shutdown ordering. Lower priorities indicate programs that start
  253. first and shut down last at startup and when aggregate commands are
  254. used in various clients (e.g. "start all"/"stop all"). Higher
  255. priorities indicate programs that start last and shut down first.
  256. Default: 999.
  257. 'autostart' -- If true, this program will start automatically when
  258. supervisord is started. Default: true.
  259. 'autorestart' -- If true, when the program exits "unexpectedly",
  260. supervisor will restart it automatically. "unexpected" exits are
  261. those which happen when the program exits with an "unexpected" exit
  262. code (see 'exitcodes'). Default: true.
  263. 'startsecs' -- The total number of seconds which the program needs
  264. to stay running after a startup to consider the start successful.
  265. If the program does not stay up for this many seconds after it is
  266. started, even if it exits with an "expected" exit code, the startup
  267. will be considered a failure. Set to 0 to indicate that the program
  268. needn't stay running for any particular amount of time. Default: 1.
  269. 'startretries' -- The number of serial failure attempts that
  270. supervisord will allow when attempting to start the program before
  271. giving up and puting the process into an ERROR state. Default: 3.
  272. 'exitcodes' -- The list of 'expected' exit codes for this program.
  273. A program is considered 'failed' (and will be restarted, if
  274. autorestart is set true) if it exits with an exit code which is not
  275. in this list and a stop of the program has not been explicitly
  276. requested. Default: 0,2.
  277. 'stopsignal' -- The signal used to kill the program when a stop is
  278. requested. This can be any of TERM, HUP, INT, QUIT, KILL, USR1, or
  279. USR2. Default: TERM.
  280. 'stopwaitsecs' -- The number of seconds to wait for the program to
  281. return a SIGCHILD to supervisord after the program has been sent a
  282. stopsignal. If this number of seconds elapses before supervisord
  283. receives a SIGCHILD from the process, supervisord will attempt to
  284. kill it with a final SIGKILL. Default: 10.
  285. 'user' -- If supervisord is running as root, this UNIX user account
  286. will be used as the account which runs the program. If supervisord
  287. is not running as root, this option has no effect. Defaut: do not
  288. switch users.
  289. 'log_stdout' -- Send process stdout output to the process logfile.
  290. Default: true.
  291. 'log_stderr' -- Send process stderr output to the process logfile.
  292. Default: false.
  293. 'logfile' -- Keep process output as determined by log_stdout and
  294. log_stderr in this file. NOTE: if both log_stderr and log_stdout
  295. are true, chunks of output from the process' stderr and stdout will
  296. be intermingled more or less randomly in the log. If 'logfile' is
  297. unset or set to 'AUTO', supervisor will automatically choose a file
  298. location. If this is set to 'NONE', supervisord will create no log
  299. file. AUTO log files and their backups will be deleted when
  300. supervisord restarts. Default: AUTO.
  301. 'logfile_maxbytes' -- The maximum number of bytes that may be
  302. consumed by the process log file before it is rotated (suffix
  303. multipliers like "KB", "MB", and "GB" can be used in the value).
  304. Set this value to 0 to indicate an unlimited log size. Default:
  305. 50MB.
  306. 'logfile_backups' -- The number of backups to keep around resulting
  307. from process log file rotation. Set this to 0 to indicate an
  308. unlimited number of backups. Default: 10.
  309. Examples of Program Configurations
  310. Postgres 8.14::
  311. [program:postgres]
  312. command=/path/to/postmaster
  313. ; we use the "fast" shutdown signal SIGINT
  314. stopsignal=INT
  315. Zope 2.8 instances and ZEO::
  316. [program:zeo]
  317. command=/path/to/runzeo
  318. priority=1
  319. [program:zope1]
  320. command=/path/to/instance/home/bin/runzope
  321. priority=2
  322. log_stderr=true
  323. [program:zope2]
  324. command=/path/to/another/instance/home/bin/runzope
  325. priority=2
  326. log_stderr=true
  327. OpenLDAP slapd::
  328. [program:slapd]
  329. command=/path/to/slapd -f /path/to/slapd.conf -h ldap://0.0.0.0:8888
  330. Process States
  331. A process controlled by supervisord will be in one of the below
  332. states at any given time. You may see these state names in various
  333. user interface elements.
  334. STOPPED (0) -- The process has been stopped due to a stop request or
  335. has never been started.
  336. STARTING (10) -- The process is starting due to a start request.
  337. RUNNING (20) -- The process is running.
  338. BACKOFF (30) -- The process is waiting to restart after a nonfatal error.
  339. STOPPING (40) -- The process is stopping due to a stop request.
  340. EXITED (100) -- The process exited with an expected exit code.
  341. FATAL (200) -- The process could not be started successfully.
  342. UNKNOWN (1000) -- The process is in an unknown state (programming error).
  343. Process progress through these states as per the following directed
  344. graph::
  345. STOPPED
  346. ^ |
  347. / |
  348. STOPPING |
  349. ^ V
  350. | STARTING <-----> BACKOFF
  351. | / \
  352. | V V
  353. \-- RUNNING FATAL
  354. |
  355. V
  356. EXITED
  357. A process is in the STOPPED state if it has been stopped
  358. adminstratively or if it has never been started.
  359. When an autorestarting process is in the BACKOFF state, it will be
  360. automatically restarted by supervisord. It will switch between
  361. STARTING and BACKOFF states until it becomes evident that it cannot
  362. be started because the number of startretries has exceeded the
  363. maximum, at which point it will transition to the FATAL state. Each
  364. start retry will take progressively more time.
  365. An autorestarted process will never be automtatically restarted if
  366. it ends up in the FATAL state (it must be manually restarted from
  367. this state).
  368. A process transitions into the STOPPING state via an administrative
  369. stop request, and will then end up in the STOPPED state.
  370. A process that cannot be stopped successfully will stay in the
  371. STOPPING state forever. This situation should never be reached
  372. during normal operations as it implies that the process did not
  373. respond to a final SIGKILL, which is "impossible" under UNIX.
  374. Terminal states are "STOPPED", "FATAL", "EXITED", and "UNKNOWN".
  375. All other states are transitional.
  376. Signals
  377. Killing supervisord with SIGHUP will stop all processes, reload the
  378. configuration from the config file, and restart all processes.
  379. Killing supervisord with SIGUSR2 will close and reopen the
  380. supervisord activity log and child log files.
  381. Access Control
  382. The UNIX permissions on the socket effectively control who may send
  383. commands to the server. HTTP basic authentication provides access
  384. control for internet and UNIX domain sockets as necessary.
  385. Security Notes
  386. I have done my best to assure that use of a supervisord process
  387. running as root cannot lead to unintended privilege escalation, but
  388. caveat emptor. Particularly, it is not as paranoid as something
  389. like DJ Bernstein's "daemontools", inasmuch as "supervisord" allows
  390. for arbitrary path specifications in its configuration file to which
  391. data may be written. Allowing arbitrary path selections can create
  392. vulnerabilities from symlink attacks. Be careful when specifying
  393. paths in your configuration. Ensure that supervisord's
  394. configuration file cannot be read from or written to by unprivileged
  395. users and that all files installed by the supervisor package have
  396. "sane" file permission protection settings. Additionally, ensure
  397. that your PYTHONPATH is sane and that all Python standard library
  398. files have adequate file permission protections. Then, pray to the
  399. deity of your choice.
  400. Other Notes
  401. Some examples of shell scripts to start services under supervisor
  402. can be found "here":http://www.thedjbway.org/services.html. These
  403. examples are actually for daemontools but the premise is the same
  404. for supervisor.
  405. Some processes (like mysqld) ignore signals sent to the actual
  406. process/thread which is created by supervisord. Instead, a
  407. "special" thread/process is created by these kinds of programs which
  408. is responsible for handling signals. This is problematic, because
  409. supervisord can only kill a pid which it creates itself, not any
  410. child thread or process of the program it creates. Fortunately,
  411. these programs typically write a pidfile which is meant to be read
  412. in order to kill the process. As a workaround for this case, a
  413. special "pidproxy" program can handle startup of these kinds of
  414. processes. The pidproxy program is a small shim that starts a
  415. process, and upon the receipt of a signal, sends the signal to the
  416. pid provided in a pidfile. A sample supervisord configuration
  417. program entry for a pidproxy-enabled program is provided here:
  418. [program:mysql]
  419. command=/path/to/pidproxy /path/to/pidfile /path/to/mysqld_safe
  420. The pidproxy program is named 'pidproxy.py' and is in the
  421. distribution.
  422. FAQ
  423. My program never starts and supervisor doesn't indicate any error:
  424. Make sure the "x" bit is set on the executable file you're using in
  425. the command= line.
  426. How can I tell if my program is running under supervisor? Supervisor
  427. and its subprocesses share an environment variable
  428. "SUPERVISOR_ENABLED". When a process is run under supervisor, your
  429. program can check for the presence of this variable to determine
  430. whether it is running under supervisor (new in 2.0).
  431. Reporting Bugs
  432. Please report bugs at http://www.plope.com/software/collector .
  433. Author Information
  434. Chris McDonough (chrism@plope.com)
  435. http://www.plope.com