README.txt 22 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553
  1. Supervisor: A System for Allowing the Control of Process State on UNIX
  2. History
  3. 7/3/2006: updated for version 2.0
  4. Changes
  5. 2.0: substantial rewrite based on 1.0.6, use ConfigParser rather
  6. than ZConfig, use HTTP for wire protocol, web interface.
  7. Introduction
  8. The supervisor is a client/server system that allows its users to
  9. control a number of processes on UNIX-like operating systems. It
  10. was inspired by the following:
  11. - It is often inconvenient to need to write "rc.d" scripts for
  12. every single process instance. rc.d scripts are a great
  13. lowest-common-denominator form of process
  14. initialization/autostart/management, but they can be painful to
  15. write and maintain. Additionally, rc.d scripts cannot
  16. automatically restart a crashed process and many programs do not
  17. restart themselves properly on a crash. Supervisord starts
  18. processes as its subprocesses, and can be configured to
  19. automatically restart them on a crash. It can also automatically
  20. be configured to start processes on its own invocation.
  21. - It's often difficult to get accurate up/down status on processes
  22. on UNIX. Pidfiles often lie. Supervisord starts processes as
  23. subprocesses, so it always knows the true up/down status of its
  24. children and can be queried conveniently for this data.
  25. - Users who need to control process state often need only to do
  26. that. They don't want or need full-blown shell access to the
  27. machine on which the processes are running. Supervisorctl allows
  28. a very limited form of access to the machine, essentially
  29. allowing users to see process status and control
  30. supervisord-controlled subprocesses by emitting "stop", "start",
  31. and "restart" commands from a simple shell.
  32. - Users often need to control processes on many machines.
  33. Supervisor provides a simple, secure, and uniform mechanism for
  34. interactively and automatically controlling processes on groups
  35. of machines.
  36. - Processes which listen on "low" TCP ports often need to be
  37. started and restarted as the root user (a UNIX misfeature). It's
  38. usually the case that it's perfectly fine to allow "normal"
  39. people to stop or restart such a process, but providing them with
  40. shell access is often impractical, and providing them with root
  41. access or sudo access is often impossible. It's also (rightly)
  42. difficult to explain to them why this problem exists. If
  43. supervisord is started as root, it is possible to allow "normal"
  44. users to control such processes without needing to explain the
  45. intricacies of the problem to them.
  46. - Processes often need to be started and stopped in groups,
  47. sometimes even in a "priority order". It's often difficult to
  48. explain to people how to do this. Supervisor allows you to
  49. assign priorities to processes, and allows user to emit commands
  50. via the supervisorctl client like "start all", and "restart all",
  51. which starts them in the preassigned priority order.
  52. Supported Platforms
  53. Supervisor has been tested and is known to run on Linux (Fedora Core
  54. 5, Ubuntu 6), Mac OS X (10.4), and Solaris (10 for Intel). It has
  55. been reported to work on FreeBSD.
  56. Supervisor requires Python 2.3 or better.
  57. Installing
  58. Run "python setup.py install", then copy the "sample.conf" file to
  59. /etc/supervisord.conf and modify to your liking. If you'd rather
  60. not put the supervisord.conf file in /etc, you can place it anywhere
  61. and start supervisord and point it at the configuration file via the
  62. -c flag, e.g. "python supervisord.py -c /path/to/sample/conf".
  63. Running Supervisord
  64. To start supervisord, run $PYDIR/bin/supervisord. The resulting
  65. process will daemonize itself and detach from the terminal. It
  66. keeps an operations log at "/tmp/supervisor.log" by default.
  67. To change the set of programs controlled by supervisord, edit the
  68. supervisord.conf file and HUP or restart the supervisord process.
  69. This file has several example program definitions. Controlled
  70. programs should themselves not be daemons, as supervisord assumes it
  71. is responsible for daemonizing its subprocesses.
  72. Supervisord accepts a number of command-line overrides. Type
  73. 'supervisord -h' for an overview.
  74. Running Supervisorctl
  75. To start supervisorctl, run $PYDIR/bin/supervisorctl. A shell will
  76. be presented that will allow you to control the processes that are
  77. currently managed by supervisord. Type "help" at the prompt to get
  78. information about the supported commands.
  79. supervisorctl may be invoked with "one time" commands when invoked
  80. with arguments from a command line. An example: "supervisorctl stop
  81. all". If arguments are present on the supervisorctl command-line,
  82. it will prevent the interactive shell from being invoked. Instead,
  83. the command will be executed and supervisorctl will exit.
  84. If supervisorctl is invoked in interactive mode against a
  85. supervisord that requires authentication, you will be asked for
  86. authentication credentials.
  87. Components
  88. Supervisord
  89. The server piece of the supervisor is named "supervisord". It is
  90. responsible for responding to commands from the client process as
  91. well as restarting crashed processes. It is meant to be run as
  92. the root user in most production setups. NOTE: see "Security
  93. Notes" at the end of this document for caveats!
  94. The server process uses a configuration file. This is typically
  95. located in "/etc/supervisord.conf". This configuration file is an
  96. "Windows-INI" style config file. It is important to keep this
  97. file "secure" because it may contain unencrypted usernames and
  98. passwords.
  99. Supervisorctl
  100. The command-line client piece of the supervisor is named
  101. "supervisorctl". It provides a shell-like interface to the
  102. features provided by supervisord. From supervisorctl, a user can
  103. connect to different supervisord processes, get status on the
  104. subprocesses controlled by a supervisord, stop and start
  105. subprocesses of a supervisord, and get lists of running processes
  106. of a supervisord.
  107. The command-line client talks to the server across a UNIX domain
  108. socket or an Internet socket. The server can assert that the user
  109. of a client should present authentication credentials before it
  110. allows him to perform commands. The client process may use the
  111. same configuration file as the server (any configuration file with
  112. a [supervisorctl] section in it will work).
  113. Web Server
  114. A (sparse) web user interface with functionality comparable to
  115. supervisorctl may be accessed via a browser if you start
  116. supervisord against an internet socket. Visit the server URL
  117. (e.g. http://localhost:9001/) to view and control process status
  118. through the web interface.
  119. XML-RPC Interface
  120. The same HTTP server which serves the web UI serves up an XML-RPC
  121. interface that can be used to interrogate and control supervisor
  122. and the programs it runs. To use the XML-RPC interface, connect
  123. to supervisor's http port with any XML-RPC client library and run
  124. commands against it. An example of doing this using Python's
  125. xmlrpclib client library::
  126. import xmlrpclib
  127. server = xmlrpclib.Server('http://localhost:9001')
  128. Call methods against the supervisor and its subprocesses by using
  129. the 'supervisor' namespace::
  130. server.supervisor.getState()
  131. You can get a list of methods supported by supervisor's XML-RPC
  132. interface by using the XML-RPC 'system.listMethods' API:
  133. server.system.listMethods()
  134. You can see help on a method by using the 'system.methodHelp' API
  135. against the method::
  136. print server.system.methodHelp('supervisor.shutdown')
  137. Supervisor's XML-RPC interface also supports the nascent
  138. "XML-RPC multicall API":http://www.xmlrpc.com/discuss/msgReader$1208 .
  139. Configuration File '[supervisord]' Section Settings
  140. The supervisord.conf log file contains a section named
  141. '[supervisord]' in which global settings for the supervisord process
  142. should be inserted. These are:
  143. 'http_port' -- Either a TCP host:port value or (e.g. 127.0.0.1:9001)
  144. or a path to a UNIX domain socket (e.g. /tmp/supervisord.sock) on
  145. which supervisor will listen for HTTP/XML-RPC requests.
  146. Supervisorctl itself uses XML-RPC to communicate with supervisord
  147. over this port.
  148. 'sockchmod' -- Change the UNIX permission mode bits of the http_port
  149. UNIX domain socket to this value (ignored if using a TCP socket).
  150. Default: 0700.
  151. 'sockchown' -- Change the user and group of the socket file to this
  152. value. May be a username (e.g. chrism) or a username and group
  153. separated by a dot (e.g. chrism.wheel) Default: do not change.
  154. 'umask' -- The umask of the supervisord process. Default: 022.
  155. 'logfile' -- The path to the activity log of the supervisord process.
  156. 'logfile_maxbytes' -- The maximum number of bytes that may be
  157. consumed by the activity log file before it is rotated (suffix
  158. multipliers like "KB", "MB", and "GB" can be used in the value).
  159. Set this value to 0 to indicate an unlimited log size. Default:
  160. 50MB.
  161. 'logfile_backups' -- The number of backups to keep around resulting
  162. from activity log file rotation. Set this to 0 to indicate an
  163. unlimited number of backups. Default: 10.
  164. 'loglevel' -- The logging level, dictating what is written to the
  165. activity log. One of 'critical', 'error', 'warn', 'info', 'debug'
  166. or 'trace'. At log level 'trace', the supervisord log file will
  167. record the stderr/stdout output of its child processes, which is
  168. useful for debugging. Default: info.
  169. 'pidfile' -- The location in which supervisord keeps its pid file.
  170. 'nodaemon' -- If true, supervisord will start in the foreground
  171. instead of daemonizing. Default: false.
  172. 'minfds' -- The minimum number of file descriptors that must be
  173. available before supervisord will start successfully. Default:
  174. 1024.
  175. 'minprocs' -- The minimum nymber of process descriptors that must be
  176. available before supervisord will start successfully. Default: 200.
  177. 'nocleanup' -- prevent supervisord from clearing old "AUTO" log
  178. files at startup time. Default: false.
  179. 'http_username' -- the username required for authentication to our
  180. HTTP server. Default: none.
  181. 'http_password' -- the password required for authentication to our
  182. HTTP server. Default: none.
  183. 'childlogdir' -- the directory used for AUTO log files. Default:
  184. value of Python's tempfile.get_tempdir().
  185. 'user' -- if supervisord is run as root, switch users to this UNIX
  186. user account before doing any meaningful processing. This value has
  187. no effect if supervisord is not run as root. Default: do not switch
  188. users.
  189. 'directory' -- When supervisord daemonizes, switch to this
  190. directory. Default: do not cd.
  191. Configuration File '[supervisorctl]' Section Settings
  192. The configuration file may contain settings for the supervisorctl
  193. interactive shell program. These options are listed below.
  194. 'serverurl' -- The URL that should be used to access the supervisord
  195. server, e.g. "http://localhost:9001". For UNIX domain sockets, use
  196. "unix:///absolute/path/to/file.sock".
  197. 'username' -- The username to pass to the supervisord server for use
  198. in authentication (should be same as 'http_username' in supervisord
  199. config). Optional.
  200. 'password' -- The password to pass to the supervisord server for use
  201. in authentication (should be the same as 'http_password' in
  202. supervisord config). Optional.
  203. 'prompt' -- String used as supervisorctl prompt. Default: supervisor.
  204. Configuration File '[program:x]' Section Settings
  205. The .INI file must contain one or more 'program' sections in order
  206. for supervisord to know which programs it should start and control.
  207. A sample program section has the following structure, the options of
  208. which are described below it::
  209. [program:programname]
  210. command=/path/to/programname
  211. priority=1
  212. autostart=true
  213. autorestart=true
  214. startsecs=10
  215. startretries=999
  216. exitcodes=0,2
  217. stopsignal=TERM
  218. stopwaitsecs=10
  219. user=nobody
  220. log_stdout=true
  221. log_stderr=false
  222. logfile=/tmp/programname.log
  223. logfile_maxbytes=10MB
  224. logfile_backups=2
  225. '[program:programname]' -- the section header, required for each
  226. program. 'programname' is a descriptive name (arbitrary) used to
  227. describe the program being run.
  228. 'command' -- the command that will be run when this program is
  229. started. The command can be either absolute,
  230. e.g. ('/path/to/programname') or relative ('programname'). If it is
  231. relative, the PATH will be searched for the executable. Programs
  232. can accept arguments, e.g. ('/path/to/program foo bar'). The
  233. command line can used double quotes to group arguments with spaces
  234. in them to pass to the program, e.g. ('/path/to/program/name -p "foo
  235. bar"').
  236. 'priority' -- the relative priority of the program in the start and
  237. shutdown ordering. Lower priorities indicate programs that start
  238. first and shut down last at startup and when aggregate commands are
  239. used in various clients (e.g. "start all"/"stop all"). Higher
  240. priorities indicate programs that start last and shut down first.
  241. Default: 999.
  242. 'autostart' -- If true, this program will start automatically when
  243. supervisord is started. Default: true.
  244. 'autorestart' -- If true, when the program exits "unexpectedly",
  245. supervisor will restart it automatically. "unexpected" exits are
  246. those which happen when the program exits with an "unexpected" exit
  247. code (see 'exitcodes'). Default: true.
  248. 'startsecs' -- The total number of seconds which the program needs
  249. to stay running after a startup to consider the start successful.
  250. If the program does not stay up for this many seconds after it is
  251. started, even if it exits with an "expected" exit code, the startup
  252. will be considered a failure. Set to 0 to indicate that the program
  253. needn't stay running for any particular amount of time. Default: 1.
  254. 'startretries' -- The number of serial failure attempts that
  255. supervisord will allow when attempting to start the program before
  256. giving up and puting the process into an ERROR state. Default: 3.
  257. 'exitcodes' -- The list of 'expected' exit codes for this program.
  258. A program is considered 'failed' (and will be restarted, if
  259. autorestart is set true) if it exits with an exit code which is not
  260. in this list and a stop of the program has not been explicitly
  261. requested. Default: 0,2.
  262. 'stopsignal' -- The signal used to kill the program when a stop is
  263. requested. This can be any of TERM, HUP, INT, QUIT, KILL, USR1, or
  264. USR2. Default: TERM.
  265. 'stopwaitsecs' -- The number of seconds to wait for the program to
  266. return a SIGCHILD to supervisord after the program has been sent a
  267. stopsignal. If this number of seconds elapses before supervisord
  268. receives a SIGCHILD from the process, supervisord will attempt to
  269. kill it with a final SIGKILL. Default: 10.
  270. 'user' -- If supervisord is running as root, this UNIX user account
  271. will be used as the account which runs the program. If supervisord
  272. is not running as root, this option has no effect. Defaut: do not
  273. switch users.
  274. 'log_stdout' -- Send process stdout output to the process logfile.
  275. Default: true.
  276. 'log_stderr' -- Send process stderr output to the process logfile.
  277. Default: false.
  278. 'logfile' -- Keep process output as determined by log_stdout and
  279. log_stderr in this file. NOTE: if both log_stderr and log_stdout
  280. are true, chunks of output from the process' stderr and stdout will
  281. be intermingled more or less randomly in the log. If 'logfile' is
  282. unset or set to 'AUTO', supervisor will automatically choose a file
  283. location. If this is set to 'NONE', supervisord will create no log
  284. file. AUTO log files and their backups will be deleted when
  285. supervisord restarts. Default: AUTO.
  286. 'logfile_maxbytes' -- The maximum number of bytes that may be
  287. consumed by the process log file before it is rotated (suffix
  288. multipliers like "KB", "MB", and "GB" can be used in the value).
  289. Set this value to 0 to indicate an unlimited log size. Default:
  290. 50MB.
  291. 'logfile_backups' -- The number of backups to keep around resulting
  292. from process log file rotation. Set this to 0 to indicate an
  293. unlimited number of backups. Default: 10.
  294. Examples of Program Configurations
  295. Postgres 8.14::
  296. [program:postgres]
  297. command=/path/to/postmaster
  298. ; we use the "fast" shutdown signal SIGINT
  299. stopsignal=INT
  300. Zope 2.8 instances and ZEO::
  301. [program:zeo]
  302. command=/path/to/runzeo
  303. priority=1
  304. [program:zope1]
  305. command=/path/to/instance/home/bin/runzope
  306. priority=2
  307. log_stderr=true
  308. [program:zope2]
  309. command=/path/to/another/instance/home/bin/runzope
  310. priority=2
  311. log_stderr=true
  312. OpenLDAP slapd::
  313. [program:slapd]
  314. command=/path/to/slapd -f /path/to/slapd.conf -h ldap://0.0.0.0:8888
  315. Process States
  316. A process controlled by supervisord will be in one of the below
  317. states at any given time. You may see these state names in various
  318. user interface elements.
  319. STOPPED (0) -- The process has been stopped due to a stop request or
  320. has never been started.
  321. STARTING (10) -- The process is starting due to a start request.
  322. RUNNING (20) -- The process is running.
  323. BACKOFF (30) -- The process is waiting to restart after a nonfatal error.
  324. STOPPING (40) -- The process is stopping due to a stop request.
  325. EXITED (100) -- The process exited with an expected exit code.
  326. FATAL (200) -- The process could not be started successfully.
  327. UNKNOWN (1000) -- The process is in an unknown state (programming error).
  328. Process progress through these states as per the following directed
  329. graph::
  330. STOPPED
  331. ^ |
  332. / |
  333. STOPPING |
  334. ^ V
  335. | STARTING <-----> BACKOFF
  336. | / \
  337. | V V
  338. \-- RUNNING FATAL
  339. |
  340. V
  341. EXITED
  342. A process is in the STOPPED state if it has been stopped
  343. adminstratively or if it has never been started.
  344. When an autorestarting process is in the BACKOFF state, it will be
  345. automatically restarted by supervisord. It will switch between
  346. STARTING and BACKOFF states until it becomes evident that it cannot
  347. be started because the number of startretries has exceeded the
  348. maximum, at which point it will transition to the FATAL state. Each
  349. start retry will take progressively more time.
  350. An autorestarted process will never be automtatically restarted if
  351. it ends up in the FATAL state (it must be manually restarted from
  352. this state).
  353. A process transitions into the STOPPING state via an administrative
  354. stop request, and will then end up in the STOPPED state.
  355. A process that cannot be stopped successfully will stay in the
  356. STOPPING state forever. This situation should never be reached
  357. during normal operations as it implies that the process did not
  358. respond to a final SIGKILL, which is "impossible" under UNIX.
  359. Terminal states are "STOPPED", "FATAL", "EXITED", and "UNKNOWN".
  360. All other states are transitional.
  361. Signals
  362. Killing supervisord with SIGHUP will stop all processes, reload the
  363. configuration from the config file, and restart all processes.
  364. Killing supervisord with SIGUSR2 will rotate the supervisord and
  365. child log files.
  366. Access Control
  367. The UNIX permissions on the socket effectively control who may send
  368. commands to the server. HTTP basic authentication provides access
  369. control for internet and UNIX domain sockets as necessary.
  370. Security Notes
  371. I have done my best to assure that use of a supervisord process
  372. running as root cannot lead to unintended privilege escalation, but
  373. caveat emptor. Particularly, it is not as paranoid as something
  374. like DJ Bernstein's "daemontools", inasmuch as "supervisord" allows
  375. for arbitrary path specifications in its configuration file to which
  376. data may be written. Allowing arbitrary path selections can create
  377. vulnerabilities from symlink attacks. Be careful when specifying
  378. paths in your configuration. Ensure that supervisord's
  379. configuration file cannot be read from or written to by unprivileged
  380. users and that all files installed by the supervisor package have
  381. "sane" file permission protection settings. Additionally, ensure
  382. that your PYTHONPATH is sane and that all Python standard library
  383. files have adequate file permission protections. Then, pray to the
  384. deity of your choice.
  385. Other Notes
  386. Some examples of shell scripts to start services under supervisor
  387. can be found "here":http://www.thedjbway.org/services.html. These
  388. examples are actually for daemontools but the premise is the same
  389. for supervisor.
  390. Some processes (like mysqld) ignore signals sent to the actual
  391. process/thread which is created by supervisord. Instead, a
  392. "special" thread/process is created by these kinds of programs which
  393. is responsible for handling signals. This is problematic, because
  394. supervisord can only kill a pid which it creates itself.
  395. Fortunately, these programs typically write a pidfile which is meant
  396. to be read in order to kill the proces. To service a workaround for
  397. this case, a special "pidproxy" program can handle startup of these
  398. kinds of processes. The pidproxy program is a small shim that
  399. starts a process, and upon the receipt of a signal, sends the signal
  400. to the pid provided in a pidfile. A sample supervisord
  401. configuration program entry for a pidproxy-enabled program is
  402. provided here:
  403. [program:mysql]
  404. command=/path/to/pidproxy /path/to/pidfile /path/to/mysqld_safe
  405. The pidproxy program is named 'pidproxy.py' and is in the
  406. distribution.
  407. FAQ
  408. My program never starts and supervisor doesn't indicate any error:
  409. Make sure the "x" bit is set on the executable file you're using the
  410. command against.
  411. How can I tell if my program is running under supervisor? Supervisor
  412. and its subprocesses share an environment variable
  413. "SUPERVISOR_ENABLED". When a process is run under supervisor, your
  414. program can check for the presence of this variable to determine
  415. whether it is running under supervisor (new in 2.0).
  416. Author Information
  417. Chris McDonough (chrism@plope.com)
  418. http://www.plope.com