From 4f48c8d58c7fc6227e7e291281d9d4723e89155f Mon Sep 17 00:00:00 2001 From: Graham Dumpleton Date: Sat, 23 Jan 2016 18:02:00 +1100 Subject: Rescue more documentation from Google Code site. --- docs/config-directives/WSGIAcceptMutex.rst | 21 - docs/config-directives/WSGIAccessScript.rst | 31 - docs/config-directives/WSGIApplicationGroup.rst | 106 -- docs/config-directives/WSGIAuthGroupScript.rst | 31 - docs/config-directives/WSGIAuthUserScript.rst | 40 - docs/config-directives/WSGICallableObject.rst | 29 - docs/config-directives/WSGICaseSensitivity.rst | 23 - docs/config-directives/WSGIDaemonProcess.rst | 276 ----- docs/config-directives/WSGIImportScript.rst | 42 - docs/config-directives/WSGILazyInitialization.rst | 73 -- docs/config-directives/WSGIPassAuthorization.rst | 24 - docs/config-directives/WSGIProcessGroup.rst | 65 - docs/config-directives/WSGIPythonEggs.rst | 18 - docs/config-directives/WSGIPythonHome.rst | 50 - docs/config-directives/WSGIPythonOptimize.rst | 49 - docs/config-directives/WSGIPythonPath.rst | 59 - docs/config-directives/WSGIRestrictEmbedded.rst | 17 - docs/config-directives/WSGIRestrictProcess.rst | 64 - docs/config-directives/WSGIRestrictSignal.rst | 46 - docs/config-directives/WSGIRestrictStdin.rst | 23 - docs/config-directives/WSGIRestrictStdout.rst | 29 - docs/config-directives/WSGIScriptAlias.rst | 66 -- docs/config-directives/WSGIScriptAliasMatch.rst | 33 - docs/config-directives/WSGIScriptReloading.rst | 14 - docs/config-directives/WSGISocketPrefix.rst | 43 - docs/configuration-directives/WSGIAcceptMutex.rst | 21 + docs/configuration-directives/WSGIAccessScript.rst | 31 + .../WSGIApplicationGroup.rst | 106 ++ .../WSGIAuthGroupScript.rst | 31 + .../WSGIAuthUserScript.rst | 40 + .../WSGICallableObject.rst | 29 + .../WSGICaseSensitivity.rst | 23 + .../configuration-directives/WSGIDaemonProcess.rst | 276 +++++ docs/configuration-directives/WSGIImportScript.rst | 42 + .../WSGILazyInitialization.rst | 73 ++ .../WSGIPassAuthorization.rst | 24 + docs/configuration-directives/WSGIProcessGroup.rst | 65 + docs/configuration-directives/WSGIPythonEggs.rst | 18 + docs/configuration-directives/WSGIPythonHome.rst | 50 + .../WSGIPythonOptimize.rst | 49 + docs/configuration-directives/WSGIPythonPath.rst | 59 + .../WSGIRestrictEmbedded.rst | 17 + .../WSGIRestrictProcess.rst | 64 + .../WSGIRestrictSignal.rst | 46 + .../configuration-directives/WSGIRestrictStdin.rst | 23 + .../WSGIRestrictStdout.rst | 29 + docs/configuration-directives/WSGIScriptAlias.rst | 66 ++ .../WSGIScriptAliasMatch.rst | 33 + .../WSGIScriptReloading.rst | 14 + docs/configuration-directives/WSGISocketPrefix.rst | 43 + docs/configuration.rst | 50 +- docs/getting-started.rst | 2 +- docs/index.rst | 13 - docs/installation.rst | 12 +- docs/project-status.rst | 25 +- docs/troubleshooting.rst | 26 +- docs/user-guides.rst | 15 + docs/user-guides/application-issues.rst | 1251 ++++++++++++++++++++ docs/user-guides/assorted-tips-and-tricks.rst | 123 ++ docs/user-guides/checking-your-installation.rst | 670 +++++++++++ docs/user-guides/configuration-guidelines.rst | 4 +- docs/user-guides/configuration-issues.rst | 66 ++ docs/user-guides/debugging-techniques.rst | 1148 ++++++++++++++++++ docs/user-guides/file-wrapper-extension.rst | 217 ++++ docs/user-guides/frequently-asked-questions.rst | 243 ++++ docs/user-guides/installation-issues.rst | 509 ++++++++ docs/user-guides/issues-with-expat-library.rst | 225 ++++ docs/user-guides/issues-with-pickle-module.rst | 175 +++ docs/user-guides/processes-and-threading.rst | 495 ++++++++ docs/user-guides/quick-installation-guide.rst | 227 ++++ docs/user-guides/registering-cleanup-code.rst | 144 +++ docs/user-guides/reloading-source-code.rst | 483 ++++++++ docs/user-guides/virtual-environments.rst | 214 ++++ 73 files changed, 7522 insertions(+), 1359 deletions(-) delete mode 100644 docs/config-directives/WSGIAcceptMutex.rst delete mode 100644 docs/config-directives/WSGIAccessScript.rst delete mode 100644 docs/config-directives/WSGIApplicationGroup.rst delete mode 100644 docs/config-directives/WSGIAuthGroupScript.rst delete mode 100644 docs/config-directives/WSGIAuthUserScript.rst delete mode 100644 docs/config-directives/WSGICallableObject.rst delete mode 100644 docs/config-directives/WSGICaseSensitivity.rst delete mode 100644 docs/config-directives/WSGIDaemonProcess.rst delete mode 100644 docs/config-directives/WSGIImportScript.rst delete mode 100644 docs/config-directives/WSGILazyInitialization.rst delete mode 100644 docs/config-directives/WSGIPassAuthorization.rst delete mode 100644 docs/config-directives/WSGIProcessGroup.rst delete mode 100644 docs/config-directives/WSGIPythonEggs.rst delete mode 100644 docs/config-directives/WSGIPythonHome.rst delete mode 100644 docs/config-directives/WSGIPythonOptimize.rst delete mode 100644 docs/config-directives/WSGIPythonPath.rst delete mode 100644 docs/config-directives/WSGIRestrictEmbedded.rst delete mode 100644 docs/config-directives/WSGIRestrictProcess.rst delete mode 100644 docs/config-directives/WSGIRestrictSignal.rst delete mode 100644 docs/config-directives/WSGIRestrictStdin.rst delete mode 100644 docs/config-directives/WSGIRestrictStdout.rst delete mode 100644 docs/config-directives/WSGIScriptAlias.rst delete mode 100644 docs/config-directives/WSGIScriptAliasMatch.rst delete mode 100644 docs/config-directives/WSGIScriptReloading.rst delete mode 100644 docs/config-directives/WSGISocketPrefix.rst create mode 100644 docs/configuration-directives/WSGIAcceptMutex.rst create mode 100644 docs/configuration-directives/WSGIAccessScript.rst create mode 100644 docs/configuration-directives/WSGIApplicationGroup.rst create mode 100644 docs/configuration-directives/WSGIAuthGroupScript.rst create mode 100644 docs/configuration-directives/WSGIAuthUserScript.rst create mode 100644 docs/configuration-directives/WSGICallableObject.rst create mode 100644 docs/configuration-directives/WSGICaseSensitivity.rst create mode 100644 docs/configuration-directives/WSGIDaemonProcess.rst create mode 100644 docs/configuration-directives/WSGIImportScript.rst create mode 100644 docs/configuration-directives/WSGILazyInitialization.rst create mode 100644 docs/configuration-directives/WSGIPassAuthorization.rst create mode 100644 docs/configuration-directives/WSGIProcessGroup.rst create mode 100644 docs/configuration-directives/WSGIPythonEggs.rst create mode 100644 docs/configuration-directives/WSGIPythonHome.rst create mode 100644 docs/configuration-directives/WSGIPythonOptimize.rst create mode 100644 docs/configuration-directives/WSGIPythonPath.rst create mode 100644 docs/configuration-directives/WSGIRestrictEmbedded.rst create mode 100644 docs/configuration-directives/WSGIRestrictProcess.rst create mode 100644 docs/configuration-directives/WSGIRestrictSignal.rst create mode 100644 docs/configuration-directives/WSGIRestrictStdin.rst create mode 100644 docs/configuration-directives/WSGIRestrictStdout.rst create mode 100644 docs/configuration-directives/WSGIScriptAlias.rst create mode 100644 docs/configuration-directives/WSGIScriptAliasMatch.rst create mode 100644 docs/configuration-directives/WSGIScriptReloading.rst create mode 100644 docs/configuration-directives/WSGISocketPrefix.rst create mode 100644 docs/user-guides/application-issues.rst create mode 100644 docs/user-guides/assorted-tips-and-tricks.rst create mode 100644 docs/user-guides/checking-your-installation.rst create mode 100644 docs/user-guides/configuration-issues.rst create mode 100644 docs/user-guides/debugging-techniques.rst create mode 100644 docs/user-guides/file-wrapper-extension.rst create mode 100644 docs/user-guides/frequently-asked-questions.rst create mode 100644 docs/user-guides/installation-issues.rst create mode 100644 docs/user-guides/issues-with-expat-library.rst create mode 100644 docs/user-guides/issues-with-pickle-module.rst create mode 100644 docs/user-guides/processes-and-threading.rst create mode 100644 docs/user-guides/quick-installation-guide.rst create mode 100644 docs/user-guides/registering-cleanup-code.rst create mode 100644 docs/user-guides/reloading-source-code.rst create mode 100644 docs/user-guides/virtual-environments.rst diff --git a/docs/config-directives/WSGIAcceptMutex.rst b/docs/config-directives/WSGIAcceptMutex.rst deleted file mode 100644 index 818d021..0000000 --- a/docs/config-directives/WSGIAcceptMutex.rst +++ /dev/null @@ -1,21 +0,0 @@ -=============== -WSGIAcceptMutex -=============== - -:Description: Specify type of accept mutex used by daemon processes. -:Syntax: ``WSGIAcceptMutex Default`` | *method* -:Default: ``WSGIAcceptMutex Default`` -:Context: server config - -The ``WSGIAcceptMutex`` directive sets the method that mod_wsgi will use to -serialize multiple daemon processes in a process group accepting requests -on a socket connection from the Apache child processes. If this directive -is not defined then the same type of mutex mechanism as used by Apache for -the main Apache child processes when accepting connections from a client -will be used. If set the method types are the same as for the Apache -`AcceptMutex`_ directive. - -Note that the ``WSGIAcceptMutex`` directive and corresponding features are -not available on Windows or when running Apache 1.3. - -.. _AcceptMutex: http://httpd.apache.org/docs/2.4/mod/mpm_common.html#acceptmutex diff --git a/docs/config-directives/WSGIAccessScript.rst b/docs/config-directives/WSGIAccessScript.rst deleted file mode 100644 index a361a1a..0000000 --- a/docs/config-directives/WSGIAccessScript.rst +++ /dev/null @@ -1,31 +0,0 @@ -================ -WSGIAccessScript -================ - -:Description: Specify script implementing host access controls. -:Syntax: ``WSGIAccessScript`` *path* [ *options* ] -:Context: directory, .htaccess -:Override: AuthConfig - -The ``WSGIAccessScript`` directive provides a mechanism for implementing -host access controls. - -More detailed information on using the ``WSGIAccessScript`` directive -can be found in :doc:`../user-guides/access-control-mechanisms`. - -The options which can be supplied to the ``WSGIAccessScript`` directive are: - -**application-group=name** - - Specifies the name of the application group within the specified - process for which the script file will be loaded. - - If the ``application-group`` option is not supplied, the special value - ``%{GLOBAL}`` which denotes that the script file be loaded within the - context of the first interpreter created by Python when it is - initialised will be used. Otherwise, will be loaded into the - interpreter for the specified application group. - -Note that the script always runs in processes associated with embedded -mode. It is not possible to delegate the script such that it is run within -context of a daemon process. diff --git a/docs/config-directives/WSGIApplicationGroup.rst b/docs/config-directives/WSGIApplicationGroup.rst deleted file mode 100644 index d08228a..0000000 --- a/docs/config-directives/WSGIApplicationGroup.rst +++ /dev/null @@ -1,106 +0,0 @@ -==================== -WSGIApplicationGroup -==================== - -:Description: Sets which application group WSGI application belongs to. -:Syntax: ``WSGIApplicationGroup name`` - ``WSGIApplicationGroup %{GLOBAL}`` - ``WSGIApplicationGroup %{SERVER}`` - ``WSGIApplicationGroup %{RESOURCE}`` - ``WSGIApplicationGroup %{ENV:variable}`` -:Default: ``WSGIApplicationGroup %{RESOURCE}`` -:Context: server config, virtual host, directory - -The ``WSGIApplicationGroup`` directive can be used to specify which -application group a WSGI application or set of WSGI applications belongs -to. All WSGI applications within the same application group will execute -within the context of the same Python sub interpreter of the process -handling the request. - -The argument to the ``WSGIApplicationGroup`` can be either one of four -special expanding variables or an explicit name of your own choosing. -The meaning of the special variables are: - -**%{GLOBAL}** - - The application group name will be set to the empty string. - - Any WSGI applications in the global application group will always be - executed within the context of the first interpreter created by Python - when it is initialised. Forcing a WSGI application to run within the - first interpreter can be necessary when a third party C extension - module for Python has used the simplified threading API for - manipulation of the Python GIL and thus will not run correctly within - any additional sub interpreters created by Python. - -**%{SERVER}** - - The application group name will be set to the server hostname. If the - request arrived over a non standard HTTP/HTTPS port, the port number - will be added as a suffix to the group name separated by a colon. - - For example, if the virtual host ``www.example.com`` is handling - requests on the standard HTTP port (80) and HTTPS port (443), a request - arriving on either port would see the application group name being set - to ``www.example.com``. If instead the virtual host was handling requests - on port 8080, then the application group name would be set to - ``www.example.com:8080``. - -**%{RESOURCE}** - - The application group name will be set to the server hostname and port - as for the ``%{SERVER}`` variable, to which the value of WSGI environment - variable ``SCRIPT_NAME`` is appended separated by the file separator - character. - - For example, if the virtual host ``www.example.com`` was handling - requests on port 8080 and the URL-path which mapped to the WSGI - application was:: - - http://www.example.com/wsgi-scripts/foo - - then the application group name would be set to:: - - www.example.com:8080|/wsgi-scripts/foo - - The effect of using the ``%{RESOURCE}`` variable expansion is for each - application on any server to be isolated from all others by being - mapped to its own Python sub interpreter. - -**%{ENV:variable}** - - The application group name will be set to the value of the named - environment variable. The environment variable is looked-up via the - internal Apache notes and subprocess environment data structures and - (if not found there) via ``getenv()`` from the Apache server process. - -In an Apache configuration file, environment variables accessible using -the ``%{ENV}`` variable reference can be setup by using directives such as -`SetEnv`_ and `RewriteRule`_. - -For example, to group all WSGI scripts for a specific user when using -`mod_userdir`_ within the same application group, the following could be -used:: - - RewriteEngine On - RewriteCond %{REQUEST_URI} ^/~([^/]+) - RewriteRule . - [E=APPLICATION_GROUP:~%1] - - - Options ExecCGI - SetHandler wsgi-script - WSGIApplicationGroup %{ENV:APPLICATION_GROUP} - - -Note that in embedded mode or a multi process daemon process group, there -will be an instance of the named sub interpreter in each process. Thus the -directive only ensures that request is handled in the named sub interpreter -within the process that handles the request. If you need to ensure that -requests for a specific user always go back to the exact same sub interpreter, -then you will need to use a daemon process group with only a single process, -or implement sticky session mechanism across a number of single process -daemon process groups. - -.. _SetEnv: http://httpd.apache.org/docs/2.2/mod/mod_env.html#setenv -.. _RewriteRule: http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html#rewriterule -.. _mod_userdir: http://httpd.apache.org/docs/2.2/mod/mod_userdir.html diff --git a/docs/config-directives/WSGIAuthGroupScript.rst b/docs/config-directives/WSGIAuthGroupScript.rst deleted file mode 100644 index ad5a30a..0000000 --- a/docs/config-directives/WSGIAuthGroupScript.rst +++ /dev/null @@ -1,31 +0,0 @@ -=================== -WSGIAuthGroupScript -=================== - -:Description: Specify script implementing group authorisation. -:Syntax: ``WSGIAuthGroupScript`` *path* [ *options* ] -:Context: directory, .htaccess -:Override: AuthConfig - -The ``WSGIAuthGroupScript`` directive provides a mechanism for implementing -group authorisation using the Apache ``Require`` directive. - -More detailed information on using the ``WSGIAuthGroupScript`` directive -can be found in :doc:`../user-guides/access-control-mechanisms`. - -The options which can be supplied to the ``WSGIAuthGroupScript`` directive are: - -**application-group=name** - - Specifies the name of the application group within the specified - process for which the script file will be loaded. - - If the ``application-group`` option is not supplied, the special value - ``%{GLOBAL}`` which denotes that the script file be loaded within the - context of the first interpreter created by Python when it is - initialised will be used. Otherwise, will be loaded into the - interpreter for the specified application group. - -Note that the script always runs in processes associated with embedded -mode. It is not possible to delegate the script such that it is run within -context of a daemon process. diff --git a/docs/config-directives/WSGIAuthUserScript.rst b/docs/config-directives/WSGIAuthUserScript.rst deleted file mode 100644 index b275504..0000000 --- a/docs/config-directives/WSGIAuthUserScript.rst +++ /dev/null @@ -1,40 +0,0 @@ -================== -WSGIAuthUserScript -================== - -:Description: Specify script implementing an authentication provider. -:Syntax: ``WSGIAuthUserScript`` *path* [ *options* ] -:Context: directory, .htaccess -:Override: AuthConfig - -The WSGIAuthUserScript directive can be used to specify a script which -implements an Apache authentication provider. - -Such an authentication provider can be used where you want Apache to worry -about the handshaking related to HTTP Basic and Digest authentication and -you only wish to deal with supplying the user credentials for authenticating -the user. - -If using at least Apache 2.2, other Apache modules implementing custom -authentication mechanisms can also make use of the authentication provider -if they are using the corresponding Apache C API for accessing them. - -More detailed information on using the WSGIAuthUserScript directive can be -found in :doc:`../user-guides/access-control-mechanisms`. - -The options which can be supplied to the WSGIAuthUserScript directive are: - -**application-group=name** - Specifies the name of the application group within the specified - process for which the script file will be loaded. - - If the 'application-group' option is not supplied, the special value - '%{GLOBAL}' which denotes that the script file be loaded within the - context of the first interpreter created by Python when it is - initialised will be used. Otherwise, will be loaded into the - interpreter for the specified application group. - -Note that the script always runs in processes associated with embedded -mode. It is not possible to delegate the script such that it is run within -context of a daemon process. - diff --git a/docs/config-directives/WSGICallableObject.rst b/docs/config-directives/WSGICallableObject.rst deleted file mode 100644 index 6157079..0000000 --- a/docs/config-directives/WSGICallableObject.rst +++ /dev/null @@ -1,29 +0,0 @@ -================== -WSGICallableObject -================== - -:Description: Sets the name of the WSGI application callable. -:Syntax: ``WSGICallableObject`` *name* - ``WSGICallableObject %{ENV:variable}`` -:Default: ``WSGICallableObject application`` -:Context: server config, virtual host, directory, .htaccess -:Override: ``FileInfo`` - -The WSGICallableObject directive can be used to override the name of the -Python callable object in the script file which is used as the entry point -into the WSGI application. - -When ``%{ENV}`` is being used, the environment variable is looked-up via the -internal Apache notes and subprocess environment data structures and (if -not found there) via getenv() from the Apache server process. - -In an Apache configuration file, environment variables accessible using -the ``%{ENV}`` variable reference can be setup by using directives such as -`SetEnv`_ and `RewriteRule`_. - -Note that the name of the callable object must be an object present at -global scope within the WSGI script file. It is not possible to use a dotted -path to refer to a sub object of a module imported by the WSGI script file. - -.. _SetEnv: http://httpd.apache.org/docs/2.2/mod/mod_env.html#setenv -.. _RewriteRule: http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html#rewriterule diff --git a/docs/config-directives/WSGICaseSensitivity.rst b/docs/config-directives/WSGICaseSensitivity.rst deleted file mode 100644 index 13d1441..0000000 --- a/docs/config-directives/WSGICaseSensitivity.rst +++ /dev/null @@ -1,23 +0,0 @@ -=================== -WSGICaseSensitivity -=================== - -:Description: Define whether file system is case sensitive. -:Syntax: ``WSGICaseSensitivity On|Off`` -:Context: server config - -When mod_wsgi is used on the Windows and MacOS X platforms, it will assume -that the filesystem in use is case insensitive. This is necessary to ensure -that the module caching system works correctly and only one module is -retained in memory where paths with different case are used to identify the -same script file. On other platforms it will always be assumed that a case -sensitive file system is used. - -The WSGICaseSensitivity directive can be used explicitly to specify for a -particular WSGI application whether the file system the script file is -stored in is case sensitive or not, thus overriding the default for any -platform. A value of On indicates that the filesystem is case sensitive. - -Because it is set in the main server config it will apply to the whole -site. All paths therefore would need to be located in a filesystem with the -same case convention. diff --git a/docs/config-directives/WSGIDaemonProcess.rst b/docs/config-directives/WSGIDaemonProcess.rst deleted file mode 100644 index 1fddeed..0000000 --- a/docs/config-directives/WSGIDaemonProcess.rst +++ /dev/null @@ -1,276 +0,0 @@ -================= -WSGIDaemonProcess -================= - -:Description: Configure a distinct daemon process for running applications. -:Syntax: ``WSGIDaemonProcess`` *name* ``[`` *options* ``]`` -:Context: server config, virtual host - -The WSGIDaemonProcess directive can be used to specify that distinct daemon -processes should be created to which the running of WSGI applications can -be delegated. Where Apache has been started as the ``root`` user, the -daemon processes can be run as a user different to that which the Apache -child processes would normally be run as. - -When distinct daemon processes are enabled and used, the process is -dedicated to mod_wsgi and the only thing that the processes do is run the -WSGI applications assigned to that process group. Any other Apache modules -such as PHP or activities such as serving up static files continue to be -run in the standard Apache child processes. - -Note that having denoted that daemon processes should be created by using -the WSGIDaemonProcess directive, the WSGIProcessGroup directive still needs -to be used to delegate specific WSGI applications to execute within those -daemon processes. - -Also note that the name of the daemon process group must be unique for the -whole server. That is, it is not possible to use the same daemon process -group name in different virtual hosts. - -Options which can be supplied to the WSGIDaemonProcess directive are: - -**user=name | user=#uid**.rst - Defines the UNIX user *name* or numeric user *uid* of the user that - the daemon processes should be run as. If this option is not supplied - the daemon processes will be run as the same user that Apache would - run child processes and as defined by the `User`_ directive. - - Note that this option is ignored if Apache wasn't started as the root - user, in which case no matter what the settings, the daemon processes - will be run as the user that Apache was started as. - - Also be aware that mod_wsgi will not allow you to run a daemon - process group as the root user due to the security risk of running - a web application as root. - -**group=name | group=#gid** - Defines the UNIX group *name* or numeric group *gid* of the primary - group that the daemon processes should be run as. If this option is not - supplied the daemon processes will be run as the same group that Apache - would run child processes and as defined by the `Group`_ directive. - - Note that this option is ignored if Apache wasn't started as the root - user, in which case no matter what the settings, the daemon processes - will be run as the group that Apache was started as. - -**processes=num** - Defines the number of daemon processes that should be started in this - process group. If not defined then only one process will be run in this - process group. - - Note that if this option is defined as 'processes=1', then the WSGI - environment attribute called 'wsgi.multiprocess' will be set to be True - whereas not providing the option at all will result in the attribute - being set to be False. This distinction is to allow for where some form - of mapping mechanism might be used to distribute requests across - multiple process groups and thus in effect it is still a multiprocess - application. If you need to ensure that 'wsgi.multiprocess' is False so - that interactive debuggers will work, simply do not specify the - 'processes' option and allow the default single daemon process to be - created in the process group. - -**threads=num** - Defines the number of threads to be created to handle requests in each - daemon process within the process group. - - If this option is not defined then the default will be to create 15 - threads in each daemon process within the process group. - -**umask=0nnn** - Defines a value to be used for the umask of the daemon processes within - the process group. The value must be provided as an octal number. - - If this option is not defined then the umask of the user that Apache is - initially started as will be inherited by the process. Typically the - inherited umask would be '0022'. - -**home=directory** - Defines an absolute path of a directory which should be used as the - initial current working directory of the daemon processes within the - process group. - - If this option is not defined, in mod_wsgi 1.X the current working - directory of the Apache parent process will be inherited by the daemon - processes within the process group. Normally the current working directory - of the Apache parent process would be the root directory. In mod_wsgi 2.0+ - the initial current working directory will be set to be the home - directory of the user that the daemon process runs as. - -**python-path=directory | python-path=directory:directory** - List of colon separated directories to add to the Python module search - path, ie., ``sys.path``. - - Note that this is not strictly the same as having set ``PYTHONPATH`` - environment variable when running normal command line Python. When this - option is used, the directories are added by calling - ``site.addsitedir()``. As well as adding the directory to - ``sys.path`` this function has the effect of opening and interpreting - any '.pth' files located in the specified directories. The option - therefore can be used to point at the ``site-packages`` directory - corresponding to a Python virtual environment created by a tool such as - ``virtualenv``, with any additional directories corresponding to - Python eggs within that directory also being automatically added to - ``sys.path``. - -**python-eggs=directory** - Directory to be used as the Python egg cache directory. This is - equivalent to having set the ``PYTHON_EGG_CACHE`` environment - variable. - - Note that the directory specified must exist and be writable by the - user that the daemon process run as. - -**stack-size=nnn** - The amount of virtual memory in bytes to be allocated for the stack - corresponding to each thread created by mod_wsgi in a daemon process. - - This option would be used when running Linux in a VPS system which has - been configured with a quite low 'Memory Limit' in relation to the - 'Context RSS' and 'Max RSS Memory' limits. In particular, the default - stack size for threads under Linux is 8MB is quite excessive and could - for such a VPS result in the 'Memory Limit' being exceeded before the - RSS limits were exceeded. In this situation, the stack size should be - dropped down to be in the region of 512KB (524288 bytes). - -**maximum-requests=nnn** - Defines a limit on the number of requests a daemon process should - process before it is shutdown and restarted. Setting this to a non zero - value has the benefit of limiting the amount of memory that a process - can consume by (accidental) memory leakage. - - If this option is not defined, or is defined to be 0, then the daemon - process will be persistent and will continue to service requests until - Apache itself is restarted or shutdown. - -**inactivity-timeout=sss** - Defines the maximum number of seconds allowed to pass before the - daemon process is shutdown and restarted when the daemon process has - entered an idle state. For the purposes of this option, being idle - means no new requests being received, or no attempts by current - requests to read request content or generate response content for the - defined period. - - This option exists to allow infrequently used applications running in - a daemon process to be restarted, thus allowing memory being used to - be reclaimed, with process size dropping back to the initial startup - size before any application had been loaded or requests processed. - -**deadlock-timeout=sss** - Defines the maximum number of seconds allowed to pass before the - daemon process is shutdown and restarted after a potential deadlock on - the Python GIL has been detected. The default is 300 seconds. - - This option exists to combat the problem of a daemon process freezing - as the result of a rouge Python C extension module which doesn't - properly release the Python GIL when entering into a blocking or long - running operation. - -**shutdown-timeout=sss** - Defines the maximum number of seconds allowed to pass when waiting - for a daemon process to gracefully shutdown as a result of the maximum - number of requests or inactivity timeout being reached, or when a user - initiated SIGINT signal is sent to a daemon process. When this timeout - has been reached the daemon process will be forced to exited even if - there are still active requests or it is still running Python exit - functions. - - If this option is not defined, then the shutdown timeout will be set - to 5 seconds. Note that this option does not change the shutdown - timeout applied to daemon processes when Apache itself is being stopped - or restarted. That timeout value is defined internally to Apache as 3 - seconds and cannot be overridden. - -**display-name=value** - Defines a different name to show for the daemon process when using the - 'ps' command to list processes. If the value is '%{GROUP}' then the - name will be '(wsgi:group)' where 'group' is replaced with the name - of the daemon process group. - - Note that only as many characters of the supplied value can be displayed - as were originally taken up by 'argv0' of the executing process. Anything - in excess of this will be truncated. - - This feature may not work as described on all platforms. Typically it - also requires a 'ps' program with BSD heritage. Thus on Solaris UNIX - the '/usr/bin/ps' program doesn't work, but '/usr/ucb/ps' does. - -**receive-buffer-size=nnn** - Defines the UNIX socket buffer size for data being received by the - daemon process from the Apache child process. - - This option may need to be used to override small default values set by - certain operating systems and would help avoid possibility of deadlock - between Apache child process and daemon process when WSGI application - generates large responses but doesn't consume request content. In - general such deadlock problems would not arise with well behaved WSGI - applications, but some spam bots attempting to post data to web sites - are known to trigger the problem. - - The maximum possible value that can be set for the buffer size is - operating system dependent and will need to be calculated through trial - and error. - -**send-buffer-size=nnn** - Defines the UNIX socket buffer size for data being sent in the - direction daemon process back to Apache child process. - - This option may need to be used to override small default values set by - certain operating systems and would help avoid possibility of deadlock - between Apache child process and daemon process when WSGI application - generates large responses but doesn't consume request content. In - general such deadlock problems would not arise with well behaved WSGI - applications, but some spam bots attempting to post data to web sites - are known to trigger the problem. - - The maximum possible value that can be set for the buffer size is - operating system dependent and will need to be calculated through trial - and error. - -To delegate a particular WSGI application to run in a named set of daemon -processes, the WSGIProcessGroup directive should be specified in -appropriate context for that application. If WSGIProcessGroup is not used, -the application will be run within the standard Apache child processes. - -If the WSGIDaemonProcess directive is specified outside of all virtual -host containers, any WSGI application can be delegated to be run within -that daemon process group. If the WSGIDaemonProcess directive is specified -within a virtual host container, only WSGI applications associated with -virtual hosts with the same server name as that virtual host can be -delegated to that set of daemon processes. - -When WSGIDaemonProcess is associated with a virtual host, the error log -associated with that virtual host will be used for all Apache error log -output from mod_wsgi rather than it appear in the main Apache error log. - -For example, if a server is hosting two virtual hosts and it is desired -that the WSGI applications related to each virtual host run in distinct -processes of their own and as a user which is the owner of that virtual -host, the following could be used:: - - - ServerName www.site1.com - CustomLog logs/www.site1.com-access_log common - ErrorLog logs/ww.site1.com-error_log - - WSGIDaemonProcess www.site1.com user=joe group=joe processes=2 threads=25 - WSGIProcessGroup www.site1.com - - ... - - - - ServerName www.site2.com - CustomLog logs/www.site2.com-access_log common - ErrorLog logs/www.site2.com-error_log - - WSGIDaemonProcess www.site2.com user=bob group=bob processes=2 threads=25 - WSGIProcessGroup www.site2.com - - ... - - -Note that the WSGIDaemonProcess directive and corresponding features are -not available on Windows or when running Apache 1.3. - -.. _User: http://httpd.apache.org/docs/2.2/mod/mpm_common.html#user -.. _Group: http://httpd.apache.org/docs/2.2/mod/mpm_common.html#group diff --git a/docs/config-directives/WSGIImportScript.rst b/docs/config-directives/WSGIImportScript.rst deleted file mode 100644 index 7705b79..0000000 --- a/docs/config-directives/WSGIImportScript.rst +++ /dev/null @@ -1,42 +0,0 @@ -================ -WSGIImportScript -================ - -:Description: Specify a script file to be loaded on process start. -:Syntax: ``WSGIImportScript`` *path* ``[`` *options* ``]`` -:Context: server config - -The WSGIImportScript directive can be used to specify a script file to be -loaded when a process starts. Options must be provided to indicate the name -of the process group and the application group into which the script will -be loaded. - -The options which must supplied to the WSGIImportScript directive are: - -**process-group=name** - Specifies the name of the process group for which the script file will - be loaded. - - The name of the process group can be set to the special value - '%{GLOBAL}' which denotes that the script file be loaded for the Apache - child processes. Any other value indicates appropriate process group - for mod_wsgi daemon mode. - -**application-group=name** - Specifies the name of the application group within the specified - process for which the script file will be loaded. - - The name of the application group can be set to the special value - '%{GLOBAL}' which denotes that the script file be loaded within the - context of the first interpreter created by Python when it is - initialised. Otherwise, will be loaded into the interpreter for the - specified application group. - -Because the script files are loaded prior to beginning to accept any -requests, any delay in loading the script will not cause actual requests to -be blocked. As such, the WSGIImportScript can be used to preload a WSGI -application script file on process start so that it is ready when actual -user requests arrive. For where there are multiple processes handling -requests, this can reduce or eliminate the apparent stalling of an -application when performing a restart of Apache or a daemon mode process -group. diff --git a/docs/config-directives/WSGILazyInitialization.rst b/docs/config-directives/WSGILazyInitialization.rst deleted file mode 100644 index 3484427..0000000 --- a/docs/config-directives/WSGILazyInitialization.rst +++ /dev/null @@ -1,73 +0,0 @@ -====================== -WSGILazyInitialization -====================== - -:Description: Enable/disable lazy initialisation of Python. -:Syntax: ``WSGILazyInitialization On|Off`` -:Default: ``WSGILazyInitialization On`` -:Context: server config - -The WSGILazyInitialization directives sets whether or not the Python -interpreter is preinitialised within the Apache parent process or whether -lazy initialisation is performed, and the Python interpreter only -initialised in the Apache server processes or mod_wsgi daemon processes -after they have forked from the Apache parent process. - -In versions of mod_wsgi prior to version 3.0 the Python interpreter was -always preinitialised in the Apache parent process. This did mean that -theoretically some benefit in memory usage could be derived from delayed -copy on write semantics of memory inherited by child processes that was -initialised in the parent. This memory wasn't significant however and was -tempered by the fact that the Python interpreter when destroyed and then -reinitialised in the Apache parent process on an Apache restart, would with -some Python versions leak memory. This meant that if a server had many -restarts performed, the Apache parent process and thus all forked child -processes could grow in memory usage over time, eventually necessitating -Apache be completely stopped and then restarted. - -This issue of memory leaks with the Python interpreter reached an extreme -with Python 3.0, where by design, various data structures would not be -destroyed on the basis that it would be reused when Python interpreter was -reinitialised within the same process. The problem is that when an Apache -restart is performed, mod_wsgi and the Python library are unloaded from -memory, with the result that the references to that memory would be lost -and so a real memory leak, of significant size and much worse that older -versions of Python, would result. - -As a consequence, with mod_wsgi 3.0 and onwards, the Python interpreter is -not initialised by default in the Apache parent process for any version of -Python. This avoids completely the risk of cummulative memory leaks by the -Python interpreter on a restart into the Apache parent process, albeit with -potential for a slight increase in child process memory sizes. If need be, -the existing behaviour can be restored by setting the directive with the -value 'Off'. - -A further upside of using lazy initialisation is that if you are using -daemon mode only, ie., not using embedded mode, you can completely turn off -initialisation of the Python interpreter within the main Apache server -child process. Unfortunately, because it isn't possible in the general case -to know whether embedded mode will be needed or not, you will need to -manually set the configuration to do this. This can be done by setting:: - - WSGIRestrictEmbedded On - -With restrictions on embedded mode enabled, any attempt to run a WSGI -application in embedded mode will fail, so it will be necessary to ensure -all WSGI applications are delegated to run in daemon mode. Although WSGI -applications will be restricted from being run in embedded mode and the -Python interpreter therefore not initialised, it will fallback to being -initialised if you use any of the Python hooks for access control, -authentication or authorisation providers, or WSGI application dispatch -overrides. - -Note that if mod_python is being used in the same Apache installation, -because mod_python takes precedence over mod_wsgi in initialising the -Python interpreter, lazy initialisation cannot be done and so Python -interpreter will continue to be preinitialised in the Apache parent process -regardless of the setting of WSGILazyInitialization. Use of mod_python will -thus perpetuate the risk of memory leaks and growing memory use of Apache -process. This is especially the case since mod_python doesn't even properly -destroy the Python interpreter in the Apache parent process on a restart -and so all memory associated with the Python interpreter is leaked and not -just that caused by the Python interpreter when it is destroyed and doesn't -clean up after itself. diff --git a/docs/config-directives/WSGIPassAuthorization.rst b/docs/config-directives/WSGIPassAuthorization.rst deleted file mode 100644 index c346c9e..0000000 --- a/docs/config-directives/WSGIPassAuthorization.rst +++ /dev/null @@ -1,24 +0,0 @@ -===================== -WSGIPassAuthorization -===================== - -:Description: Enable/Disable passing of authorisation headers. -:Syntax: ``WSGIPassAuthorization On|Off`` -:Default: ``WSGIPassAuthorization Off`` -:Context: server config, virtual host, directory, .htaccess - -The WSGIPassAuthorization directive can be used to control whether HTTP -authorisation headers are passed through to a WSGI application in the -``HTTP_AUTHORIZATION`` variable of the WSGI application environment when -the equivalent HTTP request headers are present. This option would need to -be set to ``On`` if the WSGI application was to handle authorisation -rather than Apache doing it. - -Authorisation headers are not passed through by default as doing so could -leak information about passwords through to a WSGI application which should -not be able to see them when Apache is performing authorisation. If Apache -is performing authorisation, a WSGI application can still find out what -type of authorisation scheme was used by checking the variable -``AUTH_TYPE`` of the WSGI application environment. The login name of the -authorised user can be determined by checking the variable -``REMOTE_USER``. diff --git a/docs/config-directives/WSGIProcessGroup.rst b/docs/config-directives/WSGIProcessGroup.rst deleted file mode 100644 index 3d5abe4..0000000 --- a/docs/config-directives/WSGIProcessGroup.rst +++ /dev/null @@ -1,65 +0,0 @@ -================ -WSGIProcessGroup -================ - -:Description: Sets which process group WSGI application is assigned to. -:Syntax: ``WSGIProcessGroup %{GLOBAL}|%{ENV:variable}|name`` -:Default: ``WSGIProcessGroup %{GLOBAL}`` -:Context: server config, virtual host, directory - -The WSGIProcessGroup directive can be used to specify which process group a -WSGI application or set of WSGI applications will be executed in. All WSGI -applications within the same process group will execute within the context -of the same group of daemon processes. - -The argument to the WSGIProcessGroup can be either one of two special -expanding variables or the actual name of a group of daemon processes setup -using the WSGIDaemonProcess directive. The meaning of the special variables -are: - -**%{GLOBAL}** - The process group name will be set to the empty string. - - Any WSGI applications in the global process group will always be - executed within the context of the standard Apache child processes. - Such WSGI applications will incur the least runtime overhead, however, - they will share the same process space with other Apache modules such - as PHP, as well as the process being used to serve up static file - content. Running WSGI applications within the standard Apache child - processes will also mean the application will run as the user that - Apache would normally run as. - -**%{ENV:variable}** - The process group name will be set to the value of the named - environment variable. The environment variable is looked-up via the - internal Apache notes and subprocess environment data structures and - (if not found there) via getenv() from the Apache server process. - The result must identify a named process group setup using the - WSGIDaemonProcess directive. - -In an Apache configuration file, environment variables accessible using -the ``%{ENV}`` variable reference can be setup by using directives such as -`SetEnv`_ and `RewriteRule`_. - -For example, to select which process group a specific WSGI application -should execute within based on entries in a database file, the following -could be used:: - - RewriteEngine On - RewriteMap wsgiprocmap dbm:/etc/httpd/wsgiprocmap.dbm - RewriteRule . - [E=PROCESS_GROUP:${wsgiprocmap:%{REQUEST_URI}}] - - WSGIProcessGroup %{ENV:PROCESS_GROUP} - -When using the WSGIProcessGroup directive, only daemon process groups -defined within virtual hosts with the same server name, or those defined at -global scope outside of any virtual hosts can be selected. It is not -possible to select a daemon process group which is defined within a -different virtual host. Which daemon process groups can be selected may be -further restricted if the WSGIRestrictProcess directive has been used. - -Note that the WSGIProcessGroup directive and corresponding features are not -available on Windows or when running Apache 1.3. - -.. _SetEnv: http://httpd.apache.org/docs/2.2/mod/mod_env.html#setenv -.. _RewriteRule: http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html#rewriterule diff --git a/docs/config-directives/WSGIPythonEggs.rst b/docs/config-directives/WSGIPythonEggs.rst deleted file mode 100644 index ad87df7..0000000 --- a/docs/config-directives/WSGIPythonEggs.rst +++ /dev/null @@ -1,18 +0,0 @@ -============== -WSGIPythonEggs -============== - -:Description: Directory to use for Python eggs cache. -:Syntax: ``WSGIPythonEggs`` *directory* -:Context: server config - -Used to specify the directory to be used as the Python eggs cache directory -for all sub interpreters created within embedded mode. This directive -achieves the same affect as having set the ``PYTHON_EGG_CACHE`` -environment variable. - -Note that the directory specified must exist and be writable by the user -that the Apache child processes run as. The directive only applies to -mod_wsgi embedded mode. To set the Python eggs cache directory for mod_wsgi -daemon processes, use the 'python-eggs' option to the WSGIDaemonProcess -directive instead. diff --git a/docs/config-directives/WSGIPythonHome.rst b/docs/config-directives/WSGIPythonHome.rst deleted file mode 100644 index 1b3a3ae..0000000 --- a/docs/config-directives/WSGIPythonHome.rst +++ /dev/null @@ -1,50 +0,0 @@ -============== -WSGIPythonHome -============== - -:Description: Absolute path to Python prefix/exec_prefix directories. -:Syntax: ``WSGIPythonHome`` *prefix|prefix:exec_prefix* -:Context: server config - -Used to indicate to Python when it is initialised where its library files -are installed. This should be defined where the Python executable is not in -the ``PATH`` of the user that Apache runs as, or where a system has -multiple versions of Python installed in different locations in the file -system, especially different installations of the same major/minor version, -and the installation that Apache finds in its ``PATH`` is not the desired -one. - -This directive can also be used to indicate a Python virtual environment -created using a tool such as ``virtualenv``, to be used for the whole of -mod_wsgi. - -When this directive is used it should be supplied the prefix for the -directories containing the platform independent and system dependent Python -library files. The directories should be separated by a ':'. If the same -directory is used for both, then only the one directory path needs to be -supplied. Where the directories are the same, this can usually be -determined by looking at the value of the ``sys.prefix`` variable for the -version of Python being used. - -Note that the Python installation being referred to using this directive -must be the same major/minor version of Python that mod_wsgi was compiled -for. If you want to use a different version of major/minor version of -Python than currently used, you must recompile mod_wsgi against the alternate -version of Python. - -This directive is the same as having set the environment variable -``PYTHONHOME`` in the environment of the user that Apache executes as. If -this directive is used it will override any setting of ``PYTHONHOME`` in -the environment of the user that Apache executes as. - -This directive will have no affect if mod_python is being loaded into Apache -at the same time as mod_wsgi as mod_python will in that case be responsible -for initialising Python. - -This directive is not available on Windows systems. Note that mod_wsgi 1.X -will not actually reject this directive if listed in the configuration, -however, it also will not do anything either. This is because on Windows -systems Python ignores the ``PYTHONHOME`` environment variable and always -seems to use the location of the Python DLL for determining where the -library files are located. - diff --git a/docs/config-directives/WSGIPythonOptimize.rst b/docs/config-directives/WSGIPythonOptimize.rst deleted file mode 100644 index d2db4a7..0000000 --- a/docs/config-directives/WSGIPythonOptimize.rst +++ /dev/null @@ -1,49 +0,0 @@ -================== -WSGIPythonOptimize -================== - -:Description: Enables basic Python optimisation features. -:Syntax: ``WSGIPythonOptimize [0|1|2]`` -:Default: ``WSGIPythonOptimize 0`` -:Context: server config - -Sets the level of Python compiler optimisations. The default is '0' which -means no optimisations are applied. - -Setting the optimisation level to '1' or above will have the effect of -enabling basic Python optimisations and changes the filename extension for -compiled (bytecode) files from ``.pyc`` to ``.pyo``. - -On the Windows platform, optimisation level of '0' apparently results in -the same outcome as if the optimisation level had been set to '1'. - -When the optimisation level is set to '2', doc strings will not be -generated and thus not retained. This may techically result in a smaller -memory footprint if all ``.pyo`` files were compiled at this optimisation -level, but may cause some Python packages which interrogate doc strings in -some way to fail. - -Since all the installed ``.pyo`` files in your Python installation are -not likely to be installed with level '2' optimisation, the gain from using -this level of optimisation will probably be negligible if any. This is -because potentially only the Python code for your own application code will -be compiled with this level of optimisation. This will be the case as the -``.pyo`` files will aready exist for modules in the standard Python -library and they will be used as is, rather than them being regenerated -with a higher level of optimisation than they might be. Use of level '2' -optimisation is therefore discouraged. - -This directive will have no affect if mod_python is being loaded into Apache -at the same time as mod_wsgi as mod_python will in that case be responsible -for initialising Python. - -Overall, if you do not understand what the normal 'python' executable '-O' -option does, how the Python runtime changes it behaviour as a result, and -you don't know exactly how your application would be affected by enabling -this option, then do not use this option. In other words, stop trying to -prematurely optimise the performance of your application through shortcuts. -You will get much better performance gains by looking at the design of your -application and eliminating bottlenecks within it and how it uses any -database. So, put the gun down and back away, it will be better for all -concerned. - diff --git a/docs/config-directives/WSGIPythonPath.rst b/docs/config-directives/WSGIPythonPath.rst deleted file mode 100644 index d4c2ea6..0000000 --- a/docs/config-directives/WSGIPythonPath.rst +++ /dev/null @@ -1,59 +0,0 @@ -============== -WSGIPythonPath -============== - -:Description: Additional directories to search for Python modules. -:Syntax: ``WSGIPythonPath`` *directory|directory-1:directory-2:...* -:Context: server config - -Used to specify additional directories to search for Python modules. If -multiple directories are specified they should be separated by a ':' if -using a UNIX like system, or ';' if using Windows. If any part of a -directory path contains a space character, the complete argument string to -WSGIPythonPath must be quoted. - -When using mod_wsgi version 1.X, this directive is the same as having set -the environment variable ``PYTHONPATH`` in the environment of the user -that Apache executes as. If this directive is used it will override any -setting of ``PYTHONPATH`` in the environment of the user that Apache -executes as. The end result is that the listed directories will be added -to ``sys.path``. - -Note that in mod_wsgi version 1.X this applies to all Python sub -interpreters created, be they in the Apache child processes when embedded -mode is used, or in distinct daemon processes when daemon mode is used. It -is not possible to define this differently for mod_wsgi daemon processes. -If additional directories need to be added to the module search path for a -specific WSGI application it should be done within the WSGI application -script itself. - -When using mod_wsgi version 2.0, this directive does not have the same -affect as having set the environment variable ``PYTHONPATH``. In fact, if -``PYTHONPATH`` is set in the environment of the user that Apache is -started as, any directories so defined will still be added to -``sys.path`` and they will not be overridden. - -The difference with this directive when using mod_wsgi 2.0 is that each -directory listed will be added to the end of ``sys.path`` by calling -``site.addsitedir()``. By using this function, as well as the directory -being added to ``sys.path``, any '.pth' files located in the directories -will be opened and processed. Thus, if the directories contain Python eggs, -any associated directories corresponding to those Python eggs will in turn -also be added automatically to ``sys.path``. - -Note however that when using mod_wsgi 2.0, this directive only sets up the -additional Python module search directories for interpreters created in the -Apache child processes where embedded mode is used. If directories need to -be specified for interpreters running in daemon processes, the -'python-path' option to the WSGIDaemonProcess directive corresponding to -that daemon process should instead be used. - -In mod_wsgi version 2.0, because directories corresponding to Python eggs -are automatically added to ``sys.path``, the directive can be used to -point at the ``site-packages`` directory corresponding to a Python -virtual environment created by a tool such as ``virtualenv``. - -For mod_wsgi 1.X, this directive will have no affect if mod_python is being -loaded into Apache at the same time as mod_wsgi as mod_python will in that -case be responsible for initialising Python. - diff --git a/docs/config-directives/WSGIRestrictEmbedded.rst b/docs/config-directives/WSGIRestrictEmbedded.rst deleted file mode 100644 index e4c1b45..0000000 --- a/docs/config-directives/WSGIRestrictEmbedded.rst +++ /dev/null @@ -1,17 +0,0 @@ -==================== -WSGIRestrictEmbedded -==================== - -:Description: Enable restrictions on use of embedded mode. -:Syntax: ``WSGIRestrictEmbedded On|Off`` -:Default: ``WSGIRestrictEmbedded Off`` -:Context: server config - -The WSGIRestrictEmbedded directive determines whether mod_wsgi embedded -mode is enabled or not. If set to 'On' and the restriction on embedded mode -is therefore enabled, any attempt to make a request against a WSGI -application which hasn't been properly configured so as to be delegated to -a daemon mode process will fail with a HTTP internal server error response. - -This option does not exist on Windows, or Apache 1.3 or any other -configuration where daemon mode is not available. diff --git a/docs/config-directives/WSGIRestrictProcess.rst b/docs/config-directives/WSGIRestrictProcess.rst deleted file mode 100644 index e5278f2..0000000 --- a/docs/config-directives/WSGIRestrictProcess.rst +++ /dev/null @@ -1,64 +0,0 @@ -=================== -WSGIRestrictProcess -=================== - -:Description: Restrict which daemon process groups can be selected. -:Syntax: ``WSGIRestrictProcess`` *group-1 group-2 ...* -:Syntax: WSGIRestrictProcess *group-1 group-2 ...* -:Context: server config, virtual host, directory - -When using the WSGIProcessGroup directive, daemon process groups defined -within virtual hosts with the same server name, or those defined at global -scope outside of any virtual hosts can be selected. It is not possible to -select a daemon process group which is defined within a different virtual -host. - -To further limit which of the available daemon process groups can be -selected, the WSGIRestrictProcess directive can be used to list a -restricted set of daemon process group names. This could be used for -example where %{ENV} substitution is being used to allow the daemon process -group to be selected from a .htaccess file for a specific user. - -The main Apache configuration for this scenario might be:: - - WSGIDaemonProcess default processes=2 threads=25 - - - ServerName www.site.com - - WSGIDaemonProcess bob:1 user=bob group=bob threads=25 - WSGIDaemonProcess bob:2 user=bob group=bob threads=25 - WSGIDaemonProcess bob:3 user=bob group=bob threads=25 - - WSGIDaemonProcess joe:1 user=joe group=joe threads=25 - WSGIDaemonProcess joe:2 user=joe group=joe threads=25 - WSGIDaemonProcess joe:3 user=joe group=joe threads=25 - - SetEnv PROCESS_GROUP default - WSGIProcessGroup %{ENV:PROCESS_GROUP} - - - Options ExecCGI - AllowOverride FileInfo - AddHandler wsgi-script .wsgi - WSGIRestrictProcess bob:1 bob:2 bob:3 - SetEnv PROCESS_GROUP bob:1 - - - -The .htaccess file within the users account could then delegate specific -WSGI applications to different daemon process groups using the -`SetEnv`_ directive:: - - - SetEnv PROCESS_GROUP bob:2 - - - - SetEnv PROCESS_GROUP bob:3 - - -Note that the WSGIDaemonProcess directive and corresponding features are -not available on Windows or when running Apache 1.3. - -.. _SetEnv: http://httpd.apache.org/docs/2.2/mod/mod_env.html#setenv diff --git a/docs/config-directives/WSGIRestrictSignal.rst b/docs/config-directives/WSGIRestrictSignal.rst deleted file mode 100644 index e565dfd..0000000 --- a/docs/config-directives/WSGIRestrictSignal.rst +++ /dev/null @@ -1,46 +0,0 @@ -================== -WSGIRestrictSignal -================== - -:Description: Enable restrictions on use of signal(). -:Syntax: ``WSGIRestrictSignal On|Off`` -:Default: ``WSGIRestrictSignal On`` -:Context: server config - -A well behaved Python WSGI application should not in general register any -signal handlers of its own using ``signal.signal()``. The reason for this -is that the web server which is hosting a WSGI application will more than -likely register signal handlers of its own. If a WSGI application were to -override such signal handlers it could interfere with the operation of the -web server, preventing actions such as server shutdown and restart. - -In the interests of promoting portability of WSGI applications, mod_wsgi -restricts use of ``signal.signal()`` and will ensure that any attempts -to register signal handlers are ignored. A warning notice will be output -to the Apache error log indicating that this action has been taken. - -If for some reason there is a need for a WSGI application to register some -special signal handler this behaviour can be turned off, however an -application should avoid the signals ``SIGTERM``, ``SIGINT``, -``SIGHUP``, ``SIGWINCH`` and ``SIGUSR1`` as these are all used by -Apache. - -Apache will ensure that the signal ``SIGPIPE`` is set to ``SIG_IGN``. -If a WSGI application needs to override this, it must ensure that it is -reset to ``SIG_IGN`` before any Apache code is run. In a multi threaded -MPM this would be practically impossible to ensure so it is preferable that -the handler for ``SIG_PIPE`` also not be changed. - -Apache does not use ``SIGALRM``, but it is generally preferable that -other techniques be used to achieve the same affect. - -Do note that if enabling the ability to register signal handlers, such a -registration can only reliably be done from within code which is -implemented as a side effect of importing a script file identified by the -WSGIImportScript directive. This is because signal handlers can only be -registered from the main Python interpreter thread, and request handlers -when using embedded mode and a multithreaded Apache MPM would generally -execute from secondary threads. Similarly, when using daemon mode, request -handlers would executed from secondary threads. Only code run as a side -effect of WSGIImportScript is guaranteed to be executed in main Python -interpreter thread. diff --git a/docs/config-directives/WSGIRestrictStdin.rst b/docs/config-directives/WSGIRestrictStdin.rst deleted file mode 100644 index c5716b8..0000000 --- a/docs/config-directives/WSGIRestrictStdin.rst +++ /dev/null @@ -1,23 +0,0 @@ -================= -WSGIRestrictStdin -================= - -:Description: Enable restrictions on use of STDIN. -:Syntax: ``WSGIRestrictStdin On|Off`` -:Default: ``WSGIRestrictStdin On`` -:Context: server config - -A well behaved Python WSGI application should never attempt to read any -input directly from ``sys.stdin``. This is because ways of hosting WSGI -applications such as CGI use standard input as the mechanism for receiving -the content of a request from the web server. If a WSGI application were to -directly read from ``sys.stdin`` it could interfere with the operation of -the WSGI adapter and result in corruption of the input stream. - -In the interests of promoting portability of WSGI applications, mod_wsgi -restricts access to ``sys.stdin`` and will raise an exception if an -attempt is made to use ``sys.stdin`` explicitly. - -The only time that one might want to remove this restriction is if the Apache -web server is being run in debug or single process mode for the purposes of -being able to run an interactive Python debugger such as ``pdb``. diff --git a/docs/config-directives/WSGIRestrictStdout.rst b/docs/config-directives/WSGIRestrictStdout.rst deleted file mode 100644 index ec8f56d..0000000 --- a/docs/config-directives/WSGIRestrictStdout.rst +++ /dev/null @@ -1,29 +0,0 @@ -================== -WSGIRestrictStdout -================== - -:Description: Enable restrictions on use of STDOUT. -:Syntax: ``WSGIRestrictStdout On|Off`` -:Default: ``WSGIRestrictStdout On`` -:Context: server config - -A well behaved Python WSGI application should never attempt to write any -data directly to ``sys.stdout`` or use the ``print`` statement without -directing it to an alternate file object. This is because ways of hosting -WSGI applications such as CGI use standard output as the mechanism for -sending the content of a response back to the web server. If a WSGI -application were to directly write to ``sys.stdout`` it could interfere -with the operation of the WSGI adapter and result in corruption of the -output stream. - -In the interests of promoting portability of WSGI applications, mod_wsgi -restricts access to ``sys.stdout`` and will raise an exception if an -attempt is made to use ``sys.stdout`` explicitly. - -The only time that one might want to remove this restriction is purely out -of convencience of being able to use the ``print`` statement during -debugging of an application, or if some third party module or WSGI -application was errornously using ``print`` when it shouldn't. If -restrictions on using ``sys.stdout`` are removed, any data written to -it will instead be sent through to ``sys.stderr`` and will appear in -the Apache error log file. diff --git a/docs/config-directives/WSGIScriptAlias.rst b/docs/config-directives/WSGIScriptAlias.rst deleted file mode 100644 index e0d17d7..0000000 --- a/docs/config-directives/WSGIScriptAlias.rst +++ /dev/null @@ -1,66 +0,0 @@ -=============== -WSGIScriptAlias -=============== - -:Description: Maps a URL to a filesystem location and designates the target as a WSGI script. -:Syntax: ``WSGIScriptAlias`` *URL-path file-path|directory-path* -:Context: server config, virtual host - -The WSGIScriptAlias directive behaves in the same manner as the -`Alias`_ directive, except that it additionally marks the target directory -as containing WSGI scripts, or marks the specific *file-path* as a script, -that should be processed by mod_wsgi's ``wsgi-script`` handler. - -Where the target is a *directory-path*, URLs with a case-sensitive -(%-decoded) path beginning with *URL-path* will be mapped to scripts -contained in the indicated directory. - -For example:: - - WSGIScriptAlias /wsgi-scripts/ /web/wsgi-scripts/ - -A request for ``http://www.example.com/wsgi-scripts/name`` in this case -would cause the server to run the WSGI application defined in -``/web/wsgi-scripts/name``. This configuration is essentially equivalent -to:: - - Alias /wsgi-scripts/ /web/wsgi-scripts/ - - SetHandler wsgi-script - Options +ExecCGI - - -Where the target is a *file-path*, URLs with a case-sensitive -(%-decoded) path beginning with *URL-path* will be mapped to the script -defined by the *file-path*. - -For example:: - - WSGIScriptAlias /name /web/wsgi-scripts/name - -A request for ``http://www.example.com/name`` in this case would cause the -server to run the WSGI application defined in ``/web/wsgi-scripts/name``. - -If possible you should avoid placing WSGI scripts under the `DocumentRoot`_ -in order to avoid accidentally revealing their source code if the -configuration is ever changed. The WSGIScriptAlias makes this easy by -mapping a URL and designating the location of any WSGI scripts at the same -time. If you do choose to place your WSGI scripts in a directory already -accessible to clients, do not use WSGIScriptAlias. Instead, use -``_, `SetHandler`_ and `Options`_ as in:: - - - SetHandler wsgi-script - Options ExecCGI - - -This is necessary since multiple *URL-paths* can map to the same filesystem -location, potentially bypassing the WSGIScriptAlias and revealing the -source code of the WSGI scripts if they are not restricted by a -``_ section. - -.. _Alias: http://httpd.apache.org/docs/2.2/mod/mod_alias.html#alias -.. _DocumentRoot: http://httpd.apache.org/docs/2.2/mod/core.html#documentroot -.. _: http://httpd.apache.org/docs/2.2/mod/core.html#directory -.. _SetHandler: http://httpd.apache.org/docs/2.2/mod/core.html#sethandler -.. _Options: http://httpd.apache.org/docs/2.2/mod/core.html#options diff --git a/docs/config-directives/WSGIScriptAliasMatch.rst b/docs/config-directives/WSGIScriptAliasMatch.rst deleted file mode 100644 index eb3f397..0000000 --- a/docs/config-directives/WSGIScriptAliasMatch.rst +++ /dev/null @@ -1,33 +0,0 @@ -==================== -WSGIScriptAliasMatch -==================== - -:Description: Maps a URL to a filesystem location and designates the target as a WSGI script. -:Syntax: ``WSGIScriptAliasMatch`` *regex file-path|directory-path* -:Context: server config, virtual host - -This directive is similar to the WSGIScriptAlias directive, but makes use -of regular expressions, instead of simple prefix matching. The supplied -regular expression is matched against the URL-path, and if it matches, the -server will substitute any parenthesized matches into the given string and -use it as a filename. - -For example, to map a URL to scripts contained within -a directory where the script files use the ``.wsgi`` extension, but it -is desired that the extension not appear in the URL, use:: - - WSGIScriptAliasMatch ^/wsgi-scripts/([^/]+) /web/wsgi-scripts/$1.wsgi - -Note that you should only use WSGIScriptAliasMatch if you know what you are -doing. In most cases you should be using WSGIScriptAlias instead. If you -use WSGIScriptAliasMatch and don't do things the correct way, then you risk -modifying the value of SCRIPT_NAME as passed to the WSGI application and -this can stuff things up badly causing URL mapping to not work correctly -within the WSGI application or stuff up reconstruction of the full URL when -doing redirects. This is because the substitution of the matched sub -pattern from the left hand side back into the right hand side is often -critical. - -If you think you need to use WSGIScriptAliasMatch, you probably don't -really. If you really really think you need it, then check on the mod_wsgi -mailing list about how to use it properly. diff --git a/docs/config-directives/WSGIScriptReloading.rst b/docs/config-directives/WSGIScriptReloading.rst deleted file mode 100644 index 7b0b38f..0000000 --- a/docs/config-directives/WSGIScriptReloading.rst +++ /dev/null @@ -1,14 +0,0 @@ -=================== -WSGIScriptReloading -=================== - -:Description: Enable/Disable detection of WSGI script file changes. -:Syntax: ``WSGIScriptReloading On|Off`` -:Default: ``WSGIScriptReloading On`` -:Context: server config, virtual host, directory, .htaccess -:Override: ``FileInfo`` - -The WSGIScriptReloading directive can be used to control whether changes to -WSGI script files trigger the reloading mechanism. By default script -reloading is enabled and a change to the WSGI script file will trigger -whichever reloading mechanism is appropriate to the mode being used. diff --git a/docs/config-directives/WSGISocketPrefix.rst b/docs/config-directives/WSGISocketPrefix.rst deleted file mode 100644 index dfcd707..0000000 --- a/docs/config-directives/WSGISocketPrefix.rst +++ /dev/null @@ -1,43 +0,0 @@ -================ -WSGISocketPrefix -================ - -:Description: Configure directory to use for daemon sockets. -:Syntax: ``WSGISocketPrefix`` *prefix* -:Context: server config - -Defines the directory and name prefix to be used for the UNIX domain -sockets used by mod_wsgi to communicate between the Apache child processes -and the daemon processes. - -If the directive is not defined, the sockets and any related mutex lock -files will be placed in the standard Apache runtime directory. This is the -same directory that the Apache log files would normally be placed. - -For some Linux distributions, restrictive permissions are placed on the -standard Apache runtime directory such that the directory is not readable -to others. This can cause problems with mod_wsgi because the user that the -Apache child processes run as will subsequently not have the required -permissions to access the directory to be able to connect to the sockets. - -When this occurs, a '503 Service Temporarily Unavailable' error response -would be received by the client. To resolve the problem, the -WSGISocketPrefix directive should be defined to point at an alternate -location. The value may be a location relative to the Apache root directory, -or an absolute path. - -On systems which restrict access to the standard Apache runtime directory, -they normally provide an alternate directory for placing sockets and lock -files used by Apache modules. This directory is usually called 'run' and -to make use of this directory the WSGISocketPrefix directive would be set -as follows:: - - WSGISocketPrefix run/wsgi - -Note, do not put the sockets in the system temporary working directory. -That is, do not go making the prefix '/tmp/wsgi'. The directory should be -one that is only writable by 'root' user, or if not starting Apache as -'root', the user that Apache is started as. - -Note that the WSGISocketPrefix directive and corresponding features are not -available on Windows or when running Apache 1.3. diff --git a/docs/configuration-directives/WSGIAcceptMutex.rst b/docs/configuration-directives/WSGIAcceptMutex.rst new file mode 100644 index 0000000..818d021 --- /dev/null +++ b/docs/configuration-directives/WSGIAcceptMutex.rst @@ -0,0 +1,21 @@ +=============== +WSGIAcceptMutex +=============== + +:Description: Specify type of accept mutex used by daemon processes. +:Syntax: ``WSGIAcceptMutex Default`` | *method* +:Default: ``WSGIAcceptMutex Default`` +:Context: server config + +The ``WSGIAcceptMutex`` directive sets the method that mod_wsgi will use to +serialize multiple daemon processes in a process group accepting requests +on a socket connection from the Apache child processes. If this directive +is not defined then the same type of mutex mechanism as used by Apache for +the main Apache child processes when accepting connections from a client +will be used. If set the method types are the same as for the Apache +`AcceptMutex`_ directive. + +Note that the ``WSGIAcceptMutex`` directive and corresponding features are +not available on Windows or when running Apache 1.3. + +.. _AcceptMutex: http://httpd.apache.org/docs/2.4/mod/mpm_common.html#acceptmutex diff --git a/docs/configuration-directives/WSGIAccessScript.rst b/docs/configuration-directives/WSGIAccessScript.rst new file mode 100644 index 0000000..a361a1a --- /dev/null +++ b/docs/configuration-directives/WSGIAccessScript.rst @@ -0,0 +1,31 @@ +================ +WSGIAccessScript +================ + +:Description: Specify script implementing host access controls. +:Syntax: ``WSGIAccessScript`` *path* [ *options* ] +:Context: directory, .htaccess +:Override: AuthConfig + +The ``WSGIAccessScript`` directive provides a mechanism for implementing +host access controls. + +More detailed information on using the ``WSGIAccessScript`` directive +can be found in :doc:`../user-guides/access-control-mechanisms`. + +The options which can be supplied to the ``WSGIAccessScript`` directive are: + +**application-group=name** + + Specifies the name of the application group within the specified + process for which the script file will be loaded. + + If the ``application-group`` option is not supplied, the special value + ``%{GLOBAL}`` which denotes that the script file be loaded within the + context of the first interpreter created by Python when it is + initialised will be used. Otherwise, will be loaded into the + interpreter for the specified application group. + +Note that the script always runs in processes associated with embedded +mode. It is not possible to delegate the script such that it is run within +context of a daemon process. diff --git a/docs/configuration-directives/WSGIApplicationGroup.rst b/docs/configuration-directives/WSGIApplicationGroup.rst new file mode 100644 index 0000000..d08228a --- /dev/null +++ b/docs/configuration-directives/WSGIApplicationGroup.rst @@ -0,0 +1,106 @@ +==================== +WSGIApplicationGroup +==================== + +:Description: Sets which application group WSGI application belongs to. +:Syntax: ``WSGIApplicationGroup name`` + ``WSGIApplicationGroup %{GLOBAL}`` + ``WSGIApplicationGroup %{SERVER}`` + ``WSGIApplicationGroup %{RESOURCE}`` + ``WSGIApplicationGroup %{ENV:variable}`` +:Default: ``WSGIApplicationGroup %{RESOURCE}`` +:Context: server config, virtual host, directory + +The ``WSGIApplicationGroup`` directive can be used to specify which +application group a WSGI application or set of WSGI applications belongs +to. All WSGI applications within the same application group will execute +within the context of the same Python sub interpreter of the process +handling the request. + +The argument to the ``WSGIApplicationGroup`` can be either one of four +special expanding variables or an explicit name of your own choosing. +The meaning of the special variables are: + +**%{GLOBAL}** + + The application group name will be set to the empty string. + + Any WSGI applications in the global application group will always be + executed within the context of the first interpreter created by Python + when it is initialised. Forcing a WSGI application to run within the + first interpreter can be necessary when a third party C extension + module for Python has used the simplified threading API for + manipulation of the Python GIL and thus will not run correctly within + any additional sub interpreters created by Python. + +**%{SERVER}** + + The application group name will be set to the server hostname. If the + request arrived over a non standard HTTP/HTTPS port, the port number + will be added as a suffix to the group name separated by a colon. + + For example, if the virtual host ``www.example.com`` is handling + requests on the standard HTTP port (80) and HTTPS port (443), a request + arriving on either port would see the application group name being set + to ``www.example.com``. If instead the virtual host was handling requests + on port 8080, then the application group name would be set to + ``www.example.com:8080``. + +**%{RESOURCE}** + + The application group name will be set to the server hostname and port + as for the ``%{SERVER}`` variable, to which the value of WSGI environment + variable ``SCRIPT_NAME`` is appended separated by the file separator + character. + + For example, if the virtual host ``www.example.com`` was handling + requests on port 8080 and the URL-path which mapped to the WSGI + application was:: + + http://www.example.com/wsgi-scripts/foo + + then the application group name would be set to:: + + www.example.com:8080|/wsgi-scripts/foo + + The effect of using the ``%{RESOURCE}`` variable expansion is for each + application on any server to be isolated from all others by being + mapped to its own Python sub interpreter. + +**%{ENV:variable}** + + The application group name will be set to the value of the named + environment variable. The environment variable is looked-up via the + internal Apache notes and subprocess environment data structures and + (if not found there) via ``getenv()`` from the Apache server process. + +In an Apache configuration file, environment variables accessible using +the ``%{ENV}`` variable reference can be setup by using directives such as +`SetEnv`_ and `RewriteRule`_. + +For example, to group all WSGI scripts for a specific user when using +`mod_userdir`_ within the same application group, the following could be +used:: + + RewriteEngine On + RewriteCond %{REQUEST_URI} ^/~([^/]+) + RewriteRule . - [E=APPLICATION_GROUP:~%1] + + + Options ExecCGI + SetHandler wsgi-script + WSGIApplicationGroup %{ENV:APPLICATION_GROUP} + + +Note that in embedded mode or a multi process daemon process group, there +will be an instance of the named sub interpreter in each process. Thus the +directive only ensures that request is handled in the named sub interpreter +within the process that handles the request. If you need to ensure that +requests for a specific user always go back to the exact same sub interpreter, +then you will need to use a daemon process group with only a single process, +or implement sticky session mechanism across a number of single process +daemon process groups. + +.. _SetEnv: http://httpd.apache.org/docs/2.2/mod/mod_env.html#setenv +.. _RewriteRule: http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html#rewriterule +.. _mod_userdir: http://httpd.apache.org/docs/2.2/mod/mod_userdir.html diff --git a/docs/configuration-directives/WSGIAuthGroupScript.rst b/docs/configuration-directives/WSGIAuthGroupScript.rst new file mode 100644 index 0000000..ad5a30a --- /dev/null +++ b/docs/configuration-directives/WSGIAuthGroupScript.rst @@ -0,0 +1,31 @@ +=================== +WSGIAuthGroupScript +=================== + +:Description: Specify script implementing group authorisation. +:Syntax: ``WSGIAuthGroupScript`` *path* [ *options* ] +:Context: directory, .htaccess +:Override: AuthConfig + +The ``WSGIAuthGroupScript`` directive provides a mechanism for implementing +group authorisation using the Apache ``Require`` directive. + +More detailed information on using the ``WSGIAuthGroupScript`` directive +can be found in :doc:`../user-guides/access-control-mechanisms`. + +The options which can be supplied to the ``WSGIAuthGroupScript`` directive are: + +**application-group=name** + + Specifies the name of the application group within the specified + process for which the script file will be loaded. + + If the ``application-group`` option is not supplied, the special value + ``%{GLOBAL}`` which denotes that the script file be loaded within the + context of the first interpreter created by Python when it is + initialised will be used. Otherwise, will be loaded into the + interpreter for the specified application group. + +Note that the script always runs in processes associated with embedded +mode. It is not possible to delegate the script such that it is run within +context of a daemon process. diff --git a/docs/configuration-directives/WSGIAuthUserScript.rst b/docs/configuration-directives/WSGIAuthUserScript.rst new file mode 100644 index 0000000..b275504 --- /dev/null +++ b/docs/configuration-directives/WSGIAuthUserScript.rst @@ -0,0 +1,40 @@ +================== +WSGIAuthUserScript +================== + +:Description: Specify script implementing an authentication provider. +:Syntax: ``WSGIAuthUserScript`` *path* [ *options* ] +:Context: directory, .htaccess +:Override: AuthConfig + +The WSGIAuthUserScript directive can be used to specify a script which +implements an Apache authentication provider. + +Such an authentication provider can be used where you want Apache to worry +about the handshaking related to HTTP Basic and Digest authentication and +you only wish to deal with supplying the user credentials for authenticating +the user. + +If using at least Apache 2.2, other Apache modules implementing custom +authentication mechanisms can also make use of the authentication provider +if they are using the corresponding Apache C API for accessing them. + +More detailed information on using the WSGIAuthUserScript directive can be +found in :doc:`../user-guides/access-control-mechanisms`. + +The options which can be supplied to the WSGIAuthUserScript directive are: + +**application-group=name** + Specifies the name of the application group within the specified + process for which the script file will be loaded. + + If the 'application-group' option is not supplied, the special value + '%{GLOBAL}' which denotes that the script file be loaded within the + context of the first interpreter created by Python when it is + initialised will be used. Otherwise, will be loaded into the + interpreter for the specified application group. + +Note that the script always runs in processes associated with embedded +mode. It is not possible to delegate the script such that it is run within +context of a daemon process. + diff --git a/docs/configuration-directives/WSGICallableObject.rst b/docs/configuration-directives/WSGICallableObject.rst new file mode 100644 index 0000000..6157079 --- /dev/null +++ b/docs/configuration-directives/WSGICallableObject.rst @@ -0,0 +1,29 @@ +================== +WSGICallableObject +================== + +:Description: Sets the name of the WSGI application callable. +:Syntax: ``WSGICallableObject`` *name* + ``WSGICallableObject %{ENV:variable}`` +:Default: ``WSGICallableObject application`` +:Context: server config, virtual host, directory, .htaccess +:Override: ``FileInfo`` + +The WSGICallableObject directive can be used to override the name of the +Python callable object in the script file which is used as the entry point +into the WSGI application. + +When ``%{ENV}`` is being used, the environment variable is looked-up via the +internal Apache notes and subprocess environment data structures and (if +not found there) via getenv() from the Apache server process. + +In an Apache configuration file, environment variables accessible using +the ``%{ENV}`` variable reference can be setup by using directives such as +`SetEnv`_ and `RewriteRule`_. + +Note that the name of the callable object must be an object present at +global scope within the WSGI script file. It is not possible to use a dotted +path to refer to a sub object of a module imported by the WSGI script file. + +.. _SetEnv: http://httpd.apache.org/docs/2.2/mod/mod_env.html#setenv +.. _RewriteRule: http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html#rewriterule diff --git a/docs/configuration-directives/WSGICaseSensitivity.rst b/docs/configuration-directives/WSGICaseSensitivity.rst new file mode 100644 index 0000000..13d1441 --- /dev/null +++ b/docs/configuration-directives/WSGICaseSensitivity.rst @@ -0,0 +1,23 @@ +=================== +WSGICaseSensitivity +=================== + +:Description: Define whether file system is case sensitive. +:Syntax: ``WSGICaseSensitivity On|Off`` +:Context: server config + +When mod_wsgi is used on the Windows and MacOS X platforms, it will assume +that the filesystem in use is case insensitive. This is necessary to ensure +that the module caching system works correctly and only one module is +retained in memory where paths with different case are used to identify the +same script file. On other platforms it will always be assumed that a case +sensitive file system is used. + +The WSGICaseSensitivity directive can be used explicitly to specify for a +particular WSGI application whether the file system the script file is +stored in is case sensitive or not, thus overriding the default for any +platform. A value of On indicates that the filesystem is case sensitive. + +Because it is set in the main server config it will apply to the whole +site. All paths therefore would need to be located in a filesystem with the +same case convention. diff --git a/docs/configuration-directives/WSGIDaemonProcess.rst b/docs/configuration-directives/WSGIDaemonProcess.rst new file mode 100644 index 0000000..1fddeed --- /dev/null +++ b/docs/configuration-directives/WSGIDaemonProcess.rst @@ -0,0 +1,276 @@ +================= +WSGIDaemonProcess +================= + +:Description: Configure a distinct daemon process for running applications. +:Syntax: ``WSGIDaemonProcess`` *name* ``[`` *options* ``]`` +:Context: server config, virtual host + +The WSGIDaemonProcess directive can be used to specify that distinct daemon +processes should be created to which the running of WSGI applications can +be delegated. Where Apache has been started as the ``root`` user, the +daemon processes can be run as a user different to that which the Apache +child processes would normally be run as. + +When distinct daemon processes are enabled and used, the process is +dedicated to mod_wsgi and the only thing that the processes do is run the +WSGI applications assigned to that process group. Any other Apache modules +such as PHP or activities such as serving up static files continue to be +run in the standard Apache child processes. + +Note that having denoted that daemon processes should be created by using +the WSGIDaemonProcess directive, the WSGIProcessGroup directive still needs +to be used to delegate specific WSGI applications to execute within those +daemon processes. + +Also note that the name of the daemon process group must be unique for the +whole server. That is, it is not possible to use the same daemon process +group name in different virtual hosts. + +Options which can be supplied to the WSGIDaemonProcess directive are: + +**user=name | user=#uid**.rst + Defines the UNIX user *name* or numeric user *uid* of the user that + the daemon processes should be run as. If this option is not supplied + the daemon processes will be run as the same user that Apache would + run child processes and as defined by the `User`_ directive. + + Note that this option is ignored if Apache wasn't started as the root + user, in which case no matter what the settings, the daemon processes + will be run as the user that Apache was started as. + + Also be aware that mod_wsgi will not allow you to run a daemon + process group as the root user due to the security risk of running + a web application as root. + +**group=name | group=#gid** + Defines the UNIX group *name* or numeric group *gid* of the primary + group that the daemon processes should be run as. If this option is not + supplied the daemon processes will be run as the same group that Apache + would run child processes and as defined by the `Group`_ directive. + + Note that this option is ignored if Apache wasn't started as the root + user, in which case no matter what the settings, the daemon processes + will be run as the group that Apache was started as. + +**processes=num** + Defines the number of daemon processes that should be started in this + process group. If not defined then only one process will be run in this + process group. + + Note that if this option is defined as 'processes=1', then the WSGI + environment attribute called 'wsgi.multiprocess' will be set to be True + whereas not providing the option at all will result in the attribute + being set to be False. This distinction is to allow for where some form + of mapping mechanism might be used to distribute requests across + multiple process groups and thus in effect it is still a multiprocess + application. If you need to ensure that 'wsgi.multiprocess' is False so + that interactive debuggers will work, simply do not specify the + 'processes' option and allow the default single daemon process to be + created in the process group. + +**threads=num** + Defines the number of threads to be created to handle requests in each + daemon process within the process group. + + If this option is not defined then the default will be to create 15 + threads in each daemon process within the process group. + +**umask=0nnn** + Defines a value to be used for the umask of the daemon processes within + the process group. The value must be provided as an octal number. + + If this option is not defined then the umask of the user that Apache is + initially started as will be inherited by the process. Typically the + inherited umask would be '0022'. + +**home=directory** + Defines an absolute path of a directory which should be used as the + initial current working directory of the daemon processes within the + process group. + + If this option is not defined, in mod_wsgi 1.X the current working + directory of the Apache parent process will be inherited by the daemon + processes within the process group. Normally the current working directory + of the Apache parent process would be the root directory. In mod_wsgi 2.0+ + the initial current working directory will be set to be the home + directory of the user that the daemon process runs as. + +**python-path=directory | python-path=directory:directory** + List of colon separated directories to add to the Python module search + path, ie., ``sys.path``. + + Note that this is not strictly the same as having set ``PYTHONPATH`` + environment variable when running normal command line Python. When this + option is used, the directories are added by calling + ``site.addsitedir()``. As well as adding the directory to + ``sys.path`` this function has the effect of opening and interpreting + any '.pth' files located in the specified directories. The option + therefore can be used to point at the ``site-packages`` directory + corresponding to a Python virtual environment created by a tool such as + ``virtualenv``, with any additional directories corresponding to + Python eggs within that directory also being automatically added to + ``sys.path``. + +**python-eggs=directory** + Directory to be used as the Python egg cache directory. This is + equivalent to having set the ``PYTHON_EGG_CACHE`` environment + variable. + + Note that the directory specified must exist and be writable by the + user that the daemon process run as. + +**stack-size=nnn** + The amount of virtual memory in bytes to be allocated for the stack + corresponding to each thread created by mod_wsgi in a daemon process. + + This option would be used when running Linux in a VPS system which has + been configured with a quite low 'Memory Limit' in relation to the + 'Context RSS' and 'Max RSS Memory' limits. In particular, the default + stack size for threads under Linux is 8MB is quite excessive and could + for such a VPS result in the 'Memory Limit' being exceeded before the + RSS limits were exceeded. In this situation, the stack size should be + dropped down to be in the region of 512KB (524288 bytes). + +**maximum-requests=nnn** + Defines a limit on the number of requests a daemon process should + process before it is shutdown and restarted. Setting this to a non zero + value has the benefit of limiting the amount of memory that a process + can consume by (accidental) memory leakage. + + If this option is not defined, or is defined to be 0, then the daemon + process will be persistent and will continue to service requests until + Apache itself is restarted or shutdown. + +**inactivity-timeout=sss** + Defines the maximum number of seconds allowed to pass before the + daemon process is shutdown and restarted when the daemon process has + entered an idle state. For the purposes of this option, being idle + means no new requests being received, or no attempts by current + requests to read request content or generate response content for the + defined period. + + This option exists to allow infrequently used applications running in + a daemon process to be restarted, thus allowing memory being used to + be reclaimed, with process size dropping back to the initial startup + size before any application had been loaded or requests processed. + +**deadlock-timeout=sss** + Defines the maximum number of seconds allowed to pass before the + daemon process is shutdown and restarted after a potential deadlock on + the Python GIL has been detected. The default is 300 seconds. + + This option exists to combat the problem of a daemon process freezing + as the result of a rouge Python C extension module which doesn't + properly release the Python GIL when entering into a blocking or long + running operation. + +**shutdown-timeout=sss** + Defines the maximum number of seconds allowed to pass when waiting + for a daemon process to gracefully shutdown as a result of the maximum + number of requests or inactivity timeout being reached, or when a user + initiated SIGINT signal is sent to a daemon process. When this timeout + has been reached the daemon process will be forced to exited even if + there are still active requests or it is still running Python exit + functions. + + If this option is not defined, then the shutdown timeout will be set + to 5 seconds. Note that this option does not change the shutdown + timeout applied to daemon processes when Apache itself is being stopped + or restarted. That timeout value is defined internally to Apache as 3 + seconds and cannot be overridden. + +**display-name=value** + Defines a different name to show for the daemon process when using the + 'ps' command to list processes. If the value is '%{GROUP}' then the + name will be '(wsgi:group)' where 'group' is replaced with the name + of the daemon process group. + + Note that only as many characters of the supplied value can be displayed + as were originally taken up by 'argv0' of the executing process. Anything + in excess of this will be truncated. + + This feature may not work as described on all platforms. Typically it + also requires a 'ps' program with BSD heritage. Thus on Solaris UNIX + the '/usr/bin/ps' program doesn't work, but '/usr/ucb/ps' does. + +**receive-buffer-size=nnn** + Defines the UNIX socket buffer size for data being received by the + daemon process from the Apache child process. + + This option may need to be used to override small default values set by + certain operating systems and would help avoid possibility of deadlock + between Apache child process and daemon process when WSGI application + generates large responses but doesn't consume request content. In + general such deadlock problems would not arise with well behaved WSGI + applications, but some spam bots attempting to post data to web sites + are known to trigger the problem. + + The maximum possible value that can be set for the buffer size is + operating system dependent and will need to be calculated through trial + and error. + +**send-buffer-size=nnn** + Defines the UNIX socket buffer size for data being sent in the + direction daemon process back to Apache child process. + + This option may need to be used to override small default values set by + certain operating systems and would help avoid possibility of deadlock + between Apache child process and daemon process when WSGI application + generates large responses but doesn't consume request content. In + general such deadlock problems would not arise with well behaved WSGI + applications, but some spam bots attempting to post data to web sites + are known to trigger the problem. + + The maximum possible value that can be set for the buffer size is + operating system dependent and will need to be calculated through trial + and error. + +To delegate a particular WSGI application to run in a named set of daemon +processes, the WSGIProcessGroup directive should be specified in +appropriate context for that application. If WSGIProcessGroup is not used, +the application will be run within the standard Apache child processes. + +If the WSGIDaemonProcess directive is specified outside of all virtual +host containers, any WSGI application can be delegated to be run within +that daemon process group. If the WSGIDaemonProcess directive is specified +within a virtual host container, only WSGI applications associated with +virtual hosts with the same server name as that virtual host can be +delegated to that set of daemon processes. + +When WSGIDaemonProcess is associated with a virtual host, the error log +associated with that virtual host will be used for all Apache error log +output from mod_wsgi rather than it appear in the main Apache error log. + +For example, if a server is hosting two virtual hosts and it is desired +that the WSGI applications related to each virtual host run in distinct +processes of their own and as a user which is the owner of that virtual +host, the following could be used:: + + + ServerName www.site1.com + CustomLog logs/www.site1.com-access_log common + ErrorLog logs/ww.site1.com-error_log + + WSGIDaemonProcess www.site1.com user=joe group=joe processes=2 threads=25 + WSGIProcessGroup www.site1.com + + ... + + + + ServerName www.site2.com + CustomLog logs/www.site2.com-access_log common + ErrorLog logs/www.site2.com-error_log + + WSGIDaemonProcess www.site2.com user=bob group=bob processes=2 threads=25 + WSGIProcessGroup www.site2.com + + ... + + +Note that the WSGIDaemonProcess directive and corresponding features are +not available on Windows or when running Apache 1.3. + +.. _User: http://httpd.apache.org/docs/2.2/mod/mpm_common.html#user +.. _Group: http://httpd.apache.org/docs/2.2/mod/mpm_common.html#group diff --git a/docs/configuration-directives/WSGIImportScript.rst b/docs/configuration-directives/WSGIImportScript.rst new file mode 100644 index 0000000..7705b79 --- /dev/null +++ b/docs/configuration-directives/WSGIImportScript.rst @@ -0,0 +1,42 @@ +================ +WSGIImportScript +================ + +:Description: Specify a script file to be loaded on process start. +:Syntax: ``WSGIImportScript`` *path* ``[`` *options* ``]`` +:Context: server config + +The WSGIImportScript directive can be used to specify a script file to be +loaded when a process starts. Options must be provided to indicate the name +of the process group and the application group into which the script will +be loaded. + +The options which must supplied to the WSGIImportScript directive are: + +**process-group=name** + Specifies the name of the process group for which the script file will + be loaded. + + The name of the process group can be set to the special value + '%{GLOBAL}' which denotes that the script file be loaded for the Apache + child processes. Any other value indicates appropriate process group + for mod_wsgi daemon mode. + +**application-group=name** + Specifies the name of the application group within the specified + process for which the script file will be loaded. + + The name of the application group can be set to the special value + '%{GLOBAL}' which denotes that the script file be loaded within the + context of the first interpreter created by Python when it is + initialised. Otherwise, will be loaded into the interpreter for the + specified application group. + +Because the script files are loaded prior to beginning to accept any +requests, any delay in loading the script will not cause actual requests to +be blocked. As such, the WSGIImportScript can be used to preload a WSGI +application script file on process start so that it is ready when actual +user requests arrive. For where there are multiple processes handling +requests, this can reduce or eliminate the apparent stalling of an +application when performing a restart of Apache or a daemon mode process +group. diff --git a/docs/configuration-directives/WSGILazyInitialization.rst b/docs/configuration-directives/WSGILazyInitialization.rst new file mode 100644 index 0000000..3484427 --- /dev/null +++ b/docs/configuration-directives/WSGILazyInitialization.rst @@ -0,0 +1,73 @@ +====================== +WSGILazyInitialization +====================== + +:Description: Enable/disable lazy initialisation of Python. +:Syntax: ``WSGILazyInitialization On|Off`` +:Default: ``WSGILazyInitialization On`` +:Context: server config + +The WSGILazyInitialization directives sets whether or not the Python +interpreter is preinitialised within the Apache parent process or whether +lazy initialisation is performed, and the Python interpreter only +initialised in the Apache server processes or mod_wsgi daemon processes +after they have forked from the Apache parent process. + +In versions of mod_wsgi prior to version 3.0 the Python interpreter was +always preinitialised in the Apache parent process. This did mean that +theoretically some benefit in memory usage could be derived from delayed +copy on write semantics of memory inherited by child processes that was +initialised in the parent. This memory wasn't significant however and was +tempered by the fact that the Python interpreter when destroyed and then +reinitialised in the Apache parent process on an Apache restart, would with +some Python versions leak memory. This meant that if a server had many +restarts performed, the Apache parent process and thus all forked child +processes could grow in memory usage over time, eventually necessitating +Apache be completely stopped and then restarted. + +This issue of memory leaks with the Python interpreter reached an extreme +with Python 3.0, where by design, various data structures would not be +destroyed on the basis that it would be reused when Python interpreter was +reinitialised within the same process. The problem is that when an Apache +restart is performed, mod_wsgi and the Python library are unloaded from +memory, with the result that the references to that memory would be lost +and so a real memory leak, of significant size and much worse that older +versions of Python, would result. + +As a consequence, with mod_wsgi 3.0 and onwards, the Python interpreter is +not initialised by default in the Apache parent process for any version of +Python. This avoids completely the risk of cummulative memory leaks by the +Python interpreter on a restart into the Apache parent process, albeit with +potential for a slight increase in child process memory sizes. If need be, +the existing behaviour can be restored by setting the directive with the +value 'Off'. + +A further upside of using lazy initialisation is that if you are using +daemon mode only, ie., not using embedded mode, you can completely turn off +initialisation of the Python interpreter within the main Apache server +child process. Unfortunately, because it isn't possible in the general case +to know whether embedded mode will be needed or not, you will need to +manually set the configuration to do this. This can be done by setting:: + + WSGIRestrictEmbedded On + +With restrictions on embedded mode enabled, any attempt to run a WSGI +application in embedded mode will fail, so it will be necessary to ensure +all WSGI applications are delegated to run in daemon mode. Although WSGI +applications will be restricted from being run in embedded mode and the +Python interpreter therefore not initialised, it will fallback to being +initialised if you use any of the Python hooks for access control, +authentication or authorisation providers, or WSGI application dispatch +overrides. + +Note that if mod_python is being used in the same Apache installation, +because mod_python takes precedence over mod_wsgi in initialising the +Python interpreter, lazy initialisation cannot be done and so Python +interpreter will continue to be preinitialised in the Apache parent process +regardless of the setting of WSGILazyInitialization. Use of mod_python will +thus perpetuate the risk of memory leaks and growing memory use of Apache +process. This is especially the case since mod_python doesn't even properly +destroy the Python interpreter in the Apache parent process on a restart +and so all memory associated with the Python interpreter is leaked and not +just that caused by the Python interpreter when it is destroyed and doesn't +clean up after itself. diff --git a/docs/configuration-directives/WSGIPassAuthorization.rst b/docs/configuration-directives/WSGIPassAuthorization.rst new file mode 100644 index 0000000..c346c9e --- /dev/null +++ b/docs/configuration-directives/WSGIPassAuthorization.rst @@ -0,0 +1,24 @@ +===================== +WSGIPassAuthorization +===================== + +:Description: Enable/Disable passing of authorisation headers. +:Syntax: ``WSGIPassAuthorization On|Off`` +:Default: ``WSGIPassAuthorization Off`` +:Context: server config, virtual host, directory, .htaccess + +The WSGIPassAuthorization directive can be used to control whether HTTP +authorisation headers are passed through to a WSGI application in the +``HTTP_AUTHORIZATION`` variable of the WSGI application environment when +the equivalent HTTP request headers are present. This option would need to +be set to ``On`` if the WSGI application was to handle authorisation +rather than Apache doing it. + +Authorisation headers are not passed through by default as doing so could +leak information about passwords through to a WSGI application which should +not be able to see them when Apache is performing authorisation. If Apache +is performing authorisation, a WSGI application can still find out what +type of authorisation scheme was used by checking the variable +``AUTH_TYPE`` of the WSGI application environment. The login name of the +authorised user can be determined by checking the variable +``REMOTE_USER``. diff --git a/docs/configuration-directives/WSGIProcessGroup.rst b/docs/configuration-directives/WSGIProcessGroup.rst new file mode 100644 index 0000000..3d5abe4 --- /dev/null +++ b/docs/configuration-directives/WSGIProcessGroup.rst @@ -0,0 +1,65 @@ +================ +WSGIProcessGroup +================ + +:Description: Sets which process group WSGI application is assigned to. +:Syntax: ``WSGIProcessGroup %{GLOBAL}|%{ENV:variable}|name`` +:Default: ``WSGIProcessGroup %{GLOBAL}`` +:Context: server config, virtual host, directory + +The WSGIProcessGroup directive can be used to specify which process group a +WSGI application or set of WSGI applications will be executed in. All WSGI +applications within the same process group will execute within the context +of the same group of daemon processes. + +The argument to the WSGIProcessGroup can be either one of two special +expanding variables or the actual name of a group of daemon processes setup +using the WSGIDaemonProcess directive. The meaning of the special variables +are: + +**%{GLOBAL}** + The process group name will be set to the empty string. + + Any WSGI applications in the global process group will always be + executed within the context of the standard Apache child processes. + Such WSGI applications will incur the least runtime overhead, however, + they will share the same process space with other Apache modules such + as PHP, as well as the process being used to serve up static file + content. Running WSGI applications within the standard Apache child + processes will also mean the application will run as the user that + Apache would normally run as. + +**%{ENV:variable}** + The process group name will be set to the value of the named + environment variable. The environment variable is looked-up via the + internal Apache notes and subprocess environment data structures and + (if not found there) via getenv() from the Apache server process. + The result must identify a named process group setup using the + WSGIDaemonProcess directive. + +In an Apache configuration file, environment variables accessible using +the ``%{ENV}`` variable reference can be setup by using directives such as +`SetEnv`_ and `RewriteRule`_. + +For example, to select which process group a specific WSGI application +should execute within based on entries in a database file, the following +could be used:: + + RewriteEngine On + RewriteMap wsgiprocmap dbm:/etc/httpd/wsgiprocmap.dbm + RewriteRule . - [E=PROCESS_GROUP:${wsgiprocmap:%{REQUEST_URI}}] + + WSGIProcessGroup %{ENV:PROCESS_GROUP} + +When using the WSGIProcessGroup directive, only daemon process groups +defined within virtual hosts with the same server name, or those defined at +global scope outside of any virtual hosts can be selected. It is not +possible to select a daemon process group which is defined within a +different virtual host. Which daemon process groups can be selected may be +further restricted if the WSGIRestrictProcess directive has been used. + +Note that the WSGIProcessGroup directive and corresponding features are not +available on Windows or when running Apache 1.3. + +.. _SetEnv: http://httpd.apache.org/docs/2.2/mod/mod_env.html#setenv +.. _RewriteRule: http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html#rewriterule diff --git a/docs/configuration-directives/WSGIPythonEggs.rst b/docs/configuration-directives/WSGIPythonEggs.rst new file mode 100644 index 0000000..ad87df7 --- /dev/null +++ b/docs/configuration-directives/WSGIPythonEggs.rst @@ -0,0 +1,18 @@ +============== +WSGIPythonEggs +============== + +:Description: Directory to use for Python eggs cache. +:Syntax: ``WSGIPythonEggs`` *directory* +:Context: server config + +Used to specify the directory to be used as the Python eggs cache directory +for all sub interpreters created within embedded mode. This directive +achieves the same affect as having set the ``PYTHON_EGG_CACHE`` +environment variable. + +Note that the directory specified must exist and be writable by the user +that the Apache child processes run as. The directive only applies to +mod_wsgi embedded mode. To set the Python eggs cache directory for mod_wsgi +daemon processes, use the 'python-eggs' option to the WSGIDaemonProcess +directive instead. diff --git a/docs/configuration-directives/WSGIPythonHome.rst b/docs/configuration-directives/WSGIPythonHome.rst new file mode 100644 index 0000000..1b3a3ae --- /dev/null +++ b/docs/configuration-directives/WSGIPythonHome.rst @@ -0,0 +1,50 @@ +============== +WSGIPythonHome +============== + +:Description: Absolute path to Python prefix/exec_prefix directories. +:Syntax: ``WSGIPythonHome`` *prefix|prefix:exec_prefix* +:Context: server config + +Used to indicate to Python when it is initialised where its library files +are installed. This should be defined where the Python executable is not in +the ``PATH`` of the user that Apache runs as, or where a system has +multiple versions of Python installed in different locations in the file +system, especially different installations of the same major/minor version, +and the installation that Apache finds in its ``PATH`` is not the desired +one. + +This directive can also be used to indicate a Python virtual environment +created using a tool such as ``virtualenv``, to be used for the whole of +mod_wsgi. + +When this directive is used it should be supplied the prefix for the +directories containing the platform independent and system dependent Python +library files. The directories should be separated by a ':'. If the same +directory is used for both, then only the one directory path needs to be +supplied. Where the directories are the same, this can usually be +determined by looking at the value of the ``sys.prefix`` variable for the +version of Python being used. + +Note that the Python installation being referred to using this directive +must be the same major/minor version of Python that mod_wsgi was compiled +for. If you want to use a different version of major/minor version of +Python than currently used, you must recompile mod_wsgi against the alternate +version of Python. + +This directive is the same as having set the environment variable +``PYTHONHOME`` in the environment of the user that Apache executes as. If +this directive is used it will override any setting of ``PYTHONHOME`` in +the environment of the user that Apache executes as. + +This directive will have no affect if mod_python is being loaded into Apache +at the same time as mod_wsgi as mod_python will in that case be responsible +for initialising Python. + +This directive is not available on Windows systems. Note that mod_wsgi 1.X +will not actually reject this directive if listed in the configuration, +however, it also will not do anything either. This is because on Windows +systems Python ignores the ``PYTHONHOME`` environment variable and always +seems to use the location of the Python DLL for determining where the +library files are located. + diff --git a/docs/configuration-directives/WSGIPythonOptimize.rst b/docs/configuration-directives/WSGIPythonOptimize.rst new file mode 100644 index 0000000..d2db4a7 --- /dev/null +++ b/docs/configuration-directives/WSGIPythonOptimize.rst @@ -0,0 +1,49 @@ +================== +WSGIPythonOptimize +================== + +:Description: Enables basic Python optimisation features. +:Syntax: ``WSGIPythonOptimize [0|1|2]`` +:Default: ``WSGIPythonOptimize 0`` +:Context: server config + +Sets the level of Python compiler optimisations. The default is '0' which +means no optimisations are applied. + +Setting the optimisation level to '1' or above will have the effect of +enabling basic Python optimisations and changes the filename extension for +compiled (bytecode) files from ``.pyc`` to ``.pyo``. + +On the Windows platform, optimisation level of '0' apparently results in +the same outcome as if the optimisation level had been set to '1'. + +When the optimisation level is set to '2', doc strings will not be +generated and thus not retained. This may techically result in a smaller +memory footprint if all ``.pyo`` files were compiled at this optimisation +level, but may cause some Python packages which interrogate doc strings in +some way to fail. + +Since all the installed ``.pyo`` files in your Python installation are +not likely to be installed with level '2' optimisation, the gain from using +this level of optimisation will probably be negligible if any. This is +because potentially only the Python code for your own application code will +be compiled with this level of optimisation. This will be the case as the +``.pyo`` files will aready exist for modules in the standard Python +library and they will be used as is, rather than them being regenerated +with a higher level of optimisation than they might be. Use of level '2' +optimisation is therefore discouraged. + +This directive will have no affect if mod_python is being loaded into Apache +at the same time as mod_wsgi as mod_python will in that case be responsible +for initialising Python. + +Overall, if you do not understand what the normal 'python' executable '-O' +option does, how the Python runtime changes it behaviour as a result, and +you don't know exactly how your application would be affected by enabling +this option, then do not use this option. In other words, stop trying to +prematurely optimise the performance of your application through shortcuts. +You will get much better performance gains by looking at the design of your +application and eliminating bottlenecks within it and how it uses any +database. So, put the gun down and back away, it will be better for all +concerned. + diff --git a/docs/configuration-directives/WSGIPythonPath.rst b/docs/configuration-directives/WSGIPythonPath.rst new file mode 100644 index 0000000..d4c2ea6 --- /dev/null +++ b/docs/configuration-directives/WSGIPythonPath.rst @@ -0,0 +1,59 @@ +============== +WSGIPythonPath +============== + +:Description: Additional directories to search for Python modules. +:Syntax: ``WSGIPythonPath`` *directory|directory-1:directory-2:...* +:Context: server config + +Used to specify additional directories to search for Python modules. If +multiple directories are specified they should be separated by a ':' if +using a UNIX like system, or ';' if using Windows. If any part of a +directory path contains a space character, the complete argument string to +WSGIPythonPath must be quoted. + +When using mod_wsgi version 1.X, this directive is the same as having set +the environment variable ``PYTHONPATH`` in the environment of the user +that Apache executes as. If this directive is used it will override any +setting of ``PYTHONPATH`` in the environment of the user that Apache +executes as. The end result is that the listed directories will be added +to ``sys.path``. + +Note that in mod_wsgi version 1.X this applies to all Python sub +interpreters created, be they in the Apache child processes when embedded +mode is used, or in distinct daemon processes when daemon mode is used. It +is not possible to define this differently for mod_wsgi daemon processes. +If additional directories need to be added to the module search path for a +specific WSGI application it should be done within the WSGI application +script itself. + +When using mod_wsgi version 2.0, this directive does not have the same +affect as having set the environment variable ``PYTHONPATH``. In fact, if +``PYTHONPATH`` is set in the environment of the user that Apache is +started as, any directories so defined will still be added to +``sys.path`` and they will not be overridden. + +The difference with this directive when using mod_wsgi 2.0 is that each +directory listed will be added to the end of ``sys.path`` by calling +``site.addsitedir()``. By using this function, as well as the directory +being added to ``sys.path``, any '.pth' files located in the directories +will be opened and processed. Thus, if the directories contain Python eggs, +any associated directories corresponding to those Python eggs will in turn +also be added automatically to ``sys.path``. + +Note however that when using mod_wsgi 2.0, this directive only sets up the +additional Python module search directories for interpreters created in the +Apache child processes where embedded mode is used. If directories need to +be specified for interpreters running in daemon processes, the +'python-path' option to the WSGIDaemonProcess directive corresponding to +that daemon process should instead be used. + +In mod_wsgi version 2.0, because directories corresponding to Python eggs +are automatically added to ``sys.path``, the directive can be used to +point at the ``site-packages`` directory corresponding to a Python +virtual environment created by a tool such as ``virtualenv``. + +For mod_wsgi 1.X, this directive will have no affect if mod_python is being +loaded into Apache at the same time as mod_wsgi as mod_python will in that +case be responsible for initialising Python. + diff --git a/docs/configuration-directives/WSGIRestrictEmbedded.rst b/docs/configuration-directives/WSGIRestrictEmbedded.rst new file mode 100644 index 0000000..e4c1b45 --- /dev/null +++ b/docs/configuration-directives/WSGIRestrictEmbedded.rst @@ -0,0 +1,17 @@ +==================== +WSGIRestrictEmbedded +==================== + +:Description: Enable restrictions on use of embedded mode. +:Syntax: ``WSGIRestrictEmbedded On|Off`` +:Default: ``WSGIRestrictEmbedded Off`` +:Context: server config + +The WSGIRestrictEmbedded directive determines whether mod_wsgi embedded +mode is enabled or not. If set to 'On' and the restriction on embedded mode +is therefore enabled, any attempt to make a request against a WSGI +application which hasn't been properly configured so as to be delegated to +a daemon mode process will fail with a HTTP internal server error response. + +This option does not exist on Windows, or Apache 1.3 or any other +configuration where daemon mode is not available. diff --git a/docs/configuration-directives/WSGIRestrictProcess.rst b/docs/configuration-directives/WSGIRestrictProcess.rst new file mode 100644 index 0000000..e5278f2 --- /dev/null +++ b/docs/configuration-directives/WSGIRestrictProcess.rst @@ -0,0 +1,64 @@ +=================== +WSGIRestrictProcess +=================== + +:Description: Restrict which daemon process groups can be selected. +:Syntax: ``WSGIRestrictProcess`` *group-1 group-2 ...* +:Syntax: WSGIRestrictProcess *group-1 group-2 ...* +:Context: server config, virtual host, directory + +When using the WSGIProcessGroup directive, daemon process groups defined +within virtual hosts with the same server name, or those defined at global +scope outside of any virtual hosts can be selected. It is not possible to +select a daemon process group which is defined within a different virtual +host. + +To further limit which of the available daemon process groups can be +selected, the WSGIRestrictProcess directive can be used to list a +restricted set of daemon process group names. This could be used for +example where %{ENV} substitution is being used to allow the daemon process +group to be selected from a .htaccess file for a specific user. + +The main Apache configuration for this scenario might be:: + + WSGIDaemonProcess default processes=2 threads=25 + + + ServerName www.site.com + + WSGIDaemonProcess bob:1 user=bob group=bob threads=25 + WSGIDaemonProcess bob:2 user=bob group=bob threads=25 + WSGIDaemonProcess bob:3 user=bob group=bob threads=25 + + WSGIDaemonProcess joe:1 user=joe group=joe threads=25 + WSGIDaemonProcess joe:2 user=joe group=joe threads=25 + WSGIDaemonProcess joe:3 user=joe group=joe threads=25 + + SetEnv PROCESS_GROUP default + WSGIProcessGroup %{ENV:PROCESS_GROUP} + + + Options ExecCGI + AllowOverride FileInfo + AddHandler wsgi-script .wsgi + WSGIRestrictProcess bob:1 bob:2 bob:3 + SetEnv PROCESS_GROUP bob:1 + + + +The .htaccess file within the users account could then delegate specific +WSGI applications to different daemon process groups using the +`SetEnv`_ directive:: + + + SetEnv PROCESS_GROUP bob:2 + + + + SetEnv PROCESS_GROUP bob:3 + + +Note that the WSGIDaemonProcess directive and corresponding features are +not available on Windows or when running Apache 1.3. + +.. _SetEnv: http://httpd.apache.org/docs/2.2/mod/mod_env.html#setenv diff --git a/docs/configuration-directives/WSGIRestrictSignal.rst b/docs/configuration-directives/WSGIRestrictSignal.rst new file mode 100644 index 0000000..e565dfd --- /dev/null +++ b/docs/configuration-directives/WSGIRestrictSignal.rst @@ -0,0 +1,46 @@ +================== +WSGIRestrictSignal +================== + +:Description: Enable restrictions on use of signal(). +:Syntax: ``WSGIRestrictSignal On|Off`` +:Default: ``WSGIRestrictSignal On`` +:Context: server config + +A well behaved Python WSGI application should not in general register any +signal handlers of its own using ``signal.signal()``. The reason for this +is that the web server which is hosting a WSGI application will more than +likely register signal handlers of its own. If a WSGI application were to +override such signal handlers it could interfere with the operation of the +web server, preventing actions such as server shutdown and restart. + +In the interests of promoting portability of WSGI applications, mod_wsgi +restricts use of ``signal.signal()`` and will ensure that any attempts +to register signal handlers are ignored. A warning notice will be output +to the Apache error log indicating that this action has been taken. + +If for some reason there is a need for a WSGI application to register some +special signal handler this behaviour can be turned off, however an +application should avoid the signals ``SIGTERM``, ``SIGINT``, +``SIGHUP``, ``SIGWINCH`` and ``SIGUSR1`` as these are all used by +Apache. + +Apache will ensure that the signal ``SIGPIPE`` is set to ``SIG_IGN``. +If a WSGI application needs to override this, it must ensure that it is +reset to ``SIG_IGN`` before any Apache code is run. In a multi threaded +MPM this would be practically impossible to ensure so it is preferable that +the handler for ``SIG_PIPE`` also not be changed. + +Apache does not use ``SIGALRM``, but it is generally preferable that +other techniques be used to achieve the same affect. + +Do note that if enabling the ability to register signal handlers, such a +registration can only reliably be done from within code which is +implemented as a side effect of importing a script file identified by the +WSGIImportScript directive. This is because signal handlers can only be +registered from the main Python interpreter thread, and request handlers +when using embedded mode and a multithreaded Apache MPM would generally +execute from secondary threads. Similarly, when using daemon mode, request +handlers would executed from secondary threads. Only code run as a side +effect of WSGIImportScript is guaranteed to be executed in main Python +interpreter thread. diff --git a/docs/configuration-directives/WSGIRestrictStdin.rst b/docs/configuration-directives/WSGIRestrictStdin.rst new file mode 100644 index 0000000..c5716b8 --- /dev/null +++ b/docs/configuration-directives/WSGIRestrictStdin.rst @@ -0,0 +1,23 @@ +================= +WSGIRestrictStdin +================= + +:Description: Enable restrictions on use of STDIN. +:Syntax: ``WSGIRestrictStdin On|Off`` +:Default: ``WSGIRestrictStdin On`` +:Context: server config + +A well behaved Python WSGI application should never attempt to read any +input directly from ``sys.stdin``. This is because ways of hosting WSGI +applications such as CGI use standard input as the mechanism for receiving +the content of a request from the web server. If a WSGI application were to +directly read from ``sys.stdin`` it could interfere with the operation of +the WSGI adapter and result in corruption of the input stream. + +In the interests of promoting portability of WSGI applications, mod_wsgi +restricts access to ``sys.stdin`` and will raise an exception if an +attempt is made to use ``sys.stdin`` explicitly. + +The only time that one might want to remove this restriction is if the Apache +web server is being run in debug or single process mode for the purposes of +being able to run an interactive Python debugger such as ``pdb``. diff --git a/docs/configuration-directives/WSGIRestrictStdout.rst b/docs/configuration-directives/WSGIRestrictStdout.rst new file mode 100644 index 0000000..ec8f56d --- /dev/null +++ b/docs/configuration-directives/WSGIRestrictStdout.rst @@ -0,0 +1,29 @@ +================== +WSGIRestrictStdout +================== + +:Description: Enable restrictions on use of STDOUT. +:Syntax: ``WSGIRestrictStdout On|Off`` +:Default: ``WSGIRestrictStdout On`` +:Context: server config + +A well behaved Python WSGI application should never attempt to write any +data directly to ``sys.stdout`` or use the ``print`` statement without +directing it to an alternate file object. This is because ways of hosting +WSGI applications such as CGI use standard output as the mechanism for +sending the content of a response back to the web server. If a WSGI +application were to directly write to ``sys.stdout`` it could interfere +with the operation of the WSGI adapter and result in corruption of the +output stream. + +In the interests of promoting portability of WSGI applications, mod_wsgi +restricts access to ``sys.stdout`` and will raise an exception if an +attempt is made to use ``sys.stdout`` explicitly. + +The only time that one might want to remove this restriction is purely out +of convencience of being able to use the ``print`` statement during +debugging of an application, or if some third party module or WSGI +application was errornously using ``print`` when it shouldn't. If +restrictions on using ``sys.stdout`` are removed, any data written to +it will instead be sent through to ``sys.stderr`` and will appear in +the Apache error log file. diff --git a/docs/configuration-directives/WSGIScriptAlias.rst b/docs/configuration-directives/WSGIScriptAlias.rst new file mode 100644 index 0000000..e0d17d7 --- /dev/null +++ b/docs/configuration-directives/WSGIScriptAlias.rst @@ -0,0 +1,66 @@ +=============== +WSGIScriptAlias +=============== + +:Description: Maps a URL to a filesystem location and designates the target as a WSGI script. +:Syntax: ``WSGIScriptAlias`` *URL-path file-path|directory-path* +:Context: server config, virtual host + +The WSGIScriptAlias directive behaves in the same manner as the +`Alias`_ directive, except that it additionally marks the target directory +as containing WSGI scripts, or marks the specific *file-path* as a script, +that should be processed by mod_wsgi's ``wsgi-script`` handler. + +Where the target is a *directory-path*, URLs with a case-sensitive +(%-decoded) path beginning with *URL-path* will be mapped to scripts +contained in the indicated directory. + +For example:: + + WSGIScriptAlias /wsgi-scripts/ /web/wsgi-scripts/ + +A request for ``http://www.example.com/wsgi-scripts/name`` in this case +would cause the server to run the WSGI application defined in +``/web/wsgi-scripts/name``. This configuration is essentially equivalent +to:: + + Alias /wsgi-scripts/ /web/wsgi-scripts/ + + SetHandler wsgi-script + Options +ExecCGI + + +Where the target is a *file-path*, URLs with a case-sensitive +(%-decoded) path beginning with *URL-path* will be mapped to the script +defined by the *file-path*. + +For example:: + + WSGIScriptAlias /name /web/wsgi-scripts/name + +A request for ``http://www.example.com/name`` in this case would cause the +server to run the WSGI application defined in ``/web/wsgi-scripts/name``. + +If possible you should avoid placing WSGI scripts under the `DocumentRoot`_ +in order to avoid accidentally revealing their source code if the +configuration is ever changed. The WSGIScriptAlias makes this easy by +mapping a URL and designating the location of any WSGI scripts at the same +time. If you do choose to place your WSGI scripts in a directory already +accessible to clients, do not use WSGIScriptAlias. Instead, use +``_, `SetHandler`_ and `Options`_ as in:: + + + SetHandler wsgi-script + Options ExecCGI + + +This is necessary since multiple *URL-paths* can map to the same filesystem +location, potentially bypassing the WSGIScriptAlias and revealing the +source code of the WSGI scripts if they are not restricted by a +``_ section. + +.. _Alias: http://httpd.apache.org/docs/2.2/mod/mod_alias.html#alias +.. _DocumentRoot: http://httpd.apache.org/docs/2.2/mod/core.html#documentroot +.. _: http://httpd.apache.org/docs/2.2/mod/core.html#directory +.. _SetHandler: http://httpd.apache.org/docs/2.2/mod/core.html#sethandler +.. _Options: http://httpd.apache.org/docs/2.2/mod/core.html#options diff --git a/docs/configuration-directives/WSGIScriptAliasMatch.rst b/docs/configuration-directives/WSGIScriptAliasMatch.rst new file mode 100644 index 0000000..eb3f397 --- /dev/null +++ b/docs/configuration-directives/WSGIScriptAliasMatch.rst @@ -0,0 +1,33 @@ +==================== +WSGIScriptAliasMatch +==================== + +:Description: Maps a URL to a filesystem location and designates the target as a WSGI script. +:Syntax: ``WSGIScriptAliasMatch`` *regex file-path|directory-path* +:Context: server config, virtual host + +This directive is similar to the WSGIScriptAlias directive, but makes use +of regular expressions, instead of simple prefix matching. The supplied +regular expression is matched against the URL-path, and if it matches, the +server will substitute any parenthesized matches into the given string and +use it as a filename. + +For example, to map a URL to scripts contained within +a directory where the script files use the ``.wsgi`` extension, but it +is desired that the extension not appear in the URL, use:: + + WSGIScriptAliasMatch ^/wsgi-scripts/([^/]+) /web/wsgi-scripts/$1.wsgi + +Note that you should only use WSGIScriptAliasMatch if you know what you are +doing. In most cases you should be using WSGIScriptAlias instead. If you +use WSGIScriptAliasMatch and don't do things the correct way, then you risk +modifying the value of SCRIPT_NAME as passed to the WSGI application and +this can stuff things up badly causing URL mapping to not work correctly +within the WSGI application or stuff up reconstruction of the full URL when +doing redirects. This is because the substitution of the matched sub +pattern from the left hand side back into the right hand side is often +critical. + +If you think you need to use WSGIScriptAliasMatch, you probably don't +really. If you really really think you need it, then check on the mod_wsgi +mailing list about how to use it properly. diff --git a/docs/configuration-directives/WSGIScriptReloading.rst b/docs/configuration-directives/WSGIScriptReloading.rst new file mode 100644 index 0000000..7b0b38f --- /dev/null +++ b/docs/configuration-directives/WSGIScriptReloading.rst @@ -0,0 +1,14 @@ +=================== +WSGIScriptReloading +=================== + +:Description: Enable/Disable detection of WSGI script file changes. +:Syntax: ``WSGIScriptReloading On|Off`` +:Default: ``WSGIScriptReloading On`` +:Context: server config, virtual host, directory, .htaccess +:Override: ``FileInfo`` + +The WSGIScriptReloading directive can be used to control whether changes to +WSGI script files trigger the reloading mechanism. By default script +reloading is enabled and a change to the WSGI script file will trigger +whichever reloading mechanism is appropriate to the mode being used. diff --git a/docs/configuration-directives/WSGISocketPrefix.rst b/docs/configuration-directives/WSGISocketPrefix.rst new file mode 100644 index 0000000..dfcd707 --- /dev/null +++ b/docs/configuration-directives/WSGISocketPrefix.rst @@ -0,0 +1,43 @@ +================ +WSGISocketPrefix +================ + +:Description: Configure directory to use for daemon sockets. +:Syntax: ``WSGISocketPrefix`` *prefix* +:Context: server config + +Defines the directory and name prefix to be used for the UNIX domain +sockets used by mod_wsgi to communicate between the Apache child processes +and the daemon processes. + +If the directive is not defined, the sockets and any related mutex lock +files will be placed in the standard Apache runtime directory. This is the +same directory that the Apache log files would normally be placed. + +For some Linux distributions, restrictive permissions are placed on the +standard Apache runtime directory such that the directory is not readable +to others. This can cause problems with mod_wsgi because the user that the +Apache child processes run as will subsequently not have the required +permissions to access the directory to be able to connect to the sockets. + +When this occurs, a '503 Service Temporarily Unavailable' error response +would be received by the client. To resolve the problem, the +WSGISocketPrefix directive should be defined to point at an alternate +location. The value may be a location relative to the Apache root directory, +or an absolute path. + +On systems which restrict access to the standard Apache runtime directory, +they normally provide an alternate directory for placing sockets and lock +files used by Apache modules. This directory is usually called 'run' and +to make use of this directory the WSGISocketPrefix directive would be set +as follows:: + + WSGISocketPrefix run/wsgi + +Note, do not put the sockets in the system temporary working directory. +That is, do not go making the prefix '/tmp/wsgi'. The directory should be +one that is only writable by 'root' user, or if not starting Apache as +'root', the user that Apache is started as. + +Note that the WSGISocketPrefix directive and corresponding features are not +available on Windows or when running Apache 1.3. diff --git a/docs/configuration.rst b/docs/configuration.rst index a89a2e9..5338dfd 100644 --- a/docs/configuration.rst +++ b/docs/configuration.rst @@ -5,28 +5,28 @@ Configuration .. toctree:: :maxdepth: 2 - config-directives/WSGIAcceptMutex - config-directives/WSGIAccessScript - config-directives/WSGIApplicationGroup - config-directives/WSGIAuthGroupScript - config-directives/WSGIAuthUserScript - config-directives/WSGICallableObject - config-directives/WSGICaseSensitivity - config-directives/WSGIDaemonProcess - config-directives/WSGIImportScript - config-directives/WSGILazyInitialization - config-directives/WSGIPassAuthorization - config-directives/WSGIProcessGroup - config-directives/WSGIPythonEggs - config-directives/WSGIPythonHome - config-directives/WSGIPythonOptimize - config-directives/WSGIPythonPath - config-directives/WSGIRestrictEmbedded - config-directives/WSGIRestrictProcess - config-directives/WSGIRestrictSignal - config-directives/WSGIRestrictStdin - config-directives/WSGIRestrictStdout - config-directives/WSGIScriptAlias - config-directives/WSGIScriptAliasMatch - config-directives/WSGIScriptReloading - config-directives/WSGISocketPrefix + configuration-directives/WSGIAcceptMutex + configuration-directives/WSGIAccessScript + configuration-directives/WSGIApplicationGroup + configuration-directives/WSGIAuthGroupScript + configuration-directives/WSGIAuthUserScript + configuration-directives/WSGICallableObject + configuration-directives/WSGICaseSensitivity + configuration-directives/WSGIDaemonProcess + configuration-directives/WSGIImportScript + configuration-directives/WSGILazyInitialization + configuration-directives/WSGIPassAuthorization + configuration-directives/WSGIProcessGroup + configuration-directives/WSGIPythonEggs + configuration-directives/WSGIPythonHome + configuration-directives/WSGIPythonOptimize + configuration-directives/WSGIPythonPath + configuration-directives/WSGIRestrictEmbedded + configuration-directives/WSGIRestrictProcess + configuration-directives/WSGIRestrictSignal + configuration-directives/WSGIRestrictStdin + configuration-directives/WSGIRestrictStdout + configuration-directives/WSGIScriptAlias + configuration-directives/WSGIScriptAliasMatch + configuration-directives/WSGIScriptReloading + configuration-directives/WSGISocketPrefix diff --git a/docs/getting-started.rst b/docs/getting-started.rst index 54e51cd..5fc7dd7 100644 --- a/docs/getting-started.rst +++ b/docs/getting-started.rst @@ -6,7 +6,7 @@ If starting out with mod_wsgi it is recommended you start out with a simple 'Hello World' type application. Do not attempt to use a Python web application dependent on a web framework -such as Django, Flask or Pyramid until you have got a basic 'Hello World!' +such as Django, Flask or Pyramid until you have got a basic 'Hello World' application running first. The simpler WSGI application will validate that your mod_wsgi installation is working okay and that you at least understand the basics of configuring Apache. diff --git a/docs/index.rst b/docs/index.rst index cc4830a..e8f7a2d 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -2,19 +2,6 @@ mod_wsgi ======== -.. note:: - - After much procrastination, but also a lack of time to do it anyway, - a last ditch effort is being made to get documentation for mod_wsgi - off the old Google Code site before it is archived and shutdown. - Chances are still that this will not happen and documentation will be - dumped here at the last minute in an unconverted state and so will - not be formatted properly, or will simply be in more of a mess than - it is now. Sorry, but if there were 48 hours in a day then maybe - something could be done about it, but there isn't, so you will just - have to be patient. For more details and links to the old - documentation, while it still exists, see :doc:`project-status`. - The mod_wsgi package implements a simple to use Apache module which can host any Python web application which supports the Python WSGI_ specification. The package can be installed in two different ways diff --git a/docs/installation.rst b/docs/installation.rst index 95830e3..6965e67 100644 --- a/docs/installation.rst +++ b/docs/installation.rst @@ -2,14 +2,6 @@ Installation ============ -.. warning :: - - Documentation linked here is actually located on the old Google Code - site and has not yet been transitioned to this site. Some documentation - on the old site may be out of date, especially anything related to - running mod_wsgi on Windows. For more details as to why and the links to - all the old documentation see :doc:`project-status`. - The mod_wsgi package can be installed form source code or may also be available as a pre built binary package as part of your Linux distribution. @@ -25,8 +17,8 @@ mod_wsgi from source code. For instructions on how to compile mod_wsgi from source code for UNIX like operating systems such as Linux and MacOS X see: -* `Installation Instructions `_ + * :doc:`user-guides/quick-installation-guide` If you are on Windows, you should instead use: -* `Installation on Windows `_ + * https://github.com/GrahamDumpleton/mod_wsgi/blob/develop/win32/README.rst diff --git a/docs/project-status.rst b/docs/project-status.rst index 810e4aa..08229f2 100644 --- a/docs/project-status.rst +++ b/docs/project-status.rst @@ -6,7 +6,8 @@ The mod_wsgi project is still being developed and maintained. The available time of the sole developer is however limited. As a result, progress may appear to be slow, or at times not visible at all. Usually it isn't that changes aren't being made, but that because documentation isn't being -updated you wouldn't know about the changes. +updated, except for the release notes, so you wouldn't know about the +changes. A lot of the more recent changes are being made with the aim of making it a lot easier to deploy Apache with mod_wsgi in Docker based @@ -15,20 +16,8 @@ environments. Changes included the ability to install mod_wsgi using provides a really simple way of starting up Apache and mod_wsgi from the command line with an automatically generated configuration. -Completely revised documentation will eventually be incorporated here, but -expect progress to be slow in that area. - -In the mean time keep referring to the older documentation located on -the Google Code site at: - - https://code.google.com/p/modwsgi/wiki/WhereToGetHelp - -The full documentation index on the Google Code site can be found at: - - http://code.google.com/p/modwsgi/w/list - -Documentation for the new ``mod_wsgi-express`` feature will not be found -on the Google Code site, but is documented in the PyPi entry for mod_wsgi -at: - - https://pypi.python.org/pypi/mod_wsgi +In general, the documentation is in a bit of a mess right now, being +somewhat outdated, but also because it had to be quickly shifted to this +new location from the old Google Code site just prior to it being shutdown. +Some documentation was dropped when that move occurred, so if you can't +find something then ask on the mod_wsgi mailing list for help. diff --git a/docs/troubleshooting.rst b/docs/troubleshooting.rst index b074639..45bc906 100644 --- a/docs/troubleshooting.rst +++ b/docs/troubleshooting.rst @@ -2,29 +2,19 @@ Troubleshooting =============== -.. warning :: - - Documentation linked here is actually located on the old Google Code - site and has not yet been transitioned to this site. Some documentation - on the old site may be out of date, especially anything related to - running mod_wsgi on Windows. For more details as to why and the links to - all the old documentation see :doc:`project-status`. - If you are having problems getting mod_wsgi to start up or do what you want it to do, first off ensure that you read the following documents: -* `Installation Issues `_ -* `Configuration Issues `_ -* `Application Issues `_ - -If none of the common issues match up with the problem you are seeing and -are after other ideas, or you have the need to perform more low level -debugging, check out the following document: - -* `Developer Guidelines `_ +* :doc:`user-guides/installation-issues` +* :doc:`user-guides/configuration-issues` +* :doc:`user-guides/application-issues` You can also do some basic checking of your installation and configuration to validate that how it is setup is how you expect it to be. See the following document: -* `Checking Your Installation `_ +* :doc:`user-guides/checking-your-installation` + +If none of the common issues match up with the problem you are seeing and +are after other ideas, or you have the need to perform more low level +debugging, check out the :doc:`user-guides`. diff --git a/docs/user-guides.rst b/docs/user-guides.rst index d9f1b54..60e2c76 100644 --- a/docs/user-guides.rst +++ b/docs/user-guides.rst @@ -5,6 +5,21 @@ User Guides .. toctree:: :maxdepth: 1 + user-guides/quick-installation-guide user-guides/quick-configuration-guide user-guides/configuration-guidelines + user-guides/installation-issues + user-guides/configuration-issues + user-guides/application-issues + user-guides/frequently-asked-questions + user-guides/checking-your-installation + user-guides/debugging-techniques + user-guides/processes-and-threading + user-guides/reloading-source-code + user-guides/virtual-environments user-guides/access-control-mechanisms + user-guides/file-wrapper-extension + user-guides/registering-cleanup-code + user-guides/assorted-tips-and-tricks + user-guides/issues-with-pickle-module + user-guides/issues-with-expat-library diff --git a/docs/user-guides/application-issues.rst b/docs/user-guides/application-issues.rst new file mode 100644 index 0000000..69d9de7 --- /dev/null +++ b/docs/user-guides/application-issues.rst @@ -0,0 +1,1251 @@ +================== +Application Issues +================== + +Although installation and configuration of mod_wsgi may be successful, +there are a range of issues that can impact on specific WSGI applications. +These problems can arise for various reasons, including conflicts between +an application and other Apache modules or non WSGI applications hosted by +Apache, a WSGI application not being portable, use of Python modules that +are not fully compatible with the way that mod_wsgi uses Python sub +interpreters, or dependence on a specific operating system execution +environment. + +The purpose of this document is to capture all the known problems that can +arise, including workarounds if available, related to the actual running +of a WSGI application. + +Note that the majority of these issues are not unique to mod_wsgi and would +also affect mod_python as well. This is because they arise due to the fact +that the Python interpreter is being embedded within the Apache server +itself. Unlike mod_python, in mod_wsgi there are ways of avoiding many of +the problems by using daemon mode. + +If you are having a problem which doesn't seem to be covered by this +document, also make sure you see :doc:`../user-guides/installation-issues` +and :doc:`../user-guides/configuration-issues`. + +Access Rights Of Apache User +---------------------------- + +For most Apache installations the web server is initially started up as +the root user. This is necessary as operating systems will block non root +applications from making use of Internet ports below 1024. A web server +responding to HTTP and HTTPS requests on the standard ports will need to +be able to acquire ports 80 and 443. + +Once the web server has acquired these ports and forked off child processes +to handle any requests, the user that the child processes run as will be +switched to a non privileged user. The actual name of this user varies from +one system to another with some commonly used names being 'apache', +'httpd', 'www', and 'wwwserv'. As well as the user being switched, the web +server will also normally switch to an alternate group. + +If running a WSGI application in embedded mode with mod_wsgi, the user and +group that the Apache child processes run as will be inherited by the +application. To determine which user and group would be used the main +Apache configuration files should be consulted. The particular +configuration directives which control this are ``User`` and ``Group``. +For example:: + + User www + Group www + +Because this user is non privileged and will generally be different to the +user that owns the files for a specific WSGI application, it is important +that such files and the directories which contain them are accessible to +others. If the files are not readable or the directories not searchable, +the web server will not be able to see or read the files and execution of +the WSGI application will fail at some point. + +As well as being able to read files, if a WSGI application needs to be able +to create or edit files, it will be necessary to create a special directory +which it can use to create files in and which is owned by the same user +that Apache is running as. Any files contained in the directory which it +needs to edit should also be owned by the user that Apache is run as, or +group privileges used in some way to ensure the application will have the +required access to update the file. + +One example of where access rights can be a problem in Python is with +Python eggs which need to be unpacked at runtime by a WSGI application. +This issue arises with Trac because of its ability for plugins to be +packaged as Python eggs. Pylons with its focus on being able to support +Python eggs in its deployment mechanism can also be affected. Because +of the growing reliance on Python eggs however, the issue could arise +for any WSGI application where you have installed Python eggs in their +zipped up form rather than their unpacked form. + +If your WSGI application is affected by this problem in relation to Python +eggs, you would generally see a Python exception similar to the following +occuring and being logged in the Apache error logs:: + + ExtractionError: Can't extract file(s) to egg cache + + The following error occurred while trying to extract file(s) to the + Python egg cache: + + [Errno 13] Permission denied: '/var/www/.python-eggs' + + The Python egg cache directory is currently set to: + + /var/www/.python-eggs + + Perhaps your account does not have write access to this directory? + You can change the cache directory by setting the PYTHON_EGG_CACHE + environment variable to point to an accessible directory. + +To avoid this particular problem you can set the 'PYTHON_EGG_CACHE' cache +environment variable at the start of the WSGI application script file. The +environment variable should be set to a directory which is owned and/or +writable by the user that Apache runs as:: + + import os + os.environ['PYTHON_EGG_CACHE'] = '/usr/local/pylons/python-eggs' + +Alternatively, if using mod_wsgi 2.0, one could also use the WSGIPythonEggs +directive for applications running in embedded mode, or the 'python-eggs' +option to the WSGIDaemonProcess directive when using daemon mode. + +Note that you should refrain from ever using directories or files which +have been made writable to anyone as this could compromise security. Also +be aware that if hosting multiple applications under the same web server, +they will all run as the same user and so it will be possible for each to +both see and modify each others files. If this is an issue, you should host +the applications on different web servers running as different users or on +different systems. Alternatively, any data required or updated by the +application should be hosted in a database with separate accounts for each +application. + +Issues related to access rights can in general be avoided if daemon mode +of mod_wsgi is used to run a WSGI application. This is because in daemon +mode the user and group that the processes run as can be overridden and set +to alternate values. Do however note additional issues related to 'HOME' +environment variable as described below. + +Secure Variants Of UNIX +----------------------- + +In addition to the constraints imposed by Apache running as a distinct +user, some variants of UNIX have features whereby access privileges for a +specific user may be even further restricted. One example of such a system +is SELinux. In such a system, the user that Apache runs as is typically +restricted to only being able to access quite specific parts of the file +system as well as possibly other resources or operating system library +features. + +If running such a system you will need to change the configuration for the +security system to allow both mod_wsgi and you application to do what is +required. + +As an example, the extra security checks of such a system may present +problems if the version of Python you are using only provides a static +library and not a shared library. If you experience an error similar to:: + + Cannot load /etc/httpd/modules/mod_wsgi.so into server: \ + /etc/httpd/modules/mod_wsgi.so: cannot restore segment prot after reloc: \ + Permission denied + +you will either need to configure the security system appropriately to +allow that memory relocations in static code to work, or you would need to +make sure that you reinstall Python such that it provides a shared library +and rebuild mod_wsgi. Other issues around only having a static variant of +the Python library available are described in section 'Lack Of Python +Shared Library' of :doc:`../user-guides/installation-issues`. + +Even where a shared library is used, SELinux has also resulted in similar +memory related errors when loading C extension modules at run time for +Python:: + + ImportError: /opt/python2.6/lib/python2.6/lib-dynload/itertools.so: \ + failed to map segment from shared object: Permission denied + +All up, configuring SELinux is a bit of a black art and so you are wise +to do your research. + +For some information about using mod_wsgi in a SELinux enabled environment +check out: + + * http://www.packtpub.com/article/selinux-secured-web-hosting-python-based-web-applications + * http://www.globalherald.net/jb01/weblog/21.html + * http://blog.endpoint.com/2010/02/selinux-httpd-modwsgi-26-rhel-centos-5.html + +Overall, if you don't have a specific need for SELinux, it is suggested +you consider disabling it if it gives you problems. + +Application Working Directory +----------------------------- + +When Apache is started it is typically run such that the current working +directory for the application is the root directory, although the actual +directory may vary dependent on the system or any extra security system in +place. + +Importantly, the current working directory will generally never have any +direct relationship to any specific WSGI application. As a result, an +application should never assume that it can use relative path names for +accessing the filesystem. All paths used should always be absolute path +names. + +An application should also never change the current working directory and +then assume that it can then use relative paths. This is because other +applications being hosted on the same web server may assume they can do the +same thing with the result that you can never be sure what the current +working directory may actually be. + +You should not even assume that it is safe to change the working directory +immediately prior to a specific operation, as use of multithreading can +mean that another application could change it even before you get to +perform the operation which depended on the current working directory +being the value you set it to. + +In the case of Python, if needing to use relative paths in order to make it +easier to relocate an application, one can determine the directory that a +specific code module is located in using ``os.path.dirname(__file__)``. A +full path name can then be constructed by using ``os.path.join()`` to +merge the relative path with the directory name where the module was +located. + +Another option is to take the directory part of the ``SCRIPT_FILENAME`` +variable from the WSGI environment as the base directory. The only other +alternative is to rely on a centralised configuration file so that all +absolute path names are at least defined in the one place. + +Although it is preferable that an application never make assumptions about +what the current working directory is, if for some reason the application +cannot be changed the daemon mode of mod_wsgi could be used. This will work +as an initial current working directory for the process can be specified as +an option to the WSGIDaemonProcess directive used to configure the daemon +process. Because the working directory applies to the whole process +however, only the application requiring this working directory should be +delegated to run within the context of that daemon process. + +Application Environment Variables +--------------------------------- + +When Python sub interpreters are created, each has its own copy of any +modules which are loaded. They also each have their own copy of the set of +environment variables inherited by the process and found in ``os.environ``. + +Problems can arise with the use of ``os.environ`` though, due to the fact +that updates to ``os.environ`` are pushed back into the set of process +environment variables. This means that if the Python sub interpreter which +corresponds to another application group is created after ``os.environ`` +has been updated, the new value for that environment variable will be +inherited by the new Python sub interpreter. + +This would not generally be a problem where a WSGI application is +configured using a single mandatory environment variable, as the WSGI +application script file for each application instance would be required to +set it, thereby overriding any value inherited from another application +instance via the process environment variables. + +As example, Django relies on the ``DJANGO_SETTINGS_MODULE`` environment +variable being set to be the name of the Python module containing Django's +configuration settings. So long as each WSGI script file sets this variable +all will be okay. + +Where use of environment variables can be problematic though is where there +are multiple environment variables that can be set, with some being +optional and non overlapping sets of variables are used to configure +different modes. + +As example, Trac can be configured to host a single project by setting the +``TRAC_ENV`` environment variable. Alternatively, Trac can be configured +to host a group of projects by setting the ``TRAC_ENV_PARENT_DIR`` +environment variable. If both variables are set at the same time, then +``TRAC_ENV`` takes precedence. + +If now within the one process you have a Trac instance of each type in +different Python sub interpreters, if that using ``TRAC_ENV`` loads +first, when the other is loaded it will inherit ``TRAC_ENV`` from the +first and that will override ``TRAC_ENV_PARENT_DIR``. The end result is +that both sites point at the same single project, rather than the first +being for the single project and the other being the group of projects. + +Because of this potential leakage of environment variables between Python +sub interpreters, it is preferable that WSGI applications not rely on +environment variables for configuration. + +A further reason that environment variables should not be used for +configuration is that it then becomes impossible to host two instances of +the same WSGI application component within the same Python sub interpreter +if each would require a different value be set for the same environment +variable. Note that this also applies to other means of hosting WSGI +applications besides mod_wsgi and is not mod_wsgi specific. + +As a consequence, because Django relies on the ``DJANGO_SETTINGS_MODULE`` +environment variable being set to be the name of the Python module +containing Django's configuration settings, it would be impossible to host +two Django instances in the same Python sub interpreter. It is thus +important that where there are multiple instances of Django that need to be +run on the same web server, that they run in separate Python sub +interpreters. + +As it stands the default behaviour of mod_wsgi is to run different WSGI +application scripts within the context of different Python sub +interpreters. As such, this limitation in Django does not present as an +immediate problem, however it should be kept in mind when attempting to +merge multiple WSGI applications into one application group under one +Python sub interpreter to try and limit memory use by avoiding duplicate +instances of modules in memory. + +The prefered way of configuring a WSGI application is for the application +to be a class instance which at the point of initialisation is provided +with its configuration data as an argument. Alternatively, or in +conjunction with this, configuration information can be passed through to +the WSGI application in the WSGI environment. Variables in the WSGI +environment could be set by a WSGI middleware component, or from the Apache +configuration files using the ``SetEnv`` directive. + +Configuring an application when it is first constructed, or by supplying +the configuration information through the WSGI environment variables, is +thus the only way to ensure that a WSGI application is portable between +different means of hosting WSGI applications. These problems can also be +avoided by using daemon mode of mod_wsgi and delegating each WSGI +application instance to a distinct daemon process group. + +Timezone and Locale Settings +---------------------------- + +More insidious than the problem of leakage of application environment +variable settings between sub interpreters, is where an environment +variable is required by operating system libraries to set behaviour. + +This is a problem because applications running in different sub +interpreters could set the process environment variable to be different +values. Rather than each seeing behaviour consistant with the setting they +used, all applications will see behaviour reflecting the setting as +determined by the last application to initialise itself. + +Process environment variables where this can be a problem are the 'TZ' +environment variable for setting the timezone, and the 'LANG', 'LC_TYPE', +'LC_COLLATE', 'LC_TIME' and 'LC_MESSAGES' environment variables for setting +the locale and language settings. + +The result of this, is that you cannot host multiple WSGI applications in +the same process, even if running in different sub interpreters, if they +require different settings for timezone, locale and/or language. + +In this situation you would have no choice but to use mod_wsgi daemon mode +and delegate applications requiring different settings to different daemon +process groups. Alternatively, completely different instances of Apache +should be used. + +User HOME Environment Variable +------------------------------ + +If Apache is started automatically as 'root' when a machine is first booted +it would inherit the user 'HOME' environment variable setting of the 'root' +user. If however, Apache is started by a non privileged user via the 'sudo' +command, it would inherit the 'HOME' environment variable of the user who +started it, unless the '-H' option had been supplied to 'sudo'. In the case +of the '-H' option being supplied, the 'HOME' environment variable of the +'root' user would again be used. + +Because the value of the 'HOME' environment variable can vary based on how +Apache has been started, an application should not therefore depend on the +'HOME' environment variable. + +Unfortunately, parts of the Python standard library do use the 'HOME' +environment variable as an authoritative source of information. In +particular, the 'os.expanduser()' function gives precedence to the value of +the 'HOME' environment variable over the home directory as obtained from +the user password database entry:: + + if 'HOME' not in os.environ: + import pwd + userhome = pwd.getpwuid(os.getuid()).pw_dir + else: + userhome = os.environ['HOME'] + +That the 'os.expanduser()' function does this means it can yield incorrect +results. Since the 'setuptools' package uses 'os.expanduser()' on UNIX +systems to calculate where to store Python EGGS, the location it tries to +use can change based on who started Apache and how. + +The only way to guarantee that the 'HOME' environment variable is set to a +sensible value is for it to be set explicitly at the start of the WSGI +script file before anything else is done:: + + import os, pwd + os.environ["HOME"] = pwd.getpwuid(os.getuid()).pw_dir + +In mod_wsgi 2.0, if using daemon mode the value of the 'HOME' environment +variable will be automatically reset to correspond to the home directory of +the user that the daemon process is running as. This is not done for +embedded mode however, due to the fact that the Apache child processes are +shared with other Apache modules and it is not seen as appropriate that +mod_wsgi should be changing the same environment that is used by these +other unrelated modules. + +For some consistency in the environment inherited by applications running +in embedded mode, it is therefore recommended that 'sudo -H' at least always +be used when restarting Apache from a non root account. + +Application Global Variables +---------------------------- + +Because the Python sub interpreter which hosts a WSGI application is +retained in memory between requests, any global data is effectively +persistent and can be used to carry state forward from one request to the +next. On UNIX systems however, Apache will normally use multiple processes +to handle requests and each such process will have its own global data. + +This means that although global data can be used, it can only be used +to cache data which can be safely reused within the context of that single +process. You cannot use global data as a means of holding information that +must be visible to any request handler no matter which process it runs in. + +If data must be visible to all request handlers across all Apache +processes, then it will be necessary to store the data in the filesystem +directly, or using a database. Alternatively, shared memory can be employed +by using a package such as memcached. + +Because your WSGI application can be spread across multiple process, one +must also be very careful in respect of local caching mechanisms employed +by database connector objects. If such an adapter is quite agressive in its +caching, it is possible that a specific process may end up with an out of +date view of data from a database where one of the other processes has +since changed the data. The result may be that requests handled in different +processes may give different results. + +The problems described above can be alleviated to a degree by using daemon +mode of mod_wsgi and restricting to one the number of daemon processes in +the process group. This will ensure that all requests are serviced by the +same process. If the data is only held in memory, it would however obviously +be lost when Apache is restarted or the daemon process is restarted due to +a maximum number of requests being reached. + +Writing To Standard Output +-------------------------- + +No WSGI application component which claims to be portable should write to +standard output. That is, an application should not use the Python ``print`` +statement without directing output to some alternate stream. An application +should also not write directly to ``sys.stdout``. + +This is necessary as an underlying WSGI adapter hosting the application +may use standard output as the means of communicating a response back to a +web server. This technique is for example used when WSGI is hosted within a +CGI script. + +Ideally any WSGI adapter which uses ``sys.stdout`` in this way should +cache a reference to ``sys.stdout`` for its own use and then replace it +with a reference to ``sys.stderr``. There is however nothing in the WSGI +specification that requires this or recommends it, so one can't therefore +rely on it being done. + +In order to highlight non portable WSGI application components which write +to or use standard output in some way, mod_wsgi prior to version 3.0 +replaced ``sys.stdout`` with an object which will raise an exception when +any attempt is made to write to or make use of standard output:: + + IOError: sys.stdout access restricted by mod_wsgi + +If the WSGI application you are using fails due to use of standard output +being restricted and you cannot change the application or configure it +to behave differently, you have one of two options. The first option is to +replace ``sys.stdout`` with ``sys.stderr`` at the start of your WSGI +application script file:: + + import sys + sys.stdout = sys.stderr + +This will have the affect of directing any data written to standard output +to standard error. Such data sent to standard error is then directed through +the Apache logging system and will appear in the main Apache error log file. + +The second option is to remove the restriction on using standard output +imposed by mod_wsgi using a configuration directive:: + + WSGIRestrictStdout Off + +This configuration directive must appear at global scope within the Apache +configuration file outside of any VirtualHost container directives. It +will remove the restriction on using standard output from all Python sub +interpreters that mod_wsgi creates. There is no way using the configuration +directive to remove the restriction from only one Python sub interpreter. + +When the restriction is not imposed, any data written to standard output +will also be directed through the Apache logging system and will appear in +the main Apache error log file. + +Ideally though, code should never use the 'print' statement without +redirecting the output to 'sys.stderr'. Thus if the code can be changed, +then it should be made to use something like:: + + import sys + + def function(): + print >> sys.stderr, "application debug" + ... + +Also, note that code should ideally not be making assumptions about the +environment it is executing in, eg., whether it is running in an +interactive mode, by asking whether standard output is a tty. In other +words, calling 'isatty()' will cause a similar error with mod_wsgi. If such +code is a library module, the code should be providing a way to +specifically flag that it is a non interactive application and not use +magic to determine whether that is the case or not. + +For further information about options for logging error messages and other +debugging information from a WSGI application running under mod_wsgi see +section 'Apache Error Log Files' of :doc:`../user-guides/debugging-techniques`. + +WSGI applications which are known to write data to standard output in their +default configuration are CherryPy and TurboGears. Some plugins for Trac +also have this problem. Thus one of these two techniques described above to +remove the restriction, should be used in conjunction with these WSGI +applications. Alternatively, those applications will need to be configured +not to output log messages via standard output. + +Note that the restrictions on writing to stdout were removed in mod_wsgi +3.0 because it was found that people couldn't be bothered to fix their +code. Instead they just used the documented workarounds, thereby +propogating their non portable WSGI application code. As such, since people +just couldn't care, stopped promoting the idea of writing portable WSGI +applications. + +Reading From Standard Input +--------------------------- + +No general purpose WSGI application component which claims to be portable +should read from standard input. That is, an application should not read +directly from ``sys.stdin`` either directly or indirectly. + +This is necessary as an underlying WSGI adapter hosting the application may +use standard input as the means of receiving a request from a web server. +This technique is for example used when WSGI is hosted within a CGI script. + +Ideally any WSGI adapter which uses ``sys.stdin`` in this way should +cache a reference to ``sys.stdin`` for its own use and then replace it +with an instance of ``StringIO.StringIO`` wrapped around an empty string +such that reading from standard input would always give the impression that +there is no input data available. There is however nothing in the WSGI +specification that requires this or recommends it, so one can't therefore +rely on it being done. + +In order to highlight non portable WSGI application components which try +and read from or otherwise use standard input, mod_wsgi prior to version +3.0 replaced ``sys.stdin`` with an object which will raise an exception +when any attempt is made to read from standard input or otherwise +manipulate or reference the object:: + + IOError: sys.stdin access restricted by mod_wsgi + +This restriction on standard input will however prevent the use of +interactive debuggers for Python such as ``pdb``. It can also interfere +with Python modules which use the ``isatty()`` method of ``sys.stdin`` +to determine whether an application is being run within an interactive +session. + +If it is required to be able to run such debuggers or other code which +requires interactive input, the restriction on using standard input can be +removed using a configuration directive:: + + WSGIRestrictStdin Off + +This configuration directive must appear at global scope within the Apache +configuration file outside of any VirtualHost container directives. It +will remove the restriction on using standard input from all Python sub +interpreters that mod_wsgi creates. There is no way using the configuration +directive to remove the restriction from only one Python sub interpreter. + +Note however that removing the restriction serves no purpose unless you +also run the Apache web server in single process debug mode. This is +because Apache normally makes use of multiple processes and would close +standard input to prevent any process trying to read from standard input. + +To run Apache in single process debug mode and thus allow an interactive +Python debugger such as ``pdb`` to be used, your Apache instance should +be shutdown and then the ``httpd`` program run explicitly:: + + httpd -X + +For more details on using interactive debuggers in the context of mod_wsgi +see documentation on :doc:`../user-guides/debugging-techniques`. + +Note that the restrictions on reading from stdin were removed in mod_wsgi +3.0 because it was found that people couldn't be bothered to fix their +code. Instead they just used the documented workarounds, thereby +propogating their non portable WSGI application code. As such, since people +just couldn't care, stopped promoting the idea of writing portable WSGI +applications. + +Registration Of Signal Handlers +------------------------------- + +Web servers upon which WSGI applications are hosted more often than not use +signals to control their operation. The Apache web server in particular +uses various signals to control its operation including the signals +``SIGTERM``, ``SIGINT``, ``SIGHUP``, ``SIGWINCH`` and ``SIGUSR1``. + +If a WSGI application were to register their own signal handlers it is +quite possible that they will interfere with the operation of the +underlying web server, preventing it from being shutdown or restarted +properly. As a general rule therefore, no WSGI application component should +attempt to register its own signal handlers. + +In order to actually enforce this, mod_wsgi will intercept all attempts +to register signal handlers and cause the registration to be ignored. +As warning that this is being done, a message will be logged to the Apache +error log file of the form:: + + mod_wsgi (pid=123): Callback registration for signal 1 ignored. + +If there is some very good reason that this feature should be disabled and +signal handler registrations honoured, then the behaviour can be reversed +using a configuration directive:: + + WSGIRestrictSignal Off + +This configuration directive must appear at global scope within the Apache +configuration file outside of any VirtualHost container directives. It +will remove the restriction on signal handlers from all Python sub +interpreters that mod_wsgi creates. There is no way using the configuration +directive to remove the restriction from only one Python sub interpreter. + +WSGI applications which are known to register conflicting signal handlers +are CherryPy and TurboGears. If the ability to use signal handlers is +reenabled when using these packages it prevents the shutdown and restart +sequence of Apache from working properly and the main Apache process is +forced to explicitly terminate the Apache child processes rather than +waiting for them to perform an orderly shutdown. Similar issues will occur +when using features of mod_wsgi daemon mode to recycle processes when a set +number of requests has been reached or an inactivity timer has expired. + +Pickling of Python Objects +-------------------------- + +The script files that mod_wsgi uses as the entry point for a WSGI +application, although containing Python code, are not treated exactly the +same as a Python code module. This has implications when it comes to using +the 'pickle' module in conjunction which objects contained within the WSGI +application script file. + +In practice what this means is that neither function objects, class objects +or instances of classes which are defined in a WSGI application script file +should be stored using the "pickle" module. + +In order to ensure that no strange problems at all are likely to occur, it +is suggested that only basic builtin Python types, ie., scalars, tuples, +lists and dictionaries, be stored using the "pickle" module from a WSGI +application script file. That is, avoid any type of object which has user +defined code associated with it. + +The technical reasons for the limitations in the use of the "pickle" module +in conjunction with WSGI application script files are further discussed in +the document :doc:`../user-guides/issues-with-pickle-module`. Note +that the limitations do not apply to standard Python modules and packages +imported from within a WSGI application script file from directories on the +standard Python module search path. + +Expat Shared Library Conflicts +------------------------------ + +One of the Python modules which comes standard with Python is the 'pyexpat' +module. This contains a Python wrapper for the popular 'expat' library. So +as to avoid dependencies on third party packages the Python package actually +contains a copy of the 'expat' library source code and embeds it within the +'pyexpat' module. + +Prior to Python 2.5, there was however no attempt to properly namespace the +public functions within the 'expat' library source code. The problem this +causes with mod_wsgi is that Apache itself also provides its own copy of +and makes use of the 'expat' library. Because the Apache version of the +'expat' library is loaded first, it will always be used in preference to +the version contained with the Python 'pyexpat' module. + +As a result, if the 'pyexpat' module is loaded into a WSGI application and +the version of the 'expat' library included with Python is markedly +different in some way to the Apache version, it can cause Apache to crash +with a segmentation fault. It is thus important to ensure that Apache and +Python use a compatible version of the 'expat' library to avoid this +problem. + +For further technical discussion of this issue and how to determine which +version of the 'expat' library both Apache and Python use, see the document +:doc:`../user-guides/issues-with-expat-library`. + +MySQL Shared Library Conflicts +------------------------------ + +Shared library version conflicts can also occur with the MySQL client +libraries. In this case the conflict is usually between an Apache module +that uses MySQL directly such as mod_auth_mysql or mod_dbd_mysql, or an +Apache module that indirectly uses MySQL such as PHP, and the Python +'MySQLdb' module. The result of conflicting library versions can be Apache +crashing, or incorrect results beings returned from the MySQL client +library for certain types of operations. + +To ascertain if there is a conflict, you need to determine which versions +of the shared library each package is attempting to use. This can be done +by running, on Linux, the 'ldd' command to list the library dependencies. +This should be done on any Apache modules that are being loaded, any PHP +modules and the Python ``_mysql`` C extension module:: + + $ ldd /usr/lib/python2.3/site-packages/_mysql.so | grep mysql + libmysqlclient_r.so.15 => /usr/lib/libmysqlclient_r.so.15 (0xb7d52000) + + $ ldd /usr/lib/httpd/modules/mod_*.so | grep mysql + libmysqlclient.so.12 => /usr/lib/libmysqlclient.so.12 (0xb7f00000) + + $ ldd /usr/lib/php4/*.so | grep mysql + /usr/lib/php4/mysql.so: + libmysqlclient.so.10 => /usr/lib/mysql/libmysqlclient.so.10 (0xb7f6e000) + +If there is a difference in the version of the MySQL client library, or +one version is reentrant and the other isn't, you will need to recompile +one or both of the packages such that they use the same library. + +SSL Shared Library Conflicts +---------------------------- + +When Apache is built, if it cannot find an existing SSL library that it can +use or isn't told where one is that it should use, it will use a SSL +library which comes with the Apache source code. When this SSL code is +compiled it will be statically linked into the actual Apache executable. To +determine if the SSL code is static rather than dynamically loaded from a +shared library, on Linux, the 'ldd' command can be used to list the library +dependencies. If an SSL library is listed, then code will not be statically +compiled into Apache:: + + $ ldd /usr/local/apache/bin/httpd | grep ssl + libssl.so.0.9.8 => /usr/lib/i686/cmov/libssl.so.0.9.8 (0xb79ab000) + +Where a Python module now uses a SSL library, such as a database client +library with SSL support, they would typically always obtain SSL code from +a shared library. When however the SSL library functions have also been +compiled statically into Apache, they can conflict and interfere with those +from the SSL shared library being used by the Python module. Such conflicts +can cause core dumps, or simply make it appear that SSL support in either +Apache or the Python module is not working. + +Python modules where this is known to cause a problem are, any database +client modules which include support for connecting to the database using +an SSL connection, and the Python 'hashlib' module introduced in Python +2.5. + +In the case of the 'hashlib' module it will fail to load the internal C +extension module called ``_hashlib`` because of the conflict. That +``_hashlib`` module couldn't be loaded is however not raised as an +exception, and instead the code will fallback to attempting to load the +older ``_md5`` module. In Python 2.5 however, this older ``_md5`` +module is not generally compiled and so the following error will occur:: + + ImportError: No module named _md5 + +To resolve this problem it would be necessary to rebuild Apache and use the +'--with-ssl' option to 'configure' to specify the location of the distinct +SSL library that is being used by the Python modules. + +Note that it has also been suggested that the !ImportError above can also +be caused due to the 'python-hashlib' package not being installed. This +might be the case on Linux systems where this module was separated from the +main Python package. + +Python MD5 Hash Module Conflict +------------------------------- + +Python provides in the form of the 'md5' module, routines for calculating +MD5 message-digest fingerprint (checksum) values for arbitrary data. This +module is often used in Python web frameworks for generating cookie values +to be associated with client session information. + +If a WSGI application uses this module, it is however possible that a +conflict can arise if PHP is also being loaded into Apache. The end result +of the conflict will be that the 'md5' module in Python can given incorrect +results for hash values. For example, the same value may be returned no +matter what the input data, or an incorrect or random value can be returned +even for the same data. In the worst case scenario the process may crash. + +As might be expected this can cause session based login schemes such as +commonly employed by Python web frameworks such as Django, TurboGears or +Trac to fail in strange ways. + +The underlying trigger for all these problems appears to be a clash between +the Python 'md5' module and the 'libmhash2' library used by the PHP 'mhash' +module, or possibly also other PHP modules relying on md5 routines for +cryptography such as the LDAP module for PHP. + +This clash has come about because because md5 source code in Python was +replaced with an alternate version when it was packaged for Debian. This +version did not include in the "md5.h" header file some preprocessor +defines to rename the md5 functions with a namespace prefix specific to +Python:: + + #define MD5Init _PyDFSG_MD5Init + #define MD5Update _PyDFSG_MD5Update + #define MD5Final _PyDFSG_MD5Final + #define MD5Transform _PyDFSG_MD5Transform + + void MD5Init(struct MD5Context *context); + void MD5Update(struct MD5Context *context, md5byte const *buf, unsigned len); + void MD5Final(unsigned char digest[16], struct MD5Context *context); + +As a result, the symbols in the md5 module ended up being:: + + $ nm -D /usr/lib/python2.4/lib-dynload/md5.so | grep MD5 + 0000000000001b30 T MD5Final + 0000000000001380 T MD5Init + 00000000000013b0 T MD5Transform + 0000000000001c10 T MD5Update + +The symbols then clashed directly with the non namespaced symbols present +in the 'libmhash2' library:: + + $ nm -D /usr/lib/libmhash.so.2 | grep MD5 + 00000000000069b0 T MD5Final + 0000000000006200 T MD5Init + 0000000000006230 T MD5Transform + 0000000000006a80 T MD5Update + +In Python 2.5 the md5 module is implemented in a different way and thus +this problem should only occur with older versions of Python. For those +older versions of Python, the only workaround for this problem at the +present time is to disable the loading of the 'mhash' module or other PHP +modules which use the 'libmhash2' library. This will avoid the problem +with the Python 'md5' module, obviously however, not loading these modules +into PHP may cause some PHP programs which rely on them to fail. + +The actual cause of this problem having now been identified a patch has been +produced and is recorded in Debian ticket: + + * http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=440272 + +It isn't know when an updated Debian package for Python may be produced. + +Python 'pysqlite' Symbol Conflict +--------------------------------- + +Certain versions of 'pysqlite' module defined a global symbol 'cache_init'. +This symbol clashes with a similarly named symbol present in the Apache +mod_cache module. As a result of the clash, the two modules being loaded at +the same time can cause the Apache process to crash or the following Python +exception to be raised:: + + SystemError: NULL result without error in PyObject_Call + +This problem is mentioned in pysqlite ticket: + + * http://www.initd.org/tracker/pysqlite/ticket/174 + +and the release notes for version 2.3.3: + + * http://www.initd.org/tracker/pysqlite/wiki/2.3.3_Changelog + +of pysqlite To avoid the problem upgrade to pysqlite 2.3.3 or later. + +Python Simplified GIL State API +------------------------------- + +In an attempt to simplify management of thread state objects when coding C +extension modules for Python, Python 2.3 introduced the simplified API for +GIL state management. Unfortunately, this API will only work if the code is +running against the very first Python sub interpreter created when Python +is initialised. + +Because mod_wsgi by default assigns a Python sub interpreter to each WSGI +application based on the virtual host and application mount point, code +would normally never be executed within the context of the first Python sub +interpreter created, instead a distinct Python sub interpreter would be +used. + +The consequences of attempting to use a C extension module for Python which +is implemented against the simplified API for GIL state management in +any sub interpreter besides the first, is that the code is likely to +deadlock or crash the process. The only way around this issue is to ensure +that any WSGI application which makes use of C extension modules which use +this API, only runs in the very first Python sub interpreter created when +Python is initialised. + +To force a specific WSGI application to be run within the very first Python +sub interpreter created when Python is initialised, the WSGIApplicationGroup +directive should be used and the group set to '%{GLOBAL}':: + + WSGIApplicationGroup %{GLOBAL} + +Extension modules for which this is known to be necessary are any which +have been developed using SWIG and for which the '-threads' option was +supplied to 'swig' when the bindings were generated. One example of this is +the 'dbxml' module, a Python wrapper for the Berkeley Database, previously +developed by !SleepyCat Software, but now managed by Oracle. Another package +believed to have this problem in certain use cases is Xapian. + +There is also a bit of a question mark over the Python Subversion bindings. +This package also uses SWIG, however it is only some versions that appear +to require that the very first sub interpreter created when Python is +initialised be used. It is currently believed that this may be more to do +with coding problems than with the '-threads' option being passed to the +'swig' command when the bindings were generated. + +For all the affected packages, as described above it is believed though +that they will work when application group is set to force the application +to run in the first interpreter created by Python as described above. + +Another option for packages which use SWIG generated bindings is not to use +the '-threads' option when 'swig' is used to generate the bindings. This +will avoid any problems and allow the package to be used in any sub +interpreter. Do be aware though that by disabling thread support in SWIG +bindings, that the GIL isn't released when C code is entered. The +consequences of this are that if the C code blocks, the whole Python +interpreter environment running in that process will be blocked, even +requests being handled within other threads in different sub interpreters. + +Reloading Python Interpreters +----------------------------- + +*Note: The "Interpreter" reload mechanism has been removed in mod_wsgi +version 2.0. This is because the problems with third party modules didn't +make it a viable option. Its continued presence was simply complicating the +addition of new features. As an alternative, daemon mode of mod_wsgi should +be used and the "Process" reload mechanism added with mod_wsgi 2.0.* + +To make it possible to modify a WSGI application and have the whole +application reloaded without restarting the Apache web server, mod_wsgi +provides an interpreter reloading feature. This specific feature is enabled +using the WSGIReloadMechanism directive, setting it to the value +'Interpreter' instead of its default value of 'Module':: + + WSGIReloadMechanism Interpreter + +When this option is selected and script reloading is also enabled, when the +WSGI application script file is modified, the next request which arrives +will result in the Python sub interpreter which is hosting that WSGI +application being destroyed. A new Python sub interpreter will then be +created and the WSGI application reloaded including any changes made to +normal Python modules. + +For many WSGI applications this mechanism will generally work fine, however +there are a few limitations on what is reloaded, plus some Python C extension +modules can be incompatible with this feature. + +The first issue is that although Python code modules will be destroyed and +reloaded, because a C extension module is only loaded once and used across +all Python sub interpreters for the life of the process, any changes to a C +extension module will not be picked up. + +The second issue is that some C extension modules may cache references to +the Python interpreter object itself. Because there is no notification +mechanism for letting a C extension module know when a sub interpreter is +destroyed, it is possible that later on the C extension module may attempt +to access the now destroyed Python interpreter. By this time the pointer +reference is likely a dangling reference to unused memory or some +completely different data and attempting to access or use it will likely +cause the process to crash at some point. + +A third issue is that the C extension module may cache references to Python +objects in static variables but not actually increment the reference count +on the objects in respect of its own reference to the objects. When the +last Python sub interpreter to hold a reference to that Python object is +destroyed, the object itself would be destroyed but the static variable left +with a dangling pointer. If a new Python sub interpreter is then created +and the C extension module attempts to use that cached Python object, +accessing it or using it will likely cause the process to crash at some +point. + +A few examples of Python modules which exhibit one or more of these problems +are psycopg2, PyProtocols and lxml. In the case of !PyProtocols, because this +module is used by TurboGears and sometimes used indirectly by Pylons +applications, it means that the interpreter reloading mechanism can not be +used with either of these packages. The reason for the problems with +!PyProtocols appear to stem from its use of Pyrex generated code. The lxml +package similarly uses Pyrex and is thus afflicted. + +In general it is probably inadvisable to use the interpreter reload +mechanism with any WSGI application which uses large or complicated C +extension modules. It would be recommended for example that the interpreter +reload mechanism not be used with Trac because of its use of the Python +Subversion bindings. One would also need to be cautious if using any Python +database client, although some success has been seen when using simple +database adapters such as pysqlite. + +Multiple Python Sub Interpreters +-------------------------------- + +In addition to the requirements imposed by the Python GIL, other issues can +also arise with C extension modules when multiple Python sub interpreters +are being used. Typically these problems arise where an extension module +caches a Python object from the sub interpreter which is initially used to +load the module and then passes that object to code executing within +secondary sub interpreters. + +The prime example of where this would be a problem is where the code within +the second sub interpreter attempts to execute a method of the Python +object. When this occurs the result will be an attempt to execute Python +code which doesn't belong to the current sub interpreter. + +One result of this will be that if the code being executed attempts to +import additional modules it will obtain those modules from the current sub +interpreter rather than the interpreter the code belonged to. The result of +this can be a unholy mixing of code and data owned by multiple sub +interpreters leading to potential chaos at some point. + +A more concrete outcome of such a mixing of code and data from multiple +sub interpreters is where a file object from one sub interpreter is used +within a different sub interpreter. In this sort of situation a Python +exception will occur as Python will detect in certain cases that the object +didn't belong to that interpreter:: + + exceptions.IOError: file() constructor not accessible in restricted mode + +Problems with code being executed in restricted mode can also occur when +the Python code and data marshalling features are used:: + + exceptions.RuntimeError: cannot unmarshal code objects in restricted execution mode + +A further case is where the cached object is a class object and that object +is used to create instances of that type of object for different sub +interpreters. As above this can result in an unholy mixing of code and data +from multiple sub interpreters, but at a more mundane level may become +evident through the 'isinstance()' function failing when used to check the +object instances against the local type object for that sub interpreter. + +An example of a Python module which fails in this way is psycopg2, which +caches an instance of the 'decimal.Decimal' type and uses it to create +object instances for all sub interpreters. This particular problem in +psycopg2 has been reported in psycopg2 ticket: + + * http://www.initd.org/tracker/psycopg/ticket/192 + +and has been fixed in pyscopg2 source code. It isn't known however which +version of psycopg2 this fix may have been released with. Another package +believed to have this problem in certain use cases is lxml. + +Because of the possibilty that extension module writers have not written +their code to take into consideration it being used from multiple sub +interpreters, the safest approach is to force all WSGI applications to run +within the same application group, with that preferably being the +first interpreter instance created by Python. + +To force a specific WSGI application to be run within the very first Python +sub interpreter created when Python is initialised, the WSGIApplicationGroup +directive should be used and the group set to '%{GLOBAL}':: + + WSGIApplicationGroup %{GLOBAL} + +If it is not feasible to force all WSGI applications to run in the same +interpreter, then daemon mode of mod_wsgi should be used to assign +different WSGI applications to their own daemon processes. Each would +then be made to run in the first Python sub interpreter instance within +their respective processes. + +Memory Constrained VPS Systems +------------------------------ + +Virtual Private Server (VPS) systems typically always have constraints +imposed on them in regard to the amount of memory or resources they are +able to use. Various limits and related counts are described below: + +*Memory Limit* + Maximum virtual memory size a VPS/context can allocate. + +*Used Memory* + Virtual memory size used by a running VPS/context. + +*Max Total Memory* + Maximum virtual memory usage by VPS/context. + +*Context RSS Limit* + Maximum resident memory size a VPS/context can allocate. If limit is exceeded, VPS starts to use the host's SWAP. + +*Context RSS* + Resident memory size used by a running VPS/context. + +*Max RSS Memory* + Maximum resident memory usage by VPS/context. + +*Disk Limit* + Maximum disk space that can be used by VPS (calculated for the entire VPS file tree). + +*Used Disk Memory* + Disk space used by a VPS file tree. + +*Files Limit* + Maximum number of files that can be switched to a VPS/context. + +*Used Files* + Number of files used in a VPS/context. + +*TCP Sockets Limit* + Limit on the number of established connections in a virtual server. + +*Established Sockets* + Number of established connections in a virtual server. + +In respect of the limits, when summary virtual memory size used by the +VPS exceeds Memory Limit, processes can't allocate the required memory and +will fail in unexpected ways. The general recommendation is that Context +RSS Limit be set to be one third of Memory Limit. + +Some VPS providers however appear to ignore such guidance, not perhaps +understanding how virtual memory systems work, and set too restrictive a +value on the Memory Limit of the VPS, to the extent that virtual memory use +will exceed the Memory Limit even before actual memory use reaches Max RSS +Memory or even perhaps before reaching Context RSS Limit. + +This is especially a problem where the hosted operating system is Linux, as +Linux uses a default per thread stack size which is excessive. When using +Apache worker MPM with multiple threads, or mod_wsgi daemon mode and +multiple worker threads, the amount of virtual memory quickly adds up +causing the artificial Memory Limit to be exceeded. + +Under Linux the default process stack size is 8MB. Where as other UNIX +system typically use a much smaller per thread stack size in the order of +512KB, Linux inherits the process stack size and also uses it as the per +thread stack size. + +If running a VPS system and are having problems with Memory Limit being +exceeded by the amount of virtual memory set aside by all applications +running in the VPS, it will be necessary to override the default per thread +stack size as used by Linux. + +If you are using the Apache worker MPM, you will need to upgrade to Apache +2.2 if you are not already running it. Having done that you should then use +the Apache directive !ThreadStackSize to lower the per thread stack size +for threads created by Apache for the Apache child processes:: + + ThreadStackSize 524288 + +This should drop the amount of virtual memory being set aside by Apache for +its child process and thus any WSGI application running under embedded +mode. + +If a WSGI application creates its own threads for performing background +activities, it is also preferable that they also override the stack size +set aside for that thread. For that you will need to be using at least +Python 2.5. The WSGI application should be ammended to execute:: + + import thread + thread.stack_size(524288) + +If using mod_wsgi daemon mode, you will need to use mod_wsgi 2.0 and +override the per thread stack size using the 'stack-size' option to the +WSGIDaemonProcess directive:: + + WSGIDaemonProcess example stack-size=524288 + +If you are unable to upgrade to Apache 2.2 and/or mod_wsgi 2.0, the only +other option you have for affecting the amount of virtual memory set aside +for the stack of each thread is to override the process stack size. If you are +using a standard Apache distribution, this can be done by adding to the +'envvars' file for the Apache installation:: + + ulimit -s 512 + +If using a customised Apache installation, such as on RedHat, the 'envvars' +file may not exist. In this case you would need to add this into the actual +startup script for Apache. For RedHat this is '/etc/sysconfig/httpd'. + +Note that although 512KB is given here as an example, you may in practice +need to adjust this higher if you are using third party C extension modules +for Python which allocate significant amounts of memory on the stack. + +OpenBSD And Thread Stack Size +----------------------------- + +When using Linux the excessive amount of virtual memory set aside for the +stack of each thread can cause problems in memory constrained VPS systems. +Under OpenBSD the opposite problem can occur in that the default per thread +stack size can be too small. In this situation the same mechanisms as used +above for adjusting the amount of virtual memory set aside can be used, but +in this case to increase the amount of memory to be greater than the +default value. + +Although it has been reported that the default per thread stack size on +OpenBSD can be a problem, it isn't known what it defaults too and thus +whether it is too low, or whether it was just the users specific +application which was attempting to allocate too much memory from the +stack. + +Python Oracle Wrappers +---------------------- + +When using WSGIDaemonProcess directive, it is possible to use the +'display-name' option to set what the name of the process is that will be +displayed in output from BSD derived 'ps' programs and some other monitoring +programs. This allows one to distinguish the WSGI daemon processes in a +process group from the normal Apache 'httpd' processes. + +The mod_wsgi package accepts the magic string '%{GROUP}' as value to the +WSGIDaemonProcess directive to indicate that mod_wsgi should construct the +name of the processes based on the name of the process group. Specifically, +if you have:: + + WSGIDaemonprocess mygroup display-name=%{GROUP} + +then the name of the processes in that process group would be set to the +value:: + + (wsgi:mygroup) + +This generally works fine, however causes a problem when the WSGI +application makes use of the 'cx_Oracle' module for wrapping Oracle client +libraries in Python. Specifically, Oracle client libraries can produce the +error:: + + ORA-06413: Connection not open. + +This appears to be caused by the use of brackets, ie., '()' in the name of +the process. It is therefore recommended that you explicitly provide the +name to use for the process and avoid these characters and potentially any +non alphanumeric characters to be extra safe. + +This issue is briefly mentioned in: + + * http://www.dba-oracle.com/t_ora_06413_connection_not_open.htm + +Non Blocking Module Imports +--------------------------- + +In Python 2.6 non blocking module imports were added as part of the Python +C API in the form of the function PyImport_ImportModuleNoBlock(). This +function was introduced to prevent deadlocks with module imports in certain +circumstances. Unfortunately, for valid reasons or not, use of this +function has been sprinkled through Python standard library modules as well +as third party modules. + +Although the function may have been created to fix some underlying issue, +its usage has caused a new set of problems for multithreaded programs which +defer module importing until after threads have been created. With mod_wsgi +this is actually the norm as the default mode of operation is that code is +lazily loaded only when the first request arrives which requires it. + +A classic example of the sorts of problems use of this function causes is the +error:: + + ImportError: Failed to import _strptime because the import lock is held by another thread. + +This particular error occurs when 'time.strptime()' is called for the first +time and it so happens that another thread is in the process of doing a +module import and holds the global module import lock. + +It is believed that the fact this can happen indicates that Python is +flawed in using the PyImport_ImportModuleNoBlock(). Unfortunately, when +this issue has been highlighted in the past, people seemed to think it was +acceptable and the only solution, rather than fixing the Python standard +library, was to ensure that all module imports are done before any threads +are created. + +This response is frankly crazy and you can expect all manner of random +problems related to this to crop up as more and more people start using the +PyImport_ImportModuleNoBlock() function without realising that it is a +really bad idea in the context of a multithreaded system. + +Although no hope is held out for the issue being fixed in Python, a problem +report has still been lodged and can be found at:: + + * http://bugs.python.org/issue8098 + +The only work around for the problem is to ensure that all module imports +related to modules on which the PyImport_ImportModuleNoBlock() function is +used be done explicitly or indirectly when the WSGI script file is loaded. +Thus, to get around the specific case above, add the following into the +WSGI script file:: + + import _strptime + +There is nothing that can be done in mod_wsgi to fix this properly as the +set of modules that might have to be forceably imported is unknown. Having +a hack to import them just to avoid the problem is also going to result in +unnecessary memory usage if the application didn't actually need them. diff --git a/docs/user-guides/assorted-tips-and-tricks.rst b/docs/user-guides/assorted-tips-and-tricks.rst new file mode 100644 index 0000000..6fb54b5 --- /dev/null +++ b/docs/user-guides/assorted-tips-and-tricks.rst @@ -0,0 +1,123 @@ +======================== +Assorted Tips And Tricks +======================== + +This document contains various tips and tricks related to using mod_wsgi +which don't deserve a document of their own or which don't fit within other +documentation. + +Determining If Running Under mod_wsgi +------------------------------------- + +As a WSGI application developer you should always be striving to write +portable WSGI applications. That is, you should not write your code so as +to be dependent on the specific features of a specific WSGI hosting +mechanism. + +This unfortunately is not always possible especially when it comes to +deployment due to there being no one blessed way for exposing a WSGI +application for hooking into WSGI hosting mechanisms. There may also be +times when you might want to rely on a feature of a specific WSGI hosting +mechanism, which although not part of the WSGI specification, allows you +to do something you wouldn't otherwise. + +That said, there a few ways in which you can detect that your code is +running under mod_wsgi. These fall under two categories. The first being +a general mechanism for how to detect if mod_wsgi is being used. The +second being additional ways to detect that mod_wsgi is being used when a +request is being handled. + +The simplest way of detecting if mod_wsgi is being used is to import the +'mod_wsgi' module. This is a special embedded mode which is installed +automatically by the Apache/mod_wsgi module into set of imported modules, +ie., sys.modules. You can thus do:: + + try: + import mod_wsgi + # Put code here which should only run when mod_wsgi is being used. + except: + pass + +Do note however that although this is an embedded mode added automatically, +the way mod_wsgi has been implemented allows in the future for there to be +a separate Python package/module distinct from the mod_wsgi.so file called +'mod_wsgi' which might contain additional Python code to support use of +mod_wsgi. + +What would happen if such a separate Python package/module is available is +that it will be automatically imported and additional information setup by +the Apache/mod_wsgi module then inserted into the global namespace of that +Python package/module. + +The potential existance of this distinct Python package/module means that +importing 'mod_wsgi' could one day actually succeed outside of code being +run under the Apache/mod_wsgi module. + +A more correct test therefore is:: + + try: + from mod_wsgi import version + # Put code here which should only run when mod_wsgi is being used. + except: + pass + +This is different because the 'version' attribute will only be present when +running under the Apache/mod_wsgi module as that version relates to the +version of mod_wsgi.so. + +The above import check can be used anywhere, be that in the WSGI script file, +or in your application code at either global scope or within the context of +a specific function. + +In the specific case of the WSGI script file, although the above can be +used there is an alternate check that can be made. That is to check the +value of the '__name__' attribute given to the WSGI script file when the +code is loaded into the Python interpreter. + +The normal situation where one would check the value of '__name__' is where +wanting to do something different when a Python code file is executed +directly against the Python interpreter as opposed to being imported. For +example:: + + if __name__ == '__main__': + ... + +In contrast, were a Python code file is imported, the '__name__' attribute +would be the dotted path which would be used to import the code file. + +In the case of mod_wsgi, although WSGI script files are imported as if they +are a module, because they could exist anywhere and not in locations on +the Python module search path, they don't have a conventional dotted path +name. Instead they have a magic name built from a md5 hash of the path to the +WSGI script file. + +So as to at least identify this as being related to mod_wsgi, it has the +prefix '_mod_wsgi_'. This means a WSGI script file could use:: + + if __name__.startswith('_mod_wsgi_'): + ... + +if it needed to execute different code based on whether the WSGI script +file was actually being loaded by the Apache/mod_wsgi module as opposed to +be executed directly as a script by the command line Python interpreter. + +This latter technique obviously only works in the WSGI script file and not +elsewhere. + +A final method that can be used within the context of the WSGI application +handling the request is to interrogate the WSGI environ dictionary passed +to the WSGI application. In this case code can look for the presence of +the 'mod_wsgi.version' key within the WSGI environ dictionary:: + + def application(environ, start_response): + status = '200 OK' + if environ.has_key('mod_wsgi.version'): + output = 'Hello mod_wsgi!' + else: + output = 'Hello other WSGI hosting mechanism!' + + response_headers = [('Content-type', 'text/plain'), + ('Content-Length', str(len(output)))] + start_response(status, response_headers) + + return [output] diff --git a/docs/user-guides/checking-your-installation.rst b/docs/user-guides/checking-your-installation.rst new file mode 100644 index 0000000..5b9c8ac --- /dev/null +++ b/docs/user-guides/checking-your-installation.rst @@ -0,0 +1,670 @@ +========================== +Checking Your Installation +========================== + +When debugging mod_wsgi or a WSGI application, it is import to be able to +understand how mod_wsgi has been installed, what Apache and/or Python it +uses and how those systems have been configured, plus under what +configuration the WSGI application is running. + +This document details various such checks that can be made. The primary +purpose of providing this information is so that when people ask questions +on the mod_wsgi mailing list, they can be directed here to perform certain +checks as a way of collecting additional information needed to help debug +their problem. + +If you are reading this document because you have been directed here from +the mailing list, then ensure that you actually provide the full amount of +detail obtained from the checks and not leave out information. When you +leave out information then it means guesses have to be made about your +setup which makes it harder to debug your problems. + +Apache Build Information +------------------------ + +Information related to what version of Apache is being used and how it is +built is obtained in a number of ways. The primary means is from the +Apache 'httpd' executable itself using command line options. The main such +option is the '-V' option. + +On most systems the standard Apache executable supplied with the operating +system is located at '/usr/sbin/httpd'. On MacOS X, for the operating system +supplied Apache the output from this is:: + + $ /usr/sbin/httpd -V + Server version: Apache/2.2.14 (Unix) + Server built: Feb 10 2010 22:22:39 + Server's Module Magic Number: 20051115:23 + Server loaded: APR 1.3.8, APR-Util 1.3.9 + Compiled using: APR 1.3.8, APR-Util 1.3.9 + Architecture: 64-bit + Server MPM: Prefork + threaded: no + forked: yes (variable process count) + Server compiled with.... + -D APACHE_MPM_DIR="server/mpm/prefork" + -D APR_HAS_SENDFILE + -D APR_HAS_MMAP + -D APR_HAVE_IPV6 (IPv4-mapped addresses enabled) + -D APR_USE_FLOCK_SERIALIZE + -D APR_USE_PTHREAD_SERIALIZE + -D SINGLE_LISTEN_UNSERIALIZED_ACCEPT + -D APR_HAS_OTHER_CHILD + -D AP_HAVE_RELIABLE_PIPED_LOGS + -D DYNAMIC_MODULE_LIMIT=128 + -D HTTPD_ROOT="/usr" + -D SUEXEC_BIN="/usr/bin/suexec" + -D DEFAULT_PIDLOG="/private/var/run/httpd.pid" + -D DEFAULT_SCOREBOARD="logs/apache_runtime_status" + -D DEFAULT_LOCKFILE="/private/var/run/accept.lock" + -D DEFAULT_ERRORLOG="logs/error_log" + -D AP_TYPES_CONFIG_FILE="/private/etc/apache2/mime.types" + -D SERVER_CONFIG_FILE="/private/etc/apache2/httpd.conf" + +The most important details here are: + + * The version of Apache from the 'Server version' entry. + * The MPM which Apache has been compiled to use from the 'Server MPM' entry. + +Although this has a section which appears to indicate what preprocessor +options the server was compiled with, it is a massaged list. What is often +more useful is the actual arguments which were supplied to the 'configure' +command when Apache was built. + +To determine this information you need to do the following. + + * Work out where 'apxs2' or 'apxs' is installed. + * Open this file and find setting for '$installbuilddir'. + * Open the 'config.nice' file in the directory specified for build directory. + +On MacOS X, for the operating system supplied Apache this file is located at +'/usr/share/httpd/build/config.nice'. The contents of the file is:: + + #! /bin/sh + # + # Created by configure + + "/SourceCache/apache/apache-747.1/httpd/configure" \ + "--prefix=/usr" \ + "--enable-layout=Darwin" \ + "--with-apr=/usr" \ + "--with-apr-util=/usr" \ + "--with-pcre=/usr/local/bin/pcre-config" \ + "--enable-mods-shared=all" \ + "--enable-ssl" \ + "--enable-cache" \ + "--enable-mem-cache" \ + "--enable-proxy-balancer" \ + "--enable-proxy" \ + "--enable-proxy-http" \ + "--enable-disk-cache" \ + "$@" + +Not only does this indicate what features of Apache have been compiled in, +it also indicates by way of the '--enable-layout' option what custom Apache +installation layout has been used. + +Apache Modules Loaded +--------------------- + +Modules can be loaded into Apache statically, or can be loaded dynamically +at run time based on Apache configuration files. + +If modules have been statically compiled into Apache, usually it would be +evident by what 'configure' arguments have been used when Apache was built. +To verify what exactly what is compiled in statically, you can use the '-l' +option to the Apache executable. + +On MacOS X, for the operating system supplied Apache the output from +running '-l' option is:: + + $ /usr/sbin/httpd -l + Compiled in modules: + core.c + prefork.c + http_core.c + mod_so.c + +This indicates that the only module that is loaded statically is 'mod_so'. +This is actually the Apache module that handles the task of dynamically +loading other Apache modules. + +For a specific Apache configuration, you can determine what Apache modules +will be loaded dynamically by using the '-M' option for the Apache executable. + +On MacOS X, for the operating system supplied Apache the output from +running '-M' option, where the only additional module added is mod_wsgi, +is:: + + $ /usr/sbin/httpd -M + Loaded Modules: + core_module (static) + mpm_prefork_module (static) + http_module (static) + so_module (static) + authn_file_module (shared) + authn_dbm_module (shared) + authn_anon_module (shared) + authn_dbd_module (shared) + authn_default_module (shared) + authz_host_module (shared) + authz_groupfile_module (shared) + authz_user_module (shared) + authz_dbm_module (shared) + authz_owner_module (shared) + authz_default_module (shared) + auth_basic_module (shared) + auth_digest_module (shared) + cache_module (shared) + disk_cache_module (shared) + mem_cache_module (shared) + dbd_module (shared) + dumpio_module (shared) + ext_filter_module (shared) + include_module (shared) + filter_module (shared) + substitute_module (shared) + deflate_module (shared) + log_config_module (shared) + log_forensic_module (shared) + logio_module (shared) + env_module (shared) + mime_magic_module (shared) + cern_meta_module (shared) + expires_module (shared) + headers_module (shared) + ident_module (shared) + usertrack_module (shared) + setenvif_module (shared) + version_module (shared) + proxy_module (shared) + proxy_connect_module (shared) + proxy_ftp_module (shared) + proxy_http_module (shared) + proxy_ajp_module (shared) + proxy_balancer_module (shared) + ssl_module (shared) + mime_module (shared) + dav_module (shared) + status_module (shared) + autoindex_module (shared) + asis_module (shared) + info_module (shared) + cgi_module (shared) + dav_fs_module (shared) + vhost_alias_module (shared) + negotiation_module (shared) + dir_module (shared) + imagemap_module (shared) + actions_module (shared) + speling_module (shared) + userdir_module (shared) + alias_module (shared) + rewrite_module (shared) + bonjour_module (shared) + wsgi_module (shared) + Syntax OK + +The names reflect that which would have been used with the LoadModule line +in the Apache configuration and not the name of the module file itself. + +The order in which modules are listed can be important in some cases where +a module doesn't explicitly designate in what order a handler should be +applied relative to other Apache modules. + +Global Accept Mutex +------------------- + +Because Apache is a multi process server, it needs to use a global cross +process mutex to control which of the Apache child processes get the next +chance to accept a connection from a HTTP client. + +This cross process mutex can be implemented using a variety of different +mechanisms and exactly which is used can vary based on the operating system. +Which mechanism is used can also be overridden in the Apache configuration +if absolutely required. + +A simlar instance of a cross process mutex is also used for each mod_wsgi +daemon process group to mediate which process in the daemon process group +gets to accept the next request proxied to that daemon process group via the +Apache child processes. + +The list of mechanisms which might be used to implement the cross process +mutex are as follows: + + * flock + * fcntl + * sysvsem + * posixsem + * pthread + +In the event that there are issues which communicating between the Apache +child processes and the mod_wsgi daemon process in particular, it can be +useful to know what mechanism is used to implement the cross process mutex. + +By default, the Apache configuration files would not specify a specific +mechanism, and instead which is used would be automatically selected by the +underlying Apache runtime libraries based on various build time and system +checks about what is the prefered mechanism for a particular operating +system. + +Which mechanism is used by default can be determined from the build +information displayed by the '-V' option to the Apache executable described +previously. The particular entries of interest are those with 'SERIALIZE' +in the name of the macro. + +On MacOS X, using operating system supplied Apache, the entries of interest +are:: + + -D APR_USE_FLOCK_SERIALIZE + -D APR_USE_PTHREAD_SERIALIZE + +As the entries are used in order, what this indicates is that Apache will by +default use the 'flock' mechanism to implement the cross process mutex. + +In comparison, on a Linux system, the entries of interest may be:: + + -D APR_USE_SYSVSEM_SERIALIZE + -D APR_USE_PTHREAD_SERIALIZE + +which indicates that 'sysvsem' mechanism is instead used. + +This mechanism is also what would be used by mod_wsgi by default as well for +the cross process mutex for daemon process groups. + +This mechanism will be different where the AcceptMutex and WSGIAcceptMutex +directives are used. + +If the AcceptMutex directive is defined in the Apache configuration file, +then what ever mechanism is specified will be used instead for Apache child +processes. Provided that Apache 2.2 or older is used, and WSGIAcceptMutex +is not specified, then when AcceptMutex is used, that will also then be used +by mod_wsgi daemon processes as well. + +In the case of Apache 2.4 and later, AcceptMutex will no longer override the +default for mod_wsgi daemon process groups, and instead WSGIAcceptMutex must +be specified seperately if it needs to be overridden for both. + +Either way, you should check the Apache configuration files as to whether +either AcceptMutex or WSGIAcceptMutex directives are used as they will +override the defaults calculated above. Under normal circumstances neither +should be set as default would always be used. + +If wanting to look at overriding the default mechanism, what options exist +for what mechanism can be used will be dependent on the operating system +being used. There are a couple of ways this can be determined. + +The first is to find the 'apr.h' header file from the Apache runtime library +installation that Apache was compiled against. In that you will find entries +similar to the 'USE' macros above. You will also find 'HAS' entries. In this +case we are interested in the 'HAS' entries. + +On MacOS X, with the operating system supplied APR library, the entries in +'apr.h' are:: + + #define APR_HAS_FLOCK_SERIALIZE 1 + #define APR_HAS_SYSVSEM_SERIALIZE 1 + #define APR_HAS_POSIXSEM_SERIALIZE 1 + #define APR_HAS_FCNTL_SERIALIZE 1 + #define APR_HAS_PROC_PTHREAD_SERIALIZE 0 + +The available mechanisms are those defined to be '1'. + +Finding where the right 'apr.h' is located may be tricky, so an easier way +is to trick Apache into generating an error message to list what the available +mechanisms are. To do this, in turn, add entries into the Apache configuration +files, at global scope of:: + + AcceptMutex xxx + +and:: + + WSGIAcceptMutex xxx + +For each run the '-t' option on the Apache program executable. + +On MacOS X, with the operating system supplied APR library, this yields:: + + $ /usr/sbin/httpd -t + Syntax error on line 501 of /private/etc/apache2/httpd.conf: + xxx is an invalid mutex mechanism; Valid accept mutexes for this platform \ + and MPM are: default, flock, fcntl, sysvsem, posixsem. + +for AcceptMutex and for WSGIAcceptMutex:: + + $ /usr/sbin/httpd -t + Syntax error on line 501 of /private/etc/apache2/httpd.conf: + Accept mutex lock mechanism 'xxx' is invalid. Valid accept mutex mechanisms \ + for this platform are: default, flock, fcntl, sysvsem, posixsem. + +The list of available mechanisms should normally be the same in both cases. + +Using the value of 'default' indicates that which mechanism is used is left +up to the APR library. + +Python Shared Library +--------------------- + +When mod_wsgi is built, the 'mod_wsgi.so' file should be linked against +Python via a shared library. If it isn't and it is linked against a static +library, various issues can arise. These include additional memory usage, +plus conflicts with mod_python if it is also loaded in same Apache. + +To validate that 'mod_wsgi.so' is using a shared library for Python, on most +UNIX systems the 'ldd' command is used. For example:: + + $ ldd mod_wsgi.so + linux-vdso.so.1 => (0x00007fffeb3fe000) + libpython2.5.so.1.0 => /usr/local/lib/libpython2.5.so.1.0 (0x00002adebf94d000) + libpthread.so.0 => /lib/libpthread.so.0 (0x00002adebfcba000) + libdl.so.2 => /lib/libdl.so.2 (0x00002adebfed6000) + libutil.so.1 => /lib/libutil.so.1 (0x00002adec00da000) + libc.so.6 => /lib/libc.so.6 (0x00002adec02dd000) + libm.so.6 => /lib/libm.so.6 (0x00002adec0635000) + /lib64/ld-linux-x86-64.so.2 (0x0000555555554000) + +What you want to see is a reference to an instance of 'libpythonX.Y.so'. +Normally the operating system shared library version suffix would always be +'1.0'. What it is shouldn't really matter though. + +This reference should refer to the actual Python shared library for your +Python installation. + +Do note though, that 'ldd' will take into consideration any local user +setting of the 'LD_LIBRARY_PATH' environment variable. That is, 'ldd' will +also search any directories listed in that environment variable for shared +libraries. + +Although that environment variable may be defined in your user account, it +will not normally be defined in the environment of the account that Apache +starts up as. Thus, it is important that you unset the 'LD_LIBRARY_PATH' +environment variable when running 'ldd'. + +If you run the check with and without 'LD_LIBRARY_PATH' set and find that +without it that a different, or no Python shared library is found, then you +will likely have a problem. For the case of it not being found, Apache will +fail to start. For where it is found but it is a different installation to +that which you want used, subtle problems could occur due to C extension +modules for Python being used which were compiled against that installation. + +For example, if 'LD_LIBRARY_PATH' contained the directory '/usr/local/lib' +and you obtained the results above, but when you unset it, it picked up +shared library from '/usr/lib' instead, then you may end up with problems +if for a different installation. In this case you would see:: + + $ unset LD_LIBRARY_PATH + $ ldd mod_wsgi.so + linux-vdso.so.1 => (0x00007fffeb3fe000) + libpython2.5.so.1.0 => /usr/lib/libpython2.5.so.1.0 (0x00002adebf94d000) + libpthread.so.0 => /lib/libpthread.so.0 (0x00002adebfcba000) + libdl.so.2 => /lib/libdl.so.2 (0x00002adebfed6000) + libutil.so.1 => /lib/libutil.so.1 (0x00002adec00da000) + libc.so.6 => /lib/libc.so.6 (0x00002adec02dd000) + libm.so.6 => /lib/libm.so.6 (0x00002adec0635000) + /lib64/ld-linux-x86-64.so.2 (0x0000555555554000) + +Similarly, if not found at all, you would see:: + + $ unset LD_LIBRARY_PATH + $ ldd mod_wsgi.so + linux-vdso.so.1 => (0x00007fffeb3fe000) + libpython2.5.so.1.0 => not found + libpthread.so.0 => /lib/libpthread.so.0 (0x00002adebfcba000) + libdl.so.2 => /lib/libdl.so.2 (0x00002adebfed6000) + libutil.so.1 => /lib/libutil.so.1 (0x00002adec00da000) + libc.so.6 => /lib/libc.so.6 (0x00002adec02dd000) + libm.so.6 => /lib/libm.so.6 (0x00002adec0635000) + /lib64/ld-linux-x86-64.so.2 (0x0000555555554000) + +If you have this problem, then it would be necessary to set 'LD_RUN_PATH' +environment variable to include directory containing where Python library +resides when building mod_wsgi, or set 'LD_LIBRARY_PATH' in startup file +for Apache such that it is also set for Apache when run. For standard +Apache installation the latter would be done in 'envvars' file in same +directory as Apache program executable. For some Linux installations would +need to be done in init scripts for Apache. + +Note that MacOS X doesn't use 'LD_LIBRARY_PATH' nor have 'ldd'. On MacOS X, +instead of 'ldd' you can use 'otool -L':: + + $ otool -L mod_wsgi.so + mod_wsgi.so: + /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 125.2.0) + /System/Library/Frameworks/Python.framework/Versions/2.6/Python (compatibility version 2.6.0, current version 2.6.1) + +If using standard MacOS X compilers and not using Fink or !MacPorts, there +generally should not ever be any issues with whether it is a shared library +or not as everything should just work. + +The only issue with MacOS X is that for whatever reason, the location +dependency for the shared library (framework) isn't always encoded into +'mod_wsgi.so' correctly. This seems to vary between what Python installation +was used and what MacOS X operating system version. In this case, if +multiple installations of same version of Python in different locations, +may find the system installation rather than your custom installation. + +In that situation you may need to use the '--disable-framework' option to +'configure' script for mod_wsgi. This doesn't actually disable use of the +framework, but does change how it links to use a more traditional library +style linking rather than framework linking. This seems to resolve the +problems in most cases. + +Python Installation In Use +-------------------------- + +Although the 'mod_wsgi.so' file may be finding a specific Python shared +library and that may be from the correct installation, the Python library +when initialised doesn't actually know from where it came. As such, it uses +a series of checks to try and determine where the Python installation is +actually located. + +This check has various subtleties and how it works varies depending on the +platform used. At its simplest though, on most UNIX systems it will check +all directories listed in the 'PATH' environment variable of the process. +In each of those directories it will look for the 'python' program. When it +finds such a file, it will then look for a corresponding 'lib' directory +containing a valid Python installation for the same version of Python as is +being run. + +When it finds such a directory, the home for the Python installation will +be taken as being the parent directory of the directory containing the +'python' program file found. + +Because this search is dependent on the 'PATH' environment variable, which +is likely set to a minimal set of directories for the Apache user, then if +you are using a Python installation in a non standard location, then it may +not properly find the location of that installation. + +The easiest way to validate which Python installation is being used is to +use a test WSGI script to output the value of 'sys.prefix':: + + import sys + + def application(environ, start_response): + status = '200 OK' + + output = '' + output += 'sys.version = %s\n' % repr(sys.version) + output += 'sys.prefix = %s\n' % repr(sys.prefix) + + response_headers = [('Content-type', 'text/plain'), + ('Content-Length', str(len(output)))] + start_response(status, response_headers) + + return [output] + +For standard Python installation on a Linux system, this would produce +something like:: + + sys.version = "'2.6.1 (r261:67515, Feb 11 2010, 00:51:29) \\n[GCC 4.2.1 (Apple Inc. build 5646)]'" + sys.prefix = '/usr' + +Thus, if you were expecting to pick up a separate Python installation +located under '/usr/local' or elsewhere, this would be indicative of a +problem. + +It can also be worthwhile to check that the Python module search path also +looks correct. This can be done by using a test WSGI script to output the +value of 'sys.path':: + + import sys + + def application(environ, start_response): + status = '200 OK' + output = 'sys.path = %s' % repr(sys.path) + + response_headers = [('Content-type', 'text/plain'), + ('Content-Length', str(len(output)))] + start_response(status, response_headers) + + return [output] + +In both cases, even if incorrect location is being used for Python +installation and even if there is no actual Python installation of the +correct version under that root directory, then these test scripts should +still run as 'sys' module is a builtin module which can be satisified via +just the Python library. + +If debugging, whether there is a Python installation underneath that root +directory, the subdirectory which you would want to look for is +'lib/pythonX.Y' corresponding to version of Python being used. + +If the calculated directory is wrong, then you will need to use the +WSGIPythonHome directory to set the location to the correct value. The value +to use is what 'sys.prefix' is set to when the correct Python is run from +the command line and 'sys.prefix' output:: + + >>> import sys + >>> print sys.prefix + /usr/local + +Thus for case where installed under '/usr/local', would use:: + + WSGIPythonHome /usr/local + +Embedded Or Daemon Mode +----------------------- + +WSGI applications can run in either embedded mode or daemon mode. In the +case of embedded mode, the WSGI application runs within the Apache child +processes themselves. In the case of daemon mode, they run within a +separate set of processes managed by mod_wsgi. + +To determine what mode a WSGI application is running under, replace its +WSGI script with the test WSGI script as follows:: + + import sys + + def application(environ, start_response): + status = '200 OK' + output = 'mod_wsgi.process_group = %s' % repr(environ['mod_wsgi.process_group']) + response_headers = [('Content-type', 'text/plain'), + ('Content-Length', str(len(output)))] + start_response(status, response_headers) + + return [output] + +If the configuration is such that the WSGI application is running in embedded +mode, then you will see:: + + mod_wsgi.process_group = '' + +This actually corresponds to the directive:: + + WSGIProcessGroup %{GLOBAL} + +having being used, or the same value being used to the 'process-group' +directive of WSGIScriptAlias. Do note though that these are also actually +the defaults for these if not explicitly defined. + +If the WSGI application is actually running in daemon mode, then a non +empty string will instead be shown corresponding to the name of the daemon +process group used. + +Sub Interpreter Being Used +-------------------------- + +As well as WSGI application being able to be delegated to run in either +embedded mode or daemon mode, within the process it ends up running in, it +can be delegated to a specific Python sub interpreter. + +To determine which Python sub interpreter is being used within the process +the WSGI application is being run use the test WSGI script of:: + + import sys + + def application(environ, start_response): + status = '200 OK' + output = 'mod_wsgi.application_group = %s' % repr(environ['mod_wsgi.application_group']) + + response_headers = [('Content-type', 'text/plain'), + ('Content-Length', str(len(output)))] + start_response(status, response_headers) + + return [output] + +If being run in the main interpreter, ie., the first interpreter created by +Python, this will output:: + + mod_wsgi.application_group = '' + +This actually corresponds to the directive:: + + WSGIApplicationGroup %{GLOBAL} + +having being used, or the same value being used to the 'application-group' +directive of WSGIScriptAlias. + +The default for these if not defined is actually '%{RESOURCE}'. This will +be a value made up from the name of the virtual host or server, the port +on which connection was accepted and the mount point of the WSGI application. +The port however is actually dropped where port is 80 or 443. + +An example of what you would expect to see is:: + + mod_wsgi.application_group = 'tests.example.com|/interpreter.wsgi' + +This corresponds to server name of 'tests.example.com' with connection +received on either port 80 or 443 and where WSGI application was mounted at +the URL of '/interpreter.wsgi'. + +Single Or Multi Threaded +------------------------ + +Apache supports differing Multiprocessing Modules (MPMs) having different +attributes. One such difference is whether a specific Apache child process +uses multiple threads for handling requests or whether a single thread is +instead used. + +Depending on how you configure a daemon process group when using daemon +mode will also dictate whether single or multithreaded. By default, if +number of threads is not explicitly specified for a daemon process group, +it will be multithreaded. + +Whether a WSGI application is executing within a multithreaded environment +is important to know. If it is, then you need to ensure that your own code +and any framework you are using is also thread safe. + +A test WSGI script for validating whether WSGI application running in +multithread configuration is as follows:: + + import sys + + def application(environ, start_response): + status = '200 OK' + output = 'wsgi.multithread = %s' % repr(environ['wsgi.multithread']) + + response_headers = [('Content-type', 'text/plain'), + ('Content-Length', str(len(output)))] + start_response(status, response_headers) + + return [output] + +If multithreaded, this will yield:: + + wsgi.multithread = True + +Multithreaded would usually be true on Windows, on UNIX if running in embedded +mode and worker MPM is used by Apache, or if using daemon mode and number of +threads not explicitly set, or number of threads explicitly set to value other +than '1'. diff --git a/docs/user-guides/configuration-guidelines.rst b/docs/user-guides/configuration-guidelines.rst index df992d4..b50662e 100644 --- a/docs/user-guides/configuration-guidelines.rst +++ b/docs/user-guides/configuration-guidelines.rst @@ -473,7 +473,7 @@ The argument to the WSGIApplicationGroup directive can in general be any unique name of your choosing, although there are also a number of special values which you can use as well. For further information about these special values see the more detailed documentation on the -:doc:`../config-directives/WSGIApplicationGroup` directive. Two of the +:doc:`../configuration-directives/WSGIApplicationGroup` directive. Two of the special values worth highlighting are: **%{GLOBAL}** @@ -603,7 +603,7 @@ user explicitly signals it to restart. For further information about the options that can be supplied to the WSGIDaemonProcess directive see the more detailed documentation for -:doc:`../config-directives/WSGIDaemonProcess`. A few of the options +:doc:`../configuration-directives/WSGIDaemonProcess`. A few of the options which can be supplied to the WSGIDaemonProcess directive worth highlighting are: diff --git a/docs/user-guides/configuration-issues.rst b/docs/user-guides/configuration-issues.rst new file mode 100644 index 0000000..d1bc52b --- /dev/null +++ b/docs/user-guides/configuration-issues.rst @@ -0,0 +1,66 @@ +==================== +Configuration Issues +==================== + +Many Linux distributions in particular do not structure an Apache +installation in the default manner as dictated by the original Apache code +distributed by the Apache Software Foundation. This fact, and differences +between different operating systems and distributions means that the +configuration for mod_wsgi may sometimes have to be tweaked. + +The purpose of this document is to capture all the known problems that can +arise in respect of configuration. + +If you are having a problem which doesn't seem to be covered by this +document, also make sure you see :doc:`../user-guides/installation-issues` +and :doc:`../user-guides/application-issues`. + +Location Of UNIX Sockets +------------------------ + +When mod_wsgi is used in 'daemon' mode, UNIX sockets are used to +communicate between the Apache child processes and the daemon processes +which are to handle a request. + +These sockets and any related mutex lock files will be placed in the +standard Apache runtime directory. This is the same directory that the +Apache log files would normally be placed. + +For some Linux distributions, restrictive permissions are placed on the +standard Apache runtime directory such that the directory is not readable +to others. This can cause problems with mod_wsgi because the user that the +Apache child processes run as will subsequently not have the required +permissions to access the directory to be able to connect to the sockets. + +When this occurs, a '503 Service Temporarily Unavailable' error response +would be received by the client. The Apache error log file would show +messages of the form:: + + (13)Permission denied: mod_wsgi (pid=26962): Unable to connect to WSGI \ + daemon process '' on '/etc/httpd/logs/wsgi.26957.0.1.sock' \ + after multiple attempts. + +To resolve the problem, the WSGISocketPrefix directive should be defined to +point at an alternate location. The value may be a location relative to the +Apache root directory, or an absolute path. + +On systems which restrict access to the standard Apache runtime directory, +they normally provide an alternate directory for placing sockets and lock +files used by Apache modules. This directory is usually called 'run' and +to make use of this directory the WSGISocketPrefix directive would be set +as follows:: + + WSGISocketPrefix run/wsgi + +Although this may be present, do be aware that some Linux distributions, +notably RedHat, also lock down the permissions of this directory as well so +not readable to processes running as a non root user. In this situation you +will be forced to use the operating system level '/var/run' directory +rather than the HTTP specific directory:: + + WSGISocketPrefix /var/run/wsgi + +Note, do not put the sockets in the system temporary working directory. +That is, do not go making the prefix '/tmp/wsgi'. The directory should be +one that is only writable by 'root' user, or if not starting Apache as +'root', the user that Apache is started as. diff --git a/docs/user-guides/debugging-techniques.rst b/docs/user-guides/debugging-techniques.rst new file mode 100644 index 0000000..368b795 --- /dev/null +++ b/docs/user-guides/debugging-techniques.rst @@ -0,0 +1,1148 @@ +==================== +Debugging Techniques +==================== + +Be it when initially setting up mod_wsgi for the first time, or later +during development or use of your WSGI application, you are bound to get +some sort of unexpected Python error. By default all you are usually going +to see as evidence of this is a HTTP 500 "Internal Server Error" being +displayed in the browser window and little else. + +The purpose of this document is to explain where to go look for more +details on what caused the error, describe techniques one can use to have +your WSGI application generate more useful debug information, as well as +mechanisms that can be employed to interactively debug your application. + +Note that although this document is intended to deal with techniques which +can be used when using mod_wsgi, many of the techniques are also directly +transferable or adaptable to other web hosting mechanisms for WSGI +applications. + +Apache Error Log Files +---------------------- + +When using mod_wsgi, unless you or the web framework you are using takes +specific action to catch exceptions and present the details in an alternate +manner, the only place that details of uncaught exceptions will be recorded +is in the Apache error log files. The Apache error log files are therefore +your prime source of information when things go wrong. + +Do note though that log messages generated by mod_wsgi are logged with +various serverity levels and which ones will be output to the Apache error +log files will depend on how Apache has been configured. The standard +configuration for Apache has the !LogLevel directive being set to 'warn'. +With this setting any important error messages will be output, but +informational messages generated by mod_wsgi which can assist in working +out what it is doing are not. Thus, if new to mod_wsgi or trying to debug a +problem, it is worthwhile setting the Apache configuration to use 'info' +log level instead:: + + LogLevel info + +If your Apache web server is only providing services for one host, it is +likely that you will only have one error log file. If however the Apache +web server is configured for multiple virtual hosts, then it is possible +that there will be multiple error log files, one corresponding to the main +server host and an additional error log file for each virtual host. Such a +virtual host specific error log if one is being used, would have been +configured through the placing of the Apache CustomLog directive within +the context of the VirtualHost container. + +Although your WSGI application may be hosted within a particular virtual +host and that virtual host has its own error log file, some error and +informational messages will still go to the main server host error log +file. Thus you may still need to consult both error log files when using +virtual hosts. + +Messages of note that will end up in the main server host error log file +include notifications in regard to initialisation of Python and the +creation and destruction of Python sub interpreters, plus any errors which +occur when doing this. + +Messages of note that would end up in the virtual host error log file, if +it exists, include details of uncaught Python exceptions which occur when +the WSGI application script is being loaded, or when the WSGI application +callable object is being executed. + +Messages that are logged by a WSGI application via the 'wsgi.errors' object +passed through to the application in the WSGI environment are also logged. +These will got to the virtual host error log file if it exists, or the main +error log file if the virtual host is not setup with its own error log file. +Thus, if you want to add debugging messages to your WSGI application code, +you can use 'wsgi.errors' in conjunction with the 'print' statement as shown +below:: + + def application(environ, start_response): + status = '200 OK' + output = 'Hello World!' + + print >> environ['wsgi.errors'], "application debug #1" + + response_headers = [('Content-type', 'text/plain'), + ('Content-Length', str(len(output)))] + start_response(status, response_headers) + + print >> environ['wsgi.errors'], "application debug #2" + + return [output] + +If 'wsgi.errors' is not available to the code which needs to output log +messages, then it should explicitly direct output from the 'print' +statement to 'sys.stderr':: + + import sys + + def function(): + print >> sys.stderr, "application debug #3" + ... + +If ``sys.stderr`` or ``sys.stdout`` is used directly then +these messages will end up in the main server host error log file and not +that for the virtual host unless the WSGI application is running in a +daemon process specifically associated with a virtual host. + +Do be aware though that writing to ``sys.stdout`` is by default +restricted in versions of mod_wsgi prior to 3.0 and will result in an +exception occurring of the form:: + + IOError: sys.stdout access restricted by mod_wsgi + +This is because portable WSGI applications should not write to +``sys.stdout`` or use the 'print' statement without specifying an +alternate file object besides ``sys.stdout`` as the target. This +restriction can be disabled for the whole server using the +WSGIRestrictStdout directive, or by mapping ``sys.stdout`` to +``sys.stderr`` at global scope within in the WSGI application script +file:: + + import sys + sys.stdout = sys.stderr + +In general, a WSGI application should always endeavour to only log messages +via the 'wsgi.errors' object that is passed through to a WSGI application +in the WSGI environment. This is because this is the only way of logging +messages for which there is some guarantee that they will end up in a log +file that you might have access to if using a shared server. + +An application shouldn't however cache 'wsgi.errors' and try to use it +outside of the context of a request. If this is done an exception will be +raised indicating that the request has expired and the error log object +is now invalid. + +That messages output via ``sys.stderr`` and ``sys.stdout`` end up in +the Apache error logs at all is provided as a convenience but there is no +requirement in the WSGI specification that they are valid means of a WSGI +application logging messages. + +Displaying Request Environment +------------------------------ + +When a WSGI application is invoked, the request headers are passed as CGI +variables in the WSGI request environment. The dictionary used for this +also holds information about the WSGI execution environment and mod_wsgi. +This includes mod_wsgi specific variables indicating the name of the +process and application groups within which the WSGI application is +executing. + +Knowing the values of the process and application group variables can be +important when needing to validate that your Apache configuration is doing +what you intended as far as ensuring your WSGI application is running in +daemon mode or otherwise. + +A simple way of validating such details or getting access to any of the +other WSGI request environment variables is to substitute your existing +WSGI application with one which echos back the details to your browser. +Such a task can be achieved with the following test application. The +application could be extended as necessary to display other information as +well, with process ID, user ID and group ID being shown as examples:: + + import cStringIO + import os + + def application(environ, start_response): + headers = [] + headers.append(('Content-Type', 'text/plain')) + write = start_response('200 OK', headers) + + input = environ['wsgi.input'] + output = cStringIO.StringIO() + + print >> output, "PID: %s" % os.getpid() + print >> output, "UID: %s" % os.getuid() + print >> output, "GID: %s" % os.getgid() + print >> output + + keys = environ.keys() + keys.sort() + for key in keys: + print >> output, '%s: %s' % (key, repr(environ[key])) + print >> output + + output.write(input.read(int(environ.get('CONTENT_LENGTH', '0')))) + + return [output.getvalue()] + +For the case of the process group as recorded by the +'mod_wsgi.process_group' variable in the WSGI request environment, if the +value is an empty string then the WSGI application is running in embedded +mode. For any other value it will be running in daemon mode with the process +group named by the variables value. + +Note that by default WSGI applications run in embedded mode, which means +within the Apache server child processes which accept the original requests. +Daemon mode processes would only be used through appropriate use of the +WSGIDaemonProcess and WSGIProcessGroup directives to delegate the WSGI +application to a named daemon process group. + +For the case of the application group as recorded by the +'mod_wsgi.application_group' variable in the WSGI request environment, if the +value is an empty string then the WSGI application is running in the main +Python interpreter. That is, the very first interpreter created when Python +was initialised. For any other value it indicates it is running in the named +Python sub interpreter. + +Note that by default WSGI applications would always run in a sub +interpreter rather than the main interpreter. The name of this sub +interpreter would be automatically constructed from the name of the server +or virtual host, the URL mount point of the WSGI application and the number +of the listener port when it is other than ports 80 or 443. + +To delegate a WSGI application to run in main Python interpreter, the +WSGIApplicationGroup directive would need to have been used with the value +'%{GLOBAL}'. Although the value is '%{GLOBAL}', this translates to the +empty string seen for the value of 'mod_wsgi.application_group' within the +WSGI request environment. + +The WSGIApplicationGroup directive could also be used to designate a +specific named sub interpreter rather than that selected automatically. + +For newcomers this can all be a bit confusing, which is where the test +application comes in as you can use it to validate where your WSGI +application is running is where you intended it to run. + +The set of WSGI request environment variables will also show the WSGI +variables indicating whether process is multithreaded and whether the +process group is multiprocess or not. For a more complete explanation +of what that means see documentation of +:doc:`../user-guides/processes-and-threading`. + +Tracking Request and Response +----------------------------- + +Although one can use above test application to display the request +environment, it is replacing your original WSGI application. Rather than +replace your existing application you can use a WSGI middleware wrapper +application which logs the details to the Apache error log instead:: + + # Original WSGI application. + + def application(environ, start_response): + ... + + # Logging WSGI middleware. + + import pprint + + class LoggingMiddleware: + + def __init__(self, application): + self.__application = application + + def __call__(self, environ, start_response): + errors = environ['wsgi.errors'] + pprint.pprint(('REQUEST', environ), stream=errors) + + def _start_response(status, headers, *args): + pprint.pprint(('RESPONSE', status, headers), stream=errors) + return start_response(status, headers, *args) + + return self.__application(environ, _start_response) + + application = LoggingMiddleware(application) + +The output from the middleware would end up in the Apache error log for the +virtual host, or if no virtual host specific error log file, in the main +Apache error log file. + +For more complicated problems it may also be necessary to track both the +request and response content as well. A more complicated middleware which +can log these as well as header information to the file system is as +follows:: + + # Original WSGI application. + + def application(environ, start_response): + ... + + # Logging WSGI middleware. + + import threading + import pprint + import time + import os + + class LoggingInstance: + def __init__(self, start_response, oheaders, ocontent): + self.__start_response = start_response + self.__oheaders = oheaders + self.__ocontent = ocontent + + def __call__(self, status, headers, *args): + pprint.pprint((status, headers)+args), stream=self.__oheaders) + self.__oheaders.close() + + self.__write = self.__start_response(status, headers, *args) + return self.write + + def __iter__(self): + return self + + def write(self, data): + self.__ocontent.write(data) + self.__ocontent.flush() + return self.__write(data) + + def next(self): + data = self.__iterable.next() + self.__ocontent.write(data) + self.__ocontent.flush() + return data + + def close(self): + if hasattr(self.__iterable, 'close'): + self.__iterable.close() + self.__ocontent.close() + + def link(self, iterable): + self.__iterable = iter(iterable) + + class LoggingMiddleware: + + def __init__(self, application, savedir): + self.__application = application + self.__savedir = savedir + self.__lock = threading.Lock() + self.__pid = os.getpid() + self.__count = 0 + + def __call__(self, environ, start_response): + self.__lock.acquire() + self.__count += 1 + count = self.__count + self.__lock.release() + + key = "%s-%s-%s" % (time.time(), self.__pid, count) + + iheaders = os.path.join(self.__savedir, key + ".iheaders") + iheaders_fp = file(iheaders, 'w') + + icontent = os.path.join(self.__savedir, key + ".icontent") + icontent_fp = file(icontent, 'w+b') + + oheaders = os.path.join(self.__savedir, key + ".oheaders") + oheaders_fp = file(oheaders, 'w') + + ocontent = os.path.join(self.__savedir, key + ".ocontent") + ocontent_fp = file(ocontent, 'w+b') + + errors = environ['wsgi.errors'] + pprint.pprint(environ, stream=iheaders_fp) + iheaders_fp.close() + + length = int(environ.get('CONTENT_LENGTH', '0')) + input = environ['wsgi.input'] + while length != 0: + data = input.read(min(4096, length)) + if data: + icontent_fp.write(data) + length -= len(data) + else: + length = 0 + icontent_fp.flush() + icontent_fp.seek(0, os.SEEK_SET) + environ['wsgi.input'] = icontent_fp + + iterable = LoggingInstance(start_response, oheaders_fp, ocontent_fp) + iterable.link(self.__application(environ, iterable)) + return iterable + + application = LoggingMiddleware(application, '/tmp/wsgi') + +For this middleware, the second argument to the constructor should be a +preexisting directory. For each request four files will be saved. These +correspond to input headers, input content, response status and headers, +and request content. + +Poorly Performing Code +---------------------- + +The WSGI specification allows any iterable object to be returned as the +response, so long as the iterable yields string values. That this is the +case means that one can too easily return an object which satisfies this +requirement but has some sort of performance related issue. + +The worst case of this is where instead of returning a list containing +strings, a single string is returned. The problem with a string is that +when it is iterated over, a single character of the string is yielded each +time. In other words, a single character is written back to the client on +each loop, with a flush occuring in between to ensure that the character +has actually been written and isn't just being buffered. + +Although for small strings a performance impact may not be noticed, if +returning large strings the affect on request throughput could be quite +significant. + +Another case which can cause problems is to return a file like object. For +iteration over a file like object, typically what can occur is that a +single line within the file is returned each time. If the file is a line +oriented text file where each line is a of a reasonable length, this may be +okay, but if the file is a binary file there may not actually be line +breaks within the file. + +For the case where file contains many short lines, throughput would be +affected much like in the case where a string is returned. For the case +where the file is just binary data, the result can be that the complete +file may be read in on the first loop. If the file is large, this could +cause a large transient spike in memory usage. Once that memory is +allocated, it will then be retained by the process, albeit that it may be +reused by the process at a later point. + +Because of the performance impacts in terms of throughput and memory usage, +both these cases should be avoided. For the case of returning a string, it +should be returned with a single element list. For the case of a file like +object, the 'wsgi.file_wrapper' extension should be used, or a wrapper +which suitably breaks the response into chunks. + +In order to identify where code may be inadvertantly returning such iterable +types, the following code can be used:: + + import types + + import cStringIO + import socket + import StringIO + + BAD_ITERABLES = [ + cStringIO.InputType, + socket.SocketType, + StringIO.StringIO, + types.FileType, + types.StringType, + ] + + class ValidatingMiddleware: + + def __init__(self, application): + self.__application = application + + def __call__(self, environ, start_response): + errors = environ['wsgi.errors'] + + result = self.__application(environ, start_response) + + value = type(result) + if value == types.InstanceType: + value = result.__class__ + if value in BAD_ITERABLES: + print >> errors, 'BAD ITERABLE RETURNED: ', + print >> errors, 'URL=%s ' % environ['REQUEST_URI'], + print >> errors, 'TYPE=%s' % value + + return result + + def application(environ, start_response): + ... + + application = ValidatingMiddleware(application) + +Error Catching Middleware +------------------------- + +Because mod_wsgi only logs details of uncaught exceptions to the Apache +error log and returns a generic HTTP 500 "Internal Server Error" response, +if you want the details of any exception to be displayed in the error +page and be visible from the browser, you will need to use a WSGI error +catching middleware component. + +One example of WSGI error catching middleware is the ErrorMiddleware class +from Paste. + + * http://www.pythonpaste.org + +This class can be configured not only to catch exceptions and present the +details to the browser in an error page, it can also be configured to send +the details of any errors in email to a designated recipient, or log the +details to an alternate log file. + +Being able to have error details sent by email would be useful in a +production environment or where your application is running on a web +hosting environment and the Apache error logs would not necessarily be +closely monitored on a day to day basis. Enabling of that particular +feature though should possibly only be done when you have some confidence +in the application else you might end up getting inundated with emails. + +To use the error catching middleware from Paste you simply need to wrap +your existing application with it such that it then becomes the top level +application entry point:: + + def application(environ, start_response): + status = '200 OK' + output = 'Hello World!\n\n' + + response_headers = [('Content-type', 'text/plain'), + ('Content-Length', str(len(output)))] + start_response(status, response_headers) + + return [output] + + from paste.exceptions.errormiddleware import ErrorMiddleware + application = ErrorMiddleware(application, debug=True) + +In addition to displaying information about the Python exception that has +occurred and the stack traceback, this middleware component will also +output information about the WSGI environment such that you can see what +was being passed to the WSGI application. This can be useful if the cause +of any problem was unexpected values passed in the headers of the HTTP +request. + +Note that error catching middleware is of absolutely no use for trying +to capture and display in the browser any errors that occur at global scope +within the WSGI application script when it is being imported. Details of +any such errors occuring at this point will only be captured in the Apache +error log files. As much as possible you should avoid performing +complicated tasks when the WSGI application script file is being imported, +instead you should only trigger such actions the first time a request is +received. By doing this you will be able to capture errors in such +initialisation code with the error catching middleware. + +Also note that the debug mode whereby details are displayed in the browser +should only be used during development and not in a production system. This +is because details which are displayed may be of use to anyone who may wish +to compromise your site. + +Python Interactive Debugger +--------------------------- + +Python debuggers such as implemented by the 'pdb' module can sometimes be +useful in debugging Python applications, especially where there is a need +to single step through code and analyse application state at each point. +Use of such debuggers in web applications can be a bit more tricky than +normal applications though and especially so with mod_wsgi. + +The problem with mod_wsgi is that the Apache web server can create multiple +child processes to respond to requests. Partly because of this, but also +just to prevent problems in general, Apache closes off standard input at +startup. Thus there is no actual way to interact with the Python debugger +module if it were used. + +To get around this requires having complete control of the Apache web +server that you are using to host your WSGI application. In particular, it +will be necessary to shutdown the web server and then startup the 'httpd' +process explicitly in single process debug mode, avoiding the 'apachectl' +management application altogether:: + + $ apachectl stop + $ httpd -X + +If Apache is normally started as the 'root' user, this also will need to be +run as the 'root' user otherwise the Apache web server will not have the +required permissions to write to its log directories etc. + +The result of starting the 'httpd' process in this way will be that the +Apache web server will run everything in one process rather than using +multiple processes. Further, it will not close off standard input thus +allowing the Python debugger to be used. + +Do note though that one cannot be using the ability of mod_wsgi to run +your application in a daemon process when doing this. The WSGI application +must be running within the main Apache process. + +To trigger the Python debugger for any call within your code, the following +customised wrapper for the 'Pdb' class should be used:: + + class Debugger: + + def __init__(self, object): + self.__object = object + + def __call__(self, *args, **kwargs): + import pdb, sys + debugger = pdb.Pdb() + debugger.use_rawinput = 0 + debugger.reset() + sys.settrace(debugger.trace_dispatch) + + try: + return self.__object(*args, **kwargs) + finally: + debugger.quitting = 1 + sys.settrace(None) + +This might for example be used to wrap the actual WSGI application callable +object:: + + def application(environ, start_response): + status = '200 OK' + output = 'Hello World!\n\n' + + response_headers = [('Content-type', 'text/plain'), + ('Content-Length', str(len(output)))] + start_response(status, response_headers) + + return [output] + + application = Debugger(application) + +When a request is now received, the Python debugger will be triggered and +you can interactively debug your application from the window you ran the +'httpd' process. For example:: + + > /usr/local/wsgi/scripts/hello.py(21)application() + -> status = '200 OK' + + (Pdb) list + 16 finally: + 17 debugger.quitting = 1 + 18 sys.settrace(None) + 19 + 20 def application(environ, start_response): + 21 -> status = '200 OK' + 22 output = 'Hello World!\n\n' + 23 + 24 response_headers = [('Content-type', 'text/plain'), + 25 ('Content-Length', str(len(output)))] + 26 start_response(status, response_headers) + + (Pdb) print start_response + + + cont + +When wishing to allow the request to complete, issue the 'cont' command. If +wishing to cause the request to abort, issue the 'quit' command. This will +result in a 'BdbQuit' exception being raised and would result in a HTTP +500 "Internal Server Error" response being returned to the client. To kill +off the whole 'httpd' process, after having issued 'cont' or 'quit' to exit +the debugger, interrupt the process using 'CTRL-C'. + +To see what commands the Python debugger accepts, issue the 'help' command +and also consult the documentation for the 'pdb' module on the Python web +site. + +Note that the Python debugger expects to be able to write to +``sys.stdout`` to display information to the terminal. Thus if using +using a Python web framework which replaces ``sys.stdout`` such as +web.py, you will not be able to use the Python debugger. + +Browser Based Debugger +---------------------- + +In order to use the Python debugger modules you need to have direct access +to the host and the Apache web server that is running your WSGI application. +If your only access to the system is via your web browser this makes the use +of the full Python debugger impractical. + +An alternative to the Python debugger modules which is available is an +extension of the WSGI error catching middleware previously described. This +is the EvalException class from Paste. It embodies the error catching +attributes of the ErrorMiddleware class, but also allows some measure of +interactive debugging and introspection through the web browser. + +As with any WSGI middleware component, to use the class entails creating +a wrapper around the application you wish to debug:: + + def application(environ, start_response): + status = '200 OK' + output = 'Hello World!\n\n' + + response_headers = [('Content-type', 'text/plain'), + ('Content-Length', str(len(output)))] + start_response(status, response_headers) + + return [output] + + from paste.evalexception.middleware import EvalException + application = EvalException(application) + +Like ErrorMiddleware when an unexpected exception occurs a web page is +presented which shows the location of the error along with the contents of +the WSGI application environment. Where EvalException is different however +is that it is possible to inspect the local variables residing within each +stack frame down to where the error occurred. Further, it is possible to +enter Python code which can be evaluated within the context of the selected +stack frame in order to access data or call functions or methods of +objects. + +In order for this to all work requires that subsequent requests back to +the WSGI application always end up with the same process where the error +originally occurred. With mod_wsgi this does however present a bit of a +problem as Apache can create and use multiple child processes to handle +requests. + +Because of this requirement, if you want to be able to use this browser +based interactive debugger, if running your application in embedded mode of +mod_wsgi, you will need to configure Apache such that it only starts up one +child process to handle requests and that it never creates any additional +processes. The Apache configuration directives required to achieve this are +as follows:: + + StartServers 1 + ServerLimit 1 + +The directives must be placed at global scope within the main Apache +configuration files and will affect the whole Apache web server. + +If you are using the worker MPM on a UNIX system, restricting Apache to +just a single process may not be an issue, at least during development. If +however you are using the prefork MPM on a UNIX system, you may see issues +if you are using an AJAX intensive page that relies on being able to +execute parallel requests, as only one request at a time will be able to be +handled by the Apache web server. + +If using Apache 2.X on a UNIX system, a better approach is to use daemon +mode of mod_wsgi and delegate your application to run in a single daemon +process. This process may be single or multithreaded as per any threading +requirements of your application. + +Which ever configuration is used, if the browser based interactive debugger +is used it should only be used on a development system and should never be +deployed on a production system or in a web hosting environment. This is +because the debugger will allow one to execute arbitrary Python code within +the context of your application from a remote client. + +Debugging Crashes With GDB +-------------------------- + +In cases where Apache itself crashes for no apparent reason, the above +techniques are not always particularly useful. This is especially the case +where the crash occurs in non Python code outside of your WSGI application. + +The most common cause of Apache crashing, besides any still latent bugs +that may exist in mod_wsgi, of which hopefully there aren't any, are shared +library version mismatches. Another major cause of crashes is third party C +extension modules for Python which are not compatible with being used in a +Python sub interpreter which isn't the first interpreter created when +Python is initialised, or modules which are not compatible with Python sub +interpreters being destroyed and the module then being used in a new Python +sub interpreter. + +Examples of where shared library version mismatches are known to occur are +between the version of the 'expat' library used by Apache and that embedded +within the Python 'pyexpat' module. Another is between the version of the +MySQL client libraries used by PHP and the Python MySQL module. + +Both these can be a cause of crashes where the different components are +compiled and linked against different versions of the shared library for +the packages in question. It is vitally important that all packages making +use of a shared library were compiled against and use the same version of +a shared library. + +Another problematic package is Subversion. In this case there can be +conflicts between the version of Subversion libraries used by mod_dav_svn +and the Python Subversion bindings. Certain versions of the Python +Subversion modules also cause problems because they appear to be +incompatible with use in a Python sub interpreter which isn't the first +interpreter created when Python is initialised. + +In this latter issue, the sub interpreter problems can often be solved by +forcing the WSGI application using the Python Subversion modules to run in +the '%{GLOBAL}' application group. This solution often also resolves issues +with SWIG generated bindings, especially where the '-thread' option was +supplied to 'swig' when the bindings were generated. + +Whatever the reason, in some cases the only way to determine why Apache or +Python is crashing is to use a C code debugger such as 'gdb'. Now although +it is possible to attach 'gdb' to a running process, the preferred method +for using 'gdb' in conjunction with Apache is to run Apache in single +process debug mode from within 'gdb'. + +To do this it is necessary to first shutdown Apache. The 'gdb' debugger can +then be started against the 'httpd' executable and then the process started +up from inside of 'gdb':: + + $ /usr/local/apache/bin/apachectl stop + $ sudo gdb /usr/local/apache/bin/httpd + GNU gdb 6.1-20040303 (Apple version gdb-384) (Mon Mar 21 00:05:26 GMT 2005) + Copyright 2004 Free Software Foundation, Inc. + GDB is free software, covered by the GNU General Public License, and you are + welcome to change it and/or distribute copies of it under certain conditions. + Type "show copying" to see the conditions. + There is absolutely no warranty for GDB. Type "show warranty" for details. + This GDB was configured as "powerpc-apple-darwin"...Reading symbols for shared + libraries ........ done + + (gdb) run -X + Starting program: /usr/local/apache/bin/httpd -X + Reading symbols for shared libraries .+++ done + Reading symbols for shared libraries ..................... done + +If Apache is normally started as the 'root' user, this also will need to be +run as the 'root' user otherwise the Apache web server will not have the +required permissions to write to its log directories etc. + +If Apache was crashing on startup, you should immediately encounter the +error, otherwise use your web browser to access the URL which is causing +the crash to occur. You can then commence trying to debug why the crash is +occuring. + +Note that you should ensure that you have not assigned your WSGI +application to run in a mod_wsgi daemon process using the WSGIDaemonProcess +and WSGIProcessGroup directives. This is because the above procedure will +only catch crashes which occur when the application is running in embedded +mode. If it turns out that the application only crashes when run in mod_wsgi +daemon mode, an alternate method of using 'gdb' will be required. + +In this circumstance you should run Apache as normal, but ensure that you +only create one mod_wsgi daemon process and have it use only a single +thread:: + + WSGIDaemonProcess debug threads=1 + WSGIProcessGroup debug + +If not running the daemon process as a distinct user where you can tell +which process it is, then you will also need to ensure that Apache +!LogLevel directive has been set to 'info'. This is to ensure that +information about daemon processes created by mod_wsgi are logged to the +Apache error log. This is necessary, as you will need to consult the Apache +error logs to determine the process ID of the daemon process that has been +created for that daemon process group:: + + mod_wsgi (pid=666): Starting process 'debug' with threads=1. + +Knowing the process ID, you should then run 'gdb', telling it to attach +directly to the daemon process:: + + $ sudo gdb /usr/local/apache/bin/httpd 666 + GNU gdb 6.1-20040303 (Apple version gdb-384) (Mon Mar 21 00:05:26 GMT 2005) + Copyright 2004 Free Software Foundation, Inc. + GDB is free software, covered by the GNU General Public License, and you are + welcome to change it and/or distribute copies of it under certain conditions. + Type "show copying" to see the conditions. + There is absolutely no warranty for GDB. Type "show warranty" for details. + This GDB was configured as "powerpc-apple-darwin"...Reading symbols for shared + libraries ........ done + + /Users/grahamd/666: No such file or directory. + Attaching to program: `/usr/local/apache/bin/httpd', process 666. + Reading symbols for shared libraries .+++..................... done + 0x900c7060 in sigwait () + (gdb) cont + Continuing. + +Once 'gdb' has been started and attached to the process, then initiate the +request with the URL that causes the application to crash. + +Attaching to the running daemon process can also be useful where a single +request or the whole process is appearing to hang. In this case one can +force a stack trace to be output for all running threads to try and +determine what code is getting stuck. The appropriate gdb command in this +instance is 'thread apply all bt':: + + sudo gdb /usr/local/apache-2.2/bin/httpd 666 + GNU gdb 6.3.50-20050815 (Apple version gdb-477) (Sun Apr 30 20:06:22 GMT 2006) + Copyright 2004 Free Software Foundation, Inc. + GDB is free software, covered by the GNU General Public License, and you are + welcome to change it and/or distribute copies of it under certain conditions. + Type "show copying" to see the conditions. + There is absolutely no warranty for GDB. Type "show warranty" for details. + This GDB was configured as "powerpc-apple-darwin"...Reading symbols + for shared libraries ....... done + + /Users/grahamd/666: No such file or directory. + Attaching to program: `/usr/local/apache/bin/httpd', process 666. + Reading symbols for shared libraries .+++..................... done + 0x900c7060 in sigwait () + (gdb) thread apply all bt + + Thread 4 (process 666 thread 0xd03): + #0 0x9001f7ac in select () + #1 0x004189b4 in apr_pollset_poll (pollset=0x1894650, + timeout=-1146117585187099488, num=0xf0182d98, descriptors=0xf0182d9c) + at poll/unix/select.c:363 + #2 0x002a57f0 in wsgi_daemon_thread (thd=0x1889660, data=0x18895e8) + at mod_wsgi.c:6980 + #3 0x9002bc28 in _pthread_body () + + Thread 3 (process 666 thread 0xc03): + #0 0x9001f7ac in select () + #1 0x0041d224 in apr_sleep (t=1000000) at time/unix/time.c:246 + #2 0x002a2b10 in wsgi_deadlock_thread (thd=0x0, data=0x2aee68) at + mod_wsgi.c:7119 + #3 0x9002bc28 in _pthread_body () + + Thread 2 (process 666 thread 0xb03): + #0 0x9001f7ac in select () + #1 0x0041d224 in apr_sleep (t=299970002) at time/unix/time.c:246 + #2 0x002a2dec in wsgi_monitor_thread (thd=0x0, data=0x18890e8) at + mod_wsgi.c:7197 + #3 0x9002bc28 in _pthread_body () + + Thread 1 (process 666 thread 0x203): + #0 0x900c7060 in sigwait () + #1 0x0041ba9c in apr_signal_thread (signal_handler=0x2a29a0 + ) at threadproc/unix/signals.c:383 + #2 0x002a3728 in wsgi_start_process (p=0x1806418, daemon=0x18890e8) + at mod_wsgi.c:7311 + #3 0x002a6a4c in wsgi_hook_init (pconf=0x1806418, ptemp=0x0, + plog=0xc8, s=0x18be8d4) at mod_wsgi.c:7716 + #4 0x0000a5b0 in ap_run_post_config (pconf=0x1806418, plog=0x1844418, + ptemp=0x180e418, s=0x180da78) at config.c:91 + #5 0x000033d4 in main (argc=3, argv=0xbffffa8c) at main.c:706 + +It is suggested when trying to debug such issues that the daemon process be +made to run with only a single thread. This will reduce how many stack +traces one needs to analyse. + +If you are running with multiple processes within the daemon process group +and all requests are hanging, you will need to get a snapshot of what is +happening in all processes in the daemon process group. Because doing this +by hand will be tedious, it is better to automate it. + +To automate capturing the stack traces, first create a file called 'gdb.cmds' +which contains the following:: + + set pagination 0 + thread apply all bt + detach + quit + +This can then be used in conjunction with 'gdb' to avoid needing to enter +the commands manually. For example:: + + sudo gdb /usr/local/apache-2.2/bin/httpd -x gdb.cmds -p 666 + +To be able to automate this further and apply it to all processes in a +daemon process group, then first off ensure that daemon processes are named +in 'ps' output by using the 'display-name' option to WSGIDaemonProcess +directive. + +For example, to apply default naming strategy as implemented by mod_wsgi, use:: + + WSGIDaemonProcess xxx display-name=%{GLOBAL} + +In the output of a BSD derived 'ps' command, this will now show the process +as being named '(wsgi:xxx)':: + + $ ps -cxo command,pid | grep wsgi + (wsgi:xxx) 666 + +Note that the name may be truncated as the resultant name can be no longer +than what was the length of the original executable path for Apache. You +may therefore like to name it explicitly:: + + WSGIDaemonProcess xxx display-name=(wsgi:xxx) + +Having named the processes in the daemon process group, we can now parse the +output of 'ps' to identify the process and apply the 'gdb' command script to +each:: + + for pid in `ps -cxo command,pid | awk '{ if ($0 ~ /wsgi:xxx/ && $1 !~ /grep/) print $NF }'`; do sudo gdb -x gdb.cmds -p $pid; done + +The actual name given to the daemon process group using the 'display-name' +option should be replaced in this command line. That is, change 'wsgi:xxx' +appropriately. + +If you are having problems with process in daemon process groups hanging, +you might consider implementing a monitoring system which automatically +detects somehow when the processes are no longer responding to requests and +automatically trigger this dump of the stack traces before restarting the +daemon process group or Apache. + +Extracting Python Stack Traces +------------------------------ + +Using gdb to get stack traces as described above only gives you information +about what is happening at the C code level. This will not tell where in the +actual Python code execution was at. Your only clue is going to be where a +call out was being made to some distinct C function in a C extension module +for Python. + +One can get stack traces for Python code by using:: + + def _stacktraces(): + code = [] + for threadId, stack in sys._current_frames().items(): + code.append("\n# ThreadID: %s" % threadId) + for filename, lineno, name, line in traceback.extract_stack(stack): + code.append('File: "%s", line %d, in %s' % (filename, + lineno, name)) + if line: + code.append(" %s" % (line.strip())) + + for line in code: + print >> sys.stderr, line + +The caveat here obviously is that the process has to still be running. There +is also the issue of how you trigger that function to dump stack traces for +executing Python threads. + +If the problem you have is that some request handler threads are stuck, +either blocked, or stuck in an infinite loop, and you want to know what they +are doing, then so long as there are still some handler threads left and +the application is still responding to requests, then you could trigger it +from a request handler triggered by making a request against a specific URL. + +This though depends on you only running your application within a single +process because as soon as you have multiple processes you have no gaurantee +that a request will go to the process you want to debug. + +A better method therefore is to have a perpetually running background thread +which monitors for a specific file in the file system. When that file is +created or the modification time changes, then the background thread would +dump the stack traces for the process. + +Sample code which takes this approach is included below. This code could be +placed temporarily at the end of your WSGI script file if you know you are +going to need it because of a recurring problem:: + + import os + import sys + import time + import signal + import threading + import atexit + import Queue + import traceback + + FILE = '/tmp/dump-stack-traces.txt' + + _interval = 1.0 + + _running = False + _queue = Queue.Queue() + _lock = threading.Lock() + + def _stacktraces(): + code = [] + for threadId, stack in sys._current_frames().items(): + code.append("\n# ProcessId: %s" % os.getpid()) + code.append("# ThreadID: %s" % threadId) + for filename, lineno, name, line in traceback.extract_stack(stack): + code.append('File: "%s", line %d, in %s' % (filename, + lineno, name)) + if line: + code.append(" %s" % (line.strip())) + + for line in code: + print >> sys.stderr, line + + try: + mtime = os.path.getmtime(FILE) + except: + mtime = None + + def _monitor(): + while 1: + global mtime + + try: + current = os.path.getmtime(FILE) + except: + current = None + + if current != mtime: + mtime = current + _stacktraces() + + # Go to sleep for specified interval. + + try: + return _queue.get(timeout=_interval) + except: + pass + + _thread = threading.Thread(target=_monitor) + _thread.setDaemon(True) + + def _exiting(): + try: + _queue.put(True) + except: + pass + _thread.join() + + atexit.register(_exiting) + + def _start(interval=1.0): + global _interval + if interval < _interval: + _interval = interval + + global _running + _lock.acquire() + if not _running: + prefix = 'monitor (pid=%d):' % os.getpid() + print >> sys.stderr, '%s Starting stack trace monitor.' % prefix + _running = True + _thread.start() + _lock.release() + + _start() + +Once your WSGI script file has been loaded, then touching the file +'/tmp/dump-stack-traces.txt' will cause stack traces for active Python +threads to be output to the Apache error log. + +Note that the sample code doesn't deal with possibility that with multiple +processes for same application, that all processes may attempt to dump +information at the same time. As such, you may get interleaving of output +from multiple processes in Apache error logs at the same time. + +What you may want to do is modify this code to dump out to some special +directory, distinct files containing the trace where the names of the file +include the process ID and a date/time. That way each will be separate. + +An example of what one might expect to see from the above code is as +follows:: + + # ProcessId: 666 + # ThreadID: 4352905216 + File: "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/threading.py", line 497, in __bootstrap + self.__bootstrap_inner() + File: "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/threading.py", line 522, in __bootstrap_inner + self.run() + File: "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/threading.py", line 477, in run + self.__target(*self.__args, **self.__kwargs) + File: "/Library/WebServer/Sites/django-1/htdocs/project.wsgi", line 72, in _monitor + _stacktraces() + File: "/Library/WebServer/Sites/django-1/htdocs/project.wsgi", line 47, in _stacktraces + for filename, lineno, name, line in traceback.extract_stack(stack): + + # ThreadID: 4322832384 + File: "/Library/WebServer/Sites/django-1/htdocs/project.wsgi", line 21, in application + return _application(environ, start_response) + File: "/Library/WebServer/Sites/django-1/lib/python2.6/site-packages/django/core/handlers/wsgi.py", line 245, in __call__ + response = middleware_method(request, response) + File: "/Library/WebServer/Sites/django-1/lib/python2.6/site-packages/django/contrib/sessions/middleware.py", line 36, in process_response + request.session.save() + File: "/Library/WebServer/Sites/django-1/lib/python2.6/site-packages/django/contrib/sessions/backends/db.py", line 63, in save + obj.save(force_insert=must_create, using=using) + File: "/Library/WebServer/Sites/django-1/lib/python2.6/site-packages/django/db/models/base.py", line 434, in save + self.save_base(using=using, force_insert=force_insert, force_update=force_update) + File: "/Library/WebServer/Sites/django-1/lib/python2.6/site-packages/django/db/models/base.py", line 527, in save_base + result = manager._insert(values, return_id=update_pk, using=using) + File: "/Library/WebServer/Sites/django-1/lib/python2.6/site-packages/django/db/models/manager.py", line 195, in _insert + return insert_query(self.model, values, **kwargs) + File: "/Library/WebServer/Sites/django-1/lib/python2.6/site-packages/django/db/models/query.py", line 1479, in insert_query + return query.get_compiler(using=using).execute_sql(return_id) + File: "/Library/WebServer/Sites/django-1/lib/python2.6/site-packages/django/db/models/sql/compiler.py", line 783, in execute_sql + cursor = super(SQLInsertCompiler, self).execute_sql(None) + File: "/Library/WebServer/Sites/django-1/lib/python2.6/site-packages/django/db/models/sql/compiler.py", line 727, in execute_sql + cursor.execute(sql, params) + File: "/Library/WebServer/Sites/django-1/lib/python2.6/site-packages/debug_toolbar/panels/sql.py", line 95, in execute + stacktrace = tidy_stacktrace(traceback.extract_stack()) + File: "/Library/WebServer/Sites/django-1/lib/python2.6/site-packages/debug_toolbar/panels/sql.py", line 40, in tidy_stacktrace + s_path = os.path.realpath(s[0]) + File: "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/posixpath.py", line 355, in realpath + if islink(component): + File: "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/posixpath.py", line 132, in islink + st = os.lstat(path) + +Note that one of the displayed threads will be that for the thread which is +dumping the stack traces. That stack trace can obviously be ignored. + +One could extend the above recipe in more elaborate ways by using a WSGI +middleware that capture details of each request from the WSGI environment +and also dumping out from that the URL for the request being handled by +any threads. This may assist in working out whether problems are related +to a specific URL. diff --git a/docs/user-guides/file-wrapper-extension.rst b/docs/user-guides/file-wrapper-extension.rst new file mode 100644 index 0000000..54d434b --- /dev/null +++ b/docs/user-guides/file-wrapper-extension.rst @@ -0,0 +1,217 @@ +====================== +File Wrapper Extension +====================== + +The WSGI specification supports an optional feature that can be implemented +by WSGI adapters for platform specific file handling. + + * http://www.python.org/dev/peps/pep-0333/#optional-platform-specific-file-handling + +What this allows is for a WSGI application to return a special object type +which wraps a Python file like object. If that file like object statisfies +certain conditions as dictated by a specific platform, then the WSGI +adapter is allowed to return the content of that file in an optimised +manner. + +The intent of this is to provide better performance for serving up static +file content than a pure Python WSGI application may itself be able to +achieve. + +Do note however that for the best performance, static files should always +be served by a web server. In the case of mod_wsgi this means by Apache +itself rather than mod_wsgi or the WSGI application. Using the web server +may not always be possible however, such as for files generated on demand. + +Example Of Wrapper Usage +------------------------ + +A WSGI adapter implementing this extension needs to supply a special +callable object under the key 'wsgi.file_wrapper' in the 'environ' +dictionary passed to the WSGI application. + +What this callable does will be specific to a WSGI adapter, but it must be +a callable that accepts one required positional parameter, and one optional +positional parameter. The first parameter is the file like object to be +sent, and the second parameter is an optional block size. If the block size +is not supplied then the WSGI adapter would choose a value which would be +most appropriate for the specific hosting mechanism. + +Whatever the WSGI adapter does, the result of the callable must be an +iterable object which can be used directly as the response from the WSGI +application or for passing into any WSGI middleware. Provided the response +content isn't consumed by any WSGI middleware and the iterable object gets +passed through the WSGI adapter, the WSGI adapter should recognise the +special iterable object and trigger any special handling to return the +response in a more efficient way. + +Because the support of this platform specific file handling is optional for +any specific WSGI adapter, any user code should be coded so as to be able +to cope with it not existing. + +Using the snippet as described in the WSGI specification as guide, the +WSGI application would be written as follows:: + + def application(environ, start_response): + status = '200 OK' + response_headers = [('Content-type', 'text/plain')] + start_response(status, response_headers) + + filelike = file('usr/share/dict/words', 'rb') + block_size = 4096 + + if 'wsgi.file_wrapper' in environ: + return environ['wsgi.file_wrapper'](filelike, block_size) + else: + return iter(lambda: filelike.read(block_size), '') + +Note that the file must always be opened in binary mode. If this isn't done +then on platforms which do CR/LF translation automatically then the +original content will not be returned but the translated form. As well as +it not being the original content this can cause problems with calculated +content lengths if the 'Content-Length' response header is returned by the +WSGI application and it has been generated by looking at the actual file +size rather than the translated content. + +Addition Of Content Length +-------------------------- + +The WSGI specification does not say anything specific about whether a WSGI +adapter should generate a 'Content-Length' response header when the +'wsgi.file_wrapper' extension is used and the WSGI application does not +return one itself. + +For mod_wsgi at least, if the WSGI application doesn't provide a +'Content-Length' response header it will calculate the response content +length automatically as being from the current file position to the end of +the file. A 'Content-Length' header will then be added to the response +for that value. + +As far as is known, only mod_wsgi automatically supplies a 'Content-Length' +response header in this way. If consistent behaviour is required on all +platforms, the WSGI application should always calculate the length and add +the header itself. + +Existing Content Length +----------------------- + +Where a 'Content-Length' is specified by the WSGI application, mod_wsgi +will honour that content length. That is, mod_wsgi will only return as many +bytes of the file as specified by the 'Content-Length' header. + +This is not a requirement of the WSGI specification, but then this is one +area of the WSGI specification which is arguably broken. This manifests in +the WSGI specification where it says: + + """transmission should begin at the current position within the "file" + at the time that transmission begins, and continue until the end is + reached""" + +If this interpretation is used, where a WSGI application supplies a +'Content-Length' header and the number of bytes listed is less than the +number of bytes remaining in the file from the current position, then more +bytes than specified by the 'Content-Length' header would be returned. + +To do this would technically be in violation of HTTP specifications which +should dictate that the number of bytes returned be the same as that +specified by the 'Content-Length' response header if supplied. + +Not only is this statement in the WSGI specification arguably wrong, the +example snippet of code which shows how to implement a fallback where the +'wsgi.file_wrapper' is not present, ie.:: + + if 'wsgi.file_wrapper' in environ: + return environ['wsgi.file_wrapper'](filelike, block_size) + else: + return iter(lambda: filelike.read(block_size), '') + +is also wrong. This is because it doesn't restrict the amount of bytes +returned to that specified by 'Content-Length'. + +Although mod_wsgi for normal iterable content would also discard any bytes +in excess of the specified 'Content-Length', many other WSGI adapters are +not known to do this and would just pass back all content regardless. The +result of returning excessive content above the specified 'Content-Length' +would be the failure of subsequent connections were the connection using +keep alive and was pipe lining requests. + +This problem is also compounded by the WSGI specification not placing any +requirement on WSGI middleware to respect the 'Content-Length' response +header when processing response content. Thus WSGI middleware could also +in general generate incorrect response content by virtue of not honouring +the 'Content-Length' response header. + +Overall, although mod_wsgi does what is the logical and right thing to do, +if you need to write code which is portable to other WSGI hosting mechanisms, +you should never produce a 'Content-Length' response header which lists a +number of bytes different to that which would be yielded from an iterable +object such as a file like object. Thus it would be impossible to use any +platform specific file handling features to return a range of bytes from a +file. + +Restrictions On Optimisations +----------------------------- + +Although mod_wsgi always supplies the 'wsgi.file_wrapper' callable object as +part of the WSGI 'environ' dictionary, optimised methods of returning the +file contents as the response are not always used. + +A general restriction is that the file like object must supply both a +'fileno()' and 'tell()' method. This is necessary in order to get access to +the underlying file descriptor and to determine the current position within +the file. + +The file descriptor is needed so as to be able to use the 'sendfile()' +function to return file contents in a more optimal manner. The 'tell()' +method is needed to be able to calculate response 'Content-Length' and to +validate that where the WSGI application supplies its own 'Content-Length' +header that there is sufficient bytes in the file. + +Because the 'sendfile()' function is used by Apache to return file contents +in a more optimal manner and because on Windows a Python file object only +provides a Windows file handle and not a file descriptor, no optimisations +are available on the Windows platform. + +The optimisations are also not able to be used if using Apache 1.3. This is +because Apache doesn't provide access to a mechanism for optimised sending +of file contents to a content handler under Apache 1.3. + +Finally, optimisations are not used where the WSGI application is running in +daemon mode. This is currently disabled because some UNIX platforms do not +appear to support use of the 'sendfile()' function over UNIX sockets and only +support INET sockets. This situation may possibly have changed with recent +versions of Linux at least but this has yet to be investigated properly. + +Whether or not optimisations are supported, the mod_wsgi 'wsgi.file_wrapper' +extension generally still performs better than if a pure Python iterable +object was used to yield the file contents. + +Note that this all presumes that the iterable object returned by +'wsgi.file_wrapper' is actually passed back to mod_wsgi and is not consumed +by a WSGI middleware. For example, a WSGI middleware which compresses the +response content would consume the response content and modify it with a +different iterable object being returned. In this case there is no chance +for optimisations to be used for returning the file contents. + +This problem isn't restricted though to just where the response content is +modified in some way and also extends to any WSGI middleware that wants to +replace the 'close()' method to perform some cleanup actions at the end of +a request. + +This is because in order to interject the cleanup actions triggered on the +'close()' method of the iterable object it has to replace the existing +iterable object with another which wraps the first, with the outer +providing its own 'close()' method. An example of a middleware which +replaces the 'close()' method in this way can be found in +:doc:`../user-guides/registering-cleanup-code`. + +It is thus quite easy for a WSGI application stack to inadvertantly defeat +completely any attempts to return file contents in an optimised way using +the 'wsgi.file_wrapper' extension of WSGI. As such, attempts should always +be used instead to make use of a real web server, whether that be a separate +web server, or in the case of mod_wsgi the underlying Apache web server. + +Where necessary, features of web servers or proxies such as +'X-Accel-Redirect', 'X-Sendfile' or other special purpose headers could be +used. If using mod_wsgi daemon mode and using mod_wsgi version 3.0 or later, +the 'Location' response header can also be used. + diff --git a/docs/user-guides/frequently-asked-questions.rst b/docs/user-guides/frequently-asked-questions.rst new file mode 100644 index 0000000..73cd27e --- /dev/null +++ b/docs/user-guides/frequently-asked-questions.rst @@ -0,0 +1,243 @@ +========================== +Frequently Asked Questions +========================== + +Apache Process Crashes +---------------------- + +*Q*: Why when the mod_wsgi module is initially being loaded by Apache, do +the Apache server processes crash with a 'segmentation fault'? + +*A*: This is nearly always caused due to mod_python also being loaded by +Apache at the same time as mod_wsgi and the Python installation not +providing a shared library, or mod_python having originally being built +against a static Python library. This is especially a problem with older +Linux distributions before they started shipping with Python as a shared +library. + +Further information on these problems can be found in various sections of +[InstallationIssues Installation Issues]. + +*Q*: Why when first request is made against a WSGI application does the +Apache server process handling the request crash with a 'segmentation +fault'? + +*A*: This is nearly always caused due to a shared library version conflict. +That is, Apache or some Apache module is linked against a different version +of a library than that which is being used by a particular Python module +that the WSGI application makes use of. The most common culprits are the +expat and MySQL libraries, but it can also occur with other shared +libraries. + +Another cause of a process crash only upon the first request can be a third +party C extension module for Python which has not been implemented so as to +work within a secondary Python sub interpreter. The Python bindings for +Subversion are a particular example, with the Python module only working +correctly if the WSGI application is forced to run within the first +interpreter instance created by Python. + +Further information on these problems can be found in various sections of +:doc:`../user-guides/application-issues`. +The problems with the expat library are also gone into in more detail in +:doc:`../user-guides/issues-with-expat-library`. + +*Q*: Why am I seeing the error message 'premature end of script headers' in +the Apache error logs. + +*A*: If using daemon mode, this is a symptom of the mod_wsgi daemon process +crashing when handling a request. You would probably also see the message +'segmentation fault'. See answer for question about 'segmentation fault' +above. + +This error message can also occur where you haven't configured Apache +correctly and your WSGI script file is being executed as a CGI script +instead. + +HTTP Error Responses +-------------------- + +*Q*: When I try to use mod_wsgi daemon mode I get the error response '503 +Service Temporarily Unavailable'. + +*A*: The standard Apache runtime directory has restricted access and the +Apache child process cannot access the daemon process sockets. You will +need to use the WSGISocketPrefix directive to specify an alternative +location for storing of runtime files such as sockets. + +For further information see section 'Location Of UNIX Sockets' of +[ConfigurationIssues Configuration Issues]. + +*Q*: I am getting a HTTP 500 error response and I can't find any error in +the Apache error logs. + +*A*: Some users of mod_wsgi 1.3/2.0 and older minor revisions, are finding +that mod_wsgi error messages are going missing, or ending up in the main +Apache error log file rather than a virtual host specific error log file. +Specifically, this is occurring when Apache ErrorLog directive is being +used inside of a VirtualHost container. + +It is not known exactly what operating system setup and/or Apache +configuration is the trigger for this problem. To avoid the problem, use +a newer version of mod_wsgi. + +HTTP Error Log Messages +----------------------- + +*Q*: Why do I get the error 'IOError: client connection closed' appearing +in the error logs? + +*A*: This occurs when the HTTP client making the request closes the +connection before the complete response for a request has been written. + +This can occur where a user force reloads a web page before it had been +completely displayed. It can also occur when using benchmarking tools such +as 'ab' as they will over commit on the number of requests they make when +doing concurrent requests, killing off any extra requests once the required +number has been reached. + +In general this error message can be ignored. + +Application Reloading +--------------------- + +*Q*: Do I have to restart Apache every time I make a change to the Python +code for my WSGI application? + +*A*: If your WSGI application is contained totally within the WSGI script +file and it is that file that you are changing, then no you don't. In this +case the WSGI script file will be automatically reloaded when a change is +made provided that script reloading hasn't been disabled. + +If the code you are changing lies outside of the WSGI script file then what +you may need to do will depend on how mod_wsgi is being used. + +If embedded mode of mod_wsgi is being used, the only option is to restart +Apache. You could set Apache configuration directive MaxRequestsPerChild +to 1 to force a reload of the application on every request, but this is not +recommended because it will perform as bad as or as worse as CGI and will +also affect serving up of static files and other applications being hosted +by the same Apache instance. + +If using daemon mode with a single process you can send a SIGINT signal to +the daemon process using the 'kill' command, or have the application send +the signal to itself when a specific URL is triggered. + +If using daemon mode, with any number of processes, and the process reload +mechanism of mod_wsgi 2.0 has been enabled, then all you need to do is +touch the WSGI script file, thereby updating its modification time, and +the daemon processes will automatically shutdown and restart the next time +they receive a request. + +Use of daemon mode and the process reload mechanism is the preferred +mechanism for handling automatic reloading of code after changes. + +More details on how source code reloading works with mod_wsgi can be +found in :doc:`../user-guides/reloading-source-code`. + +*Q*: Why do requests against my application seem to take forever, but +then after a bit they all run much quicker? + +*A*: This is because mod_wsgi by default performs lazy loading of any +application. That is, an application is only loaded the first time that a +request arrives which targets that WSGI application. This means that those +initial requests will incur the overhead of loading all the application code +and performing any startup initialisation. + +This startup overhead can appear to be quite significant, especially if +using Apache prefork MPM and embedded mode. This is because the +startup cost is incurred for each process and with prefork MPM there are +typically a lot more processes that if using worker MPM or mod_wsgi +daemon mode. Thus, as many requests as there are processes will run +slowly and everything will only run full speed once code has all been +loaded. + +Note that if recycling of Apache child processes or mod_wsgi daemon +processes after a set number of requests is enabled, or for embedded mode +Apache decides itself to reap any of the child processes, then you can +periodically see these delayed requests occurring. + +Some number of the benchmarks for mod_wsgi which have been posted +do not take into mind these start up costs and wrongly try to compare +the results to other systems such as fastcgi or proxy based systems where +the application code would be preloaded by default. As a result mod_wsgi +is painted in a worse light than is reality. If mod_wsgi is configured +correctly the results would be better than is shown by those benchmarks. + +For some cases, such as when WSGIScriptAlias is being used, it is actually +possible to preload the application code when the processes first starts, +rather than when the first request arrives. To preload an application see the +WSGIImportScript directive. + +By preloading the application code you would not normally see delays in +requests being handled. The only exception to this would be when running +a single process under mod_wsgi daemon mode and the process is being +restarted when a maximum number of requests arrives or explicitly via one +of the means to trigger reloading of application code. Delays here can be +avoided by running at least two processes in the daemon process group. +This is because when one process is restarting, the others can handle the +requests. + +Execution Environment +--------------------- + +*Q*: Why do I get the error 'IOError: sys.stdout access restricted by +mod_wsgi'? + +*A*: A portable WSGI application or application component should not +output anything to standard output. This is because some WSGI hosting +mechanisms use standard output to communicate with the web server. If +a WSGI application outputs anything to standard output it will thus +potentially interleave with the response sent back to the client. + +To promote portability of WSGI applications, mod_wsgi by default restricts +direct use of 'sys.stdout' and 'sys.stdin'. Because the 'print' statement +defaults to outputing text to 'sys.stdout', using 'print' for debugging +purposes can cause this error. + +For more details about this issue, including how applications should do +logging and how to disable this restriction see section 'Writing To Standard +Output' in :doc:`../user-guides/application-issues` and section 'Apache Error +Log Files' in :doc:`../user-guides/debugging-techniques`. + +*Q*: Can mod_wsgi be used with Python virtual environments created using +Ian Bicking's 'virtualenv' package? + +*A*: Yes. For more details see :doc:`../user-guides/virtual-environments`. + +Access Control Mechanisms +------------------------- + +*Q*: Why are client user credentials not being passed through to the WSGI +application in the 'HTTP_AUTHORIZATION' variable of the WSGI environment? + +*A*: User credentials are not passed by default as doing so is insecure and +could expose a users password to WSGI applications which shouldn't be +permitted to see it. Such a situation might occur within a corporate +setting where HTTP authentication mechanisms were used to control access to +a corporate web server but it was possible for users to provide their own +web pages. The last thing a system administator will want is normal users +being able to see other users passwords. + +As a result, the passing of HTTP authentication credentials must be +explicitly enabled by the web server administrator. This can only be done +using directives placed in the main Apache confguration file. + +For further information see :doc:`../user-guides/access-control-mechanisms` +and the documentation for the WSGIPassAuthorization directive. + +*Q*: Is there a way of having a WSGI application provide user authentication +for resources outside of the application such as static files, CGI scripts +or even a distinct application. In other words, something akin to being able +to define access, authentication and authorisation handlers in mod_python? + +*A*: Providing you are using Apache 2.0 or later, version 2.0 of mod_wsgi +provides support for hooking into the Apache access, authentication and +authorisation handler phases. This doesn't allow full control of how the +Apache handler is implemented, but does allow control over how user +credentials are validated, determination of what groups a user is a member +of and whether specific hosts are allowed access. This is generally more +than sufficient and makes the task somewhat simpler than needing to +implement a full handler like in mod_python as Apache and mod_wsgi do all +the hard work. + +For further information see :doc:`../user-guides/access-control-mechanisms`. diff --git a/docs/user-guides/installation-issues.rst b/docs/user-guides/installation-issues.rst new file mode 100644 index 0000000..51f05af --- /dev/null +++ b/docs/user-guides/installation-issues.rst @@ -0,0 +1,509 @@ +=================== +Installation Issues +=================== + +Although mod_wsgi is not a large package in itself, it depends on both +Apache and Python to get it compiled and installed. Because Apache and +Python are complicated systems in their own right, various problems can +come up during installation of mod_wsgi. These problems can arise for +various reasons, including an incomplete or suboptimal Python installation +or presence of multiple Python versions. + +The purpose of this document is to capture all the known problems that can +arise regarding installation, including workarounds if available. + +If you are having a problem which doesn't seem to be covered by this +document, also make sure you see :doc:`../user-guides/configuration-issues` +and :doc:`../user-guides/application-issues`. + +Missing Python Header Files +--------------------------- + +In order to compile mod_wsgi from source code you must have installed the +full Python distribution, including header files. On a Linux distribution +where binary Python packages are split into a runtime package and a +developer package, the developer package is often not installed by default. +This means that you will be missing the header files required to compile +mod_wsgi from source code. An example of the error messages you will see +if the developer package is not installed are:: + + mod_wsgi.c:113:20: error: Python.h: No such file or directory + mod_wsgi.c:114:21: error: compile.h: No such file or directory + mod_wsgi.c:115:18: error: node.h: No such file or directory + mod_wsgi.c:116:20: error: osdefs.h: No such file or directory + mod_wsgi.c:119:2: error: #error Sorry, mod_wsgi requires at least Python 2.3.0. + mod_wsgi.c:123:2: error: #error Sorry, mod_wsgi requires that Python supporting thread. + +To remedy the problem, install the developer package for Python +corresponding to the Python runtime package you have installed. What the +name of the developer package is can vary from one Linux distribution to +another. Normally it has the same name as the Python runtime package with +'-dev' appended to the package name. You will need to lookup up list of +available packages in your packaging system to determine actual name of +package to install. + +Lack Of Python Shared Library +----------------------------- + +In the optimal case, when mod_wsgi is compiled the resulting Apache module +should be less than 250 Kbytes in size. If this is not the case and the +module is over 1MB in size, it indicates that the version of Python being +used was not originally configured so as to produce a Python shared library +and is instead only producing a static library. + +Although the existance of only a static library for Python doesn't normally +cause compilation of mod_wsgi to fail, it does mean that when 'libtool' is +used to generate the mod_wsgi Apache module, that it has to embed the +actual static library objects into the Apache module instead of it being +used as a shared library. + +The consequences of this are that when the mod_wsgi Apache module is loaded +by Apache, the operating system dynamic linker has to perform address +relocations on the Python library component of the mod_wsgi Apache module. +Because these relocations require memory to be modified, the full Python +library then becomes private memory to the process and not shared. + +On a Linux system this need to perform the address relocations at runtime +will immediately cause each Apache child process to bloat out in size by +between 1 and 2MB. On a Solaris system, depending on which compiler is +being used and which options, the amount of additional memory used can be +5MB or more. + +To determine whether the compiled mod_wsgi module is making use of a +shared library for Python, many UNIX systems provide the 'ldd' +program. The output from running this on the 'mod_wsgi.so' file would +be something like:: + + $ ldd mod_wsgi.so + linux-vdso.so.1 => (0x00007fffeb3fe000) + libpython2.5.so.1.0 => /usr/local/lib/libpython2.5.so.1.0 (0x00002adebf94d000) + libpthread.so.0 => /lib/libpthread.so.0 (0x00002adebfcba000) + libdl.so.2 => /lib/libdl.so.2 (0x00002adebfed6000) + libutil.so.1 => /lib/libutil.so.1 (0x00002adec00da000) + libc.so.6 => /lib/libc.so.6 (0x00002adec02dd000) + libm.so.6 => /lib/libm.so.6 (0x00002adec0635000) + /lib64/ld-linux-x86-64.so.2 (0x0000555555554000) + +Note how there is a dependency listed on the '.so' file for Python. If +this is not present then mod_wsgi is using a static Python library. + +Although mod_wsgi will still work when compiled against a version of Python +which only provides a static library, you are highly encouraged to ensure +that your Python installation has been configured and compiled with the +'--enable-shared' option to enable the production and use of a shared +library for Python. + +If rebuilding Python to generate a shared library, do make sure that the +Python shared library, or a symlink to it appears in the Python 'config' +directory of your Python installation. If the shared library doesn't appear +here next to the static version of the library, 'libtool' will not be able +to find it and will still use the static version of the library. It is +understood that the Python build process may not actually do this, so you +may have to do it by hand. + +To check, go to the Python 'config' directory of your Python installation +and do a directory listing:: + + $ ls -las + + 4 drwxr-sr-x 2 root staff 4096 2007-11-29 23:26 . + 20 drwxr-sr-x 21 root staff 20480 2007-11-29 23:26 .. + 4 -rw-r--r-- 1 root staff 2078 2007-11-29 23:26 config.c + 4 -rw-r--r-- 1 root staff 1446 2007-11-29 23:26 config.c.in + 8 -rwxr-xr-x 1 root staff 7122 2007-11-29 23:26 install-sh + 7664 -rw-r--r-- 1 root staff 7833936 2007-11-29 23:26 libpython2.5.a + 40 -rw-r--r-- 1 root staff 38327 2007-11-29 23:26 Makefile + 8 -rwxr-xr-x 1 root staff 7430 2007-11-29 23:26 makesetup + 8 -rw-r--r-- 1 root staff 6456 2007-11-29 23:26 python.o + 20 -rw-r--r-- 1 root staff 17862 2007-11-29 23:26 Setup + 4 -rw-r--r-- 1 root staff 368 2007-11-29 23:26 Setup.config + 4 -rw-r--r-- 1 root staff 41 2007-11-29 23:26 Setup.local + +If you only see a '.a' file for Python library, then either Python wasn't +installed with the shared library, or the shared library was placed +elsewhere. What appears to normally happen is that the shared library is +actually placed in the 'lib' directory two levels above the Python 'config' +directory. In that case you need to create a symlink in the 'config' +directory to where the shared library is actually installed:: + + $ ln -s ../../libpython2.5.so . + +Apart from the additional memory consumption when using a static library, +it is also preferable that a shared library be used where it is possible +that you will upgrade your Python installation to a newer patch revision. +This is because if you upgrade Python to a newer patch revision but do +not recompile mod_wsgi, mod_wsgi will still incorporate the older static +Python library and will not pick up any changes from the newer version +of Python. This will result in undefined behaviour as the Python library +code may not match up with the Python code modules or external modules +in the Python installation. If a Python shared library is used, this will +not be a problem. + +Multiple Python Versions +------------------------ + +Where there are multiple versions of Python installed on a system and it is +necessary to ensure that a specific version is used, the '--with-python' +option can be supplied to 'configure' when installing mod_wsgi:: + + ./configure --with-python=/usr/local/bin/python2.5 + +This may be necessary where for example the default Python version supplied +with the system is an older version of Python. More specifically, it would +be required where it isn't possible to replace the older version of Python +outright due to operating system management scripts being dependent on the +older version of Python and not working with newer versions of Python. + +Where multiple versions of Python are present and are installed under the +same directory, this should generally be all that is required. If however +the newer version of Python you wish to use is in a different location, for +example under '/usr/local', it is possible that when Apache is started that +it will not be able find the Python library files for the version of Python +you wish to use. + +This can occur because the Python library when initialised determines where +the Python installation resides by looking through directories specified in +the 'PATH' environment variable for the 'python' executable and using that +as base location for calculating installation prefix. Specifically, the +directory above the directory containing the 'python' executable is taken +as being the installation prefix. + +When the Python which should be used is installed in a non standard +location, then that 'bin' directory is unlikely to be in the 'PATH' used by +Apache when it is started. As such, rather than find +'/usr/local/bin/python' it would instead find '/usr/bin/python' and so use +'/usr' rather than the directory '/usr/local/' as the installation prefix. + +When this occurs, if under '/usr' there was no Python installation of the +same version number as Python which should be used, then normally:: + + 'import site' failed; use -v for traceback + +would appear in the Apache error log file when Python is first being +initialised within Apache. Any attempt to make a request against a WSGI +application would also result in errors as no modules at all except for +inbuilt modules, would be able to be found when an attempt is made to +import them. + +Alternatively, if there was a Python installation of the same version, +albeit not the desired installation, then there may be no obvious issues on +startup, but at run time you may find modules cannot be found when being +imported as they are installed into a different location than that which +was being used. Even if equivalent module is found, it could fail at run +time in subtle ways if the two Python installations are of same version but at +the different locations are compiled in different ways, or if it is a third +party module and they are different versions and so API is different. + +In this situation it will be necessary to explicitly tell mod_wsgi +where the Python executable for the version of Python which should be +used, is located. This can be done using the WSGIPythonHome directive:: + + WSGIPythonHome /usr/local + +The value given to the WSGIPythonHome directive should be a normalised +path corresponding to that defined by the Python {{{sys.prefix}}} variable +for the version of Python being used and passed to the '--with-python' +option when configuring mod_wsgi:: + + >>> import sys + >>> sys.prefix + '/usr/local' + +An alternative, although less desirable way of achieving this is to set the +'PATH' environment variable in the startup scripts for Apache. For a standard +Apache installation using ASF structure, this can be done by editing the +'envvars' file in same directory as the Apache executable and adding the +alternate bin directory to the head of the 'PATH':: + + PATH=/usr/local/bin:$PATH + export PATH + +If there are any concerns over what Python installation directory is being +used and you want to verify what it is, then use a small test WSGI script +which outputs the values of 'sys.prefix' and 'sys.path'. For example:: + + import sys + + def application(environ, start_response): + status = '200 OK' + output = 'Hello World!' + + response_headers = [('Content-type', 'text/plain'), + ('Content-Length', str(len(output)))] + start_response(status, response_headers) + + print >> sys.stderr, 'sys.prefix = %s' % repr(sys.prefix) + print >> sys.stderr, 'sys.path = %s' % repr(sys.path) + + return [output] + +Using ModPython and ModWsgi +--------------------------- + +Using mod_python and mod_wsgi together is no longer supported and recent +versions of mod_wsgi will cause the startup of Apache to be aborted if both +are loaded at the same time. + +Python Patch Level Mismatch +--------------------------- + +If the Python package is upgraded to a newer patch level revision, one +will likely see the following warning messages in the Apache error log +when Python is being initialised:: + + mod_wsgi: Compiled for Python/2.4.1. + mod_wsgi: Runtime using Python/2.4.2. + +The warning is indicating that a newer version of Python is now being +used than what mod_wsgi was originally compiled for. + +This would generally not be a problem provided that both versions of Python +were originally installed with the '--enable-shared' option supplied to +'configure'. If this option is used then the Python library will be linked +in dynamically at runtime and so an upgrade to the Python version will be +automatically used. + +If '--enable-shared' was however not used and the Python library is +therefore embedded into the actual mod_wsgi Apache module, then there is a +risk of undefined behaviour. This is because the version of the Python +library embedded into the mod_wsgi Apache module will be older than the +corresponding Python code modules and extension modules being used from the +Python library directory. + +Thus, if a shared library is not being used for Python it will be necessary +to rebuild mod_wsgi against the newer patch level revision of mod_wsgi and +reinstall it. + +Mixing 32 Bit And 64 Bit Packages +--------------------------------- + +When attempting to compile mod_wsgi on a Linux system using an X86 64 bit +processor, the following error message can arise:: + + /bin/sh /usr/lib64/apr/build/libtool --silent --mode=link gcc -o \ + mod_wsgi.la -I/usr/local/include/python2.4 -DNDEBUG -rpath \ + /usr/lib64/httpd/modules -module -avoid-version mod_wsgi.lo \ + -L/usr/local/lib/python2.4/config -lpython2.4 -lpthread -ldl -lutil + /usr/bin/ld: /usr/local/lib/python2.4/config/ + libpython2.4.a(abstract.o): relocation R_X86_64_32 against `a local + symbol' can not be used when making a shared object; recompile with -fPIC + /usr/local/lib/python2.4/config/libpython2.4.a: could not read symbols: Bad value + collect2: ld returned 1 exit status + apxs:Error: Command failed with rc=65536 + . + make: *** [mod_wsgi.la] Error 1 + +This error is believed to be result of the version of Python being used +having been originally compiled for the generic X86 32 bit architecture +whereas mod_wsgi is being compiled for X86 64 bit architecture. The actual +error arises in this case because 'libtool' would appear to be unable to +generate a dynamically loadable module for the X86 64 bit architecture from +a X86 32 bit static library. Alternatively, the problem is due to 'libtool' +on this platform not being able to create a loadable module from a X86 64 +bit static library in all cases. + +If the first issue, the only solution to this problem is to recompile +Python for the X86 64 bit architecture. When doing this, it is preferable, +and may actually be necessary, to ensure that the '--enable-shared' option +is provided to the 'configure' script for Python when it is being compiled +and installed. + +If rebuilding Python to generate a shared library, do make sure that the +Python shared library, or a symlink to it appears in the Python 'config' +directory of your Python installation. If the shared library doesn't appear +here next to the static version of the library, 'libtool' will not be able +to find it and will still use the static version of the library. It is +understood that the Python build process may not actually do this, so you +may have to do it by hand. + +If the version of Python being used was compiled for X86 64 bit +architecture and a shared library does exist, but not in the 'config' +directory, then adding the missing symlink may be all that is required. + +Unable To Find Python Shared Library +------------------------------------ + +When mod_wsgi is built against a version of Python providing a shared +library, the Python shared library must be in a directory which is searched +for libraries at runtime by Apache. If this isn't the case the Python +shared library will not be able to be found when loading the mod_wsgi +module in to Apache. The error in this situation will be similar to:: + + error while loading shared libraries: libpython2.4.so.1.0: \ + cannot open shared object file: No such file or directory + +A number of alternatives exist for resolving this problem. The preferred +solution would be to copy the Python shared library into a directory which +is searched for dynamic libraries at run time. Directories which would +generally always be searched are '/lib' and '/usr/lib'. + +For some systems the directory '/usr/local/lib' may also be searched, but +this may depend on the directory having been explicitly added to the +approrpiate system file listing the directories to be searched. The name +and location of this configuration file differs between platforms. On Linux +systems it is often called '/etc/ld.so.conf'. If changes are made to the +file on Linux systems the 'ldconfig' command also needs to be run. See the +manual page for 'ldconfig' for further details. + +Rather than changing the system wide list of directories to search for +shared libraries, additional search directories can be specified just +for Apache. On Linux this would entail setting the 'LD_LIBRARY_PATH' +environment variable to include the directory where the Python shared +library is installed. + +The setting and exporting of the environment variable would be placed in +the Apache 'envvars' file, for a standard Apache installation, located in +the same directory as the Apache web server executable. If using a +customised Apache installation, such as on Red Hat, the 'envvars' file may +not exist. In this case you would need to add this into the actual startup +script for Apache. For Red Hat this is '/etc/sysconfig/httpd'. + +A final alternative on some systems is to embed the directory to search +for the Python shared library into the mod_wsgi Apache module itself. On +Linux systems this can be done by setting the environment variable +'LD_RUN_PATH' to the directory containing the Python shared library when +initially building the mod_wsgi source code. + +GNU C Stack Smashing Extensions +------------------------------- + +Various Linux distributions are starting to ship with a version of the GNU +C compiler which incorporates an extension which implements protection for +stack-smashing. In some instances where such a compiler is used to build +mod_wsgi, the module is unable to then be loaded by Apache. The specific +problem is that the symbol ``__stack_chk_fail_local`` is being flagged as +undefined:: + + $ invoke-rc.d apache2 reload + apache2: Syntax error on line 190 of /etc/apache2/apache2.conf: \ + Cannot load /usr/lib/apache2/modules/mod_wsgi.so into server: \ + /usr/lib/apache2/modules/mod_wsgi.so: \ + undefined symbol: __stack_chk_fail_local failed! + invoke-rc.d: initscript apache2, action "reload" failed. + +The exact reason for this is not known but it is speculated to be caused +when the system libraries or Apache itself has not been compiled with a +version of the GNU C compiler incorporating the extension. + +To workaround the problem, modify the 'Makefile' for mod_wsgi and change +the value of 'CFLAGS' to:: + + CFLAGS = -Wc,-fno-stack-protector + +Perform a 'clean' in the directory and then rebuild and reinstall the +mod_wsgi module. + +Undefined 'forkpty' On Fedora 7 +------------------------------- + +On Fedora 7, the provided binary version of Apache is not linked against +the 'libutil' system library. This causes problems when Python is initialised +and the 'posix' module imported for the first time. This is because the +'posix' module requires functions from 'libutil' but they will not be present. +The error encountered would be similar to:: + + httpd: Syntax error on line 54 of /etc/httpd/conf/httpd.conf: Cannot \ + load /etc/httpd/modules/mod_wsgi.so into server: \ + /etc/httpd/modules/mod_wsgi.so: undefined symbol: forkpty + +This problem can be fixed by adding '-lutil' to the list of libraries to +link mod_wsgi against when it is being built. This can be done by adding +'-lutil' to the 'LDLIBS' variable in the mod_wsgi 'Makefile' after having +run 'configure'. + +An alternative method which may work is to edit the 'envvars' file, if it +exists and is used, located in the same directory as the Apache 'httpd' +executable, or the Apache startup script, and add:: + + LD_PRELOAD=/usr/lib/libutil.so + export LD_PRELOAD + +Missing Include Files On SUSE +----------------------------- + +SUSE Linux follows a slightly different convention to other Linux +distributions and has split their Apache "dev" packages in a way as to +allow packages for different Apache MPMs to be installed at the same time. +Although the resultant mod_wsgi module isn't strictly MPM specific, it +does indirectly include the MPM specific header file "mpm.h". Because the +header file is MPM specific, when configuring mod_wsgi, it is necessary to +reference the version of "apxs" from the MPM specific "dev" package else +the "mpm.h" header file will not be found at compile time. These errors +are:: + + In file included from mod_wsgi.c:4882: /usr/include/apache2/mpm_common.h:46:17: error: mpm.h: No such file or directory + ... + mod_wsgi.c: In function 'wsgi_set_accept_mutex': + mod_wsgi.c:5200: error: 'ap_accept_lock_mech' undeclared (first use in this function) + mod_wsgi.c:5200: error: (Each undeclared identifier is reported only once + mod_wsgi.c:5200: error: for each function it appears in.) + apxs:Error: Command failed with rc=65536 + +To avoid this problem, when configuring mod_wsgi, it is necessary to use +the "--with-apxs" option to designate that either "apxs2-worker" or +"apxs2-prefork" should be used. Thus:: + + ./configure --with-apxs=/usr/sbin/apxs2-worker + +or:: + + ./configure --with-apxs=/usr/sbin/apxs2-prefork + +Although which is used is not important, since mod_wsgi when compiled isn't +specific to either, best to use that which corresponds to the version of +Apache being used. + +Apache Maintainer Mode +---------------------- + +When building mod_wsgi from source code, on UNIX systems there should be +minimal if no compiler warnings. If you see a lot of warnings, especially +complaints about ``ap_strstr``, then your Apache installation has been +configured for maintainer mode:: + + mod_wsgi.c: In function 'wsgi_process_group': + mod_wsgi.c:722: warning: passing argument 1 of 'ap_strstr' discards + qualifiers from pointer target type + mod_wsgi.c:740: warning: passing argument 1 of 'ap_strstr' discards + qualifiers from pointer target type + +Specifically, whoever built the version of Apache being used supplied the +option '--enable-maintainer-mode' when configuring Apache prior to +installation. You would be able to tell at the time of compiling mod_wsgi +if this has been done as the option '-DAP_DEBUG' would be supplied to the +compiler when mod_wsgi source code is compiled. + +These warnings can be ignored, but in general you shouldn't run Apache in +maintainer mode. + +A further reason for not running Apache in maintainer mode is that certain +situations can cause Apache to fail an internal assertion check when using +mod_wsgi. The specific error message is:: + + [crit] file http_filters.c, line 346, assertion "readbytes > 0" failed + [notice] child pid 18551 exit signal Aborted (6) + +This occurs because the Apache code has an overly agressive assertion +check, which is arguably incorrect. This particular assertion check will +fail when a zero length read is perform on the Apache 'HTTP_IN' input +filter. + +This scenario can arise in mod_wsgi due to a workaround in place to get +around a bug in Apache related to generation of '100-continue' response. +The Apache bug is described in: + + * https://issues.apache.org/bugzilla/show_bug.cgi?id=38014 + +The scenario can also be triggered as a result of a WSGI application +performing a zero length read on 'wsgi.input'. + +Changes to mod_wsgi are being investigated to see if zero length reads can +be ignored, but due to the workaround for the bug, this would only be able +to be done for Apache 2.2.8 or later. + +The prefered solution is simply not to use Apache with maintainer mode +enabled for systems where you are running real code. Unfortunately, it +looks like some Linux distributions, eg. SUSE, accidentally released Apache +binary packages with this mode enabled by default. You should update to a +Apache binary package that doesn't have the mode enabled, or compile from +source code. diff --git a/docs/user-guides/issues-with-expat-library.rst b/docs/user-guides/issues-with-expat-library.rst new file mode 100644 index 0000000..d05aa9b --- /dev/null +++ b/docs/user-guides/issues-with-expat-library.rst @@ -0,0 +1,225 @@ +========================= +Issues With Expat Library +========================= + +This article describes problems caused due to mismatches in the version of +the "expat" library embedded into Python and that linked into Apache. Where +incompatible versions are used, Apache can crash as soon as any Python code +module imports the "pyexpat" module. + +Note that this only applies to Python versions prior to Python 2.5. From +Python 2.5 onwards, the copy of the "expat" library bundled in with Python +is name space prefixed, thereby avoid name clashes with an "expat" library +which has previously been loaded. + +The Dreaded Segmentation Fault +------------------------------ + +When moving beyond creating simple WSGI applications to more complicated +tasks, one can unexpectedly be confronted with Apache crashing. This +generally manifests in no response being returned to the browser when a +request is made. Upon further investigation of the Apache error log file, a +message similar to the following message is found:: + + [notice] child pid 3238 exit signal Segmentation fault (11) + +The change which causes this is the explicit addition of code to import the +Python module "pyexpat", or the importing of any Python module which +indirectly makes use of the "pyexpat" module. Examples of other modules +which make use of the "pyexpat" module are "xmlrpclib" and modules from the +"PyXML" package. Nearly always, any module which in some way performs +processing of XML data will be affected as most such modules rely on using +the "pyexpat" module in some way. + +Verifying Expat Is The Problem +------------------------------ + +To verify that the "pyexpat" module is the trigger for the problem, +construct a simple WSGI application script file containing:: + + def application(environ, start_response): + status = '200 OK' + output = 'without expat\n' + + response_headers = [('Content-type', 'text/plain'), + ('Content-Length', str(len(output)))] + start_response(status, response_headers) + + return [output] + +Verify that this handler works and the browser receives the response +"without pyepxat". Now modify the handler such that the "pyexpat" module is +being imported. Also change the response so that it is clear that the +modified handler is being used:: + + import pyexpat + + def application(environ, start_response): + status = '200 OK' + output = 'with expat\n' + + response_headers = [('Content-type', 'text/plain'), + ('Content-Length', str(len(output)))] + start_response(status, response_headers) + + return [output] + +Presuming that script reloading is enabled, if now upon a request being +received by the WSGI application a succesful response of "with pyexpat" is +received by the browser, it would generally indicate that the "pyexpat" +module is not the problem after all. If however no response is received and +the Apache error log records a "Segmentation fault" then the "pyexpat" +module is the trigger. + +Mismatch In Versions Of Expat +----------------------------- + +Segmentation faults can occur with any application where different +components of the application were compiled against different versions of a +common library such as the "expat" library. The actual cause of the problem +is generally a change in the API of the library, such as changed function +prototypes, changed data types, or changes in structure layouts. In the +case where mod_wsgi is being used, the different components are Apache +and the "pyexpat" module from Python. + +Normally when different components of an application are built, they would +be built against the same version of the library and such problems would +not occur. In the case of the "pyexpat" module however, it is compiled +against a distinct version of the "expat" library which is then embedded +within the "pyexpat" module. At the same time, Apache will be built against +the version of the "expat" library included with the operating system, or +if not a standard part of the operating system, a version which is supplied +with Apache. + +Thus if the version of the "expat" library embedded into the "pyexpat" +module is different to that which Apache was compiled against, the +potential for this problem will exist. Note though that there may not +always be a problem. Whether there is or not will ultimately depend on what +changes were made in the "expat" library between the releases of the +different versions used. It is also possible how each library version was +compiled could be a factor. + +Expat Version Used By Apache +---------------------------- + +To determine the version of the the "expat" library which is used by +Apache, on Linux the "ldd" command can be used. Other operating systems +also provide this program or will generally have some form of equivalent +program. For example, on Mac OS X the command which is run is "otool -L". + +The purpose of these programs is to generate a list of all shared libraries +that an application is linked against. To determine where the "expat" +library being used by Apache is located, it is necessary to run the "ldd" +program on the "httpd" program. On a Linux system, the "httpd" program is +normally located in "/usr/sbin". Because we are only interested in the +"expat" library, we can ignore anything but the reference to that library:: + + [grahamd@dscpl grahamd]$ ldd /usr/sbin/httpd | grep expat + libexpat.so.0 => /usr/lib/libexpat.so.0 (0xb7e8c000) + +From this output it can be seen that the "httpd" program appears to be +using "/usr/lib/libexpat.so.0". Although some operating systems embed in +the name of the shared library versioning information, it does not +generally indicate the true version of the code base which made up the +library. To obtain this, it is necessary to extract the version information +out of the library. For the "expat" library this can be determined by +searching within the strings contained in the library for a version string +starting with ``expat_``:: + + [grahamd@dscpl grahamd]$ strings /usr/lib/libexpat.so.0 | grep expat_ + expat_1.95.8 + +The version of the "expat" library would therefore appear to be "1.95.8". +Unfortunately though, many operating systems allow the library search path +to be overridden at the point that a program is run using an environment +variable such as "LD_LIBRARY_PATH" and it is quite possible that when +Apache is run, the context in which it is run could result in it finding +the "expat" library in a different location. + +To be absolutely sure, it is necessary to determine which "expat" library +the running copy of Apache used. On Linux and many other operating systems, +this can be determined using the "lsof" command. If this program doesn't +exist, an alternate program which may be available is "ofiles". Either of +these should be run against one of the active Apache processes. If Apache +was originally started as root, the command will also need to be run as +root:: + + [grahamd@dscpl grahamd]$ ps aux | grep http | head -3 + root 3625 0.0 0.6 31068 12836 ? SN Sep25 0:08 /usr/sbin/httpd + apache 24814 0.0 0.7 34196 15604 ? SN 04:11 0:00 /usr/sbin/httpd + apache 24815 0.0 0.7 33924 15916 ? SN 04:11 0:00 /usr/sbin/httpd + + [grahamd@dscpl grahamd]$ sudo /usr/sbin/lsof -p 3625 | grep expat + httpd 3625 root mem REG 253,0 123552 6409040 + /usr/lib/libexpat.so.0.5.0 + + [grahamd@dscpl grahamd]$ strings /usr/lib/libexpat.so.0.5.0 | grep expat_ + expat_1.95.8 + +Expat Version Used By Python +---------------------------- + +To determine the version of the "expat" library which is embedded in the +Python "pyexpat" module, the module should be imported and the version +information extracted from the module. This can be done by executing +"python" on the command line and entering the necessary code directly:: + + [grahamd@dscpl grahamd]$ python + Python 2.3.3 (#1, May 7 2004, 10:31:40) + [GCC 3.3.3 20040412 (Red Hat Linux 3.3.3-7)] on linux2 + Type "help", "copyright", "credits" or "license" for more information. + >>> import pyexpat + >>> pyexpat.version_info + (1, 95, 7) + +Combining Python And Apache +--------------------------- + +When mod_wsgi is used from within Apache, although there is a version of +the "expat" library embedded in the "pyexpat" module, it will effectively +be ignored. This is because Apache has already loaded into memory at +startup the version of the "expat" library which it is linked against. That +this occurs can be seen by using the ability of Linux to forcibly preload a +shared library into a program when run, even though that program wasn't +linked against the library orginally. This is achieved using the +"LD_PRELOAD" environment variable:: + + [grahamd@dscpl grahamd]$ LD_PRELOAD=/usr/lib/libexpat.so.0.5.0 python + Python 2.3.3 (#1, May 7 2004, 10:31:40) + [GCC 3.3.3 20040412 (Red Hat Linux 3.3.3-7)] on linux2 + Type "help", "copyright", "credits" or "license" for more information. + >>> import pyexpat + >>> pyexpat.version_info + (1, 95, 8) + +As can be seen, although the "pyexpat" module for this version of Python +embedded version 1.95.7 of the "expat" library, when the same version of +the "expat" library as was being used by Apache is forcibly loaded into the +program at startup, the version information obtained from the "pyexpat" +module now shows that version 1.95.8 of the "expat" library is being used. + +Luckily in this case, the patch level difference between the two versions +of the "expat" library as used by Python and Apache doesn't cause a +problem. If however the two versions of the "expat" library were +incompatible, one would expect to see the "python" program crash with a +segmentation fault at this point. This therefore can be used as an +alternate way of verifying that it is the "pyexpat" module and more +specifically the version of the "expat" library used, that is causing the +problem. + +Updating System Expat Version +----------------------------- + +Because the version of the "expat" library embedded within the "pyexpat" +module is shipped as source code within the Python distribution, it can be +hard to replace it. The preferred approach to resolving the mismatch is +therefore to replace/update the version of the "expat" library that is used +by Apache. + +Generally the problem occurs where that used by Apache is older than that +which is being used by Python. In that case, the version of the "expat" +library used by Apache should be updated to be the same version as that +embedded within the "pyexpat" module. By using the same version, one would +expect any problems to disappear. If problems still persist, it is possible +that Apache may also need to be recompiled against the same version of the +"expat" library as used in Python. diff --git a/docs/user-guides/issues-with-pickle-module.rst b/docs/user-guides/issues-with-pickle-module.rst new file mode 100644 index 0000000..a0d6cad --- /dev/null +++ b/docs/user-guides/issues-with-pickle-module.rst @@ -0,0 +1,175 @@ +========================= +Issues With Pickle Module +========================= + +This article describes various limitations on what data can be stored using +the "pickle" module from a WSGI application script file. This arises due +to the fact that a WSGI application script file is not treated exactly the +same as a standard Python module. + +Note that these limitations only apply to the WSGI application script file +which is the target of the WSGIScriptAlias, AddHandler or Action +directives. Any standard Python modules or packages which make up an +application and which are being imported from directories located in +``sys.path`` using the 'import' statement are not affected. + +Packing And Script Reloading +---------------------------- + +The first source of problems and limitations is how the operation of the +"pickle" serialisation routine is affected by the ability of mod_wsgi to +automatically reload WSGI application script files. The particular types of +data which are known to be affected are function objects and class objects. + +To illustrate the problems and where they arise, consider the following +output from an interactive Python session:: + + >>> import pickle + >>> def a(): pass + ... + >>> pickle.dumps(a) + 'c__main__\na\np0\n.' + >>> z = a + >>> pickle.dumps(z) + 'c__main__\na\np0\n.' + +As can be seen, it is possible to pickle a function object. This can be +done even through a copy of the function object by reference, although in +that case the pickled object still refers to the original function object. + +If now the original function object is deleted however, and the copy of the +function object is pickled, a failure will occur:: + + >>> del a + >>> pickle.dumps(z) + Traceback (most recent call last): + ... + pickle.PicklingError: Can't pickle : it's not found as __main__.a + +The exception has been raised because the original function object was +deleted from where it was created. It occurs because the copy of the +original function object is still internally identified by the name which +it was assigned at the point of creation. The "pickle" serialisation +routine will check that the original object as identified by the name still +exists. If it doesn't exist, it will refuse to serialise the object. + +Creating a new function object in place of the original function object +does not eliminate the problem, although it does result in a different sort +of exception:: + + >>> def a(): pass + ... + >>> pickle.dumps(z) + Traceback (most recent call last): + ... + pickle.PicklingError: Can't pickle : it's not the same object as __main__.a + +In this case, the "pickle" serialisation routine recognises that "a" exists +but realises that it is actually a different function object from which the +"z" copy was originally made. + +Where the problems start occuring with mod_wsgi is if the function object +being saved was itself a copy of some function object which is held outside +of the module the function object was defined in. If the module holding the +original function object was actually the WSGI application script file and +it was reloaded because of the automatic script reloading mechanism, an +attempt to pickle the object will fail. This is because the original +function object which had been copied from will have been replaced by a new +one when the script was reloaded. + +This sort of problem, although it will not occur for an instance of a +class, will occur for the class object itself:: + + >>> class B: pass + ... + >>> b=B() + >>> pickle.dumps(b) + '(i__main__\nB\np0\n(dp1\nb.' + >>> del B + >>> pickle.dumps(b) + '(i__main__\nB\np0\n(dp1\nb.' + >>> class B: pass + ... + >>> pickle.dumps(B) + 'c__main__\nB\np0\n.' + >>> C = B + >>> pickle.dumps(C) + 'c__main__\nB\np0\n.' + >>> del B + >>> pickle.dumps(C) + Traceback (most recent call last): + ... + pickle.PicklingError: Can't pickle : it's not found as __main__.B + +Note though that for the case of a class instance, an appropriate class +object must exist at the same location when the serialised object is being +restored:: + + >>> class B: pass + ... + >>> b = B() + >>> pickle.loads(pickle.dumps(b)) + <__main__.B instance at 0x41e40> + >>> del B + >>> pickle.loads(pickle.dumps(b)) + Traceback (most recent call last): + ... + AttributeError: 'module' object has no attribute 'B' + +Unpacking And Module Names +-------------------------- + +The second problem derives from how the mod_wsgi script loading mechanism +does not make use of the standard Python module importing mechanism. This +is necessary as the standard Python module importing mechanism requires +every loaded module to have a unique name, with each module residing in +``sys.modules`` under that name. Further, that name must be able to be +used to import the module. + +The mod_wsgi script loading mechanism does not place modules in +``sys.modules`` under their original name so as to allow multiple modules +with the same name in different directories and also to avoid having to use +the ".py" extension for script files. + +The consequence though of modules not residing in ``sys.modules`` under +their original name is that function objects and class objects within such +a module may not be able to converted back into objects from their +serialised form. This is because "pickle" when attempting to import a +module automatically if the module isn't already loaded will not be +able to load the WSGI application script file. + +The problem can be seen in the following output from an interactive Python +session:: + + >>> exec "class C: pass" in m.__dict__ + >>> c = m.C() + >>> pickle.dumps(c) + '(im\nC\np0\n(dp1\nb.' + >>> pickle.loads(pickle.dumps(c)) + + >>> del sys.modules["m"] + >>> pickle.loads(pickle.dumps(c)) + Traceback (most recent call last): + ... + ImportError: No module named m + +Summary Of Limitations +---------------------- + +Although the first problem described above could be avoided by disabling +script reloading, there is no way to work around the second problem +resulting from how mod_wsgi names modules when stored in ``sys.modules``. + +In practice, what this means is that neither function objects, class +objects or instances of classes which are defined in a WSGI application +script file should be stored using the "pickle" module. + +In order to ensure that no strange problems at all are likely to occur, it +is suggested that only basic builtin Python types, ie., scalars, tuples, +lists and dictionaries, be stored using the "pickle" module from a WSGI +application script file. That is, avoid any type of object which has user +defined code associated with it. + +Note that this limitation only applies to the WSGI application script file, +it doesn't apply to normal Python modules imported using the Python "import" +statement. diff --git a/docs/user-guides/processes-and-threading.rst b/docs/user-guides/processes-and-threading.rst new file mode 100644 index 0000000..05813af --- /dev/null +++ b/docs/user-guides/processes-and-threading.rst @@ -0,0 +1,495 @@ +======================= +Processes And Threading +======================= + +Apache can operate in a number of different modes dependent on the platform +being used and the way in which it is configured. This ranges from multiple +processes being used, with only one request being handled at a time within +each process, to one or more processes being used, with concurrent requests +being handled in distinct threads executing within those processes. + +The combinations possible are further increased by mod_wsgi through its +ability to create groups of daemon processes to which WSGI applications can +be delegated. As with Apache itself, each process group can consist of one +or more processes and optionally make use of multithreading. Unlike Apache, +where some combinations are only possible based on how Apache was compiled, +the mod_wsgi daemon processes can operate in any mode based only on runtime +configuration settings. + +This article provides background information on how Apache and mod_wsgi +makes use of processes and threads to handle requests, and how Python +sub interpreters are used to isolate WSGI applications. The implications +of the various modes of operation on data sharing is also discussed. + +WSGI Process/Thread Flags +------------------------- + +Although Apache can make use of a combination of processes and/or threads +to handle requests, this is not unique to the Apache web server and the +WSGI specification acknowledges this fact. This acknowledgement is in the +form of specific key/value pairs which must be supplied as part of the WSGI +environment to a WSGI application. The purpose of these key/value pairs is +to indicate whether the underlying web server does or does not make use of +multiple processes and/or multiple threads to handle requests. + +These key/value pairs are defined as follows in the WSGI specification. + +*wsgi.multithread* + This value should evaluate true if the application object may be + simultaneously invoked by another thread in the same process, and + should evaluate false otherwise. + +*wsgi.multiprocess* + This value should evaluate true if an equivalent application object may + be simultaneously invoked by another process, and should evaluate false + otherwise. + +A WSGI application which is not written to take into consideration the +different combinations of process and threading models may not be portable +and potentially may not be robust when deployed to an alternate hosting +platform or configuration. + +Although you may not need an application or application component to work +under all possible combinations for these values initially, it is highly +recommended that any application component still be designed to work under +any of the different operating modes. If for some reason this cannot be +done due to the very nature of what functionality the component provides, +the component should validate if it is being run within a compatible +configuration and return a HTTP 500 internal server error response if it +isn't. + +An example of a component for which restrictions would apply is one +providing an interactive browser based debugger session in response to an +internal failure of a WSGI application. In this scenario, for the component +to work correctly, subsequent HTTP requests must be processed by the same +process. As such, the component can only be used with a web server that +uses a single process. In other words, the value of 'wsgi.multiprocess' +would have to evaluate to be false. + +Multi-Processing Modules +------------------------ + +The main factor which determines how Apache operates is which +multi-processing module (MPM) is built into Apache at compile time. +Although runtime configuration can customise the behaviour of the MPM, the +choice of MPM will dictate whether or not multithreading is available. + +On UNIX based systems, Apache defaults to being built with the 'prefork' +MPM. If Apache 1.3 is being used this is actually the only choice, but for +later versions of Apache, this can be overridden at build time by supplying +an appropriate value in conjunction with the '--with-mpm' option when +running the 'configure' script for Apache. The main alternative to the +'prefork' MPM which can be used on UNIX systems is the 'worker' MPM. + +If you are unsure which MPM is built into Apache, it can be determined +by running the Apache web server executable with the '-V' option. The +output from running the web server executable with this option will be +information about how it was configured when built:: + + Server version: Apache/2.2.1 + Server built: Mar 4 2007 20:48:15 + Server's Module Magic Number: 20051115:1 + Server loaded: APR 1.2.6, APR-Util 1.2.6 + Compiled using: APR 1.2.6, APR-Util 1.2.6 + Architecture: 32-bit + Server MPM: Worker + threaded: yes (fixed thread count) + forked: yes (variable process count) + Server compiled with.... + -D APACHE_MPM_DIR="server/mpm/worker" + -D APR_HAS_MMAP + -D APR_HAVE_IPV6 (IPv4-mapped addresses enabled) + -D APR_USE_SYSVSEM_SERIALIZE + -D APR_USE_PTHREAD_SERIALIZE + -D SINGLE_LISTEN_UNSERIALIZED_ACCEPT + -D APR_HAS_OTHER_CHILD + -D AP_HAVE_RELIABLE_PIPED_LOGS + -D DYNAMIC_MODULE_LIMIT=128 + -D HTTPD_ROOT="/usr/local/apache-2.2" + -D SUEXEC_BIN="/usr/local/apache-2.2/bin/suexec" + -D DEFAULT_SCOREBOARD="logs/apache_runtime_status" + -D DEFAULT_ERRORLOG="logs/error_log" + -D AP_TYPES_CONFIG_FILE="conf/mime.types" + -D SERVER_CONFIG_FILE="conf/httpd.conf" + +Which MPM is being used can be determined from the 'Server MPM' field. + +On the Windows platform the only available MPM is 'winnt'. + +The UNIX 'prefork' MPM +---------------------- + +This MPM is the most commonly used. It was the only mode of operation +available in Apache 1.3 and is still the default mode on UNIX systems in +later versions of Apache. In this configuration, the main Apache process +will at startup create multiple child processes. When a request is received +by the parent process, it will be processed by which ever of the child +processes is ready. + +Each child process will only handle one request at a time. If another +request arrives at the same time, it will be handled by the next available +child process. When it is detected that the number of available processes +is running out, additional child processes will be created as necessary. If +a limit is specified as to the number of child processes which may be +created and the limit is reached, plus there are sufficient requests +arriving to fill up the listener socket queue, the client may instead +receive an error resulting from not being able to establish a connection +with the web server. + +Where additional child processes have to be created due to a peak in the +number of current requests arriving and where the number of requests has +subsequently dropped off, the excess child processes may be shutdown and +killed off. Child processes may also be shutdown and killed off after they +have handled some set number of requests. + +Although threads are not used to service individual requests, this does not +preclude an application from creating separate threads to perform some +specific task. + +For the typical 'prefork' configuration where multiple processes are used, +the WSGI environment key/value pairs indicating how processes and threads +are being used will be as follows. + +*wsgi.multithread* + False + +*wsgi.multiprocess* + True + +Because multiple processes are being used, a WSGI middleware component such +as the interactive browser based debugger described would not be able to be +used. If during development and testing of a WSGI application, use of such a +debugger was required, the only option which would exist would be to limit +the number of processes being used. This could be achieved using the Apache +configuration:: + + StartServers 1 + ServerLimit 1 + +With this configuration, only one process will be started, with no +additional processes ever being created. The WSGI environment key/value +pairs indicating how processes and threads are being used will for this +configuration be as follows. + +*wsgi.multithread* + False + +*wsgi.multiprocess* + False + +In effect, this configuration has the result of serialising all requests +through a single process. This will allow an interactive browser based +debugger to be used, but may prevent more complex WSGI applications which +make use of AJAX techniques from working. This could occur where a web page +initiates a sequence of AJAX requests and expects later requests to be able +to complete while a response for an initial request is still pending. In +other words, problems may occur where requests overlap, as subsequent +requests will not be able to be executed until the initial request has +completed. + +The UNIX 'worker' MPM +--------------------- + +The 'worker' MPM is similar to 'prefork' mode except that within each child +process there will exist a number of worker threads. Instead of a request +only being able to be processed by the next available idle child process +and with the handling of the request being the only thing the child process +is then doing, the request may be processed by a worker thread within a +child process which already has other worker threads handling other +requests at the same time. + +It is possible that a WSGI application could be executed at the same time +from multiple worker threads within the one child process. This means that +multiple worker threads may want to access common shared data at the same +time. As a consequence, such common shared data must be protected in a way +that will allow access and modification in a thread safe manner. Normally +this would necessitate the use of some form of synchronisation mechanism to +ensure that only one thread at a time accesses and or modifies the common +shared data. + +If all worker threads within a child process were busy when a new request +arrives the request would be processed by an idle worker thread in another +child process. Apache may still create new child processes on demand if +necessary. Apache may also still shutdown and kill off excess child +processes, or child processes that have handled more than a set number of +requests. + +Overall, use of 'worker' MPM will result in less child processes needing to +be created, but resource usage of individual child processes will be +greater. On modern computer systems, the 'worker' MPM would in general be +the prefered MPM to use and should if possible be used in preference to the +'prefork' MPM. + +Although contention for the global interpreter lock (GIL) in Python can +causes issues for pure Python programs, it is not generally as big an issue +when using Python within Apache. This is because all the underlying +infrastructure for accepting requests and mapping the URL to a WSGI +application, as well as the handling of requests against static files are +all performed by Apache in C code. While this code is being executed the +thread will not be holding the Python GIL, thus allowing a greater level of +overlapping execution where a system has multiple CPUs or CPUs with +multiple cores. + +This ability to make good use of more than processor, even when using +multithreading, is further enchanced by the fact that Apache uses multiple +processes for handling requests and not just a single process. Thus, even +when there is some contention for the GIL within a specific process, it +doesn't stop other processes from being able to run as the GIL is only +local to a process and does not extend across processes. + +For the typical 'worker' configuration where multiple processes and +multiple threads are used, the WSGI environment key/value pairs indicating +how processes and threads are being used will be as follows. + +*wsgi.multithread* + True + +*wsgi.multiprocess* + True + +Similar to the 'prefork' MPM, the number of processes can be restricted +to just one if required using the configuration:: + + StartServers 1 + ServerLimit 1 + +With this configuration, only one process will be started, with no +additional processes ever being created, but that one process would still +make use of multiple threads. + +The WSGI environment key/value pairs indicating how processes and threads +are being used will for this configuration be as follows. + +*wsgi.multithread* + True + +*wsgi.multiprocess* + False + +Because multiple threads are being used, there would be no problem with +overlapping requests generated by an AJAX based web page. + +The Windows 'winnt' MPM +----------------------- + +On the Windows platform the 'winnt' MPM is the only option available. With +this MPM, multiple worker threads within a child process are used to handle +all requests. The 'winnt' MPM is different to the 'worker' mode however in +that there is only one child process. At no time are additional child +processes created, or that one child process shutdown and killed off, +except where Apache as a whole is being stopped or restarted. Because there +is only one child process, the maximum number of threads used is much +greater. + +The WSGI environment key/value pairs indicating how processes and threads +are being used will for this configuration be as follows. + +*wsgi.multithread* + True + +*wsgi.multiprocess* + False + +The mod_wsgi Daemon Processes +----------------------------- + +When using 'daemon' mode of mod_wsgi, each process group can be +individually configured so as to run in a manner similar to either +'prefork', 'worker' or 'winnt' MPMs for Apache. This is achieved by +controlling the number of processes and threads within each process +using the 'processes' and 'threads' options of the WSGIDaemonProcess +directive. + +To emulate the same process/thread model as the 'winnt' MPM, that is, +a single process with multiple threads, the following configuration would +be used:: + + WSGIDaemonProcess example threads=25 + +The WSGI environment key/value pairs indicating how processes and threads +are being used will for this configuration be as follows. + +*wsgi.multithread* + True + +*wsgi.multiprocess* + False + +Note that by not specifying the 'processes' option only a single process is +created within the process group. Although providing 'processes=1' as an +option would also result in a single process being created, this has a +slightly different meaning and so you should only do this if necessary. + +The difference between not specifying the 'processes' option and defining +'processes=1' will be that WSGI environment attribute called +'wsgi.multiprocess' will be set to be True when the 'processes' option +is defined, whereas not providing the option at all will result in the +attribute being set to be False. This distinction is to allow for where +some form of mapping mechanism might be used to distribute requests across +multiple process groups and thus in effect it is still a multiprocess +application. + +In other words, if you use the configuration:: + + WSGIDaemonProcess example processes=1 threads=25 + +the WSGI environment key/value pairs indicating how processes and threads +are being used will instead be: + +*wsgi.multithread* + True + +*wsgi.multiprocess* + True + +If you need to ensure that 'wsgi.multiprocess' is False so that interactive +debuggers do not complain about an incompatible configuration, simply do +not specify the 'processes' option and allow the default behaviour of a +single daemon process to apply. + +To emulate the same process/thread model as the 'worker' MPM, that is, +multiple processes with multiple threads, the following configuration would +be used:: + + WSGIDaemonProcess example processes=2 threads=25 + +The WSGI environment key/value pairs indicating how processes and threads +are being used will for this configuration be as follows. + +*wsgi.multithread* + True + +*wsgi.multiprocess* + True + +To emulate the same process/thread model as the 'prefork' MPM, that is, +multiple processes with only a single thread running in each, the following +configuration would be used:: + + WSGIDaemonProcess example processes=5 threads=1 + +The WSGI environment key/value pairs indicating how processes and threads +are being used will for this configuration be as follows. + +*wsgi.multithread* + False + +*wsgi.multiprocess* + True + +Note that when using mod_wsgi daemon processes, the processes are only used +to execute the Python based WSGI application. The processes are not in any +way used to serve static files, or host applications implemented in other +languages. + +Unlike the normal Apache child processes when 'embedded' mode of mod_wsgi +is used, the configuration as to the number of daemon processes within a +process group is fixed. That is, when the server experiences additional +load, no more daemon processes are created than what is defined. You should +therefore always plan ahead and make sure the number of processes and +threads defined is adequate to cope with the expected load. + +Sharing Of Global Data +---------------------- + +When the 'winnt' MPM is being used, or the 'prefork' or 'worker' MPM are +forced to run with only a single process, all request handlers within a +specific WSGI application will always be accessing the same global data. +This global data will persist in memory until Apache is shutdown or +restarted, or in the case of the 'prefork' or 'worker' MPM until the child +process is recycled due to reaching a predefined request limit. + +This ability to access the same global data and for that data to persist +for the lifetime of the child process is not present when either of the +'prefork' or 'worker' MPM are used in multiprocess mode. In other words, +where the WSGI environment key/value pair indicating how processes are used +is set to: + +*wsgi.multiprocess* + True + +This is because request handlers can execute within the context of distinct +child processes, each with their own set of global data unique to that +child process. + +The consequences of this are that you cannot assume that separate +invocations of a request handler will have access to the same global data +if that data only resides within the memory of the child process. If some +set of global data must be accessible by all invocations of a handler, that +data will need to be stored in a way that it can be accessed from multiple +child processes. Such sharing could be achieved by storing the global data +within an external database, the filesystem or in shared memory accessible +by all child processes. + +Since the global data will be accessible from multiple child processes at +the same time, there must be adequate locking mechanisms in place to +prevent distinct child processes from trying to modify the same data at the +same time. The locking mechanisms need to also be able to deal with the +case of multiple threads within one child process accessing the global data +at the same time, as will be the case for the 'worker' and 'winnt' MPM. + +Python Sub Interpreters +----------------------- + +The default behaviour of mod_wsgi is to create a distinct Python sub +interpreter for each WSGI application. Thus, where Apache is being used to +host multiple WSGI applications a process will contain multiple sub +interpreters. When Apache is run in a mode whereby there are multiple child +processes, each child process will contain sub interpreters for each WSGI +application. + +When a sub interpreter is created for a WSGI application, it would then +normally persist for the life of the process. The only exception to this +would be where interpreter reloading is enabled, in which case the sub +interpreter would be destroyed and recreated when the WSGI application +script file has been changed. + +For the sub interpreter created for each WSGI application, they will each +have their own set of Python modules. In other words, a change to the +global data within the context of one sub interpreter will not be seen from +the sub interpreter corresponding to a different WSGI application. This +will be the case whether or not the sub interpreters are in the same +process. + +This behaviour can be modified and multiple applications grouped together +using the WSGIApplicationGroup directive. Specifically, the directive +indicates that the marked WSGI applications should be run within the +context of a common sub interpreter rather than being run in their own sub +interpreters. By doing this, each WSGI application will then have access +to the same global data. Do note though that this doesn't change the fact +that global data will not be shared between processes. + +The only other way of sharing data between sub interpreters within the one +child process would be to use an external data store, or a third party +C extension module for Python which allows communication or sharing of +data between multiple interpreters within the same process. + +Building A Portable Application +------------------------------- + +Taking into consideration the different process models used by Apache and the +manner in which interpreters are used by mod_wsgi, to build a portable and +robust application requires the following therefore be satisified. + +1. Where shared data needs to be visible to all application instances, +regardless of which child process they execute in, and changes made to the +data by one application are immediately available to another, including any +executing in another child process, an external data store such as a +database or shared memory must be used. Global variables in normal Python +modules cannot be used for this purpose. + +2. Access to and modification of shared data in an external data store must +be protected so as to prevent multiple threads in the same or different +processes from interfering with each other. This would normally be achieved +through a locking mechanism visible to all child processes. + +3. An application must be re-entrant, or simply put, be able to be called +concurrently by multiple threads at the same time. Data which needs to +exist for the life of the request, would need to be stored as stack based +data, thread local data, or cached in the WSGI application environment. +Global variables within the actual application module cannot be used for +this purpose. + +4. Where global data in a module local to a child process is still used, +for example as a cache, access to and modification of the global data must +be protected by local thread locking mechanisms. diff --git a/docs/user-guides/quick-installation-guide.rst b/docs/user-guides/quick-installation-guide.rst new file mode 100644 index 0000000..2922435 --- /dev/null +++ b/docs/user-guides/quick-installation-guide.rst @@ -0,0 +1,227 @@ +======================== +Quick Installation Guide +======================== + +This document describes the steps for installing mod_wsgi on a UNIX system +from the original source code. + +Apache Requirements +------------------- + +Apache 2.0, 2.2 or 2.4 can be used. + +For Apache 2.0, 2.2 and 2.4, the single threaded 'prefork' or multithreaded +'worker' Apache MPMs can be used. For Apache 2.4 the 'event' MPM can also +be used. + +The version of Apache and its runtime libraries must have be compiled with +support for threading. + +On Linux systems, if Apache has been installed from a package repository, +you must have installed the corresponding Apache "dev" package as well. + +For most Linux distributions, the "dev" package for Apache 2.X is +"apache2-dev" where the corresponding Apache package was "apache2". Some +systems however distinguish the "dev" package based on which MPM is used by +Apache. As such, it may also be called "apache2-worker-dev" or +"apache2-prefork-dev". If using Apache 2.X, do not mix things up and install +"apache-dev" by mistake, which is the "dev" package for Apache 1.3 called +just "apache". + +Python Requirements +------------------- + +Any Python 2.X version from Python 2.6 onwards can be used. For Python 3.X, +you will need Python 3.3 or later. + +The version of Python being used must have been compiled with support for +threading. + +On Linux systems, if Python has been installed from a package repository, +you must have installed the corresponding Python "dev" package as well. + +Python should preferably be available as a shared library. If this is not +the case then base runtime memory usage of mod_wsgi will be greater. + +Unpacking The Source Code +------------------------- + +Source code tar balls can be obtained from: + + * https://github.com/GrahamDumpleton/mod_wsgi/releases + +After having downloaded the tar ball for the version you want to use, +unpack it with the command:: + + tar xvfz mod_wsgi-X.Y.tar.gz + +Replace 'X.Y' with the actual version number for that being used. + +Configuring The Source Code +--------------------------- + +To setup the package ready for building run the "configure" script from +within the source code directory:: + + ./configure + +The configure script will attempt to identify the Apache installation to +use by searching in various standard locations for the Apache build tools +included with your distribution called "apxs2" or "apxs". If not found in +any of these standard locations, your PATH will be searched. + +Which Python installation to use will be determined by looking for the +"python" executable in your PATH. + +If these programs are not in a standard location, they cannot be found in +your PATH, or you wish to use alternate versions to those found, the +"--with-apxs" and "--with-python" options can be used in conjunction with +the "configure" script:: + + ./configure --with-apxs=/usr/local/apache/bin/apxs \ + --with-python=/usr/local/bin/python + +On some Linux distributions, such as SUSE and CentOS, it will be necessary +to use the "--with-apxs" option and specify either "/usr/sbin/apxs2-worker" +or "/usr/sbin/apxs2-prefork". This is necessary as the Linux distribtions +allow installation of "dev" packages for both Apache MPM variants at the +same time, whereas other Linux distributions do not. + +If you have multiple versions of Python installed and you are not using +that which is the default, you may have to organise that the PATH inherited +by the Apache application when run will result in Apache finding the +alternate version. Alternatively, the WSGIPythonHome directive should +be used to specify the exact location of the Python installation +corresponding to the version of Python compiled against. If this is not +done, the version of Python running within Apache may attempt to use the +Python modules from the wrong version of Python. + +Building The Source Code +------------------------ + +Once the package has been configured, it can be built by running:: + + make + +If the mod_wsgi source code does not build successfully, see: + + * :doc:`../user-guides/installation-issues` + +If successful, the only product of the build process that needs to be +installed is the Apache module itself. There are no separate Python code +files as everything is done within C code compiled into the Apache module. + +To install the Apache module into the standard location for Apache modules +as dictated by Apache for your installation, run:: + + make install + +Installation should be done as the 'root' user or 'sudo' command if +appropriate. + +If you want to install the Apache module in a non standard location +dictated by how your operating system distribution structures the +configuration files and modules for Apache, you will need to copy the file +manually into place. + +If installing the Apache module by hand, the file is called 'mod_wsgi.so'. +The compiled Apache module can be found in the ".libs" subdirectory. The +name of the file should be kept the same when copied into its appropriate +location. + +Loading Module Into Apache +-------------------------- + +Once the Apache module has been installed into your Apache installation's +module directory, it is still necessary to configure Apache to actually +load the module. + +Exactly how this is done and in which of the main Apache configuration +files it should be placed, is dependent on which version of Apache you are +using and may also be influenced by how your operating system's Apache +distribution has organised the Apache configuration files. You may +therefore need to check with any documentation for your operating system to +see in what way the procedure may need to be modified. + +In the simplest case, all that is required is to add a line of the form:: + + LoadModule wsgi_module modules/mod_wsgi.so + +into the main Apache "httpd.conf" configuration file at the same point that +other Apache modules are being loaded. The last option to the directive +should either be an absolute path to where the mod_wsgi module file is +located, or a path expressed relative to the root of your Apache +installation. If you used "make" to install the package, see where it +copied the file to work out what to set this value to. + +Restart Apache Web Server +------------------------- + +Having adding the required directives you should perform a restart of +Apache to check everything is okay. If you are using an unmodified Apache +distribution from the Apache Software Foundation, a restart is performed +using the 'apachectl' command:: + + apachectl restart + +If you see any sort of problem, or if you are upgrading from an older +version of mod_wsgi, it is recommended you actually stop and the start +Apache instead:: + + apachectl stop + apachectl start + +Note that on many Linux distributions where Apache is prepackaged, the +Apache software has been modified and as a result the 'apachectl' command +may not work properly or the command may not be present. On these systems, +you will need to use whatever is the sanctioned method for restarting +system services. + +This may be via an 'init.d' script:: + + /etc/init.d/httpd stop + /etc/init.d/httpd start + +or via some special service maintenance script. + +On Debian derived distributions, restarting Apache is usually done via the +'invoke-rc.d' command:: + + invoke-rc.d apache2 stop + invoke-rc.d apache2 start + +On RedHat derived distributions, restarting Apache is usually done via the +'service' command:: + + service httpd stop + service httpd start + +In nearly all cases the scripts used to restart Apache will need to be run +as the 'root' user or via 'sudo'. + +In general, for any system where you are using a prepackaged version of +Apache, it is wise to always check the documentation for that package or +system to determine the correct way to restart the Apache service. This is +because they often use a wrapper around 'apachectl', or replace it, with a +script which performs additional actions. + +If all is okay, you should see a line of the form:: + + Apache/2.4.8 (Unix) mod_wsgi/4.4.21 Python/2.7 configured + +in the Apache error log file. + +Cleaning Up After Build +----------------------- + +To cleanup after installation, run:: + + make clean + +If you need to build the module for a different version of Apache, you +should run:: + + make distclean + +and then rerun "configure" against the alternate version of Apache before +attempting to run "make" again. diff --git a/docs/user-guides/registering-cleanup-code.rst b/docs/user-guides/registering-cleanup-code.rst new file mode 100644 index 0000000..81889e5 --- /dev/null +++ b/docs/user-guides/registering-cleanup-code.rst @@ -0,0 +1,144 @@ +======================== +Registering Cleanup Code +======================== + +This document describes how to go about registering callbacks to perform +cleanup tasks at the end of a request and when an application process is +being shutdown. + +Cleanup At End Of Request +------------------------- + +To perform a cleanup task at the end of a request a couple of different +approaches can be used dependent on the requirements. The first approach +entails wrapping the calling of a WSGI application within a Python 'try' +block, with the cleanup code being triggered from the 'finally' block:: + + def _application(environ, start_response): + status = '200 OK' + output = 'Hello World!' + + response_headers = [('Content-type', 'text/plain'), + ('Content-Length', str(len(output)))] + start_response(status, response_headers) + + return [output] + + def application(environ, start_response): + try: + return _application(environ, start_response) + finally: + # Perform required cleanup task. + ... + +This might even be factored into a convenient WSGI middleware component:: + + class ExecuteOnCompletion1: + def __init__(self, application, callback): + self.__application = application + self.__callback = callback + def __call__(self, environ, start_response): + try: + return self.__application(environ, start_response) + finally: + self.__callback(environ) + +The WSGI environment passed in the 'environ' argument to the application +could even be supplied to the cleanup callback as shown in case it needed +to look at any configuration information or information passed back in the +environment from the application. + +The application would then be replaced with an instance of this class +initialised with a reference to the original application and a suitable +cleanup function:: + + def cleanup(environ): + # Perform required cleanup task. + ... + + application = ExecuteOnCompletion1(_application, cleanup) + +Using this approach, the cleanup function will actually be called prior to +the response content being consumed by mod_wsgi and written back to the +client. As such, it is probably only suitable where a complete response is +returned as an array of strings. It would not be suitable where a generator +is being returned as the cleanup would be called prior to any strings being +consumed from the generator. This would be problematic where the cleanup +task was to close or delete some resource from which the generator was +obtaining the response content. + +In order to have the cleanup task only executed after the complete response +has been consumed, it would be necessary to wrap the result of the +application within an instance of a purpose built generator like object. +This object needs to yield each item from the response in turn, and when +this object is cleaned up by virtue of the 'close()' method being called, +it should in turn call 'close()' on the result returned from the application +if necessary, and then call the supplied cleanup callback:: + + class Generator2: + def __init__(self, iterable, callback, environ): + self.__iterable = iterable + self.__callback = callback + self.__environ = environ + def __iter__(self): + for item in self.__iterable: + yield item + def close(self): + try: + if hasattr(self.__iterable, 'close'): + self.__iterable.close() + finally: + self.__callback(self.__environ) + + class ExecuteOnCompletion2: + def __init__(self, application, callback): + self.__application = application + self.__callback = callback + def __call__(self, environ, start_response): + try: + result = self.__application(environ, start_response) + except: + self.__callback(environ) + raise + return Generator2(result, self.__callback, environ) + +Note that for a successfully completed request, since the cleanup task will +be executed after the complete response has been written back to the +client, if an error occurs there will be no evidence of this in the +response seen by the client. As far as the client will be concerned +everything will look okay. The only indication of an error will be found in +the Apache error log. + +Both of the solutions above are not specific to mod_wsgi and should work +with any WSGI hosting solution which complies with the WSGI specification. + +Cleanup On Process Shutdown +--------------------------- + +To perform a cleanup task on shutdown of either an Apache child process +when using 'embedded' mode of mod_wsgi, or of a daemon process when using +'daemon' mode of mod_wsgi, the standard Python 'atexit' module can be used:: + + import atexit + + def cleanup(): + # Perform required cleanup task. + ... + + atexit.register(cleanup) + +Such a registered cleanup function will also be called if the 'Interpreter' +reload mechanism is enabled and the Python sub interpreter in which the +cleanup function was registered was destroyed. + +Note that although mod_wsgi will ensure that cleanup functions registered +using the 'atexit' module will be called correctly, this solution may not +be portable to all WSGI hosting solutions. + +Also be aware that although one can register a cleanup function to be +called on process shutdown, this is no absolute guarantee that it will be +called. This is because a process may crash, or it may be forcibly killed +off by Apache if it takes too long to shutdown normally. As a result, an +application should not be dependent on cleanup functions being called on +process shutdown and an application must have some means of detecting an +abnormal shutdown when it is started up and recover from it automatically. diff --git a/docs/user-guides/reloading-source-code.rst b/docs/user-guides/reloading-source-code.rst new file mode 100644 index 0000000..199cd30 --- /dev/null +++ b/docs/user-guides/reloading-source-code.rst @@ -0,0 +1,483 @@ +===================== +Reloading Source Code +===================== + +This document contains information about mechanisms available in mod_wsgi +for automatic reloading of source code when an application is changed and +any issues related to those mechanisms. + +Embedded Mode Vs Daemon Mode +---------------------------- + +What is achievable in the way of automatic source code reloading depends on +which mode your WSGI application is running. + +If your WSGI application is running in embedded mode then what happens when +you make code changes is largely dictated by how Apache works, as it +controls the processes handling requests. In general, if using embedded +mode you will have no choice but to manually restart Apache in order for code +changes to be used. + +If using daemon mode, because mod_wsgi manages directly the processes +handling requests and in which your WSGI application runs, there is more +avenue for performing automatic source code reloading. + +As a consequence, it is important to understand what mode your WSGI +application is running in. + +If you are running on Windows, are using Apache 1.3, or have not used +WSGIDaemonProcess/WSGIProcessGroup directives to delegate your WSGI +application to a mod_wsgi daemon mode process, then you will be using +embedded mode. + +If you are not sure whether you are using embedded mode or daemon mode, +then substitute your WSGI application entry point with:: + + def application(environ, start_response): + status = '200 OK' + + if not environ['mod_wsgi.process_group']: + output = 'EMBEDDED MODE' + else: + output = 'DAEMON MODE' + + response_headers = [('Content-Type', 'text/plain'), + ('Content-Length', str(len(output)))] + + start_response(status, response_headers) + + return [output] + +If your WSGI application is running in embedded mode, this will output to +the browser 'EMBEDDED MODE'. If your WSGI application is running in daemon +mode, this will output to the browser 'DAEMON MODE'. + +Reloading In Embedded Mode +-------------------------- + +However you have configured Apache to mount your WSGI application, you will +have a script file which contains the entry point for the WSGI application. +This script file is not treated exactly like a normal Python module and +need not even use a '.py' extension. It is even preferred that a '.py' +extension not be used for reasons described below. + +For embedded mode, one of the properties of the script file is that by +default it will be reloaded whenever the file is changed. The primary +intent with the file being reloaded is to provide a second chance at +getting any configuration in it and the mapping to the application correct. +If the script weren't reloaded in this way, you would need to restart +Apache even for a trivial change to the script file. + +Do note though that this script reloading mechanism is not intended as a +general purpose code reloading mechanism. Only the script file itself is +reloaded, no other Python modules are reloaded. This means that if modifying +normal Python code files which are used by your WSGI application, you will +need to trigger a restart of Apache. For example, if you are using Django +in embedded mode and needed to change your 'settings.py' file, you would +still need to restart Apache. + +That only the script file and not the whole process is reloaded also has a +number of implications and imposes certain restrictions on what code in the +script file can do or how it should be implemented. + +The first issue is that when the script file is imported, if the code makes +modifications to ``sys.path`` or other global data structures and the +changes are additive, checks should first be made to ensure that the change +has not already been made, else duplicate data will be added every time the +script file is reloaded. + +This means that when updating ``sys.path``, instead of using:: + + import sys + sys.path.append('/usr/local/wsgi/modules') + +the more correct way would be to use:: + + import sys + path = '/usr/local/wsgi/modules' + if path not in sys.path: + sys.path.append(path) + +This will ensure that the path doesn't get added multiple times. + +Even where the script file is named so as to have a '.py' extension, that +the script file is not treated like a normal module means that you should +never try to import the file from another code file using the 'import' +statement or any other import mechanism. The easiest way to avoid this is +not use the '.py' extension on script files or never place script files in +a directory which is located on the standard module search path, nor add +the directory containing the script into ``sys.path`` explicitly. + +If an attempt is made to import the script file as a module the result will +be that it will be loaded a second time as an independent module. This is +because script files are loaded under a module name which is keyed to the +full absolute path for the script file and not just the basename of the +file. Importing the script file directly and accessing it will therefore +not result in the same data being accessed as exists in the script file +when loaded. + +Because the script file is not treated like a normal Python module also has +implications when it comes to using the "pickle" module in conjunction +with objects contained within the script file. + +In practice what this means is that neither function objects, class objects +or instances of classes which are defined in the script file should be +stored using the "pickle" module. + +The technical reasons for the limitations on the use of the "pickle" module +in conjunction with objects defined in the script file are further +discussed in the document :doc:`../user-guides/issues-with-pickle-module`. + +The act of reloading script files also means that any data previously held +by the module corresponding to the script file will be deleted. If such +data constituted handles to database connections, and the connections are +not able to clean up themselves when deleted, it may result in resource +leakage. + +One should therefore be cautious of what data is kept in a script file. +Preferably the script file should only act as a bridge to code and data +residing in a normal Python module imported from an entirely different +directory. + +Restarting Apache Processes +--------------------------- + +As explained above, the only facility that mod_wsgi provides for reloading +source code files in embedded mode, is the reloading of just the script +file providing the entry point for your WSGI application. + +If you don't have a choice but to use embedded mode and still desire some +measure of automatic source code reloading, one option available which +works for both Windows and UNIX systems is to force Apache to recycle the +Apache server child process that handles the request automatically after +the request has completed. + +To enable this, you need to modify the value of the MaxRequestsPerChild +directive in the Apache configuration. Normally this would be set to a +value of '0', indicating that the process should never be restarted as a +result of the number of requests processed. To have it restart a process +after every request, set it to the value '1' instead:: + + MaxRequestsPerChild 1 + +Do note however that this will cause the process to be restarted after any +request. That is, the process will even be restarted if the request was for +a static file or a PHP application and wasn't even handled by your WSGI +application. The restart will also occur even if you have made no changes +to your code. + +Because a restart happens regardless of the request type, using this method +is not recommended. + +Because of how the Apache server child processes are monitored and restarts +handled, it is technically possible that this method will yield performance +which is worse than CGI scripts. For that reason you may even be better off +using a CGI/WSGI bridge to host your WSGI application. At least that way +the handling of other types of requests, such as for static files and PHP +applications will not be affected. + +Reloading In Daemon Mode +------------------------ + +If using mod_wsgi daemon mode, what happens when the script file is changed +is different to what happens in embedded mode. In daemon mode, if the +script file changed, rather than just the script file being reloaded, the +daemon process which contains the application will be shutdown and +restarted automatically. + +Detection of the change in the script file will occur at the time of the +first request to arrive after the change has been made. The way that the +restart is performed does not affect the handling of the request, with it +still being processed once the daemon process has been restarted. + +In the case of there being multiple daemon processes in the process group, +then a cascade effect will occur, with successive processes being restarted +until the request is again routed to one of the newly restarted processes. + +In this way, restarting of a WSGI application when a change has been made +to the code is a simple matter of touching the script file if daemon mode +is being used. Any daemon processes will then automatically restart without +the need to restart the whole of Apache. + +So, if you are using Django in daemon mode and needed to change your +'settings.py' file, once you have made the required change, also touch the +script file containing the WSGI application entry point. Having done that, +on the next request the process will be restarted and your Django +application reloaded. + +Restarting Daemon Processes +--------------------------- + +If you are using daemon mode of mod_wsgi, restarting of processes can to a +degree also be controlled by a user, or by the WSGI application itself, +without restarting the whole of Apache. + +To force a daemon process to be restarted, if you are using a single daemon +process with many threads for the application, then you can embed a page in +your application (password protected hopefully), that sends an appropriate +signal to itself. + +This should only be done for daemon processes and not within the Apache +child processes, as sending such a signal within a child process may +interfere with the operation of Apache. That the code is executing within a +daemon process can be determined by checking the 'mod_wsgi.process_group' +variable in the WSGI environment passed to the application. The value will +be non empty if a daemon process:: + + if environ['mod_wsgi.process_group'] != '': + import signal, os + os.kill(os.getpid(), signal.SIGINT) + +This will cause the daemon process your application is in to shutdown. The +Apache process supervisor will then automatically restart your process +ready for subsequent requests. On the restart it will pick up your new +code. This way you can control a reload from your application through some +special web page specifically for that purpose. + +You can also send this signal from an external application, but a problem +there may be identifying which process to send the signal to. If you are +running the daemon process(es) as a distinct user/group to Apache and each +application is running as a different user then you could just look for the +Apache (httpd) processes owned by the user the application is running as, +as opposed to the Apache user, and send them all signals. + +If the daemon process is running as the same user as Apache or there are +distinct applications running in different daemon processes but as the same +user, knowing which daemon processes to send the signal may be harder to +determine. + +Either way, to make it easier to identify which processes belong to a +daemon process group, you can use the 'display-name' option to the +WSGIDaemonProcess to name the process. On many platforms, when this option +is used, that name will then appear in the output from the 'ps' command +and not the name of the actual Apache server binary. + +Monitoring For Code Changes +--------------------------- + +The use of signals to restart a daemon process could also be employed in a +mechanism which automatically detects changes to any Python modules or +dependent files. This could be achieved by creating a thread at startup +which periodically looks to see if file timestamps have changed and trigger +a restart if they have. + +Example code for such an automatic restart mechanism which is compatible +with how mod_wsgi works is shown below:: + + import os + import sys + import time + import signal + import threading + import atexit + import Queue + + _interval = 1.0 + _times = {} + _files = [] + + _running = False + _queue = Queue.Queue() + _lock = threading.Lock() + + def _restart(path): + _queue.put(True) + prefix = 'monitor (pid=%d):' % os.getpid() + print >> sys.stderr, '%s Change detected to \'%s\'.' % (prefix, path) + print >> sys.stderr, '%s Triggering process restart.' % prefix + os.kill(os.getpid(), signal.SIGINT) + + def _modified(path): + try: + # If path doesn't denote a file and were previously + # tracking it, then it has been removed or the file type + # has changed so force a restart. If not previously + # tracking the file then we can ignore it as probably + # pseudo reference such as when file extracted from a + # collection of modules contained in a zip file. + + if not os.path.isfile(path): + return path in _times + + # Check for when file last modified. + + mtime = os.stat(path).st_mtime + if path not in _times: + _times[path] = mtime + + # Force restart when modification time has changed, even + # if time now older, as that could indicate older file + # has been restored. + + if mtime != _times[path]: + return True + except: + # If any exception occured, likely that file has been + # been removed just before stat(), so force a restart. + + return True + + return False + + def _monitor(): + while 1: + # Check modification times on all files in sys.modules. + + for module in sys.modules.values(): + if not hasattr(module, '__file__'): + continue + path = getattr(module, '__file__') + if not path: + continue + if os.path.splitext(path)[1] in ['.pyc', '.pyo', '.pyd']: + path = path[:-1] + if _modified(path): + return _restart(path) + + # Check modification times on files which have + # specifically been registered for monitoring. + + for path in _files: + if _modified(path): + return _restart(path) + + # Go to sleep for specified interval. + + try: + return _queue.get(timeout=_interval) + except: + pass + + _thread = threading.Thread(target=_monitor) + _thread.setDaemon(True) + + def _exiting(): + try: + _queue.put(True) + except: + pass + _thread.join() + + atexit.register(_exiting) + + def track(path): + if not path in _files: + _files.append(path) + + def start(interval=1.0): + global _interval + if interval < _interval: + _interval = interval + + global _running + _lock.acquire() + if not _running: + prefix = 'monitor (pid=%d):' % os.getpid() + print >> sys.stderr, '%s Starting change monitor.' % prefix + _running = True + _thread.start() + _lock.release() + +This would be used by importing into the script file the Python module +containing the above code, starting the monitoring system and adding any +additional non Python files which should be tracked:: + + import os + + import monitor + monitor.start(interval=1.0) + monitor.track(os.path.join(os.path.dirname(__file__), 'site.cf')) + + def application(environ, start_response): + ... + +Where needing to add many non Python files in a directory hierarchy, such +as template files which would otherwise be cached within the running +process, the ``os.path.walk()`` function could be used to traverse +all files and add required files based on extension or other criteria +using the 'track()' function. + +This mechanism would generally work adequately where a single daemon +process is used within a process group. You would need to be careful +however when multiple daemon processes are used. This is because it may not +be possible to synchronise the checks exactly across all of the daemon +processes. As a result you may end up with the daemon processes running a +mixture of old and new code until they all synchronise with the new code +base. This problem can be minimised by defining a short interval time +between scans, however that will increase the overhead of the checks. + +Using such an approach may in some cases be useful if using mod_wsgi as a +development platform. It certainly would not be recommended you use this +mechanism for a production system. + +The reasons for not using it on a production system is due to the +additional overhead and chance that daemon processes are restarted when you +are not expecting them to be. For example, in a production environment +where requests are coming in all the time, you do not want a restart +triggered when you are part way through making a set of changes which cover +multiple files as likely then that an inconsistent set of code will be +loaded and the application will fail. + +Note that you should also not use this mechanism on a system where you have +configured mod_wsgi to preload your WSGI application as soon as the daemon +process has started. If you do that, then the monitor thread will be recreated +immediately and so for every single code change on a preloaded file you +make, the daemon process will be restarted, even if there is no intervening +request. + +If preloading was really required, the example code would need to be +modified so as to not use signals to restart the daemon process, but reset +to zero the variable saved away in the WSGI script file that records the +modification time of the script file. This will have the affect of delaying +the restart until the next request has arrived. Because that variable holding +the modification time is an internal implementation detail of mod_wsgi and +not strictly part of its published API or behaviour, you should only use +that approach if it is warranted. + +Restarting Windows Apache +------------------------- + +On the Windows platform there is no daemon mode only embedded mode. The MPM +used on Apache is the 'winnt' MPM. This MPM is like the worker MPM on UNIX +systems except that there is only one process. + +Being embedded mode, modifying the WSGI script file only results in the WSGI +script file itself being reloaded, the process as a whole is not reloaded. +Thus there is no way normally through modifying the WSGI script file or any +other Python code file used by the application, of having the whole +application reloaded automatically. + +The recipe in the previous section can be used with daemon mode on UNIX +systems to implement an automated scheme for restarting the daemon +processes when any code change is made, but because Windows lacks the +'fork()' system call daemon mode isn't supported in the first place. + +Thus, the only way one can have code changes picked up on Windows is to +restart Apache as a whole. Although a full restart is required, Apache on +Windows only uses a single child server process and so the impact isn't as +significant as on UNIX platforms, where many processes may need to be +shutdown and restarted. + +With that in mind, it is actually possible to modify the prior recipe for +restarting a daemon process to restart Apache itself. To achieve this slight +of hand, it is necessary to use the Python 'ctypes' module to get access to +a special internal Apache function which is available in the Windows version +of Apache called 'ap_signal_parent()'. + +The required change to get this to work is to replace the restart +function in the previous code with the following:: + + def _restart(path): + _queue.put(True) + prefix = 'monitor (pid=%d):' % os.getpid() + print >> sys.stderr, '%s Change detected to \'%s\'.' % (prefix, path) + print >> sys.stderr, '%s Triggering Apache restart.' % prefix + import ctypes + ctypes.windll.libhttpd.ap_signal_parent(1) + +Other than that, the prior code would be used exactly as before. Now when +any change is made to Python code used by the application or any other +monitored files, Apache will be restarted automatically for you. + +As before, probably recommended that this only be used during development +and not on a production system. diff --git a/docs/user-guides/virtual-environments.rst b/docs/user-guides/virtual-environments.rst new file mode 100644 index 0000000..e416ba3 --- /dev/null +++ b/docs/user-guides/virtual-environments.rst @@ -0,0 +1,214 @@ +==================== +Virtual Environments +==================== + +This document contains information about how to make use of Python virtual +environments such as created by Ian Bicking's virtualenv with mod_wsgi. + + * http://pypi.python.org/pypi/virtualenv + +The purpose of such Python virtual environments is to allow one to create +multiple distinct Python environments for the same version of Python, but +with different sets of Python modules and packages installed into the +Python 'site-packages' directory. + +A virtual Python environment is useful where it is necessary to run +multiple WSGI applications which have conflicting requirements as to what +version of a Python module or package needs to be installed. They can also +be used where Apache and daemon mode of mod_wsgi is used to host WSGI +applications for different users and each user wants to be able to +separately install their own Python modules and packages. + +Note that aspects of the configuration described here will not work if +mod_python is being loaded into Apache at the same time as mod_wsgi. This +is because mod_python will in that case be responsible for initialising the +Python interpreter, thereby overriding what mod_wsgi is trying to do. For +best results, you should therefore use only mod_wsgi and not try and use +mod_python on the same server at the same time. + +Baseline Environment +-------------------- + +The first step in using virtual environments with mod_wsgi is to point +mod_wsgi at a baseline Python environment. This step is actually optional +and if not done the main Python installation for the system, usually that +which mod_wsgi was compiled for, would be used as the baseline environment. + +Although the main Python installation can be used, especially in a shared +environment where daemon mode of mod_wsgi is used to host WSGI applications +for different users, it is better to make the baseline environment a virgin +environment with an effectively empty 'site-packages' directory. This way +there is no possibility of conflicts between modules and packages in a users +individual Python virtual environment and the baseline environment. + +To create a virgin environment using the 'virtualenv' program, the +'--no-site-packages' option should be supplied when creating the environment:: + + $ cd /usr/local/pythonenv + + $ virtualenv --no-site-packages BASELINE + New python executable in BASELINE/bin/python + Installing setuptools............done. + +Note that the version of Python from which this baseline environment is +created must be the same version of Python that mod_wsgi was compiled for. +It is not possible to mix environments based on different major/minor +versions of Python. + +Once the baseline Python environment has been created, the WSGIPythonHome +directive should be defined within the global part of the main Apache +configuration files. The directive should refer to the top level directory +for the baseline environment created by the 'virtualenv' script:: + + WSGIPythonHome /usr/local/pythonenv/BASELINE + +This Python environment will now be used as the baseline environment for +all WSGI applications running under mod_wsgi, whether they be run in +embedded mode or daemon mode. + +There is no need to set the WSGIPythonHome directive if you want to use +the main Python installation as the baseline environment. + +Application Environments +------------------------ + +If for a specific WSGI application you have created a dedicated virtual +environment, then this environment can now be overlayed on top of the +baseline environment. If the baseline environment was a virgin environment, +this virtual environment should also be initially created as a virgin +environment. + +For example, to create a virtual environment dedicated to developing Pylons +applications the following would be used:: + + $ virtualenv --no-site-packages PYLONS-1 + New python executable in + PYLONS-1/bin/python + Installing setuptools............done. + + $ source PYLONS-1/bin/activate + + (PYLONS-1)$ easy_install Pylons + Searching for Pylons + ....... + +The Pylons instructions for creating a Pylons application would then be +followed and the application tested using the Pylons inbuilt web server. + +As an additional step however, the WSGI script file described in the +instructions would be modified to overlay the virtual environment for the +application on top of the baseline environment. This would be done by +adding at the very start of the WSGI script file the following:: + + import site + site.addsitedir('/usr/local/pythonenv/PYLONS-1/lib/python2.5/site-packages') + +Note that in this case the full path to the 'site-packages' directory for +the virtual environment needs to be specified and not just the root of +the virtual environment. + +Using 'site.addsitedir()' is a bit different to simply adding the directory +to 'sys.path' as the function will open up any '.pth' files located in the +directory and process them. This is necessary to ensure that any special +directories related to Python eggs are automatically added to 'sys.path'. + +Note that although virtualenv includes the script 'activate_this.py', which +the virtualenv documentation claims should be invoked using 'execfile()' in +the context of mod_wsgi, you may want to be cautious using it. This is +because the script modifies 'sys.prefix' which may actually cause problems +with the operation of mod_wsgi or Python modules already loaded into the +Python interpreter, if the code is dependent on the value of 'sys.prefix' +not changing. The WSGIPythonHome directive already described should instead +be used if wanting to associate Python as a whole with the virtual +environment. + +Despite that, the 'activate_this.py' script is an attempt to resolve an +issue with how 'site.addsitedir()' works. That is that any new directories +which are added to 'sys.path' by 'site.addsitedir()' are actually appended +to the end. The problem with this in the context of mod_wsgi is that if +WSGIPythonHome was not used to associate mod_wsgi with a virgin baseline +environment, then any packages/modules in the main Python installation will +still take precedence over those in the virtual environment. + +To work around this problem, what 'activate_this.py' does is invoke +'site.addsitedir()' but then also reorders 'sys.path' so any newly added +directories are shifted to the front of 'sys.path'. This will then ensure +that where there are different versions of packages in the virtual environment +that they take precedence over those in the main Python installation. + +As explained, because 'activate_this.py' is doing other things which may +not be appropriate in the context of mod_wsgi, if unable to set WSGIPythonHome +to point mod_wsgi at a virgin baseline environment, instead of just calling +'site.addsitedir()' you should use the code:: + + ALLDIRS = ['usr/local/pythonenv/PYLONS-1/lib/python2.5/site-packages'] + + import sys + import site + + # Remember original sys.path. + prev_sys_path = list(sys.path) + + # Add each new site-packages directory. + for directory in ALLDIRS: + site.addsitedir(directory) + + # Reorder sys.path so new directories at the front. + new_sys_path = [] + for item in list(sys.path): + if item not in prev_sys_path: + new_sys_path.append(item) + sys.path.remove(item) + sys.path[:0] = new_sys_path + +If you still want to use the activation script from virtualenv, then use:: + + activate_this = '/usr/local/pythonenv/PYLONS-1/bin/activate_this.py' + execfile(activate_this, dict(__file__=activate_this)) + +If the fact that 'sys.prefix' has been modified doesn't give an issue, then +great. If you see subtle unexplained problems that may be linked to the +change to 'sys.prefix', then use the more long handed approach above whereby +'site.addsitedir()' is used directly and 'sys.path' reorderd subsequently. + +Process Environments +-------------------- + +When 'site.addsitedir()' is used from a WSGI script file to overlay a +virtual environment on top of the baseline environment, it is only applied +to the specific Python interpreter instance that the application has been +delegated to run in. This means that WSGI applications running in the same +process but within different Python interpreter instances can use different +virtual environments. + +At the same time though, if needing all WSGI applications running in the +same process but within different Python interpreters, to use the same +virtual environment, you would need to setup 'sys.path' in the WSGI script +file for all applications. + +Alternatively, if using mod_wsgi 2.0 and embedded mode, the WSGIPythonPath +directive can be used to setup the virtual environment for all Python +interpreters created within the process in one step:: + + WSGIPythonPath /usr/local/pythonenv/PYLONS-1/lib/python2.5/site-packages + +Similarly, if using mod_wsgi 2.0 or later and daemon mode, the +'python-path' option to the WSGIDaemonProcess directive can be used to +setup the virtual environment:: + + WSGIDaemonProcess pylons \ + python-path=/usr/local/pythonenv/PYLONS-1/lib/python2.5/site-packages + +Note that WSGIPythonPath does not have this effect for mod_wsgi prior to +version 2.0. This is because in older versions WSGIPythonPath merely added +any listed directories to 'sys.path', whereas in mod_wsgi 2.0 and later it +calls 'site.addsitedir()' for each listed directory. + +Do note though that all mod_wsgi 2.X versions prior to mod_wsgi 2.4 do not +perform the reordering of 'sys.path' as explained previously, when using +WSGIPythonPath directive or 'python-path' option for WSGIDaemonProcess. +Thus, you would need to be using WSGIPythonHome to reference a virgin +baseline environment when using mod_wsgi 2.3 or earlier if the standard +Python site-packages directory has conflicting packages. For mod_wsgi 2.4 +onwards this is not an issue and a virtual environments site-packages will +always override that in standard Python installation. -- cgit v1.2.1