diff options
Diffstat (limited to 'docs/user-guides/debugging-techniques.rst')
-rw-r--r-- | docs/user-guides/debugging-techniques.rst | 1148 |
1 files changed, 1148 insertions, 0 deletions
diff --git a/docs/user-guides/debugging-techniques.rst b/docs/user-guides/debugging-techniques.rst new file mode 100644 index 0000000..368b795 --- /dev/null +++ b/docs/user-guides/debugging-techniques.rst @@ -0,0 +1,1148 @@ +==================== +Debugging Techniques +==================== + +Be it when initially setting up mod_wsgi for the first time, or later +during development or use of your WSGI application, you are bound to get +some sort of unexpected Python error. By default all you are usually going +to see as evidence of this is a HTTP 500 "Internal Server Error" being +displayed in the browser window and little else. + +The purpose of this document is to explain where to go look for more +details on what caused the error, describe techniques one can use to have +your WSGI application generate more useful debug information, as well as +mechanisms that can be employed to interactively debug your application. + +Note that although this document is intended to deal with techniques which +can be used when using mod_wsgi, many of the techniques are also directly +transferable or adaptable to other web hosting mechanisms for WSGI +applications. + +Apache Error Log Files +---------------------- + +When using mod_wsgi, unless you or the web framework you are using takes +specific action to catch exceptions and present the details in an alternate +manner, the only place that details of uncaught exceptions will be recorded +is in the Apache error log files. The Apache error log files are therefore +your prime source of information when things go wrong. + +Do note though that log messages generated by mod_wsgi are logged with +various serverity levels and which ones will be output to the Apache error +log files will depend on how Apache has been configured. The standard +configuration for Apache has the !LogLevel directive being set to 'warn'. +With this setting any important error messages will be output, but +informational messages generated by mod_wsgi which can assist in working +out what it is doing are not. Thus, if new to mod_wsgi or trying to debug a +problem, it is worthwhile setting the Apache configuration to use 'info' +log level instead:: + + LogLevel info + +If your Apache web server is only providing services for one host, it is +likely that you will only have one error log file. If however the Apache +web server is configured for multiple virtual hosts, then it is possible +that there will be multiple error log files, one corresponding to the main +server host and an additional error log file for each virtual host. Such a +virtual host specific error log if one is being used, would have been +configured through the placing of the Apache CustomLog directive within +the context of the VirtualHost container. + +Although your WSGI application may be hosted within a particular virtual +host and that virtual host has its own error log file, some error and +informational messages will still go to the main server host error log +file. Thus you may still need to consult both error log files when using +virtual hosts. + +Messages of note that will end up in the main server host error log file +include notifications in regard to initialisation of Python and the +creation and destruction of Python sub interpreters, plus any errors which +occur when doing this. + +Messages of note that would end up in the virtual host error log file, if +it exists, include details of uncaught Python exceptions which occur when +the WSGI application script is being loaded, or when the WSGI application +callable object is being executed. + +Messages that are logged by a WSGI application via the 'wsgi.errors' object +passed through to the application in the WSGI environment are also logged. +These will got to the virtual host error log file if it exists, or the main +error log file if the virtual host is not setup with its own error log file. +Thus, if you want to add debugging messages to your WSGI application code, +you can use 'wsgi.errors' in conjunction with the 'print' statement as shown +below:: + + def application(environ, start_response): + status = '200 OK' + output = 'Hello World!' + + print >> environ['wsgi.errors'], "application debug #1" + + response_headers = [('Content-type', 'text/plain'), + ('Content-Length', str(len(output)))] + start_response(status, response_headers) + + print >> environ['wsgi.errors'], "application debug #2" + + return [output] + +If 'wsgi.errors' is not available to the code which needs to output log +messages, then it should explicitly direct output from the 'print' +statement to 'sys.stderr':: + + import sys + + def function(): + print >> sys.stderr, "application debug #3" + ... + +If ``sys.stderr`` or ``sys.stdout`` is used directly then +these messages will end up in the main server host error log file and not +that for the virtual host unless the WSGI application is running in a +daemon process specifically associated with a virtual host. + +Do be aware though that writing to ``sys.stdout`` is by default +restricted in versions of mod_wsgi prior to 3.0 and will result in an +exception occurring of the form:: + + IOError: sys.stdout access restricted by mod_wsgi + +This is because portable WSGI applications should not write to +``sys.stdout`` or use the 'print' statement without specifying an +alternate file object besides ``sys.stdout`` as the target. This +restriction can be disabled for the whole server using the +WSGIRestrictStdout directive, or by mapping ``sys.stdout`` to +``sys.stderr`` at global scope within in the WSGI application script +file:: + + import sys + sys.stdout = sys.stderr + +In general, a WSGI application should always endeavour to only log messages +via the 'wsgi.errors' object that is passed through to a WSGI application +in the WSGI environment. This is because this is the only way of logging +messages for which there is some guarantee that they will end up in a log +file that you might have access to if using a shared server. + +An application shouldn't however cache 'wsgi.errors' and try to use it +outside of the context of a request. If this is done an exception will be +raised indicating that the request has expired and the error log object +is now invalid. + +That messages output via ``sys.stderr`` and ``sys.stdout`` end up in +the Apache error logs at all is provided as a convenience but there is no +requirement in the WSGI specification that they are valid means of a WSGI +application logging messages. + +Displaying Request Environment +------------------------------ + +When a WSGI application is invoked, the request headers are passed as CGI +variables in the WSGI request environment. The dictionary used for this +also holds information about the WSGI execution environment and mod_wsgi. +This includes mod_wsgi specific variables indicating the name of the +process and application groups within which the WSGI application is +executing. + +Knowing the values of the process and application group variables can be +important when needing to validate that your Apache configuration is doing +what you intended as far as ensuring your WSGI application is running in +daemon mode or otherwise. + +A simple way of validating such details or getting access to any of the +other WSGI request environment variables is to substitute your existing +WSGI application with one which echos back the details to your browser. +Such a task can be achieved with the following test application. The +application could be extended as necessary to display other information as +well, with process ID, user ID and group ID being shown as examples:: + + import cStringIO + import os + + def application(environ, start_response): + headers = [] + headers.append(('Content-Type', 'text/plain')) + write = start_response('200 OK', headers) + + input = environ['wsgi.input'] + output = cStringIO.StringIO() + + print >> output, "PID: %s" % os.getpid() + print >> output, "UID: %s" % os.getuid() + print >> output, "GID: %s" % os.getgid() + print >> output + + keys = environ.keys() + keys.sort() + for key in keys: + print >> output, '%s: %s' % (key, repr(environ[key])) + print >> output + + output.write(input.read(int(environ.get('CONTENT_LENGTH', '0')))) + + return [output.getvalue()] + +For the case of the process group as recorded by the +'mod_wsgi.process_group' variable in the WSGI request environment, if the +value is an empty string then the WSGI application is running in embedded +mode. For any other value it will be running in daemon mode with the process +group named by the variables value. + +Note that by default WSGI applications run in embedded mode, which means +within the Apache server child processes which accept the original requests. +Daemon mode processes would only be used through appropriate use of the +WSGIDaemonProcess and WSGIProcessGroup directives to delegate the WSGI +application to a named daemon process group. + +For the case of the application group as recorded by the +'mod_wsgi.application_group' variable in the WSGI request environment, if the +value is an empty string then the WSGI application is running in the main +Python interpreter. That is, the very first interpreter created when Python +was initialised. For any other value it indicates it is running in the named +Python sub interpreter. + +Note that by default WSGI applications would always run in a sub +interpreter rather than the main interpreter. The name of this sub +interpreter would be automatically constructed from the name of the server +or virtual host, the URL mount point of the WSGI application and the number +of the listener port when it is other than ports 80 or 443. + +To delegate a WSGI application to run in main Python interpreter, the +WSGIApplicationGroup directive would need to have been used with the value +'%{GLOBAL}'. Although the value is '%{GLOBAL}', this translates to the +empty string seen for the value of 'mod_wsgi.application_group' within the +WSGI request environment. + +The WSGIApplicationGroup directive could also be used to designate a +specific named sub interpreter rather than that selected automatically. + +For newcomers this can all be a bit confusing, which is where the test +application comes in as you can use it to validate where your WSGI +application is running is where you intended it to run. + +The set of WSGI request environment variables will also show the WSGI +variables indicating whether process is multithreaded and whether the +process group is multiprocess or not. For a more complete explanation +of what that means see documentation of +:doc:`../user-guides/processes-and-threading`. + +Tracking Request and Response +----------------------------- + +Although one can use above test application to display the request +environment, it is replacing your original WSGI application. Rather than +replace your existing application you can use a WSGI middleware wrapper +application which logs the details to the Apache error log instead:: + + # Original WSGI application. + + def application(environ, start_response): + ... + + # Logging WSGI middleware. + + import pprint + + class LoggingMiddleware: + + def __init__(self, application): + self.__application = application + + def __call__(self, environ, start_response): + errors = environ['wsgi.errors'] + pprint.pprint(('REQUEST', environ), stream=errors) + + def _start_response(status, headers, *args): + pprint.pprint(('RESPONSE', status, headers), stream=errors) + return start_response(status, headers, *args) + + return self.__application(environ, _start_response) + + application = LoggingMiddleware(application) + +The output from the middleware would end up in the Apache error log for the +virtual host, or if no virtual host specific error log file, in the main +Apache error log file. + +For more complicated problems it may also be necessary to track both the +request and response content as well. A more complicated middleware which +can log these as well as header information to the file system is as +follows:: + + # Original WSGI application. + + def application(environ, start_response): + ... + + # Logging WSGI middleware. + + import threading + import pprint + import time + import os + + class LoggingInstance: + def __init__(self, start_response, oheaders, ocontent): + self.__start_response = start_response + self.__oheaders = oheaders + self.__ocontent = ocontent + + def __call__(self, status, headers, *args): + pprint.pprint((status, headers)+args), stream=self.__oheaders) + self.__oheaders.close() + + self.__write = self.__start_response(status, headers, *args) + return self.write + + def __iter__(self): + return self + + def write(self, data): + self.__ocontent.write(data) + self.__ocontent.flush() + return self.__write(data) + + def next(self): + data = self.__iterable.next() + self.__ocontent.write(data) + self.__ocontent.flush() + return data + + def close(self): + if hasattr(self.__iterable, 'close'): + self.__iterable.close() + self.__ocontent.close() + + def link(self, iterable): + self.__iterable = iter(iterable) + + class LoggingMiddleware: + + def __init__(self, application, savedir): + self.__application = application + self.__savedir = savedir + self.__lock = threading.Lock() + self.__pid = os.getpid() + self.__count = 0 + + def __call__(self, environ, start_response): + self.__lock.acquire() + self.__count += 1 + count = self.__count + self.__lock.release() + + key = "%s-%s-%s" % (time.time(), self.__pid, count) + + iheaders = os.path.join(self.__savedir, key + ".iheaders") + iheaders_fp = file(iheaders, 'w') + + icontent = os.path.join(self.__savedir, key + ".icontent") + icontent_fp = file(icontent, 'w+b') + + oheaders = os.path.join(self.__savedir, key + ".oheaders") + oheaders_fp = file(oheaders, 'w') + + ocontent = os.path.join(self.__savedir, key + ".ocontent") + ocontent_fp = file(ocontent, 'w+b') + + errors = environ['wsgi.errors'] + pprint.pprint(environ, stream=iheaders_fp) + iheaders_fp.close() + + length = int(environ.get('CONTENT_LENGTH', '0')) + input = environ['wsgi.input'] + while length != 0: + data = input.read(min(4096, length)) + if data: + icontent_fp.write(data) + length -= len(data) + else: + length = 0 + icontent_fp.flush() + icontent_fp.seek(0, os.SEEK_SET) + environ['wsgi.input'] = icontent_fp + + iterable = LoggingInstance(start_response, oheaders_fp, ocontent_fp) + iterable.link(self.__application(environ, iterable)) + return iterable + + application = LoggingMiddleware(application, '/tmp/wsgi') + +For this middleware, the second argument to the constructor should be a +preexisting directory. For each request four files will be saved. These +correspond to input headers, input content, response status and headers, +and request content. + +Poorly Performing Code +---------------------- + +The WSGI specification allows any iterable object to be returned as the +response, so long as the iterable yields string values. That this is the +case means that one can too easily return an object which satisfies this +requirement but has some sort of performance related issue. + +The worst case of this is where instead of returning a list containing +strings, a single string is returned. The problem with a string is that +when it is iterated over, a single character of the string is yielded each +time. In other words, a single character is written back to the client on +each loop, with a flush occuring in between to ensure that the character +has actually been written and isn't just being buffered. + +Although for small strings a performance impact may not be noticed, if +returning large strings the affect on request throughput could be quite +significant. + +Another case which can cause problems is to return a file like object. For +iteration over a file like object, typically what can occur is that a +single line within the file is returned each time. If the file is a line +oriented text file where each line is a of a reasonable length, this may be +okay, but if the file is a binary file there may not actually be line +breaks within the file. + +For the case where file contains many short lines, throughput would be +affected much like in the case where a string is returned. For the case +where the file is just binary data, the result can be that the complete +file may be read in on the first loop. If the file is large, this could +cause a large transient spike in memory usage. Once that memory is +allocated, it will then be retained by the process, albeit that it may be +reused by the process at a later point. + +Because of the performance impacts in terms of throughput and memory usage, +both these cases should be avoided. For the case of returning a string, it +should be returned with a single element list. For the case of a file like +object, the 'wsgi.file_wrapper' extension should be used, or a wrapper +which suitably breaks the response into chunks. + +In order to identify where code may be inadvertantly returning such iterable +types, the following code can be used:: + + import types + + import cStringIO + import socket + import StringIO + + BAD_ITERABLES = [ + cStringIO.InputType, + socket.SocketType, + StringIO.StringIO, + types.FileType, + types.StringType, + ] + + class ValidatingMiddleware: + + def __init__(self, application): + self.__application = application + + def __call__(self, environ, start_response): + errors = environ['wsgi.errors'] + + result = self.__application(environ, start_response) + + value = type(result) + if value == types.InstanceType: + value = result.__class__ + if value in BAD_ITERABLES: + print >> errors, 'BAD ITERABLE RETURNED: ', + print >> errors, 'URL=%s ' % environ['REQUEST_URI'], + print >> errors, 'TYPE=%s' % value + + return result + + def application(environ, start_response): + ... + + application = ValidatingMiddleware(application) + +Error Catching Middleware +------------------------- + +Because mod_wsgi only logs details of uncaught exceptions to the Apache +error log and returns a generic HTTP 500 "Internal Server Error" response, +if you want the details of any exception to be displayed in the error +page and be visible from the browser, you will need to use a WSGI error +catching middleware component. + +One example of WSGI error catching middleware is the ErrorMiddleware class +from Paste. + + * http://www.pythonpaste.org + +This class can be configured not only to catch exceptions and present the +details to the browser in an error page, it can also be configured to send +the details of any errors in email to a designated recipient, or log the +details to an alternate log file. + +Being able to have error details sent by email would be useful in a +production environment or where your application is running on a web +hosting environment and the Apache error logs would not necessarily be +closely monitored on a day to day basis. Enabling of that particular +feature though should possibly only be done when you have some confidence +in the application else you might end up getting inundated with emails. + +To use the error catching middleware from Paste you simply need to wrap +your existing application with it such that it then becomes the top level +application entry point:: + + def application(environ, start_response): + status = '200 OK' + output = 'Hello World!\n\n' + + response_headers = [('Content-type', 'text/plain'), + ('Content-Length', str(len(output)))] + start_response(status, response_headers) + + return [output] + + from paste.exceptions.errormiddleware import ErrorMiddleware + application = ErrorMiddleware(application, debug=True) + +In addition to displaying information about the Python exception that has +occurred and the stack traceback, this middleware component will also +output information about the WSGI environment such that you can see what +was being passed to the WSGI application. This can be useful if the cause +of any problem was unexpected values passed in the headers of the HTTP +request. + +Note that error catching middleware is of absolutely no use for trying +to capture and display in the browser any errors that occur at global scope +within the WSGI application script when it is being imported. Details of +any such errors occuring at this point will only be captured in the Apache +error log files. As much as possible you should avoid performing +complicated tasks when the WSGI application script file is being imported, +instead you should only trigger such actions the first time a request is +received. By doing this you will be able to capture errors in such +initialisation code with the error catching middleware. + +Also note that the debug mode whereby details are displayed in the browser +should only be used during development and not in a production system. This +is because details which are displayed may be of use to anyone who may wish +to compromise your site. + +Python Interactive Debugger +--------------------------- + +Python debuggers such as implemented by the 'pdb' module can sometimes be +useful in debugging Python applications, especially where there is a need +to single step through code and analyse application state at each point. +Use of such debuggers in web applications can be a bit more tricky than +normal applications though and especially so with mod_wsgi. + +The problem with mod_wsgi is that the Apache web server can create multiple +child processes to respond to requests. Partly because of this, but also +just to prevent problems in general, Apache closes off standard input at +startup. Thus there is no actual way to interact with the Python debugger +module if it were used. + +To get around this requires having complete control of the Apache web +server that you are using to host your WSGI application. In particular, it +will be necessary to shutdown the web server and then startup the 'httpd' +process explicitly in single process debug mode, avoiding the 'apachectl' +management application altogether:: + + $ apachectl stop + $ httpd -X + +If Apache is normally started as the 'root' user, this also will need to be +run as the 'root' user otherwise the Apache web server will not have the +required permissions to write to its log directories etc. + +The result of starting the 'httpd' process in this way will be that the +Apache web server will run everything in one process rather than using +multiple processes. Further, it will not close off standard input thus +allowing the Python debugger to be used. + +Do note though that one cannot be using the ability of mod_wsgi to run +your application in a daemon process when doing this. The WSGI application +must be running within the main Apache process. + +To trigger the Python debugger for any call within your code, the following +customised wrapper for the 'Pdb' class should be used:: + + class Debugger: + + def __init__(self, object): + self.__object = object + + def __call__(self, *args, **kwargs): + import pdb, sys + debugger = pdb.Pdb() + debugger.use_rawinput = 0 + debugger.reset() + sys.settrace(debugger.trace_dispatch) + + try: + return self.__object(*args, **kwargs) + finally: + debugger.quitting = 1 + sys.settrace(None) + +This might for example be used to wrap the actual WSGI application callable +object:: + + def application(environ, start_response): + status = '200 OK' + output = 'Hello World!\n\n' + + response_headers = [('Content-type', 'text/plain'), + ('Content-Length', str(len(output)))] + start_response(status, response_headers) + + return [output] + + application = Debugger(application) + +When a request is now received, the Python debugger will be triggered and +you can interactively debug your application from the window you ran the +'httpd' process. For example:: + + > /usr/local/wsgi/scripts/hello.py(21)application() + -> status = '200 OK' + + (Pdb) list + 16 finally: + 17 debugger.quitting = 1 + 18 sys.settrace(None) + 19 + 20 def application(environ, start_response): + 21 -> status = '200 OK' + 22 output = 'Hello World!\n\n' + 23 + 24 response_headers = [('Content-type', 'text/plain'), + 25 ('Content-Length', str(len(output)))] + 26 start_response(status, response_headers) + + (Pdb) print start_response + <built-in method start_response of mod_wsgi.Adapter object at 0x1160180> + + cont + +When wishing to allow the request to complete, issue the 'cont' command. If +wishing to cause the request to abort, issue the 'quit' command. This will +result in a 'BdbQuit' exception being raised and would result in a HTTP +500 "Internal Server Error" response being returned to the client. To kill +off the whole 'httpd' process, after having issued 'cont' or 'quit' to exit +the debugger, interrupt the process using 'CTRL-C'. + +To see what commands the Python debugger accepts, issue the 'help' command +and also consult the documentation for the 'pdb' module on the Python web +site. + +Note that the Python debugger expects to be able to write to +``sys.stdout`` to display information to the terminal. Thus if using +using a Python web framework which replaces ``sys.stdout`` such as +web.py, you will not be able to use the Python debugger. + +Browser Based Debugger +---------------------- + +In order to use the Python debugger modules you need to have direct access +to the host and the Apache web server that is running your WSGI application. +If your only access to the system is via your web browser this makes the use +of the full Python debugger impractical. + +An alternative to the Python debugger modules which is available is an +extension of the WSGI error catching middleware previously described. This +is the EvalException class from Paste. It embodies the error catching +attributes of the ErrorMiddleware class, but also allows some measure of +interactive debugging and introspection through the web browser. + +As with any WSGI middleware component, to use the class entails creating +a wrapper around the application you wish to debug:: + + def application(environ, start_response): + status = '200 OK' + output = 'Hello World!\n\n' + + response_headers = [('Content-type', 'text/plain'), + ('Content-Length', str(len(output)))] + start_response(status, response_headers) + + return [output] + + from paste.evalexception.middleware import EvalException + application = EvalException(application) + +Like ErrorMiddleware when an unexpected exception occurs a web page is +presented which shows the location of the error along with the contents of +the WSGI application environment. Where EvalException is different however +is that it is possible to inspect the local variables residing within each +stack frame down to where the error occurred. Further, it is possible to +enter Python code which can be evaluated within the context of the selected +stack frame in order to access data or call functions or methods of +objects. + +In order for this to all work requires that subsequent requests back to +the WSGI application always end up with the same process where the error +originally occurred. With mod_wsgi this does however present a bit of a +problem as Apache can create and use multiple child processes to handle +requests. + +Because of this requirement, if you want to be able to use this browser +based interactive debugger, if running your application in embedded mode of +mod_wsgi, you will need to configure Apache such that it only starts up one +child process to handle requests and that it never creates any additional +processes. The Apache configuration directives required to achieve this are +as follows:: + + StartServers 1 + ServerLimit 1 + +The directives must be placed at global scope within the main Apache +configuration files and will affect the whole Apache web server. + +If you are using the worker MPM on a UNIX system, restricting Apache to +just a single process may not be an issue, at least during development. If +however you are using the prefork MPM on a UNIX system, you may see issues +if you are using an AJAX intensive page that relies on being able to +execute parallel requests, as only one request at a time will be able to be +handled by the Apache web server. + +If using Apache 2.X on a UNIX system, a better approach is to use daemon +mode of mod_wsgi and delegate your application to run in a single daemon +process. This process may be single or multithreaded as per any threading +requirements of your application. + +Which ever configuration is used, if the browser based interactive debugger +is used it should only be used on a development system and should never be +deployed on a production system or in a web hosting environment. This is +because the debugger will allow one to execute arbitrary Python code within +the context of your application from a remote client. + +Debugging Crashes With GDB +-------------------------- + +In cases where Apache itself crashes for no apparent reason, the above +techniques are not always particularly useful. This is especially the case +where the crash occurs in non Python code outside of your WSGI application. + +The most common cause of Apache crashing, besides any still latent bugs +that may exist in mod_wsgi, of which hopefully there aren't any, are shared +library version mismatches. Another major cause of crashes is third party C +extension modules for Python which are not compatible with being used in a +Python sub interpreter which isn't the first interpreter created when +Python is initialised, or modules which are not compatible with Python sub +interpreters being destroyed and the module then being used in a new Python +sub interpreter. + +Examples of where shared library version mismatches are known to occur are +between the version of the 'expat' library used by Apache and that embedded +within the Python 'pyexpat' module. Another is between the version of the +MySQL client libraries used by PHP and the Python MySQL module. + +Both these can be a cause of crashes where the different components are +compiled and linked against different versions of the shared library for +the packages in question. It is vitally important that all packages making +use of a shared library were compiled against and use the same version of +a shared library. + +Another problematic package is Subversion. In this case there can be +conflicts between the version of Subversion libraries used by mod_dav_svn +and the Python Subversion bindings. Certain versions of the Python +Subversion modules also cause problems because they appear to be +incompatible with use in a Python sub interpreter which isn't the first +interpreter created when Python is initialised. + +In this latter issue, the sub interpreter problems can often be solved by +forcing the WSGI application using the Python Subversion modules to run in +the '%{GLOBAL}' application group. This solution often also resolves issues +with SWIG generated bindings, especially where the '-thread' option was +supplied to 'swig' when the bindings were generated. + +Whatever the reason, in some cases the only way to determine why Apache or +Python is crashing is to use a C code debugger such as 'gdb'. Now although +it is possible to attach 'gdb' to a running process, the preferred method +for using 'gdb' in conjunction with Apache is to run Apache in single +process debug mode from within 'gdb'. + +To do this it is necessary to first shutdown Apache. The 'gdb' debugger can +then be started against the 'httpd' executable and then the process started +up from inside of 'gdb':: + + $ /usr/local/apache/bin/apachectl stop + $ sudo gdb /usr/local/apache/bin/httpd + GNU gdb 6.1-20040303 (Apple version gdb-384) (Mon Mar 21 00:05:26 GMT 2005) + Copyright 2004 Free Software Foundation, Inc. + GDB is free software, covered by the GNU General Public License, and you are + welcome to change it and/or distribute copies of it under certain conditions. + Type "show copying" to see the conditions. + There is absolutely no warranty for GDB. Type "show warranty" for details. + This GDB was configured as "powerpc-apple-darwin"...Reading symbols for shared + libraries ........ done + + (gdb) run -X + Starting program: /usr/local/apache/bin/httpd -X + Reading symbols for shared libraries .+++ done + Reading symbols for shared libraries ..................... done + +If Apache is normally started as the 'root' user, this also will need to be +run as the 'root' user otherwise the Apache web server will not have the +required permissions to write to its log directories etc. + +If Apache was crashing on startup, you should immediately encounter the +error, otherwise use your web browser to access the URL which is causing +the crash to occur. You can then commence trying to debug why the crash is +occuring. + +Note that you should ensure that you have not assigned your WSGI +application to run in a mod_wsgi daemon process using the WSGIDaemonProcess +and WSGIProcessGroup directives. This is because the above procedure will +only catch crashes which occur when the application is running in embedded +mode. If it turns out that the application only crashes when run in mod_wsgi +daemon mode, an alternate method of using 'gdb' will be required. + +In this circumstance you should run Apache as normal, but ensure that you +only create one mod_wsgi daemon process and have it use only a single +thread:: + + WSGIDaemonProcess debug threads=1 + WSGIProcessGroup debug + +If not running the daemon process as a distinct user where you can tell +which process it is, then you will also need to ensure that Apache +!LogLevel directive has been set to 'info'. This is to ensure that +information about daemon processes created by mod_wsgi are logged to the +Apache error log. This is necessary, as you will need to consult the Apache +error logs to determine the process ID of the daemon process that has been +created for that daemon process group:: + + mod_wsgi (pid=666): Starting process 'debug' with threads=1. + +Knowing the process ID, you should then run 'gdb', telling it to attach +directly to the daemon process:: + + $ sudo gdb /usr/local/apache/bin/httpd 666 + GNU gdb 6.1-20040303 (Apple version gdb-384) (Mon Mar 21 00:05:26 GMT 2005) + Copyright 2004 Free Software Foundation, Inc. + GDB is free software, covered by the GNU General Public License, and you are + welcome to change it and/or distribute copies of it under certain conditions. + Type "show copying" to see the conditions. + There is absolutely no warranty for GDB. Type "show warranty" for details. + This GDB was configured as "powerpc-apple-darwin"...Reading symbols for shared + libraries ........ done + + /Users/grahamd/666: No such file or directory. + Attaching to program: `/usr/local/apache/bin/httpd', process 666. + Reading symbols for shared libraries .+++..................... done + 0x900c7060 in sigwait () + (gdb) cont + Continuing. + +Once 'gdb' has been started and attached to the process, then initiate the +request with the URL that causes the application to crash. + +Attaching to the running daemon process can also be useful where a single +request or the whole process is appearing to hang. In this case one can +force a stack trace to be output for all running threads to try and +determine what code is getting stuck. The appropriate gdb command in this +instance is 'thread apply all bt':: + + sudo gdb /usr/local/apache-2.2/bin/httpd 666 + GNU gdb 6.3.50-20050815 (Apple version gdb-477) (Sun Apr 30 20:06:22 GMT 2006) + Copyright 2004 Free Software Foundation, Inc. + GDB is free software, covered by the GNU General Public License, and you are + welcome to change it and/or distribute copies of it under certain conditions. + Type "show copying" to see the conditions. + There is absolutely no warranty for GDB. Type "show warranty" for details. + This GDB was configured as "powerpc-apple-darwin"...Reading symbols + for shared libraries ....... done + + /Users/grahamd/666: No such file or directory. + Attaching to program: `/usr/local/apache/bin/httpd', process 666. + Reading symbols for shared libraries .+++..................... done + 0x900c7060 in sigwait () + (gdb) thread apply all bt + + Thread 4 (process 666 thread 0xd03): + #0 0x9001f7ac in select () + #1 0x004189b4 in apr_pollset_poll (pollset=0x1894650, + timeout=-1146117585187099488, num=0xf0182d98, descriptors=0xf0182d9c) + at poll/unix/select.c:363 + #2 0x002a57f0 in wsgi_daemon_thread (thd=0x1889660, data=0x18895e8) + at mod_wsgi.c:6980 + #3 0x9002bc28 in _pthread_body () + + Thread 3 (process 666 thread 0xc03): + #0 0x9001f7ac in select () + #1 0x0041d224 in apr_sleep (t=1000000) at time/unix/time.c:246 + #2 0x002a2b10 in wsgi_deadlock_thread (thd=0x0, data=0x2aee68) at + mod_wsgi.c:7119 + #3 0x9002bc28 in _pthread_body () + + Thread 2 (process 666 thread 0xb03): + #0 0x9001f7ac in select () + #1 0x0041d224 in apr_sleep (t=299970002) at time/unix/time.c:246 + #2 0x002a2dec in wsgi_monitor_thread (thd=0x0, data=0x18890e8) at + mod_wsgi.c:7197 + #3 0x9002bc28 in _pthread_body () + + Thread 1 (process 666 thread 0x203): + #0 0x900c7060 in sigwait () + #1 0x0041ba9c in apr_signal_thread (signal_handler=0x2a29a0 + <wsgi_check_signal>) at threadproc/unix/signals.c:383 + #2 0x002a3728 in wsgi_start_process (p=0x1806418, daemon=0x18890e8) + at mod_wsgi.c:7311 + #3 0x002a6a4c in wsgi_hook_init (pconf=0x1806418, ptemp=0x0, + plog=0xc8, s=0x18be8d4) at mod_wsgi.c:7716 + #4 0x0000a5b0 in ap_run_post_config (pconf=0x1806418, plog=0x1844418, + ptemp=0x180e418, s=0x180da78) at config.c:91 + #5 0x000033d4 in main (argc=3, argv=0xbffffa8c) at main.c:706 + +It is suggested when trying to debug such issues that the daemon process be +made to run with only a single thread. This will reduce how many stack +traces one needs to analyse. + +If you are running with multiple processes within the daemon process group +and all requests are hanging, you will need to get a snapshot of what is +happening in all processes in the daemon process group. Because doing this +by hand will be tedious, it is better to automate it. + +To automate capturing the stack traces, first create a file called 'gdb.cmds' +which contains the following:: + + set pagination 0 + thread apply all bt + detach + quit + +This can then be used in conjunction with 'gdb' to avoid needing to enter +the commands manually. For example:: + + sudo gdb /usr/local/apache-2.2/bin/httpd -x gdb.cmds -p 666 + +To be able to automate this further and apply it to all processes in a +daemon process group, then first off ensure that daemon processes are named +in 'ps' output by using the 'display-name' option to WSGIDaemonProcess +directive. + +For example, to apply default naming strategy as implemented by mod_wsgi, use:: + + WSGIDaemonProcess xxx display-name=%{GLOBAL} + +In the output of a BSD derived 'ps' command, this will now show the process +as being named '(wsgi:xxx)':: + + $ ps -cxo command,pid | grep wsgi + (wsgi:xxx) 666 + +Note that the name may be truncated as the resultant name can be no longer +than what was the length of the original executable path for Apache. You +may therefore like to name it explicitly:: + + WSGIDaemonProcess xxx display-name=(wsgi:xxx) + +Having named the processes in the daemon process group, we can now parse the +output of 'ps' to identify the process and apply the 'gdb' command script to +each:: + + for pid in `ps -cxo command,pid | awk '{ if ($0 ~ /wsgi:xxx/ && $1 !~ /grep/) print $NF }'`; do sudo gdb -x gdb.cmds -p $pid; done + +The actual name given to the daemon process group using the 'display-name' +option should be replaced in this command line. That is, change 'wsgi:xxx' +appropriately. + +If you are having problems with process in daemon process groups hanging, +you might consider implementing a monitoring system which automatically +detects somehow when the processes are no longer responding to requests and +automatically trigger this dump of the stack traces before restarting the +daemon process group or Apache. + +Extracting Python Stack Traces +------------------------------ + +Using gdb to get stack traces as described above only gives you information +about what is happening at the C code level. This will not tell where in the +actual Python code execution was at. Your only clue is going to be where a +call out was being made to some distinct C function in a C extension module +for Python. + +One can get stack traces for Python code by using:: + + def _stacktraces(): + code = [] + for threadId, stack in sys._current_frames().items(): + code.append("\n# ThreadID: %s" % threadId) + for filename, lineno, name, line in traceback.extract_stack(stack): + code.append('File: "%s", line %d, in %s' % (filename, + lineno, name)) + if line: + code.append(" %s" % (line.strip())) + + for line in code: + print >> sys.stderr, line + +The caveat here obviously is that the process has to still be running. There +is also the issue of how you trigger that function to dump stack traces for +executing Python threads. + +If the problem you have is that some request handler threads are stuck, +either blocked, or stuck in an infinite loop, and you want to know what they +are doing, then so long as there are still some handler threads left and +the application is still responding to requests, then you could trigger it +from a request handler triggered by making a request against a specific URL. + +This though depends on you only running your application within a single +process because as soon as you have multiple processes you have no gaurantee +that a request will go to the process you want to debug. + +A better method therefore is to have a perpetually running background thread +which monitors for a specific file in the file system. When that file is +created or the modification time changes, then the background thread would +dump the stack traces for the process. + +Sample code which takes this approach is included below. This code could be +placed temporarily at the end of your WSGI script file if you know you are +going to need it because of a recurring problem:: + + import os + import sys + import time + import signal + import threading + import atexit + import Queue + import traceback + + FILE = '/tmp/dump-stack-traces.txt' + + _interval = 1.0 + + _running = False + _queue = Queue.Queue() + _lock = threading.Lock() + + def _stacktraces(): + code = [] + for threadId, stack in sys._current_frames().items(): + code.append("\n# ProcessId: %s" % os.getpid()) + code.append("# ThreadID: %s" % threadId) + for filename, lineno, name, line in traceback.extract_stack(stack): + code.append('File: "%s", line %d, in %s' % (filename, + lineno, name)) + if line: + code.append(" %s" % (line.strip())) + + for line in code: + print >> sys.stderr, line + + try: + mtime = os.path.getmtime(FILE) + except: + mtime = None + + def _monitor(): + while 1: + global mtime + + try: + current = os.path.getmtime(FILE) + except: + current = None + + if current != mtime: + mtime = current + _stacktraces() + + # Go to sleep for specified interval. + + try: + return _queue.get(timeout=_interval) + except: + pass + + _thread = threading.Thread(target=_monitor) + _thread.setDaemon(True) + + def _exiting(): + try: + _queue.put(True) + except: + pass + _thread.join() + + atexit.register(_exiting) + + def _start(interval=1.0): + global _interval + if interval < _interval: + _interval = interval + + global _running + _lock.acquire() + if not _running: + prefix = 'monitor (pid=%d):' % os.getpid() + print >> sys.stderr, '%s Starting stack trace monitor.' % prefix + _running = True + _thread.start() + _lock.release() + + _start() + +Once your WSGI script file has been loaded, then touching the file +'/tmp/dump-stack-traces.txt' will cause stack traces for active Python +threads to be output to the Apache error log. + +Note that the sample code doesn't deal with possibility that with multiple +processes for same application, that all processes may attempt to dump +information at the same time. As such, you may get interleaving of output +from multiple processes in Apache error logs at the same time. + +What you may want to do is modify this code to dump out to some special +directory, distinct files containing the trace where the names of the file +include the process ID and a date/time. That way each will be separate. + +An example of what one might expect to see from the above code is as +follows:: + + # ProcessId: 666 + # ThreadID: 4352905216 + File: "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/threading.py", line 497, in __bootstrap + self.__bootstrap_inner() + File: "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/threading.py", line 522, in __bootstrap_inner + self.run() + File: "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/threading.py", line 477, in run + self.__target(*self.__args, **self.__kwargs) + File: "/Library/WebServer/Sites/django-1/htdocs/project.wsgi", line 72, in _monitor + _stacktraces() + File: "/Library/WebServer/Sites/django-1/htdocs/project.wsgi", line 47, in _stacktraces + for filename, lineno, name, line in traceback.extract_stack(stack): + + # ThreadID: 4322832384 + File: "/Library/WebServer/Sites/django-1/htdocs/project.wsgi", line 21, in application + return _application(environ, start_response) + File: "/Library/WebServer/Sites/django-1/lib/python2.6/site-packages/django/core/handlers/wsgi.py", line 245, in __call__ + response = middleware_method(request, response) + File: "/Library/WebServer/Sites/django-1/lib/python2.6/site-packages/django/contrib/sessions/middleware.py", line 36, in process_response + request.session.save() + File: "/Library/WebServer/Sites/django-1/lib/python2.6/site-packages/django/contrib/sessions/backends/db.py", line 63, in save + obj.save(force_insert=must_create, using=using) + File: "/Library/WebServer/Sites/django-1/lib/python2.6/site-packages/django/db/models/base.py", line 434, in save + self.save_base(using=using, force_insert=force_insert, force_update=force_update) + File: "/Library/WebServer/Sites/django-1/lib/python2.6/site-packages/django/db/models/base.py", line 527, in save_base + result = manager._insert(values, return_id=update_pk, using=using) + File: "/Library/WebServer/Sites/django-1/lib/python2.6/site-packages/django/db/models/manager.py", line 195, in _insert + return insert_query(self.model, values, **kwargs) + File: "/Library/WebServer/Sites/django-1/lib/python2.6/site-packages/django/db/models/query.py", line 1479, in insert_query + return query.get_compiler(using=using).execute_sql(return_id) + File: "/Library/WebServer/Sites/django-1/lib/python2.6/site-packages/django/db/models/sql/compiler.py", line 783, in execute_sql + cursor = super(SQLInsertCompiler, self).execute_sql(None) + File: "/Library/WebServer/Sites/django-1/lib/python2.6/site-packages/django/db/models/sql/compiler.py", line 727, in execute_sql + cursor.execute(sql, params) + File: "/Library/WebServer/Sites/django-1/lib/python2.6/site-packages/debug_toolbar/panels/sql.py", line 95, in execute + stacktrace = tidy_stacktrace(traceback.extract_stack()) + File: "/Library/WebServer/Sites/django-1/lib/python2.6/site-packages/debug_toolbar/panels/sql.py", line 40, in tidy_stacktrace + s_path = os.path.realpath(s[0]) + File: "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/posixpath.py", line 355, in realpath + if islink(component): + File: "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/posixpath.py", line 132, in islink + st = os.lstat(path) + +Note that one of the displayed threads will be that for the thread which is +dumping the stack traces. That stack trace can obviously be ignored. + +One could extend the above recipe in more elaborate ways by using a WSGI +middleware that capture details of each request from the WSGI environment +and also dumping out from that the URL for the request being handled by +any threads. This may assist in working out whether problems are related +to a specific URL. |