summaryrefslogtreecommitdiff
path: root/docs/user-guides/debugging-techniques.rst
blob: 13d47e8c8ab7fee4ca8697fc194de45213d3797a (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
====================
Debugging Techniques
====================

Be it when initially setting up mod_wsgi for the first time, or later
during development or use of your WSGI application, you are bound to get
some sort of unexpected Python error. By default all you are usually going
to see as evidence of this is a HTTP 500 "Internal Server Error" being
displayed in the browser window and little else.

The purpose of this document is to explain where to go look for more
details on what caused the error, describe techniques one can use to have
your WSGI application generate more useful debug information, as well as
mechanisms that can be employed to interactively debug your application.

Note that although this document is intended to deal with techniques which
can be used when using mod_wsgi, many of the techniques are also directly
transferable or adaptable to other web hosting mechanisms for WSGI
applications.

Apache Error Log Files
----------------------

When using mod_wsgi, unless you or the web framework you are using takes
specific action to catch exceptions and present the details in an alternate
manner, the only place that details of uncaught exceptions will be recorded
is in the Apache error log files. The Apache error log files are therefore
your prime source of information when things go wrong.

Do note though that log messages generated by mod_wsgi are logged with
various severity levels and which ones will be output to the Apache error
log files will depend on how Apache has been configured. The standard
configuration for Apache has the !LogLevel directive being set to 'warn'.
With this setting any important error messages will be output, but
informational messages generated by mod_wsgi which can assist in working
out what it is doing are not. Thus, if new to mod_wsgi or trying to debug a
problem, it is worthwhile setting the Apache configuration to use 'info'
log level instead::

    LogLevel info

If your Apache web server is only providing services for one host, it is
likely that you will only have one error log file. If however the Apache
web server is configured for multiple virtual hosts, then it is possible
that there will be multiple error log files, one corresponding to the main
server host and an additional error log file for each virtual host. Such a
virtual host specific error log if one is being used, would have been
configured through the placing of the Apache CustomLog directive within
the context of the VirtualHost container.

Although your WSGI application may be hosted within a particular virtual
host and that virtual host has its own error log file, some error and
informational messages will still go to the main server host error log
file. Thus you may still need to consult both error log files when using
virtual hosts.

Messages of note that will end up in the main server host error log file
include notifications in regard to initialisation of Python and the
creation and destruction of Python sub interpreters, plus any errors which
occur when doing this.

Messages of note that would end up in the virtual host error log file, if
it exists, include details of uncaught Python exceptions which occur when
the WSGI application script is being loaded, or when the WSGI application
callable object is being executed.

Messages that are logged by a WSGI application via the 'wsgi.errors' object
passed through to the application in the WSGI environment are also logged.
These will go to the virtual host error log file if it exists, or the main
error log file if the virtual host is not setup with its own error log file.
Thus, if you want to add debugging messages to your WSGI application code,
you can use 'wsgi.errors' in conjunction with the 'print' statement as shown
below::

    def application(environ, start_response):
        status = '200 OK'
        output = b'Hello World!'

        print >> environ['wsgi.errors'], "application debug #1"

        response_headers = [('Content-type', 'text/plain'),
                            ('Content-Length', str(len(output)))]
        start_response(status, response_headers)

        print >> environ['wsgi.errors'], "application debug #2"

        return [output]

If 'wsgi.errors' is not available to the code which needs to output log
messages, then it should explicitly direct output from the 'print'
statement to 'sys.stderr'::

    import sys

    def function():
        print >> sys.stderr, "application debug #3"
            ...

If ``sys.stderr`` or ``sys.stdout`` is used directly then
these messages will end up in the main server host error log file and not
that for the virtual host unless the WSGI application is running in a
daemon process specifically associated with a virtual host.

Do be aware though that writing to ``sys.stdout`` is by default
restricted in versions of mod_wsgi prior to 3.0 and will result in an
exception occurring of the form::

    IOError: sys.stdout access restricted by mod_wsgi

This is because portable WSGI applications should not write to
``sys.stdout`` or use the 'print' statement without specifying an
alternate file object besides ``sys.stdout`` as the target. This
restriction can be disabled for the whole server using the
WSGIRestrictStdout directive, or by mapping ``sys.stdout`` to
``sys.stderr`` at global scope within in the WSGI application script
file::

    import sys
    sys.stdout = sys.stderr

In general, a WSGI application should always endeavour to only log messages
via the 'wsgi.errors' object that is passed through to a WSGI application
in the WSGI environment. This is because this is the only way of logging
messages for which there is some guarantee that they will end up in a log
file that you might have access to if using a shared server.

An application shouldn't however cache 'wsgi.errors' and try to use it
outside of the context of a request. If this is done an exception will be
raised indicating that the request has expired and the error log object
is now invalid.

That messages output via ``sys.stderr`` and ``sys.stdout`` end up in
the Apache error logs at all is provided as a convenience but there is no
requirement in the WSGI specification that they are valid means of a WSGI
application logging messages.

Displaying Request Environment
------------------------------

When a WSGI application is invoked, the request headers are passed as CGI
variables in the WSGI request environment. The dictionary used for this
also holds information about the WSGI execution environment and mod_wsgi.
This includes mod_wsgi specific variables indicating the name of the
process and application groups within which the WSGI application is
executing.

Knowing the values of the process and application group variables can be
important when needing to validate that your Apache configuration is doing
what you intended as far as ensuring your WSGI application is running in
daemon mode or otherwise.

A simple way of validating such details or getting access to any of the
other WSGI request environment variables is to substitute your existing
WSGI application with one which echos back the details to your browser.
Such a task can be achieved with the following test application. The
application could be extended as necessary to display other information as
well, with process ID, user ID and group ID being shown as examples::

    import cStringIO
    import os

    def application(environ, start_response):
        headers = []
        headers.append(('Content-Type', 'text/plain'))
        write = start_response('200 OK', headers)

        input = environ['wsgi.input']
        output = cStringIO.StringIO()

        print >> output, "PID: %s" % os.getpid()
        print >> output, "UID: %s" % os.getuid()
        print >> output, "GID: %s" % os.getgid()
        print >> output

        keys = environ.keys()
        keys.sort()
        for key in keys:
            print >> output, '%s: %s' % (key, repr(environ[key]))
        print >> output

        output.write(input.read(int(environ.get('CONTENT_LENGTH', '0'))))

        return [output.getvalue()]

For the case of the process group as recorded by the
'mod_wsgi.process_group' variable in the WSGI request environment, if the
value is an empty string then the WSGI application is running in embedded
mode. For any other value it will be running in daemon mode with the process
group named by the variables value.

Note that by default WSGI applications run in embedded mode, which means
within the Apache server child processes which accept the original requests.
Daemon mode processes would only be used through appropriate use of the
WSGIDaemonProcess and WSGIProcessGroup directives to delegate the WSGI
application to a named daemon process group.

For the case of the application group as recorded by the
'mod_wsgi.application_group' variable in the WSGI request environment, if the
value is an empty string then the WSGI application is running in the main
Python interpreter. That is, the very first interpreter created when Python
was initialised. For any other value it indicates it is running in the named
Python sub interpreter.

Note that by default WSGI applications would always run in a sub
interpreter rather than the main interpreter. The name of this sub
interpreter would be automatically constructed from the name of the server
or virtual host, the URL mount point of the WSGI application and the number
of the listener port when it is other than ports 80 or 443.

To delegate a WSGI application to run in main Python interpreter, the
WSGIApplicationGroup directive would need to have been used with the value
'%{GLOBAL}'. Although the value is '%{GLOBAL}', this translates to the
empty string seen for the value of 'mod_wsgi.application_group' within the
WSGI request environment.

The WSGIApplicationGroup directive could also be used to designate a
specific named sub interpreter rather than that selected automatically.

For newcomers this can all be a bit confusing, which is where the test
application comes in as you can use it to validate where your WSGI
application is running is where you intended it to run.

The set of WSGI request environment variables will also show the WSGI
variables indicating whether process is multithreaded and whether the
process group is multiprocess or not. For a more complete explanation
of what that means see documentation of
:doc:`../user-guides/processes-and-threading`.

Tracking Request and Response
-----------------------------

Although one can use above test application to display the request
environment, it is replacing your original WSGI application. Rather than
replace your existing application you can use a WSGI middleware wrapper
application which logs the details to the Apache error log instead::

    # Original WSGI application.

    def application(environ, start_response):
        ...

    # Logging WSGI middleware.

    import pprint

    class LoggingMiddleware:

        def __init__(self, application):
            self.__application = application

        def __call__(self, environ, start_response):
            errors = environ['wsgi.errors']
            pprint.pprint(('REQUEST', environ), stream=errors)

            def _start_response(status, headers, *args):
                pprint.pprint(('RESPONSE', status, headers), stream=errors)
                return start_response(status, headers, *args)

            return self.__application(environ, _start_response)

    application = LoggingMiddleware(application)

The output from the middleware would end up in the Apache error log for the
virtual host, or if no virtual host specific error log file, in the main
Apache error log file.

For more complicated problems it may also be necessary to track both the
request and response content as well. A more complicated middleware which
can log these as well as header information to the file system is as
follows::

    # Original WSGI application.

    def application(environ, start_response):
        ...

    # Logging WSGI middleware.

    import threading
    import pprint
    import time
    import os

    class LoggingInstance:
        def __init__(self, start_response, oheaders, ocontent):
            self.__start_response = start_response
            self.__oheaders = oheaders
            self.__ocontent = ocontent

        def __call__(self, status, headers, *args):
            pprint.pprint((status, headers)+args), stream=self.__oheaders)
            self.__oheaders.close()

            self.__write = self.__start_response(status, headers, *args)
            return self.write

        def __iter__(self):
            return self

        def write(self, data):
            self.__ocontent.write(data)
            self.__ocontent.flush()
            return self.__write(data)

        def next(self):
            data = self.__iterable.next()
            self.__ocontent.write(data)
            self.__ocontent.flush()
            return data

        def close(self):
            if hasattr(self.__iterable, 'close'):
                self.__iterable.close()
            self.__ocontent.close()

        def link(self, iterable):
            self.__iterable = iter(iterable)

    class LoggingMiddleware:

        def __init__(self, application, savedir):
            self.__application = application
            self.__savedir = savedir
            self.__lock = threading.Lock()
            self.__pid = os.getpid()
            self.__count = 0

        def __call__(self, environ, start_response):
            self.__lock.acquire()
            self.__count += 1
            count = self.__count
            self.__lock.release()

            key = "%s-%s-%s" % (time.time(), self.__pid, count)

            iheaders = os.path.join(self.__savedir, key + ".iheaders")
            iheaders_fp = file(iheaders, 'w')

            icontent = os.path.join(self.__savedir, key + ".icontent")
            icontent_fp = file(icontent, 'w+b')

            oheaders = os.path.join(self.__savedir, key + ".oheaders")
            oheaders_fp = file(oheaders, 'w')

            ocontent = os.path.join(self.__savedir, key + ".ocontent")
            ocontent_fp = file(ocontent, 'w+b')

            errors = environ['wsgi.errors']
            pprint.pprint(environ, stream=iheaders_fp)
            iheaders_fp.close()

            length = int(environ.get('CONTENT_LENGTH', '0'))
            input = environ['wsgi.input']
            while length != 0:
                data = input.read(min(4096, length))
                if data:
                    icontent_fp.write(data)
                    length -= len(data)
                else:
                    length = 0
            icontent_fp.flush()
            icontent_fp.seek(0, os.SEEK_SET)
            environ['wsgi.input'] = icontent_fp

            iterable = LoggingInstance(start_response, oheaders_fp, ocontent_fp)
            iterable.link(self.__application(environ, iterable))
            return iterable

    application = LoggingMiddleware(application, '/tmp/wsgi')

For this middleware, the second argument to the constructor should be a
preexisting directory. For each request four files will be saved. These
correspond to input headers, input content, response status and headers,
and request content.

Poorly Performing Code
----------------------

The WSGI specification allows any iterable object to be returned as the
response, so long as the iterable yields string values. That this is the
case means that one can too easily return an object which satisfies this
requirement but has some sort of performance related issue.

The worst case of this is where instead of returning a list containing
strings, a single string is returned. The problem with a string is that
when it is iterated over, a single character of the string is yielded each
time. In other words, a single character is written back to the client on
each loop, with a flush occurring in between to ensure that the character
has actually been written and isn't just being buffered.

Although for small strings a performance impact may not be noticed, if
returning large strings the affect on request throughput could be quite
significant.

Another case which can cause problems is to return a file like object. For
iteration over a file like object, typically what can occur is that a
single line within the file is returned each time. If the file is a line
oriented text file where each line is a of a reasonable length, this may be
okay, but if the file is a binary file there may not actually be line
breaks within the file.

For the case where file contains many short lines, throughput would be
affected much like in the case where a string is returned. For the case
where the file is just binary data, the result can be that the complete
file may be read in on the first loop. If the file is large, this could
cause a large transient spike in memory usage. Once that memory is
allocated, it will then be retained by the process, albeit that it may be
reused by the process at a later point.

Because of the performance impacts in terms of throughput and memory usage,
both these cases should be avoided. For the case of returning a string, it
should be returned with a single element list. For the case of a file like
object, the 'wsgi.file_wrapper' extension should be used, or a wrapper
which suitably breaks the response into chunks.

In order to identify where code may be inadvertently returning such iterable
types, the following code can be used::

    import types

    import cStringIO
    import socket
    import StringIO

    BAD_ITERABLES = [
      cStringIO.InputType,
      socket.SocketType,
      StringIO.StringIO,
      types.FileType,
      types.StringType,
    ]

    class ValidatingMiddleware:

        def __init__(self, application):
            self.__application = application

        def __call__(self, environ, start_response):
            errors = environ['wsgi.errors']

            result = self.__application(environ, start_response)

            value = type(result)
            if value == types.InstanceType:
                value = result.__class__
            if value in BAD_ITERABLES:
                print >> errors, 'BAD ITERABLE RETURNED: ',
                print >> errors, 'URL=%s ' % environ['REQUEST_URI'],
                print >> errors, 'TYPE=%s' % value

            return result

    def application(environ, start_response):
        ...

    application = ValidatingMiddleware(application)

Error Catching Middleware
-------------------------

Because mod_wsgi only logs details of uncaught exceptions to the Apache
error log and returns a generic HTTP 500 "Internal Server Error" response,
if you want the details of any exception to be displayed in the error
page and be visible from the browser, you will need to use a WSGI error
catching middleware component.

One example of WSGI error catching middleware is the ErrorMiddleware class
from Paste.

  * http://www.pythonpaste.org
    
This class can be configured not only to catch exceptions and present the
details to the browser in an error page, it can also be configured to send
the details of any errors in email to a designated recipient, or log the
details to an alternate log file.

Being able to have error details sent by email would be useful in a
production environment or where your application is running on a web
hosting environment and the Apache error logs would not necessarily be
closely monitored on a day to day basis. Enabling of that particular
feature though should possibly only be done when you have some confidence
in the application else you might end up getting inundated with emails.

To use the error catching middleware from Paste you simply need to wrap
your existing application with it such that it then becomes the top level
application entry point::

    def application(environ, start_response):
        status = '200 OK'
        output = b'Hello World!\n'

        response_headers = [('Content-type', 'text/plain'),
                            ('Content-Length', str(len(output)))]
        start_response(status, response_headers)

        return [output]

    from paste.exceptions.errormiddleware import ErrorMiddleware
    application = ErrorMiddleware(application, debug=True)

In addition to displaying information about the Python exception that has
occurred and the stack traceback, this middleware component will also
output information about the WSGI environment such that you can see what
was being passed to the WSGI application. This can be useful if the cause
of any problem was unexpected values passed in the headers of the HTTP
request.

Note that error catching middleware is of absolutely no use for trying
to capture and display in the browser any errors that occur at global scope
within the WSGI application script when it is being imported. Details of
any such errors occurring at this point will only be captured in the Apache
error log files. As much as possible you should avoid performing
complicated tasks when the WSGI application script file is being imported,
instead you should only trigger such actions the first time a request is
received. By doing this you will be able to capture errors in such
initialisation code with the error catching middleware.

Also note that the debug mode whereby details are displayed in the browser
should only be used during development and not in a production system. This
is because details which are displayed may be of use to anyone who may wish
to compromise your site.

Python Interactive Debugger
---------------------------

Python debuggers such as implemented by the 'pdb' module can sometimes be
useful in debugging Python applications, especially where there is a need
to single step through code and analyse application state at each point.
Use of such debuggers in web applications can be a bit more tricky than
normal applications though and especially so with mod_wsgi.

The problem with mod_wsgi is that the Apache web server can create multiple
child processes to respond to requests. Partly because of this, but also
just to prevent problems in general, Apache closes off standard input at
startup. Thus there is no actual way to interact with the Python debugger
module if it were used.

To get around this requires having complete control of the Apache web
server that you are using to host your WSGI application. In particular, it
will be necessary to shutdown the web server and then startup the 'httpd'
process explicitly in single process debug mode, avoiding the 'apachectl'
management application altogether::

    $ apachectl stop
    $ httpd -X

If Apache is normally started as the 'root' user, this also will need to be
run as the 'root' user otherwise the Apache web server will not have the
required permissions to write to its log directories etc.

The result of starting the 'httpd' process in this way will be that the
Apache web server will run everything in one process rather than using
multiple processes. Further, it will not close off standard input thus
allowing the Python debugger to be used.

Do note though that one cannot be using the ability of mod_wsgi to run
your application in a daemon process when doing this. The WSGI application
must be running within the main Apache process.

To trigger the Python debugger for any call within your code, the following
customised wrapper for the 'Pdb' class should be used::

    class Debugger:

        def __init__(self, object):
            self.__object = object

        def __call__(self, *args, **kwargs):
            import pdb, sys
            debugger = pdb.Pdb()
            debugger.use_rawinput = 0
            debugger.reset()
            sys.settrace(debugger.trace_dispatch)

            try:
                return self.__object(*args, **kwargs)
            finally:
                debugger.quitting = 1
                sys.settrace(None)

This might for example be used to wrap the actual WSGI application callable
object::

    def application(environ, start_response):
        status = '200 OK'
        output = b'Hello World!\n'

        response_headers = [('Content-type', 'text/plain'),
                            ('Content-Length', str(len(output)))]
        start_response(status, response_headers)

        return [output]

    application = Debugger(application)

When a request is now received, the Python debugger will be triggered and
you can interactively debug your application from the window you ran the
'httpd' process. For example::

    > /usr/local/wsgi/scripts/hello.py(21)application()
    -> status = '200 OK'

    (Pdb) list
     16             finally:
     17                 debugger.quitting = 1
     18                 sys.settrace(None)
     19
     20     def application(environ, start_response):
     21  ->     status = '200 OK'
     22         output = b'Hello World!\n'
     23
     24         response_headers = [('Content-type', 'text/plain'),
     25                             ('Content-Length', str(len(output)))]
     26         start_response(status, response_headers)

    (Pdb) print start_response
    <built-in method start_response of mod_wsgi.Adapter object at 0x1160180>

    cont

When wishing to allow the request to complete, issue the 'cont' command. If
wishing to cause the request to abort, issue the 'quit' command. This will
result in a 'BdbQuit' exception being raised and would result in a HTTP
500 "Internal Server Error" response being returned to the client. To kill
off the whole 'httpd' process, after having issued 'cont' or 'quit' to exit
the debugger, interrupt the process using 'CTRL-C'.

To see what commands the Python debugger accepts, issue the 'help' command
and also consult the documentation for the 'pdb' module on the Python web
site.

Note that the Python debugger expects to be able to write to
``sys.stdout`` to display information to the terminal. Thus if using
using a Python web framework which replaces ``sys.stdout`` such as
web.py, you will not be able to use the Python debugger.

Browser Based Debugger
----------------------

In order to use the Python debugger modules you need to have direct access
to the host and the Apache web server that is running your WSGI application.
If your only access to the system is via your web browser this makes the use
of the full Python debugger impractical.

An alternative to the Python debugger modules which is available is an
extension of the WSGI error catching middleware previously described. This
is the EvalException class from Paste. It embodies the error catching
attributes of the ErrorMiddleware class, but also allows some measure of
interactive debugging and introspection through the web browser.

As with any WSGI middleware component, to use the class entails creating
a wrapper around the application you wish to debug::

    def application(environ, start_response):
        status = '200 OK'
        output = b'Hello World!\n'

        response_headers = [('Content-type', 'text/plain'),
                            ('Content-Length', str(len(output)))]
        start_response(status, response_headers)

        return [output]

    from paste.evalexception.middleware import EvalException
    application = EvalException(application)

Like ErrorMiddleware when an unexpected exception occurs a web page is
presented which shows the location of the error along with the contents of
the WSGI application environment. Where EvalException is different however
is that it is possible to inspect the local variables residing within each
stack frame down to where the error occurred. Further, it is possible to
enter Python code which can be evaluated within the context of the selected
stack frame in order to access data or call functions or methods of
objects.

In order for this to all work requires that subsequent requests back to
the WSGI application always end up with the same process where the error
originally occurred. With mod_wsgi this does however present a bit of a
problem as Apache can create and use multiple child processes to handle
requests.

Because of this requirement, if you want to be able to use this browser
based interactive debugger, if running your application in embedded mode of
mod_wsgi, you will need to configure Apache such that it only starts up one
child process to handle requests and that it never creates any additional
processes. The Apache configuration directives required to achieve this are
as follows::

    StartServers 1
    ServerLimit 1

The directives must be placed at global scope within the main Apache
configuration files and will affect the whole Apache web server.

If you are using the worker MPM on a UNIX system, restricting Apache to
just a single process may not be an issue, at least during development. If
however you are using the prefork MPM on a UNIX system, you may see issues
if you are using an AJAX intensive page that relies on being able to
execute parallel requests, as only one request at a time will be able to be
handled by the Apache web server.

If using Apache 2.X on a UNIX system, a better approach is to use daemon
mode of mod_wsgi and delegate your application to run in a single daemon
process. This process may be single or multithreaded as per any threading
requirements of your application.

Which ever configuration is used, if the browser based interactive debugger
is used it should only be used on a development system and should never be
deployed on a production system or in a web hosting environment. This is
because the debugger will allow one to execute arbitrary Python code within
the context of your application from a remote client.

Debugging Crashes With GDB
--------------------------

In cases where Apache itself crashes for no apparent reason, the above
techniques are not always particularly useful. This is especially the case
where the crash occurs in non Python code outside of your WSGI application.

The most common cause of Apache crashing, besides any still latent bugs
that may exist in mod_wsgi, of which hopefully there aren't any, are shared
library version mismatches. Another major cause of crashes is third party C
extension modules for Python which are not compatible with being used in a
Python sub interpreter which isn't the first interpreter created when
Python is initialised, or modules which are not compatible with Python sub
interpreters being destroyed and the module then being used in a new Python
sub interpreter.

Examples of where shared library version mismatches are known to occur are
between the version of the 'expat' library used by Apache and that embedded
within the Python 'pyexpat' module. Another is between the version of the
MySQL client libraries used by PHP and the Python MySQL module.

Both these can be a cause of crashes where the different components are
compiled and linked against different versions of the shared library for
the packages in question. It is vitally important that all packages making
use of a shared library were compiled against and use the same version of
a shared library.

Another problematic package is Subversion. In this case there can be
conflicts between the version of Subversion libraries used by mod_dav_svn
and the Python Subversion bindings. Certain versions of the Python
Subversion modules also cause problems because they appear to be
incompatible with use in a Python sub interpreter which isn't the first
interpreter created when Python is initialised.

In this latter issue, the sub interpreter problems can often be solved by
forcing the WSGI application using the Python Subversion modules to run in
the '%{GLOBAL}' application group. This solution often also resolves issues
with SWIG generated bindings, especially where the ``-thread`` option was
supplied to 'swig' when the bindings were generated.

Whatever the reason, in some cases the only way to determine why Apache or
Python is crashing is to use a C code debugger such as 'gdb'. Now although
it is possible to attach 'gdb' to a running process, the preferred method
for using 'gdb' in conjunction with Apache is to run Apache in single
process debug mode from within 'gdb'.

To do this it is necessary to first shutdown Apache. The 'gdb' debugger can
then be started against the 'httpd' executable and then the process started
up from inside of 'gdb'::

    $ /usr/local/apache/bin/apachectl stop
    $ sudo gdb /usr/local/apache/bin/httpd
    GNU gdb 6.1-20040303 (Apple version gdb-384) (Mon Mar 21 00:05:26 GMT 2005)
    Copyright 2004 Free Software Foundation, Inc.
    GDB is free software, covered by the GNU General Public License, and you are
    welcome to change it and/or distribute copies of it under certain conditions.
    Type "show copying" to see the conditions.
    There is absolutely no warranty for GDB.  Type "show warranty" for details.
    This GDB was configured as "powerpc-apple-darwin"...Reading symbols for shared
    libraries ........ done

    (gdb) run -X
    Starting program: /usr/local/apache/bin/httpd -X
    Reading symbols for shared libraries .+++ done
    Reading symbols for shared libraries ..................... done

If Apache is normally started as the 'root' user, this also will need to be
run as the 'root' user otherwise the Apache web server will not have the
required permissions to write to its log directories etc.

If Apache was crashing on startup, you should immediately encounter the
error, otherwise use your web browser to access the URL which is causing
the crash to occur. You can then commence trying to debug why the crash is
occurring.

Note that you should ensure that you have not assigned your WSGI
application to run in a mod_wsgi daemon process using the WSGIDaemonProcess
and WSGIProcessGroup directives. This is because the above procedure will
only catch crashes which occur when the application is running in embedded
mode. If it turns out that the application only crashes when run in mod_wsgi
daemon mode, an alternate method of using 'gdb' will be required.

In this circumstance you should run Apache as normal, but ensure that you
only create one mod_wsgi daemon process and have it use only a single
thread::

    WSGIDaemonProcess debug threads=1
    WSGIProcessGroup debug

If not running the daemon process as a distinct user where you can tell
which process it is, then you will also need to ensure that Apache
!LogLevel directive has been set to 'info'. This is to ensure that
information about daemon processes created by mod_wsgi are logged to the
Apache error log. This is necessary, as you will need to consult the Apache
error logs to determine the process ID of the daemon process that has been
created for that daemon process group::

    mod_wsgi (pid=666): Starting process 'debug' with threads=1.

Knowing the process ID, you should then run 'gdb', telling it to attach
directly to the daemon process::

    $ sudo gdb /usr/local/apache/bin/httpd 666
    GNU gdb 6.1-20040303 (Apple version gdb-384) (Mon Mar 21 00:05:26 GMT 2005)
    Copyright 2004 Free Software Foundation, Inc.
    GDB is free software, covered by the GNU General Public License, and you are
    welcome to change it and/or distribute copies of it under certain conditions.
    Type "show copying" to see the conditions.
    There is absolutely no warranty for GDB.  Type "show warranty" for details.
    This GDB was configured as "powerpc-apple-darwin"...Reading symbols for shared
    libraries ........ done

    /Users/grahamd/666: No such file or directory.
    Attaching to program: `/usr/local/apache/bin/httpd', process 666.
    Reading symbols for shared libraries .+++..................... done
    0x900c7060 in sigwait ()
    (gdb) cont
    Continuing.

Once 'gdb' has been started and attached to the process, then initiate the
request with the URL that causes the application to crash.

Attaching to the running daemon process can also be useful where a single
request or the whole process is appearing to hang. In this case one can
force a stack trace to be output for all running threads to try and
determine what code is getting stuck. The appropriate gdb command in this
instance is 'thread apply all bt'::

    sudo gdb /usr/local/apache-2.2/bin/httpd 666
    GNU gdb 6.3.50-20050815 (Apple version gdb-477) (Sun Apr 30 20:06:22 GMT 2006)
    Copyright 2004 Free Software Foundation, Inc.
    GDB is free software, covered by the GNU General Public License, and you are
    welcome to change it and/or distribute copies of it under certain conditions.
    Type "show copying" to see the conditions.
    There is absolutely no warranty for GDB.  Type "show warranty" for details.
    This GDB was configured as "powerpc-apple-darwin"...Reading symbols
    for shared libraries ....... done

    /Users/grahamd/666: No such file or directory.
    Attaching to program: `/usr/local/apache/bin/httpd', process 666.
    Reading symbols for shared libraries .+++..................... done
    0x900c7060 in sigwait ()
    (gdb) thread apply all bt

    Thread 4 (process 666 thread 0xd03):
    #0  0x9001f7ac in select ()
    #1  0x004189b4 in apr_pollset_poll (pollset=0x1894650,
        timeout=-1146117585187099488, num=0xf0182d98, descriptors=0xf0182d9c)
        at poll/unix/select.c:363
    #2  0x002a57f0 in wsgi_daemon_thread (thd=0x1889660, data=0x18895e8)
        at mod_wsgi.c:6980
    #3  0x9002bc28 in _pthread_body ()

    Thread 3 (process 666 thread 0xc03):
    #0  0x9001f7ac in select ()
    #1  0x0041d224 in apr_sleep (t=1000000) at time/unix/time.c:246
    #2  0x002a2b10 in wsgi_deadlock_thread (thd=0x0, data=0x2aee68) at
        mod_wsgi.c:7119
    #3  0x9002bc28 in _pthread_body ()

    Thread 2 (process 666 thread 0xb03):
    #0  0x9001f7ac in select ()
    #1  0x0041d224 in apr_sleep (t=299970002) at time/unix/time.c:246
    #2  0x002a2dec in wsgi_monitor_thread (thd=0x0, data=0x18890e8) at
        mod_wsgi.c:7197
    #3  0x9002bc28 in _pthread_body ()

    Thread 1 (process 666 thread 0x203):
    #0  0x900c7060 in sigwait ()
    #1  0x0041ba9c in apr_signal_thread (signal_handler=0x2a29a0
        <wsgi_check_signal>) at threadproc/unix/signals.c:383
    #2  0x002a3728 in wsgi_start_process (p=0x1806418, daemon=0x18890e8)
        at mod_wsgi.c:7311
    #3  0x002a6a4c in wsgi_hook_init (pconf=0x1806418, ptemp=0x0,
        plog=0xc8, s=0x18be8d4) at mod_wsgi.c:7716
    #4  0x0000a5b0 in ap_run_post_config (pconf=0x1806418, plog=0x1844418,
        ptemp=0x180e418, s=0x180da78) at config.c:91
    #5  0x000033d4 in main (argc=3, argv=0xbffffa8c) at main.c:706

It is suggested when trying to debug such issues that the daemon process be
made to run with only a single thread. This will reduce how many stack
traces one needs to analyse.

If you are running with multiple processes within the daemon process group
and all requests are hanging, you will need to get a snapshot of what is
happening in all processes in the daemon process group. Because doing this
by hand will be tedious, it is better to automate it.

To automate capturing the stack traces, first create a file called 'gdb.cmds'
which contains the following::

    set pagination 0
    thread apply all bt
    detach
    quit

This can then be used in conjunction with 'gdb' to avoid needing to enter
the commands manually. For example::

    sudo gdb /usr/local/apache-2.2/bin/httpd -x gdb.cmds -p 666

To be able to automate this further and apply it to all processes in a
daemon process group, then first off ensure that daemon processes are named
in 'ps' output by using the 'display-name' option to WSGIDaemonProcess
directive.

For example, to apply default naming strategy as implemented by mod_wsgi, use::

    WSGIDaemonProcess xxx display-name=%{GLOBAL}

In the output of a BSD derived 'ps' command, this will now show the process
as being named '(wsgi:xxx)'::

    $ ps -cxo command,pid | grep wsgi
    (wsgi:xxx)        666

Note that the name may be truncated as the resultant name can be no longer
than what was the length of the original executable path for Apache. You
may therefore like to name it explicitly::

    WSGIDaemonProcess xxx display-name=(wsgi:xxx)

Having named the processes in the daemon process group, we can now parse the
output of 'ps' to identify the process and apply the 'gdb' command script to
each::

    for pid in `ps -cxo command,pid | awk '{ if ($0 ~ /wsgi:xxx/ && $1 !~ /grep/) print $NF }'`; do sudo gdb -x gdb.cmds -p $pid; done

The actual name given to the daemon process group using the 'display-name'
option should be replaced in this command line. That is, change 'wsgi:xxx'
appropriately.

If you are having problems with process in daemon process groups hanging,
you might consider implementing a monitoring system which automatically
detects somehow when the processes are no longer responding to requests and
automatically trigger this dump of the stack traces before restarting the
daemon process group or Apache.

Extracting Python Stack Traces
------------------------------

Using gdb to get stack traces as described above only gives you information
about what is happening at the C code level. This will not tell where in the
actual Python code execution was at. Your only clue is going to be where a
call out was being made to some distinct C function in a C extension module
for Python.

One can get stack traces for Python code by using::

    def _stacktraces():
       code = []
       for threadId, stack in sys._current_frames().items():
           code.append("\n# ThreadID: %s" % threadId)
           for filename, lineno, name, line in traceback.extract_stack(stack):
               code.append('File: "%s", line %d, in %s' % (filename,
                       lineno, name))
               if line:
                   code.append("  %s" % (line.strip()))

       for line in code:
           print >> sys.stderr, line

The caveat here obviously is that the process has to still be running. There
is also the issue of how you trigger that function to dump stack traces for
executing Python threads.

If the problem you have is that some request handler threads are stuck,
either blocked, or stuck in an infinite loop, and you want to know what they
are doing, then so long as there are still some handler threads left and
the application is still responding to requests, then you could trigger it
from a request handler triggered by making a request against a specific URL.

This though depends on you only running your application within a single
process because as soon as you have multiple processes you have no guarantee
that a request will go to the process you want to debug.

A better method therefore is to have a perpetually running background thread
which monitors for a specific file in the file system. When that file is
created or the modification time changes, then the background thread would
dump the stack traces for the process.

Sample code which takes this approach is included below. This code could be
placed temporarily at the end of your WSGI script file if you know you are
going to need it because of a recurring problem::

    import os
    import sys
    import time
    import signal
    import threading
    import atexit
    import Queue
    import traceback 

    FILE = '/tmp/dump-stack-traces.txt'

    _interval = 1.0

    _running = False
    _queue = Queue.Queue()
    _lock = threading.Lock()

    def _stacktraces(): 
        code = [] 
        for threadId, stack in sys._current_frames().items(): 
            code.append("\n# ProcessId: %s" % os.getpid()) 
            code.append("# ThreadID: %s" % threadId) 
            for filename, lineno, name, line in traceback.extract_stack(stack): 
                code.append('File: "%s", line %d, in %s' % (filename, 
                        lineno, name)) 
                if line: 
                    code.append("  %s" % (line.strip())) 

        for line in code:
            print >> sys.stderr, line

    try:
        mtime = os.path.getmtime(FILE)
    except:
        mtime = None

    def _monitor():
        while 1:
            global mtime

            try:
                current = os.path.getmtime(FILE)
            except:
                current = None

            if current != mtime:
                mtime = current
                _stacktraces()

            # Go to sleep for specified interval.

            try:
                return _queue.get(timeout=_interval)
            except:
                pass

    _thread = threading.Thread(target=_monitor)
    _thread.setDaemon(True)

    def _exiting():
        try:
            _queue.put(True)
        except:
            pass
        _thread.join()

    atexit.register(_exiting)

    def _start(interval=1.0):
        global _interval
        if interval < _interval:
            _interval = interval

        global _running
        _lock.acquire()
        if not _running:
            prefix = 'monitor (pid=%d):' % os.getpid()
            print >> sys.stderr, '%s Starting stack trace monitor.' % prefix
            _running = True
            _thread.start()
        _lock.release()

    _start()

Once your WSGI script file has been loaded, then touching the file
'/tmp/dump-stack-traces.txt' will cause stack traces for active Python
threads to be output to the Apache error log.

Note that the sample code doesn't deal with possibility that with multiple
processes for same application, that all processes may attempt to dump
information at the same time. As such, you may get interleaving of output
from multiple processes in Apache error logs at the same time.

What you may want to do is modify this code to dump out to some special
directory, distinct files containing the trace where the names of the file
include the process ID and a date/time. That way each will be separate.

An example of what one might expect to see from the above code is as
follows::

    # ProcessId: 666
    # ThreadID: 4352905216
    File: "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/threading.py", line 497, in __bootstrap
      self.__bootstrap_inner()
    File: "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/threading.py", line 522, in __bootstrap_inner
      self.run()
    File: "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/threading.py", line 477, in run
      self.__target(*self.__args, **self.__kwargs)
    File: "/Library/WebServer/Sites/django-1/htdocs/project.wsgi", line 72, in _monitor
      _stacktraces()
    File: "/Library/WebServer/Sites/django-1/htdocs/project.wsgi", line 47, in _stacktraces
      for filename, lineno, name, line in traceback.extract_stack(stack):

    # ThreadID: 4322832384
    File: "/Library/WebServer/Sites/django-1/htdocs/project.wsgi", line 21, in application
      return _application(environ, start_response)
    File: "/Library/WebServer/Sites/django-1/lib/python2.6/site-packages/django/core/handlers/wsgi.py", line 245, in __call__
      response = middleware_method(request, response)
    File: "/Library/WebServer/Sites/django-1/lib/python2.6/site-packages/django/contrib/sessions/middleware.py", line 36, in process_response
      request.session.save()
    File: "/Library/WebServer/Sites/django-1/lib/python2.6/site-packages/django/contrib/sessions/backends/db.py", line 63, in save
      obj.save(force_insert=must_create, using=using)
    File: "/Library/WebServer/Sites/django-1/lib/python2.6/site-packages/django/db/models/base.py", line 434, in save
      self.save_base(using=using, force_insert=force_insert, force_update=force_update)
    File: "/Library/WebServer/Sites/django-1/lib/python2.6/site-packages/django/db/models/base.py", line 527, in save_base
      result = manager._insert(values, return_id=update_pk, using=using)
    File: "/Library/WebServer/Sites/django-1/lib/python2.6/site-packages/django/db/models/manager.py", line 195, in _insert
      return insert_query(self.model, values, **kwargs)
    File: "/Library/WebServer/Sites/django-1/lib/python2.6/site-packages/django/db/models/query.py", line 1479, in insert_query
      return query.get_compiler(using=using).execute_sql(return_id)
    File: "/Library/WebServer/Sites/django-1/lib/python2.6/site-packages/django/db/models/sql/compiler.py", line 783, in execute_sql
      cursor = super(SQLInsertCompiler, self).execute_sql(None)
    File: "/Library/WebServer/Sites/django-1/lib/python2.6/site-packages/django/db/models/sql/compiler.py", line 727, in execute_sql
      cursor.execute(sql, params)
    File: "/Library/WebServer/Sites/django-1/lib/python2.6/site-packages/debug_toolbar/panels/sql.py", line 95, in execute
      stacktrace = tidy_stacktrace(traceback.extract_stack())
    File: "/Library/WebServer/Sites/django-1/lib/python2.6/site-packages/debug_toolbar/panels/sql.py", line 40, in tidy_stacktrace
      s_path = os.path.realpath(s[0])
    File: "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/posixpath.py", line 355, in realpath
      if islink(component):
    File: "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/posixpath.py", line 132, in islink
      st = os.lstat(path)

Note that one of the displayed threads will be that for the thread which is
dumping the stack traces. That stack trace can obviously be ignored.

One could extend the above recipe in more elaborate ways by using a WSGI
middleware that capture details of each request from the WSGI environment
and also dumping out from that the URL for the request being handled by
any threads. This may assist in working out whether problems are related
to a specific URL.