pathoc: break all the Python webservers!

2012-09-27

A few months ago, I announced pathod, a pathological HTTP daemon. The project started as a testing tool to let me craft standards-violating HTTP responses while working on mitmproxy. It soon became a free-standing project, and has turned out to be incredibly useful in security testing, exploit delivery and general creative mischief. In the last release, I added pathoc - pathod's malicious client-side twin. It does for HTTP requests what pathod does for HTTP responses, and uses the same hyper-terse specification language.

In this post, I show how pathoc can be used as a very simple fuzzer, by finding issues in a number of major pure-Python webservers. None of the tested servers failed catastrophically - they all caught the unexpected exception and continued serving requests. None the less, I think it's reasonable to say that we've triggered a bug if a) the server returns an 500 Internal Server Error response or terminates the connection abnormally, and b) we see a traceback in our logs. In fact, by this definition, I found bugs in every pure-Python server I tested.

All of the problems I list below are simple failures of validation - what they have in common is that somewhere in the project code is called with input that it doesn't expect and can't handle. This matters - in fact, I'd argue that the majority of security problems fall in this category. It's interesting to ponder why this type of issue is so ubiquitous in Python servers. I have no doubt that part the answer lies in Python's use of exceptions - errors that would be explicit in other languages can be implicit in Python, and code that seems clean and intuitive might in fact be buggy. I think this is especially relevant right now, given the recent flurry of discussion surrounding the Go language and its error handling. It's pretty instructive to read Russ Cox's recent riposte to this post criticizing Go's explicit approach, while looking at the bugs below. I love Python and I think it's a fine language, but I also think the designers of Go probably made the right choice.

Basic fuzzing with pathoc

My methodology for these tests was very simple indeed. I launched each server in turn, and used pathod to fire corrupted GET requests at the daemon until I saw an error. I then looked at the logs, and boiled the distinct cases down to a minimal pathoc specification by hand. This exercises a rather shallow set of features in the server software - mostly parsing of the HTTP lead-in and request headers. It's possible to give software a much, much deeper workout with pathoc, but I'll leave that for a future post.

My pathoc fuzzing command looked something like this:

pathoc -n 1000 -p 8080 -t 1 localhost 'get:/:b@10:ir,"\x00"'

The most important flags here are -n, which tells pathoc to make 1000 consecutive requests, and -t, which tells pathoc to time out after one second (necessary to prevent hangs when daemons terminate improperly). The request specification itself breaks down as follows:

get Issue a GET request
/ ... to the path /
b@10 ... with a body consisting of 10 random bytes
ir,"\x00" ... and inject a NULL byte at a random location.

It's that last clause - the random injection - that makes the difference between simply crafting requests and basic fuzzing. Every time a new request is issued, the injection occurs at a different location. I varied the injected character between a NULL byte, a carriage return and a random alphabet letter. Each exposed different errors in different servers. For a complete description of the specification language, see the online docs.

Results

For each bug, I've given a traceback and a minimal pathoc call to trigger the issue. The tracebacks have been edited lightly to shorten file paths and remove irrelevances like timestamps.

CherryPy

pathoc -p 8080 localhost 'get:/:b@10:h"Content-Length"="x"'
ENGINE ValueError("invalid literal for int() with base 10: 'x'",)
Traceback (most recent call last):
  File "cherrypy/wsgiserver/wsgiserver2.py", line 1292, in communicate
    req.parse_request()
  File "cherrypy/wsgiserver/wsgiserver2.py", line 591, in parse_request
    success = self.read_request_headers()
  File "cherrypy/wsgiserver/wsgiserver2.py", line 711, in read_request_headers
    if mrbs and int(self.inheaders.get("Content-Length", 0)) > mrbs:
ValueError: invalid literal for int() with base 10: 'x'
pathoc -p 8080 localhost 'get:/:i4,"\r"
ENGINE TypeError("argument of type 'NoneType' is not iterable",)
Traceback (most recent call last):
  File "cherrypy/wsgiserver/wsgiserver2.py", line 1292, in communicate
    req.parse_request()
  File "cherrypy/wsgiserver/wsgiserver2.py", line 580, in parse_request
    success = self.read_request_line()
  File "cherrypy/wsgiserver/wsgiserver2.py", line 644, in read_request_line
    if NUMBER_SIGN in path:
TypeError: argument of type 'NoneType' is not iterable

Tornado

pathoc -p 8080 localhost 'get:/:b@10:h"Content-Length"="x"'
[E 120927 11:42:26 iostream:307] Uncaught exception, closing connection.
    Traceback (most recent call last):
      File "tornado/iostream.py", line 304, in wrapper
        callback(*args)
      File "tornado/httpserver.py", line 254, in _on_headers
        content_length = int(content_length)
    ValueError: invalid literal for int() with base 10: 'x'
[E 120927 11:42:26 ioloop:435] Exception in callback <tornado.stack_context._StackContextWrapper object at 0x1012e28e8>
    Traceback (most recent call last):
      File "tornado/ioloop.py", line 421, in _run_callback
        callback()
      File "tornado/iostream.py", line 304, in wrapper
        callback(*args)
      File "tornado/httpserver.py", line 254, in _on_headers
        content_length = int(content_length)
    ValueError: invalid literal for int() with base 10: 'x'
pathoc -p 8080 localhost 'get:/:h"h\r\n"="x"'
[E iostream:307] Uncaught exception, closing connection.
    Traceback (most recent call last):
      File "tornado/iostream.py", line 304, in wrapper
        callback(*args)
      File "tornado/httpserver.py", line 236, in _on_headers
        headers = httputil.HTTPHeaders.parse(data[eol:])
      File "tornado/httputil.py", line 127, in parse
        h.parse_line(line)
      File "tornado/httputil.py", line 113, in parse_line
        name, value = line.split(":", 1)
    ValueError: need more than 1 value to unpack
[E ioloop:435] Exception in callback <tornado.stack_context._StackContextWrapper object at 0x1012bd7e0>
    Traceback (most recent call last):
      File "tornado/ioloop.py", line 421, in _run_callback
        callback()
      File "tornado/iostream.py", line 304, in wrapper
        callback(*args)
      File "tornado/httpserver.py", line 236, in _on_headers
        headers = httputil.HTTPHeaders.parse(data[eol:])
      File "tornado/httputil.py", line 127, in parse
        h.parse_line(line)
      File "tornado/httputil.py", line 113, in parse_line
        name, value = line.split(":", 1)
    ValueError: need more than 1 value to unpack

Twisted

pathoc -p 8080 localhost 'get:/:b@10:h"Content-Length"="x"'
[HTTPChannel,4,127.0.0.1] Unhandled Error
  Traceback (most recent call last):
    File "twisted/python/log.py", line 84, in callWithLogger
      return callWithContext({"system": lp}, func, *args, **kw)
    File "twisted/python/log.py", line 69, in callWithContext
      return context.call({ILogContext: newCtx}, func, *args, **kw)
    File "twisted/python/context.py", line 118, in callWithContext
      return self.currentContext().callWithContext(ctx, func, *args, **kw)
    File "twisted/python/context.py", line 81, in callWithContext
      return func(*args,**kw)
  --- <exception caught here> ---
    File "twisted/internet/selectreactor.py", line 150, in _doReadOrWrite
      why = getattr(selectable, method)()
    File "twisted/internet/tcp.py", line 199, in doRead
      rval = self.protocol.dataReceived(data)
    File "twisted/protocols/basic.py", line 564, in dataReceived
      why = self.lineReceived(line)
    File "twisted/web/http.py", line 1558, in lineReceived
      self.headerReceived(self.__header)
    File "twisted/web/http.py", line 1580, in headerReceived
      self.length = int(data)
  exceptions.ValueError: invalid literal for int() with base 10: 'x'

SimpleHTTP

pathoc -p 8080 localhost 'get:"/\0"'
Exception happened during processing of request from ('127.0.0.1', 54029)
Traceback (most recent call last):
  File "lib/python2.7/SocketServer.py", line 284, in _handle_request_noblock
    self.process_request(request, client_address)
  File "lib/python2.7/SocketServer.py", line 310, in process_request
    self.finish_request(request, client_address)
  File "lib/python2.7/SocketServer.py", line 323, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "lib/python2.7/SocketServer.py", line 638, in __init__
    self.handle()
  File "python2.7/BaseHTTPServer.py", line 340, in handle
    self.handle_one_request()
  File "lib/python2.7/BaseHTTPServer.py", line 328, in handle_one_request
    method()
  File "lib/python2.7/SimpleHTTPServer.py", line 44, in do_GET
    f = self.send_head()
  File "lib/python2.7/SimpleHTTPServer.py", line 68, in send_head
    if os.path.isdir(path):
  File "lib/python2.7/genericpath.py", line 41, in isdir
    st = os.stat(s)
TypeError: must be encoded string without NULL bytes, not str

Waitress

pathoc -p 8080 localhost 'get:/:i16," "'
ERROR:waitress:uncaptured python exception, closing channel
<waitress.channel.HTTPChannel connected 127.0.0.1:62330 at 0x1007ca310>
(
    <type 'exceptions.IndexError'>:list index out of range
        [lib/python2.7/asyncore.py|read|83]
        [lib/python2.7/asyncore.py|handle_read_event|444]
        [lib/python2.7/site-packages/waitress/channel.py|handle_read|169]
        [lib/python2.7/site-packages/waitress/channel.py|received|186]
        [lib/python2.7/site-packages/waitress/parser.py|received|99]
        [lib/python2.7/site-packages/waitress/parser.py|parse_header|158]
        [lib/python2.7/site-packages/waitress/parser.py|get_header_lines|247]
)

Edit: The first version of this post had examples that were due to the test WSGI application, not waitress. I've replaced them with the traceback above, which has been reformatted for clarity.

Werkzeug

pathoc -p 8080 localhost 'get:/:h"Host"="n\r\0"'
Traceback (most recent call last):
  File "flask/app.py", line 1518, in __call__
    return self.wsgi_app(environ, start_response)
  File "flask/app.py", line 1507, in wsgi_app
    return response(environ, start_response)
  File "/usr/local/lib/python2.7/site-packages/werkzeug/wrappers.py", line 1082, in __call__
    app_iter, status, headers = self.get_wsgi_response(environ)
  File "werkzeug/wrappers.py", line 1070, in get_wsgi_response
    headers = self.get_wsgi_headers(environ)
  File "werkzeug/wrappers.py", line 986, in get_wsgi_headers
    headers['Location'] = location
  File "werkzeug/datastructures.py", line 1132, in __setitem__
    self.set(key, value)
  File "werkzeug/datastructures.py", line 1097, in set
    self._validate_value(_value)
  File "werkzeug/datastructures.py", line 1065, in _validate_value
    raise ValueError('Detected newline in header value.  This is '
ValueError: Detected newline in header value.  This is a potential security problem