A few months ago, I announced pathod, a pathological HTTP daemon. The project started as a testing tool to let me craft standards-violating HTTP responses while working on mitmproxy. It soon became a free-standing project, and has turned out to be incredibly useful in security testing, exploit delivery and general creative mischief. In the last release, I added pathoc - pathod's malicious client-side twin. It does for HTTP requests what pathod does for HTTP responses, and uses the same hyper-terse specification language.
In this post, I show how pathoc can be used as a very simple fuzzer, by finding issues in a number of major pure-Python webservers. None of the tested servers failed catastrophically - they all caught the unexpected exception and continued serving requests. None the less, I think it's reasonable to say that we've triggered a bug if a) the server returns an 500 Internal Server Error response or terminates the connection abnormally, and b) we see a traceback in our logs. In fact, by this definition, I found bugs in every pure-Python server I tested.
All of the problems I list below are simple failures of validation - what they have in common is that somewhere in the project code is called with input that it doesn't expect and can't handle. This matters - in fact, I'd argue that the majority of security problems fall in this category. It's interesting to ponder why this type of issue is so ubiquitous in Python servers. I have no doubt that part the answer lies in Python's use of exceptions - errors that would be explicit in other languages can be implicit in Python, and code that seems clean and intuitive might in fact be buggy. I think this is especially relevant right now, given the recent flurry of discussion surrounding the Go language and its error handling. It's pretty instructive to read Russ Cox's recent riposte to this post criticizing Go's explicit approach, while looking at the bugs below. I love Python and I think it's a fine language, but I also think the designers of Go probably made the right choice.
Basic fuzzing with pathoc
My methodology for these tests was very simple indeed. I launched each server in turn, and used pathod to fire corrupted GET requests at the daemon until I saw an error. I then looked at the logs, and boiled the distinct cases down to a minimal pathoc specification by hand. This exercises a rather shallow set of features in the server software - mostly parsing of the HTTP lead-in and request headers. It's possible to give software a much, much deeper workout with pathoc, but I'll leave that for a future post.
My pathoc fuzzing command looked something like this:
pathoc -n 1000 -p 8080 -t 1 localhost 'get:/:b@10:ir,"\x00"'
The most important flags here are -n, which tells pathoc to make 1000 consecutive requests, and -t, which tells pathoc to time out after one second (necessary to prevent hangs when daemons terminate improperly). The request specification itself breaks down as follows:
get | Issue a GET request |
/ | ... to the path / |
b@10 | ... with a body consisting of 10 random bytes |
ir,"\x00" | ... and inject a NULL byte at a random location. |
It's that last clause - the random injection - that makes the difference between simply crafting requests and basic fuzzing. Every time a new request is issued, the injection occurs at a different location. I varied the injected character between a NULL byte, a carriage return and a random alphabet letter. Each exposed different errors in different servers. For a complete description of the specification language, see the online docs.
Results
For each bug, I've given a traceback and a minimal pathoc call to trigger the issue. The tracebacks have been edited lightly to shorten file paths and remove irrelevances like timestamps.
CherryPy
pathoc -p 8080 localhost 'get:/:b@10:h"Content-Length"="x"'
ENGINE ValueError("invalid literal for int() with base 10: 'x'",)
Traceback (most recent call last):
File "cherrypy/wsgiserver/wsgiserver2.py", line 1292, in communicate
req.parse_request()
File "cherrypy/wsgiserver/wsgiserver2.py", line 591, in parse_request
success = self.read_request_headers()
File "cherrypy/wsgiserver/wsgiserver2.py", line 711, in read_request_headers
if mrbs and int(self.inheaders.get("Content-Length", 0)) > mrbs:
ValueError: invalid literal for int() with base 10: 'x'
pathoc -p 8080 localhost 'get:/:i4,"\r"
ENGINE TypeError("argument of type 'NoneType' is not iterable",)
Traceback (most recent call last):
File "cherrypy/wsgiserver/wsgiserver2.py", line 1292, in communicate
req.parse_request()
File "cherrypy/wsgiserver/wsgiserver2.py", line 580, in parse_request
success = self.read_request_line()
File "cherrypy/wsgiserver/wsgiserver2.py", line 644, in read_request_line
if NUMBER_SIGN in path:
TypeError: argument of type 'NoneType' is not iterable
Tornado
pathoc -p 8080 localhost 'get:/:b@10:h"Content-Length"="x"'
[E 120927 11:42:26 iostream:307] Uncaught exception, closing connection.
Traceback (most recent call last):
File "tornado/iostream.py", line 304, in wrapper
callback(*args)
File "tornado/httpserver.py", line 254, in _on_headers
content_length = int(content_length)
ValueError: invalid literal for int() with base 10: 'x'
[E 120927 11:42:26 ioloop:435] Exception in callback <tornado.stack_context._StackContextWrapper object at 0x1012e28e8>
Traceback (most recent call last):
File "tornado/ioloop.py", line 421, in _run_callback
callback()
File "tornado/iostream.py", line 304, in wrapper
callback(*args)
File "tornado/httpserver.py", line 254, in _on_headers
content_length = int(content_length)
ValueError: invalid literal for int() with base 10: 'x'
pathoc -p 8080 localhost 'get:/:h"h\r\n"="x"'
[E iostream:307] Uncaught exception, closing connection.
Traceback (most recent call last):
File "tornado/iostream.py", line 304, in wrapper
callback(*args)
File "tornado/httpserver.py", line 236, in _on_headers
headers = httputil.HTTPHeaders.parse(data[eol:])
File "tornado/httputil.py", line 127, in parse
h.parse_line(line)
File "tornado/httputil.py", line 113, in parse_line
name, value = line.split(":", 1)
ValueError: need more than 1 value to unpack
[E ioloop:435] Exception in callback <tornado.stack_context._StackContextWrapper object at 0x1012bd7e0>
Traceback (most recent call last):
File "tornado/ioloop.py", line 421, in _run_callback
callback()
File "tornado/iostream.py", line 304, in wrapper
callback(*args)
File "tornado/httpserver.py", line 236, in _on_headers
headers = httputil.HTTPHeaders.parse(data[eol:])
File "tornado/httputil.py", line 127, in parse
h.parse_line(line)
File "tornado/httputil.py", line 113, in parse_line
name, value = line.split(":", 1)
ValueError: need more than 1 value to unpack
Twisted
pathoc -p 8080 localhost 'get:/:b@10:h"Content-Length"="x"'
[HTTPChannel,4,127.0.0.1] Unhandled Error
Traceback (most recent call last):
File "twisted/python/log.py", line 84, in callWithLogger
return callWithContext({"system": lp}, func, *args, **kw)
File "twisted/python/log.py", line 69, in callWithContext
return context.call({ILogContext: newCtx}, func, *args, **kw)
File "twisted/python/context.py", line 118, in callWithContext
return self.currentContext().callWithContext(ctx, func, *args, **kw)
File "twisted/python/context.py", line 81, in callWithContext
return func(*args,**kw)
--- <exception caught here> ---
File "twisted/internet/selectreactor.py", line 150, in _doReadOrWrite
why = getattr(selectable, method)()
File "twisted/internet/tcp.py", line 199, in doRead
rval = self.protocol.dataReceived(data)
File "twisted/protocols/basic.py", line 564, in dataReceived
why = self.lineReceived(line)
File "twisted/web/http.py", line 1558, in lineReceived
self.headerReceived(self.__header)
File "twisted/web/http.py", line 1580, in headerReceived
self.length = int(data)
exceptions.ValueError: invalid literal for int() with base 10: 'x'
SimpleHTTP
pathoc -p 8080 localhost 'get:"/\0"'
Exception happened during processing of request from ('127.0.0.1', 54029)
Traceback (most recent call last):
File "lib/python2.7/SocketServer.py", line 284, in _handle_request_noblock
self.process_request(request, client_address)
File "lib/python2.7/SocketServer.py", line 310, in process_request
self.finish_request(request, client_address)
File "lib/python2.7/SocketServer.py", line 323, in finish_request
self.RequestHandlerClass(request, client_address, self)
File "lib/python2.7/SocketServer.py", line 638, in __init__
self.handle()
File "python2.7/BaseHTTPServer.py", line 340, in handle
self.handle_one_request()
File "lib/python2.7/BaseHTTPServer.py", line 328, in handle_one_request
method()
File "lib/python2.7/SimpleHTTPServer.py", line 44, in do_GET
f = self.send_head()
File "lib/python2.7/SimpleHTTPServer.py", line 68, in send_head
if os.path.isdir(path):
File "lib/python2.7/genericpath.py", line 41, in isdir
st = os.stat(s)
TypeError: must be encoded string without NULL bytes, not str
Waitress
pathoc -p 8080 localhost 'get:/:i16," "'
ERROR:waitress:uncaptured python exception, closing channel
<waitress.channel.HTTPChannel connected 127.0.0.1:62330 at 0x1007ca310>
(
<type 'exceptions.IndexError'>:list index out of range
[lib/python2.7/asyncore.py|read|83]
[lib/python2.7/asyncore.py|handle_read_event|444]
[lib/python2.7/site-packages/waitress/channel.py|handle_read|169]
[lib/python2.7/site-packages/waitress/channel.py|received|186]
[lib/python2.7/site-packages/waitress/parser.py|received|99]
[lib/python2.7/site-packages/waitress/parser.py|parse_header|158]
[lib/python2.7/site-packages/waitress/parser.py|get_header_lines|247]
)
Edit: The first version of this post had examples that were due to the test WSGI application, not waitress. I've replaced them with the traceback above, which has been reformatted for clarity.
Werkzeug
pathoc -p 8080 localhost 'get:/:h"Host"="n\r\0"'
Traceback (most recent call last):
File "flask/app.py", line 1518, in __call__
return self.wsgi_app(environ, start_response)
File "flask/app.py", line 1507, in wsgi_app
return response(environ, start_response)
File "/usr/local/lib/python2.7/site-packages/werkzeug/wrappers.py", line 1082, in __call__
app_iter, status, headers = self.get_wsgi_response(environ)
File "werkzeug/wrappers.py", line 1070, in get_wsgi_response
headers = self.get_wsgi_headers(environ)
File "werkzeug/wrappers.py", line 986, in get_wsgi_headers
headers['Location'] = location
File "werkzeug/datastructures.py", line 1132, in __setitem__
self.set(key, value)
File "werkzeug/datastructures.py", line 1097, in set
self._validate_value(_value)
File "werkzeug/datastructures.py", line 1065, in _validate_value
raise ValueError('Detected newline in header value. This is '
ValueError: Detected newline in header value. This is a potential security problem