<?xml version="1.0" encoding="iso-8859-1"?>
<rss version="2.0"><channel><title>cortesi</title><link>http://corte.si</link><description>Cortesi</description><generator>PyRSS2Gen-1.0.0</generator><docs>http://blogs.law.harvard.edu/tech/rss</docs><item><title>mitmproxy 0.9.1</title><link>http://corte.si/posts/code/mitmproxy/announce0_9_1/index.html</link><description>&lt;div class="post"&gt;
    &lt;div class="posthead"&gt;
        &lt;h1&gt;&lt;a href="http://corte.si/posts/code/mitmproxy/announce0_9_1/index.html"&gt;mitmproxy 0.9.1&lt;/a&gt;&lt;/h1&gt;
        &lt;span class="postdate"&gt;16 June 2013&lt;/span&gt;
    &lt;/div&gt;
    &lt;div class="postbody"&gt;
        &lt;p&gt;&lt;a href="http://mitmproxy.org"&gt;
&lt;img src="http://corte.si/posts/code/mitmproxy/announce0_9_1/mitmproxy_0_9_1.png"/&gt;
&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I'm happy to announce the release of &lt;a href="http://mitmproxy.org"&gt;mitmproxy 0.9.1&lt;/a&gt;.
This is a bugfix release, with no significant changes in behaviour.&lt;/p&gt;

&lt;p&gt;As hinted in my previous release note, the project itself is also evolving. As
of this release, mitmproxy and its sister projects (&lt;a href="http://pathod.net"&gt;pathod&lt;/a&gt;
and &lt;a href="https://github.com/mitmproxy/netlib"&gt;netlib&lt;/a&gt;) are housed under a separate
organization on Github, rather than my own personal space:&lt;/p&gt;

&lt;p&gt;&lt;a class="btn" href="https://github.com/mitmproxy"&gt;github.com/mitmproxy&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I'm also very happy to welcome the first external core developer to the
mitmproxy projext: &lt;a href="http://maximilianhils.com/"&gt;Maximilian Hils&lt;/a&gt;. Max is the
author of &lt;a href="http://honeyproxy.org/"&gt;HoneyProxy&lt;/a&gt;, a web analysis front-end for
mitmproxy. In the next few months, he'll be working on integrating and
expanding his work to become mitmproxy's official web interface. Max's efforts
will be sponsored by Google under their &lt;a href="http://www.google-melange.com/gsoc/homepage/google/gsoc2013"&gt;Summer of
Code&lt;/a&gt; program, and
will be mentored by the &lt;a href="http://www.honeynet.org/"&gt;HoneyNet Project&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;Changelog&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Use "correct" case for Content-Type headers added by mitmproxy.&lt;/li&gt;
&lt;li&gt;Make UTF environment detection more robust.&lt;/li&gt;
&lt;li&gt;Improved MIME-type detection for viewers.&lt;/li&gt;
&lt;li&gt;Always read files in binary mode (Windows compatibility fix).&lt;/li&gt;
&lt;li&gt;Correct PyOpenSSL dependency declaration.&lt;/li&gt;
&lt;li&gt;Some developer documentation.&lt;/li&gt;
&lt;/ul&gt;

    &lt;/div&gt;
    &lt;div class="postfooter"&gt;
    &lt;/div&gt;
&lt;/div&gt;
</description><guid isPermaLink="true">http://corte.si/posts/code/mitmproxy/announce0_9_1/index.html</guid><pubDate>Sun, 16 Jun 2013 16:48:00 GMT</pubDate></item><item><title>Skout: a devastating privacy vulnerability</title><link>http://corte.si/posts/security/skout/index.html</link><description>&lt;div class="post"&gt;
    &lt;div class="posthead"&gt;
        &lt;h1&gt;&lt;a href="http://corte.si/posts/security/skout/index.html"&gt;Skout: a devastating privacy vulnerability&lt;/a&gt;&lt;/h1&gt;
        &lt;span class="postdate"&gt;31 May 2013&lt;/span&gt;
    &lt;/div&gt;
    &lt;div class="postbody"&gt;
        &lt;p&gt;I've become a bit weary of the process of public vulnerability disclosure - I'm
much more likely nowadays to just drop companies an anonymous notice and move
on. Every so often, though, I come across an issue so egregious that talking
about it publicly seems like an imperative. This is one of them. &lt;/p&gt;

&lt;p&gt;First, some background. Skout is a location-based mobile social network. The
idea is to allow people to meet others in their area, semi-anonymously, get to
know them, and then perhaps line up a meeting in meatspace. As far as I can
tell, a huge fraction of the userbase are singles, using Skout as an ad-hoc
dating app. Skout's scale is significant - they don't release exact user
numbers, but I've seen claims of more than 10 million users, and a growth rate
of a million users per month.&lt;/p&gt;

&lt;p&gt;In 2012, Skout went through a major PR catastrophe, when its service was linked
to &lt;a href="http://bits.blogs.nytimes.com/2012/06/12/after-rapes-involving-children-skout-a-flirting-app-faces-crisis/"&gt;no fewer than 3 separate rapes of
children&lt;/a&gt;
by adult men posing as teenagers. Skout immediately suspended the service for
teenagers and went through a security re-vamp. A month later, &lt;a href="http://blog.skout.com/2012/07/13/teens-welcome-back-to-skout/"&gt;teens were
allowed back&lt;/a&gt;,
with Skout making much of its new safety system, "advanced, proprietary
algorithms" to weed out stalkers, and its long-term commitment to community
safety.&lt;/p&gt;

&lt;p&gt;Given this background, the problem I found is simple but devastating. The Skout
mobile application talks to Skout's servers through a simple API. When a user's
profile is viewed an unencrypted, plain-HTTP request is made to to a path like
this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;http://i22.skout.com/services/ServerService/getProfile
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;What's returned is a blob of XML containing the user's complete profile data.
In fact, the profile data is &lt;em&gt;too&lt;/em&gt; complete, including some bits of data
information that is never actually used by the app. For example, we can see the
user's exact date of birth:&lt;/p&gt;

&lt;pre&gt;&amp;lt;ax213:birthdayDate&amp;gt;xx/xx/1995&amp;lt;/ax213:birthdayDate&amp;gt;

&lt;/pre&gt;

&lt;p&gt;... but only the user's age in years is actually displayed. Most serious,
however, is the high-precision location information that is returned in the
ax213:homeLocation and ax213:location tags:&lt;/p&gt;

&lt;pre&gt;&amp;lt;ax213:latitude&amp;gt;-xx.xxx&amp;lt;/ax213:latitude&amp;gt;
&amp;lt;ax213:longitude&amp;gt;xxx.xxx&amp;lt;/ax213:longitude&amp;gt;

&lt;/pre&gt;

&lt;p&gt;The three decimal places of precision in the co-ordinates is enough to locate a
user to within about 110 meters north-south, and substantially less than that
east-west depending on the distance from the equator. Here's what that looks
like in a hypothetical example:&lt;/p&gt;

&lt;p&gt;&lt;center&gt;
    &lt;img src="http://corte.si/posts/security/skout/skout-map.png"/&gt;
&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;I used &lt;a href="http://mitmproxy.org"&gt;mitmproxy&lt;/a&gt; to observe Skout's traffic, but
because the request is unencrypted any tool that allows you to inspect network
traffic would be enough. The result is a stalker's wet dream - click on an
anonymous profile, watch your network traffic, and find out exactly where the
victim lives. I've also seen minors located at malls where they hang out, and
at their schools... Given the scale of Skout's userbase and the ease with which
the data can be obtained, I think there's a high likelihood that this issue has
already been used for unsavoury purposes. &lt;/p&gt;

&lt;p&gt;I reported the vulnerability to Skout on the 24th of May. I'm happy to report
that they immediately realised the seriousness of the situation, and their API
stopped returning exact lat/long values a few hours later. Subsequent
correspondence with Niklas Lindstrom, Skout's CTO, confirmed that they were
taking steps to tighten security. I've encouraged Skout to speak about this
publicly - their userbase needs to know about the issue, and need to be
reassured that action is being taken to ensure that this type of privacy breach
won't ever recur.&lt;/p&gt;

    &lt;/div&gt;
    &lt;div class="postfooter"&gt;
    &lt;/div&gt;
&lt;/div&gt;
</description><guid isPermaLink="true">http://corte.si/posts/security/skout/index.html</guid><pubDate>Fri, 31 May 2013 09:08:00 GMT</pubDate></item><item><title>How mitmproxy works</title><link>http://corte.si/posts/code/mitmproxy/howitworks/index.html</link><description>&lt;div class="post"&gt;
    &lt;div class="posthead"&gt;
        &lt;h1&gt;&lt;a href="http://corte.si/posts/code/mitmproxy/howitworks/index.html"&gt;How mitmproxy works&lt;/a&gt;&lt;/h1&gt;
        &lt;span class="postdate"&gt;16 May 2013&lt;/span&gt;
    &lt;/div&gt;
    &lt;div class="postbody"&gt;
        &lt;p&gt;I started work on &lt;a href="http://mitmproxy.org"&gt;mitmproxy&lt;/a&gt; because I was frustrated
with the available interception tools. I had a long list of minor complaints -
they were insufficiently flexible, not programmable enough, mostly written in
Java (a language I don't enjoy), and so forth. My most serious problem, though,
was opacity. The best tools were all closed source and commercial. SSL
interception is a complicated and delicate process, and after a certain point,
not understanding precisely what your proxy is doing just doesn't fly.&lt;/p&gt;

&lt;p&gt;The text below is now part of the &lt;a href="http://mitmproxy.org/doc/index.html"&gt;official
documentation&lt;/a&gt; of mitmproxy. It's a
detailed description of mitmproxy's interception process, and is more or less
the overview document I wish I had when I first started the project. I proceed
by example, starting with the simplest unencrypted explicit proxying, and
working up to the most complicated interaction - transparent proxying of
SSL-protected traffic&lt;sup class="footnote-ref" id="fnref-ssl"&gt;&lt;a href="#fn-ssl"&gt;1&lt;/a&gt;&lt;/sup&gt; in the presence of
&lt;a href="http://en.wikipedia.org/wiki/Server_Name_Indication"&gt;SNI&lt;/a&gt;.&lt;/p&gt;

&lt;h1&gt;Explicit HTTP&lt;/h1&gt;

&lt;p&gt;Configuring the client to use mitmproxy as an explicit proxy is the simplest
and most reliable way to intercept traffic. The proxy protocol is codified in
the &lt;a href="http://www.ietf.org/rfc/rfc2068.txt"&gt;HTTP RFC&lt;/a&gt;, so the behaviour of both
the client and the server is well defined, and usually reliable. In the
simplest possible interaction with mitmproxy, a client connects directly to the
proxy and makes a request that looks like this:&lt;/p&gt;

&lt;pre&gt;GET http://example.com/index.html HTTP/1.1&lt;/pre&gt;

&lt;p&gt;This is a proxy GET request - an extended form of the vanilla HTTP GET request
that includes a schema and host specification, and it includes all the
information mitmproxy needs to relay the request upstream.&lt;/p&gt;

&lt;p&gt;&lt;img src="http://corte.si/posts/code/mitmproxy/howitworks/explicit.png"/&gt;&lt;/p&gt;

&lt;table class="table"&gt;
    &lt;tbody&gt;
        &lt;tr&gt;

            &lt;td&gt;&lt;b&gt;1&lt;/b&gt;&lt;/td&gt;

            &lt;td&gt;The client connects to the proxy and makes a request.&lt;/td&gt;

        &lt;/tr&gt;

        &lt;tr&gt;

            &lt;td&gt;&lt;b&gt;2&lt;/b&gt;&lt;/td&gt;

            &lt;td&gt;Mitmproxy connects to the upstream server and simply forwards
            the request on.&lt;/td&gt;

        &lt;/tr&gt;
    &lt;/tbody&gt;
&lt;/table&gt;

&lt;h1&gt;Explicit HTTPS&lt;/h1&gt;

&lt;p&gt;The process for an explicitly proxied HTTPS connection is quite different. The
client connects to the proxy and makes a request that looks like this:&lt;/p&gt;

&lt;pre&gt;CONNECT example.com:443 HTTP/1.1&lt;/pre&gt;

&lt;p&gt;A conventional proxy can neither view nor manipulate an SSL-encrypted data
stream, so a CONNECT request simply asks the proxy to open a pipe between the
client and server. The proxy here is just a facilitator - it blindly forwards
data in both directions without knowing anything about the contents. The
negotiation of the SSL connection happens over this pipe, and the subsequent
flow of requests and responses are completely opaque to the proxy.&lt;/p&gt;

&lt;h3&gt;The MITM in mitmproxy&lt;/h3&gt;

&lt;p&gt;This is where mitmproxy's fundamental trick comes into play. The MITM in its
name stands for Man-In-The-Middle - a reference to the process we use to
intercept and interfere with these theoretically opaque data streams. The basic
idea is to pretend to be the server to the client, and pretend to be the client
to the server, while we sit in the middle decoding traffic from both sides. The
tricky part is that the &lt;a href="http://en.wikipedia.org/wiki/Certificate_authority"&gt;Certificate
Authority&lt;/a&gt; system is
designed to prevent exactly this attack, by allowing a trusted third-party to
cryptographically sign a server's SSL certificates to verify that they are
legit. If this signature doesn't match or is from a non-trusted party, a secure
client will simply drop the connection and refuse to proceed. Despite the many
shortcomings of the CA system as it exists today, this is usually fatal to
attempts to MITM an SSL connection for analysis. Our answer to this conundrum
is to become a trusted Certificate Authority ourselves. Mitmproxy includes a
full CA implementation that generates interception certificates on the fly. To
get the client to trust these certificates, we &lt;a href="http://mitmproxy.org/doc/ssl.html"&gt;register mitmproxy as a trusted
CA with the device manually&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;Complication 1: What's the remote hostname?&lt;/h3&gt;

&lt;p&gt;To proceed with this plan, we need to know the domain name to use in the
interception certificate - the client will verify that the certificate is for
the domain it's connecting to, and abort if this is not the case. At first
blush, it seems that the CONNECT request above gives us all we need - in this
example, both of these values are "example.com".  But what if the client had
initiated the connection as follows:&lt;/p&gt;

&lt;pre&gt;CONNECT 10.1.1.1:443 HTTP/1.1&lt;/pre&gt;

&lt;p&gt;Using the IP address is perfectly legitimate because it gives us enough
information to initiate the pipe, even though it doesn't reveal the remote
hostname.&lt;/p&gt;

&lt;p&gt;Mitmproxy has a cunning mechanism that smooths this over - &lt;a href="http://mitmproxy.org/doc/features/upstreamcerts.html"&gt;upstream
certificate sniffing&lt;/a&gt;. As
soon as we see the CONNECT request, we pause the client part of the
conversation, and initiate a simultaneous connection to the server. We complete
the SSL handshake with the server, and inspect the certificates it used. Now,
we use the Common Name in the upstream SSL certificates to generate the dummy
certificate for the client. Voila, we have the correct hostname to present to
the client, even if it was never specified.&lt;/p&gt;

&lt;h3&gt;Complication 2: Subject Alternative Name&lt;/h3&gt;

&lt;p&gt;Enter the next complication. Sometimes, the certificate Common Name is not, in
fact, the hostname that the client is connecting to. This is because of the
optional &lt;a href="http://en.wikipedia.org/wiki/SubjectAltName"&gt;Subject Alternative
Name&lt;/a&gt; field in the SSL certificate
that allows an arbitrary number of alternative domains to be specified. If the
expected domain matches any of these, the client will proceed, even though the
domain doesn't match the certificate Common Name. The answer here is simple:
when extract the CN from the upstream cert, we also extract the SANs, and add
them to the generated dummy certificate.&lt;/p&gt;

&lt;h3&gt;Complication 3: Server Name Indication&lt;/h3&gt;

&lt;p&gt;One of the big limitations of vanilla SSL is that each certificate requires its
own IP address. This means that you couldn't do virtual hosting where multiple
domains with independent certificates share the same IP address. In a world
with a rapidly shrinking IPv4 address pool this is a problem, and we have a
solution in the form of the &lt;a href="http://en.wikipedia.org/wiki/Server_Name_Indication"&gt;Server Name
Indication&lt;/a&gt; extension to
the SSL and TLS protocols. This lets the client specify the remote server name
at the start of the SSL handshake, which then lets the server select the right
certificate to complete the process.&lt;/p&gt;

&lt;p&gt;SNI breaks our upstream certificate sniffing process, because when we connect
without using SNI, we get served a default certificate that may have nothing to
do with the certificate expected by the client. The solution is another tricky
complication to the client connection process. After the client connects, we
allow the SSL handshake to continue until just &lt;em&gt;after&lt;/em&gt; the SNI value has been
passed to us. Now we can pause the conversation, and initiate an upstream
connection using the correct SNI value, which then serves us the correct
upstream certificate, from which we can extract the expected CN and SANs.&lt;/p&gt;

&lt;p&gt;There's another wrinkle here. Due to a limitation of the SSL library mitmproxy
uses, we can't detect that a connection &lt;em&gt;hasn't&lt;/em&gt; sent an SNI request until it's
too late for upstream certificate sniffing. In practice, we therefore make a
vanilla SSL connection upstream to sniff non-SNI certificates, and then discard
the connection if the client sends an SNI notification. If you're watching your
traffic with a packet sniffer, you'll see two connections to the server when an
SNI request is made, the first of which is immediately closed after the SSL
handshake. Luckily, this is almost never an issue in practice.&lt;/p&gt;

&lt;h3&gt;Putting it all together&lt;/h3&gt;

&lt;p&gt;Lets put all of this together into the complete explicitly proxied HTTPS flow.&lt;/p&gt;

&lt;p&gt;&lt;img src="explicit_https.png"/&gt;&lt;/p&gt;

&lt;table class="table"&gt;
    &lt;tbody&gt;
        &lt;tr&gt;
            &lt;td&gt;&lt;b&gt;1&lt;/b&gt;&lt;/td&gt;
            &lt;td&gt;The client makes a connection to mitmproxy, and issues an HTTP
            CONNECT request.&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;&lt;b&gt;2&lt;/b&gt;&lt;/td&gt;

            &lt;td&gt;Mitmproxy responds with a 200 Connection Established, as if it
            has set up the CONNECT pipe.&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;&lt;b&gt;3&lt;/b&gt;&lt;/td&gt;

            &lt;td&gt;The client believes it's talking to the remote server, and
            initiates the SSL connection. It uses SNI to indicate the hostname
            it is connecting to.&lt;/td&gt;
        &lt;/tr&gt;

        &lt;tr&gt;
            &lt;td&gt;&lt;b&gt;4&lt;/b&gt;&lt;/td&gt;

            &lt;td&gt;Mitmproxy connects to the server, and establishes an SSL
            connection using the SNI hostname indicated by the client.&lt;/td&gt;

        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;&lt;b&gt;5&lt;/b&gt;&lt;/td&gt;

            &lt;td&gt;The server responds with the matching SSL certificate, which
            contains the CN and SAN values needed to generate the interception
            certificate.&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;&lt;b&gt;6&lt;/b&gt;&lt;/td&gt;

            &lt;td&gt;Mitmproxy generates the interception cert, and continues the
            client SSL handshake paused in step 3.&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;&lt;b&gt;7&lt;/b&gt;&lt;/td&gt;

            &lt;td&gt;The client sends the request over the established SSL
            connection.&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;&lt;b&gt;7&lt;/b&gt;&lt;/td&gt;

            &lt;td&gt;Mitmproxy passes the request on to the server over the SSL
            connection initiated in step 4.&lt;/td&gt;
        &lt;/tr&gt;
    &lt;/tbody&gt;
&lt;/table&gt;

&lt;h1&gt;Transparent HTTP&lt;/h1&gt;

&lt;p&gt;When a transparent proxy is used, the HTTP/S connection is redirected into a
proxy at the network layer, without any client configuration being required.
This makes transparent proxying ideal for those situations where you can't
change client behaviour - proxy-oblivious Android applications being a common
example.&lt;/p&gt;

&lt;p&gt;To achieve this, we need to introduce two extra components. The first is a
redirection mechanism that transparently reroutes a TCP connection destined for
a server on the Internet to a listening proxy server. This usually takes the
form of a firewall on the same host as the proxy server -
&lt;a href="http://www.netfilter.org/"&gt;iptables&lt;/a&gt; on Linux or
&lt;a href="http://en.wikipedia.org/wiki/PF_(firewall)"&gt;pf&lt;/a&gt; on OSX. Once the client has
initiated the connection, it makes a vanilla HTTP request, which might look
something like this:&lt;/p&gt;

&lt;pre&gt;GET /index.html HTTP/1.1&lt;/pre&gt;

&lt;p&gt;Note that this request differs from the explicit proxy variation, in that it
omits the scheme and hostname. How, then, do we know which upstream host to
forward the request to? The routing mechanism that has performed the
redirection keeps track of the original destination for us.  Each routing
mechanism has a different way of exposing this data, so this introduces the
second component required for working transparent proxying: a host module that
knows how to retrieve the original destination address from the router. In
mitmproxy, this takes the form of a built-in set of
&lt;a href="https://github.com/cortesi/mitmproxy/tree/master/libmproxy/platform"&gt;modules&lt;/a&gt;
that know how to talk to each platform's redirection mechanism.  Once we have
this information, the process is fairly straight-forward.&lt;/p&gt;

&lt;p&gt;&lt;img src="transparent.png"/&gt;&lt;/p&gt;

&lt;table class="table"&gt;
    &lt;tbody&gt;
        &lt;tr&gt;
            &lt;td&gt;&lt;b&gt;1&lt;/b&gt;&lt;/td&gt;
            &lt;td&gt;The client makes a connection to the server.&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;&lt;b&gt;2&lt;/b&gt;&lt;/td&gt;

            &lt;td&gt;The router redirects the connection to mitmproxy, which is
            typically listening on a local port of the same host. Mitmproxy
            then consults the routing mechanism to establish what the original
            destination was.&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;&lt;b&gt;3&lt;/b&gt;&lt;/td&gt;

            &lt;td&gt;Now, we simply read the client's request...&lt;/td&gt;
        &lt;/tr&gt;

        &lt;tr&gt;
            &lt;td&gt;&lt;b&gt;4&lt;/b&gt;&lt;/td&gt;

            &lt;td&gt;... and forward it upstream.&lt;/td&gt;

        &lt;/tr&gt;
    &lt;/tbody&gt;
&lt;/table&gt;

&lt;h1&gt;Transparent HTTPS&lt;/h1&gt;

&lt;p&gt;The first step is to determine whether we should treat an incoming connection
as HTTPS. The mechanism for doing this is simple - we use the routing mechanism
to find out what the original destination port is. By default, we treat all
traffic destined for ports 443 and 8443 as SSL.&lt;/p&gt;

&lt;p&gt;From here, the process is a merger of the methods we've described for
transparently proxying HTTP, and explicitly proxying HTTPS. We use the routing
mechanism to establish the upstream server address, and then proceed as for
explicit HTTPS connections to establish the CN and SANs, and cope with SNI.&lt;/p&gt;

&lt;p&gt;&lt;img src="transparent_https.png"/&gt;&lt;/p&gt;

&lt;table class="table"&gt;
    &lt;tbody&gt;
        &lt;tr&gt;
            &lt;td&gt;&lt;b&gt;1&lt;/b&gt;&lt;/td&gt;
            &lt;td&gt;The client makes a connection to the server.&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;&lt;b&gt;2&lt;/b&gt;&lt;/td&gt;

            &lt;td&gt;The router redirects the connection to mitmproxy, which is
            typically listening on a local port of the same host. Mitmproxy
            then consults the routing mechanism to establish what the original
            destination was.&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;&lt;b&gt;3&lt;/b&gt;&lt;/td&gt;

            &lt;td&gt;The client believes it's talking to the remote server, and
            initiates the SSL connection. It uses SNI to indicate the hostname
            it is connecting to.&lt;/td&gt;
        &lt;/tr&gt;

        &lt;tr&gt;
            &lt;td&gt;&lt;b&gt;4&lt;/b&gt;&lt;/td&gt;

            &lt;td&gt;Mitmproxy connects to the server, and establishes an SSL
            connection using the SNI hostname indicated by the client.&lt;/td&gt;

        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;&lt;b&gt;5&lt;/b&gt;&lt;/td&gt;

            &lt;td&gt;The server responds with the matching SSL certificate, which
            contains the CN and SAN values needed to generate the interception
            certificate.&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;&lt;b&gt;6&lt;/b&gt;&lt;/td&gt;

            &lt;td&gt;Mitmproxy generates the interception cert, and continues the
            client SSL handshake paused in step 3.&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;&lt;b&gt;7&lt;/b&gt;&lt;/td&gt;

            &lt;td&gt;The client sends the request over the established SSL
            connection.&lt;/td&gt;
        &lt;/tr&gt;
        &lt;tr&gt;
            &lt;td&gt;&lt;b&gt;7&lt;/b&gt;&lt;/td&gt;

            &lt;td&gt;Mitmproxy passes the request on to the server over the SSL
            connection initiated in step 4.&lt;/td&gt;
        &lt;/tr&gt;
    &lt;/tbody&gt;
&lt;/table&gt;

&lt;div class="footnotes"&gt;
&lt;hr /&gt;
&lt;ol&gt;
&lt;li id="fn-ssl"&gt;
&lt;p&gt;I use "SSL" to refer to both SSL and TLS in the generic sense, unless otherwise specified.&amp;nbsp;&lt;a href="#fnref-ssl" class="footnoteBackLink" title="Jump back to footnote 1 in the text."&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;

    &lt;/div&gt;
    &lt;div class="postfooter"&gt;
    &lt;/div&gt;
&lt;/div&gt;
</description><guid isPermaLink="true">http://corte.si/posts/code/mitmproxy/howitworks/index.html</guid><pubDate>Thu, 16 May 2013 08:53:00 GMT</pubDate></item><item><title>pathod 0.9</title><link>http://corte.si/posts/code/pathod/announce0_9.html</link><description>&lt;div class="post"&gt;
    &lt;div class="posthead"&gt;
        &lt;h1&gt;&lt;a href="http://corte.si/posts/code/pathod/announce0_9.html"&gt;pathod 0.9&lt;/a&gt;&lt;/h1&gt;
        &lt;span class="postdate"&gt;16 May 2013&lt;/span&gt;
    &lt;/div&gt;
    &lt;div class="postbody"&gt;
        &lt;p&gt;I've just released &lt;a href="http://pathod.net"&gt;pathod 0.9&lt;/a&gt;, my toolset for crafting
malicious and interesting HTTP traffic. Apart from the usual range of stability
improvements and bugfixes, this release introduces a major new set of features:
proxy support. &lt;a href="http://pathod.net/docs/pathoc"&gt;Pathoc&lt;/a&gt;, the client, has
sprouted support for vanilla proxy connections, and is also able to tunnel
through proxies using CONNECT. &lt;a href="http://pathod.net/docs/pathod"&gt;Pathod&lt;/a&gt;, the
server, will now respond to proxy requests as well as straight HTTP, and will
treat CONNECT requests as SSL with on-the-fly generation of dummy certificates. &lt;/p&gt;

&lt;p&gt;The Pathod changes in particular open a whole new range of possibilities for
fuzzing and other mischief. Any client with proxy support can be directed at
Pathod, which can then impersonate the upstream server and return the
creatively malicious response of your choice.&lt;/p&gt;

&lt;p&gt;There have also been some organizational changes. This is the first release
based on &lt;a href="http://github.com/cortesi/netlib"&gt;netlib&lt;/a&gt;, the gonzo networking
library pathod now shares with &lt;a href="http://mitmproxy.org"&gt;mitmproxy&lt;/a&gt;. Over the next
while, pathod and mitmproxy will move closer together. As a sign of this, the
major version numbers between these projects are now synchronized. &lt;/p&gt;

    &lt;/div&gt;
    &lt;div class="postfooter"&gt;
    &lt;/div&gt;
&lt;/div&gt;
</description><guid isPermaLink="true">http://corte.si/posts/code/pathod/announce0_9.html</guid><pubDate>Thu, 16 May 2013 08:41:00 GMT</pubDate></item><item><title>mitmproxy 0.9</title><link>http://corte.si/posts/code/mitmproxy/announce0_9/index.html</link><description>&lt;div class="post"&gt;
    &lt;div class="posthead"&gt;
        &lt;h1&gt;&lt;a href="http://corte.si/posts/code/mitmproxy/announce0_9/index.html"&gt;mitmproxy 0.9&lt;/a&gt;&lt;/h1&gt;
        &lt;span class="postdate"&gt;15 May 2013&lt;/span&gt;
    &lt;/div&gt;
    &lt;div class="postbody"&gt;
        &lt;p&gt;&lt;a href="http://mitmproxy.org"&gt;
&lt;img src="http://corte.si/posts/code/mitmproxy/announce0_9/mitmproxy_0_9.png"/&gt;
&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I'm happy to announce the release of &lt;a href="http://mitmproxy.org"&gt;mitmproxy 0.9&lt;/a&gt;.
This is a major release, with huge improvements to mitmproxy pretty much across
the board. So much has happened in the year since the last release that it's
difficult to pick out the headlines. Mitmproxy is now faster, more scalable,
and works in more tricky corner cases than ever before. Full transparent mode
support has landed for both Linux and OSX. Content decoding is much nicer, with
a slew of new targets like
&lt;a href="http://en.wikipedia.org/wiki/Action_Message_Format"&gt;AMF&lt;/a&gt; and &lt;a href="https://code.google.com/p/protobuf/"&gt;Protocol
Buffers&lt;/a&gt;. We now have a WSGI container
that allows you to host web apps right in the proxy. In addition to this, there
is a myriad of new features, bugfixes and other small improvements. &lt;/p&gt;

&lt;p&gt;There are also changes afoot in the project itself. As a first step, I've moved
mitmproxy from the GPLv3 to an MIT license. I hope that this will make it
easier for people to use the project in more contexts. Keep an eye out for more
changes along these lines soon, geared to broadening participation in the project.&lt;/p&gt;

&lt;h2&gt;Changelog&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Upstream certs mode is now the default.&lt;/li&gt;
&lt;li&gt;Add a WSGI container that lets you host in-proxy web applications.&lt;/li&gt;
&lt;li&gt;Full transparent proxy support for Linux and OSX.&lt;/li&gt;
&lt;li&gt;Introduce netlib, a common codebase for mitmproxy and pathod
(http://github.com/cortesi/netlib).&lt;/li&gt;
&lt;li&gt;Full support for SNI.&lt;/li&gt;
&lt;li&gt;Color palettes for mitmproxy, tailored for light and dark terminal
backgrounds.&lt;/li&gt;
&lt;li&gt;Stream flows to file as responses arrive with the "W" shortcut in
mitmproxy.&lt;/li&gt;
&lt;li&gt;Extend the filter language, including ~d domain match operator, ~a to
match asset flows (js, images, css).&lt;/li&gt;
&lt;li&gt;Follow mode in mitmproxy ("F" shortcut) to "tail" flows as they arrive.&lt;/li&gt;
&lt;li&gt;--dummy-certs option to specify and preserve the dummy certificate
directory.&lt;/li&gt;
&lt;li&gt;Server replay from the current captured buffer.&lt;/li&gt;
&lt;li&gt;Huge improvements in content views. We now have viewers for AMF, HTML,
JSON, Javascript, images, XML, URL-encoded forms, as well as hexadecimal
and raw views.&lt;/li&gt;
&lt;li&gt;Add Set Headers, analogous to replacement hooks. Defines headers that are set
on flows, based on a matching pattern.&lt;/li&gt;
&lt;li&gt;A graphical editor for path components in mitmproxy.&lt;/li&gt;
&lt;li&gt;A small set of standard user-agent strings, which can be used easily in
the header editor.&lt;/li&gt;
&lt;li&gt;Proxy authentication to limit access to mitmproxy&lt;/li&gt;
&lt;/ul&gt;

    &lt;/div&gt;
    &lt;div class="postfooter"&gt;
    &lt;/div&gt;
&lt;/div&gt;
</description><guid isPermaLink="true">http://corte.si/posts/code/mitmproxy/announce0_9/index.html</guid><pubDate>Wed, 15 May 2013 11:43:00 GMT</pubDate></item><item><title>Google, destroyer of ecosystems</title><link>http://corte.si/posts/socialmedia/rip-google-reader.html</link><description>&lt;div class="post"&gt;
    &lt;div class="posthead"&gt;
        &lt;h1&gt;&lt;a href="http://corte.si/posts/socialmedia/rip-google-reader.html"&gt;Google, destroyer of ecosystems&lt;/a&gt;&lt;/h1&gt;
        &lt;span class="postdate"&gt;14 March 2013&lt;/span&gt;
    &lt;/div&gt;
    &lt;div class="postbody"&gt;
        &lt;p&gt;Google has finally shut down a service I actually care about - &lt;a href="http://googlereader.blogspot.co.nz/2013/03/powering-down-google-reader.html"&gt;Google Reader
will die a graceless, undignified death on July 1,
2013&lt;/a&gt;.
The only way Google could inconvenience me more would be to shut down search
itself, and yet - I'm not angry that Google is shutting Reader down. I'm
furious that they ever entered the RSS game at all. Consider this quote from a
TechCrunch &lt;a href="http://techcrunch.com/2006/01/10/searchfox-to-shut-down/"&gt;article in January
2006&lt;/a&gt;. Here, Michael
Arrington ends an article about the shutdown of a feed reader service with a
statement that seems truly bizarre today:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;The RSS reader space is becoming hyper competitive, with dozens of different
  choices for readers. &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;A hyper competitive space with dozens of choices? Reader made its first public
appearance a couple of months before this, in October 2005. I remember this
period well - it was a time of immense excitement, when RSS seemed to be the
future, the news ecosystem was vibrant, and this thing called the blogosphere,
fueled by peer subscription, was doubling in size every six months. It was into
this magic garden that Google wandered, like a giant toddler leaving
destruction in its wake. Reader was undeniably a good product, but it's best
quality was also its worst: it was free. Subsidized by Google's immense search
profits, it never had to earn its keep, and its competitors started to die.
Over time, the "hyper competitive" RSS reader market turned into a monoculture.
Today, on the eve of its shutdown, RSS more or less means "Google Reader" to a
large fraction of readers, to the extent where even the best feed readers on
IOS are just Google Reader clients&lt;sup class="footnote-ref" id="fnref-1"&gt;&lt;a href="#fn-1"&gt;1&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;The sudden shock of Reader's closure will harm a news ecosystem that I &lt;a href="http://corte.si/posts/socialmedia/trouble-with-social-news.html"&gt;already
believe to be deeply ill&lt;/a&gt;.  Google
Reader is not just a core part of my information diet - it's also the most
direct channel I have to readers of this blog. As of today, the Reader
subscriber count for &lt;a href="http://corte.si"&gt;corte.si&lt;/a&gt; stands at about 3 times the
total number of other subscribers combined. Some of these readers will migrate
to other services and stay in touch, but many will inevitably abandon the idea
of direct subscription to blogs entirely. In the next few months, tens of
thousands of small blogs will lose direct contact with a large fraction of
their readers. &lt;/p&gt;

&lt;p&gt;The truth is this: Google destroyed the RSS feed reader ecosystem with a
subsidized product, stifling its competitors and killing innovation. It then
neglected Google Reader itself for years, after it had effectively become the
only player. Today it does further damage by buggering up the already
beleaguered links between publishers and readers. It would have been better for
the Internet if Reader had never been at all.&lt;/p&gt;

&lt;div class="footnotes"&gt;
&lt;hr /&gt;
&lt;ol&gt;
&lt;li id="fn-1"&gt;
&lt;p&gt;Yes, I'm aware that there are a few hardy outliers still playing in this place. My own logs show that their reach is insignificant, though, and when I tried to shift my subscriptions about a year ago, there was nothing as good as Reader itself. Once &lt;a href="http://www.newsblur.com"&gt;NewsBlur's&lt;/a&gt; servers have recovered, I definitely plan to give it another shot.&amp;nbsp;&lt;a href="#fnref-1" class="footnoteBackLink" title="Jump back to footnote 1 in the text."&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;

    &lt;/div&gt;
    &lt;div class="postfooter"&gt;
    &lt;/div&gt;
&lt;/div&gt;
</description><guid isPermaLink="true">http://corte.si/posts/socialmedia/rip-google-reader.html</guid><pubDate>Thu, 14 Mar 2013 16:15:00 GMT</pubDate></item><item><title>Things I found on GitHub: aspell custom dictionary entries</title><link>http://corte.si/posts/hacks/github-spellingdicts/index.html</link><description>&lt;div class="post"&gt;
    &lt;div class="posthead"&gt;
        &lt;h1&gt;&lt;a href="http://corte.si/posts/hacks/github-spellingdicts/index.html"&gt;Things I found on GitHub: aspell custom dictionary entries&lt;/a&gt;&lt;/h1&gt;
        &lt;span class="postdate"&gt;26 February 2013&lt;/span&gt;
    &lt;/div&gt;
    &lt;div class="postbody"&gt;
        &lt;p&gt;I've been doing a series of posts looking at data gathered with
&lt;a href="https://github.com/cortesi/ghrabber"&gt;ghrabber&lt;/a&gt;, a simple tool I wrote that
lets you grab files matching a search specification from GitHub. Last week, I
looked at &lt;a href="http://corte.si/posts/hacks/github-shhistory/index.html"&gt;shell history&lt;/a&gt; in the
broad, and then specifically at &lt;a href="http://corte.si/posts/hacks/github-pipechains/index.html"&gt;pipe
chains&lt;/a&gt;.  Today, I move on to
something different - custom &lt;a href="http://aspell.net/"&gt;aspell&lt;/a&gt; dictionaries. When
aspell finds a word it doesn't recognize, the user is prompted to correct it,
ignore it, or add it to a custom dictionary so that it will be recognized as
correct in future. These words are written to the user's custom dictionary - a
file named &lt;strong&gt;.aspell_en_pw&lt;/strong&gt; that lives in the user's home directory. It
turns out that 30 people have checked aspell dictionaries into GitHub,
containing a total of 9501 custom words. The chart below shows the top 50
words, with the X-axis showing the percentage of files the word appeared in.&lt;/p&gt;

&lt;div class="row"&gt;
    &lt;div class="span6"&gt;
        &lt;a href="http://corte.si/posts/hacks/github-spellingdicts/aspell.png"&gt;&lt;img src="http://corte.si/posts/hacks/github-spellingdicts/aspell.png"/&gt;&lt;/a&gt;
    &lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;There were a few requests for the raw data behind the previous two posts, so
this time round you can also &lt;a href="http://corte.si/posts/hacks/github-spellingdicts/aspell-all.csv"&gt;download a CSV file&lt;/a&gt;
with the occurrence totals for each word in the dataset.&lt;/p&gt;

    &lt;/div&gt;
    &lt;div class="postfooter"&gt;
    &lt;/div&gt;
&lt;/div&gt;
</description><guid isPermaLink="true">http://corte.si/posts/hacks/github-spellingdicts/index.html</guid><pubDate>Tue, 26 Feb 2013 09:07:00 GMT</pubDate></item><item><title>Things I found on GitHub: pipe chains</title><link>http://corte.si/posts/hacks/github-pipechains/index.html</link><description>&lt;div class="post"&gt;
    &lt;div class="posthead"&gt;
        &lt;h1&gt;&lt;a href="http://corte.si/posts/hacks/github-pipechains/index.html"&gt;Things I found on GitHub: pipe chains&lt;/a&gt;&lt;/h1&gt;
        &lt;span class="postdate"&gt;22 February 2013&lt;/span&gt;
    &lt;/div&gt;
    &lt;div class="postbody"&gt;
        &lt;p&gt;Earlier this week I published &lt;a href="https://github.com/cortesi/ghrabber"&gt;ghrabber&lt;/a&gt;,
a simple tool that lets you grab files matching an arbitrary search
specification from GitHub. I used ghrabber to retrieve all the bash_history
and zsh_history files accidentally checked in to repos, and took &lt;a href="http://corte.si/posts/hacks/github-shhistory/index.html"&gt;a light look
at the dataset with some simple
graphs&lt;/a&gt;. In total, I
obtained 234 shell history files with 165k individual command entries. This is
a very rare opportunity to "shoulder-surf", to actually see what people &lt;em&gt;do&lt;/em&gt; at
the command prompt, and perhaps get some insights into how to improve things.&lt;/p&gt;

&lt;p&gt;Along those lines, today's post looks at pipe chains - that is, compound
commands that pipe the output of one command to another. The pipe operator lies
at the core of the Unix command-line philosophy. The fact that we can easily
compose complex operations is the reason why we are able to write small tools
that "do one thing well" without losing generality. The shell history data on
Github can give us some real data about what people do with composed commands,
and how they do it. &lt;/p&gt;

&lt;div class="row"&gt;
    &lt;div class="span6"&gt;
        &lt;a href="http://corte.si/posts/hacks/github-pipechains/pipechains.png"&gt;&lt;img src="http://corte.si/posts/hacks/github-pipechains/pipechains.png"/&gt;&lt;/a&gt;
    &lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;It turns out that about 2% of all commands issued on the command-line use
pipes. The graph above shows the prevalence the most common pipe chains - that
is, what percentage of the user in my sample used each chain. There's a lot of
fascinating stuff we can read straight from this image.&lt;/p&gt;

&lt;p&gt;Starting at the top, the first thing we notice is how widely used the &lt;strong&gt;ps |
grep&lt;/strong&gt; chain is. About 17% of users in my sample used this chain - given the
type of data we have, the real-world prevalence would surely be higher still.
I've just been extolling the virtues of small tools and composability, but in
this case practicality should beat purity. I suggest that everyone should have
a command-alias similar to this in their shell configuration:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;alias pg="ps aux | grep"
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;I've added this to my .zshrc today, and I've already used it twice. &lt;/p&gt;

&lt;p&gt;Next up, we have the &lt;strong&gt;ls | grep&lt;/strong&gt; pipes. The vast majority of uses here could
actually be accomplished using the shell's filename generation mechanism.  This
ranges from simple redundancies like grepping for file extensions, to
performing quite complex matching operations that could be done using the
shell's advanced glob operations. I'm guilty of this myself - I rarely use
features like recursive globbing, expansions using character ranges, case
insensitive globbing, and so forth. I've brushed up on &lt;a href="http://linux.die.net/man/1/zshexpn"&gt;filename expansion for
my chosen shell&lt;/a&gt;, and perhaps you should
too. &lt;/p&gt;

&lt;p&gt;The last thing I want to point out is a pattern that's genuinely dangerous -
&lt;strong&gt;curl | bash&lt;/strong&gt;, along with its cousins &lt;strong&gt;curl | sh&lt;/strong&gt; and &lt;strong&gt;wget | sh&lt;/strong&gt;.
Unfortunately, this has become the recommended installation pattern for some
tool - the vast majority of invocations here are for &lt;a href="https://rvm.io/"&gt;RVM&lt;/a&gt; and
&lt;a href="http://yeoman.io/"&gt;Yeoman&lt;/a&gt;. I don't think it's a good idea to pipe anything
from the web straight into a local shell, but the situation is made
particularly dire by the fact that almost half of these invocations are either
over plain HTTP or explicitly turn certificate validation off. &lt;/p&gt;

&lt;p&gt;I'll stop here, although There are interesting things to say about nearly every
entry in the graph above. Next week, I'll move on from the shell history
sample, look at some other juicy datasets extracted using ghrabber.&lt;/p&gt;

    &lt;/div&gt;
    &lt;div class="postfooter"&gt;
    &lt;/div&gt;
&lt;/div&gt;
</description><guid isPermaLink="true">http://corte.si/posts/hacks/github-pipechains/index.html</guid><pubDate>Fri, 22 Feb 2013 09:59:00 GMT</pubDate></item><item><title>Things I found on GitHub: shell history</title><link>http://corte.si/posts/hacks/github-shhistory/index.html</link><description>&lt;div class="post"&gt;
    &lt;div class="posthead"&gt;
        &lt;h1&gt;&lt;a href="http://corte.si/posts/hacks/github-shhistory/index.html"&gt;Things I found on GitHub: shell history&lt;/a&gt;&lt;/h1&gt;
        &lt;span class="postdate"&gt;19 February 2013&lt;/span&gt;
    &lt;/div&gt;
    &lt;div class="postbody"&gt;
        &lt;p&gt;Github recently introduced hugely &lt;a href="https://github.com/blog/1381-a-whole-new-code-search"&gt;improved code
search&lt;/a&gt;, one of those
rare moments when a service I use adds a feature that directly and measurably
measurably improves my life. Predictably, there was soon a
&lt;a href="http://www.webmonkey.com/2013/01/users-scramble-as-github-search-exposes-passwords-security-details/"&gt;flurry&lt;/a&gt;
&lt;a href="http://www.scmagazine.com.au/News/330152,passwords-ssh-keys-exposed-on-github.aspx"&gt;of&lt;/a&gt;
&lt;a href="http://arstechnica.com/security/2013/01/psa-dont-upload-your-important-passwords-to-github/"&gt;breathless&lt;/a&gt;
stories about the security implications. This shouldn't have been news to
anyone - by now, it should be clear that better search in almost any context
has security or privacy implications, a law of the universe almost as solid as
the second law of thermodynamics. We saw this with &lt;a href="http://www.securityfocus.com/news/11417"&gt;Google's own code
search&lt;/a&gt;, as well as &lt;a href="http://en.wikipedia.org/wiki/Google_hacking"&gt;Google
proper&lt;/a&gt;, Facebook's &lt;a href="http://actualfacebookgraphsearches.tumblr.com/"&gt;Graph
Search&lt;/a&gt; and even
&lt;a href="http://www.wired.com/wiredenterprise/2013/02/microsoft-bing-fights-botnets/"&gt;Bing&lt;/a&gt;.
A certain fraction of people will always make mistakes, and and any
sufficiently powerful search will allow bad guys to find and take advantage of
the outliers. &lt;/p&gt;

&lt;p&gt;After the dust had settled a bit I started wondering what else we could do with
Github's search - other than snookering schmucks who checked in their private
keys. I'm always enticed by data, and the combination of search and the ability
to download raw checked-in files seemed like a promising avenue to explore.
Lets see what we can come up with.&lt;/p&gt;

&lt;h1&gt;&lt;a href="https://github.com/cortesi/ghrabber"&gt;ghrabber&lt;/a&gt; - grab files from GitHub&lt;/h1&gt;

&lt;p&gt;First, some tooling. I've just released ghrabber, a simple tool that lets you
grab all files matching a search specification from GitHub. Here, for instance,
is an obvious wheeze - fetching all files with the extension ".key":&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;./ghrabber.py "extension:key"
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Downloaded files are saved locally to files named &lt;strong&gt;user.repository&lt;/strong&gt;. Existing
files with the same name are skipped, which means that you can reasonably
efficiently stop and resume a ghrab. &lt;/p&gt;

&lt;h1&gt;Shell history files&lt;/h1&gt;

&lt;p&gt;I've been having a lot of fun exploring Github with ghrabber. I'll return to
this in future posts - today I'll start with a quick illustration of what can
be done. One type of difficult-to-find information that is sometimes checked in
to repos is shell history. Two simple ghrabber commands for the two most
popular shells is all we need:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;./ghrabber.py "path:.bash_history"
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;and&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;./ghrabber.py "path:.zsh_history"
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;After cleaning the data a bit, I had 234 history files varying in length from 1
line to just over 10 thousand, containing a total of 165k entries. I fed this
into &lt;a href="http://pandas.pydata.org/"&gt;Pandas&lt;/a&gt; for analysis, parsing each command
using a combination of hand-hacked heuristics and the built-in
&lt;a href="http://docs.python.org/2/library/shlex.html"&gt;shlex&lt;/a&gt; module. The remainder of
this post is a light exploration of some approaches to this dataset, steering
clear of the obvious and tediously well-covered security implications.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://corte.si/posts/hacks/github-shhistory/topcmds.png"&gt;&lt;img src="http://corte.si/posts/hacks/github-shhistory/topcmds.png"/&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;One way to slice the data is to look at the percentage of history files a given
command appears in. This gives us a nice listing of the top commands by user
prevalence, which you can see in the graph on the left above. On the right,
I've taken the same list of commands, and checked how many invocations are
preceded by a &lt;strong&gt;man&lt;/strong&gt; lookup for the command. This gives us an idea of which
commonly-used commands have difficult or unintuitive interfaces. It's
interesting that &lt;strong&gt;ln&lt;/strong&gt; is right at the top of the list, considering how simple
the command syntax is. My theory is that everyone forgets the order of the
source and target files.&lt;/p&gt;

&lt;div class="row"&gt;
    &lt;div class="span4"&gt;
        &lt;a href="http://corte.si/posts/hacks/github-shhistory/editors.png"&gt;&lt;img src="http://corte.si/posts/hacks/github-shhistory/editors.png"/&gt;&lt;/a&gt;
    &lt;/div&gt;
    &lt;div class="span4"&gt;
        &lt;a href="http://corte.si/posts/hacks/github-shhistory/tmuxes.png"&gt;&lt;img src="http://corte.si/posts/hacks/github-shhistory/tmuxes.png"/&gt;&lt;/a&gt;
    &lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;Since we have a list of the most widely used commands, it's also trivial to do
silly popularity comparisons. Above is the obvious look at the state of the
editor wars (vim is winning, folks), and a check on how
&lt;a href="http://tmux.sourceforge.net/"&gt;tmux&lt;/a&gt; is doing in supplanting screen (the faster
the better). &lt;/p&gt;

&lt;div class="row"&gt;
    &lt;div class="span4"&gt;&lt;a href="http://corte.si/posts/hacks/github-shhistory/args-ssh.png"&gt;&lt;img src="http://corte.si/posts/hacks/github-shhistory/args-ssh.png"/&gt;&lt;/a&gt;&lt;/div&gt;
    &lt;div class="span4"&gt;&lt;a href="http://corte.si/posts/hacks/github-shhistory/args-mkdir.png"&gt;&lt;img src="http://corte.si/posts/hacks/github-shhistory/args-mkdir.png"/&gt;&lt;/a&gt;&lt;/div&gt;
&lt;/div&gt;

&lt;div class="row"&gt;
    &lt;div class="span4"&gt;&lt;a href="http://corte.si/posts/hacks/github-shhistory/args-rm.png"&gt;&lt;img src="http://corte.si/posts/hacks/github-shhistory/args-rm.png"/&gt;&lt;/a&gt;&lt;/div&gt;
    &lt;div class="span4"&gt;&lt;a href="http://corte.si/posts/hacks/github-shhistory/args-ls.png"&gt;&lt;img src="http://corte.si/posts/hacks/github-shhistory/args-ls.png"/&gt;&lt;/a&gt;&lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;Another interesting thing to do is to look at the most commonly used flags to
commands. I think having "real data" of command use may well guide us to design
better command-line interfaces. I'd love to know the most common invocation
flags for some of the tools I write.&lt;/p&gt;

&lt;p&gt;I'll stop there. The data pool in this case is very deep, and there are a huge
range of interesting bits of command-line ethnography that could be done. Stay
posted for more in the coming weeks.&lt;/p&gt;

    &lt;/div&gt;
    &lt;div class="postfooter"&gt;
    &lt;/div&gt;
&lt;/div&gt;
</description><guid isPermaLink="true">http://corte.si/posts/hacks/github-shhistory/index.html</guid><pubDate>Tue, 19 Feb 2013 10:37:00 GMT</pubDate></item><item><title>The trouble with social news</title><link>http://corte.si/posts/socialmedia/trouble-with-social-news.html</link><description>&lt;div class="post"&gt;
    &lt;div class="posthead"&gt;
        &lt;h1&gt;&lt;a href="http://corte.si/posts/socialmedia/trouble-with-social-news.html"&gt;The trouble with social news&lt;/a&gt;&lt;/h1&gt;
        &lt;span class="postdate"&gt;24 January 2013&lt;/span&gt;
    &lt;/div&gt;
    &lt;div class="postbody"&gt;
        &lt;p&gt;There is something terribly awry with the social news ecosystem. This is a
feeling that's been growing on me over the last few years, and is the reason
why I've cut both &lt;a href="http://reddit.com"&gt;Reddit&lt;/a&gt; and &lt;a href="http://news.ycombinator.com"&gt;Hacker
News&lt;/a&gt; (who together constitute pretty much all of
"social news") out of my information diet.  Although I've mulled over things in
various conversations, I've never actually tried to put my feeling of unease in
writing, until today. What's spurring me into action is a &lt;a href="http://yann.lecun.com/ex/pamphlets/publishing-models.html"&gt;proposal by Yann
LeCun&lt;/a&gt; that a model
similar to social news be adopted for scientific peer review - self-assembled
Reviewing Entities voting on streams of submitted papers, regulated by a
reputation system for authors and reviewers. Basically, this is science a la
Reddit: complete with subreddits, karma and upboats. I find the idea frankly
terrifying.&lt;/p&gt;

&lt;p&gt;I guess it's time, then, to put finger to keyboard and lay out what disquiets
me about social news. &lt;/p&gt;

&lt;h2&gt;Karma Corrupts&lt;/h2&gt;

&lt;p&gt;You start by introducing a reputation mechanism like
&lt;a href="http://www.reddit.com/wiki/faq#toc_9"&gt;karma&lt;/a&gt; to improve some outcome - say, to
increase the quality of comments, or to apply a threshold to restrict voting to
trustworthy community members. This seems like a plausible and even elegant
mechanism at first, until you discover the terrible side-effects. &lt;/p&gt;

&lt;p&gt;Humans are fundamentally status-seeking social apes, and you've now introduced
a visible measure of social worth that people will be driven to maximize. In
the real world, we have a word for those who spend their lives accumulating
karma - we call them politicians. And so, within karma communities, we see the
rise of a political class - persuasive centrists who cater (perhaps
unconsciously) to a constituency, and who express (perhaps eloquently) opinions
calculated to appeal to the masses and avoid controversy. Hacker News and many
subreddits are dominated by people like this, whose comments are largely
predictable and rarely add anything new or unexpected to the conversation. &lt;/p&gt;

&lt;p&gt;At the bottom end of the food chain, we have a different class of creature with
the same basic aim as the politicians, but without the persuasive charm needed
to pull off the political approach. These are the karma whores, who use a
mixture of frank pandering, provocation and calculated outrage to achieve the
same aims. &lt;/p&gt;

&lt;p&gt;The karma maximization game often acts contrary to the goals we aimed to
achieve by introducing karma in the first place: the tenor of the community
suffers, the diversity of opinion declines, and the karma whores post pictures
of their cats everywhere.&lt;/p&gt;

&lt;h2&gt;The Lossy Sieve&lt;/h2&gt;

&lt;p&gt;Go and have a look at the &lt;a href="http://news.ycombinator.com/newest"&gt;new story submission
queue&lt;/a&gt; on Hacker News. Scroll through a few
pages, and pay attention to the stories stuck at one vote - they will most
likely never receive another upvote and will die in obscurity. Now, go look at
the &lt;a href="http://news.ycombinator.com/"&gt;front page&lt;/a&gt;. When I do this exercise I'm
struck by the fact that there's plenty of crap on the front page, and quite a
bit of good stuff in the submission queue languishing in obscurity. So, quality
can't be the sole metric here - what determines what gets onto the front page
and what doesn't?&lt;/p&gt;

&lt;p&gt;Lets try a thought experiment. First, set up a small number of voting accounts
- say, 10 or so. Now, in the new submission queue, pick 5 random stories every
hour, and give them a small number of upvotes soon after they are submitted. I
predict that you will find that stories that received this small initial boost
are vastly more likely to end up on the front page. If I'm right, then chance
dominates story selection - as long as an article exceeds some basic quality
threshold, it all depends on who happens to see the story soon after it is
submitted, and whether the spirit moves them to vote. Note that this is not the
case at the extremes - frankly bad content won't be upvoted, and really
important stories will usually find their way to the top. The lossy sieve
phenomenon affects everything in between. &lt;/p&gt;

&lt;p&gt;What this boils down to is that social news doesn't provide an effective filter
- good content gets lost, and mediocre content finds its way onto our screens.&lt;/p&gt;

&lt;h2&gt;The Pinhole Effect&lt;/h2&gt;

&lt;p&gt;In social news, the front page is king. Most users never go beyond the first or
second page of top stories. However, front-page real estate is incredibly
limited compared to the volume of submissions on most popular subreddits and on
Hacker News. The effect of this is that we're looking at a fast-flowing river
of information through a pinhole.  Even assuming that the selection mechanism
works flawlessly, what you see on the front page is a small sliver of the
total, chosen through a consensus mechanism that takes no account of individual
variation in tastes and interests. The news you see is not tailored to &lt;em&gt;you&lt;/em&gt; -
it's tailored to some abstract, average participant, with all the rough edges
of individuality smoothed away. The effect of this is that even at its best,
the stories that emerge from the social news system feel like a predictable
pablum dished up by the hivemind. The subreddit system tries to improve this by
allowing communities to self-assemble around interests, but the pinhole effect
still dominates in busy subreddits like
&lt;a href="http://reddit.com/r/programming"&gt;/r/programming&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;Gaming The System&lt;/h2&gt;

&lt;p&gt;Social news systems are eminently gameable, and cheating is rife. Part of the
reason for this is that a story's destiny depends on a relatively small number
of votes. If your story has any merit at all, you significantly increase the
likelihood that it will end up on the front page by giving it a small nudge at
the beginning of its life. If it has no merit whatsoever, you can still force
it onto people's screens with a few tens or hundreds of votes. Conversely, you
can use the same effect to censor and oppress views you disagree with if your
social news site has downvotes. Anyone who's kept an eye on these things can
rattle off examples of gaming in action: the &lt;a href="http://en.wikipedia.org/wiki/Digg_Patriots"&gt;voting
rings&lt;/a&gt;, the &lt;a href="http://www.reddit.com/r/reddit.com/comments/b7e25/today_i_learned_that_one_of_reddits_most_active/"&gt;"social media
consultants"&lt;/a&gt;,
the &lt;a href="http://www.reddit.com/r/shitredditsays"&gt;vigilante thought-polizei&lt;/a&gt;,
the &lt;a href="http://www.reddit.com/comments/2n2tu/ron_paul_on_the_debate_my_opponents_called_for/c2n5v8"&gt;political
operators&lt;/a&gt;,
and dozens of other types of manipulation and villainy. What's more - these
visible scandals are just the tip of the iceberg. Eyeballs are valuable, and
there's an active arms race with social news sites on the one side, and a dark
army of spammers, scammers and true believers on the other. How much of what we
see is affected by this type of cheating? We just don't know, but my suspicion
is that the effect is significant.&lt;/p&gt;

&lt;p&gt;The point here is broader than any particular instance of gaming. It's that
social news sites are structurally susceptible to manipulation in ways that
can't be fixed without changing the core of their operation. A system like this
might be good enough to deliver &lt;a href="http://knowyourmeme.com/memes/rage-comics"&gt;rage
comics&lt;/a&gt;, but I feel queasy trusting
it any further.&lt;/p&gt;

&lt;h2&gt;Community Collapse Disorder &lt;/h2&gt;

&lt;p&gt;My final beef with social news is a problem that it shares with pretty much all
online communities, especially technical ones. We're all familiar with the
life-cycle of technical forums. They start with a small community of insiders
who create value, which then attracts more people to participate, which then
dilutes the quality of the contributions (and often introduces a few
pathological bad actors), which then causes the good contributors to move on,
which causes the magic well to dry up. Everyone then take their toys and move
to the next community, and the cycle repeats. We saw this with Usenet and the
original C2 wiki, and we are seeing it now with Hacker News and many technical
subreddits all at various points in this life-cycle. &lt;/p&gt;

&lt;p&gt;I believe that Community Collapse Disorder is one of the Big Problems online
that we don't yet have a satisfactory solution to. People are trying, though.
Hacker News, for instance, seems to be rather &lt;a href="https://www.google.com/search?hl=en&amp;amp;q=site%3Anews.ycombinator.com+%22eternal+september%22"&gt;poignantly aware of its own
decline&lt;/a&gt;,
with some of the &lt;a href="http://al3x.net/2011/02/22/solving-the-hacker-news-problem.html"&gt;best of the old-timers calling for an
alternative&lt;/a&gt;.
Paul Graham himself recognizes the issue, and has been tweaking things in
various ways to combat the phenomenon, without much success. &lt;/p&gt;

&lt;p&gt;At the moment, we just don't know how to build online communities that are both
inclusive and stable. Democracy, here, seems to lead inevitably to decline, and
social news sites are no exception.&lt;/p&gt;

&lt;h2&gt;A better way forward?&lt;/h2&gt;

&lt;p&gt;A big part of the reason I don't use social news anymore is that my existing
social networks have become so much more effective at turning up good content.
The absolute best source of news for me is simply the set of links shared by
the folks I follow on &lt;a href="http://twitter.com/cortesi"&gt;Twitter&lt;/a&gt;. I follow people
who post interesting content, and whom I trust to act as information filters
for me. Most of them share my technical interests, but some are interesting
because they are from my home town, or because they share some more esoteric
pursuit with me. So, the news stream I see is exactly tailored to me. At the
same time, there is also room idiosyncrasy - if someone I follow shares
something left-field that tickles their fancy, I'll see it. In turn, I try to
be a responsible information filter for those who follow me - I find a link or
two worth tweeting on most days.&lt;/p&gt;

&lt;p&gt;There are still things I miss - Twitter is great for sharing links, but is an
awful medium for technical discussion.
&lt;a href="https://plus.google.com/106243676845481872244"&gt;Google+&lt;/a&gt; could be a better
alternative, but just doesn't seem to have achieved liftoff for me. I would
also love better tools for aggregating and harvesting links from my social
network. At the moment I use &lt;a href="http://flipboard.com"&gt;Flipboard&lt;/a&gt; and
&lt;a href="http://getprismatic.com"&gt;Prismatic&lt;/a&gt;, but I have issues with both. On the
whole, though, these are quibbles. It seems to me that using social networks to
filter news is a better way forward - if I was tackling the social news
problem, I'd be building tools to support this process.&lt;/p&gt;

    &lt;/div&gt;
    &lt;div class="postfooter"&gt;
    &lt;/div&gt;
&lt;/div&gt;
</description><guid isPermaLink="true">http://corte.si/posts/socialmedia/trouble-with-social-news.html</guid><pubDate>Thu, 24 Jan 2013 11:01:00 GMT</pubDate></item><item><title>Go: a nice language with an annoying personality</title><link>http://corte.si/posts/code/go/go-rant.html</link><description>&lt;div class="post"&gt;
    &lt;div class="posthead"&gt;
        &lt;h1&gt;&lt;a href="http://corte.si/posts/code/go/go-rant.html"&gt;Go: a nice language with an annoying personality&lt;/a&gt;&lt;/h1&gt;
        &lt;span class="postdate"&gt;18 January 2013&lt;/span&gt;
    &lt;/div&gt;
    &lt;div class="postbody"&gt;
        &lt;p&gt;Last week, I had the pleasure of attending &lt;a href="http://dropbox.com"&gt;Dropbox&lt;/a&gt;'s
annual company &lt;a href="https://blog.dropbox.com/2012/03/hack-week-ii/"&gt;hack fest&lt;/a&gt;.  It
was a great opportunity to get a look at how Dropbox works internally, and
mingle with the smart and driven folks who make one of my favourite products.
In the spirit of hack week, me and my friend
&lt;a href="http://twitter.com/alexdong"&gt;@alexdong&lt;/a&gt; decided to do our project in Go. We'd
both wanted to explore the language, but had never quite been able to make time
- a week-long code holiday seemed to be the perfect opportunity. I was hopeful
that Go would turn out to hit a magical sweet spot: a light set of abstractions
hugging close to the machine, while still providing the indoor plumbing and
civilized conveniences of life that I had grown used to with languages like
Python. Five days of furious hacking later, I can report that Go might well
deliver on this promise, but has enough annoying personality quirks that I will
think twice about basing any more projects on it.&lt;/p&gt;

&lt;p&gt;My main beef with Go has nothing to do with fundamental language design, and
may seem almost inconsequential at first glance. The Go compiler treats unused
module imports and declared variables as compile errors. This is great in
theory and is something you might well want to enforce before code can be
committed, but during the actual &lt;em&gt;process&lt;/em&gt; of producing code it's nothing but
an irksome, unnecessary pain in the ass. Let's look at a concrete example,
starting with a snippet of code as follows &lt;sup class="footnote-ref" id="fnref-1"&gt;&lt;a href="#fn-1"&gt;1&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;

&lt;pre&gt;import (
    &amp;quot;io/ioutil&amp;quot;
)
...
...
    m, err := ioutil.ReadFile(path)
    if err != nil {
        return nil, err
    }
...
...
    DoSomething(m)

&lt;/pre&gt;

&lt;p&gt;I'm a firm believer that printing stuff to screen is a programmer's best
debugging tool, so say we're hacking away and want to print the value of &lt;strong&gt;m&lt;/strong&gt;
while running our unit tests. We change the code as follows, adding an import
for the "fmt" module and a call to Print:&lt;/p&gt;

&lt;pre&gt;import (
    &amp;quot;io/ioutil&amp;quot;
    &amp;quot;fmt&amp;quot;
)
...
...
    m, err := ioutil.ReadFile(path)
    if err != nil {
        return nil, err
    }
    fmt.Print(m)
...
...
    DoSomething(m)

&lt;/pre&gt;

&lt;p&gt;Now we keep hacking, and want to comment out the print statement for a moment
like so: &lt;/p&gt;

&lt;pre&gt;import (
    &amp;quot;io/ioutil&amp;quot;
    &amp;quot;fmt&amp;quot;
)
...
...
    m, err := ioutil.ReadFile(path)
    if err != nil {
        return nil, err
    }
    //fmt.Print(m)
...
...
    DoSomething(m)

&lt;/pre&gt;

&lt;p&gt;This is a compile error. We have to switch contexts, move to the top of the
module, also comment out the import, and then move back to the spot we're
really hacking on:&lt;/p&gt;

&lt;pre&gt;import (
    &amp;quot;io/ioutil&amp;quot;
    //&amp;quot;fmt&amp;quot;
)
...
...
    m, err := ioutil.ReadFile(path)
    if err != nil {
        return nil, err
    }
    //fmt.Print(m)
...
...
    DoSomething(m)

&lt;/pre&gt;

&lt;p&gt;A few seconds later, we want to re-enable the Print statement - so up we go
again to the top of the module to re-enable the import. This is even worse when
we want to, say, comment out the &lt;strong&gt;DoSomething&lt;/strong&gt; call while hacking:&lt;/p&gt;

&lt;pre&gt;import (
    &amp;quot;io/ioutil&amp;quot;
)
...
...
    m, err := ioutil.ReadFile(path)
    if err != nil {
        return nil, err
    }
...
...
    //DoSomething(m)

&lt;/pre&gt;

&lt;p&gt;This is also a compile error because now &lt;em&gt;m&lt;/em&gt; is unused. We have to hunt up in
our code to find the declaration, which could be explicit or implicit using an
&lt;strong&gt;:=&lt;/strong&gt; assignment. So, in this case we find the declaration, and use the magic
underscore name to throw the offending value away:&lt;/p&gt;

&lt;pre&gt;import (
    &amp;quot;io/ioutil&amp;quot;
)
...
...
    _, err := ioutil.ReadFile(path)
    if err != nil {
        return nil, err
    }
...
...
    //DoSomething(m)

&lt;/pre&gt;

&lt;p&gt;That should fix it, right? Well, no. It turns out we've previously declared and
used &lt;strong&gt;err&lt;/strong&gt; (a very common idiom), so this is still a compile error. We're
using the "declare and assign" syntax, but have no new variables on the
left-hand side of the ":=". So we need to make another tweak:&lt;/p&gt;

&lt;pre&gt;import (
    &amp;quot;io/ioutil&amp;quot;
)
...
...
    _, err = ioutil.ReadFile(path)
    if err != nil {
        return nil, err
    }
...
...
    //DoSomething(m)

&lt;/pre&gt;

&lt;p&gt;Five seconds later, we want to re-enable &lt;strong&gt;DoSomething&lt;/strong&gt;, and now we have to
unwind the entire process. &lt;/p&gt;

&lt;p&gt;The cumulative effect of all this is like trying to write code while someone
next to you randomly knocks your hands off the keyboard every few seconds.
It's a pointlessly pedantic approach that adds constant friction to your
write-compile-test cycle, breaks your flow, and just generally makes life a
little harder for very little benefit. There's no way to turn this mis-feature
off, no flag we can pass to the compiler to temporarily make this a warning
rather than an error while hacking&lt;sup class="footnote-ref" id="fnref-2"&gt;&lt;a href="#fn-2"&gt;2&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;The irony of the situation is that I agree with the sentiment behind this. I
don't want dangling variables or imports in my codebase. And I agree that if
something is worth warning about it's worth making it an error. The mistake is
to confuse the state we want at the conclusion of a unit of hacking&lt;sup class="footnote-ref" id="fnref-3"&gt;&lt;a href="#fn-3"&gt;3&lt;/a&gt;&lt;/sup&gt;, with
what we need at every point in between, during the write-compile-test cycle.
This cycle is the core of the process of actually producing code, and the
&lt;a href="http://xkcd.com/353/"&gt;exhilarating sense of weightlessness&lt;/a&gt; that you get when
hacking in Python is largely due to the fact that the language works really,
really hard to optimize this process. Go has given away this feeling of
exhilaration, basically for nothing.&lt;/p&gt;

&lt;p&gt;Despite all this, it's still possible that the benefits of Go do outweigh its
irritating personality. Interfaces, memory management, first-class concurrency
and static type checking is a knockout combination, and the language in general
has something of the taut practicality that I love in C. So, despite the
rantiness of this post, I'll keep hacking on our project and make sure I
produce a few thousand more lines of code before making a final call on the
language. Look for a project release and a blog post along these lines in the
coming months.&lt;/p&gt;

&lt;div class="footnotes"&gt;
&lt;hr /&gt;
&lt;ol&gt;
&lt;li id="fn-1"&gt;
&lt;p&gt;Ellipses indicate "an arbitrary amount of intervening code"&amp;nbsp;&lt;a href="#fnref-1" class="footnoteBackLink" title="Jump back to footnote 1 in the text."&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn-2"&gt;
&lt;p&gt;I edited this paragraph a bit for tone. I originally accused the Go
documentation of being faintly smug about all of this - which is not fair, and
doesn't add anything to the argument.&amp;nbsp;&lt;a href="#fnref-2" class="footnoteBackLink" title="Jump back to footnote 2 in the text."&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn-3"&gt;
&lt;p&gt;Why don't we have a word for this? By "unit of hacking", I mean the work
that goes on between starting to hack on a change-set and doing a commit. At the
beginning and at the end, the code is in a clean state, but in between there
are many periods of transition where cleanliness requirements are relaxed.&amp;nbsp;&lt;a href="#fnref-3" class="footnoteBackLink" title="Jump back to footnote 3 in the text."&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;

    &lt;/div&gt;
    &lt;div class="postfooter"&gt;
    &lt;/div&gt;
&lt;/div&gt;
</description><guid isPermaLink="true">http://corte.si/posts/code/go/go-rant.html</guid><pubDate>Fri, 18 Jan 2013 11:51:00 GMT</pubDate></item><item><title>Released: pathod 0.3</title><link>http://corte.si/posts/code/pathod/announce0_3.html</link><description>&lt;div class="post"&gt;
    &lt;div class="posthead"&gt;
        &lt;h1&gt;&lt;a href="http://corte.si/posts/code/pathod/announce0_3.html"&gt;Released: pathod 0.3&lt;/a&gt;&lt;/h1&gt;
        &lt;span class="postdate"&gt;16 November 2012&lt;/span&gt;
    &lt;/div&gt;
    &lt;div class="postbody"&gt;
        &lt;p&gt;I've just released &lt;a href="http://pathod.net"&gt;pathod 0.3&lt;/a&gt;, which beefs up
&lt;a href="http://pathod.net/docs/pathoc"&gt;pathoc&lt;/a&gt;'s fuzzing capabilities, improves the
spec language and includes lots of bugfixes and other small tweaks. Get it
while it's hot!&lt;/p&gt;

&lt;h2&gt;Better fuzzing&lt;/h2&gt;

&lt;p&gt;A major focus of this release is to improve
&lt;a href="http://pathod.net/docs/pathoc"&gt;pathoc&lt;/a&gt;'s capabilities as a basic fuzzing tool.
I've had fun &lt;a href="http://corte.si/posts/code/pathod/pythonservers/index.html"&gt;breaking webservers&lt;/a&gt; with
pathoc, and it's even come in handy in my Day Job. Here's a quick summary of
how things have changed.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The &lt;strong&gt;-x&lt;/strong&gt; flag tells pathoc to explain its requests. This prints out an
expanded pathoc query specification, with all randomly generated content and
query modifications resolved. If you trigger an exception, you can precisely
replay the offending query using this explanation.&lt;/li&gt;
&lt;li&gt;The options for outputting requests and responses have been expanded hugely.
First, the &lt;strong&gt;-q&lt;/strong&gt; and &lt;strong&gt;-r&lt;/strong&gt; flags tell pathoc to dump complete records of
requests and responses respectively. This data is sniffed by instrumenting
the socket, so is canonical regardless of our ability to interpret returned
data. The &lt;strong&gt;-x&lt;/strong&gt; option makes pathod dump this data in hexdump format
(otherwise unprintable characters are escaped to preserve your terminal).&lt;/li&gt;
&lt;li&gt;A number of options have been added to let you ignore expected responses.
&lt;strong&gt;-C&lt;/strong&gt; takes a comma-separated list of response codes to ignore. &lt;strong&gt;-T&lt;/strong&gt;
ignores server timeouts. This lets you hone in on the exceptional responses
that you care about, and ignore the rest.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;Language improvements&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;I've simplified response specifications by making the response message
a standard component with the "r" mnemonic. &lt;/li&gt;
&lt;li&gt;I've added the "u" mnemonic to request specifications, as a shortcut for
specifying the User-Agent header: &lt;/li&gt;
&lt;/ul&gt;

&lt;pre&gt;get:/:u"My Weird User-Agent"&lt;/pre&gt;

&lt;p&gt;We also have a small library of representative User-Agent strings that can be
  used instead of specifying your own. For example, this specifies the
  GoogleBot User-Agent string:&lt;/p&gt;

&lt;pre&gt;get:/:ug&lt;/pre&gt;

&lt;p&gt;The list of available shortcuts are in the docs, and can be listed from the
  commandline using the &lt;strong&gt;--show-uas&lt;/strong&gt; flag to pathoc:&lt;/p&gt;

&lt;pre class="terminal"&gt;&gt; ./pathoc --show-uas
User agent strings:
   a android
   l blackberry
   b bingbot
   c chrome
   f firefox
   g googlebot
   i ie9
   p ipad
   h iphone
   s safari&lt;/pre&gt;

    &lt;/div&gt;
    &lt;div class="postfooter"&gt;
    &lt;/div&gt;
&lt;/div&gt;
</description><guid isPermaLink="true">http://corte.si/posts/code/pathod/announce0_3.html</guid><pubDate>Fri, 16 Nov 2012 15:25:00 GMT</pubDate></item><item><title>pathoc: break all the Python webservers!</title><link>http://corte.si/posts/code/pathod/pythonservers/index.html</link><description>&lt;div class="post"&gt;
    &lt;div class="posthead"&gt;
        &lt;h1&gt;&lt;a href="http://corte.si/posts/code/pathod/pythonservers/index.html"&gt;pathoc: break all the Python webservers!&lt;/a&gt;&lt;/h1&gt;
        &lt;span class="postdate"&gt;27 September 2012&lt;/span&gt;
    &lt;/div&gt;
    &lt;div class="postbody"&gt;
        &lt;p&gt;A few months ago, I announced &lt;a href="http://pathod.net"&gt;pathod&lt;/a&gt;, a pathological HTTP
daemon. The project started as a testing tool to let me craft
standards-violating HTTP responses while working on
&lt;a href="http://mitmproxy.org"&gt;mitmproxy&lt;/a&gt;. It soon became a free-standing project, and
has turned out to be incredibly useful in security testing, exploit delivery
and general creative mischief. In the last release, I added pathoc - pathod's
malicious client-side twin. It does for HTTP requests what pathod does for HTTP
responses, and uses the same &lt;a href="http://pathod.net/docs/language"&gt;hyper-terse specification
language&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;In this post, I show how pathoc can be used as a very simple fuzzer, by finding
issues in a number of major pure-Python webservers. None of the tested servers
failed catastrophically - they all caught the unexpected exception and
continued serving requests. None the less, I think it's reasonable to say that
we've triggered a bug if a) the server returns an 500 Internal Server Error
response or terminates the connection abnormally, and b) we see a traceback in
our logs.  In fact, by this definition, I found bugs in &lt;em&gt;every&lt;/em&gt; pure-Python
server I tested. &lt;/p&gt;

&lt;p&gt;All of the problems I list below are simple failures of validation - what they
have in common is that somewhere in the project code is called with input that
it doesn't expect and can't handle.  This matters - in fact, I'd argue that the
majority of security problems fall in this category. It's interesting to ponder
why this type of issue is so ubiquitous in Python servers. I have no doubt that
part the answer lies in Python's use of exceptions - errors that would be
explicit in other languages can be implicit in Python, and code that seems
clean and intuitive might in fact be buggy. I think this is especially relevant
right now, given the recent flurry of discussion surrounding the &lt;a href="http://golang.org/"&gt;Go
language&lt;/a&gt; and its error handling. It's pretty instructive
to read Russ Cox's &lt;a href="https://plus.google.com/116810148281701144465/posts/iqAiKAwP6Ce"&gt;recent
riposte&lt;/a&gt; to
&lt;a href="http://uberpython.wordpress.com/2012/09/23/why-im-not-leaving-python-for-go/"&gt;this
post&lt;/a&gt;
criticizing Go's explicit approach, while looking at the bugs below. &lt;a href="https://github.com/cortesi"&gt;I love
Python&lt;/a&gt; and I think it's a fine language, but I
also think the designers of Go probably made the right choice.&lt;/p&gt;

&lt;h1&gt;Basic fuzzing with pathoc&lt;/h1&gt;

&lt;p&gt;My methodology for these tests was very simple indeed. I launched each server
in turn, and used pathod to fire corrupted GET requests at the daemon until I
saw an error. I then looked at the logs, and boiled the distinct cases down to
a minimal pathoc specification by hand. This exercises a rather shallow set of
features in the server software - mostly parsing of the HTTP lead-in and
request headers. It's possible to give software a much, much deeper workout
with pathoc, but I'll leave that for a future post. &lt;/p&gt;

&lt;p&gt;My pathoc fuzzing command looked something like this:&lt;/p&gt;

&lt;pre class="terminal"&gt;
pathoc -n 1000 -p 8080 -t 1 localhost 'get:/:b@10:ir,"\x00"'
&lt;/pre&gt;

&lt;p&gt;The most important flags here are &lt;b&gt;-n&lt;/b&gt;, which tells pathoc to make 1000
consecutive requests, and &lt;b&gt;-t&lt;/b&gt;, which tells pathoc to time out after one
second (necessary to prevent hangs when daemons terminate improperly). The
request specification itself breaks down as follows:&lt;/p&gt;

&lt;table class="table"&gt;
    &lt;tr&gt;
        &lt;td&gt;get&lt;/td&gt;
        &lt;td&gt;Issue a GET request&lt;/td&gt;
    &lt;/tr&gt;

    &lt;tr&gt;
        &lt;td&gt;/&lt;/td&gt;
        &lt;td&gt;... to the path / &lt;/td&gt;
    &lt;/tr&gt;

    &lt;tr&gt;
        &lt;td&gt;b@10&lt;/td&gt;
        &lt;td&gt;... with a body consisting of 10 random bytes &lt;/td&gt;
    &lt;/tr&gt;

    &lt;tr&gt;
        &lt;td&gt;ir,"\x00"&lt;/td&gt;
        &lt;td&gt;... and inject a NULL byte at a random location.&lt;/td&gt;
    &lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;It's that last clause - the random injection - that makes the difference
between simply crafting requests and basic fuzzing. Every time a new request is
issued, the injection occurs at a different location. I varied the injected
character between a NULL byte, a carriage return and a random alphabet letter.
Each exposed different errors in different servers. For a complete description
of the specification language, see the &lt;a href="http://pathod.net/docs/language"&gt;online
docs&lt;/a&gt;.&lt;/p&gt;

&lt;h1&gt;Results&lt;/h1&gt;

&lt;p&gt;For each bug, I've given a traceback and a minimal pathoc call to trigger the
issue. The tracebacks have been edited lightly to shorten file paths and
remove irrelevances like timestamps.&lt;/p&gt;

&lt;ul class="nav nav-tabs"&gt;
  &lt;li class="active"&gt;&lt;a href="#cherrypy" data-toggle="tab"&gt;cherrypy&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="#tornado" data-toggle="tab"&gt;tornado&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="#twisted" data-toggle="tab"&gt;twisted&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="#simplehttp" data-toggle="tab"&gt;SimpleHTTPServer&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="#waitress" data-toggle="tab"&gt;waitress&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href="#werkzeug" data-toggle="tab"&gt;werkzeug&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="tab-content"&gt;

&lt;div class="tab-pane active" id="cherrypy"&gt;

&lt;pre class="terminal"&gt;pathoc -p 8080 localhost 'get:/:b@10:h"Content-Length"="x"'&lt;/pre&gt;

&lt;pre&gt;
ENGINE ValueError("invalid literal for int() with base 10: 'x'",)
Traceback (most recent call last):
  File "cherrypy/wsgiserver/wsgiserver2.py", line 1292, in communicate
    req.parse_request()
  File "cherrypy/wsgiserver/wsgiserver2.py", line 591, in parse_request
    success = self.read_request_headers()
  File "cherrypy/wsgiserver/wsgiserver2.py", line 711, in read_request_headers
    if mrbs and int(self.inheaders.get("Content-Length", 0)) &gt; mrbs:
ValueError: invalid literal for int() with base 10: 'x'
&lt;/pre&gt;


&lt;pre class="terminal"&gt;pathoc -p 8080 localhost 'get:/:i4,"\r"&lt;/pre&gt;

&lt;pre&gt;
ENGINE TypeError("argument of type 'NoneType' is not iterable",)
Traceback (most recent call last):
  File "cherrypy/wsgiserver/wsgiserver2.py", line 1292, in communicate
    req.parse_request()
  File "cherrypy/wsgiserver/wsgiserver2.py", line 580, in parse_request
    success = self.read_request_line()
  File "cherrypy/wsgiserver/wsgiserver2.py", line 644, in read_request_line
    if NUMBER_SIGN in path:
TypeError: argument of type 'NoneType' is not iterable
&lt;/pre&gt;

&lt;/div&gt;

&lt;div class="tab-pane" id="tornado"&gt;

&lt;pre class="terminal"&gt;pathoc -p 8080 localhost 'get:/:b@10:h"Content-Length"="x"'&lt;/pre&gt;

&lt;pre&gt;
[E 120927 11:42:26 iostream:307] Uncaught exception, closing connection.
    Traceback (most recent call last):
      File "tornado/iostream.py", line 304, in wrapper
        callback(*args)
      File "tornado/httpserver.py", line 254, in _on_headers
        content_length = int(content_length)
    ValueError: invalid literal for int() with base 10: 'x'
[E 120927 11:42:26 ioloop:435] Exception in callback &lt;tornado.stack_context._StackContextWrapper object at 0x1012e28e8&gt;
    Traceback (most recent call last):
      File "tornado/ioloop.py", line 421, in _run_callback
        callback()
      File "tornado/iostream.py", line 304, in wrapper
        callback(*args)
      File "tornado/httpserver.py", line 254, in _on_headers
        content_length = int(content_length)
    ValueError: invalid literal for int() with base 10: 'x'
&lt;/pre&gt;


&lt;pre class="terminal"&gt;pathoc -p 8080 localhost 'get:/:h"h\r\n"="x"'&lt;/pre&gt;

&lt;pre&gt;
[E iostream:307] Uncaught exception, closing connection.
    Traceback (most recent call last):
      File "tornado/iostream.py", line 304, in wrapper
        callback(*args)
      File "tornado/httpserver.py", line 236, in _on_headers
        headers = httputil.HTTPHeaders.parse(data[eol:])
      File "tornado/httputil.py", line 127, in parse
        h.parse_line(line)
      File "tornado/httputil.py", line 113, in parse_line
        name, value = line.split(":", 1)
    ValueError: need more than 1 value to unpack
[E ioloop:435] Exception in callback &lt;tornado.stack_context._StackContextWrapper object at 0x1012bd7e0&gt;
    Traceback (most recent call last):
      File "tornado/ioloop.py", line 421, in _run_callback
        callback()
      File "tornado/iostream.py", line 304, in wrapper
        callback(*args)
      File "tornado/httpserver.py", line 236, in _on_headers
        headers = httputil.HTTPHeaders.parse(data[eol:])
      File "tornado/httputil.py", line 127, in parse
        h.parse_line(line)
      File "tornado/httputil.py", line 113, in parse_line
        name, value = line.split(":", 1)
    ValueError: need more than 1 value to unpack
&lt;/pre&gt;



&lt;/div&gt;

&lt;div class="tab-pane" id="twisted"&gt;

&lt;pre class="terminal"&gt;pathoc -p 8080 localhost 'get:/:b@10:h"Content-Length"="x"'&lt;/pre&gt;

&lt;pre&gt;
[HTTPChannel,4,127.0.0.1] Unhandled Error
    Traceback (most recent call last):
      File "twisted/python/log.py", line 84, in callWithLogger
        return callWithContext({"system": lp}, func, *args, **kw)
      File "twisted/python/log.py", line 69, in callWithContext
        return context.call({ILogContext: newCtx}, func, *args, **kw)
      File "twisted/python/context.py", line 118, in callWithContext
        return self.currentContext().callWithContext(ctx, func, *args, **kw)
      File "twisted/python/context.py", line 81, in callWithContext
        return func(*args,**kw)
    --- &lt;exception caught here&gt; ---
      File "twisted/internet/selectreactor.py", line 150, in _doReadOrWrite
        why = getattr(selectable, method)()
      File "twisted/internet/tcp.py", line 199, in doRead
        rval = self.protocol.dataReceived(data)
      File "twisted/protocols/basic.py", line 564, in dataReceived
        why = self.lineReceived(line)
      File "twisted/web/http.py", line 1558, in lineReceived
        self.headerReceived(self.__header)
      File "twisted/web/http.py", line 1580, in headerReceived
        self.length = int(data)
    exceptions.ValueError: invalid literal for int() with base 10: 'x'
&lt;/pre&gt;
&lt;/div&gt;

&lt;div class="tab-pane" id="simplehttp"&gt;

&lt;pre class="terminal"&gt;pathoc -p 8080 localhost 'get:"/\0"'&lt;/pre&gt;

&lt;pre&gt;
Exception happened during processing of request from ('127.0.0.1', 54029)
Traceback (most recent call last):
  File "lib/python2.7/SocketServer.py", line 284, in _handle_request_noblock
    self.process_request(request, client_address)
  File "lib/python2.7/SocketServer.py", line 310, in process_request
    self.finish_request(request, client_address)
  File "lib/python2.7/SocketServer.py", line 323, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "lib/python2.7/SocketServer.py", line 638, in __init__
    self.handle()
  File "python2.7/BaseHTTPServer.py", line 340, in handle
    self.handle_one_request()
  File "lib/python2.7/BaseHTTPServer.py", line 328, in handle_one_request
    method()
  File "lib/python2.7/SimpleHTTPServer.py", line 44, in do_GET
    f = self.send_head()
  File "lib/python2.7/SimpleHTTPServer.py", line 68, in send_head
    if os.path.isdir(path):
  File "lib/python2.7/genericpath.py", line 41, in isdir
    st = os.stat(s)
TypeError: must be encoded string without NULL bytes, not str
&lt;/pre&gt;

&lt;/div&gt;

&lt;div class="tab-pane" id="waitress"&gt;

&lt;pre class="terminal"&gt;pathoc -p 8080 localhost 'get:/:i16," "'&lt;/pre&gt;
&lt;pre&gt;ERROR:waitress:uncaptured python exception, closing channel 
&lt;waitress.channel.HTTPChannel connected 127.0.0.1:62330 at 0x1007ca310&gt; 
(
    &lt;type 'exceptions.IndexError'&gt;:list index out of range 
        [lib/python2.7/asyncore.py|read|83] 
        [lib/python2.7/asyncore.py|handle_read_event|444]
        [lib/python2.7/site-packages/waitress/channel.py|handle_read|169]
        [lib/python2.7/site-packages/waitress/channel.py|received|186]
        [lib/python2.7/site-packages/waitress/parser.py|received|99]
        [lib/python2.7/site-packages/waitress/parser.py|parse_header|158]
        [lib/python2.7/site-packages/waitress/parser.py|get_header_lines|247]
)
&lt;/pre&gt;

&lt;b&gt;Edit: The first version of this post had examples that were due to the test
WSGI application, not waitress. I've replaced them with the traceback above,
which has been reformatted for clarity. &lt;/b&gt;

&lt;/div&gt;

&lt;div class="tab-pane" id="werkzeug"&gt;

&lt;pre class="terminal"&gt;pathoc -p 8080 localhost 'get:/:h"Host"="n\r\0"'&lt;/pre&gt;

&lt;pre&gt;
Traceback (most recent call last):
  File "flask/app.py", line 1518, in __call__
    return self.wsgi_app(environ, start_response)
  File "flask/app.py", line 1507, in wsgi_app
    return response(environ, start_response)
  File "/usr/local/lib/python2.7/site-packages/werkzeug/wrappers.py", line 1082, in __call__
    app_iter, status, headers = self.get_wsgi_response(environ)
  File "werkzeug/wrappers.py", line 1070, in get_wsgi_response
    headers = self.get_wsgi_headers(environ)
  File "werkzeug/wrappers.py", line 986, in get_wsgi_headers
    headers['Location'] = location
  File "werkzeug/datastructures.py", line 1132, in __setitem__
    self.set(key, value)
  File "werkzeug/datastructures.py", line 1097, in set
    self._validate_value(_value)
  File "werkzeug/datastructures.py", line 1065, in _validate_value
    raise ValueError('Detected newline in header value.  This is '
ValueError: Detected newline in header value.  This is a potential security problem
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;&lt;/div&gt;&lt;/p&gt;

    &lt;/div&gt;
    &lt;div class="postfooter"&gt;
    &lt;/div&gt;
&lt;/div&gt;
</description><guid isPermaLink="true">http://corte.si/posts/code/pathod/pythonservers/index.html</guid><pubDate>Thu, 27 Sep 2012 09:46:00 GMT</pubDate></item><item><title>Limits of data visualization with space filling curves</title><link>http://corte.si/posts/visualisation/hilbert-snake/index.html</link><description>&lt;div class="post"&gt;
    &lt;div class="posthead"&gt;
        &lt;h1&gt;&lt;a href="http://corte.si/posts/visualisation/hilbert-snake/index.html"&gt;Limits of data visualization with space filling curves&lt;/a&gt;&lt;/h1&gt;
        &lt;span class="postdate"&gt;20 September 2012&lt;/span&gt;
    &lt;/div&gt;
    &lt;div class="postbody"&gt;
        &lt;p&gt;I recently wrote a &lt;a href="http://corte.si/posts/visualisation/binvis/index.html"&gt;series&lt;/a&gt; of
&lt;a href="http://corte.si/posts/visualisation/entropy/index.html"&gt;posts&lt;/a&gt; using the &lt;a href="http://corte.si/posts/code/hilbert/portrait/index.html"&gt;Hilbert
curve&lt;/a&gt; to visualize binaries,
culminating in a &lt;a href="http://corte.si/posts/visualisation/malware/index.html"&gt;gallery showing regions of high entropy in
malware&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;center&gt;
    &lt;a href='http://corte.si/posts/visualisation/malware/detail/08b983ec55bfd50d1d2cb9a90b1ae54e.html'&gt;
        &lt;img src='http://corte.si/posts/visualisation/hilbert-snake/malwarexample.png'/&gt;
    &lt;/a&gt;
&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;The fact that the Hilbert curve has excellent locality preservation means that
one dimensional features are preserved (as much as they can be) in the
two-dimensional layout. This lets us visually pick out features of interest,
and makes it possible, for instance, to quickly identify different malware
packers just based on their layout characteristics. &lt;/p&gt;

&lt;p&gt;An obvious next step is to ask if it's possible to extend this idea to let us
visually compare binaries, creating a sort of visual diff. Unfortunately, we
now bump our heads against the limitations of space-filling curve
visualization. I made the animation below after a recent conversation along
these lines, and I think it illustrates the main issues nicely. It shows a
single contiguous stretch of data (the black area) being shifted progressively
through a binary.  At each timestep, the only thing that changes is the
starting location of the data block:&lt;/p&gt;

&lt;p&gt;&lt;center&gt;
    &lt;img src='http://corte.si/posts/visualisation/hilbert-snake/hilbertsnake.gif'/&gt;
&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;Two things are immediately clear: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The block of data doesn't retain its
shape at different offsets - identical stretches of data can look totally
different depending on their locations. &lt;/li&gt;
&lt;li&gt;There's no way to quickly see
&lt;em&gt;where&lt;/em&gt; in the binary a piece of information lies. Unless you are very familiar
with the particular curve and know its exact orientation, you can't say, for
instance, when the data block lies a third of the way through the binary.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It's often worthwhile to trade off these things for locality preservation, but
it definitely scotches certain use cases. I do wonder if it might be possible
to tune the trade-off somewhat - sacrificing some locality preservation for
better shape retention and offset estimation. I've toyed with some ideas along
these lines (see the unrolled layouts in the &lt;a href="http://corte.si/posts/visualisation/binvis/index.html"&gt;binary visualization
post&lt;/a&gt;), but I still don't have a
satisfying solution. If anyone out there knows of one, drop me a line.&lt;/p&gt;

    &lt;/div&gt;
    &lt;div class="postfooter"&gt;
    &lt;/div&gt;
&lt;/div&gt;
</description><guid isPermaLink="true">http://corte.si/posts/visualisation/hilbert-snake/index.html</guid><pubDate>Thu, 20 Sep 2012 12:30:00 GMT</pubDate></item><item><title>Findng the UDID leak: a guessing game</title><link>http://corte.si/posts/security/udid-leak-guessing.html</link><description>&lt;div class="post"&gt;
    &lt;div class="posthead"&gt;
        &lt;h1&gt;&lt;a href="http://corte.si/posts/security/udid-leak-guessing.html"&gt;Findng the UDID leak: a guessing game&lt;/a&gt;&lt;/h1&gt;
        &lt;span class="postdate"&gt;07 September 2012&lt;/span&gt;
    &lt;/div&gt;
    &lt;div class="postbody"&gt;
        &lt;p&gt;It's become quite a popular parlor game to guess who is responsible for the
recent Antisec UDID leak. I've now seen no less than six separate apps named as
the probable source (two of which came from &lt;a href="http://www.marco.org"&gt;Marco
Arment&lt;/a&gt;). Before we pick the next culprit, I think it's
worth taking a step back to consider the list of things we &lt;em&gt;don't&lt;/em&gt; know:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We don't know that we're dealing with just one source. The Antisec dump may
well be an amalgam of data from various sources.&lt;/li&gt;
&lt;li&gt;We don't know that we're looking for just one app, or even a set of apps by
one developer. The leak may well come from one of the myriad of 3rd party services
which could be included in thousands of apps.&lt;/li&gt;
&lt;li&gt;We don't know that Antisec is being truthful about the scale of the database,
or the additional data they claim is associated with the UDID/APNS records.&lt;/li&gt;
&lt;li&gt;We certainly don't know that the data was filched from an FBI laptop or that
the NCFTA was in any way involved.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Given all of these unknowns, I think a simple process-of-elimination approach
to tracking down the leak will probably be fruitless, or worse, result in the
finger being pointed at even more innocent parties. The one entity that may
already have the answer to this question is Apple. They have a list of a
million affected UDIDs, and they presumably have records of all apps that have
ever used the associated push tokens. Given a large and precise sample like
this, it should be possible to find the origin(s) of the leak reasonably
easily. Indeed, if Apple is on the ball they may already have done this.&lt;/p&gt;

&lt;p&gt;Now for some frank speculation of my own. Let's assume for a moment that
Antisec has been entirely truthful about the data, and that we're dealing with
a single source. In that case, we're looking for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;... an app or third-party service integrated into multiple apps&lt;/li&gt;
&lt;li&gt;... with 12 million or more users&lt;/li&gt;
&lt;li&gt;... that is APNS-enabled&lt;/li&gt;
&lt;li&gt;... which also gathers user data like real names and zip codes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I'll throw my hat in the ring and say that my money is on a third-party
service, not a single app. If my hunch is right, the list of possible culprits
is actually rather short.&lt;/p&gt;

    &lt;/div&gt;
    &lt;div class="postfooter"&gt;
    &lt;/div&gt;
&lt;/div&gt;
</description><guid isPermaLink="true">http://corte.si/posts/security/udid-leak-guessing.html</guid><pubDate>Fri, 07 Sep 2012 21:59:00 GMT</pubDate></item><item><title>The UDID leak is a privacy catastrophe</title><link>http://corte.si/posts/security/udid-leak.html</link><description>&lt;div class="post"&gt;
    &lt;div class="posthead"&gt;
        &lt;h1&gt;&lt;a href="http://corte.si/posts/security/udid-leak.html"&gt;The UDID leak is a privacy catastrophe&lt;/a&gt;&lt;/h1&gt;
        &lt;span class="postdate"&gt;04 September 2012&lt;/span&gt;
    &lt;/div&gt;
    &lt;div class="postbody"&gt;
        &lt;p&gt;Something I've been worrying about for a long time has just happened: &lt;a href="http://pastebin.com/nfVT7b0Z"&gt;Antisec
has leaked a database with more than a million
UDIDs&lt;/a&gt;. The UDID issue has been a bit of a white
whale of mine - I've written many blog posts about it and spent more hours than
I care to think negotiating responsible disclosure with companies misusing
UDIDs. Let's recap some of the posts I've written about this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="http://corte.si/posts/security/openfeint-udid-deanonymization/index.html"&gt;In May 2011&lt;/a&gt;,
just before its sale to Gree was announced, I showed that
&lt;a href="http://en.wikipedia.org/wiki/OpenFeint"&gt;OpenFeint&lt;/a&gt; was misusing UDIDs in a way
that allowed you to link a UDID to a user's identity, geolocation and Facebook
and Twitter accounts. I didn't discuss it openly at the time, you could also
completely take over an OpenFeint account, and access chat, forums, friends
lists, and more using just a UDID. This resulted in a class-action lawsuit
against OpenFeint, which has since petered out.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="http://corte.si/posts/security/apple-udid-survey/index.html"&gt;Later that month&lt;/a&gt;, I
published a survey looking at how UDIDs are used in practice.
The data is now slightly out of date, but shows just how widely UDIDs are used and misused.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="http://corte.si/posts/security/udid-must-die/index.html"&gt;In September 2011&lt;/a&gt;,
I published the most troubling news so far, which
paradoxically also got the least coverage in the press. I looked at
&lt;em&gt;all&lt;/em&gt; the gaming social networks on IOS - basically OpenFeint and its
competitors - and found catastrophic mismanagement by nearly everyone. The
vulnerabilities ranged from de-anonymization, to takeover of the user's gaming
social network account, to the ability to completely take over the user's
Facebook and Twitter accounts using just a UDID.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As serious these problems are, I'm afraid it's just the tip of the iceberg.
Negotiating disclosure and trying to convince companies to fix their problems
has taken literally months of my time, so I've stopped publishing on this issue
for the moment. It's disheartening to say it, but some of the companies
mentioned in my posts &lt;em&gt;still&lt;/em&gt; have unfixed problems (they were all notified
well in advance of any publication). I will also note ominously that I know of
a number of similar vulnerabilities elsewhere in the IOS app ecosystem that
I've just not had the time to pursue. &lt;/p&gt;

&lt;p&gt;When speaking to people about this, I've often been asked "What's the worst
that can happen?". My response was always that the worst case scenario would be
if a large database of UDIDs leaked... and here we are.&lt;/p&gt;

    &lt;/div&gt;
    &lt;div class="postfooter"&gt;
    &lt;/div&gt;
&lt;/div&gt;
</description><guid isPermaLink="true">http://corte.si/posts/security/udid-leak.html</guid><pubDate>Tue, 04 Sep 2012 18:12:00 GMT</pubDate></item><item><title>Defiler</title><link>http://corte.si/posts/photos/lymantriid/index.html</link><description>&lt;div class="post"&gt;
    &lt;div class="posthead"&gt;
        &lt;h1&gt;&lt;a href="http://corte.si/posts/photos/lymantriid/index.html"&gt;Defiler&lt;/a&gt;&lt;/h1&gt;
        &lt;span class="postdate"&gt;26 August 2012&lt;/span&gt;
    &lt;/div&gt;
    &lt;div class="postbody"&gt;
        &lt;p&gt;I've been living out of a bag for the last 3 weeks, working hard on a series of
intense but fun audits. After running in high gear for a while I find that I
need a mental palate cleanser - something to help me refocus and stop me from
getting snowblind. I then grab my camera, strap on my macro rig, and walk out
the door to try to catch the local wildlife in the act. It's become a bit of a
game - the aim is to catch creatures in their natural setting and leave them
completely undisturbed when I go, with no posing, prodding or other
disturbances. Getting a usable shot of a 5mm target sitting on a twig swaying
in the wind is a fun challenge. &lt;/p&gt;

&lt;p&gt;Today I find myself in Sydney, working in a part of the town that is shot
through with unreasonably beautiful walking tracks. The place is also blessed
with a huge diversity of invertebrate life that makes my &lt;a href="http://en.wikipedia.org/wiki/Dunedin"&gt;adopted home
town&lt;/a&gt; seem barren by comparison. I walked
along a nearby track until I found a quiet, leafy spot, geared up, and
leopard-crawled through the underbrush. Not long after, I came face-to-face
with this imposing little chap sitting on the tip of a fern frond. &lt;/p&gt;

&lt;p&gt;&lt;center&gt;
    &lt;a href="http://corte.si/posts/photos/lymantriid/lymantriid2.jpg"&gt;
        &lt;img src="http://corte.si/posts/photos/lymantriid/lymantriid2-small.jpg"&gt;
    &lt;/a&gt;
&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;This is a &lt;a href="http://en.wikipedia.org/wiki/Lymantriidae"&gt;Lymantriid&lt;/a&gt; caterpillar
of some variety, probably one of the tussock moths native to Australia.
"Lymantria" means "defiler" - some species of this family can cause huge damage
to foliage, and are considered to be destructive pests. So much so, that when a
single male &lt;a href="http://en.wikipedia.org/wiki/Gypsy_moth"&gt;Gypsy Moth&lt;/a&gt; (Lymantria
dispar) was discovered in Hamilton, New Zealand, they sprayed the entire city
with a caterpillar-specific &lt;a href="http://www.biosecurity.govt.nz/pests-diseases/forests/gypsy-moth/residents/foray.htm"&gt;bacterial
insecticide&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;No need for drastic measures with this particular fellow, though - he's native
to this ecosystem, and the only pest is me and my camera. He was head down
munching away when I found him, and paid absolutely no attention to me when I
moved in close to get these shots. He's got reason to be cocksure, too - those
tufts of hair on his back contain hollow, poison-filled spines that can cause a
pretty unpleasant reaction when touched. &lt;/p&gt;

&lt;p&gt;&lt;center&gt;
    &lt;a href="http://corte.si/posts/photos/lymantriid/lymantriid1.jpg"&gt;
        &lt;img src="http://corte.si/posts/photos/lymantriid/lymantriid1-small.jpg"&gt;
    &lt;/a&gt;
&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;An few hours exploring and photographing is a very effective brain-cleaner,
leaving me ready to deal with spiny, venomous defilers of the digital variety.&lt;/p&gt;

    &lt;/div&gt;
    &lt;div class="postfooter"&gt;
    &lt;/div&gt;
&lt;/div&gt;
</description><guid isPermaLink="true">http://corte.si/posts/photos/lymantriid/index.html</guid><pubDate>Sun, 26 Aug 2012 13:14:00 GMT</pubDate></item><item><title>pathod 0.2: the daemon gets an evil twin</title><link>http://corte.si/posts/code/pathod/announce0_2.html</link><description>&lt;div class="post"&gt;
    &lt;div class="posthead"&gt;
        &lt;h1&gt;&lt;a href="http://corte.si/posts/code/pathod/announce0_2.html"&gt;pathod 0.2: the daemon gets an evil twin&lt;/a&gt;&lt;/h1&gt;
        &lt;span class="postdate"&gt;22 August 2012&lt;/span&gt;
    &lt;/div&gt;
    &lt;div class="postbody"&gt;
        &lt;p&gt;I've just pushed pathod 0.2 out the door. This is a huge release, with many new
features:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="http://pathod.net/docs/pathoc"&gt;pathoc&lt;/a&gt;, pathod's evil client-side twin.&lt;/li&gt;
&lt;li&gt;&lt;a href="http://pathod.net/docs/test"&gt;libpathod.test&lt;/a&gt;, a framework for using pathod in your unit tests.&lt;/li&gt;
&lt;li&gt;&lt;a href="http://pathod.net/docs/language"&gt;Improved mini language&lt;/a&gt;, including many new abilities and improvements.&lt;/li&gt;
&lt;li&gt;A rewrite of the networking core.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The project also has a new website at &lt;a href="http://pathod.net"&gt;pathod.net&lt;/a&gt;. Yes,
pathod is now self-hosting, so you can try out both pathod and pathoc
specifications right on the website. There's also a new &lt;a href="http://public.pathod.net/200:b%22hello,%20sailor.%22"&gt;public pathod
instance&lt;/a&gt;, which I'm
sure everyone will use entirely responsibly. &lt;/p&gt;

&lt;p&gt;&lt;a class="btn btn-large" href="http://pathod.net"&gt;get it here&lt;/a&gt;&lt;/p&gt;

    &lt;/div&gt;
    &lt;div class="postfooter"&gt;
    &lt;/div&gt;
&lt;/div&gt;
</description><guid isPermaLink="true">http://corte.si/posts/code/pathod/announce0_2.html</guid><pubDate>Wed, 22 Aug 2012 14:02:00 GMT</pubDate></item><item><title>Introducing pathod: a pathological HTTP server</title><link>http://corte.si/posts/code/pathod/announce0_1.html</link><description>&lt;div class="post"&gt;
    &lt;div class="posthead"&gt;
        &lt;h1&gt;&lt;a href="http://corte.si/posts/code/pathod/announce0_1.html"&gt;Introducing pathod: a pathological HTTP server&lt;/a&gt;&lt;/h1&gt;
        &lt;span class="postdate"&gt;01 May 2012&lt;/span&gt;
    &lt;/div&gt;
    &lt;div class="postbody"&gt;
        &lt;p&gt;I've just released &lt;a href="http://cortesi.github.com/pathod""&gt;pathod&lt;/a&gt;, a pathological
HTTP/S daemon useful for testing and torturing HTTP clients. At its core is a
tiny, terse language for crafting HTTP responses. It also has a built-in web
interface that lets you play with the response spec language, inspect logs, and
access pathod's full help document. &lt;/p&gt;

&lt;p&gt;The rest of this post is a quick teaser showing some of pathod's abilities.
See the detailed documentation on the &lt;a href="http://cortesi.github.com/pathod""&gt;pathod
site&lt;/a&gt; if you want more. &lt;/p&gt;

&lt;h1&gt;The simplest possible response&lt;/h1&gt;

&lt;p&gt;The easiest way to craft a response is to specify it directly in the request
URL. Lets start with the simplest possible example. Start pathod, and then
visit this URL:&lt;/p&gt;

&lt;pre class="terminal"&gt;
http://localhost:9999/p/200
&lt;/pre&gt;

&lt;p&gt;The "/p/" path is the location of the response generator in pathod's default
configuration - everything after that a response specification in pathod's
mini-language.  The general form of a response spec is as follows:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;code[MESSAGE]:[colon-separated list of features]
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;In this case, we're specifying only the HTTP response code - that is, an HTTP
200 OK with no headers and no content, resulting in a response like this:&lt;/p&gt;

&lt;pre class="terminal"&gt;
HTTP/1.1 200 OK
&lt;/pre&gt;

&lt;h1&gt;Specifying features&lt;/h1&gt;

&lt;p&gt;One example of a "feature" is a response header. Lets embellish our response by
adding one:&lt;/p&gt;

&lt;pre class="terminal"&gt;
200:h"Etag"="foo"
&lt;/pre&gt;

&lt;p&gt;The first letter of the feature - "h", in this case - is a mnemonic indicating
the type of feature we're adding. The full response to this spec looks like this:&lt;/p&gt;

&lt;pre class="terminal"&gt;
HTTP/1.1 200 OK
Etag: foo
&lt;/pre&gt;

&lt;p&gt;Both "Etag" and "foo" are Value Specifiers, a syntax used throughout the
response specification language. In this case they are literal values, as
indicated by the fact that they are quoted strings. The Value Specification
syntax also lets us load values from files or generate random data. For
instance, here is a specification that generates 100k of random binary data for
the header value:&lt;/p&gt;

&lt;pre class="terminal"&gt;
200:h"Etag"=@100k
&lt;/pre&gt;

&lt;p&gt;Now, binary data in the header value will probably break things in interesting
ways, but is unlikely to be read by the client as a valid (but over-long)
value. To see if the client really drops off its perch if we feed it a single
100k header, we have to constrain the random data. Here's the same response,
but with data generated only from ASCII letters:&lt;/p&gt;

&lt;pre class="terminal"&gt;
200:h"Etag"=@100k,ascii_letters
&lt;/pre&gt;

&lt;p&gt;pathod has a large number of built-in character classes from which random
data can be generated. &lt;/p&gt;

&lt;h1&gt;Pauses and Disconnects&lt;/h1&gt;

&lt;p&gt;Next, we can disrupt the communications in various ways. At the moment, this
means adding pauses and disconnects to a response. Let's start with an HTTP 404
response with a body consisting of a 100k of random binary data:&lt;/p&gt;

&lt;pre class="terminal"&gt;
404:b@100k
&lt;/pre&gt;

&lt;p&gt;Here's the same response, but with a 120 second pause after sending 100 bytes:&lt;/p&gt;

&lt;pre class="terminal"&gt;
404:b@100k:p120,100
&lt;/pre&gt;

&lt;p&gt;And, the same response again, but with hard disconnect after sending 100 bytes:&lt;/p&gt;

&lt;pre class="terminal"&gt;
404:b@100k:d100
&lt;/pre&gt;

&lt;p&gt;Instead of specifying a time explicitly, we can ask pathod to just randomly
disconnect at a time of its choosing:&lt;/p&gt;

&lt;pre class="terminal"&gt;
404:b@100k:dr
&lt;/pre&gt;

&lt;p&gt;That's it for the teaser - hopefully it's enough to entice you into looking at
&lt;a href="http://cortesi.github.com/pathod""&gt;pathod&lt;/a&gt;'s full documentation.&lt;/p&gt;

&lt;h1&gt;What's next?&lt;/h1&gt;

&lt;p&gt;pathod is an "airport project" - the first draft was written in its
entirety during a 40-hour trip back home from New York (I drew a bad lot in
stopovers). I've now firmed it up a bit, but there's still work to be done. In
the next month, mitmproxy's test suite will move to pathod, after which
there will be a simple, well-documented way to unit test. I also plan to build
out the JSON API (which is used to drive pathod in test suites), and expand the
mini-language with convenient ways  to generate pathological cookies,
authentication headers, SSL errors, and cache control. &lt;/p&gt;

    &lt;/div&gt;
    &lt;div class="postfooter"&gt;
    &lt;/div&gt;
&lt;/div&gt;
</description><guid isPermaLink="true">http://corte.si/posts/code/pathod/announce0_1.html</guid><pubDate>Tue, 01 May 2012 08:14:00 GMT</pubDate></item><item><title>mitmproxy 0.8</title><link>http://corte.si/posts/code/mitmproxy/announce0_8/index.html</link><description>&lt;div class="post"&gt;
    &lt;div class="posthead"&gt;
        &lt;h1&gt;&lt;a href="http://corte.si/posts/code/mitmproxy/announce0_8/index.html"&gt;mitmproxy 0.8&lt;/a&gt;&lt;/h1&gt;
        &lt;span class="postdate"&gt;09 April 2012&lt;/span&gt;
    &lt;/div&gt;
    &lt;div class="postbody"&gt;
        &lt;p&gt;&lt;a href="http://mitmproxy.org"&gt;
&lt;img src="http://corte.si/posts/code/mitmproxy/announce0_8/mitmproxy_0_8.png"/&gt;
&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I'm happy to announce the release of &lt;a href="http://mitmproxy.org"&gt;mitmproxy 0.8&lt;/a&gt;.
This release has a few major new features, big speedups, and many, many small
bugfixes and improvements. Here are the headlines:&lt;/p&gt;

&lt;h2&gt;Android interception&lt;/h2&gt;

&lt;p&gt;The most prominent new feature is that we now have a supported way to intercept
Android traffic. What's more, we can do this without a cumbersome transparent
proxying rig - see the &lt;a href="http://mitmproxy.org/doc/certinstall/android.html"&gt;Android section in the
documentation&lt;/a&gt; for the
details. Special thanks goes to &lt;a href="http://twitter.com/yjmbo"&gt;Jim Cheetham&lt;/a&gt; for
lending me an Android device and helping to get this feature off the ground.&lt;/p&gt;

&lt;h2&gt;Replacement patterns&lt;/h2&gt;

&lt;p&gt;Another exceedingly useful new feature is &lt;a href="http://mitmproxy.org/doc/replacements.html"&gt;replacement
patterns&lt;/a&gt;. These consist of a
filter, a regular expression and a replacement string, and run continuously
while mitmproxy processes requests and responses. You can pass these either on
the command-line, or using a built-in replacement pattern editor.&lt;/p&gt;

&lt;p&gt;&lt;img src="http://corte.si/posts/code/mitmproxy/announce0_8/mitmproxy0_8_replace.png"/&gt;&lt;/p&gt;

&lt;p&gt;I'm sure you can immediately think of many uses for this flexible feature, but
my favourite is to use it during testing as a way to conveniently inject
complicated exploits into web traffic. I do this by setting a replacement
pattern that swaps a short but likely unique string (say MYXSS) for a long
exploit, and then I use simple interaction and front-end tools like Firebug to
inject exploits into requests manually based on the short string marker.&lt;/p&gt;

&lt;h2&gt;Improved pretty-printing of request and response contents&lt;/h2&gt;

&lt;p&gt;This release of mitmproxy has a completely redesigned subsystem for
pretty-printing request and response bodies. For instance, we now extract EXIF
tags and other basic information to give you something better than a hex dump 
when looking at an image:&lt;/p&gt;

&lt;p&gt;&lt;img src="http://corte.si/posts/code/mitmproxy/announce0_8/mitmproxy0_8-pretty.png"/&gt;&lt;/p&gt;

&lt;p&gt;We also have much improved HTML indenting (using &lt;a href="http://lxml.de/"&gt;lxml&lt;/a&gt;), and
a built-in JavaScript beautifier (thanks to
&lt;a href="http://jsbeautifier.org"&gt;JSBeautifier&lt;/a&gt;) that teases out compressed and
obfuscated scripts into something readable.&lt;/p&gt;

&lt;h2&gt;Changelog&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Detailed tutorial for Android interception. Some features that land in
this release have finally made reliable Android interception possible.&lt;/li&gt;
&lt;li&gt;Upstream-cert mode, which uses information from the upstream server to
generate interception certificates.&lt;/li&gt;
&lt;li&gt;Replacement patterns that let you easily do global replacements in flows
matching filter patterns. Can be specified on the command-line, or edited
interactively.&lt;/li&gt;
&lt;li&gt;Much more sophisticated and usable pretty printing of request bodies.
Support for auto-indentation of JavaScript, inspection of image EXIF
data, and more.&lt;/li&gt;
&lt;li&gt;Details view for flows, showing connection and SSL cert information (X
keyboard shortcut).&lt;/li&gt;
&lt;li&gt;Server certificates are now stored and serialized in saved traffic for
later analysis. This means that the 0.8 serialization format is NOT
compatible with 0.7.&lt;/li&gt;
&lt;li&gt;Add a shortcut key ("f") to load the remainder of a request or response body,
if it is abbreviated.&lt;/li&gt;
&lt;li&gt;Many other improvements, including bugfixes, and expanded scripting API,
and more sophisticated certificate handling.&lt;/li&gt;
&lt;/ul&gt;

    &lt;/div&gt;
    &lt;div class="postfooter"&gt;
    &lt;/div&gt;
&lt;/div&gt;
</description><guid isPermaLink="true">http://corte.si/posts/code/mitmproxy/announce0_8/index.html</guid><pubDate>Mon, 09 Apr 2012 16:57:00 GMT</pubDate></item><item><title>mitmproxy 0.7</title><link>http://corte.si/posts/code/mitmproxy/announce0_7/index.html</link><description>&lt;div class="post"&gt;
    &lt;div class="posthead"&gt;
        &lt;h1&gt;&lt;a href="http://corte.si/posts/code/mitmproxy/announce0_7/index.html"&gt;mitmproxy 0.7&lt;/a&gt;&lt;/h1&gt;
        &lt;span class="postdate"&gt;27 February 2012&lt;/span&gt;
    &lt;/div&gt;
    &lt;div class="postbody"&gt;
        &lt;p&gt;&lt;a href="http://mitmproxy.org"&gt;
&lt;img src="http://corte.si/posts/code/mitmproxy/announce0_7/mitmproxy_0_7.png"/&gt;
&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I'm happy to announce the release of &lt;a href="http://mitmproxy.org"&gt;mitmproxy 0.7&lt;/a&gt;. The
biggest visible change is a new structured editor for headers, query strings
and form fields. Other new feature include a reverse proxy mode, extended
script API that makes many common tasks much easier, and a myriad of
improvements to the interface (including a massive increase in speed).
Everybody still on 0.6 should upgrade - get it here:&lt;/p&gt;

&lt;h2&gt;&lt;a href="http://mitmproxy.org"&gt;mitmproxy-0.7.tar.gz&lt;/a&gt; &lt;a href="http://mitmproxy.org/docs"&gt;(docs)&lt;/a&gt;&lt;/h2&gt;

&lt;p&gt;You can also now install mitmproxy using &lt;a href="http://pypi.python.org/pypi/pip"&gt;pip&lt;/a&gt;, like so:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;pip install mitmproxy
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;In other news, the project has had an amazing month, after a rash of
high-profile results obtained using mitmproxy were published. It started with
&lt;a href="http://mclov.in/2012/02/08/path-uploads-your-entire-address-book-to-their-servers.html"&gt;Arun Thampi's
discovery&lt;/a&gt;
that Path uploads users' address books to their servers. Things snowballed from
there, and for a few days mitmproxy seemed to be everywhere. Similar findings
were made for
&lt;a href="http://markchang.tumblr.com/post/17244167951/hipster-uploads-part-of-your-iphone-address-book-to-its"&gt;Hipster&lt;/a&gt;,
&lt;a href="http://www.theverge.com/2012/2/14/2798008/ios-apps-and-the-address-book-what-you-need-to-know"&gt;The
Verge&lt;/a&gt;
did a mitmproxy-driven AddressbookGate expose (including vaguely threatening
background shots of mitmproxy doing its dastardly work), and lots of people
said nice things on Twitter. &lt;/p&gt;

&lt;p&gt;To see the impact all of this for the mitmproxy project, you need only look at
the &lt;a href="http://github.com/cortesi/mitmproxy"&gt;Github page&lt;/a&gt; - watchers of the repo
went from about 200 a month a go, to 950 at the time of this post. &lt;/p&gt;

&lt;h2&gt;Changelog&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;New built-in key/value editor. This lets you interactively edit URL query
strings, headers and URL-encoded form data. &lt;/li&gt;
&lt;li&gt;Extend script API to allow duplication and replay of flows.&lt;/li&gt;
&lt;li&gt;API for easy manipulation of URL-encoded forms and query strings.&lt;/li&gt;
&lt;li&gt;Add "D" shortcut in mitmproxy to duplicate a flow.&lt;/li&gt;
&lt;li&gt;Reverse proxy mode. In this mode mitmproxy acts as an HTTP server,
forwarding all traffic to a specified upstream server.&lt;/li&gt;
&lt;li&gt;UI improvements - use Unicode characters to make GUI more compact,
improve spacing and layout throughout.&lt;/li&gt;
&lt;li&gt;Add support for filtering by HTTP method.&lt;/li&gt;
&lt;li&gt;Add the ability to specify an HTTP body size limit.&lt;/li&gt;
&lt;li&gt;Move to typed netstrings for serialization format - this makes 0.7
backwards-incompatible with serialized data from 0.6!&lt;/li&gt;
&lt;li&gt;Significant improvements in speed and responsiveness of UI. &lt;/li&gt;
&lt;li&gt;Many minor bugfixes and improvements.&lt;/li&gt;
&lt;/ul&gt;

    &lt;/div&gt;
    &lt;div class="postfooter"&gt;
    &lt;/div&gt;
&lt;/div&gt;
</description><guid isPermaLink="true">http://corte.si/posts/code/mitmproxy/announce0_7/index.html</guid><pubDate>Mon, 27 Feb 2012 20:38:00 GMT</pubDate></item><item><title>OpenBSD in decline?</title><link>http://corte.si/posts/security/openbsd-decline/index.html</link><description>&lt;div class="post"&gt;
    &lt;div class="posthead"&gt;
        &lt;h1&gt;&lt;a href="http://corte.si/posts/security/openbsd-decline/index.html"&gt;OpenBSD in decline?&lt;/a&gt;&lt;/h1&gt;
        &lt;span class="postdate"&gt;26 February 2012&lt;/span&gt;
    &lt;/div&gt;
    &lt;div class="postbody"&gt;
        &lt;p&gt;My leisurely Sunday activity today is to set up a new
&lt;a href="http://openbsd.org"&gt;OpenBSD&lt;/a&gt; firewall for my mobile app testing lab. I haven't
done a from-scratch OpenBSD install for years, so I spent some time reading
through the change logs for the last few versions to catch up with what's
changed. Although the project is clearly still making steady, well-engineered
progress, I had the nagging feeling that the rate of change wasn't what it used
to be. So, I pulled some numbers from &lt;a href="http://archives.neohapsis.com/archives/openbsd/cvs/"&gt;CVS commit message list
archives&lt;/a&gt;, and graphed
them. Here are the number of commits per month from January 2001 to January
2012. The orange line is a simple 12-month moving average: &lt;/p&gt;

&lt;p&gt;&lt;img src="http://corte.si/posts/security/openbsd-decline/commitspermonth.png"/&gt;&lt;/p&gt;

&lt;p&gt;Now, we should be cautious about interpreting this - the number of commits
doesn't tell us anything about the quality, importance or magnitude of code
change. Even if it did all of these things, there are other and perhaps better
measures of a project's health. Still, the trend is clear, and suggests a
sustained decline in activity.&lt;/p&gt;

&lt;p&gt;I just &lt;a href="http://openbsd.org/orders.html"&gt;bought some T-shirts&lt;/a&gt; to help support
one of my favourite open source projects. You should too. &lt;/p&gt;

    &lt;/div&gt;
    &lt;div class="postfooter"&gt;
    &lt;/div&gt;
&lt;/div&gt;
</description><guid isPermaLink="true">http://corte.si/posts/security/openbsd-decline/index.html</guid><pubDate>Sun, 26 Feb 2012 09:08:00 GMT</pubDate></item><item><title>Malware</title><link>http://corte.si/posts/visualisation/malware/index.html</link><description>&lt;div class="post"&gt;
    &lt;div class="posthead"&gt;
        &lt;h1&gt;&lt;a href="http://corte.si/posts/visualisation/malware/index.html"&gt;Malware&lt;/a&gt;&lt;/h1&gt;
        &lt;span class="postdate"&gt;05 January 2012&lt;/span&gt;
    &lt;/div&gt;
    &lt;div class="postbody"&gt;
        &lt;div class="hidden"&gt;&lt;h2&gt;If you subscribe to my RSS feed, please visit this
article directly.  The table below has interactive elements that won't work in
most feed readers.&lt;/h2&gt; &lt;/div&gt;

&lt;p&gt;Hover and click for more.&lt;/p&gt;

&lt;table class="spacertable"&gt;
    &lt;tr&gt;

&lt;td&gt;
    &lt;a href='http://corte.si/posts/visualisation/malware/detail/00f29767bee5f8bd5b2d55d5be734f69.html'&gt;
        &lt;img class='malwareimg' src='http://corte.si/posts/visualisation/malware/images/small_00f29767bee5f8bd5b2d55d5be734f69_entropy.png'/&gt;
    &lt;/a&gt;
&lt;/td&gt;


&lt;td&gt;
    &lt;a href='http://corte.si/posts/visualisation/malware/detail/01310712a180d9f939c126712d24363d.html'&gt;
        &lt;img class='malwareimg' src='http://corte.si/posts/visualisation/malware/images/small_01310712a180d9f939c126712d24363d_entropy.png'/&gt;
    &lt;/a&gt;
&lt;/td&gt;


&lt;td&gt;
    &lt;a href='http://corte.si/posts/visualisation/malware/detail/023293a96c763bbdee3991994cdcdcef.html'&gt;
        &lt;img class='malwareimg' src='http://corte.si/posts/visualisation/malware/images/small_023293a96c763bbdee3991994cdcdcef_entropy.png'/&gt;
    &lt;/a&gt;
&lt;/td&gt;


&lt;td&gt;
    &lt;a href='http://corte.si/posts/visualisation/malware/detail/0309fc0e6dbeb714c5361f82b2ccb037.html'&gt;
        &lt;img class='malwareimg' src='http://corte.si/posts/visualisation/malware/images/small_0309fc0e6dbeb714c5361f82b2ccb037_entropy.png'/&gt;
    &lt;/a&gt;
&lt;/td&gt;


&lt;td&gt;
    &lt;a href='http://corte.si/posts/visualisation/malware/detail/038e3a7add116ac69e5f9539ce461386.html'&gt;
        &lt;img class='malwareimg' src='http://corte.si/posts/visualisation/malware/images/small_038e3a7add116ac69e5f9539ce461386_entropy.png'/&gt;
    &lt;/a&gt;
&lt;/td&gt;

&lt;/tr&gt;&lt;tr&gt;

&lt;td&gt;
    &lt;a href='http://corte.si/posts/visualisation/malware/detail/03b3f30aed5b7dc39bd6e356bbde3713.html'&gt;
        &lt;img class='malwareimg' src='http://corte.si/posts/visualisation/malware/images/small_03b3f30aed5b7dc39bd6e356bbde3713_entropy.png'/&gt;
    &lt;/a&gt;
&lt;/td&gt;


&lt;td&gt;
    &lt;a href='http://corte.si/posts/visualisation/malware/detail/04240e137999dc6b5115de8db3a15f53.html'&gt;
        &lt;img class='malwareimg' src='http://corte.si/posts/visualisation/malware/images/small_04240e137999dc6b5115de8db3a15f53_entropy.png'/&gt;
    &lt;/a&gt;
&lt;/td&gt;


&lt;td&gt;
    &lt;a href='http://corte.si/posts/visualisation/malware/detail/04fee7e6dedf912b4a72886486627b05.html'&gt;
        &lt;img class='malwareimg' src='http://corte.si/posts/visualisation/malware/images/small_04fee7e6dedf912b4a72886486627b05_entropy.png'/&gt;
    &lt;/a&gt;
&lt;/td&gt;


&lt;td&gt;
    &lt;a href='http://corte.si/posts/visualisation/malware/detail/05fd535d70dfb5ee4f36e87e39d8c70d.html'&gt;
        &lt;img class='malwareimg' src='http://corte.si/posts/visualisation/malware/images/small_05fd535d70dfb5ee4f36e87e39d8c70d_entropy.png'/&gt;
    &lt;/a&gt;
&lt;/td&gt;


&lt;td&gt;
    &lt;a href='http://corte.si/posts/visualisation/malware/detail/07ddb50c4cc358fc3718847684ca5fae.html'&gt;
        &lt;img class='malwareimg' src='http://corte.si/posts/visualisation/malware/images/small_07ddb50c4cc358fc3718847684ca5fae_entropy.png'/&gt;
    &lt;/a&gt;
&lt;/td&gt;

&lt;/tr&gt;&lt;tr&gt;

&lt;td&gt;
    &lt;a href='http://corte.si/posts/visualisation/malware/detail/08b983ec55bfd50d1d2cb9a90b1ae54e.html'&gt;
        &lt;img class='malwareimg' src='http://corte.si/posts/visualisation/malware/images/small_08b983ec55bfd50d1d2cb9a90b1ae54e_entropy.png'/&gt;
    &lt;/a&gt;
&lt;/td&gt;


&lt;td&gt;
    &lt;a href='http://corte.si/posts/visualisation/malware/detail/08c926bf7fbb3397236effef1b30b4df.html'&gt;
        &lt;img class='malwareimg' src='http://corte.si/posts/visualisation/malware/images/small_08c926bf7fbb3397236effef1b30b4df_entropy.png'/&gt;
    &lt;/a&gt;
&lt;/td&gt;


&lt;td&gt;
    &lt;a href='http://corte.si/posts/visualisation/malware/detail/094fedd2e4c175cd81dc170fd4d03917.html'&gt;
        &lt;img class='malwareimg' src='http://corte.si/posts/visualisation/malware/images/small_094fedd2e4c175cd81dc170fd4d03917_entropy.png'/&gt;
    &lt;/a&gt;
&lt;/td&gt;


&lt;td&gt;
    &lt;a href='http://corte.si/posts/visualisation/malware/detail/096381c0f5ddc29319ba2b2647cea116.html'&gt;
        &lt;img class='malwareimg' src='http://corte.si/posts/visualisation/malware/images/small_096381c0f5ddc29319ba2b2647cea116_entropy.png'/&gt;
    &lt;/a&gt;
&lt;/td&gt;


&lt;td&gt;
    &lt;a href='http://corte.si/posts/visualisation/malware/detail/09dd27fcccb9c000d37c6394364be1b5.html'&gt;
        &lt;img class='malwareimg' src='http://corte.si/posts/visualisation/malware/images/small_09dd27fcccb9c000d37c6394364be1b5_entropy.png'/&gt;
    &lt;/a&gt;
&lt;/td&gt;

&lt;/tr&gt;&lt;tr&gt;

&lt;td&gt;
    &lt;a href='http://corte.si/posts/visualisation/malware/detail/0b4f82e83741e79310d797d54db5a9be.html'&gt;
        &lt;img class='malwareimg' src='http://corte.si/posts/visualisation/malware/images/small_0b4f82e83741e79310d797d54db5a9be_entropy.png'/&gt;
    &lt;/a&gt;
&lt;/td&gt;


&lt;td&gt;
    &lt;a href='http://corte.si/posts/visualisation/malware/detail/0bcee1314e8c61fa8ef55743f3bb7742.html'&gt;
        &lt;img class='malwareimg' src='http://corte.si/posts/visualisation/malware/images/small_0bcee1314e8c61fa8ef55743f3bb7742_entropy.png'/&gt;
    &lt;/a&gt;
&lt;/td&gt;


&lt;td&gt;
    &lt;a href='http://corte.si/posts/visualisation/malware/detail/0cc9e0ba6a0bd8b79aaf2be22c496228.html'&gt;
        &lt;img class='malwareimg' src='http://corte.si/posts/visualisation/malware/images/small_0cc9e0ba6a0bd8b79aaf2be22c496228_entropy.png'/&gt;
    &lt;/a&gt;
&lt;/td&gt;


&lt;td&gt;
    &lt;a href='http://corte.si/posts/visualisation/malware/detail/0d9109ab6b06f38221b713eb6a54c42f.html'&gt;
        &lt;img class='malwareimg' src='http://corte.si/posts/visualisation/malware/images/small_0d9109ab6b06f38221b713eb6a54c42f_entropy.png'/&gt;
    &lt;/a&gt;
&lt;/td&gt;


&lt;td&gt;
    &lt;a href='http://corte.si/posts/visualisation/malware/detail/0d97f71367f8b6dcb8cbc8ec964ebdbe.html'&gt;
        &lt;img class='malwareimg' src='http://corte.si/posts/visualisation/malware/images/small_0d97f71367f8b6dcb8cbc8ec964ebdbe_entropy.png'/&gt;
    &lt;/a&gt;
&lt;/td&gt;

&lt;/tr&gt;&lt;tr&gt;

&lt;td&gt;
    &lt;a href='http://corte.si/posts/visualisation/malware/detail/0dcfe476fbd68148f007e6c48c226e0f.html'&gt;
        &lt;img class='malwareimg' src='http://corte.si/posts/visualisation/malware/images/small_0dcfe476fbd68148f007e6c48c226e0f_entropy.png'/&gt;
    &lt;/a&gt;
&lt;/td&gt;


&lt;td&gt;
    &lt;a href='http://corte.si/posts/visualisation/malware/detail/0e2bf707dbc146c9d60c373237d050b7.html'&gt;
        &lt;img class='malwareimg' src='http://corte.si/posts/visualisation/malware/images/small_0e2bf707dbc146c9d60c373237d050b7_entropy.png'/&gt;
    &lt;/a&gt;
&lt;/td&gt;


&lt;td&gt;
    &lt;a href='http://corte.si/posts/visualisation/malware/detail/0eab36fc4307a1fd3ad8d832c526cf40.html'&gt;
        &lt;img class='malwareimg' src='http://corte.si/posts/visualisation/malware/images/small_0eab36fc4307a1fd3ad8d832c526cf40_entropy.png'/&gt;
    &lt;/a&gt;
&lt;/td&gt;


&lt;td&gt;
    &lt;a href='http://corte.si/posts/visualisation/malware/detail/0f5c70c82a74c8ff3d05fbf4d90bc5bf.html'&gt;
        &lt;img class='malwareimg' src='http://corte.si/posts/visualisation/malware/images/small_0f5c70c82a74c8ff3d05fbf4d90bc5bf_entropy.png'/&gt;
    &lt;/a&gt;
&lt;/td&gt;


&lt;td&gt;
    &lt;a href='http://corte.si/posts/visualisation/malware/detail/0fc12afe2d283b92184897b6e7bcc2c2.html'&gt;
        &lt;img class='malwareimg' src='http://corte.si/posts/visualisation/malware/images/small_0fc12afe2d283b92184897b6e7bcc2c2_entropy.png'/&gt;
    &lt;/a&gt;
&lt;/td&gt;

&lt;/tr&gt;&lt;tr&gt;

&lt;td&gt;
    &lt;a href='http://corte.si/posts/visualisation/malware/detail/0ff25e3cefcce4336d0abeb9f02ccb02.html'&gt;
        &lt;img class='malwareimg' src='http://corte.si/posts/visualisation/malware/images/small_0ff25e3cefcce4336d0abeb9f02ccb02_entropy.png'/&gt;
    &lt;/a&gt;
&lt;/td&gt;


&lt;td&gt;
    &lt;a href='http://corte.si/posts/visualisation/malware/detail/109f8c72ff91dee5906aba0e47324526.html'&gt;
        &lt;img class='malwareimg' src='http://corte.si/posts/visualisation/malware/images/small_109f8c72ff91dee5906aba0e47324526_entropy.png'/&gt;
    &lt;/a&gt;
&lt;/td&gt;


&lt;td&gt;
    &lt;a href='http://corte.si/posts/visualisation/malware/detail/12e9e61357be212f28ea4c81ef75018d.html'&gt;
        &lt;img class='malwareimg' src='http://corte.si/posts/visualisation/malware/images/small_12e9e61357be212f28ea4c81ef75018d_entropy.png'/&gt;
    &lt;/a&gt;
&lt;/td&gt;


&lt;td&gt;
    &lt;a href='http://corte.si/posts/visualisation/malware/detail/12eec9b3e0aa2e6683487c13eede2382.html'&gt;
        &lt;img class='malwareimg' src='http://corte.si/posts/visualisation/malware/images/small_12eec9b3e0aa2e6683487c13eede2382_entropy.png'/&gt;
    &lt;/a&gt;
&lt;/td&gt;


&lt;td&gt;
    &lt;a href='http://corte.si/posts/visualisation/malware/detail/131f1cb94df6e2969ac874503cbfd934.html'&gt;
        &lt;img class='malwareimg' src='http://corte.si/posts/visualisation/malware/images/small_131f1cb94df6e2969ac874503cbfd934_entropy.png'/&gt;
    &lt;/a&gt;
&lt;/td&gt;

&lt;/tr&gt;&lt;tr&gt;

&lt;td&gt;
    &lt;a href='http://corte.si/posts/visualisation/malware/detail/14064e26cbd3daed7e6eb3b4fb245c8f.html'&gt;
        &lt;img class='malwareimg' src='http://corte.si/posts/visualisation/malware/images/small_14064e26cbd3daed7e6eb3b4fb245c8f_entropy.png'/&gt;
    &lt;/a&gt;
&lt;/td&gt;


&lt;td&gt;
    &lt;a href='http://corte.si/posts/visualisation/malware/detail/14560f7dc19e6fef87743f83e5234519.html'&gt;
        &lt;img class='malwareimg' src='http://corte.si/posts/visualisation/malware/images/small_14560f7dc19e6fef87743f83e5234519_entropy.png'/&gt;
    &lt;/a&gt;
&lt;/td&gt;


&lt;td&gt;
    &lt;a href='http://corte.si/posts/visualisation/malware/detail/14e6950dd4bcffe54bf158a20437e6b4.html'&gt;
        &lt;img class='malwareimg' src='http://corte.si/posts/visualisation/malware/images/small_14e6950dd4bcffe54bf158a20437e6b4_entropy.png'/&gt;
    &lt;/a&gt;
&lt;/td&gt;


&lt;td&gt;
    &lt;a href='http://corte.si/posts/visualisation/malware/detail/1511f2d75e07bb94f5da8cbc031a51dd.html'&gt;
        &lt;img class='malwareimg' src='http://corte.si/posts/visualisation/malware/images/small_1511f2d75e07bb94f5da8cbc031a51dd_entropy.png'/&gt;
    &lt;/a&gt;
&lt;/td&gt;


&lt;td&gt;
    &lt;a href='http://corte.si/posts/visualisation/malware/detail/1542a2f2732bbdad500bf112686503ac.html'&gt;
        &lt;img class='malwareimg' src='http://corte.si/posts/visualisation/malware/images/small_1542a2f2732bbdad500bf112686503ac_entropy.png'/&gt;
    &lt;/a&gt;
&lt;/td&gt;

&lt;/tr&gt;&lt;tr&gt;

&lt;td&gt;
    &lt;a href='http://corte.si/posts/visualisation/malware/detail/163524fb9a41e6ec79178a902797f8f1.html'&gt;
        &lt;img class='malwareimg' src='http://corte.si/posts/visualisation/malware/images/small_163524fb9a41e6ec79178a902797f8f1_entropy.png'/&gt;
    &lt;/a&gt;
&lt;/td&gt;


&lt;td&gt;
    &lt;a href='http://corte.si/posts/visualisation/malware/detail/16c533cc9b3dac1bde9885b4bd967bff.html'&gt;
        &lt;img class='malwareimg' src='http://corte.si/posts/visualisation/malware/images/small_16c533cc9b3dac1bde9885b4bd967bff_entropy.png'/&gt;
    &lt;/a&gt;
&lt;/td&gt;


&lt;td&gt;
    &lt;a href='http://corte.si/posts/visualisation/malware/detail/177827ae9615791e067b4a9fb4be1ab9.html'&gt;
        &lt;img class='malwareimg' src='http://corte.si/posts/visualisation/malware/images/small_177827ae9615791e067b4a9fb4be1ab9_entropy.png'/&gt;
    &lt;/a&gt;
&lt;/td&gt;


&lt;td&gt;
    &lt;a href='http://corte.si/posts/visualisation/malware/detail/17fa099ecef82edd1e4ddc61be575ae4.html'&gt;
        &lt;img class='malwareimg' src='http://corte.si/posts/visualisation/malware/images/small_17fa099ecef82edd1e4ddc61be575ae4_entropy.png'/&gt;
    &lt;/a&gt;
&lt;/td&gt;


&lt;td&gt;
    &lt;a href='http://corte.si/posts/visualisation/malware/detail/17fd97da6d93430ec0d9aa040b4b2c58.html'&gt;
        &lt;img class='malwareimg' src='http://corte.si/posts/visualisation/malware/images/small_17fd97da6d93430ec0d9aa040b4b2c58_entropy.png'/&gt;
    &lt;/a&gt;
&lt;/td&gt;

&lt;/tr&gt;&lt;tr&gt;

&lt;td&gt;
    &lt;a href='http://corte.si/posts/visualisation/malware/detail/18ce863d41622cd7aaa3c7d3d11e2f3e.html'&gt;
        &lt;img class='malwareimg' src='http://corte.si/posts/visualisation/malware/images/small_18ce863d41622cd7aaa3c7d3d11e2f3e_entropy.png'/&gt;
    &lt;/a&gt;
&lt;/td&gt;


&lt;td&gt;
    &lt;a href='http://corte.si/posts/visualisation/malware/detail/18f9ede7d921742f963a0eb06887fdfa.html'&gt;
        &lt;img class='malwareimg' src='http://corte.si/posts/visualisation/malware/images/small_18f9ede7d921742f963a0eb06887fdfa_entropy.png'/&gt;
    &lt;/a&gt;
&lt;/td&gt;


&lt;td&gt;
    &lt;a href='http://corte.si/posts/visualisation/malware/detail/1998bb714c0de980635ee9b8c1951381.html'&gt;
        &lt;img class='malwareimg' src='http://corte.si/posts/visualisation/malware/images/small_1998bb714c0de980635ee9b8c1951381_entropy.png'/&gt;
    &lt;/a&gt;
&lt;/td&gt;


&lt;td&gt;
    &lt;a href='http://corte.si/posts/visualisation/malware/detail/19bc481e5cb1113c7eff49b67273f892.html'&gt;
        &lt;img class='malwareimg' src='http://corte.si/posts/visualisation/malware/images/small_19bc481e5cb1113c7eff49b67273f892_entropy.png'/&gt;
    &lt;/a&gt;
&lt;/td&gt;


&lt;td&gt;
    &lt;a href='http://corte.si/posts/visualisation/malware/detail/1a30184661ee6585f4a188107e63a4d2.html'&gt;
        &lt;img class='malwareimg' src='http://corte.si/posts/visualisation/malware/images/small_1a30184661ee6585f4a188107e63a4d2_entropy.png'/&gt;
    &lt;/a&gt;
&lt;/td&gt;

&lt;/tr&gt;&lt;tr&gt;

&lt;td&gt;
    &lt;a href='http://corte.si/posts/visualisation/malware/detail/1a3aa70d060be5e6e778e3519b400bf1.html'&gt;
        &lt;img class='malwareimg' src='http://corte.si/posts/visualisation/malware/images/small_1a3aa70d060be5e6e778e3519b400bf1_entropy.png'/&gt;
    &lt;/a&gt;
&lt;/td&gt;


&lt;td&gt;
    &lt;a href='http://corte.si/posts/visualisation/malware/detail/1a8700c754f97c115fa91fa161fa05cc.html'&gt;
        &lt;img class='malwareimg' src='http://corte.si/posts/visualisation/malware/images/small_1a8700c754f97c115fa91fa161fa05cc_entropy.png'/&gt;
    &lt;/a&gt;
&lt;/td&gt;


&lt;td&gt;
    &lt;a href='http://corte.si/posts/visualisation/malware/detail/1aa40b6ea4e7be64d4e6a024fcdf76fe.html'&gt;
        &lt;img class='malwareimg' src='http://corte.si/posts/visualisation/malware/images/small_1aa40b6ea4e7be64d4e6a024fcdf76fe_entropy.png'/&gt;
    &lt;/a&gt;
&lt;/td&gt;


&lt;td&gt;
    &lt;a href='http://corte.si/posts/visualisation/malware/detail/1b0e377994cfdb4eec0d2fb028118844.html'&gt;
        &lt;img class='malwareimg' src='http://corte.si/posts/visualisation/malware/images/small_1b0e377994cfdb4eec0d2fb028118844_entropy.png'/&gt;
    &lt;/a&gt;
&lt;/td&gt;


&lt;td&gt;
    &lt;a href='http://corte.si/posts/visualisation/malware/detail/1b5bad65f8b72a52cfcae67e3e538f34.html'&gt;
        &lt;img class='malwareimg' src='http://corte.si/posts/visualisation/malware/images/small_1b5bad65f8b72a52cfcae67e3e538f34_entropy.png'/&gt;
    &lt;/a&gt;
&lt;/td&gt;

&lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;The images above are &lt;a href="http://corte.si/posts/visualisation/entropy/index.html"&gt;entropy visualizations&lt;/a&gt;
of samples from a malware database - black is zero entropy, with colour ranging
through blue, up to hot pink for maximum entropy. Large areas of very high
entropy are usually sections that are packed - encrypted or obfuscated by the
malware authors to make the malware hard to detect and reverse engineer.
Smaller areas might be keys, passwords, or other chunks of data meant to be
hidden from view.&lt;/p&gt;

&lt;p&gt;When you hover over an image, you see a &lt;a href="http://corte.si/posts/visualisation/binvis/index.html"&gt;character class
visualization&lt;/a&gt; with the following colors:&lt;/p&gt;

&lt;table style="margin: 30px"&gt;
    &lt;tr&gt;
        &lt;td style="background-color: #000000"&gt;&amp;nbsp;&lt;/td&gt;
        &lt;td&gt;0x00&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td style="background-color: #ffffff"&gt;&amp;nbsp;&lt;/td&gt;
        &lt;td&gt;0xFF&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td style="background-color: #377eb8"&gt;&amp;nbsp;&lt;/td&gt;
        &lt;td&gt;Printable characters&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td style="background-color: #e41a1c"&gt;&amp;nbsp;&lt;/td&gt;
        &lt;td&gt;Everything else&lt;/td&gt;
    &lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;Clicking will show you high-detail versions of both visualizations, and let you
look up the binary hash to see what it is. I've used a square Hilbert curve
layout - the files start in the top-left corner, and pass through the quadrants
clockwise. &lt;/p&gt;

&lt;p&gt;I spent hours looking through thousands these visualizations today. I find them
eerie and rather beautiful - an entirely different perspective from my
day-to-day interactions with malware.&lt;/p&gt;

&lt;script src="http://corte.si/js01-jquery-1.8.0.min.js" type="text/javascript"&gt;&lt;/script&gt;
&lt;script&gt;
    $(function(){
        $('.malwareimg').each(function(){
            $('&lt;img/&gt;').appendTo('body')
                .css({ display: "none" })
                .attr('src',$(this).attr('src').replace("entropy", "charclass"));
        });
        $('.malwareimg').hover(
            function(){
                t = $(this);
                t.attr('src',t.attr('src').replace("entropy", "charclass"));
            },
            function(){ 
                t = $(this);
                t.attr('src',t.attr('src').replace('charclass','entropy'));
            }
         );

    })
&lt;/script&gt;

    &lt;/div&gt;
    &lt;div class="postfooter"&gt;
    &lt;/div&gt;
&lt;/div&gt;
</description><guid isPermaLink="true">http://corte.si/posts/visualisation/malware/index.html</guid><pubDate>Thu, 05 Jan 2012 22:50:00 GMT</pubDate></item><item><title>Visualizing entropy in binary files</title><link>http://corte.si/posts/visualisation/entropy/index.html</link><description>&lt;div class="post"&gt;
    &lt;div class="posthead"&gt;
        &lt;h1&gt;&lt;a href="http://corte.si/posts/visualisation/entropy/index.html"&gt;Visualizing entropy in binary files&lt;/a&gt;&lt;/h1&gt;
        &lt;span class="postdate"&gt;04 January 2012&lt;/span&gt;
    &lt;/div&gt;
    &lt;div class="postbody"&gt;
        &lt;p&gt;Last week, I wrote about &lt;a href="http://corte.si/posts/visualisation/binvis/index.html"&gt;visualizing binary files using space-filling
curves&lt;/a&gt;, a technique I use when I need to get a
quick overview of the broad structure of a file. Today, I'll show you an
elaboration of the same basic idea - still based on space-filling curves, but
this time using a colour function that measures local entropy.&lt;/p&gt;

&lt;p&gt;Before I get to the details, let's quickly talk about the motivation for a
visualization like this. We can think of entropy as the degree to which a chunk
of data is disordered. If we have a data set where all the elements have the
same value, the amount of disorder is nil, and the entropy is zero. If the data
set has the maximum amount of heterogeneity (i.e. all possible symbols are
represented equally), then we also have the maximum amount of disorder, and
thus the maximum amount of entropy. There are two common types of high-entropy
data that are of special interest to reverse engineers and penetration testers.
The first is compressed data - finding and extracting compressed sections is a
common task in many security audits. The second is cryptographic material -
which is obviously at the heart of most security work. Here, I'm referring not
only to key material and certificates, but also to hashes and actual encrypted
data. As I show below, a tool like the one I'm describing today can be highly
useful in spotting this type of information.&lt;/p&gt;

&lt;p&gt;For this visualization, I use the &lt;a href="http://en.wikipedia.org/wiki/Entropy_(information_theory"&gt;Shannon
entropy&lt;/a&gt; measure to
calculate byte entropy over a sliding window. This gives us a "local entropy"
value for each byte, even though the concept doesn't really apply to single
symbols. &lt;/p&gt;

&lt;p&gt;With that out of the way, let's look at some pretty pictures.&lt;/p&gt;

&lt;h1&gt;Visualizing the OSX ksh binary&lt;/h1&gt;

&lt;p&gt;In my previous post, I used the &lt;a href="http://en.wikipedia.org/wiki/Korn_shell"&gt;ksh&lt;/a&gt;
binary as a guinea pig, and I'll do the same here. On the left is the entropy
visualization with colours ranging from black for zero entropy, through shades
of blue as entropy increases, to hot pink for maximum entropy. On the right is
the Hilbert curve visualization from the last post for comparison - see &lt;a href="http://corte.si/posts/visualisation/binvis/index.html"&gt;the
post itself&lt;/a&gt; for an explanation of the colour
scheme. Click for larger versions with much more detail:&lt;/p&gt;

&lt;table class="spacertable"&gt;
    &lt;tr&gt;
        &lt;td&gt;Entropy&lt;/td&gt;
        &lt;td&gt;Byte class&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;
            &lt;a href="http://corte.si/posts/visualisation/entropy/hilbert-entropy-large.png"&gt;
                &lt;img src="http://corte.si/posts/visualisation/entropy/hilbert-entropy.png"/&gt;
            &lt;/a&gt;
        &lt;/td&gt;
        &lt;td&gt;
            &lt;a href="http://corte.si/posts/visualisation/binvis/binary-large-hilbert.png"&gt;
                &lt;img src="http://corte.si/posts/visualisation/binvis/binary-hilbert.png"/&gt;
            &lt;/a&gt;
        &lt;/td&gt;
    &lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;Note that this is a dual-architecture
&lt;a href="http://en.wikipedia.org/wiki/Mach-O"&gt;Mach-O&lt;/a&gt; file, containing code for both
i386 and x86_64. You can see this if you squint somewhat at these images - some
broad structures in the file are repeated twice. We can see that there are a
number of different sections of the &lt;strong&gt;ksh&lt;/strong&gt; binary that have very high entropy.
It's not immediately obvious why a system binary would contain either
compressed sections or cryptographic material. As it happens, the explanation
in this case is quite interesting. Let's have a closer look:&lt;/p&gt;

&lt;p&gt;&lt;center&gt;
    &lt;img src="http://corte.si/posts/visualisation/entropy/entropy-annotated.png"/&gt;
&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;Sections &lt;strong&gt;1&lt;/strong&gt; and &lt;strong&gt;2&lt;/strong&gt; are a lovely validation of the central idea of this
post. These two areas do indeed contain cryptographic material - in this case,
&lt;a href="http://developer.apple.com/library/mac/#technotes/tn2206/_index.html"&gt;code signing hashes and
certificates&lt;/a&gt;.
Rather satisfyingly, they stand out like a sore thumb. It turns out that all of
the official OSX binaries are signed by Apple. This is then used in turn to
apply &lt;a href="http://developer.apple.com/library/mac/#technotes/tn2206/_index.html"&gt;a variety of
policies&lt;/a&gt;,
depending on who the signatory is, and whether they are trusted.&lt;/p&gt;

&lt;p&gt;You can dump some rudimentary data about a binary's signature using the
&lt;strong&gt;codesign&lt;/strong&gt; command (which you can also use to sign binaries yourself):&lt;/p&gt;

&lt;pre&gt;
&gt; codesign -dvv /bin/ksh 
Executable=/bin/ksh
Identifier=com.apple.ksh
Format=Mach-O universal (i386 x86_64)
CodeDirectory v=20100 size=5662 flags=0x0(none) hashes=278+2 location=embedded
Signature size=4064
Authority=Software Signing
Authority=Apple Code Signing Certification Authority
Authority=Apple Root CA
Info.plist=not bound
Sealed Resources=none
Internal requirements count=1 size=92
&lt;/pre&gt;

&lt;p&gt;Section &lt;strong&gt;3&lt;/strong&gt; (the two occurrences are the same data repeated for each
architecture) is interesting for a different reason - it's a cautionary example
of how the simple entropy measure we're using sometimes detects high entropy in
highly structured data. A hex dump of the start of the region looks like this: &lt;/p&gt;

&lt;pre&gt;
000d1f00  00 01 00 00 00 02 00 00  00 06 00 00 00 00 00 00  |................|
000d1f10  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000d1f20  00 01 02 03 04 05 06 07  08 09 0a 0b 0c 0d 0e 0f  |................|
000d1f30  10 11 12 13 14 15 16 17  18 19 1a 1b 1c 1d 1e 1f  |................|
000d1f40  20 21 22 23 24 25 26 27  28 29 2a 2b 2c 2d 2e 2f  | !"#$%&amp;'()*+,-./|
000d1f50  30 31 32 33 34 35 36 37  38 39 3a 3b 3c 3d 3e 3f  |0123456789:;&lt;=&gt;?|
000d1f60  40 41 42 43 44 45 46 47  48 49 4a 4b 4c 4d 4e 4f  |@ABCDEFGHIJKLMNO|
000d1f70  50 51 52 53 54 55 56 57  58 59 5a 5b 5c 5d 5e 5f  |PQRSTUVWXYZ[\]^_|
000d1f80  60 61 62 63 64 65 66 67  68 69 6a 6b 6c 6d 6e 6f  |`abcdefghijklmno|
000d1f90  70 71 72 73 74 75 76 77  78 79 7a 7b 7c 7d 7e 7f  |pqrstuvwxyz{|}~.|
000d1fa0  80 81 82 83 84 85 86 87  88 89 8a 8b 8c 8d 8e 8f  |................|
000d1fb0  90 91 92 93 94 95 96 97  98 99 9a 9b 9c 9d 9e 9f  |................|
000d1fc0  a0 a1 a2 a3 a4 a5 a6 a7  a8 a9 aa ab ac ad ae af  |................|
000d1fd0  b0 b1 b2 b3 b4 b5 b6 b7  b8 b9 ba bb bc bd be bf  |................|
000d1fe0  c0 c1 c2 c3 c4 c5 c6 c7  c8 c9 ca cb cc cd ce cf  |................|
000d1ff0  d0 d1 d2 d3 d4 d5 d6 d7  d8 d9 da db dc dd de df  |................|
000d2000  e0 e1 e2 e3 e4 e5 e6 e7  e8 e9 ea eb ec ed ee ef  |................|
000d2010  f0 f1 f2 f3 f4 f5 f6 f7  f8 f9 fa fb fc fd fe ff  |................|
&lt;/pre&gt;

&lt;p&gt;We see that this section contains each byte value from 0x00 to 0xff in order -
furthermore this whole block is repeated with minor variations a number of
times. There are two things to explain here - why is this detected as "high
entropy" data, and what the heck is it doing in the file? &lt;/p&gt;

&lt;p&gt;First, we need to understand that the Shannon entropy measure looks only at the
relative occurrence frequencies of individual symbols (in this case, bytes). A
chunk of data like the one above therefore looks like it has high entropy,
because each symbol occurs once and only once, making the data highly
heterogeneous. &lt;/p&gt;

&lt;p&gt;Now, what earthly use would chunks of data like this be? With a bit of digging,
I found the answer in the &lt;strong&gt;ksh&lt;/strong&gt; source code. These sections are maps used for
translation between various &lt;a href="http://en.wikipedia.org/wiki/EBCDIC"&gt;character&lt;/a&gt;
&lt;a href="http://en.wikipedia.org/wiki/ASCII"&gt;encodings&lt;/a&gt;. If you're interested, here's
the &lt;a href="http://opensource.apple.com/source/ksh/ksh-13/ksh/src/lib/libast/string/ccmap.c"&gt;culprit in all its repetitive
glory&lt;/a&gt;.&lt;/p&gt;

&lt;h1&gt;The code&lt;/h1&gt;

&lt;p&gt;As usual, the code for generating all of the images in this post is up on
GitHub. The entropy visualizations were created with
&lt;a href="https://github.com/cortesi/scurve/blob/master/binvis"&gt;binvis&lt;/a&gt;, a new addition
to &lt;a href="https://github.com/cortesi/scurve"&gt;scurve&lt;/a&gt;, my compendium of code related
to space-filling curves. &lt;/p&gt;

    &lt;/div&gt;
    &lt;div class="postfooter"&gt;
    &lt;/div&gt;
&lt;/div&gt;
</description><guid isPermaLink="true">http://corte.si/posts/visualisation/entropy/index.html</guid><pubDate>Wed, 04 Jan 2012 05:26:00 GMT</pubDate></item><item><title>A personal link mill</title><link>http://corte.si/posts/socialmedia/linkmill/index.html</link><description>&lt;div class="post"&gt;
    &lt;div class="posthead"&gt;
        &lt;h1&gt;&lt;a href="http://corte.si/posts/socialmedia/linkmill/index.html"&gt;A personal link mill&lt;/a&gt;&lt;/h1&gt;
        &lt;span class="postdate"&gt;30 December 2011&lt;/span&gt;
    &lt;/div&gt;
    &lt;div class="postbody"&gt;
        &lt;p&gt;I posted a link to an interesting visualization paper on Twitter today,
&lt;a href="https://twitter.com/#!/__mharrison__/status/152503684822081537"&gt;prompting someone to ask me where I had found
it&lt;/a&gt;. Sadly, I
had to admit that I had no clue where I first saw it referenced, due to the way
I consume links I find on the net. So, I thought I'd write a quick blog post to
explain myself, and then pitch a product idea that could make my life (and
maybe yours) much easier.&lt;/p&gt;

&lt;p&gt;First, the problem statement: my aim is to efficiently discover links to
interesting stuff on the net. Simple as that. A few years ago, my flow of links
came mostly from social news sites (&lt;a href="http://news.ycombinator.com"&gt;Hacker News&lt;/a&gt;
and &lt;a href="http://reddit.com"&gt;Reddit&lt;/a&gt;), and items shared by people I follow on social
networks. Over time, I became more and more disenchanted with this way of doing
things. The social news approach is to take a torrent of very low quality links
(user submissions), and then crowd-source the filtration process through
voting.  But popularity is not a good measure of information quality, and the
result is a bland, lowest-common-denominator view of the world that has no room
for anything that doesn't make it to the front page. Don't get me wrong -
Reddit and HN do a lot of other things well - but they just don't cut it as
primary information sources. Mining links from social networks is a more
promising approach, but still problematic. None of the social networks provide
the tools needed to extract shared links from the update stream and consume
them efficiently. There is also a structural issue - I don't necessarily want
to mix my social ties and my information sources, and I definitely don't want
to be limited to just one platform. These are separate functions that I feel
require separate tools.&lt;/p&gt;

&lt;h1&gt;My personal link mill&lt;/h1&gt;

&lt;p&gt;Eventually, I took matters into my own hands. First, I hugely broadened the
number of information sources I consumed. The tool I use for this is Google
Reader - I now subscribe to about 800 individual feeds, and this number is
growing daily. The trick here is to find high-quality, low-volume link sources.
The motherlode of good links for me was to be found on social bookmarking
sites. About 700 of my subscriptions are to the RSS feeds of individual users
on &lt;a href="http://pinboard.in"&gt;Pinboard&lt;/a&gt; and &lt;a href="http://delicious.com"&gt;Delicious&lt;/a&gt;. This
gives me very fine control and a great mix of interests. Plus, getting links
from individual curators handily sidesteps the social news group-think problem.
The remainder of my subscriptions are split between blogs, some sub-Reddits, a
few Twitter users and subsections of &lt;a href="http://arxiv.org"&gt;arXiv&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;So much for how my intake works. Just as important is the way that I consume
it. I do my "filtering" in batches, usually in the evening. Using
&lt;a href="http://reederapp.com/"&gt;Reeder&lt;/a&gt; on my iPad works well for me, letting me flick
quickly and comfortably through all the new links of the day. When I find
something that looks interesting, I resist the temptation to read it then and
there - instead, I batch up all my reading for later. If it's a web page, it
goes to &lt;a href="http://www.instapaper.com/"&gt;Instapaper&lt;/a&gt;.  If it's a PDF, it gets
downloaded into a &lt;a href="http://www.dropbox.com/"&gt;DropBox&lt;/a&gt; folder, which is synced to
&lt;a href="http://www.goodiware.com/goodreader.html"&gt;GoodReader&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;Finally, the actual reading. Every morning, I toddle off to a nice cafe with my
iPad, and read all the interesting stuff I saved the previous day in a single
sitting. I'm ruthless about just skimming things that don't warrant careful
attention. If I find something particularly interesting I save it permanently,
and perhaps tweet it or mail it to someone I think might be interested. &lt;/p&gt;

&lt;h1&gt;Problems - and a product idea?&lt;/h1&gt;

&lt;p&gt;This system works for me, but it has many problems. There's no end-to-end
coordination, so by the time I sit down to actually read something, I have no
easy way to tell which feed it came from. Google Reader sucks at managing
hundreds of low-volume subscriptions. Reeder is a great, but is not tailored to
consuming redundant information from many sources. The end result is that
maintaining the system I have is a time-consuming pain in the ass. The fact
that it's still worth it despite this, makes me think there might be commercial
room for a better solution.&lt;/p&gt;

&lt;p&gt;Which brings me to a rough product idea - a formalized version of this link
mill for people who want to take direct control of their information intake.
The business end is a generalized feed consumer, letting you subscribe to RSS
feeds, Twitter users, Google+ updates, sub-Reddits and other information
sources.  Links are extracted from these feeds, keeping track of which links
appeared where. The user is then presented with a stream of links to consume,
de-duplicated so that those appearing in multiple feeds are presented only
once. The system keeps track of links the user marks as "interesting", batching
them for later consumption. It also uses this information to score the feeds,
letting the user see which feeds are low quality, and should be ditched. Given
the right tools, the time needed for a user to maintain and tend their link
feed garden would be quite modest, and the rewards would be great.&lt;/p&gt;

&lt;p&gt;If someone built this, I for one would gladly fork over some of my hard-earned
doubloons to use it. In fact, with some validation of the idea and a few
collaborators I might think of building it myself. Does this sound useful to
anyone else?&lt;/p&gt;

    &lt;/div&gt;
    &lt;div class="postfooter"&gt;
    &lt;/div&gt;
&lt;/div&gt;
</description><guid isPermaLink="true">http://corte.si/posts/socialmedia/linkmill/index.html</guid><pubDate>Fri, 30 Dec 2011 15:42:00 GMT</pubDate></item><item><title>Visualizing binaries with space-filling curves</title><link>http://corte.si/posts/visualisation/binvis/index.html</link><description>&lt;div class="post"&gt;
    &lt;div class="posthead"&gt;
        &lt;h1&gt;&lt;a href="http://corte.si/posts/visualisation/binvis/index.html"&gt;Visualizing binaries with space-filling curves&lt;/a&gt;&lt;/h1&gt;
        &lt;span class="postdate"&gt;23 December 2011&lt;/span&gt;
    &lt;/div&gt;
    &lt;div class="postbody"&gt;
        &lt;p&gt;In my day job I often come across binary files with unknown content. I have a
set of standard avenues of attack when I confront such a beast - use "file" to
see if it's a known file type, "strings" to see if there's readable text, run
some in-house code to extract compressed sections, and, of course, fire up a
hex editor to take a direct look. There's something missing in that list,
though - I have no way to get a quick view of the overall structure of the
file.  Using a hex editor for this is not much chop - if the first section of
the file looks random (i.e. probably compressed or encrypted), who's to say
that there isn't a chunk of non-random information a meg further down?
Ideally, we want to do this type of broad pattern-finding by eye, so a
visualization seems to be in order.&lt;/p&gt;

&lt;p&gt;First, lets begin by picking a colour scheme. We have 256 different byte
values, but for a first-pass look at a file, we can compress that down into a
few common classes:&lt;/p&gt;

&lt;table&gt;
    &lt;tr&gt;
        &lt;td style="background-color: #000000"&gt;&amp;nbsp;&lt;/td&gt;
        &lt;td&gt;0x00&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td style="background-color: #ffffff"&gt;&amp;nbsp;&lt;/td&gt;
        &lt;td&gt;0xFF&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td style="background-color: #377eb8"&gt;&amp;nbsp;&lt;/td&gt;
        &lt;td&gt;Printable characters&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td style="background-color: #e41a1c"&gt;&amp;nbsp;&lt;/td&gt;
        &lt;td&gt;Everything else&lt;/td&gt;
    &lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;This covers the most common padding bytes, nicely highlights strings, and lumps
everything else into a miscellaneous bucket. The broad outline of what we need
to do next is clear - we sample the file at regular intervals, translate each
sampled byte to a colour, and write the corresponding pixel to our image. This
brings us to the big question - what's the best way to arrange the pixels? A
first stab might be to lay the pixels out row by row, snaking to and fro to
make sure each pixel is always adjacent to its predecessor. It turns out,
however, that this zig-zag pattern is not very satisfying - small scale
features (i.e. features that take up only a few lines) tend to get lost.  What
we want is a layout that maps our one-dimensional sequence of samples onto the
2-d image, while keeping elements that are close together in one dimension as
near as possible to each other in two dimensions.  This is called "locality
preservation", and the &lt;a href="http://en.wikipedia.org/wiki/Space-filling_curve"&gt;space-filling
curves&lt;/a&gt; are a family of
mathematical constructs that have precisely this property. If you're a regular
reader of this blog, you may know that I have an
&lt;a href="http://corte.si/posts/code/hilbert/portrait/index.html"&gt;almost&lt;/a&gt;
&lt;a href="http://corte.si/posts/code/sortvis-fruitsalad/index.html"&gt;unseemly&lt;/a&gt;
&lt;a href="http://corte.si/posts/code/hilbert/swatches/index.html"&gt;fondness&lt;/a&gt; for these critters. So,
lets add a couple of space-filling curves to the mix to see how they stack up.
The &lt;a href="http://en.wikipedia.org/wiki/Z-order_curve"&gt;Z-Order curve&lt;/a&gt; has found wide
practical use in computer science. It's not the best in terms of locality
preservation, but it's easy and quick to compute. The &lt;a href="http://en.wikipedia.org/wiki/Hilbert_curve"&gt;Hilbert
curve&lt;/a&gt;, on the other hand, is
(nearly) as good as it gets at locality preservation, but is much more
complicated to generate. Here's what our three candidate curves look like - in
each case, the traversal starts in the top-left corner:&lt;/p&gt;

&lt;table class="spacertable"&gt;
    &lt;tr&gt;
        &lt;td&gt;Zigzag&lt;/td&gt;
        &lt;td&gt;Z-order&lt;/td&gt;
        &lt;td&gt;Hilbert&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;&lt;img src="http://corte.si/posts/visualisation/binvis/zigzag.png"/&gt;&lt;/td&gt;
        &lt;td&gt;&lt;img src="http://corte.si/posts/visualisation/binvis/zorder.png"/&gt;&lt;/td&gt;
        &lt;td&gt;&lt;img src="http://corte.si/posts/visualisation/binvis/hilbert.png"/&gt;&lt;/td&gt;
    &lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;And here they are, visualizing the
&lt;a href="http://en.wikipedia.org/wiki/Korn_shell"&gt;ksh&lt;/a&gt;
(&lt;a href="http://en.wikipedia.org/wiki/Mach-O"&gt;Mach-O&lt;/a&gt;,
&lt;a href="http://en.wikipedia.org/wiki/Fat_binary"&gt;dual-architecture&lt;/a&gt;) binary
distributed with OSX - click for the significantly more spectacular larger
versions of the images:&lt;/p&gt;

&lt;table class="spacertable"&gt;
    &lt;tr&gt;
        &lt;td&gt;Zigzag&lt;/td&gt;
        &lt;td&gt;Z-order&lt;/td&gt;
        &lt;td&gt;Hilbert&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;
            &lt;a href="http://corte.si/posts/visualisation/binvis/binary-large-zigzag.png"&gt;
                &lt;img src="http://corte.si/posts/visualisation/binvis/binary-zigzag.png"/&gt;
            &lt;/a&gt;
        &lt;/td&gt;
        &lt;td&gt;
            &lt;a href="http://corte.si/posts/visualisation/binvis/binary-large-zorder.png"&gt;
                &lt;img src="http://corte.si/posts/visualisation/binvis/binary-zorder.png"/&gt;
            &lt;/a&gt;
        &lt;/td&gt;
        &lt;td&gt;
            &lt;a href="http://corte.si/posts/visualisation/binvis/binary-large-hilbert.png"&gt;
                &lt;img src="http://corte.si/posts/visualisation/binvis/binary-hilbert.png"/&gt;
            &lt;/a&gt;
        &lt;/td&gt;
    &lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;The classical Hilbert and Z-Order curves are actually square, so for these
visualizations I've unrolled them, stacking four sub-curves on top of each
other.  To my eye, the Hilbert curve is the clear winner here. Local features
are prominent because they are nicely clumped together. The Z-order curve shows
some annoying artifacts with contiguous chunks of data sometimes split between
two or more visual blocks. &lt;/p&gt;

&lt;p&gt;The downside of the space-filling curve visualizations is that we can't look at
a feature in the image and tell where, exactly, it can be found in the file.
I'm toying with the idea (though not very seriously) of writing an interactive
binary file viewer with a space-filling curve navigation pane. This would let
the user click on or hover over a patch of structure and see the file offset
and the corresponding hex. &lt;/p&gt;

&lt;h1&gt;More detail&lt;/h1&gt;

&lt;p&gt;We can get more detail in these images by increasing the granularity of the
colour mapping. One way to do this is to use a trick I first concocted to
&lt;a href="http://corte.si/posts/code/hilbert/portrait/index.html"&gt;visualize the Hilbert Curve at
scale&lt;/a&gt;. The basic idea is to use a
3-d Hilbert curve traversal of the RGB colour cube to create a palette of
colours. This makes use of the locality-preserving properties of the Hilbert
curve to make sure that similar elements have similar colours in the
visualization. See the &lt;a href="http://corte.si/posts/code/hilbert/portrait/index.html"&gt;original
post&lt;/a&gt; for more.&lt;/p&gt;

&lt;p&gt;So, here's a Hilbert curve mapping of a binary file, using a Hilbert-order
traversal of the RGB cube as a colour palette. Again, click on the image for
the much nicer large scale version:&lt;/p&gt;

&lt;p&gt;&lt;center&gt;
    &lt;a href="http://corte.si/posts/visualisation/binvis/hilbert-hilbert-large.png"&gt;
        &lt;img src="http://corte.si/posts/visualisation/binvis/hilbert-hilbert.png"/&gt;
    &lt;/a&gt;
&lt;/center&gt;&lt;/p&gt;

&lt;p&gt;This shows significantly more fine-grained structure, which might be good for a
deep dive into a binary. On the other hand, the colours don't map cleanly to
distinct byte classes, so the image is harder to interpret. An ideal hex viewer
would let you flick between the two palettes for navigation. &lt;/p&gt;

&lt;h1&gt;The code&lt;/h1&gt;

&lt;p&gt;As usual, I'm publishing the code for generating all of the images in this
post. The binary visualizations were created with
&lt;a href="https://github.com/cortesi/scurve/blob/master/binvis"&gt;binvis&lt;/a&gt;, which is a new
addition to &lt;a href="https://github.com/cortesi/scurve"&gt;scurve&lt;/a&gt;, my space-filling curve
project. The curve diagrams were made with the "drawcurve" utility to be found
in the same place.&lt;/p&gt;

    &lt;/div&gt;
    &lt;div class="postfooter"&gt;
    &lt;/div&gt;
&lt;/div&gt;
</description><guid isPermaLink="true">http://corte.si/posts/visualisation/binvis/index.html</guid><pubDate>Fri, 23 Dec 2011 06:02:00 GMT</pubDate></item><item><title>netograph.com - Realtime privacy snapshots of the social web</title><link>http://corte.si/posts/netograph/launch/index.html</link><description>&lt;div class="post"&gt;
    &lt;div class="posthead"&gt;
        &lt;h1&gt;&lt;a href="http://corte.si/posts/netograph/launch/index.html"&gt;netograph.com - Realtime privacy snapshots of the social web&lt;/a&gt;&lt;/h1&gt;
        &lt;span class="postdate"&gt;08 December 2011&lt;/span&gt;
    &lt;/div&gt;
    &lt;div class="postbody"&gt;
        &lt;p&gt;Today, I'm launching &lt;a href="http://netograph.com"&gt;Netograph&lt;/a&gt;, a new privacy-related
site that I've been hacking on over the past few months. The goal of the
project is to provide you with a quick overview of the privacy picture for a
URL, &lt;strong&gt;before&lt;/strong&gt; you've clicked on the link. At the moment, Netograph scans
&lt;a href="http://reddit.com"&gt;Reddit&lt;/a&gt;, &lt;a href="http://news.ycombinator.com"&gt;Hacker News&lt;/a&gt;,
&lt;a href="http://pinboard.in"&gt;Pinboard&lt;/a&gt;, &lt;a href="http://delicous.com"&gt;Delicous&lt;/a&gt; and
&lt;a href="http://digg.com"&gt;Digg&lt;/a&gt; - links on these sites should show up within a few
minutes of submission.&lt;/p&gt;

&lt;p&gt;For more details, head over to &lt;a href="http://netograph.com"&gt;netograph.com&lt;/a&gt;. There you
will also find
&lt;a href="https://addons.mozilla.org/en-US/firefox/addon/netograph/"&gt;Firefox&lt;/a&gt; and
&lt;a href="https://chrome.google.com/webstore/detail/bfhmbldbigkpniinkmckafbgcajcbaai"&gt;Chrome&lt;/a&gt;
browser addons that let you view the Netograph report for a URL instantly with
a right-click. Enjoy!&lt;/p&gt;

&lt;table class="spacertable"&gt;
    &lt;tr&gt;

        &lt;td&gt;
            &lt;a href="http://netograph.com/starmap/1740"&gt;
                &lt;img src="http://corte.si/posts/netograph/launch/ng-guardian.png"&gt;
                guardian.co.uk
            &lt;/a&gt;
        &lt;/td&gt;

        &lt;td&gt;
            &lt;a href="http://netograph.com/starmap/2512"&gt;
                &lt;img src="http://corte.si/posts/netograph/launch/ng-techcrunch.png"&gt;
                techcrunch.com
            &lt;/a&gt;
        &lt;/td&gt;

        &lt;td&gt;
            &lt;a href="http://netograph.com/starmap/2457"&gt;
                &lt;img src="http://corte.si/posts/netograph/launch/ng-reddit.png"&gt;
                reddit.com
            &lt;/a&gt;
        &lt;/td&gt;

    &lt;/tr&gt;
&lt;/table&gt;

&lt;h2&gt;What's next?&lt;/h2&gt;

&lt;p&gt;This is just the first step. As I hinted in a &lt;a href="http://corte.si/posts/privacy/neighbourhoods-of-trust/index.html"&gt;previous
post&lt;/a&gt;, the most interesting
results from Netograph are likely to come from aggregating and
cross-correlating the data for individual URLs. I'm already hard at work on
this - the next iteration of Netograph will aim to shine some light on the
sometimes shadowy network of third-parties that track and analyze nearly every
URL we visit. I will also be publishing some interesting tidbits from this data
corpus on my blog as I go along, so watch this space.&lt;/p&gt;

    &lt;/div&gt;
    &lt;div class="postfooter"&gt;
    &lt;/div&gt;
&lt;/div&gt;
</description><guid isPermaLink="true">http://corte.si/posts/netograph/launch/index.html</guid><pubDate>Thu, 08 Dec 2011 05:39:00 GMT</pubDate></item><item><title>Otago Polytechnic Talk</title><link>http://corte.si/posts/talks/polytech.html</link><description>&lt;div class="post"&gt;
    &lt;div class="posthead"&gt;
        &lt;h1&gt;&lt;a href="http://corte.si/posts/talks/polytech.html"&gt;Otago Polytechnic Talk&lt;/a&gt;&lt;/h1&gt;
        &lt;span class="postdate"&gt;31 October 2011&lt;/span&gt;
    &lt;/div&gt;
    &lt;div class="postbody"&gt;
        &lt;p&gt;Further reading for the guest lecture I'm giving at Otago Polytechnic today:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;The talk I'm not giving: &lt;a href="https://www.owasp.org/index.php/Top_10_2010-Main"&gt;OWASP Top 10&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Tools: &lt;a href="http://getfirebug.com/"&gt;FireBug&lt;/a&gt;, &lt;a href="https://addons.mozilla.org/en-US/firefox/addon/tamper-data/"&gt;TamperData&lt;/a&gt;, &lt;a href="http://python.org"&gt;Python&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The &lt;a href="http://en.wikipedia.org/wiki/Samy_(XSS)"&gt;Myspace Worm&lt;/a&gt;, and Samy
Kamkar's &lt;a href="http://namb.la/popular/tech.html"&gt;own explanation of the exploit&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Halvar Flake's &lt;a href="http://www.immunityinc.com/infiltrate/2011/presentations/Fundamentals_of_exploitation_revisited.pdf"&gt;Programming and state machines&lt;/a&gt;, which is where I first saw the term "programming the weird machine".&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

    &lt;/div&gt;
    &lt;div class="postfooter"&gt;
    &lt;/div&gt;
&lt;/div&gt;
</description><guid isPermaLink="true">http://corte.si/posts/talks/polytech.html</guid><pubDate>Mon, 31 Oct 2011 09:05:00 GMT</pubDate></item><item><title>Neighborhoods of trust on the web</title><link>http://corte.si/posts/privacy/neighbourhoods-of-trust/index.html</link><description>&lt;div class="post"&gt;
    &lt;div class="posthead"&gt;
        &lt;h1&gt;&lt;a href="http://corte.si/posts/privacy/neighbourhoods-of-trust/index.html"&gt;Neighborhoods of trust on the web&lt;/a&gt;&lt;/h1&gt;
        &lt;span class="postdate"&gt;27 September 2011&lt;/span&gt;
    &lt;/div&gt;
    &lt;div class="postbody"&gt;
        &lt;p&gt;For the last fortnight I've been hard at work on a new project that aims to
examine trust and security on the web at scale. The basic idea is to use a
browser instance to render a URL, and then to extract all persistent state with
browser forensic techniques afterwards. This gives you a dump of cookies, cache
contents, Flash storage, HTML5 databases, and so on. At the same time, all
traffic is routed through a specialised version of
&lt;a href="http://mitmproxy.org"&gt;mitmproxy&lt;/a&gt;, and captured for later analysis. The result
is a very detailed snapshot of what viewing a given URL actually &lt;em&gt;does&lt;/em&gt;. The
next step is to do this "at scale" - this means running many instances of this
process in parallel on headless servers, decoupling things using queues,
backing it all onto a database, and then spending days and days fine-tuning.
I'm happy with my progress so far - my infrastructure is now now scanning all
the URLs passing through &lt;a href="http://news.ycombinator.com"&gt;Hacker News&lt;/a&gt;,
&lt;a href="http://reddit.com"&gt;Reddit&lt;/a&gt;, &lt;a href="http://digg.com"&gt;Digg&lt;/a&gt;,
&lt;a href="http://delicious.com"&gt;Delicious&lt;/a&gt; and &lt;a href="http://pinboard.in"&gt;Pinboard&lt;/a&gt; in
realtime, without breaking a sweat.&lt;/p&gt;

&lt;p&gt;I am pretty excited about the possibilities for this project, and I'm exploring
plans for the future with like-minded security folk. Get in touch if this
interests you, and keep an eye on my blog for more news.&lt;/p&gt;

&lt;p&gt;After my pilot run, I had 150 gigs of data covering about 120 thousand URLs.
Below is a quick peek at one tiny slice of this data - an appetizer for things
to come.&lt;/p&gt;

&lt;h1&gt;Neighborhoods of trust&lt;/h1&gt;

&lt;p&gt;&lt;a href="http://corte.si/posts/privacy/neighbourhoods-of-trust/images/full.png"&gt;
    &lt;img src="http://corte.si/posts/privacy/neighbourhoods-of-trust/images/wholegraph.png"/&gt;
&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This graph shows structures that emerge from the way sites use third-party
executable resources. In this context, "executable" means means JavaScript,
Flash and HTML, and "third-party" means domains other than the URL's own. The
nodes in this graph are the third-party domains, and the edges are associations
between them via the URLs I crawled. For example, if a site loaded scripts from
both Google Analytics and from Doubleclick, that would create (or reinforce) an
edge between the nodes "google-analytics.com" and "doubleclick.com".  Using
this data, I calculated a co-occurrence coefficient for the third-party
sources, and then extracted the resulting neighbourhood structures
&lt;a href="http://lanl.arxiv.org/abs/0803.0476"&gt;algorithmically&lt;/a&gt;. The neighbourhood
information was used to colour and lay out the graph, trying to keep nodes that
are closely correlated together. Finally, nodes are scaled based on how many
URLs reference them.&lt;/p&gt;

&lt;p&gt;The result is a rather stunning graph showing neighborhoods of trust - areas of
the Internet bound together based on the third parties allowed to run code in
users' browsers. I've spent a few hours playing with this data, and the sheer
range of interesting structure is surprising. At one end of the spectrum, you
can zoom in to the individual node relationships, and find small clusters of
surprising sites that cross-load resources from each other, often because they
are owned by the same entity. At the other end, countries, language groups, and
broad fields of interest aggregate in huge tribes of kinship.&lt;/p&gt;

&lt;p&gt;Here are a few of the larger-scale features from the graph: &lt;/p&gt;

&lt;table class="layouttable"&gt;
    &lt;tr&gt;
        &lt;td&gt;
            &lt;img style="float: left" src="http://corte.si/posts/privacy/neighbourhoods-of-trust/images/wholegraph-b.png"/&gt;
        &lt;/td&gt;
        &lt;td&gt;

                &lt;h2&gt;Mainstream&lt;/h2&gt;

                The most widely used resources dominate in the neighbourhood
                extraction algorithm, which causes them to cluster together in
                their own super-community. The top nodes in this cluster,
                descending order of occurrence are: google-analytics.com,
                facebook.com, doubleclick.net, fbcdn.net, quantserve.com,
                twitter.com, google.com, googlesyndication.com, googleapis.com,
                scorecardresearch.net, facebook.net, addthis.com. These are
                also the top nodes overall.
        &lt;/td&gt;
    &lt;/tr&gt;

    &lt;tr&gt;
        &lt;td&gt;
            &lt;img style="clear: left; float: left;" src="http://corte.si/posts/privacy/neighbourhoods-of-trust/images/wholegraph-a.png"/&gt;
        &lt;/td&gt;
        &lt;td&gt;

            &lt;h2&gt;Japanese&lt;/h2&gt;

            The main resources are hatena.ne.jp, microad.jp, mixi.jp,
            yahoo.co.jp, nakanohito.jp. More surprisingly, also in this cluster
            are topsy.com, appspot.com and postrank.com. Perhaps these
            resources are especially commonly used on Japanese sites. 

        &lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;

            &lt;img src="http://corte.si/posts/privacy/neighbourhoods-of-trust/images/wholegraph-d.png"/&gt;

        &lt;/td&gt;
        &lt;td&gt;

            &lt;h2&gt;Russian&lt;/h2&gt;

            Top resources are yadro.ru, yandex.ru, rambler.ru, vkontakte.ru,
            openstat.net, userapi.com, shinystat.net, and dt00.net

        &lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;

            &lt;img src="http://corte.si/posts/privacy/neighbourhoods-of-trust/images/wholegraph-c.png"/&gt;

        &lt;/td&gt;
        &lt;td&gt;

            &lt;h2&gt;Porn&lt;/h2&gt;

            And here we have a portion of the web dedicated to pron. The top
            resources are awempire.com, clickbank.net, picadmedia.com,
            getresponse.com, adultfriendfinder.com, adultadword.com, phcdn.com,
            juicyads.com, brazzers.com, etology.com, data-ero-advertising.com
            and viddler.com. A more surprising inclusion in this group is
            wufoo.com - I wonder if this is an artifact, or whether Wufoo
            really does have a use in the adult content world.  

        &lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;

            &lt;img src="http://corte.si/posts/privacy/neighbourhoods-of-trust/images/wholegraph-e.png"/&gt;

        &lt;/td&gt;
        &lt;td&gt;

            &lt;h2&gt;Misc&lt;/h2&gt;

            Just to show that it's not all clear-cut, here's an example of a
            neighbourhood I find harder to explain. The top resources are
            netdna-cdn.com, amgdgt.com, trafficmp.com, ooyala.com,
            suitesmart.com, demdex.net, adfrontiers.com, lycos.com and
            break.com. I speculate that this group might be loosely aligned
            around a number of big CDNs and analysis suites.

        &lt;/td&gt;
    &lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;The graph in this post was created, analyzed and pre-processed using
&lt;a href="http://projects.skewed.de/graph-tool/"&gt;graph-tool&lt;/a&gt;, a great Python library for
dealing with large graphs. The visualization and modularity analysis was done
using the ever-wonderful &lt;a href="http://gephi.org/"&gt;Gephi&lt;/a&gt;. If these aren't both in
your arsenal of analysis tools, you're missing out.&lt;/p&gt;

    &lt;/div&gt;
    &lt;div class="postfooter"&gt;
    &lt;/div&gt;
&lt;/div&gt;
</description><guid isPermaLink="true">http://corte.si/posts/privacy/neighbourhoods-of-trust/index.html</guid><pubDate>Tue, 27 Sep 2011 23:23:00 GMT</pubDate></item><item><title>Why the Apple UDID had to die</title><link>http://corte.si/posts/security/udid-must-die/index.html</link><description>&lt;div class="post"&gt;
    &lt;div class="posthead"&gt;
        &lt;h1&gt;&lt;a href="http://corte.si/posts/security/udid-must-die/index.html"&gt;Why the Apple UDID had to die&lt;/a&gt;&lt;/h1&gt;
        &lt;span class="postdate"&gt;09 September 2011&lt;/span&gt;
    &lt;/div&gt;
    &lt;div class="postbody"&gt;
        &lt;p&gt;&lt;strong&gt;EDIT: A &lt;a href="http://blogs.wsj.com/digits/2011/09/19/privacy-risk-found-on-cellphone-games/"&gt;WSJ Digits
article&lt;/a&gt;
is now up, containing a responses from Zynga and Chillingo. Other networks
declined to comment.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A UDID is a "Unique Device Identifier" - you can think of it as a serial number
burned permanently into every iPhone, iPad and iPod Touch. Any installed app
can access the UDID without requiring the user's knowledge or consent.  We know
that UDIDs are very widely used - in a sample of 94 apps I tested, &lt;a href="http://corte.si/posts/security/apple-udid-survey/index.html"&gt;74%
silently sent the UDID to one or more servers on the
Internet&lt;/a&gt;, often without encryption.
This means that UDIDs are not secret values - if you use an Apple device
regularly, it's certain that your UDID has found its way into scores of
databases you're entirely unaware of. Developers often assume UDIDs are
anonymous values, and routinely use them to aggregate detailed and sensitive
user behavioural information. One example is Flurry, a mobile analytics firm
used by 15% of apps I tested, which can monitor application startup, shutdown,
scores achieved, and a host of other application-specific events, all linked to
the user's UDID. I recently showed that it was possible to use
&lt;a href="http://en.wikipedia.org/wiki/OpenFeint"&gt;OpenFeint&lt;/a&gt;, a large mobile social
gaming network, to &lt;a href="http://corte.si/posts/security/openfeint-udid-deanonymization/index.html"&gt;de-anonymize
UDIDs&lt;/a&gt;, linking them to
usernames, email addresses, GPS locations, and even Facebook profiles.&lt;/p&gt;

&lt;p&gt;This post looks at the way UDIDs are used in the broader social gaming
ecosystem. The work is based on a simple question: what happens if we swap our
UDID for another while communicating with the network?  There are a number of
ways to do this - in my case I used &lt;a href="http://mitmproxy.org"&gt;mitmproxy&lt;/a&gt;, an
intercepting HTTP/S proxy I developed which lets me re-write the traffic
leaving a device on the fly. In most cases this was a simple matter of
replacing one string with another, but two networks (Scoreloop and Crystal)
prevented UDID substitution using cryptography. Unfortunately, both networks
relied on the secrecy of key material distributed in the application binaries
to every device. I have verified that it is possible to reverse engineer the
application binaries to extract the key material and circumvent the
cryptographic protection.&lt;/p&gt;

&lt;p&gt;The outcome of this experiment shows that social gaming networks systematically
misuse UDIDs, resulting in serious privacy breaches for their users. All the
networks I tested allowed UDIDs to be linked to potentially identifying user
information, ranging from usernames to email addresses, friends lists and
private messages. Furthermore, 5 of the 7 networks allow an attacker to log in
as a user using only their UDID, giving the attacker complete control of the
user's account. Two networks had further problems that compromised a user's
Facebook and Twitter accounts - Crystal lets an attacker take control of a user
accounts by leaking API keys, while Scoreloop partially discloses users'
friends lists, even if they are private. &lt;/p&gt;

&lt;p&gt;&lt;style&gt;
    .yes {
        background-color: #d55858;
        color: #000000;
    }
    .no {
        background-color: #5bd65b;
        color: #000000;
    }&lt;/p&gt;

&lt;p&gt;&lt;/style&gt;&lt;/p&gt;

&lt;table class="table table-bordered"&gt;

    &lt;tr&gt;
        &lt;th&gt;&lt;/th&gt;
        &lt;th&gt;Data leaked&lt;/th&gt;
        &lt;th&gt;Log in as user&lt;/th&gt;
        &lt;th&gt;Social Media Accounts&lt;/th&gt;
    &lt;/tr&gt;

    &lt;tr&gt;
        &lt;th&gt;&lt;a href="http://www.chillingo.com/"&gt;Crystal&lt;/a&gt;&lt;/th&gt;
        &lt;td class="yes"&gt; Username, friends, Facebook, Twitter, games played, location, email address &lt;/td&gt;
        &lt;td class="yes"&gt; Yes &lt;/td&gt;
        &lt;td class="yes"&gt; Control of Facebook, Twitter accounts&lt;/td&gt;
    &lt;/tr&gt;

    &lt;tr&gt;
        &lt;th&gt;&lt;a href="http://www.gameloft.com/"&gt;GameLoft&lt;/a&gt;&lt;/th&gt;
        &lt;td class="yes"&gt; Username, email address, games played, nationality, friends &lt;/td&gt;
        &lt;td class="yes"&gt; Yes &lt;/td&gt;
        &lt;td class="no"&gt; No &lt;/td&gt;
    &lt;/tr&gt;

    &lt;tr&gt;
        &lt;th&gt;&lt;a href="http://www.geocade.com/"&gt;Geocade&lt;/a&gt;&lt;/th&gt;
        &lt;td class="yes"&gt; Username, email address, games played, location &lt;/td&gt;
        &lt;td class="yes"&gt; Yes &lt;/td&gt;
        &lt;td class="no"&gt; No &lt;/td&gt;
    &lt;/tr&gt;

    &lt;tr&gt;
        &lt;th&gt;&lt;a href="http://openfeint.com/"&gt;OpenFeint&lt;/a&gt;&lt;/th&gt;
        &lt;td class="yes"&gt; Username, last played game, online status, friends &lt;/td&gt;
        &lt;td class="yes"&gt; Yes &lt;/td&gt;
        &lt;td class="no"&gt; No &lt;/td&gt;
    &lt;/tr&gt;

    &lt;tr&gt;
        &lt;th&gt;&lt;a href="http://www.scoreloop.com/"&gt;Scoreloop&lt;/a&gt;&lt;/th&gt;
        &lt;td class="yes"&gt; Email address, gender, username, nationality, friends &lt;/td&gt;
        &lt;td class="yes"&gt; Yes &lt;/td&gt;
        &lt;td class="yes"&gt; Access private Facebook and Twitter friends lists &lt;/td&gt;
    &lt;/tr&gt;

    &lt;tr&gt;
        &lt;th&gt;&lt;a href="http://plusplus.com/"&gt;Plus+&lt;/a&gt;&lt;/th&gt;
        &lt;td class="yes"&gt; Username &lt;/td&gt;
        &lt;td class="no"&gt; No &lt;/td&gt;
        &lt;td class="no"&gt; No &lt;/td&gt;
    &lt;/tr&gt;

    &lt;tr&gt;
        &lt;th&gt;&lt;a href="http://www.zynga.com/"&gt;Zynga&lt;/a&gt;&lt;/th&gt;
        &lt;td class="yes"&gt; First name, username, friends*, in-game messages*,
        mobile number* &lt;/td&gt;
        &lt;td class="yes"&gt; Yes* &lt;/td&gt;
        &lt;td class="no"&gt; No &lt;/td&gt;
    &lt;/tr&gt;

&lt;/table&gt;

&lt;p&gt;* The starred Zynga findings rely on the fact that other networks can be used
to obtain the user's email address using the UDID. &lt;/p&gt;

&lt;p&gt;There are two caveats to keep in mind while considering these results. First,
the findings are based on the default settings for each social network - some
networks may have settings that reduce the amount of information exposed.
Second, some of the data leaked is optional - for instance, it's not mandatory
for a user to link Facebook or Twitter accounts with any of the networks. &lt;/p&gt;

&lt;p&gt;All the affected companies and Apple were notified 5 weeks ago. The Crystal and
Scoreloop teams have both repaired the problems that could lead to a follow-on
compromise of a user's social network accounts. At the time of writing, it is
still possible to log in as a user using only a UDID on five of the vulnerable
networks. &lt;/p&gt;

&lt;h1&gt;The future&lt;/h1&gt;

&lt;p&gt;A few days after I notified the companies involved, it was revealed that Apple
was &lt;a href="http://techcrunch.com/2011/08/19/apple-ios-5-phasing-out-udid/"&gt;quietly killing the UDID
API&lt;/a&gt;. It will
still be present in IOS5, but is marked deprecated, and will probably be
removed in future. I recommend that developers shift away from using UDIDs now,
rather than wait for formal removal of the API.&lt;/p&gt;

&lt;p&gt;We can now expect a frenzy of activity as developers look for alternatives. The
challenge will be to make sure that the cure isn't as bad as the disease -
Apple's recommendation to "create a unique identifier specific to your app"
could tempt developers to replicate the UDID mechanism on a smaller scale,
flaws and all. Expect more blog posts on this topic soon.&lt;/p&gt;

    &lt;/div&gt;
    &lt;div class="postfooter"&gt;
    &lt;/div&gt;
&lt;/div&gt;
</description><guid isPermaLink="true">http://corte.si/posts/security/udid-must-die/index.html</guid><pubDate>Fri, 09 Sep 2011 20:22:00 GMT</pubDate></item></channel></rss>