corte.si

I recently published some research showing that the OpenFeint social gaming network can be used to link Apple UDIDs to users' real-world identities. To understand why this is a problem, we have to look at the way UDIDs are used in the broader app ecosystem. Once we do this, we see that the vast majority of applications send UDIDs to servers on the Internet, and that UDID-linked user information is aggregated in literally thousands of databases on the net. In this context, UDID de-anonymization is a serious threat to user privacy.

We have one good research paper surveying UDID use - in 2010, Eric Smith looked at the unencrypted portion of app traffic, and found that 68% of tested apps send UDIDs upstream in the clear. I was curious to see what the figures would look like if encrypted (HTTPS) traffic was included, so I decided to do my own survey, using mitmproxy to analyse all traffic from the 94 applications I had installed on my iPhone. Below is a set of graphs highlighting the main facts. I've also published a list of all applications and the domains they contacted here - it makes for interesting reading.

Apps are noisier than you think they are

84% of apps tested contacted one or more domains during use. At the extreme end, iDestroy contacted 14 domains, including 3 different ad networks and OpenFeint.

... and send your UDID to more places than you expect

74% of apps tested sent the device UDID to one or more domains.

... often without encryption

46% of apps that transmitted UDIDs did so in the clear. 54% of apps transmitting UDIDs used encryption for all UDID traffic¹.

A few big UDID aggregators dominate

Three big aggregators of UDID-related data dominate: Apple, Flurry, and OpenFeint. Each one of these companies has the vast majority of UDIDs on file, linked to a rich set of privacy-sensitive information. OpenFeint's ubiquity is one of the reasons why UDID de-anonymization using their API is so serious.

... behind them are a long tail of smaller aggregators

Here is a list of all the remaining domains that had UDIDs transmitted to them - a mixture of ad networks, analytics firms, individual developer sites, and online services.

ads.mp.mydas.mobi	analytics.localytics.com	api.dropbox.com
bayobongo.com	bbc.112.2o7.net	beatwave.collect3.com.au
catalog.lexcycle.com	data.mobclix.com	init.gc.apple.com
msh.amazon.com	notifications.lexcycle.com	promo.limbic.com
soma.smaato.com	www.chimerasw.com	www.phasiclabs.com
www.trainyard.ca	api.twitter.com	ngpipes.ngmoco.com
npr.122.2o7.net	ws.tapjoyads.com

Methodology

For each application, I started a logging instance of mitmdump, like so:

mitmdump -w appname

I then started up the application, interacted with anything that might elicit network traffic, and shut it down. The collected data was analyzed with a simple script, that used the libmproxy API to traverse the traffic dumps and extract the needed information.

The fact that 54% of UDID-using apps would have gone undetected by Smith's study seems to indicate that there should be a much greater difference between our results - Smith found 68% of apps use UDIDs vs my 74%. The discrepancy can be accounted for by the fact that we used different samples - Smith used predominantly applications in Apple's "Top Free" lists, whereas I used both paid and unpaid applications that happened to be on my phone.