Things I found on GitHub: aspell custom dictionary entries

2013-02-26

I've been doing a series of posts looking at data gathered with ghrabber, a simple tool I wrote that lets you grab files matching a search specification from GitHub. Last week, I looked at shell history in the broad, and then specifically at pipe chains. Today, I move on to something different - custom aspell dictionaries. When aspell finds a word it doesn't recognize, the user is prompted to correct it, ignore it, or add it to a custom dictionary so that it will be recognized as correct in future. These words are written to the user's custom dictionary - a file named .aspell_en_pw that lives in the user's home directory. It turns out that 30 people have checked aspell dictionaries into GitHub, containing a total of 9501 custom words. The chart below shows the top 50 words, with the X-axis showing the percentage of files the word appeared in.

There were a few requests for the raw data behind the previous two posts, so this time round you can also download a CSV file with the occurrence totals for each word in the dataset.