Non-programming books for Programmers
I started this blog as an outlet for the neglected parts of my brain not devoted to technology - a place to write about books, sociology, economics, and things that don't necessarily involve a keyboard. From this perspective, I've failed spectacularly: nearly everything on this blog has turned out to be about programming. I recognize a lost battle when I see one, so my new strategy is to run with the fact that this is a technical blog, but to inject some non-technical content anyway.
This will be the first in a series of reviews of non-programming books for programmers, exploring the sometimes tenuous connections between the books I love and the business of cutting code. I will do my best to be interesting and thought-provoking, but in the end I can only guarantee one thing: each and every one of these will be way, way tldr.
The Superorganism: Bert Hölldobler & E. O. Wilson
It's impossible to talk about The Superorganism without first mentioning Bert Hölldobler and E. O. Wilson's most famous collaboration - a book called simply The Ants . I've been fascinated with ants since childhood, and The Ants is one of my favourite books - deep enough to be intellectually satisfying on almost any detail, and broad enough to be one of those rare books that summarizes nearly everything to be said about its subject. It's hard to avoid platitudes like "authoritative" and "magisterial" when talking about a book like this, so I will resort to a simple computer science analogy: The Ants is to the study of ants what The Art of Computer Programming is to the study of algorithms. Only more so, because unlike Knuth, Hölldobler and Wilson actually completed their survey in 1990. It should be no surprise then, that I had The Superorganism on pre-order as soon as I heard that Hölldobler and Wilson were publishing their first new book in almost two decades. The Superorganism expands on a theme that also lies at the heart of The Ants - the workings of insect societies. The Superorganism paints with a broader brush than its predecessor, touching frequently on the other great families of eusocial insects - termites, bees and wasps.
If you haven't delved into the world of social insects before, you're in for a treat. The range and complexity of social insect behaviour can be weirder and more wonderful than anything found in science fiction. Consider, for example, the lives of what the authors call the "ultimate superorganism": the Attine leafcutter ants. The remarkable fact about the leafcutters is that they are farmers, cultivating vast fungal gardens that provide them with essential nutrients. These fungal gardens are grown on a substrate of leaf-matter, and leafcutters get their name from the fact that colonies cut up enormous quantities of leaves to transport back to their nests - one mature colony was estimated to harvest a leaf area of 4550 square meters per year. The fungus gardens are the lifeblood of the leafcutter colony, and they are tended with endless patience and skill. Leaves brought back to the nest are snipped up, molded into pellets, and carefully planted with fungal hyphae taken from elsewhere in the garden. Workers patrol the fungal gardens ceaselessly, weeding out foreign fungal strains and other contaminants. The ants secrete antibiotics that inhibit the growth of other fungi, and produce growth hormones that enhances the growth of their own strain. They wage an endless battle against Escovopsis, a parasitic species of fungus that specialises in invading Attine leafcutter gardens. Remarkably, an important part of their arsenal is a second symbiont: a bacterium that only occurs on the cuticle of leafcutter ants, which produces powerful antibiotics specific to the fungal pest. The ants grow these bacterial weapons on special patches of cuticle, modified specifically to house them. There is also a degree of communication between the ants and their garden fungus. Leafcutter ants are sensitive to the chemicals signals released by distressed fungus, and learn to avoid food that harms their gardens. When a new queen leaves the nest to mate and establish a colony of her own, she carries a sample of the fungus from her parent colony in a cavity next to her oesophagus. Once she has found a likely nesting spot, she spits out the fungal sample, and tends the growing cultivar as closely as she does her own offspring, feeding it with secreted fluid, while she herself subsists off her own bodyfat. Once the first brood of workers have been raised, the queen assumes her proper position as the egg-laying machine at the center of the colony, feeding on unfertilized eggs laid by her workers. If her colony is successful, she will produce about 20 eggs a minute, 24 hours a day, resulting in between 150 and 200 million offspring during her life. The colony can consist of several million ants at any one time. This population is housed in a colossal nest - one typical example had 1920 chambers with 238 fungus gardens. To build it, the ants had to shift 40 tonnes of soil. The nest itself is designed to provide optimal ventilation and humidity for the fungal gardens, and is continually adjusted by the ants to achieve the right conditions. Stretching out from the nest is a set of foraging tunnels that surface into a web of trunk routes along which leaf material is brought back to the nest. Trunk routes are meticulously maintained, with "road workers" clearing debris and encroaching vegetation. Within the ant population there are a range of physical castes, each adapted to a specific set of jobs. The smallest workers maintain and patrol the fungal gardens. The largest are gigantic supersoldiers that specialise in deterring vertebrate predators. Underpinning all of this is a sophisticated chemical communication system, involving a huge array of pheromones, and an incredibly sensitive sensory system. Hölldobler and Wilson cite research that shows that one milligram of the trail pheromone of Atta texana is enough to lead a worker 60 times around the Earth.
Ponder for a moment the immense behavioural complexity required to sustain a sophisticated insect civilization like this. There are an extraordinary number of behaviours that need to be optimized, many of which read like they are straight from the pages of a programming competition. Foraging strategies need to be devised to efficiently discover food sources. Once a food source is discovered, its value needs to be estimated, and the right fraction of the colony's labour pool needs to be allocated to exploit it. Throughput needs to be optimised by selecting the right leaf fragment size, while minimizing the significant energetic cost of cutting leaves up smaller than necessary. The cost of constructing and maintaining the web of trunk routes needs to be weighed against the efficiency benefits gained (it turns out that they can improve foraging speed tenfold). There are many, many other interesting sub-problems like these, and the colony solves them all admirably. The entire system reminds one of a super-complicated real-time strategy game, and we can be forgiven for suspecting that there must be some hyper-intelligent controller micromanaging a Zerg-like expansion of the nest. Here, however, we come to perhaps the most remarkable fact about social insects: their colonies are leaderless. There is no central strategist at all - their entire range of sophisticated behaviour is emergent, arising from the aggregate actions of many small simple units with only local information. And yet, millions of ants can act with such apparent coherence and purpose that biologists like Hölldobler and Wilson have started thinking of colonies as organisms in themselves - "superorganisms" that compete, mate, and strive for survival.
Humanity has not yet learned how to cross the chasm that separates the individual ant from the superorganism. We've seen the early glimmers of technologically produced distributed systems - one thinks of things like the Internet, peer-to-peer networks, and maybe some nebulous social constructs like "the blogosphere". The fact is, however, that we are simply incapable of designing distributed systems that even begin to approach the robustness and intricacy of insect colonies. The Superorganism is certainly not a manual for applying insectiod principles of distributed engineering to technological problems. It is, however, the best available overview of the best distributed systems we know of, and for that reason alone should be on every intellectually curious computer scientist's bookshelf.
Bees: resource allocation, peer-to-peer communication and tiered architectures
That's all very exciting, but it's not very concrete. So, for the second part of this review, I'll look at one example of distributed problem solving covered in The Superorganism , and explore its fascinating parallels with computer science.
The best-studied insect society is surely that of Apis mellifera, the honeybee. In 1947 Karl von Frisch famously decoded part of the "dance language" of the honeybee, showing that the bee waggle dance was used to convey precise information about the distance, direction and quality of a food source to nearby bees. The amazing discovery that bees conveyed complex abstract notions of this type to each other gave us an early insight into the wonder of social insect communication. Over the years since von Frisch's discovery, it has gradually emerged that the waggle dance is just one of a complex set of signals used to implement a distributed resource allocation strategy inside the bee colony. The bees in a hive are loosely specialised into "foragers", who go out of the hive to gather food, and "nectar processors", who remain in the hive to receive nectar from incoming foragers for processing and storage. When a forager returns to the nest laden with pollen and nectar, it searches until it finds a free processor to accept its cargo. The first optimisation problem the hive faces is to balance these two populations of specialists, minimising the waiting time for foragers dropping off their cargos as well as idle time for processors waiting to accept them. The second optimisation problem arises from the fact that the supply of nectar sources is not constant - if a new grove of flowers in bloom is discovered, the hive has to divert resources to exploit it as quickly as possible, adjusting the number of foragers and processors to match. This is complicated by the fact that not all nectar sources are equal: some might be particularly rich, and therefore require more foragers to exploit. A particular bee hive might be extracting nectar from a number of flower patches at the same time, and foragers need to be allocated optimally, and continually re-balanced. Remarkably, the bee colony accomplishes these goals without any central co-ordination, using an entirely distributed algorithm. To see how they do this, we need to flesh out the bee dance language somewhat. Hölldobler and Wilson describe three basic bee dances:
Waggle dance: The famous dance discovered by von Frisch, which directs forager bees to a specific resource with precise information on the location and distance.
Shaking dance: Recruits more bees to foraging, sending them to the dance floor to look for waggle dancers.
Tremble dance: Induces waggle dancers to stop dancing, and recruits bees to nectar processing.
These dances are signals that provide the communications framework for the "bee algorithm", sketched out by Hölldobler and Wilson in the following set of decision rules:
1 | Not enough nectar collectors in the field? If yes, and you also have immediate knowledge of a producing flower patch, perform the waggle dance.
2 | Is the flower patch rich or the weather fine or the day early or does the colony need substantially more food? Perform the dance with appropriately greater vivacity and persistence.
3 | Not enough active foragers to send into the field? Perform the shaking maneuver.
4 | Not enough nectar processors in the hive to handle the nectar inflow? Perform the tremble dance.
So, how do bees decide if there are too many foragers or too many nectar processors, using purely local information? The answer is simple and elegant: if a returning forager experiences a wait time of 20 seconds or less before finding a nectar processor, they assume that there is a surplus of processors and recruit more bees to foraging through the waggling dance. If they experience a wait time of 50 seconds or more, they assume that there are too many foragers, and use the tremble dance to both reduce the number foragers and increase the number of processors. Notice that all the signals used in this system are "peer to peer" - bees only communicate with nearby bees that are in the hive at the moment of communication.
The system described above is clear enough to implement easily, and there is a rich range of parallels with computer science. It's not surprising, therefore, that a a bit of searching through the literature shows that a number of computer scientists have started mining the bee resource allocation algorithm for ideas. One nice example comes from Sunil Nakrani and Craig Tovey, who have successfully applied a subset of the behaviour outlined above in a paper called On Honey Bees and Dynamic Allocation in an Internet Server Colony. Consider a hypothetical data center of servers used to implement a hosted application environment. Each application is backed by a dynamic pool of virtual servers, and servers can be added to or removed from the pools transparently. There is, however, a switching cost to moving resources about - re-allocating a virtual server involves server downtime and therefore lost revenue. Application load varies unpredictably - one day an application might be getting three hits a day, and the next it might crop up on Reddit and have a massive load spike. The hosting company is paid based on usage - say, per HTTP request served - and faces the complex problem of optimally allocating its server resources to minimize downtime and maximize revenue. Nakrani and Tovey approach this problem by mapping the bee resource allocation system onto the server allocation problem. In this mapping, foraging bees are the servers, and flower patches are the applications. In nature, the bee recruitment signal - the waggle dance described above - is triggered if a flower patch is sufficiently "profitable". The more profitable the nectar source, the greater the "vivacity and persistence" of the recruitment signal. Nakrani and Tovey simulated a system where servers used a central advertboard to post recruitment adverts. In broad terms, Nakrani and Tovey's servers were more likely to read a random advert from the advertboard, and switch to a different application, when their current application was less profitable. On the other hand, a server was more likely to post an advert to recruit more servers to its application, if its application was more profitable. The result is a distributed algorithm that performs within about 11.5% of an omniscient resource allocator with complete knowledge of all future HTTP requests.
Interestingly, Nakrani and Tovey also had something to teach entomologists. They found that while the bee recruitment algorithm performed superbly when there was a lot of variability in application load, it was outperformed by much simpler algorithms when load was relatively static. Their simulation therefore seems to indicate that the bee recruitment algorithm is an adaptation to variability in nectar sources. While this blog post focuses on what computer scientists can learn from insects, the possibility that information might flow the other way is a fascinating one. When I first read about the loose specialisation in the beehive, with foragers handing over their load to processors, my immediate thought was that this described a tiered architecture. Now, there are a number of sound non-architectural reasons why a colony would want to have some bees specialise in foraging. Foragers tend to be the older bees in the colony, and this makes complete sense. Foraging is a hazardous activity, and bees have a limited lifespan. Sending out bees that are approaching the end of their lives anyway is good economics. Hölldobler and Wilson write that this specialisation
... causes a problem for the honeybee colony: How can the rate of food collection, particularly of nectar, and the rate of food processing be kept in balance?
The computer scientist in me suspects that there may be a different way to look at this aspect of bee behaviour. In computing we produce tiered architectures with independent layers because they improve efficiency and flexibility in various ways. I can't help but wonder if a similar benefit might support this aspect of bee behaviour.
One last note before I'm done. Karl von Frisch once said that
... the life of bees is like a magic well. The more you draw from it, the more there is to draw.
There are some 20,000 species of bee in the world, ranging from solitary species to the great super-societies of domestic honeybees. There are 14,000 species of ants, 4,000 species of termite, and more than 100,000 species of wasp. Each of these species is a unique product of evolution's boundless ingenuity, and each has its own suite of solutions to the problems of survival. When one of these species disappears - and they are doing so at a terrifying rate - the tragedy is not simply that something beautiful is irretrievably gone from the world, but also that we have lost another irreplaceable magic well to study, learn from, and emulate. E. O. Wilson has devoted much of the latter years of his life to the great cause of preserving our biological legacy - if you are interested in this urgent issue (and you should be) I recommend his 2002 book The Future of Life .
If you click on one of the Amazon links in this article and buy the book, you're putting a coin in my tip jar. All proceeds will go to a book I'll review on this blog.