Digital Literacy: Search Algorithms are Mechanical Turks

One of the most pervasive features of computing culture are algorithms, the sets of processes or instructions contained in computer code that determine how a particular task will be completed. While algorithms power everything from your automatic coffee maker to your smart phone, because they are frequently hidden from their users, it can be easy to ignore these algorithms and their impact on how we gain access to information.

One of the areas where algorithms have the most impact is on our information search and retrieval practices. Online search is dominated by complex searching algorithms, the most well-known of which is Google’s PageRank. While there are many different ways of thinking about these search algorithms, from the standpoint of digital literacy it is fascinating to see the extent to which these alogrithms have been accepted as reliable stand-ins for other forms of information seeking. One reason for this substitution is that they are, on the whole, quite good at finding and serving us the information that we want. Another is the long-standing cultural assumption that computers (and many other forms of technology) are objective means of accessing information.

For example, a common saying in computing circles is “garbage in, garbage out.” This maxim serves to underscore the idea that computers are objective, neutral information processors—if a particular output is “garbage” then the problem is that the input that led to it was “garbage” in the first place.

Information gatekeepers have a vested interest in supporting claims like this one. Particularly, they want users to think that their technologies—and, in the case of many information technologies, the algorithms that power them—are autonomous machines that objectively process queries and serve up the best results. If we accept the autonomous, machine-like nature of algorithms, it is easier to accept their objectivity and independence from their human creators.

Consider the case of Google. In their paper describing PageRank, the authors wrote that the algorithm is “a method for rating Web pages objectively and mechanically” (my emphasis). This language underscores the common assumption technological processes aren’t capable of deception or other forms of obfuscation. That is, machines, they don’t lie. They have inputs and outputs, but the quality of the output depends on the input, not on the algorithm that processes it. However, algorithms, even Google’s algorithm, depend entirely on the biases and decisions made by their creators.

Not only that, companies like Google frequently tweak their algorithms, constantly trying to improve them. In the case of search engines, this can mean trying to thwart spammers or reacting to users who have found a way to game search engine results. Google itself tweaks its search engine algorithm over 500 times a year, or, on average, more than once a day.

In most cases, algorithms aren’t autonomous machines, however. From the perspective of digital literacy, it is better to think about an algorithm like Google’s as a kind of Mechanical Turk. The “Turk” was a chess-playing machine that made it seem that it operated autonomously, a robot capable of challenging human players in a game of skill. However, the machine was not autonomous, but was operated by a person concealed in a compartment of the machine. Similarly, even though algorithms give the appearance of autonomous behavior, maintaining that semblance of autonomous action requires frequent tinkering and is dependent on guidance from human controllers. In this sense, algorithms are mechanical, in that they have features that enhance the speed or accuracy of these human decisions, but they are not independent of those decisions.

This is not to say that what these algorithms produce can’t be considered “true” or accurate, or that some algorithms don’t produce superior results than others; rather, it is to say that information seekers must examine the information available about these algorithms and the contingencies that produced them if they wish to come to reliable conclusions about that information. As such, examining these contingencies is as important a factor in digital literacy as the examination of the publication history of printed texts—the name and affiliation of the author(s), who printed it, what form it was printed in, how it was edited, etc.—was for print literacy.