How we see people—ourselves as well as others—in the virtual world is perhaps the most challenging problem in the design of online spaces. Online, we have no inherent appearance. This is liberating, for we can rethink and recreate personal identity cues, or omit visual representation altogether. Yet a world in which it is difficult to perceive the inhabitants as distinct individuals can be dull and confusing. We are sensory beings, gaining much of our impressions of people and places through sight and sound. Text is excellent at communicating information, but a dry text interface does not provide the feel of a vibrant society. If we are to create more immersive and sensorial interfaces for social communication, one of the fundamental problems we must solve is how to represent the participants.
Data portraits are depictions of people made by visualizing data by and about them. These data can be anything: a portrait of someone’s medical history (Plaisant et al. 1996), travels (Rekimoto, Miyaki, and Ishizawa 2007), shopping lists (Sherman 2006), and so on. We will focus here on the data people create in their online interactions: their email exchanges, status updates, and contributions to online discussions.
I call these depictions “portraits” rather than “visualizations” to emphasize the subjectivity of the representation. The goal of a visualization is often accuracy; it is a tool for scientific or sociological analysis. A portrait is an artistic production, shaped by the tension among the often-conflicting goals of the subject, artist, and audience. The subject wants to appear in the best possible light; the audience wants to gain insight about the subject; and the artist has his or her own aesthetic message to convey, as well as mediating between the subject and audience. The data portrait artist has an enormous number of choices to make in creating the portrait, beginning with deciding what data to show. Even given the decision to use, say, one’s history in a conversation, decisions remain about which patterns to show and how to depict them. This is not to deny that data portraits are visualizations; they are. The distinction is in the conceptual framing.
The portrait framework helps us see that these decisions can be in cooperation or in conflict with the subject. Is it in the tradition of the commissioned portrait, where the subject has the final say on what is shown and what hidden? Or is it more like a street photograph, where the artist captures a starkly revealing image, fascinating to the audience but perhaps appalling for the subject? Visualizations share these issues, and certainly data portraits are a form of visualization. But there is a distinction.
The term “portrait” emphasizes that the representation is an evocative depiction, meant to convey something about the subject’s character or place in society.1 A portrait provides a salient, recognizable, characteristic, evocative, or symbolic representation of its subject. Facial portraits are the archetypal form—which is unsurprising, given that we are neurologically predisposed to recognizing other humans by facial structure. (If dogs were artists, perhaps they would portray each other via creatively rendered scents.) A person’s face can tell us something about his or her place in society, character, and approach to the world.
The goal of portraiture—evoking a person’s essential qualities—is a difficult challenge. Looking at traditional portraits, we see that many images of people’s faces are not deeply evocative; many paintings barely capture a likeness, let alone an essence. Even many photographs, which are accurate recordings of the person’s appearance at one particular place and moment, convey little of their subject’s character (and often barely resemble them). The difficulty is not only in the rendering, but also in the source data; the face itself is a fallible indicator of character. We make inferences about personality and intention based on structural characteristics such as a heavy brow or weak chin, which in fact are very poor predictors of these traits. Creating portraits from data has the same challenges: one must base the portrait on salient and emblematic data and then represent it in a legible and intuitive form.
The “essential qualities” of a person vary from one culture to another. Medieval portraits emphasized one’s position in a rigidly hierarchical society; nineteenth-century portraits showed an increasing interest in human psychology. Data portraits (besides their general context in the twenty-first-century world of computational analysis, algorithmic rendering, and online social media) reflect the mores of the community for which they are made, whether composed of health-measuring life-loggers, argumentative tabloid readers, or weapon-collecting role-playing gamers.
A data portrait can serve several purposes. It can be a mirror designed for private viewing of the data that exist about oneself and for managing the impression they make. It can function as a public portrait, whether as a work of art or as an avatar, representing the subject in an online interaction space. It can be used to make an activist statement about privacy, surveillance, and power in our culture.
For online communities, data portraiture can create recognizable and meaningful renderings of the participants. One of the big problems in these communities is that it is difficult to keep track of the other members (Hancock and Dunham 2001). For a newcomer to the community, or any participant in a very large group, it is hard to figure out others’ roles and contributions. Even if you can see, for instance, all the comments that someone has made in an ongoing discussion, this unwieldy archive does not easily provide you with a clear picture of his or her role in the community. You would need to spend hours poring over transcripts, piecing an impression together through scattered remarks. Visualization reifies this data and condenses it into something we can easily perceive, compactly embodying a tremendous amount of information and making it possible to see years of activity in a single glance. Data portraits like this can help members of a community keep track of who the other participants are, showing the roles they play and creating a concise representation of the things they have said and done. Here, the portraits act as proxies for the subjects, affecting how others in the community act toward them.
Communities flourish when their members have stable identities, and upholding local norms enhances social status. Anonymity and its effective equivalent, cheap pseudonyms are, in general, antithetical to community. Yet a stable identity need not be a singular identity, nor does it need to be based on one’s real name or tied to one’s offline self. In the physical world, we can maintain separate personas in different contexts; but online, where searches can easily aggregate everything said and done under a particular identifier, using one’s real name for all interactions eliminates the contextual privacy we take for granted in everyday life. Strong pseudonymous identities can have extensive histories and reputations within their community. They can thus provide the stability and motivation for social cooperation that we associate with “real” identity, yet also maintain privacy. Data portraits support pseudonymity by providing an effective way to see these histories and reputations; they are a picture of the person’s actions in the local space, rather than merely a photo of his or her offline appearance.
Data portraits also function as mirrors. Voluntarily or not, we all leave an increasingly detailed trail of data behind us—a history not only of deliberately published interactions, but also of our searches, movements, and the mentions others make of us. Being able to see the patterns of these data trails—the impression given off by our recorded actions—can help us decide how we want to modify our behavior to shape our virtual persona. Looking at yourself through such a lens could, with the relevant data, make it easy to see if you talk more or less than others, or if you answer your share of questions in addition to seeking advice. On the positive side, this is socialization; we see ourselves as others see us and modify our behaviors to conform to our community’s standards and to gain status according to its values. On the negative side, we have the chilling effect of ubiquitous surveillance, where our self-awareness of the inferences that might be drawn from what we do causes us to greatly curtail our comments and actions.
Data portraiture raises many questions, including who controls the depiction, what purpose can it serve, and how can we create a vivid portrayal of a person using data about him instead of creating an image of his physical features? To address them, we will start by seeing how artists of different eras, with their varied conceptions about what is important to know about a person, have represented people. We will then look at recent examples of data portraits, and the varied approaches they use to translate material such as their subject’s online interactions into a visual depiction. Finally, we will look at the big conceptual changes that arise when the “artist” is a machine: how does it affect the negotiation between subject and artist about how the former should be portrayed? What brings insight to the mechanical eye?
A portrait reflects its era’s concepts of identity. Throughout history, portraits have been about much more than simply showing what someone looks like. Their purpose is to convey something about who their subjects are, though what is considered significant varies among cultures. Our notions of identity, of the constancy of a person in different circumstances and throughout life, are culturally constructed; as notions of identity change, so do portraits.
The Middle Ages did not hold immediate, fleshy reality in high regard, rejecting material life in favor of the spiritual. Early medieval portraits of rulers often did not depict the actual appearance of the living person. Instead, they used the image of the long ago Roman emperor in whose succession the ruler claimed to be standing. The portrait thus showed the historical, political, and social context in which the subject wished to be viewed, rather than the earthly, physical visage (Schneider 2002).
The Late Middle Ages and the Renaissance saw a renewed fascination with the surrounding world. Lifelikeness became the goal of portraiture, and painters acquired great technical skill in recreating the appearance of their sitters and in rendering the richly symbolic clothes and objects that represented class and status in an increasingly complicated world. The relationship between artist and subject also became more complicated. Although lifelikeness was the artist’s goal, most sitters also wanted a flattering rendering, and the sitter’s patronage was important to most artists (Woods-Marsden 1987).
Until photography, a painting was the only way to capture and preserve how someone looked at a certain time, or to convey that image to others at a distance. When King Henry VIII of England was seeking a new queen after the death of his third wife, he sent Hans Holbein to paint portraits of various prospective brides, including Anne of Cleves, whom the King eventually chose as his fourth wife. Although Henry had found her portrait attractive and, on the basis of it and other descriptions, contracted to marry her, once he met her in person he was repelled by her, and the marriage was quickly annulled (Warnicke 2000). Paintings could be proxies: a painting of a ruler in his chair was accorded the same gestures that would be given to the monarch himself (viewers uncovered their heads and were careful not to turn their backs to the portrait); portraits of criminals who could not be found could be executed in their stead (Schneider 2002).
By the twentieth century, serious art was increasingly abstract. One reason is that photography brought an easy ability to capture a moment of reality, and thus verisimilitude became its provenance. Yet it was not only in reaction to this new technology that painting moved increasingly toward images that explore the nature of form, light, or movement, or diverge from representation completely.
The publication of Freud’s Interpretation of Dreams in 1899 ushered in a century in which psychology and its attendant questions about how the mind works would be very influential. “Are we to paint what’s on the face, what’s inside the face, or what’s behind it?” Picasso is famously said to have asked (see figure 8.2). A related intellectual theme was the increased awareness of the subjectivity of all perception (Baumer 1977). In painting, this was manifest in a shift to seeing the artist’s vision, rather than the subject, as the primary point of an artwork. Twentieth-century painted portraits are far from the venerating images of the Renaissance: the subject is often distorted or otherwise made grotesque.
The project of “portraying somebody in her/his individual originality or quality of essence” has come to an end. But portraiture as genre has become the form of new conceptions of subjectivity and new notions of representation. (van Alphen 1996, 254)
By mid-century, photography had become an inexpensive and popular hobby. Capturing reality was possible for anyone with a camera, not only a few skilled artists. It became the norm to photograph events such as vacations and birthdays; schools provided portraits of the pupils each year. It became clear, too, that simply capturing how someone looked at a given moment with a snapshot, though accurate, did not necessarily create an evocative portrait.
Today, we are in the age of information. Vast databanks collect details of our everyday life. The human genome is decoded. Although facial portraits, abstract and distorted though they may be, still define the genre, there are noteworthy works that use recordings and data to depict their subject. Sophie Calle’s Address Book (1983) portrays her subject through other people’s opinions. Upon finding a stranger’s address book in the street, she called some of the numbers in it and asked about the owner. With the transcripts of these conversations and pictures illustrating what she had learned were his favorite activities, she created a portrait of a man through the words of his acquaintances. It’s a fascinating work in its depiction of a person through the multiple lenses of varying acquaintances. And, it is a disturbing work in its invasion of the privacy of the man who had the misfortune to drop this collection of personal data, his list of contacts, at that particular place and time.
Portraits of people through their possessions (the inventory portrait) are physical data portraits. Some draw upon our cultural understanding of objects to express meaning. Peter Menzel’s Material World (Menzel 1994) is a series of photographs of families around the world, with all their possessions displayed in front of their home. Although each family was chosen because it was deemed to be “statistically average” for its country, the photographs still read as highly individual portraits of specific people, evoked through the objects they live with. Rachel Strickland’s Portable Effects (Strickland 1998) was an installation in which visitors were requested to empty whatever bag they had with them—a backpack or purse—and the contents and owner were photographed separately. The resulting images were exhibited in a gallery, where viewers were asked to try to match the right face to its possessions, testing the resemblance between face and data.
Others evoke their subject through the intimacy between a person and his or her objects. Christian Boltanski created a series of portraits called Inventory of Objects Belonging to a Young Woman of Charleston, Inventory of Objects Belonging to a Woman of Bois-Colombes, and so on, in which he selected a person, photographed his or her possessions, and published the images in a book. The art historian Ernst van Alphen argues that these nonrepresentational portraits can evoke their subjects more successfully than the more traditional form:2 “This success is due to the fact that one of the traditional components of the portrait has been exchanged for another semiotic principle. Similarity has gone, contiguity is proposed as the new mode of portraiture” (van Alphen 1996, 250).
Many contemporary self-portraits depict the artist through an intimately personal inventory. Tracy Emin’s Everybody I Have Ever Slept With is a tent in which she appliquéd the names of 102 people she had slept with, from her grandmother with whom she had napped as a small child to her recent lovers (see figure 8.4). Here, the inventory is data: a list of names, not actual objects. As a self-portrait, for most viewers its concept is its most evocative feature: we get an impression of someone almost aggressively eager to shock. It presents another level of legibility to viewers within her community who know, or know of, some of the people listed.
Emin’s work is a forerunner to the social network portrait, in which we see a person through the set of people with whom she is connected; these portraits gain vividness only when we know something of the connections, as we saw in chapter 4. Such knowledge may come from being part of the artist’s community, but it can also be part of the self-portrait, if it is itself made from evocative portraits of others.3
Not all data are expressive. Steve Miller’s Genetic Portrait of Isabel Goldsmith is a beautiful, abstract-seeming painting of chromosomes cultured from its subject’s white blood cells (see figure 8.5).4 The depicted data comprise very much the essence of the person: DNA is what, at the biological level, makes the individual. But pictures of DNA are effectively meaningless; though so much is encoded in them, as a visual representation they are no more informative than random squiggles. What fascinates the viewer—and they are intriguing, enough so that a company now sells custom DNA portraits; you send them a sample and several hundred dollars and they send you a colorful wall-sized print of your DNA (DNA Art by DNA 11 2012)—is the idea that this indecipherable pattern is the key to who someone is. It is a portrait of a concept, not a person.
In the late 1990s, my students and I began experimenting with ways to represent participants in online discussions. The Web, with its wealth of information and easy navigation, was attracting huge numbers of people online.5 Discussion sites were having difficulty keeping up with the accelerating influx of newcomers who were unfamiliar with the customs and social rules, and people were having a hard time figuring out where the interesting discussions were. We wanted to make visualizations that would function as maps, guiding people where to go. More fundamentally, we were interested in making visualizations that would help people make sense of the other online participants and of the evolving social mores.
Making sense of the different personalities and complex social dynamics in a big online discussion is difficult, especially for newcomers. You see a person frequently referred to in other postings. Who is that person? Why does everyone seem to be ignoring one person’s questions but eagerly answering another’s? You say something and someone criticizes you. Are you dealing with a crank or with an authority in the group whom you need to take seriously? You might be able to answer some of these questions by painstakingly reading all the back correspondence, but there is also the problem of keeping track of people for whom you have no visual referent. In most online discussions, people are represented by email addresses or screen names. Sometimes these are quite memorable, but often they are either cryptically obscure or forgettably common. If there are four or five “Dans” in a group, it is easy to conflate them.
One solution is to add a visual representation of the participants; indeed, many discussion sites make it possible for people to accompany their postings with a photo or graphic. These images are of limited value. Photos are most useful for groups in which the participants know each other face to face, where they serve as a reminder of a familiar person. But a photo of oneself is not useful for anyone who wishes to be pseudonymous, or for those who do not wish their gender or race to be the most notable aspects of their identity. Arbitrary graphical icons are similar to a self-chosen screen name; they are arbitrary. A photo can be faked, and a graphic can falsely imply a skill or affiliation. Furthermore, they are far from unique, and popular ones, from sports team logos to cute cats, share many users.
The key problem was how to make a recognizable graphical image that would meaningfully represent a person. The most salient material, we decided, would be the actions of the person him- or herself. So we decided to visualize people’s conversation history, with the goal of creating a compact representation, a portrait that would show the patterns of their actions within that context. Over the next several years, we made many portrait sketches, experimenting with approaches to rendering textual history.
The two big problems in creating a data portrait are choosing what data to show and designing the visual representation for it. In reviewing these portrait sketches, I will focus on the visual representation problem and, in particular, on the question of what makes a data portrait intuitively legible. A traditional portrait has a face, body, and expression—features in which we can easily read the subject’s age, gender, and often also something of his or her character, social position, and dominant mood, even across cultures. But the data portrait is abstract. Making it intuitively legible, or intriguing enough that one is willing to learn to read it, is a key goal in designing data portraits.
Authorlines (Viégas and Smith 2004), a companion piece to Newsgroup Crowds discussed in chapter 6, uses a bubble graph to show the different roles individuals play within a group (see figure 8.6). Each column is a week, and the bubbles represent conversations the subject has been part of; the larger the bubble, the more posts she has written. Conversations that the subject has initiated appear as orange circles above the middle line; those initiated by others are yellow circles under the line.
Certain patterns of behavior are immediately apparent. Spammers almost always only initiate conversations (at least as of the time this work was created; they have since become more savvy about insinuating themselves into ongoing discussions) and do not follow up on their posts; their portrait shows lots of small bubbles, all above the midline. Someone who takes the role of an expert has a very different portrait. They initiate fewer postings but are likely to respond to others’ questions, sometimes with a couple of postings. The portrait of people who take on the role of answerer has small but not uniform bubbles primarily in the response zone. Highly argumentative people may either initiate or respond; their portrait is recognizable by its large circles showing where they have gotten deeply embroiled in a disagreement.
Authorlines is clear and legible. It would be an invaluable tool for helping a participant or, especially, a newcomer, assess who is who within the community. However, the form of the portrait is itself part of the message-bearing content. Without knowing what the data is, one would never guess that it represented people rather than, say, mortgage failure rates or gross national product. It shows you statistics about a person but you do not think of it as a proxy for the person.
An early Sociable Media Group project, PeopleGarden* (Xiong and Donath 1999), used organic forms to stand for the person (see figures 8.7 and 8.8). Its metaphorical poetics, which portrays the participants in a discussion group as flowers and the group as a garden, makes interpreting its meaning intuitive. As the height of a real flower indicates its age, the height of a PeopleGarden flower indicates how long someone has been posting to the group. The number of petals represents posting frequency; a lush blossom indicates an engaged contributor. A petal’s color indicates whether it is an initial posting or a response and the color fades over time; it is easy to remember that a faded flower is an inactive participant. The flower metaphor makes the portraits easily legible. It gives them visual appeal and a sense of vitality; rather than a dry statistical graph, here the data appear as an enticing garden. The drawback is that the metaphor overwhelms the content it depicts. PeopleGarden portrays everyone as pretty flowers no matter how hostile or gruesome their comments are.
Metaphor is a powerful, but sometimes tricky, way to introduce meaning. The challenge in using metaphor is to abstract sufficiently from the source. A less figural representation could still draw meaning from our familiarity with natural forms without looking like a blossom; it could use growth and height to indicate age, brightness and fading to show recent presence, denseness of detail to indicate activity. This way, the representation achieves legibility without relying on literal depictions.
A key point to notice in the difference between Authorlines and PeopleGarden is neutrality. Authorlines’ graphs are legible, but neutral. You can easily understand that larger quantities of circles and more of them represent an increase in some quantity, but you do not know whether this is desirable. Even after learning the key to the mapping, it is up to the viewer to interpret it, to decide what a “desirable” dataset is. PeopleGarden’s design implies values: we want the garden to be lush, to have many flowers with lots of petals. We may find it more attractive when the flowers are diverse, with different colors and heights.
Designers need to be aware of the values their visualizations promote. A discussion group with only three or four participants might still be quite useful and successful, but could appear scraggly and sparse in a PeopleGarden-like portrayal, unless, for example, the designer added a rule to keep interacting participants tightly bunched, distributing them at a distance from each other only if they communicated very little. The designer of any visualization, but especially one that functions as a proxy for a person, needs to be cognizant of what values the representation promotes.
Another experimental interface from the Sociable Media Group, Anthropomorphs* (Perry and Donath 2004), portrayed its subjects as humanlike forms, encoding numerous statistics into the size, shape, and posture of these bodies (see figure 8.9). For example, the more messages they had written, the higher their arms are raised; the higher the proportion of replies to initial posts they sent, the wider open their eyes; the more central they were to the group, measured by how many responses they received, the larger their legs are and the wider apart they are set. Each box on their body stands for one message the subject has sent, so the more prolific the subject, the bigger his or her abdomen. Their facial expression and body color reflect the emotional tone of their postings.
There were earlier attempts to use human forms as a medium for visualization. Herman Chernoff proposed representing data by varying the size, shape, and position of facial features (Chernoff 1973).6 His idea was that since people are so good at recognizing faces and detecting the minute differences between them, we should be able to exploit this ability for depicting data. The problem with this is that the effect of moving and changing facial features has nonlinear perceptual results; making a single feature larger can change the overall expression and appearance of the face. A seemingly small change in a feature can appear significant, and vice versa.
Unlike Chernoff faces, Anthropomorphs makes use of the viewer’s social and psychological interpretation of the work’s humanoid form. It attempts to match the meaning of the data to the expression of the visualization. For example, how open the figure’s eyes are indicates the subject’s ratio of replies to initial postings: someone who is actively paying attention to others (making a lot of replies) would be depicted with wide-open, alert eyes. The width of the figure’s stance indicates the number of responses the subject receives: the wide-set legs of the person who receives many responses is meant to show someone sturdily ensconced in the community.
Here, the problem is that the humanlike forms are too intuitive—we read too much into them. The prolific poster, who in PeopleGarden appears as a lush flower, here is big-bellied, his girth appearing to push his arms up from his sides; the novice looks, by contrast, compact and self-possessed. And I say “he” because this rendering, from the blue body to the wide stance, reads as male, though there is no reason to assume the subject is not, in fact, female. Furthermore, these particular human forms are quite cartoonlike. Some users found them cute and attractive, but to others they were cloying.
Although Anthropomorphs was not an entirely successful visualization, as an experiment we learned several things about the advantages and pitfalls of using human forms for depicting data. The central issue is that of unintended interpretations. When mapping a numerical quantity to growth in the form, one needs to be careful about making the form larger, not heavier. Different postures can indicate different emotions; unless you want the visualization to express these feelings, take care not to have it look dejected or triumphant, for example. In particular, if the portrait functions as a proxy for a person, it balances on a fine line between something that you can identify with and something that has an identity of its own.
One of the problems with Anthropomorphs was that the faces created characters. In a game environment, when one is playing a fictional role, a proxy with its own identity may work well. But for typical discussion groups, the proxy should not compete with the subject for human(like) identity. To address this, we designed a subsequent version with more abstract anthropomorphic visualizations. These had silhouette heads, with no faces, and elongated bodies. These were less cute and personable, which is more suitable for many applications (see figure 8.10).
The humanlike form has the advantage of immediately reading as a person. A group of such portraits is intuitively perceived as a group of people; it does not risk being mistaken for a chart of mortgage rates or baseball statistics. The key is to find the right balance between figurative images and abstraction. Part of the appeal of the anthropomorphic depictions is that they read as individuals; the disadvantage is that they can easily convey unintended expressions and personalities. A more abstract rendering reduces the expression, but can also lose the individuality. One purpose for data portraits is to humanize the online experience, make us recognize these collections of words and statistics as representing real people. For this, the humanlike visualization form, problematic as it is, can often be a good solution.
Another approach to using human forms in data portraiture is to use the form as a frame for the visualization, but not as a carrier of information. Lexigraphs I* is a group portrait of Twitter users (Dragulescu 2009). Each person appears as a silhouette outlined in words derived from their updates, animated by the rhythm of their postings (see figure 8.11). The silhouettes are identically shaped; the individuality of the portrait is in the specific words and rhythms. The silhouette is thus purely decorative; it bears no specific information. Yet setting the words in the shape of heads contributes greatly to the sense that one is looking at portraits of individual people. It is important that these head shapes are themselves quite abstract; there are no features, simply the impression of a human. Edward Tufte, in his influential writings about graphical design, inveighed against “data ducks,” decorative elements that distort or overwhelm the information the graphic should convey. Is making a data portrait in the outline of a head a form of data duck? Although the shape of the head does not provide data about the individual, I would argue that it is not a pointless decoration, but a design that immediately clues the viewer into perceiving the visualization as a portrait of a person. A community thus rendered becomes an inhabited cityscape, an online space where people-watching is an entertaining and informative pastime.
Words are the raw material of many data portraits, and incorporating the text itself into the portrait provides immediate context and detail. The challenge is to compress what may be a large body of words into a quickly readable image. For this reason, textual analysis is a key element in the design of data portraits. One approach is to highlight the most evocative words. Lexigraphs I analyzes word usage to find ones used with unusual frequency. The technique (discussed in greater detail below) is similar to that of creating a caricature: one finds the norm and highlights the ways that the subject deviates from it (Brennan 1985).
As with any portrait, there is a trade-off between expressivity and accuracy: the artist’s vision, which can render the subject distinctly and vividly, also distorts the portrayal. An interface that allows the viewers to delve more deeply into the source material for the portrait—to see the original text and context from which the portrait was made—gives them both the concise representation of the portrait and the ability to form their own impression of the subject. This is especially important in situations, such as online communities, where the portrait functions as a proxy for the subject.
At the heart of quantitative reasoning is a single question: Compared to what? Small multiple designs, multivariate and data bountiful, answer directly by visually enforcing comparisons of changes, of the differences among objects, of the scope of alternatives. For a wide range of problems in data presentation, small multiples are the best design solution.
—Edward Tufte (1990)
Data portraits usually exist in a series, created by applying an algorithm to the data of numerous subjects. The ability to compare among multiples makes these abstract depictions legible; it is primarily in the context of other portraits of similar design that we can understand a portrait’s nuances and vocabulary. Lexigraphs I, for instance, shows a group of Twitter users, portraying each with salient words from their current and past updates. It animates with the rhythm of each user’s postings. While we would get some impression from a single portrait, only upon seeing the whole group can we judge whether the person is notably prolific or unusually personal in their postings; these are relative qualities, and one needs to see the community to understand its individuals.
A portrait photographer depends upon another person to complete his picture. The subject imagined, which in a sense is me, must be discovered in someone else willing to take part in a fiction he cannot possibly know about. My concerns are not his. We have separate ambitions for the image. His need to plead his case probably goes as deep as my need to plead mine, but the control is with me.
—Richard Avedon, foreword to In the American West (1985)
Every portrait has an artist, a subject, and an audience. The tension between them is what creates the portrait: The subject wishes to appear in a positive light. The artist wants to create a good artwork, to represent the subject truthfully while also keeping him or her happy. The audience wants to get a sense of what the subject is like. When the subject’s good will is very important to the artist, presenting him or her favorably will be paramount. The portrait must still have some resemblance to its subject, yet here the subject has great influence over the artist, and may request a flattering depiction. This can produce blandly agreeable depictions, such as the photographs of board members and CEOs that line corporate hallways. Yet it can also produce deeply insightful and empathic works (Cohen 2003). The great Renaissance portraits (e.g., Holbein’s paintings of Henry VIII, Medici portraits) were commissioned works, and the painters could not offend the powerful merchants and royalty who were their subjects. Still, they managed to create revealing and compelling portrayals. When the subject is not a patron, the portrait may be far more stark and revealing. This is the relationship of the artist and subjects in Renaissance Dutch “genre” painting (Jan Steen, Franz Hals, Judith Leyster) and in contemporary art, from painting to street photography (Robert Frank, Lisette Model). What the viewer learns of the subject varies not only by the dynamics of the artist–subject relationship, but also by the artist’s skill and inclination. There are great and revealing portraits of powerful patrons, as well as opaque ones made by artists whose interest is in surface and design, rather than in portraying the subject’s psychology.
At the extreme are passport photos and mug shots, utilitarian images made for identifying the subjects, who in these cases have no say at all as to how they wish to appear. Here, the audience—immigration and law officials—is in control; there is no artistic vision set on furthering a career, and no desire on the subject’s part to be thus immortalized. Though even here, especially with the mug shot, where the subject is likely to be feeling angry, frightened, or defiant, a strong sense of personality can seep through the regimented form. Mug shots of arrested civil rights workers, for example, show pride and dignity in their deliberate civil disobedience (Etheridge, Wilkins, and McWhorter 2008).
The photographer Richard Avedon pointed out that although photographic portraits are always accurate—light really did bounce off the subject’s face in that particular way at some time—that does not mean they are objective; they present the artist’s viewpoint. And, being accurate does not mean that they are true; they do not, he claims, get at some fundamental story or observation about the subject. In his foreword to In the American West, he argues: “A portrait is not a likeness. The moment an emotion or fact is transformed into a photograph it is no longer a fact but an opinion. There is no such thing as inaccuracy in a photograph. All photographs are accurate. None of them is the truth.”
In the world of information visualization, the goal is to depict the data as objectively as possible. This is the opposite of art, where the artist’s subjective vision is central. Data portraits sit between these extremes: their techniques come from the world of statistical analysis, but their purpose is artistic. Some may be closer to one extreme or the other; neither is “right,” but understanding where a particular portrait falls in this subjectivity continuum is important to understanding its function.
One of the key aspects of art is choosing what not to show. Even in traditional painting, the artist chooses among all the possible poses and settings, omitting all but one. In creating a data portrait, the first decision is what data to show. Who controls the information a data portrait includes? Is it legitimate for the subject to omit data? If the portrait derives from my contributions to a conversation, can I edit my words? If it is made from, say, the times I punch in and out of work, can I change these data? There are no clear-cut answers to these questions. They depend on the purpose of the portrait, the intentions of the subject, and what message the portrait conveys about its relationship to accuracy.
Portraits that show one’s history within a community usually cannot be edited. Their function is not simply to depict people as they are, but to promote acting within the norms of the group and pursuing the achievements that bring high status in that context. One changes one’s portrait not by editing it after the fact, but by modifying one’s actions. Such portraits are common in online role playing games, where the player profiles show such data as the player’s role, skills, and achievements in the game (see figure 8.12). They are also used in “serious” sites. Stack Overflow, for example, is a successful knowledge-building site in which participants ask and answer questions about computer technology. It has a strict hierarchy of privileges, making status, in the form of “reputation points,” highly sought after and valuable. The user profiles on this site feature subjects’ total reputation points, the questions they have asked, and the community’s assessment of them, as well as the answers they have supplied and how well they were received; it also shows the topics to which they contribute, and other details about their participation on the site (see figure 8.13). Some Stack Overflow users remain pseudonymous, providing little information about their age, nationality, role in the physical world, and so on, yet they are vivid personas within the site, portrayed through its distinctive lens.
Other data portraits give the subject more control of the depiction. Today’s self-written online profiles are primarily text, but people increasingly include data, whether as a link to their Twitter stream, statistics from various monitoring services (How far did you run this week? How many hours did you work?), updates on their travels, the music they are listening to, and so on. One can choose which of these data streams to include in one’s self-portrait, but cannot alter the data itself. If I am embarrassed by how sporadically I exercise or how unadventurous my musical taste is, I can choose not to include this data stream in my profile, but I cannot—or at least, should not—falsify it.
Deception is possible, of course. The economics of honesty tell us that if the gains from being deceptive are high, people will lie; and if the cost of being deceived is high, the audience will demand more reliable signals (Donath 1998, forthcoming). For example, sites for self-monitoring athletic achievements allow users to record their daily statistics. In the general community, there is little concern about this; most viewers have little to lose if they falsely believe an acquaintance has been running steadily, when in actuality he’s been sitting on the sofa.7 In events such as races, however, self-reports do not count, only official measurements made by an outside authority. Here, cheating can be very tempting for some racers, but can be very costly to other, more honest ones, as well as to the audience who expects to see a fair race. Control over one’s own portrayal of oneself works in situations where trust is high or where the cost of being deceived is low.
Giving people greater freedom to portray themselves, even falsely, can be revealing; discerning audiences can read a great deal into an untruthful self-portrait. The profiles in online dating sites are self-written descriptions. There are no technical controls to enforce honesty, yet people often reveal more than they intend. No one sets out to write a profile that says, “I am whiny and needy, quick to blame those around me for all my shortcomings,” or “I am officious and pompous, ready to take over any gathering with my long-winded pronouncements”; yet such subtexts can be read into what the writer intended as a flattering self-depiction.8 As data portraits become increasingly common forms of online self-representation, subjects, artists, and viewers will need to address the issue of what is acceptable data-retouching, and how to convey the existence or absence of such adjustments.
Having some measure of control over what data about oneself are made public is essential to maintaining privacy. Personal data can be very embarrassing if revealed out of context, to the wrong person, or at an inappropriate time. The obvious examples are things such as treatment for STDs, sappy love notes, or drunken pictures from one’s youth. But many things can be embarrassing if taken out of context. Grocery lists—part of your personal database if you shop online or use a supermarket loyalty card—reveal our private eating habits. These facts might be perfectly ordinary, but they are still part of one’s private domestic world, unsettling if revealed without your permission. The words of endearment I use with my children can embarrass me in front of my colleagues, while the professional jargon I use at work can seem stilted and pretentious in front of my family.
The artist Kelly Sherman has exhibited a series of portraits titled Wish Lists (Sherman 2006). Each was a person’s wish list found online, typed onto a sheet of paper (see figure 8.14). One, for instance, called “Tara,” listed:
non-breakable dishes and glasses
bathroom rugs and shower curtain
gift certificate for Ross
boom box with LL Cool J CD
These simple lists are quite evocative. Like people-watching from a café seat, one can read these lists and make up stories about the people they represent, and make guesses about their age, home, and relationships. As with the imaginary biographies we compose about the passersby on the street, these are probably inaccurate, but nonetheless vivid. But Wish Lists does not feel intrusive. It is not a privacy violation, because there is no way to recognize the identity behind the list. The impression is vivid, but also anonymous. It does, however, highlight how evocative even the most innocuous information can be. This does not mean that we all need to hide our wish lists, grocery receipts, and all such mundane, telling details. We should, however, be aware of the how vividly they portray us. Removing information is one solution; so is adding it. If one’s data portrait seems skewed and uncharacteristic, the solution may be to provide more information, to round out the portrayal and place the different details of one’s daily life in their broader context.
Most data portraits are created algorithmically: the artist designs a program for making portraits, rather than the portraits themselves. Each step in the process—mining for data, analyzing it, and then depicting it—involves creative choices. But it is still an automated process—a final stage, perhaps, in the increasing mechanization of portraiture.
The traditional portrait was painted by hand, the artist consciously shaping each brush stroke. The advent of photography changed the artist’s role in the work’s creation; the photographer’s eye and intention remained actively involved, but creating the image itself became the job of a machine.9 Data portraits automate the process even further, raising questions about the artist’s participation in his or her creation and about the source of meaning and artistic interpretation in these works.
Indeed, two aspects of automation distinguish the hand-painted portrait from the computer-generated one. First, there is the automation of rendering. Painting is a physical act, with each brushstroke laid by hand (Donath 2011). The algorithmic portrait, like a photograph, is produced by a machine; the artist need not ever touch the final object. Second, and more radical, is the automation of observation. Though a photograph is machine made, the artist observes the subject, decides when to shoot, and often chooses the most evocative (or flattering) shot among numerous images. With the algorithmic portrait, the artist, having finished designing the algorithm, need never see the subject or the subject’s data. Photography automated the hand; algorithmic portraits automate the eye.
Traditional artists create portraits based on their responses to individual sitters, often spending days, even months, in close quarters with the subject. The artist with a portrait-machine can depict people without establishing any relationship with them; indeed, data-portraitist and subjects usually never meet. The subjects of data portraits, like the subjects in street photographs, are often unaware that they are being portrayed.
Yet in some ways, the line between algorithmic and handmade portraits is not as clear as it first seems. Artists bring their existing set of skills and techniques to each portrait. Some are highly attuned to each sitter, while others churn out rote images, applying their technique identically to each sitter. You need not look further than fifteen-minute pastel portraits commissioned for $20 in many tourist districts, or even at a gallery filled with nearly identical white-ruffed seventeenth-century nobility, to see that handmade portraits can be inexpressive and conventional, efficiently produced by following a set of painterly rules.
On the other hand, following rules and having a degree of separation from your subject do not rule out the creation of expressive and evocative portrayals. It depends on the rules and tools. Computational portraits can incorporate sophisticated algorithms that highlight the most salient features for evoking the individual. The programmer/artist, who never sees the subject nor touches the portrait, can in effect respond individually and meaningfully to different subjects.
One approach is to highlight how the subject differs from a given norm. Caricature works this way, by exaggerating the features that differentiate the subject. A facial caricature exaggerates features such as a prominent nose or small eyes:
[Caricature] is a transformation which amplifies perceptually significant information while reducing less relevant details. The resulting distortion satisfies the beholder’s mental model of what is unique about a particular face. Caricature … can be considered a sophisticated form of semantic bandwidth compression. (Brennan 1985, 170)
What makes a caricature expressive is which features the artist chooses to amplify. For eyes alone, one could consider their overall size a feature, or how widely spaced they are, or their shape or angle. One could exaggerate the lines around them, the bags under them, or the eyelashes lining them. The artist’s task is to determine which of the myriad possible features distinguish the subject in an interesting way.
We can use a caricature-like approach to highlight characteristic words and phrases in a body of text. “Term frequency–inverse document frequency,” or TF-IDF (Salton 1988), is a statistical method of determining how significant a word is in a collection of words. Given some definition of normal word-frequency distribution, this method compares how frequently a given word occurs in a collection relative to its expected frequency. A word receives more weight the more frequently it appears, offset by giving little weight to common words.
Caricature exaggerates perceptually significant features that differ from a given norm (see figure 8.15). A traditional artist works from an internalized model of what constitutes an ordinary face. Different norms will result in highlighting different features as unusual and significant. Caricatures of people of a different race than the artist’s norm will exaggerate the differences in typical facial structure between the two races.
Verbal portraits that use caricature-like techniques to highlight significant words must also choose their norms carefully. Lexigraphs I, which visualizes English-speaking Twitter users, compares their words with the norm of a large collection of Twitter updates, rather than the English language as a whole. Words such as “tweet” or “follower” are relatively rare in general, but very common on Twitter; using this corpus prevents words that are common in this setting from being part of the portrait. If, however, you wanted to exaggerate for an outside audience the clichés of Twitter jargon, you could use ordinary English as the norm: the resulting portrait would be an exaggerated version of what makes someone’s Twitter-speech so different from ordinary verbal exchange, and might feature, along with mentions of tweets and followers, many shortened words, abbreviations such as MT (modified tweet), and also different subject matter: a big proportion of my postings to Twitter—much bigger than their role in my everyday conversation—are mentions of political news stories and design innovations.
Themail, described in a previous chapter, varies the norms it uses in order to create a multilayered portrait of a pair of people. The background shows the words that typify all their correspondence in the context of general English usage, while the foreground shows how their discussion topics changed over time, found by setting their words for that period against the corpus of all their interactions.
Refining this technique further can produce even more vivid portraits. We can develop the requisite heuristics by creating portraits by hand, with the goal of articulating, and ultimately automating, the design decisions.
Like the automated word-frequency portraits, the handmade Rhythm of Salience* emphasizes characteristic words (Abrams and Hall 2006; Donath 2006). It is a group portrait of the participants in a conversation; I made it by highlighting the words and phrases each person used that seemed to exemplify what I took to be their favored topics and typical speech patterns (see figure 8.16). For example, “Janet” was a facilitator of the conversation; her highlighted phrases include “common ground,” “invitation,” “question,” and “expand that for me.” Warren is a media theorist, and among his highlighted phrases are “agonistic pluralism,” “hegemonic control,” “discourse,” “interrogation,” and “Foucault.” Also highlighted were common words such as “text” and “theory,” which in his case were equally evocative. The portrait of Mark, a statistician, included “data,” “similarity,” “clusters,” “statistics,” and “multivariate”; “rambling” and “coffee” round out his portrayal. Note that with the basic TF-IDF algorithm, words such as “similarity” or “text” would manifest the same degree of significance regardless of who said them, but with this personal topic-based approach, they are given higher weight in the profile of someone who was typified by a relevant topic or role.
Rhythm of Salience is a handmade data portrait. The typifying topics were the artist’s general impression of each person, and the highlighted words were ones that seemed to fit with that impression. Could this process be automated? To some extent, yes. Topic modeling is a rapidly developing field and computers are increasingly able to extract topics, emotional tone, and other semantic content from text (McCallum, Wang, and Corrada-Emmanuel 2007; Pang and Lee 2008). More difficult would be the humanizing details that help a portrait come alive, such as including the references to coffee and the self-conscious remark about rambling in the portrait of Mark the statistician.
If we gave ten people the raw data of the Rhythm of Salience and told them to highlight the most evocative phrases, we would see ten different portraits. It is easy for us to imagine such an exercise, and to understand that each artist would see the subjects from a different perspective and would find in their words different phrases to capture that impression. But it is important to remember that automated portraits are also subjective, that the mechanical artist has a point of view. The algorithmic artist expresses a vision for society by choosing which patterns to highlight and how to depict them. Even the most “objective” data portrait has some subjectivity in the data it shows and the color it uses. The visual style of a portrait can indicate its degree of subjectivity. The statistical graphs in Authorlines imply that it is an objective, factual depiction, whereas the cartoon figures in Anthropomorphs suggest a more subjective view, suited to its computational analysis of emotional content.
Furthermore, objectivity and accuracy are different, and a computational portrait can approach objectivity without being accurate. Personas* is a piece that critiques the role of the machine as “artist” (Zinman 2009).10 It creates a portrait based on the results of a Web search for the subject’s name. It analyzes the resulting texts and attempts to characterize the person by fitting him or her into a set of categories of roles and interests (see figure 8.17). The result is sometimes surprisingly apt, but can also be very far off, given the computer’s inability to distinguish different people with the same name and its errors in language comprehension. Personas is a reminder of the fallibility, social naiveté, and opacity of the computer as portrayer in an era when such computer analysis of people is increasingly prevalent.
The art historian Richard Brilliant ends his book Portraiture with a warning:
Indeed, before long, one may expect that instead of an artist’s profile portrait the future will preserve only complete actuarial files, stored in some omniscient computer, ready to spew forth a different kind of personal profile, beginning with one’s Social Security number. Then, and only then, will portraiture as a distinctive genre of art disappear. (Brilliant 1990, 174)
Is a data portrait—created by a machine, visually abstract, depicting a person through data—inherently dehumanizing? I argue that it is not. Data portraits have the potential to evoke the individuality of their subjects in ways that are not possible with traditional forms.
For example, a person’s role in her social world—not her status, which traditional portraits have long depicted with displays of prized possessions and carefully chosen clothing, but her web of social ties, whether numerous and diverse, or fewer but dense and close—is a major part of her identity. Yet, with the exception of group portraits that depict a small family or corporate circle, the subject’s social connections are at best implied in existing portraits. As we saw in the “Mapping Social Networks” chapter, depicting a person’s social network and the communication that flows through it can be meaningful, evocative, and, owing to its great complexity, best rendered by machine.
We are still only novices in this field, still learning what kinds of data are expressive, still developing the vocabulary to display these data. This process involves all three parties, not just the artists. The viewers’ understanding of the visual vocabulary must evolve along with that of the artists, and the subjects—who could be all of us—need think about how we wish to appear in the information world.