In the 1970s and ’80s, urban designer William H. Whyte walked the streets of New York City, watching, recording, and analyzing how people move, sit, and strike up conversations in urban streets and plazas. His goal was to discover what actually attracted people to city spaces, what were the features that created vibrant areas. He found that many of the things people claimed they wanted, such as wide-open spaces and vendor-free districts, were not what they actually liked and sought out; big plazas, the celebrated feature of many new big building sites, were actually sterile and empty. Based on his observations of where people chose to walk, sit, and linger, he developed a set of guidelines for creating lively and sociable urban spaces.
One of the factors he cited as essential for creating a vibrant environment was what he called triangulation: “The process by which some external stimulus provides a linkage between people and prompts strangers to talk to each other as if they were not” (Whyte 1988, 154). This stimulus, or social catalyst, could be a performer, a minor altercation, a striking view, a popular food truck, and so on. Social catalysts change the relationship of the people in the space with each other, providing them with a shared experience and a reason to acknowledge each other. Public spaces have the potential to be social spaces; the purpose of the social catalyst is to unlock that potential.
In this final chapter, we will look at how communication technologies can function as catalysts that change the social dynamics of a physical space (Karahalios 2004). These works open the space to outside voices, give people new means with which to interact with each other, or change what the people in the spaces know about each other. Online, engaging in a discussion with strangers is a common experience. Face to face, however, such interactions are rare, especially in cities, where one can encounter thousands of people but greet no one. The goal of putting social catalysts in public spaces is to bring some of the online world’s open sociability to the physical world.
One might say, as a general rule, that acquainted persons in a social situation require a reason not to enter into a face engagement with each other, while unacquainted persons require a reasons to do so.
—Erving Goffman, Behavior in Public Places (1966)
What is the value of interacting with strangers face to face, and why are there high barriers to doing so? In the city, one reason is that the vast multitude of strangers makes it impossible to acknowledge each person. Louis Wirth, the early urban sociologist, wrote: “The reserve, the indifference and the blasé outlook which urbanites manifest in their relationships may thus be regarded as devices for immunizing themselves against the personal claims and expectations of others.” Were city dwellers expected to greet each passerby, life would come to a halt, gridlocked by graciousness.1 In a small community, however, people walk down the street amid a web of relationships, greeting many passersby, each nod reflecting a history of interactions and knowledge. Such communities need not be rural; even the largest of cities have village-like enclaves, neighborhoods where families have lived for generations (Young and Willmott 1992).
There are costs to making a connection: the small but still real effort required to be properly sociable (Goffman 1966); the responsibility you take on for helping the other if needed; and the possibility that you have just made yourself vulnerable to someone who could turn out to be oppressively strange. When people feel insecure, pressured, or intimidated, they may retreat from interaction. Several years ago, I was in a New York subway when a large and rather threatening man got on and began gesturing and shouting. Before this, people had been making the minute acknowledgments of everyday life, saying “Excuse me” if they bumped into each other, or looking quietly around at each other. But with the arrival of the ominously volatile man, everyone in the train looked down at their feet, and all conversation and eye contact stopped. By retreating to their private space and ceasing to acknowledge the presence of others, they were removing themselves from responsibility. Would they have helped each other if needed? Possibly, though less likely than if their immediate response had been joining together rather than retreating to isolation (Milgram 1964).
Mobility can create a lonely crowd. A suburban town inhabited mostly by newcomers and commuters living in perpetual transit can still feel isolated, even though it may be dressed up with quaint, village-like facades. When we are hurrying to get somewhere, late for work or late getting home, the cost of time in engaging with others—particularly in an urban space with its seemingly inexhaustible supply of strangers—is high. Rushing commuters avoid eye contact; they don’t want to be stopped to give directions. They want the street to be an efficient means for moving from one place to another.
Yet efficiency is not always the goal. In thriving, vibrant parts of our cities there are people sight-seeing, shopping, sitting in cafés, walking dogs, caring for children, or heading from one place to another—but not in a frenzied rush.2 People enjoy even brief interactions: the storeowner who chats with customers, the short conversation with a stranger catalyzed by a street performance. Interaction is an investment in a relationship, no matter how fleeting. These minor engagements still create a sense of connection with others and begin to dissolve the wall that says that they are not my responsibility.
Wirth observed that, in the life lived among strangers, “whereas the [urban] individual gains, on the one hand, a certain degree of emancipation or freedom from the personal and emotional controls of intimate groups, he loses, on the other hand, the spontaneous self-expression, the morale and the sense of participation that comes with living in an integrated society” (Wirth 1938, 12). The bonds between strangers and the sense of responsibility we each have to each other are tentative and delicate. Fostering them helps make a place not just more pleasant but safer (Milgram 1964, 1970).
Technology has exacerbated the problem of isolation in the city. People are involved in their own world of music and mobile updates, their eyes and ears focused on private entertainment, oblivious to the sights and sounds around them. Though the reader of a traditional print book might have been immersed in the story, the book’s cover itself could be a catalyst: “Is that as good as his last book?” people might have asked in a café or park. The move to electronic books removes the last vestige of public display from the act of reading.3 Yet technology need not be only an alienating force. Redesigning personal media devices to publicly display images related to the user’s reading, viewing, or listening would reintroduce the communicative and catalytic aspect (see figure 12.1).
On a larger scale, technology can transform the nature of a public space, making deeper awareness of and communication with those around you into an inherent part of being there. That is not to say that all people want, all of the time, to engage with others. But many do, at least some of the time. Our goal here is to find the means of making this happen and to understand the contexts in which it works.
Mediated discussions and interactive games, performed in public spaces, can function as especially intriguing social catalysts.4 They can, like the street performer, function as entertainment. But they can also change the nature of what is public about a space, and by doing so, redraw its boundaries. They can add to a space by annexing other spaces through communicative portals. They can change the ephemerality of the space, bringing in additional history or recording for posterity. And they can introduce new roles and new players into the urban mix, ones that raise questions about volition and responsibility.
This chapter is about designs whose purpose is to provoke thought and conversation in public space. The variety of possible designs is immense. We will focus on three areas: video connections that establish a visual link between two spaces, telepresence systems that enable people to move and act in distant venues, and augmentation that adds virtual information to people and objects. These technologies each have the ability to catalyze social connections, but with different implications for the participants’ sense of safety, privacy, and engagement. I will primarily discuss works that have been built and installed in public, because seeing how people reacted to them is useful and often surprising. Most are artworks, rather than commercial projects; the goal of the former is to change how people relate and to make them think and talk, whereas the latter, which may well function as catalysts, usually have marketing as their ultimate goal.
Our senses define what is “near,” but each one does this a bit differently. Touch is quite limited; we can only feel things within the diameter of our arm span, just a few feet. We can see much farther: several miles, given an open horizon and a clear, sunny day. In the city, though, unless you are high up in a skyscraper, your view is quickly blocked by buildings. We cannot hear as far as we can see, but sound does travel through walls and around corners. The boundaries of what is proximate are set by our senses. For this reason, interfaces that connect us to other spaces, that expand proximity, seem magical. Today, we are somewhat accustomed to communicating with people at a distance, but there is still something extraordinary about a window through which we can see a distant scene as if it were a continuation of our own. Scale, setting, and interaction matter here. Live video streaming onto a computer screen at home seems just like TV. But a full-scale window into another space is like a doorway into another world, one that you could almost step inside of and be transported.
The first public creation of such a portal was Kit Galloway and Sherrie Rabinowitz’s aptly named Hole-in-Space (see figure 12.2), which connected a big display window of The Broadway department store window in Los Angeles with a window at Lincoln Center in New York City (Galloway and Rabinowitz 1980; SFMOMA 2008). Passersby could see and speak with their life-size counterparts in the other space. The connection ran for three nights, initially with no explanatory materials. The first night, people discovered and explored the installation, and on subsequent nights they were joined by others who had arranged to meet up with distant family and friends via the installation or to sing and dance with the bicoastal audience. Though there were some design problems, including off-center gaze and confusion about the symmetry of the communication, the installation was very successful (Karahalios 2004).
Hole-In-Space suddenly severed the distance between both cities and created an outrageous pedestrian intersection. There was the evening of discovery, followed by the evening of intentional word-of-mouth rendezvous, followed by a mass migration of families and trans-continental loved ones, some of which had not seen each other for over twenty years. (Galloway and Rabinowitz 1980)
Since then, other similar projects have been developed (see Karahalios 2004). Sometimes these electronic portals invite interaction, but simply connecting two spaces with video and audio does not automatically catalyze interaction between them. Often, passersby would see the people and activity from the other side, but would not attempt to communicate. For designers, the question is why do some installations, such as Hole-in-Space, work so well while others fail?
Galloway and Rabinowitz had taken care to make Hole-in-Space feel like an extraordinary spatial connection rather than a mundane videoconference. The display was bright, accessible, and easy to interact with. People appeared life-sized, and the picture filled the window to its edge, leaving no television-like frame: the distant people appeared to be right behind the glass. It was installed in busy public places, which drew crowds of entertainment-seeking shoppers and theater-goers. It was also made at a time before fast Internet connections and cheap cameras made live, two-way video common.
Framing, both physical and metaphorical, is important. Often, installations that attempt to capture the magical feeling that Hole-in-Space created miss it because their physical setting is off. For instance, when the distant scene is shown on a screen, even a life-size one, the frame of the screen breaks the illusion of common space, and the on-screen people seem no more present in the room than does the anchor on the evening news. An unexpected setting can help. A more recent project, Hole in the Earth, used video to connect a public square in Rotterdam, the Netherlands, with a popular mosque in Bandung, Indonesia, theatrically framing the work as a hole cut through to the other side of the earth (see figures 12.3 and 12.4). Although less gracefully intuitive than Hole-in-Space, it did succeed at making the now commonplace experience of video connectivity again seem extraordinary, reminding people of the distance, the sheer physical separation, between the two ends of an Internet connection (Ueda 2006).
Hole-in-Space ran for a limited time (it used very expensive satellite communication), which made it into an event; this created a much denser crowd and festive atmosphere than if it had been an ongoing installation. It is hard to know how people would have responded if it had been left up as a permanent feature in the New York and Los Angeles urban scenes. Would people continue to see it as a space where spontaneous street performance was allowed, where one could talk to and make faces at complete strangers? Or would it have faded into daily life, the people seen walking on the other side of its glass no more open to random interactions than any actually local person walking behind an ordinary storefront window?
Video portals such as Hole-in-Space and Hole in the Earth are about being there. They seek to reconstruct the experience of being in the presence of the other; their magic is in their erasure of distance. A video connection can also engage people by going beyond being there. Artificial reality pieces create an imaginary third space in which the distant participants interact. This computational third space can have its own “magical” effects and physics; exploring them is the catalyst for interaction.
One of the earliest artificial reality installations, Myron Krueger’s Videoplace, featured this symmetrical common ground (Krueger, Gionfriddo, and Hinrichsen 1985; Krueger 1991). Krueger’s work showed participants as silhouettes, and he used those forms to explore the nature of interaction, creating scenarios with different rules of engagement.
In Krueger’s work, people interact with gestures, their silhouettes bumping up against each other with surprising but believable effects. In one scenario, your silhouette is a tiny form, and you move it about the screen with gestures: lifting an arm up makes it jump up and pointing in a direction causes it to go that way (see figure 12.5). The silhouette is simultaneously a depiction of self and a control device.
Krueger designed Videoplace to connect people in separate spaces (though many of its applications work well on their own also). Discussing the aesthetics of artificial reality, he noted:
The relationship of one viewer to another can … be the explicit subject of the work. Thus, for the first time, the artist can compose relationships between friend and stranger where the very nature of the interaction can be changed as casually as we change the subject in a conversation. … Rather than isolating people further from one another, the challenge for artificial realities is inventing new ways to bring people together. (Krueger 1991, 93)
Erving Goffman (1966, 126) noted that although in many circumstances it was not acceptable for unacquainted people to spontaneously speak to each other in public, there were a number of exceptions. One was when people were engaged in an “unserious sport.” The movements and gestures that people make to interact with a system such as Videoplace are light and unserious; by getting people to move in this way, the interface changes the rules of engagement.
Works that connect two spaces need to ensure that there are people at both sides, or else function gracefully in their absence. Hole-in-Space connected two very populous public spaces for a limited time; it did not matter that it was not interesting when the other side was vacant, for there was no lack of audience at either end. Many installations, however, are in places with fewer passersby—a pair of lobbies, common rooms, or cafés—where it is important that the screen present something interesting when only one space is occupied. Several of Krueger’s projects work well as standalone installations, able to entertain a solo visitor as well as connected ones.
Karrie Karahalios’s Telemurals* addressed this problem by functioning as a mirror as well as a portal (Karahalios and Donath 2003, 2004). Like Krueger’s pieces, Telemurals featured silhouettes of the participants interacting in a common, artificial space (see figure 12.6). As the participant moved, the silhouette became increasingly detailed, while lack of movement made it fade and fray. The goal with this visual transformation was to encourage people to gesture and move. This fading and filling in works even if no one is at the other location. Passersby became intrigued by it, and stopped to explore the installation, increasing the likelihood that they would be there when someone came by the other side. People could talk to each other, too. The audio was straightforward, but the screen also displayed their words as text, and the imperfections in speech recognition made this transcript (deliberately) more comical than redundant. By making speech into an entertaining game, this simultaneous mistranslation encourages people to talk at greater length than the stilted “hellos” that often make up much of the conversation among strangers in other mediated spaces.
In a public space such as Times Square, anyone can walk by. Connecting such a space to another, distant one does not fundamentally change how public it is. Semipublic spaces, however, are more limited. In an office, a common area may be public to the employees of the company, but not to the world at large. A company with widely dispersed satellite offices may want to have a way to connect these far-flung groups and provide a sense of connection, but that connection needs to function within the expectations of privacy and control of the semipublic space (Dourish and Bly 1992; Mantei et al. 1991).
Attempting to encourage sociability by placing a video connection in a space where people expect some privacy, as in an office, can backfire, causing people instead to limit their conversations to neutral and impersonal topics or to avoid the space altogether. Control and awareness are the important factors: giving people control over when the system is operational and providing intuitive indications of what is visible and audible to others help ensure that the connection works as a social catalyst rather than a deterrent (Bellotti and Sellen 1993; Dourish et al. 1996; Langheinrich 2001).
In a space where I have control over who enters physically, I should similarly have control over who enters virtually, including the ability to turn the system off. I don’t expect to be able to disable a system running in a public space, but I do expect to be able to do so if it is in my office, my home, or a conference room where I am running a meeting. Control also tests the installation’s value: when people have the ability to turn a system off, it must provide a sufficiently strong benefit for them to turn it back on again.
Awareness determines how easily people can intuit what others see of them. If I am unaware that there is a camera in my space and people elsewhere can see me, then I am under covert surveillance. Users or administrators of public video installations sometimes attempt to create awareness by putting up signs announcing that cameras are operating in an area. Though their wording may be more cheerful, these signs have the same chilling effect as the warnings near surveillance cameras: you are being watched and should constrain your behavior accordingly. This effect is desirable for a security system, but not for a social catalyst. Plus, people tend to overlook such signs; they are just one more notice in an environment overloaded with information.
A more graceful way to foster awareness is with an interface that makes the connection to another space intuitively apparent. An interface that resembles a window, such as Hole-in-Space, provides awareness since we assume windows have two-way visibility: when the other space is empty, we know no one is there to see us, and when it is occupied, we assume those present can see us, just as we see them. The window illusion is essential, for an interface that simply appears as a video screen does not engender this assumption of symmetry; we see video screens all the time without thinking that the people in the programs can see us (indeed, believing that the people on TV are watching you is a sign of mental instability). A common virtual space can also create awareness of distant others and their perception of you, but how vividly or intuitively it does so depends on its design. Both Telemurals and Videoplace present a common virtual space to both sides, but in those unfamiliar environments, users may initially be unsure of what the distant viewers see or even that there is another set of viewers.
The screen-based works discussed above emphasize the appearance and movements of the participants. Being flat, they are always in the role of window or wall. They are works that emphasize symmetry, making a connection between two spaces where each site has a screen and camera, and each user can see and be seen. A different type of social catalyst allows remote visitors to interact with people in the space via a physical object, essentially a sculptural or robotic avatar. This embodied tele-interaction opens up the public space to interactions with people from outside, whose own location and identity may be unknown. Here, aspects of the online world, such as frictionless entry and exit from conversations and anonymous participation, enter (or intrude upon) the face-to-face, local environment.
One design goal of embodied tele-interaction is to give heft to the remote visitor. If you have ever sat in a meeting where some people participated via speakerphone you know that locally present people attract one’s attention more than an invisible virtual presence does; when the distant participants are not speaking, people quickly forget about them. The shape and scale of the physical avatar can give them significant presence in the space. We designed the AgoraPhone and the Chit Chat Club projects to explore the forms and interactions that would provide remote participants with substance, while not overwhelming their personality and message.
At the same time, the remote participant has less at stake in maintaining the civility of the space. We know that when people can make anonymous comments online, the quality of discourse plunges. People do not behave this way when out in public because they are identifiable and because other people, who might become angry and offended, are physically present. Giving voice and presence in a space to people who have the safety of distance and the cloak of anonymity has the potential to invite hostility, rather than make the space more sociable. Thus, another important design goal is to foster constructive remote participation.
The AgoraPhone* was an installation that gave voice in a public space to distant, anonymous people; it relied on the visual cues of its design to maintain civility (Dobson 2002a). Its physical manifestation was a human-sized and vaguely humanoid sculpture that was installed on a well-traveled walkway plaza (see figure 12.7). It was painted a warm orange, with a big trumpetlike opening for speaking and listening. Postcards and online notices advertised its service:
AgoraPhone is a free, uncensored, and easily accessed communication place. … Combining increasingly popular mediated communication allowances with old school public interaction, the first installation of AgoraPhone consists of a phone number that can be dialed from anywhere, and a communication sculpture installed as an element of urban architecture. From any touch-tone phone anywhere, people can call AgoraPhone’s number and be connected to the public place. (Dobson 2002a)
People called from far away, and from cell phones elsewhere in the plaza; they called to play music or seek anonymous advice. It created a public platform in the plaza for the emblematic contemporary speaker, mediated and anonymous. And, it created a spectacle in the plaza that gave the passersby a reason to stop, listen, and respond.
The name “AgoraPhone” comes from the ancient Greek word agora meaning a public space where people gather for debate, discussion, and other civic and social purposes. Today, many cities have spaces that are set aside for public speech, perhaps the most famous being the Speaker’s Corner in Hyde Park in London. Expectations about behavior are different in such places than in ordinary plazas; it is acceptable to stand on a platform to lecture or attempt to rally the gathering crowd, behavior that would seem odd, if not illegal, in many other public spaces. In such a space, anyone can get up and declaim, but the audience is also free to speak its mind and often heckles the speaker. The AgoraPhone creates its own speaker’s corner, with a local audience and remote speaker. The distance gives the caller safety, a feature that the service’s advertisements highlighted: “Is there something you have been aching to express or discuss, but for one reason or another have not yet found a way to feel comfortable doing so? Dial AgoraPhone!”
Although anonymous, remote callers to the AgoraPhone were surprisingly well behaved. This was partly because of the advertisements, which depicted it as a supportive experience. The interface also played a role. Kelly Dobson, the designer, considered having a website where the remote user could view the plaza, but ultimately decided it would be too surveillance-like, and being able to surreptitiously view the passersby and then speak to them might invite pranksters hoping to surprise and shock people Thus, most remote speakers did not even know if anyone was there when they called; the only way for them to achieve any connection was to attract and engage people who would respond.
The Chit Chat Club* used a similar technology but was designed to be installed in cafés, a quite different setting. Cafés are semipublic places where people come to sit, observe the passing crowd, and converse with friends; some patrons want to read in solitude, while others hope to meet up with a neighborhood acquaintance or an agreeable stranger. Today remote socializing is common, and many people sit at a table chatting online or talking on a mobile phone. Yet, while the buzz of discussion among physically present people adds to the liveliness of the space, these one-sided conversations are dissonant, their rhythm and timbre at odds with the surroundings (Ling 2002). Chit Chat Club was designed to bring conversations with remote partners fully into the physical present.
Chit Chat Club was a set of human-scale avatar chairs, meant to be placed among regular tables and chairs. Each chair had a camera, a microphone, speakers, and a display (Karahalios and Dobson 2005). Via the Web, a remote person could occupy the chair, see and hear the people at the table, and converse with them. The chairs gave the visitor the scale and voice to be a full presence at the table. Conversation via the avatar chair was novel, yet at the same time natural; since the avatars occupied their place much as a human in a chair would, including them in the conversation and turning to face them when addressing or listening to them seemed quite intuitive.
Chit Chat Club opened the café table to a wider public. The Web interface allowed anyone to occupy a chair; a conversational group could thus become a circle of public discourse rather than a private discussion. For the participants in the café, conversing with the avatar did not bring them to a private world, as the phone does; they could engage with it and with the people and activity around them. And, the avatar chairs gave the distant participant both a better sense of what was occurring in the public space and an audible voice in the conversation. (Interestingly, some people were convinced the chair itself was a very intelligent robot, rather than a physical avatar that a person controlled and was speaking through.)
Chit Chat Club included several different avatar chairs (see figures 12.8 and 3.16). Three had cartoon faces, one featured abstract graphics, and one communicated by text. People found the chairs with faces to be the most natural to talk to, and they were usually the most popular. The text chair was designed in part to accommodate people who did not have a computer microphone or preferred not to speak out loud; it used large animated fonts to display the distant person’s written responses. By making text big and striking, it allowed a distant person who was limited to text input to be a full-scale participant in a spoken conversation. The abstract chair was visually striking, but harder to engage with for small conversations. Abstraction, we found, works better for public declamation in which the tele-interactive sculpture is a platform for performance. For the intimacy of conversation, text and figurative interfaces appear more approachable and sympathetic.
Projects such as the AgoraPhone and Chit Chat Club open up a public space to outside communication. They enable a new hybrid interaction, one that combines the ease of engagement of online forums with a sense of physical presence. They were sculptures, able to transmit words, sounds, and images, but physically inert.5
Eric Paulos and John Canny’s PRoPs (Personal Roving Presences) were remotely inhabitable robots, physical avatars that can move around (Paulos and Canny 1998, 2001). In their various incarnations—a navigable blimp, a wheeled robot—the remote visitor was truly visiting, able to move around, explore the space, and be part of mobile society. The wheeled robot PRoP had an abstractly humanlike shape, including a head with a screen that showed the remote user’s face, a camera for eyes, and a microphone and speakers (see figure 12.9). Displaying the remote user was an important reminder that the rig was actually a telepresent visitor; otherwise, the robot would seem like an autonomous surveillance tool. Having the camera in the head meant the user would turn the robot to look at people and things; thus intuitively conveying the object of the telepresent person’s attention. The wheeled PRoP also had a fingerlike pointing device, which made it possible for the remote visitor to indicate specific objects.
Using a PRoP was a very different way to experience visiting a remote place than simply appearing on a static screen; you could physically interact with your environment. Paulos and Canny point out that having a large, remotely operated object in a public place requires considerable responsibility and trust. Even a well-meaning but careless operator could cause considerable harm to both the robot and bystanders by, for example, sending it tumbling down a flight of stairs.6 A difficult and important challenge is ensuring that the operator acts responsibly. The PRoP creates a situation of asymmetric presence. The operators’ experience of the world is virtual, seen on a screen, but their ability to have an impact on it is real. People online are often less inhibited, and less kind, than they are in person. This disinhibition, combined with the ability to effect remote physical action, means that ordinary people may, from their distance and behind the screen, cause harm they would not inflict in person.7
Paulos and Canny were able to limit access to the PRoP to trusted colleagues, carefully vetting who was allowed to affect a public space from afar. As more devices are connected to the Net—and already there are robots designed for children with instructions on how to set them up so that “anyone on the Internet … can activate your robot” (Hello Robot Software 2012)—the issue of how to regulate remotely controlled displays, recorders, robots, and other objects becomes increasingly complex (Goldberg 2001).
The moral questions surrounding telerobotics and the ability to effect action at a distance are most stark in the military, where officers seated safely in offices a continent away from the battlefield can send drones, telerobotic weapons, to maim and kill. On the one hand, using unmanned weapons lets a commander fight without risking the lives of his soldiers. Yet “the spectacle of Americans fighting wars with robots runs the risk of reviving the perception of the United States as a cowardly nation unwilling to back up its principles with genuine sacrifice” (Brzezinski 2003). Does saving the lives of one’s own soldiers make these weapons not only morally acceptable, but even morally imperative? Or is the ability to kill without cost morally reprehensible? Is it only by paying the cost (in this case, the lives of one’s own soldiers) that one can decide fairly whether to fight?8
The public plaza is, fortunately, a very different place than the battlefield. In the battlefield, the goal is winning at the ultimate competition, whereas in the plaza the goal is cooperation or at least coexistence. But we can draw useful analogies. To be physically present in the battlefield puts the soldier at risk; removing the soldier to a remote location sets up a situation some would argue is ethically wrong. Are there risks that a person in the plaza faces through his or her proximity to others, and does the loss of these risks affect the ethical balance?
People who share a public space engage in a complex, though often subconscious, performance demonstrating their willingness to conform to the community’s rules. This includes being dressed appropriately, maintaining the culturally prescribed distance between people, and using the subtle eye contact and gestures that allow us to, say, navigate who goes first in a narrow pathway (Goffman 1966).9 They wish to avoid costs ranging from loss of self-esteem (“I don’t want people to think I’m a slob”) to physical harm (“I don’t want those guys to hit me”).10
It is important to emphasize that in face-to-face social situations, the psychological costs of disobeying norms is, for most people, very high. Stanley Milgram carried out a series of experiments in which he required his graduate students to flout these norms; for example, they had to get on a subway car and request another passenger give up their seat, or intrude in a waiting line of people (Milgram et al. 1986; Milgram 1978). One important observation from the experiments was how difficult these acts were for the students to perform.
Most students reported extreme difficulty in carrying out the assignment. Students reported that when standing in front of a subject, they felt anxious, tense, and embarrassed. Frequently, they were unable to vocalize the request for a seat and had to withdraw. They sometimes feared that they were the center of attention of the car and were often unable to look directly at the subject. (Milgram 1978, 42)
The students were not afraid of physical reprisal; what challenged them was the fear of disapproval, of being seen as selfish or boorish. It did not matter that the people who would be making that judgment were strangers. It shows the depth to which we do care what strangers think of us; this concern about the regard of others maintains much of our social order.
How can we maintain at least some of that desire to stay in the good graces of others in a tele-interactive system? To start, the remote operator needs to feel that there are potential costs to them for being disruptive. One approach is for the system to impose costs externally, like requiring identified log-ins or requiring a deposit to use the system. You then behave appropriately because you might be fined or banned from using the system again in the future. However, these externally imposed costs require additional effort to monitor and impose penalties. Moreover, they do not inherently motivate people to behave cooperatively. Are there interface designs that would give people an inherent motivation to be cooperative?
Let’s start by looking at the opposite, at a (hypothetical) interface that makes it easy to misbehave. Webcams that provide live views of public spaces around the world are quite common. From their high-up, wide-angled perspective, you can see people walking down streets, sitting at tables, or driving their cars (e.g., figure 12.10). Although there is activity, it is not very interesting to watch, for the people are just distant beings milling around in their space. Now imagine that this view is the interface for a tele-interaction system; you can intervene in their space, perhaps just to speak to them, but maybe also to move an object around. You can see the reaction to your actions, the disruption to the others’ flow. Doing something that makes them pause for a second is a first step, like when a child, bored of watching ants go about their daily lives, places a twig on the anthill. But that too becomes dull. The child with a stick at first tentatively and then more ferociously pokes at the hole, stirring the ants to frenzied and entertaining reaction. What of the webcam viewer with a telerobot? The people, viewed from above, are faceless, unidentified, resembling ants more than fellow humans. The scale of their reaction is hard to judge. They might be shocked or frightened, but that is invisible in this view. The only reactions that are perceivable are big ones—people running, hiding. Face to face, we see individuals, we recognize and empathize with the emotions in their reactions; but from a distant view, their expression is invisible, and they cease to seem like individuals. Too easily, we can find excitement in action, in the big disruption that causes people to flee, that disrupts their orderly flow and sends them racing wildly in every direction.
Compare the bird’s-eye view interface with an installation in which the camera has a human-scale view and the people in the space can walk away from it, ignore it, or shut it off. In the latter case, the remote participant needs to work to keep an audience; engaging them, rather than disrupting them, creates a more interesting experience for all.
What would encourage the remote users of a telerobot to act properly? First, the remote operators need to be able to perceive the social nuances of potential and ongoing interactions. Even the most well-intentioned operator will act boorishly if he cannot discern who is willing to converse. Think about how you go about initiating conversation with a stranger—for instance, if you want to ask directions in a strange city. If many people are around, you do not approach the first person you see, but instead seek someone who appears open to this engagement. Some you rule out because they clearly seem in a hurry or avoid making eye contact. Ideally, you will find someone who responds to your initial signals that you are trying to ask something. People communicate their availability for interaction with words, eye contact, gestures, and movement. To engage graciously with people, the tele-operator must be able to perceive these signals (see Paulos and Canny 1997, 2001, for an in-depth discussion).
The people in the physical environment also need the ability to penalize poor behavior on the part of the robot’s operator, not only egregious disruption but also any inappropriate and annoying action. For example, we do not ordinarily walk up to random strangers and say “Hi there, honey!” in a goofy voice, but that might be tempting for some users of this system, giddily experimenting with the freedom they feel as distant unreachable actors.
The designers of the Chit Chat Club, AgoraPhone, and PRoPs all note that physical scale is important (Dobson 2002a,b; Karahalios and Dobson 2005; Paulos and Canny 1997). An object that is the height of an average person allows for natural interaction; if taller, it is intimidating, and if shorter, it forces the person to stoop uncomfortably.11 A balance of vulnerability is necessary between the people in the space and the remote operator. Giving the people in the physical space the ability to deactivate the robot temporarily—perhaps a button that silences it, shuts off its camera, or makes it stand still for a moment—creates this balance. It is not an exact equivalence: the people cannot see the remote operator or intrude upon his space, and the remote operator cannot push a button on a person to make her stand still. But, as we are with other people face to face, each is vulnerable to the other; each has a stake in creating a harmonious interaction.
The robotic PRoP could explore spaces and convey the operator’s words, but both its range and communicative ability were limited. Getting robots to perform reliably is not a simple task, and the world is full of things that are trivial for people to operate, but almost insurmountably difficult for a machine. For example, the robot PRoP could not open doors or operate elevators (indeed, its limited mobility is part of what rendered it safe, for it could be easily confined). Ken Goldberg, a colleague of Paulos and Canny’s and a pioneer in telerobotic art (Goldberg 2001), came up with the idea of the Tele-Actor, a setup similar in many ways to the PRoP, but with a person in the place of the robot. A tele-operated human is agile and intelligent. One can give it high-level tasks, far beyond the ability of current robots to understand and carry out. “Go to the store and get some water”; “Get a gin and tonic and drink it”; “Flirt with the blond guy in the gray T-shirt.” The Tele-Actor concept was used to explore different interaction scenarios (Goldberg et al. 2002).
In Tele-Direction* (designed by Goldberg, my students, and myself), a group of operators, rather than an individual, would together decide on what directives to send to the Tele-Actor. The actor had a head-mounted camera and microphone, which sent pictures and audio of her surroundings to the directors, and a screen on which she read the directions sent by them.
A key observation from this experiment was that the people around the Tele-Actor also needed to read the directions she was carrying out. Although the head-mounted camera and microphone gave some indication that there was something unusual about this person, if you did not understand that remote instructions were directing her actions, her behavior would seem at best disjointed and often bizarre or rude. Once we added a prominent display to the actor’s outfit showing the directions, people could better comprehend the situation and her motivations (see figure 12.12).
The simplest sort of representation is strict agency. … So long as I do not, either in person or through my agent, join in the enactment of that by which I am governed, I cannot justly claim to be autonomous.
—Robert Paul Wolff, In Defense of Anarchism (1970)
The project raised interesting questions about autonomy and responsibility. If the directors said to, say, steal someone’s phone, or tell someone his hair looked funny, who was responsible? Was it the actor, because after all she was not a robot but a real person with ethics and judgment? Was it the directors? All of them, or only those who voted for the irresponsible act?
With the first iteration of the Tele-Actor, the semi-anonymity of group directing combined with their remoteness created a system in which few directors felt much responsibility for their actions, and the directions reflected it. At a business lunch, the Tele-Actor was told at one point to grab and eat a piece of food off of someone’s plate; at another, she was requested to jump on a table and bark like a dog. The slapstick tone of the directions reflects the theatrical feel of the project: the Tele-Actor is a form of street performer, a jester who can both entertain and disturb. The human in the loop here is an intelligent being, but that intelligence was used simply to facilitate blindly carrying out orders. How could the design be changed to inject judgment and responsibility into the system?
A second iteration, the Tele-Reporter*, explored this question (Tang 2002). The basic setup was the same, with two key distinctions. The role of the actor was changed to reporter, and the controlling software had a new system of rewards and discouragements to encourage constructive behavior.
Tele-Reporter was designed for the public space as agora, a place where people come together for debate and discussion. Many people who want to participate in public discussion—community or work meetings, for example—are unable to attend; they are at work, caring for children, or out of town. With a Tele-Reporter as representative, they could join in the discussion. The directors voted among themselves on the most important points to make and the Tele-Reporter would, after vetting them, convey them as eloquently and assertively as the situation required.
Whereas the Tele-Actor was meant to be subordinate to the directors, the Tele-Reporter was in charge. The Tele-Reporter could veto an inappropriate directive; moreover, anyone who had voted for something that was subsequently vetoed was briefly suspended from participating. Here, the system depends on the reporter’s integrity. Is she—can she be—responsible to both the people with whom she is physically present and to the remote people she represents?
These experiments with tele-operated humans were provocations intended to explore some of the complex conceptual and ethical questions in creating remotely controlled interactions. By removing today’s primitive robot from the equation and replacing it with an intelligent human (a stand-in for future, smarter robots), they were able to highlight our need for clarity about motivations and explore design solutions for fostering responsibility.
Steve Mann is a computer science professor who has lived since the 1980s as a cyborg, existing in conjunction with a variety of applications, including augmented information about the world and a camera that publishes what its wearer sees. Although video streams of one’s life are now rather common online, Mann was among the first to have the ability to automatically post pictures of his surroundings as he navigated them (see figure 12.13); in the process, he set off some of the earliest arguments about which spaces were truly public, and thus fair game for webcams, and which were not. For Mann, his webcam is a response to the increasingly pervasive surveillance under which we live. Stores have cameras to deter theft and to study customer behavior; streets have cameras that were initially placed to prevent major crimes such as terrorism, but which are increasingly used to pursue minor crimes and antigovernment actions (Rosen 2005). Mann objects in particular to the one-sided nature of the recording; stores can, with very little oversight about what they do with this video, record you throughout their premises, while they forbid you from taking photographs in them.
Mann terms his “surveilling the surveillers” sousveillance. It is simultaneously a suggestion of collective surveillance, a performance to increase people’s awareness of the extensive surveillance they are under, and a dramatic retort to the powerful and secretive surveillers (Mann, Nolan, and Wellman 2003).
Mann’s cyborg connectivity seemed bizarrely eccentric in the mid-1990s, when he first began posting images to the Web from his ever-present gear. But webcams quickly moved into dorms, offices, and bedrooms, and other mobile recorders soon followed. Today we are accustomed to acting under the eye not only of innumerable surveillance cameras, but also under the gaze of many other random people’s recording devices, taking photos and videos destined to live indefinitely online, probably but not necessarily in obscurity.
A cyborg being combines both the human and the technological. In science fiction, cyborgs may be strange creatures of flesh, metal, and silicon. In our everyday life, as we increasingly augment ourselves with cell phones, computers, and cameras, we are all (or at least those of us who can afford these devices) becoming cyborgs (Haraway 1994; Turkle 2006). Our cyborg selves have already transformed public space. People carve chunks of private space out of the public sphere when their attention is focused elsewhere, on phone calls and messages. Our attached devices change our motivations and volition because we are responding to distant rather than nearby needs. And we become transmitters, too, sending images and sounds from the public space to unknown locations, opening our surroundings to unknown eyes and judges, far away in time or space.
Information is a social catalyst. The more we know about another person, the more likely it is that we will be able to discover common interests or experiences. A good hostess, upon introducing two of her guests to each other, will mention something about each of them that will help start their conversation. Although strangers on a city street today seldom spontaneously introduce themselves, on the rare occasions that they do, it is often because they have figured out that they have something in common (Milgram 1977). Something as simple as a T-shirt with one’s college or favorite band on it can thus function as an informational social catalyst.
We can cover ourselves (or our cars) with only so much data about our interests and beliefs. Plus, T-shirts and bumper stickers are not adaptive: they show all the information they have to everyone. Computational mobile devices (smartphones) can selectively make information about their bearer public. Location-based social networks and social connection programs let people control what data about themselves they wish to make available and to whom; if I am in the same location as you and you have allowed me access to your information, these services will tell me of your presence—and about your identity and interests (Eagle and Pentland 2005; Humphreys 2008).
RFID (radio-frequency identification) tags raise privacy concerns. Although it would be difficult to surreptitiously implant a tag in someone, as more consumer goods—clothing, shoes, and so on—come to have embedded tags (for inventory control), we are likely to be walking around emitting various identifiers that can be linked to us. A network of tag readers, for example at store and restaurant entrances, could then build up a detailed picture of our day as we wander about town (Lockton and Rosenberg 2005).
A key technological piece in any augmentation system is the computer-readable identifier of a physical object or person. If an object is large and stationary, such as a restaurant, its location can serve as its identifier. Mobile objects can have identifying tags, such as the RFID (radio-frequency identification) tags used to identify pets, runners, library books, and so on.
An augmented reality device that recognizes objects can provide you with supplementary data about them, such as their botanical name or what they are called in a foreign language. A device that recognizes specific people can tell you everything that is publicly known about them (Starner et al. 1997). Researchers have experimented with providing people in conferences and similarly semipublic situations with “active badges” that both identify people and provide some details about them, such as their work expertise, hobbies, and the like (Borovoy et al. 1998; McCarthy et al. 2004). The goal of these projects is to make it easier for people to introduce themselves to each other. They are especially useful in what we might call proto-communities: groups of people who do not actually know each other, but have some common bond that establishes trust and a feeling of camaraderie and potential interest.12 Some provide a public display identifying people in the nearby area (Churchill et al. 2004; McCarthy et al. 2004). Others send private messages to people when they pass near each other (Borovoy et al. 1998; Esbjörnsson, Juhlin, and Östergren 2004). These systems work as computational social facilitators, automatically providing introductions that a gracious host would normally offer.
These experiments used opt-in technologies: participants actively chose to wear a tag that identifies them to the system and could remove it at will. They were also able to choose what information the system would reveal about them. The resulting displays were more likely to fail because they were bland (it may not be that exciting to discover that you and Mary share an interest in optical fiber technology) than because they were distressing.
Yet it is increasingly possible to identify people without their consent. Today we can gain a certain degree of privacy in public simply by being somewhat of an enigma to our fellow strangers. Although there may be a lot of information about me available online, someone who sees me on the street but does not know my name cannot connect that material to me. Yet with machines’ growing ability to recognize people, this gap is disappearing. Once the computer can attach a name to a face, it can attach the myriad data that accompanies that name.13
Augmented reality techniques could let us see a data portrait of each person superimposed on his or her body, either privately, as viewed through computer-aided glasses, or publicly, projected onto the person themselves. In chapter 8 we looked in detail at some of the approaches to and concerns with portraying people in terms of the data about them. What changes when these portraits are attached to an actual, physical person?
How people see the information may be as important as what they see. At one end of the design continuum is the public display, such as a large projection showing a live video of the space with data superimposed on people’s images. Here the visualization is public, part of the common experience of everyone in the space. At the other end of the spectrum is surreptitious viewing. Augmented reality glasses are becoming both more powerful and more subtle. Steve Mann’s gear from the early 1990s obscured almost his entire head; today, such devices look like slightly awkward glasses. By the time computational face recognition is commonplace, it will not be obvious to anyone that the wearer of the streamlined gear is viewing an augmented world.14 And though by then one may assume that almost everyone is seeing an augmented scene, it will not be apparent what information they are viewing; the subject will not know she is being so observed.15 Yet will that allow people to be oblivious to it or cause them to be hyperaware, living in a panopticon-like situation of knowing that they may at any time be secretly observed by someone through both physical and virtual eyes?
We all live now with the possibility that somewhere, someone may be looking at our virtual data. Few of us think about this very often, if at all. And if we do, it is mostly out of concern that we are being ignored rather than a worry that we are being observed: Why has no one responded to my comment? Is anyone reading my posting? Yet something about the idea of people looking at us in person and seeing virtual information seems deeply unsettling. Is the problem the lack of control over what people will see or the combination of the data with our physical self? Perhaps as technologies that map face and data together become increasingly common, people will start thinking about maintaining their virtual profile as an integral part of their public image, their personal grooming.
But let us think for a moment also about the various guises in which public augmentation may occur. One scenario is a nightclub or public art piece, designed to provoke. It could be a wall of pictures of people in the space along with striking data portraits of them. Or, more vividly, a data spotlight that follows people around. Marie Sester’s ACCESS (Donath 2008) was a vision-enhanced robotically controlled spotlight that people (or a program) could use to highlight different people in a space (see figure 12.11). Now imagine that same spotlight, but instead of light, it projects your baby pictures, your status updates from years gone by, things people have said about you. Is this an invasion of privacy, or is privacy irrelevant here as the virtual becomes a key part of the search for attention, for status in a hierarchy of short-lived fame?
Today we have districts based on economics, industry, and the like. Places are zoned to be residential, to have no buildings over four stories, to have mixed-income housing, or light industry. In the future, we may have spaces zoned by information use, by the privacy laws that govern them. And similarly, establishments that today attract different clients by having soft, soothing background music or loud hardcore, by having easy-to-clean plastic tables or thick linen, may use personal information to create ambience.
The history that follows us now online will follow us everywhere. Today these scenarios seem intrusive, overly revealing. We can imagine that people would go to great lengths not to be identified, and that the dark glasses and hats of celebrity will become the norm for stepping out of the house. Yet we may well become accustomed to knowing a great deal about the strangers around us, so much so that the days when we knew only the surface appearance of others may seem like a disturbingly dark age of social and civil ignorance. People will think that having no data or being unidentified is a mark of disenfranchisement. They’ll want to make sure they are recognizable and that there is good and interesting information about them.
The stranger, as we think of him now, may cease to exist.