Chapter 1: A rhetorical view of captioning

Four new principles of closed captioning

  1. Every sound cannot be closed captioned.
  2. Captioners must decide which sounds are significant.
  3. Captioners must rhetorically invent and negotiate the meaning of the text.
  4. Captions are interpretations.

Figure 1.1. Captioners offer interpretations within the constraints of time and space.

A clip from 21 Jump Street shows police officers Schmidt (Jonah Hill) and Jenko (Channing Tatum), dressed in black uniforms with matching black shorts and helmets, riding their police bicycles side-by-side on the park grass. In the featured frame, the bike cops are heading straight for the viewer. A small red light is visible on each bicycle’s handlebars. The cops are peddling to confront a small biker gang smoking pot on the other side of the park. Pounding rock music (uncaptioned) accompanies the pursuit but cuts out momentarily to call attention to the faint bicycle sirens, which sound like children’s toys. The sirens are captioned as (SIRENS WHOOPING SOFTLY), which is supposed to capture the ridiculousness of the scene. Packed into this single caption, then, is the reminder that these are not real cops because real cops would be burning rubber in a patrol car and blaring their sirens. Columbia Pictures, 2012. Blu-Ray.

Source: 21 Jump Street, 2012. Blu-Ray. Featured caption: (SIRENS WHOOPING SOFTLY).

Figure 1.2. All dog sounds are not created equal.

The top row contains two frames from an episode of Grimm (2011, Episode 1.1, NBC). In the top left frame, Nick (David Giuntoli) is shown in profile walking at night on a suburban street. A home in the background is lit by porch light. Large trees provide an ominous backdrop. The caption, [dog barking], is more than a stock sound to provide suburban ambience. A few seconds later in this scene, the same dog seems to be suffering, drawing the attention of Nick, who turns to face the camera in the top right frame. The accompanying caption is: [dog yelps, whines, goes silent]. The bottom row contains two frames from Extract (2009, Ternion Pictures), both of which are taken during a dinner table scene at night. In the bottom left frame, Joel (Jason Bateman) and Suzie (Kristen Wiig) are eating at their dining table with some unknown [DOG BARKING IN DISTANCE]. In the bottom right frame, Suzie stares blankly after Joel walks away from the table upset. The accompanying caption: [CRICKETS CHRIPING]. The dog barking in Extract is part of a stock soundscape that includes crickets chirping, whereas the dog sounds are an integral element of the horror storyline in the Grimm episode. The animal and insect captions in Extract end up intruding into the serious dinner discussion. TV source: Extract rebroadcast on Comedy Central and Grimm rebroadcast on the Syfy channel.

Two frames from Extract (2009) and Grimm (2011) featuring dog barking captions.

Source: Grimm, Episode 1.1, 2011. SyFy Channel. Featured caption: [dog barking].

Source: Extract, 2009. Comedy Central. Featured captions: [DOG BARKING IN DISTANCE] and [CRICKETS CHRIPING].

Figure 1.3. Where does distinct speech shade off into indistinct chatter?

In this clip from Avatar, Jake Sully (Sam Worthington), inhabiting his Na’vi avatar body, is shown in a mid-shot looking slightly off-camera to the viewer’s right. Jake has just wandered away from the scientists Grace (Sigourney Weaver) and Norm (Joel Moore), who are busy taking plant root samples. Captions create a clear line between distinct speech and indistinct background chatter, even though, sonically speaking, the dividing line is not always quite so obvious. Chattering is also a popular option for describing indistinct crowd noise, but captioners need to be mindful of the term’s gendered implications. The conversations of women have at times been described dismissively as chattering. Caption: (GRACE CONTINUES CHATTERING). Twentieth Century Fox, 2009. DVD.

Source: Avatar, 2009. DVD. Featured caption: (GRACE CONTINUES CHATTERING).

1. Captions contextualize.

Captioning is about meaning, not sound per se. Captions don’t describe sounds so much as convey the purpose and meaning of sounds in specific contexts. The meaning of a sound in a particular context may transcend its origins (Chapter 3). The precise sonic qualities of a squeaky water tap may be less significant than the act of turning the tap off: (TURNS TAP OFF). In such cases, the action trumps the sound. Additional examples include: [TURNS OFF RADIO], [ unbuckles seat belt ], [BLADE PULLS FREE], [ Snaps Oscar’s Neck ], and [HITS CYMBAL]. Onomatopoeia has a role to play in captioning but it must be used with care and when the visual context clearly informs the meaning of the captions.

Source: Young Doctor’s Notebook, Episode 1.1, 2012. Ovation Network. Featured caption: (TURNS TAP OFF).

Source: Inception, 2009. DVD. Featured caption: [TURNS OFF RADIO].

Source: Branded, 2012. Netflix. Featured caption: [ unbuckles seat belt ].

Source: Zombie Apocalypse, 2011. SyFy Channel. Featured captions: [BLADE STRIKES] and [BLADE PULLS FREE].

Source: The Faculty, 1998. DVD. Featured caption: [ Snaps Oscar’s Neck ].

Source: The Wedding Singer, 1998. FX Network. Featured captions: [HITS CYMBAL] and [STRUMS GUITAR].

Source: Family Guy, “Finders Keepers,” 2013. Netflix. Featured caption: (RINGING DOORBELL FRANTICALLY).


2. Captions clarify.

Captions tell us which sounds are important, what people are saying, and what non-speech sounds mean. As a hearing viewer, I continually find myself relying on captions to learn characters’ names and apprehend unusual words such as “flobberworms.” (So that’s what Peter Pettigrew just said in the background of the Harry Potter movie!) Reading provides superior access over listening, particularly when a noisy environment may work against the listener’s ability to make out clearly what people are saying. The same goes for music lyrics that are transcribed on the screen for easy reading, as lyrics are famous for being misinterpreted by hearing fans.

Source: Harry Potter and the Prisoner of Azkaban, 2004. DVD. Featured words in this clip that become clear to listeners when transcribed: Lupin, Patronus, dementor.

Source: Iggy Azalea’s music video for “Fancy,” 2014. VH1. Music lyrics captioned, including: ♪ TAKIN’ ALL THE LIQUOR STRAIGHT, NEVER CHASE THAT ♪

Source: Shaun of the Dead, 2004. DVD. Music lyrics captioned, including: ♪ That’s why they call me Mr. Fahrenheit ♪

3. Captions formalize.

Captions tend to be presented in standard written English, with information about manner of speaking relegated to identifiers such as (drunken slurring). Nothing else about the speech will mark it as inflected or accented (e.g. drunk) except for a lone identifier at the beginning of the first speech caption. While standard English provides the fastest access to information, it comes at the expense of conveying the embodied aspects of speech. Embodiment is carried almost entirely by manner of speaking identifiers or simple phonetic transformations (e.g. gonna, can’t). While it is easy to find examples of substandard or phonetic spellings in speech captions, even these examples are informed by a desire to make the captions as fast to read as possible. Phonetic transcriptions are rhetorical insofar as they balance accuracy with accessibility. In this way, we might say that captions rationalize the teeming soundscape. Sounds that resist easy classification or simple description, such as mood music, are tamed or ignored altogether.

Source: The Internship, 2013. DVD. Featured caption: (SLURRING) Are you shitting me?

Source: Moonrise Kingdom, 2012. DVD. Featured caption: (SLOWLY) Do not cross this stick.

Source: Galaxy Quest, 1999. DVD. Featured caption: [Yelling In Slow Motion] NO!

4. Captions equalize.

Every sound tends to play at the same “volume” on the caption track. While there are ways of modulating the volume of captioned sounds and differentiating background from foreground sounds in the captions, these ways are limited and space-consuming. As a result, every sound tends to occupy the same sonic plane, making every sound equally “loud.”

Source: The Happening, 2008. DVD. Featured caption is an example of backchannel speech sounds that come forward when captioned: “[Man] I just walked down a quarter mile. It was clean.”

5. Captions linearize.

Sounds that are heard simultaneously can not be read simultaneously. Captions linearize sound by presenting the soundscape in a form that can be read one sound/caption at a time. Although it is unusual, multiple non-speech parentheticals can be presented on the screen at the same time. Multiple sounds can also occupy the same caption – e.g. see District 9’s (2009) [ALIEN GROWLS AND PEOPLE SHOUTING INDISTINCTLY] and [RAPID GUNFIRE AND MEN SHOUTING IN DISTANCE]. Multiple, simultaneous sounds can also be reduced to single captions such as [overlapping chatter] and [overlapping shouts] from Silver Linings Playbook (2012). But simultaneous sounds must still be read one at a time. The caption reader thus experiences the film soundscape as a series of individual captions.

Source: District 9, 2009. DVD. Featured caption: [ALIEN GROWLS AND PEOPLE SHOUTING INDISTINCTLY].

Source: District 9, 2009. DVD. Featured caption: [RAPID GUNFIRE AND MEN SHOUTING IN DISTANCE].

Source: Silver Linings Playbook, 2012. DVD. Featured caption: [overlapping shouts].

Source: Aliens vs. Predator: Requiem, 2007. DVD. Featured captions: (low hissing, growling), (whimpering, crying).

6. Captions time-shift.

Viewers do not necessarily read at the same rate as characters speak. Speech captions don’t always start precisely on the first beat of the utterance being captioned. The same is true for non-speech captions, which may precede or follow the sounds being captioned. I devote Chapter 5 to exploring some of the ways in which captions give advance notice to readers. Even something as seemingly innocuous as a dash at the end of a caption can alert caption readers to a forthcoming interruption in speech. Names in non-speech captions can also give away plot details. For example, when [GINA SCREAMS] in Unknown (2011), caption readers can guess that Gina is more than an insignificant taxi driver. Readers not only learn the taxi driver’s name before listeners do but also venture a guess that Gina will return later in the narrative. I coin the term “captioned irony” – adapting the concept of dramatic irony – to describe cases in which caption readers know more or sooner than listeners who are watching with the captions turned off.

Source: Taken, 2008. DVD. Featured caption: “We can nego–“

Source: Paul, 2011. DVD. Featured captions: (PAUL THE DOG BARKS), “YOUNG TARA: Go on then, Paul.”

Source: Unknown, 2011. DVD. Featured caption: [GINA SCREAMS].

7. Captions distill.

The soundscape is often pared down to its essential elements in the caption track. Only the most significant sounds are represented. Exceptions abound, as when ambient PA announcements are overcaptioned as verbatim speech. But for the most part, ambient sounds tend to be reduced to single captions or not captioned at all. Music is distilled to a simple description and/or captioned music lyrics. Captions reconstruct the narrative as a series of elemental sounds. This process also transforms sustained sounds – instrumental music, environmental noise, ambient sounds – into discrete, one-off captions. Consider a tense scene in Terminator 3 (2003) in which the evil terminator (Kristanna Loken) has broken into a veterinarian clinic looking to kill the vet, Kate Brewster (Claire Danes). As Kate confronts John Connor (Nick Stahl), whom she has trapped in a dog cage in one of the exam rooms, the commotion in other areas of the clinic is reduced to a series of elemental sounds/captions: [GLASS BREAKING], [DOGS BARKING], [DOGS BARKING], [WOMAN SCREAMS], [GUNSHOTS], [GASPING]. In this example, the captions construct a narrative out of key sounds: the terminator breaks a window to gain entry to the clinic, the dogs react, a customer screams before being shot, and Kate gasps when she sees the customer’s body fall. These are the essential moments of the scene, each of which is mapped onto a corresponding caption.

Source: Terminator 3: Rise of the Machines, 2004. SyFy Channel. Featured caption: [GUNSHOTS].

Table 1.1. Non-speech captions from a single movie displayed in table format.

A table showing the first twenty search results for the non-speech DVD captions in Lincoln. The table has five columns: Caption number, search result number, caption start time, caption end time, and caption text. The hyperlinks in the first column (which are inactive here) take users to the location in the full caption file where that caption appears. Touchstone Pictures, 2012.

Caption Number No. Start Time End Time Text
1 1 00:01:19,184 00:01:20,845 (THUNDER RUMBLING)
2 2 00:01:32,130 00:01:33,620 (MEN CLAMORING)
3 3 00:01:41,206 00:01:43,197 (YELLING)
4 4 00:01:46,411 00:01:48,504 (SCREAMING)
5 5 00:01:59,524 00:02:00,684 (GROANING)
59 6 00:04:12,824 00:04:15,054 My last barber hanged himself.
62 7 00:04:20,832 00:04:22,129 (CHUCKLES)
70 8 00:04:36,414 00:04:39,383 Yeah. We heard you speak…
86 9 00:05:03,207 00:05:04,401 (STAMMERING)
146 10 00:08:24,175 00:08:25,506 (LINCOLN SIGHS)
169 11 00:09:43,421 00:09:44,888 -(DOOR CREAKING)
175 12 00:10:06,677 00:10:07,644 (DOOR OPENS)
176 13 00:11:23,888 00:11:25,116 (KISSES)
183 14 00:11:50,614 00:11:52,275 (PLAYING A MARCH)
184 15 00:12:02,092 00:12:03,286 (MUSIC STOPS)
190 16 00:12:42,533 00:12:43,932 (AUDIENCE LAUGHING)
191 17 00:12:44,034 00:12:45,126 (AUDIENCE CLAPPING)
192 18 00:12:45,269 00:12:47,567 ALL: (SINGING)
We are coming, Father Abraham
204 19 00:13:15,499 00:13:16,488 (LAUGHS) “Only twenty?”
237 20 00:14:33,444 00:14:35,378 -It’s too important.

Table 1.2. Search results for “indistinct chatter” displayed in table format.

A table showing the first seventeen search results for “indistinct chatter” using all the movies in the corpus. The table has six columns: Caption number, search result number, movie source, caption start time, caption end time, and caption text. The hyperlinks in the first column (which are inactive here) take users to the location in the full caption file where that caption appears.

Caption Number No. Source Start Time End Time Text
1167 1 21 Jump Street – 2012 00:50:06,684 00:50:08,151 (INDISTINCT CHATTER)
511 2 Aliens v Predator – Requiem – 2007 00:38:34,700 00:38:38,329 (indistinct chatter in distance)
777 3 Aliens v Predator – Requiem – 2007 00:56:57,4026 00:57:00,530 (indistinct chatter in distance)
1271 4 Argo – 2012 00:01:46,411 00:01:48,504 (SCREAMING)
388 5 Beasts of the Southern Wild – 2012 00:43:19,677 00:43:21,645 [INDISTINCT CHATTER]
97 6 CSI-NY-Unspoken – 2012 00:13:37,346 00:13:38,973 (muffled, indistinct chatter)
140 7 CSI-NY-Unspoken – 2012 00:20:10,005 00:20:11,996 (indistinct chatter,
phones ringing)
1127 8 Cloud Atlas – 2012 01:16:19,820 01:16:21,549 [INDISTINCT CHATTER OVER DEVICE]
1153 9 Django Unchained – 2012 01:28:11,577 01:28:12,771 [indistinct chatter]
966 10 Inglourious Basterds – 2009 01:55:47,387 01:55:48,877 (INDISTINCT CHATTERING)
57 11 Killing Them Softly – 2012 00:04:11,121 00:04:12,452 [indistinct chatter]
148 12 Killing Them Softly – 2012 00:08:27,043 00:08:28,032 [indistinct chatter]
178 13 Killing Them Softly – 2012 00:09:50,793 00:09:52,021 [indistinct chatter]
284 14 Killing Them Softly – 2012 00:15:51,120 00:15:53,953 – [indistinct chatter, laughter]
– Go, go, go, go, go.
1217 15 Killing Them Softly – 2012 01:21:18,536 01:21:20,697 [indistinct chatter]
1331 16 Killing Them Softly – 2012 01:33:05,709 01:33:08,678 [indistinct chatter]
1335 17 Killing Them Softly – 2012 01:34:14,645 01:34:18,638 [indistinct chatter]
1345 18 Killing Them Softly – 2012 01:36:14,431 01:36:17,423 [indistinct chatter, birds chirping]
982 19 Les Miserables – 2012 01:10:52,228 01:10:54,093 (INDISTINCT CHATTERING)

