Chapter 1: A rhetorical view of captioning

This supplemental website includes all of the media clips discussed in Reading Sounds (University of Chicago Press, 2015). Purchase your copy of Reading Sounds today in paperback or e-book.

Four new principles of closed captioning

Every sound cannot be closed captioned.
Captioners must decide which sounds are significant.
Captioners must rhetorically invent and negotiate the meaning of the text.
Captions are interpretations.

Figure 1.1. Captioners offer interpretations within the constraints of time and space.

A clip from 21 Jump Street shows police officers Schmidt (Jonah Hill) and Jenko (Channing Tatum), dressed in black uniforms with matching black shorts and helmets, riding their police bicycles side-by-side on the park grass. In the featured frame, the bike cops are heading straight for the viewer. A small red light is visible on each bicycle’s handlebars. The cops are peddling to confront a small biker gang smoking pot on the other side of the park. Pounding rock music (uncaptioned) accompanies the pursuit but cuts out momentarily to call attention to the faint bicycle sirens, which sound like children’s toys. The sirens are captioned as (SIRENS WHOOPING SOFTLY), which is supposed to capture the ridiculousness of the scene. Packed into this single caption, then, is the reminder that these are not real cops because real cops would be burning rubber in a patrol car and blaring their sirens. Columbia Pictures, 2012. Blu-Ray.

Source: 21 Jump Street, 2012. Blu-Ray. Featured caption: (SIRENS WHOOPING SOFTLY).

Figure 1.2. All dog sounds are not created equal.

The top row contains two frames from an episode of Grimm (2011, Episode 1.1, NBC). In the top left frame, Nick (David Giuntoli) is shown in profile walking at night on a suburban street. A home in the background is lit by porch light. Large trees provide an ominous backdrop. The caption, [dog barking], is more than a stock sound to provide suburban ambience. A few seconds later in this scene, the same dog seems to be suffering, drawing the attention of Nick, who turns to face the camera in the top right frame. The accompanying caption is: [dog yelps, whines, goes silent]. The bottom row contains two frames from Extract (2009, Ternion Pictures), both of which are taken during a dinner table scene at night. In the bottom left frame, Joel (Jason Bateman) and Suzie (Kristen Wiig) are eating at their dining table with some unknown [DOG BARKING IN DISTANCE]. In the bottom right frame, Suzie stares blankly after Joel walks away from the table upset. The accompanying caption: [CRICKETS CHRIPING]. The dog barking in Extract is part of a stock soundscape that includes crickets chirping, whereas the dog sounds are an integral element of the horror storyline in the Grimm episode. The animal and insect captions in Extract end up intruding into the serious dinner discussion. TV source: Extract rebroadcast on Comedy Central and Grimm rebroadcast on the Syfy channel.

Source: Grimm, Episode 1.1, 2011. SyFy Channel. Featured caption: [dog barking].

Source: Extract, 2009. Comedy Central. Featured captions: [DOG BARKING IN DISTANCE] and [CRICKETS CHRIPING].

Figure 1.3. Where does distinct speech shade off into indistinct chatter?

In this clip from Avatar, Jake Sully (Sam Worthington), inhabiting his Na’vi avatar body, is shown in a mid-shot looking slightly off-camera to the viewer’s right. Jake has just wandered away from the scientists Grace (Sigourney Weaver) and Norm (Joel Moore), who are busy taking plant root samples. Captions create a clear line between distinct speech and indistinct background chatter, even though, sonically speaking, the dividing line is not always quite so obvious. Chattering is also a popular option for describing indistinct crowd noise, but captioners need to be mindful of the term’s gendered implications. The conversations of women have at times been described dismissively as chattering. Caption: (GRACE CONTINUES CHATTERING). Twentieth Century Fox, 2009. DVD.

Source: Avatar, 2009. DVD. Featured caption: (GRACE CONTINUES CHATTERING).

1. Captions contextualize.

Captioning is about meaning, not sound per se. Captions don’t describe sounds so much as convey the purpose and meaning of sounds in specific contexts. The meaning of a sound in a particular context may transcend its origins (Chapter 3). The precise sonic qualities of a squeaky water tap may be less significant than the act of turning the tap off: (TURNS TAP OFF). In such cases, the action trumps the sound. Additional examples include: [TURNS OFF RADIO], [ unbuckles seat belt ], [BLADE PULLS FREE], [ Snaps Oscar’s Neck ], and [HITS CYMBAL]. Onomatopoeia has a role to play in captioning but it must be used with care and when the visual context clearly informs the meaning of the captions.

Source: Young Doctor’s Notebook, Episode 1.1, 2012. Ovation Network. Featured caption: (TURNS TAP OFF).

Source: Inception, 2009. DVD. Featured caption: [TURNS OFF RADIO].

Source: Branded, 2012. Netflix. Featured caption: [ unbuckles seat belt ].

Source: Zombie Apocalypse, 2011. SyFy Channel. Featured captions: [BLADE STRIKES] and [BLADE PULLS FREE].

Source: The Faculty, 1998. DVD. Featured caption: [ Snaps Oscar’s Neck ].

Source: The Wedding Singer, 1998. FX Network. Featured captions: [HITS CYMBAL] and [STRUMS GUITAR].

Source: Family Guy, “Finders Keepers,” 2013. Netflix. Featured caption: (RINGING DOORBELL FRANTICALLY).

Source: Paul, 2011. DVD. Featured captions: (MIMICKING SPACESHIP FLYING) and (MIMICKING SPACESHIP DISAPPEARING).

2. Captions clarify.

Captions tell us which sounds are important, what people are saying, and what non-speech sounds mean. As a hearing viewer, I continually find myself relying on captions to learn characters’ names and apprehend unusual words such as “flobberworms.” (So that’s what Peter Pettigrew just said in the background of the Harry Potter movie!) Reading provides superior access over listening, particularly when a noisy environment may work against the listener’s ability to make out clearly what people are saying. The same goes for music lyrics that are transcribed on the screen for easy reading, as lyrics are famous for being misinterpreted by hearing fans.

Source: Harry Potter and the Prisoner of Azkaban, 2004. DVD. Featured words in this clip that become clear to listeners when transcribed: Lupin, Patronus, dementor.

Source: Iggy Azalea’s music video for “Fancy,” 2014. VH1. Music lyrics captioned, including: ♪ TAKIN’ ALL THE LIQUOR STRAIGHT, NEVER CHASE THAT ♪

Source: Shaun of the Dead, 2004. DVD. Music lyrics captioned, including: ♪ That’s why they call me Mr. Fahrenheit ♪

3. Captions formalize.

Captions tend to be presented in standard written English, with information about manner of speaking relegated to identifiers such as (drunken slurring). Nothing else about the speech will mark it as inflected or accented (e.g. drunk) except for a lone identifier at the beginning of the first speech caption. While standard English provides the fastest access to information, it comes at the expense of conveying the embodied aspects of speech. Embodiment is carried almost entirely by manner of speaking identifiers or simple phonetic transformations (e.g. gonna, can’t). While it is easy to find examples of substandard or phonetic spellings in speech captions, even these examples are informed by a desire to make the captions as fast to read as possible. Phonetic transcriptions are rhetorical insofar as they balance accuracy with accessibility. In this way, we might say that captions rationalize the teeming soundscape. Sounds that resist easy classification or simple description, such as mood music, are tamed or ignored altogether.

Source: The Internship, 2013. DVD. Featured caption: (SLURRING) Are you shitting me?

Source: Moonrise Kingdom, 2012. DVD. Featured caption: (SLOWLY) Do not cross this stick.

Source: Galaxy Quest, 1999. DVD. Featured caption: [Yelling In Slow Motion] NO!

4. Captions equalize.

Every sound tends to play at the same “volume” on the caption track. While there are ways of modulating the volume of captioned sounds and differentiating background from foreground sounds in the captions, these ways are limited and space-consuming. As a result, every sound tends to occupy the same sonic plane, making every sound equally “loud.”

Source: The Happening, 2008. DVD. Featured caption is an example of backchannel speech sounds that come forward when captioned: “[Man] I just walked down a quarter mile. It was clean.”

5. Captions linearize.

Sounds that are heard simultaneously can not be read simultaneously. Captions linearize sound by presenting the soundscape in a form that can be read one sound/caption at a time. Although it is unusual, multiple non-speech parentheticals can be presented on the screen at the same time. Multiple sounds can also occupy the same caption – e.g. see District 9’s (2009) [ALIEN GROWLS AND PEOPLE SHOUTING INDISTINCTLY] and [RAPID GUNFIRE AND MEN SHOUTING IN DISTANCE]. Multiple, simultaneous sounds can also be reduced to single captions such as [overlapping chatter] and [overlapping shouts] from Silver Linings Playbook (2012). But simultaneous sounds must still be read one at a time. The caption reader thus experiences the film soundscape as a series of individual captions.

Source: District 9, 2009. DVD. Featured caption: [ALIEN GROWLS AND PEOPLE SHOUTING INDISTINCTLY].

Source: District 9, 2009. DVD. Featured caption: [RAPID GUNFIRE AND MEN SHOUTING IN DISTANCE].

Source: Silver Linings Playbook, 2012. DVD. Featured caption: [overlapping shouts].

Source: Aliens vs. Predator: Requiem, 2007. DVD. Featured captions: (low hissing, growling), (whimpering, crying).

6. Captions time-shift.

Viewers do not necessarily read at the same rate as characters speak. Speech captions don’t always start precisely on the first beat of the utterance being captioned. The same is true for non-speech captions, which may precede or follow the sounds being captioned. I devote Chapter 5 to exploring some of the ways in which captions give advance notice to readers. Even something as seemingly innocuous as a dash at the end of a caption can alert caption readers to a forthcoming interruption in speech. Names in non-speech captions can also give away plot details. For example, when [GINA SCREAMS] in Unknown (2011), caption readers can guess that Gina is more than an insignificant taxi driver. Readers not only learn the taxi driver’s name before listeners do but also venture a guess that Gina will return later in the narrative. I coin the term “captioned irony” – adapting the concept of dramatic irony – to describe cases in which caption readers know more or sooner than listeners who are watching with the captions turned off.

Source: Taken, 2008. DVD. Featured caption: “We can nego–“

Source: Paul, 2011. DVD. Featured captions: (PAUL THE DOG BARKS), “YOUNG TARA: Go on then, Paul.”

Source: Unknown, 2011. DVD. Featured caption: [GINA SCREAMS].

7. Captions distill.

The soundscape is often pared down to its essential elements in the caption track. Only the most significant sounds are represented. Exceptions abound, as when ambient PA announcements are overcaptioned as verbatim speech. But for the most part, ambient sounds tend to be reduced to single captions or not captioned at all. Music is distilled to a simple description and/or captioned music lyrics. Captions reconstruct the narrative as a series of elemental sounds. This process also transforms sustained sounds – instrumental music, environmental noise, ambient sounds – into discrete, one-off captions. Consider a tense scene in Terminator 3 (2003) in which the evil terminator (Kristanna Loken) has broken into a veterinarian clinic looking to kill the vet, Kate Brewster (Claire Danes). As Kate confronts John Connor (Nick Stahl), whom she has trapped in a dog cage in one of the exam rooms, the commotion in other areas of the clinic is reduced to a series of elemental sounds/captions: [GLASS BREAKING], [DOGS BARKING], [DOGS BARKING], [WOMAN SCREAMS], [GUNSHOTS], [GASPING]. In this example, the captions construct a narrative out of key sounds: the terminator breaks a window to gain entry to the clinic, the dogs react, a customer screams before being shot, and Kate gasps when she sees the customer’s body fall. These are the essential moments of the scene, each of which is mapped onto a corresponding caption.

Source: Terminator 3: Rise of the Machines, 2004. SyFy Channel. Featured caption: [GUNSHOTS].

Table 1.1. Non-speech captions from a single movie displayed in table format.

A table showing the first twenty search results for the non-speech DVD captions in Lincoln. The table has five columns: Caption number, search result number, caption start time, caption end time, and caption text. The hyperlinks in the first column (which are inactive here) take users to the location in the full caption file where that caption appears. Touchstone Pictures, 2012.

Caption Number	No.	Start Time	End Time	Text
1	1	00:01:19,184	00:01:20,845	(THUNDER RUMBLING)
2	2	00:01:32,130	00:01:33,620	(MEN CLAMORING)
3	3	00:01:41,206	00:01:43,197	(YELLING)
4	4	00:01:46,411	00:01:48,504	(SCREAMING)
5	5	00:01:59,524	00:02:00,684	(GROANING)
59	6	00:04:12,824	00:04:15,054	My last barber hanged himself.
59	6	00:04:12,824	00:04:15,054	(CHUCKLES)
62	7	00:04:20,832	00:04:22,129	(CHUCKLES)
70	8	00:04:36,414	00:04:39,383	Yeah. We heard you speak…
70	8	00:04:36,414	00:04:39,383	(STAMMERING) Goddamn.
86	9	00:05:03,207	00:05:04,401	(STAMMERING)
146	10	00:08:24,175	00:08:25,506	(LINCOLN SIGHS)
169	11	00:09:43,421	00:09:44,888	-(DOOR CREAKING)
169	11	00:09:43,421	00:09:44,888	-Oh!
175	12	00:10:06,677	00:10:07,644	(DOOR OPENS)
176	13	00:11:23,888	00:11:25,116	(KISSES)
183	14	00:11:50,614	00:11:52,275	(PLAYING A MARCH)
184	15	00:12:02,092	00:12:03,286	(MUSIC STOPS)
190	16	00:12:42,533	00:12:43,932	(AUDIENCE LAUGHING)
191	17	00:12:44,034	00:12:45,126	(AUDIENCE CLAPPING)
192	18	00:12:45,269	00:12:47,567	ALL: (SINGING)
192	18	00:12:45,269	00:12:47,567	We are coming, Father Abraham
204	19	00:13:15,499	00:13:16,488	(LAUGHS) “Only twenty?”
237	20	00:14:33,444	00:14:35,378	-It’s too important.
237	20	00:14:33,444	00:14:35,378	-(KNOCKING ON DOOR)

Table 1.2. Search results for “indistinct chatter” displayed in table format.

A table showing the first seventeen search results for “indistinct chatter” using all the movies in the corpus. The table has six columns: Caption number, search result number, movie source, caption start time, caption end time, and caption text. The hyperlinks in the first column (which are inactive here) take users to the location in the full caption file where that caption appears.

Caption Number	No.	Source	Start Time	End Time	Text
1167	1	21 Jump Street – 2012	00:50:06,684	00:50:08,151	(INDISTINCT CHATTER)
511	2	Aliens v Predator – Requiem – 2007	00:38:34,700	00:38:38,329	(indistinct chatter in distance)
777	3	Aliens v Predator – Requiem – 2007	00:56:57,4026	00:57:00,530	(indistinct chatter in distance)
1271	4	Argo – 2012	00:01:46,411	00:01:48,504	(SCREAMING)
388	5	Beasts of the Southern Wild – 2012	00:43:19,677	00:43:21,645	[INDISTINCT CHATTER]
97	6	CSI-NY-Unspoken – 2012	00:13:37,346	00:13:38,973	(muffled, indistinct chatter)
140	7	CSI-NY-Unspoken – 2012	00:20:10,005	00:20:11,996	(indistinct chatter,
140	7	CSI-NY-Unspoken – 2012	00:20:10,005	00:20:11,996	phones ringing)
1127	8	Cloud Atlas – 2012	01:16:19,820	01:16:21,549	[INDISTINCT CHATTER OVER DEVICE]
1153	9	Django Unchained – 2012	01:28:11,577	01:28:12,771	[indistinct chatter]
966	10	Inglourious Basterds – 2009	01:55:47,387	01:55:48,877	(INDISTINCT CHATTERING)
57	11	Killing Them Softly – 2012	00:04:11,121	00:04:12,452	[indistinct chatter]
148	12	Killing Them Softly – 2012	00:08:27,043	00:08:28,032	[indistinct chatter]
178	13	Killing Them Softly – 2012	00:09:50,793	00:09:52,021	[indistinct chatter]
284	14	Killing Them Softly – 2012	00:15:51,120	00:15:53,953	– [indistinct chatter, laughter]
284	14	Killing Them Softly – 2012	00:15:51,120	00:15:53,953	– Go, go, go, go, go.
1217	15	Killing Them Softly – 2012	01:21:18,536	01:21:20,697	[indistinct chatter]
1331	16	Killing Them Softly – 2012	01:33:05,709	01:33:08,678	[indistinct chatter]
1335	17	Killing Them Softly – 2012	01:34:14,645	01:34:18,638	[indistinct chatter]
1345	18	Killing Them Softly – 2012	01:36:14,431	01:36:17,423	[indistinct chatter, birds chirping]
982	19	Les Miserables – 2012	01:10:52,228	01:10:54,093	(INDISTINCT CHATTERING)