Tracking sonic timelines in closed captioning

Verbs are the heart of nonspeech captions, especially when paralinguistic sounds are involved (grunting, laughing, crying, etc.), because captioning nonspeech is fundamentally about representing and embodying action (which is what verbs do).

Note, first, the distinction between discrete and sustained sounds. Nonspeech sounds that have a clear beginning and end are discrete or one-off sounds — e.g. a single cough or short grunt. Discrete sounds can be indicated in nonspeech captions using the third-person present tense form of the verb (-s or -es ending):

(LAUGHS)
(GRUNTS)
(SIGHS)
(SHOUTS)
etc.

Sustained sounds are conventionally indicated using the present participle form of the verb (-ing ending). The present participle creates the continuous tense. The -ing ending can be used to produce a state of ongoing action, the repetition of a nonspeech sound (such as a fit of coughing), ambient sounds, or multiple, overlapping instances of a type of action (such as many people clamoring or shouting).

(LAUGHING)
(GRUNTING)
(SIGHING)
(SHOUTING)
etc.

Both discrete and sustained sounds/verbs can be qualified to the left with nouns (MAN SHOUTS) and to the right with adverbials: (SOBS LOUDLY), (BARKING IN THE DISTANCE). (For more on building paralinguistic nonspeech captions, see Chapter 2 of my book, Reading Sounds).

Here are two quick examples that make use of both discrete and sustained nonspeech captions. Both of these examples are discussed in my book:

From Inception (2010): A single cough followed by a fit of coughing. The former is a single, discrete sound captioned with the simple present tense as [COUGHS], while the latter is a continuous, repetitive sound captioned with the present participle as [SAITO COUGHING].

Source: Inception, 2009. DVD. Featured captions: [COUGHS], [SAITO COUGHING]

From Man of Steel (2013): [LOIS GRUNTING] is immediately followed in the caption track by [CLARK GRUNTS]. The captioner follows convention: Lois’ multiple vocalizations (including panting) are contrasted with Clark’s single, discrete grunt.

Source: Man of Steel, 2013. DVD. Featured captions: [LOIS GRUNTING], [CLARK GRUNTS]

Captioners don’t always adhere to this basic distinction between discrete and sustained. But more importantly, the distinction itself is deeply contextual and even arbitrary at times. While [COUGHS] and [COUGHING] seem clearly distinguishable on the basis of the one vs. the many (one discrete cough vs. a fit of coughing), [LAUGHS] and [LAUGHING] overlap quite a bit in their meanings. Complicating the distinction further: the -ing ending may simply indicate that a sound endures beyond a short burst — a long as opposed to short sigh, for example. Or it may involve multiple participants — one person laughs vs. multiple people laughing. Even so, I believe the distinction between discrete and sustained captions remains a useful one in closed captioning.

To keep sustained sounds going, so to speak, after they have disappeared from the caption track, they may be reinforced and reiterated with “continues” captions:

Source: Skyfall, 2012. DVD. Featured caption: (DOGS CONTINUE BARKING). This caption is paired with (DOGS BARKING IN DISTANCE), which precedes the “continue” caption by eight seconds.

Source: Skyfall, 2012. DVD. Featured caption: (TELEGRAPH CONTINUES CLICKING). This caption is paired with (CLICKING), which precedes the “continues” caption by thirty-eight seconds.

And sustained sounds may be terminated with “stop” captions:

Source: Aliens vs. Predator: Requiem, 2007. DVD. Featured caption: (beeping stops). This caption is immediately preceded by (steady beeping) and (beeping accelerates).

But stop captions are not required if the visual context makes it clear that a sustained sound has stopped. For example, (PHONE RINGING) may be stopped not by a closed caption but by someone visibly answering the phone. (For more on continues and stop captions, see Chapter 6 of my book, Reading Sounds).

Each sustained sound in the caption track creates a sonic timeline that continues to persist until it is terminated through a change in visual context or a stop caption. Multiple timelines may co-exist, with sustained sounds/captions building on each other. Sound is simultaneous, and one way of creating simultaneity on the caption track is by layering up sustained sounds.

Let me suggest, tentatively, how this might play out in a specific example:

Source: Avatar, 2009. DVD. Featured caption: (CHATTERING AND SINGING STOP). This caption is paired with (VILLAGERS CHATTERING) and (SINGING IN NA’VI). Note the layering up of multiple, sustained sounds in separate captions (chattering, singing) that are simultaneously stopped with a single caption.

In this scene from Avatar (2009), sustained sounds are added to the captioned landscape one-by-one:

(VILLAGERS CHATTERING) — 08.20-10.16
(DRUMS BEATING) — 10.21-11.16
(SINGING IN NA’VI) — 12.21-13.66

The present participle (-ing verb ending) suggests ongoing action. The villagers don’t stop chattering when the drums start beating, just as the drums continue when some of the villagers start singing. Or a more likely reading: All three sustained sounds occur simultaneously despite being presented to us sequentially. As the camera moves down and then drops in behind Jake Sully and Neytiri as they arrive at the tribal congregation, we gain progressive access to each successive layer of sound: chattering, drumming, singing. We know these sounds are intended to be sustained even after they are erased from the caption track because they are terminated in the caption track:

(CHATTERING AND SINGING STOP) — 20.72-22.17

Or rather, two of the sustained sounds are terminated. What happened to (DRUMS BEATING), which was also on a sustained timeline? Should we assume that the drums kept on beating even after the chattering and singing have stopped? That’s a possibility. But I think we are supposed to assume that everything stops abruptly as the villagers turn to watch the arrival of the outsider, Jake Sully. Given the continuous nature of sustained nonspeech captions, a better caption might have been (chattering and music stop), where “music” would account for both drums and singing.

Towards the end of the clip, two of the sonic timelines are restarted: (CHATTERING AND SINGING RESUME). Should the drums have been restarted as well? The drums can be heard more clearly at the end of the clip but the captioned drums were never terminated to begin with and seem to have been forgotten.

Let me conclude with a playful visual representation of how the sustained nonspeech sounds in this scene persist to create multiple layers of meaning.

Source: Avatar, 2009. DVD. Textual animations created by the author in Adobe After Effects.

The textual animations I produced for this clip are not my alternative to closed captions (I need to be clear about that!) but rather a small attempt to visualize sustained meanings and to suggest how much cognitive effort may be required of readers to keep track of multiple captions in short-term memory. This visualization also brings out the ambiguity of Jake’s (STAMMERING), a manner of speaking identifier. Readers may assume, incorrectly in this case, that Jake continues to stammer through multiple speech captions, because the present participle form suggests ongoing action but also because that’s how manner identifiers sometimes function. A single manner identifier will sometimes be responsible for inflecting a speaker’s speech across multiple speech captions. In this case, Jake doesn’t seem to be stammering through his lines even though the single manner identifier suggests as much. (See Chapter 8 of my book, Reading Sounds, for more information on manner of speaking identifiers.)

Here’s the larger point: If each present participle inaugurates a sonic timeline, then captioners need to keep tabs on all of the sustained captions. Sustained sounds can persist on the caption track beyond their allotted screen time. Every sustained sound in the caption track should be assumed to be ongoing and active until otherwise noted via a stop caption or through visual/contextual clues. Visual cues are often sufficient to indicate when a sustained sound has stopped…but not always.