‘To the boundary quicker than you can say “Waltzing Matilda”’: The language of cricket commentary

Test Match Special (TMS) is a long running radio programme providing ball-by-ball coverage of England’s test cricket matches. The show is much loved for its light hearted style and evocative language. One of its best loved commentators, the late Christopher Martin-Jenkins, was described by his former producer as a ‘master wordsmith’:

‘He always had the knack of producing the perfect phrase for any significant moment, but perhaps even more impressively he had the ability to lift the mundane.  A ball outside the off stump was “across the bows”. A drive through the covers for one was “stroked crisply”.’

In a recent article, the current TMS anchor Jonathon Agnew lauded the ‘great powers of description’ of fellow commentator, Henry Blofeld, and compared his role on the programme to that of a musician:

‘I liken the commentator to a soloist, with the crowd as the background orchestra. The crowd has a tone and pitch and you can set your voice against it to make it sound melodic, gentle, harmonious […]

As a linguist, I’m interested to analyse, scientifically, the language of a show like Test Match Special. For example, can we see how the linguistic style of TMS compares to that used by other cricket broadcasters, or how it compares to the media’s coverage of other sports? Objectively, can we say it is more ‘descriptive’ or ‘evocative’ than other commentary?

Corpus linguistics

One way to do so is to use a technique that linguists call corpus analysis.

Corpus linguistics is the study of language using very large samples of data (or corpora). In the technique, a computer is used to look for patterns in the way words and certain grammatical features appear, as well as how often they do so. Corpus analysis can be used to look for patterns in natural language, to see how features differ between dialects (dialectology) – between British and Australian speakers of English, for example – or within different social communities (sociolinguistics). It can also be used to study linguistic or literary style (stylistics). For example, looking at stylistic features, corpus analysis was recently used to show that J.K. Rowling was indeed the author of the detective novel The Cuckoo’s Calling, writing under a pen name.

Take two test matches and a game of golf…

To do an analysis, I first needed a corpus of text. For this, I took the BBC’s live text commentary from the first two Ashes tests between England and Australia of July 2013 (taking place at Trent Bridge and Lords). The live text feed features ball-by-ball updates, direct quotes from Test Match Special’s commentators, as well as tweets from listeners, and those of other well known cricketing figures and celebrities. This feed was readily available, archived on the BBC website. In total, this amounted to 157,698 individual words.

To compare the sample against, I took the same live text coverage of the first two tests from the Sidney Morning Herald (SMH) newspaper in Australia. The Herald’s coverage features very similar content to that of the BBC, and the resulting corpus comprised a very similar number of words (163,507).

Finally, I looked at the BBC’s live text coverage of two major golf tournaments: the 2013 Open Golf Championship and 2013 US Open. The text feed, produced over 8 days, is very similar in content to the two cricket corpora although amounted to fewer words in total (104,027).

I then analysed all 3 corpora with corpus analytics software, freely available from the web. Here’s what I learned…

1. What words occur most frequently?

One of the easiest things to do is look at the frequency that words occur. Many words will of course feature multiple times in a given sample. For example, in the BBC’s coverage of the Ashes, there were 10,641 distinct words in the 157,698 word corpus.

As you would expect, grammatical items (function words) were the most common: ‘the’, ‘to’, ‘a’, ‘of’, ‘and’ and ‘in’ were the top six most frequent words in the corpus. This is pretty much what you expect of any large sample of English text. ‘The’ for example is also the most common word in the 2 Billion word Oxford English Corpus.

The content words are more interesting. The first most frequent content word in the BBC Ashes test is ‘England’ (occurring 1,655 times), with ‘Australia’ only mentioned ‘873’ times. The most frequently occurring players names were ‘Bell’ (548 times), ‘Root’ (447) and ‘Agar’ (409), reflecting their key roles for the two teams in the two games. For the SMH corpus, the most frequently occurring names were similar, although the Australian player Agar was mentioned more times than Root (perhaps a result of a different national bias).

Otherwise, differences in the frequency and distribution of these words between the three corpora will reflect differences in things like register and dialect, as well as style. Register refers to a particular variant of language used for a particular purpose, or in a particular setting – such as playing or watching cricket. Vocabulary (jargon) is a particularly important aspect of register, so we can expect differences in register to be apparent in the word frequencies of different corpora. In the TMS sample, cricketing terminology features heavily. ‘Wicket’ was the 50th most frequent word (which would be quite odd in any other corpus, except perhaps in one from an Ewok convention). Here are the most frequent cricketing terms in the corpus:

wicket             50th            462 times
over                59th             378 times
stump            227th          120 times
bowl               235th           118 times
maiden          283rd           95 times

‘Silly’ (as in ‘point’) occurs 25 times, which is surely more than you would find in any other serious piece of sports journalism.

Finally, in terms of the words used, it was very difficult to detect any dialect differences between the Australian and British samples. (Somewhat disappointingly, I could find no examples from the Aussie Slang Dictionary in the SMH sample, and no occurrence of the word ‘pom’.) Interestingly, the word ‘fab’, which is not a word I would expect to find in the English cricketing register, occurs twice in the SMH corpus.

2. What words occur with other words?

One thing that corpus linguists particularly look for is the frequency with which words occur with other words. In particular, they are interested in collocations. A collocation is a pair or sequence of words that occur together in a corpus more often than a grammatical combination would be expected to occur by chance.

For example, a search of the BBC Ashes text shows that the following words collocate frequently with ‘shot’:

poor                 8 times
odd                  3 times
loose                3 times
horrible           3 times

‘Nice’, ‘lovely’, ‘woeful’, ‘terrible’, ‘rubbish’, ‘rash’, ‘glorious’, ‘beautiful’ and ‘forceful’ all appear once with ‘shot’, whereas ‘good’ does not appear at all.

In the SMH coverage the following words collocate most frequently with ‘shot’:

fine                  6 times
wristy              6 times
poor                5 times
false                5 times
cracking         5 times

‘Bad shot’, which means roughly the same as ‘poor shot’, only appears once in either corpus. Therefore, ‘poor shot’ seems to be a genuine cricketing collocation. In both cricketing corpora there does seem to be a tendency to avoid the simple adjectives ‘good’ and ‘poor’; instead, use of a range of adjectives is preferred.

Collocations are particularly interesting to linguists because they can hint at particular connotations of a word, which have become ‘ingrained’ within a particular language community. (For example, the adjective ‘whinging’ might collocate with ‘pom’ and ‘typical’ might collocate with ‘Aussie’. However, I could find neither collocation in either corpus.)

3. What can we learn about the linguistic style?

TMS is well known for its light-hearted style, which does seem to carry over to the sister coverage on the BBC website. For example, the use of nicknames for the commentary team is popular. ‘Aggers’ occurs 21 times in the corpus, compared to the more formal (Jonathan) ‘Agnew’, which appears 57 times. ‘Tuffers’ appears 6 times whereas (Phil) ‘Tuffnel’ doesn’t appear at all. ‘Boycs’, for Geoffrey Boycott, appears 3 times. The word ‘Aussies’, an informal term referring to the Australian team, occurs 212 times.

What about the evidence for a particularly ‘evocative’ or ‘descriptive’ style? One thing we can look for is the use of figurative language.

Metaphor is one of the most common forms of figurative language but is notoriously difficult to spot, partly because it isn’t marked by any particular grammatical structures. Similes work in a very similar way to metaphors. However, they also have the feature of being marked by quite standard grammatical forms in English, typically involving ‘like’, ‘as’ or ‘than’. They are therefore much easier to spot.

In the three corpora, I searched for similes of the form:

like […]
as […] as […]
[…] than […]

I also then removed, manually, non-similes such as ‘hotter than yesterday’, ‘playing as well as Australia’, and so on. Notably, this included a number of comparative non-similes, which still contained figurative language, such as:

 […] with a whip that clatters the leg-side fence quicker than you can say “Waltzing Matilda”.

In the BBC Ashes coverage I found 91 examples of similes of these types (22 using ‘than’, 16 using ‘as’ and 53 using ‘like’). They included sentences like:

The sun comes out, tension rising in Trent Bridge like a shaken bottle of pop.
Haddin is tougher than a cheap steak.
[He] sits back in his chair, as comfy as a man enjoying a sherry at Christmas.

Only 11 of these similes were clichés (such as ‘This outfield is like lightning’ and ‘Joe Root looks angelic, but he is as hard as nails’).

In contrast, I found only 9 clear similes in the similarly-sized SMH corpus (for example, ‘Starc’s stay could have been as short as a fire-cracker’s thread!’). I found 12 in the slightly smaller golf corpus (such as, ‘His chat of winning this thing was as outlandish as his clothes’). This does suggest that the style of BBC’s cricket coverage includes greater emphasis on figurative (and possibly ‘evocative’) language.

Finally, I looked at the types of metaphorical themes that are used in the similes from the BBC Ashes coverage. I found that, 5 similes used references to famous cricketers, including:

Suddenly, the pitch looks about as docile as a sleepy Phil Tufnell.

6 referred to other sports, or sports personalities, including:

Peter Siddle continues to be as miserly as Arsene Wenger in the transfer market.
Ryan Harris looks like a northern darts player.

7 similes used school or amateur cricket as a theme, including:

The pitch is still good. Australia batted like a village green team yesterday.
The fielding coach had come on as twelfth man, like your teacher helping out.

9 made references to cinema, including:

[…] stone-faced, watching on from the shadows like a baddie from a film.
[He] makes a gesture that looks like he’s calling for a Hannibal Lecter mask.

And 10 similes (11% of the total) used references to animals, including:

Michael Clarke, angrier than a swarm of hornets […]

Metaphors are often used in the media to report and describe current events, and there is research to show that this can influence the way we think about them. For example, a senior military official recently compared the political situation in Afghanistan to a cricket match (‘We put the Taliban into bat in 2001 and took a flurry of early wickets…with half an hour to play we find ourselves some runs short.’) Finally, then, it’s perhaps not surprising to see war being used as a metaphorical theme to describe the cricket on 3 occasions. They were:

Bell, like a Grenadier Guard, dutiful in defence.
It will be like a war of attrition.
Chris Rogers wears more armour than a tank.

4. What else can we discover?

The data showed a few final things. Firstly, the two samples of cricket commentary reflect the fact that the weather for the first two Ashes test matches was not typically British. The word ‘sun’ appears 64 times in the BBC corpus, whereas ‘rain’ occurs only 9 times. Only one of the similes in the corpus relates to the weather:

It’s grey at Lord’s, but probably not as grey as the mood in the Australia dressing room.

Secondly, fans of ‘Boycott bingo’ (a popular game for fans of TMS in which scoring is based on the various pet phrases of commentator, Geoffrey Boycott), will have been slightly disappointed. The phrase ‘corridor of uncertainty’ appears only 3 times in the BBC corpus. In contrast, the related term the ‘corridor outside off’ appears 31 times in the SMH corpus. Disappointingly, ‘buffet bowling’ (referring to loose bowling against which the batsman can ‘help themselves’) appears only once in the BBC coverage, as does the related term ‘carvery’.

Lastly, a search of corpus indicates it had been a disappointing period for cakes and confectionary in the TMS commentary box. The word ‘cake’ itself only appears once in the corpus, alongside a singular ‘Bakewell’. (I would hope that similar analysis of future coverage would find a higher frequency of cake-related words.)

To conclude (‘at stumps on day four…’)

Corpus analysis is a technique employed by other linguists in much more sophisticated ways than I have here. However, I hope I’ve given a flavour of the types of thing corpus linguistics can be used for.

For example, corpus techniques can be used to investigate linguistic style. Certainly (albeit based on very limited samples of text) the language of BBC’s cricket coverage does seem to be a more figurative in style than that of the Sidney Morning Herald in Australia, with much greater use of linguistic figures such as similes. Figurative language is also much more prevalent in the BBC’s live text coverage of cricket than it is in the BBC’s coverage of golf. This is evidence for a particularly ‘evocative’ and ‘descriptive’ style of cricket coverage on the BBC, as popularised on Test Match Special.

Above all, I hope this serves as a timely celebration of the language of cricket commentary. For fellow language lovers, to paraphrase a stalwart of TMS, it is a genuine word buffet.


