Editing punctuation can be hard. One corpus-based approach to more effective editing involves tailoring your edits to the genre in which you’re working.

As editors, we’ve all faced the problem of not knowing exactly how to apply the straightforward rules of punctuation while editing a manuscript. How far can we bend the rules before they break? After all, linguists are the ones who get to describe language rules while we’re the ones who get to prescribe, right? Enter the tools of corpus linguistics. By analyzing a broad spectrum of punctuation data from across many different genres, editors can make decisions based on real-world usage patterns, specific to particular genres.


In their article, “Frequency Distributions of Punctuation Marks in English: Evidence from Large‑scale Corpora,” authors Kun Sun and Rong Wang (2019) studied punctuation use by analyzing data from several different corpora as well as from the Google Books N-gram Viewer. In performing these analyses, Sun and Wang sought to understand the differences in the frequency of punctuation use across a variety of contemporary registers and dialects of English. Their efforts were not limited to synchronic data, however. Sun and Wang also performed some diachronic analyses in order to understand how punctuation use has varied over time.

Sun and Wang found statistically significant differences in the frequency of ten different punctuation marks across registers. They performed a test of statistical significance (specifically a one-way ANOVA test) whose results enabled them to reject the null hypothesis and conclude that the difference among their data was not superficial. In the cases of regional dialects and historical varieties of English, Sun and Wang repeated a similar analytical process for a smaller subset of the initial set of ten punctuation marks and found meaningful frequency differences in these cases as well. For example, comma use and semicolon use peaked around the 1850s and then experienced a rapid decline.


The findings of this study suggest that when editing for punctuation, editors ought to take genre, register, time period, and dialect into account. For instance, when editing a research paper, they should be comfortable with a high frequency of parentheses and semicolons, but when editing a fantasy novel, they should be less comfortable with that sort of frequency. As Sun and Wang note, “Academic writing displays the fewest periods, question marks, and exclamation marks, but the highest figures for parentheses and semicolons” (2016, 26). (Additionally, their findings that patterns of punctuation use have changed significantly over time may provide a dose of historical humility to sticklers for comma and semicolon usage—things have not always been nor will they always be so.)

“Academic writing displays the fewest periods, question marks, and exclamation marks, but the highest figures for parentheses and semicolons.”

—Kun Sun and Rong Wang (2016)

Punctuation rules often feel very inflexible, but with usage evidence from corpora, we can feel more comfortable bending them. This is not to say that, for instance, em dashes can be used willy-nilly in place of any other punctuation mark. Rather, the point is that as thoughtful editors, we can appeal to corpus data like Sun and Wang’s to craft a more nuanced approach to editing for punctuation.

To learn more about applications of corpus linguistics research in editing, read the full article:

Sun, Kun and Rong Wang. 2019. “Frequency Distributions of Punctuation Marks in English: Evidence from Large‑scale Corpora.” English Today 35, no. 4 (December): 23–35. https://doi.org/10.1017/S0266078418000512

—Josh Stevenson, Editing Research


Find More Research

Take a look at Allie Stevens’s Editing Research article for more tips on how to incorporate the wisdom of corpora into your editing practices: “Breaking the Style Guide: Using Corpora to Enhance Your Copyedits.”

Also read Brady Davis’s Editing Research piece for another treatment of this innovative editing approach: “How to Use Corpora to Edit Technical Articles Effectively and Accurately.”