How to Use Corpora to Edit Technical Articles Effectively and Accurately

Corpora are quickly becoming mainstream in the field of linguistics. But can they help us become better technical editors as well?

A corpus is a large collection of written texts that often contains multiple genres of writing or transcribed speech. Corpus analysis tools allow you to see words in context and find connections between similar words. Corpora contain writing from various topics and fields. Likewise, technical editors handle a number of topics and fields. So can we use corpora to enhance our ability as technical editors?

THE RESEARCH

In his article “Using Corpus-Based Instruction to Explore Writing Variation across the Disciplines: A Case History in a Graduate-Level Technical Editing Course,” published in Across the Disciplines, Ryan K. Boettger (2016) from the University of North Texas reports on the results of teaching about corpus-based approaches in a graduate-level technical editing class of 14 students. Throughout the class, the editing students worked with ESL STEM graduate students to help them with their writing. Many of the editing students used corpus-based approaches in the process. Several students compiled and analyzed (with AntConc) their own corpora containing academic writing in the discipline of their clients, analyzed (with AntConc) a corpus of student writing and a corpus of student editing that Boettger had compiled, or used the online Corpus of Contemporary American English (COCA, which has built-in analysis tools).

[Student editors] typically enjoyed engaging with corpora and found these experiences valuable in working with [ESL] clients from various academic backgrounds and validating their editorial decisions.
—Ryan K. Boettger (2016)

After the semester, Boettger invited the students in his class to respond to a survey so he could assess their opinions on corpus-based learning, and every student in the class participated. The survey asked students to answer several questions on a Likert scale of 1 to 7 (strongly disagree = 1, disagree = 2, somewhat disagree = 3, neither agree nor disagree = 4, somewhat agree = 5, agree = 6, strongly agree = 7). Students reported with a mean (x̅) of 5.64 and a standard deviation (s) of 1.39 that they understood language systems better because of their corpus learning. Students mostly agreed that they would use corpus-based approaches for future technical editing (x̅ = 5.57; s = 1.28) and communications projects (x̅ = 5.79; s = 1.53). Students also mostly agreed that they would use the results from corpus-based inquiries to justify editorial suggestions to a primary decision-maker (x̅ = 5.86; s = 0.86), to justify how communicators from different disciplines write (x̅ = 6.14; s = 0.86), and to propose changes to writing practices in their places of work (x̅ = 5.79; s = 0.86). These responses show the immense value that students saw in using corpora in technical editing and writing.

THE IMPLICATIONS

The results indicate that corpus-based approaches can greatly assist professionals in technical editing and communications projects. As technical editors, we can compile and analyze a corpus with the writing of multiple authors in our client’s discipline to justify and explain our edits. We also can teach our clients how to compile and analyze a corpus with articles in their discipline (see “How to Compile and Analyze Your Own Corpus” below). As they write papers in the future, they will have a reference to improve their writing. Corpus knowledge could be especially helpful for ESL clients. Technical editors who work for publishing companies can use corpora to improve the writing process at their places of work, and freelance technical editors can use corpora to improve their own editing (see “How to Use a Corpus: Tools and Terminology for Analyzing COCA” below to learn how to analyze online corpora and to learn the key terminology for analyzing corpora in general).

To find out more about using corpora in the teaching and practice of technical editing, read the full article:

Boettger, Ryan K. 2016. “Using Corpus-Based Instruction to Explore Writing Variation across the Disciplines: A Case History in a Graduate-Level Technical Editing Course.” Across the Disciplines 13 (1): n1. https://www.researchgate.net/publication/283546780.

—Brady Davis, Editing Research

FEATURE IMAGE BY PATRICK TOMASSO

Find more research

Take a look at Ryan K. Boettger and Stephanie Wulff’s (2016) conference article to learn about how corpus linguistics approaches in technical writing classes help students write better in their respective fields: “Using Authentic Language Data to Teach Discipline-Specific Writing Patterns to STEM Students.” In 2016 IEEE International Professional Communication Conference (IPCC), pp. 1–4. https://doi.org/10.1109/IPCC.2016.7740513.

Read Ryan K. Boettger and Stefanie Wulff’s (2014) research article to learn about a corpus-linguistic approach to examining why students make the choice to leave demonstrative this unattended (as opposed to following it with a noun phrase): “The Naked Truth about the Naked This: Investigating Grammatical Prescriptivism in Technical Communication.” Technical Communication Quarterly 23 (2): 115–140. https://doi.org/10.1080/10572252.2013.803919.

Read Gregory Crane and Jeffrey A. Rydberg-Cox’s (2000) conference article to learn about the need for discipline-specific editors with a combination of academic and technical training to edit small corpora of works in their field: “New Technology and New Roles: The Need for ‘Corpus Editors.’” In Proceedings of the Fifth ACM Conference on Digital Libraries, pp. 252–253. https://doi.org/10.1145/336597.336686.

Mark Davies has created several powerful corpora that can be accessed freely at English-Corpora.org. Here are some of the most interesting and relevant corpora:

Davies, Mark. 2008–. The Corpus of Contemporary American English (COCA): One Billion Words, 1990–2019. https://www.english-corpora.org/coca/.
Davies, Mark. 2016–. Corpus of News on the Web (NOW): 10 Billion Words from 20 Countries, Updated Every Day (2010–yesterday). https://www.english-corpora.org/now/.
Davies, Mark. 2020–. Coronavirus Corpus: 291 Million Words and Growing, the Definitive Record of the Social, Cultural, and Economic Impact of the Coronavirus (COVID-19) in 2020 and beyond (January 2020–yesterday). https://www.english-corpora.org/corona/. The Coronavirus Corpus is a subset of NOW.

More useful corpora (adapted from Appendix A of Boettger [2016]):

Linguistic Data Consortium and also The Trustees of the University of Pennsylvania. 1992-2020. Air Traffic Control (ATC) Corpus. https://catalog.ldc.upenn.edu/LDC94S14A.
Schler, Jonathan, Moshe Koppel, Shlomo Argamon, and James Pennebaker. 2006. Blog Authorship Corpus. https://www.kaggle.com/rtatman/blog-authorship-corpus.
Yasumasa, Someya. 2000. Business Letters Corpus. http://www.someya-net.com/concordancer/.
English Language Institute of the University of Michigan. 2019. Michigan Corpus of Academic Spoken English (MICASE). http://quod.lib.umich.edu/m/micase/.
English Language Institute of the University of Michigan. 2009–2010. Michigan Corpus of Upper-Level Student Papers (MICUSP). https://micusp.elicorpora.info/.
NetAdvance. 2018. Professional English Research Consortium Corpus (PERC). https://scnweb.japanknowledge.com/PERC2/.

How to Use a Corpus: Tools and Terminology for Analyzing COCA

David Brown, a professor from Carnegie Mellon University who runs the YouTube channel The Grammar Lab, made a three-part video tutorial in 2012 that provides an introduction to common corpus analysis tools and terminology. He uses the online Corpus of Contemporary American English (COCA), which has built-in analysis tools, to provide this introduction. Check out the tutorial to learn how to integrate corpora into your editing: https://www.youtube.com/watch?v=sCLgRTlxG0Y&list=PL3FB2495C0B9D542B.

Brown also runs The Grammar Lab blog, which contains several useful articles introducing corpus usage: www.thegrammarlab.com.

The following is a list of vocabulary that may be helpful as you go through Brown’s learning materials:

concordance: When you make a search for a word with a corpus analysis tool, little snippets of text pop up that contain that word in context. These snippets are often referred to collectively as concordance or as concordance data. Alternatively, each individual snippet of text can be called a concordance, so that a search produces several concordances.
concordance: When you make a search for a word with a corpus analysis tool, little snippets of text pop up that contain that word in context. These snippets are often referred to collectively as concordance or as concordance data. Alternatively, each individual snippet of text can be called a concordance, so that a search produces several concordances.
semantic prosody: Semantic prosody is when a seemingly neutral word takes on positive or negative associations based on its collocation. When a seemingly neutral word is most frequently surrounded by collates with a positive connotation, that word has positive semantic prosody. When a seemingly neutral word is most frequently surrounded by collates with a negative connotation, that word has negative semantic prosody.
- EXAMPLE: Cause has a seemingly neutral connotation but has collates of death, problems, damage, concern, and disease (as found using the collocation function of COCA), thus giving it a negative semantic prosody.

How to Compile and Analyze Your Own Corpus

To compile a corpus, save your articles as text files (you can use Adobe Acrobat to do this). You may need to go through the text files to verify accuracy. Then download Laurence Anthony’s (2019) corpus analysis tool to analyze your text file corpus:

AntConc (Version 3.5.8) [Computer Software]. Tokyo, Japan: Waseda University. Available from http://www.antlab.sci.waseda.ac.jp/.

Once you’ve downloaded and opened the program, go to Help on the menu bar and select Readme File for instructions on how to use AntConc. You essentially open up the text files with File > Import and then you can analyze them with the tools described in the Readme file.

For a great overview of AntConc’s features, check out Lawrence Anthony’s (2011) video tutorials on YouTube: https://www.youtube.com/watch?v=9TsqFVrUYO0&list=PLx4CiNE_oyfPp9uVMmbByd-ncdN2aUDu8.

In addition, check out David Brown’s tutorial article on The Grammar Lab blog: “AntConc Walk-Through.” http://www.thegrammarlab.com/?nor-portfolio=antconc-walk-through.

6 Comments

Liana Sowa

June 16, 2021at7:11 pm

Log in to Reply

So helpful! I definitely underuse corpora myself.
How Corpus Tools Can Help Language Learners Correct Errors - Editing Research

April 11, 2022at7:27 am

Log in to Reply

[…] To learn about using corpora in editing, check out Brady Davis’s article: “How to Use Corpora to Edit Technical Articles Effectively and Accurately.” […]
How Historical Prescriptivism Influences Modern Editing - Editing Research

June 10, 2022at12:08 pm

Log in to Reply

[…] Learn more about the basics of using corpora and how they can help you as an editor in Brady Davis’s Editing Research article “How to Use Corpora to Edit Technical Articles Effectively and Accurately.” […]
Breaking the Style Guide: Using Corpora to Enhance Your Copyedits - Editing Research

April 17, 2023at9:00 am

Log in to Reply

[…] Take a look at Brady Davis’s Editing Research article for more information on the combination of corpora and copy editing: “How to Use Corpora to Edit Technical Articles Effectively and Accurately.” […]
The Linguistics of Punctuation—An Editor's Guide - Editing Research

June 17, 2023at4:52 am

Log in to Reply

[…] Also read Brady Davis’s Editing Research piece for another treatment of this innovative editing approach: “How to Use Corpora to Edit Technical Articles Effectively and Accurately.” […]
Punctuation by Genre: A Corpus Linguistics Study - Editing Research

June 17, 2023at7:42 am

Log in to Reply

[…] out Brady Davis’s article “How to Use Corpora to Edit Technical Articles Effectively and Accurately” to learn more about how corpora can help with […]