Digitizing the OED

I recently picked up a copy of John Simpson’s memoir The Word Detective: Searching for the Meaning of It All at the Oxford English Dictionary, largely because I like words and dictionaries and the OED in particular. A nice surprise, however, is that Simpson, the former chief editor of the OED, oversaw the digitization of the OED to CD-ROM, beginning in the early 1980s, as well as it’s later migration online. This is exciting stuff, not only for the description of transferring a  massive database from print to computer (~67 million characters), but also because Simpson nicely describes how digitization did not replace the original function of the OED, but rather added new dimensions to it.

One of the benefits of the OED was that it’s data was already structured; i.e. definitions, pronunciations, etymology, etc. are distinguishing by their formatting, “a change of typeface, size of print, special print characters, indentation, etc.,” consistently and repetitively.  The OED teamed up with International Computaprint Coporation, IBM, and the University of Waterloo (all in North America). The first two helped with digitizing the data, while Waterloo’s Computer Science Department helped construct the database. The typing took 150 people working for 18 months. After words, the 20,000 pages of type, each three columns of small print, had to by proofread, which was taken on by 50 freelancers.

Simpson’s descriptions of how this large project took shape and was organized are interesting, but he shines when describing the new possibilities that digitization would open for the OED. Up to this point, dictionaries were incredibly linear: you looked up the word you wanted, and there you were. But what if, as Simpson describes it, you were able “to search the entire content of the dictionary instantly for information relating to the language”? He gives the example of finding all the words in English that end in -ology (1,011 in the OED), followed by comparing them with all the words that end in -ography (508). Given how time-consuming doing this would be with the print dictionary, it wasn’t done, but digitization could make such a search feasible and quick.

“Hundreds of other questions which might have been asked about the language were not asked, or were only answered falteringly by considering just a sample of the data. What if you could dream up more or less any question you wanted about the language, ask it, and receive an answer seconds later?” Simpson writes. This seems to be the common-sense attraction of what is now collectively referred to as digital humanities: it opens up the possibility of new questions, new forms of analysis, and the ability to see patterns and meanings that would be impossible or extremely impractical to reach without digital tools. At the same time, the possibilities these new avenues offer do not mean we abandon other avenues of research and analysis. Just because we can search the entire corpus for all instances of -ology doesn’t mean that sometimes we just need or want to know the specific meaning(s) and history of amphibology or tropology – both of which are just two of the many words that Simpson explores in his fascinating memoir.


Weekly Roundup: June 8


  • Global Perspectives on Digital History, a site that collects material from “hundreds of venues where high-quality scholarship is likely to appear, including the personal websites of scholars, institutional sites, blogs, and other feeds” including Twitter.
  •  Finally, I’m soliciting resources on GIS – these may be books, websites, blogs, videos, anything. To begin, a friend recommended I check out the GIS subreddit. What else is out there?

My HTRC 2015 via Twitter

Digital History & AHA 2015

Far from being in New York for this year’s AHA conference, I was visiting family in Texas. Nonetheless, two things were fairly clear from roughly 1800 miles away: digital history projects were getting some attention, and conference attendees were, thankfully, tweeting conference away.

As a side note to the rest of this post, I want to thank everyone who tweeted from AHA and from every conference. Not only is it a great tool for those attending conferences, but it is a great resource for the rest of us who are  unable for one reason or another to attend a given conference. In fact, I probably could not write this post if it were not for the AHA attendees who tweeted panels, projects, papers, thoughts, questions, and answers.

To begin, let’s just look at the panels that were exclusively (based on the online program) on digital history:

And needless to say, there were digital projects and methods presented on panels that were not exclusively on digital history. But we can’t stop there those who presented at the Digital Projects Lightning Round. You can see a complete list of the projects with short descriptions here – with the added bonus that the projects are linked to their respected websites.

In addition to the AHA’s list of participants, check out Anelise H. Shrout’s list on of Digital Projects at the AHA on her blog, especially since she includes projects from the associated THATCamp as well.

Finally, you can attend, as it were, the Getting Started in Digital History Workshop via Jason M. Kelly’s blog.

