Keywords in Context

One of the initial project goals was to provide a means for examining the text using keywords, or more specifically a variation on the tools designed for “keywords in context” (KWIC). This technique involves generating a concordance, a list of the words used within a text along with those words immediately surrounding them, as a means for providing context. Traditionally, generating a concordance was a labor-intensive process reserved for only the most important books, often source texts that serve as the foundation for religion, such as the Judeo-Christian Bible.

As a linguistic tool, examining the collocation of terms and their respective uses across a corpus can provide valuable research insights, and with the advent of digital text analytics applications generating these annotated corpora became significantly faster and more accessible.

It was our hope that, when applied to our collection of Marxist writings, the results could also have pedagogical value: By searching for a keyword of personal interest, perhaps drawn from current events or an unrelated research interest, a scholar unfamiliar with Marxism could gain insight from how the terms were applied, resulting in something of a “Marxist perspective” on the topic.

Unfortunately the Open Source tools available to build a custom KWIC web application are somewhat limited. There are several desktop applications, such as AntConc and KHCoder, but the server-side applications appear to be both built primarily using Java and requiring significant customization. Although this doesn’t rule out building one, for a near-term solution we used the Voyant Tools, a collection of research applications developed to enhance humanities reading through lightweight text analytics, with particular emphasis on HTML documents. With our corpus available in valid XHTML, it was uploaded and examined using the Voyant “Contexts” tool.

The initial results were interesting. Contexts identified the top ten keywords (“class” being first), and generated concordances for them across the entire corpus.

Contexts Tool

The selections are listed alphabetically by author, with the document title in the first column, and the selected keyword is displayed in the center. The “context” and “expand” sliders at the bottom of the screen increase the amount of text displayed on either side of the keyword, and using the “plus” icon to the left of the title reveals the expanded passage.

Expanded Context

Particularly interesting are the results from a random keyword. Assuming it can be found, Contexts applies wildcard stemming, revealing unexpected instances. Consider the search for “fool” and its variants:

The keyword "fool*"

Applying a keyword relevant to a particular interest, such as use of the term “suffrage” is also revelatory:

The keyword "suffrag*"

Although it is possible to limit the results to specific texts, as a web application hosted on an external server the Voyant Contexts tool can be slow and prone to stalling, particularly when non-indexed keywords are searched. Adding or removing materials requires generating a new corpus, and there are limits placed on how many documents the service is willing to contain. As an open source project, however, it is possible to host a dedicated instance of the Voyant Tools. Should the project generate sufficient interest, such an approach is worthy of consideration.

In the meantime, access to the corpus via Contexts is available at http://adiuva.me/marx/contexts.html.

2 thoughts on “Keywords in Context

  1. James Mellone

    The idea of a concordance is a good one when it comes to a collection of works by someone like Marx. That’s a large undertaking that would require a more involved absorption of Marx’s works and their content than you may be able to tackle at this stage. As a first step toward providing more descrptive information about content, I would like to see a summary of sorts, 150 words or so, for each book in the current digital collection. Although I’m conversant with some of the main concepts of Marxism, my sporadic reading means I cannot recall which concept he addressed in which book. He repeated himself extensively, but it would be good to have a brief overview of each work so the reader has a contextual starting point for doing some keyword searching.

    Reply
    1. Dave Williams Post author

      That’s an excellent idea — and in keeping with the nature of the project, ideally one that could be accomplished programmatically.

      It might be useful to apply Topic Modeling, generating information that would fill a similar role to the summaries you describe. Not exactly a general description, but a “label” of the main themes, alongside a series of keyword “tags.” Definitely something to keep in mind for a future iteration.

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *