Author Archives: Dave Williams

About Dave Williams

Web & Digital Services Librarian at Queens College, CUNY.

A Digital Collection

As part of our ongoing experiments in visualization, I recently rebuilt a personal website using Omeka (or more specifically what they’re now calling “Omeka Classic”), a lightweight web publishing platform for digital collections from George Mason University’s Roy Rosenzweig Center for History and New Media. Previous exposure suggested that, by itself, it could make a decent digital asset management add-on to an existing site due to its metadata support. In this instance I wanted to discover the benefits Omeka offers when treating our corpus of plain text files as an online collection.

As it happens, there are a number of very nice plugins extending the functionality of Omeka, and several are specifically designed for text analysis and visualization. With the file content attached as an item type metadata (in this case the textual component), I was able to use the Ngram plugin to analyze term frequency and plot it over time.

Continue reading →

Keywords in Context

3 Replies

One of the initial project goals was to provide a means for examining the text using keywords, or more specifically a variation on the tools designed for “keywords in context” (KWIC). This technique involves generating a concordance, a list of the words used within a text along with those words immediately surrounding them, as a means for providing context. Traditionally, generating a concordance was a labor-intensive process reserved for only the most important books, often source texts that serve as the foundation for religion, such as the Judeo-Christian Bible.

As a linguistic tool, examining the collocation of terms and their respective uses across a corpus can provide valuable research insights, and with the advent of digital text analytics applications generating these annotated corpora became significantly faster and more accessible.

It was our hope that, when applied to our collection of Marxist writings, the results could also have pedagogical value: By searching for a keyword of personal interest, perhaps drawn from current events or an unrelated research interest, a scholar unfamiliar with Marxism could gain insight from how the terms were applied, resulting in something of a “Marxist perspective” on the topic.

Continue reading →

Improvements and Interactions

Network Explorations

Initial Experiments

Leave a reply

The initial data corpus for this project is composed of twenty-five files, formatted using UTF-8 (8-bit Unicode) character encoding as both plain text and XHTML. These were produced using materials hosted by the Marxists Internet Archive, through a process of carefully cleaning the content via regular expressions. Extraneous presentational markup and line endings were removed, and replaced by semantic markup where applicable.

XHTML was selected because, as a markup language written according to the XML standard, it supports incorporating additional metadata. For this project we wanted to include information about the location where a document was originally written as well as the date, and this was accomplished using a set of elements defined by the Dublin Core Metadata Initiative. For example, Marx and Engel’s Manifesto of the Communist Party was created in Brussels, Belgium, in February of 1848, so the XTHML version contains the markup:

<meta name="DC.date" content="1848-02-21" />
<meta name="DC.coverage.x" scheme="Point" content="4.3333" />
<meta name="DC.coverage.y" scheme="Point" content="50.8333" />

Continue reading →

Interrogating Marx(ism)

Author Archives: Dave Williams

About Dave Williams

A Digital Collection

Keywords in Context

Improvements and Interactions

Network Explorations

Initial Experiments

Need help with the Commons?