Digital Humanities

Stephen Best and Sharon Marcus introduced the concept of “surface reading,” a reading practice focused on what a text plainly presents, aiming to understand the text “at face value” without seeking hidden meanings or counter-signals. They describe this approach as one where criticism serves a modest purpose: “to indicate what the text says about itself” (Best and Marcus 2009: 11).

In contrast, the last two decades have seen the emergence of “distant reading,” a practice popularized by Franco Moretti. Critics sometimes refer to it as “not reading” because the actual analysis is performed by computers rather than human readers. Distant reading and similar methods, such as distant viewing, fall under the domain of the digital humanities. This field employs computational and statistical techniques to address large-scale research questions that would be too extensive for individual scholars. Despite debates about its place in literary theory, digital humanities have gained significance in an era where digitization of texts has become widespread. Projects like Google Books and Project Gutenberg, which digitize vast numbers of literary texts, have made computational approaches to literature more feasible. Although Optical Character Recognition (OCR) used for digitization is prone to errors, these texts are still usable for most computational analyses.

Computational approaches are particularly well-suited to text analysis due to the discrete nature of words, which makes texts amenable to counting and similar operations. These methods enable researchers to compare versions of a text, perform word searches, analyze word proximities, and identify clusters of words. They can also be used to attribute authorship of anonymous texts or identify authorship of misattributed works by examining stylistic elements such as punctuation and sentence structure. For example, computational analysis has shown that the frequency of words expressing doubt in Joseph Conrad’s *Heart of Darkness* is statistically distinct compared to a broader corpus of standard English (Stubbs 2005).

Computational methods are also useful on a larger scale. Franco Moretti has criticized the limited scope of traditional literary history, which is based on a small and unrepresentative selection of canonized texts. With approximately 30,000 novels published in English in the 19th century, less than one percent have been included in the canon. While human scholars cannot feasibly read the remaining 29,700 novels, computational approaches can analyze them to reveal patterns in vocabulary, themes, and genres, as Johanna Drucker points out (Drucker 2021: 113).

Computational methods often require collaboration with traditional literary analysis. For example, in sorting novels by genre, software must be guided by genre markers identified through close reading. Ted Underwood’s computational research on science fiction, which used bibliographies compiled by genre scholars, successfully identified novels within the genre. Similarly, computational analysis of detective fiction has traced the genre’s defining characteristics back to Edgar Allan Poe’s “The Murders in the Rue Morgue” (Underwood 2016).

Moreover, computational methods can differentiate fiction from non-fiction, as Andrew Piper’s work demonstrates with a 95% accuracy rate for 19th-century novels. His findings also show that distinctions between fiction and non-fiction remain consistent over time (Piper 2018: 105).

Critics of computational methods argue that these approaches are apolitical and may perpetuate biases related to race, gender, and social inequality. Safiya Umoja Noble’s *Algorithms of Oppression* (2018) highlights the risk of embedding prejudices in algorithms, while her call for a “Critical Black Digital Humanities” (Noble 2019) echoes the concerns of scholars like Roopika Risam, who examines postcolonial digital humanities. Despite these concerns, computational methods have been employed in studies addressing race and gender, as demonstrated by Richard Jean So’s research on racial inequality in postwar fiction and studies on gender and authorship by Underwood, David Bamman, and Sabrina Lee (Underwood et al. 2018). However, it remains essential to scrutinize these methods for biases, as their influence may be more concealed than that of theoretical approaches.



Categories: Digital Humanities

Tags: , , , ,