Design questions all the way down. A General Methodological Model of Distant Reading
Yann Audin du laboratoire de recherche sur les écritures numériques présentera un poster à la Conférence annuelle du Journal of Computational Literary Studies à l'université de Potsdam.
Résumé de la présentation
Since it was coined in Conjecture on World Literature (Moretti 2000), the expression “Distant Reading” (DR) has come to represent a wide array of practices in Computational Literary Studies. Despite a wide variety of goals, computational tools and infrastructures, as well as extensive differences in corpora types and sizes, DR projects share a common general conceptual structure.
- Corpus creation: A literary corpus can include well-structured and cleaned textual data (Fileva 2023; Bories, Fabo, and Plecháč 2022), metadata on the studied works and their historical contexts (Wilkens 2016) and annotations (Wolfe 2002);
- Formal model: A computational method to capture relevant information from the
formal features of the corpus; the process of literary modelling is described by McCarty (2013) and Piper (2015, 2017); - Datafication: The application of the formal model on the corpus to transform it into a database is an act of reduction that is necessary to study massive corpora, but it amplifies any biases introduced in the previous steps;
- Data representation: Data can be visualized in graphical (Drucker 2011; Jänicke et al. 2017) or tabular (Flanders and Jannidis 2018) form to uncover large-scale patterns, connections, outliers, etc.;
- Analysis: Distant Reading infers findings, and confirms (or infirms) hypotheses from the trends discovered with data representations and statistical methods, in conjunction with scholarly expertise and literary insight.
These steps are informed by technical limitations, corpus specificities, and the research question. They further carry design questions—literary, scientific and technical choices—that are often obfuscated in DR discussions as trivial; this poster presents these questions through a general methodological pipeline of Distant Reading. More significantly, it aims to showcase the compounding effect of design choices in a practice based on large-scale corpora and algorithmic automation.
Overview of the poster
This poster is organized around a schematic form of the general DR pipeline that integrates theoretical and practical elements from more than two decades of Distant Reading. Importantly, it is not an attempt at proposing a standardized methodological pipeline for Distant Reading in Computational Literary Studies. Rather, it showcases the different streams of DR as alternative paths in the conceptual model, and presents DR as a flexible and extensible project. For instance, it points out the difference between the use of a computational method as a proxy for a literary concept, versus the translation of a concept into a formal model. The poster’s objective is to describe explicitly the complexities of DR projects through an exploration of the technical assemblages of DR, while identifying some of their epistemic consequences on Computational Literary Studies.