DH2024 “Reinvention & Responsibility“, GMU, Arlington Campus, 5.-10. August 2024

Come visit us at the ADHO Conference in DC!

Friday, August 9th

Responsible and Sustainable Editing: A Life Cycle for Digital Editions of Historical Travelogues (DEHisRe)

Poster Session – II

2:00 PM – 3:30 PM (EDT)

Van Metre Hall 125

Our presentation „A Life Cycle for Digital Editions of Historical Travelogues“ focuses on responsible resource management in Digital Humanities by outlining objectives for sustainable action in digital scholarly editions trough a case study, including a) efficient use of available development resources and infrastructure; b) consistent pursuit of code efficiency using open standards; c) transparent documentation for data reusability.

Responsible and Sustainable Editing: A Life Cycle for Digital Editions of Historical Travelogues (DEHisRe)

Sandra Balck, Anna Ananieva, Jacob Möhrke
Leibniz Institute for East and Southeast European Studies (IOS Regensburg), Germany

Abstact

Poster presented at the conference DH2024 Reinvention & Responsibility, 6–10 August 2024, Washington, D.C.

Introduction

The DFG-founded project „Digital Editions of Historical Travelogues“ (DEHisRe) develops a research infrastructure for the digital transcription, annotation, and visualization of historical travel accounts. In addition, we are examining ontology concepts for travel accounts and travel routes (Balck et al. 2023).

As a case study, we are using the handwritten travel records of Franz Xaver Bronner (1758-1850), a German migrant who traveled to Kazan on the Volga in 1810 from Aarau in Switzerland and returned from Russia in 1817 (Beyer-Thoma 2017).

By incorporating the best practices in the field, we are presenting a six-step life cycle of digital editing. This circular workflow is based on an iterative process and is aligned with the „Life Cycle of Historical Information“ (Boonstra et al. 2004; Meroño-Peñuela et al. 2014) while aiming to benefit sustainable digital editions of historical travelogues.

Six-Step Life Cycle  of Digital Editing 

The circular workflow starts with the digitization and transcription (1) of the texts to be edited, followed by the modeling and annotation (2 & 3) of data contained in the text. The processed data provides answers to complex information queries (4) and enables their analysis (5). Visualization (6) creates a new access to the texts, opening different perspectives. New research questions can trigger another iteration of the life cycle. (Fig. 1)

Fig. 1: Six-Step Life Cycle of Digital Editing

Step 1: Creating 

Creating includes the digitization and transcription of historical sources. It begins with the selection of documents to be digitized, followed by digitization and subsequent manual transcription or automated text recognition using appropriate tools. The goal of this step is to create a digital copy of the original, which forms the basis for the following steps and supports the preservation of cultural heritage.[1]

Step 2: Modeling 

During the step of data modeling, the digitized document is structured and enriched. It includes defining relationships between data, categorizing information, and creating schemas. We use a data model based on the TEI guidelines (TEI 2019) and the DTA Base Format (DTABf) (Haaf et al. 2014). These open standards allow lossless data exchange, the merging of information from different sources, and the reuse of data for extended analyses. For ontological modeling, we apply CIDOC-CRM.[2]

Step 3: Editing 

The editing step enriches the texts using the previously created data models. The digitized texts are corrected, annotated, and provided with metadata. Methods like NER and ontological annotation are employed. The dates were semi-automatically annotated with existing software modified to suit the specifics of the case study (Möhrke 2023).[3] Additional information is outsourced to registers.[4]

Step 4: Information Retrieval 

This step involves developing systems and algorithms to retrieve relevant information from large data sets. In our case study XSLT and XQuery are primarily used. This step is crucial for the accessibility and usability of the data. 

Step 5: Analysis 

In the analysis step, digitized and edited data are examined to extract insights. This can encompass various types of data analysis, including textual, network, spatiotemporal, or statistical evaluation.  The goal is to identify patterns, trends, or relationships within the data, potentially leading to new research questions that restart the life cycle.[5]

Step 6: Presentation 

This step involves presenting and disseminating the results by publishing digital editions, creating visualizations, or developing user interfaces for databases. Modeling in established formats allows visualization in various tools and linking with other digital editions. Presentation can be delivered through digital platforms, web applications, or traditional print publications. 

Conclusion 

In the context of digital editions, sustainable editing is primarily considered in terms of reusability and long-term availability (Crompton 2023).[6] The six-step life cycle we present adds value to responsible and sustainable editing in two ways: by consistently implementing established standards, and by integrating them into a workflow that efficiently utilizes and reuses data.

It helps create digital resources from historical sources, providing deeper insights and new research opportunities. Each step contributes to the digital transformation of historical information, maximizing its value for research and teaching while ensuring compliance with the FAIR principles. The iterative nature of the project emphasizes continuous improvement of digital editions, with transparent documentation ensuring data and pipeline reusability. Continuous refinement through case-by-case replication enhances the quality and usability of digital resources, providing sustainable practical guidance for innovative DH approaches along with responsible action.


[1] Overall, our workflow and deliveries align with the current European and national initiatives in the Digital Humanities and Digital Editing, such as DARIAH and NFDI. 

[2] See documentation by Ingo Frank: Draft of OWL ontology and SHACL shapes for frame-based Ontology Design Patterns for the DTO (Digital Editions of Historical Travelogues Ontology), https://github.com/dehisre01/bronner-ontology-design-patterns (10 June 2024).

[3] For this approach, the program HeidelTime was used and supplemented with regular expressions tailored to the text. This approach proved to be superior to the results of statistical models (such as spacy or flair) in experiments and has the advantage of being more resource-efficient. See documentation by Jacob Möhrke: https://github.com/dehisre01/bronner-heideltime-annotation (10 June 2024).

[4] These registers compile preliminary work of Hermann Beyer-Thoma and have been comlemented with authority data from GND, VIAF, and GeoNames. This ensures definitive identification and interoperability. The technical implementation was done via Python, OpenRefine and ediarum, i.e. with existing tools aiming for sustainable editing.

[5] One example emerged regarding overlaps between Bronner’s travel route and those of contemporaries with similar destinations. To further explore this, we need at least two editions with geolocated annotations using compatible data formats. Consequently, we initiated a dialogue with the „edition humboldt digital“ project to address this query (see Fischer; Thomas 2021; Humbodt 2020).

[6] Initiatives such as Text+ have already been established with the explicit goal of contributing to greater sustainability of digital data. An obvious path for reducing inconsistency is the use of standards like TEI (Unsworth 2011), which is common in digital editions nowadays and should improve sustainability in the future.

References

Balck, Sandra / Beyer-Thoma, Hermann / Frank, Ingo / Ananieva, Anna (2023): “Interlinking Text and Data with Semantic Annotation and Ontology Design Patterns to Analyse Historical Travelogues”, in: Digital Humanities Quarterly 17.3 <http://www.digitalhumanities.org/dhq/vol/17/3/000726/000726.html> [10 June 2024].

Balck, Sandra / Ananieva, Anna / Beyer-Thoma, Hermann / Frank, Ingo / Möhrke, Jacob / Schnell, Corwin (2022): “Building a Digital Infrastructure for the Edition and Analysis of Historical Travelogues”, in: Cummings, James (ed.): TEI 2022 Conference Book, Newcastle Upon Tyne, UK, September 2022: 70-71. DOI: 10.5281/zenodo.7071025.

Beyer-Thoma, Hermann (2017). “Donauwörth – Aarau – Kazan‘: Die Auswanderungsentscheidung des ehemaligen bayerischen Mönchs Franz-Xaver Bronner im Jahr 1809”, in: Zeitschrift für bayerische Landesgeschichte 79, 3: 689-741 <https://nbn-resolving.org/urn:nbn:de:0168-ssoar-74075-8> [10 June 2024].

Boonstra, Onno / Breure, Leen / Doorn, Peter (2004): “Past, Present and Future of Historical Information Science”, in: Historical Social Research 29, 2: 4-132, DOI: 10.12759/hsr.29.2004.2.4-132.

Crompton, Constance (2023): “No Boutique or Fashionable Technologies: Project Development, Mentorship, and Sustainability in an Innovation-First World”, in: Digital Humanities Quarterly 17, 1 <https://digitalhumanities.org/dhq/vol/17/1/000660/000660.html> [10 June 2024].

Fischer, Gordon / Thomas, Christian (2021): ”Humboldt auf Reisen: Chronotopische Zugänge zur edition humboldt digital“, in: vDHd 2021 <https://vdhd2021.hypotheses.org/292> [10. June 2024].

Haaf, Susanne / Geyken, Alexander / Wiegand, Frank (2015): “The DTA ‘Base Format’: A TEI Subset for the Compilation of a Large Reference Corpus of Printed Text from Multiple Sources”, in: Journal of the Text Encoding Initiative 8, 9 April. DOI: 10.4000/jtei.1114.

Humboldt, Alexander von (2020): “Fragmente des Sibirischen Reise-Journals 1829 [= Tagebücher der Russisch-Sibirischen Reise I]”, ed. by Tobias Kraft; Florian Schnee; Ulrich Päßler, in: Ette, Ottmar (ed.): edition humboldt digital, Berlin: Berlin-Brandenburgische Akademie der Wissenschaften, 2020 <https://edition-humboldt.de/H0005449 > [10 June 2024].

Meroño-Peñuela, Albert / Ashkpour, Ashkan / van Erp, Marieke / Mandemakers, Kees / Breure, Leene / Scharnhorst, Andrea / Schlobach, Stefan / van Harmelen, Frank (2015): “Semantic Technologies for Historical Research: A Survey”, in: Semantic Web 6, 6: 539–564. DOI: 10.3233/SW-140158.

Möhrke, Jacob (2023): Semi-automated Annotation of Dates. <https://dehisre.ios-regensburg.de/semi-automatisierte-annotation-von-zeitangaben> [10 May 2023].

Text Encoding Initiative – TEI (2019): A very gentle introduction to the TEI markup language.  <https://tei-c.org/Vault/Tutorials/mueller-index.htm> [10 June 2024].

Unsworth, John (2011): “Computational Work with Very Large Text Collections: Interoperability, Sustainability, and the TEI”, in: Journal of the Text Encoding Initiative 1, 8 June. DOI: 10.4000/jtei.215.

Weitin, Thomas (2017): “Scalable Reading”, in: Zeitschrift für Literaturwissenschaft und Linguistik 47, 1:  1–6.

0 Kommentare

Einen Kommentar abschicken

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert

WordPress Cookie Hinweis von Real Cookie Banner