Networks from Archives
Our project, "Networks from Archives," focuses on reconstructing networks of administrative correspondence using a vast dataset from the Historical Overseas Archives of Lisbon. This dataset contains nearly 170,000 documents from the period of 1610 to 1833, detailing administrative correspondence between Portugal and its Atlantic ex-colonies. In this article, we specifically examine the period of 1642-1822, which spans the establishment of the Overseas Council in Lisbon to the declaration of independence of Brazil. Through this analysis, we aim to contribute to ongoing discussions and debates surrounding the historical relations and networks between Portugal and its former colonies.
In "Networks from archives: Reconstructing networks of official correspondence in the early modern Portuguese empire," we detail the development of our digital methodology for converting archival data into network data. Figure 1 provides a succinct overview of the steps involved in our methodology, which can be summarized as follows:
First, we created a random sample from the 169,221 register entries in plain text to facilitate annotation. Next, we annotated the entries in the sample with the relevant labels or categories. We then trained an NER (Named Entity Recognition) model using the annotated entries to identify the senders, recipients, and attributes present in all entries. To identify text patterns, we utilized regular expressions. Furthermore, we extracted metadata from the text using regular expressions. Finally, we constructed network data by analyzing duplicate entities, correcting typos, and conducting network analysis and visualization.
By following these steps, we were able to convert the archival data into a structured network format, which enabled us to analyze the administrative correspondence and the relationships between different entities more effectively.
Fig. 1. Schematic of the methodology to turn archival data into network data.
The main goal is to convert large amounts of unstructured textual data from archives into network data. This approach is increasingly popular in historical research and has been successful for various historical periods. The process of designing a network study involves identifying the social entities of interest (nodes), the relevant types of relations (edges), and the attributes of nodes and edges. Although working with archival information may seem more passive than active modes of data collection, such as experiments or surveys, researchers still have some freedom to interpret and code the source material to infer types of relationships and attributes of actors. The degree of freedom varies depending on the source material, which ranges from structured indexes to unstructured natural language prose.