top of page
robot

MAPE Engine

This study presents the MAPE Engine, an innovative AI-powered framework that integrates advanced natural language processing (NLP), large language models (LLMs), embedding-based validation and clustering techniques to extract information from approximately 180,000 historical records.

What is MAPE Engine?

The study of the extensive and unstructured correspondence of the Portuguese Empire (1610–1833), archived in the Arquivo Histórico Ultramarino de Lisboa, poses a great challenge to traditional research methods due to its complexity and volume. This study presents the MAPE Engine, an innovative AI-powered framework that integrates advanced natural language processing (NLP), large language models (LLMs), embedding-based validation and clustering techniques to extract information from approximately 180,000 historical records. The method automates the assignment of concise, contextual topics to each correspondence and organizes them into thematic clusters that reveal overarching categories such as, among others, colonial administration, maritime trade, religious affairs, and mobility. By leveraging the multilingual and contextual understanding capabilities of the LLaMA 3.2 model and advanced clustering algorithms, this approach overcomes the limitations of traditional archival processing and provides improved accessibility and interpretability. The MAPE engine paves the way for transformative archival research. It enables international scholars and history enthusiasts to explore hidden patterns and connections in historical datasets in a bilingual, user-friendly tool in English and Portuguese.

Mape Engine will be soon available on the server of Tadeusz Manteuffel Institute of History
Polish Academy of Sciences

Find out more about MAPE Engine

Our Dataset

MAPE: A Dataset of Correspondence from the Portuguese Empire

The MAPE dataset comprises 182,491 historical correspondence records from the Arquivo Histórico Ultramarino de Lisboa (Portuguese Overseas Archives of Lisbon, hereafter AHU), in particular from the collection of the Conselho Ultramarino (Overseas Council), covering the period from 1581 to 1859.

The MAPE dataset is provided as a single CSV file the repository root. It consolidates all correspondence registers extracted from the AHU PDFs into a uniform tabular structure.

The consolidated dataset originally contained correspondence in Portuguese, which was a significant barrier for a global audience. To overcome this limitation, we translated the original content into English using Google Gemini 1.5 Flash, a lightweight transformer-based model optimized for multilingual text processing and translation. Google Gemini 1.5 Flash supports over 100 languages and is designed to strike a balance between speed, computational efficiency and high-quality text creation. 

Download Dataset here: MAPE: A Dataset of Correspondence from the Portuguese Empire: https://zenodo.org/records/15481608

​How to cite: Błoch, A., Vasques Filho, D., Bojanowski, M., Santana, C., & Hussain, S. (2025). MAPE: A Dataset of Correspondence from the Portuguese Empire [Data set]. Zenodo. https://doi.org/10.5281/zenodo.15481608

Join our mailing list for updates on publications and events

Thanks for submitting!

© 2023 by MAPE. Powered and secured by Wix

bottom of page