top of page
Szukaj

Our latest article: Unveiling the Critical Nexus of Data Preprocessing and Transparent Documentation for Result Quality and Reproducibility in Digital History


ree

This paper underscores the importance of adequate data preprocessing, transparency, and documentation in digital history research, showcasing how these often overlooked practices impact research quality and reproducibility. We present a topic modelling case study involving over 160,000 records of official correspondence of the Atlantic Portuguese Empire from 1640 to 1822 to illustrate how these practices, associated with standardised formats and metadata conventions, facilitate the sharing and reproduction of experiments. First, we evaluate the impact of data cleaning and preprocessing on model performance. Second, concerning model selection, we compare the performance of latent Dirichlet allocation (LDA), latent semantic indexing (LSI), and Gibbs sampling algorithm for a Dirichlet mixture model (GSDMM). Besides stressing the underestimated significance of data preprocessing and transparent documentation to strengthen research robustness and contribute to a reproducibility culture, we also demonstrate the potential of topic modelling in digital historical studies, specifically in the context of the Atlantic Portuguese Empire.


 
 
 

Komentarze


Join our mailing list for updates on publications and events

Thanks for submitting!

© 2023 by MAPE. Powered and secured by Wix

bottom of page