Natural Language Processing (NLP), statistical analytical methods, and lately Machine Learning (ML), are at the heart of analyzing large amounts of unstructured data computationally. While advanced techniques and tools are readily available to preprocess, process, and analyze vast corpuses in western languages, most of them are not or arbitrarily applicable to Ottoman Turkish documents. Consequently, except for sporadic attempts to introduce computational text analysis to Ottoman Studies, Ottoman scholarship remains largely incompatible with the age of Big Data.
In an attempt to offer a solution to some of the infrastructural and methodological problems preventing Ottoman texts to be analyzed computationally, I have spent the last two years developing Rumi 1.0 and Rumi Analyzer, two software optimized to the Ottoman Turkish language. In my talk I will discuss the challenges of processing Ottoman Turkish texts algorithmically, the solutions I have given to these challenges, and present the results of my analyses of early modern Ottoman documents.
Tamás Kiss (PhD, CEU) is a software developer and a historian of the early modern Ottoman Empire and the Mediterranean with an interest in computer-enhanced stylometry and topic modelling. Currently he is developing tools for Ottoman Turkish computational text analysis and working toward creating a best practices guide for Ottomanists interested in using computational methods in their research. He is currently a non-stipendary postdoctoral researcher at CEU’s Medieval Studies Department and e-learning developer at QLytix Hungary.