Alambic: an AI-based Environment for Exploring Literary Texts in Modern Arabic
The number of digitized and born-digital patrimonial documents in Arabic, located in national libraries, archives, and specialized collections around the world, is growing exponentially. However, despite recent digitization initiatives, this important cultural heritage remains difficult to exploit at scale without the use of intelligent tools allowing for the deep analysis of these collections.
Given these challenges, the aim of this project is to create an AI and Natural Language Processing text-analysis pipeline that can be applied to literary texts in Arabic (books and other documents of substantial length). In a first instance, the pipeline will include named entity recognition, dialogical structure identification and emotional analysis, used to visualize and explore fictional narrative structure, character networks and literary cartographies.
This project will thus open up an advanced AI-powered approach for the study of literary texts in order to test hypotheses and to develop new knowledge. The proposed pipeline will simplify the process of digital reading and research for students and scholars, and will make it possible to carry out both qualitative and quantitative analyses, proposing new ways of understanding works or authors through the emotions associated with places and characters. More broadly, the produced representations will make it possible to create virtual navigation scenarios around a given character or place, to link the identified elements to other texts or online resources (GeoNames, Wikidata, etc.).
Furthermore, this project will provide the scientific community with a model for the named entity recognition and emotional analysis in Arabic literary texts, and a widely available dataset of analysed fictional sources in Arabic with its annotation guidelines.
Porteur du projet: Motasem Alrahabi