Conference article

Bulgarian Language Technology for Digital Humanities: a focus on the Culture of Giving for Education

Kiril Simov
IICT-BAS, Sofia, Bulgaria

Petya Osenova
IICT-BAS, Sofia, Bulgaria

Download article

Published in: Selected papers from the CLARIN Annual Conference 2018, Pisa, 8-10 October 2018

Linköping Electronic Conference Proceedings 159:20, p. 196-204

Show more +

Published: 2019-05-28

ISBN: 978-91-7685-034-3

ISSN: 1650-3686 (print), 1650-3740 (online)


The paper presents the main language technology components that are necessary for supporting the investigations within the digital humanities with a focus on the culture of giving for education. This domain is socially significant and covers various historical periods. It also takes into consideration the social position of the givers, their gender and the type of the giving act (last posthumous will or financial support in one’s lifetime). The survey describes the adaptation of the NLP tools to the task as well as the various ways for improving the targeted extraction from the specially designed corpus of texts related to giving. The main challenge was the language variety caused by the big time span of the texts (80-100 years). We provided two initial instruments for targeted information extraction: statistics with ranked word occurrences and content analysis. Even in this preliminary stage the provided technology proved out to be very useful for our colleagues in sociology, cultural and educational studies.


Bulgarian Language Technology, Processing different codifications of contemporary Bulgarian, Linked Open Data of Biographical Data


Anthony, L. 2014. AntConc (Version 3.4.4w) [Computer Software]. Tokyo, Japan: Waseda University.
Available from

Fokkens et al. 2018. Fokkens, A., Ter Braake, S., Ockeloen, N., Vossen, P., Legêne, S., Schreiber, G., De Boer, V. BiographyNet: Extracting Relations Between People and Events. At: arXiv:1801.07073 [cs.CL]

Georgiev G., Zhikov V., Simov K., Osenova P., Nakov P. 2012. Feature-Rich Part-of-speech Tagging for Morphologically Complex Languages: Application to Bulgarian. In: proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics. Avignon, France. pp 492-502.

Monachesi, P., Lemnitzer, L., Simov, K. Language Technology for eLearning. In: Innovative Approaches for Learning and Knowledge Sharing. EC-TEL 2006 . Lecture Notes in Computer Science, vol 4227. Springer, Berlin, Heidelberg, 2006, p. 667-672.

Popov, D., Simov, K. and Vidins a, S. 1998. A Dictionary of Writing, Pronunciation and Punctuation of Bulgarian Language. (in Bulgarian) Atlantis KL, Sofia, Bulgaria. 927 pages.

Popov D., Simov, K., Vidinska, S. and Osenova, P. 2003. A Spelling Dictionary of Bulgarian Language. (in Bulgarian), Nauka i Izkustvo, Sofia, Bulgaria. 808 pages.

Savkov, A., Laskova, L., Kancheva, S., Osenova, P., Simov, K. Linguistic Analysis Processing Line for Bulgarian . Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012), ELRA, 2012, 2959-2964

Simov, K., Osenova, P., Kolkovska, P., Balabanova, E., Doikoff, D. A Language Resources Infrastructure for Bulgarian . Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC 2004), ELRA, 2004, 1685-1688.

Simova, I., Vasilev, D., Popov, A., Simov, K., Osenova P. 2014. Joint Ensemble Model for POS Tagging and Dependency Parsing. In: Proceedings of the First Joint Workshop on Statistical Parsing of Morphologically Rich Languages and Syntactic Analysis of Non-Canonical Languages. Dublin, Ireland. pp 15–25.

Staykova, K., Simov, K., Agre, G., Osenova, P. Language Technology Support for Semantic Annotation of Iconographic Descriptions. In: Proceedings of the Workshop on Language Technologies for Digital Humanities and Cultural Heritage, RANLP 2011 , 2011, p. 51-56.

Zhikov, V., Georgiev, G., Simov, K., Osenova, P. Combining POS Tagging, Dependency Parsing and Coreferential Resolution for Bulgarian . Proceedings of RANLP, 2013, 755-762.

Citations in Crossref