Towards the Automated Enrichment of Multilingual Terminology Databases with Knowledge-Rich Contexts - Experiments with Russian EuroTermBank Data

Anne-Kathrin Schumann
University of Vienna, Austria / Tilde, Latvia

Ingår i: Proceedings of CHAT 2012: The 2nd Workshop on the Creation; Harmonization and Application of Terminology Resources; Co-located with TKE 2012; June 22; 2012; Madrid; Spain

Linköping Electronic Conference Proceedings 72:4, s. 27-34

Publicerad: 2012-06-11


ISSN: 1650-3686 (tryckt), 1650-3740 (online)


Although knowledge-rich context (KRC) extraction has received a lot of attention; to our knowledge few attempts at directly feeding KRCs into a terminological resource have been undertaken. The aim of this study; therefore; is to investigate to which extent pattern-based KRC extraction can be useful for the enrichment of terminological resources. The paper describes experiments aiming at the enrichment of a multilingual term bank; namely EuroTermBank; with KRCs extracted from Russian language web corpora. The contexts are extracted using a simple pattern-based method and then ranked by means of a supervised machine learning algorithm. The internet is used as a source of information since it is a primary means for finding information about terms and concepts for many language professionals; and a KRC extraction approach must therefore be able to deal with the quality of data found online in order to be applicable to real tasks.


computer-aided terminography; knowledge-rich contexts; web as corpus; Russian language; multilingual terminology databases


