Towards the Automated Enrichment of Multilingual Terminology Databases with Knowledge-Rich Contexts - Experiments with Russian EuroTermBank Data

Anne-Kathrin Schumann
University of Vienna, Austria / Tilde, Latvia

Ladda ner artikel

Ingår i: Proceedings of CHAT 2012: The 2nd Workshop on the Creation; Harmonization and Application of Terminology Resources; Co-located with TKE 2012; June 22; 2012; Madrid; Spain

Linköping Electronic Conference Proceedings 72:4, s. 27-34

Visa mer +

Publicerad: 2012-06-11


ISSN: 1650-3686 (tryckt), 1650-3740 (online)


Although knowledge-rich context (KRC) extraction has received a lot of attention; to our knowledge few attempts at directly feeding KRCs into a terminological resource have been undertaken. The aim of this study; therefore; is to investigate to which extent pattern-based KRC extraction can be useful for the enrichment of terminological resources. The paper describes experiments aiming at the enrichment of a multilingual term bank; namely EuroTermBank; with KRCs extracted from Russian language web corpora. The contexts are extracted using a simple pattern-based method and then ranked by means of a supervised machine learning algorithm. The internet is used as a source of information since it is a primary means for finding information about terms and concepts for many language professionals; and a KRC extraction approach must therefore be able to deal with the quality of data found online in order to be applicable to real tasks.


computer-aided terminography; knowledge-rich contexts; web as corpus; Russian language; multilingual terminology databases


[1] Auger; A.; Barrière; C.: Pattern-based approaches to semantic relation extraction. Terminology. 14 (1); 1-19 (2008)

[2] Condamines; A.; Rebeyrolle; J.: Searching for and identifying conceptual relationships via a corpus-based approach to a Terminological Knowledge Base (CTKB). In: Bourigault; D.; Jacquemin; C.; L’Homme; M.-C. (eds.) Recent Advances in Computational Terminology; pp. 127-148. John Benjamins; Amsterdam/Philadelphia (2001)

[3] De Groc; C.: Babouk: Focused web crawling for corpus compilation and automatic terminology extraction. In: IEEE/WIC/ACM International Conference on Web Intelligence (2011)

[4] Feliu; J.; Cabré; M.: Conceptual relations in specialized texts: new typology and an extraction system proposal. In: Proceedings of TKE 2002; pp. 45-49. INRIA; Nancy (2002)

[5] Halskov; J.; Barrière; C.: Web-based extraction of semantic relation instances for terminology work. Terminology. 14 (1); 20-44 (2008)

[6] International Organization for Standardization. International Standard ISO 12620: 2009 – Terminology and Other Language and Content Resources – Specification of Data Categories and Management of a Data Category Registry for Language Resources. ISO; Geneva (2009)

[7] Malaisé; V.; Zweigenbaum; P.; Bachimont; B.: Mining defining contexts to help structuring differential ontologies. Terminology. 11 (1); 21-53 (2005)

[8] Marshman; E.: Towards strategies for processing relationsips between multiple relation participants in knowledge patterns. An analysis in English and French. Terminology. 13 (1); 1-34 (2007)

[9] Marshman; E.: Expressions of uncertainty in candidate knowledge-rich contexts. A comparison in English and French specialized texts. Terminology. 14 (1); 124-151 (2008)

[10] Meyer; I.: Extracting Knowledge-Rich Contexts for Terminography: A conceptual and methodological framework. In: Bourigault; Jacquemin; L’Homme (eds.); pp. 279-302 (2001)

[11] Pearson; J.: Terms in Context. (Studies in Corpus Linguistics 1). John Benjamins; Amsterdam/Philadelphia (1998)

[12] Rirdance; S.; Vasiljevs; A. (eds.): Towards Consolidation of European Terminology Resources. Experience and Recommendations from EuroTermBank Project. Tilde; Riga (2006)

[13] Schumann; A.-K.: A Bilingual Study of Knowledge-Rich Context Extraction in Russian and German. In: Proceedings of the Fifth Language & Technology Conference; pp. 516-520. Fundacja Uniwersytetu im. A. Mickiewicza; Poznan (2011)

[14] Sharoff; S.: Creating general-purpose corpora using automated search engine queries. In: Baroni; M.; Bernardini; S. (eds.); WaCky! Working papers on the Web as Corpus. Gedit; Bologna (2006)

[15] Sharoff; S.; Kopotev; M.; Erjavec; T.; Feldmann; A.; Divjak; S.: Designing and evaluating Russian tagsets. In: Proceedings of LREC (2008)

[16] Sierra; G.; Alarcón; R.; Aguilar; C.; Bach; C.: Definitional verbal patterns for semantic relation extraction. Terminology. 14 (1); 74-98 (2008)

[17] Walter; S.: Definitionsextraktion aus Urteilstexten. PhD thesis in Computational Linguistics. Saarland University Saarbrücken (2010)

[18] Xu; F.-Y.: Bootstrapping Relation Extraction from Semantic Seeds. PhD thesis in Computational Linguistics. Saarland University Saarbrücken (2007)

Citeringar i Crossref