Using Wikipedia for Domain Terms Extraction

Jorge Vivaldi
Universitat Pompeu Fabra, Barcelona, Spain

Horacio Rodríguez
Technical University of Catalonia, Barcelona, Spain

Ladda ner artikel

Ingår i: Proceedings of CHAT 2012: The 2nd Workshop on the Creation; Harmonization and Application of Terminology Resources; Co-located with TKE 2012; June 22; 2012; Madrid; Spain

Linköping Electronic Conference Proceedings 72:1, s. 3-10

Visa mer +

Publicerad: 2012-06-11


ISSN: 1650-3686 (tryckt), 1650-3740 (online)


Domain terms are a useful resource for tuning both resources and NLP processors to domain specific tasks. This paper proposes a method for ob-taining terms from potentially any domain using Wikipedia.


Term extraction; domain terminology; Wikipedia


1. Aronson A.; Lang F.: An overview of MetaMap: historical perspective and recent ad-vances. JAMIA 2010 17; p. 229-236 (2010).

2. Cabré M.T.; Estopà R.; Vivaldi J.: Automatic term detection. A review of current systems. Recent Advances in Computational Terminology 2; p. 53-87m (2001).

3. Drouin P.: Term extraction using non-technical corpora as a point of leverage. Terminol-ogy 9(1); p. 99-115 (2003).

4. Enguehard C.; Pantera L.: Automatic Natural Acquisition of a Terminology. Journal of Quantitative Linguistics 2(1); p. 27-32 (1994).

5. Frantzi K. T.; Ananiadou; S.; Tsujii; J.: The C-value/NC-value Method of Automatic Rec-ognition for Multi-word Terms. LNCS; Volume 1513; p. 585-604 (2009).

6. Heid; U.; Jauß; S.; Krüger K.; Hofmann; A.: Term extraction with standard tools for corpus exploration. Experiece from German. In Proceedings of TKE’96. Berlin (1996).

7. Magnini B.; Cavaglià G.: Integrating Subject Field Codes In WordNet. In 2nd LREC (2000).

8. Medelyan; O.; Milne; D.; Legg C.; Witten; I. H.: Mining meaning from Wikipedia. Interna-tional Journal of Human-Computer Studies 67 (9); p. 716-754 (2009).

9. E. Niemann; Gurevych I.: The People’s Web meets Linguistic Knowledge: Automatic Sense Alignment of Wikipedia and WordNet. In: Proceedings of the 9th International Con-ference on Computational Semantics; p. 205-214 (2011).

10. Pazienza M.T.; Pennacchiotti M.; Zanzotto F.M.: Terminology Extraction: An Analysis of Linguistic and Statistical Approaches. StudFuzz 185; Springer-Verlag; p. 225-279 (2005).

11. Vivaldi J.: Extracción de candidatos a término mediante combinación de estrategias hete-rogéneas. PhD Thesis; Universitat Politècnica de Catalunya (2001).

12. Vivaldi J.; Rodríguez H.: Evaluation of terms and term extraction systems: A practical ap-proach. Terminology 13(2); p. 225-248 (2007).

13. Vivaldi J.; Rodríguez H.: Finding Domain Terms using Wikipedia. In 7th LREC (2010).

14. Vivaldi J.; Rodríguez H.: Using Wikipedia for term extraction in the biomedical domain: first experience. In Procesamiento del Lenguaje Natural 45; p. 251-254 (2010).

15. Zesch T.; Müller C.; Gurevych I.: Extracting Lexical Semantic Knowledge from Wikipedia and Wiktionary. In 6th LREC p. 1646-1652 (2008).

Citeringar i Crossref