Konferensartikel

Defining Verbal Synonyms: Between Syntax and Semantics

Zdenka Urešová
Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics, Prague, Czech Republic

Eva Fucíková
Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics, Prague, Czech Republic

Jan Hajic
Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics, Prague, Czech Republic

Eva Hajicová
Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics, Prague, Czech Republic

Ladda ner artikel

Ingår i: Proceedings of the 17th International Workshop on Treebanks and Linguistic Theories (TLT 2018), December 13–14, 2018, Oslo University, Norway

Linköping Electronic Conference Proceedings 155:9, s. 75-90

Visa mer +

Publicerad: 2018-12-10

ISBN: 978-91-7685-137-1

ISSN: 1650-3686 (tryckt), 1650-3740 (online)

Abstract

While studying verbal synonymy, we have investigated the relation between syntax and semantics in hope that the exploration of this relationship will help us to get more insight into the question of synonymy as the relationship relating (similar) meanings between different lexemes. Most synonym lexicons (Wordnets and similar thesauri) are based on an intuition about the similarity of word meanings, or on notions like “semantic roles.” In some cases, syntax is also taken into account, but we have found no annotation and/or evaluation experiment to see how strongly can syntax contribute to synonym specification. We have prepared an annotation experiment for which we have used two treebanks (Czech and English) from the Universal Dependencies (UD) set of parallel corpora (PUDs) in order to see how strong correlation exists between syntax and the assignment of verbs in context to pre-determined (bilingual) classes of synonyms. The resulting statistics confirmed that while syntax does support decisions about synonymy, such support is not strong enough and that more semantic criteria are indeed necessary. The results of the annotation will also help to further improve rules and specifications for creating synonymous classes. Moreover, we have collected evidence that the annotation setup that we have used can identify synonym classes to be merged, and the resulting data (which we plan to publish openly) can possibly serve for the evaluation of automatic methods used in this area.

Nyckelord

synonyms, lexical resource, parallel corpus, annotation, interannotator agreement, syntax, semantics, Universal Dependencies, valency

Referenser

Alishahi, A. and Stevenson, S. (2010). A computational model of learning semantic roles from child-directed language. Language and Cognitive Processes, 25(1):50–93.

Baker, C. F., Fillmore, C. J., and Lowe, J. B. (1998). The Berkeley FrameNet Project. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics - Volume 1, ACL ’98, pages 86–90, Stroudsburg, PA, USA. Association for Computational Linguistics.

Bouma, G., Haji?c, J., Nivre, J., Solberg, P., and Ovrelid, L. (2018). Expletives in universal dependency treebanks. In Proceedings of the Second Workshop on Universal Dependencies (UDW 2018), pages 18–26, Bruxelles, Belgium. Association for Computational Linguistics.

Bowerman, M. (2002). Mapping thematic roles onto syntactic functions: Are children helped by innate linking rules? In Mouton Classics: From Syntax to Cognition, from Phonology to Text, Vol. 2. (Original in Linguistics, 1990, 28, 1253-1289.). Mouton de Gruyter.

Brown, P. F., deSouza, P. V., Mercer, R. L., Pietra, V. J. D., and Lai, J. C. (1992). Class-based n-gram models of natural language. Comput. Linguist., 18(4):467–479.

Cech, R., Macutek, J., and Košcová, M. (2015). On the relation between verb full valency and synonymy. In Proceedings of the Third International Conference on Dependency Linguistics (Depling 2015), pages 68–73. Uppsala University, Uppsala, Sweden.

Cinková, S. (2006). From PropBank to EngVallex: Adapting the PropBank-Lexicon to the Valency Theory of the Functional Generative Description. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC 2006), pages 2170–2175, Genova, Italy. ELRA.

Cover, T. M. and Thomas, J. A. (2006). Elements of Information Theory, 2nd Edition. Wiley. Fellbaum, C. (1998). WordNet: An Electronic Lexical Database. Language, Speech, and Communication. MIT Press, Cambridge, MA. 423 pp.

Fillmore, C. J., Johnson, C. R., and L.Petruck, M. R. (2003). Background to FrameNet: FrameNet and Frame Semantics. International Journal of Lexicography, 16(3):235–250.

Hajic, J., Panevová, J., Urešová, Z., Bémová, A., Kolárová, V., and Pajas, P. (2003). PDTVALLEX: Creating a Large-coverage Valency Lexicon for Treebank Annotation. In Nivre, Joakim//Hinrichs, E., editor, Proceedings of The Second Workshop on Treebanks and Linguistic Theories, volume 9 of Mathematical Modeling in Physics, Engineering and Cognitive Sciences, pages 57—68, Vaxjo, Sweden. Vaxjo University Press.

Hartshorne, J. K., O’Donnell, T. J., Sudo, Y., Uruwashi, M., and Snedeker, J. (2010). Linking meaning to language: Linguistic universals and variation. In Ohlsson, S. and Catrambone, R., editors, Proceedings of the 32nd Annual Meeting of the Cognitive Science Society, pages 1186–1191, Austin, TX. Cognitive Science Society, Department of Linguistics and Scandinavian Studies, University of Oslo.

Kettnerová, V., Lopatková, M., and Bejcek, E. (2012). Mapping semantic information from FrameNet onto VALLEX. The Prague Bulletin of Mathematical Linguistics, 97:23–41.

Kingsbury, P. and Palmer, M. (2002). From Treebank to PropBank. In Proceedings of the LREC, Canary Islands, Spain.

Levin, B. (1993). English Verb Classes and Alternations: A Preliminary Investigation. The University of Chicago Press, Chicago and London.

Levin, B. and Hovav, M. R. (2005). Argument realization. Cambridge Univ. Press, Cambridge.

Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013). Efficient estimation of word representations in vector space. CoRR, abs/1301.3781.

Nivre, J., Abrams, M., Agic, Ž., Ahrenberg, L., Antonsen, L., Aranzabe, M. J., Arutie, G., Asahara, M., Ateyah, L., Attia, M., Atutxa, A., Augustinus, L., Badmaeva, E., Ballesteros, M., Banerjee, E., Bank, S., Barbu Mititelu, V., Bauer, J., Bellato, S., Bengoetxea, K., Bhat, R. A., Biagetti, E., Bick, E., Blokland, R., Bobicev, V., Börstell, C., Bosco, C., Bouma, G., Bowman, S., Boyd, A., Burchardt, A., Candito, M., Caron, B., Caron, G., Cebiro?glu Eryi?git, G., Celano, G. G. A., Cetin, S., Chalub, F., Choi, J., Cho, Y., Chun, J., Cinková, S., Collomb, A., Çöltekin, Ç., Connor, M., Courtin, M., Davidson, E., de Marneffe, M.-C., de Paiva, V., Diaz de Ilarraza, A., Dickerson, C., Dirix, P., Dobrovoljc, K., Dozat, T., Droganova, K., Dwivedi, P., Eli, M., Elkahky, A., Ephrem, B., Erjavec, T., Etienne, A., Farkas, R., Fernandez Alcalde, H., Foster, J., Freitas, C., Gajdošová, K., Galbraith, D., Garcia, M., Gärdenfors, M., Gerdes, K., Ginter, F., Goenaga, I., Gojenola, K., Gökirmak, M., Goldberg, Y., Gómez Guinovart, X., Gonzáles Saavedra, B., Grioni, M., Gruzitis, N., Guillaume, B., Guillot-Barbance, C., Habash, N., Hajic, J., Hajic jr., J., Hà M?, L., Han, N.-R., Harris, K., Haug, D., Hladká, B., Hlavá?cová, J., Hociung, F., Hohle, P., Hwang, J., Ion, R., Irimia, E., Jelínek, T., Johannsen, A., Jørgensen, F., Ka¸sikara, H., Kahane, S., Kanayama, H., Kanerva, J., Kayadelen, T., Kettnerová, V., Kirchner, J., Kotsyba, N., Krek, S., Kwak, S., Laippala, V., Lambertino, L., Lando, T., Larasati, S. D., Lavrentiev, A., Lee, J., Lê Hông, P., Lenci, A., Lertpradit, S., Leung, H., Li, C. Y., Li, J., Li, K., Lim, K., Ljubešic, N., Loginova, O., Lyashevskaya, O., Lynn, T., Macketanz, V., Makazhanov, A., Mandl, M., Manning, C., Manurung, R., Maranduc, C., Marecek, D., Marheinecke, K., Martínez Alonso, H., Martins, A., Mašek, J., Matsumoto, Y., McDonald, R., Mendonça, G., Miekka, N., Missilä, A., Mititelu, C., Miyao, Y., Montemagni, S., More, A., Moreno Romero, L., Mori, S., Mortensen, B., Moskalevskyi, B., Muischnek, K., Murawaki, Y., Müürisep, K., Nainwani, P., Navarro Horñiacek, J. I., Nedoluzhko, A., Nešpore-B¯erzkalne, G., Nguyên Thi. , L., Nguyên Thi. Minh, H., Nikolaev, V., Nitisaroj, R., Nurmi, H., Ojala, S., Olúòkun, A., Omura, M., Osenova, P., Östling, R., Øvrelid, L., Partanen, N., Pascual, E., Passarotti, M., Patejuk, A., Peng, S., Perez, C.-A., Perrier, G., Petrov, S., Piitulainen, J., Pitler, E., Plank, B., Poibeau, T., Popel, M., Pretkalnin, a, L., Prévost, S., Prokopidis, P., Przepiórkowski, A., Puolakainen, T., Pyysalo, S., Rääbis, A., Rademaker, A., Ramasamy, L., Rama, T., Ramisch, C., Ravishankar, V., Real, L., Reddy, S., Rehm, G., Rießler, M., Rinaldi, L., Rituma, L., Rocha, L., Romanenko, M., Rosa, R., Rovati, D., Ros, ca, V., Rudina, O., Sadde, S., Saleh, S., Samardži´c, T., Samson, S., Sanguinetti, M., Saulite, B., Sawanakunanon, Y., Schneider, N., Schuster, S., Seddah, D., Seeker, W., Seraji, M., Shen, M., Shimada, A., Shohibussirri, M., Sichinava, D., Silveira, N., Simi, M., Simionescu, R., Simkó, K., Šimková, M., Simov, K., Smith, A., Soares-Bastos, I., Stella, A., Straka, M., Strnadová, J., Suhr, A., Sulubacak, U., Szántó, Z., Taji, D., Takahashi, Y., Tanaka, T., Tellier, I., Trosterud, T., Trukhina, A., Tsarfaty, R., Tyers, F., Uematsu, S., Urešová, Z., Uria, L., Uszkoreit, H., Vajjala, S., van Niekerk, D., van Noord, G., Varga, V., Vincze, V., Wallin, L., Washington, J. N., Williams, S., Wirén, M., Woldemariam, T., Wong, T.-s., Yan, C., Yavrumyan, M. M., Yu, Z., Žabokrtský, Z., Zeldes, A., Zeman, D., Zhang, M., and Zhu, H. (2018). Universal dependencies 2.2. LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University.

Nivre, J., de Marneffe, M.-C., Ginter, F., Goldberg, Y., Hajic, J., Manning, C. D., McDonald, R., Petrov, S., Pyysalo, S., Silveira, N., Tsarfaty, R., and Zeman, D. (2016). Universal dependencies v1: A multilingual treebank collection. In Chair), N. C. C., Choukri, K., Declerck, T., Goggi, S., Grobelnik, M., Maegaard, B., Mariani, J., Mazo, H., Moreno, A., Odijk, J., and Piperidis, S., editors, Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), Paris, France. European Language Resources Association (ELRA).

Palmer, M. (2009). SemLink: Linking PropBank, VerbNet and FrameNet. In Proceedings of the Generative Lexicon Conference, page 9–15.

Palmer, M., Gung, J., Bonial, C., Choi, J., Hargraves, O., Palmer, D., and Stowe, K. (2017). The pitfalls of shortcuts: Tales from the word sense tagging trenches. To appear in: Lexical Semantics and Computational Lexicography.

Pustejovsky, J. (1991). The generative lexicon. Computational linguistics, 17(4):409–441.

Urešová, Z., Fu?cíková, E., Haji?cová, E., and Haji?c, J. (2018a). Creating a Verb Synonym Lexicon Based on a Parallel Corpus. In Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC’18), Miyazaki, Japan. European Language Resources Association (ELRA).

Urešová, Z., Fu?cíková, E., Haji?cová, E., and Haji?c, J. (2018b). CzEngClass 0.2. LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University, http://hdl.handle.net/11234/1-2824, Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Urešová, Z., Fu?cíková, E., Haji?cová, E., and Haji?c, J. (2018c). Synonymy in Bilingual Context: The CzEngClass Lexicon. In Proceedings of the 27th International Conference on Computational Linguistics, COLING 2018, Santa Fe, New Mexico, USA, August 20-26, 2018, pages 2456–2469.

Urešová, Z., Fu?cíková, E., Haji?cová, E., and Haji?c, J. (2018d). Tools for Building an Interlinked Synonym Lexicon Network. In Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC’18), Miyazaki, Japan. European Language Resources Association (ELRA).

Wu, Y. (2017). The interfaces of Chinese syntax with semantics and pragmatics (Routledge Studies in Chinese Linguistics). Journal of Linguistics, 54(1):222–227.

Zeman, D. (2017). Core arguments in universal dependencies. In Proceedings of the Fourth International Conference on Dependency Linguistics (Depling 2017), pages 287–296. Linköping University Electronic Press.

Citeringar i Crossref