Konferensartikel

Open Stylometric System WebSty: Towards Multilingual and Multipurpose Workbench

Maciej Piasecki
Faculty of Computer Science and Management, Wroclaw University of Science and Technology, Poland

Tomasz Walkowiak
Faculty of Electronics, Wroclaw University of Science and Technology, Poland

Maciej Eder
Institute of Polish Language, Polish Academy of Sciences and Pedagogical, University of Kraków, Poland

Ladda ner artikel

Ingår i: Selected papers from the CLARIN Annual Conference 2017, Budapest, 18–20 September 2017

Linköping Electronic Conference Proceedings 147:12, s. 145-158

Visa mer +

Publicerad: 2018-05-16

ISBN: 978-91-7685-273-6

ISSN: 1650-3686 (tryckt), 1650-3740 (online)

Abstract

WebSty is an open, web-based stylometric system designed for Social Sciences & Humanities (SS&H) users. It was designed according to the CLARIN philosophy: no need for installation, minimised requirements on the users’ technical skills and knowledge, and focus on SS&H tasks. In the paper, we present its latest extension with several visualisation methods, techniques for the extraction of characteristic features, and support for multilinguality.

Nyckelord

stylometry Web-based application authorship attribution text clustering

Referenser

[Bell, 2010] Bell, M. (2010). SOA Modeling Patterns for Service-Oriented Discovery and Analysis. Wiley & Sons
[Belkin & Niyogi, 2003] Belkin, M., Niyogi, P. (2003). Laplacian Eigenmaps for Dimensionality Reduction and Data Representation. Neural Computation, 15(6): 1373–1396.

[Eder et al., 2017] Eder, M., Piasecki, M. and Walkowiak, T. (2017). An open stylometric system based on multilevel text analysis. Cognitive Studies | Études cognitives, 2017(17), https://doi.org/10.11649/cs.1430.

[Eder et al., 2016] Eder, M., Rybicki, J. and Kestemont, M. (2016). Stylometry with R: a package for computational text analysis. R Journal, 8(1): 107–121, http://journal.r-project.org/archive/2016-1/eder-rybicki-kestemont.pdf.

[Honnibal & Johnson, 2015] Honnibal, M. and Johnson, M. (2015). An Improved Non-monotonic Transition System for Dependency Parsing. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Lisbon, Portugal, 1373-1378

[Landauer & Dumais, 1997] Landauer, T. and Dumais, S. (1997) A Solution to Plato’s Problem: The Latent Semantic Analysis Theory of Acquisition. Psychological Review, 1997, 104, pp. 211-240.

[Le et al., 2011] Le, X., Lancashire, I., Hirst, G. and Jokel, R. (2011). Longitudinal detection of dementia through lexical and syntactic changes in writing: a case study of three British novelists. Literary and Linguistic Computing, 26(4): 435–461.

[van der Maaten & Hinton, 2008] van der Maaten, L.J.P.; Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9 (Nov), pp.: 2431–2456.

[Maryl et al., 2016] Maryl, M., Piasecki, M. & Mlynarczyk, K. (2016) Where Close and Distant Readings Meet: Text Clustering Methods in Literary Analysis of Weblog Genres. In Eder, M. & Rybicki, J. (Eds.) Digital Humanities 2016 Conference Abstracts, Jagiellonian University and Pedagogical University, pp. 273-275.

[Maurer, 2017] Maurer, Leon (access Apr. 2017) Web page of the StyleTool program URL: https://github.com/lnmaurer/StyleTool

[McCallum, 2002] McCallum, A.K. (2002) MALLET: A Machine Learning for Language Toolkit. Web page of the system. URL: http://mallet.cs.umass.edu.

[McDonald et al., 2012] McDonald, A., Afroz, S., Caliskan, A., Stolerman, A. and Greenstadt, R. (2012) Use Fewer Instances of the Letter "i": Toward Writing Style Anonymization. PETS 2012

[Manning et al., 2014] Manning, Ch. D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J. and McClosky, D. (2014). The Stanford CoreNLP Natural Language Processing Toolkit. Association for Computational Linguistics (ACL) 2014 – System Demonstrations, ACL.

[Orosz & Novák, 2013)] Orosz, G. and Novák, A. (2013) PurePos 2.0: a Hybrid Tool for Morphological Disambiguation. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2013), page 539–545, Hissar, Bulgaria, 2013. INCOMA Ltd. Shoumen, BULGARIA.

[Peltz, 2003] Peltz, Ch. (2003). Web services orchestration and choreography. Computer, vol. 36, no. 10, pp. 46–52

[Petrov et al., 2012] Petrov, S., Das, D., & McDonald, R. (2012) A Universal Part-of-Speech Tagset. In Proceedings of LREC 2012.

[Pol et al., 2018] Pol M., Walkowiak T., Piasecki M. (2018). Towards CLARIN-PL LTC Digital Research Platform for: Depositing, Processing, Analyzing and Visualizing Language Data. In Reliability and Statistics in Transportation and Communication. Lecture Notes in Networks and System, Springer International Publishing, vol. 33.

[Przepiórkowskiet al., 2012] Przepiórkowski, A., Banko, M., Górski, R. L. and Lewandowska-Tomaszczyk, B. (eds) (2012). Narodowy Korpus Jezyka Polskiego. Warszawa: PWN.

[Radziszewski, 2013] Radziszewski, A. (2013). A Tiered CRF Tagger for Polish. In Intelligent Tools for Building a Scientific Information Platform. Studies in Computational Intelligence, vol. 467, pp. 215–230.

[Sinclair et al., 2012] Sinclair, S., Rockwell, G. and the Voyant Tools Team (2012) Voyant Tools (web application). URL: http://docs.voyant-tools.org

[Stamatatos, 2009] Stamatatos, E. (2009). A survey of modern authorship attribution methods. Journal of the American Society for Information Science and Technology, 60(3): 538–556.

[Straka & Straková, 2017] Straka, M. and Straková, J. (2017) Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe. In Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies. ACL, pp. 88–99.

[Walkowiak, 2014] Walkowiak, T. (2014), Behavior of Web Servers in Stress Tests. In Advances in Intelligent Systems and Computing Vol. 286, Springer, pp. 467–476.

[Walkowiak, 2016] Walkowiak, T. (2016). Asynchronous System for Clustering and Classifications of Texts in Polish. In: Proceedings of the Eleventh International Conference on Dependability and Complex Systems DepCoSRELCOMEX, 2016, Springer International Publishing, pp. 529–538.

[Walkowiak, 2018] Walkowiak, T. (2018). Language Processing Modelling Notation – orchestration of NLP microservices. In: Advances in Dependability Engineering of Complex Systems, Springer International Publishing, pp. 464-473.

[Wittenburg et al., 2010] Wittenburg, P., Bel, N., Borin, L., Budin, G., Calzolari, N., Hajicová, E, Koskenniemi, K., Lemnitzer, L., Maegaard, B., Piasecki, M., Pierrel, J., Piperidis, S., Skadina, I., Tufis, D., van Veenendaal, R., Váradi, T., and Wynne, M. (2010) Resource and Service Centres as the Backbone for a Sustainable Service Infrastructure. In Nicoletta Calzolari et al. (ed.) Proceedings of the International Conference on Language Resources and Evaluation, LREC 2010, 17-23 May 2010, Valletta, Malta, European Language Resources Association (ELRA), pp. 60--63.

[Zhao & Karypis, 2005)] Zhao, Y. and Karypis, G. (2005). Hierarchical Clustering Algorithms for Document Datasets. Data Mining and Knowledge Discovery, 10(2): 1.

Citeringar i Crossref