WebSty is an open, web-based stylometric system designed for Social Sciences & Humanities (SS&H) users. It was designed according to the CLARIN philosophy: no need for installation, minimised requirements on the users’ technical skills and knowledge, and focus on SS&H tasks. In the paper, we present its latest extension with several visualisation methods, techniques for the extraction of characteristic features, and support for multilinguality.
[Bell, 2010] Bell, M. (2010). SOA Modeling Patterns for Service-Oriented Discovery and Analysis. Wiley & Sons
[Belkin & Niyogi, 2003] Belkin, M., Niyogi, P. (2003). Laplacian Eigenmaps for Dimensionality Reduction and Data Representation. Neural Computation, 15(6): 1373–1396.
[Eder et al., 2017] Eder, M., Piasecki, M. and Walkowiak, T. (2017). An open stylometric system based on multilevel text analysis. Cognitive Studies | Études cognitives, 2017(17), https://doi.org/10.11649/cs.1430.
[Eder et al., 2016] Eder, M., Rybicki, J. and Kestemont, M. (2016). Stylometry with R: a package for computational text analysis. R Journal, 8(1): 107–121, http://journal.r-project.org/archive/2016-1/eder-rybicki-kestemont.pdf.
[Honnibal & Johnson, 2015] Honnibal, M. and Johnson, M. (2015). An Improved Non-monotonic Transition System for Dependency Parsing. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Lisbon, Portugal, 1373-1378
[Landauer & Dumais, 1997] Landauer, T. and Dumais, S. (1997) A Solution to Plato’s Problem: The Latent Semantic Analysis Theory of Acquisition. Psychological Review, 1997, 104, pp. 211-240.
[Le et al., 2011] Le, X., Lancashire, I., Hirst, G. and Jokel, R. (2011). Longitudinal detection of dementia through lexical and syntactic changes in writing: a case study of three British novelists. Literary and Linguistic Computing, 26(4): 435–461.
[van der Maaten & Hinton, 2008] van der Maaten, L.J.P.; Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9 (Nov), pp.: 2431–2456.
[Maryl et al., 2016] Maryl, M., Piasecki, M. & Mlynarczyk, K. (2016) Where Close and Distant Readings Meet: Text Clustering Methods in Literary Analysis of Weblog Genres. In Eder, M. & Rybicki, J. (Eds.) Digital Humanities 2016 Conference Abstracts, Jagiellonian University and Pedagogical University, pp. 273-275.
[Maurer, 2017] Maurer, Leon (access Apr. 2017) Web page of the StyleTool program URL: https://github.com/lnmaurer/StyleTool
[McCallum, 2002] McCallum, A.K. (2002) MALLET: A Machine Learning for Language Toolkit. Web page of the system. URL: http://mallet.cs.umass.edu.
[McDonald et al., 2012] McDonald, A., Afroz, S., Caliskan, A., Stolerman, A. and Greenstadt, R. (2012) Use Fewer Instances of the Letter "i": Toward Writing Style Anonymization. PETS 2012
[Manning et al., 2014] Manning, Ch. D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J. and McClosky, D. (2014). The Stanford CoreNLP Natural Language Processing Toolkit. Association for Computational Linguistics (ACL) 2014 – System Demonstrations, ACL.
[Orosz & Novák, 2013)] Orosz, G. and Novák, A. (2013) PurePos 2.0: a Hybrid Tool for Morphological Disambiguation. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2013), page 539–545, Hissar, Bulgaria, 2013. INCOMA Ltd. Shoumen, BULGARIA.
[Peltz, 2003] Peltz, Ch. (2003). Web services orchestration and choreography. Computer, vol. 36, no. 10, pp. 46–52
[Petrov et al., 2012] Petrov, S., Das, D., & McDonald, R. (2012) A Universal Part-of-Speech Tagset. In Proceedings of LREC 2012.
[Pol et al., 2018] Pol M., Walkowiak T., Piasecki M. (2018). Towards CLARIN-PL LTC Digital Research Platform for: Depositing, Processing, Analyzing and Visualizing Language Data. In Reliability and Statistics in Transportation and Communication. Lecture Notes in Networks and System, Springer International Publishing, vol. 33.
[Przepiórkowskiet al., 2012] Przepiórkowski, A., Banko, M., Górski, R. L. and Lewandowska-Tomaszczyk, B. (eds) (2012). Narodowy Korpus Jezyka Polskiego. Warszawa: PWN.
[Radziszewski, 2013] Radziszewski, A. (2013). A Tiered CRF Tagger for Polish. In Intelligent Tools for Building a Scientific Information Platform. Studies in Computational Intelligence, vol. 467, pp. 215–230.
[Sinclair et al., 2012] Sinclair, S., Rockwell, G. and the Voyant Tools Team (2012) Voyant Tools (web application). URL: http://docs.voyant-tools.org
[Stamatatos, 2009] Stamatatos, E. (2009). A survey of modern authorship attribution methods. Journal of the American Society for Information Science and Technology, 60(3): 538–556.
[Straka & Straková, 2017] Straka, M. and Straková, J. (2017) Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe. In Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies. ACL, pp. 88–99.
[Walkowiak, 2014] Walkowiak, T. (2014), Behavior of Web Servers in Stress Tests. In Advances in Intelligent Systems and Computing Vol. 286, Springer, pp. 467–476.
[Walkowiak, 2016] Walkowiak, T. (2016). Asynchronous System for Clustering and Classifications of Texts in Polish. In: Proceedings of the Eleventh International Conference on Dependability and Complex Systems DepCoSRELCOMEX, 2016, Springer International Publishing, pp. 529–538.
[Walkowiak, 2018] Walkowiak, T. (2018). Language Processing Modelling Notation – orchestration of NLP microservices. In: Advances in Dependability Engineering of Complex Systems, Springer International Publishing, pp. 464-473.
[Wittenburg et al., 2010] Wittenburg, P., Bel, N., Borin, L., Budin, G., Calzolari, N., Hajicová, E, Koskenniemi, K., Lemnitzer, L., Maegaard, B., Piasecki, M., Pierrel, J., Piperidis, S., Skadina, I., Tufis, D., van Veenendaal, R., Váradi, T., and Wynne, M. (2010) Resource and Service Centres as the Backbone for a Sustainable Service Infrastructure. In Nicoletta Calzolari et al. (ed.) Proceedings of the International Conference on Language Resources and Evaluation, LREC 2010, 17-23 May 2010, Valletta, Malta, European Language Resources Association (ELRA), pp. 60--63.
[Zhao & Karypis, 2005)] Zhao, Y. and Karypis, G. (2005). Hierarchical Clustering Algorithms for Document Datasets. Data Mining and Knowledge Discovery, 10(2): 1.