Od kamnitega do spletnega portala: samodejno zaznavanje sprememb v rabi besed

Avtorji

  • Mojca Brglez Institut "Jožef Stefan", Filozofska fakulteta Univerze v Ljubljani
  • Veronika Bajt Mirovni inštitut
  • Senja Pollak Institut "Jožef Stefan"
  • Špela Rot Univerza v Ljubljani, Filozofska fakulteta
  • Matej Martinc Institut "Jožef Stefan"

DOI:

https://doi.org/10.51663/pnz.65.3.07

Ključne besede:

zaznavanje sprememb v rabi besed, semantika, pomenski premiki, sociolingvistika

Povzetek

V prispevku prikažemo sistem za zaznavanje sprememb v rabi besed v slovenščini, ki omogoča samodejno zaznavanje pomenskih premikov v različnih časovnih obdobjih. Najprej predstavimo tehnično zasnovo in zahteve sistema, metodologijo za odkrivanje sprememb in grafični uporabniški vmesnik, ki omogoča uporabniku prijazno uporabo, nato pa demonstriramo, kako je sistem mogoče implementirati na referenčnem korpusu slovenščine Gigafida 2.0 in ga uporabiti za iskanje in analizo sprememb v rabi besed v različnih časovnih obdobjih. Rezultate sistema evalviramo s pomočjo kognitivno-jezikoslovne in leksikalne analize najbolj spremenjenih pridevnikov in samostalnikov, kjer raziščemo in kategoriziramo pomene in rabe besed v zaznanih gručah glede na njihovo semantično motiviranost in zastopanost v slovarju. Nazadnje sistem uporabimo na primeru reprezentacije migracij v časovnih obdobjih z ročno določenimi ločnicami, ki so signifikantno vplivale na odnos do migracije in migrantov v Sloveniji, ter tako preverimo njegovo uporabnost za sociolingvistične raziskave. Z jezikoslovnega vidika ugotavljamo, da sistem razločuje pomensko, skladenjsko in drugače kontekstualno različne rabe, in pokažemo, da omogoča zaznavo tako kratkoročnih kot dolgoročnih sprememb. Po drugi strani ugotavljamo, da sistem jasno prikaže vpliv zunanjih dejavnikov v specifičnih časovnih obdobjih na jezik in diskurz in je tako uporabno orodje za sociolingvistično analizo.

Literatura

Aitchison, Jean. Language change: Progress or decay? Cambridge University Press, 2001.

Azarbonyad, Hosein, Dehghani, Mostafa, Beelen, Kaspar, Arkut, Alexandra, Marx, Maarten, in Kamps, Jaap. “Words are malleable: Computing semantic shifts in political and media discourse,” v Proceedings of the 2017 ACM Conference on information and knowledge management (2017): 1509–1518.

Basile, Pierpaolo, Caputo, Annalina, Caselli, Tommaso, Cassotti, Pierluigi, Varvara, Rossella, “DIACR-Ita @ EVALITA2020: Overview of the EVALITA2020 Diachronic Lexical Semantics (DIACR-Ita) Task,” v Proceedings of the Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA 2020), ur. Valerio Basile, Danilo Croce, Maria Di Maro in Lucia C. Passaro (Accademia University Press, 2020). https://api.semanticscholar.org/CorpusID:229292864.

Devlin, Jacob, Chang, Ming-Wei, Lee, Kenton, in Toutanova, Kristina. “BERT: Pre-training of deep bidirectional transformers for language understanding,” v Proceedings of the 2019 conference of the North American chapter of the Association for computational linguistics: Human language technologies, volume 1 (long and short papers) (Association for Computational Linguistics, 2019): 4171–4186. https://doi.org/10.18653/v1/N19-1423.

Farris, Sara R. In the name of women’s rights: The rise of femonationalism. Duke University Press, 2017. http://www.jstor.org/stable/j.ctv11sn2fp.

Fišer, Darja, in Ljubešić, Nikola. „Tviti kot leksikografski vir za analizo pomenskih premikov v slovenščini,“ v Viri, orodja in metode za analizo spletne slovenščine, ur. Darja Fišer (Znanstvena založba Filozofske fakultete, 2018): 198-226.

Gantar, Polona, Arhar Holdt, Špela, in Pollak, Senja. “Leksikalne novosti v besedilih računalniško posredovane komunikacije,” Slavistična revija 66, št. 4 (2018), 459–472.

Gillani, Nabeel, in Levy, Roger. “Simple dynamic word embeddings for mapping perceptions in the public sphere,” v Proceedings of the third workshop on natural language processing and computational social science (2019): 94–99.

Giulianelli, Mario, Del Tredici, Marco, in Fernández, Raquel. “Analysing lexical semantic change with contextualised word representations,” v Proceedings of the 58th annual meeting of the Association for computational linguistics (Association for Computational Linguistics, 2020): 3960–3973. https://www.aclweb.org/anthology/2020.acl-main.365.

Hamilton, William L., Leskovec, Jure in Jurafsky, Dan. “Diachronic word embeddings reveal statistical laws of semantic change,” V Proceedings of the 54th annual meeting of the Association for computational linguistics (Association for computational linguistics, 2016): 1489–1501. http://doi.org/10.18653/v1/P16-1141.

Hilpert, Martin, in Gries, Stefan Th. “Assessing frequency changes in multistage diachronic corpora: Applications for historical corpus linguistics and the study of language acquisition,” Literary and Linguistic Computing 24, št. 4 (2008): 385–401.

Hopper, Paul J. “On some principles of grammaticization”. Approaches to grammaticalization: Vol. 1. Theoretical and methodological issues, ur. Elizabeth Closs Traugott and Bernd Heine (John Benjamins, 1991), 17–35, https://doi.org/10.1075/tsl.19.1.04hop.

Juola, Patrick. “The time course of language change,” Computers and the Humanities 37, št. 1 (2003): 77–96.

Kim, Yoon, Chiu, Yi-I, Hanaki, Kentaro, Hegde, Darshan, in Petrov, Slav. “Temporal analysis of language through neural language models,” v Proceedings of the ACL 2014 workshop on language technologies and computational social science (2014): 61–65. http://doi.org/10.3115/v1/W14-2517.

Krek, Simon, Arhar Holdt, Š., Erjavec, Tomaž, Čibej, Jaka, Repar, Andraž, Gantar, Polona, idr., “Gigafida 2.0: the reference corpus of written standard Slovene,” V Proceedings of the 12th Language resources and evaluation conference (ELRA, 2020): 3340–3345.

Kundnani, Arun. The muslims are coming: Islamophobia, extremism, and the domestic war on terror. Verso, 2015.

Kutuzov, Andrey, in Giulianelli, Mario. “UiO-UvA at SemEval-2020 task 1: Contextualised embeddings for lexical semantic change detection,” v Proceedings of the fourteenth workshop on semantic evaluation (International Committee for Computational Linguistics, 2020): 126–134. https://www.aclweb.org/anthology/2020.semeval-1.14.

Kutuzov, Andrey, in Pivovarova, Lidia. “RuShiftEval: a shared task on semantic shift detection for Russian,” v Computational linguistics and intellectual technologies: Papers from the annual conference Dialogue. (2021).

Kutuzov, Andrey, Velldal, Erik, in Øvrelid, Lilja. ”Tracing armed conflicts with diachronic word embedding models,”V Proceedings of the events and stories in the news workshop (Association for Computational Linguistics, 2017): 31–36. https://doi.org/10.18653/v1/W17-2705.

Lin, Jianhua. “Divergence measures based on the Shannon entropy,” IEEE Transactions on Information theory 37, št. 1 (1991): 145–151.

Ljubešić, Nikola, Terčon, Luka, in Dobrovoljc, Kaja. “CLASSLA-Stanza: The Next Step for Linguistic Processing of South Slavic Languages,” v Zbornik konference za jezikovne tehnologije in digitalno humanistiko (JT-DH-2024), ur. Špela Arhar Holdt in Tomaž Erjavec (Institut za novejšo zgodovino, 2024): 251-274. https://doi.org/10.5281/zenodo.13936406

Martinc, Matej, Kralj Novak, Petra, in Pollak, Senja. “Leveraging contextual embeddings for detecting diachronic semantic shift,” v Proceedings of the twelfth language resources and evaluation conference (ELRA, 2020): 4811–4819. https://aclanthology.org/2020.lrec-1.592.

Martinc, Matej, Montariol, Syrielle, Zosa, Elaine, in Pivovarova, Lidia. “Capturing evolution in word usage: Just add more clusters?” v Companion proceedings of the web conference 2020 (Association for Computing Machinery: 2020): 343-349. https://doi.org/10.1145/3366424.3382186.

Martinc, Matej, Perger, Nina, Pelicon, Andraž, Ulčar, Matej, Vezovnik, Andreja, in Pollak, Senja. “EMBEDDIA hackathon report: Automatic sentiment and viewpoint analysis of Slovenian news corpus on the topic of LGBTIQ+,” v Proceedings of the EACL Hackashop on news media content analysis and automated report generation (2021): 121–126.

Mikolov, Tomas, Sutskever, Ilya, Chen, Kai, Corrado, Greg S., in Dean, Jeff. “Distributed representations of words and phrases and their compositionality,” Advances in neural information processing systems 26 (2013).

Montariol, Syrielle, Martinc, Matej, in Pivovarova, Lidia. “Scalable and interpretable semantic change detection,” v Proceedings of the 2021 conference of the north american chapter of the association for computational linguistics human language technologies (ACL, 2021): 4642–4652.

Pajnik, Mojca, “Medijsko-politični paralelizem. legitimizacija migracijske politike na primeru komentarja v časopisu Delo,” Dve domovini / Two Homelands 45 (2017): 169-184.

Pranjić, Marko, Dobrovoljc, Kaja, Pollak, Senja in Martinc, Matej. “Semantic change detection for slovene language: a novel dataset and an approach based on optimal transport,” arXiv:2402.16596 (arXiv preprint, 2024). https://doi.org/10.48550/arXiv.2402.16596.

Rosin, Guy D., Guy, Ido, in Radinsky, Kira. “Time masking for temporal language models,” v Proceedings of the fifteenth ACM international conference on web search and data mining (2022): 833–841.

Schlechtweg, Dominik, McGillivray, Barbara, Hengchen, Simon, Dubossarsky, Haim, in Tahmasebi, Nina. “SemEval-2020 task 1: Unsupervised lexical semantic change detection,” v Proceedings of the fourteenth workshop on semantic evaluation (International Committee for Computational Linguistics, 2020): 1–23. https://www.aclweb.org/anthology/2020.semeval-1.1.

Snoj, Jerica, “Slovarska večpomenskost in Slovensko leksikalno pomenoslovje,”. Slavistična Revija 51, št. 4 (2003): 387-409.

Justin Solomon, “Optimal transport on discrete domains,” arXiv:1801.07745v2 (arXiv preprint, 2018). https://arxiv.org/pdf/1801.07745.

Stewart, Ian, Arendt, Dustin, Bell, Eric, in Volkova, Svitlana. “Measuring, predicting and visualizing short-term change in word representation and usage in VKontakte social network” v Eleventh international AAAI Conference on Web and Social Media (2017): 672-675. https://doi.org/10.1609/icwsm.v11i1.14938.

Sweetser, Eve. From Etymology to Pragmatics: Metaphorical and Cultural Aspects of Semantic Structure. Cambridge University Press, 1990.

Tahmasebi, Nina, Borin, Lars, in Jatowt, Adam. “Survey of computational approaches to diachronic conceptual change,” arXiv:1811.06278 (arXiv preprint, 2018). https://arxiv.org/abs/1811.06278.

Tahmasebi, Nina., Borin, Lars, Jatowt, Adam, Xu, Yang, in Hengchen, Simon (ur.). Computational approaches to semantic change. Language Science Press, 2021. https://doi.org/10.5281/zenodo.5040241.

Tang, Xuri. “A state-of-the-art of semantic change computation,” Natural Language Engineering 24, št. 5 (2018): 649–676.

Ulčar, Matej, in Robnik Šikonja, Marko. “SloBERTa: Slovene monolingual large pretrained masked language model,” v Zbornik 24. mednarodne multikonference Informacijska družba 2021, zvezek C (Institut „Jožef Stefan“: 2021): 17-20.

Vidovič Muha, Ada, Slovensko leksikalno pomenoslovje: govorica slovarja. Znanstveni inštitut Filozofske fakultete, 2000.

Zamora-Reina, F. D., Bravo-Marquez, F. in Schlechtweg, D. “LSCDiscovery: A shared task on semantic change discovery and detection in Spanish,” v Proceedings of the 3rd Workshop on Computational Approaches to Historical Language Change (Association for Computational Linguistics: 2022): 149–164.

Prenosi

Objavljeno

2025-12-22

Številka

Rubrika

Razprave

Najbolj brani prispevki istega avtorja(jev)