A System for Word Usage Change Detection: Its Use in Linguistic and Sociolinguistic Studies

Authors

  • Mojca Brglez Institut "Jožef Stefan"
  • Veronika Bajt Mirovni inštitut
  • Senja Pollak Institut "Jožef Stefan"
  • Špela Rot Univerza v Ljubljani, Filozofska fakulteta
  • Matej Martinc Institut "Jožef Stefan"

DOI:

https://doi.org/10.51663/pnz.65.3.07

Keywords:

word usage change, semantics, meaning shifts, sociolinguistics

Abstract

This paper presents a system for detecting changes in Slovene word usage, enabling the automatic identification of semantic and other shifts across different time periods. We first introduce the system’s technical design and requirements, the methodology for detecting changes, and the graphical user interface, which ensures a user-friendly experience. We then demonstrate how the system can be implemented on the reference corpus of Slovene, Gigafida 2.0, and used to search for and analyse changes in word usage across various time periods. The system’s results are evaluated through a cognitive-linguistic and lexical analysis of the most changed adjectives and nouns, where we examine and categorise word meanings and usages within the detected clusters based on their semantic motivation and representation in dictionaries. Finally, we apply the system to a case study of migration representation in different time periods with manually defined boundaries, which have significantly influenced attitudes toward migration and migrants in Slovenia, thereby testing its applicability for sociolinguistic research. From a linguistic perspective, we observe that the system distinguishes between semantic, syntactic, and other contextually distinct usages, demonstrating its ability to detect both short-term and long-term changes. Furthermore, we observe that the system clearly illustrates the impact of external factors on language and discourse in specific time periods, making it a valuable tool for sociolinguistic analysis.

References

Aitchison, Jean. Language change: Progress or decay? Cambridge University Press, 2001.

Azarbonyad, Hosein, Dehghani, Mostafa, Beelen, Kaspar, Arkut, Alexandra, Marx, Maarten, in Kamps, Jaap. “Words are malleable: Computing semantic shifts in political and media discourse,” v Proceedings of the 2017 ACM Conference on information and knowledge management (2017): 1509–1518.

Basile, Pierpaolo, Caputo, Annalina, Caselli, Tommaso, Cassotti, Pierluigi, Varvara, Rossella, “DIACR-Ita @ EVALITA2020: Overview of the EVALITA2020 Diachronic Lexical Semantics (DIACR-Ita) Task,” v Proceedings of the Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA 2020), ur. Valerio Basile, Danilo Croce, Maria Di Maro in Lucia C. Passaro (Accademia University Press, 2020). https://api.semanticscholar.org/CorpusID:229292864.

Devlin, Jacob, Chang, Ming-Wei, Lee, Kenton, in Toutanova, Kristina. “BERT: Pre-training of deep bidirectional transformers for language understanding,” v Proceedings of the 2019 conference of the North American chapter of the Association for computational linguistics: Human language technologies, volume 1 (long and short papers) (Association for Computational Linguistics, 2019): 4171–4186. https://doi.org/10.18653/v1/N19-1423.

Farris, Sara R. In the name of women’s rights: The rise of femonationalism. Duke University Press, 2017. http://www.jstor.org/stable/j.ctv11sn2fp.

Fišer, Darja, in Ljubešić, Nikola. „Tviti kot leksikografski vir za analizo pomenskih premikov v slovenščini,“ v Viri, orodja in metode za analizo spletne slovenščine, ur. Darja Fišer (Znanstvena založba Filozofske fakultete, 2018): 198-226.

Gantar, Polona, Arhar Holdt, Špela, in Pollak, Senja. “Leksikalne novosti v besedilih računalniško posredovane komunikacije,” Slavistična revija 66, št. 4 (2018), 459–472.

Gillani, Nabeel, in Levy, Roger. “Simple dynamic word embeddings for mapping perceptions in the public sphere,” v Proceedings of the third workshop on natural language processing and computational social science (2019): 94–99.

Giulianelli, Mario, Del Tredici, Marco, in Fernández, Raquel. “Analysing lexical semantic change with contextualised word representations,” v Proceedings of the 58th annual meeting of the Association for computational linguistics (Association for Computational Linguistics, 2020): 3960–3973. https://www.aclweb.org/anthology/2020.acl-main.365.

Hamilton, William L., Leskovec, Jure in Jurafsky, Dan. “Diachronic word embeddings reveal statistical laws of semantic change,” V Proceedings of the 54th annual meeting of the Association for computational linguistics (Association for computational linguistics, 2016): 1489–1501. http://doi.org/10.18653/v1/P16-1141.

Hilpert, Martin, in Gries, Stefan Th. “Assessing frequency changes in multistage diachronic corpora: Applications for historical corpus linguistics and the study of language acquisition,” Literary and Linguistic Computing 24, št. 4 (2008): 385–401.

Hopper, Paul J. “On some principles of grammaticization”. Approaches to grammaticalization: Vol. 1. Theoretical and methodological issues, ur. Elizabeth Closs Traugott and Bernd Heine (John Benjamins, 1991), 17–35, https://doi.org/10.1075/tsl.19.1.04hop.

Juola, Patrick. “The time course of language change,” Computers and the Humanities 37, št. 1 (2003): 77–96.

Kim, Yoon, Chiu, Yi-I, Hanaki, Kentaro, Hegde, Darshan, in Petrov, Slav. “Temporal analysis of language through neural language models,” v Proceedings of the ACL 2014 workshop on language technologies and computational social science (2014): 61–65. http://doi.org/10.3115/v1/W14-2517.

Krek, Simon, Arhar Holdt, Š., Erjavec, Tomaž, Čibej, Jaka, Repar, Andraž, Gantar, Polona, idr., “Gigafida 2.0: the reference corpus of written standard Slovene,” V Proceedings of the 12th Language resources and evaluation conference (ELRA, 2020): 3340–3345.

Kundnani, Arun. The muslims are coming: Islamophobia, extremism, and the domestic war on terror. Verso, 2015.

Kutuzov, Andrey, in Giulianelli, Mario. “UiO-UvA at SemEval-2020 task 1: Contextualised embeddings for lexical semantic change detection,” v Proceedings of the fourteenth workshop on semantic evaluation (International Committee for Computational Linguistics, 2020): 126–134. https://www.aclweb.org/anthology/2020.semeval-1.14.

Kutuzov, Andrey, in Pivovarova, Lidia. “RuShiftEval: a shared task on semantic shift detection for Russian,” v Computational linguistics and intellectual technologies: Papers from the annual conference Dialogue. (2021).

Kutuzov, Andrey, Velldal, Erik, in Øvrelid, Lilja. ”Tracing armed conflicts with diachronic word embedding models,”V Proceedings of the events and stories in the news workshop (Association for Computational Linguistics, 2017): 31–36. https://doi.org/10.18653/v1/W17-2705.

Lin, Jianhua. “Divergence measures based on the Shannon entropy,” IEEE Transactions on Information theory 37, št. 1 (1991): 145–151.

Ljubešić, Nikola, Terčon, Luka, in Dobrovoljc, Kaja. “CLASSLA-Stanza: The Next Step for Linguistic Processing of South Slavic Languages,” v Zbornik konference za jezikovne tehnologije in digitalno humanistiko (JT-DH-2024), ur. Špela Arhar Holdt in Tomaž Erjavec (Institut za novejšo zgodovino, 2024): 251-274. https://doi.org/10.5281/zenodo.13936406

Martinc, Matej, Kralj Novak, Petra, in Pollak, Senja. “Leveraging contextual embeddings for detecting diachronic semantic shift,” v Proceedings of the twelfth language resources and evaluation conference (ELRA, 2020): 4811–4819. https://aclanthology.org/2020.lrec-1.592.

Martinc, Matej, Montariol, Syrielle, Zosa, Elaine, in Pivovarova, Lidia. “Capturing evolution in word usage: Just add more clusters?” v Companion proceedings of the web conference 2020 (Association for Computing Machinery: 2020): 343-349. https://doi.org/10.1145/3366424.3382186.

Martinc, Matej, Perger, Nina, Pelicon, Andraž, Ulčar, Matej, Vezovnik, Andreja, in Pollak, Senja. “EMBEDDIA hackathon report: Automatic sentiment and viewpoint analysis of Slovenian news corpus on the topic of LGBTIQ+,” v Proceedings of the EACL Hackashop on news media content analysis and automated report generation (2021): 121–126.

Mikolov, Tomas, Sutskever, Ilya, Chen, Kai, Corrado, Greg S., in Dean, Jeff. “Distributed representations of words and phrases and their compositionality,” Advances in neural information processing systems 26 (2013).

Montariol, Syrielle, Martinc, Matej, in Pivovarova, Lidia. “Scalable and interpretable semantic change detection,” v Proceedings of the 2021 conference of the north american chapter of the association for computational linguistics human language technologies (ACL, 2021): 4642–4652.

Pajnik, Mojca, “Medijsko-politični paralelizem. legitimizacija migracijske politike na primeru komentarja v časopisu Delo,” Dve domovini / Two Homelands 45 (2017): 169-184.

Pranjić, Marko, Dobrovoljc, Kaja, Pollak, Senja in Martinc, Matej. “Semantic change detection for slovene language: a novel dataset and an approach based on optimal transport,” arXiv:2402.16596 (arXiv preprint, 2024). https://doi.org/10.48550/arXiv.2402.16596.

Rosin, Guy D., Guy, Ido, in Radinsky, Kira. “Time masking for temporal language models,” v Proceedings of the fifteenth ACM international conference on web search and data mining (2022): 833–841.

Schlechtweg, Dominik, McGillivray, Barbara, Hengchen, Simon, Dubossarsky, Haim, in Tahmasebi, Nina. “SemEval-2020 task 1: Unsupervised lexical semantic change detection,” v Proceedings of the fourteenth workshop on semantic evaluation (International Committee for Computational Linguistics, 2020): 1–23. https://www.aclweb.org/anthology/2020.semeval-1.1.

Snoj, Jerica, “Slovarska večpomenskost in Slovensko leksikalno pomenoslovje,”. Slavistična Revija 51, št. 4 (2003): 387-409.

Justin Solomon, “Optimal transport on discrete domains,” arXiv:1801.07745v2 (arXiv preprint, 2018). https://arxiv.org/pdf/1801.07745.

Stewart, Ian, Arendt, Dustin, Bell, Eric, in Volkova, Svitlana. “Measuring, predicting and visualizing short-term change in word representation and usage in VKontakte social network” v Eleventh international AAAI Conference on Web and Social Media (2017): 672-675. https://doi.org/10.1609/icwsm.v11i1.14938.

Sweetser, Eve. From Etymology to Pragmatics: Metaphorical and Cultural Aspects of Semantic Structure. Cambridge University Press, 1990.

Tahmasebi, Nina, Borin, Lars, in Jatowt, Adam. “Survey of computational approaches to diachronic conceptual change,” arXiv:1811.06278 (arXiv preprint, 2018). https://arxiv.org/abs/1811.06278.

Tahmasebi, Nina., Borin, Lars, Jatowt, Adam, Xu, Yang, in Hengchen, Simon (ur.). Computational approaches to semantic change. Language Science Press, 2021. https://doi.org/10.5281/zenodo.5040241.

Tang, Xuri. “A state-of-the-art of semantic change computation,” Natural Language Engineering 24, št. 5 (2018): 649–676.

Ulčar, Matej, in Robnik Šikonja, Marko. “SloBERTa: Slovene monolingual large pretrained masked language model,” v Zbornik 24. mednarodne multikonference Informacijska družba 2021, zvezek C (Institut „Jožef Stefan“: 2021): 17-20.

Vidovič Muha, Ada, Slovensko leksikalno pomenoslovje: govorica slovarja. Znanstveni inštitut Filozofske fakultete, 2000.

Zamora-Reina, F. D., Bravo-Marquez, F. in Schlechtweg, D. “LSCDiscovery: A shared task on semantic change discovery and detection in Spanish,” v Proceedings of the 3rd Workshop on Computational Approaches to Historical Language Change (Association for Computational Linguistics: 2022): 149–164.

Published

2025-12-22

Issue

Section

Articles

Most read articles by the same author(s)