The Potential of ChatGPT in the Development of the Thesaurus of Modern Slovene
DOI:
https://doi.org/10.51663/pnz.65.3.08Keywords:
digital lexicography, ChatGPT, synonyms, word senses, Slovene languageAbstract
In this study, we examine how well ChatGPT-4 performs in two lexicographic tasks: (a) cleaning the list of automatically retrieved synonym candidates and assigning synonymic material to lexical senses, and (b) generating dictionary entries, including sense division, definitions, and examples, based on different input data. As a gold standard, we consider the lexicographic decisions recorded in the Digital Dictionary Database for Slovene. In the first experiment, we analyse the results for 246 dictionary entries and find that ChatGPT processed the data identically to lexicographers in 41.9 % of cases, while in 58.1 % of cases, it made different decisions. When assessing the relevance of synonym candidates, ChatGPT was more permissive than the gold standard. Differences in synonym placement (assignment to a different sense in 14.6 % of entries, missing placement in 19.9 %) can be partly attributed to input data characteristics, such as task complexity and the brevity of semantic indicators. In the second experiment, we test ChatGPT’s ability to autonomously generate dictionary entries for 116 headwords. The analysis of generated sense divisions and definitions reveals that the system performs moderately well: in 57 % of cases, it identified all senses, almost 80 % of generated entries received an average score of 3.5 or higher, and 19 % received the highest score from both evaluators. The main challenges include excessive splitting of senses, failure to recognise figurative meanings, and reduced predictability of results. We conclude that ChatGPT has potential for speeding up manual lexicographic work if its results are properly monitored and refined.
References
Angleško-slovenski slovar Bridge. 2000. Ljubljana: DZS.
Arhar Holdt, Špela, Jaka Čibej, Kaja Dobrovoljc, Polona Gantar, Vojko Gorjanc, Bojan Klemenc, Iztok Kosem, Simon Krek, Cyprian Laskowski in Marko Robnik-Šikonja. "Thesaurus of Modern Slovene: By the Community for the Community." V Proceedings of the XVIII EURALEX International Congress, Lexicography in Global Contexts, 17–21 July 2018, Ljubljana, ur. Jaka Čibej, Vojko Gorjanc, Iztok Kosem in Simon Krek, 401–410. Ljubljana: Znanstvena založba Filozofske fakultete, 2018. https://doi.org/10.4312/9789610600961.
Arhar Holdt, Špela, Polona Gantar, Iztok Kosem, Eva Pori, Marko Robnik Šikonja in Simon Krek. "Thesaurus of Modern Slovene 2.0." V Electronic Lexicography in the 21st Century (eLex 2023), Proceedings of the eLex 2023 Conference, 27–29 June 2023, ur. Marek Medveď, Michal Měchura, Carole Tiberius, Iztok Kosem, Jelena Kallas, Miloš Jakubíček in Simon Krek, 366–381. Brno: Lexical Computing CZ, 2023. https://elex.link/elex2023/wp-content/uploads/82.pdf.
Bartosz, Ptasznik, Sascha Wolfer in Robert Lew. "A Learners’ Dictionary versus ChatGPT in Receptive and Productive Lexical Tasks." International Journal of Lexicography 37, št. 3 (2024): 322–336. https://doi.org/10.1093/ijl/ecae011.
Čibej, Jaka, Luka Terčon, Simon Krek, Andraž Repar, Erik Novak, Polona Gantar, Iztok Kosem, Špela Arhar Holdt, Kaja Dobrovoljc, Amadea Berginc, Irena Hvala, Damijan Klement, Manja Kolenc, Ana Močnik, Tina Munda, David Pavlas, Anamari Pečan, Aleksandra Poljak, Davorin Sečnik, Jure Šešet, Jan Štumberger, Tina Toličič in Laura Trpin. Open Slovene WordNet OSWN 1.0. Slovenian language resource repository CLARIN.SI, 2023. http://hdl.handle.net/11356/1888.
de Schryver, Gilles-Maurice. Generative AI and Lexicography: The Current State of the Art Using ChatGPT. International Journal of Lexicography 36, št. 4 (2023): 355–387. https://doi.org/10.1093/ijl/ecad021.
de Schryver, Gilles-Maurice in David Joffe. The End of Lexicography, Welcome to the Machine: On How ChatGPT Can Already Take over All of the Dictionary Maker’s Tasks. 20th CODH Seminar, ROIS-DS Center for Open Data in the Humanities, Tokyo, Japan, 27. februar 2023. http://codh.rois.ac.jp/seminar/lexicography-chatgpt-20230227/.
Gantar, Polona. Leksikografski opis slovenščine v digitalnem okolju. 1. izd., elektronska izd. Ljubljana: Znanstvena založba Filozofske fakultete, 2015. Zbirka Sporazumevanje. https://doi.org/10.4312/9789612377922.
Gantar, Polona, Špela Arhar Holdt, Iztok Kosem in Simon Krek. "Sopomenke 2.0 in Kolokacije 2.0: Novi koraki za slovenske odzivne slovarje." Jezik in slovstvo 68, št. 4 (2023): 157–175. https://doi.org/10.4312/jis.68.4.157-175.
Gapsa, Magdalena, Špela Arhar Holdt in Iztok Kosem. "Kako dober je ChatGPT pri umeščanju sopomenk pod besedne pomene." V Jezikovne tehnologije in digitalna humanistika: Zbornik konference, 19.–20. september 2024, Ljubljana, Slovenija, ur. Špela Arhar Holdt in Tomaž Erjavec, 144–162. Ljubljana: Inštitut za novejšo zgodovino, 2024. https://zenodo.org/records/13912515.
Gapsa, Magdalena. "But why?? Evaluation of User-Suggested Synonyms in the Thesaurus of Modern Slovene." [Preprint], 2023. https://doi.org/10.21203/rs.3.rs-2775161/v1.
Jakubíček, Miloš in Michael Rundell. "The End of Lexicography? Can ChatGPT Outperform Current Tools for Post-Editing Lexicography?" V Electronic Lexicography in the 21st Century (eLex 2023): Proceedings of the eLex 2023 Conference, ur. Marek Medveď, Michal Měchura, Carole Tiberius, Iztok Kosem, Jelena Kallas, Miloš Jakubíček in Simon Krek, 518–533. Lexical Computing CZ, 2023. https://elex.link/elex2023/wp-content/uploads/102.pdf.
Kosem, Iztok, Simon Krek in Polona Gantar. "Semantic Data Should No Longer Exist in Isolation: The Digital Dictionary Database of Slovenian." V EURALEX XIX: Congress of the European Association for Lexicography, Lexicography for Inclusion, 7–9 September 2021, Virtual, Book of Abstracts, ur. Zoe Gavriilidou, Lydia Mitits in Spyros Kiosses, 81–83. Democritus University of Thrace, 2021. https://euralex.org/wp-content/uploads/2022/04/ABS2020.pdf.
Kosem, Iztok, Špela Arhar Holdt, Simon Krek, Polona Gantar, Eva Pori, Urška Kamenšek, Primož Ponikvar, Rebeka Roblek, Jure Šešet, Petra Zaranšek, Karolina Zgaga, Jaka Čibej, Bojan Klemenc, Cyprian Laskowski, Kaja Dobrovoljc, Vojko Gorjanc in Nikola Ljubešić. Kolokacijski slovar sodobne slovenščine. Ljubljana: Znanstvena založba Filozofske fakultete, 2018–. https://viri.cjvt.si/kolokacije/slv/#.
Kosem, Iztok, Husak, Miloš in McCarthy, Diana. "GDEX for Slovene." V Electronic Lexicography in the 21st Century: New Applications for New Users: Proceedings of eLex 2011, 10–12 November 2011, Bled, Slovenia, ur. Iztok Kosem in Karmen Kosem, 150–159. Ljubljana: Trojina, Institute for Applied Slovene Studies, 2011. http://www.trojina.si/elex2011/elex2011_proceedings.pdf.
Krek, Simon, Cyprian Laskowski in Marko Robnik-Šikonja. "From Translation Equivalents to Synonyms: Creation of a Slovene Thesaurus Using Word Co-occurrence Network Analysis." V Electronic Lexicography in the 21st Century. Proceedings of eLex 2017 Conference: Lexicography from Scratch, ur. Iztok Kosem, Carole Tiberius, Miloš Jakubíček, Jelena Kallas, Simon Krek in Vít Baisa, 93–109. Leiden: Dutch Language Institute, Lexical Computing CZ s.r.o., Trojina, 2017. https://elex.link/elex2017/wp-content/uploads/2017/09/paper05.pdf.
Krek, Simon, Cyprian Laskowski, Marko Robnik-Šikonja, Iztok Kosem, Špela Arhar Holdt, Polona Gantar, Jaka Čibej, Vojko Gorjanc, Bojan Klemenc in Kaja Dobrovoljc. Thesaurus of Modern Slovene 1.0. Repozitorij raziskovalne strukture CLARIN.SI, 2018. http://hdl.handle.net/11356/1166.
Krek, Simon, Cyprian Laskowski, Marko Robnik-Šikonja, Iztok Kosem, Špela Arhar Holdt, Polona Gantar, Jaka Čibej, Vojko Gorjanc, Bojan Klemenc, Kaja Dobrovoljc, Eva Pori, Rok Roblek in Klemen Zgaga. Thesaurus of Modern Slovene 2.0. Repozitorij raziskovalne strukture CLARIN.SI, 2023. http://hdl.handle.net/11356/1916.
McKean, Erin in Will Fitzgerald. "The ROI of AI in Lexicography." Lexicography 11, št. 1 (2024): 7–27. https://utppublishing.com/doi/abs/10.1558/lexi.27569.
Lew, Robert. "ChatGPT as a COBUILD Lexicographer." Humanities and Social Sciences Communications 10 (2023): Article 704. https://doi.org/10.1057/s41599-023-02119-6.
Marzi, Gabriele, Marco Balzano Marco in Davide Marchiori. "K-Alpha Calculator—Krippendorff's Alpha Calculator: A User-Friendly Tool for Computing Krippendorff's Alpha Inter-Rater Reliability Coefficient." MethodsX 12 (2024): 102545. https://doi.org/10.1016/j.mex.2023.102545.
OpenAI. ChatGPT (31. 5. 2024) [veliki jezikovni model]. 2024. https://chatgpt.com.
Rundell, Michael. "Automating the Creation of Dictionaries: Are We Nearly There?" V Proceedings of the 16th International Conference of the Asian Association for Lexicography: “Lexicography, Artificial Intelligence, and Dictionary Users”, 22–24 June 2023, Seoul, South Korea, 9–17. Yonsei University, 2023. https://www.asialex.org/pdf/Asialex-Proceedings-2023.pdf.
Tiberius, C., Kris Heylen, J. de Does, B. Vanroy, V. Vandeghinste in J. van Doeselaar. "LLMs and Evidence-based Lexicography." V Large Language Models and Lexicography, Book of Abstracts, 8th October 2024, Cavtat, Croatia, ur. Simon Krek, 44–48. 2024. Pridobljeno 25. januarja 2025. https://www.cjvt.si/wp-content/uploads/2024/10/LLM-Lex_2024_Book-of-Abstracts.pdf.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Špela Arhar Holdt, Magdalena Gapsa, Polona Gantar, Iztok Kosem

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).