Predicting Slovene Text Complexity Using Readability Measures

The majority of existing readability measures are designed for English texts. We aim to adapt and test the readability measures on Slovene. We test ten well-known readability formulas and eight additional readability criteria on five t(cid:92)pes of te(cid:91)ts: children(cid:182)s maga(cid:93)ines, general maga(cid:93)ines, dail(cid:92) ne(cid:90)spapers, technical magazines, and transcriptions of national assembly sessions. As these groups of texts target different audiences, we assume that the differences in writing styles should be reflected in their readability scores. Our analysis shows which readability measures perform well on this task and which fail to distinguish between the groups.


Introduction
In English, the problem of determining text readability (i.e.how easy a text is to understand) has long been a topic of research, with its origins in the 19th century (Sherman 1893).Since then, many different methods and readability measures have been developed, often with the goal of determining whether a text is too difficult for its target age group.Even though the question of readability is complex from a linguistic standpoint, a large majority of existing measures are based on simple heuristics.There has been little research on readability of languages other than English, therefore we aim to apply these measures to Slovene and evaluate how well they perform.
There are several factors that might cause these measures to perform poorly on non-English languages, such as: -Many measures are fine-tuned to correspond to the grade levels of the United States education system.It is likely a different fine-tuning would be needed for other languages, as a.) their education system is different from the US system, and b.) the differences in readability between grade levels are likely to be different between languages, meaning that each language would require specifically tuned parameters.-Some measures utilize a list of common English words and their results depend on the definition of this list.For Slovene, there currently does not exist a publicly available list of common words, so it is not known how such measures would perform.-The existing readability measures do not use the morphological information to determine difficult words but rely on syllable and character counts, or a list of difficult words.As Slovene is morphologically much more complex than English, words with complex morphology are harder to understand than those with simple morphology, even if they have the same number of characters or syllables.
We analyze the commonly used readability measures (as well as some novel measures) on Slovene texts and propose a word list needed to implement the word-listbased measures.We calculate statistical distributions of scores for each readability measure across subcorpora and assess the ability of measures to distinguish between different subcorpora using a variety of statistical tests.We show that machine learning classification models, using a combination of readability measures, can predict the subcorpus a given text belongs to.
The paper extends the short version of the paper presented in Škvorc et al. ( 2018) and is structured as follows.We first present the related work on readability measures and describe the readability measures used in our analysis.The methodology of the analysis is presented next, followed by the results split into three sections.The last section concludes the paper and presents ideas for further work.

Related Work
For English, there exists a variety of works focused on determining readability by using readability formulas.Those formulas rely on different features of the text such as the average sentence length, percentage of difficult words, and the average number of characters per word.Examples of such measures include the Coleman-Liau index (Coleman and Liau 1975), LIX (Björnsson 1968), and the automated readability index (ARI) (Senter and Smith 1967).Some formulas, like the Flesch-Kincaid grade level (Kincaid et al. 1975) andSMOG (Mc Laughlin 1969) use the number of syllables per word to determine if a word is difficult.Additionally, some measures (e.g., the Spache readability formula (Spache 1953) and Dale-Chall readability formula (Dale and Chall 1948) rely on a pre-constructed list of difficult words.
Aside from the readability formulas, there exists a variety of other approaches that can be used to determine readability (Bailin and Grafstein 2016).For example, various machine-learning approaches can be used to obtain better results than readability formulas, such as the approach presented in Francois and Miltsakaki (2012), which outperforms readability formulas on French text.
There is little work attempting to apply these measures to Slovene texts.Most work dealing with the readability of Slovene text is focused on manual methods.For example, Justin (2009) analyzes Slovene textbooks from a variety of angles, including readability.On the other hand, works that focus on automatic readability measures are rare.Zwitter Vitez (2014) uses a variety of readability measures for author recognition in Slovene text, but we found no works that used them to determine readability.
In addition to Slovene, some related works evaluate readability measures on other languages.Debowski et al. (2015) evaluate readability formulas on Polish text and show that they obtain better results by using a more complex, machine-learning-based approach.

Readability Measures
In our analysis, we used two groups of readability measures: -Existing readability formulas for English: we focused mainly on popular methods that have been shown to achieve good results on English texts.These measures mostly rely on easy-to-obtain features such as a number of difficult words, sentence length, and word length.-Natural-language-processing-based readability criteria: we used additional criteria that are not present in the existing readability formulas but can be obtained from tools for automatic language processing, such as the percentage of verbs, number of unique words, and morphological difficulty of words.In the existing English formulas, such criteria are not used but they might contain useful information for determining the readability of Slovene texts.
In the following two subsections we present the established readability measures for grading English text and our proposed additional criteria.

Existing Readability Formulas
There exists a variety of ways to measure the readability of texts written in English.For our analysis, we used 10 readability formulas given below.The entities used in the expressions correspond to the number of occurrences of a given entity, e.g., word corresponds to the number of words in a measured text.
-Gunning fog index (Gunning 1952) is calculated as: where a word is considered complex if it contains three or more syllables.As there exists no established automatic method for counting syllables of Slovene words, we used a rule-based approach designed for English.The resulting score is calibrated to the grade level of the USA education system.
-Flesch reading ease (Kincaid et al. 1975) is calculated as: The score does not correspond to grade levels.Instead, the higher the value, the easier the text is considered to be.A text with a score of 100 should be easily understood by 11-year-old students, while a text with a score of 0 should be intended for university graduates.
-Flesch-Kincaid grade level (Kincaid et al. 1975) is similar to Flesch reading ease, but does correspond to grade levels.It is calculated as: -Dale-Chall readability formula (Dale and Chall 1948) is calculated as: The formula requires a predefined list of common (easy) words and the words which are not on the list are considered as difficult.The novelty of the Dale-Chall Formula was that it did not use word-length counts but a count of "hard" words which do not appear on a specially designed list of common words.This list was defined as the words familiar to most of the 4th-grade students: when 80 percent of the fourthgraders indicated that they knew a word, the word was added to the list.
Higher scores indicate that the text is harder, but the resulting score does not correspond to grade levels, nor is it appropriate for text aimed at children below 4th grade.In our analysis, we obtained the difficult words in two ways: 1.By constructing a list of "easy" words and considering every word not on the list as difficult.The list of easy words is described later in the paper.2. By considering words with more than seven characters as difficult.
-Spache readability formula (Spache 1953) is calculated as: Difficult words are defined as words that do not appear in the list of commonly used words, which is the same as the one used in the Dale-Chall readability formula.This method was specifically designed for texts targeting children up to the fourth grade and was not designed to perform well on harder text.The obtained score corresponds to grade levels.
-Automated readability index (Senter and Smith 1967) is calculated as: The formula was designed so that it could be automatically captured in times when texts were written on typewriters and therefore it does not use information relating to syllables or difficult words.The obtained score corresponds to grade levels.
-SMOG (Simple Measure of Gobbledygook) (McLaughlin 1969) can be calculated as: where difficult words are defined as words with three or more syllables.The score corresponds to grade levels.
-LIX (Bjornsson 1968) is calculated as: where long words are defined as words consisting of more than six characters.LIX is the only measure we used that was not designed specifically for English but for a variety of languages.Because of this, it does not use syllables or a list of unique words.The score does not correspond to grade levels.
-RIX (Anderson 1983) is a simplification of LIX, and is calculated as: -Coleman-Liau index (Coleman and Liau 1975) is calculated as: where L is the average number of letters per 100 words and S is the average number of sentences per 100 words.The obtained score corresponds to grade levels.

Language-Processing-Based Readability Criteria
The readability formulas described in the previous section use a low number of common criteria, such as the number of syllables in words or the number of words in a sentence.In our analysis, we also analyzed Slovene texts using the following additional statistics: -percentage of long words, -percentage of difficult words, -percentage of verbs, -percentage of adjectives, -percentage of unique words, -average sentence length.
Many of these (percentage of long words, difficult words, unique words, and average sentence length) are used as features in the readability measures described above.We evaluate them individually to determine how important each of them is for Slovene texts.The percentage of verbs is used because a higher number of verbs can indicate more complex sentences with multiple clauses.The percentage of adjectives was chosen because we assumed a higher percentage of adjectives could indicate longer, more descriptive sentences that are harder to understand.
To take into account richer morphology of Slovene and a less fixed word order compared to English, we computed two additional criteria: -Context of difficult words, which is the average number of difficult words that appear in a context (i.e. the three words before or after the word) of a difficult word.Difficult words are defined as words that do not appear on the list of common words.The intuition behind this metric is that a difficult word that appears in the context of easy words is easier to understand than if it is surrounded by other difficult words since its meaning can be more easily inferred from the context.-Average morphological difficulty, where we use the Slovene morphological lexicon Sloleks (Arhar Holdt 2009) to assign a "morphological difficulty" score to each word.Sloleks is a lexicon of word forms and contains frequency information for morphological variants of over 100,000 lemmas (base forms of words as defined in a dictionary).We use the relative frequency of a word variant compared to other variants of the same lemma as the morphological difficulty score.

Analysis of Slovene Texts
In this section, we describe the methodology used for our analysis.In the first subsection, we describe the data sets on which we conducted our analysis.In the second subsection, we describe how we constructed the list of easy words used in some of the readability measures.

Data Sets
We created a set of subcorpora from the Gigafida reference corpus of written Slovene (Logar et al. 2012).Gigafida contains 39,427 Slovene texts released from 1990 to 2011, for a total of 1,187,002,502 words.We focused on texts published in magazines, newspapers, and books while ignoring texts collected from the internet.The texts in the Gigafida corpus are segmented into paragraphs and sentences, tokenized, and part-of-speech tagged using the Obeliks tagger (Grčar et al. 2012).We grouped the texts based on the intended audience, resulting in the following subcorpora: -Children's magazines include magazines aimed at younger children (to In Table 1 we show the number of documents in each subcorpus and the average number of words per document.The subcorpus of newspapers contains the largest number of documents, while the subcorpus of text sourced from the National Assembly of Slovenia contains the fewest.Our hypothesis is that the readability measures will be able to distinguish texts from different subcorpora.We assume that children's magazines will be easily distinguishable from other genres that are addressing an adult population.We also suppose that general magazines are less complex than specialized magazines.The National Assembly transcripts were included as they differ from other texts in two major ways: a.) they are transcripts of spoken language and b.) they relate to a highly technical subject matter.Because of this, we were interested in how readability measures would grade them.To test our hypothesis and to determine how well each readability measure works, we analyzed texts from each subcorpus to obtain a score distribution for each measure.The scores were calculated separately for each source text (e.g., one magazine article, a newspaper, or one assembly session).

List of Common Words
For designing the list of common words, we took a corpus-based approach.Note that the methodology to create a list of common words from language corpora was already tested for other languages, (see e.g., Kilgarriff et al. 2014).We used four corpora to create a list of common words: Kres, Janes, Gos, and Šolar: -Šolar (Kosem et al. 2011) contains 2,703 texts written by pupils in Slovenia from grades 6 to 13 (grade 6 to 9 in primary school, and grade 1 to 4 in secondary school).The texts include essays, summaries, and answers to examination questions.-Gos (Verdonik et al. 2011)  We extracted the most common words and defined the common words as the ones that appear frequently in all four corpora (and are therefore not specific to a certain text type).We use four corpora to include texts that primarily reflect language production by different language users (Gos, Janes, Šolar), as well as texts that primarily reflect standard language (Kres).We aimed at covering younger school-going population (Šolar) and adults.For some corpora, we could have assigned words to different age levels (e.g. using pupils' grade levels in Šolar or using the age groups available in Gos metadata), but these corpora are very specific and the resulting word groups would mainly reflect the genre instead of age levels.Because of this, we opted for the approach of crossing the word lists to obtain a single list.The overlap of the most common words in four corpora eliminates frequent words which are typical for only one of the corpora (e.g.administrative language in Kres, spoken language markers in Gos, Twitter-specific usage in Janes, and literary references from essays in Šolar).
From each corpus, we extracted the 10,000 most frequent word lemmas and part-of-speech tuples.In order to construct a list of common words representative of Slovene language, we selected the word lemmas that occurred in the most frequent word lists of all the four corpora.We obtained a list of 2,562 common words, which we used in readability measures.

Results
For each text in each subcorpus, we calculated readability scores using all readability measures described in the previous section.In Figure 1 we present a few examples of obtained score distributions.We show distributions for three text subcorpora (children's magazines, newspapers, and technical magazines) and three readability scores (Goobledygook, Coleman-Liau, and the average number of words in a sentence).To show a compact overview of all included readability measures we calculated the median, first and third quartiles of the distribution for each score and each text subcorpus.The box-and-whiskers plots showing these results are visualized in Figure 2 which shows that most readability measures are able to distinguish between different subcorpora.Additionally, some of the readability measures confirm our original hypothesis, i.e. they are able to distinguish children's magazines from other genres that are addressing adult population, and evaluate general magazines as less complex than computer magazines.Figure 2 allows for an additional interpretation of readability measures.For example, children's magazines vs. general magazines vs. newspapers mean scores show increasing complexity in the following measures: Percentage of long words, Flesh Kincaid Grade Level, Gunning Fog Index, Dale-Chall Readability Formula (based on complexity defined by syllables), Context of Difficult Words, SMOG, LIX, RIX and Automated Readability Index.All these measures consider the length of words and/or sentences.The percentage of adjectives also seems to correlate with the complexity of these three text types, although to a lesser extent.The same holds for Flesh Reading Ease, since higher scores indicate lower complexity.For the majority of these measures, the distinction between newspapers and specialized computer magazines is either less evident or not evident at all, but they do indicate that computer magazines are less readable than general magazines.
Scores using the list of common words do not lead to the same conclusions.Percentage of Difficult Words and Dale-Chall Readability Formula with word list do not reflect the complexity of genres, but to some extent, they do distinguish between general and specialized texts (i.e.newspapers and general magazines have lower scores than specialized computer magazines).One of the reasons for the relatively high scores for the complexity of children magazines might be in the large proportion of literary language, such as in poems for children with many words not in the list of common words.For example, "KRAH, KRAH, KRAH! MENE NIČ NI STRAH!" (Krah, krah, krah!I am not afraid!) has 7 words, out of which 4 are on the list of simple words, while the interjection KRAH is not on the simple words list.Therefore, the proportion of difficult words in this segment is 42.8% (3 occurrences of word KRAH out of 7 words in total).On the other hand, the words are short, therefore length-based measures consider them to be simple words.
The readability scores for the National Assembly subcorpus show high variability across the measures, which might be attributed to the fact that it is a different genre (spoken, but specialized).E.g., in several measures where the readability complexity rises from children's magazines to general magazines and newspapers, the National assembly scores are close to general magazines.Very long words are less likely used in spoken language, even in a political context.Average morphological difficulty and context of difficult words lead to the interpretation that this genre is more complex (less "readable").The very high score for the context of difficult words might be attributed to enumeration of Assembly members (e.g., "Obveščen sem, da so zadržani in se današnje seje ne morejo udeležiti naslednje poslanke in poslanci: Ciril Pucko, Franc Kangler, Vincencij Demšar, Branko Kalalemina, …" (I was informed that the following deputies are occupied and cannot attend this session: …).The relatively high percentage of verbs can also be interpreted from this perspective, e.g., the National assembly text include many performatives, such as "Pričenjam nadaljevanje seje" (Starting the continuation of the session) and "Ugotavljamo prisotnost v dvorani" (Establishing the presence).
In summary, using a list of common words did not improve the partitioning of the text subcorpora perceived as easy and as difficult to read.Both measures that use it (Dale-Chall and Spache readability formulas) are poor separators.A number of simple readability measures worked well, such as the percentage of long words, the percentage of verbs/adjectives, and the average morphological difficulty.
We also calculated the sample mean and standard deviation of readability measures for each text subcorpus.The results are shown in Table 2.
Using these results, we calculated the Bhattacharyya distance between the distributions of Children's magazines and newspapers for each score.The Bhattacharyya distance measures the similarity between two statistical distributions.We assumed the scores were distributed normally, as the results shown in Figure 1 show that the scores approximately follow a normal distribution, and calculated the distance using the following formula: We also show the Bhattacharyya coefficient, which measures the overlap between two statistical distributions and can be calculated as: The results are presented in Table 3.These results are similar to the ones shown in Figure 2, with the readability formulas using the list of difficult words showing less dichotomization power.The largest distance is obtained using average sentence lengths.

Additional Statistical Tests
In addition to the initial analysis presented in the previous section, we performed additional, more thorough statistical tests to determine which of the evaluated measures are better at predicting the group a text belongs to.We used the following approaches: -Mutual information.This measure reports the amount of information we get about a random variable Y by observing another random variable X.In our case, mutual information reports the amount of information we get about the group of texts by knowing a score of certain readability measure.Mutual information is defined as: where p(x) and p(y) are the marginal probability distribution functions of X and Y and p(x, y) is the joint probability function of X and Y.In our case, X represents the distributions of readability measures and Y the distribution of groups.The higher the mutual information between the readability measure and the groups, the more useful the measure for determining the group membership.
-Analysis of variance (ANOVA).This measure first splits samples of a statistical distribution into several groups (in our case, based on the group the texts belong to) and then calculates if the groups are significantly different from one another.We use this measure to determine if the distributions obtained by calculating a single measure on each group of texts are significantly different.If they are, they can be useful for determining the group membership of a given text.-Feature selection using a chi-squared test.Similarly to mutual information, we use the chi-squared test to determine whether the readability measures and the group memberships are mutually dependent.If they are, this indicates that knowing the value of the readability measure is useful when determining which group a text belongs to.
In addition to the four statistical tests used above, we also ranked each feature using a random forest classifier (Breiman 2001).The classifier is capable of automatically combining different readability measures in order to predict which subcorpus a given text belongs to and is also capable of calculating how important each readability measure was when making the prediction.The classifier is described in more detail in the next section.Using each of these tests, we obtained scores that tell us how useful each readability measure is when trying to predict the subcorpus it came from.The results are presented in Table 4, with higher scores indicating better (more informative) readability measures.
#file: feature_selection_names.xlsxThe results of the statistical tests show that the features commonly used by the readability formulas (i.e. an average sentence length and number of long words) are useful when it comes to determining group membership.In particular, the average sentence length stands out since it is ranked as the most important measure in three out of the four tests.At least one of either LIX or RIX is also highly ranked (in the top 50% of all measures) by all the tests.Those measures are the only ones from the tested measures that were not designed specifically for English, which could be one of the reasons why they perform better on Slovene texts.The results also show that a number of proposed simpler readability criteria, such as the percentage of verbs, percentage of adjectives, and the average morphological difficulty are less useful than the established statistical formulas.The results are inconclusive about the most useful readability criterion for Slovene.Several formulas and statistics are useful, but the rankings are different by different tests.When using our list of common words Dale-Chall and Spache readability formulas are again shown to perform worse than the formulas that consider long words as difficult.

Classification Results
In addition to statistical evaluation, we also performed a test with machine learning classifiers (Kononenko and Kukar 2007) to see whether we could use our readability measures to predict which subcorpus a text belongs to.With classification models, we can automatically learn how to split the texts into different subcorpora based on readability formulas and other readability criteria.We used the following classification models.
-Decision trees construct a binary decision tree where each node splits the training set based on one readability measure.The trained tree can predict the subcorpus of a given text.-Random forests (Breiman 2001) create multiple decision trees in a random manner.This reduces the variance of a model and often gives better prediction accuracy than using a single decision tree.-Naive Bayes is a probabilistic model based on the Bayes' theorem.The model assumes that the readability measures are independent.-Extreme gradient boosting (Chen and Carlos 2016) constructs a large number of simple classifiers and combines them to achieve state-of-the-art results on many classification problems.
In order to use classification models, we first train them on a training subset of our data set.We used randomly selected 75% of our data set for the training.To evaluate the models, we calculated the classification accuracy (i.e. the percentage of texts each model predicted correctly) on the remaining 25% of the data set.The obtained results are presented in Table 5.The results obtained by the majority classifier (i.e.classifying everything as the most frequent group) are presented as a baseline score.Table 5 shows that we are able to predict the correct group of a text with high accuracy, over 98% with the best-performing model (Random forest).This shows that a combination of readability measures that we evaluated in this paper can be used to accurately distinguish between different groups of text.

Conclusion and Future Work
We analyzed statistical distributions of well-known readability measures on Slovene texts.We extracted five subcorpora of texts from the Gigafida corpus with commonly perceived different readability levels: children magazines, popular magazines, newspapers, technical magazines, and national assembly texts.We find that the readability formulas are able to distinguish between these subcorpora reasonably well, with the exception of national assembly texts, which are of a different, spoken, genre and the used measures were not originally designed to handle it.A number of simple readability statistics, such as the context of difficult words and average sentence length, also dichotomize the different subcorpora of text.
In this work, we only focused on simple readability formulas along with some additional readability criteria.There exist several more complex methods for evaluating the complexity of texts, such as the one presented in Lu ( 2009) and Wiersma et al. (2010).Such advanced methods might be more suitable for Slovene texts than the simple methods used in this paper, and we plan to test them in future work.
Most of the used English readability formulas were designed to correlate with school grades and were initially tuned on that domain.For Slovene, there currently is no publicly available data set with texts tagged according to the appropriate grade level.This disallows analysis of the readability measures from this perspective.In future work, we plan to prepare such a corpus and design several readability scores fit for different purposes.This will allow us to frame text complexity as a classification problem with the goal of predicting the grade level of a text instead of predicting its group membership.In a similar approach, experts would annotate texts with readability scores.This would allow us to fit a regression model using the readability measures analyzed in this paper.
Another area that we plan to explore is the use of coherence and cohesion measures (Barzilay and Lapata 2008;Crossley et al. 2016), which are used to determine if words, sentences, and paragraphs are logically connected.Coherence and cohesion methods usually use machine learning approaches that mostly rely on languagespecific features and shall be therefore evaluated on Slovene texts.The same applies to readability measures based on machine learning (Francois and Miltsakaki 2012) which we also plan to analyze in the future.

PREDICTINg SLOVENE TExT COMPLExITy uSINg READABILITy MEASuRES sUmmarY
In English, the problem of determining text readability (i.e.how easy a text is to understand) has long been a topic of research, with its origins in the 19th century.Since then, many different methods and readability measures have been developed, often with the goal of determining whether a text is too difficult for its target age group.Even though the question of readability is complex from a linguistic standpoint, a large majority of existing measures are based on simple heuristics.Since most of these measures were developed for English texts, it is hard to say how well they would perform on Slovene texts.Measures designed for English are designed to correspond with the American school system, are sometimes based on pre-constructed lists of easy words which do not exist for Slovene and do not take into account morphological information when determining whether a word is difficult or not.
In our work, we analyze some common readability measures on Slovene text.We also introduce and analyze two additional readability criteria that do not appear in any of the analyzed readability measures: morphological difficulty, where we assume word forms that appear rarely are harder to understand than the ones that appear commonly and the context of difficult words, where we assume difficult words are easier to understand in a context of simple words, as their meaning can be inferred from that context.We performed the analysis on 14,581 text documents from the Gigafida corpus, which were split into five groups based on their target audience (childrens' magazines, pop magazines, newspaper articles, computer magazines, and transcriptions of sessions of the National Assembly).We assumed that the groups should have different readability scores due to their differing target audiences and writing styles.
For each analyzed readability measure we checked how well it separates texts from different groups.We did this by first obtaining the statistical distribution of readability scores for texts in each group and checking how much the distributions differ.We show that a number of common readability measures designed for English work well on Slovene texts.To determine which of the measures perform the best we used several statistical tests.
We also show that machine-learning methods can be used to accurately (over 98% chance of a correct prediction) predict which group a text belongs to based on its readability scores.We trained four different machine-learning models (decision trees, random forests, naïve Bayes classifier, and extreme gradient boosting) and evaluated them on our dataset.We obtained the best result (98.4% classification accuracy) by using random forests.Tadej Škvorc, Simon Krek, Senja Pollak, Špela Arhar Holdt, Marko Robnik-Šikonja

Figure 1 :
Figure 1: the score distributions for three text subcorpora and three readability measures.the distributions show that technical magazines readability scores are the most consistent, while newspapers' scores are more diverse.Children's magazines' scores have a strong peak on the left-hand side (easier texts) that is well separated from the other sources.

Figure 2 :
Figure2: the scores of each readability measure for each subcorpus of texts, represented with box plots.the subcorpora depicted from left to right are: 1.) Children's magazines, 2.) General magazines, 3.) Newspapers, 4.) Computer magazines, and 5.) National assembly transcriptions.the boxes show the first, second, and third quartile of the distributions while the whiskers extend for 1.5 iQr past the first and third quartile.

table 1 :
the number of documents and the average number of words per document for each subcorpus.
contains around 120 hours of recorded spoken Slovene (1,035,101 words), as well as transcriptions of the recordings.The recordings are collected from a variety of sources, including conversations, television, radio, and phone calls.Around 10% of the corpus consists of recorded lessons in primary and secondary schools.-Janes(Fišeretal. 2014) contains Slovene texts from various internet sources, such as tweets, forum posts, blogs, comments, and Wikipedia talk pages.-Kres (Logar Berginc and Šuster 2009) is a sub-corpus of Gigafida that is balanced with respect to the source (e.g.newspapers, magazines, or internet).

table 2 :
the mean and standard deviation for each subcorpus of texts and each readability score.

table 3 :
the bhattacharyya distances and coefficients between the distributions of scores for children's magazines and newspapers for each readability measure.the results are sorted by decreasing distance.

table 4 :
the ranks of readability measures obtained by the statistical tests, which report the usefulness of readability measures for predicting group membership.the measures are ordered from the most useful to the least useful.

table 5 :
the classification accuracies for each of the models.the numbers show the percentage of texts for which the group membership was correctly predicted.