HSE University Develops Tool for Assessing Text Complexity in Low-Resource Languages
,_interior_77.jpg)
Researchers at the HSE Centre for Language and Brain have developed a tool for assessing text complexity in low-resource languages. The first version supports several of Russia’s minority languages, including Adyghe, Bashkir, Buryat, Tatar, Ossetian, and Udmurt. This is the first tool of its kind designed specifically for these languages, taking into account their unique morphological and lexical features.
According to the Institute of Linguistics of the Russian Academy of Sciences, 155 languages are spoken in Russia. Some of them are used by relatively small communities—for example, around 80,000 people speak Adyghe, while 250,000 to 350,000 people speak Buryat, Ossetian, and Udmurt. Other languages, such as Bashkir and Tatar, have more than one million native speakers. All of these languages hold official status in various republics of Russia, making it essential not only to preserve them but also to create conditions for their development, including opportunities for learning and use in education and science.
In 2025, a Presidential Decree approving the Fundamentals of the State Language Policy of the Russian Federation was adopted. It affirms linguistic diversity and outlines a strategy for the development and practical use of the languages spoken by the peoples of Russia. One way to advance these goals is to create digital tools that make working with low-resource languages easier and more accessible.
A team of scientists at the HSE Centre for Language and Brain has developed an online text complexity calculator for quick and easy assessment of text difficulty in several minority languages, taking into account their linguistic features. The calculator is based on Textometr, a tool created by Antonina Laposhina and Maria Lebedeva for evaluating the complexity of Russian-language texts.
The calculator developed by psycholinguists at HSE University evaluates texts across several parameters: word length and frequency based on data from language corpora; the percentage of vocabulary covered by the frequency list (ie the share of words in the text that appear among the 5,000 most frequent words in the respective language); and the distribution of parts of speech within the text. In addition, the calculator considers factors such as lexical density and diversity, as well as the text's narrativity and descriptiveness.
The key innovation is the use of the Flesch Reading Ease formula, adapted separately for each language, making it possible to assess text complexity and readability more accurately.
The Flesch score is based on the number of words, sentences, and syllables, but the original coefficients were developed for English and do not work well for structurally different languages—such as the polysynthetic Adyghe language, in which the average word is much longer. In a 2025 study, Uliana Petrunina and Nina Zdorova recalculated the formula’s coefficients specifically for Adyghe, which significantly improved the accuracy of the readability assessment.
Uliana Petrunina
'The parameters of our calculator are adapted to the structural features of each of the six low-resource languages of Russia, using text corpora as well as frequency and morphological analyses. We also adapted the classic Flesch Reading Ease score. As a result, the algorithm can be easily reconfigured for other low-resource languages, regardless of their typological characteristics,' explains Uliana Petrunina, Research Fellow at the HSE Centre for Language and Brain and one of the developers of the tool.
The tool will help create comparable stimulus materials for linguistic experiments and provide teachers with a resource for selecting high-quality educational materials by difficulty level. This solution represents an important contribution to the preservation and development of Russia’s minority languages and to supporting the country’s linguistic diversity.
Nina Zdorova
'Our tool allows researchers and teachers to select materials based on their linguistic complexity, which is particularly important for research and education in languages with limited resources,' says Nina Zdorova, one of the creators of the tool.
Future versions are expected to include additional low-resource languages that are underrepresented in linguistics, both in Russia and beyond.
Nina Zdorova
See also:
Wearable Device Data and Saliva Biomarkers Help Assess Stress Resilience
A team of scientists, including researchers from HSE University, has proposed a method for assessing stress resilience using physiological markers derived from wearable devices and saliva samples. The participants who adapted better to stress showed higher heart rate variability, higher zinc concentrations in saliva, and lower potassium levels. The findings were published in the Journal of Molecular Neuroscience.
HSE Unveils Anthropomorphic Courier Robot
From April 1 to 3, 2026, the Fourth Robotics Festival took place, with the HSE Faculty of Computer Science acting as the main organiser. The event featured the presentation of the anthropomorphic courier robot Arkus. The humanoid was introduced by the Institute for Robotic Systems, established jointly by HSE University and the EFKO Group of Companies.
When Circumstances Are Stronger Than Habits: How Financial Stress Affects Smoking Cessation
HSE researchers have found that the likelihood of quitting smoking rises with increasing financial struggles. While low levels of financial difficulties do not affect smoking behaviour, moderate financial stress can increase the probability of quitting by 13% to 21%. Responses to high financial stress differ by gender: men are almost 1.5 times more likely to give up cigarettes than under normal conditions, whereas no significant effect is observed on women’s decisions to quit smoking. These conclusions are based on data from the Russia Longitudinal Monitoring Survey (RLMS-HSE) for 2000–2023 and have been published in Monitoring of Public Opinion: Economic and Social Changes.
‘It Is a Great Honour for Us to Be Partners’
In late March 2026, an official meeting took place between a delegation from HSE University and delegations from Vietnam National University, Hanoi (VNU); the Government of the Socialist Republic of Vietnam; and the Embassy of Vietnam in the Russian Federation. The participants discussed key areas of cooperation that will help strengthen ties not only between the universities, but also between the two countries.
HSE Researchers Propose New Method of Verbal Fluency Analysis for Early Detection of Cognitive Impairment
Researchers from the HSE Center for Language and Brain and the Mental Health Research Centre have proposed a new method of linguistic analysis that enables the distinction between normal and pathological ageing. Using this approach, they showed that patterns in patients’ word choices during verbal fluency tests allow clinicians to more accurately differentiate clinically significant impairments from subjective memory complaints. Incorporating this type of analysis into clinical practice could improve the accuracy of early dementia diagnosis. The results have been published in Applied Neuropsychology: Adult.
How the Brain Processes a Word: HSE Researchers Compare Reading Routes in Adults and Children
Researchers from the HSE Center for Language and Brain used magnetoencephalography to study how the brains of adults and children respond to words during reading. They showed that in children the brain takes longer to process words that are frequently used in everyday speech, while rare words and pseudowords are processed in the same way—slowly and in parts. With age, the system is reorganised: high-frequency words shift to a fast route, whereas new letter combinations are still analysed slowly. The study was published in the journal Psychophysiology.
From Spins and Two-Dimensional Materials to Tsunamis and Tornadoes: What HSE Physicists Study
The Laboratory for Condensed Matter Physics studies highly complex processes of interaction between molecules and atoms in solids and liquids, the quantum mechanics of these processes, and ultra-thin two-dimensional materials. HSE physicists, together with colleagues from leading academic institutes, investigate the properties of superconductors and topological materials, phenomena at ultra-low temperatures, as well as problems of turbulence and hydrodynamics.
How Neural Networks Detect and Interpret Wordplay: New Insights from HSE Researchers
An international team including researchers from the HSE Faculty of Computer Science has presented KoWit-24, an annotated dataset of 2,700 Russian-language Kommersant news headlines containing wordplay. The dataset enables an assessment of how artificial intelligence detects and interprets wordplay. Experiments with five large language models show that even advanced systems still make mistakes, and that interpreting wordplay is more challenging for them than detecting it. The results were presented at the RANLP conference; the paper is available on Arxiv.org, and the dataset and the code for reproducing the experiments are available on GitHub.
HSE Holds Exams, Quizzes, and Selection Rounds for School Students in Tashkent and Bishkek
More than 3,000 international school students took part in the INTO HSE International Olympiad, whose award ceremonies were held in Tashkent and Bishkek in March 2026. The university’s outreach events also included final examinations, presentations of academic programmes, and on-site selection tests for prospective applicants. In Uzbekistan and Kyrgyzstan, nearly 200 participants received diplomas as winners and prize-winners. The best of them will be eligible to apply for state-funded places at HSE.
HSE Economists Find That Auction Prices Depend on Artist’s Life Story
Researchers from the Centre for Big Data in Economics and Finance at the HSE Faculty of Economic Sciences have found that facts from an artist’s life are statistically significant in pricing a painting, alongside such traditional characteristics as the material, the size of the canvas, or the presence of the artist’s signature. This conclusion is based on an analysis of prices for 15,000 works by 158 artists sold since 1999 by the major auction houses Sotheby’s and Christie’s. The article has been published in the journal Empirical Studies of the Arts.


