• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

HSE University Develops Tool for Assessing Text Complexity in Low-Resource Languages

An installation at the National Library of the Republic of Tatarstan celebrating the history of Tatar writing, featuring symbols from various alphabets

An installation at the National Library of the Republic of Tatarstan celebrating the history of Tatar writing, featuring symbols from various alphabets
© Wikimedia Commons

Researchers at the HSE Centre for Language and Brain have developed a tool for assessing text complexity in low-resource languages. The first version supports several of Russia’s minority languages, including Adyghe, Bashkir, Buryat, Tatar, Ossetian, and Udmurt. This is the first tool of its kind designed specifically for these languages, taking into account their unique morphological and lexical features.

According to the Institute of Linguistics of the Russian Academy of Sciences, 155 languages are spoken in Russia. Some of them are used by relatively small communities—for example, around 80,000 people speak Adyghe, while 250,000 to 350,000 people speak Buryat, Ossetian, and Udmurt. Other languages, such as Bashkir and Tatar, have more than one million native speakers. All of these languages hold official status in various republics of Russia, making it essential not only to preserve them but also to create conditions for their development, including opportunities for learning and use in education and science. 

In 2025, a Presidential Decree approving the Fundamentals of the State Language Policy of the Russian Federation was adopted. It affirms linguistic diversity and outlines a strategy for the development and practical use of the languages spoken by the peoples of Russia. One way to advance these goals is to create digital tools that make working with low-resource languages easier and more accessible.

A team of scientists at the HSE Centre for Language and Brain has developed an online text complexity calculator for quick and easy assessment of text difficulty in several minority languages, taking into account their linguistic features. The calculator is based on Textometr, a tool created by Antonina Laposhina and Maria Lebedeva for evaluating the complexity of Russian-language texts.

The calculator developed by psycholinguists at HSE University evaluates texts across several parameters: word length and frequency based on data from language corpora; the percentage of vocabulary covered by the frequency list (ie the share of words in the text that appear among the 5,000 most frequent words in the respective language); and the distribution of parts of speech within the text. In addition, the calculator considers factors such as lexical density and diversity, as well as the text's narrativity and descriptiveness.

The key innovation is the use of the Flesch Reading Ease formula, adapted separately for each language, making it possible to assess text complexity and readability more accurately. 

The Flesch score is based on the number of words, sentences, and syllables, but the original coefficients were developed for English and do not work well for structurally different languages—such as the polysynthetic Adyghe language, in which the average word is much longer. In a 2025 study, Uliana Petrunina and Nina Zdorova recalculated the formula’s coefficients specifically for Adyghe, which significantly improved the accuracy of the readability assessment.

Uliana Petrunina

'The parameters of our calculator are adapted to the structural features of each of the six low-resource languages of Russia, using text corpora as well as frequency and morphological analyses. We also adapted the classic Flesch Reading Ease score. As a result, the algorithm can be easily reconfigured for other low-resource languages, regardless of their typological characteristics,' explains Uliana Petrunina, Research Fellow at the HSE Centre for Language and Brain and one of the developers of the tool.

The tool will help create comparable stimulus materials for linguistic experiments and provide teachers with a resource for selecting high-quality educational materials by difficulty level. This solution represents an important contribution to the preservation and development of Russia’s minority languages and to supporting the country’s linguistic diversity. 

Nina Zdorova

'Our tool allows researchers and teachers to select materials based on their linguistic complexity, which is particularly important for research and education in languages with limited resources,' says Nina Zdorova, one of the creators of the tool.

Future versions are expected to include additional low-resource languages that are underrepresented in linguistics, both in Russia and beyond.

See also:

HSE Scientists Develop Method to Stabilise Iodine in Solar Cells

Scientists at HSE MIEM, in collaboration with colleagues from China, have developed a method to improve the durability of perovskite solar cells by addressing iodine loss from the material. The researchers introduced quaternary ammonium molecules into the perovskite structure; these molecules form strong electrostatic pairs with iodine ions, effectively anchoring them within the crystal lattice. As a result, the solar cells retain more than 92% of their power after a thousand hours of operation at 85°C. The study has been published in Advanced Energy Materials.

HSE Researchers Create Genome-Wide Map of Quadruplexes

An international team, including researchers from HSE University, has created the first comprehensive map of quadruplexes—unstable DNA structures involved in gene regulation. For the first time, scientists have shown that these structures function in pairs: one is located in a DNA region that initiates gene transcription, while the other lies in a nearby region that enhances this process. In healthy tissues, quadruplexes regulate tissue-specific genes, whereas in cancerous tissues they influence genes responsible for cell growth and division. These findings may contribute to the development of new anticancer drugs that target quadruplexes. The study has been published in Nucleic Acids Research.

HSE Scholars to Join Sino-Russian Association of Fundamental Sciences

The Sino-Russian Association of Fundamental Sciences has officially begun its work in China. It brings together research centres in mathematics, physics, chemistry, life sciences, and Earth sciences, with participation from HSE University scholars. During the launch conference, the Sino-Russian Mathematics Series project was also presented; it envisages the publication of 100 textbooks and monographs over the next ten years. HSE University representatives Ivan Arzhantsev and Sergei Lando have joined the project’s editorial board.

Mathematician from HSE University–Nizhny Novgorod Solves Equation Considered Unsolvable in Quadratures Since 19th Century

Mathematician Ivan Remizov from HSE University–Nizhny Novgorod and the Institute for Information Transmission Problems of the Russian Academy of Sciences has made a conceptual breakthrough in the theory of differential equations. He has derived a universal formula for solving problems that had been considered unsolvable in quadratures for more than 190 years. This result fundamentally reshapes one of the oldest areas of mathematics and has potential to have important implications for fundamental physics and economics. The paper has been published in Vladikavkaz Mathematical Journal.

Scientists Reveal How Language Supports Complex Cognitive Processing in the Brain

Valeria Vinogradova, a researcher at HSE University, together with British colleagues, studied how language proficiency affects cognitive processing in deaf adults. The study showed that higher language proficiency—regardless of whether the language is signed or spoken—is associated with higher activity and stronger functional connectivity within the brain network responsible for cognitive task performance. The findings have been published in Cerebral Cortex.

HSE AI Research Centre Simplifies Particle Physics Experiments

Scientists at the HSE AI Research Centre have developed a novel approach to determining robustness in deep learning models. Their method works eight times faster than an exhaustive model search and significantly reduces the need for manual verification. It can be applied to particle physics problems using neural networks of various architectures. The study has been published in IEEE Access.

Scientists Show That Peer Influence Can Be as Effective as Expert Advice

Eating habits can be shaped not only by the authority of medical experts but also through ordinary conversations among friends. Researchers at HSE University have shown that advice from peers to reduce sugar consumption is just as effective as advice from experts. The study's findings have been published in Frontiers in Nutrition.

HSE University Establishes Cybersecurity Department

The HSE University Moscow Tikhonov Institute of Electronics and Mathematics (MIEM) has established a new Department of Cybersecurity. This move consolidates MIEM’s educational, scientific, and expert resources in information and computer security, expands its portfolio of educational programmes, strengthens partnerships with industry leaders, and enhances HSE’s position as a leading centre of cybersecurity competence.

HSE University to Host Second ‘Genetics and the Heart’ Congress

HSE University, the National Research League of Cardiac Genetics, and the Central State Medical Academy of the Administrative Directorate of the President will hold the Second ‘Genetics and the Heart’ Congress with international participation. The event will take place on February 7–8, 2026, at the HSE University Cultural Centre.

HSE Scientists Uncover How Authoritativeness Shapes Trust

Researchers at the HSE Institute for Cognitive Neuroscience have studied how the brain responds to audio deepfakes—realistic fake speech recordings created using AI. The study shows that people tend to trust the current opinion of an authoritative speaker even when new statements contradict the speaker’s previous position. This effect also occurs when the statement conflicts with the listener’s internal attitudes. The research has been published in the journal NeuroImage.