Tabular Data Anonymisation Solution for Safe Use in AI Systems Developed at HSE University

The AI and Digital Science Institute at the HSE Faculty of Computer Science has developed a tabular data anonymisation service designed to prepare corporate datasets for use in analytics and AI applications. The solution can identify personal data in structured datasets, apply consistent and reproducible anonymisation rules, and generate the artifacts required for quality control, auditing, and subsequent use of data in secure environments.
The solution addresses one of the key challenges of AI adoption in organisations: real-world data is essential for training, testing, and monitoring models, yet its direct use often carries the risk of exposing personal information. This challenge is particularly acute when working with data from corporate information systems, where information about users, employees, students, or clients is stored in the form of interconnected tables, identifiers, and attributes.
The service developed at HSE University addresses this challenge by combining processing rules, a replacement registry, and a reproducible anonymisation model. Given identical input data, the system produces consistent and predictable results, which is essential for the replicability of experiments, data quality assurance, and subsequent auditing. This approach preserves the structure of the dataset and maintains its suitability for analytical applications and AI use scenarios.
The solution is being developed in compliance with Russian personal data legislation and applicable requirements for data anonymisation. Its architecture provides for separate storage of source data and processing artifacts, as well as management of replacement rules, access controls, integrity checks, and a replacement registry. Together, these mechanisms enable the service to be integrated into a controlled AI data lifecycle management framework.
Currently, the service is used within HSE University's SmartMLOps platform to process data from the university's corporate information systems. Its applications include preparing data for analytics, testing, and the deployment of AI services. The solution can also be adapted for use in secure environments by organisations handling sensitive datasets, including those in education, healthcare, industry, finance, and government.
A separate line of development focuses on creating a version for unstructured data such as text documents, communications, contracts, and other materials in which personal data appears in free form. This version is currently under development and undergoing pilot testing. It will use a combination of rule-based methods, NLP (natural language processing) tools, and NEM (named entity recognition) models to identify personal data in texts while considering the context.
Hadi Saleh
'It is not enough for AI projects to simply have access to data. It is necessary to prepare data in a way that preserves its analytical value while ensuring that personal information is not disclosed. Our service addresses exactly this engineering challenge: it integrates anonymisation into a managed process for preparing data for AI,' said project leader Hadi Saleh, Head of the Unit for Applied Technological Solutions at the AI and Digital Science Institute, HSE Faculty of Computer Science.
The rights to the core components of the solution are reserved. The team envisions further development of the service both as an internal tool at HSE University and as a solution for deployment in secure environments within organisations that require data preparation for AI while complying with personal data protection requirements.
The project is ranked among the top 10 in the Data Security, Trust, and Quality category of the 2026 Gravitation International University Award in AI and Big Data.
The service was developed by the team of the Strategic Technological Project 'Multi-Agent AI Platform for Sectoral Solutions' as part of HSE University’s Development Programme for 2025–2036, supported under the Priority 2030 Strategic Academic Leadership Programme.
See also:
HSE Researchers Determine Which Internet Users Are More Likely to Fact-Check
Researchers at HSE University examined the strategies employed by Russian internet users to verify unreliable information and the factors that motivate them to do so. The study found that more than half of users who encounter potentially false information online attempt to verify it by locating the original source. The likelihood of fact-checking is influenced by several factors, including age, place of residence, social status, information literacy skills, and the use of AI. The findings have been published in Monitoring of Public Opinion: Economic and Social Changes.
Population Lifespan Is Governed by Mathematical Laws
Researchers at HSE University and MSU have established a universal law governing the time to extinction of a population in a random environment. Their analysis of the evolution of branching processes—complex probabilistic systems—shows that, regardless of the initial population size, extinction follows strict mathematical laws. The results have been published in the Journal of Applied Probability.
Sociologists: Conservative Consumers Dominate Russian Middle Class
The Russian middle class cannot be regarded as a homogeneous and uniformly stable social group. Similar income levels often mask significant differences in financial strategies, lifestyles, and levels of economic security. This is the conclusion reached by sociologists at HSE University. The study has been published in Voprosy Ekonomiki.
Neurolinguists Assist in Awake Surgery on 11-Year-Old Patient with Epilepsy
Researchers at the HSE Centre for Language and Brain took part in a rare awake neurosurgical procedure performed on an 11-year-old patient with drug-resistant epilepsy. Working alongside surgeons at the Voyno-Yasenetsky Centre of Specialised Medical Care for Children in Solntsevo, they monitored the resection of a portion of the left temporal lobe, where the epileptic focus had been identified.
Scientists Explain How Emotions Shape Attitudes Toward Digital Governance
Today, interactions between citizens and government increasingly take place through digital governance platforms, including digital public services, AI-powered systems, and algorithmic decision-making tools. Until now, however, these technologies have largely been viewed as technical instruments, with their effectiveness assessed primarily in terms of efficiency and user-friendliness. The authors of a new study propose a broader perspective, arguing that digital governance should also be understood as an emotional experience that directly shapes citizens' trust in public institutions.
Neural Network Maps as a Method for Constructing Mathematical Models
Scientists from HSE University–Nizhny Novgorod and the Institute of Physics Belgrade, Serbia, are jointly exploring the application of machine learning techniques and neural networks to the study of nonlinear dynamics. Natalya Stankevich, Leading Research Fellow at the Laboratory of Topological Methods in Dynamics of the Faculty of Informatics, Mathematics, and Computer Science at HSE University–Nizhny Novgorod, spoke to the HSE News Service about this international project.
HSE Scientists Develop Method to Compress Large Language Models Without Losing Quality
Researchers from the AI and Digital Science Institute at the HSE Faculty of Computer Science have developed a new compression method for large language models such as GPT and LLaMA that reduces their size by 25–36% without additional training or significant loss of accuracy. This is the first approach to use mathematical transformations—specifically, rotations of model weights—to make models more amenable to compression with structured matrices. The study results have been published in ACL Findings 2025. The code is available on GitHub.
Machine Learning Models Can Help Reduce Volatility and Boost Stock Market Returns
The use of machine learning models makes it possible to achieve greater accuracy in predicting risks in the Russian stock market compared to classical econometric approaches. The predictive power of these models increases by 23%, while the average investor’s return can reach up to 13% per annum. These conclusions were drawn by Nikita Lysenok from the Department of Financial Market Infrastructure at the HSE Faculty of Economic Sciences. The paper has been published in Fundamental and Applied Mathematics.
Pocket Money, Personal Interest, and Family Practices: What Shapes Students’ Economic Literacy?
University students' economic literacy depends not only on their field of study but also on their interest in economics, the learning environment, and family financial practices. For example, students who received pocket money irregularly tend to perform better on economic literacy tests than their peers who received financial support on a regular basis. These findings come from a study conducted by HSE University involving more than 1,100 students from five Russian universities. The findings have been published in Cakrawala Pendidikan.
HSE Study Reveals Imbalance in the Generative AI Market
Researchers at HSE University analysed how effectively the global generative artificial intelligence market converts investment into real revenue, concluding that AI is currently developing faster than it is paying off. The results have been published in the journal Foresight and STI Governance.


