Large Language Models No Longer Require Powerful Servers

Scientists from Yandex, HSE University, MIT, KAUST, and ISTA have made a breakthrough in optimising LLMs. Yandex Research, in collaboration with leading science and technology universities, has developed a method for rapidly compressing large language models (LLMs) without compromising quality. Now, a smartphone or laptop is enough to work with LLMs—there's no need for expensive servers or high-powered GPUs.
This method enables faster testing and more efficient implementation of new neural network-based solutions, reducing both development time and costs. As a result, LLMs are more accessible not only to large corporations, but also to smaller companies, non-profit laboratories and institutes, as well as individual developers and researchers.
Previously, running a language model on a smartphone or laptop required quantising on an expensive server—a process that could take anywhere from a few hours to several weeks. Quantisation can now be performed directly on a smartphone or laptop in just a few minutes.
Challenges in implementing LLMs
The main obstacle to using LLMs is that they require considerable computational power. This applies to open-source models as well. For example, the popular DeepSeek-R1 is too large to run even on high-end servers built for AI and machine learning workloads, meaning that very few companies can effectively use LLMs, even if the model itself is publicly available.
The new method reduces the model's size while maintaining its quality, making it possible to run on more accessible devices. This method allows even larger models, such as DeepSeek-R1 with 671 billion parameters and Llama 4 Maverick with 400 billion parameters, to be compressed, which until now could only be quantised using basic methods and resulted in significant quality loss.
The new quantisation method opens up more opportunities to use LLMs across various fields, particularly in resource-limited sectors such as education and the social sphere. Startups and independent developers can now implement compressed models to create innovative products and services without the need for costly hardware investments. Yandex is already applying the new method for prototyping—creating working versions of products and quickly validating ideas. Testing compressed models takes less time than testing the original versions.
Key details of the new method
The new quantisation method is named HIGGS (Hadamard Incoherence with Gaussian MSE-Optimal GridS). It enables the compression of neural networks without the need for additional data or computationally intensive parameter optimisation. This is especially useful in situations where there is not enough relevant data available to train the model. HIGGS strikes a balance between the quality, size, and complexity of the quantised models, making them suitable for use on a variety of devices.
The method has already been validated on the widely used Llama 3 and Qwen2.5 models. Experiments have shown that HIGGS outperforms all existing data-free quantisation methods, including NF4 (4-bit NormalFloat) and HQQ (Half-Quadratic Quantisation), in terms of both quality and model size.

Scientists from HSE University, the Massachusetts Institute of Technology (MIT), the Austrian Institute of Science and Technology (ISTA), and King Abdullah University of Science and Technology (KAUST, Saudi Arabia), all contributed to the development of the method.
The HIGGS method is already accessible to developers and researchers on Hugging Face and GitHub, with a research paper available on arXiv.
Response from the academic community, and other methods
The paper describing the new method has been accepted for presentation at one of the largest AI conferences in the world—the North American Chapter of the Association for Computational Linguistics (NAACL). The conference will be held from April 29 to May 4, 2025, in Albuquerque, New Mexico, USA, and Yandex will be among the attendees, along with other companies and universities such as Google, Microsoft Research, and Harvard University. The paper has been cited by Red Hat AI, an American software company, as well as Peking University, Hong Kong University of Science and Technology, Fudan University, and others.
Previously, scientists from Yandex presented 12 studies focused on LLM quantisation. The company aims to make the application of LLMs more efficient, less energy-consuming, and accessible to all developers and researchers. For example, the Yandex Research team has previously developed methods for compressing LLMs, which reduce computational costs by nearly eight times, while not significantly compromising the quality of the neural network’s responses. The team has also developed a solution that allows running a model with 8 billion parameters on a regular computer or smartphone through a browser interface, even without major computational power.
See also:
HSE and Yandex Propose Method to Speed Up Neural Networks for Image Generation
A team of scientists at HSE FCS and Yandex Research has proposed a method that reduces computational costs and accelerates text-to-image generation in diffusion models without compromising quality. These models currently set the standard for text-to-image generation, but their use is limited by high computational loads, the company said in a statement.
HSE Scientists Identify Effective Models for Training Research Personnel for Industry
Experts from the HSE Institute for Statistical Studies and Economics of Knowledge have examined industrial PhD programmes across 19 countries worldwide. The analysis shows that the key components of an effective model include co-funding by universities, industry, and government; dual academic supervision; and flexible intellectual property arrangements. The findings have been published in Foresight and STI Governance.
HSE Biologists Identify Factors That Accelerate Breast Cancer Recurrence
Scientists at HSE University have identified a molecular mechanism underlying aggressive breast cancer. They found that the signals supporting tumour growth originate not from the tumour itself but from its microenvironment. The researchers also demonstrated that reduced levels of the IGFBP6 protein in the tumour microenvironment lead to the accumulation of macrophages—immune cells associated with a higher risk of cancer recurrence. These findings already make it possible to assess patient risk more accurately and may, in the future, enable the development of drugs that target cells of the tumour microenvironment. The study has been published in Current Drug Therapy.
Russian Scientists Propose Method to Speed Up Microwave Filter Design
Researchers at HSE MIEM, in collaboration with colleagues from the Moscow Technical University of Communications and Informatics (MTUCI), have implemented a novel approach to designing microwave filters—generative synthesis using machine learning tools. The proposed method reduces the filter development cycle from several days to just a few minutes and in the future could be applied to the design of other microwave electronic devices. The results were presented at the IEEE International Conference '2026 Systems of Signals Generating and Processing in the Field of on Board Communications.'
Scientists Find That Only Technological Innovations Consistently Advance Environmental Sustainability
Renewable energy and labour productivity do not always contribute to environmental sustainability. Technological innovation is the only factor that consistently has a positive effect. This is the conclusion reached by an international team of researchers, including Natalia Veselitskaya, Leading Research Fellow at the HSE ISSEK Foresight Centre. The study has been published in Sustainable Development.
HSE Researchers Train Neural Network to Predict Protein–Protein Interactions More Accurately
Scientists at the AI and Digital Science Institute of the HSE Faculty of Computer Science have developed a model capable of predicting protein–protein interactions with 95% accuracy. GSMFormer-PPI integrates three types of protein data (including information about protein surface properties) to analyse relationships between proteins, rather than simply combining datasets as in previous models. The solution could accelerate the discovery of disease molecular mechanisms, biomarkers, and potential therapeutic targets. The paper has been published in Scientific Reports.
HSE Scientists Uncover Mechanism Behind Placental Lipid Metabolism Disorders in Preeclampsia
Scientists at HSE University have discovered that in preeclampsia—one of the most severe complications of pregnancy—the placenta remodels its lipid metabolism, reducing its own cholesterol synthesis while increasing cholesterol transfer to the foetus. This compensatory mechanism helps sustain foetal nutrition but accelerates placental deterioration and may lead to preterm birth. The study findings have been published in Frontiers in Molecular Biosciences.
HSE Experts Reveal Low Accuracy of Technology Forecasts in Transportation
HSE researchers evaluated the accuracy of technology forecasts in the transportation sector over the past 50 years and found that the average accuracy rate does not exceed 25%, with the lowest accuracy observed in aviation and rail transport. According to the scientists, this is due to limitations of the forecasting method and the inherent complexities of the sector. The study findings have been published in Technological Forecasting and Social Change.
Wearable Device Data and Saliva Biomarkers Help Assess Stress Resilience
A team of scientists, including researchers from HSE University, has proposed a method for assessing stress resilience using physiological markers derived from wearable devices and saliva samples. The participants who adapted better to stress showed higher heart rate variability, higher zinc concentrations in saliva, and lower potassium levels. The findings were published in the Journal of Molecular Neuroscience.
When Circumstances Are Stronger Than Habits: How Financial Stress Affects Smoking Cessation
HSE researchers have found that the likelihood of quitting smoking rises with increasing financial struggles. While low levels of financial difficulties do not affect smoking behaviour, moderate financial stress can increase the probability of quitting by 13% to 21%. Responses to high financial stress differ by gender: men are almost 1.5 times more likely to give up cigarettes than under normal conditions, whereas no significant effect is observed on women’s decisions to quit smoking. These conclusions are based on data from the Russia Longitudinal Monitoring Survey (RLMS-HSE) for 2000–2023 and have been published in Monitoring of Public Opinion: Economic and Social Changes.


