• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Large Language Models No Longer Require Powerful Servers

Large Language Models No Longer Require Powerful Servers

© iStock

Scientists from Yandex, HSE University, MIT, KAUST, and ISTA have made a breakthrough in optimising LLMs. Yandex Research, in collaboration with leading science and technology universities, has developed a method for rapidly compressing large language models (LLMs) without compromising quality. Now, a smartphone or laptop is enough to work with LLMs—there's no need for expensive servers or high-powered GPUs.

This method enables faster testing and more efficient implementation of new neural network-based solutions, reducing both development time and costs. As a result, LLMs are more accessible not only to large corporations, but also to smaller companies, non-profit laboratories and institutes, as well as individual developers and researchers.

Previously, running a language model on a smartphone or laptop required quantising on an expensive server—a process that could take anywhere from a few hours to several weeks. Quantisation can now be performed directly on a smartphone or laptop in just a few minutes.

Challenges in implementing LLMs

The main obstacle to using LLMs is that they require considerable computational power. This applies to open-source models as well. For example, the popular DeepSeek-R1 is too large to run even on high-end servers built for AI and machine learning workloads, meaning that very few companies can effectively use LLMs, even if the model itself is publicly available.

The new method reduces the model's size while maintaining its quality, making it possible to run on more accessible devices. This method allows even larger models, such as DeepSeek-R1 with 671 billion parameters and Llama 4 Maverick with 400 billion parameters, to be compressed, which until now could only be quantised using basic methods and resulted in significant quality loss.

The new quantisation method opens up more opportunities to use LLMs across various fields, particularly in resource-limited sectors such as education and the social sphere. Startups and independent developers can now implement compressed models to create innovative products and services without the need for costly hardware investments. Yandex is already applying the new method for prototyping—creating working versions of products and quickly validating ideas. Testing compressed models takes less time than testing the original versions.

Key details of the new method

The new quantisation method is named HIGGS (Hadamard Incoherence with Gaussian MSE-Optimal GridS). It enables the compression of neural networks without the need for additional data or computationally intensive parameter optimisation. This is especially useful in situations where there is not enough relevant data available to train the model. HIGGS strikes a balance between the quality, size, and complexity of the quantised models, making them suitable for use on a variety of devices.

The method has already been validated on the widely used Llama 3 and Qwen2.5 models. Experiments have shown that HIGGS outperforms all existing data-free quantisation methods, including NF4 (4-bit NormalFloat) and HQQ (Half-Quadratic Quantisation), in terms of both quality and model size.

© iStock

Scientists from HSE University, the Massachusetts Institute of Technology (MIT), the Austrian Institute of Science and Technology (ISTA), and King Abdullah University of Science and Technology (KAUST, Saudi Arabia), all contributed to the development of the method.

The HIGGS method is already accessible to developers and researchers on Hugging Face and GitHub, with a research paper available on arXiv.

Response from the academic community, and other methods

The paper describing the new method has been accepted for presentation at one of the largest AI conferences in the world—the North American Chapter of the Association for Computational Linguistics (NAACL). The conference will be held from April 29 to May 4, 2025, in Albuquerque, New Mexico, USA, and Yandex will be among the attendees, along with other companies and universities such as Google, Microsoft Research, and Harvard University. The paper has been cited by Red Hat AI, an American software company, as well as Peking University, Hong Kong University of Science and Technology, Fudan University, and others.

Previously, scientists from Yandex presented 12 studies focused on LLM quantisation. The company aims to make the application of LLMs more efficient, less energy-consuming, and accessible to all developers and researchers. For example, the Yandex Research team has previously developed methods for compressing LLMs, which reduce computational costs by nearly eight times, while not significantly compromising the quality of the neural network’s responses. The team has also developed a solution that allows running a model with 8 billion parameters on a regular computer or smartphone through a browser interface, even without major computational power.

See also:

HSE Scientists Develop Method to Stabilise Iodine in Solar Cells

Scientists at HSE MIEM, in collaboration with colleagues from China, have developed a method to improve the durability of perovskite solar cells by addressing iodine loss from the material. The researchers introduced quaternary ammonium molecules into the perovskite structure; these molecules form strong electrostatic pairs with iodine ions, effectively anchoring them within the crystal lattice. As a result, the solar cells retain more than 92% of their power after a thousand hours of operation at 85°C. The study has been published in Advanced Energy Materials.

HSE Researchers Create Genome-Wide Map of Quadruplexes

An international team, including researchers from HSE University, has created the first comprehensive map of quadruplexes—unstable DNA structures involved in gene regulation. For the first time, scientists have shown that these structures function in pairs: one is located in a DNA region that initiates gene transcription, while the other lies in a nearby region that enhances this process. In healthy tissues, quadruplexes regulate tissue-specific genes, whereas in cancerous tissues they influence genes responsible for cell growth and division. These findings may contribute to the development of new anticancer drugs that target quadruplexes. The study has been published in Nucleic Acids Research.

Mathematician from HSE University–Nizhny Novgorod Solves Equation Considered Unsolvable in Quadratures Since 19th Century

Mathematician Ivan Remizov from HSE University–Nizhny Novgorod and the Institute for Information Transmission Problems of the Russian Academy of Sciences has made a conceptual breakthrough in the theory of differential equations. He has derived a universal formula for solving problems that had been considered unsolvable in quadratures for more than 190 years. This result fundamentally reshapes one of the oldest areas of mathematics and has potential to have important implications for fundamental physics and economics. The paper has been published in Vladikavkaz Mathematical Journal.

Scientists Reveal How Language Supports Complex Cognitive Processing in the Brain

Valeria Vinogradova, a researcher at HSE University, together with British colleagues, studied how language proficiency affects cognitive processing in deaf adults. The study showed that higher language proficiency—regardless of whether the language is signed or spoken—is associated with higher activity and stronger functional connectivity within the brain network responsible for cognitive task performance. The findings have been published in Cerebral Cortex.

HSE AI Research Centre Simplifies Particle Physics Experiments

Scientists at the HSE AI Research Centre have developed a novel approach to determining robustness in deep learning models. Their method works eight times faster than an exhaustive model search and significantly reduces the need for manual verification. It can be applied to particle physics problems using neural networks of various architectures. The study has been published in IEEE Access.

Scientists Show That Peer Influence Can Be as Effective as Expert Advice

Eating habits can be shaped not only by the authority of medical experts but also through ordinary conversations among friends. Researchers at HSE University have shown that advice from peers to reduce sugar consumption is just as effective as advice from experts. The study's findings have been published in Frontiers in Nutrition.

HSE University Develops Tool for Assessing Text Complexity in Low-Resource Languages

Researchers at the HSE Centre for Language and Brain have developed a tool for assessing text complexity in low-resource languages. The first version supports several of Russia’s minority languages, including Adyghe, Bashkir, Buryat, Tatar, Ossetian, and Udmurt. This is the first tool of its kind designed specifically for these languages, taking into account their unique morphological and lexical features.

HSE Scientists Uncover How Authoritativeness Shapes Trust

Researchers at the HSE Institute for Cognitive Neuroscience have studied how the brain responds to audio deepfakes—realistic fake speech recordings created using AI. The study shows that people tend to trust the current opinion of an authoritative speaker even when new statements contradict the speaker’s previous position. This effect also occurs when the statement conflicts with the listener’s internal attitudes. The research has been published in the journal NeuroImage.

Language Mapping in the Operating Room: HSE Neurolinguists Assist Surgeons in Complex Brain Surgery

Researchers from the HSE Center for Language and Brain took part in brain surgery on a patient who had been seriously wounded in the SMO. A shell fragment approximately five centimetres long entered through the eye socket, penetrated the cranial cavity, and became lodged in the brain, piercing the temporal lobe responsible for language. Surgeons at the Burdenko Main Military Clinical Hospital removed the foreign object while the patient remained conscious. During the operation, neurolinguists conducted language tests to ensure that language function was preserved.

AI Overestimates How Smart People Are, According to HSE Economists

Scientists at HSE University have found that current AI models, including ChatGPT and Claude, tend to overestimate the rationality of their human opponents—whether first-year undergraduate students or experienced scientists—in strategic thinking games, such as the Keynesian beauty contest. While these models attempt to predict human behaviour, they often end up playing 'too smart' and losing because they assume a higher level of logic in people than is actually present. The study has been published in the Journal of Economic Behavior & Organization.