• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Large Language Models No Longer Require Powerful Servers

Large Language Models No Longer Require Powerful Servers

© iStock

Scientists from Yandex, HSE University, MIT, KAUST, and ISTA have made a breakthrough in optimising LLMs. Yandex Research, in collaboration with leading science and technology universities, has developed a method for rapidly compressing large language models (LLMs) without compromising quality. Now, a smartphone or laptop is enough to work with LLMs—there's no need for expensive servers or high-powered GPUs.

This method enables faster testing and more efficient implementation of new neural network-based solutions, reducing both development time and costs. As a result, LLMs are more accessible not only to large corporations, but also to smaller companies, non-profit laboratories and institutes, as well as individual developers and researchers.

Previously, running a language model on a smartphone or laptop required quantising on an expensive server—a process that could take anywhere from a few hours to several weeks. Quantisation can now be performed directly on a smartphone or laptop in just a few minutes.

Challenges in implementing LLMs

The main obstacle to using LLMs is that they require considerable computational power. This applies to open-source models as well. For example, the popular DeepSeek-R1 is too large to run even on high-end servers built for AI and machine learning workloads, meaning that very few companies can effectively use LLMs, even if the model itself is publicly available.

The new method reduces the model's size while maintaining its quality, making it possible to run on more accessible devices. This method allows even larger models, such as DeepSeek-R1 with 671 billion parameters and Llama 4 Maverick with 400 billion parameters, to be compressed, which until now could only be quantised using basic methods and resulted in significant quality loss.

The new quantisation method opens up more opportunities to use LLMs across various fields, particularly in resource-limited sectors such as education and the social sphere. Startups and independent developers can now implement compressed models to create innovative products and services without the need for costly hardware investments. Yandex is already applying the new method for prototyping—creating working versions of products and quickly validating ideas. Testing compressed models takes less time than testing the original versions.

Key details of the new method

The new quantisation method is named HIGGS (Hadamard Incoherence with Gaussian MSE-Optimal GridS). It enables the compression of neural networks without the need for additional data or computationally intensive parameter optimisation. This is especially useful in situations where there is not enough relevant data available to train the model. HIGGS strikes a balance between the quality, size, and complexity of the quantised models, making them suitable for use on a variety of devices.

The method has already been validated on the widely used Llama 3 and Qwen2.5 models. Experiments have shown that HIGGS outperforms all existing data-free quantisation methods, including NF4 (4-bit NormalFloat) and HQQ (Half-Quadratic Quantisation), in terms of both quality and model size.

© iStock

Scientists from HSE University, the Massachusetts Institute of Technology (MIT), the Austrian Institute of Science and Technology (ISTA), and King Abdullah University of Science and Technology (KAUST, Saudi Arabia), all contributed to the development of the method.

The HIGGS method is already accessible to developers and researchers on Hugging Face and GitHub, with a research paper available on arXiv.

Response from the academic community, and other methods

The paper describing the new method has been accepted for presentation at one of the largest AI conferences in the world—the North American Chapter of the Association for Computational Linguistics (NAACL). The conference will be held from April 29 to May 4, 2025, in Albuquerque, New Mexico, USA, and Yandex will be among the attendees, along with other companies and universities such as Google, Microsoft Research, and Harvard University. The paper has been cited by Red Hat AI, an American software company, as well as Peking University, Hong Kong University of Science and Technology, Fudan University, and others.

Previously, scientists from Yandex presented 12 studies focused on LLM quantisation. The company aims to make the application of LLMs more efficient, less energy-consuming, and accessible to all developers and researchers. For example, the Yandex Research team has previously developed methods for compressing LLMs, which reduce computational costs by nearly eight times, while not significantly compromising the quality of the neural network’s responses. The team has also developed a solution that allows running a model with 8 billion parameters on a regular computer or smartphone through a browser interface, even without major computational power.

See also:

HSE Develops App for Assessing Phonological Processing in Children

Researchers at the HSE Centre for Language and Brain have developed a new digital tool for assessing children's phonological processing skills—the ZARYA (Sound Analysis of the Russian Language) test battery. It is the first standardised application in Russia designed to provide a fast and reliable assessment of children's ability to distinguish speech sounds, retain them in working memory, and perform phonemic analysis. The app runs on Android tablets and smartphones and is available for download from RuStore. Details of the test validation have been published in the Journal of Speech, Language, and Hearing Research.

Researchers Discover How Spelling Errors Slow Down Reading in Russian

Psycholinguists from the Centre for Language and Brain at HSE University–St Petersburg have shown that words that are frequently misspelled are processed more slowly by readers, even when presented with the correct spelling. The researchers confirmed this effect for the first time using Russian-language materials and found that response speed is most strongly linked to how confidently individuals can distinguish the correct spelling of a word from an incorrect one. The study has been published in The Mental Lexicon.

Scientists Discover Why Europium 'Misbehaves'

Europium is a rare-earth metal responsible for the pure red glow in displays and other luminescent materials. For a long time, however, it refused to emit light when surrounded by certain organic molecules known as acylpyrazolone ligands. Chemists have now uncovered the reason: in europium complexes with these ligands, a 'black window' appears—a charge-transfer state in which the energy absorbed by the ligand is dissipated as heat rather than emitted as light. Understanding this mechanism opens the way to designing more efficient red-emitting materials for displays, fluorescent thermometers, and chemical sensors. The results have been published in Dalton Transactions.

HSE Economists Reveal How the Wage Gap Emerges Among Vocational School Graduates

HSE researchers examined the careers of 600,000 graduates of Russian secondary vocational education programmes and found that at the start of their careers, the gender wage gap reaches 23%, doubling after three years. This disparity is largely due to male and female students choosing different occupations when enrolling in vocational schools. These were the findings made by Sergey Roshchin, Natalya Yemelina, and Ksenia Rozhkova from of the HSE Faculty of Economic Sciences. The article has been published in Educational Studies.

HSE Researchers Make Aldehydes Perform Dual Function

Chemists from HSE University have discovered a way to carry out a reductive addition reaction without using an external reducing agent. Instead, the required 'resource' is supplied by the aldehyde itself, one of the reaction participants. This approach helps prevent unwanted side reactions, reduces toxicity, and simplifies the production and synthesis of organic molecules, including those used in the manufacture of medicines. The study has been published in Journal of Catalysis.

HSE Scientists Explain Why Findings in Autism Research Differ

Researchers from the Cognitive Health and Intelligence Centre at HSE University conducted the first-ever systematic review of studies on the specifics of emotion-from-motion perception in autism. The review showed that differences found between autistic and non-autistic individuals are largely associated with the experimental design and the types of tasks given to study participants. The review findings have been published in Research in Autism.

Tremors: Scientists Develop Method for Real-Time Tracking of Hazardous Underground Vibrations

Researchers from HSE MIEM and IPKON RAS have developed a new mathematical monitoring model that can identify the source of hazardous underground vibrations in real time. The technology could help reduce the risk of damage to buildings, roads, and other infrastructure located near quarries and mining sites. The paper has been published in Russian Mining Industry.

HSE Researchers Determine Which Internet Users Are More Likely to Fact-Check

Researchers at HSE University examined the strategies employed by Russian internet users to verify unreliable information and the factors that motivate them to do so. The study found that more than half of users who encounter potentially false information online attempt to verify it by locating the original source. The likelihood of fact-checking is influenced by several factors, including age, place of residence, social status, information literacy skills, and the use of AI. The findings have been published in Monitoring of Public Opinion: Economic and Social Changes.

Tabular Data Anonymisation Solution for Safe Use in AI Systems Developed at HSE University

The AI and Digital Science Institute at the HSE Faculty of Computer Science has developed a tabular data anonymisation service designed to prepare corporate datasets for use in analytics and AI applications. The solution can identify personal data in structured datasets, apply consistent and reproducible anonymisation rules, and generate the artifacts required for quality control, auditing, and subsequent use of data in secure environments.

Population Lifespan Is Governed by Mathematical Laws

Researchers at HSE University and MSU have established a universal law governing the time to extinction of a population in a random environment. Their analysis of the evolution of branching processes—complex probabilistic systems—shows that, regardless of the initial population size, extinction follows strict mathematical laws. The results have been published in the Journal of Applied Probability.