The coordination of complicated interactive systems, regardless of whether it is the different means of transport in a city or the various components that have to work together to create effective and efficient robots, is an increasingly important topic for software designers. Now researchers have developed with a completely new way to approach these complex problems, and use simple diagrams as tools to uncover better approaches for software optimization in deep learning models.
They say that the new method is so easy to address these complex tasks so that it can be reduced to a drawing that fits the back of a napkin.
The new approach is described in the magazine Transactions of research for machine learningIn a paper by incoming doctoral students Vincent Abbott and Professor Gioele Zardini from MIT laboratory for information and decision systems (LIDS).
“We have designed a new language to talk about these new systems,” says Zardini. This new diagram-based “language” is strongly based on something that is referred to as a category theory, he explains.
All of this has to do with the design of the underlying architecture of computer algorithms – the programs that actually record and control the various optimized systems. “The components are different parts of an algorithm, and they have to talk to each other, exchange information, but also take into account energy consumption, memory consumption, etc. Such optimizations are notoriously difficult, since each change in part of the system can cause changes in other parts, which can further influence other parts and so on.
The researchers decided to focus on the respective class of profound algorithms that are currently a hot topic of research. Deep Learning is the basis for the large models for artificial intelligence, including large voice models such as chatt and image generation models such as Midjourney. These models manipulate data by a “deep” series of Matrix multiplications that are interspersed with other operations. The numbers in matrices are parameters and are updated during long training runs so that complex patterns can be found. Models consist of billions of parameters that make the calculation expensive and thus improves the use and optimization of resources of invaluable value.
Diagrams can represent details of the parallelized operations from which deep learning models consist of and reveal relationships between algorithms and the hardware supplied by companies such as Nvidia for parallelized graphics processing unit (GPU). “I am very excited about it,” says Zardini, because “we have apparently found a language that describes very well that deep -learning algorithms explicitly represent all important things that the operators you use”, for example the energy consumption, the storage assignment and any other parameter for which you want to optimize.
Much of the progress in the deep learning is due to optimizations of resource efficiency. The latest Deepseek model showed that a small team with top models from Openai and other large laboratories can compete by focusing on resource efficiency and the relationship between software and hardware. As a rule, he says when deriving these optimizations: “People need a lot of attempt and errors to discover new architectures.” For example, a widespread optimization program called FlaShannitung took more than four years to develop, he says. But with the new framework they developed, “we can really tackle this problem in a more formal way.” All of this is visually presented in a precisely defined graphic language.
But the methods used to find these improvements are “very limited,” he says. “I think this shows that there is a big gap because we have no formal systematic method to either refer to an algorithm to its optimal execution or even really understand how many resources need it for operation.” But now there is such a system with the new method based on diagram that you have developed.
The category theory based on this approach is a way to mathematically describe the different components of a system and, as they interact in generalized, abstractly. Different perspectives can be related. For example, mathematical formulas can be related to algorithms that you implement and use resources, or descriptions of systems can be associated with robust “monoid string diagrams”. These visualizations enable you to play around directly and to experiment on how the different parts combine and interact. What you have developed, he says, is “String diagrams on steroids”, which contain many other graphic conventions and many other properties.
“The category theory can be seen as the mathematics of abstraction and composition,” says Abbott. “Each composition system can be described using the category -theory, and the relationship between composition systems can then also be examined.” Albraic rules that are usually connected to functions can also be shown as diagrams, he says. “Then, many of the visual tricks that we can do with diagrams, we can refer to algebraic tricks and functions. So it creates this correspondence between these different systems.”
As a result, he says: “This solves a very important problem, namely that we have these deep -learning algorithms, but they are not clearly understood as mathematical models.” However, by portraying it as a diagram, it becomes possible to approach them formally and systematically, he says.
This enables a clear visual understanding of the way in which parallel real processes can be represented in multicore computer GPUs through parallel processing. “In this way,” says Abbott, “diagrams can both represent a function and then show how they can optimally perform them on a GPU.”
The algorithm “Attention” is used by deep learning algorithms that require general context -related information, and is a key phase of the serialized blocks that form large language models such as chatt. The development of flash hintion is an optimization that lasted for years, but led to a six -fold improvement in the speed of the attention algorithms.
Zardini turns her method to the established flash stance algorithm and says: “Here we can literally derive it on a napkin.” Then he adds: “Ok, maybe it's a big napkin.” However, in order to drive home, how much your new approach can simplify dealing with these complex algorithms, you have described your formal research work on the work “Flash Dennation on a napkin”.
This method, says Abbott, “enables optimization to be derived very quickly in contrast to prevailing methods.” While you are initially on the existing flash stance – algorithm and thus checked its effectiveness, “let's hope to use this language to automate the recognition of improvements,” says Zardini, who is in addition to a main sub -search in Lids, the Rudge and Nancy Allen Assistor for Civil and Environmental Engineering and Affiliate -Factors with the Institute for Data, Society and Society, Society, Society, Society, Society and Society, Society and Society.
The plan is that, in the end, he says the software develops so far that “the researcher uploads his code and automatically recognize with the new algorithm, which can be improved, which can be improved and you return an optimized version of the algorithm to the user.”
In addition to automation of algorithm optimization, Zardini notes that a robust analysis of how profound algorithms are connected to the use of hardware resources enables a systematic co-design of hardware and software. This work line integrates into Zardini's focus on the categorical co-design that uses the tools of the category theory to optimize various components of constructed systems at the same time.
Abbott says that “this whole field optimized deep learning models, I think, is not taken into account quite critically, and therefore these diagrams are so exciting. They open the doors for a systematic approach to this problem.”
“I am very impressed with the quality of this research. … The new approach to the diagram of algorithms that are used by this paper could be a very important step,” says Jeremy Howard, founder and CEO of Answers.ai, who was not associated with this work. “This paper is the first time that I have seen such a notation with which the performance of a profound algorithm is deeply analyzed on the real hardware.
“This is a beautifully executed piece of theoretical research that also aims at high access to non -initiated readers – a feature that is rarely seen in papers of this kind,” says Petar Velickovic, senior research scientist at Google Deepmind and a lecturer at Cambridge University, which was not associated with this work. These researchers, he says, “are clearly excellent communicators, and I can't wait to see what they come up with next!”
The new diagram language, which was published online, has already attracted great attention and interest from software developers. A reviewer from Abbott's earlier paper, which presented the diagrams, found that “the proposed neural circuit diagrams look excellent from an artistic point of view (as far as I can judge).” It is technical research, but it is also striking! “Zardini says.