Sparse Models, The Math, And A New Theory For Ground-Breaking AI

Video: This intriguing theory from a master of conceptual science might end up being crucial to new AI advances.

Get ready for a lot of math…!

We have sort of an intuitive understanding of a big need in artificial intelligence and machine learning, which has to do with making sure that systems converge well, and that data is oriented the right way. Also, that we understand what these tools are doing, that we can look under the hood.

A lot of us have already heard of the term “curse of dimensionality,” but Tomaso Armando Poggio invokes this frightening trope with a good bit of mathematics attached… (Poggio is the Eugene McDermott professor in the Department of Brain and Cognitive Sciences, a researcher at the McGovern Institute for Brain Research, and a member of the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL)

In talking about the contributions of Alessandro Volta in 1800 and his development of the first battery, BB makes the analogy to current technology and the frontier that we’re facing now.

We need, he says, a theory of machine learning to provide, in his words, “deep explainability” and for other kinds of fundamental advancement.

“That’s the root of a lot of the problems,” Poggio says. “(A lack of) explainability: not knowing exactly the properties and limitations of those systems … and we need a theory because we need better systems.”

He also suggests we can find principles that human intelligence has in common with large language models, and use those for deeper exploration.

(Watch Poggio’s description of a process where someone can use a “powerful estimator” and parametric analysis to approximate an unknown function, and then, in principle, find the relevant parameters by optimizing the fit between different components, and how this process relates to thinking, in a broader way, about the use of an implicit function from input/output data.)

Later, in assessing an image of which the parameters number no less than ten to the power of 1000, Poggio compares that number to the number of protons in the entire universe: 10 to the power of eighty.

“This (dimensional volume) is a real curse,” he says.

In describing the curse of dimensionality as it affects new systems, Poggio talks about the example of working with a “well-known and classical function,” and also describes the nature of a compositional function that would help with these sorts of problems.

Breaking down binary trees into collections of variables, he talks about dimensionality and the principle of sparse connectivity, again, with a detailed description that you’ll want to listen to, maybe more than once.

“(This approach) will avoid the curse of dimensionality when approximation is done by a deep network with the same compositional structure, that same sparse connectivity at different layers. … the question was, then, are compositionally sparse functions very rare, something that happens, perhaps, with images? … this would explain why convolutional networks are good, and dense networks are bad.”

Not to add more technicality, but the following statement by Poggio seems to sum up this part of his theory:

“It turns out (that) every practical function, every function that is Turing computable, in non-polynomial, (or) non-exponential time, is compositionally sparse, and can be approximated without curse of dimensionality, by a deep network with the appropriate sparse connectivity at each layer.”

Watch this sentence closely.

Generally, using the example of a convolutional network, Poggio talks about how sparsity could help us to uncover key improvements in AI/ML systems. He explains what he calls a “conjecture” on sparse models this way:

“This may be what transformers can do, for at least a subset of functions:

to find that sparse composition at each level of the hierarchy. And this is done by self-attention, which selects a small number, a sparse number of tokens, at each layer in the network.”

This is, to put it mildly, very interesting for engineers who are trying to break through the current limitations of what we can do with AI and ML. A lot of it, to be sure, has to do with black box models, and dimensionality, and fitting.

Take a look and see what you think of this approach. Poggio concludes with a summary:

“I think we need a theory-first approach to AI. This will provide true explainability, will allow us to improve on the systems … which we don’t understand why they work, which is kind of very ironic. And perhaps beyond that, to really discover principles of intelligence that apply also to our brain(s). … any testing conjecture to be explored (involves the idea that) what that (model) may be doing is really: to find at least for a subset of interesting function(s), the sparse variables that are needed at each layer in a network.”

Read the full article here