Getting Serious About Handling Sparse Models! Re-Examining Dense Tensors In The AI Age

What can you do about data sparsity? What do you do when you have a matrix with a bunch of zeros in it, and you can’t get a good look at a complex system because so many of the nodes are empty?

This can be a major issue. If you’re wondering “what the pros would do,” you just might be interested in the idea of solutions for sparse data problems. If you read the journals, you’re probably seeing the term pop up in tech media. But what does it mean? Digging through your lexicon of terms, looking at classifier charts, ruminating on the Boltzmann machine – all of that can be helpful, but also, hearing from people in the field is a powerful way to connect with what’s on the front burner right now.

For some thoughts on this, Saman Amarasinghe goes all of the way back to the 1950s to talk about FORTRAN and its build.

“One thing interesting about Fortran was it had only one data structure; the data structure was tensors,” he says.

However, as he moves on, Amarasinghe suggests that we have to expand our view of new AI systems to deal with sparsity as a fundamental issue, and that dense tensors present a challenge.

“The world is not dense,” he said, underlining phenomena like replication and symmetry that can help us to conceive of data models differently.

Sparsity, by the way, refers to situations where there isn’t enough data, or where too many of the data points have a zero or null value.

The latter is often referred to as ‘controlled sparsity’.

Amarasinghe suggests that we can deal with these kinds of sparsity with new approaches that expand on what dense tensors have done for the past half-century.

Dense tensors, he notes, are flexible, but they waste memory.

The solution? Compressing data sets, and using metadata to point to the empty values.

“The problem is: I have all of these zeros I’m storing,” Amarasinghe says. “So what I want to do is: instead of doing that, how do I compress this thing, (and I) don’t store the zeros. But now how do you figure out what that value is? Because we don’t know. … we need to keep additional data called metadata. And the metadata will say, for that value, what the row and column number is: this is called the coordinate format.”

Amarasinghe shows a series of projections of the type of code that you’ll need to create a multi-tensor result

“This is hard,” he concludes, while also providing some caveats that may be handy in tomorrow’s engineering world.

Ignoring sparsity, he contends, is throwing away performance. Amarasinghe explains how the efficiencies work:

“At some point, you get better and better performance,” he says. “And if there’s a lot of sparsity, (you get) a huge amount of performance. Why? Because if you multiply by zero, you don’t have to do anything. You don’t even have to fetch items. So normally, you keep zeros multiplied (and) if you add, you just have to frame the data and copy it – you don’t (unintelligible) operation. So because of these two, I can do a lot less operation, a lot less data fetches, and you get good performance.”

Going deeper into ideas like vector multiplication, Amarasinghe illustrates more of the work the engineers have to do to deal with data sparsity at a fundamental level.

“If you look at where the data is, large amounts of data, things like sparse neural networks (are) right at the cusp of getting performance using matrix matrix multiply,” he says. “But there are many different other domains, we have sparsities in much large numbers. So you can get a huge amount of performance in here.”

New approaches might help us to figure out how to handle data irregularities in systems)

Amarasinghe also presents a slide with some relative sparsity in the following categories:

Internet and social media
Circuit simulation
Computing chemistry
Fluid dynamics
Statistics

This part of the presentation speaks to the idea of analyzing different kinds of data systems differently. It is also instructive of trends in AI right now: you can find papers on data sparsity problems in statistics, for example, all over the Internet. We also see that sparse data bias is viewed as a major problem for systems.

To address this, Amarasinghe suggests engineers can build a sparse tensor compiler to optimize the problem. Watch carefully the part of the talk where he goes into the use of lossless compression – some of the visuals may help.

“What we have done is (we’ve) made it possible for programmers to write the code as they’re working on dense data,” he says. “But actually, in fact, the data is compressed. So we are going to operate on compressed data, and get great performance.”

Read the full article here