Startup DreamersStartup Dreamers
  • Home
  • Startup
  • Money & Finance
  • Starting a Business
    • Branding
    • Business Ideas
    • Business Models
    • Business Plans
    • Fundraising
  • Growing a Business
  • More
    • Innovation
    • Leadership
Trending

NYT ‘Pips’ Hints, Solution And Walkthrough For Tuesday, November 18

November 18, 2025

Tesla Shareholders Approve Elon Musk’s $1 Trillion Pay Package

November 17, 2025

Today’s Wordle #1612 Hints And Answer For Monday, November 17

November 17, 2025
Facebook Twitter Instagram
  • Newsletter
  • Submit Articles
  • Privacy
  • Advertise
  • Contact
Facebook Twitter Instagram
Startup DreamersStartup Dreamers
  • Home
  • Startup
  • Money & Finance
  • Starting a Business
    • Branding
    • Business Ideas
    • Business Models
    • Business Plans
    • Fundraising
  • Growing a Business
  • More
    • Innovation
    • Leadership
Subscribe for Alerts
Startup DreamersStartup Dreamers
Home » AI Watching And Listening: Cross-Sensory Cognition Work
Innovation

AI Watching And Listening: Cross-Sensory Cognition Work

adminBy adminSeptember 18, 20230 ViewsNo Comments4 Mins Read
Facebook Twitter Pinterest LinkedIn Tumblr Email

Sometimes we forget how much AI is really doing behind the scenes – but to be reminded, we need to look no further than so much of what came out of Imagination in Action, and everything these experts showed us.

Large language models are taking our world by storm, with the ability to imitate human cognition in so many different ways. We’re really seeing all of this lead into a massive trend toward digital disruption.

That idea comes through loud and clear as James Glass takes us through some of the intersections between video, audio and new technology.

For example, take a look at the part of the video where he talks about image captioning and the interplay between visuals and text:

“We were interested in seeing if we could take speech and pair it up with vision, and with no other information, see what the machine could learn from raw audio samples and raw pixels,” he explains. “And so since nothing like this existed, we went out and collected about 400,000 or so people talking about images. People like to do this; it’s pretty easy. Then we (built) a deep learning model, having one branch grovel (sic) over the image and another branch grovel (sic) over the audio, and then at a high level, have them connect and try and learn a joint audiovisual semantic Layton representation of the signal.”

Glass talks about “semantic objects” as versatile units of digital cognition, and shows us how the computer ‘thinks’ by offering a display where you can hear people talking about items in a picture, and see pixels lighting up around those objects.

In a way, it’s kind of like a step-through code editing program where you see what the machine is doing while it’s doing it.

Lighthouses and sunsets are pretty, but Glass suggests there’s more to it than that:

“It’s sort of like somebody shining a flashlight at a picture while you’re talking. And it’s not perfect, but you get a sense that on some of the concepts that you’re hearing, it sort of knows what you’re talking about. You can quantify this a little bit more by looking through a large data set and finding patches (sic) and images that have high correspondence with segments in the speech captions, and pooling them together and then clustering, and you get hundreds and hundreds of these kinds of clusters…”

He talks about the “Rosetta Stone” of language intersection, where some of these new technologies will enable better translations – or more to the point, entirely new kinds of translations transcending text and verbal reading in very sci-fi ways.

But that’s really just the tip of the iceberg. Think about what’s going to happen when we allow AI entities to translate between media, between speech and visuals!

Or to put it another way, think back about a decade to early AI work. We had unsupervised machine learning, and supervised machine learning.

These paradigms that Glass is talking about are inherently different. They’re based on self-supervised learning, as he mentions several times. And that’s critically important. Self-supervising systems evolve in ways that make it hard for humans to keep up with them.

As an example, Glass talks about scene analysis and perception models. Listen to this part where he discusses a methodology for multimedia analysis:

“You can modify that basic model to have a visual branch that’s processing video, and an audio branch that’s processing speech and the audio sounds, and learn a high-level embedding space. And you can do things like retrieval: play an audio snippet and retrieve the corresponding video snippet, and things like that.”

Video: These are some very interesting new things that AI has just become capable of

He talks about listening and understanding, and how we can move the ball forward:

“Deep Learning has really enabled us to make connections across modalities,” he says. “It’s fascinating: self-supervised learning has led us learn from large quantities of unannotated data. And these newer large language models (are) going to be a really interesting research direction (in which) to connect perception with language: two of the original pillars of artificial intelligence.”

It truly is fascinating. After a while, you might find it almost keeps you up at night. With AI doing all of this – how long until it’s doing it better than us? Anyway, the applications are evident, and the methodology, the cutting-edge research, is starkly impressive.

Read the full article here

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email

Related Articles

NYT ‘Pips’ Hints, Solution And Walkthrough For Tuesday, November 18

Innovation November 18, 2025

Today’s Wordle #1612 Hints And Answer For Monday, November 17

Innovation November 17, 2025

Today’s NYT ‘Pips’ Hints, Solution And Walkthrough For Saturday, November 16

Innovation November 16, 2025

AI And The Vertical Market

Innovation November 15, 2025

Today’s Wordle #1609 Hints And Answer For Friday, November 14

Innovation November 14, 2025

Today’s Wordle #1608 Hints And Answer For Thursday, November 13

Innovation November 13, 2025
Add A Comment

Leave A Reply Cancel Reply

Editors Picks

NYT ‘Pips’ Hints, Solution And Walkthrough For Tuesday, November 18

November 18, 2025

Tesla Shareholders Approve Elon Musk’s $1 Trillion Pay Package

November 17, 2025

Today’s Wordle #1612 Hints And Answer For Monday, November 17

November 17, 2025

Apple Pulls China’s Top Gay Dating Apps After Government Order

November 16, 2025

Today’s NYT ‘Pips’ Hints, Solution And Walkthrough For Saturday, November 16

November 16, 2025

Latest Posts

AI And The Vertical Market

November 15, 2025

All of My Employees Are AI Agents, and So Are My Executives

November 14, 2025

Today’s Wordle #1609 Hints And Answer For Friday, November 14

November 14, 2025

Today’s Wordle #1608 Hints And Answer For Thursday, November 13

November 13, 2025

How to Keep Subways and Trains Cool in an Ever Hotter World

November 12, 2025
Advertisement
Demo

Startup Dreamers is your one-stop website for the latest news and updates about how to start a business, follow us now to get the news that matters to you.

Facebook Twitter Instagram Pinterest YouTube
Sections
  • Growing a Business
  • Innovation
  • Leadership
  • Money & Finance
  • Starting a Business
Trending Topics
  • Branding
  • Business Ideas
  • Business Models
  • Business Plans
  • Fundraising

Subscribe to Updates

Get the latest business and startup news and updates directly to your inbox.

© 2025 Startup Dreamers. All Rights Reserved.
  • Privacy Policy
  • Terms of use
  • Press Release
  • Advertise
  • Contact

Type above and press Enter to search. Press Esc to cancel.

GET $5000 NO CREDIT