Startup DreamersStartup Dreamers
  • Home
  • Startup
  • Money & Finance
  • Starting a Business
    • Branding
    • Business Ideas
    • Business Models
    • Business Plans
    • Fundraising
  • Growing a Business
  • More
    • Innovation
    • Leadership
Trending

First Pill For Obstructive Sleep Apnea Shows Promise In Phase 3 Study

June 30, 2025

From ‘Side Project’ to 8-Figure Business: Left On Friday

June 30, 2025

Top 17 Events and Conferences to Help Grow Your Business

June 30, 2025
Facebook Twitter Instagram
  • Newsletter
  • Submit Articles
  • Privacy
  • Advertise
  • Contact
Facebook Twitter Instagram
Startup DreamersStartup Dreamers
  • Home
  • Startup
  • Money & Finance
  • Starting a Business
    • Branding
    • Business Ideas
    • Business Models
    • Business Plans
    • Fundraising
  • Growing a Business
  • More
    • Innovation
    • Leadership
Subscribe for Alerts
Startup DreamersStartup Dreamers
Home » AI Watching And Listening: Cross-Sensory Cognition Work
Innovation

AI Watching And Listening: Cross-Sensory Cognition Work

adminBy adminSeptember 18, 20230 ViewsNo Comments4 Mins Read
Facebook Twitter Pinterest LinkedIn Tumblr Email

Sometimes we forget how much AI is really doing behind the scenes – but to be reminded, we need to look no further than so much of what came out of Imagination in Action, and everything these experts showed us.

Large language models are taking our world by storm, with the ability to imitate human cognition in so many different ways. We’re really seeing all of this lead into a massive trend toward digital disruption.

That idea comes through loud and clear as James Glass takes us through some of the intersections between video, audio and new technology.

For example, take a look at the part of the video where he talks about image captioning and the interplay between visuals and text:

“We were interested in seeing if we could take speech and pair it up with vision, and with no other information, see what the machine could learn from raw audio samples and raw pixels,” he explains. “And so since nothing like this existed, we went out and collected about 400,000 or so people talking about images. People like to do this; it’s pretty easy. Then we (built) a deep learning model, having one branch grovel (sic) over the image and another branch grovel (sic) over the audio, and then at a high level, have them connect and try and learn a joint audiovisual semantic Layton representation of the signal.”

Glass talks about “semantic objects” as versatile units of digital cognition, and shows us how the computer ‘thinks’ by offering a display where you can hear people talking about items in a picture, and see pixels lighting up around those objects.

In a way, it’s kind of like a step-through code editing program where you see what the machine is doing while it’s doing it.

Lighthouses and sunsets are pretty, but Glass suggests there’s more to it than that:

“It’s sort of like somebody shining a flashlight at a picture while you’re talking. And it’s not perfect, but you get a sense that on some of the concepts that you’re hearing, it sort of knows what you’re talking about. You can quantify this a little bit more by looking through a large data set and finding patches (sic) and images that have high correspondence with segments in the speech captions, and pooling them together and then clustering, and you get hundreds and hundreds of these kinds of clusters…”

He talks about the “Rosetta Stone” of language intersection, where some of these new technologies will enable better translations – or more to the point, entirely new kinds of translations transcending text and verbal reading in very sci-fi ways.

But that’s really just the tip of the iceberg. Think about what’s going to happen when we allow AI entities to translate between media, between speech and visuals!

Or to put it another way, think back about a decade to early AI work. We had unsupervised machine learning, and supervised machine learning.

These paradigms that Glass is talking about are inherently different. They’re based on self-supervised learning, as he mentions several times. And that’s critically important. Self-supervising systems evolve in ways that make it hard for humans to keep up with them.

As an example, Glass talks about scene analysis and perception models. Listen to this part where he discusses a methodology for multimedia analysis:

“You can modify that basic model to have a visual branch that’s processing video, and an audio branch that’s processing speech and the audio sounds, and learn a high-level embedding space. And you can do things like retrieval: play an audio snippet and retrieve the corresponding video snippet, and things like that.”

Video: These are some very interesting new things that AI has just become capable of

He talks about listening and understanding, and how we can move the ball forward:

“Deep Learning has really enabled us to make connections across modalities,” he says. “It’s fascinating: self-supervised learning has led us learn from large quantities of unannotated data. And these newer large language models (are) going to be a really interesting research direction (in which) to connect perception with language: two of the original pillars of artificial intelligence.”

It truly is fascinating. After a while, you might find it almost keeps you up at night. With AI doing all of this – how long until it’s doing it better than us? Anyway, the applications are evident, and the methodology, the cutting-edge research, is starkly impressive.

Read the full article here

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email

Related Articles

First Pill For Obstructive Sleep Apnea Shows Promise In Phase 3 Study

Innovation June 30, 2025

Tick Problem Is Getting Worse, This Risk Index At Highest Level, 10/10

Innovation June 29, 2025

Deontay Wilder Vs. Tyrrell Herndon Results And Full Card Results

Innovation June 28, 2025

Second ‘Gundam Hathaway’ Movie Gets A New Trailer And Winter Release

Innovation June 27, 2025

Recycling Shells For Baby Oysters Reaps Financial, Environmental Gains

Innovation June 26, 2025

Dbrand Responds To Killswitch Switch 2 Backlash With Promised Fix

Innovation June 25, 2025
Add A Comment

Leave A Reply Cancel Reply

Editors Picks

First Pill For Obstructive Sleep Apnea Shows Promise In Phase 3 Study

June 30, 2025

From ‘Side Project’ to 8-Figure Business: Left On Friday

June 30, 2025

Top 17 Events and Conferences to Help Grow Your Business

June 30, 2025

Disney Just Threw a Punch in a Major AI Fight

June 30, 2025

Tick Problem Is Getting Worse, This Risk Index At Highest Level, 10/10

June 29, 2025

Latest Posts

His Side Hustle Led to 7 Figures and Richard Branson’s Island

June 29, 2025

Deontay Wilder Vs. Tyrrell Herndon Results And Full Card Results

June 28, 2025

Brothers’ Side Hustle Made Over $175 Million: ‘No Investors’

June 28, 2025

What Every B2B Brand Should Be Doing to Earn Trust in 2025

June 28, 2025

Think You Need Millions to Buy a Business? Think Again.

June 28, 2025
Advertisement
Demo

Startup Dreamers is your one-stop website for the latest news and updates about how to start a business, follow us now to get the news that matters to you.

Facebook Twitter Instagram Pinterest YouTube
Sections
  • Growing a Business
  • Innovation
  • Leadership
  • Money & Finance
  • Starting a Business
Trending Topics
  • Branding
  • Business Ideas
  • Business Models
  • Business Plans
  • Fundraising

Subscribe to Updates

Get the latest business and startup news and updates directly to your inbox.

© 2025 Startup Dreamers. All Rights Reserved.
  • Privacy Policy
  • Terms of use
  • Press Release
  • Advertise
  • Contact

Type above and press Enter to search. Press Esc to cancel.

GET $5000 NO CREDIT