Saturday April 11, 2020
Home Lead Story Google AI can...

Google AI can focus on individual speakers in a crowd

The visual signal not only improves the speech separation quality significantly in cases of mixed speech, but, importantly, it also associates the separated, clean speech tracks

0
//
Google india launches 'Tz' to help people pay their utility bills. Wikimedia Commons
Google AI to identify speakers from crowd. Wikimedia Commons

Just as most smartphone cameras now allow users to focus on a single object among many, it may soon be possible to pick out individual voices in a crowd by suppressing all other sounds, thanks to a new Artificial Intelligence (AI) system developed by Google researchers.

This is an important development as computers as not as good as humans at focusing their attention on a particular person in a noisy environment. Known as the cocktail party effect, the capability to mentally “mute” all other voices and sounds comes natural to us humans.

Google has collaborated with getty images. Wikimedia Commons
Google AI will identify individual speakers now. Wikimedia Commons

However, automatic speech separation — separating an audio signal into its individual speech sources — remains a significant challenge for computers, Inbar Mosseri and Oran Lang, software engineers at Google Research, wrote in a blog post this week. In a new paper, the researchers presented a deep learning audio-visual model for isolating a single speech signal from a mixture of sounds such as other voices and background noise.

“In this work, we are able to computationally produce videos in which speech of specific people is enhanced while all other sounds are suppressed,” Mosseri and Lang said. The method works on ordinary videos with a single audio track, and all that is required from the user is to select the face of the person in the video they want to hear, or to have such a person be selected algorithmically based on context.

Also Read: Want To Know What Facebook, Google Know About You?

The researchers believe this capability can have a wide range of applications, from speech enhancement and recognition in videos, through video conferencing, to improved hearing aids, especially in situations where there are multiple people speaking. “A unique aspect of our technique is in combining both the auditory and visual signals of an input video to separate the speech,” the researchers said.

google
This will also help in speech enhancement . VOA

“Intuitively, movements of a person’s mouth, for example, should correlate with the sounds produced as that person is speaking, which in turn can help identify which parts of the audio correspond to that person,” they explained.

The visual signal not only improves the speech separation quality significantly in cases of mixed speech, but, importantly, it also associates the separated, clean speech tracks with the visible speakers in the video, the researchers said. IANS

Next Story

Microsoft Registers New Daily Record of 2.7 Billion Meeting Minutes on March 31

Microsoft Stream is the service that powers live events and meeting recordings in Teams

0
Microsoft
Microsoft found people in Norway and the Netherlands turn on video most, with about 60 per cent of calls including video. Pixabay

As the salary day progressed amid COVID-19 lockdowns, Microsoft registered a new daily record of 2.7 billion meeting minutes in a single day on March 31 — 200 per cent increase from 900 million on March 16, the company announced on Thursday.

As students and teachers turn to Teams for distance learning, there are 183,000 tenants in 175 countries using Teams for Education, said Jared Spataro, Corporate Vice President for Microsoft 365.

“As the world works remotely, it is no surprise people are turning on video in Teams meetings two times more than before many of us began working from home full-time. We’ve also seen total video calls in Teams grow by over 1,000 percent in the month of March,” Spataro informed.

Microsoft found people in Norway and the Netherlands turn on video most, with about 60 per cent of calls including video. “People in Australia use video in meetings 57 per cent of the time, Italy 53 per cent, Chile 52 per cent, Switzerland 51 per cent, and Spain 49 per cent. People in the UK, Canada, and Sweden use video 47 per cent of the time and people in Mexico and the US use it 41 per cent and 38 per cent respectively,” the company said.

People in India use video in 22 per cent of meetings, Singapore 26 per cent, South Africa 36 per cent, France 37 per cent, and Japan 39 per cent. “This may be attributed in part to less access to devices and stable internet in some regions such as India and South Africa,” said Spataro.

Microsoft
As the salary day progressed amid COVID-19 lockdowns, Microsoft registered a new daily record of 2.7 billion meeting minutes in a single day on March 31 — 200 per cent increase from 900 million on March 16, the company announced on Thursday. Pixabay

From March 1-March 31, the average time between a person’s first use of Teams and last use of Teams each day increased by over one hour. The number of weekly Teams mobile users grew more than 300 per cent from early February to March 31.

“We’ve seen large increases in usage of Teams on mobile devices from customers in higher education and primary and secondary education (K-12). We’ve also seen a notable increase from customers in government-related industries,” said Spataro.

ALSO READ: NASA Selects Masten Space Systems To Deliver 8 Polaroids To Moon’s South Pole

Microsoft Stream is the service that powers live events and meeting recordings in Teams.

“As a result of customers moving events online, the number of Stream videos in Teams per week has increased over five times in the last month with hundreds of hours of video uploaded per minute,” the executive said. (IANS)