Comprehensive coverage

The calm after the storm

In the most sophisticated acoustic laboratory of its kind in Israel, at the Bar-Ilan School of Engineering, Dr. Sharon Ganot and his colleagues are conducting experiments in computerized decoding of multi-participant conversations that are constantly moving and their voices are swallowed by the background noise and other conversations

The acoustic laboratory at Bar-Ilan University. Photo: Sharon Ganot
The acoustic laboratory at Bar-Ilan University. Photo: Sharon Ganot

How many times have you found yourself putting your earpiece close to the phone's earpiece, even pressing the speakerphone button, and despite all your efforts you can't hear your interlocutor well? And your situation is even better, compared to a hard of hearing person for whom the system amplifies the voice of the speakers, but at the same time also the noise.

The problem of processing speech signals has been preoccupying Dr. Sharon Ganot from the School of Engineering at Bar-Ilan University since he himself was a graduate student. Ganot, his students as well as post-doctoral students from abroad who come to his lab, develop a variety of algorithms that have in common the need to receive a speech signal in a noisy environment and improve it - starting with one speaker in one place and accompanied by loud noise, and ending with many speakers in a noisy environment and not standing in their place .

The laboratory, one of the most sophisticated and unique in the world, began operating this year at the School of Engineering, and it is the one that allows Ganet and his laboratory personnel great flexibility in performing complex experiments in processing speech signals.

"The problem that bothers me, which takes different forms and complexities, is the reception of a speech signal in an environment with disturbances and its improvement. The disturbances can be caused by background noise, other conversations near the speakers, as well as severe reverberation (for example inside a receiver). Providing an answer to this problem will help to solve communication problems of the type of automatic pointing of the camera to the speaker in a video conference, improvement of hearing aids for the deaf, and the like," explains Ganot. "In the first step I receive a speech signal, which I pick up with the help of one or more microphones. Even today, some luxury cars are equipped with four hands-free speakers, so I can use the information from four microphones to perform the tasks I want to perform."

"The problems facing me are many and complex. First, the speech signal itself is a natural signal, which cannot be shaped as we wish, and therefore there is no good model for it that can be entered into a computer. The speech signal is characterized by several phenomena. Second, its properties change over time; On top of that, its intensity changes from very low intensities to very strong intensities (or vice versa) in short time intervals; And of course, environmental factors cannot be ignored. Every environment is acoustically different. The complex environment is expressed in a very large collection of reverberations, due to the impact of the sound waves on various objects and of course the walls. It is this large collection that creates the sense of reverberation. When the room is very 'reverberant' or alternatively completely echoless, the human listener experiences this as a feeling of discomfort, although this makes it much easier for the algorithms."

The phenomenon of returns in the room is a phenomenon that we usually measure with two numbers. One is the reverberation time, measured in seconds. This is the amount of time it takes for a signal, from the moment it leaves the speaker's mouth, to continue to reverberate in the room. For the human ear, 300-200 milliseconds is a reasonable decay time. These conditions are common, for example, in office rooms. When the decay time reaches half a second, we start to feel discomfort.

In order to neutralize the reverberation, Ganut explains, the algorithm is required to network the complex acoustic system that connects the point where the speech was transmitted and the point where it was received. The filter algorithm has many coefficients, the number of which increases with the reverberation time.

"Any impact of the speech signal on any bone in the room will cause it to be reflected from it and therefore arrive with a certain delay and delay compared to its direct arrival from the original speaker. If we add many such delays, we will get a dense system of delays that fade in increasingly longer times. The second number is the power ratio between the main arrival and the other arrivals. In normal rooms, at a distance of about a meter to a meter and a half between the speaker and the sensor, the intensity of the reflections becomes dominant."

Another problematic phenomenon in acoustic systems is the fact that they change rapidly. When the speaker moves a few centimeters, we get a completely different response of the room.

One microphone, noisy environment

"The classic problem, which has been handled with varying degrees of success for almost 40 years, is the attempt to clean signal noise picked up by a single microphone. Let's refer to the example of the microphone of the mobile phone that the driver is trying to dictate a number to dial. Even if we close the windows, the environment will still be quite noisy, and in the background there is a constant noise, or at most a slowly changing noise, coming from the air conditioner and the car's engine. The compromise will always be between noise removal and speech distortion. The more we clean up the noise, the more we will be left with a more metallic sound."

"No problem in this area has been fully resolved, so it is still not possible to delete the word noise from the dictionary, even though there are dramatic improvements," notes Ganot, who made two contributions to the topic: "The first contribution leaves the received signal in the time domain, and the second refers to it in the frequency domain .”

The solution for cleaning noise in the time domain was developed by Ganot in his master's thesis, under the guidance of Prof. Ehud Weinstein and Prof. David Borstein from Tel Aviv University.
"I am trying to give a statistical model to the speech signal and its change over time. If I knew the particular clean speech signal I could easily estimate its characteristic parameters. The problem is that I don't know the particular letter that I want to clear (otherwise there would be nothing to clear...). I only know the speech signal accompanied by noise. If someone whispered the exact parameters to me I could run an optimal filter and clean the speech signal from the noise. The filter I used is called the Kalman Filter, named after the Hungarian-American scientist Rudolf Kalman

The second method of cleaning a signal from noise is by using the frequency domain. The human ear also operates in different frequency ranges. There are sensory cells in the inner ear, each responsible for a different frequency range" (and see: Haim Somer, "The Ear and the Voice", "Galileo 127). "Another fact that we use to clean signal noise in the frequency domain is the fact that the human ear is not sensitive to intensities in a linear fashion. It has a sort of logarithmic scale. When the intensities are low, we will clearly notice the differences in intensities, and when the intensities are high, we will not notice these differences."

"I convert the speech signal to the frequency plane and want to create a model for it that will distinguish it from the noise. To achieve this model I train the computer. I take a database of many clean speech sentences, which are not related in any way to the speaker I am trying to clean, and extract from this large collection a statistical model that manages to characterize any speech signal. All this before running the algorithm on a specific problem. When approaching a new signal that I am trying to clean, I take the noisy samples and compare them to the model I learned about clean speech, and based on this comparison decide which of the models in the database is the most suitable for the specific signal I have just picked up. After choosing the appropriate model, I know how to use it to clean the speech signal from the noisy signal."

The goal is to implement these algorithms in tiny devices, such as a mobile phone or a hearing aid. This algorithm distorts the sound a little more compared to the algorithm in the time dimension, but its computational load is very low and it is specially adapted for hearing aids. This work was also carried out jointly with Prof. David Burstein from Tel Aviv University.

The multi-microphone problem

Ganot continues and explains the process: "After the problem of receiving a signal with one microphone and filtering it from the noise was reached, we decided to use the spatial aspect as well, since a person also has two ears. And yet: a computerized system has no limit, and it is possible to use an unlimited array of microphones instead of a single microphone. By using an array of microphones we gained the feature of directionality. If a single microphone is sensitive in the same way to all directions, or at least to a wide angular key, then an array of microphones has a spatial separation capability that we want to use to differentiate between the desired signal coming from a certain direction or from a certain point and a noise signal coming from another place."

"When I speak to a person to my right, it is clear that the right ear will receive the speech signal before the left ear. Therefore, it is possible to use the time differences between the reception of the signal in both ears to estimate where the speaker is. In complex acoustic problems, this can be done using an array of microphones, where the relative difference between the arrival of the signals to the various microphones indicates the direction of the signal."

We can take advantage of the fact that the desired noise and signal will never arrive from the same point to focus on the conversation we want from the night of voices. The conversation will come from a given place, while the noise will come from an air conditioner, or from a conversation at a nearby table. The spatial information will allow us to get much better performance. When trying to follow a lecturer walking around the Anna and Anna hall, and not using a wireless microphone, it is necessary to study the changing acoustic environment. We do this by means of an array of microphones scattered around the room, and feeding the received signals into an algorithm that will know how to preserve the desired signal and reduce all interfering signals coming from the sides."

"In order to achieve this I use two ideas: the first idea is that instead of trying to network the entire system that connects the speaker to the microphone, which is a complicated system, I network only the relative connection between the reception of the signal in the various microphones. The second idea is to take advantage of the fact that the speech signal varies in time at a high rate as opposed to noise that varies at a low rate. This way I achieve a good separation ability between the two signals."

"Even if the algorithm did not completely solve the problem, at least it improved the quality of the speech dramatically. Recently, we have added several improvements. One, in collaboration with Prof. Israel Cohen from the Technion, enabled a better handling of noises that change over time, such as a passing truck when our car window is open. The second, with Ronen Talmon, a student working in collaboration with Israel Cohen and me, enables treatment in rooms with significantly longer reverberation times. Now Dr. Ganot and his partners can use the sophisticated and unique laboratory in Israel to try any microphone array and any source of noise or speech they request.

A complex problem that we are trying to analyze in the laboratory is called the "cocktail party" problem: several people are talking at the same time in the room, sometimes while walking and in the presence of background noise, and we have to isolate one conversation from the night of conversations. The function of the algorithm is to separate the desired speakers and isolate them from the other voices and noises. The algorithm has many applications, for example as an aid to the hearing impaired. In this application it is possible to focus on the conversation going on in the speakers in front of the face of the hearing aid wearer.

Another interesting problem being addressed in the new lab is echo cancellation. Too much reverberation disturbs the listeners, and may also damage automatic speech recognition systems. One of the examples on the site illustrates the suppression of distant echoes from speech signals received from a distance of 250 centimeters.
This work was carried out in collaboration with Prof. Cohen and Dr. Emanuel Habets, who did his post-doctoral training in Dr. Ganot's laboratory. Dr. Hebbetts is now at Imperial College London.

Another problem is the acoustic echo problem: a speaker on the phone on one side of the conversation wants to hear his interlocutor from the other end of the line, but also hears his own voice after returning from the other end. In the Genot research group, in conjunction with a group from the Netherlands (headed by Dr. Piet Sooman (foreigner) and the then doctoral student Dr. Hebets), an algorithm was developed that is able to effectively deal with the cancellation of the echo, at the same time as improving the quality of the signal sent from the remote phone by reducing Noises, such as the hum of an air conditioner and a computer fan, and the reduction of the reverberation level in it.

The most sophisticated acoustic laboratory in Israel

This year, the sophisticated acoustic laboratory in Israel began operating at the Bar Ilan School of Engineering, under the planning and responsibility of Dr. Sharon Ganot.
The laboratory that was established in the design of Dr. Genot and is located in the School of Engineering looks at first glance like a recording studio in a record company, acoustically isolated from its surroundings. The ability to control the reverberation level in the room is what makes the laboratory a unique unit. The ceiling, floor and walls are made up of a collection of panels that can be set to reverberate or absorb the signal, so you can get a large collection of rooms of different acoustic character in this single room.
The equipment allows recording using 24 microphones at the same time, and the transmission of eight signals from speakers at the same time.

term

  • A signal is a description of the variability of a physical phenomenon as a function of its variable (usually a function of time). A system performs the conversion of a signal at its entrance to another signal at its output. In an acoustic system, the system creates a large number of delays and decelerations of the input signal (the speech as it comes out of the mouth) to receive a resonant output signal (the speech as it is received by the ear or microphone). Signals and systems will be represented using mathematical functions.
  • Fourier (Jean Baptiste Joseph Fourier 1768-1830) was a French physicist and mathematician. Among his other occupations, he joined Napoleon's campaigns of conquest in Egypt and was even appointed to senior administrative positions in the French government in Egypt. As part of his research on heat transfer, he claimed in 1822 (without proof) that any periodic function (even if it is not continuous) can be written as an infinite series of trigonometric functions (sine and cosine) whose frequencies are the frequency of the periodic signal and its integral multiples. These frequencies are called harmonics and the resulting column is called a Fourier column. Later, the concept was extended to handle signals with infinite cycles (that is, general non-periodic functions). It can be shown that these signals can be recorded as a sequence of harmonics known as the Fourier transform.
  • A wave is a physical phenomenon that depends on both time and location. An example familiar to all of us is the waves of the sea. If we look at a certain moment we will see a collection of "hills" and "valleys" from the beginning of the wave to the coastline. If we ride on a buoy (at a certain point) we will rise and fall as a function of time. So the sea waves change both in time and in location. It can be shown that the multiplication of the wavelength (the distance between the hills along the position axis) by the frequency (the rate of rise and fall on the float) is equal to the propagation speed of the wave. Sound is a wave and its propagation speed is only 342 meters per second (depending on temperature and air pressure). The sound wave is propagated by changes in the pressure of the particles of matter and therefore sound does not exist in a vacuum.
  • Creation of a speech signal: the origin of the speech signal is in the air exhaled from the lungs. This air makes its way towards the mouth (and sometimes the nose as well). If the vocal cords come into action, the air flow is intermittently interrupted and instead of a continuous flow we will get pulses. A periodic signal is received. The distance between the pulses (known as pitch) determines the pitch of the sound. The oral cavity is used to write a changing resonance for the sound wave. The resonance frequencies (called formants) can be controlled by changing the position of the tongue and lips. The resonant frequencies determine the spoken wheel. The sound wave leaves the oral cavity and spreads through the air until it reaches the receiver: a human ear or a microphone, which converts the sound wave into a nerve or electrical signal respectively.

You can listen to many examples of recordings On the website of Dr. Ganot's laboratory In the Audio files demonstration link.

6 תגובות

  1. Eran. You're right. My intention was to point out that the brain solved the technological problem presented here. It's a shame that it's still not possible to delve into the intricacies of the brain to check how it does this (as well as many other things).

  2. 3- It's like saying that it's interesting that Rabin was murdered in Rabin Square...
    The names are similar for the simple reason that it is the same problem.

  3. I suggest that before you assign grades to the labs, you check on what basis the lab is qualified, what is its level of accuracy, what kind of acoustic information can be extracted from it.
    In order to produce a variable reverberation time you don't need too much knowledge and technology, at the same time the creation of acoustic fields, mapping, calibrating them, maintaining their stability is a much more difficult task and this, together with many other parameters, can testify to the level of a laboratory.
    I suggest you visit two acoustic sites located in the city of Or Yehuda at Optoacoustics Ltd. and Isosound Labs Ltd. which own the most professional facilities in Israel, with certificates to that effect.

  4. It is interesting that what is referred to in the article as the cocktail party problem - the problem of the microphones focusing on the desired speaker out of a night of speakers and other noises is the name of a natural feature of the human brain called the cocktail party effect according to which a person is able to concentrate on a certain speaker, even in an environment of other conversations and noises.

Leave a Reply

Email will not be published. Required fields are marked *

This site uses Akismat to prevent spam messages. Click here to learn how your response data is processed.