Comprehensive coverage

Word search in the soundtrack of a video, and voice recognition of the callers to banks

Today the annual event of AVIOS Israel will take place, the 6th annual event in the field of speech technologies, we interviewed two of the presenters of interesting developments for it: a voice search engine and a biometric speech recognition system

AVIOS Israel association logo
AVIOS Israel association logo

Today, the annual conference of AVIOS Israel (applied voice input/output society), which brings together the companies involved in voice recognition and voice output of computer systems, will be held in RG. Before the conference, I interviewed for the Daily Mail newsletter the managers of two companies that developed speech recognition technologies. We bring these two articles to the readers of the knowledge site as well.

Search for a word in a video

NSC will present at the Avius conference a technology that it has transferred from the world of espionage and the call center to all internet users - the ability to find a spoken word within a sound file or a video sound track and go directly to the relevant section, explains Dr. Ami Moyal, the company's CEO

Dr. Ami Moyal, CEO of NSC, what will you present at the conference?

"NSC (RT Natural Speech Communication) will present at the Avios conference the search engine in sound files: www.snipp.tv The user types search words and the engine searches for them in sound files or sound tracks of previously indexed video files and displays results that are files in which the search word was spoken. The Internet is flooded with video content, but until now the only way to search for them was to search in the accompanying texts - whoever uploaded the file chose to write. Upon receiving the search results, it was necessary to watch the films in their entirety. We treat the sound file as if it were a text file and are able to bring the user to the point where the word he typed was spoken, even if it is long movies, and therefore the user can decide if the content is relevant to him."

How did you arrive at such an innovative development, which seems to be borrowed from James Bond movies?

I have been in the field of speech recognition for almost 20 years and for me the development of the use of speech recognition technology for other fields is natural. I definitely see a process of expanding the use of speech recognition beyond the field of human-machine communication to other fields - especially at the level of integration of technology into the world of search. Since our establishment, we have been engaged in the development of a speech recognition engine. Our uniqueness is in implementing the engine on dedicated Blades servers. This engine enables the handling of large amounts of information both in real time and offline, in a wide variety of languages ​​with very high recognition performance.

In recent years, we have mainly focused on finding keywords, that is, the ability to find a certain word within a pool of speech. Naturally, those who use such methods are certain organizations in the security market that record conversations in large quantities and would like to analyze them automatically to find the conversations that deserve to be focused on. We managed to achieve good performance even when there are noise disturbances or other distortions of the channel while performing identification in the unique languages ​​required by the security market. The second market that uses this is the call center industry. Organizations record the conversations and seek to extract business intelligence from them. From here we came to the development that is our highlight today - the search engine for multimedia content on the Internet."

How does the system work?

"Since there is a flood of video content on the Internet, the tv.Snipp search engine allows the user to type the word as in any search engine. Each file goes through a preliminary index process with us, and thus we are able to display the search results in less than a second, even if it is dozens and hundreds of video files in which the word was said. The system is able to target the point within the video where the word was spoken. In this way, we save the user the need to hear or see all the files, and allow him a very fast search process. The site is still in the beta stage and we strongly believe in our approach and solution. We have uploaded to the site video files from significant suppliers including Reuters, Fox and Tribune".

Did you advance Google?

"In launching a search engine for multimedia based on speech recognition technology, we did indeed launch a website before Google, but since everyone in the industry knows that we must put order in the world of multimedia search, I assume that all search engines are working on it, our advantage is that the technology works in other areas, so we have already launched the website. The site has been live for a few weeks and bloggers in different places have started commenting on it and the responses are positive and encouraging, some have written that our site provides an answer not only to the flooding of the video but also to a real search within the video. The rate of visits to the website is increasing".

What is the next development?

"In a comprehensive view, the solution we have is suitable for performing a search in multimedia content in any market where multimedia content exists. For example, the market of content producers who can receive from us search services or automatic tag generation services for their content. For example, the corporate market that will expand its content database from text to multimedia and will certainly need capabilities to index and search multimedia content.

From a global market perspective, it seems that the topic of Voice Search is gaining momentum and the reference is to a voice interface where the user will say a word and the system will search for it in its database."

"The biometric voice recognition will become an integral part of the risk management system"

Says Almog Ali-Raz, CEO of PerSay, which will present at Avios its systems that serve banks, telecom companies, healthcare providers and security organizations

Almog Ali-Raz, CEO of Persei, wrote about the company.

"PerSay develops, manufactures and markets systems for biometric speaker identification, capable of producing voice signatures of customers and employees. These voice signatures are then used to identify the people when accessing service centers, and performing sensitive operations. The company that started a few years ago as a spin-off of Verint, which developed the technology for security organizations, currently provides it to customers in the banking, telecom, healthcare organizations, large organizations that use it in applications such as resetting passwords and security and government organizations."

What are the challenges facing the development of the system?

"The challenges are algorithmic challenges. We are required to develop software capable of processing the user's voice, finding its unique characteristics and distinguishing it from other voices. The second challenge is to build a system that is able to take the algorithm that works in the laboratory, and implement it in a complex IT environment. When the stakeholders in the implementation of a product like ours in a bank's customer service center are information security personnel, customer service personnel, operations, IT, telephony and usually also system integrators because the system needs to be integrated in a technological environment. The secret of the company's success is that it excels in these two parameters - the accuracy of the identification of its systems compared to the competitors - mainly in the world and, in addition, the ease of assimilation of its systems."

What are the advantages of the system for the organizations that use it?

"Our systems give many added values ​​to organizations. First of all, they improve the level of security by adding a biometric layer and thus enable Multi Factor authentication. In addition, they improve the customer experience by not having to be asked questions by service representatives or remember complex passwords. In addition to all this, the company's systems improve the efficiency of organizations by shortening call times at service centers and enabling the automation of sensitive manual processes such as resetting passwords. One of the advantages of speaker identification technology is the fact that it can be applied across all customer contact channels with organizations, this technology can be used to identify people calling service centers, performing actions on the Internet or using mobile devices. Another uniqueness of our systems is that they do not depend on language or accent."

Is the system safe from impersonators?

"Yes. In addition, the company also has capabilities not only to perform identity verification but also to locate imposters in real time based on a comparison to a pre-recorded voice signature database. For example, at a bank's service center, if someone stung the organization and managed to withdraw money from accounts and the conversations he had with the bank are recorded, you can use the recording and build a voice signature that will locate him the next time he calls."

Who are your customers?

"The company's clients include some of the world's leading banks, telecom companies, security organizations and more, but the emphasis is mainly on banking. We have developed a dedicated set of products for banks that allow identification with a voice password and also during a natural conversation, for example yours with the service representative at the bank. We lead the field in the world in terms of the number of installations and their size. From the telecom sector, we can mention Bell Canada, for example, over 750 of its customers chose to identify themselves using a voice password, and in the first year we recorded millions of identifications."

What is the future of biometric speech recognition technology?

"In the future, we see the issue of biometric speaker identification becoming an integral part of risk management in customer access to organizations. Each of us will have a signature or a set of voice signatures that will enable optimal privacy protection and efficient access to remote applications and services."

The articles were first published in the Daily Mail newsletter of the People and Computers group (The People)

6 תגובות

  1. This will be effective for indexing but not really for passwords since anyone can record you on countless occasions and then sting using the recording

  2. Everyone who has OFFICE 2007 has the ONENOTE software there, which, among other things, allows you to record directly from the microphone. There is a component called Microsoft Search 4 that knows how to analyze (not in real time) the file and convert it to text.

  3. I think it will take time for the companies in the field to establish themselves in the market and improve their products, and only then will they be able to move to the private market and provide solutions for the home user.

    On the other hand, it is good to know that Israel has a foothold in this field.

  4. With all due respect, and there is a great deal of respect and appreciation, we still do not see such software being integrated for the home user. The technology may be here but PCs are still too weak to support it. Such blades may be needed to make this magic work; But the main biomass that will need to consume the product is behind a PC and therefore the pace of development, as it seems to me as an end user, is not satisfactory. I remember when Windows XP came out, I then downloaded a heavy software called Dragon (if I'm not mistaken) to try. For a whole week I trained her to recognize my voice while reading texts of different difficulty levels. Nothing helped. The software slowed down the computer, suddenly opened applications without anyone tweeting and recognized almost nothing. In another attempt this year, I saw that nothing had changed.

    Hope the engineers will work overtime and already bring this great news of using voice as a communication tool with a computer. Just like Star Trek.

    Greetings friends,
    Ami Bachar

Leave a Reply

Email will not be published. Required fields are marked *

This site uses Akismat to prevent spam messages. Click here to learn how your response data is processed.