Comprehensive coverage

Computer vision is an important component of the cognitive age

This is what Tal Drori, director of the MULTIMEDIA ANALYTICS department at IBM's research laboratory in Haifa, says in an interview with the Hadaan site as part of a colloquium on computer vision, hosted by the laboratory

Tal Drori, director of the Multimeia Analytics lab at the IBM research lab in Haifa. PR photo
Tal Drori, director of the Multimeia Analytics lab at the IBM research lab in Haifa. PR photo

About two weeks ago, a COGNITIVE COMPUTING COLLOCVIUM conference was held at the IBM research center in Haifa that dealt with computer vision. One of the sessions was dedicated to multimedia analytics.

In an interview with the Hidan website, Tal Drori, director of the MULTIMEDIA ANALYTICS department, explains that this is a boom period for the field of computer vision.
In my lab we do different applications of computer analytics. In the field of computer vision, we are engaged in cognitive medical imaging (see "The radiologist assistant who never gets tired ), medical imaging that knows how to take into account elements from the medical images as well as from the comprehensive data such as medical records, recent tests, medical studies, etc. and talk to the radiologist and give the results and insights in his language.
Another area that we are researching in the lab is object recognition and augmented reality. Augmented reality directly connects to cognitive computing. If cognitive computing makes it possible to upgrade a person's interaction with the environment, augmented reality is the means - to take reality as the person sees it and enrich it. An example of such a project is an application on a mobile device that takes a photo of a product shelf in a supermarket and enriches the reality with information about nutritional values, coupons or anything else that requires the computer to identify which product it is and provide data related to it. But it is possible to derive further benefit from augmented reality in a field such as maintenance: a technician who comes to service a machine and wants both of his hands to be free wears smart glasses for an industrial environment and with their help he sees both the machine he is supposed to service and all the data on the component - characteristics, date of last service, etc. '. In addition, the system can send a query to Watson who will explain from past experience how to handle this type of discovery.
Another group is investigating the field of biometric authentication. Let's take for example a scenario of, for example, logging into the bank account from the mobile device. The system we developed knows how to turn on the video and ask the person to speak to the camera and say a random number - this is to make sure that he is not simply placing a photo in front of the camera, combined with facial recognition, the voice and sometimes also asking him to sign the screen. All these measures make it possible to provide a very high level of authentication on mobile devices.
"The issue of biometric authentication is relevant to many uses, since most of our interaction is through the phone and not through the fixed computer. The issue of authentication is important when entering sensitive systems. For example, even during a physical visit to the bank, when a robot greets us at the bank and apart from just saying hello, it will also be able to verify that it is indeed the account holder. Even in smart cars, authentication by face or voice recognition will be needed so that only those who are authorized to drive the car can do so.

Gal Ashur from IBM's research laboratory in Haifa presents the system for analyzing video content at a conference held by the company in Las Vegas. Photo: Avi Blizovsky

The computer suddenly understands by itself what is in the video
Gal Ashur, director of the video and GIS technology group in the Multimedia Analytics department that Drori manages, presented the video content analysis system at a conference held by IBM in Las Vegas a week earlier. Computer analysis of the content is designed to enrich the information about it and facilitate the search in it. Ashur's group uses a lightweight virtual operating system known as Dockers to go through the film and extract its contents.

"We basically created a solution to which we upload video from all kinds of sources, including a smart phone application we wrote that records video while driving along with the GPS data of the location of the photo," said Ashour. "Another source is video obtained from car video cameras (Dashboard Camera), GoPro cameras and more".


"The most basic feature of the video analysis system is to recognize text (for example road signs), and actually turn it into optical character recognition (OCR). If the video records organizational training, he can read the texts in the presentations, and indicate the time when they appear so that it is easy to reach this point in the lecture."

In conclusion, Ashur said that "in the future it will also be possible to add speech recognition, or face recognition and maybe even record our lives and attach metadata to everything we see so that we can search in the future and watch relevant events again. For this purpose, the computer that processes the data requires analytical ability."

The amount of videos is very large and there is a huge value in computer understanding of the video content to find the right video and the right segment within the video is a very important thing for many applications.
"The future is cognitive even in computer vision. That the systems will help a person to understand better, to behave better. Part of a person's most natural interaction is seeing, if we can help a person through what we understand they see, understanding the context is a cognitive system. Drori summarizes.

3 תגובות

  1. My attendance at the conference was interesting and fascinating, in addition I very much agree with Mr. Tal Drori's theory.

Leave a Reply

Email will not be published. Required fields are marked *

This site uses Akismat to prevent spam messages. Click here to learn how your response data is processed.