Hayadan > IBM makes available to the research community one million encoded facial images to improve facial recognition systems

IBM makes available to the research community one million encoded facial images to improve facial recognition systems

IBM's research laboratories are making available to the global research community Diversity in Faces - a set that contains data on a million facial images that have undergone a coding process • Dr. Aya Sofer, vice president for artificial intelligence technology research at IBM worldwide: "The data set will help in the development of fair artificial intelligence systems More. This is a call to the global research community to contribute to the continuation of research in the field and promote this important issue"

Encoded facial recognition image. Photo: IBM

Artificial intelligence applications today show increasingly improved capabilities for performing complex data processing tasks, for example converting speech to text, translating or recognizing contexts, objects and images. These applications rely on the use of deep learning and dealing with huge amounts of data for the purpose of preparing more accurate models - but the strength of these technologies is also sometimes their weakness.

Artificial intelligence systems learn what they are taught: if the learning process (known as "training") is not based on sufficiently diverse data sets, the accuracy and fairness of the insights produced by these systems may be compromised. The challenge in training artificial intelligence systems is most clearly expressed when it comes to facial recognition technologies. There is a difficulty in building such systems that meet expectations as far as fairness is concerned.

The heart of the problem is not technological but the way the systems are trained. In order for them to work as expected, and for the results they produce to become more and more accurate over time, the information used to train the systems must be diverse and offer coverage of as wide a human variety as possible. The data sets used to train the system must be comprehensive and diverse enough for the technology to learn all the ways in which human faces differ from each other, and accurately identify those differences in a wide variety of situations. The image data must reflect the distribution and variation of facial features as we see them around the world.

This is the reason why IBM is now harnessing the power of science to develop fairer and more accurate artificial intelligence systems. These days, IBM's research division is making Diversity in Faces available to the global research community - a data set that includes coded data of one million anonymous facial images of humans, selected as the most appropriate representatives of human diversity from a public database of 100 million facial images (YFCC100M database ). The million images in the kit were coded using ten independent coding methods documented in the scientific literature. This is the first data set of its kind available for free use by the global research and development community and its purpose in making it available to researchers is to advance research in the area of fairness and accuracy of facial recognition technologies.

The coding of the images in the kit included objective measurement of human faces, such as skull structure - total length, nose length, forehead height and more subjective characterizations such as verbal assessment by humans about the age of the person photographed and the gender to which he belongs.
How do we measure and ensure the required diversity of human faces? First we know how faces differ between ages, genders, or skin tone, and how they are defined within these characteristics. But as previous studies have also shown, these variables are only a part of the total aggregate, and are not sufficient to characterize the entire variety of human faces. Dimensions such as symmetry of the face, contrast, position of the face, length and width of the various components (eyes, nose, forehead) are also important.

"IBM is proud to provide access to the new data set with the goal of promoting collective research and contributing to the creation of fairer artificial intelligence systems," said Dr. Aya Sofer, vice president of artificial intelligence technology research at IBM worldwide. "While IBM Research Labs is committed to continuing to research fairer facial recognition systems, we do not believe we can do it alone. Making the data set available to the global research community is a call to others to contribute to the continuation of research in the field and promote this important issue on the agenda of the scientific world."

For an evaluation site intended for the scientific community