Comprehensive coverage

Do artificial intelligence systems really have their own secret language?

Researchers in the US made the intriguing claim that the DALL-E 2 model may have invented its own secret language to understand descriptions of objects.

By: Aaron J. Snowwell, Postdoctoral Research Fellow, Computational Law and AI Liability, Queensland University of Technology

Artificial intelligence draws. Photo: depositphotos.com
Artificial intelligence draws. Photo: depositphotos.com

A new generation of artificial intelligence (AI) models can produce "creative" images on demand based on text prompts. software like Image, midjourney and DALL-E 2 starting Change the way creative content is created with implications for copyright and intellectual property issues.

While the output of these models is often impressive, it is difficult to know exactly how they produce their results (a problem that accompanies the entire deep learning world and does not yet have an answer, although several companies, including IBM, claim to develop systems that will provide explanations, but no such system has yet went on the market). Last week, researchers in the US made the intriguing claim that the DALL-E 2 model may have invented its own secret language to understand descriptions of objects.

Researchers fed DALL-E 2 a request to generate images containing text captions, and then fed the resulting captions (gibberish) back into the system, researchers concluded that DALL-E 2 thinks that Vicootes means "vegetables", while that Wa ch zod rea refers to"Sea creatures that a whale may eat". These claims are fascinating, and if true, could have important implications for security and interpretability for this type of large AI model. So what exactly is going on?

Does DALL-E 2 have a hidden language?

DALL-E 2 apparently has no "secret language". Maybe it's more accurate to say he has Vocabulary His own - but we can't know that for sure either.

First of all, at this point it is very difficult to verify any claims about DALL-E 2 and other large models of artificial intelligence, because only a handful of researchers and creative professionals have access to them. Any images that are shared publicly (on Twitter for example) should be taken with a fairly large grain of salt, as they were "picked" by a human from among many output images created by artificial intelligence.


Even those with access can only use these models in limited ways. For example, DALL-E 2 users can create or modify images, but cannot (yet) interact more deeply with the AI ​​system, for example by changing the code behind the scenes. This means that methods "Explainable artificial intelligence” to understand how these systems work cannot be implemented, and a systematic investigation of their behavior is challenging.

So, what's up?

One possibility is that "gibberish" phrases are related to words from languages ​​other than English. for example, Apollo, which seems to conjure images of birds, is similar to Latin apodidae, which is the binomial name of a family of bird species. This seems like a plausible explanation. For example, DALL-E 2 was trained on a very wide variety of data taken from the Internet, which included many non-English words.

Similar things have happened in the past: large artificial intelligence models in natural language Accidentally learned to write computer code without deliberate training.

Is it all about tokens?

One point that supports this theory is the fact that AI language models do not read text the way humans do. Instead, they break down input text into "TOKENS" before processing it.

For token accesses  Different have different results. Treating each word as a token seems like an intuitive approach, but causes trouble when identical tokens have different meanings (like how "match" (a match, but also a set in an AB game) means different things in a game of tennis or in lighting a fire.).

On the other hand, treating each character as a token produces a smaller number of possible tokens, but each one conveys much less meaningful information.

DALL-E 2 (and other models) use an intermediate approach called Coding a pair of bytes (BPE). Examining the BPE representations for some of the gibberish words indicates that this approach can be an important factor in understanding the "secret language".

Not the whole picture

The "language of secrets" can also be just an example of the "garbage in, garbage out" principle. DALL-E 2 cannot say "I don't know what you are talking about", so it will always create some kind of image from the given input text.

Either way, none of these possibilities provide full explanations for what's going on. For example, it looks like removing single characters from gibberish words Corrupts the generated images in very specific ways. And it seems that individual gibberish words do not necessarily combine to produce Coherent composite images (As they would do if there really was a secret "language" under the hood).

why is it important

Beyond the intellectual curiosity, you may be wondering if all of this really matters.

The answer is yes. DALL-E's "secret language" is an example of an "adversarial attack" against a machine learning system: a way to break the system's intended behavior by deliberately choosing inputs that the AI ​​doesn't handle well.

One of the reasons for performing such attacks is that they challenge our confidence in the model. If the AI ​​interprets gibberish in unintended ways, it may also interpret meaningful words in unintended ways. This also causes security concerns. DALL-E 2 filters input text to prevent users from producing harmful or offensive content, but a "hidden language" of gibberish may allow users to bypass these filters.

A recent study revealed “switch expressions  For some AI models - short nonsense phrases like "zoning tap fiennes" that can reliably cause models to write racist, harmful or biased content. This research is part of the ongoing effort to understand and control In the way complex deep learning systems learn from data.

Finally, phenomena such as the "secret language" of DALL-E 2 raise concerns for interpretation. We want these models to behave as one expects, but seeing structured output in response to gibberish confounds our expectations.

shed light on the existing concerns

You probably remember the uproar that arose in 2017 around some chatbots on Facebook that "invented their own language". The current situation is similar in that the results are worrying - but not in the sense of "Skynet is coming to take over the world".

Instead, the "secret language" of DALL-E 2 highlights the existing concerns about the robustness, security and interpretability of Deep learning systems.

For an article in The Conversation

More of the topic in Hayadan:

One response

  1. When translating the character token into Hebrew in the above context, it is a letter and not a figure.
    nice demo :)

Leave a Reply

Email will not be published. Required fields are marked *

This site uses Akismat to prevent spam messages. Click here to learn how your response data is processed.