The information density in DNA is about a hundred million times greater than that of digital storage. This means that for every unit of volume that currently holds 1 megabyte, we could potentially store up to 100 terabytes.

Researchers at the Taub Faculty of Computer Science have developed an AI-based method that accelerates the speed of information retrieval from DNA-based databases by three orders of magnitude and significantly improves accuracy. The team of researchers included doctoral student Omer Tzabri, Dr. Daniela Bar-Lev, Dr. Itay Or, Prof. Eitan Yaakovi, and Prof. Tovi Etzion.
DNA information storage is a new and promising field of research, the main focus of which is the use of DNA as a platform for storing information. DNA has significant advantages as an information storage system, including the preservation of information for enormous periods of time; a dramatic reduction in energy and economic costs and environmental damage; and a leap in information density, which means a dramatic reduction in storage volume.
In the context of "shelf life" of information – In 2013, researchers from Denmark succeeded in extracting DNA from the bone of a horse that lived 700,000 years ago. In 2021, an international team succeeded in extracting DNA from mammoths that lived more than a million years ago. For comparison, the lifespan of a magnetic disk, such as those used in server farms, is measured in years or at most a few decades. Therefore, the expected giant leap in long-term storage is clear.
In the economic and energy context It is worth noting that the "cloud", which provides us with most computing services, is based on server farms that currently consume about 3% of global electricity consumption and emit about 2% of total carbon emissions. As the amount of information grows exponentially, it is clear that the environmental damage expected from the continued use of existing technologies will continue to grow steadily.
Regarding information density – The information density in DNA is up to about a hundred million times greater than that of digital storage. This means that potentially, for every unit of volume that currently holds 1 megabyte, we could store up to 100 terabytes.
DNA is a molecule that consists of a sequence of organic compounds called nucleotides. These are divided into four types, denoted by the letters T, G, C, and A. Accordingly, while in traditional computing information is represented by only two digits – 0 and 1 – storage in DNA is based on sequences of four letters, which dramatically increases the number of possible combinations.
To write (store) the information in this technology, DNA synthesis is required – the creation of DNA molecules according to the sequence that encodes the information; and to read the information, DNA sequencing is required.
The development of DNA storage technology is accompanied by many technological challenges. First, both synthesis and sequencing are long and noisy processes that introduce errors into the information created. These are mainly insertion/deletion/substitution errors. In addition, due to the limitations of the synthesis process, many copies of each of the DNA molecules encoding the information are created during the process. These are stored together, in no order, in a storage medium that constitutes the memory system. During sequencing, many incorrect copies of these molecules are obtained; most of them contain errors and some even disappear completely. The current study presents a comprehensive computational solution for retrieving information and correcting errors in these complex systems, using innovative algorithms and methods for encoding and retrieving information. Through experiments, the researchers show that the solution they developed allows Shorten the time it takes to retrieve and read information from days to 10 minutes.

The method developed by the Technion researchers, DNAformer, consists of an AI model trained on simulated data (created using a simulator developed at the Technion) so that it can reconstruct DNA sequences based on their incorrect copies. In addition, the method also contains a dedicated error-correction code unique to DNA, which preserves the information in an error-resistant manner. On top of all this, an additional safety mechanism has been developed, which can identify particularly noisy DNA sequences and effectively apply powerful algorithmic tools to them. At the end of the process, everything is converted back into digital information.
The new method presented by the researchers allows reading 100 megabytes of information at a speed 3,200 times faster than the most accurate method that existed to date, without loss of accuracy. Compared to other methods that were considered fast until this development, the new method presents Accuracy improvement of up to 40% In addition to a significant time improvement. These capabilities were demonstrated on 3.1 megabytes of data, which included a color still image, a 24-second audio clip featuring astronaut Neil Armstrong speaking on the moon, and written text on the virtues of DNA as a promising storage method. [HERE]
The researchers intend to develop versions based on DNAformer that are tailored to different needs. They also explain that the technology they developed is scalable and adaptive, meaning that it can be adapted to very large amounts of data to meet market needs and future synthesis and sequencing technologies.
The research was supported by the European Research Commission (ERC grant), by the European Innovation Authority (EIC grant, DiDAX project), and by the Israel Science Foundation (ISF).
for the article in Nature machine intelligence
More of the topic in Hayadan:
2 תגובות
(Small, **eight** orders of magnitude)
They nicely compare the density of digital storage to that of DNA. A similar comparison of read and write access times between the two storage media is lacking.
Let's say 10 minutes for DNA. How much is that for a flash drive? What's the ratio?