Comprehensive coverage

Israeli researchers report progress in storing information in DNA: all of YouTube in one spoonful

Researchers at the Technion and the Herzliya Interdisciplinary Center demonstrated a significant improvement in the efficiency of the process required to store digital information in DNA. In an article published in the journal Nature Biotechnology, the group demonstrated the storage of information at a density equivalent to the storage of more than 10 petabytes (one million gigabytes) in a single gram of DNA while significantly optimizing the writing process

Biological computing. Illustration: shutterstock
Biological computing. Illustration: shutterstock

Researchers at the Technion and the Herzliya Interdisciplinary Center demonstrated a significant improvement in the efficiency of the process required to store digital information in DNA. In an article published in the journal Nature Biotechnology The group demonstrated the storage of information at a density equivalent to the storage of more than 10 petabytes (one million gigabytes) in a single gram of DNA while significantly optimizing the writing process. To illustrate, this density makes it possible, theoretically, to store in the volume of a teaspoon all the information saved on YouTube.

The research was led by research student Leon Inabi from the Faculty of Computer Science at the Technion under the guidance of Prof. Zohar Yachini from the Faculty of Computer Science at the Technion and Efi Arazi School of Computer Science at the Herzliya Interdisciplinary Center. The research was conducted in collaboration with the laboratory of Prof. Roi Amit from the Faculty of Biotechnology and Food Engineering at the Technion.

The amount of digital information has grown at an enormous speed since the invention of the hard disk by IBM in the 50s. The storage of this information has become a great challenge not only in the technological context but also in the economic and environmental aspects, since today the server farms - the information warehouses that serve us all - are responsible for about 2% of the global carbon emissions (a rate similar to the cumulative emissions of all the airplanes in the world) and about 3% of the world's electricity consumption (more than the electricity consumption of the whole of the UK). Against the background of all this, a new and revolutionary technological approach has been developing in the last decade: storing information in DNA. This technology allows for significant miniaturization, saving the information for a much longer term (thousand times) and zero energy and economic cost.

The basic idea in encoding information on DNA is this: the DNA molecule is a chain made up of links called nucleotides. The nucleotides are divided into four types marked with the letters A, C, G and T. To store information in DNA, each binary sequence (consisting of the signs 0 and 1) must be translated into a sequence consisting of these letters. In the next step, in a process called synthesis, actual DNA molecules representing the same sequences are produced. To read the information, sequencing of the DNA molecules is required. This sequencing produces an output that represents the nucleotide sequence that makes up each molecule in the input, and said output is translated into a binary sequence that represents the original message that we encoded. Modern technologies allow the synthesis of thousands of different nucleotide series at the same time.

DNA storage is a very complex technological challenge. In the field of reading the information (sequencing) there has been enormous progress following the genome revolution, but there are still significant technological difficulties in writing the information. Hence the importance of the breakthrough achieved by the researchers of the Technion and the Herzliya Interdisciplinary Center and which allows: (1) an increase in the number of letters used to encode the information (beyond the original 4 letters); (2) a significant reduction in the rounds of synthesis required to store the information in DNA; (3) Improving the error correction mechanism in the code.

As mentioned, natural DNA consists of four building blocks, the four letters A, C, G and T. The team of researchers increased the number of letters for actual use, with each new letter being a unique combination of the original letters. The concept is similar to the production of new colors by uniquely mixing base colors. Increasing the number of letters allows more information to be encoded at each position in the sequence of DNA molecules. According to Prof. Yakhini, "in the synthesis and sequencing processes used today, there is a built-in information redundancy (redundancy), because each molecule is produced in a large number of copies and read in a large number of copies during sequencing. The technology we developed utilizes this redundancy to increase the effective number of letters far above the original 4 letters, thus allowing us to encode each unit of information in fewer synthesis cycles."

The researchers were able to reduce by 20% the number of synthesis rounds required per unit of information. Furthermore, the researchers showed that it will be possible to reduce the number of synthesis rounds by 75% in the future without significant development efforts. This means that the storage process will be faster and less expensive. "In this work, we applied information coding in a practical way with a greater synthesis efficiency by tens of percent compared to conventional coding," explains Prof. Amit. "The research included the actual application of the new coding method to store large volumes of information on DNA molecules and its reconstruction to test the process." Indeed, on one of the shelves in Prof. Amit's lab at the Technion is a small test tube containing about 10 nanograms (one billionth of a gram) of DNA, which encodes thousands of copies of the Bible in a bilingual version.

The research group has developed an advanced mechanism that makes it possible to overcome errors that are an integral part of a biological-physical process such as the one occurring here. Part of the DNA sequence of the molecules that store the information, designed by Leon Inabi and Prof. Yachini, is used for the said error correction mechanism. According to Leon Inabi, "Thanks to the use of error correction codes, adapted to the unique coding we created, we were able to perform extremely efficient coding and successfully recover the information. When working in a system consisting of millions of parts (molecules), even extremely rare events (one in a million events) occur, which may disrupt the reading. The careful coding allowed us to overcome these problems."

The researchers note that "the technology presented in the article has the potential to optimize additional processes in synthetic biology and biotechnology. We believe that in the coming years we will see a significant increase in the use of synthetic DNA in research and industry."

The artificial DNA used by the researchers and designed by the group was produced by the American company Twist Bioscience, which also employs a development group in Tel Aviv, and was sequenced at the Technion's genomic center. The research was partially supported by the European Union's Horizon 2020 framework program. Leon Inabi is supported by an Adams scholarship of the Israel Academy of Sciences. Dr. Orna Attar and research student Inbal Vakanin also participated in the study.

for the scientific article

More of the topic in Hayadan:

Leave a Reply

Email will not be published. Required fields are marked *

This site uses Akismat to prevent spam messages. Click here to learn how your response data is processed.