Comprehensive coverage

How do you reduce the transfer time of genomic data from six years to a few months?

The Sheba Hospital Cancer Research Center (SCRC) thought that the process of transferring 300 terabytes of data needed to advance their research would take a reasonable amount of time and were surprised to see that it was 43,800 hours. The Israel Defense Forces managed to establish a gigabyte communication line for them that allowed them to transfer 4-6 terabytes per day, and to have time to analyze the research within 100 days

Computing in medicine. Image shutterstock
Computing in medicine. Image shutterstock

The Sheba Hospital Cancer Research Center (SCRC) thought that the process of transferring 300 terabytes of data required to advance their research, from the Genomic Data Commons Data Portal of the US National Cancer Institute in Chicago to local storage , will be relatively simple.

The researchers at the SCRC purchased the necessary resources from the Sheba data center, opened accounts on the GDC data portal, received permission to access the databases, installed the client software and pressed ENTER. Then they were surprised to receive the following message: "Your download will take about 43,800 hours". The exact language of the message may have been slightly different, but the message was clear: the existing network infrastructure will not be up to the task.

The Cancer Research Center in Sheba, affiliated with the Sackler School of Medicine, was founded by Prof. Gideon (Gidi) Ravavi, who is also its head. The center became known as a pioneering research laboratory, which does not shy away from technological challenges.

Prof. Ravavi is not only a world-renowned researcher in his field of expertise: identifying the role of genetic elements in activating genes that cause cancer and deciphering changes in RNA. He is also known as having initiative when it comes to technology and as someone who insists on being at the forefront of technology that enables advanced research in the field of cancer. This vision ensures that the SCRC will be equipped with the technologies necessary to enable the decoding of genetic and epigenetic mechanisms that affect gene expression, and their impact on cancer. Under his leadership, the SCRC was the first research facility in Israel to purchase next-generation DNA microarray (NGS) technologies. Prof. Ravavi also made sure to purchase enough storage space to meet the ambitious goal of downloading the GDC datasets.

In order to carry out the data transfer, Dr. Eran Eyal, the head of its bioinformatics - SCRC, and his team began to think outside the box. Way out of the box. "We consulted with Sheba's IT department and with the commercial internet providers," says Eran. "But they had no practical solution. We even explored the possibility of traveling from Israel to the NIH facility in Chicago with disks in suitcases to bring the data back to Israel. But the cost of such a trip was too high. And the NIH team has never encountered a lab that wanted to do that."

Israel's R&D network: the helpful solution

Prof. Gideon Ravavi
Prof. Gideon Ravavi

It was the GDC portal team that led them to consult with the Interuniversity Computing Center (MHC) - Israel's National Research and Education Network (NREN). Eran contacted Henk Nussbacher, manager of network and computing infrastructure of the Israel Defense Forces. Sheba has already used the services of MABA for the application of tele-medicine. But this application did not require a link or speed comparable to what is required to transfer 300 terabytes efficiently and securely.

Hank proposed a dedicated gigabit per second line between the Israel Defense Forces and SCRC using an existing carrier's infrastructure. He contacted the GDC team, tested the application to make sure it was working and that the connection from Israel could handle a continuous gigabyte per second load. In October 2017, the line passed the tests and was put into operation. "The initial assessments indicated that what was previously a multi-year download process that was very impractical, became a very practical project that would take about three months," recalls Eran. "Now that we see how well the solution works, we are considering expanding the scope of the project and trying to obtain more data."

According to Dr. Nitzan Kol and Amri Naishul, the SCRC researchers who are actually handling the transfer, the configuration of the Sheba and NIH systems does not actually allow the ultimate speed of gigabytes per second to be reached, but it still meets the needs of the mission. "The TCP infrastructure of the GDC is certainly not ideal. We see peaks of about 800 megabytes per second and also lows of about 600 megabytes per second. But in the end we reach 4 to 6 terabytes per day. Enough to allow us to transfer the 300 terabytes in the planned time frame."

Research to restore hope

The SCRC currently focuses on three main areas: RNA modifications and their role in regulating gene expression and cell fate; Research of transposable genetic elements (TE), also known as "jumping genes" or DNA sequences that move from one place on the genome to another; and the genomic sequence for personalized medicine. GDC data is essential for the study of TEs and whole genome sequencing of specific subgroups of patients Cancer, since they involve huge data sets and analysis.

Due to the nature of the GDC datasets, there is no question of the recurrence of the data over time. "This type of data does not become 'outdated'," says Prof. Ravavi. "The only problem, now that we have a way to transfer the data thanks to MBA, becomes a question of capacity. The more data we want, the more storage space we need. It's a matter of balancing resources." The relevance and usefulness of the data go beyond the research at the Sheba Center, and have the potential to be an important tool in the cancer research that takes place all over the country. "In principle," says Prof. Ravavi, "SCRC will be happy to share the publicly available genomic data downloaded within the scope of this project."

https://www.inthefieldstories.net/delivering-the-data-for-groundbreaking-cancer-research/

3 תגובות

  1. 2 plane tickets to the USA + 30 3.5" disks with a volume of 10TB + three days = all the material in Israel in a secure and fast way

  2. I don't understand all the fuss and no one has tested Amazon's service???
    There is a great Amazon service for data transfer... it's called AWS SnowBall
    According to what I checked it should cost a little more than 10K and they do everything securely and quickly.
    And much faster than 100 days and you don't have to talk to a thousand people and consultations outside the box... come on.

    I wonder how much for this whole business…

Leave a Reply

Email will not be published. Required fields are marked *

This site uses Akismat to prevent spam messages. Click here to learn how your response data is processed.