The Sheba Hospital Cancer Research Center (SCRC) thought that the process of transferring 300 terabytes of data needed to advance their research would take a reasonable amount of time and were surprised to see that it was 43,800 hours. The Israel Defense Forces managed to establish a gigabyte communication line for them that allowed them to transfer 4-6 terabytes per day, and to have time to analyze the research within 100 days
The Sheba Hospital Cancer Research Center (SCRC) thought that the process of transferring 300 terabytes of data needed to advance their research, from the Genomic Data Commons Data Portal of the US National Cancer Institute in Chicago to local storage , will be relatively simple.
The researchers at the SCRC purchased the necessary resources from the Sheba data center, opened accounts on the GDC data portal, received permission to access the databases, installed the client software and pressed ENTER. Then they were surprised to receive the following message: "Your download will take approximately 43,800 hours". The exact language of the message may have been slightly different, but the message was clear: the existing network infrastructure will not be up to the task.
The Cancer Research Center in Sheba, affiliated with the Sackler School of Medicine, was founded by Prof. Gideon (Gidi) Ravavi, who also heads it. The center became known as a pioneering research laboratory, which does not shy away from technological challenges.
Prof. Ravavi is not only a world-renowned researcher in his field of expertise: identifying the role of genetic elements in activating genes that cause cancer and deciphering changes in RNA. He is also known as having initiative when it comes to technology and as someone who insists on being at the forefront of technology that enables advanced research in the field of cancer. This vision ensures that the SCRC will be equipped with the technologies necessary to enable the deciphering of genetic and epigenetic mechanisms that affect gene expression, and their impact on cancer. Under his leadership, the SCRC was the first research facility in Israel to purchase next-generation DNA microarray (NGS) technologies. Prof. Ravavi also made sure to purchase enough storage space to meet the ambitious goal of downloading the GDC datasets.
In order to carry out the data transfer, Dr. Eran Eyal, the head of its bioinformatics - SCRC, and his team began to think outside the box. Way outside the box. "We consulted Sheba's IT department and the commercial Internet providers," says Eran. "But there was no They have a practical solution. We even explored the possibility of traveling from Israel to the NIH facility in Chicago with disks in suitcases to bring the data back to Israel. But the cost of such a trip was too high. And the NIH team has never encountered a lab that wanted to do that."
Israel's R&D network: the helpful solution
It was the GDC portal team that led them to consult with the Interuniversity Computing Center (MHC) - Israel's National Research and Education Network (NREN). Eran contacted Henk Nussbacher, Network and Computing Infrastructure Manager of MHC. Sheba already used the services of MBA for a tele-medicine application. But this application did not require a link or speed similar to what is required to transfer 300 terabytes in an efficient and secure manner.
Hank proposed a dedicated gigabit per second line between the Israel Defense Forces and SCRC using an existing carrier's infrastructure. He contacted the GDC team, tested the application to make sure it worked and that the connection from Israel could handle a continuous gigabyte per second load. In October 2017, the line passed the tests and was put into operation. "Initial assessments indicated that what had previously been a very multi-year download process "Impracticable, turned into a very practical project that will take about three months," recalls Eran. "Now that we see how well the solution works, we are considering expanding the scope of the project and trying to obtain more data."
According to Dr. Nitzan Kol and Amri Naishul, the SCRC researchers who are actually handling the transfer, the configuration of Sheba's and NIH's systems does not actually allow the ultimate speed of gigabytes per second to be reached, but it still meets the needs of the mission. "The TCP infrastructure of the -GDC is definitely not ideal. We see peaks of about 800 megabytes per second and also lows of about 600 megabytes per second. But in the end we reach 4 to 6 terabytes per day. Enough to allow us to transfer the 300 terabytes in the planned time frame."
Research to restore hope
The SCRC currently focuses on three main areas: RNA modifications and their role in regulating gene expression and cell fate; The study of transposable genetic elements (TE), also known as "jumping genes" or DNA sequences that move from one place on the genome to another; and the genomic sequence for personalized medicine. GDC data is essential for the study of TEs and whole genome sequencing of specific subgroups of patients Cancer, since they involve huge data sets and analysis.
Due to the nature of the GDC datasets, there is no question of the recurrence of the data over time. "This type of data does not become 'obsolete'," says Prof. Ravavi. "The only problem, now that we have a way to transfer the data thanks to the MBA, becomes a question of capacity. The more data we want, the more storage space we need. It's a matter of balancing resources." The relevance and usefulness of the data go beyond the research at the Sheba Center, and have the potential to be an important tool in the cancer research that takes place all over the country. "In principle," says Prof. Ravavi, "SCRC will be happy to share the genomic data that is available to the public that were downloaded within the scope of this project."
https://www.inthefieldstories.net/delivering-the-data-for-groundbreaking-cancer-research/
3 תגובות
2 plane tickets to the USA + 30 3.5" discs with a volume of 10TB + three days = all the material in Israel in a secure and fast way
I don't understand all the fuss and no one has tested Amazon's service???
There is a great Amazon service for data transfer... it's called AWS SnowBall
According to what I checked it should cost a little more than 10K and they do everything securely and quickly.
And much faster than 100 days and you don't have to talk to a thousand people and consultations outside the box... come on.
I wonder how much for this whole business…