New major collaborations in data sharing will help to discover new genes or variants in ALS

From the start of Project MinE it was anticipated that DNA profiles from external controls would become available in the near future. These “external” control data are important, because it helps to increase the likelihood of finding specific genes or variants that are relevant in ALS. Project MinE has now been able to start important collaborations in data sharing to include external control data sets.

Sequencing genomes is very costly and because the whole genome is investigated at once, we need many samples to be able to find relevant differences between ALS and control participants. Sharing genetic data is therefore absolutely critical to make progress in the field of genomics, because the external control data can be integrated without additional sequencing costs.

There are many great genomics initiatives around the globe that have large sets of control data and are now collaborating with Project MinE. Recently, Project MinE started a new collaboration with TOPmed, a program by the National Heart, Lung and Blood institute in the USA that has made more than 10,000 genomes available. Project MinE has now been granted access to these highly valuable data. Another 1,600 genomes, including ALS participants, have become available through collaboration with New York Genome Center’s ALS consortium.

These two examples also illustrate why Project MinE’s goal of whole genome sequencing 22,500 participants is more relevant than ever. Through sharing data, we are not only working towards solving ALS, but Project MinE also contributes to solving other diseases.

The approval of these data sharing requests is a major milestone for Project MinE. However, the non-trivial task of storage and computation remains. Our current estimates indicate that the total volume of data will exceed 2,000 terabytes before the end of 2018. An enormous amount of very valuable big data that will ultimately help to continue to discover more genes that play a role in ALS.

Accumulating, storing and analyzing these enormous amounts of data is very costly and is only possible because of the many donations and contributions made to this important project. We are looking forward to further scaling up the Project MinE dataset and will continue to both request and share data with other initiatives. By continuing to collaborate, we both benefit from and contribute to a host of research initiatives and continue pave the way for novel gene discoveries.