Scientists have revealed nearly 100 genetic variants implicated in the development of cancers such as breast cancer and prostate cancer, reports said.
The new method designed by a team of scientists, described in the journal Science, identified these variants in the under-explored regions of DNA that do not code for proteins, but instead influence activity of other genes. As even more whole genome sequences become available, this approach can be applied to find any potential disease-causing variant in the non-coding regions of the genome.
Researchers can now identify DNA regions within non-coding DNA, the major part of the genome that is not translated into a protein, where mutations can cause diseases such as cancer.
Their approach reveals many potential genetic variants within non-coding DNA that drive the development of a variety of different cancers. This approach has great potential to find other disease-causing variants.
Unlike the coding region of the genome where our 23,000 protein-coding genes lie, the non-coding region -- which makes up 98% of our genome -- is poorly understood. Recent studies have emphasized the biological value of the non-coding regions, previously considered 'junk' DNA, in the regulation of proteins. This new information provides a starting point for researchers to sieve through the non-coding regions and identify the most functionally important regions.
"Our technique allows scientists to focus in on the most functionally important parts of the non-coding regions of the genome," says Professor Mark Gerstein, the senior author from the University of Yale. "This is not just beneficial for cancer research, but can be extended to other genetic diseases too."
The team used the full set of genetic variants from the first phase of the 1000 Genomes Project, together with information about the non-coding regions generated by the ENCODE Project, and identified regions that did not accumulate much variation.
Protein-coding genes play a crucial role in human survival and fitness, and are under strong 'purifying' selection, which removes variation. The team found that some non-coding DNA regions showed almost the same low levels of variation as protein-coding genes, and called these 'ultrasensitive' regions.
Within the ultrasensitive regions, they looked at specific single DNA letters that, when altered, caused the greatest disturbance to the genetic region. If this non-coding, ultrasensitive region is central to a network of many related genes, variation can cause a greater knock-on effect, resulting in disease.
They integrated all this information to develop a computer workflow known as FunSeq. This system prioritizes genetic variants in the non-coding regions based on their predicted impact on human disease.
"Our method is a practical and successful way to screen for purifying selection in non-coding regions of the genome using freely available data such as those from the ENCODE and 1000 Genomes Projects," says Dr Yali Xue, author from the Wellcome Trust Sanger Institute. "It really shows the value of these large-scale open access data-sets."