13,000 offer up DNA to put their genomes online
Since opening to the public late last month, The Personal Genome Project has signed up 13,000 volunteers who will donate genetic material for the benefit of gene research worldwide. Information about the genetic material will also be posted online.
The project was launched last year with the goal of creating the world's first publicly accessible database of human genomic and trait data from 100,000 people. Initially, it started as a closed test study with 10 volunteers so that those who later sign up for the project "will know what they're getting into," said George Church, the Harvard Medical School professor leading the initiative.
Those first 10 volunteers had their genomes, along with photos and personal and family history, placed online as a pilot for the experiment, which one day could include millions of unique genomes.
Church said study participants have not been promised any anonymity -- just the opposite.
Participants are schooled on the fact that their private medical data, including any diseases or deformities, will be available for the world to view. And while Church acknowledged that will initially scare a some people off. But once people have gotten used to the idea of participating in medical research, "it's a fairly small additional step to say, 'Let's allow anyone at all to take a look at it.'
"We don't need that many people to enroll. One hundred thousand people out of 6.5 billion is a very tiny number of people," Church said.
The purpose of the public genome database is to offer up genetic information to the world's scientific community, including computer scientists, for the study of hereditary medical issues, according to Church. The project is among the first to allow researchers other than traditional medical doctors to use the data.
"I think there's a lot of opportunity for someone who looks at things differently to make connections that the so-called experts missed," he said. "So we're very excited about having participation of computer scientists, mathematicians, physicists and so forth."
Church believes that within a few years, everyone will have the opportunity to keep their own genome data -- and personal medical information -- in a personally-controlled electronic record. That valuable information becomes even more valuable when it can be shared with the scientific community in general.
"...If everyone shares, then suddenly it adds value to the resources everyone already has," he said.
The Personal Genome Project will focus initially on medical research. For example, Church and his team are interested in morphological characteristics, such as what makes a person's face the shape that it is.
"That doesn't sound like it's immediately medical, but things about morphology can affect whether you have sleeping or breathing problems," Church said. "We're trying not to be prejudicial in deciding in advance what's medical or not because there are opportunities for serendipity and holistic interconnections that computers can find that people may have missed because they're not as good at finding correlations."
To date, scientists have discovered 1,450 genes that are considered predictive of hereditary disease and that are actionable. That means a person with the genes can be treated medically if given enough warning or they can make a lifestyle change to make them less susceptible to illness, Church said.
The project targets families that have had diseases or abnormalities since their data will be of more use in finding genetic links. According to Church, older volunteers get priority because they've had more life history and more medical incidents.
Volunteers who sign up to have their genetic material tested must first answer a detailed questionnaire and demonstrate that they understand that their private information will be made very public. They will then be asked to give genetic material, such as hair, blood, skin or saliva, from which their genome will be extracted.
While volunteers won't have their names published with their genomic information, Church said the subjects are completely aware that anyone familiar with them can deduct from the photos and background information who they are.
A genome represents a full set of chromosomes -- or the complete genetic sequence -- of a human being, half of which come from the father and half from the mother. The genetic sequence represents 6 billion base pairs of nucleotides -- complementary DNA strands -- connected by hydrogen bonds.
In order to store the research data, one byte of capacity is required for each base pair. As a result, 6GB of data capacity is needed to store the genetic information of just one person, according to Church.
To address the scalability required for such a database, the project has turned to Web 2.0 technology and crowd sourcing. In other words, the project is being offered to the worldwide community of developers and technology vendors. For example, Isilon Systems Inc. stepped up to offer network-attached storage (NAS) clusters as primary storage for the project.
Church said Harvard went with NAS clusters because traditional monolithic storage arrays with RAID 5 protection is becoming less reliable for research where data grows exponentially. "We're starting to see solutions at that scale start to fail on a very regular basis," he said, "meaning you get two simultaneous disk failures and then lose whole data set."
Eventually, Church said he envisions millions of volunteers participating in the Personal Genome Project, requiring a highly scalable infrastructure. For just the current database with data on 10 people, the project is using 100 servers and a three-node Isilon IQ 12000x cluster.
The success of the project is highly dependent on how well the crowd-sourcing model works and which companies step forward to offer up technology for research. Church said Google has also offered "significant gifts," as has Amazon, which offered to host the data on its cloud storage offering.
Church said he expects The Personal Genome Project to have its 100,000 volunteers by the end of the year, even though not all of those participants will have been processed by that time.
"We're trying to build a model where even if only 100,000 out of 6.5 billion share, it's enough to benefit all 6 billion," he said.