From: www.itworld.com

Computers helped drive breakthrough in human genome sequencing

December 8, 2000 —

 

At a White House press conference Monday, the Human Genome Project public consortium
and Celera Genomics, a private firm, jointly reported that they had assembled working
drafts of the human genome sequence. The two groups' presence on the same podium marked
an apparent truce in what has been a desperate push to be first to announce a decoded
human gene sequence.

While representing a breakthrough in scientific learning, the genome detective work
also represents something of a breakthrough in modern computing techniques. Distributed
computing and database technology as well as advanced search software and other
technologies were employed to reach the goal of uncovering the basic plan for human
life.

The work to create a genetic blueprint for a human being revealed a total of 3.12
billion base pairs in the human genome. An assembled genome is described as one on
which the location and order of the letters of genetic code along the chromosomes are
known. Computers are relied on to uncover matches in DNA sequences that serve to
unravel the code.

Some observers suggest that the work is leading to the creation of a new field of
technology known as bioinformatics. They say that a new discipline is arising out of
the wedding of computer science and biology.

For its part, Celera has hooked up DNA sequencers with a supercomputing facility
featuring 800 interconnected Compaq Alpha-based computer systems, each of which is
capable of performing more than 250 billion sequence comparisons per hour. Celera has
an alliance with Oracle for database development.

"The whole project has been about information acquisition and storage," said Bruce
Birren, assistant director of the Whitehead Sequencing Center in Cambridge, Mass., a
key participant in the Human Genome Sequencing Consortium.

"We've read out the four-letter code that represents the book of life," Birren said,
referring to the four-letter code that corresponds to DNA's four basic chemical
components. "We've always studied one gene at a time, but our perspective is changed
because we now see the entire landscape. That takes computational ability."

There is substantial analytical work yet to do in the field, as researchers look to
establish possible links between specific genes and specific traits. That next stage of
work may be counted on to drive further computing advances, even as computing advances
drive genome mapping forward.

"Now we're moving into a phase where interpreting [genetic] information is
going to require new analytical tools," Birren said. Researchers are already using a
mix of different advanced software technologies -- including neural networks, fuzzy
logic, and data smoothing -- to uncover patterns in the genetic data.

It will also be necessary to carefully match analytical and data management software
tools, said Michael Roberson, a program manager at the SAS Institute in Cary, N.C.

"One of the areas where SAS software has been used for a long time has been in the area
of clinical trials," he said.

On one level, Roberson explained, genetic data manipulation and management is similar
to traditional data mining and data warehousing tasks. But there are differences.

"In human genome work, data warehousing is made more complicated by the fact that the
data is very irregular and very large," he said. "When you're looking at this data in
relation to clinical trial data, it's much harder to take pieces of information from a
lot of sources and combine them as you would, for example, with a traditional credit
card information database. It's tricky data to work with, because the [techniques
associated with the] collection of the data tend to be different for each subject."

Roberson said his group was looking at new technology known as data smoothing, which
uses pattern recognition techniques to cull true genetic markers amid noisy data sets.
In May the SAS Institute spun off iBiomatics LLC as a wholly owned subsidiary to
specifically meet the computing needs of researchers in the emerging life science
industry.


Links to related genome computing information on the ITworld.com
Network


"SAS helps scientists decipher human genetic code," Brad Shewmake (Infoworld,
June 12, 2000)


"Writing the Book of Life," Louise
Fickel (CIO, March 1, 2000)

http://www.infoworld.com/sponsor/supplements/BioInfor/BioInfo.htm>"Building a Career in
BioInformatics," Dawn Levy ( InfoWorld supplement, July 30, 1999)

Links to other genome computing information

Computing the Genome

Introduction to Human Genome
Computing via the World Wide Web


National Institutes of Health (NIH)

The
Bioinfomatics Gold Rush (Scientific American, July 2000)