Computers helped drive breakthrough in human genome sequencing

December 8, 2000, 03:35 PM —  ITworld.com — 

At a White House press conference Monday, the Human Genome Project public consortium
and Celera Genomics, a private firm, jointly reported that they had assembled working
drafts of the human genome sequence. The two groups' presence on the same podium marked
an apparent truce in what has been a desperate push to be first to announce a decoded
human gene sequence.

While representing a breakthrough in scientific learning, the genome detective work
also represents something of a breakthrough in modern computing techniques. Distributed
computing and database technology as well as advanced search software and other
technologies were employed to reach the goal of uncovering the basic plan for human
life.

The work to create a genetic blueprint for a human being revealed a total of 3.12
billion base pairs in the human genome. An assembled genome is described as one on
which the location and order of the letters of genetic code along the chromosomes are
known. Computers are relied on to uncover matches in DNA sequences that serve to
unravel the code.

Some observers suggest that the work is leading to the creation of a new field of
technology known as bioinformatics. They say that a new discipline is arising out of
the wedding of computer science and biology.

For its part, Celera has hooked up DNA sequencers with a supercomputing facility
featuring 800 interconnected Compaq Alpha-based computer systems, each of which is
capable of performing more than 250 billion sequence comparisons per hour. Celera has
an alliance with Oracle for database development.

"The whole project has been about information acquisition and storage," said Bruce
Birren, assistant director of the Whitehead Sequencing Center in Cambridge, Mass., a
key participant in the Human Genome Sequencing Consortium.

"We've read out the four-letter code that represents the book of life," Birren said,
referring to the four-letter code that corresponds to DNA's four basic chemical
components. "We've always studied one gene at a time, but our perspective is changed
because we now see the entire landscape. That takes computational ability."

There is substantial analytical work yet to do in the field, as researchers look to
establish possible links between specific genes and specific traits. That next stage of
work may be counted on to drive further computing advances, even as computing advances
drive genome mapping forward.

"Now we're moving into a phase where interpreting [genetic] information is
going to require new analytical tools," Birren said. Researchers are already using a
mix of different advanced software technologies -- including neural networks, fuzzy
logic, and data smoothing -- to uncover patterns in the genetic data.

It will also be necessary to carefully match analytical and data management software
tools, said Michael Roberson, a program manager at the SAS Institute in Cary, N.C.

"One of the areas where SAS software has been used for a long time has been in the area
of clinical trials," he said.

On one level, Roberson explained, genetic data manipulation and management is similar
to traditional data mining and data warehousing tasks. But there are differences.

"In human genome work, data warehousing is made more complicated by the fact that the
data is very irregular and very large," he said. "When you're looking at this data in
relation to clinical trial data, it's much harder to take pieces of information from a
lot of sources and combine them as you would, for example, with a traditional credit
card information database. It's tricky data to work with, because the [techniques
associated with the] collection of the data tend to be different for each subject."

Roberson said his group was looking at new technology known as data smoothing, which
uses pattern recognition techniques to cull true genetic markers amid noisy data sets.
In May the SAS Institute spun off iBiomatics LLC as a wholly owned subsidiary to
specifically meet the computing needs of researchers in the emerging life science
industry.


Links to related genome computing information on the ITworld.com
Network


"SAS helps scientists decipher human genetic code," Brad Shewmake (Infoworld,
June 12, 2000)


"Writing the Book of Life," Louise
Fickel (CIO, March 1, 2000)

http://www.infoworld.com/sponsor/supplements/BioInfor/BioInfo.htm>"Building a Career in
BioInformatics," Dawn Levy ( InfoWorld supplement, July 30, 1999)

Links to other genome computing information

Computing the Genome

Introduction to Human Genome
Computing via the World Wide Web


National Institutes of Health (NIH)

The
Bioinfomatics Gold Rush (Scientific American, July 2000)

ITworld.com

I like it!
Post a comment
The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.
Free books

Essential JavaFX
Get started building rich Web apps quickly with an introduction to the power of JavaFX key features -- scene node graphs, nodes as components, the coordinate system, layout options, colors and gradients, custom classes with inheritance, animation, binding, and event handlers.Enter now!

The Nomadic Developer
Consulting can be hugely rewarding, but it's easy to fail if you are unprepared. To succeed, you need a mentor who knows the lay of the land. Aaron Erickson is your mentor, and this is your guidebook. Enter now!

Featured Sponsor

AISO founders envisioned a Web hosting company that was environmentally friendly. While the company employed energy-efficient innovations like solar panels, its infrastructure produced unacceptable power and cooling requirements. Find out how AISO leveraged AMD technology to overcome their challenge in this case study white paper.

In this whitepaper, Scalar explores the opportunity to change the landscape with respect to mission critical databases built around Oracle. Leveraging technologies such as Linux, high-end commodity processing power and Oracle RAC technology to architect, design, build and maintain database infrastructure that delivers maximum availability, reliability and performance at a fraction of traditional cost.

On a typical day, weather.com, the Web site for The Weather Channel in Atlanta, serves up between 15 million and 20 million page views. But in September 2004, when back-to-back hurricanes ransacked Florida, the peak traffic on one day more than tripled: over 70 million page views by more than 7 million unique visitors. Read the full success story now.

Marketplace