November 18, 2011, 3:11 AM — Scientists at Cambridge University are developing a computer system that can read vast amounts of scientific literature, make rapid connections between facts and develop hypotheses.
Cambridge University says most biomedical scientists can't manage to keep on top of reading all of the publications in their field, let alone an adjacent field. Cambridge points out that the US National Library of Medicine's biomedical bibliographic database now lists over 19 million records and adds up to 4,000 new records daily.
It says that for a prolific field such as cancer research, the number of publications could quickly become unmanageable and important hypothesis-generating evidence may be missed.
With these problems in mind the university is now developing a computer system to helps scientists in the biomedical field.
To be useful, says Cambridge, such a system would need to trawl through the literature in the same way that a scientist would. It would need to read literature to uncover new knowledge, evaluate the quality of the information, look for patterns and connections between facts, and then generate hypotheses to test.
Not only would such a program speed up the progress of scientific discovery but, with the capacity to consider vast numbers of factors, it might even discover information that could be missed by the human brain, said Cambridge.
Dr Anna Korhonen and a team of researchers of Cambridge's Natural Language and Information Processing Group are aiming to develop systems that can understand written language in the same way that humans do. One of the projects Korhonen is involved in has recently developed a method of "text mining" the area of cancer risk assessment of chemicals, one of the most literature-dependent areas of biomedicine.
Every year, thousands of new chemicals are developed, any one of which might pose a potential risk to human health. Korhonen said: "The first stage of any risk assessment is a literature review, which is a major bottleneck as there could be tens of thousands of articles for a single chemical. Performed manually it's expensive and because of the rising number of publications it's becoming too challenging to manage."
Her team has developed the CRAB tool in collaboration with Professor Ulla Stenius' group at the Institute of Environmental Medicine at Sweden's Karolinska Institutet.
The CRAB approach involves developing programs that can analyse natural language texts, despite their complexity, inconsistency and ambiguity. The CRAB technology is billed as the first text-mining tool aimed at aiding literature reviews in chemical risk assessments.
At the press of a button a profile is rapidly built for any particular chemical using all of the available literature, describing highly specific patterns of connections between chemicals and toxicity.