Built mainly on open-source software, the ACSO's CancerLinQ project is a "learning health system" that will eventually analyze data from millions of cancer patients via their EHRs. The prototype system ingests de-identified patient data form two dozen oncology practices.
"We architected the system in such as way as to be able to accept any data in any format and then we used machine-learning algorithms to identify what was sent to us," Hauser said.
Once in the database, the data is mapped to a standardized medical vocabulary such as would be contained in the World Health Organization's International Classification of Diseases (ICD).
While the prototype was built just as a proof of concept, cancer doctors will eventually be able to consult the full-scale database like a Google search. That will allow doctors to see how patients with the same types of cancer were treated around the country, and how they fared.
While currently using a NoSQL, CouchDB database backend, the ASCO is considering using Cassandra with Hadoop for the full build. That database is expected to be completed in 12 to 18 months.
Beyond helping an individual patient, big data will allow the healthcare community as a whole detect poor drug interactions quickly. "So this gives us the ability to look at that [common cancer] population and figure out the best dosages and cycles of treatment," Hauser said.
While the ASCO is among the largest cancer research organizations, it is by no means alone in its use of big data in determining best practices.
Cleveland Clinic - a 4,500-bed healthcare system - uses an EHR from Epic Systems and a SQL transactional database for retrospective data analysis of its EHRs to improve patient treatment.
"We think first about outcomes: what data can we collect and make available to clinicians so they know how well they're doing in treating their patient," said Dr. C Marin Harris, CIO of Cleveland Clinic.
Cleveland Clinic is also starting to use Hadoop, but it's still a small part of the research because data is internally confined.
"It may appear if we only analyze Cleveland Clinic data that we're doing well with regard to a patient, but in fact if the patient went to someone else's emergency room 10 times, we didn't know that," Harris said.
Cleveland Clinic is working with other state health plans to collect a broader swath of patient data. Along with Ohio's other largest healthcare provider, University Hospitals, Cleveland Clinic is preparing to share data across Ohio's statewide electronic medical records exchange, CliniSync .
Once on the exchange, Cleveland Clinic will be electronically linked to 21 other hospitals already using the system.