Stock service's data warehouse now delivers in seconds, not hours
Until a year ago, Pink OTC Markets Inc. had a problem. The company, which operates the Pink Sheets over-the-counter stock quotation service and the OTCQX service for foreign stocks to trade in the United States, couldn't deliver the reports its customers wanted.
Companies whose stock prices were quoted on the exchanges wanted data to understand why their stock price moved, as did trading firms. Head traders, meanwhile, wanted to get data to analyze the performance of traders working underneath them, while compliance officers wanted the same data to make sure they were following beefed-up SEC regulations.
The company used an Informix database from IBM to process stock trades and quotes as well as store it for data warehousing purposes. But reports took several hours to generate. And without a Web front-end, they would have to be created by a Pink OTC employee for every ad hoc inquiry, said Rahul Bose, director of application development for Pink OTC.
Last year, Pink OTC began testing a data warehouse using a columnar database from Vertica Systems Inc. and high-speed data-integration software from SyncSort Inc. "Reports that used to take two hours to run now can show up 10-15 seconds later," said Bose.
The reports can be generated on stock quotes and trades executed as recent as 5 seconds ago. That virtual real-time performance is enabled by SyncSort's DMExpress software, which the vendor says can load 5.4 TB of data into the Vertica data warehouse in an hour, and the three server cluster now running Vertica in production.
Not that Pink OTC needs that kind of performance, yet. Its billion record database takes up about half a terabyte. But the firm is adding several million records a day, a rate of data volume that should grow, said Bose, as the markets recover and as Pink OTC continues to attract more large foreign firms to OTCQX.
OTXQX lets foreign stocks make themselves available to U.S. investors without having to, for example, list as an American Depository Receipt (ADR) on the New York Stock Exchange. This allows them to avoid more direct SEC scrutiny and the extra costs associated with Sarbanes-Oxley.
Firms listing on OTCQX already have a combined market capitalization of $420 billion, said Bose. They include Benetton, Adidas and Roche, said Bose.
Pink OTC looked at three main alternatives to Vertica-SyncSort: a separate Informix instance tuned for data warehousing, the InfoBright columnar database, and a popular capital markets application called OneTick, said Bose. The firm eliminated OneTick because it didn't support the JDBC API for Java. Pink OTC is a Java shop.
The firm then eliminated Informix because of the cost. According to Bose, Informix licenses combined with a Storage Area Network (SAN) he felt would be needed to "get good performance" with Informix would have brought the cost to more than a million dollars, compared to the several hundred thousand it spent on Vertica and its cluster.
That cluster is made up of Sun servers with dual quad-core CPUs and the Red Hat Linux operating system, using conventional hard drives.
"In the grand scheme of things, these are pretty low-end machines," he said.
It then chose Vertica over InfoBright for several reasons. InfoBright, while easier to use, "doesn't give you as many knobs and dials" to enable performance optimization, he said, though he cautioned this was based on a bakeoff from last year.
While InfoBright was roughly as fast as Vertica out of the box, it lagged by up to a factor of ten after Vertica was optimized, said Bose.
Vertica was also easier to set up in a cluster than InfoBright, Bose said, and can continue to run even when one node crashes, is taken down, and then rejoined.