Look, Ma. No ETL

July 13, 2010
87 Views

One of the first things you learn about in business intelligence is ETL. Raw data gets harvested, washed and served. But Sandy Steier hadn’t heard.

Sandy had been busy analyzing data. For years on Wall Street, he pored over mortgage-backed securities with a tool he and peers developed for themselves.

One of the first things you learn about in business intelligence is ETL. Raw data gets harvested, washed and served. But Sandy Steier hadn’t heard.

Sandy had been busy analyzing data. For years on Wall Street, he pored over mortgage-backed securities with a tool he and peers developed for themselves.

He only learned of ETL recently. He’d become acquainted with a data architect with whom he shared a bus ride every day to and from their offices in downtown Manhattan. “I had never really spoken to him before,” Sandy recalls. “He was in a different world even though we both dealt with data.”

Sandy described to him his rapidly maturing tool. As I imagine the scene, the calm data architect suddenly twisted himself on the cramped bus seat to face Sandy. “You don’t do ETL? You work with raw data??”

No, he didn’t do any ETL, Sandy explained. “We didn’t realize how important that was,” he recalled. “We had always just stuck the raw data into the database and then realized, ‘Hey, this data’s a mess.’” He instructed users to clean it themselves. “You get the data from the horse’s mouth. You’re the expert. We didn’t realize how powerful this was.”

In Sandy’s system, you don’t worry about database design. He and his partners not only didn’t worry about ETL, they wondered how data analysis could not be done their way — import first, clean later. “It makes good sense if you can get away with it.”

A crucial factor that lets the tool work as it does is speed. It allows the 1010Data engine to calculate and recalculate repeatedly. The summaries that cubes harbor for anticipated queries are no longer necessary. Parallel processing with a columnar database runs fast enough. In place of ETL, he uses what he now calls “ELTAR,” for extract, load, and transform as required.

A hurdle, he says, is conventional beliefs held by his sales prospects. In one phone call recently, he explained to a prospect that ETL was unnecessary. The man replied, “That’s not credible.” In fine sales form, Sandy said, “Then you’ll be impressed when I prove it to you.” The prospect replied more firmly, “You don’t understand. That’s not credible.”

Actually, the technology’s credibility doesn’t matter much. The company, 1010Data, offers reporting and analytics on the cloud — invisible to customers except for the results. Sandy says, “We could have monkeys writing on scratchpads.” To those willing to try, he offers to prove it with the prospect’s own data.

Their technology’s speed allows them to do the work of dozens with a team of a few people, he says, and to finish large data warehouse projects in weeks that would otherwise take months or years. If multiple customers use the same data, such as stock market data, the time required is even less.

All without ETL.