We data folks live in exciting times.

As we saw at the Strata + Hadoop show in San Jose, open source developers continue to deliver new ways to analyze high volumes of fast-moving data.

One of the hottest, Apache Kafka, can feed data from thousands of applications to emerging platforms like HBase and Cassandra. Enterprises can use Kafka message brokers to tap real-time data streams from myriad sources to address a range of use cases. They can:

  • Engage prospects that visit their websites based on highly granular activity tracking
  • Correlate transaction histories with store sensors and smartphone apps to make location-based retail offers to customers
  • Manage supply chains and product shipments based on real-time location checks, operational metrics and traffic patterns

Cool stuff!

But innovations like Kafka also raise questions about the role of the enterprise data warehouse (EDW), the reliable system of record that often will be a source but not target for streaming use cases like the bulleted examples above. The short answer? EDWs must co-exist with more and more complementary platforms.

“Accept that the world will get more distributed,” Gartner VP Ted Friedman advised attendees at the Gartner Enterprise Information & Master Data Management Summit in Dallas in March.

Enterprises of all types are re-examining long-held assumptions about the EDW. Recently a global financial services organization in EMEA speculated to us that their Data Lake might become their central data “hub,” with EDWs serving as “spokes” for various lines of business.

While the merits of such a strategy will vary by customer, we find the most successful enterprises follow some consistent guiding principles.

  • Hadoop is a must-have architecture component. Hadoop’s ability to cost-effectively process fast-growing volumes of structured and unstructured data has proven a powerful complement to the data warehouse. In many cases EDWs remain the system of record and Hadoop serves as the analytics testbed for new data types and user cases. But perhaps the most compelling reason to invest in a Data Lake now is to tap future innovation opportunities. Hadoop is the focal point for Apache open source contributions such as Spark, Kafka and Storm. Join the community today to capitalize on future game changers.
  • The customer is king. Analyst George Gilbert of Wikibon envisions the rise of “systems of engagement” that enable enterprises to identify, predict and shape individual customer experiences. To do this, they are analyzing historic and real-time customer activities across multiple channels to act most effectively on their 360-degree view. As a case in point, the Canada-based digital bank Tangerine integrates social media, emails and customer records on SQL server to improve both real-time customer service and longer-term product offerings.
  • Data analytics can and should become a profit center. Ted Friedman and Debra Logan of Gartner predicted at the Gartner conference in Dallas that by 2020, half of enterprises will “successfully” link financial objectives to data and analytics, and 10 percent will “have a highly profitable business unit specifically for productizing and commercializing their information assets.” So while platforms will proliferate, leading enterprises will assign teams to put the pieces together to capitalize on digital insights.
  • Go to war for talent. Analytics initiatives are only as good as the people that drive them. The leaders are doing what it takes to win talent when for Apache Hadoop, Kafka and Spark. Even in today’s tight job market, it is easier to find and hire experts with the right skills than to develop that expertise exclusively in-house.
  • Automate. Automate. Automate. With so many new tasks requiring deep expertise, it is critical to take the manual labor out of repetitive tasks like ETL and creating/managing data warehouses, when possible.

While some of these principles seem revolutionary, enterprises can and should take an incremental approach, continuously re-shaping and experimenting.