Data Lakes and Network Optimization: What’s Next for Telecommunications and Big Data
Relational data warehouses served communications service providers well in the past, but it’s time to start thinking beyond columns and rows. Unstructured data will be the fuel that powers risk management and decision-making in the near future. And to use all sorts of data to its fullest potential, we need new ways of storing, accessing and analyzing that data.
Data Lakes and Sandboxes
A data lake (enterprise data hub) — a massive data repository typically based on a Hadoop architecture and housed on a commodity hardware cluster — can not only solve the problems of data storage, integration and accessibility but also enables better real-time analysis and decision-making. Information (structured, semi-structured and unstructured) stored in a data lake retains its native format and original attributes, ensuring it is properly conserved for future use.
You can also create a more defined data lake — usually referred to as a data sandbox — for a project with a defined scope.
One potential pitfall to be aware of before you dive into the creation of a data lake is that these deployments may be best used by data analysis experts. A huge amount of unstructured data with ungoverned metadata can be a challenge for the ordinary user. Sandboxes are a good way to test the data lake environment, and can also be utilized to gain many of the benefits of a data lake without a massive migration project.
Security can also be an issue in a data lake. Speak with IT and your data privacy team to determine what data can go into the lake and how you can protect information from unauthorized users. Regulatory compliance issues can also arise. You will need to develop a process that keeps personally identifiable data controlled and protected, or run the risk of data exposure. Enterprise-grade solutions can provide the tools you need to secure a data lake, but every company needs to determine its own acceptable exposure to risk and govern its data accordingly.
Responsive Network Management
One initial data lake project for a CSP to consider is a responsive network management initiative. Your company is probably already doing this to a greater or lesser degree, but a data lake would enable analysts working with huge amounts of historical and real time data generated by switches, routers and other infrastructure to understand the big network picture then and now.
This information can be correlated to spot trends and determine patterns and predictable behavior in order to ramp up overall efficiency across the board. You might opt to begin with a test project such as determining whether and where to add infrastructure to meet SLAs, allocating bandwidth in real-time, Quality-of-Service (QoS) issues such as latency and reliability or predicting network component failures before they happen.
Data That You Can Trust
Obviously, when you are making business shifts based on what your data is telling you, you want to know that you can trust that data.You may also want to speak with the people who are doing much of the data analysis in your company to gauge how much they determine data quality now. Do they rely somewhat on the lineage of that data? Do they find value in reviewing how other analysts have worked with that same data set? Asking what your analysts need or might not need in terms of metadata is a key requirement that should be considered.
Data lakes suit the needs of those who can sort out the contextual bias of data captured from multiple sources and comfortably merge and reconcile information from structured, semi-structured and unstructured sources.
Choosing the Right Architecture
Apache Hadoop is ideally suited to the data lake or sandbox scenario. It runs on commodity hardware, it provides the best storage bang-for-the-buck and it processes massive amounts of data — of any type — very efficiently.
Enterprise distributions of Hadoop add more effective backup options and mission-critical robustness.
Hadoop also provides a platform that can be built on. As we move away from discount-driven competition, CSPs will increasingly rely on leveraging big data to tightly target customer offerings. And as we move toward an ever more seamless integration between all communications services, the data that flows from all those sources will serve as a great foundation.
If you’re interested in learning more about how Hadoop can help your business, be sure to download the free ebook “The Executive’s Guide to Big Data & Apache Hadoop”.
Currently I'm the Sr. Product Marketing Manager at MapR Technologies. I have 10 + years of experience in the technology industry in marketing, sales and consulting. I have an executive MBA from the Fuqua School of Business, Duke University. My domain of expertise is in business intelligence & analytics.