Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    data analytics for pharmacy trends
    How Data Analytics Is Tracking Trends in the Pharmacy Industry
    5 Min Read
    car expense data analytics
    Data Analytics for Smarter Vehicle Expense Management
    10 Min Read
    image fx (60)
    Data Analytics Driving the Modern E-commerce Warehouse
    13 Min Read
    big data analytics in transporation
    Turning Data Into Decisions: How Analytics Improves Transportation Strategy
    3 Min Read
    sales and data analytics
    How Data Analytics Improves Lead Management and Sales Results
    9 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: Calculating the Soft Costs of Hadoop
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Uncategorized > Calculating the Soft Costs of Hadoop
Uncategorized

Calculating the Soft Costs of Hadoop

kingmesal
kingmesal
7 Min Read
Image
SHARE

Image

Image

Estimating the total cost of ownership (TCO) for Hadoop has been a challenge. Costs can be hidden or unanticipated during the planning and deployment stages. But time – and good analytics – have proven that the TCO between Hadoop distributions can vary by as much as 50%. Architecture makes all the difference, particularly when it comes to soft costs – the Hadoop environment. Choose wisely, or your cost-effective Big Data deployment may turn into a huge money-and-resource waster.

The Cost of Expertise

More Read

Optimization Analytics Comes to the Mass Market
Lost in Transmission
“Why taking a few punches on the financial crisis just might save IT” by Patrick Gray on TechRepublic
Dilbert, Data and Decision-making
Marketing Tips: Making a Leap in a Down Economy

The rush to deploy Hadoop is understandable: what enterprise wouldn’t want a solution which has removed all restrictions on how much data a business can store, process, and analyze? Information can be collected from assorted touch points and integrated with vast amounts of previously underutilized or ignored data. New discoveries can be made from seemingly disparate data points, creating actionable information that can provide that much-desired competitive edge. Bottom-line: you can never know too much about your market – and Big Data, when properly utilized, provides understandings that humans simply can’t correlate from the unceasing flow of data.

But Hadoop can be a bit of a wild card. To get the most out of any Big Data project, and to avoid unpleasant surprises down the line, businesses should go into the Hadoop adoption process with eyes wide open. Among the most critical questions to answer is whether the right skill-set to deploy, maintain, and secure Hadoop exists within the organization.

After adopting Hadoop, many companies quickly realize that they simply do not have the in- house expertise to make it run smoothly, let alone to make Hadoop enterprise-ready. At a bare minimum, for success with Hadoop, companies need an IT staff that understands block size configurations, knows what to do if they lose a NameNode, and comprehends the ins and outs of HBase and it’s less-than-straightforward interactions with Hadoop Distributed File System (HDFS).

An enterprise with Hadoop wizards on staff can essentially choose any distribution and end up with an agile, robust deployment. Most businesses, though, underestimate how much time and effort can go into a Hadoop project. And if the necessary chores aren’t carried out, the deployment will cease to be useful very quickly.

In one example cited by CITO Research, in the recent white paper “Five Questions to Ask Before Choosing a Hadoop Distribution,” a manufacturing company that had deployed Hadoop estimated that, shortly after deployment, it was using less than 4% of its Big Data. Due to configuration issues that could not be addressed by the company’s in-house IT, users were experiencing constant downtime during NameNode bottlenecks and upgrades. Many analysts simply opted not to use Hadoop because of these challenges, meaning that all the data in Hadoop was going to waste. The company deployed another distribution of Hadoop and were able to utilize over 75% of their Big Data.

The Cost of Security and Backup

Before a company begins to utilize captured data in new ways, it must ensure that all personally identifiable and sensitive information is classified, and then managed and protected to help ensure privacy and security. New policies may need to be developed to address these issues. User access controls and roles may need to be redefined and implemented. Employees and executives may need to receive training. These are all costly endeavors, but the benefits of a properly deployed Hadoop distribution should provide enough benefits to more than compensate for the “start-up” costs.

Unfortunately, some deployments will demand significant resource and financial investments, offsetting benefits and raising their TCO dramatically over time. As an example, Hadoop’s security features are generally not robust or flexible enough for enterprise use. For compliance and risk management reasons, virtually all enterprises need to utilize multiple levels of encryption – disk encryption and wire-level encryption – to secure data traveling between nodes in the cluster. Apache Hadoop offers neither form of encryption.

To greatly reduce TCO, enterprises will want to look for a distribution that supports native wire-level encryption implemented using public-private key pairs and disk encryption capabilities, along with authentication methods supported for other applications.

Enterprises also, rather obviously, need backup capabilities for their Hadoop clusters. Here again, hidden costs can arise well after deployment. The replication process in HDFS offers protection from disk failure, but it is not immune to human errors. Further, if a file is corrupted, that corrupted file will be automatically replicated across the cluster, exacerbating the problem.

When you roll new code into production, you need a backup in case something goes awry and you need to roll the system back. Apache Hadoop doesn’t offer this capability. To avoid this type of exposure, and the costs in lost productivity that accompany it, businesses should consider a commercial Hadoop distribution with snapshot capabilities. Users can take point-in-time snapshots of every file and table. If an error occurs, the snapshot can be restored.

Your TCO and Hadoop – Calculating The Real Cost

While general TCO issues can be examined and predicted, the true TCO of any given deployment is unique to each enterprise. Skill sets, regulatory requirements, disaster recovery, and many other factors come into play.

A good place to begin the evaluation (or redeployment process) is with the Hadoop TCO Calculator, which provides a personalized overview of the true costs for deploying and running various distributions of Hadoop. You use your own data with this self-service tool and can change the inputs in real time in order to estimate costs across a number of variables in different scenarios. Access the Hadoop TCO Calculator here.

Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

dedicated servers for ai businesses
5 Reasons AI-Driven Business Need Dedicated Servers
Artificial Intelligence Exclusive News
data analytics for pharmacy trends
How Data Analytics Is Tracking Trends in the Pharmacy Industry
Analytics Big Data Exclusive
ai call centers
Using Generative AI Call Center Solutions to Improve Agent Productivity
Artificial Intelligence Exclusive
warehousing in the age of big data
Top Challenges Of Product Warehousing In The Age Of Big Data
Big Data Exclusive

Stay Connected

1.2kFollowersLike
33.7kFollowersFollow
222FollowersPin

You Might also Like

Web Seminar – The Key to Operational BI: Keep It Simple

2 Min Read

Looking for ‘Bad Guys’ with Data Quality Tools

4 Min Read

Technology Today, #20: David Siegel and The Semantic Web

3 Min Read

New York Diner

3 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

ai in ecommerce
Artificial Intelligence for eCommerce: A Closer Look
Artificial Intelligence
ai chatbot
The Art of Conversation: Enhancing Chatbots with Advanced AI Prompts
Chatbots

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?