Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    image fx (67)
    Improving LinkedIn Ad Strategies with Data Analytics
    9 Min Read
    big data and remote work
    Data Helps Speech-Language Pathologists Deliver Better Results
    6 Min Read
    data driven insights
    How Data-Driven Insights Are Addressing Gaps in Patient Communication and Equity
    8 Min Read
    pexels pavel danilyuk 8112119
    Data Analytics Is Revolutionizing Medical Credentialing
    8 Min Read
    data and seo
    Maximize SEO Success with Powerful Data Analytics Insights
    8 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: Calculating the Soft Costs of Hadoop
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Uncategorized > Calculating the Soft Costs of Hadoop
Uncategorized

Calculating the Soft Costs of Hadoop

kingmesal
kingmesal
7 Min Read
Image
SHARE

Image

Image

Estimating the total cost of ownership (TCO) for Hadoop has been a challenge. Costs can be hidden or unanticipated during the planning and deployment stages. But time – and good analytics – have proven that the TCO between Hadoop distributions can vary by as much as 50%. Architecture makes all the difference, particularly when it comes to soft costs – the Hadoop environment. Choose wisely, or your cost-effective Big Data deployment may turn into a huge money-and-resource waster.

The Cost of Expertise

More Read

CTO Perspectives on Cyber Security Bill
Best Market Research Survey Ever
R in the NYTimes… [1]
Leaders’ Perspectives on Big Data
O Knowledge Graph, Where Art Thou?

The rush to deploy Hadoop is understandable: what enterprise wouldn’t want a solution which has removed all restrictions on how much data a business can store, process, and analyze? Information can be collected from assorted touch points and integrated with vast amounts of previously underutilized or ignored data. New discoveries can be made from seemingly disparate data points, creating actionable information that can provide that much-desired competitive edge. Bottom-line: you can never know too much about your market – and Big Data, when properly utilized, provides understandings that humans simply can’t correlate from the unceasing flow of data.

But Hadoop can be a bit of a wild card. To get the most out of any Big Data project, and to avoid unpleasant surprises down the line, businesses should go into the Hadoop adoption process with eyes wide open. Among the most critical questions to answer is whether the right skill-set to deploy, maintain, and secure Hadoop exists within the organization.

After adopting Hadoop, many companies quickly realize that they simply do not have the in- house expertise to make it run smoothly, let alone to make Hadoop enterprise-ready. At a bare minimum, for success with Hadoop, companies need an IT staff that understands block size configurations, knows what to do if they lose a NameNode, and comprehends the ins and outs of HBase and it’s less-than-straightforward interactions with Hadoop Distributed File System (HDFS).

An enterprise with Hadoop wizards on staff can essentially choose any distribution and end up with an agile, robust deployment. Most businesses, though, underestimate how much time and effort can go into a Hadoop project. And if the necessary chores aren’t carried out, the deployment will cease to be useful very quickly.

In one example cited by CITO Research, in the recent white paper “Five Questions to Ask Before Choosing a Hadoop Distribution,” a manufacturing company that had deployed Hadoop estimated that, shortly after deployment, it was using less than 4% of its Big Data. Due to configuration issues that could not be addressed by the company’s in-house IT, users were experiencing constant downtime during NameNode bottlenecks and upgrades. Many analysts simply opted not to use Hadoop because of these challenges, meaning that all the data in Hadoop was going to waste. The company deployed another distribution of Hadoop and were able to utilize over 75% of their Big Data.

The Cost of Security and Backup

Before a company begins to utilize captured data in new ways, it must ensure that all personally identifiable and sensitive information is classified, and then managed and protected to help ensure privacy and security. New policies may need to be developed to address these issues. User access controls and roles may need to be redefined and implemented. Employees and executives may need to receive training. These are all costly endeavors, but the benefits of a properly deployed Hadoop distribution should provide enough benefits to more than compensate for the “start-up” costs.

Unfortunately, some deployments will demand significant resource and financial investments, offsetting benefits and raising their TCO dramatically over time. As an example, Hadoop’s security features are generally not robust or flexible enough for enterprise use. For compliance and risk management reasons, virtually all enterprises need to utilize multiple levels of encryption – disk encryption and wire-level encryption – to secure data traveling between nodes in the cluster. Apache Hadoop offers neither form of encryption.

To greatly reduce TCO, enterprises will want to look for a distribution that supports native wire-level encryption implemented using public-private key pairs and disk encryption capabilities, along with authentication methods supported for other applications.

Enterprises also, rather obviously, need backup capabilities for their Hadoop clusters. Here again, hidden costs can arise well after deployment. The replication process in HDFS offers protection from disk failure, but it is not immune to human errors. Further, if a file is corrupted, that corrupted file will be automatically replicated across the cluster, exacerbating the problem.

When you roll new code into production, you need a backup in case something goes awry and you need to roll the system back. Apache Hadoop doesn’t offer this capability. To avoid this type of exposure, and the costs in lost productivity that accompany it, businesses should consider a commercial Hadoop distribution with snapshot capabilities. Users can take point-in-time snapshots of every file and table. If an error occurs, the snapshot can be restored.

Your TCO and Hadoop – Calculating The Real Cost

While general TCO issues can be examined and predicted, the true TCO of any given deployment is unique to each enterprise. Skill sets, regulatory requirements, disaster recovery, and many other factors come into play.

A good place to begin the evaluation (or redeployment process) is with the Hadoop TCO Calculator, which provides a personalized overview of the true costs for deploying and running various distributions of Hadoop. You use your own data with this self-service tool and can change the inputs in real time in order to estimate costs across a number of variables in different scenarios. Access the Hadoop TCO Calculator here.

Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

image fx (2)
Monitoring Data Without Turning into Big Brother
Big Data Exclusive
image fx (71)
The Power of AI for Personalization in Email
Artificial Intelligence Exclusive Marketing
image fx (67)
Improving LinkedIn Ad Strategies with Data Analytics
Analytics Big Data Exclusive Software
big data and remote work
Data Helps Speech-Language Pathologists Deliver Better Results
Analytics Big Data Exclusive

Stay Connected

1.2kFollowersLike
33.7kFollowersFollow
222FollowersPin

You Might also Like

What to keep in our heads?

10 Min Read

Social business policies

9 Min Read

CES Showcases Content Anywhere. Here’s a Real-World Reality Check.

7 Min Read

What Does the Regression Model Indicate?

2 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

ai chatbot
The Art of Conversation: Enhancing Chatbots with Advanced AI Prompts
Chatbots
AI and chatbots
Chatbots and SEO: How Can Chatbots Improve Your SEO Ranking?
Artificial Intelligence Chatbots Exclusive

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?