By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData Collective
  • Analytics
    AnalyticsShow More
    predictive analytics in dropshipping
    Predictive Analytics Helps New Dropshipping Businesses Thrive
    12 Min Read
    data-driven approach in healthcare
    The Importance of Data-Driven Approaches to Improving Healthcare in Rural Areas
    6 Min Read
    analytics for tax compliance
    Analytics Changes the Calculus of Business Tax Compliance
    8 Min Read
    big data analytics in gaming
    The Role of Big Data Analytics in Gaming
    10 Min Read
    analyst,women,looking,at,kpi,data,on,computer,screen
    Promising Benefits of Predictive Analytics in Asset Management
    11 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-23 SmartData Collective. All Rights Reserved.
Reading: Lower Big Data Hardware TCO with Hadoop
Share
Notification Show More
Latest News
ai software development
Key Strategies to Develop AI Software Cost-Effectively
Artificial Intelligence
ai in omnichannel marketing
AI is Driving Huge Changes in Omnichannel Marketing
Artificial Intelligence
ai for small business tax planning
Maximize Tax Deductions as a Business Owner with AI
Artificial Intelligence
ai in marketing with 3D rendering
Marketers Use AI to Take Advantage of 3D Rendering
Artificial Intelligence
How Big Data Is Transforming the Maritime Industry
How Big Data Is Transforming the Maritime Industry
Big Data
Aa
SmartData Collective
Aa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Big Data > Lower Big Data Hardware TCO with Hadoop
Big Data

Lower Big Data Hardware TCO with Hadoop

kingmesal
Last updated: 2015/11/19 at 9:11 AM
kingmesal
6 Min Read
SHARE

Apache Hadoop unquestionably delivers ROI by increasing the flexibility of data management/storage, processing, and analytics abilities. And it’s logical to assume that “commodity” servers translate into cost savings. But the truth is that while Hadoop can and does deliver significant cost reductions and revenue gains, much depends on the actual deployment.

Apache Hadoop unquestionably delivers ROI by increasing the flexibility of data management/storage, processing, and analytics abilities. And it’s logical to assume that “commodity” servers translate into cost savings. But the truth is that while Hadoop can and does deliver significant cost reductions and revenue gains, much depends on the actual deployment.

The true total cost of ownership (TCO) of any distributed system is dependent on architecture and best practice IT operations. The cost of purchasing and maintaining hardware aside, performance, reliability and scalability are critical to real TCO—a sluggish, unstable system may cost less for its physical structure, but will drain capital as well as operational expense resources from the company.

Hadoop itself is not a magical solution that makes data management faster, easier, and cheaper. The architectural differences between Hadoop distributions can save companies 20-50% in TCO.

More Read

How Big Data Is Transforming the Maritime Industry

How Big Data Is Transforming the Maritime Industry

Utilizing Data to Discover Shortcomings Within Your Business Model
Small Businesses Use Big Data to Offset Risk During Economic Uncertainty
The Importance of Data-Driven Approaches to Improving Healthcare in Rural Areas
Analytics Changes the Calculus of Business Tax Compliance

Know Your Hadoop Distribution

Hadoop was created to fill a specific need. To grow and prosper on the web, companies such as Yahoo and Facebook needed a way to efficiently work with very large, dissimilar data sets. Hadoop was the solution.

The open source community further developed Hadoop, and applications for the platform, to meet various data-intensive use cases. As Hadoop gained users, attention turned to ways to tune and refine the platform to reduce costs while improving the performance of the Hadoop parallel processing architecture. For mission-critical business use, Apache Hadoop’s reliance on NameNodes and its lack of a read-write file system created problems in terms of scalability, reliability, and performance. MapR created a file system to make Hadoop enterprise-ready and to remove the single point of failure-the NameNode.

The difference Hadoop can make in TCO when you consider data storage costs alone is impressive. The average cost to store data in a data warehouse is greater than $10,000/terabyte. The cost for a mature enterprise-grade distribution of Hadoop is less than $1,000/terabyte. But how available is that data?

One of Hadoop’s biggest TCO-killers is its use of NameNodes. A NameNode holds metadata about the information stored on all the DataNodes in a cluster. This architecture creates huge data access bottlenecks for businesses, and also creates a single point of failure in which a single job processing 10 million files could make an entire cluster inoperable.

There is a workaround for the latter issue: implementing the journaling NameNode concept removes the single failure threat. Unfortunately setting this up is a complicated procedure which requires an additional investment in hardware, and it doesn’t mitigate the processing bottleneck problem. To solve this problem effortlessly, look for a distribution that forgoes NameNodes in favor of distributed metadata architecture like MapR.

NameNodes also affect Hadoop’s scalability—single clusters can only scale as large as their NameNode, which means a lot of time, resources, and skill are required to ensure the best utilization of blocks. Unmanaged, a 1 MB file could consume a 2 GB block. But when compressed or sequenced to bypass scalability problems, data may not be as readily available for use in analysis.

A mature, enterprise-ready distribution of Hadoop eliminates block size limitations as well as limitations that exist for the total number of files the cluster can manage. It will also utilize a read-write file system to achieve high availability and low latency reliability.

Evaluating For TCO

When evaluating Hadoop distributions for enterprise use, factor in the setup and maintenance time costs along with price of the hardware. Find the sweet spot—the distribution that can deliver higher performance on less hardware. Look for distributions that can perform operations and analytics on the same platform, and that support volumes, making large amounts of data easier to manage.

Look also for features that will enable your Hadoop deployment to continue to deliver strong TCO. For example, a distribution that supports Apache Hadoop’s APIs, as well as multiple versions of key Hadoop components (rather than a single version of Hive, Spark, and Flume, etc.), will help you avoid forced upgrades down the road.

You’ll also want to make sure that the distribution you are evaluating doesn’t make your data a hostage to an exclusive data format. Additionally, since many enterprises utilize applications that work in conjunction with NFS, look for full NFS compatibility to leverage the skills and solutions that you already have in house.

Your Cluster, Your TCO

The information provided above is, by necessity, general in nature. If you’d like to gauge the TCO of Hadoop distributions using your own data, across a number of variables in different scenarios, you can do so with this TCO Calculator for Hadoop. Simply input your estimated volume of data, number of files, and data growth per year, then test different scenarios by adjusting variables across hardware costs, software costs, staffing, and environmental issues.

kingmesal November 19, 2015
Share this Article
Facebook Twitter Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

ai software development
Key Strategies to Develop AI Software Cost-Effectively
Artificial Intelligence
ai in omnichannel marketing
AI is Driving Huge Changes in Omnichannel Marketing
Artificial Intelligence
ai for small business tax planning
Maximize Tax Deductions as a Business Owner with AI
Artificial Intelligence
ai in marketing with 3D rendering
Marketers Use AI to Take Advantage of 3D Rendering
Artificial Intelligence

Stay Connected

1.2k Followers Like
33.7k Followers Follow
222 Followers Pin

You Might also Like

How Big Data Is Transforming the Maritime Industry
Big Data

How Big Data Is Transforming the Maritime Industry

8 Min Read
utlizing big data for business model
Big Data

Utilizing Data to Discover Shortcomings Within Your Business Model

6 Min Read
big data use in small businesses
Big Data

Small Businesses Use Big Data to Offset Risk During Economic Uncertainty

7 Min Read
data-driven approach in healthcare
Analytics

The Importance of Data-Driven Approaches to Improving Healthcare in Rural Areas

6 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

giveaway chatbots
How To Get An Award Winning Giveaway Bot
Big Data Chatbots Exclusive
data-driven web design
5 Great Tips for Using Data Analytics for Website UX
Big Data

Quick Link

  • About
  • Contact
  • Privacy
Follow US

© 2008-23 SmartData Collective. All Rights Reserved.

Removed from reading list

Undo
Go to mobile version
Welcome Back!

Sign in to your account

Lost your password?