Sign up | Login with →

Hadoop

Retail Data Monetization: Are you sitting on top of a retail goldmine?

June 28, 2016 by Ajith Nayar

The majority of sales generated in the 4.5 trillion dollar US retail market is in-store and the volume of transaction data collected at various points in the trading process is immense. This data is a treasure trove of customer insight as well as product performance. While many retailers mine this data to gain specific insights into understanding their shoppers better, the possibilities that such data analysis opens up is largely untapped.[read more]

Apache Spark and Hadoop: The best big data solution for enterprises

June 1, 2016 by Jagadish Thaker

The term big data has become the center of attention for enterprises. In the past, business decisions have been made on the basis of transactional data stored in relational databases. This is known as traditional data, which is in structured form and easy to analyze for getting business insights.Apart from this critical business data,...[read more]

exclusive

Five Steps to Successfully Manage Multiple Data Platforms

April 13, 2016 by Kevin Petrie

We data folks live in exciting times.As we saw at the Strata + Hadoop show in San Jose, open source developers continue to deliver new ways to analyze high volumes of fast-moving data.One of the hottest, Apache Kafka, can feed data from thousands of applications to emerging platforms like HBase and Cassandra. Enterprises can use Kafka...[read more]

Apache Drill vs. Apache Spark: What’s The Right Tool for the Job?

March 1, 2016 by Jim Scott

If you’re looking to implement a big data project, you’re probably deciding whether to go with Apache Spark SQL or Apache Drill. This article can help you decide which query tool you should use for the kinds of projects you’re working on.[read more]

Preparing Yourself to Move to Apache Spark

February 29, 2016 by Jim Scott

While MapReduce has been the mainstay of Hadoop processing, Apache Spark is now taking the throne as the way to handle distributed computation. The reasons are obvious: Spark is very fast due to its use of Resilient Distributed Datasets, or RDDs, and it has a clean programming model.[read more]

A Guide to Spark Streaming - Code Examples Included

February 25, 2016 by Jim Scott

Apache Spark is great for processing large amounts of data over large clusters, but wouldn’t it be great if you could process data in near real time? You can with Spark Streaming.[read more]

exclusive

Big Data and Hadoop Development 2016

December 16, 2015 by Jenny Brown

Solving the Hadoop challenges and shortcomings with SAS will allow you to make the most of Big Data and use it as a catalyst to bring about positive outcomes of organizational growth, profit and development.[read more]

exclusive

Hadoop and Spark: Better Together

October 1, 2015 by Jim Scott
1

The various online reports about the end of Hadoop as a big data framework bring to mind Mark Twain’s notable quote about the reports of his demise being an exaggeration. Hadoop is very much alive, and numerous organizations continue to make it a key component of their big data and analytics initiatives.[read more]

Managing Big Data Integration and Security with Hadoop

September 2, 2015 by Jason Parms
5

An open-source framework like Hadoop offers endless possibilities for development, and with a strong management group like Apache Systems behind it, one can expect increasing numbers of modules and technologies to integrate with Hadoop to enable your business to achieve its Big Data goals – and maybe even go significantly beyond what you can envision today.[read more]

exclusive

Data, Data, Data: Communicating Successfully in the Deluge

September 1, 2015 by Daniel Matthews

Now that big data is in the mix, there’s potential for a ton of noise. The business best able to cull what’s vital from the deluge of data, turn it into information and communicate it loud and clear will ride the landslide of data to success.[read more]

Taking the Mystery Out of Big Data

July 10, 2015 by Doug Lautzenheiser
1

In the not-so-distant past, firms tracked their own internal transactions and master data (products, customers, employees, and so forth) but little else. Companies probably only had very large databases if their industry called for high-volume and high-speed applications such as telecommunication, shipping, or point of sales. Even then, these transactions were all formatted in a standard way and could be saved inside the relational database IBM designed in the 1960s.[read more]

exclusive

Big Data Hadoop Use Cases in the Oil and Gas Industry

June 22, 2015 by Dave Mendle
1

While U.S. oil production has begun expanding - so much so that the International Energy Agency predicts that by 2016 the US will surpass Saudi Arabia and Russia - the rest of the world’s oil production has ceased to expand. In an effort to streamline and optimize oil and gas production methods, advances in instrumentation, process automation, collaboration, and data management are being developed.[read more]

exclusive

How Big Data Affects Us Through the Internet of Things

June 19, 2015 by Dave Mendle

Although big data is ever-present in our lives, it can be difficult to understand how much it really has changed our day-to-day living. Let’s take a closer look at how big data has weaved its way into the lives of many consumers today, via the Internet of Things (IoT).[read more]

exclusive

4 Considerations When Choosing a Hadoop Distribution

June 18, 2015 by Dave Mendle

Choosing the right Hadoop distribution can be a tricky process. Many businesses looking to adopt Hadoop in their data infrastructure have a hard time figuring out what really differentiates one distribution from another. With so many options available, it’s easy to get lost in the choices.[read more]

exclusive

The Data Lake: A More Balanced Perspective

June 17, 2015 by Tamara Dull

The Big Data MOPS Series.

The recent data lake debate with my colleague, Anne Buff, may be over, but the discussion in many organizations is just getting started. What we learned during the debate—and you may be discovering in your own organization—is that it forces the larger discussion of managing growing volumes of data in a big data world. With the onslaught of big data technologies in recent years, organizations are having to look once again at the underlying technologies supporting their data collection, processing, storage, and analysis activities. And right now, the Hadoop-based data lake happens to be a very popular option.[read more]