Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    predictive analytics risk management
    How Predictive Analytics Is Redefining Risk Management Across Industries
    7 Min Read
    data analytics and gold trading
    Data Analytics and the New Era of Gold Trading
    9 Min Read
    composable analytics
    How Composable Analytics Unlocks Modular Agility for Data Teams
    9 Min Read
    data mining to find the right poly bag makers
    Using Data Analytics to Choose the Best Poly Mailer Bags
    12 Min Read
    data analytics for pharmacy trends
    How Data Analytics Is Tracking Trends in the Pharmacy Industry
    5 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: Better than Brute Force: Big Data Analytics Tips
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Big Data > Data Mining > Better than Brute Force: Big Data Analytics Tips
AnalyticsData MiningPredictive AnalyticsStatistics

Better than Brute Force: Big Data Analytics Tips

metabrown
metabrown
10 Min Read
SHARE

Long before I encountered the term “Big Data,” questions about dealing with large datasets were a routine part of my work. This situation was by no means unique. Government agencies and many businesses have been amassing repositories of detailed data for a long time.

Long before I encountered the term “Big Data,” questions about dealing with large datasets were a routine part of my work. This situation was by no means unique. Government agencies and many businesses have been amassing repositories of detailed data for a long time.

Consider, for example, the transaction history of a large retailer. A single transaction might include one item, or dozens. For each item, there may be several descriptors, such as a product ID and price. Besides the items purchased, there is the time at which the transaction took place, the register, the cashier, customer payment information and more. Each item corresponds to a lot, distributor, manufacturer, shipper and so on. The customer corresponds to another stream of information, covering previous purchases, loyalty program status and marketing history. This is adding up to a lot of data, and we’re still describing just one transaction.  A department store chain, grocer or big box retailer may handle a thousand transactions or more each day, year round, at each of hundreds, even thousands, of outlets.

More Read

big data in insurance
How Data and Analytics Can Improve Insurance Claims Management
A Self-Driving Car Will Create 1 Gigabyte of Data Per Second: New Big Data Opportunity?
Video Analytics Get Pedestrian in New York City
Use AI to Get the Most Out of Your Social Media Marketing Strategy
How Data Analytics Reshape the Ed-Tech Industry

Lately, I have heard “Big Data” used to describe everything from baseball statistics, which are far smaller in scale than the retailer example, to online retailing and social media, where the data resources can become enormous. Big is relative, yet when people ask, “Can you handle our data?” they all have certain shared concerns.

The prospect who asks if you can handle a large volume of data is looking for reassurance, and proof that you have something of value to offer. Here are some thoughts behind the question…

It’s hard for us to store this much data, let alone do anything useful with it.

We’ve tried some things that didn’t work.

Some of the things that didn’t work crashed the system.

We spent a lot of money on the last thing that didn’t work. Spending a lot of money on another thing that doesn’t work could be a career-ending move.

Of course, your answer will boil down to “Of course we can handle your data!” but if you say it that way, most people won’t believe you, and for good reasons. They want an answer that addresses those unspoken concerns.

So what’s the right way to respond? Answer the question with questions. At least, that is the way to begin. (This is true whether you are a vendor, outside consultant, or an internal resource such as a business analyst.) First, ask about goals. What kind of questions does your prospect expect to answer? Why? How will the information be of use to the business? What’s the vision for integrating analytics into decision-making and business processes?

Why begin with these questions? Primarily to learn the answers, because the answers will guide you in everything else you ask and do. But there are other reasons as well. Asking questions about business goals helps your clients to become aware of gaps in their own reasoning and other challenges of meeting expectations. The process of describing goals can easily lead someone to a realization that the data doesn’t support their wants, and some redirection is necessary. You’ll be saving yourself and your client a lot of trouble by getting the big questions on the table from the start.

Your clients are, most likely, a smart bunch of people. They are experts in their own professions, not yours. They may be unsure of how to evaluate you and the services you offer. Asking questions in a respectful manner is a way to show that you care about them and what they do. If you have put serious thought into the questions you ask and listen to the answers, you’re very likely to uncover some issues of importance, including some your prospect had not previously considered. Your client develops an appreciation for what you know, what you can do for them. As you develop an understanding of needs, you prospect develops respect and trust for you.

While you ask questions about Big Data analysis goals, keep asking yourself how much data is required to address the client’s goals. Just because an organization has endless heaps of data doesn’t mean there is a reason to touch and feel every bit of it.

Is the client asking for information about every single person in the data – individually? Very often, that’s not the case. And if that’s not the case, handling the data becomes simpler. Focusing on particular segments? OK, then you need data relevant to those segments, not everybody and her mother. Is every field in the database relevant to your research? Are there open-ended text fields, and if so, do you need them? Narrowing the data to just the relevant cases and fields can easily reduce the volume of data by a factor of ten, or one-hundred, one-thousand… you get the picture.

If the goals center on understanding behaviors of large numbers of people (or transactions, or some other things) as a group, then you have discovered another fine opportunity to reduce scale. Because you don’t need to use every single individual to do a good job of describing the group. What you need is a sample. There are statistical approaches to sampling, and there are data mining approaches, use what suits your client’s needs and your own work processes.

If you don’t trust or use sampling, you don’t know drivel about data analysis; go back to school and stop wasting clients’ time and money.

What if your client really does need to be able to address every single case in a huge repository? This is certainly possible. Perhaps your client wants to rate the profitability or purchasing potential of every customer. Or maybe you’re dealing with insurance claims – which ones are potentially fraudulent? Tax payment information – who’s cheating? Situations like these do call for handling lots and lots of data.

Still, there are opportunities to minimize the resources required. Break big questions into small pieces, use sampling early and often, and educate yourself about the resource requirements for the things you do. Scoring a million cases using your predictive model, for example, may require less computing power than building the model itself on a sample of a few thousand cases. This information isn’t always easy to find; making nice with vendor tech support and you client’s IT staff may lead you to invaluable tricks of the trade.

Make it a point to minimize your demands on the data at every step of the analytic process. Exploring the data? If you’re looking to understand what’s typical, use a sample, not the whole dataset. Seeking the extreme and unusual? You’ll need more data to work with, but still may be better off with a subset of the data, at least to begin. You’re ready to build models? Often, you’ll get the best results by modeling segments one at a time. Throwing everybody into one giant equation is asking for a weak model. Take a sample of training data from a single segment and you can work faster, often producing more accurate models.

When you tell a prospect that you can handle the data, it shouldn’t mean you’ll do it by brute force. A little finesse yields stronger results with less strain on resources.

TAGGED:big data
Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

street address database
Why Data-Driven Companies Rely on Accurate Street Address Databases
Big Data Exclusive
predictive analytics risk management
How Predictive Analytics Is Redefining Risk Management Across Industries
Analytics Exclusive Predictive Analytics
data analytics and gold trading
Data Analytics and the New Era of Gold Trading
Analytics Big Data Exclusive
student learning AI
Advanced Degrees Still Matter in an AI-Driven Job Market
Artificial Intelligence Exclusive

Stay Connected

1.2kFollowersLike
33.7kFollowersFollow
222FollowersPin

You Might also Like

cloud ERP implementation
Uncategorized

When Does Cloud ERP Start Saving Money?

6 Min Read
data visualization platforms
Big DataData VisualizationExclusive

New Big Data Visualization Platforms Help You Optimize Decision Making

6 Min Read
choosing web hosting using big data
Big DataExclusiveIT

10 Ways Big Data Helps With Selecting The Perfect Web Hosting

11 Min Read

Big Data Insight for Big $ Action – Users Want Intuitive, Mobile and Actionable BI and Big Data Analytics

6 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

data-driven web design
5 Great Tips for Using Data Analytics for Website UX
Big Data
AI chatbots
AI Chatbots Can Help Retailers Convert Live Broadcast Viewers into Sales!
Chatbots

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?