Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    data driven insights
    How Data-Driven Insights Are Addressing Gaps in Patient Communication and Equity
    8 Min Read
    pexels pavel danilyuk 8112119
    Data Analytics Is Revolutionizing Medical Credentialing
    8 Min Read
    data and seo
    Maximize SEO Success with Powerful Data Analytics Insights
    8 Min Read
    data analytics for trademark registration
    Optimizing Trademark Registration with Data Analytics
    6 Min Read
    data analytics for finding zip codes
    Unlocking Zip Code Insights with Data Analytics
    6 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: Better than Brute Force: Big Data Analytics Tips
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Big Data > Data Mining > Better than Brute Force: Big Data Analytics Tips
AnalyticsData MiningPredictive AnalyticsStatistics

Better than Brute Force: Big Data Analytics Tips

metabrown
metabrown
10 Min Read
SHARE

Long before I encountered the term “Big Data,” questions about dealing with large datasets were a routine part of my work. This situation was by no means unique. Government agencies and many businesses have been amassing repositories of detailed data for a long time.

Long before I encountered the term “Big Data,” questions about dealing with large datasets were a routine part of my work. This situation was by no means unique. Government agencies and many businesses have been amassing repositories of detailed data for a long time.

Consider, for example, the transaction history of a large retailer. A single transaction might include one item, or dozens. For each item, there may be several descriptors, such as a product ID and price. Besides the items purchased, there is the time at which the transaction took place, the register, the cashier, customer payment information and more. Each item corresponds to a lot, distributor, manufacturer, shipper and so on. The customer corresponds to another stream of information, covering previous purchases, loyalty program status and marketing history. This is adding up to a lot of data, and we’re still describing just one transaction.  A department store chain, grocer or big box retailer may handle a thousand transactions or more each day, year round, at each of hundreds, even thousands, of outlets.

More Read

Podcast Available
Q&A for 5 Do’s and Don’ts for Behavioral Segmentation, Targeting, & Interactive Marketing
IBM Brings Business Analytics to Apple iPad
What Are the Most Serious Privacy Concerns Regarding Big Data?
Real-Time Analytics and Storm Tracking: How Utilities Can Keep Customers Out of the Dark

Lately, I have heard “Big Data” used to describe everything from baseball statistics, which are far smaller in scale than the retailer example, to online retailing and social media, where the data resources can become enormous. Big is relative, yet when people ask, “Can you handle our data?” they all have certain shared concerns.

The prospect who asks if you can handle a large volume of data is looking for reassurance, and proof that you have something of value to offer. Here are some thoughts behind the question…

It’s hard for us to store this much data, let alone do anything useful with it.

We’ve tried some things that didn’t work.

Some of the things that didn’t work crashed the system.

We spent a lot of money on the last thing that didn’t work. Spending a lot of money on another thing that doesn’t work could be a career-ending move.

Of course, your answer will boil down to “Of course we can handle your data!” but if you say it that way, most people won’t believe you, and for good reasons. They want an answer that addresses those unspoken concerns.

So what’s the right way to respond? Answer the question with questions. At least, that is the way to begin. (This is true whether you are a vendor, outside consultant, or an internal resource such as a business analyst.) First, ask about goals. What kind of questions does your prospect expect to answer? Why? How will the information be of use to the business? What’s the vision for integrating analytics into decision-making and business processes?

Why begin with these questions? Primarily to learn the answers, because the answers will guide you in everything else you ask and do. But there are other reasons as well. Asking questions about business goals helps your clients to become aware of gaps in their own reasoning and other challenges of meeting expectations. The process of describing goals can easily lead someone to a realization that the data doesn’t support their wants, and some redirection is necessary. You’ll be saving yourself and your client a lot of trouble by getting the big questions on the table from the start.

Your clients are, most likely, a smart bunch of people. They are experts in their own professions, not yours. They may be unsure of how to evaluate you and the services you offer. Asking questions in a respectful manner is a way to show that you care about them and what they do. If you have put serious thought into the questions you ask and listen to the answers, you’re very likely to uncover some issues of importance, including some your prospect had not previously considered. Your client develops an appreciation for what you know, what you can do for them. As you develop an understanding of needs, you prospect develops respect and trust for you.

While you ask questions about Big Data analysis goals, keep asking yourself how much data is required to address the client’s goals. Just because an organization has endless heaps of data doesn’t mean there is a reason to touch and feel every bit of it.

Is the client asking for information about every single person in the data – individually? Very often, that’s not the case. And if that’s not the case, handling the data becomes simpler. Focusing on particular segments? OK, then you need data relevant to those segments, not everybody and her mother. Is every field in the database relevant to your research? Are there open-ended text fields, and if so, do you need them? Narrowing the data to just the relevant cases and fields can easily reduce the volume of data by a factor of ten, or one-hundred, one-thousand… you get the picture.

If the goals center on understanding behaviors of large numbers of people (or transactions, or some other things) as a group, then you have discovered another fine opportunity to reduce scale. Because you don’t need to use every single individual to do a good job of describing the group. What you need is a sample. There are statistical approaches to sampling, and there are data mining approaches, use what suits your client’s needs and your own work processes.

If you don’t trust or use sampling, you don’t know drivel about data analysis; go back to school and stop wasting clients’ time and money.

What if your client really does need to be able to address every single case in a huge repository? This is certainly possible. Perhaps your client wants to rate the profitability or purchasing potential of every customer. Or maybe you’re dealing with insurance claims – which ones are potentially fraudulent? Tax payment information – who’s cheating? Situations like these do call for handling lots and lots of data.

Still, there are opportunities to minimize the resources required. Break big questions into small pieces, use sampling early and often, and educate yourself about the resource requirements for the things you do. Scoring a million cases using your predictive model, for example, may require less computing power than building the model itself on a sample of a few thousand cases. This information isn’t always easy to find; making nice with vendor tech support and you client’s IT staff may lead you to invaluable tricks of the trade.

Make it a point to minimize your demands on the data at every step of the analytic process. Exploring the data? If you’re looking to understand what’s typical, use a sample, not the whole dataset. Seeking the extreme and unusual? You’ll need more data to work with, but still may be better off with a subset of the data, at least to begin. You’re ready to build models? Often, you’ll get the best results by modeling segments one at a time. Throwing everybody into one giant equation is asking for a weak model. Take a sample of training data from a single segment and you can work faster, often producing more accurate models.

When you tell a prospect that you can handle the data, it shouldn’t mean you’ll do it by brute force. A little finesse yields stronger results with less strain on resources.

TAGGED:big data
Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

accountant using ai
AI Improves Integrity in Corporate Accounting
Exclusive
ai and law enforcement
Forensic AI Technology is Doing Wonders for Law Enforcement
Artificial Intelligence Exclusive
langgraph and genai
LangGraph Orchestrator Agents: Streamlining AI Workflow Automation
Artificial Intelligence Exclusive
ai fitness app
Will AI Replace Personal Trainers? A Data-Driven Look at the Future of Fitness Careers
Artificial Intelligence Big Data Exclusive

Stay Connected

1.2kFollowersLike
33.7kFollowersFollow
222FollowersPin

You Might also Like

wordpress site safety measures
Big DataExclusive

The Role Of Big Data In Setting WordPress Safety Trends In 2020

8 Min Read
big data helping wordpress hosting
Big DataExclusive

Big Data is Transforming the Future of WordPress Hosting

8 Min Read
data can help to reduce expenses
Big DataBusiness IntelligenceData ManagementExclusive

5 Business Expenses That Data Can Help Reduce

6 Min Read
how ai is transforming lending
Artificial IntelligenceExclusiveFintechMachine Learning

How AI Is Transforming Lending And Loan Management

8 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

ai in ecommerce
Artificial Intelligence for eCommerce: A Closer Look
Artificial Intelligence
AI chatbots
AI Chatbots Can Help Retailers Convert Live Broadcast Viewers into Sales!
Chatbots

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?