Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    business using business intelligence
    How to Use a Competitive Intelligence Dashboard to Turn Market Data Into Smarter Marketing Decisions 
    9 Min Read
    unusual trading activity
    Signal Or Noise? A Decision Tree For Evaluating Unusual Trading Activity
    3 Min Read
    software developer using ai
    How Data Analytics Helps Developers Deliver Better Tech Services
    8 Min Read
    ai for stock trading
    Can Data Analytics Help Investors Outperform Warren Buffett
    9 Min Read
    media monitoring
    Signals In The Noise: Using Media Monitoring To Manage Negative Publicity
    5 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: The Data Time Investment
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Data Management > The Data Time Investment
Data Management

The Data Time Investment

Venky Ganti
Venky Ganti
7 Min Read
SHARE

In a prior blog post on challenges beyond the 3V’s of working with data, I discussed some issues which hindered the efficiency of data analysts besides drastically raising the bar on their motivation to begin working with new data.

Here, I will drill down those issues and my past experience around them.

How do I Find and Understand Data?

In a prior blog post on challenges beyond the 3V’s of working with data, I discussed some issues which hindered the efficiency of data analysts besides drastically raising the bar on their motivation to begin working with new data.

More Read

Image
Big Data and the Internet of Things: Two Sides of the Same Coin?
5 Essential Cybersecurity Tips For Data Centric Businesses In 2021
DIY Culture: Should Non-IT Employees Be Compensated for Building Apps?
3 Marketing Dashboards with Actionable Data, Not Fluff
A Quick Tech Tutorial: Two-Factor Authentication

Here, I will drill down those issues and my past experience around them.

How do I Find and Understand Data?

Let’s consider the scenario when an engineer or a data analyst inside Google wants to find relevant data, say, a table in Dremel or an SSTable on GFS. She still has to remember the name of the table, and which among Google’s myriad data stores contain it. Further, unlike documents which are self-describing, it is not easy to “understand” what is inside a dataset and how to use it. The user needs to understand the data by talking to people who know about the data, or through some other alternative means. Contrast the effort spent by an engineer within Google for finding and understanding data, relative to that an external user spends using Google to find and understand information on the web.

Let me recall one of my own frustrating experiences around a similar scenario. I worked on the AdWords team at Google. I needed to find information about search queries that led to similar user behavior on Google’s products, specifically Search and Ads. I felt that there must be several datasets out there in the Search and Ads teams. I found two in the Ads teams because I knew someone who worked on those projects. But, it turned out after further investigation that I could not use either because of the differences in target applications. However, I had little luck in finding out similar information from the Search teams. I tried rebuilding my own, spent months, and didn’t succeed. Recently, after I left Google, an ex-colleague told me he chanced upon a pointer to the right data and successfully used it!

Of course, these problems around finding and understanding data are not peculiar to Google but exist at any organization which leverages data to enhance their decision-making and their products. In general, an engineer at Google has a better chance at overcoming these problems due to awesome internals tools (e.g., code search).

The focus of much of the technology related to data has been on enabling processing massive amounts of data, and visualizing results better. But, there is no focus on empowering users to find and understand data within these databases to prepare queries and programs more reliably and efficiently.

The primary reason in my opinion for the lack of focus on these issues, is that it is much more concrete to measure and show progress on query processing efficiency and visualization capabilities. On the other hand, it is hard today to articulate the benefits of helping data users find and understand data. By the way, wasn’t this true for Search over the web until Google came along and illustrated the economic and productivity gains across a wide spectrum of users? I believe that we are at the cusp of a similar revolution in data consumption.

Who do I Ask?

After an analyst finds a dataset, she needs to understand its usage by other analysts and applications. Often, it is very hard to find such knowledgeable users. There were many times when I found it quite hard, even at Google, to identify the people I need to talk to for such questions; when I did find them, I felt the pain of distracting engineers with run-of-the-mill questions which they must have answered many times over.

As an example, I was responsible for migrating an application reading data from one engine to a newer more robust engine. A big part of the migration involved rewriting queries to read from the new schema. I was among the last few to be doing this migration, and hence similar questions must have been answered. But, the wiki that I was pointed to didn’t have all the information I needed. So, I had to drag myself very reluctantly to a very busy principal engineer, the only one I knew directly, to get help. I would have appreciated, a lot, if I could quickly find someone else who went through a similar migration.

On the flip side, I would repeatedly answer the same set of questions over and over on data that I produced and maintained. I tried creating a wiki page, but was still asked lots of questions. As we all know, this approach comes with its own set of challenges — keeping the wiki updated and reliable over time. In retrospect, I wouldn’t be surprised if I or my colleagues may have missed a few updates.

How much Time?

So, how much time is actually spent by analysts on these activities of finding and understanding data? I haven’t tried measuring this yet. We just don’t have the methodology and the tools to do it. But, depending on who you ask and which data they need to use, the answer varies widely. New users to a particular dataset will spend upwards of 80% on these tasks, while experts much much less. However, experts spend time by answering other users’ questions over and over.

Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

AI driven big data company
How AI-Driven Workflows Are Changing the Way Companies Think About Data Risk
Artificial Intelligence Data Management Exclusive Risk Management
ai product development
Why Businesses Outsource AI Product Development Companies
Exclusive News
banking tools
The Fintech and Banking Tools Global Entrepreneurs Rely On
Fintech Infographic
business using business intelligence
How to Use a Competitive Intelligence Dashboard to Turn Market Data Into Smarter Marketing Decisions 
Analytics Big Data Exclusive Marketing

Stay Connected

1.2KFollowersLike
33.7KFollowersFollow
222FollowersPin

You Might also Like

Image
AnalyticsBig DataBusiness IntelligenceData MiningDecision ManagementHadoopKnowledge ManagementModelingPredictive AnalyticsPrivacySentiment AnalyticsSocial DataSocial Media AnalyticsText AnalyticsUnstructured DataWeb AnalyticsWorkforce AnalyticsWorkforce Data

How Big Data Will Change People Management Forever

7 Min Read
ensure data privacy by deleting Internet content
Privacy

4 Steps to Delete Yourself from the Internet for Data Privacy

6 Min Read

Dronegate: The First Casualty is Our Cybersecurity Paradigm

6 Min Read

Internet2: Big Pipes for Big Data

4 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

ai chatbot
The Art of Conversation: Enhancing Chatbots with Advanced AI Prompts
Chatbots
AI and chatbots
Chatbots and SEO: How Can Chatbots Improve Your SEO Ranking?
Artificial Intelligence Chatbots Exclusive

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?