Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    data analytics
    How Data Analytics Can Help You Construct A Financial Weather Map
    4 Min Read
    financial analytics
    Financial Analytics Shows The Hidden Cost Of Not Switching Systems
    4 Min Read
    warehouse accidents
    Data Analytics and the Future of Warehouse Safety
    10 Min Read
    stock investing and data analytics
    How Data Analytics Supports Smarter Stock Trading Strategies
    4 Min Read
    predictive analytics risk management
    How Predictive Analytics Is Redefining Risk Management Across Industries
    7 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: The Data Time Investment
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Data Management > The Data Time Investment
Data Management

The Data Time Investment

Venky Ganti
Venky Ganti
7 Min Read
SHARE

In a prior blog post on challenges beyond the 3V’s of working with data, I discussed some issues which hindered the efficiency of data analysts besides drastically raising the bar on their motivation to begin working with new data.

Here, I will drill down those issues and my past experience around them.

How do I Find and Understand Data?

In a prior blog post on challenges beyond the 3V’s of working with data, I discussed some issues which hindered the efficiency of data analysts besides drastically raising the bar on their motivation to begin working with new data.

More Read

A Look at Today’s White House Big Data Event
5 Essential Steps To Take After A Data Security Breach
What Are Your Big Data Application Options?
Building Information Technology Liquidity
How Artificial Intelligence is Transforming the Way eCommerce Businesses Perform

Here, I will drill down those issues and my past experience around them.

How do I Find and Understand Data?

Let’s consider the scenario when an engineer or a data analyst inside Google wants to find relevant data, say, a table in Dremel or an SSTable on GFS. She still has to remember the name of the table, and which among Google’s myriad data stores contain it. Further, unlike documents which are self-describing, it is not easy to “understand” what is inside a dataset and how to use it. The user needs to understand the data by talking to people who know about the data, or through some other alternative means. Contrast the effort spent by an engineer within Google for finding and understanding data, relative to that an external user spends using Google to find and understand information on the web.

Let me recall one of my own frustrating experiences around a similar scenario. I worked on the AdWords team at Google. I needed to find information about search queries that led to similar user behavior on Google’s products, specifically Search and Ads. I felt that there must be several datasets out there in the Search and Ads teams. I found two in the Ads teams because I knew someone who worked on those projects. But, it turned out after further investigation that I could not use either because of the differences in target applications. However, I had little luck in finding out similar information from the Search teams. I tried rebuilding my own, spent months, and didn’t succeed. Recently, after I left Google, an ex-colleague told me he chanced upon a pointer to the right data and successfully used it!

Of course, these problems around finding and understanding data are not peculiar to Google but exist at any organization which leverages data to enhance their decision-making and their products. In general, an engineer at Google has a better chance at overcoming these problems due to awesome internals tools (e.g., code search).

The focus of much of the technology related to data has been on enabling processing massive amounts of data, and visualizing results better. But, there is no focus on empowering users to find and understand data within these databases to prepare queries and programs more reliably and efficiently.

The primary reason in my opinion for the lack of focus on these issues, is that it is much more concrete to measure and show progress on query processing efficiency and visualization capabilities. On the other hand, it is hard today to articulate the benefits of helping data users find and understand data. By the way, wasn’t this true for Search over the web until Google came along and illustrated the economic and productivity gains across a wide spectrum of users? I believe that we are at the cusp of a similar revolution in data consumption.

Who do I Ask?

After an analyst finds a dataset, she needs to understand its usage by other analysts and applications. Often, it is very hard to find such knowledgeable users. There were many times when I found it quite hard, even at Google, to identify the people I need to talk to for such questions; when I did find them, I felt the pain of distracting engineers with run-of-the-mill questions which they must have answered many times over.

As an example, I was responsible for migrating an application reading data from one engine to a newer more robust engine. A big part of the migration involved rewriting queries to read from the new schema. I was among the last few to be doing this migration, and hence similar questions must have been answered. But, the wiki that I was pointed to didn’t have all the information I needed. So, I had to drag myself very reluctantly to a very busy principal engineer, the only one I knew directly, to get help. I would have appreciated, a lot, if I could quickly find someone else who went through a similar migration.

On the flip side, I would repeatedly answer the same set of questions over and over on data that I produced and maintained. I tried creating a wiki page, but was still asked lots of questions. As we all know, this approach comes with its own set of challenges — keeping the wiki updated and reliable over time. In retrospect, I wouldn’t be surprised if I or my colleagues may have missed a few updates.

How much Time?

So, how much time is actually spent by analysts on these activities of finding and understanding data? I haven’t tried measuring this yet. We just don’t have the methodology and the tools to do it. But, depending on who you ask and which data they need to use, the answer varies widely. New users to a particular dataset will spend upwards of 80% on these tasks, while experts much much less. However, experts spend time by answering other users’ questions over and over.

Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

protecting patient data
How to Protect Psychotherapy Data in a Digital Practice
Big Data Exclusive Security
data analytics
How Data Analytics Can Help You Construct A Financial Weather Map
Analytics Exclusive Infographic
AI use in payment methods
AI Shows How Payment Delays Disrupt Your Business
Artificial Intelligence Exclusive Infographic
financial analytics
Financial Analytics Shows The Hidden Cost Of Not Switching Systems
Analytics Exclusive Infographic

Stay Connected

1.2KFollowersLike
33.7KFollowersFollow
222FollowersPin

You Might also Like

AI leads to a new range of cybersecurity risks for social media users
Artificial Intelligence

AI Significantly Increases the Dangers of Social Media Hacking

11 Min Read
Image
AnalyticsBest PracticesBig DataBusiness IntelligenceCloud ComputingData ManagementData MiningData VisualizationExclusiveHadoopHardwareITMapReducePolicy and GovernanceSoftwareUnstructured Data

6 Simple Steps to a Big Data Strategy

6 Min Read

Big Data: Transforming Information Security [VIDEO]

1 Min Read
cyber security risk management
Best PracticesBusiness IntelligenceRisk ManagementSecurity

How Should Businesses Handle Cyber Security Risk Assessment?

5 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

AI and chatbots
Chatbots and SEO: How Can Chatbots Improve Your SEO Ranking?
Artificial Intelligence Chatbots Exclusive
ai chatbot
The Art of Conversation: Enhancing Chatbots with Advanced AI Prompts
Chatbots

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?