By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData Collective
  • Analytics
    AnalyticsShow More
    data science anayst
    Growing Demand for Data Science & Data Analyst Roles
    6 Min Read
    predictive analytics in dropshipping
    Predictive Analytics Helps New Dropshipping Businesses Thrive
    12 Min Read
    data-driven approach in healthcare
    The Importance of Data-Driven Approaches to Improving Healthcare in Rural Areas
    6 Min Read
    analytics for tax compliance
    Analytics Changes the Calculus of Business Tax Compliance
    8 Min Read
    big data analytics in gaming
    The Role of Big Data Analytics in Gaming
    10 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-23 SmartData Collective. All Rights Reserved.
Reading: Investigating the Potential of Data Preparation
Share
Notification Show More
Latest News
SMEs Use AI-Driven Financial Software for Greater Efficiency
Artificial Intelligence
data security in big data age
6 Reasons to Boost Data Security Plan in the Age of Big Data
Big Data
data science anayst
Growing Demand for Data Science & Data Analyst Roles
Data Science
ai software development
Key Strategies to Develop AI Software Cost-Effectively
Artificial Intelligence
ai in omnichannel marketing
AI is Driving Huge Changes in Omnichannel Marketing
Artificial Intelligence
Aa
SmartData Collective
Aa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Big Data > Data Quality > Investigating the Potential of Data Preparation
AnalyticsBig DataData Quality

Investigating the Potential of Data Preparation

Dave Menninger
Last updated: 2017/10/05 at 4:47 PM
Dave Menninger
7 Min Read
Image
SHARE

Data preparation is critical to the effectiveness of both operational and analytic business processes. Operational processes today are fed by streams of constantly generated data. Our data and analytics in the cloud benchmark research shows that more than half (55%) vr_dac_23_time_spent_in_analytics_updatedof organizations spend the most time in their analytic processes preparing data for analysis – a situation that reduces their productivity. Data now comes from more sources than ever, at a faster pace and in a dizzying array of formats; it often contains inconsistencies in both structure and content.

In response to these changing information conditions, data preparation technology is evolving. Big data, data science, streaming data and self-service all are impacting the way organizations collect and prepare data. Data sources used in analytic processes now include cloud-based data and external data. Many data sources now include large amounts of unstructured data, in contrast to just a few years ago when most organizations focused primarily on structured data. Our big data analytics benchmark research shows that nearly half (49%) include unstructured content such as documents or Web pages in their analyses.

The ways in which data is stored in organizations are changing as well. Historically, data was extracted, transformed and loaded, and only then made available to end users through data warehouses or data marts. Now data warehouses are being supplemented with, or in some cases replaced by, data lakes, which I have written about. As a result, the data preparation process may involve not just loading raw information into a data lake, but also retrieving and refining information from it.

The advent of big data technologies such as Hadoop and NoSQL databases intensifies the need to apply data science techniques to make sense of these volumes of information. In this case querying and reporting over such large amounts of information are both inefficient and ineffective analytical techniques. And using data science means addressing additional data preparation requirements such as normalizing, sampling, binning and dealing with missing or outlying values. For example, in our next-generation predictive analytics benchmark research, 83 percent of organizations reported using sampling in preparing their analyses. Data scientists also frequently use sandboxes – copies of the data that can be manipulated without impacting operational processes or production data sources. Managing sandboxes adds yet another challenge to the data preparation process.

More Read

data science anayst

Growing Demand for Data Science & Data Analyst Roles

How Big Data Is Transforming the Maritime Industry
Predictive Analytics Helps New Dropshipping Businesses Thrive
Utilizing Data to Discover Shortcomings Within Your Business Model
Small Businesses Use Big Data to Offset Risk During Economic Uncertainty

Data governance is always a challenge; in this new world, it’s as if anything grown even more difficult as the volume and variety of data grow. At the moment most big data technologies trail their relational database counterparts in providing data governance capabilities. The developers of data preparation processes must adapt them to these new environments, supplementing them with processes that support governance and compliance of personally identifiable information (PII), payment card information (PCI), protected health information (PHI) and other standards for the handling of sensitive, restricted data.

In the emerging self-service approach to data preparation, three separate user personas typically are employed. Operational teams need to derive useful information from data as soon as it is generated to complete business transactions and keep operations flowing smoothly. Analysts need access to relevant information to guide better decision-making. And the IT organization is often called upon to support either or both of these roles when the complexities of data access and preparation exceed the skills of those in the lines of business. While IT departments probably welcome the opportunity to enable end users to perform more self-service tasks, they cannot do so to the extent that it ignores enterprise requirements. Nonetheless, the trend toward deploying tools that support self-service data preparation is growing. These two trends can lead to conflict for organizations that want to derive maximum business value from their data as quickly as possible while still maintaining appropriate data governance, security, and consistency.

To help understand how organizations are tackling these changes, Ventana Research is conducting benchmark research on data preparation. This research will identify existing and planned approaches and related technologies, best practices for implementing them and market trends in data preparation. It will assess the current challenges associated with innovations in data preparation, including self-service capabilities and architectures that support big data environments. The research will assess the extent to which tools and processes for data preparation support superior performance and determine how organizations balance the demand for self-service capabilities with enterprise requirements for data governance and repeatability. It will uncover ways in which data preparation and supporting technologies are being used to enhance operational and analytic processes.

This research also will provide new insights into the changes now occurring in business and IT functions as organizations seek to capitalize on data preparation to gain competitive advantage and help with regulatory compliance and risk management and governance processes. The research will investigate how organizations are implementing data preparation tools to support all types of operational and business processes including operational intelligence, business intelligence, and data science.

Data is an essential component of every aspect of business, and organizations that use it well are likely to gain advantages over competitors that do not. Watch our community for updates. We expect the research to reveal impactful insights that will help business and IT. When it is complete, we’ll share education and best practices about how organizations can tackle these challenges and opportunities.

Dave Menninger November 22, 2016
Share this Article
Facebook Twitter Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

SMEs Use AI-Driven Financial Software for Greater Efficiency
Artificial Intelligence
data security in big data age
6 Reasons to Boost Data Security Plan in the Age of Big Data
Big Data
data science anayst
Growing Demand for Data Science & Data Analyst Roles
Data Science
ai software development
Key Strategies to Develop AI Software Cost-Effectively
Artificial Intelligence

Stay Connected

1.2k Followers Like
33.7k Followers Follow
222 Followers Pin

You Might also Like

data science anayst
Data Science

Growing Demand for Data Science & Data Analyst Roles

6 Min Read
How Big Data Is Transforming the Maritime Industry
Big Data

How Big Data Is Transforming the Maritime Industry

8 Min Read
predictive analytics in dropshipping
Predictive Analytics

Predictive Analytics Helps New Dropshipping Businesses Thrive

12 Min Read
utlizing big data for business model
Big Data

Utilizing Data to Discover Shortcomings Within Your Business Model

6 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

giveaway chatbots
How To Get An Award Winning Giveaway Bot
Big Data Chatbots Exclusive
ai in ecommerce
Artificial Intelligence for eCommerce: A Closer Look
Artificial Intelligence

Quick Link

  • About
  • Contact
  • Privacy
Follow US

© 2008-23 SmartData Collective. All Rights Reserved.

Removed from reading list

Undo
Go to mobile version
Welcome Back!

Sign in to your account

Lost your password?