The Establishment of Data Preparation

February 22, 2015

Data is an essential ingredient for every aspect of business, and those that use it well are likely to gain advantages over competitors that do not.

Data is an essential ingredient for every aspect of business, and those that use it well are likely to gain advantages over competitors that do not. Our benchmark research on information optimizationvr_Info_Optimization_02_drivers_for_deploying_information reveals a variety of drivers for deploying information, most commonly analytics, information access, decision-making, process improvements and customer experience and satisfaction. To accomplish any of these purposes requires that data be prepared through a sequence of steps: accessing, searching, aggregating, enriching, transforming and cleaning data from different sources to cre­ate a single uniform data set. To prepare data properly, businesses need flex­ible tools that enable them to en­rich the context of data drawn from multiple sources, collaborate on its preparation to serve business needs and govern the process of preparation to ensure security and consistency. Users of these tools range from analysts to operations professionals in the lines of business.

Data preparation efforts often encounter challenges created by the use of tools not designed for these tasks. Many of today’s analytics and business intelligence products do not provide enough flexibility, and data management tools for data integration are too complicated for analysts who need to interact ad hoc with data. Depending on IT staff to fill ad hoc requests takes far too long for the rapid pace of today’s business. Even worse, many organizations use spreadsheets because they are familiar and easy to work with. However, when it comes to data preparation, spreadsheets are awkward and time-consuming and require expertise to code them to perform these tasks. They also incur risks of errors in data and inconsistencies among disparate versions stored on individual desktops.

vr_Info_Optimization_16_information_software_evaluation_criteriaIn effect inadequate tools waste analysts’ time, which is a scarce re­source in many organizations, and can squander market opportunities through delays in preparation and unreliable data quality. Our information optimization research shows that most analysts spend the majority of their time not in actual analysis but in readying the data for analysis. More than 45 percent of their time goes to preparing data for an­al­y­sis or reviewing the quality and consistency of data.

Businesses need technology tools capable of handling data preparation tasks quick­ly and dependably so users can be sure of data quality and concen­trate on the value-adding as­pects of their jobs. More than a dozen such tools designed for these tasks are on the market. The best among them are easy for analysts to use, which our research shows is critical: More than half (58%) of participants said that usability is a very important evaluation criterion, more than any other, in software for optimizing information. These tools also deal with the large numbers and types of sources organizations have accumulated: 92 percent of those in our research have 16 to 20 data sources, and 80 percent have more than 20 sources. Complicating the issue further, these sources are not all inside the enterprise; they also are found on the Internet and in cloud-based environments where data may be in applications or in big data stores.

Organizations can’t make business use of their data until it is ready, so simplifying and enhancing the data preparation process can make it possible for analysts to begin analysis sooner and thus be more productive. Our analysis of time related to data preparation finds that when this is done right, significant amounts of time could be shifted to tasks that contribute to achieving business goals. We conclude that, assuming analysts spend 20 hours a week working on analytics, most are spending six hours on preparing data, another six hours on reviewing data for quality and consistency issues, three more hours on assembling information, another two hours waiting for data from IT and one hour presenting information for review; this leaves only two hours for performing the analysis itself.

Dedicated data preparation tools provide support for key tasks in areas that our research and experience finds that are done manually by about one-third of organizations. These data tasks include search, aggregation, reduction, lineage tracking, metrics definition and collaboration. If an organization is able to reduce the 14 hours previously mentioned in data-related tasks (that including preparing data, reviewing data and waiting for data from IT) by one-third, it will have an extra four hours a week for analysis – that’s 10 percent of a 40-hour work week. Multiply this time by the number of individual analysts and it becomes significant. Using the proper tools can enable such a reallocation of time to use the professional expertise of these employees.

This savings can apply in any line of business. For example,vr_NG_Finance_Analytics_10_data_issues_slow_delivery_of_metrics our research into next-generation finance analytics shows that more than two-thirds (68%) of finance organizations spend most of their analytics time on data-related tasks. Further analysis shows that only 36 percent of finance organizations that spend the most time on data-related tasks can produce metrics within a week, compared to more than half (56%) of those that spend more time on analytic tasks. This difference is important to finance organizations seeking to take a more active role in corporate decision-making.

vr_BDI_09_big_data_integration_starts_with_basicsAnother example is found in big data. The flood of business data has created even more challenges as the types of sources have expanded beyond just the RDBMS and data appliances; Hadoop, in-memory and NoSQL big data sources exist in at least 25 percent of organizations, according to our big data integration research. Our projections of growth based on what companies are planning indicates that Hadoop, in-memory and NoSQL sources will increase significantly. Each of these types must draw from systems from various providers, which have specific interfaces to access data let alone load it. Our research in big data finds similar results regarding data preparation: The tasks that consume the most time are reviewing data for quality and consistency (52%) and preparing data (46%). Without automating data preparation for accessing and streamlining the loading of data, big data can be an insurmountable task for companies seeking efficiency in their deployments.

A third example is in the critical area of customer analytics. Customer data is used across many departments but especially marketing, sales and customer service. Our research again finds similarvr_Info_Optimization_11_innovations_important_for_information issues regarding time lost to data preparation tasks. In our next-generation customer analytics benchmark research preparing data is the most time-consuming task (in 47% of organizations), followed closely by reviewing data (43%). The research also finds that data not being readily available is the most common point of dissatisfaction with customer analytics (in 63% of organizations). Our research finds other examples, too, in human resources, sales, manufacturing and the supply chain.

The good news is that these busi­ness-focused data preparation tools have usability in the form of spreadsheet-like interfaces and include analytic workflows that simplify and enhance data preparation. In searching for and profiling of data and examining fields based on analytics, use of color can help highlight patterns in the data. Capabilities for addressing duplicate and incorrect data about, for example, companies, addresses, products and locations are built in for simplicity of access and use. In addition data preparation is entering a new stage in which ma­chine learning and pat­tern recog­ni­tion, along with predictive analytics techniques, can help guide individuals to issues and focus their efforts on looking forward. Tools also are advancing in collaboration, helping teams of analysts work together to save time and take advantage of colleagues’ expertise and knowledge of the data, along with interfacing to IT and data management professionals. In our information optimization research collaboration is a critical technology innovation, according to more than half (51%) of organizations. They desire several collaborative capabilities ranging from discussion forms to knowledge sharing to requests on activity streams.

This data preparation technology provides support for ad hoc and other agile approaches to working with data that maps to how business actually operate. Taking a dedicated approach can help simplify and speed data preparation and add value by enabling users to perform analysis sooner and allocate more time to it. If you have not taken a look at how data preparation can improve analytics and operational processes, I recommend that you start now. Organizations are saving time and becoming more effective by focusing more on business value-adding tasks.