Socializing Self-service Data Preparation to Enhance Accessibility and Governance

Increasingly business applications are leveraging a centralized data marketplace with social features to improve collaboration.

November 7, 2017
95 Shares 1,649 Views

Data socialization applies a core concept of social media and modern consumer apps – creating and sharing information – to a central data management platform. It unites data preparation, data cataloging, data stewardship, automation and governance with machine learning and social media attributes.

Increasingly business applications are leveraging a centralized data marketplace with social features to improve collaboration so that individuals and organizations become more informed, agile and productive. By combining a data marketplace with socialization and self-service data preparation, organizations can expedite, simplify and improve best practices for organizing and optimizing data discovery. In particular, some benefits include:

  • Point-and-click access to available datasets
  • Understanding the relevancy of data in relation to context that identifies patterns of use and related assets
  • Performing data quality scoring
  • Reviewing data origin and lineage
  • Suggesting and sharing relevant sources
  • Following key users and datasets
  • Automatically recommending likely data preparation actions based on user persona

The powerful combination of a centralized data marketplace with social features enables data scientists, analysts and even novice business users across a company to search for, share and reuse prepared, managed data to achieve true enterprise collaboration and agility, resulting in better and faster business decisions while building an analytics community. Users can learn from each other and be more productive and better connected as they source, cleanse and prepare data for analytical and operational processes.

For example, two-thirds of healthcare organizations report needing additional resources in order to provide required reports on quality performance and outcomes to effectively manage their revenue cycle. And more than 75% say their clinical data is presented in a different view or in a different format from their financial data – making revenue reporting difficult, manual and time-consuming. The ability to pull data directly from any source, structured or not, prep it, export it to other applications and share it with the right users enables rapid analysis and improved efficiencies.

At a company like Datawatch, we walk the walk by centralizing data access for marketing operations. Because an accurate view of our marketing and sales funnel – from initial prospects to closed sales – is critical to decisions about budget and campaign support, we have certified, validated data sets from Salesforce, Pardot, Netsuite, Zendesk, Google Analytics, Google AdWords and multiple social media applications all shared across various teams. The result is a self-sufficient team that conducts go-to-market analysis and creates marketing dashboards without a dependency on our data science or IT resources.

Governance Reinforced

IT leaders may be concerned that the collaborative nature of a data marketplace opens enterprises up to security and governance risks. It fact, it is the opposite. The data marketplace enhances key factors of data governance – data masking, stewardship, lineage and role-based permissions – to ensure the proper management of all sources. In fact, healthcare finance team members could export data from different applications and share the datasets – without revealing sensitive information – to create a full picture of the revenue cycle, reimbursement rates and individual facility performance. Rather than having individuals with rogue, personalized data sources and spreadsheets, self-service data preparation with data socialization encourages all information to be shared and stored in a centralized platform, making it far easier to manage and track.

Through curation and governance, it becomes possible to harness the “tribal knowledge” that too often goes unshared, converting it to best practices and shared resources and even making automatic recommendations when a user chooses a dataset to work with. Users get a head start on new projects by being able to search on cataloged data, metadata and data preparation models indexed by user, type, application and unique data values to quickly find the right information.

Maximizing the Data Scientist’s Role

Data scientists add value to organizations when their deep skillsets like R and Python scripting are applied to complex data requests. With self-service data preparation and data socialization, they gain the ability to acquire and prepare data from any source, eliminate/automate redundant work across different silos and share techniques and curated data with peers, empowering business users to leverage data prep features themselves or access pre-prepped datasets. This enables data scientists to serve as an organization’s stewards, monitoring shared content and determining whether it should be certified as an enterprise data source. They can work to improve data quality and build trust in their analytics for better outcomes for everyone.