Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    image fx (67)
    Improving LinkedIn Ad Strategies with Data Analytics
    9 Min Read
    big data and remote work
    Data Helps Speech-Language Pathologists Deliver Better Results
    6 Min Read
    data driven insights
    How Data-Driven Insights Are Addressing Gaps in Patient Communication and Equity
    8 Min Read
    pexels pavel danilyuk 8112119
    Data Analytics Is Revolutionizing Medical Credentialing
    8 Min Read
    data and seo
    Maximize SEO Success with Powerful Data Analytics Insights
    8 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: Approaches to Big Data Visualization
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Big Data > Data Visualization > Approaches to Big Data Visualization
AnalyticsBig DataBusiness IntelligenceData ManagementData Visualization

Approaches to Big Data Visualization

Richard
Richard
16 Min Read
Data Visualization
SHARE

Displaying results

Data visualization is when you manually or otherwise organize and display data in a pictorial or graphic format in an attempt to enable your audience to:

Contents
Displaying resultsNot a new conceptInstant gratificationsData-driven documentsDashboardsOutliersInvestigation and adjudicationOperational intelligence
  • See the results of your analysis efforts more clearly
  • Simplify the complexities within the data you are using
  • Understand and grasp a point that you are using the data to make

Not a new concept

This concept of using pictures–typography, color, contrast, and shape–to communicate or understand data is not new and has been around for literally centuries, from the manual creation of maps and graphs in the 17th century to the invention of the pie chart in the early 1800s.

Today, computers can be used to process large amounts of data lightning fast to make visualizations tremendously more valuable. Going forward, we can expect the data visualization process to continue to evolve, perhaps as more of a mixture of art and science rather than a numbers crunching technology.

Instant gratifications

An exciting example of the data visualization evolutionary process is how the industry has moved data visualizations past the process of generating and publishing charts and graphs for an audience to review and deliberate on to now having set up an expectation for interactive visualizations.

More Read

big data in healthcare
How Big Data Is Helping To Lower Medical Liability Risks
Development of on-chip optical interconnects for future…
The Great Recession: Four Vendor Responses for Partners
How geeks are opening up government on the Web (via iGov – The…
Reflections on Gate 24

With interactive visualization, we can take the concept of data visualization much, much further by using technology to allow the audience to interact with the data; giving the user the self-service ability to drill down into the generated pictures, charts, and graphs (to access more or specific details), interactively in real time (or near real time) to change what data is displayed (perhaps a different time frame or event) and how it’s processed and/or presented (maybe select a bar graph rather than a pie chart).

This allows visualizations to be much more effective and personalized.

In Chapter 5, Displaying Results with D3, we will go through the topic of displaying the results of analysis on big data using a typical web browser using Data Driven Documents (D3) in a variety of examples. D3 allows the ability to apply pre-built data visualizations to datasets.

Data-driven documents

Data Driven Documents is referred within the open community as D3.

D3 is an open source library written in JavaScript. The objective is to allow for easily manipulating documents based upon data using standard web browsing technologies (such as HTML or CSS). Its value-add is to provide you with full capabilities without having to build your own or strapping yourself to some proprietary framework.

These library components give you excellent tools for big data visualization and a data-driven approach to DOM manipulation. D3’s functional style allows the reuse of library code modules that you’ve already built (or others have already built) adding pretty much any particular features you need or want (or don’t want) to. This creates a means that can become as powerful as you want it (or have the time to make it) to be, to give a unique style to your data visualizations, manipulate and make it all interactive–exactly how you want or need it to be.

Dashboards

As discussed earlier in this chapter, big data is collecting and accumulating daily, in fact; minute-by-minute and there is a realization that organizations rely on this information for a variety of reasons.

Various types of reporting formats are utilized on this data, including data dashboards.

Data Visualization

As with everything, there are various apprehensions as to the most accurate definition of what a data dashboard is.

For example, A. Chiang writes:

“A dashboard is a visual display of the most important information needed to achieve one or more objectives; consolidated and arranged on a single screen so the information can be monitored at a glance.”

Refer to the following link for more information: http://www.dashboardinsight.com/articles/digital-dashboards/fundamentals/what-is-a-dashboard.aspx.

Whatever the definition, any dashboard has the capacity for supplying timely, important information for its audience to use in decision making, if it is well designed and constructed.

It is critical that dashboards present data in a relevant, concise, and well-thought-out manner (not just a collection of visual representations in a workbook or spreadsheet) and in addition, dashboards have to have a supporting infrastructure capable of refreshing the dashboard in a well-timed manner as well as including some form of DQA. Making decisions based upon a dashboard with incorrectly presented, stale, or even incorrect data can lead to disaster.

Chapter 6, Dashboard for Big Data – Tableau, of this book offers examination of the topic of effective dashboarding and includes working examples demonstrating solutions for effectively presenting results based upon your big data analysis in a real-time dashboard format using Tableau.

Tableau is categorized as business intelligence software designed to help people see and understand data; more than just a code library, Tableau is considered to be a suite or a family of interactive data visualization products.

Tableau’s structure allows us the ability to combine multiple views of data from multiple sources into a single, highly effective dashboard that can provide the data consumers with much richer insights. Tableau also works with a variety of formats of (both structured and unstructured) data and can handle the volumes of big data, literally, petabytes or terabytes, millions or billions of rows, turning that big data into valuable visualizations for targeted audiences.

To address the velocity of today’s big data world, you can use Tableau to connect directly to local and cloud data sources, or just import your data for fast in-memory (more on in-memory later in this book) performance.

Another goal of Tableau is self-service analytics (which we mentioned earlier in this chapter and will talk more about later on), where a user can have a dialog with selected data to ask questions (in real time, not in a batch mode) using easy point-and-click analytics to mine big data intuitively and effectively discovering understandings and opportunities that may exist within the dataset or datasets.

Some of the more exciting abilities Tableau offers include:

  • Real-time drag-and-drop cluster analysis
  • Cross data source joining
  • Powerful data connectors
  • Mobile enabled
  • Real-time territory or region data exploration

Outliers

In Chapter 7, Dealing with Outliers Using Python, we will dive into Outliers.

As was defined earlier in this chapter, an outlier is an observation point that is distant or vastly different from the other observed data points within the data.

Although outliers typically represent (only) about 1 to 5 percent of your data, when you’re working with big data, investigating, or even just viewing, 1 to 5 percent of that data is rather difficult.

Investigation and adjudication

Outliers, you see, can be determined to be non-influential or very influential to the point you are trying to make with your data visualization.

The act or process of making this determination is critically important to your analysis, but it is also very problematic when dealing with the larger volumes, many varieties, and velocities of big data. For example, a fundamental step to help make this determination is called the sizing of your samples, which is the main mathematical process of calculating the percentage of outliers to the size of the data sample, which is not so simple a task when the data is in petabytes or terabytes!

Identifying and removing outliers can be tremendously complicated and there are many differences in opinions as to how to go about determining the percentage of outliers that exist in your dataset as well as determining their effect on the data and deciding what to do with them. It is, however, generally accepted that an automated process can be created that can facilitate at least the identification of outliers, possibly even through the use of visualization.

Carrying on, all the approaches for the investigation and adjudication of outliers such as sorting, capping, graphing, and so on require manipulating and processing of the data using a tool that is feature–rich and robust.

This chapter offers working examples demonstrating solutions for effectively and efficiently identifying and dealing with big data outliers (as well as some other dataset anomalies) using Python.

Python is a scripting language that is extremely easy to learn and incredibly readable, since its coding syntax so closely resembles the English language.

According to the article, The 9 most in-demand programming languages of 2016, by Bouwkamp, available at http://www.codingdojo.com/blog/9-most-in-demand-programming-languages-of-2016, Python is listed in the top most in-demand programming languages (at the time of writing).

Born as far back as 1989 and created by Guido van Rossum, Python is actually very simple in nature, but it is also considered by the industry to be extremely powerful, fast, and it can be run in almost any environment.

As per www.python.org:

“Open sourced (and free!), Python is part of the winning formula for productivity, software quality, and maintainability at many companies and institutions around the world.”

There is a growing interest within the industry to utilize the Python language for data analysis and even for big data analysis and it is the exceptional choice for the data scientist to perform typical day to day activities as it provides libraries, in fact a standard library (even some focusing specifically on big data, such as Pydoop and SciPy) to accomplish almost anything you need or want to do with the data you have or are accumulating, including:

  • Automations
  • Building websites and web pages
  • Accessing and manipulating data
  • Calculating statistics
  • Creating visualizations
  • Reporting
  • Building predictive and explanatory models
  • Evaluating models on additional data
  • Integrating models into production systems

As a final note here, Python’s standard library is very extensive, offering a wide range of built-in modules that provide access to system functionalities, as well as standardized solutions to solve many problems that occur in everyday programming making this an obvious choice to explore for dealing with big data outliers and related processing.

Operational intelligence

In Chapter 8, Big Data Operational Intelligence with Splunk, of this book, we concentrate on big data Operational Intelligence.

Operational intelligence (OI) is a type of analytics that attempts to deliver visibility and insight from (usually machine generated) operational or event data, running queries against streaming data feeds in real time, producing analytic results as operational instructions, which can be immediately acted upon by an organization, through manual or automated actions (a clear example of turning datasets into value!).

Sophisticated OI systems also provide the ability to associate metadata with certain metrics, process steps, channels, and so on, found within data. With this ability, it becomes easy to acquire additional related information, for example, machine-generated operational data is typically full of unique identifiers and result or status codes. These codes or identifiers may be efficient for processing and storage, but are not always easily interpreted by human beings. To make this data more readable (and therefore more valuable) we can associate additional information that is more user friendly with the data results–possibly in the form of a status or event description or perhaps a product name or machine name.

Once there is an understanding of the challenges of applying basic analytics and visualization techniques to operational big data, the value of that data can be better or more quickly realized. In this chapter, we offer working examples demonstrating solutions for the valuing of operational or event big data with operational intelligence using Splunk.

So, what is Splunk? H. Klein says:

“Splunk started out as a kind of “Google for Log files”. It does a lot more… It stores all your logs and provides very fast search capabilities roughly in the same way Google does for the internet…”

Splunk software is a great tool to help unlock hidden value in machine generated, operational data (as well as other types of data). With Splunk, you can collect, index, search, analyze, and visualize all your data in one place, providing an integrated method to organize and extract real-time insights from massive amounts of (big data) machine data from virtually anywhere.

Splunk stores data in flat files, assigning indexes to the files. Splunk doesn’t require any database software running in the background to make this happen. Splunk calls these files indexers. Splunk can index any type of time-series data (data with timestamps), making it an optimal choice for big data OI solutions. During data indexing, Splunk breaks data into events based on the timestamps it identifies.

Although using simple search terms will work, (for example, a machine ID) Splunk also offers its own Search Processing Language (SPL). Splunk SPL (think of it as kind of like SQL) is an extremely powerful tool for searching enormous amounts of big data and performing statistical operations on what is relevant within a specific context.

This tutorial has been taken from Big Data Visualization by James D. Miller. Use the code ORSCF50at the checkout to save 50% on the RRP until the 30th of November.

TAGGED:big data visualizationdata dashboard
Share This Article
Facebook Pinterest LinkedIn
Share
ByRichard
Follow:
Richard Gall is co-editor of the Packt Hub. He’s interested in politics, tech culture, and how software is being used by modern businesses.

Follow us on Facebook

Latest News

image fx (2)
Monitoring Data Without Turning into Big Brother
Big Data Exclusive
image fx (71)
The Power of AI for Personalization in Email
Artificial Intelligence Exclusive Marketing
image fx (67)
Improving LinkedIn Ad Strategies with Data Analytics
Analytics Big Data Exclusive Software
big data and remote work
Data Helps Speech-Language Pathologists Deliver Better Results
Analytics Big Data Exclusive

Stay Connected

1.2kFollowersLike
33.7kFollowersFollow
222FollowersPin

You Might also Like

data visualization and data analytics
AnalyticsBest PracticesBig DataData ManagementData VisualizationExclusive

Avoid Analytics Mistakes by Being Aware of Misinformation Visualization

7 Min Read

Does Your City Have a Data Dashboard?

6 Min Read
data visualization tools for your business
Data Visualization

Real-Time Interactive Data Visualization Tools Reshaping Modern Business

5 Min Read
data visualization platforms
Big DataData VisualizationExclusive

New Big Data Visualization Platforms Help You Optimize Decision Making

6 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

ai is improving the safety of cars
From Bolts to Bots: How AI Is Fortifying the Automotive Industry
Artificial Intelligence
ai chatbot
The Art of Conversation: Enhancing Chatbots with Advanced AI Prompts
Chatbots

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?