Data Analysis Using Relationship Graphs

February 15, 2011
266 Views

There are four key data visualization techniques used by data analysis pros in the government and local law enforcement.  As financial institutions, e-commerce organizations and social network analysts begin to apply data visualization more frequently, these techniques will help guide the process of uncovering meaningful insights hidden within mountains of disparate data.  This post focuses on advanced data visualization using relationship graphs.

There are four key data visualization techniques used by data analysis pros in the government and local law enforcement.  As financial institutions, e-commerce organizations and social network analysts begin to apply data visualization more frequently, these techniques will help guide the process of uncovering meaningful insights hidden within mountains of disparate data.  This post focuses on advanced data visualization using relationship graphs.

In our last post (“Four Key Data Visualization Techniques Used by the Pros”), we mentioned four important techniques in data visualization.  They are:

 

1)   Data Preparation & Data Connectivity

2)   Data Profiling

3)   Advanced Analysis Using Relationship Graphs

4)   Annotation, Collaboration and Presentation

 

We summarized the key aspects of data profiling, especially as they relate to uncovering data anomalies prior to advanced analysis.  Using a fraud analysis example, we profiled banking alerts across business lines.   The fraud analyst revealed that specific loan officers were linked to more than one fraud alert.  The alerts also seemed to be concentrated in specific branches.

This post tackles the 3rd phase in the analysis – advanced analysis using relationship graphs.  Unlike traditional forms of business intelligence which usually include summary level charts in a dashboard format, relationship graphs show linkages (relationships) between data entities.   Here’s a simple relationship graph from an earlier post that shows linkages between people, flights and addresses:

 

 

This graph shows that three different people are linked to one common address at 2911 Major Avenue in Minneapolis.  It shows the flights they took and other addresses with which they are associated.  Using this type of data visualization, intelligence analysts identify important connections between data.  They discover “networks” of people, activity and events.  Additional investigation may include watch list checks, identity verification of people in the network and supplemental data analysis using related information from blogs or news.

Relationship graphs are not only used by government agencies and local law enforcement.   CRM analysts explore product purchasing behavior by customer, type of product, store and region.  Marketers measure lead generation performance by analyzing linkages between key phrases used from the major search engines, web pages, completed web forms, opportunities and closed deals.  Pharmaceutical companies identify influential networks of physicians based on accreditations, hospital affiliations, publications, patients and other attributes.

Returning to our example on Fraud Analysis, let’s use this form of advanced analysis to show relationships between banking customers, loan officers, branch affiliation and the address for the property associated with the bank loan.

 

 

After filtering the data to analyze just high appraisal alerts, the analyst notices that some customers are linked to properties in states where the loan officer is not affiliated.   For example, Dan Lane owns a property in Washington State.  His loan officer is Charles Head who is assigned to three branches, none of which are in Washington.  Robert Miles has a loan for a property in Maryland with a loan officer (Jack Carnahan) who works in the Los Angeles Branch.  John Kilpatrick (center of the graph) exhibits similar data anomalies.  These types of insights are almost impossible to discern from detailed tables, spreadsheets or charts.   But relationship graphs reveal them instantly.  

Relationship graphs can also be constructed using data driven attributes.  For example, analysts can pinpoint the most connected nodes or the links with the highest value.  When combined with other forms of data visualization, a more detailed picture is revealed.   In the graph below, the loan officers and banking customers are scaled based on the number of connections they have.   The thickness of the links shows the amount of money at risk to the bank.  The timeline on the left shows the length of time between account origination and an alert being triggered.   Since the visualizations interact with one another, the analyst can identify a person of interest in seconds rather than days. 

 

 

For example, a short interval alert may correspond to a customer connected to more than one fraud alert.  That customer may be connected to a loan officer who shares connections with other people of interest.  Each of these people may be involved in banking transactions where the money at risk to the bank is significant.

In this case, advanced analysis using relationship graphs has provided a detailed picture of connections the fraud analysts can use to isolate cases, prioritize resources and investigate at a pace far beyond what he could have done using traditional forms of business intelligence.   Time saved in this type of analysis can be enormous.   Accurate results are a by-product of the process.

As we will learn in our next two posts, these visualizations are very effective forms of communication allowing analysts to collaborate.   When coupled with the flexibility to integrate other sources of data, relationship graphs can reveal even greater insights. 

This type of analysis has been applied across many domains.  Fraud, Cyber and Intelligence analysis represent three core areas where these techniques have proven useful.  But the applications of relationship graphing extend far beyond these domains.   With the growth of social media, Social Network Analysis (SNA) is becoming more widely adopted to identify important connections, affiliations and spheres of influence across a wide variety of data sets.   At the heart of SNA is the idea that certain people, topics and events are influential within and outside the network.  This same application is being applied to identify and measure other spheres of influence in the life sciences world and social media.   Since a breakdown in one part of the network could negatively impact other parts of the network, the same techniques can be applied in manufacturing, sales and e-commerce.   Some of these important topics will be explored in future posts.

You can learn more about the application of data visualization techniques, please visit www.centrifugesystems.com or www.visualsalesperformance.com