# SAS Visual Analytics: How Catwoman Influenced My Data

December 31, 2013
167 Views

Image: Wikipedia

Image: Wikipedia

Eartha Kitt was the best Cat Woman. It’s true – her voice and diction was what made her so purr-fect for the role. When I was younger I would watch the Batman reruns (POW!) and then try to walk and talk like Eartha.  ”How can BatGirl be the best anything when CatWoman is around??” [Check out CatWoman in action here!]

You’re probably wondering what that statement has to do with SAS or why I would mention it at all. Last week I was raving about how wonderful it is to use formats with SAS Visual Analytics to solve all your problems and pretty much end all suffering in the world (well, reporting world).  But here’s a tip for working with dressed up  data items.

## A Costume Does Not a Superhero Make

A format is a way to change how your data appears in a chart or table.  You can think of a format like a costume. For instance, even if I had a Cat Woman costume, it would not suddenly empower me with any special powers other than looking similar to her. You wouldn’t find me thinking I had new agile jumping abilities or for that matter deciding to embark upon a criminal enterprise with my new friend the Joker. So the costume would make me look different, but inside I’m still just a geeky SAS programmer.

Think of a format the same way (this is an important concept litter mates!)  Your underlying data does not change because you changed the format. For instance, using the same example from last week, here is my data. The data item, Arrival Date (Original), is the true value; it shows the calls per minute.  There were several ways to change the data item so it could appear as more than one superhero (or date value).  Actually I guess CatWoman was super villain not a hero.

## Has the Joker Been in My Report?

Let’s say that we want to create an “Average Actions Per Hour” data item to plot in a chart similar to this one.  The simplest thing to do is duplicate the Actions (Total) measure and change it’s aggregation from Sum to Average.  That’s what I did in the following graphic. But when I double-checked my math on the averages … it was not what I expected.

Click for larger image

My expectation was that 19 actions divided by 3 days would be 6.3 not 3.8.  Eeek … what happened? I doubled checked and the SUM is correct – there were a total of 19 calls.  The average is wrong … but why? Let’s start by looking at the raw data and then removing some assumptions.  My first assumption was that because the X-axis had the actions plotted by Hour that the Actions (Averaged) data item would know what to do.  Turns out it does know what to do … but just not the way I was thinking.

Remember when I said the costume didn’t make me Cat Woman – well the hour format did change the underlying data in this instance either.  Instead of totaling the actions (19) and then dividing by the number of days (3) it divided by the number of values.  In this case – there were 5 values contributing to the end result.  SAS Visual Analytics didn’t sum by hour – instead it did it by minute.

## Geez – What is SAS Visual Analytics Doing?

Ok – now we know what we don’t want and how we inadvertently introduced errors into the reporting process. [Another  failed example here.] Seems like we just need to aggregate to the hour level and then it should understand. As we went through last week, create a new calculated item.  Since I want the data to be a aggregated to the hour, I need to remove the minutes and seconds from the data item.  Using the DateTimeFromDateHMS function, I rebuilt the data item to be at the hour level.  The entire trick is really when the 0s are used for the minute and second.

Click for larger image

Then I changed the format to Time showing only the hour.  My result is below.  SAS VA is still using the count of variables as the denominator for the average.  So the other values are not having any influence on the calculation. As Robin would say, “The batcomputer is none too frisky today, Batman!!”

## Turn off the Bat Signal … I’ve Got it!

Tell commissioner Gordon we’ve figure it out – no Bat Signal required. We have to let the calculation know how we want it calculated. So let’s build a new aggregation for our measure.

1. Create a New Aggregated Measure.
2. Use a formula similar to the one below.  It’s just sum of actions divided by distinct count of arrival days.
3. Update the chart to use your new measure.

Now let’s look at these results next to our original so we can compare how the chart changes. I added a date range slider so the users can determine how to explore the data. Here’s a video I made to show how the date range slider works for this example. In the video you can see how the new measure handles the distinct number of days.

Click for larger image.

## Oh What Purr-fection!

It’s like many things – when you let the machine do everything it sometimes doesn’t make the choice you expect or intent.  Two lessons from this post – how to create an aggregate and don’t forget to double-check the math.

Don’t forget to check out the video.