By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData Collective
  • Analytics
    AnalyticsShow More
    data analytics in sports industry
    Here’s How Data Analytics In Sports Is Changing The Game
    6 Min Read
    data analytics on nursing career
    Advances in Data Analytics Are Rapidly Transforming Nursing
    8 Min Read
    data analytics reveals the benefits of MBA
    Data Analytics Technology Proves Benefits of an MBA
    9 Min Read
    data-driven image seo
    Data Analytics Helps Marketers Substantially Boost Image SEO
    8 Min Read
    construction analytics
    5 Benefits of Analytics to Manage Commercial Construction
    5 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-23 SmartData Collective. All Rights Reserved.
Reading: Python and Productivity
Share
Notification Show More
Latest News
data analytics in sports industry
Here’s How Data Analytics In Sports Is Changing The Game
Big Data
data analytics on nursing career
Advances in Data Analytics Are Rapidly Transforming Nursing
Analytics
data analytics reveals the benefits of MBA
Data Analytics Technology Proves Benefits of an MBA
Analytics
anti-spoofing tips
Anti-Spoofing is Crucial for Data-Driven Businesses
Security
ai in software development
3 AI-Based Strategies to Develop Software in Uncertain Times
Software
Aa
SmartData Collective
Aa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Uncategorized > Python and Productivity
Uncategorized

Python and Productivity

Editor SDC
Last updated: 2009/04/22 at 11:26 AM
Editor SDC
6 Min Read
SHARE

One of the main benefits of programmability is the ability to extend and automate SPSS Statistics capabilities. I’d like to tell you the story of a recent extension effort: the SPSSINC TURF command and dialog.

TURF analysis is Total Unduplicated Reach and Frequency. It is a common technique in market research. Suppose you […]

One of the main benefits of programmability is the ability to extend and automate SPSS Statistics capabilities. I’d like to tell you the story of a recent extension effort: the SPSSINC TURF command and dialog.

More Read

big data improves

3 Ways Big Data Improves Leadership Within Companies

IT Is Not Analytics. Here’s Why.
Romney Invokes Analytics in Rebuke of Trump
WEF Davos 2016: Top 100 CEO bloggers
In Memoriam: Robin Fray Carey

TURF analysis is Total Unduplicated Reach and Frequency. It is a common technique in market research. Suppose you have a survey about sports viewing popularity. It asks about football, soccer, baseball, basketball, hockey, and other sports. You would like to know how to reach the most viewers with no more than three sports.

You could tabulate with FREQUENCIES the positive responses to each sport. But this doesn’t answer the question, because the audiences will overlap. You would like to know the highest reach of combinations of up to three sports eliminating the overlap.

Calculating the TURF requires finding the set union for all combinations up to a certain size of positive responses to the sports and then presenting the best of those combinations. That is a computationally demanding task that grows explosively as the number of questions increases, but it is conceptually simple.

SPSS Statistics does not have a built-in way to do this, so I set out to create an extension command implemented in Python for it: SPSSINC TURF. First, I decided to work with transposed data and the built-in set algebra capabilities of Python. I pass the question dataset and create a set for each question listing the case numbers that have positive responses. That’s just a few lines of code.

The trickier part was figuring out how to manage all the set union calculations. It’s a set of tree structures for which a little bit of recursion boils the work down to a few lines of code. My first try was getting clumsy, so I went out for a bike ride for a few hours and came back with the algorithm worked out in my head. I believe in the left-brain, right-brain approach: study something intensively; then relax or do something different, and things are much clearer when you return to the subject.

Putting this together, I finished the code, but I was worried that this task would be so computationally demanding that it would be too slow to be useful. As it turned out, though, the approach I took, heavily leveraging Python sets and some other features, runs amazingly fast. And although the sets have to fit in memory, it seems to handle pretty large problems.

I went on to create a dialog box interface using the Version 17 Custom Dialog Builder and extension command syntax using the extension mechanism, which requires a small xml file to define the syntax and uses our extension.py module to handle that interface.

So, what sort of effort did this take? Less than one day, including the bike ride. How much more productive could you be? Taking advantage of the combination of Python and SPSS together along with the CDB and other tools reduced this task to about 225 lines of code plus the dialog and xml.

I posted this to SPSS Developer Central, where it can be downloaded for free. It is written for SPSS Statistics 17, but it will work with version 16 (not including the dialog) with a small change documented in the readme file. One competing product that does this as a main feature sells for a 4-figure price.

The original version posted had a subset of the features I had thought about doing. I wanted to see what interest there might be. Within a few days I had received and implemented a few enhancement requests. By getting the first version out to the world, it was easier to see what additional features users might want. Again, higher productivity by not implementing things that would probably not be used. But maybe I’ll do more later.

This experience is typical of many programmability projects, in my experience. Big results for small amounts of work. Of course, I’ve done this a lot, so I know all the tools and how to approach a problem. Programmability definitely requires an investment in learning the technology, but it’s hard to beat the ROI.

Editor SDC April 22, 2009
Share this Article
Facebook Twitter Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

data analytics in sports industry
Here’s How Data Analytics In Sports Is Changing The Game
Big Data
data analytics on nursing career
Advances in Data Analytics Are Rapidly Transforming Nursing
Analytics
data analytics reveals the benefits of MBA
Data Analytics Technology Proves Benefits of an MBA
Analytics
anti-spoofing tips
Anti-Spoofing is Crucial for Data-Driven Businesses
Security

Stay Connected

1.2k Followers Like
33.7k Followers Follow
222 Followers Pin

You Might also Like

big data improves
Big DataJobsKnowledge ManagementUncategorized

3 Ways Big Data Improves Leadership Within Companies

6 Min Read
Image
Uncategorized

IT Is Not Analytics. Here’s Why.

7 Min Read

Romney Invokes Analytics in Rebuke of Trump

4 Min Read

WEF Davos 2016: Top 100 CEO bloggers

14 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

giveaway chatbots
How To Get An Award Winning Giveaway Bot
Big Data Chatbots Exclusive
AI chatbots
AI Chatbots Can Help Retailers Convert Live Broadcast Viewers into Sales!
Chatbots

Quick Link

  • About
  • Contact
  • Privacy
Follow US

© 2008-23 SmartData Collective. All Rights Reserved.

Removed from reading list

Undo
Go to mobile version
Welcome Back!

Sign in to your account

Lost your password?