Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    big data analytics in transporation
    Turning Data Into Decisions: How Analytics Improves Transportation Strategy
    3 Min Read
    sales and data analytics
    How Data Analytics Improves Lead Management and Sales Results
    9 Min Read
    data analytics and truck accident claims
    How Data Analytics Reduces Truck Accidents and Speeds Up Claims
    7 Min Read
    predictive analytics for interior designers
    Interior Designers Boost Profits with Predictive Analytics
    8 Min Read
    image fx (67)
    Improving LinkedIn Ad Strategies with Data Analytics
    9 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: Python and Productivity
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Uncategorized > Python and Productivity
Uncategorized

Python and Productivity

Editor SDC
Editor SDC
6 Min Read
SHARE

One of the main benefits of programmability is the ability to extend and automate SPSS Statistics capabilities. I’d like to tell you the story of a recent extension effort: the SPSSINC TURF command and dialog.

TURF analysis is Total Unduplicated Reach and Frequency. It is a common technique in market research. Suppose you […]

One of the main benefits of programmability is the ability to extend and automate SPSS Statistics capabilities. I’d like to tell you the story of a recent extension effort: the SPSSINC TURF command and dialog.

More Read

Email & Video
Clay Shirky: Save Society, Not Newspapers
Is good exposure over-exposed?
Upstream Works Helps Customer Service Agents Improve Efficiency and Effectiveness
Google Apps: The Missing Manual

TURF analysis is Total Unduplicated Reach and Frequency. It is a common technique in market research. Suppose you have a survey about sports viewing popularity. It asks about football, soccer, baseball, basketball, hockey, and other sports. You would like to know how to reach the most viewers with no more than three sports.

You could tabulate with FREQUENCIES the positive responses to each sport. But this doesn’t answer the question, because the audiences will overlap. You would like to know the highest reach of combinations of up to three sports eliminating the overlap.

Calculating the TURF requires finding the set union for all combinations up to a certain size of positive responses to the sports and then presenting the best of those combinations. That is a computationally demanding task that grows explosively as the number of questions increases, but it is conceptually simple.

SPSS Statistics does not have a built-in way to do this, so I set out to create an extension command implemented in Python for it: SPSSINC TURF. First, I decided to work with transposed data and the built-in set algebra capabilities of Python. I pass the question dataset and create a set for each question listing the case numbers that have positive responses. That’s just a few lines of code.

The trickier part was figuring out how to manage all the set union calculations. It’s a set of tree structures for which a little bit of recursion boils the work down to a few lines of code. My first try was getting clumsy, so I went out for a bike ride for a few hours and came back with the algorithm worked out in my head. I believe in the left-brain, right-brain approach: study something intensively; then relax or do something different, and things are much clearer when you return to the subject.

Putting this together, I finished the code, but I was worried that this task would be so computationally demanding that it would be too slow to be useful. As it turned out, though, the approach I took, heavily leveraging Python sets and some other features, runs amazingly fast. And although the sets have to fit in memory, it seems to handle pretty large problems.

I went on to create a dialog box interface using the Version 17 Custom Dialog Builder and extension command syntax using the extension mechanism, which requires a small xml file to define the syntax and uses our extension.py module to handle that interface.

So, what sort of effort did this take? Less than one day, including the bike ride. How much more productive could you be? Taking advantage of the combination of Python and SPSS together along with the CDB and other tools reduced this task to about 225 lines of code plus the dialog and xml.

I posted this to SPSS Developer Central, where it can be downloaded for free. It is written for SPSS Statistics 17, but it will work with version 16 (not including the dialog) with a small change documented in the readme file. One competing product that does this as a main feature sells for a 4-figure price.

The original version posted had a subset of the features I had thought about doing. I wanted to see what interest there might be. Within a few days I had received and implemented a few enhancement requests. By getting the first version out to the world, it was easier to see what additional features users might want. Again, higher productivity by not implementing things that would probably not be used. But maybe I’ll do more later.

This experience is typical of many programmability projects, in my experience. Big results for small amounts of work. Of course, I’ve done this a lot, so I know all the tools and how to approach a problem. Programmability definitely requires an investment in learning the technology, but it’s hard to beat the ROI.

Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

big data analytics in transporation
Turning Data Into Decisions: How Analytics Improves Transportation Strategy
Analytics Big Data Exclusive
AI and fund manager software
AI And The Acceleration Of Information Flows From Fund Managers To Investors
Artificial Intelligence Exclusive
sales and data analytics
How Data Analytics Improves Lead Management and Sales Results
Analytics Big Data Exclusive
ai in marketing
How AI and Smart Platforms Improve Email Marketing
Artificial Intelligence Exclusive Marketing

Stay Connected

1.2kFollowersLike
33.7kFollowersFollow
222FollowersPin

You Might also Like

Using Customer Data? Start With Clean Data

8 Min Read

The Very Model of a Modern DQ General

4 Min Read

The Ethics of Blogging

3 Min Read

SOA design patterns, explored in detail

1 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

ai chatbot
The Art of Conversation: Enhancing Chatbots with Advanced AI Prompts
Chatbots
data-driven web design
5 Great Tips for Using Data Analytics for Website UX
Big Data

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Go to mobile version
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?