Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    How Data Analytics Is Reshaping Patient Financing Decisions
    How Data Analytics Is Reshaping Patient Financing Decisions
    13 Min Read
    business using business intelligence
    How to Use a Competitive Intelligence Dashboard to Turn Market Data Into Smarter Marketing Decisions 
    9 Min Read
    unusual trading activity
    Signal Or Noise? A Decision Tree For Evaluating Unusual Trading Activity
    3 Min Read
    software developer using ai
    How Data Analytics Helps Developers Deliver Better Tech Services
    8 Min Read
    ai for stock trading
    Can Data Analytics Help Investors Outperform Warren Buffett
    9 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: Python and Productivity
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Uncategorized > Python and Productivity
Uncategorized

Python and Productivity

Editor SDC
Editor SDC
6 Min Read
SHARE

One of the main benefits of programmability is the ability to extend and automate SPSS Statistics capabilities. I’d like to tell you the story of a recent extension effort: the SPSSINC TURF command and dialog.

TURF analysis is Total Unduplicated Reach and Frequency. It is a common technique in market research. Suppose you […]

One of the main benefits of programmability is the ability to extend and automate SPSS Statistics capabilities. I’d like to tell you the story of a recent extension effort: the SPSSINC TURF command and dialog.

More Read

Top 4 Things Businesses Should Know about Apple WatchOS 2
Facebook, Scrabble, and the Limits of Free
Celebrating the Employee
9 Questions CEO’s Should Ask About Data Governance
TEDx: Big Brains Meet Barcamps

TURF analysis is Total Unduplicated Reach and Frequency. It is a common technique in market research. Suppose you have a survey about sports viewing popularity. It asks about football, soccer, baseball, basketball, hockey, and other sports. You would like to know how to reach the most viewers with no more than three sports.

You could tabulate with FREQUENCIES the positive responses to each sport. But this doesn’t answer the question, because the audiences will overlap. You would like to know the highest reach of combinations of up to three sports eliminating the overlap.

Calculating the TURF requires finding the set union for all combinations up to a certain size of positive responses to the sports and then presenting the best of those combinations. That is a computationally demanding task that grows explosively as the number of questions increases, but it is conceptually simple.

SPSS Statistics does not have a built-in way to do this, so I set out to create an extension command implemented in Python for it: SPSSINC TURF. First, I decided to work with transposed data and the built-in set algebra capabilities of Python. I pass the question dataset and create a set for each question listing the case numbers that have positive responses. That’s just a few lines of code.

The trickier part was figuring out how to manage all the set union calculations. It’s a set of tree structures for which a little bit of recursion boils the work down to a few lines of code. My first try was getting clumsy, so I went out for a bike ride for a few hours and came back with the algorithm worked out in my head. I believe in the left-brain, right-brain approach: study something intensively; then relax or do something different, and things are much clearer when you return to the subject.

Putting this together, I finished the code, but I was worried that this task would be so computationally demanding that it would be too slow to be useful. As it turned out, though, the approach I took, heavily leveraging Python sets and some other features, runs amazingly fast. And although the sets have to fit in memory, it seems to handle pretty large problems.

I went on to create a dialog box interface using the Version 17 Custom Dialog Builder and extension command syntax using the extension mechanism, which requires a small xml file to define the syntax and uses our extension.py module to handle that interface.

So, what sort of effort did this take? Less than one day, including the bike ride. How much more productive could you be? Taking advantage of the combination of Python and SPSS together along with the CDB and other tools reduced this task to about 225 lines of code plus the dialog and xml.

I posted this to SPSS Developer Central, where it can be downloaded for free. It is written for SPSS Statistics 17, but it will work with version 16 (not including the dialog) with a small change documented in the readme file. One competing product that does this as a main feature sells for a 4-figure price.

The original version posted had a subset of the features I had thought about doing. I wanted to see what interest there might be. Within a few days I had received and implemented a few enhancement requests. By getting the first version out to the world, it was easier to see what additional features users might want. Again, higher productivity by not implementing things that would probably not be used. But maybe I’ll do more later.

This experience is typical of many programmability projects, in my experience. Big results for small amounts of work. Of course, I’ve done this a lot, so I know all the tools and how to approach a problem. Programmability definitely requires an investment in learning the technology, but it’s hard to beat the ROI.

Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

How Data Analytics Is Reshaping Patient Financing Decisions
How Data Analytics Is Reshaping Patient Financing Decisions
Analytics Big Data Exclusive
AI driven big data company
How AI-Driven Workflows Are Changing the Way Companies Think About Data Risk
Artificial Intelligence Data Management Exclusive Risk Management
ai product development
Why Businesses Outsource AI Product Development Companies
Exclusive News
banking tools
The Fintech and Banking Tools Global Entrepreneurs Rely On
Fintech Infographic

Stay Connected

1.2KFollowersLike
33.7KFollowersFollow
222FollowersPin

You Might also Like

Jason Adams Explains TunkRank

1 Min Read

Public Expression, Liability, and Anonymity

3 Min Read

Wolfram Talks About Wolfram Alpha

4 Min Read

Largest HIPAA Breach Ever: Hackers Steal Data on 4.5 Million Community Health Systems Patients

5 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

AI chatbots
AI Chatbots Can Help Retailers Convert Live Broadcast Viewers into Sales!
Chatbots
ai in ecommerce
Artificial Intelligence for eCommerce: A Closer Look
Artificial Intelligence

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?