Consuming Output for Further Processing

April 22, 2009
148 Views

A common need with SPSS Statistics is to produce some statistical results and then use them for further processing.  We sometimes call that “output as input”.  This is very straightforward if you can do it with AGGREGATE or procedures that produce results as SPSS datasets, but it is possible to do this for anything that SPSS Statistics can put in a pivot table.

One way to retrieve output for reuse is to write a Basic or Python script.  The output would be produced as pivot table(s) in the Viewer.  Then the script would search through the Viewer document and find the desired table.  Using something like the GetValueAt api of the Datacells class (Python) or the ValueAt property of the Data Cells object (Basic), a program can retrieve cell values.  The script might be kicked off via a SCRIPT command.

This works, but it can be tricky to program, and it is roundabout and inefficient.  And in distributed mode with SPSS Statistics Server, this solution is unavailable.

Back in SPSS version 12 we introduced the Output Management System (OMS), but many users have only a vague understanding of this powerful mechanism.  It provides a much easier an

A common need with SPSS Statistics is to produce some statistical results and then use them for further processing.  We sometimes call that “output as input”.  This is very straightforward if you can do it with AGGREGATE or procedures that produce results as SPSS datasets, but it is possible to do this for anything that SPSS Statistics can put in a pivot table.

One way to retrieve output for reuse is to write a Basic or Python script.  The output would be produced as pivot table(s) in the Viewer.  Then the script would search through the Viewer document and find the desired table.  Using something like the GetValueAt api of the Datacells class (Python) or the ValueAt property of the Data Cells object (Basic), a program can retrieve cell values.  The script might be kicked off via a SCRIPT command.

This works, but it can be tricky to program, and it is roundabout and inefficient.  And in distributed mode with SPSS Statistics Server, this solution is unavailable.

Back in SPSS version 12 we introduced the Output Management System (OMS), but many users have only a vague understanding of this powerful mechanism.  It provides a much easier and more efficient solution to grabbing output.  Combining OMS and programmability, the output could still be processed by a Python or Basic script, but it could also be retrieved by a Python or .NET program – instead of a Python script – by using the XML workspace.  This allows for better synchronization, and it works in either local or distributed mode.

OMS is a listener for the output.  It is not built in to particular procedures.  In fact, the procedure does not even know that something is listening.  Rather, you start OMS listening for particular objects.  When an object of interest comes along, OMS keeps a copy.  When you stop the listener, the captured objects are written to memory, to a dataset, or to a file and are available to your SPSS syntax or programmability code.

You tell OMS what to listen for by selecting the types of objects, most often TABLES, and, if desired, the particular types of tables you want, such as a crosstabs table.  You stop the listener with the OMSEND command.  The OMS command specifies what the output format should be – including XML, SAV, HTML, text, Excel, Word, and PDF, and where to write it.

If you write the output to a dataset, then you can activate that dataset and apply standard SPSS commands to it.  You can also access the dataset with Python programmability whether or not is is activated using the Dataset class in the spss module.

A more general mechanism is to have OMS write to the XML workspace.  This is an in-memory structure that can be read by Python or .NET code.  The OMS command assigns a name to the workspace item it creates.  Then the program code can retrieve all or a selected part of that item using the GetXmlUtf16 Python api.  (Similarly for .NET).  You write an XPath expression to say which part of the xml you want.

XML and XPath are very powerful but can be a bit intimidating, so we have provided some Python helper functions to make it easy.  In the spssaux module, which is installed with the Python plug-in, there is a function createXmlOutput that takes care of the OMS wrapper and writing to the workspace.  All you give it is the command syntax you want and the identifiers for the type of table you are interested in.

Correspondingly, getValuesFromXmlWorkspace can retrieve specific information from the workspace item created by the first function.  You use the visible properties of the table to determine what is to be retrieved.  And then you are off to the races.

Here is an example.  Let’s say you want to run a regression and do something if the R Square statistic is too small.  The R Square is in the Model Summary table.  Here’s an example of that table.

The Model Summary Table

The Model Summary Table

So the task is to run the regression and retrieve the second column of this table.  Here is a little Python program to do this.  It expects that the cars.sav data file shipped with SPSS Statistics is the active dataset.

begin program.
import spss, spssaux
cmd="""REGRESSION /DEPENDENT mpg
/METHOD=ENTER horse weight."""
tag, errorlevel = \
spssaux.createXmlOutput(cmd, omsid='Regression',
subtype='Model Summary')
Rsquare =spssaux.getValuesFromXmlWorkspace(tag, 'Model Summary',
colCategory="R Square", cellAttrib="number")[0]
if Rsquare < .7:
print "ouch!"
end program.

Let’s walk through this code.

  • The cmd= line is the syntax to run to create the output we want to harvest.  It could be more than one command.
  • The createXmlOutput call runs the command, specifying that we are interested in the Model Summary table of the Regression command.  It returns two values: a tag to use when retrieving output, and an error code, which is ignored in this example.
  • The getValuesFromXmlWorkspace call uses the tag and the OMS table subtype along with specifying the part of the table we want.  Looking at the example table, we see a column label that can be used for retrieval.  That column will have its value stored as both text and a number in the xml, so we specify that we want the number form.  The function returns a list of the things that matched.  Here we take the first and only element.

We know what to retrieve just by looking at the labels in the table.  In this example, just identifying the column is enough, but you can also specify row labels.  Some tables are too complicated for this approach, but a great many things can be done using this simple model.  The spssaux module also has a createDatasetOutput function that works in a similar way but creates an SPSS dataset instead of xml.  Values would be retrieved from that dataset with the Dataset class or a cursor object.

Note that this table was not retrieved from the Viewer.  It was captured by the OMS listener and placed in the workspace, from which the Python code extracted it.

Inside these Python functions, OMS commands and XPath expressions were generated, but you don’t need to learn those technologies in order to benefit from them.