SAS Admin: Process Data Faster in RDBMS by Buffering the Data in Memory

October 3, 2012
175 Views

Contributed by Stephen Overton to BI Notes


Contributed by Stephen Overton to BI Notes


By default, accessing third party relational databases can be very slow if not configured properly.  I recently started using PostgreSQL 9.1, an open source database, to store high volumes of data for my SAS Global Forum 2013 paper.  At first it was taking forever to load up data because SAS was inserting 1 row at a time into the database table.  After adding a simple option my data processing was off to the races!

Buffering Options

The SAS INSERTBUFF and READBUFF options will improve ODBC and OLE DB libraries dramatically.   By default these are set to 1 and 250 rows respectively for ODBC connections.  Other third party databases, such as Oracle, DB2, or MS SQL Server, will probably benefit as well but I have not been able to test.  Setting these buffer sizes tells SAS how many rows to buffer in memory before processing.  

Using the LIBNAME Option

These options can be added to the LIBNAME statement to set the buffering sizes for all processing done on tables within the library.  Ideally if you have the SAS Metadata server running, your SAS Administrator should set these options through the Data Library manager in SAS Management Console.

If you are using Base SAS or writing code in SAS Enterprise Guide, you can also manually write the LIBNAME step like this:

LIBNAME pgsgf13 ODBC  DBCOMMIT=10000  READBUFF=30000 INSERTBUFF=30000  DATASRC=sasgf13  SCHEMA=public ;

Be sure to check out SAS support for more information on the INSERTBUFF and READBUFF options for the LIBNAME statement.

Using a SAS Data Set Option

You can also explicitly define these buffer options for an individual data step in your code if you want.   This may come in handy depending on the type, size and width of data you plan on inserting.

LIBNAME pgsgf13 ODBC DATASRC=sasgf13 SCHEMA=public ;
data pgsgf13.test(dbcommit=500000 insertbuff=10000 readbuff=10000);
*** DATA STEP STUFF ****;
run;

Be sure to check out SAS support for more information on the INSERTBUFF and READBUFF options for the data step.

Considerations

Careful consideration must be taken into account when setting these options.  The optimal setting depends on your SAS compute server resources and network capacity.  The number of rows to buffer should be much less for very wide tables with lots of character data because of the physical byte sizes of character columns and the overall width of the table.  In my project I am using very skinny fact tables with numeric data, which requires only 8 bytes per column of numeric data.  Assuming I have 10 numeric columns, that’s only about 80 bytes of data per row.  For my data step which inserts a huge volume of data, I could theoretically set the INSERTBUFF equal to something like 1,000,000 rows, but SAS does have a hard limit of approximately 32,000 rows it can buffer in memory :-)

The post SAS Administration: Process Data Faster in RDBMS by Buffering the Data in Memory appeared first on Business Intelligence Notes for SAS® BI Users.