Modeling Visualization Macros

March 16, 2009
92 Views

I was creating some scoring models and decided to look for some macros.

1) Here is a nice SAS Macro from Wensui’s blog at http://statcompute.spaces.live.com/blog/

Its particularly useful for Modelling chaps, I have seen a version of this Macro sometime back which had curves also plotted but this one is quite nice too

2) I then found some ROC macros for SAS, and one document for SPSS

3) I found a R package for ROC curves

and compared all three

 

 

 

 


I was creating some scoring models and decided to look for some macros.

1) Here is a nice SAS Macro from Wensui’s blog at http://statcompute.spaces.live.com/blog/

Its particularly useful for Modelling chaps, I have seen a version of this Macro sometime back which had curves also plotted but this one is quite nice too

 

 

2) I then found some ROC macros for SAS, and one document for SPSS

3) I found a R package for ROC curves

and compared all three

 

SAS MACRO TO CALCULATE GAINS CHART WITH KS

%macro ks(data = , score = , y = );

options nocenter mprint nodate;

data _tmp1;
  set 
&data;
  where &score ~= . and y in (1, );
  random = ranuni(1);
  keep &score &y random;
run;

proc sort data = _tmp1 sortsize = max;
  by descending &score random;
run;

data _tmp2;
  set _tmp1;
  by descending &score random;
  i + 1;
run;

proc rank data = _tmp2 out = _tmp3 groups = 10;
  var i;
run;

proc sql noprint;
create table
  _tmp4 as
select
  i + 1       as decile,
  count(*)    as cnt,
  sum(&y)     as bad_cnt,
  min(&score) as min_scr format = 8.2,
  max(&score) as max_scr format = 8.2
from
  _tmp3
group by
  i;

select
  sum(cnt) into :cnt
from
  _tmp4;

select
  sum(bad_cnt) into :bad_cnt
from
  _tmp4;    
quit;

data _tmp5;
  set _tmp4;
  retain cum_cnt cum_bcnt cum_gcnt;
  cum_cnt  + cnt;
  cum_bcnt + bad_cnt;
  cum_gcnt + (cnt – bad_cnt);
  cum_pct  = cum_cnt  / &cnt;
  cum_bpct = cum_bcnt / &bad_cnt;
  cum_gpct = cum_gcnt / (&cnt &bad_cnt);
  ks       = (max(cum_bpct, cum_gpct) – min(cum_bpct, cum_gpct)) * 100;

  format cum_bpct percent9.2 cum_gpct percent9.2
         ks       6.2;
  
  label decile    = ‘DECILE’
        cnt       = ‘#FREQ’
        bad_cnt   = ‘#BAD’
        min_scr   = ‘MIN SCORE’
        max_scr   = ‘MAX SCORE’
        cum_gpct  = ‘CUM GOOD%’
        cum_bpct  = ‘CUM BAD%’
        ks        = ‘KS’;
run;

title “%upcase(&score) KS”;
proc print data  = _tmp5 label noobs;
  var decile cnt bad_cnt min_scr max_scr cum_bpct cum_gpct ks;
run;    
title;

proc datasets library = work nolist;
  delete _: / memtype = data;
run;
quit;

%mend ks;    

data test;
  do i = 1 to 1000;
    score = ranuni(1);
    if score * 2 + rannor(1) * 0.3 > 1.5 then y = 1;
    else y = ;
    output;
  end;
run;

%ks(data = test, score = score, y = y);

/*
SCORE KS              
                                MIN         MAX
  
DECILE    #FREQ    #BAD       SCORE       SCORE     CUM BAD%    CUM GOOD%        KS
   1       100      87         0.91        1.00      34.25%        1.74%      32.51
   2       100      78         0.80        0.91      64.96%        4.69%      60.27
   3       100      49         0.69        0.80      84.25%       11.53%      72.72
   4       100      25         0.61        0.69      94.09%       21.58%      72.51
   5       100      11         0.51        0.60      98.43%       33.51%      64.91
   6       100       3         0.40        0.51      99.61%       46.51%      53.09
   7       100       1         0.32        0.40     100.00%       59.79%      40.21
   8       100       0         0.20        0.31     100.00%       73.19%      26.81
   9       100       0         0.11        0.19     100.00%       86.60%      13.40
  10       100       0         0.00        0.10     100.00%      100.00%       0.00
*/

Its particularly useful for Modelling , I have seen a version of this Macro sometime back which had curves also plotted but this one is quite nice too.

Here is another example of a SAS Macro for ROC Curve  and this one comes from http://www2.sas.com/proceedings/sugi22/POSTERS/PAPER219.PDF

APPENDIX A
Macro
/***************************************************************/;
/* MACRO PURPOSE: CREATE AN ROC DATASET AND PLOT */;
/* */;
/* VARIABLES INTERPRETATION */;
/* */;
/* DATAIN INPUT SAS DATA SET */;
/* LOWLIM MACRO VARIABLE LOWER LIMIT FOR CUTOFF */;
/* UPLIM MACRO VARIABLE UPPER LIMIT FOR CUTOFF */;
/* NINC MACRO VARIABLE NUMBER OF INCREMENTS */;
/* I LOOP INDEX */;
/* OD OPTICAL DENSITY */;
/* CUTOFF CUTOFF FOR TEST */;
/* STATE STATE OF NATURE */;
/* TEST QUALITATIVE RESULT WITH CUTOFF */;
/* */;
/* DATE WRITTEN BY */;
/* */;
/* 09-25-96 A. STEAD */;
/***************************************************************/;
%MACRO ROC(DATAIN,LOWLIM,UPLIM,NINC=20);
OPTIONS MTRACE MPRINT;
DATA ROC;
SET &DATAIN;
LOWLIM = &LOWLIM; UPLIM = &UPLIM; NINC = &NINC;
DO I = 1 TO NINC+1;
CUTOFF = LOWLIM + (I-1)*((UPLIM-LOWLIM)/NINC);
IF OD > CUTOFF THEN TEST=”R”; ELSE TEST=”N”;
OUTPUT;
END;
DROP I;
RUN;
PROC PRINT;
RUN;
PROC SORT; BY CUTOFF;
RUN;
PROC FREQ; BY CUTOFF;
TABLE TEST*STATE / OUT=PCTS1 OUTPCT NOPRINT;
RUN;
DATA TRUEPOS; SET PCTS1; IF STATE=”P” AND TEST=”R”;
TP_RATE = PCT_COL; DROP PCT_COL;
RUN;
DATA FALSEPOS; SET PCTS1; IF STATE=”N” AND TEST=”R”;
FP_RATE = PCT_COL; DROP PCT_COL;
RUN;
DATA ROC; MERGE TRUEPOS FALSEPOS; BY CUTOFF;
IF TP_RATE = . THEN TP_RATE=0.0;
IF FP_RATE = . THEN FP_RATE=0.0;
RUN;
PROC PRINT;
RUN;
PROC GPLOT DATA=ROC;
PLOT TP_RATE*FP_RATE=CUTOFF;
RUN;
%MEND;

 

VERSION 9.2 of SAS has a macro called %ROCPLOT http://support.sas.com/kb/25/018.html

 

SPSS also uses ROC curve and there is a nice document here on that

http://www.childrensmercy.org/stats/ask/roc.asp

Here are some examples from R with the package ROCR from

http://rocr.bioinf.mpi-sb.mpg.de/

 

image

 

 

Using ROCR’s 3 commands to produce a simple ROC plot:
pred <- prediction(predictions, labels)
perf <- performance(pred, measure = “tpr”, x.measure = “fpr”)
plot(perf, col=rainbow(10))

The graphics are outstanding in the R package and here is an example

Citation:

Tobias Sing, Oliver Sander, Niko Beerenwinkel, Thomas Lengauer.
ROCR: visualizing classifier performance in R.
Bioinformatics 21(20):3940-3941 (2005).

 

Share/Save/Bookmark

 

You may be interested

Education and the Blockchain – Should We be Teaching Blockchain in Schools?
IT
55 shares497 views
IT
55 shares497 views

Education and the Blockchain – Should We be Teaching Blockchain in Schools?

Glen Allard - July 26, 2017

It goes without saying that tech progress is moving at a rapid pace. Futurists point to Moore’s law – the…

5 Effective Strategies for Boosting IoT Security
Internet of Things
79 shares1,308 views
Internet of Things
79 shares1,308 views

5 Effective Strategies for Boosting IoT Security

Ryan Kh - July 25, 2017

With the emergence of IoT devices that are being rolled out from time to time, the serious IoT security issues…

The Future of Healthcare and Big Pharma is in Big Data Analytics
Analytics
634 views
Analytics
634 views

The Future of Healthcare and Big Pharma is in Big Data Analytics

riteshmehta - July 25, 2017

The healthcare industry recognizes that Big Data as and opportunity and a challenge for the whole sector. Nevertheless, systems and…