Jackknifing Gini Coefficients

DATA JACK; /* read the data and create data set with the jackknife samples */
ARRAY DATA/200/D1-D200; /* array to store the original data, set up for a maximum of 200 data
points; increase here and in next statement if necessary */
RETAIN D1-D200 /* tell SAS not to clear the array */
NPT 0; /* NPT counts the number of data points */
KEEP IJACK /* IJACK is the number of the jackknife sample */
X I; /* X is an observed value */
INFILE CARDS EOF=WRITE; /* go to the section labeled write when done reading the data */
/* read in original data and copy to jackknife data set */
IJACK = 0; /* IJACK = 0 => working with original data */
INPUT X @@; /* read a data value, possibly more than one on a line */
NPT = NPT + 1; /* update the number of data values */
DATA/NPT/ = X; /* store this value in the data array */
OUTPUT JACK; /* output to the jack data set */
RETURN; /* and go get another input value */
/* after reading all the data, create the jack data sets */
WRITE: /* remove each point in turn from the data set */
DO IJACK = 1 TO NPT; /* loop executed once for each sample */
DO I = 1 TO NPT; /* look at each point in turn */
IF I = IJACK THEN DO; /* if it is in this jackknife sample */
X = DATA/I/; /* get its value from the data array */
OUTPUT JACK; /* and store it on the data set */
/* use SAS macro variables to store useful pieces of information between data steps */
CALL SYMPUT("NPT",NPT); /* the number of data points */

/*** USER adds raw data here. Current input command is set up to read values of one variable. These may be stored 1 value per line or more than one value per line. An unlimited number of lines of data is permitted. If more than 200 data values, users needs to increase the size of the data array and increase the number of D variables used (see top of this data step). ***/

18 18 20 23 25 28


/*** Calculate Gini coefficient for the original data and each jackknife sample. To jackknife other statistics, replace the following PROC SORT and DATA step with a procedure that calculates a statistic BY IJACK and stores it in a variable named JACKX on an output data set named JACKS. ***/

PROC SORT DATA=JACK; /* sort each jackknife sample so that the values are in increasing order */
RETAIN GINISUM /* accumulates (2i-n-1)*X */
SUM /* accumulates X, for calculating mean */
IPT /* index of data value, 1=smallest, NPT=largest */
NPT /* number of values in this sample */
N /* number of values in original sample */
SAMPLE; /* value of Gini coefficient from original data */
BY IJACK; /* need to do these calculations separately for each jackknife sample */
IF FIRST.IJACK THEN DO; /* if this is the first value in the sample initialize sums and counters */
IPT = 0;
SUM = 0;
N = SYMGET("NPT"); /* find number of points in original data */
IF IJACK = 0 THEN NPT = N; /* number of points in this sample */
ELSE NPT = N-1; /* there is one less point in jackknife samples */
IPT = IPT + 1; /* then add 1 to the number of data points */
SUM = SUM + X; /* add the value to the sum */
GINISUM = GINISUM+(2*IPT - (NPT+1))*X; /* and add (2i-n-1)X to its sum */
IF LAST.IJACK THEN DO; /* if just did the last value in the sample */ 
JACKX = GINISUM/((NPT-1)*SUM); /* calculate Gini coefficient for sample */
PSUEDOX = SAMPLE+(N-1)*(SAMPLE-JACKX); /* and pseudo value */
OUTPUT; /* store on JACKS data set */
IF IJACK = 0 THEN DO; /* if this was the Gini coefficient for the observation data */
SAMPLE = JACKX; /* remember the value in a variable */
CALL SYMPUT("SAMPLE",JACKX); /* and store it it a macro variable */
KEEP IJACK PSEUDOX; /* JACKS data set only needs the estimate of the pseudovalue (pseudox) and
the number of the jackknife sample (IJACK) needed to keep the original sample (IJACK=0) separate from the jackknifed values (IJACK > 0) */

Website Terms and Conditions and Privacy Policy
Please send comments or suggestions about this Website to