New Features and Old: Statistics in TABLE
By Noreen Redden
FOCUS provides through the ANALYSE command many statistical functions, including regression analysis, mean, median and quartiles. The output of ANALYSE may be a HOLD file, which then allows the user to report from the HOLD file.
However, some statistical parameters actually can be determined directly through TABLE. Thus, a user may generate these parameters in any release or platform of FOCUS or WebFOCUS.
Most of us are familiar with AVE., which gives us the mean of a particular field. Once you know the rules, mode (the value with the highest frequency), median (the center value), and standard deviation are also derivable in one pass of the data through TABLE.
Using the DEFINE FUNCTION command (click here for more information) to create the Standard Deviation Function, This procedure will display MEAN, NUM_OBS RANGE and standard deviation. (In this case using SALES from the CAR file).
DEFINE FUNCTION SSDEV/D15.3
(VAR1/D12.2,AQ/D12.2,AV/D12.2,CQ/I9)
SVAR/D12.2 =(AQ-(AV * AV))
* (CQ /(CQ -1)) ;
SSDEV/D15.3=SQRT(SVAR);
END
Once the function has been set up in a PROFILE (FOCPROF, EDASPROF), it can be used with any field in any file.
-* SINCE I DECLARE THAT THE VARIABLE MUST
BE D12.2 …
DEFINE FILE CAR
XSALES/D12.2 = SALES;
END
TABLE FILE CAR
SUM
CNT.XSALES AS 'NUM_OBS' AVE.SALES AS 'MEAN'
MIN.SALES MAX.SALES
COMPUTE RANGE/D7 = MAX.SALES MIN.SALES;
COMPUTE STD/D15.3 = SSDEV( XSALES,
ASQ.XSALES, AVE.XSALES, CNT.XSALES);
SSDEV XSALES
BY COUNTRY
END
If MODE or MEDIAN is required, a multi-verb set will be needed. The parameterized FOCEXEC shown on the following page will give NUM_OBS, RANGE, MODE, MEDIAN, VARIANCE and STANDARD DEVIATION for any field in any file. (There is also an optional grouping parameter.) This does not require the DEFINE FUNCTION operation, and is not platform- or release-dependent.
Now, for some other statistical variables:
TILES
Release 7.2 with FOCUS for S/390 and WEBFOCUS release 5 introduced TILES. This is an extension of the BY [HIGHEST] syntax, which apportions those values as equally as possible into the number of TILE groups you specify. For instance, if a particular class
has 1,000, and you need to divide into the Top 10 percent, etc., use the TILE 10 operation.
TABLE FILE STUDENTS
PRINT SSN NAME BY
HIGHEST GPA IN-GROUPS-OF
10 TILES
END
About 100 students will be in each tile. This is an estimate because if there are "tied" values of GPA, then all students with the same GPA will be in the same TILE even if this makes that particular TILE contain more than 100 students. For more details,
refer to the document here.
FORECAST
Another new feature of release 7.2 of FOCUS for S/390, and WEBFOCUS Release 5 supports predicting values beyond the range of the values stored in the data source using moving average, exponential moving average or regression analysis. This forecasting is invoked via the RECAP
command.
The following syntax is for the MOVAVE and EXPAVE methods:
ON sfld RECAP fld1[/fmt] = FORECAST(fld2,
interval, npredict, 'method',npoint);
The following syntax is for the REGRESS method (omits the npoint parameter):
ON sfld RECAP fld1[/fmt] =
FORECAST(fld2, interval, npredict, 'REGRESS');
For more details, refer to the document here.
One example would be to project SALES for the next four months.
TABLE FILE SALES
SUM SALES BY MONTH
ON MONTH RECAP SALESP/D12.2 =
FORECAST(SALES,1,3,’MOVAVE’,4);
END
This will project SALES for each of the next four months, using the moving average of three months at a time.
|
-* The first section is simply to get the values for the variables,
-* filename, fieldname, group by, and what stats are required
-* The following variables contain the information required:
-* &1 = FILENAME
-* &2 = FIELDNAME
-* &3 = Sort field (GROUP BY) or Blank if no grouping required
-* Following variables will have a value of Y if the statistic is required
-* &NUM_OBS -- Number of observations
-* &RANGE --- Minimum and Maximum
-* &MODE --- Most frequently occurring value
-* &MEDIAN --- center value or average of 2 center values (if even NUM_OBS)
-* &VARIANCE --- Variance
-* &STDEV -- Standard Deviation
-SET &OBSP=IF &NUM_OBS EQ 'Y' THEN 'AS NUMBER, OBSERVATIONS’ ELSE - 'NOPRINT';
-SET &MIN=IF &RANGE EQ 'Y' THEN 'MIN.'|&2 |' MAX.'|&2 ELSE ' ';
-SET &STD = IF &STDEV EQ 'Y' THEN 'AS STANDARD, DEVIATION’ ELSE
- 'NOPRINT';
-SET &AVE=IF &MEAN EQ 'Y' THEN 'AS MEAN' ELSE 'NOPRINT';
-SET &MDN=IF &MEDIAN EQ 'Y' THEN 'AS MEDIAN' ELSE 'NOPRINT';
-SET &VARN = IF &VARIANCE EQ ‘Y’ THEN ‘AS VARIANCE’ ELSE -
- ‘NOPRINT’;
-STEP2
-* Each instance of the field to be analyzed must be differentiated
-* Therefore, we assign a different number to each instance.
DEFINE FILE &1
XKEY/I9 WITH &2 = XKEY + 1;
END
TABLE FILE &1
-* the total of the field observed (to be used in calculation of average/mean
SUM &2 NOPRINT
-* the count of number of observations
CNT.&2 &OBSP
-* MINIMUM and MAXIMUM if range requested
&MIN
-* using ASQ , calculate the first portion required for standard deviation
COMPUTE SVAR.&2=(ASQ.&2-(AVE.&2*AVE.&2))
* (CNT.&2/(CNT.&2-1)); &VARN
COMPUTE SSDEV.&2|/D15.3=SQRT(SVAR.&2);
&STD
-* determine which occurrence will be median
-* IF AN EVEN NUMBER OF ENTRIES, MEDIAN IS THE AVERAGE OF EITHER -* SIDE
-* IF AN UNEVEN NUMBER OF ENTRIES, MEDIAN IS THE CENTER VALUE
COMPUTE AVERIT/A1 = IF IMOD(CNT.&2,2,'I1') EQ 0
THEN 'Y' ELSE 'N'; NOPRINT
COMPUTE AVE4/I4 = IMOD(CNT.&2,4,'I1'); NOPRINT
COMPUTE MEAN = &2 / CNT.&2; &AVE
-* DETERMINE WHICH ENTRY DETERMINES THE QUARTILE
COMPUTE Q2/I3 = CNT.&2|/2; NOPRINT
-* SECOND VERB DETERMINES MODE
-IF &3 EQ ' ' GOTO NOBY00;
BY &3 NOPRINT UNDER-LINE SUBHEAD
"GROUP: &3 = <&3 "
-NOBY00
SUM CNT.XKEY NOPRINT MIN.&2 NOPRINT
COMPUTE CHKIT/I9 =
-IF &3 EQ ' ' GOTO NOBY01;
IF &3 EQ LAST &3 THEN CNT.XKEY ELSE
-NOBY01
IF CNT.XKEY GT CHKIT THEN CNT.XKEY ELSE CHKIT; NOPRINT
COMPUTE MODE/D15.2 =
IF CNT.XKEY EQ CHKIT
THEN MIN.&2 ELSE MODE;
-IF &3 EQ ' ' GOTO NOBY0;
BY &3
-NOBY0
BY &2 NOPRINT
-*
-* 3rd VERB NEEDS ENTRY # TO DETERMINE MEDIAN
PRINT &2 NOPRINT
COMPUTE CTR =
-IF &3 EQ ' ' GOTO NOBY1;
IF &3 NE LAST &3 THEN 1 ELSE
-NOBY1
CTR + 1; NOPRINT
COMPUTE MEDIAN = IF AVERIT EQ 'N' AND CTR EQ (Q2 + 1) THEN &2 ELSE
IF AVERIT EQ 'N' THEN MEDIAN ELSE IF CTR EQ Q2 THEN &2 ELSE
IF CTR EQ Q2 + 1 THEN (MEDIAN + &2) / 2
ELSE MEDIAN; &MDN
-IF &3 EQ ' ' GOTO NOBY2;
BY &3
-NOBY2
BY &2 NOPRINT
WHERE TOTAL CTR EQ CNT.&2
-* ONLY PRINTS LAST LINE, SO NO DETAIL IS SHOWN
END
|

|
|