Purpose of PROC MEANS
The MEANS procedure provides an easy way to compute descriptive statistics. Descriptive statistics such as the mean, minimum, and maximum provide useful information about numeric data.

Specifying Statistics
By default, PROC MEANS computes the n-count (the number of non-missing values), the mean, the standard deviation, and the minimum and maximum values for variables. To specify statistics, list their keywords in the PROC MEANS statement.


Descriptive Statistics
Keyword Description
 CLM Two-sided confidence limit for the mean
 CSS Corrected sum of squares
 CV Coefficient of variation
 KURTOSIS Kurtosis
 LCLM One-sided confidence limit below the mean
 MAX Maximum value
 MEAN Average
 MIN Minimum value
 N Number of observations with nonmissing values
 NMISS Number of observations with missing values
 RANGE Range
 SKEWNESS Skewness
 STDDEV / STD  Standard Deviation
 STDERR Standard error of the mean
 SUM Sum
 SUMWGT Sum of the Weight variable values.
 UCLM One-sided confidence limit above the mean
 USS Uncorrected sum of squares
 VAR Variance


Quantile Statistics
Keyword Description
 MEDIAN / P50  Median or 50th percentile
 P1 1st percentile
 P5 5th percentile
 P10 10th percentile
 Q1 / P25 Lower quartile or 25th percentile
 Q3 / P75 Upper quartile or 75th percentile
 P90 90th percentile
 P95 95th percentile
 P99 99th percentile
 QRANGE Difference between upper and lower quartiles: Q3-Q1


Hypothesis Testing
Keyword Description
 PROBT  Probability of a greater absolute value for the t value
 T Student's t for testing the hypothesis that the population mean is 0


Limiting Decimal Places
Because PROC MEANS uses the BEST. format by default, procedure output can contain unnecessary decimal places. To limit decimal places, use the MAXDEC= option and set it equal to the length that you prefer.

Specifying Variables in PROC MEANS
By default, PROC MEANS computes statistics for all numeric variables. To specify the variables to include in PROC MEANS output, list them in a VAR statement.

Group Processing Using the CLASS Statement
Include a CLASS statement, specifying variable names, to group PROC MEANS output by variable values. Statistics are not computed for the CLASS variables.

Group Processing Using the BY Statement
Include a BY statement, specifying variable names, to group PROC MEANS output by variable values. Your data must be sorted according to those variables. Statistics are not computed for the BY variables.

Creating a Summarized Data Set Using PROC MEANS
You can create an output data set that contains summarized variables by using the OUTPUT statement in PROC MEANS. When you use the OUTPUT statement without specifying the statistic-keyword= option, the summary statistics N, MEAN, STD, MIN, and MAX are produced for all of the numeric variables or for all of the variables that are listed in a VAR statement.

Creating a Summarized Data Set Using PROC SUMMARY
You can also create a summarized output data set by using PROC SUMMARY. The PROC SUMMARY code for producing an output data set is exactly the same as the code for producing an output data set with PROC MEANS. The difference between the two procedures is that PROC MEANS produces a report by default, whereas PROC SUMMARY produces an output data set by default.

The FREQ Procedure
The FREQ Procedure is a descriptive procedure as well as a statistical procedure that produces one-way and n-way frequency tables. It concisely describes your data by reporting the distribution of variable values.

Specifying Variables
By default, the FREQ procedure creates frequency tables for every variable in your data set. To specify the variables to analyze, include them in a TABLES statement.

Creating Two-Way Tables
When a TABLES statement contains two variables joined by an asterisk (*), PROC FREQ produces crosstabulations. The resulting table displays values for

  • cell frequency
  • cell percentage of total frequency
  • cell percentage of row frequency
  • cell percentage of column frequency.

Creating N-Way Tables
Crosstabulations can include more than two variables. When three or more variables are joined in a TABLES statement, the result is a series of two-way tables that are grouped by the values of the first variables listed.

Creating Tables in List Format
To reduce the bulk of n-way table output, add a slash (/) and the LIST option to the end of the TABLES statement. PROC FREQ then prints compact, multi-column lists instead of a series of tables.

Suppressing Table Information
You can suppress the display of specific statistics by adding one or more options to the TABLES statement:

  • NOFREQ suppresses cell frequencies
  • NOPERCENT suppresses cell percentages
  • NOROW suppresses row percentages
  • NOCOL suppresses column percentages.


Syntax

To go to the page where a statement or option was presented, select a link.


PROC MEANS <DATA=SAS-data-set>
<statistic-keyword(s)> <option(s)>
<VAR variable(s)>;
<CLASS variable(s)>;
<BY variable(s)>;
<OUTPUT out=SAS-data-set statistic=variable(s)>;
RUN;

PROC SUMMARY <DATA=SAS-data-set>
<statistic-keyword(s)> <option(s)>
<VAR variable(s)>;
<CLASS variable(s)>;
<OUTPUT out=SAS-data-set >;
RUN;

PROC FREQ <DATA=SAS-data-set>
TABLES variable-1*variable-2 <* ... variable-n>
/ <NOFREQ|NOPERCENT|NOROW|NOCOL> <LIST>;
RUN;



Sample Programs
proc means data=clinic.heart min max maxdec=1;
   var arterial heart cardiac urinary;
   class survive sex;
run;


proc summary data=clinic.diabetes;
   var age height weight;
   class sex;
   output out=work.sum_gender
      mean=AvgAge AvgHeight AvgWeight;
run; 


proc freq data=clinic.heart order=freq;
   tables sex*survive*shock / nopercent list;
run;



Points to Remember
  • In PROC MEANS, use a VAR statement to limit output to relevant variables. Exclude statistics for nominal variables such as ID or ProductCode.

  • By default, PROC MEANS prints the full width of each numeric variable. Use the MAXDEC= option to limit decimal places and to improve legibility.

  • Data must be sorted for BY group processing. You might need to run PROC SORT before using PROC MEANS with a BY statement.

  • PROC MEANS and PROC SUMMARY produce the same results; however, the default output is different. PROC MEANS produces a report, whereas PROC SUMMARY produces an output data set.

  • If you do not include a TABLES statement, PROC FREQ produces statistics for every variable in the data set.

  • Variables that have continuous numeric values can create a large amount of output. Use a TABLES statement to exclude such variables, or group their values by applying a FORMAT statement.