Group Processing Using the BY Statement

Like the CLASS statement, the BY statement specifies variables to use for categorizing observations.


General form, BY statement:
BY variable(s);

where variable(s) specifies category variables for group processing.


But BY and CLASS differ in two key ways:

  1. Unlike CLASS processing, BY processing requires that your data already be sorted or indexed in the order of the BY variables. Unless data set observations are already sorted, you will need to run the SORT procedure before using PROC MEANS with any BY group.

    Alert: Be careful when sorting data sets to enable group processing. If you don't specify an output data set by using the OUT= option, PROC SORT will overwrite your initial data set with the newly sorted observations.

  2. BY group results have a layout that is different from the layout of CLASS group results. Note that the BY statement in the program below creates four small tables; a CLASS statement would produce a single large table.
          proc sort data=clinic.heart out=work.heartsort;
             by survive sex;
          run;
          proc means data=work.heartsort maxdec=1;
             var arterial heart cardiac urinary;
             by survive sex;
          run;

Survive=DIED Sex=1
Variable N Mean Std Dev Minimum Maximum
Arterial
Heart
Cardiac
Urinary
4
4
4
4
92.5
111.0
176.8
98.0
10.5
53.4
75.2
186.1
83.0
54.0
95.0
0.0
103.0
183.0
260.0
377.0

Survive=DIED Sex=2
Variable N Mean Std Dev Minimum Maximum
Arterial
Heart
Cardiac
Urinary
6
6
6
6
94.2
103.7
318.3
100.3
27.3
16.7
102.6
155.7
72.0
81.0
156.0
0.0
145.0
130.0
424.0
405.0

Survive=SURV Sex=1
Variable N Mean Std Dev Minimum Maximum
Arterial
Heart
Cardiac
Urinary
5
5
5
5
77.2
109.0
298.0
100.8
12.2
32.0
139.8
60.2
61.0
77.0
66.0
44.0
88.0
149.0
410.0
200.0

Survive=SURV Sex=2
Variable N Mean Std Dev Minimum Maximum
Arterial
Heart
Cardiac
Urinary
5
5
5
5
78.8
100.0
330.2
111.2
6.8
13.4
87.0
152.4
72.0
84.0
256.0
12.0
87.0
111.0
471.0
377.0


Hot tip: Because it doesn't require a sorting step, the CLASS statement is easier to use than the BY statement. However, BY group processing can be more efficient when you are categorizing data that includes many variables.