Introduction

Some of the SAS data sets that you work with might be quite large. Large data sets can take a relatively long time to process because, by default, SAS reads observations in a data set sequentially. For example, assume that your data set has five hundred observations. In order to read the five-hundredth observation, SAS first reads the observations numbered 1 through 499, and then reads observation number 500. Sometimes, you might want to make SAS access specific observations directly for greater speed and efficiency.

You will need to access specific observations directly when you want to create a representative sample of a large data set, which can be much easier to work with than the full data set. For example, if you are concerned about the accuracy of the data in a large data set, you could audit a small sample of the data in order to determine if a full audit is necessary. A representative sample is a subset of the full data set. The subset should contain observations that are taken from throughout the original data set so that the subset gives an accurate representation of the full data set. This lesson discusses two types of representative samples:

  • systematic samples
  • random samples.

Indexes can also make working with very large data sets easier. An index is a separate data structure that is associated with a data set, and that contains information about the specific location of observations in the data set according to the value of key variables. An index enables you to access a particular observation directly, without needing to read all of the observations that precede it in the data set. Indexes are useful in many instances, including WHERE and BY processing. This lesson discusses how to create and maintain both simple and composite indexes.


Notice:

Setting Up Filerefs for Practices in This Lesson



1.5 hours



In this lesson, you learn to

  • create a systematic sample from a known number of observations
  • create a systematic sample from an unknown number of observations
  • create a random sample with replacement
  • create a random sample without replacement
  • use indexes
  • create indexes in the DATA step
  • manage indexes with PROC DATASETS
  • manages indexes with PROC SQL
  • document and maintain indexes.

complete the following lessons:

  • .