In SAS, DATA and PROC statements are used to define two basic program steps. The, DATA steps create or modify SAS datasets and PROC steps tell SAS what analyses are to be conducted on the dataset. Some programs will not have a PROC step, but almost all will have a DATA step. A DATA or PROC section continues until all of the commands in that section are completed. The end of a section is indicated when another DATA or PROC statement appears or when SAS encounters a RUN statement. DATA Step The DATA step creates a SAS dataset that contains the data along with a "data dictionary." The data dictionary contains information on the variables and their properties (whether they are numeric or character, the width of the values at input, etc.) The following example creates a SAS data set from raw input: DATA EXAMPLE1; INPUT NAME $ SEX $ AGE INCOME; CARDS; Susan F 18 12000 Fred M 20 21586 Jane F 19 22232 (many observations omitted) John M 19 14128 ;
Notice that each command line ends with a semicolon. Also, the dollar sign after the variables NAME and SEX indicate that those variables are character variables and not numbers. It is also good practice to put a semicolon at the end of the data set. This is not essential but it does provide a logical break point. The above DATA step inputs the data but does nothing with it. To conduct an analysis, we need a PROC statement. PROC Step The PROCedure step is used to perform some type of analysis on the data, including PRINTing it. The following are examples of PROC statements.
PROC PRINT; PROC MEANS; VARIABLES AGE INCOME; RUN;
According to the SAS documentation, the RUN command is optional in some versions of SAS. Our version of SAS, however, does seem to require it. Together, the DATA and the PROC steps make up a SAS program.
SAS statements SAS statements follow certain rules so that the program can understand what you want. Specifically, All SAS statements start with a keyword (e.g., DATA, PROC) All SAS statements end with a semicolon (;). The semicolon is like a period at the end of a sentence written in English. SAS statements can start anywhere on a line. However, readability is enhanced by using indents in the same way we indent paragraphs in written English. Furthermore, spaces and blank lines are ignored so it doesn't matter if you put more than one space between words. SAS does not distinguish between upper and lower case letters. Consequently, "PROC" is the same as "Proc." Upper and lower case does matter with data, however, since the statements SEX= 'F' and SEX= 'f' are not equivalent. missing data are represented by a . (a period or dot). Separate Data Files Data are often kept in a separate file to avoid having to re-enter them every time we want to use SAS. External data sets are also easy to share with other researchers. These files must be in text or ASCII format. To ensure this, create them with NotePad or another text editor, or save them using the "*.txt" option in WordPerfect or MS Word after selecting Save As on the File menu. Let's assume that you have entered a data file into NotePad (or some similar text editor) and saved it into a separate file called: Example.raw
The file might look like this:
Susan Fred Jane Wendy Bill John
F M F F M M
18 20 19 22 24 19
12000 21586 22232 25000 25589 14128
Get into the SAS Program Editor (select Clear from the Edit menu if there is already something there). Type the following statements:
FILENAME MYDATA 'A:EXAMPLE.RAW'; DATA CLASS; INFILE MYDATA; INPUT NAME $ SEX$ AGE INCOME; PROC PRINT; PROC MEANS; VARIABLES AGE INCOME; RUN;
Make sure you have a semicolon at the end of each statement! The first line in this program assigns a logical filename (in this case 'MYDATA') to the name of your physical file ('A:EXAMPLE.RAW'). From now on, SAS will refer to your data file by its logical name. Think of this logical name as an alias or nickname. Just as nicknames are usually a short form for something much longer, so logical filenames can avoid your having to type out something like 'c:/sas/students/data/classnote/example1.raw' more than once. The second line ('DATA CLASS;') tells SAS you are invoking the DATA step and that you want to give your data the temporary name 'CLASS.' SAS does not like to work with the original data set more than it has to so it creates temporary data sets internally. This minimizes the possibility of altering (i.e., 'screwing-up') the original data. The third line ('INFILE MYDATA') tells SAS to get, or input, your data file under the logical filename MYDATA and copy it into the temporary file called CLASS. This line replaces the CARDS line that we used in the first tutorial. The fourth line (the INPUT line) tells SAS the names of your variables just as it did in the first example. The two remaining, or PROC lines, indicate what you want done to the data. If the PROC statement is not included in the same run, you may have to use the DATA= option to tell SAS which data set you are using. For example: PROC PRINT DATA=CLASS;