Alternatives to SAS® IF-THEN/ELSE Processing Imelda C. Go, Lexington County School District One, Lexington, SCABSTRACT IF-THEN/ELSE statements are simple and easy to use. However, IF-THEN/ELSE statements have their limitations. They are not always easy to read or to make changes to. They may also be less efficient than other methods that are available in SAS. Alternatives discussed include SELECT groups, ARRAY processing, and PROC FORMAT. The discussion includes examples about creating new variables out of existing ones, recoding variable values, validating data, and controlling output appearance. INTRODUCTION IF-THEN/ELSE statements are basic statements for conditional processing. They are simple, easy to learn, and easy to use. However, they may not always provide the easiest solution to programming problems. Fortunately, there are a number of alternatives to IF-THEN/ELSE processing. The paper is in question-and-answer format and will show through contrived examples alternatives to IF-THEN/ELSE processing. The alternatives provided are not necessarily the only or best ways to handle specific programming situations. Do you use IF-THEN/ELSE statements only for conditional processing? The following assigns the teacher and counselor values based on rating. data one; length teacher counselor $30.; input rating $20.; if rating=’Exemplary’ then teacher=’Frodo’; else if rating in (’Poor’, ’Fair’) then do; teacher=’Aragorn’; counselor=’Gandalf’; end; else do; teacher=’unassigned’; counselor=’Legolas’; end; cards; ... ; A SELECT group may be used instead. data one; length teacher counselor $30.; input rating $20.; select (rating); when(’Exemplary’) teacher=’Frodo’; when(’Poor’,’Fair’) do; teacher=’Aragorn’; counselor=’Gandalf’; end; otherwise do; teacher=’unassigned’; counselor=’Legolas’; end; end; cards; ... ; Do you create datasets with subsetting IF-THEN/ELSE statements only to use the resulting data sets as input for exactly the same procedure(s)? The following creates two data sets: one for males and one for females. data males females; input sex $1. grade 2.; if sex=’M’ then output males; elseif sex=’F’ then output females; cards; ... ; proc freq data=males; tables grade; proc freq data=females; tables grade; When the resulting data sets are mutually exclusive subsets of the original data set and they are used with exactly the same procedures, then BY-group processing can be used with procedures that support BY-group processing. In the rewritten code below, the data also has to be sorted according to the variable specified in the BY statement for PROC FREQ. data one; input sex $1. grade 2.; if sex notin (’M’, ’F’) then delete; cards; ... ; proc sort; by sex; proc freq; by sex; tables grade; Do you create datasets with subsetting IF statements in different DATA steps only to use them as input for exactly the same procedure(s)? The following creates two data sets: one for 10th grade males and one for 7th grade females. data one; input sex $1. grade 2.; cards; ... ; data M10; set one; if sex=’M’ and grade=10; proc freq data=M10; tables grade; data F7; set one; if sex=’F’ and grade=7; proc freq data=F7; tables grade;
When exploratory data analysis is performed, the analyst may need to look at several subsets of data to see if anything of interest might appear. Instead of creating a data set for each subset of interest, use the WHERE statement to specify a subset of the data for the procedure. data one; input sex $1. grade 2.; cards; ... ; proc freq; tables grade; where sex=’M’ and grade=10; proc freq; tables grade; where sex=’F’ and grade=7; There is also the WHERE= data set option. data one; input sex $1. grade 2.; cards; ... ; proc freq data=one (where=(sex=’M’ and grade=10)); tables grade; proc freq data=one (where=(sex=’F’ and grade=7)); tables grade; Do you create new variables with conditional statements only to control the appearance of output? In the example below, the gender2 variable is created for the sole purpose of printing more user-friendly values of M and F (instead of 1 and 2) in PROC FREQ output. data one; input gender; if gender=1 then gender2=’F’; else if gender=2 then gender2=’M’; cards; ... ; proc freq; tables gender2; Instead of creating a new variable, create a user-defined format to control the appearance of output. PROC FREQ will print the values of the gender variable as F and M instead of 1and 2. data one; input gender; cards; ... ; proc format; value gender 1=’F’ 2=’M’; proc freq; format gender gender.; The gender. format may be applied by using the FORMAT statement with a procedure, or it may be applied in the DATA step as shown below. If the format is applied in the DATA step, then the same format will apply to the variable in procedures where the variable is used. proc format; value gender 1=’F’ 2=’M’; data one; input gender; format gender gender.; cards; ... ; proc freq;Do you validate data using conditional statements? Suppose that the valid values for a gender variable are 1 and 2 and that other values are invalid. data one; input gender; if gender notin (1,2) then gender=.; cards; ... ; An informat can be used to perform simple data validation. If all the valid values for a variable are specified, all othervalues can be considered invalid. The keyword OTHER is used to indicate range values that are not included in all the other ranges for an informat. When _ERROR_ is specified as an informatted value, all values in the corresponding informat range are not valid and a missing value will be assigned to the variable. When _SAME_ is specified as an informatted value, a value in the corresponding informat range stays the same. Suppose that values from a variable with integer and non-integer values need to be validated and the only valid values are 1 and 2. The following INVALUE statement uses _SAME_, _ERROR_ , and OTHER for this task: proc format; invalue check 1=_same_ 2=_same_ other=_error_; An informat’s range can be specified as a list of values separated by commas. The following statement is functionally equivalent to the previous one: proc format; invalue check 1,2=_same_ other=_error_; The informat is used in the input statement. SAS will assign a missing value to gender if a value other than 1 or 2 is encountered. data one; input gender check.; cards; ... ;