A Survey of Some of The Most Useful SAS Functions: Ron Cody, Camp Verde, Texas
A Survey of Some of The Most Useful SAS Functions: Ron Cody, Camp Verde, Texas
ABSTRACT
SAS Functions provide amazing power to your DATA step programming. Some of these functions are essential—
some of them save you writing volumes of unnecessary code. This paper covers some of the most useful SAS
functions. Some of these functions may be new to you and they will change the way you program and approach
common programming tasks.
INTRODUCTION
The majority of the functions described in this paper work with character data. There are functions that search for
strings, others that can find and replace strings or join strings together. Still others that can measure the spelling
distance between two strings (useful for "fuzzy" matching). Some of the newest and most amazing functions are not
functions at all, but call routines. Did you know that you can sort values within an observation? Did you know that
not only can you identify the largest or smallest value in a list of variables, but you can identify the second or third or
th
n largest of smallest value? If this introduction has caught your attention, read on!
Remember, the storage length of a SAS character variable is set at compile time. Since the LENGTH statement
comes before the assignment statement for String, SAS assigns a length of 7 for String. The LENGTHN function
returns a 3 since this is the length of String, not counting the trailing blanks. Finally, by concatenating a colon on
each side of String, it is easy to see that this value contains 4 trailing blanks.
If you move the LENGTH statement further down in the program like this:
1
NESUG 2012 Foundations and Fundamentals
Program 2
data chars2;
String = 'abc';
length String $ 7;
Storage_length = lengthc(String);
Length = lengthn(String);
Display = ":" || String || ":";
put Storage_length= /
Length= /
Display=;
run;
Notice that the LENGTH statement is ignored. Since String = 'abc' appears before the LENGTH statement, the
length of String has already been set. As a good rule-of-thumb, run PROC CONTENTS on all of your data sets and
check the storage length of all your character variables. Don't be surprised if you see some character variables with
lengths of 200, the default length for many of the SAS character functions—that is, the length that SAS assigns to a
variable if you do not implicitly indicate the length in a LENGTH statement or some other way.
*New way;
if missing(Age) then . . .
if missing(Char) then . . .
The argument to the missing function can either be character or numeric and the function returns a value of true if the
argument is a missing value and false otherwise. I highly recommend that you use this function in any program
where you need to test for a missing value. You will find that the programs read so much better.
If you need to set one or more character or numeric variables to a missing value, you can do it the old way like this:
array x[10] x1-x10;
array chars[5] a b c;
do i = 1 to 10;
x[i] = .;
end;
do I = 1 to 3;
chars[i] = ' ';
end;
drop i;
or you can save yourself a lot of effort by using the call missing routine like this:
call missing(of x1-x10, a, b, c);
2
NESUG 2012 Foundations and Fundamentals
think about the INPUT function is to ask yourself "What does an INPUT statement do?" It takes a text value, usually
from a file and reads it according to a supplied INFORMAT. Well, the INPUT function does a similar thing. It takes a
text value (the first argument to the function) and "reads" it as if it were reading data from a file, according to the
INFORMAT that you supply as the second argument. Perhaps the next program will make this clear:
Program 3
data _null_;
c_date = "9/15/2004";
c_num = "123";
Sas_Date = input(c_date,mmddyy10.);
Number = input(c_num,10.);
put SAS_Date= Number=;
run;
You have two character variables in this program (c_date and c_num). By using the INPUT function you created a
true SAS date (numeric) on the first value and performed a character to numeric conversion on the second. Notice
that the informat used to convert c_num is 10. This is not a problem. Unlike reading text from a file, the INFORMAT
you supply cannot read past the end of the character value. After you run this program the value of SAS_Date and
Number are:
Figure 3: Output from Program 3
SAS_Date = 16329
Number = 123
This example takes three numeric values (a SAS date, a number, and a social security number) and creates three
character variables. After you run this program, the values of the character variables are:
Figure 4: Output from Program 4
Char_Date = "1/2/1960"
Money = "$1,234.00"
SS_Char = "123-45-6789"
The next program shows how you can use a format to group ages into categories. This is somewhat easier than
writing a series of IF-THEN-ELSE statements. Here is the program:
Program 5
proc format;
value agegrp 0-20='0 to 20'
21-40='21 to 40'
3
NESUG 2012 Foundations and Fundamentals
41-high='41+';
run;
data PutEx;
input Age @@;
AgeGroup = put(Age,agegrp.);
datalines;
15 25 60
;
The new variable AgeGroup is now a character variable with the formatted values for Age. The storage length of
this new variable is the longest formatted value. Below, you can see the values of Age and AgeGroup.
Program 6
data locate;
input String $10.;
First = find(String,'xyz','i');
First_c = findc(String,'xyz','i');
/* i means ignore case */
datalines;
abczyx1xyz
1234567890
abcz1y2x39
XYZabcxyz
;
This example uses the 'i' modifier for both functions. By using this modifier, you save yourself the trouble of having to
change the case of one or more strings before you start your search.
Figure 6: Output from Program 6
String First First_c
abczyx1xyz 8 4
1234567890 0 0
abcx1y2z39 0 4
XYZabcxyz 1 1
In the first observation, the substring 'xyz' is not found until the eighth position in String. Because the FINDC function
is looking for an 'x' or a 'y' or a 'z', it returns a 4 in the first observation because of the 'z' in the fourth position. Notice
that when there are no matches as in observation 2, the functions return a 0.
4
NESUG 2012 Foundations and Fundamentals
The COMPRESS function will remove spaces from Phone1, because you only used one argument. For Phone2, you
specified open and closed parentheses, a dash and a space. And for Phone3, you specified the two modifiers 'k' and
'd'. Notice the two commas. They are necessary to tell SAS that the 'kd' are modifiers (third argument) and not a list
of character to remove (second argument). Notice in the listing below, that Phone2 and Phone3 are the same.
However, had there been any extraneous characters in Phone, Phone2 would still contain those characters.
Figure 7: Output from Program 7
Phone Phone1 Phone2 Phone3
(908)235-4490 (908)235-4490 9082354490 9082354490
(201) 555-77 99 (201)555-7799 2015557799 2015557799
Here is another very useful example showing how the COMPRESS function can be used to extract the digits from
values that contain other non-digit characters, such as units. Take a look:
5
NESUG 2012 Foundations and Fundamentals
Program 8
data Units;
input @1 Wt $10.;
Wt_Lbs =
input(compress(Wt,,'kd'),8.);
if findc(Wt,'K','i') then
Wt_Lbs = 2.2*Wt_Lbs;
datalines;
155lbs
90Kgs.
;
You see that the input data contains units such as lbs. or Kgs. This is a fairly common problem. Using the
COMPRESS function makes for a very simple and elegant solution. You start by keeping only the digits in the
original value and you use the INPUT function to do the character to numeric conversion. Now you need to test if the
original value contained an upper- or lowercase 'k'. If so, you need to convert kilograms to pounds. The FINDC
function, with the 'i' modifier makes this a snap.
Figure 8: Output from Program 8
Listing of Data Set Units
Wt Wt_Lbs
155lbs 155
90Kgs. 198
Here you want to extract the state code (starting in position 3 for a length of 2) and the digit part of the ID starting in
position 5. Notice that you omit the third argument in the digit extraction. This is useful since some digits are 3
characters long and some are 4. You use the INPUT function to perform the character to numeric conversion in this
example.
6
NESUG 2012 Foundations and Fundamentals
THE SUBSTR FUNCTION USED ON THE LEFT-HAND SIDE OF THE EQUAL SIGN
Way back when I was learning SAS (probably before your time), this was called the SUBSTR pseudo function. That
name was too scary and SAS has renamed it the SUBSTR function used on the left-hand side of the equal sign. To
my knowledge, this is the only SAS function allowed to the left of the equal sign. Here's what it does:
It allows you to replace characters in an existing string with new characters. This sounds complicated, but you will
see in the following program, that it is actually straight forward. This next program uses the SUBSTR function (on the
left-hand side of the equal sign) to mask the first 5 characters in an account number. Here is the code:
Program 10
data bank;
input Id Account : $9. @@;
Account2 = Account;
substr(Account2,1,5) = '*****';
datalines;
001 123456789 002 049384756 003 119384757
;
First you assign the value of Account to another variable (Account2) so that you don't destroy the original value.
Next, you replace the characters in Account2, starting from position 1 for a length of 5 with five asterisks. Here is the
listing:
Figure 10: Output from Program 10
Id Account2
1 *****6789
2 *****4756
3 *****4757
7
NESUG 2012 Foundations and Fundamentals
Some names contain a middle initial, some do not. By using the -1 as the second argument to the function, you
always the last name.
Figure 11: Output from Program 11
Last_
Name Name
Jeff W. Snoker Snoker
Raymond Albert Albert
Alfred E. Newman Newman
Steven J. Foster Foster
Jose Romerez Romerez
Notice that in order to specify a single quote as a delimiter, you need to place the list of delimiters in double
quotes.
8
NESUG 2012 Foundations and Fundamentals
In each of the three lines using the TRANWRD function, you are replacing the words Street, Avenue, and Road with
their abbreviations.
Figure 13: Output from Program 13
Listing of Data Set CONVERT
Obs Address
1 89 Lazy Brook Rd.
2 123 River Rd.
3 12 Main St.
You may also consider using the SOUNDEX function to match names in two files. However, I have found that
SOUNDEX tends to match names that are quite dissimilar.
9
NESUG 2012 Foundations and Fundamentals
You can see that when you concatenate the two strings (One and Two) without using either of these functions, the
result maintains those blanks. Notice that the Trim variable has no blanks between the 'ABC' and 'XYZ' and the Strip
variable has no blanks at all. Finally, you can see that it is much easier to use the CATS function to remove leading
and trailing blanks and then concatenate the strings.
10
NESUG 2012 Foundations and Fundamentals
11
NESUG 2012 Foundations and Fundamentals
The CATS function concatenates all of the survey responses into a single string and the COUNTC function then
counts how many Y's (ignore case) there are in the string. I really love this program!
Figure 19: Output from Program 19
Listing of Survey
Q1 Q2 Q3 Q4 Q5 Num
y y n n Y 3
n n n n n 0
SOME DATE FUNCTIONS (MDY, MONTH, WEEKDAY, DAY, YEAR, AND YRDIF)
This section describes some of the most common (and useful) date functions. The MDY function returns a SAS date
given a month, day, and year value (the three arguments to the function). The WEEKDAY, DAY, MONTH and YEAR
functions all take a SAS date as their argument and return the day of the week (1=Sunday, 2=Monday, etc.), the day
of the month (a number from 1 to 31), the month (a number from 1 to 12) and the year respectively.
The YRDIF function computes the number of years between two dates. The first two arguments are the first date and
the second date. An optional third argument allows you to specify the number of days in a month and the number of
days in a year. For example, for certain financial calculations (such as bond interest), you might specify '30/360' that
asks for 30 day months and 360 day years. The YRDIF function had a slight problem computing the difference in
years when one or both of the dates fell on a leap year. This problem was fixed in version 9.3. If you are running a
version of SAS prior to 9.3, you need to specify 'ACT/ACT' as the third argument to the YRDIF function. Note that the
calculation may be off by one day when leap years are involved. This author still believes this is better than
subtracting the two dates and dividing by 365.25, the way we computed ages prior to the YRDIF function.
The next program demonstrates all of these date functions:
Program 20
data DateExamples;
input (Date1 Date2)(:mmddyy10.) M D Y;
SAS_Date = MDY(M,D,Y);
WeekDay = weekday(Date1);
MonthDay = day(Date1);
Year = year(Date1);
12
NESUG 2012 Foundations and Fundamentals
Age = yrdif(Date1,Date2);
format Date: mmddyy10.;
datalines;
10/21/1955 10/21/2012 6 15 2011
;
Figure 20: Output from Program 20
Week Month
SAS_Date Day Day Year Age
18793 10 21 1955 57
Defining arrays this way can save you lots of time and programming effort.
Figure 21: Output from Program 21
A B C x1 x2 x3 y z
Ron John Mary 1 2 . 3 .
13
NESUG 2012 Foundations and Fundamentals
input x1-x5;
Sum = sum(of x1-x5);
if n(of x1-x5) ge 4 then
Mean1 = mean(of x1-x5);
if nmiss(of x1-x5) le 3 then
Mean2 = mean(of x1-x5);
datalines;
1 2 . 3 4
. . . 8 9
;
In this program, you compute Mean1 only if there are 4 or more non-missing values. You compute Mean2 only if
there are 3 or fewer missing values.
Figure 22: Output from Program 22
Sum Mean1 Mean2
10 2.5 2.5
17 . 8.5
14
NESUG 2012 Foundations and Fundamentals
Program 24
data Moving;
input X @@;
Moving = mean(X,lag(x),lag2(x));
datalines;
50 40 55 20 70 50
;
In this program, you compute the mean of the current value and the two previous values.
Figure 24: Output from Program 24
X Moving
50 50.0000
40 45.0000
55 48.3333
20 38.3333
70 48.3333
50 46.6667
In the output below, you see that the original scores were 80, 70, 90, 10, and 80 but they have traded places so that
Score1 is the lowest, Score2, the next lowest, and so forth.
Figure 25: Output from Program 25
Score1 Score2 Score3 Score4 Score5 Top3
10 70 80 80 90 83.333
CONCLUSION
This paper covered some of the most useful functions in SAS. I understand that it may be a bit overwhelming, but I
just couldn't leave out some of my favorites. I think you can see how indispensable SAS functions (and call routines)
are to DATA step programming.
REFERENCES
Cody, Ron, 2010, SAS Functions by Example, Second edition, SAS Press, Cary, NC., SAS OnLine Doc., SAS
Institute, Cary, NC.
15
NESUG 2012 Foundations and Fundamentals
CONTACT INFORMATION
Your comments and questions are valued and encouraged. Contact the author at:
Name: Ron Cody
Address: PO Box 5049
City, State ZIP: Camp Verde, TX 78010
E-mail: [email protected]
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS
Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies.
16