Base Proc 6982
Base Proc 6982
1 Procedures Guide
The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2004.
Base SAS 9.1 Procedures Guide. Cary, NC: SAS Institute Inc.
Base SAS 9.1 Procedures Guide
Copyright 2004 by SAS Institute Inc., Cary, NC, USA
ISBN 1-59047-204-7
All rights reserved. Produced in the United States of America. No part of this publication
may be reproduced, stored in a retrieval system, or transmitted, in any form or by any
means, electronic, mechanical, photocopying, or otherwise, without the prior written
permission of the publisher, SAS Institute Inc.
U.S. Government Restricted Rights Notice. Use, duplication, or disclosure of this
software and related documentation by the U.S. government is subject to the Agreement
with SAS Institute and the restrictions set forth in FAR 52.22719 Commercial Computer
Software-Restricted Rights (June 1987).
SAS Institute Inc., SAS Campus Drive, Cary, North Carolina 27513.
1st printing, January 2004
SAS Publishing provides a complete selection of books and electronic products to help
customers use SAS software to its fullest potential. For more information about our
e-books, e-learning products, CDs, and hard-copy books, visit the SAS Publishing Web site
at support.sas.com/pubs or call 1-800-727-3228.
SAS and all other SAS Institute Inc. product or service names are registered trademarks
or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA
registration.
Other brand and product names are registered trademarks or trademarks of their
respective companies.
Contents
Whats New
xi
Overview
xi
Details
xi
PART
Concepts
Chapter 1
Chapter 2
Language Concepts
16
Procedure Concepts
19
Output Delivery System
Chapter 3
32
Overview
57
Statements
58
PART
Procedures
Chapter 4
71
75
Chapter 5
77
Chapter 6
153
Chapter 7
15
179
57
iv
Chapter 8
215
225
Chapter 9
Chapter 10
275
Chapter 11
277
283
Chapter 12
Chapter 13
283
285
Chapter 14
301
Chapter 15
301
303
Chapter 16
376
393
Chapter 17
393
395
Chapter 18
399
Chapter 19
401
Chapter 20
399
401
403
Chapter 21
419
Chapter 22
429
Chapter 23
485
Chapter 24
487
Chapter 25
485
487
489
Chapter 26
501
501
vi
Chapter 27
523
524
526
550
553
Chapter 28
587
589
Chapter 29
591
591
595
596
Chapter 30
597
601
603
Chapter 31
603
Chapter 32
603
605
606
608
624
Chapter 33
665
666
679
Chapter 34
682
665
703
705
703
589
vii
Chapter 35
771
Chapter 36
787
Chapter 37
787
789
803
Chapter 38
Chapter 39
807
Chapter 40
813
Chapter 41
831
Chapter 42
845
936
836
viii
Chapter 43
948
1003
Chapter 44
1015
1027
Chapter 45
1163
Chapter 46
1177
1179
1171
1285
Chapter 47
Chapter 48
Chapter 49
1285
1287
Chapter 50
1311
ix
Chapter 51
1333
Chapter 52
1333
1335
PART
Appendices
Appendix 1
1335
1337
1339
Overview
1339
Keywords and Formulas
1340
Statistical Background 1348
References
1373
Appendix 2
Appendix 3
1377
Overview
1377
CENSUS 1377
CHARITY
1378
CUSTOMER_RESPONSE 1380
DJIA
1383
EDUCATION 1384
EMPDATA 1385
ENERGY
1387
GROC
1388
MATCH_11 1388
PROCLIB.DELAY
1390
PROCLIB.EMP95 1391
PROCLIB.EMP96 1392
PROCLIB.INTERNAT 1393
PROCLIB.LAKES
1393
PROCLIB.MARCH
1394
PROCLIB.PAYLIST2
1395
PROCLIB.PAYROLL 1395
PROCLIB.PAYROLL2 1398
PROCLIB.SCHEDULE
1399
PROCLIB.STAFF 1402
PROCLIB.SUPERV
1405
RADIO
1405
Appendix 4
4 Recommended Reading
Recommended Reading
Index
1421
1419
1419
1375
1375
xi
Whats New
Overview
New and enhanced Base SAS procedures in 9 and 9.1
3 improve ODS formatting
3 enable import and export of Microsoft Excel 2002 spreadsheets and Microsoft
Access 2002 tables
3 support long format and informat names
3 list and compare SAS registries
3 support parallel sorting operations
3 improve statistical processing
3 improve printer denitions.
A list of ODS table names is now provided for each procedure that supports ODS.
You can use these names to reference the table when using the Output Delivery System
(ODS) to select tables and create output data sets.
Note:
3 This section describes the features of Base SAS procedures that are new or
enhanced since SAS 8.2.
3 z/OS is the successor to the OS/390 operating system. SAS 9.1 is supported on
both OS/390 and z/OS operating systems and, throughout this document, any
reference to z/OS also applies to OS/390, unless otherwise stated.
Details
The CONTENTS Procedure
The new look for output from the CONTENTS procedure and the CONTENTS
statement in PROC DATASETS provides a better format for the Output Delivery
System (ODS). PROC CONTENTS output now displays the data representation of a le
by reporting the native platform for each le, rather than just telling you whether the
data representation is native or foreign. Also, PROC CONTENTS output now provides
the encoding value, whether a character variable is transcoded if required, and whether
the data set is part of a generation group. A new example was added that shows how to
get PROC CONTENTS output into an ODS output data set for processing.
The ORDER= option was added to the CONTENTS statement to enable you to print
a list of variables in alphabetical order even if they include mixed-case names.
3 export to Microsoft Excel 2002 spreadsheets and Microsoft Access 2002 tables. The
new data sources are available for the Windows operating environment on 32-bit
platforms if your site has a license for SAS/ACCESS Interface to PC File Formats.
3 specify SAS data set options in the DATA= argument when you are exporting to
all data sources except for delimited, comma-separated, and tab-delimited external
les. For example, if the data set that you are exporting has an assigned
password, use the ALTER=, PW=, READ=, or WRITE= data set option. To export
only data that meets a specied condition, use the WHERE= data set option.
3 The maximum length for character informat names is now 30. The maximum
length for numeric informat names is now 31.
xiv
Whats New
in the chi-square goodness-of-t test for one-way tables, in the binomial computations
for one-way tables, and in the computation of kappa statistics for two-way tables.
The following new options are available in the TABLES statement:
3 The CONTENTS= option enables you to specify the text for the HTML contents
le links to crosstabulation tables.
3 The BDT option enables you to request Tarones adjustment in the Breslow-Day
test for homogeneity of odds ratios when you use the CMH option to compute the
Breslow-Day test for stratied 222 tables.
3 The NOWARN option suppresses the log warning message that the asymptotic
chi-square test might not be valid when more than 20% of the table cells have
expected frequencies less than 5.
3 The CROSSLIST option displays crosstabulation tables in ODS column format.
This option creates a table that has a table denition that you can customize with
the TEMPLATE procedure.
Additionally, the FREQ procedure now produces exact condence limits for the
common odds ratio and related tests.
Whats New xv
external functions that are written in the C or C++ programming languages for use in
SAS programs and C-language structures and types. For PROC PROTO documentation,
go to http://support.sas.com/documentation/onlinedoc. Select Base SAS from
the Product-Specic Documentation list.
3 The LISTREG option lists the contents of the registry in the log.
3 The COMPAREREG1 and COMPAREREG2 options are used together to compare
two registries. The results appear in the log.
3 Numeric class variables that do not have a format assigned to them are
automatically formatted with the BEST12. format.
3 PROC REPORT now writes the value _PAGE_ for the _BREAK_ variable in the
output data set for observations that are derived from a COMPUTE BEFORE
_PAGE_ or COMPUTE AFTER _PAGE_ statement.
3 The DATECOPY option copies to the output data set the SAS internal date and
time when the input data set was created, and the SAS internal date and time
when it was last modied prior to the sort.
3 The DUPOUT= option species an output data set that contains duplicate
observations.
xvi
Whats New
3 The OVERWRITE option deletes the input data set before the replacement output
data set is populated with observations.
3 You can now reference a permanent SAS data set by its physical lename.
3 When using the INTO clause to assign values to a range of macro variables, you
can now specify leading zeroes in the macro variable names.
3 Available statistics include upper and lower condence limits, skewness, and
kurtosis. PROC TABULATE now supports the ALPHA= option, which enables you
to specify a condence level.
3 Numeric class variables that do not have a format assigned to them are
automatically formatted with the BEST12. format.
P A R T
Concepts
Chapter
Chapter
Chapter
3
15
CHAPTER
1
Choosing the Right Procedure
Functional Categories of Base SAS Procedures 3
Report Writing 3
Statistics 3
Utilities 4
Report-Writing Procedures 4
Statistical Procedures 6
Available Statistical Procedures 6
Efciency Issues 7
Quantiles 7
Computing Statistics for Groups of Observations 7
Additional Information about the Statistical Procedures 7
Utility Procedures 7
Brief Descriptions of Base SAS Procedures 10
Report Writing
These procedures display useful information, such as data listings (detail reports),
summary reports, calendars, letters, labels, multipanel reports, and graphical reports:
CALENDAR
PLOT
SUMMARY*
CHART*
TABULATE*
FREQ*
REPORT*
TIMEPLOT
MEANS
SQL
Statistics
These procedures compute elementary statistical measures that include descriptive
statistics based on moments, quantiles, condence intervals, frequency counts,
Utilities
Chapter 1
cross-tabulations, correlations, and distribution tests. They also rank and standardize
data:
CHART
RANK
SUMMARY
CORR
REPORT
TABULATE
FREQ
SQL
UNIVARIATE
MEANS
STANDARD
Utilities
These procedures perform basic utility operations. They create, edit, sort, and
transpose data sets, create and restore transport data sets, create user-dened formats,
and provide basic le maintenance such as to copy, append, and compare data sets:
APPEND
EXPORT
PWENCODE
FONTREG
PRTEXP
CATALOG
FORMAT
REGISTRY
CIMPORT
FSLIST
RELEASE*
COMPARE
IMPORT
SORT
CONTENTS
OPTIONS
SOURCE*
CONVERT*
OPTLOAD
SQL
COPY
OPTSAVE
TAPECOPY*
CPORT
PDS*
TAPELABEL*
CV2VIEW@
PDSCOPY*
TEMPLATE+
DATASETS
PMENU
TRANSPOSE
PRINTTO
TRANTAB#
BMDP
DBCSTAB
DOCUMENT+
*
+
@
#
PRTDEF
See the SAS documentation for your operating environment for a description of these procedures.
See SAS Output Delivery System: Users Guide for a description of these procedures.
See SAS/ACCESS for Relational Databases: Reference for a description of this procedure.
See SAS National Language Support (NLS): Users Guide for a description of these procedures.
Report-Writing Procedures
Table 1.1 on page 5 lists report-writing procedures according to the type of report.
Report-Writing Procedures
Which
Detail reports
REPORT
SQL
MEANS or
SUMMARY
REPORT
SQL
TABULATE
Summary reports
CALENDAR
Multipanel reports
(telephone book listings)
REPORT
CHART
PLOT
TIMEPLOT
* These reports quickly produce a simple graphical picture of the data. To produce high-resolution graphical
reports, use SAS/GRAPH software.
Statistical Procedures
Chapter 1
Statistical Procedures
Available Statistical Procedures
Table 1.2 on page 6 lists statistical procedures according to task. Table A1.1 on page
1341 lists the most common statistics and the procedures that compute them.
Table 1.2 Elementary Statistical Procedures by Task
To produce
Which
Descriptive statistics
CORR
MEANS or
SUMMARY
REPORT
SQL
TABULATE
UNIVARIATE
FREQ
TABULATE
UNIVARIATE
Correlation analysis
CORR
Distribution analysis
UNIVARIATE
FREQ
UNIVARIATE
RANK
Frequency and
cross-tabulation tables
Robust estimation
Data transformation
Computing ranks
To produce
Utility Procedures
Which
STANDARD
CHART
UNIVARIATE
Standardizing data
Low-resolution graphics*
Efciency Issues
Quantiles
For a large sample size n, the calculation of quantiles, including the median, requires
computing time proportional to nlog(n). Therefore, a procedure, such as UNIVARIATE,
that automatically calculates quantiles may require more time than other data
summarization procedures. Furthermore, because data is held in memory, the procedure
also requires more storage space to perform the computations. By default, the report
procedures PROC MEANS, PROC SUMMARY, and PROC TABULATE require less
memory because they do not automatically compute quantiles. These procedures also
provide an option to use a new xed-memory quantiles estimation method that is
usually less memory intense. See Quantiles on page 555 for more information.
Utility Procedures
Table 1.3 on page 8 groups utility procedures according to task.
Utility Procedures
Chapter 1
Which
Supply information
COMPARE
CONTENTS
OPTIONS
SQL
OPTIONS
OPTLOAD
OPTSAVE
DOCUMENT+
FONTREG
FORMAT
PRINTTO
PRTDEF
PRTEXP
TEMPLATE
Create, browse, and edit
data
DBCSTAB#
FORMAT
SORT
SQL
TRANSPOSE
TRANTAB#
Manage SAS les
SQL
Transform data
FSLIST
APPEND
BMDP*
Utility Procedures
Which
CATALOG
CIMPORT
CONVERT*
COPY
CPORT
CV2VIEW@
DATASETS
EXPORT
IMPORT
PDS*
PDSCOPY*
REGISTRY
RELEASE*
SOURCE*
SQL
TAPECOPY
TAPELABEL*
Control windows
PMENU
Miscellaneous
PWENCODE
*
+
@
#
See
See
See
See
the SAS documentation for your operating environment for a description of these procedures.
SAS Output Delivery System: Users Guide for a description of these procedures.
SAS/ACCESS for Relational Databases: Reference for a description of this procedure.
SAS National Language Support (NLS): Users Guide for a description of these procedures.
10
Chapter 1
11
CPORT procedure
writes SAS data libraries, data sets, and catalogs in a special format called a
transport le. Coupled with the CIMPORT procedure, PROC CPORT enables you
to move SAS libraries, data sets, and catalogs from one operating environment to
another.
CV2VIEW procedure
converts SAS/ACCESS view descriptors to PROC SQL views. Starting in SAS
System 9, conversion of SAS/ACCESS view descriptors to PROC SQL views is
recommended because PROC SQL views are platform independent and enable you
to use the LIBNAME statement. See SAS/ACCESS for Relational Databases:
Reference for details.
DATASETS procedure
lists, copies, renames, and deletes SAS les and SAS generation groups, manages
indexes, and appends SAS data sets in a SAS data library. The procedure provides
all the capabilities of the APPEND, CONTENTS, and COPY procedures. You can
also modify variables within data sets, manage data set attributes, such as labels
and passwords, or create and delete integrity constraints.
DBCSTAB procedure
produces conversion tables for the double-byte character sets that SAS supports.
DOCUMENT procedure
manipulates procedure output that is stored in ODS documents. PROC
DOCUMENT enables a user to browse and edit output objects and hierarchies,
and to replay them to any supported ODS output format. See SAS Output Delivery
System: Users Guide for details.
EXPORT procedure
reads data from a SAS data set and writes it to an external data source.
FONTREG procedure
adds system fonts to the SAS registry.
FORMAT procedure
creates user-dened informats and formats for character or numeric variables.
PROC FORMAT also prints the contents of a format library, creates a control data
set to write other informats or formats, and reads a control data set to create
informats or formats.
FREQ procedure
produces one-way to n-way frequency tables and reports frequency counts. PROC
FREQ can compute chi-square tests for one-way to n-way tables, tests and
measures of association and of agreement for two-way to n-way cross-tabulation
tables, risks and risk difference for 222 tables, trends tests, and
Cochran-Mantel-Haenszel statistics. You can also create output data sets.
FSLIST procedure
displays the contents of an external le or copies text from an external le to the
SAS Text Editor.
IMPORT procedure
reads data from an external data source and writes them to a SAS data set.
MEANS procedure
computes descriptive statistics for numeric variables across all observations and
within groups of observations. You can also create an output data set that contains
specic statistics and identies minimum and maximum values for groups of
observations.
12
Chapter 1
OPTIONS procedure
lists the current values of all SAS system options.
OPTLOAD procedure
reads SAS system option settings from the SAS registry or a SAS data set, and
puts them into effect.
OPTSAVE procedure
saves SAS system option settings to the SAS registry or a SAS data set.
PDS procedure
lists, deletes, and renames the members of a partitioned data set. See the SAS
documentation for your operating environment for more information.
PDSCOPY procedure
copies partitioned data sets from disk to tape, disk to disk, tape to tape, or tape to
disk. See the SAS documentation for your operating environment for more
information.
PLOT procedure
produces scatter plots that graph one variable against another. The coordinates of
each point on the plot correspond to the two variables values in one or more
observations of the input data set.
PMENU procedure
denes menus that you can use in DATA step windows, macro windows, and
SAS/AF windows, or in any SAS application that enables you to specify customized
menus.
PRINT procedure
prints the observations in a SAS data set, using all or some of the variables.
PROC PRINT can also print totals and subtotals for numeric variables.
PRINTTO procedure
denes destinations for SAS procedure output and the SAS log.
PRTDEF procedure
creates printer denitions for individual SAS users or all SAS users.
PRTEXP procedure
exports printer denition attributes to a SAS data set so that they can be easily
replicated and modied.
PWENCODE procedure
encodes passwords for use in SAS programs.
RANK procedure
computes ranks for one or more numeric variables across the observations of a
SAS data set. The ranks are written to a new SAS data set. Alternatively, PROC
RANK produces normal scores or other rank scores.
REGISTRY procedure
imports registry information into the USER portion of the SAS registry.
RELEASE procedure
releases unused space at the end of a disk data set in the z/OS environment. See
the SAS documentation for this operating environment for more information.
REPORT procedure
combines features of the PRINT, MEANS, and TABULATE procedures with
features of the DATA step in a single report-writing tool that can produce both
detail and summary reports.
13
SORT procedure
sorts observations in a SAS data set by one or more variables. PROC SORT stores
the resulting sorted observations in a new SAS data set or replaces the original
data set.
SOURCE procedure
provides an easy way to back up and process source library data sets. See the SAS
documentation for your operating environment for more information.
SQL procedure
implements a subset of the Structured Query Language (SQL) for use in SAS. SQL
is a standardized, widely used language that retrieves and updates data in SAS
data sets, SQL views, and DBMS tables, as well as views based on those tables.
PROC SQL can also create tables and views, summaries, statistics, and reports
and perform utility functions such as sorting and concatenating.
STANDARD procedure
standardizes some or all of the variables in a SAS data set to a given mean and
standard deviation and produces a new SAS data set that contains the
standardized values.
SUMMARY procedure
computes descriptive statistics for the variables in a SAS data across all
observations and within groups of observations and outputs the results to a new
SAS data set.
TABULATE procedure
displays descriptive statistics in tabular form. The value in each table cell is
calculated from the variables and statistics that dene the pages, rows, and
columns of the table. The statistic associated with each cell is calculated on values
from all observations in that category. You can write the results to a SAS data set.
TAPECOPY procedure
copies an entire tape volume or les from one or more tape volumes to one output
tape volume. See the SAS documentation for your operating environment for more
information.
TAPELABEL procedure
lists the label information of an IBM standard-labeled tape volume under the z/OS
environment. See the SAS documentation for this operating environment for more
information.
TEMPLATE procedure
customizes ODS output for an entire SAS job or a single ODS output object. See
SAS Output Delivery System: Users Guide for details.
TIMEPLOT procedure
produces plots of one or more variables over time intervals.
TRANSPOSE procedure
transposes a data set that changes observations into variables and vice versa.
TRANTAB procedure
creates, edits, and displays customized translation tables.
UNIVARIATE procedure
computes descriptive statistics (including quantiles), condence intervals, and
robust estimates for numeric variables. Provides detail on the distribution of
numeric variables, which include tests for normality, plots to illustrate the
distribution, frequency tables, and tests of location.
14
15
CHAPTER
2
Fundamental Concepts for Using
Base SAS Procedures
Language Concepts 16
Temporary and Permanent SAS Data Sets 16
Naming SAS Data Sets 16
USER Data Library 17
SAS System Options 17
Data Set Options 18
Global Statements 18
Procedure Concepts 19
Input Data Sets 19
RUN-Group Processing 19
Creating Titles That Contain BY-Group Information 20
BY-Group Processing 20
Suppressing the Default BY Line 20
Inserting BY-Group Information into a Title 20
Example: Inserting a Value from Each BY Variable into the Title 21
Example: Inserting the Name of a BY Variable into a Title 22
Example: Inserting the Complete BY Line into a Title 23
Error Processing of BY-Group Specications 24
Shortcuts for Specifying Lists of Variable Names 24
Formatted Values 25
Using Formatted Values 25
Example: Printing the Formatted Values for a Data Set 25
Example: Grouping or Classifying Formatted Data 27
Example: Temporarily Associating a Format with a Variable 28
Example: Temporarily Dissociating a Format from a Variable 29
Formats and BY-Group Processing 30
Formats and Error Checking 30
Processing All the Data Sets in a Library 30
Operating Environment-Specic Procedures 30
Statistic Descriptions 31
Computational Requirements for Statistics 32
Output Delivery System 32
What Is the Output Delivery System? 32
Gallery of ODS Samples 33
Introduction to the ODS Samples 33
Listing Output 33
PostScript Output 35
HTML Output 35
RTF Output 36
PDF Output 37
XML Output 38
16
Language Concepts
Chapter 2
Language Concepts
17
The SAS system options WORK=, WORKINIT, and WORKTERM affect how you
work with temporary and permanent libraries. See SAS Language Reference:
Dictionary for complete documentation.
Typically, two-level names represent permanent SAS data sets. A two-level name
takes the form libref.SAS-data-set. The libref is a name that is temporarily associated
with a SAS data library. A SAS data library is an external storage location that stores
SAS data sets in your operating environment. A LIBNAME statement associates the
libref with the SAS data library. In the following PROC PRINT step, PROCLIB is the
libref and EMP is the SAS data set within the library:
libname proclib SAS-data-library;
proc print data=proclib.emp;
run;
Note: If you have a USER data library dened, then you can still use the WORK
data library by specifying WORK.SAS-data-set.
18
Chapter 2
The individual procedure chapters contain reminders that you can use data set
options where it is appropriate.
SAS data set options are
ALTER=
OBS=
BUFNO=
OBSBUF=
BUFSIZE=
OUTREP=
CNTLLEV=
POINTOBS=
COMPRESS=
PW=
DLDMGACTION=
PWREQ=
DROP=
READ=
ENCODING=
RENAME=
ENCRYPT=
REPEMPTY=
FILECLOSE=
REPLACE=
FIRSTOBS=
REUSE=
GENMAX=
SORTEDBY=
GENNUM=
SORTSEQ=
IDXNAME=
SPILL=
IDXWHERE=
TOBSNO=
IN=
TYPE=
INDEX=
WHERE=
KEEP=
WHEREUP=
LABEL=
WRITE=
For a complete description of SAS data set options, see SAS Language Reference:
Dictionary.
Global Statements
You can use these global statements anywhere in SAS programs except after a
DATALINES, CARDS, or PARMCARDS statement:
comment
ODS
DM
OPTIONS
ENDSAS
PAGE
FILENAME
SASFILE
LIBNAME
SKIP
%LIST
TITLE
LOCK
19
%RUN
%INCLUDE
RUN-Group Processing
RUN
FOOTNOTE
For information about all but the ODS statement, refer to SAS Language Reference:
Dictionary. For information about the ODS statement, refer to Output Delivery
System on page 32 and to SAS Output Delivery System: Users Guide.
Procedure Concepts
If you omit the DATA= option, the procedure uses the value of the SAS system option
_LAST_=. The default of _LAST_= is the most recently created SAS data set in the
current SAS job or session. _LAST_= is described in detail in SAS Language Reference:
Dictionary.
RUN-Group Processing
RUN-group processing enables you to submit a PROC step with a RUN statement
without ending the procedure. You can continue to use the procedure without issuing
another PROC statement. To end the procedure, use a RUN CANCEL or a QUIT
statement. Several base SAS procedures support RUN-group processing:
CATALOG
DATASETS
PLOT
PMENU
TRANTAB
See the section on the individual procedure for more information.
Note: PROC SQL executes each query automatically. Neither the RUN nor RUN
CANCEL statement has any effect. 4
20
Chapter 2
21
BYLINE
inserts the complete default BY line into the title.
sufx
supplies text to place immediately after the BY-group information that you insert
in the title. No space appears between the BY-group information and the sufx.
store has four departments. See GROC on page 1388 for the DATA step that
creates the data set.
2 sorts the data by Region and Department.
3 uses the SAS system option NOBYLINE to suppress the BY line that normally
statement, #BYVAL2 inserts the value of the second BY variable, Department, into
the title. In the second TITLE statement, #BYVAL(Region) inserts the value of
Region into the title. The rst period after Region indicates that a sufx follows.
The second period is the sufx.
5 uses the SAS system option BYLINE to return to the creation of the default BY
data groc; u
input Region $9. Manager $ Department $ Sales;
datalines;
Southeast
Hayes
Paper
250
Southeast
Hayes
Produce
100
Southeast
Hayes
Canned
120
Southeast
Hayes
Meat
80
...more lines of data...
Northeast
Fuller
Paper
200
Northeast
Fuller
Produce
300
Northeast
Fuller
Canned
420
Northeast
Fuller
Meat
125
;
22
Chapter 2
Sales Sum
400 +
*****
*****
|
*****
*****
300 +
*****
*****
|
*****
*****
*****
200 +
*****
*****
*****
|
*****
*****
*****
100 +
*****
*****
*****
|
*****
*****
*****
-------------------------------------------Aikmann
Duncan
Jeffreys
Manager
Sales Sum
75 +
*****
*****
|
*****
*****
60 +
*****
*****
|
*****
*****
45 +
*****
*****
|
*****
*****
30 +
*****
*****
*****
|
*****
*****
*****
15 +
*****
*****
*****
|
*****
*****
*****
-------------------------------------------Aikmann
Duncan
Jeffreys
Manager
#BYVAR(Region) inserts the name of the variable Region into the title. (If Region
had a label, #BYVAR would use the label instead of the name.) The sufx al is
appended to the label. In the second TITLE statement, #BYVAL1 inserts the value
of the rst BY variable, Region, into the title.
3 uses the SAS system option BYLINE to return to the creation of the default BY
Sales Mean
300 +
*****
|
*****
200 +
*****
*****
100 +
*****
*****
*****
|
*****
*****
*****
-------------------------------------------Aikmann
Duncan
Jeffreys
Manager
23
24
Chapter 2
Sales Sum
400 +
*****
*****
|
*****
*****
300 +
*****
*****
|
*****
*****
*****
200 +
*****
*****
*****
|
*****
*****
*****
100 +
*****
*****
*****
|
*****
*****
*****
-------------------------------------------Aikmann
Duncan
Jeffreys
Manager
Sales Sum
75 +
*****
*****
|
*****
*****
60 +
*****
*****
|
*****
*****
45 +
*****
*****
|
*****
*****
30 +
*****
*****
*****
|
*****
*****
*****
15 +
*****
*****
*****
|
*****
*****
*****
-------------------------------------------Aikmann
Duncan
Jeffreys
Manager
Meaning
x1-xn
x:
x--a
x-numeric-a
Formatted Values
Notation
Meaning
x-character-a
_numeric_
_character_
_all_
25
Note: You cannot use shortcuts to list variable names in the INDEX CREATE
statement in PROC DATASETS. 4
See SAS Language Reference: Concepts for complete documentation.
Formatted Values
Using Formatted Values
Typically, when you print or group variable values, base SAS procedures use the
formatted values. This section contains examples of how base procedures use formatted
values.
26
Formatted Values
Chapter 2
Gender
M
F
M
F
M
M
M
M
M
F
Jobcode
TA2
ME2
ME1
FA3
TA3
ME3
SCP
PT2
TA2
TA3
Salary
Birth
Hired
34376
35108
29769
32886
38822
43025
18723
88606
32615
38785
12SEP60
15OCT64
05NOV67
31AUG65
13DEC50
26APR54
06JUN62
30MAR61
17JAN63
22DEC68
04JUN87
09AUG90
16OCT90
29JUL90
17NOV85
07JUN80
01OCT90
10FEB81
02DEC90
05OCT89
The following PROC FORMAT step creates the format $JOBFMT., which assigns
descriptive names for each job:
proc format;
value $jobfmt
FA1=Flight Attendant Trainee
FA2=Junior Flight Attendant
FA3=Senior Flight Attendant
ME1=Mechanic Trainee
ME2=Junior Mechanic
ME3=Senior Mechanic
PT1=Pilot Trainee
PT2=Junior Pilot
PT3=Senior Pilot
TA1=Ticket Agent Trainee
TA2=Junior Ticket Agent
TA3=Senior Ticket Agent
NA1=Junior Navigator
NA2=Senior Navigator
BCK=Baggage Checker
SCP=Skycap;
run;
The FORMAT statement in this PROC MEANS step temporarily associates the
$JOBFMT. format with the variable Jobcode:
options nodate pageno=1
linesize=64 pagesize=60;
proc means data=proclib.payroll mean max;
class jobcode;
var salary;
format jobcode $jobfmt.;
title Summary Statistics for;
title2 Each Job Code;
run;
Formatted Values
27
PROC MEANS produces this output, which uses the $JOBFMT. format:
Summary Statistics for
Each Job Code
11
23039.36
23979.00
16
27986.88
28978.00
32933.86
33419.00
Mechanic Trainee
28500.25
29769.00
Junior Mechanic
14
35576.86
36925.00
Senior Mechanic
42410.71
43900.00
Junior Navigator
42032.20
43433.00
Senior Navigator
52383.00
53798.00
Pilot Trainee
67908.00
71349.00
Junior Pilot
10
87925.20
91908.00
Senior Pilot
10504.50
11379.00
Skycap
18308.86
18833.00
27721.33
28880.00
20
33574.95
34803.00
Note: Because formats are character strings, formats for numeric variables are
ignored when the values of the numeric variables are needed for mathematical
calculations. 4
28
Formatted Values
Chapter 2
SCP=Skycap;
run;
options nodate pageno=1
linesize=64 pagesize=40;
proc means data=proclib.payroll mean max;
class jobcode;
var salary;
format jobcode $codefmt.;
title Summary Statistics for Job Codes;
title2 (Using a Format that Groups the Job Codes);
run;
34
27404.71
33419.00
Mechanic
29
35274.24
43900.00
45913.75
53798.00
20
72176.25
91908.00
18308.86
18833.00
Navigator
Pilot
Skycap
Ticket Agent
41
34076.73
40899.00
-------------------------------------------------------
Formatted Values
29
Gender
M
F
M
F
M
M
M
M
M
F
Jobcode
TA2
ME2
ME1
FA3
TA3
ME3
SCP
PT2
TA2
TA3
Salary
Birth
Hired
$34,376
$35,108
$29,769
$32,886
$38,822
$43,025
$18,723
$88,606
$32,615
$38,785
12SEP60
15OCT64
05NOV67
31AUG65
13DEC50
26APR54
06JUN62
30MAR61
17JAN63
22DEC68
04JUN87
09AUG90
16OCT90
29JUL90
17NOV85
07JUN80
01OCT90
10FEB81
02DEC90
05OCT89
1=Freshman
2=Sophomore
3=Junior
4=Senior;
run;
data debate;
input Name $ Gender $
format year $yrfmt.;
datalines;
Capiccio m 1 3.598 Tucker
Bagwell f 2 3.722 Berry
Metcalf m 2 3.342 Gold
Gray
f 3 3.177 Syme
Baglione f 4 4.000 Carr
Hall
m 4 3.574 Lewis
;
Year $
m
m
f
f
m
m
1
2
3
3
4
4
GPA
3.901
3.198
3.609
3.883
3.750
3.421
@@;
30
Chapter 2
PROC MEANS produces this output, which does not use the YRFMT. format:
Average GPA
3.42
3.56
4
4
3.69
-------------------------------
Statistic Descriptions
Statistic Descriptions
Table 2.1 on page 31 identies common descriptive statistics that are available in
several Base SAS procedures. See Keywords and Formulas on page 1340 for more
detailed information about available statistics and theoretical information.
Table 2.1 Common Descriptive Statistics That Base Procedures Calculate
Statistic
Description
condence intervals
Procedures
FREQ, MEANS/SUMMARY, TABULATE, UNIVARIATE
CSS
corrected sum of
squares
CV
coefcient of variation
goodness-of-t tests
FREQ, UNIVARIATE
KURTOSIS
kurtosis
MAX
largest (maximum)
value
MEAN
mean
MEDIAN
MIN
smallest (minimum)
value
MODE
UNIVARIATE
number of observations
on which calculations
are based
NMISS
number of missing
values
NOBS
number of observations
MEANS/SUMMARY, UNIVARIATE
PCTN
REPORT, TABULATE
PCTSUM
REPORT, TABULATE
Pearson correlation
CORR
percentiles
RANGE
range
31
32
Chapter 2
Statistic
Description
Procedures
robust statistics
trimmed means,
Winsorized means
UNIVARIATE
SKEWNESS
skewness
Spearman correlation
CORR
STD
standard deviation
STDERR
SUM
sum
SUMWGT
sum of weights
tests of location
UNIVARIATE
USS
uncorrected sum of
squares
VAR
variance
33
Listing Output
Traditional SAS output is Listing output. You do not need to change your SAS
programs to create listing output. By default, you continue to create this kind of output
even if you also create a type of output that contains more formatting.
34
Output 2.1
Chapter 2
Listing Output
Average Quarterly Sales Amount by Each Sales Representative
11926.9
12165.2
774.3
31899.1
Jensen
5
5
10015.7
8009.5
3406.7
20904.8
____________________________________________________________________________
16026.8
14355.0
1237.5
34686.4
Jensen
6
6
12455.1
12713.7
1393.7
34376.7
____________________________________________________________________________
15
15
7313.6
7280.4
1485.0
30970.0
Jensen
21
21
10585.3
7361.7
2227.5
27129.7
____________________________________________________________________________
13624.4
12624.6
5419.8
38093.1
Jensen
6
6
19010.4
15441.0
1703.4
38836.4
____________________________________________________________________________
35
PostScript Output
With ODS, you can produce output in PostScript format.
HTML Output
With ODS, you can produce output in HTML (Hypertext Markup Language.) You can
browse these les with Internet Explorer, Netscape, or any other browser that fully
supports the HTML 3.2 tagset.
Note: To create HTML 4.0 tagsets, use the ODS HTML4 statement. In SAS 9, the
ODS HTML statement generates HTML 3.2 tagsets. In future releases of SAS, the ODS
HTML statement will support the most current HTML tagsets available. 4
36
Chapter 2
RTF Output
With ODS, you can produce RTF (Rich Text Format) output which is used with
Microsoft Word.
PDF Output
With ODS, you can produce output in PDF (Portable Document Format), which can
be viewed with the Adobe Acrobat Reader.
37
38
Chapter 2
XML Output
With ODS, you can produce output that is tagged with XML (Extensible Markup
Language) tags.
39
40
Chapter 2
41
ODS output
ODS output consists of formatted output from any of the ODS destinations. For
example, the OUTPUT destination produces SAS data sets; the LISTING
destination produces listing output; the HTML destination produces output that is
formatted in Hypertext Markup Language.
Figure 2.1 ODS Processing: What Goes in and What Comes Out
Table
Definition
+
Output
Object
Document
Output
Listing
Output
SAS
Data Set
MARKUP
HTML
HTML3.2
Output
RTF
SAS
TAGSETS*
PRINTER
User-defined
TAGSETS
ODS
Destinations
ODS
Outputs
42
Chapter 2
HTML4
SASIOXML
SASXMOH
CSVALL
HTMLCSS
SASREPORT
SASXMOIM
DEFAULT
IMODE
SASXML
SASXMOR
DOCBOOK
PHTML
SASXMOG
WML
EVENT_MAP
LATEX
SHORT_MAP
TPL_STYLE_MAP
CSV
LATEX2
STYLE_DISPLAY
TROFF
CSVBYLINE
NAMEDHTML
STYLE_POPUP
WMLOLIST
GRAPH
ODSSTYLE
TEXT_MAP
GTABLEAPPLET
PYX
TPL_STYLE_LIST
CAUTION:
These tagsets are experimental tagsets. Do not use these tagsets in production jobs.
Features of ODS
ODS is designed to overcome the limitations of traditional SAS output and to make it
easy to access and create the new formatting options. ODS provides a method of
delivering output in a variety of formats, and makes the formatted output easy to access.
Important features of ODS include the following:
3 ODS combines raw data with one or more table denitions to produce one or more
output objects. These objects can be sent to any or all ODS destinations. You
control the specic type of output from ODS by selecting an ODS destination. The
currently available ODS destinations can produce
3 traditional monospace output
3 an output data set
3 an ODS document that contains a hierarchy le of the output objects
3 output that is formatted for a high-resolution printer such as PostScript and
PDF
3 output that is formatted in various markup languages such as HTML
3 RTF output that is formatted for use with Microsoft Word.
3 ODS provides table denitions that dene the structure of the output from SAS
procedures and from the DATA step. You can customize the output by modifying
these denitions, or by creating your own.
3 ODS provides a way for you to choose individual output objects to send to ODS
destinations. For example, PROC UNIVARIATE produces ve output objects. You
can easily create HTML output, an output data set, traditional listing output, or
printer output from any or all of these output objects. You can send different
output objects to different destinations.
43
3 In the SAS windowing environment, ODS stores a link to each output object in the
Results folder in the Results window.
Third-Party
Formatted
destinations
The following table lists the ODS destination categories, the destination that each
category includes, and the formatted output that results from each destination.
Table 2.4 Destination Category Table
Category
Destinations
Results
SAS Formatted
DOCUMENT
ODS document
LISTING
OUTPUT
HTML
MARKUP
PRINTER
RTF
Third-Party Formatted
As future destinations are added to ODS, they automatically will become available to
the DATA step and to all procedures that support ODS.
44
Chapter 2
by the destination that does not support it. Otherwise, ODS would support a small
subset of features that are only common to all destinations. If this was true, then it
would be difcult to move your reports from one output format to another output
format. ODS provides many output formatting options, so that you can use the
appropriate format for the output that you want. It is best to use the appropriate
destination suited for your purpose.
45
different data sets into a single table. You can easily access and process your
output data sets using all of the SAS data set features. For example, you can
access your output data using variable names and perform WHERE-expression
processing just as you would process data from any other SAS data set.
3 an HTML le (called the body le) that contains the results from the
procedure
46
Chapter 2
output for a page description language or a hybrid language like RTF which
requires all of the text to be measured and placed at a specic position on the page.
3 PRINTER Family
The PRINTER destination produces output for
generation.
3 Features that we expect users to change on each document, such as the output le
name.
The ODS style attributes control the way that individual elements are created.
Attributes are aspects of a given style, such as type face, weight, font size, and color.
The values of the attributes collectively determine the appearance of each part of the
document to which the style is applied. With style attributes, it is unnecessary to insert
47
destination-specic code (such as raw HTML) into the document. Each output
destination will interpret the attributes that are necessary to generate the presentation
of the document. Because not all destinations are the same, not all attributes can be
interpreted by all destinations. Style attributes that are incompatible with a selected
destination are ignored. For example, PostScript does not support active links, so the
URL= attribute is ignored when producing PostScript output.
48
Chapter 2
Each style attribute species a value for one aspect of the presentation. For example,
the BACKGROUND= attribute species the color for the background of an HTML table
or for a colored table in printed output. The FONT_STYLE= attribute species whether
to use a Roman or an italic font. For information on style attributes, see the section on
style attributes in the SAS Output Delivery System: Users Guide.
Note: Because style denitions control the presentation of the data, they have no
effect on output objects that go to the LISTING or OUTPUT destination. 4
Results
View
Templates
denitions. If you want to view the underlying SAS code for a style
denition, then select the style and open it.
Operating Environment Information: For information on navigating in the
Explorer window without a mouse, see the section on Window Controls and
General Navigation in the SAS documentation for your operating
environment. 4
3 TEMPLATE Procedure:
You can also display a list of the available styles by submitting the following
PROC TEMPLATE statements:
proc template;
list styles;
run;
3 SQL Procedure:
You can also display a list of the available styles by submitting the following
PROC SQL statements:
proc sql;
select * from styles.style-name;
The stylename is the name of any style from the template store (for example,
styles.default or styles.beige).
For more information on how ODS destinations use styles and how you can
customize styles, see the section on the DEFINE STYLE statement in the SAS Output
Delivery System: Users Guide.
49
Most Base SAS procedures that support ODS use one or more table denitions
to produce output objects. These table denitions include denitions for table
elements: columns, headers, and footers. Each table element can specify the use of
one or more style elements for various parts of the output. These style elements
cannot be specied within the syntax of the procedure, but you can use customized
styles for the ODS destinations that you use. For more information about
customizing tables and styles, see the TEMPLATE procedure in the SAS Output
Delivery System: Users Guide.
50
Chapter 2
If you make a mistake when you modify the SAS registry, then your system might become
unstable or unusable. You will not be warned if an entry is incorrect. Incorrect entries
can cause errors, and can even prevent you from bringing up a SAS session. For
more information about how to congure the SAS registry, see the SAS registry
section in SAS Language Reference: Concepts. 4
To change the default setting of the HTML version in the SAS registry:
1 Select
Solutions
Accessories
Registry Editor
or
Issue the command REGEDIT.
2 Select
ODS
Modify
3 Select
Edit
or
Click the right mouse button and select MODIFY. The Edit String Value window
appears.
4 Type the HTML version in the Value Data text box and select OK.
Display 2.6 SAS Registry Showing HTML Version Setting
ODS
Destinations
Edit
Modify
or
Click the right mouse button and select MODIFY.
5 Type in the Value Data entry into the Edit Value String or Edit Signed Integer
SAS Output
By default, ODS output is formatted according to instructions that a PROC step or
DATA step denes. However, ODS provides ways for you to customize the output. You
can customize the output for an entire SAS job, or you can customize the output for a
single output object.
51
52
Chapter 2
Default List
OUTPUT
EXCLUDE ALL
All others
SELECT ALL
3 a partial path. A partial path consists of any part of the full path that begins
immediately after a period (.) and continues to the end of the full path. For
example, if the full path is
Univariate.City_Pop_90.TestsForLocation
3 a label path. For example, the label path for the output object is
"The UNIVARIATE Procedure"."CityPop_90"."Tests For Location"
Note: The trace record shows the label path only if you specify the LABEL
option in the ODS TRACE statement. 4
3 a partial label path. A partial label path consists of any part of the label that
begins immediately after a period (.) and continues to the end of the label. For
example, if the label path is
"The UNIVARIATE Procedure"."CityPop_90"."Tests For Location"
53
Note: Although you can maintain a selection list for one destination and an
exclusion list for another, it is easier to understand the results if you maintain the same
types of lists for all the destinations where you route output. 4
54
Summary of ODS
Chapter 2
Remember that not all procedures use table denitions. If you produce a trace record
for one of these procedures, no denition appears in the trace record. Conversely, some
procedures use multiple table denitions to produce their output. If you produce a trace
record for one of these procedures, more than one denition appears in the trace record.
The trace record refers to the table denition as a template. For a detailed
explanation of the trace record, see the section on the ODS TRACE statement in the
SAS Output Delivery System: Users Guide.
You can use PROC TEMPLATE to modify an entire table denition. When a
procedure or DATA step uses a table denition, it uses the elements that are dened or
referenced in its table denition. In general, you cannot directly specify a table element
for your procedure or DATA step to use without modifying the denition itself.
Note: Three Base SAS procedures, PROC PRINT, PROC REPORT and PROC
TABULATE, do provide a way for you to access table elements from the procedure step
itself. Accessing the table elements enables you to customize your report. For more
information about these procedures, see the Base SAS Procedures Guide 4
Summary of ODS
In the past, the term output has generally referred to the outcome of a SAS
procedure and DATA step. With the advent of the Output Delivery System, output
takes on a much broader meaning. ODS is designed to optimize output from SAS
procedures and the DATA step. It provides a wide range of formatting options and
greater exibility in generating, storing, and reproducing SAS output.
Important features of ODS include the following:
3 ODS combines raw data with one or more table denitions to produce one or more
output objects. An output object tells ODS how to format the results of a procedure
or DATA step.
3 ODS provides table denitions that dene the structure of the output from SAS
procedures and from the DATA step. You can customize the output by modifying
these denitions, or by creating your own denitions.
3 ODS provides a way for you to choose individual output objects to send to ODS
destinations.
3 ODS stores a link to each output object in the Results folder for easy retrieval and
access.
3 As future destinations are added to ODS, they will automatically become available
to the DATA step and all procedures that support ODS.
One of the main goals of ODS is to enable you to produce output for numerous
destinations from a single source, without requiring separate sources for each
destination. ODS supports many destinations:
DOCUMENT
enables you to capture output objects from single run of the analysis and produce
multiple reports in various formats whenever you want without re-running your
SAS programs.
LISTING
produces output that looks the same as the traditional SAS output.
Summary of ODS
55
HTML
produces output for online viewing.
MARKUP
produces output for markup language tagsets.
OUTPUT
produces SAS output data sets, thereby eliminating the need to parse PROC
PRINTTO output.
PRINTER
produces presentation-ready printed reports.
RTF
produces output suitable for Microsoft Word reports.
By default, ODS output is formatted according to instructions that the procedure or
DATA step denes. However, ODS provides ways for you to customize the presentation
of your output. You can customize the presentation of your SAS output, or you can
customize the look of a single output object. ODS gives you greater exibility in
generating, storing, and reproducing SAS procedure and DATA step output with a wide
range of formatting options.
56
57
CHAPTER
3
Statements with the Same
Function in Multiple Procedures
Overview 57
Statements 58
BY 58
FREQ 61
QUIT 63
WEIGHT 63
WHERE 68
Overview
Several statements are available and have the same function in a number of base
SAS procedures. Some of the statements are fully documented in SAS Language
Reference: Dictionary, and others are documented in this section. The following list
shows you where to nd more information about each statement:
ATTRIB
affects the procedure output and the output data set. The ATTRIB statement does
not permanently alter the variables in the input data set. The LENGTH= option
has no effect. See SAS Language Reference: Dictionary for complete
documentation.
BY
orders the output according to the BY groups. See BY on page 58.
FORMAT
affects the procedure output and the output data set. The FORMAT statement does
not permanently alter the variables in the input data set. The DEFAULT= option
is not valid. See SAS Language Reference: Dictionary for complete documentation.
FREQ
treats observations as if they appear multiple times in the input data set. See
FREQ on page 61.
LABEL
affects the procedure output and the output data set. The LABEL statement does
not permanently alter the variables in the input data set except when it is used
with the MODIFY statement in PROC DATASETS. See SAS Language Reference:
Dictionary for complete documentation.
QUIT
executes any statements that have not executed and ends the procedure. See
QUIT on page 63.
58
Statements
Chapter 3
WEIGHT
species weights for analysis variables in the statistical calculations. See
WEIGHT on page 63.
WHERE
subsets the input data set by specifying certain conditions that each observation
must meet before it is available for processing. See WHERE on page 68.
Statements
BY
Orders the output according to the BY groups.
See also: Creating Titles That Contain BY-Group Information on page 20
Required Arguments
variable
species the variable that the procedure uses to form BY groups. You can specify
more than one variable. If you do not use the NOTSORTED option in the BY
statement, then the observations in the data set must either be sorted by all the
variables that you specify, or they must be indexed appropriately. Variables in a BY
statement are called BY variables.
Options
DESCENDING
species that the observations are sorted in descending order by the variable that
immediately follows the word DESCENDING in the BY statement.
NOTSORTED
species that observations are not necessarily sorted in alphabetic or numeric order.
The observations are grouped in another way, for example, chronological order.
The requirement for ordering or indexing observations according to the values of
BY variables is suspended for BY-group processing when you use the NOTSORTED
option. In fact, the procedure does not use an index if you specify NOTSORTED. The
procedure denes a BY group as a set of contiguous observations that have the same
values for all BY variables. If observations with the same values for the BY variables
are not contiguous, then the procedure treats each contiguous set as a separate BY
group.
Note:
BY
59
Note: You cannot use the GROUPFORMAT option, which is available in the BY
statement in a DATA step, in a BY statement in any PROC step. 4
BY-Group Processing
Procedures create output for each BY group. For example, the elementary statistics
procedures and the scoring procedures perform separate analyses for each BY group.
The reporting procedures produce a report for each BY group.
Note: All base SAS procedures except PROC PRINT process BY groups
independently. PROC PRINT can report the number of observations in each BY group
as well as the number of observations in all BY groups. Similarly, PROC PRINT can
sum numeric variables in each BY group and across all BY groups. 4
You can use only one BY statement in each PROC step. When you use a BY
statement, the procedure expects an input data set that is sorted by the order of the BY
variables or one that has an appropriate index. If your input data set does not meet
these criteria, then an error occurs. Either sort it with the SORT procedure or create an
appropriate index on the BY variables.
Depending on the order of your data, you may need to use the NOTSORTED or
DESCENDING option in the BY statement in the PROC step.
For more information on
variable(s). If the BY variable is numeric and has no user-applied format, then the
BEST12. format is applied for the purpose of BY-group processing.
3 The procedure continues adding observations to the current BY group until both
60
BY
Chapter 3
CHART
SORT (required)
COMPARE
STANDARD
CORR
SUMMARY
FREQ
TABULATE
MEANS
TIMEPLOT
PLOT
TRANSPOSE
UNIVARIATE
RANK
Note: In the SORT procedure, the BY statement species how to sort the data. With
the other procedures, the BY statement species how the data are currently sorted. 4
Example
This example uses a BY statement in a PROC PRINT step. There is output for each
value of the BY variable, Year. The DEBATE data set is created in Example:
Temporarily Dissociating a Format from a Variable on page 29.
options nodate pageno=1 linesize=64
pagesize=40;
proc print data=debate noobs;
by year;
title Printing of Team Members;
title2 by Year;
run;
FREQ
61
Gender
Capiccio
Tucker
m
m
GPA
3.598
3.901
Gender
GPA
f
m
m
3.722
3.198
3.342
Bagwell
Berry
Metcalf
Gender
f
f
f
GPA
3.609
3.177
3.883
Gender
f
m
m
m
GPA
4.000
3.750
3.574
3.421
FREQ
Treats observations as if they appear multiple times in the input data set.
Tip: You can use a WEIGHT statement and a FREQ statement in the same step of any
procedure that supports both statements.
FREQ variable;
Required Arguments
variable
species a numeric variable whose value represents the frequency of the observation.
If you use the FREQ statement, then the procedure assumes that each observation
62
FREQ
Chapter 3
CORR
MEANS/SUMMARY
REPORT
STANDARD
TABULATE
UNIVARIATE
Example
The data in this example represent a ships course and speed (in nautical miles per
hour), recorded every hour. The frequency variable, Hours, represents the number of
hours that the ship maintained the same course and speed. Each of the following PROC
MEANS steps calculates average course and speed. The different results demonstrate
the effect of using Hours as a frequency variable.
The following PROC MEANS step does not use a frequency variable:
options nodate pageno=1 linesize=64 pagesize=40;
data track;
input Course Speed Hours @@;
datalines;
30 4 8 50 7 20
75 10 30 30 8 10
80 9 22 20 8 25
83 11 6 20 6 20
;
Without a frequency variable, each observation has a frequency of 1, and the total
number of observations is 8.
WEIGHT
63
When you use Hours as a frequency variable, the frequency of each observation is the
value of Hours, and the total number of observations is 141 (the sum of the values of
the frequency variable).
QUIT
Executes any statements that have not executed and ends the procedure.
QUIT;
CATALOG
DATASETS
PLOT
PMENU
SQL
WEIGHT
Species weights for analysis variables in the statistical calculations.
Tip: You can use a WEIGHT statement and a FREQ statement in the same step of any
procedure that supports both statements.
WEIGHT variable;
64
WEIGHT
Chapter 3
Required Arguments
variable
species a numeric variable whose values weight the values of the analysis variables.
The values of the variable do not have to be integers. The behavior of the procedure
when it encounters a nonpositive weight variable value is as follows:
Weight value
0
less than 0
missing
The procedure
counts the observation in the total number of observations
converts the weight value to zero and counts the observation in
the total number of observations
excludes the observation from the analysis
CORR
FREQ
MEANS/SUMMARY
REPORT
STANDARD
TABULATE
UNIVARIATE
Note: In PROC FREQ, the value of the variable in the WEIGHT statement
represents the frequency of occurrence for each observation. See the PROC FREQ
documentation in Volume 3 of this book for more information. 4
=w
WEIGHT
65
is not an estimate of the variance of the ith observation, because this variance involves
the observations weight which varies from observation to observation.
If the values of your variable are counts that represent the number of occurrences of
each observation, then use this variable in the FREQ statement rather than in the
WEIGHT statement. In this case, because the values are counts, they should be
integers. (The FREQ statement truncates any noninteger values.) The variance that is
computed with a FREQ variable is an estimate of the common variance, 2 , of the
observations.
Note: If your data come from a stratied sample where the weights wi represent
the strata weights, then neither the WEIGHT statement nor the FREQ statement
provides appropriate stratied estimates of the mean, variance, or variance of the
mean. To perform the appropriate analysis, consider using PROC SURVEYMEANS,
which is a SAS/STAT procedure that is documented in the SAS/STAT Users Guide. 4
The following PROC MEANS step computes the average estimate of the object size
while ignoring the weights. Without a WEIGHT variable, PROC MEANS uses the
default weight of 1 for every observation. Thus, the estimates of object size at all
distances are given equal weight. The average estimate of the object size exceeds the
actual size by 3.55 cm.
proc means data=size maxdec=3 n mean var stddev;
var objectsize;
title1 Unweighted Analysis of the SIZE Data Set;
run;
66
WEIGHT
Chapter 3
The next two PROC MEANS steps use the precision measure (Precision) in the
WEIGHT statement and show the effect of using different values of the VARDEF=
option. The rst PROC step creates an output data set that contains the variance and
standard deviation. If you reduce the weighting of the estimates that are made at
greater distances, the weighted average estimate of the object size is closer to the actual
size.
proc means data=size maxdec=3 n mean var stddev;
weight precision;
var objectsize;
output out=wtstats var=Est_SigmaSq std=Est_Sigma;
title1 Weighted Analysis Using Default VARDEF=DF;
run;
proc means data=size maxdec=3 n mean var std
vardef=weight;
weight precision;
var objectsize;
title1 Weighted Analysis Using VARDEF=WEIGHT;
run;
In the rst PROC MEANS step, the variance is an estimate of 2 , where the
variance of the ith observation is assumed to be var (xi ) = 2 =wi and wi is the weight
for the ith observation. In the second PROC MEANS step, the computed variance is an
estimate of (n 1=n) 2 =w, where w is the average weight. For large n, this is an
approximate estimate of the variance of an observation with average weight.
WEIGHT
67
The following statements create and print a data set with the weighted variance and
weighted standard deviation of each observation. The DATA step combines the output
data set that contains the variance and the standard deviation from the weighted
analysis with the original data set. The variance of each observation is computed by
dividing Est_SigmaSq, the estimate of 2 from the weighted analysis when
VARDEF=DF, by each observations weight (Precision). The standard deviation of each
observation is computed by dividing Est_Sigma, the estimate of from the weighted
analysis when VARDEF=DF, by the square root of each observations weight (Precision).
data wtsize(drop=_freq_ _type_);
set size;
if _n_=1 then set wtstats;
Est_VarObs=est_sigmasq/precision;
Est_StdObs=est_sigma/sqrt(precision);
proc print data=wtsize noobs;
title Weighted Statistics;
by distance;
format est_varobs est_stdobs
est_sigmasq est_sigma precision 6.3;
run;
68
WHERE
Chapter 3
Weighted Statistics
Precision
30
20
30
25
0.667
0.667
0.667
0.667
Est_
SigmaSq
Est_
Sigma
Est_
VarObs
Est_
StdObs
20.678
20.678
20.678
20.678
4.547
4.547
4.547
4.547
31.017
31.017
31.017
31.017
5.569
5.569
5.569
5.569
Precision
43
33
25
30
0.333
0.333
0.333
0.333
Est_
SigmaSq
Est_
Sigma
Est_
VarObs
Est_
StdObs
20.678
20.678
20.678
20.678
4.547
4.547
4.547
4.547
62.035
62.035
62.035
62.035
7.876
7.876
7.876
7.876
Precision
25
36
48
33
0.222
0.222
0.222
0.222
Est_
SigmaSq
Est_
Sigma
Est_
VarObs
Est_
StdObs
20.678
20.678
20.678
20.678
4.547
4.547
4.547
4.547
93.052
93.052
93.052
93.052
9.646
9.646
9.646
9.646
Precision
43
36
23
48
0.167
0.167
0.167
0.167
Est_
SigmaSq
Est_
Sigma
Est_
VarObs
Est_
StdObs
20.678
20.678
20.678
20.678
4.547
4.547
4.547
4.547
124.07
124.07
124.07
124.07
11.139
11.139
11.139
11.139
Precision
30
25
50
38
0.133
0.133
0.133
0.133
Est_
SigmaSq
Est_
Sigma
Est_
VarObs
Est_
StdObs
20.678
20.678
20.678
20.678
4.547
4.547
4.547
4.547
155.09
155.09
155.09
155.09
12.453
12.453
12.453
12.453
WHERE
Subsets the input data set by specifying certain conditions that each observation must meet before
it is available for processing.
WHERE where-expression;
WHERE
69
Required Arguments
where-expression
RANK
CHART
REPORT
COMPARE
SORT
CORR
SQL
STANDARD
FREQ
TABULATE
MEANS/SUMMARY
TIMEPLOT
PLOT
TRANSPOSE
UNIVARIATE
Details
3 The CALENDAR and COMPARE procedures and the APPEND statement in
PROC DATASETS accept more than one input data set. See the documentation for
the specic procedure for more information.
3 To subset the output data set, use the WHERE= data set option:
proc report data=debate nowd
out=onlyfr(where=(year=1));
run;
Example
In this example, PROC PRINT prints only those observations that meet the condition
of the WHERE expression. The DEBATE data set is created in Example: Temporarily
Dissociating a Format from a Variable on page 29.
options nodate pageno=1 linesize=64
pagesize=40;
70
WHERE
Chapter 3
run;
Gender
m
m
f
f
f
f
m
m
Year
Freshman
Freshman
Sophomore
Junior
Junior
Senior
Senior
Senior
GPA
3.598
3.901
3.722
3.609
3.883
4.000
3.750
3.574
71
P A R T
Procedures
Chapter
Chapter
Chapter
Chapter
Chapter
215
Chapter
225
Chapter
275
Chapter
277
Chapter
283
Chapter
Chapter
Chapter
Chapter
Chapter
Chapter
Chapter
75
77
153
179
285
301
303
393
395
399
401
72
Chapter
Chapter
Chapter
Chapter
Chapter
Chapter
Chapter
Chapter
Chapter
589
Chapter
591
Chapter
601
Chapter
603
Chapter
Chapter
Chapter
Chapter
Chapter
Chapter
789
Chapter
803
Chapter
Chapter
Chapter
Chapter
Chapter
Chapter
403
419
429
485
487
489
501
523
605
665
703
771
787
807
813
831
845
1003
1027
73
P A R T
Procedures
Chapter
1163
Chapter
1177
Chapter
1179
Chapter
1285
Chapter
1287
Chapter
Chapter
Chapter
1311
1333
1335
74
75
CHAPTER
4
The APPEND Procedure
Overview: APPEND Procedure 75
Syntax: APPEND Procedure 75
Chapter 3, Statements with the Same Function in Multiple Procedures, on page 57 for
details. You can also use any global statements. See Global Statements on page 18 for
a list.
Reminder: You can use data set options with the BASE= and DATA= options. See Data
76
77
CHAPTER
5
The CALENDAR Procedure
Overview: CALENDAR Procedure 79
What Does the CALENDAR Procedure Do? 79
What Types of Calendars Can PROC CALENDAR Produce? 79
Advanced Scheduling and Project Management Tasks 83
Syntax: CALENDAR Procedure 84
PROC CALENDAR Statement 85
BY Statement 91
CALID Statement 92
DUR Statement 93
FIN Statement 94
HOLIDUR Statement 95
HOLIFIN Statement 95
HOLISTART Statement 96
HOLIVAR Statement 97
MEAN Statement 97
OUTDUR Statement 98
OUTFIN Statement 98
OUTSTART Statement 99
START Statement 99
SUM Statement 100
VAR Statement 101
Concepts: CALENDAR Procedure 101
Type of Calendars 101
Schedule Calendar 102
Denition 102
Required Statements 102
Examples 102
Summary Calendar 102
Denition 102
Required Statements 103
Multiple Events on a Single Day 103
Examples 103
The Default Calendars 103
Description 103
When You Unexpectedly Produce a Default Calendar 103
Examples 104
Calendars and Multiple Calendars 104
Denitions 104
Why Create Multiple Calendars 104
How to Identify Multiple Calendars 104
Using Holidays or Calendar Data Sets with Multiple Calendars
105
78
Contents
Chapter 5
106
79
For the activities data set shown that is in this calendar, see Example 1 on page 114.
80
Chapter 5
------------------------------------------------------------------------------------------------------------------------------|
|
|
July 1996
|
|
|
|-----------------------------------------------------------------------------------------------------------------------------|
|
Sunday
|
Monday
|
Tuesday
|
Wednesday
|
Thursday
|
Friday
|
Saturday
|
|-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------|
|
|
1
|
2
|
3
|
4
|
5
|
6
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|+=Interview/JW==+|
|+Dist. Mtg./All=+|+====Mgrs. Meeting/District 6=====+|
|
|
|
|
|
|
|
|
|
|
|
|
|
|+VIP Banquet/JW=+|
|
|
|
|-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------|
|
7
|
8
|
9
|
10
|
11
|
12
|
13
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|+Planning Counci+|+=Seminar/White=+|
|+==================Trade Show/Knox==================+|+====Mgrs. Meeting/District 7=====+|
|+================================Sales Drive/District 6=================================+|
|
|
|
|-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------|
|
14
|
15
|
16
|
17
|
18
|
19
|
20
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|+NewsLetter Dead+|+Co. Picnic/All=+|
|
|
|
|
|
|
|+==Dentist/JW===+|+Bank Meeting/1s+|+Planning Counci+|+=Seminar/White=+|
|+================================Sales Drive/District 7=================================+|
|
|
|-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------|
|
21
|
22
|
23
|
24
|
25
|
26
|
27
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|+=Birthday/Mary=+|+======Close Sale/WYGIX Co.=======+|
|
|
|+===============Inventors Show/Melvin===============+|+Planning Counci+|
|
|
|-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------|
|
|
|
|
|
28
|
|
|
|
|
29
|
|
|
|
|
30
|
|
|
|
|
31
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
-------------------------------------------------------------------------------------------------------------------------------
81
For an explanation of the program that produces this calendar, see Example 4 on
page 127.
-----------------------------------------------------------------------------------------------------------------------|
|
|
July 1996
|
|
|
|----------------------------------------------------------------------------------------------------------------------|
|
Sunday
|
Monday
|
Tuesday
|
Wednesday
|
Thursday
|
Friday
|
Saturday
|
----------+----------------+----------------+----------------+----------------+----------------+----------------+----------------|
|
|
|
1
|
2
|
3
|
4
|
5
|
6
|
|.........|................|................|................|................|................|................|................|
| CAL1
|
|
|
|
|**Independence**|+Assemble Tank/>|
|
|
|
|
|
|
|
|+Lay Power Line>|
|
|
|
|+==============Drill Well/$1,000.00==============>|
|<Drill Well/$1,+|
|
|.........|................|................|................|................|................|................|................|
| CAL2
|
|
|
|+=======================Excavate/$3,500.00========================>|
|---------+----------------+----------------+----------------+----------------+----------------+----------------+----------------|
|
|
7
|
8
|
9
|
10
|
11
|
12
|
13
|
|.........|................|................|................|................|................|................|................|
| CAL1
|
|
|
|
|
|
|
|
|.........|................|................|................|................|................|................|................|
| CAL2
|
|<Excavate/$3,50>|****Vacation****|<Excavate/$3,50+|
|
|
|
|---------+----------------+----------------+----------------+----------------+----------------+----------------+----------------|
|
|
14
|
15
|
16
|
17
|
18
|
19
|
20
|
|.........|................|................|................|................|................|................|................|
| CAL1
|
|
|
|
|
|+===============================Install Pump/$500.00===============================+|
|<===========Pour Foundation/$1,500.00============+|
|+Install Pipe/$>|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|---------+----------------+----------------+----------------+----------------+----------------+----------------+----------------|
|
|
21
|
22
|
23
|
24
|
25
|
26
|
27
|
|.........|................|................|................|................|................|................|................|
| CAL1
|
|
|
|
|
|+==============================Erect Tower/$2,500.00===============================>|
|<====Install Pipe/$1,000.00=====+|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|---------+----------------+----------------+----------------+----------------+----------------+----------------+----------------|
|
|
28
|
29
|
30
|
31
|
|
|
|
|.........|................|................|................|................|................|................|................|
| CAL1
|
|<Erect Tower/$2+|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
----------------------------------------------------------------------------------------------------------------------------------
82
Chapter 5
In a summary calendar, each piece of information for a given day is the value of a
variable for that day. The variables can be either numeric or character, and you can
format them as necessary. You can use the SUM and MEAN options to calculate sums
and means for any numeric variables. These statistics appear in a box below the
calendar, as shown in Output 5.3. The data set that is shown in this calendar is created
in Example 7 on page 143.
83
---------------------------------------------------------------------------------------------------------|
|
|
December
|
|
|
1996
|--------------------------------------------------------------------------------------------------------|
|
Sunday
|
Monday
|
Tuesday
| Wednesday
|
Thursday
|
Friday
|
Saturday
|
|--------------+--------------+--------------+--------------+--------------+--------------+--------------|
|
1
|
2
|
3
|
4
|
5
|
6
|
7
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
123 |
234 |
238 |
188 |
188 |
198 |
123 |
183 |
176 |
200 |
267 |
243 |
176 |
165 |
177 |
|
|
|
|--------------+--------------+--------------+--------------+--------------+--------------+--------------|
|
8
|
9
|
10
|
11
|
12
|
13
|
14
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
178 |
198 |
187 |
165 |
176 |
187 |
187 |
176 |
231 |
176 |
187 |
222 |
187 |
187 |
123 |
|
|
|
|--------------+--------------+--------------+--------------+--------------+--------------+--------------|
|
15
|
16
|
17
|
18
|
19
|
20
|
21
|
|
|
|
|
|
|
|
176 |
165 |
|
156 |
. |
|
198 |
143 |
|
178 |
198 |
|
165 |
176 |
|
|
|
|
|
177 |
167 |
167 |
187 |
187 |
|
|--------------+--------------+--------------+--------------+--------------+--------------+--------------|
|
22
|
23
|
24
|
25
|
26
|
27
|
28
|
|
|
|
|
|
|
|
187 |
187 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
123 |
|
|
|
|
|
|--------------+--------------+--------------+--------------+--------------+--------------+--------------|
|
|
|
29
|
|
|
30
|
|
|
31
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
----------------------------------------------------------------------------------------------------------------------------------------------------|
|
Sum
|
Mean
|
|
| Brkfst
| Lunch
|
|
|
|
2763 |
2830 |
|
172.688 |
188.667 |
| Dinner
|
2990 |
186.875 |
--------------------------------------------
84
Chapter 5
The following table lists the statements and options available in the CALENDAR
procedure according to function.
To do this
MEAN
SUM
DUR or FIN
CALID
Specify holidays
HOLISTART
HOLIDUR
HOLIFIN
HOLIVAR
To do this
Control display
OUTSTART
OUTDUR
OUTFIN
Specify grouping
BY
CALID
To do this
CALEDATA=
activities
DATA=
holidays
HOLIDATA=
WORKDATA=
Control printing
display all months, even if no activities exist
FILL
FORMCHAR=
HEADER=
LOCALE
MISSING
WEEKDAYS
DATETIME
DAYLENGTH=
INTERVAL=
LEGEND
MEANTYPE=
Options
85
86
Chapter 5
CALEDATA=SAS-data-set
species the calendar data set, a SAS data set that contains weekly work schedules
for multiple calendars.
Default: If you omit the CALEDATA= option, then PROC CALENDAR uses a
default work schedule, as described in The Default Calendars on page 103.
Tip: A calendar data set is useful if you are using multiple calendars or a
nonstandard work schedule.
See also: Calendar Data Set on page 109
Featured in: Example 3 on page 122
DATA=SAS-data-set
species the activities data set, a SAS data set that contains starting dates for all
activities and variables to display for each activity. Activities must be sorted or
indexed by starting date.
Default: If you omit the DATA= option, then the most recently created SAS data set
is used.
See also: Activities Data Set on page 107
Featured in: All examples. See Examples: CALENDAR Procedure on page 114
DATETIME
species that START and FIN variables contain values in DATETIME. format.
Default: If you omit the DATETIME option, then PROC CALENDAR assumes that
the START and FIN values are in the DATE. format.
Featured in: Example 3 on page 122
DAYLENGTH=hours
gives the number of hours in a standard working day. The hour value must be a SAS
TIME value.
Default: 24 if INTERVAL=DAY (the default), 8 if INTERVAL=WORKDAY.
Restriction: DAYLENGTH= applies only to schedule calendars.
Interaction: If you specify the DAYLENGTH= option and the calendar data set
contains a D_LENGTH variable, then PROC CALENDAR uses the DAYLENGTH=
value only when the D_LENGTH value is missing.
Interaction: When INTERVAL=DAY and you have no CALEDATA= data set,
specifying a DAYLENGTH= value has no effect.
Tip: The DAYLENGTH= option is useful when you use the DUR statement and
your work schedule contains days of varying lengths, for example, a 5 half-day
work week. In a work week with varying day lengths, you need to set a standard
day length to use in calculating duration times. For example, an activity with a
duration of 3.0 workdays lasts 24 hours if DAYLENGTH=8:00 or 30 hours if
DAYLENGTH=10:00.
Tip: Instead of specifying the DAYLENGTH= option, you can specify the length of
the working day by using a D_LENGTH variable in the CALEDATA= data set. If
you use this method, then you can specify different standard day lengths for
different calendars.
See also: Calendar Data Set on page 109 for more information on setting the
length of the standard workday
FILL
displays all months between the rst and last activity, start and nish dates
inclusive, including months that contain no activities.
Default: If you do not specify FILL, then PROC CALENDAR prints only months
that contain activities. (Months that contain only holidays are not printed.)
Featured in:
87
FORMCHAR <(position(s))>=formatting-character(s)
denes the characters to use for constructing the outlines and dividers for the cells in
the calendar as well as all identifying markers (such as asterisks and arrows) used to
indicate holidays or continuation of activities in PROC CALENDAR output.
position(s)
identies the position of one or more characters in the SAS formatting-character
string. A space or a comma separates the positions.
Default: Omitting (position(s)) is the same as specifying all 20 possible system
formatting characters, in order.
Range: PROC CALENDAR uses 17 of the 20 formatting characters that SAS
provides. Table 5.1 on page 87 shows the formatting characters that PROC
CALENDAR uses. Figure 5.1 on page 88 illustrates their use in PROC
CALENDAR output.
formatting-character(s)
lists the characters to use for the specied positions. PROC CALENDAR assigns
characters in formatting-character(s) to position(s), in the order that they are listed.
For instance, the following option assigns an asterisk (*) to the twelfth position,
assigns a single dash (-) to the thirteenth, and does not alter remaining characters:
formchar(12 13)=*-
to this:
*------------------ACTIVITY--------------*
Interaction: The SAS system option FORMCHAR= species the default formatting
characters. The SAS system option denes the entire string of formatting
characters. The FORMCHAR= option in a procedure can redene selected
characters.
Tip: You can use any character in formatting-characters, including hexadecimal
characters. If you use hexadecimal characters, then you must put an x after the
closing quotation mark. For instance, the following option assigns the hexadecimal
character 2D to the third formatting character, the hexadecimal character 7C to
the seventh character, and does not alter the remaining characters:
formchar(3,7)=2D7Cx
See also: For information on which hexadecimal codes to use for which characters,
Default
Used to draw
vertical bar
horizontal bar
88
Chapter 5
Position
Default
Used to draw
10
11
12
13
activity line
16
activity separator
18
<
19
>
activity continuation to
20
holiday marker
89
species the type of heading to use in printing the name of the month.
SMALL
prints the month and year on one line.
MEDIUM
prints the month and year in a box four lines high.
LARGE
prints the month seven lines high using asterisks (*). The year is included if space
is available.
Default: MEDIUM
HOLIDATA=SAS-data-set
species the holidays data set, a SAS data set that contains the holidays you want to
display in the output. One variable must contain the holiday names and another
must contain the starting dates for each holiday. PROC CALENDAR marks holidays
in the calendar output with asterisks (*) when space permits.
Interaction: Displaying holidays on a calendar requires a holidays data set and a
HOLISTART statement. A HOLIVAR statement is recommended for naming
holidays. HOLIDUR is required if any holiday lasts longer than one day.
Tip: The holidays data set does not require sorting.
See also: Holidays Data Set on page 108
Featured in: All examples. See Examples: CALENDAR Procedure on page 114
INTERVAL=DAY | WORKDAY
species the units of the DUR and HOLIDUR variables to one of two default
daylengths:
DAY
species the values of the DUR and HOLIDUR variables in units of 24-hour days
and species the default 7-day calendar. For instance, a DUR value of 3.0 is
treated as 72 hours. The default calendar work schedule consists of seven working
days, all starting at 00:00 with a length of 24:00.
WORKDAY
species the values of the DUR and HOLIDUR variables in units of 8-hour days
and species that the default calendar contains ve days a week, Monday through
Friday, all starting at 09:00 with a length of 08:00. When WORKDAY is specied,
PROC CALENDAR treats the values of the DUR and HOLIDUR variables in units
of working days, as dened in the DAYLENGTH= option, the CALEDATA= data
set, or the default calendar. For example, if the working day is 8 hours long, then
a DUR value of 3.0 is treated as 24 hours.
Default: DAY
Interaction: In the absence of a CALEDATA= data set, PROC CALENDAR uses
the work schedule dened in a default calendar.
Interaction: The WEEKDAYS option automatically sets the INTERVAL= value to
WORKDAY.
See also: Calendars and Multiple Calendars on page 104 and Calendar Data Set
on page 109 for more information on the INTERVAL= option and the specication
of working days; The Default Calendars on page 103
Featured in: Example 5 on page 134
LEGEND
prints the names of the variables whose values appear in the calendar. This
identifying text, or legend box, appears at the bottom of the page for each month if
90
Chapter 5
LOCALE (Experimental)
prints the names of months and weekdays in the language that is indicated by the
value of the LOCALE= SAS system option. The LOCALE option in PROC
CALENDAR does not change the starting day of the week.
Default: If LOCALE is not specied, then names of months and weekdays are
printed in English.
CAUTION:
LOCALE is an experimental option that is available in SAS 9.1. Do not use this option
in production jobs. 4
MEANTYPE=NOBS | NDAYS
However, it may omit some days if you use the OUTSTART statement with the
OUTDUR or OUTFIN statement.
Featured in:
MISSING
determines how missing values are treated, based on the type of calendar.
Summary Calendar
If there is a day without an activity scheduled, then PROC CALENDAR prints the
values of variables for that day by using the SAS or user-dened that is format
specied for missing values.
Default: If you omit MISSING, then days without activities contain no values.
Schedule Calendar
variables with missing values appear in the label of an activity, using the format
specied for missing values.
Default: If you do not specify MISSING, then PROC CALENDAR ignores missing
missing values
WEEKDAYS
suppresses the display of Saturdays and Sundays in the output. It also species that
the value of the INTERVAL= option is WORKDAY.
Default: If you omit WEEKDAYS, then the calendar displays all seven days.
BY Statement
91
Tip:
Featured in:
WORKDATA=SAS-data-set
species the workdays data set, a SAS data set that denes the work pattern during
a standard working day. Each numeric variable in the workdays data set denotes a
unique workshift pattern during one working day.
Tip:
The workdays data set is useful in conjunction with the calendar data set.
See also: Workdays Data Set on page 110 and Calendar Data Set on page 109
Featured in:
BY Statement
Processes activities separately for each BY group, producing a separate calendar for each value of
the BY variable.
Calendar type: Summary and schedule
Main discussion:
BY on page 58
BY <DESCENDING> variable-1
<< DESCENDING> variable-n>
<NOTSORTED>;
Required Arguments
variable
species the variable that the procedure uses to form BY groups. You can specify
more than one variable, but the observations in the data set must be sorted by all the
variables that you specify or have an appropriate index. Variables in a BY statement
are called BY variables.
92
CALID Statement
Chapter 5
Options
DESCENDING
species that the observations are sorted in descending order by the variable that
immediately follows the word DESCENDING in the BY statement.
NOTSORTED
species that observations are not necessarily sorted in alphabetic or numeric order.
The observations are grouped in another way, for example, chronological order.
CALID Statement
Processes activities in groups dened by the values of a calendar identier variable.
Calendar type: Summary and schedule
Tip: Useful for producing multiple schedule calendars and for use with SAS/OR
software.
See also: Calendar Data Set on page 109
Featured in:
Example 2 on page 118, Example 3 on page 122, and Example 6 on page 137
CALID variable
</ OUTPUT=COMBINE|MIX|SEPARATE>;
Required Arguments
variable
Requirement:
DUR Statement
93
Tip:
Options
OUTPUT=COMBINE|MIX|SEPARATE
controls the amount of space required to display output for multiple calendars.
COMBINE
produces one page for each month that contains activities and subdivides each day
by the CALID value.
Restriction: The input data must be sorted by or indexed on the START variable.
Featured in: Example 2 on page 118 and Example 4 on page 127
MIX
produces one page for each month that contains activities and does not identify
activities by the CALID value.
Restriction: The input data must be sorted by or indexed on the START variable.
Tip: MIX requires the least space for output.
Featured in: Example 4 on page 127
SEPARATE
produces a separate page for each value of the CALID variable.
Restriction: The input data must be sorted by the CALID variable and then by
DUR Statement
Species the variable that contains the duration of each activity.
Alias:
DURATION
If you use both a DUR and a FIN statement, then DUR is ignored.
To produce a schedule calendar, you must use either a DUR or FIN statement.
Featured in:
DUR variable;
94
FIN Statement
Chapter 5
Required Arguments
variable
Duration
3 Duration is measured inclusively from the start of the activity (as given in the
START variable). In the output, any activity that lasts part of a day is displayed
as lasting a full day.
3 The INTERVAL= option in a PROC CALENDAR statement automatically sets the
unit of the duration variable, depending on its own value as follows:
If INTERVAL=
24 hours
WORKDAY
8 hours
FIN Statement
Species the variable in the activities data set that contains the nishing date of each activity.
FINISH
Calendar type: Schedule
Interaction: If you use both a FIN and a DUR statement, then FIN is used.
Tip: To produce a schedule calendar, you must use either a FIN or DUR statement.
Featured in: Example 6 on page 137
Alias:
FIN variable;
Required Arguments
variable
HOLIFIN Statement
95
Restriction: If the FIN variable contains datetime values, then you must specify
HOLIDUR Statement
Species the variable in the holidays data set that contains the duration of each holiday for a
schedule calendar.
Alias: HOLIDURATION
Calendar type: Schedule
Default: If you do not use a HOLIDUR or HOLIFIN statement, then all holidays last
one day.
Restriction: Cannot use with a HOLIFIN statement.
Featured in: Example 1 on page 114 through Example 5 on page 134
HOLIDUR variable;
Required Arguments
variable
Holiday Duration
3 If you use both the HOLIFIN and HOLIDUR statements, then PROC CALENDAR
uses the HOLIFIN variable value to dene each holidays duration.
3 Set the unit of the holiday duration variable in the same way that you set the unit
of the duration variable; use either the INTERVAL= and DAYLENGTH= options
or the CALEDATA= data set.
3 Duration is measured inclusively from the start of the holiday (as given in the
HOLISTART variable). In the output, any holiday lasting at least half a day
appears as lasting a full day.
HOLIFIN Statement
Species the variable in the holidays data set that contains the nishing date of each holiday.
Alias: HOLIFINISH
Calendar type: Schedule
Default: If you do not use a HOLIFIN or HOLIDUR statement, then all holidays last
one day.
96
HOLISTART Statement
Chapter 5
HOLIFIN variable;
Required Arguments
variable
Holiday Duration
If you use both the HOLIFIN and the HOLIDUR statements, then PROC
CALENDAR uses only the HOLIFIN variable.
HOLISTART Statement
Species a variable in the holidays data set that contains the starting date of each holiday.
Alias:
HOLISTA, HOLIDAY
HOLISTART variable;
Required Arguments
variable
Details
3 The holidays data set need not be sorted.
3 All holidays last only one day, unless you use a HOLIFIN or HOLIDUR statement.
3 If two or more holidays occur on the same day, then PROC CALENDAR uses only
the rst observation.
MEAN Statement
HOLIVAR Statement
Species a variable in the holidays data set whose values are used to label the holidays.
HOLIVARIABLE, HOLINAME
Calendar type: Summary and schedule
Alias:
If you do not use a HOLIVAR statement, then PROC CALENDAR uses the
word DATE to identify holidays.
Default:
HOLIVAR variable;
Required Arguments
variable
a variable whose values are used to label the holidays. Typically, this variable
contains the names of the holidays.
Range: character or numeric.
Restriction: This variable must be in the holidays data set.
Tip:
MEAN Statement
Species numeric variables in the activities data set for which mean values are to be calculated
for each month.
Calendar type: Summary
Tip:
Required Arguments
variable(s)
numeric variable for which mean values are calculated for each month.
Restriction: This variable must be in the activities data set.
97
98
OUTDUR Statement
Chapter 5
Options
FORMAT=format-name
OUTDUR Statement
Species in days the length of the week to be displayed.
OUTDURATION
Requirement: The OUTSTART statement is required.
Alias:
OUTDUR number-of-days;
Required Arguments
number-of-days
Length of Week
Use either the OUTDUR or OUTFIN statement to supply the procedure with
information about the length of the week to display. If you use both, then PROC
CALENDAR ignores the OUTDUR statement.
OUTFIN Statement
Species the last day of the week to display in the calendar.
OUTFINISH
The OUTSTART statement is required.
Featured in: Example 3 on page 122 and Example 8 on page 147
Alias:
Requirement:
START Statement
99
OUTFIN day-of-week;
Required Arguments
day-of-week
the name of the last day of the week to display. For example,
outfin friday;
Length of Week
Use either the OUTFIN or OUTDUR statement to supply the procedure with
information about the length of the week to display. If you use both, then PROC
CALENDAR uses only the OUTFIN statement.
OUTSTART Statement
Species the starting day of the week to display in the calendar.
OUTSTA
If you do not use OUTSTART, then each calendar week begins with Sunday.
Featured in: Example 3 on page 122 and Example 8 on page 147
Alias:
Default:
OUTSTART day-of-week;
Required Arguments
day-of-week
the name of the starting day of the week for each week in the calendar. For example,
outstart monday;
START Statement
Species the variable in the activities data set that contains the starting date of each activity.
STA, DATE, ID
Required: START is required for both summary and schedule calendars.
Featured in: All examples
Alias:
100
SUM Statement
Chapter 5
START variable;
Required Arguments
variable
SUM Statement
Species numeric variables in the activities data set to total for each month.
Calendar type: Summary
Tip: To apply different formats to variables that are being summed, use multiple SUM
statements.
Featured in:
Required Arguments
variable(s)
Options
FORMAT=format-name
F=
Type of Calendars
101
3 The sum appears in the LEGEND box if you specify the LEGEND option.
3 PROC CALENDAR automatically displays variables named in a SUM statement
in the calendar output, even if the variables are not named in the VAR statement.
VAR Statement
Species the variables that you want to display for each activity.
Alias:
VARIABLE
VAR variable(s);
Required Arguments
variable(s)
species one or more variables that you want to display in the calendar.
Range: The values of variable can be either character or numeric.
Restriction: These variables must be in the activities data set.
Tip:
Details
When VAR Is Not Used
If you do not use a VAR statement, then the procedure displays all variables in the
activities data set in the order in which they occur in the data set, except for the BY,
CALID, START, DUR, and FIN variables. However, not all variables are displayed if
the LINESIZE= and PAGESIZE= settings do not allow enough space in the calendar.
Display of Variables
3 PROC CALENDAR displays variables in the order that they appear in the VAR
statement. Not all variables are displayed, however, if the LINESIZE= and
PAGESIZE= settings do not allow enough space in the calendar.
Type of Calendars
PROC CALENDAR can produce two kinds of calendars: schedule and summary.
102
Schedule Calendar
Chapter 5
Use a
if you want to
schedule calendar
schedule calendar
summary calendar
Note: PROC CALENDAR produces a summary calendar if you do not use a DUR or
FIN statement in the PROC step. 4
Schedule Calendar
Denition
A report in calendar format that shows when activities and holidays start and end.
Required Statements
You must supply a START statement and either a DUR or FIN statement.
Use this statement
START
DUR*
duration of an activity
FIN*
* Choose one of these. If you do not use a DUR or FIN statement, then PROC CALENDAR
assumes that you want to create a summary calendar report.
Examples
See Simple Schedule Calendar on page 79, Advanced Schedule Calendar on page
80, as well as Example 1 on page 114, Example 2 on page 118, Example 3 on page 122,
Example 4 on page 127, Example 5 on page 134, and Example 6 on page 137
Summary Calendar
Denition
A report in calendar format that displays activities and holidays that last only one
day and that can provide summary information in the form of sums and means.
103
Required Statements
You must supply a START statement. This statement identies the variable in the
activities data set that contains an activitys starting date.
Examples
See Simple Summary Calendar on page 81, Example 7 on page 143, and Example 8
on page 147
3 your application uses a 5-day work week with 8-hour days or a 7-day work week
with 24-hour days. See Table 5.2 on page 103.
Then set
INTERVAL=
By default
DAYLENGTH=
Shown in
Example
7 (M-Sun)
DAY
24
24-hour days
5 (M-F)
WORKDAY
8-hour days
3 If the activities data set does not contain a CALID variable, then PROC
CALENDAR produces a default calendar.
3 If both the holidays and calendar data sets do not contain a CALID variable, then
PROC CALENDAR produces a default calendar even if the activities data set
contains a CALID variable.
3 If the activities and calendar data sets contain the CALID variable, but the
holidays data set does not, then the default holidays are used.
104
Chapter 5
Examples
See the 7-day default calendar in Output 5.1 and the 5-day default calendar in
Example 1 on page 114
Denitions
calendar
a logical entity that represents a weekly work pattern, which consists of weekly
work schedules and daily shifts. PROC CALENDAR contains two default work
patterns: 5-day week with an 8-hour day or a 7-day week with a 24-hour day. You
can also dene your own work patterns by using CALENDAR and WORKDAYS
data sets.
calendar report
a report in calendar format that displays activities, holidays, and nonwork periods.
A calendar report can contain multiple calendars in one of three formats
separate
Each identied calendar prints on separate output pages.
combined
All identied calendars print on the same output pages and each is identied.
mixed
All identied calendars print on the same output pages but are not identied
as belonging to separate calendars.
multiple calendar
a logical entity that represents multiple weekly work patterns.
105
to. Use the CALID statement to specify the variable whose values identify the
appropriate calendar. This variable can be numeric or character.
You can use the special variable name _CAL_ or you can use another variable name.
PROC CALENDAR automatically looks for a variable named _CAL_ in the holiday and
calendar data sets, even when the activities data set uses a variable with another name
as the CALID variable. Therefore, if you use the name _CAL_ in your holiday and
calendar data sets, then you can more easily reuse these data sets in different calendar
applications.
3 Every value of the CALID variable that appears in either the holidays or calendar
data sets denes a calendar.
3 If a CALID value appears in the HOLIDATA= data set but not in the
CALEDATA= data set, then the work schedule of the default calendar is used.
3 If a CALID value appears in the CALEDATA= data set but not in the
HOLIDATA= data set, then the holidays of the default calendar are used.
3 If a CALID value does not appear in either the HOLIDATA= or CALEDATA= data
set, then the work schedule and holidays of the default calendar are used.
3 If the CALID variable is not found in the holiday or calendar data set, then PROC
CALENDAR looks for the default variable _CAL_ instead. If neither the CALID
variable nor a _CAL_ variable appears in a data set, then the observations in that
data set are applied to a default calendar.
3 print separate calendars on the same page and identify each one.
3 print separate calendars on the same page without identifying them.
3 print separate pages for each identied calendar.
As an example, consider a calendar that shows the activities of all departments
within a division. Each department can have its own calendar identication value and,
if necessary, can have individual weekly work patterns, daily work shifts, and holidays.
If you place activities that are associated with different calendars in the same
activities data sets, then you use PROC CALENDAR to produce calendar reports that
print
3 the schedule and events for each department on a separate pages (separate output)
3 the schedule and events for the entire division, each identied by department
(combined output)
3 the schedule and events for the entire division, but not identied by department
(mixed output).
The multiple-calendar feature was added specically to enable PROC CALENDAR to
process the output of PROC CPM in SAS/OR software, a project management tool. See
Example 6 on page 137.
106
Chapter 5
How to Identify Calendars with the CALID Statement and the Special
Variable _CAL_
To identify multiple calendars, you must use the CALID statement to specify the
variable whose values identify which calendar an event belongs with. This variable can
be numeric or character.
You can use the special variable name _CAL_ or you can use another variable name.
PROC CALENDAR automatically looks for a variable named _CAL_ in the holiday and
calendar data sets, even when the activities data set uses a variable with another name
as the CALID variable. Therefore, if you use the name _CAL_ in your holiday and
calendar data sets, then you can more easily reuse these data sets in different calendar
applications.
3 Every value of the CALID variable that appears in either the holidays or calendar
data sets denes a calendar.
3 If a CALID value appears in the HOLIDATA= data set but not in the
CALEDATA= data set, then the work schedule of the default calendar is used.
3 If a CALID value appears in the CALEDATA= data set but not in the
HOLIDATA= data set, then the holidays of the default calendar are used.
3 If a CALID value does not appear in either the HOLIDATA= or CALEDATA= data
set, then the work schedule and holidays of the default calendar are used.
3 If the CALID variable is not found in the holiday or calendar data sets, then
PROC CALENDAR looks for the default variable _CAL_ instead. If neither the
CALID variable nor a _CAL_ variable appears in a data set, then the observations
in that data set are applied to a default calendar.
Examples
Example 2 on page 118, Example 3 on page 122, Example 4 on page 127, and
Example 8 on page 147
Description
activities
DATA= option
holidays
HOLIDATA= option
Data Set
Description
calendar
CALEDATA= option
workdays
107
WORKDATA= option
Structure
Each observation in the activities data set contains information about one activity.
One variable must contain the starting date. If you are producing a schedule calendar,
then another variable must contain either the activity duration or nishing date. Other
variables can contain additional information about an activity.
If a variable contains an
activitys
starting date
START statement
duration
DUR statement
Schedule
nishing date
FIN statement
Schedule
108
Chapter 5
is used. In such situations, you might nd PROC SUMMARY useful to collapse your
data set to contain one activity per starting date.
Examples
Every example in the Examples section uses an activities data set.
Purpose
You can use a holidays data set, specied with the HOLIDATA= option, to
Structure
Each observation in the holidays data set must contain at least the holiday starting
date. A holiday lasts only one day unless a duration or nishing date is specied.
Supplying a holiday name is recommended, though not required. If you do not specify
which variable contains the holiday name, then PROC CALENDAR uses the word DATE
to identify each holiday.
If a variable contains a holidays
starting date
HOLISTART
name
HOLIVAR
duration
HOLIDUR
nishing date
HOLIFIN
No Sorting Needed
You do not need to sort or index the holidays data set.
109
Examples
Every example in the Examples section uses a holidays data set.
Structure
Each observation in the calendar data set denes one weekly work schedule. The
data set created in the DATA step shown below denes weekly work schedules for two
calendars, CALONE and CALTWO.
data cale;
input _sun_ $ _mon_ $ _tue_ $ _wed_ $ _thu_ $ /
_fri_ $ _sat_ $ _cal_ $ d_length time6.;
datalines;
holiday workday workday workday workday
workday holiday calone 8:00
holiday shift1 shift1 shift1 shift1
shift2 holiday caltwo 9:00
;
110
Chapter 5
_CAL_
the CALID (calendar identier) variable. The values of this variable identify
different calendars. If this variable is not present, then the rst observation in
this data set denes the work schedule that is applied to all calendars in the
activities data set.
If the CALID variable contains a missing value, then the character or numeric
value for the default calendar (DEFAULT or 0) is used. See The Default Calendars
on page 103 for further details.
D_LENGTH
the daylength identier variable. Values of D_LENGTH indicate the length of the
standard workday to be used in calendar calculations. You can set the workday
length either by placing this variable in your calendar data set or by using the
DAYLENGTH= option.
Missing values for this variable default to the number of hours specied in the
DAYLENGTH= option; if the DAYLENGTH= option is not used, the day length
defaults to 24 hours if INTERVAL=DAY, or 8 hours if INTERVAL=WORKDAY.
DAY
00:00
24 hours
WORKDAY
9:00
8 hours
You can reset the length of the standard workday with the DAYLENGTH= option or
a D_LENGTH variable in the calendar data set. You can dene other work shifts in a
workdays data set.
Examples
Example 3 on page 122, Example 4 on page 127, and Example 7 on page 143 feature
a calendar data set.
If INTERVAL=
DAY
00:00
24 hours
WORKDAY
9:00
111
8 hours
Structure
Each variable in the workdays data set contains one daily schedule of alternating
work and nonwork periods. For example, this DATA step creates a data set that
contains specications for two work shifts:
data work;
input shift1 time6. shift2 time6.;
datalines;
7:00 7:00
12:00 11:00
13:00
.
17:00
.
;
The variable SHIFT1 species a 10-hour workday, with one nonwork period (a lunch
hour); the variable SHIFT2 species a 4-hour workday with no nonwork periods.
Examples
See Example 3 on page 122
Variable
Activities (DATA=)
CALID
START
DUR
1.0 is used
FIN
112
Data set
Chapter 5
Variable
VAR
CALID
D_LENGTH
SUM, MEAN
CALID
HOLISTART
HOLIDUR
HOLIFIN
HOLIVAR
Workdays (WORKDATA=)
Holiday (HOLIDATA=)
SUM, MEAN
Calendar (CALEDATA=)
no value is used
any
113
omits values to make the output t the page and prints messages to that effect in the
SAS log.
114
Chapter 5
This example
3 creates a schedule calendar
3 uses one of the two default work patterns: 8-hour day, 5-day week
Program
Create the activities data set. ALLACTY contains both personal and business activities
information for a bank president.
data allacty;
input date : date7. event $ 9-36 who $ 37-48 long;
datalines;
01JUL96 Dist. Mtg.
All
1
17JUL96 Bank Meeting
1st Natl
1
02JUL96 Mgrs. Meeting
District 6
2
11JUL96 Mgrs. Meeting
District 7
2
03JUL96 Interview
JW
1
08JUL96 Sales Drive
District 6
5
15JUL96 Sales Drive
District 7
5
08JUL96 Trade Show
Knox
3
22JUL96
11JUL96
18JUL96
25JUL96
12JUL96
19JUL96
18JUL96
05JUL96
19JUL96
16JUL96
24JUL96
25JUL96
;
Inventors Show
Planning Council
Planning Council
Planning Council
Seminar
Seminar
NewsLetter Deadline
VIP Banquet
Co. Picnic
Dentist
Birthday
Close Sale
Melvin
Group II
Group III
Group IV
White
White
All
JW
All
JW
Mary
WYGIX Co.
Program
115
3
1
1
1
1
1
1
1
1
1
1
2
Sort the activities data set by the variable that contains the starting date. You are not
required to sort the holidays data set.
proc sort data=allacty;
by date;
run;
Set LINESIZE= appropriately. If the line size is not long enough to print the variable values,
then PROC CALENDAR either truncates the values or produces no calendar output.
options nodate pageno=1 linesize=132 pagesize=60;
Create the schedule calendar. DATA= identies the activities data set; HOLIDATA= identies
the holidays data set. WEEKDAYS species that a week consists of ve eight-hour work days.
proc calendar data=allacty holidata=hol weekdays;
Specify an activity start date variable and an activity duration variable. The START
statement species the variable in the activities data set that contains the starting date of the
activities; DUR species the variable that contains the duration of each activity. Creating a
schedule calendar requires START and DUR.
start date;
dur long;
116
Program
Chapter 5
Output
117
Output
Julia Cho
July
|
|
|
1996
|---------------------------------------------------------------------------------------------------------------------------------|
|
Monday
|
Tuesday
|
Wednesday
|
Thursday
|
Friday
|
|-------------------------+-------------------------+-------------------------+-------------------------+-------------------------|
|
1
|
2
|
3
|
4
|
5
|
|
|
|
|******Independence*******|********Vacation*********|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|+=====Interview/JW======+|
|+====Dist. Mtg./All=====+|+============Mgrs. Meeting/District 6=============+|
|
|
|
|
|
|
|-------------------------+-------------------------+-------------------------+-------------------------+-------------------------|
|
8
|
9
|
10
|
11
|
12
|
|********Vacation*********|********Vacation*********|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|+Planning Council/Group +|+=====Seminar/White=====+|
|+==============================Trade Show/Knox==============================+|
|+==========================Sales Drive/District 6===========================>|
|
|
|+====VIP Banquet/JW=====+|+============Mgrs. Meeting/District 7=============+|
|-------------------------+-------------------------+-------------------------+-------------------------+-------------------------|
|
15
|
16
|
17
|
18
|
19
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|+======Dentist/JW=======+|
|
|
|
|
|
|
|
|
|
|+NewsLetter Deadline/All+|+====Co. Picnic/All=====+|
22
|
|
|
|
|
|
23
|
|
|
|
|
|
24
|
|
|
|
|
|
25
|
|
|
|
|
|
26
|
|
|
|
|
|
|
|
|+=====Birthday/Mary=====+|+==============Close Sale/WYGIX Co.===============+|
|+===========================Inventors Show/Melvin===========================+|+Planning Council/Group +|
|
|-------------------------+-------------------------+-------------------------+-------------------------+-------------------------|
|
29
|
30
|
31
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
-----------------------------------------------------------------------------------------------------------------------------------
118
Chapter 5
CALID statement:
_CAL_ variable
OUTPUT=COMBINE option
DUR statement
24-hour day, 7-day week
Program
Create the activities data set and identify separate calendars. ALLACTY2 contains both
personal and business activities for a bank president. The _CAL_ variable identies which
calendar an event belongs to.
data allacty2;
input date:date7. happen $ 10-34 who $ 35-47
datalines;
01JUL96 Dist. Mtg.
All
02JUL96 Mgrs. Meeting
District 6
03JUL96 Interview
JW
05JUL96 VIP Banquet
JW
06JUL96 Beach trip
family
08JUL96 Sales Drive
District 6
08JUL96 Trade Show
Knox
09JUL96 Orthodontist
Meagan
11JUL96 Mgrs. Meeting
District 7
11JUL96 Planning Council
Group II
12JUL96 Seminar
White
14JUL96 Co. Picnic
All
14JUL96 Business trip
Fred
15JUL96 Sales Drive
District 7
16JUL96 Dentist
JW
17JUL96 Bank Meeting
1st Natl
17JUL96 Real estate agent
Family
18JUL96 NewsLetter Deadline
All
18JUL96 Planning Council
Group III
19JUL96 Seminar
White
22JUL96 Inventors Show
Melvin
24JUL96 Birthday
Mary
_CAL_ $ long;
CAL1
CAL1
CAL1
CAL1
CAL2
CAL1
CAL1
CAL2
CAL1
CAL1
CAL1
CAL1
CAL2
CAL1
CAL1
CAL1
CAL2
CAL1
CAL1
CAL1
CAL1
CAL1
1
2
1
1
2
5
3
1
2
1
1
1
2
5
1
1
1
1
1
1
3
1
25JUL96
25JUL96
27JUL96
;
Planning Council
Close Sale
Ballgame
Group IV
WYGIX Co.
Family
CAL1
CAL1
CAL2
Program
119
1
2
1
Create the holidays data set and identify which calendar a holiday affects. The _CAL_
variable identies which calendar a holiday belongs to.
data vac;
input hdate:date7. holiday $ 11-25 _CAL_ $ ;
datalines;
29JUL96
vacation
CAL2
04JUL96
Independence
CAL1
;
Sort the activities data set by the variable that contains the starting date. When
creating a calendar with combined output, you sort only by the activity starting date, not by the
CALID variable. You are not required to sort the holidays data set.
proc sort data=allacty2;
by date;
run;
Set LINESIZE= appropriately. If the linesize is not long enough to print the variable values,
then PROC CALENDAR either truncates the values or produces no calendar output.
options nodate pageno=1 pagesize=60 linesize=132;
Create the schedule calendar. DATA= identies the activities data set; HOLIDATA=
identies the holidays data set. By default, the output calendar displays a 7-day week.
proc calendar data=allacty2 holidata=vac;
Combine all events and holidays on a single calendar. The CALID statement species the
variable that identies which calendar an event belongs to. OUTPUT=COMBINE places all
events and holidays on the same calendar.
calid _CAL_ / output=combine;
Specify an activity start date variable and an activity duration variable. The START
statement species the variable in the activities data set that contains the starting date of the
activities; DUR species the variable that contains the duration of each activity. Creating a
schedule calendar requires START and DUR.
start date ;
dur long;
120
Program
Chapter 5
Retrieve holiday information. The HOLISTART and HOLIVAR statements specify the
variables in the holidays data set that contain the start date and name of each holiday,
respectively. HOLISTART is required when you use a holidays data set.
holistart hdate;
holivar holiday;
Output
121
Output
Julia Cho
|
|
|
|
|
|+=Interview/JW=+|**Independence**|
|
|+Dist. Mtg./All+|+===Mgrs. Meeting/District 6====+|
|+VIP Banquet/JW+|
|
|
|
|
|
|
|
|
|
|---------+----------------+----------------+----------------+----------------+----------------+----------------+----------------|
|
|
7
|
8
|
9
|
10
|
11
|
12
|
13
|
|.........|................|................|................|................|................|................|................|
| CAL2
|<Beach trip/fam+|
|+Orthodontist/M+|
|
|
|
|
|.........|................|................|................|................|................|................|................|
| CAL1
|
|
|
|
|
|
|
|
|+Planning Counc+|+Seminar/White=+|
|+================Trade Show/Knox=================+|+===Mgrs. Meeting/District 7====+|
|+==============================Sales Drive/District 6==============================+|
|
|
|
|---------+----------------+----------------+----------------+----------------+----------------+----------------+----------------|
|
|
14
|
15
|
16
|
17
|
18
|
19
|
20
|
|.........|................|................|................|................|................|................|................|
| CAL2
|+======Business trip/Fred=======+|
|+Real estate ag+|
|
|
|
|.........|................|................|................|................|................|................|................|
| CAL1
|
|
|
|
|+Planning Counc+|
|
|
|
|
|
|
|+==Dentist/JW==+|+Bank Meeting/1+|+NewsLetter Dea+|+Seminar/White=+|
|+Co. Picnic/All+|+==============================Sales Drive/District 7==============================+|
|
|
|---------+----------------+----------------+----------------+----------------+----------------+----------------+----------------|
|
|
21
|
22
|
23
|
24
|
25
|
26
|
27
|
|.........|................|................|................|................|................|................|................|
| CAL2
|
|
|
|
|
|
|+Ballgame/Famil+|
|.........|................|................|................|................|................|................|................|
| CAL1
|
|
|
|+Birthday/Mary=+|+=====Close Sale/WYGIX Co.======+|
|
|
|
|+=============Inventors Show/Melvin==============+|+Planning Counc+|
|
|
|
|
|
|
|
|
|
|
|
|---------+----------------+----------------+----------------+----------------+----------------+----------------+----------------|
|
|
28
|
29
|
30
|
31
|
|
|
|
|.........|................|................|................|................|................|................|................|
| CAL2
|
|
|
|
|
|****vacation****|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
----------------------------------------------------------------------------------------------------------------------------------
122
Chapter 5
This example
3 produces separate output pages for each calendar in a single PROC step
And set
OUTPUT= to
See Example
SEPARATE
3, 8
starting date
COMBINE
4, 2
starting date
MIX
Program
Specify a library so that you can permanently store the activities data set.
libname well SAS-data-library;
Program
123
Create the activities data set and identify separate calendars. WELL.ACT is a
permanent SAS data set that contains activities for a well construction project. The _CAL_
variable identies the calendar that an activity belongs to.
data well.act;
input task & $16. dur : 5. date : datetime16.
datalines;
Drill Well
3.50 01JUL96:12:00:00 CAL1
Lay Power Line
3.00 04JUL96:12:00:00 CAL1
Assemble Tank
4.00 05JUL96:08:00:00 CAL1
Build Pump House
3.00 08JUL96:12:00:00 CAL1
Pour Foundation
4.00 11JUL96:08:00:00 CAL1
Install Pump
4.00 15JUL96:14:00:00 CAL1
Install Pipe
2.00 19JUL96:08:00:00 CAL1
Erect Tower
6.00 20JUL96:08:00:00 CAL1
Deliver Material
2.00 01JUL96:12:00:00 CAL2
Excavate
4.75 03JUL96:08:00:00 CAL2
;
_cal_ $ cost;
1000
2000
1000
2000
1500
500
1000
2500
500
3500
Create the holidays data set. The _CAL_ variable identies the calendar that a holiday
belongs to.
data well.hol;
input date date. holiday $ 11-25 _cal_ $;
datalines;
09JUL96
Vacation
CAL2
04JUL96
Independence
CAL1
;
Create the calendar data set. Each observation denes the workshifts for an entire week.
The _CAL_ variable identies to which calendar the workshifts apply. CAL1 uses the default
8-hour workshifts for Monday through Friday. CAL2 uses a half day on Saturday and the
default 8-hour workshift for Monday through Friday.
data well.cal;
input _sun_ $
_fri_ $
datalines;
Holiday Holiday
Holiday Halfday
;
Create the workdays data set. This data set denes the daily workshifts that are named in
the calendar data set. Each variable (not observation) contains one daily schedule of alternating
work and nonwork periods. The HALFDAY workshift lasts 4 hours.
data well.wor;
input halfday time5.;
datalines;
124
Program
Chapter 5
08:00
12:00
;
Sort the activities data set by the variables that contain the calendar identication
and the starting date, respectively. You are not required to sort the holidays data set.
proc sort data=well.act;
by _cal_ date;
run;
Set LINESIZE= appropriately. If the linesize is not long enough to print the variable values,
then PROC CALENDAR either truncates the values or produces no calendar output.
options nodate pageno=1 linesize=132 pagesize=60;
Create the schedule calendar. DATA= identies the activities data set; HOLIDATA=
identies the holidays data set; CALEDATA= identies the calendar data set; WORKDATA=
identies the workdays data set. DATETIME species that the variable specied with the
START statement contains values in SAS datetime format.
proc calendar data=well.act
holidata=well.hol
caledata=well.cal
workdata=well.wor
datetime;
Print each calendar on a separate page. The CALID statement species that the _CAL_
variable identies calendars. OUTPUT=SEPARATE prints information for each calendar on
separate pages.
calid _cal_ / output=separate;
Specify an activity start date variable and an activity duration variable. The START
statement species the variable in the activities data set that contains the activity starting
date; DUR species the variable that contains the activity duration. START and DUR are
required for a schedule calendar.
start date;
dur dur;
Retrieve holiday information. HOLISTART and HOLIVAR specify the variables in the
holidays data set that contain the start date and name of each holiday, respectively.
HOLISTART is required when you use a holidays data set.
holistart date;
holivar holiday;
Program
Customize the calendar appearance. OUTSTART and OUTFIN specify that the calendar
display a 6-day week, Monday through Saturday.
outstart Monday;
outfin Saturday;
125
126
Output
Chapter 5
Output
|
|
|
|
|
|
|****Independence****|
|
|
|
|
|
|
|
|
|
|
|
|
|+====================Drill Well/$1,000.00====================>|
|
|
|
|
|
|
|+Assemble Tank/$1,0>|
|+Lay Power Line/$2,>|
|<Drill Well/$1,000.+|
|
|
|
|--------------------+--------------------+--------------------+--------------------+--------------------+--------------------|
|
8
|
9
|
10
|
11
|
12
|
13
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|--------------------+--------------------+--------------------+--------------------+--------------------+--------------------|
|
15
|
16
|
17
|
18
|
19
|
20
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|+=========================================Install Pump/$500.00=========================================+|
|<=================Pour Foundation/$1,500.00==================+|
|+Install Pipe/$1,00>|
|
|
|--------------------+--------------------+--------------------+--------------------+--------------------+--------------------|
|
22
|
23
|
24
|
25
|
26
|
27
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|+========================================Erect Tower/$2,500.00=========================================>|
|
|<========Install Pipe/$1,000.00=========+|
|
|
|
|
|--------------------+--------------------+--------------------+--------------------+--------------------+--------------------|
|
|
|
|
|
29
|
|
|
|
|
30
|
|
|
|
|
31
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|<Erect Tower/$2,500+|
|
|
|
|
|
-------------------------------------------------------------------------------------------------------------------------------
Example 4: Multiple Schedule Calendars with Atypical Workshifts (Combined and Mixed Output)
127
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|+===============================Excavate/$3,500.00================================>|
|+==================Deliver Material/$500.00==================+|
|
|
|
|--------------------+--------------------+--------------------+--------------------+--------------------+--------------------|
|
8
|
9
|
10
|
11
|
12
|
13
|
|
|
|
|******Vacation******|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|<Excavate/$3,500.00>|
|<Excavate/$3,500.00+|
|
|
|
|--------------------+--------------------+--------------------+--------------------+--------------------+--------------------|
|
15
|
16
|
17
|
18
|
19
|
20
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|--------------------+--------------------+--------------------+--------------------+--------------------+--------------------|
|
22
|
23
|
24
|
25
|
26
|
27
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|--------------------+--------------------+--------------------+--------------------+--------------------+--------------------|
|
29
|
30
|
31
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
-------------------------------------------------------------------------------------------------------------------------------
128
Chapter 5
OUTPUT=COMBINE option
OUTPUT=MIXED option
DUR statement
OUTSTART statement
OUTFIN statement
Data sets:
This example
3 produces a schedule calendar
3 schedules activities around holidays
3
3
3
3
And set
OUTPUT= to
See Example
SEPARATE
3, 8
starting date
COMBINE
4, 2
starting date
MIX
129
Specify the SAS data library where the activities data set is stored.
libname well SAS-data-library;
Sort the activities data set by the variable that contains the starting date. Do not sort
by the CALID variable when producing combined calendar output.
proc sort data=well.act;
by date;
run;
Set PAGESIZE= and LINESIZE= appropriately. When you combine calendars, check the
value of PAGESIZE= to ensure that there is enough room to print the activities from multiple
calendars. If LINESIZE= is too small for the variable values to print, then PROC CALENDAR
either truncates the values or produces no calendar output.
options nodate pageno=1 linesize=132 pagesize=60;
Create the schedule calendar. DATA= identies the activities data set; HOLIDATA=
identies the holidays data set; CALEDATA= identies the calendar data set; WORKDATA=
identies the workdays data set. DATETIME species that the variable specied with the
START statement contains values in SAS datetime format.
proc calendar data=well.act
holidata=well.hol
caledata=well.cal
workdata=well.wor
datetime;
Combine all events and holidays on a single calendar. The CALID statement species
that the _CAL_ variable identies the calendars. OUTPUT=COMBINE prints multiple
calendars on the same page and identies each calendar.
calid _cal_ / output=combine;
Specify an activity start date variable and an activity duration variable. The START
statement species the variable in the activities data set that contains the starting date of the
activities; DUR species the variable that contains the duration of each activity. START and
DUR are required for a schedule calendar.
start date;
dur dur;
130
Chapter 5
Retrieve holiday information. HOLISTART and HOLIVAR specify the variables in the
holidays data set that contain the start date and name of each holiday, respectively.
HOLISTART is required when you use a holidays data set.
holistart date;
holivar holiday;
131
Output 5.7 Multiple Schedule Calendars with Atypical Workshifts (Combined Output)
Well Drilling Work Schedule: Combined Calendars
-----------------------------------------------------------------------------------------------------------------------|
|
|
July 1996
|
|
|
|----------------------------------------------------------------------------------------------------------------------|
|
Sunday
|
Monday
|
Tuesday
|
Wednesday
|
Thursday
|
Friday
|
Saturday
|
----------+----------------+----------------+----------------+----------------+----------------+----------------+----------------|
|
|
|
1
|
2
|
3
|
4
|
5
|
6
|
|.........|................|................|................|................|................|................|................|
| CAL1
|
|
|
|
|**Independence**|+Assemble Tank/>|
|
|
|
|
|
|
|
|+Lay Power Line>|
|
|
|
|+==============Drill Well/$1,000.00==============>|
|<Drill Well/$1,+|
|
|.........|................|................|................|................|................|................|................|
| CAL2
|
|
|
|+=======================Excavate/$3,500.00========================>|
|---------+----------------+----------------+----------------+----------------+----------------+----------------+----------------|
|
|
7
|
8
|
9
|
10
|
11
|
12
|
13
|
|.........|................|................|................|................|................|................|................|
| CAL1
|
|+===================Build Pump House/$2,000.00====================+|
|
|
|
|
|<=====================Assemble Tank/$1,000.00=====================+|
|
|
|
|
|<===Lay Power Line/$2,000.00====+|
|+===Pour Foundation/$1,500.00===>|
|
|.........|................|................|................|................|................|................|................|
| CAL2
|
|<Excavate/$3,50>|****Vacation****|<Excavate/$3,50+|
|
|
|
|---------+----------------+----------------+----------------+----------------+----------------+----------------+----------------|
|
|
14
|
15
|
16
|
17
|
18
|
19
|
20
|
|.........|................|................|................|................|................|................|................|
| CAL1
|
|+===============================Install Pump/$500.00===============================+|
|
|
|
|<===========Pour Foundation/$1,500.00============+|
|+Install Pipe/$>|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|---------+----------------+----------------+----------------+----------------+----------------+----------------+----------------|
|
|
21
|
22
|
23
|
24
|
25
|
26
|
27
|
|.........|................|................|................|................|................|................|................|
| CAL1
|
|+==============================Erect Tower/$2,500.00===============================>|
|
|
|
|<====Install Pipe/$1,000.00=====+|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|---------+----------------+----------------+----------------+----------------+----------------+----------------+----------------|
|
|
28
|
29
|
30
|
31
|
|
|
|
|.........|................|................|................|................|................|................|................|
| CAL1
|
|
|
|<Erect Tower/$2+|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
----------------------------------------------------------------------------------------------------------------------------------
132
Chapter 5
133
Output 5.8 Multiple Schedule Calendar with Atypical Workshifts (Mixed Output)
Well Drilling Work Schedule: Mixed Calendars
------------------------------------------------------------------------------------------------------------------------------|
|
|
July 1996
|
|
|
|-----------------------------------------------------------------------------------------------------------------------------|
|
Monday
|
Tuesday
|
Wednesday
|
Thursday
|
Friday
|
Saturday
|
|--------------------+--------------------+--------------------+--------------------+--------------------+--------------------|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|+Assemble Tank/$1,0>|
|
|
|
|+===============================Excavate/$3,500.00================================>|
|+==================Deliver Material/$500.00==================+|****Independence****|+Lay Power Line/$2,>|
|
|+====================Drill Well/$1,000.00====================>|****Independence****|<Drill Well/$1,000.+|
|
|--------------------+--------------------+--------------------+--------------------+--------------------+--------------------|
|
8
|
9
|
10
|
11
|
12
|
13
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|<Excavate/$3,500.00>|******Vacation******|<Excavate/$3,500.00+|+=======Pour Foundation/$1,500.00=======>|
|
|--------------------+--------------------+--------------------+--------------------+--------------------+--------------------|
|
15
|
16
|
17
|
18
|
19
|
20
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|+=========================================Install Pump/$500.00=========================================+|
|<=================Pour Foundation/$1,500.00==================+|
|+Install Pipe/$1,00>|
|
|
|
|--------------------+--------------------+--------------------+--------------------+--------------------+--------------------|
|
22
|
23
|
24
|
25
|
26
|
27
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|+========================================Erect Tower/$2,500.00=========================================>|
|<========Install Pipe/$1,000.00=========+|
|
|
|
|
|
|
|--------------------+--------------------+--------------------+--------------------+--------------------+--------------------|
|
29
|
30
|
31
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|<Erect Tower/$2,500+|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
-------------------------------------------------------------------------------------------------------------------------------
134
Chapter 5
This example produces a schedule calendar that displays only holidays. You can use
this same code to produce a set of blank calendars by removing the HOLIDATA= option
and the HOLISTART, HOLIVAR, and HOLIDUR statements from the PROC
CALENDAR step.
Program
Create the activities data set. Specify one activity in the rst month and one in the last, each
with a duration of 0. PROC CALENDAR does not print activities with zero durations in the
output.
data acts;
input sta : date7. act $ 11-30 dur;
datalines;
01JAN97
Start
0
31DEC97
Finish
0
;
Program
135
Set PAGESIZE= and LINESIZE= appropriately. To create larger boxes for each day in the
calendar output, increase the value of PAGESIZE=.
options nodate pageno=1 linesize=132 pagesize=30;
Create the calendar. DATA= identies the activities data set; HOLIDATA= identies the
holidays data set. FILL displays all months, even those with no activities. By default, only
months with activities appear in the report. INTERVAL=WORKDAY species that activities and
holidays are measured in 8-hour days and that PROC CALENDAR schedules activities only
Monday through Friday.
proc calendar data=acts holidata=holidays fill interval=workday;
Specify an activity start date variable and an activity duration variable. The START
statement species the variable in the activities data set that contains the starting date of the
activities; DUR species the variable that contains the duration of each activity. Creating a
schedule calendar requires START and DUR.
start sta;
dur dur;
136
Output
Chapter 5
Output
------------------------------------------------------------------------------------------------------------------------------|
|
January
1997
|
|
|
|
|-----------------------------------------------------------------------------------------------------------------------------|
|
Sunday
|
Monday
|
Tuesday
|
Wednesday
|
Thursday
|
Friday
|
Saturday
|
|-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------|
|
|
|
|
1
|
2
|
3
|
4
|
|
|
|
|***New Years****|
|
|
|
|-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------|
|
5
|
6
|
7
|
8
|
9
|
10
|
11
|
|
|
|
|
|
|
|
|
|-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------|
|
12
|
13
|
14
|
15
|
16
|
17
|
18
|
|
|
|
|
|
|
|
|
|-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------|
|
19
|
20
|
21
|
22
|
23
|
24
|
25
|
|
|
|
|
|
|
|
|
|-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------|
|
26
|
27
|
28
|
29
|
30
|
31
|
|
|
|
|
|
|
|
|
|
-------------------------------------------------------------------------------------------------------------------------------
137
12
------------------------------------------------------------------------------------------------------------------------------|
|
|
December
|
|
|
1997
|-----------------------------------------------------------------------------------------------------------------------------|
|
Sunday
|
Monday
|
Tuesday
|
Wednesday
|
Thursday
|
Friday
|
Saturday
|
|-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------|
|
7
|
8
|
9
|
10
|
11
|
12
|
13
|
|
|
|
|
|
|
|
|
|-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------|
|
14
|
15
|
16
|
17
|
18
|
19
|
20
|
|
|
|
|
|
|
|
|
|-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------|
|
21
|
22
|
23
|
24
|
25
|
26
|
27
|
|
|
|
|
|*Christmas Break*|*Christmas Break*|
|
|-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------|
|
28
|
29
|
30
|
31
|
|
|
|
|
|*Christmas Break*|*Christmas Break*|*Christmas Break*|
|
|
|
-------------------------------------------------------------------------------------------------------------------------------
indicate nonwork days, weekly work schedules, and workshifts with holidays,
calendar, and workshift data sets.)
2 indicate which activities are successors to others (precedence relationships).
3 dene resource limitations if you want them considered in the schedule.
4 provide an initial starting date.
138
Chapter 5
PROC CPM can process your data to generate a data set that contains the start and
end dates for each activity. PROC CPM schedules the activities, based on the duration
information, weekly work patterns, workshifts, as well as holidays and nonwork days
that interrupt the schedule. You can generate several views of the schedule that is
computed by PROC CPM, from a simple listing of start and nish dates to a calendar, a
Gantt chart, or a network diagram.
See Also
This example introduces users of PROC CALENDAR to more advanced SAS
scheduling tools. For an introduction to project management tasks and tools and
several examples, see Project Management Using the SAS System. For more examples,
see SAS/OR Software: Project Management Examples. For complete reference
documentation, see SAS/OR Users Guide: Project Management.
Program
Set appropriate options. If the linesize is not long enough to print the variable values, then
PROC CALENDAR either truncates the values or produces no calendar output. A longer
linesize also makes it easier to view a listing of a PROC CPM output data set.
options nodate pageno=1 linesize=132 pagesize=60;
Create the activities data set and identify separate calendars. This data identies two
calendars: the professors (the value of _CAL_ is Prof.) and the students (the value of _CAL_ is
Student). The Succ1 variable identies which activity cannot begin until the current one ends.
For example Analyze Exp 1 cannot begin until Run Exp 1 is completed. The DAYS value of 0
for JOBNUM 3, 6, and 8 indicates that these are milestones.
data grant;
input jobnum Task $ 4-22 Days Succ1 $ 27-45 aldate : date7. altype $
_cal_ $;
format aldate date7.;
datalines;
1 Run Exp 1
11 Analyze Exp 1
.
.
Student
2 Analyze Exp 1
5 Send Report 1
.
.
Prof.
3 Send Report 1
0 Run Exp 2
.
.
Prof.
4
5
6
7
8
9
;
Run Exp 2
11
Analyze Exp 2
4
Send Report 2
0
Write Final Report 4
Send Final Report
0
Site Visit
1
Analyze Exp 2
Send Report 2
Write Final Report
Send Final Report
.
.
.
.
.
18jul96
.
.
.
.
.
ms
Program
139
Student
Prof.
Prof.
Prof.
Student
Prof.
Create the holidays data set and identify which calendar a nonwork day belongs to.
The two holidays are listed twice, once for the professors calendar and once for the students.
Because each person is associated with a separate calendar, PROC CPM can apply the personal
vacation days to the appropriate calendars.
data nowork;
format holista date7. holifin date7.;
input holista : date7. holifin : date7. name $ 17-32 _cal_ $;
datalines;
04jul96 04jul96 Independence Day Prof.
02sep96 02sep96 Labor Day
Prof.
04jul96 04jul96 Independence Day Student
02sep96 02sep96 Labor Day
Student
15jul96 16jul96 PROF Vacation
Prof.
15aug96 16aug96 STUDENT Vacation Student
;
Calculate the schedule with PROC CPM. PROC CPM uses information supplied in the
activities and holidays data sets to calculate start and nish dates for each activity. The DATE=
option supplies the starting date of the project. The CALID statement is not required, even
though this example includes two calendars, because the calendar identication variable has the
special name _CAL_.
proc cpm data=grant
date=01jul96d
interval=weekday
out=gcpm1
holidata=nowork;
activity task;
successor succ1;
duration days;
calid _cal_;
id task;
aligndate aldate;
aligntype altype;
holiday holista / holifin=holifin;
run;
140
Program
Chapter 5
Print the output data set that was created with PROC CPM. This step is not required.
PROC PRINT is a useful way to view the calculations produced by PROC CPM. See Output 5.10.
proc print data=gcpm1;
title Data Set GCPM1, Created with PROC CPM;
run;
Sort GCPM1 by the variable that contains the activity start dates before using it with
PROC CALENDAR.
proc sort data=gcpm1;
by e_start;
run;
Create the schedule calendar. GCPM1 is the activity data set. PROC CALENDAR uses the
S_START and S_FINISH dates, calculated by PROC CPM, to print the schedule. The VAR
statement selects only the variable TASK to display on the calendar output. See Output 5.11.
proc calendar data=gcpm1
holidata=nowork
interval=workday;
start e_start;
fin
e_finish;
calid _cal_ / output=combine;
holistart holista;
holifin
holifin;
holivar name;
var task;
title Schedule for Experiment X-15;
title2 Professor and Student Schedule;
run;
Output
141
Output
Output 5.10
PROC PRINT displays the observations in GCPM1, showing the scheduling calculations created by PROC CPM.
Task
Succ1
Days
_cal_
E_START
E_FINISH
L_START
L_FINISH
T_FLOAT
F_FLOAT
Run Exp 1
Analyze Exp 1
11
Student
01JUL96
16JUL96
01JUL96
16JUL96
2
3
4
Analyze Exp 1
Send Report 1
Run Exp 2
Send Report 1
Run Exp 2
Analyze Exp 2
5
0
11
Prof.
Prof.
Student
17JUL96
24JUL96
24JUL96
23JUL96
24JUL96
07AUG96
17JUL96
24JUL96
24JUL96
23JUL96
24JUL96
07AUG96
0
0
0
0
0
0
5
6
7
Analyze Exp 2
Send Report 2
Write Final Report
Send Report 2
Write Final Report
Send Final Report
4
0
4
Prof.
Prof.
Prof.
08AUG96
14AUG96
14AUG96
13AUG96
14AUG96
19AUG96
08AUG96
14AUG96
14AUG96
13AUG96
14AUG96
19AUG96
0
0
0
0
0
0
8
9
0
1
Student
Prof.
20AUG96
18JUL96
20AUG96
18JUL96
20AUG96
18JUL96
20AUG96
18JUL96
0
0
0
0
142
Output
Output 5.11
Chapter 5
PROC CALENDAR created this schedule calendar by using the S_START and S_FINISH dates that were
calculated by PROC CPM. The activities on July 24th and August 14th, because they are milestones, do not
delay the start of a successor activity. Note that Site Visit occurs on July 18, the same day that Analyze Exp 1
occurs. To prevent this overallocation of resources, you can use resource constrained scheduling, available
in SAS/OR software.
July
|
|
1996
|
|
|----------------------------------------------------------------------------------------------------------------------|
|
Sunday
|
Monday
|
Tuesday
|
Wednesday
|
Thursday
|
Friday
|
Saturday
|
----------+----------------+----------------+----------------+----------------+----------------+----------------+----------------|
|
|
|
1
|
2
|
3
|
4
|
5
|
6
|
|.........|................|................|................|................|................|................|................|
| PROF.
|
|
|
|
|Independence Day|
|
|
|.........|................|................|................|................|................|................|................|
| STUDENT |
|+===================Run Exp 1====================>|Independence Day|<==Run Exp 1===>|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|---------+----------------+----------------+----------------+----------------+----------------+----------------+----------------|
|
|
7
|
8
|
9
|
10
|
11
|
12
|
13
|
|.........|................|................|................|................|................|................|................|
| STUDENT |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|---------+----------------+----------------+----------------+----------------+----------------+----------------+----------------|
|
|
14
|
15
|
16
|
17
|
18
|
19
|
20
|
|.........|................|................|................|................|................|................|................|
| PROF.
|
|*PROF Vacation**|*PROF Vacation**|
|+==Site Visit==+|
|
|
|
|
|
|
|+=================Analyze Exp 1==================>|
|
|.........|................|................|................|................|................|................|................|
| STUDENT |
|<===========Run Exp 1===========+|
|
|
|
|
|
|
|
|
|
|
|
|
|
|---------+----------------+----------------+----------------+----------------+----------------+----------------+----------------|
|
|
21
|
22
|
23
|
24
|
25
|
26
|
27
|
|.........|................|................|................|................|................|................|................|
| PROF.
|
|<=========Analyze Exp 1=========+|+Send Report 1=+|
|
|
|
|.........|................|................|................|................|................|................|................|
| STUDENT |
|
|
|+===================Run Exp 2====================>|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|---------+----------------+----------------+----------------+----------------+----------------+----------------+----------------|
|
|
28
|
29
|
30
|
31
|
|
|
|
|.........|................|................|................|................|................|................|................|
| STUDENT |
|<===================Run Exp 2====================>|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
----------------------------------------------------------------------------------------------------------------------------------
143
-----------------------------------------------------------------------------------------------------------------------|
|
|
August 1996
|
|
|
|----------------------------------------------------------------------------------------------------------------------|
|
Sunday
|
Monday
|
Tuesday
|
Wednesday
|
Thursday
|
Friday
|
Saturday
|
----------+----------------+----------------+----------------+----------------+----------------+----------------+----------------|
|
|
|
|
|
|
1
|
2
|
3
|
|.........|................|................|................|................|................|................|................|
| STUDENT |
|
|
|
|<===========Run Exp 2===========>|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|---------+----------------+----------------+----------------+----------------+----------------+----------------+----------------|
|
|
4
|
5
|
6
|
7
|
8
|
9
|
10
|
|.........|................|................|................|................|................|................|................|
| PROF.
|
|
|
|
|+=========Analyze Exp 2=========>|
|
|.........|................|................|................|................|................|................|................|
| STUDENT |
|<===================Run Exp 2====================+|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|---------+----------------+----------------+----------------+----------------+----------------+----------------+----------------|
|
|
11
|
12
|
13
|
14
|
15
|
16
|
17
|
|.........|................|................|................|................|................|................|................|
| PROF.
|
|
|
|+===============Write Final Report===============>|
|
|
|
|<=========Analyze Exp 2=========+|+Send Report 2=+|
|
|
|
|.........|................|................|................|................|................|................|................|
| STUDENT |
|
|
|
|STUDENT Vacation|STUDENT Vacation|
|
|
|
|
|
|
|
|
|
|
|---------+----------------+----------------+----------------+----------------+----------------+----------------+----------------|
|
|
18
|
19
|
20
|
21
|
22
|
23
|
24
|
|.........|................|................|................|................|................|................|................|
| PROF.
|
|<Write Final Re+|
|
|
|
|
|
|.........|................|................|................|................|................|................|................|
| STUDENT |
|
|+Send Final Rep+|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|---------+----------------+----------------+----------------+----------------+----------------+----------------+----------------|
|
|
|
|
|
|
|
|
|
|
|
|
25
|
|
|
26
|
|
|
|
|
|
27
|
|
|
|
|
|
|
|
|
28
|
|
|
|
|
|
29
|
|
|
30
|
|
|
|
|
|
|
|
|
31
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
----------------------------------------------------------------------------------------------------------------------------------
CALID statement:
_CAL_ variable
OUTPUT=SEPARATE option
FORMAT statement
LABEL statement
144
Chapter 5
MEAN statement
SUM statement
Other features:
PROC FORMAT:
PICTURE statement
This example
3 produces a summary calendar
3 displays holidays
3 produces sum and mean values by business day (observation) for three variables
3 prints a legend and uses variable labels
3 uses picture formats to display values.
Program
Create the activities data set. MEALS records how many meals were served for breakfast,
lunch, and dinner on the days that the cafeteria was open for business.
data meals;
input date
datalines;
02Dec96
03Dec96
04Dec96
05Dec96
06Dec96
09Dec96
10Dec96
11Dec96
12Dec96
13Dec96
16Dec96
17Dec96
18Dec96
19Dec96
20Dec96
23Dec96
;
234
188
183
267
165
198
176
176
187
187
165
.
143
198
176
187
238
198
176
243
177
187
187
231
222
123
177
167
167
187
187
123
Program
145
datalines;
26DEC96
Repairs
27DEC96
Repairs
30DEC96
Repairs
31DEC96
Repairs
24DEC96
Christmas Eve
25DEC96
Christmas
;
Sort the activities data set by the activity starting date. You are not required to sort the
holidays data set.
proc sort data=meals;
by date;
run;
Create picture formats for the variables that indicate how many meals were served.
proc format;
picture bfmt other = 000 Brkfst;
picture lfmt other = 000 Lunch ;
picture dfmt other = 000 Dinner;
run;
Set PAGESIZE= and LINESIZE= appropriately. The legend box prints on the next page if
PAGESIZE= is not set large enough. LINESIZE= controls the width of the cells in the calendar.
options nodate pageno=1 linesize=132 pagesize=60;
Create the summary calendar. DATA= identies the activities data set; HOLIDATA=
identies the holidays data set. The START statement species the variable in the activities
data set that contains the activity starting date; START is required.
proc calendar data=meals holidata=closed;
start date;
Retrieve holiday information. The HOLISTART and HOLIVAR statements specify the
variables in the holidays data set that contain the start date and the name of each holiday,
respectively. HOLISTART is required when you use a holidays data set.
holistart date;
holiname holiday;
146
Program
Chapter 5
Calculate, label, and format the sum and mean values. The SUM and MEAN statements
calculate sum and mean values for three variables and print them with the specied format.
The LABEL statement prints a legend and uses labels instead of variable names. The FORMAT
statement associates picture formats with three variables.
sum brkfst lunch dinner / format=4.0;
mean brkfst lunch dinner / format=6.2;
label brkfst = Breakfasts Served
lunch =
Lunches Served
dinner =
Dinners Served;
format brkfst bfmt.
lunch lfmt.
dinner dfmt.;
147
Output
Output 5.12
-------------------------------------------------------------------------------------------|
|
|
December 1996
|
|
|
|------------------------------------------------------------------------------------------|
|
Sunday
|
Monday
| Tuesday
| Wednesday | Thursday |
Friday
| Saturday |
|------------+------------+------------+------------+------------+------------+------------|
|
1
|
2
|
3
|
4
|
5
|
6
|
7
|
|
|
|
|
|
|
|
|
| 123 Brkfst | 188 Brkfst | 123 Brkfst | 200 Brkfst | 176 Brkfst |
|
|
|
| 234 Lunch | 188 Lunch | 183 Lunch | 267 Lunch | 165 Lunch |
|
|
| 238 Dinner | 198 Dinner | 176 Dinner | 243 Dinner | 177 Dinner |
|
|------------+------------+------------+------------+------------+------------+------------|
|
|
|
|
9
|
10
|
11
|
12
|
13
|
|
|
|
|
|
|
| 178 Brkfst | 165 Brkfst | 187 Brkfst | 176 Brkfst | 187 Brkfst |
14
|
|
|
|
| 198 Lunch | 176 Lunch | 176 Lunch | 187 Lunch | 187 Lunch |
|
|
| 187 Dinner | 187 Dinner | 231 Dinner | 222 Dinner | 123 Dinner |
|
|------------+------------+------------+------------+------------+------------+------------|
|
|
|
|
|
15
|
16
|
17
|
18
|
19
|
20
|
|
|
|
|
|
|
| 176 Brkfst | 156 Brkfst | 198 Brkfst | 178 Brkfst | 165 Brkfst |
| 165 Lunch |
. | 143 Lunch | 198 Lunch | 176 Lunch |
| 177 Dinner | 167 Dinner | 167 Dinner | 187 Dinner | 187 Dinner |
21
|
|
|
|
|
|------------+------------+------------+------------+------------+------------+------------|
|
22
|
23
|
24
|
25
|
26
|
27
|
28
|
|
|
|Christmas Ev|*Christmas**|**Repairs***|**Repairs***|
|
|
|
|
| 187 Brkfst |
| 187 Lunch |
| 123 Dinner |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|------------+------------+------------+------------+------------+------------+------------|
|
29
|
30
|
31
|
|
|
|
|
|
|**Repairs***|**Repairs***|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
-------------------------------------------------------------------------------------------------------------------------------|
| Sum | Mean |
|
|
|
|
| Breakfasts Served | 2763 | 172.69 |
|
Lunches Served | 2830 | 188.67 |
|
Dinners Served | 2990 | 186.88 |
-------------------------------------
148
Chapter 5
LEGEND
CALID statement:
_CAL_ variable
OUTPUT=SEPARATE option
OUTSTART statement
OUTFIN statement
SUM statement
Data sets:
This example
3
3
3
3
And set
OUTPUT= to
See Example
SEPARATE
3, 8
starting date
COMBINE
4, 2
starting date
MIX
Program
Specify the SAS data library where the activities data set is stored.
libname well SAS-data-library;
run;
Program
149
Sort the activities data set by the variables containing the calendar identication and
the starting date, respectively.
proc sort data=well.act;
by _cal_ date;
run;
Set PAGESIZE= and LINESIZE= appropriately. The legend box prints on the next page if
PAGESIZE= is not set large enough. LINESIZE= controls the width of the boxes.
options nodate pageno=1 linesize=132 pagesize=60;
Create the summary calendar. DATA= identies the activities data set; HOLIDATA=
identies the holidays data set; CALDATA= identies the calendar data set; WORKDATA=
identies the workdays data set. DATETIME species that the variable specied with the
START statement contains a SAS datetime value. LEGEND prints text that identies the
variables.
proc calendar data=well.act
holidata=well.hol
datetime legend;
Print each calendar on a separate page. The CALID statement species that the _CAL_
variable identies calendars. OUTPUT=SEPARATE prints information for each calendar on
separate pages.
calid _cal_ / output=separate;
Specify an activity start date variable and retrieve holiday information. The START
statement species the variable in the activities data set that contains the activity starting
date. The HOLISTART and HOLIVAR statements specify the variables in the holidays data set
that contain the start date and name of each holiday, respectively. These statements are
required when you use a holidays data set.
start date;
holistart date;
holivar holiday;
Calculate sum values. The SUM statement totals the COST variable for all observations in
each calendar.
sum cost / format=dollar10.2;
Display a 6-day week. OUTSTART and OUTFIN specify that the calendar display a 6-day
week, Monday through Saturday.
outstart Monday;
outfin Saturday;
150
Output
Chapter 5
Output
Output 5.13
|
|
|
|
|
|
|------------------+------------------+------------------+------------------+------------------+------------------|
|
8
|
9
|
10
|
11
|
12
|
13
|
|
|
| Build Pump House |
|
3 |
|
|
|
|
|
| Pour Foundation |
|
4 |
|
|
|
|
|
|
|
$2,000.00 |
|
|
$1,500.00 |
|
|
|------------------+------------------+------------------+------------------+------------------+------------------|
|
15
|
16
|
17
|
18
|
19
|
20
|
|
| Install Pump
|
|
|
4 |
|
|
|
|
|
|
|
| Install Pipe
|
|
| Erect Tower
2 |
|
|
6 |
|
$500.00 |
|
|
|
$1,000.00 |
$2,500.00 |
|------------------+------------------+------------------+------------------+------------------+------------------|
|
22
|
23
|
24
|
25
|
26
|
27
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|------------------+------------------+------------------+------------------+------------------+------------------|
|
|
|
29
|
|
|
30
|
|
|
31
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
------------------------------------------------------------------------------------------------------------------------------------------| Legend
|
| task
|
|
|
Sum
|
|
|
| dur
|
|
| cost
| $11,500.00 |
-------------------------
Output
151
July
|
|
|
1996
|-----------------------------------------------------------------------------------------------------------------|
|
Monday
|
Tuesday
|
Wednesday
|
Thursday
|
Friday
|
Saturday
|
|------------------+------------------+------------------+------------------+------------------+------------------|
|
1
|
2
|
3
|
4
|
5
|
6
|
|
|
|
|
|
|
|
| Deliver Material |
|
2 |
|
$500.00 |
| Excavate
|
|
4.75 |
|
$3,500.00 |
|
|
|
|
|
|
|
|
|
|------------------+------------------+------------------+------------------+------------------+------------------|
|
8
|
9
|
10
|
11
|
12
|
13
|
|
|*****Vacation*****|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|------------------+------------------+------------------+------------------+------------------+------------------|
|
15
|
16
|
17
|
18
|
19
|
20
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|------------------+------------------+------------------+------------------+------------------+------------------|
|
22
|
23
|
24
|
25
|
26
|
27
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|------------------+------------------+------------------+------------------+------------------+------------------|
|
|
|
29
|
|
|
30
|
|
|
31
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
------------------------------------------------------------------------------------------------------------------------------------------| Legend |
Sum
|
|
| task
| dur
|
|
|
|
|
|
| cost
| $4,000.00 |
-------------------------
152
153
CHAPTER
6
The CATALOG Procedure
Overview: CATALOG Procedure 153
Syntax: CATALOG Procedure 154
PROC CATALOG Statement 155
CHANGE Statement 157
CONTENTS Statement 157
COPY Statement 158
DELETE Statement 160
EXCHANGE Statement 161
EXCLUDE Statement 162
MODIFY Statement 163
SAVE Statement 164
SELECT Statement 165
Concepts: CATALOG Procedure 165
Interactive Processing with RUN Groups 165
Denition 165
How to End a PROC CATALOG Step 166
Error Handling and RUN Groups 166
Specifying an Entry Type 166
Four Ways to Supply an Entry Type 166
Why Use the ENTRYTYPE= Option? 167
Avoid a Common Error 167
The ENTRYTYPE= Option 167
Catalog Concatenation 168
Restrictions 168
Results: CATALOG Procedure 169
Examples: CATALOG Procedure 170
Example 1: Copying, Deleting, and Moving Catalog Entries from Multiple Catalogs 170
Example 2: Displaying Contents, Changing Names, and Changing a Description 174
Example 3: Using the FORCE Option with the KILL Option 176
154
Chapter 6
For more information on SAS data libraries and catalogs, refer to SAS Language
Reference: Concepts.
To learn how to use the SAS windowing environment to manage entries in a SAS
catalog, see the SAS online Help for the SAS Explorer window. You may prefer to use
the Explorer window instead of using PROC CATALOG. The window can do most of
what the procedure does.
Supports the Output Delivery System. See Output Delivery System on page 32
for details.
Tip:
You can perform similar functions with the SAS Explorer window and with
dictionary tables in the SQL procedure. For information on the Explorer window, see
the online Help. For information on PROC SQL, see Chapter 44, The SQL Procedure,
on page 1027.
Reminder:
See:
To do this
COPY, SELECT
COPY, EXCLUDE
To do this
155
SAVE
CHANGE
EXCHANGE
MODIFY
CONTENTS
To do this
ENTRYTYPE=
KILL
FORCE
Required Arguments
CATALOG=<libref.>catalog
CAT=, C=
the catalog.
156
Chapter 6
Options
ENTRYTYPE=etype
restricts processing of the current PROC CATALOG step to one entry type.
Alias:
ET=
catalog.
Interaction: The specied entry type applies to any one-level entry names used in a
In order to process multiple entry types in a single PROC CATALOG step, use
ENTRYTYPE= in a subordinate statement, not in the PROC CATALOG statement.
Tip:
FORCE
COPY
COPY MOVE
SAVE
Tip:
Featured in:
KILL
is specied.
Interaction: The SAVE statement has no effect because the KILL option deletes all
Tip:
Featured in:
CAUTION:
Do not attempt to limit the effects of the KILL option. This option deletes all entries in a
SAS catalog before any option or other statement takes effect. 4
CONTENTS Statement
157
CHANGE Statement
Renames one or more catalog entries.
You can change multiple names in a single CHANGE statement or use multiple
CHANGE statements.
Tip:
CHANGE old-name-1=new-name-1
<old-name-n=new-name-n>
</ ENTRYTYPE=etype>;
Required Arguments
old-name=new-name
species the current name of a catalog entry and the new name you want to assign to
it. Specify any valid SAS name.
Restriction: You must designate the type of the entry, either with the name
Options
ENTRYTYPE=etype
CONTENTS Statement
Lists the contents of a catalog in the procedure output or writes a list of the contents to a SAS
data set, an external le, or both.
Featured in: Example 2 on page 174
Without Options
The output is sent to the procedure output.
158
COPY Statement
Chapter 6
Options
Note: The ENTRYTYPE= (ET=) option is not available for the CONTENTS
statement. 4
CATALOG=<libref.>catalog
sends the contents to a SAS data set. When the statement executes, a message on
the SAS log reports that a data set has been created. The data set contains six
variables in this order:
LIBNAME
the libref
MEMNAME
NAME
TYPE
DESC
DATE
COPY Statement
Copies some or all of the entries in one catalog to another catalog.
Restriction: A COPY statements effect ends at a RUN statement or at the beginning of a
statement other than the SELECT or EXCLUDE statement.
Tip: Use SELECT or EXCLUDE statements, but not both, after the COPY statement to
limit which entries are copied.
Tip: You can copy entries from multiple catalogs in a single PROC step, not just the one
specied in the PROC CATALOG statement.
Tip: The ENTRYTYPE= option does not require a forward slash (/) in this statement.
Featured in: Example 1 on page 170
COPY Statement
To do this
ENTRYTYPE=
IN=
MOVE
NEW
NOEDIT
159
NOSOURCE
Required Arguments
OUT=<libref.>catalog
Options
ENTRYTYPE=etype
restricts processing to one entry type for the current COPY statement and any
subsequent SELECT or EXCLUDE statements.
The ENTRYTYPE= Option on page 167
See also: Specifying an Entry Type on page 166
See:
IN=<libref.>catalog
MOVE
deletes the original catalog or entries after the new copy is made.
Interaction: When MOVE removes all entries from a catalog, the procedure deletes
overwrites the destination (specied by OUT=) if it already exists. If you omit NEW,
PROC CATALOG updates the destination. For information about using the NEW
option with concatenated catalogs, see Catalog Concatenation on page 168.
160
DELETE Statement
Chapter 6
NOEDIT
prevents the copied version of the following SAS/AF entry types from being edited by
the BUILD procedure:
CBT
PROGRAM
FRAME
SCL
HELP
SYSTEM
MENU
Restriction: If you specify the NOEDIT option for an entry that is not one of these
types, it is ignored.
When creating SAS/AF applications for other users, use NOEDIT to protect the
application by preventing certain catalog entries from being altered.
Tip:
Featured in:
NOSOURCE
omits copying the source lines when you copy a SAS/AF PROGRAM, FRAME, or SCL
entry.
Alias:
NOSRC
Restriction: If you specify this option for an entry other than a PROGRAM,
DELETE Statement
Deletes entries from a SAS catalog.
Tip: Use DELETE to delete only a few entries; use SAVE when it is more convenient to
specify which entries not to delete.
Tip:
You can specify multiple entries. You can also use multiple DELETE statements.
Required Arguments
entry(s)
EXCHANGE Statement
161
Options
ENTRYTYPE=etype
EXCHANGE Statement
Switches the name of two catalog entries.
Restriction:
EXCHANGE name-1=other-name-1
<name-n=other-name-n>
</ ENTRYTYPE=etype>;
Required Arguments
name=other-name
species two catalog entry names that the procedure will switch.
Interaction: You can specify only the entry name without the entry type if you use
the ENTRYTYPE= option on either the PROC CATALOG statement or the
EXCHANGE statement.
See also: Specifying an Entry Type on page 166
Options
ENTRYTYPE=etype
162
EXCLUDE Statement
Chapter 6
EXCLUDE Statement
Species entries that the COPY statement does not copy.
Requires the COPY statement.
Restriction: Do not use the EXCLUDE statement with the SELECT statement.
Restriction:
Tip:
You can use multiple EXCLUDE statements with a single COPY statement within
a RUN group.
See also: COPY Statement on page 158 and SELECT Statement on page 165
Tip:
Featured in:
Required Arguments
entry(s)
Options
ENTRYTYPE=etype
MODIFY Statement
163
MODIFY Statement
Changes the description of a catalog entry.
Featured in: Example 2 on page 174
Required Arguments
entry
species the name of one SAS catalog entry. Optionally, you can specify the entry
type with the name.
Restriction: You must designate the type of the entry, either when you specify the
DESC
Use DESCRIPTION= with no text to remove the current description.
164
SAVE Statement
Chapter 6
Options
ENTRYTYPE=etype
SAVE Statement
Specify entries not to delete from a SAS catalog.
Restriction:
Tip: Use SAVE to delete all but a few entries in a catalog. Use DELETE when it is
more convenient to specify which entries to delete.
You can specify multiple entries and use multiple SAVE statements.
See also: DELETE Statement on page 160
Tip:
Required Arguments
entry(s)
Options
ENTRYTYPE=etype
165
SELECT Statement
Species entries that the COPY statement will copy.
Restriction:
You can use multiple SELECT statements with a single COPY statement within a
RUN group.
See also: COPY Statement on page 158 and EXCLUDE Statement on page 162
Tip:
Required Arguments
entry(s)
Options
ENTRYTYPE=etype
Denition
The CATALOG procedure is interactive. Once you submit a PROC CATALOG
statement, you can continue to submit and execute statements or groups of statements
without repeating the PROC CATALOG statement.
A set of procedure statements ending with a RUN statement is called a RUN group.
The changes specied in a given group of statements take effect when a RUN statement
is encountered.
166
Chapter 6
Note: When you enter a QUIT, DATA, or PROC statement, any statements following
the last RUN group execute before the CATALOG procedure terminates. If you enter a
RUN statement with the CANCEL option, however, the remaining statements do not
execute before the procedure ends. 4
See Example 2 on page 174.
Be careful when setting up batch jobs in which one RUN groups statements depend on the
effects of a previous RUN group, especially when deleting and renaming entries. 4
167
Example
delete
test1.program
test1.log test2.log;
ET= in parentheses
delete
test1 (et=program);
ENTRYTYPE= without a
slash2
1 in a subordinate statement
2 in the PROC CATALOG or the COPY statement
Note: All statements, except the CONTENTS statement, accept the ENTRYTYPE=
(alias ET=) option. 4
168
Catalog Concatenation
Chapter 6
Tip:
Catalog Concatenation
The CATALOG procedure supports both implicit and explicit concatenation of
catalogs. All statements and options that can be used on single (unconcatenated)
catalogs can be used on catalog concatenations.
Restrictions
When you use the CATALOG procedure to copy concatenated catalogs and you use
the NEW option, the following rules apply:
1 If the input catalog is a concatenation and if the output catalog exists in any level
of the input concatenation, the copy is not allowed.
2 If the output catalog is a concatenation and if the input catalog exists in the rst
level of the output concatenation, the copy is not allowed.
169
For example, the following code demonstrates these two rules, and the copy fails:
libname first path-name1;
libname second path-name2;
/* create concat.x */
libname concat (first second);
/* fails rule #1 */
proc catalog c=concat.x;
copy out=first.x new;
run;
quit;
/* fails rule #2 */
proc catalog c=first.x;
copy out=concat.x new;
run;
quit;
In summary, the following table shows when copies are allowed. In the table, A and
B are libraries, and each contains catalog X. Catalog C is an implicit concatenation of A
and B, and catalog D is an implicit concatenation of B and A.
Input catalog
Output catalog
Copy allowed?
C.X
B.X
No
C.X
D.X
No
D.X
C.X
No
A.X
A.X
No
A.X
B.X
Yes
B.X
A.X
Yes
C.X
A.X
No
B.X
C.X
Yes
A.X
C.X
No
170
Chapter 6
Catalog_Random
Catalog_Sequential
This example
Program
171
Input Catalogs
The SAS catalog PERM.SAMPLE contains the following entries:
DEFAULT
FSLETTER
LOAN
LOAN
BUILD
LOAN
CREDIT
TEST1
TEST2
TEST3
LOAN
CREDIT
TEST1
TEST2
TEST3
LOAN
PASSIST
PRTINFO
FORM
FORM
FRAME
HELP
KEYS
KEYS
LOG
LOG
LOG
LOG
PMENU
PROGRAM
PROGRAM
PROGRAM
PROGRAM
SCL
SLIST
KPRINTER
FORMAT
FORMATC
FORMAT:MAXLEN=16,16,12
FORMAT:MAXLEN=1,1,14
Program
Set the SAS system options. Write the source code to the log by specifying the SOURCE SAS
system option.
options nodate pageno=1 linesize=80 pagesize=60 source;
Assign a library reference to a SAS data library. The LIBNAME statement assigns the
libref PERM to the SAS data library that contains a permanent SAS catalog.
libname perm SAS-data-library;
172
Program
Chapter 6
Copy everything except three LOG entries and PASSIST.SLIST from PERM.SAMPLE
to WORK.TESTCAT. The EXCLUDE statement species which entries not to copy. ET=
species a default type. (ET=) species an exception to the default type.
copy out=testcat;
exclude test1 test2 test3
run;
Copy two formats from PERM.FORMATS to PERM.FINANCE. The IN= option enables
you to copy from a different catalog than the one specied in the PROC CATALOG statement.
Note the entry types for numeric and character formats: REVENUE.FORMAT is a numeric
format and DEPT.FORMATC is a character format. The COPY and SELECT statements execute
before the QUIT statement ends the PROC CATALOG step.
copy in=perm.formats out=perm.finance;
select revenue.format dept.formatc;
quit;
Log
Log
1
libname perm SAS-data-library;
NOTE: Directory for library PERM contains files of mixed engine types.
NOTE: Libref PERM was successfully assigned as follows:
Engine:
V9
Physical Name: SAS-data-library
2
options nodate pageno=1 linesize=80 pagesize=60 source;
3
proc catalog cat=perm.sample;
4
delete credit.program credit.log;
5
run;
NOTE: Deleting entry CREDIT.PROGRAM in catalog PERM.SAMPLE.
NOTE: Deleting entry CREDIT.LOG in catalog PERM.SAMPLE.
6
copy out=tcatall;
7
run;
NOTE: Copying entry DEFAULT.FORM from catalog PERM.SAMPLE to catalog
WORK.TCATALL.
NOTE: Copying entry FSLETTER.FORM from catalog PERM.SAMPLE to catalog
WORK.TCATALL.
NOTE: Copying entry LOAN.FRAME from catalog PERM.SAMPLE to catalog WORK.TCATALL.
NOTE: Copying entry LOAN.HELP from catalog PERM.SAMPLE to catalog WORK.TCATALL.
NOTE: Copying entry BUILD.KEYS from catalog PERM.SAMPLE to catalog WORK.TCATALL.
NOTE: Copying entry LOAN.KEYS from catalog PERM.SAMPLE to catalog WORK.TCATALL.
NOTE: Copying entry TEST1.LOG from catalog PERM.SAMPLE to catalog WORK.TCATALL.
NOTE: Copying entry TEST2.LOG from catalog PERM.SAMPLE to catalog WORK.TCATALL.
NOTE: Copying entry TEST3.LOG from catalog PERM.SAMPLE to catalog WORK.TCATALL.
NOTE: Copying entry LOAN.PMENU from catalog PERM.SAMPLE to catalog WORK.TCATALL.
NOTE: Copying entry TEST1.PROGRAM from catalog PERM.SAMPLE to catalog
WORK.TCATALL.
NOTE: Copying entry TEST2.PROGRAM from catalog PERM.SAMPLE to catalog
WORK.TCATALL.
NOTE: Copying entry TEST3.PROGRAM from catalog PERM.SAMPLE to catalog
WORK.TCATALL.
NOTE: Copying entry LOAN.SCL from catalog PERM.SAMPLE to catalog WORK.TCATALL.
NOTE: Copying entry PASSIST.SLIST from catalog PERM.SAMPLE to catalog
WORK.TCATALL.
NOTE: Copying entry PRTINFO.XPRINTER from catalog PERM.SAMPLE to catalog
WORK.TCATALL.
173
174
Chapter 6
8
copy out=testcat;
9
exclude test1 test2 test3 passist (et=slist) / et=log;
10
run;
NOTE: Copying entry DEFAULT.FORM from catalog PERM.SAMPLE to catalog
WORK.TESTCAT.
NOTE: Copying entry FSLETTER.FORM from catalog PERM.SAMPLE to catalog
WORK.TESTCAT.
NOTE: Copying entry LOAN.FRAME from catalog PERM.SAMPLE to catalog WORK.TESTCAT.
NOTE: Copying entry LOAN.HELP from catalog PERM.SAMPLE to catalog WORK.TESTCAT.
NOTE: Copying entry BUILD.KEYS from catalog PERM.SAMPLE to catalog WORK.TESTCAT.
NOTE: Copying entry LOAN.KEYS from catalog PERM.SAMPLE to catalog WORK.TESTCAT.
NOTE: Copying entry LOAN.PMENU from catalog PERM.SAMPLE to catalog WORK.TESTCAT.
NOTE: Copying entry TEST1.PROGRAM from catalog PERM.SAMPLE to catalog
WORK.TESTCAT.
NOTE: Copying entry TEST2.PROGRAM from catalog PERM.SAMPLE to catalog
WORK.TESTCAT.
NOTE: Copying entry TEST3.PROGRAM from catalog PERM.SAMPLE to catalog
WORK.TESTCAT.
NOTE: Copying entry LOAN.SCL from catalog PERM.SAMPLE to catalog WORK.TESTCAT.
NOTE: Copying entry PRTINFO.XPRINTER from catalog PERM.SAMPLE to catalog
WORK.TESTCAT.
11
copy out=logcat move;
12
select test1 test2 test3 / et=log;
13
run;
NOTE: Moving entry TEST1.LOG from catalog PERM.SAMPLE to catalog WORK.LOGCAT.
NOTE: Moving entry TEST2.LOG from catalog PERM.SAMPLE to catalog WORK.LOGCAT.
NOTE: Moving entry TEST3.LOG from catalog PERM.SAMPLE to catalog WORK.LOGCAT.
14
copy out=perm.finance noedit;
15
select loan.frame loan.help loan.keys loan.pmenu;
16
run;
NOTE: Copying entry LOAN.FRAME from catalog PERM.SAMPLE to catalog PERM.FINANCE.
NOTE: Copying entry LOAN.HELP from catalog PERM.SAMPLE to catalog PERM.FINANCE.
NOTE: Copying entry LOAN.KEYS from catalog PERM.SAMPLE to catalog PERM.FINANCE.
NOTE: Copying entry LOAN.PMENU from catalog PERM.SAMPLE to catalog PERM.FINANCE.
17
copy in=perm.formats out=perm.finance;
18
select revenue.format dept.formatc;
19
quit;
NOTE: Copying entry REVENUE.FORMAT from catalog PERM.FORMATS to catalog
PERM.FINANCE.
NOTE: Copying entry DEPT.FORMATC from catalog PERM.FORMATS to catalog
PERM.FINANCE.
Program
175
This example
3 lists the entries in a catalog and routes the output to a le
3 changes entry names
3 changes entry descriptions
Program
Set the SAS system options. The system option SOURCE writes the source code to the log.
options nodate pageno=1 linesize=80 pagesize=60 source;
Assign a library reference. The LIBNAME statement assigns a libref to the SAS data library
that contains a permanent SAS catalog.
libname perm SAS-data-library;
List the entries in a catalog and route the output to a le. The CONTENTS statement
creates a listing of the contents of the SAS catalog PERM.FINANCE and routes the output to a
le.
proc catalog catalog=perm.finance;
contents;
title1 Contents of PERM.FINANCE before changes are made;
run;
Change entry names. The CHANGE statement changes the name of an entry that contains a
user-written character format. (ET=) species the entry type.
change dept=deptcode (et=formatc);
run;
Process entries in multiple run groups. The MODIFY statement changes the description of
an entry. The CONTENTS statement creates a listing of the contents of PERM.FINANCE after
all the changes have been applied. QUIT ends the procedure.
modify loan.frame (description=Loan analysis app. - ver1);
contents;
title1 Contents of PERM.FINANCE after changes are made;
run;
quit;
176
Output
Chapter 6
Output
177
This example
3 creates a resource environment
3 tries to delete all catalog entries by using the KILL option but receives an error
3 species the FORCE option to successfully delete all catalog entries by using the
KILL option.
Program
Start a process (resource environment) by opening the catalog entry MATT in the
WORK.SASMACR catalog.
%macro matt;
%put &syscc;
%mend matt;
Specify the KILL option to delete all catalog entries in WORK.SASMACR. Since there is
a resource environment (process using the catalog), KILL will not work and an error is sent to
the log.
proc catalog c=work.sasmacr kill;
run;
quit;
Log
ERROR: You cannot open WORK.SASMACR.CATALOG for update access because
WORK.SASMACR.CATALOG is in use by you in resource environment
Line Mode Process.
WARNING: Command CATALOG not processed because of errors noted above.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE CATALOG used (Total process time):
real time
0.04 seconds
cpu time
0.03 seconds
178
Log
Chapter 6
Log
NOTE: Deleting entry MATT.MACRO in catalog WORK.SASMACR.
179
CHAPTER
7
The CHART Procedure
Overview: CHART Procedure 179
What Does the CHART Procedure Do? 179
What Types of Charts Can PROC CHART Create? 180
Syntax: CHART Procedure 184
PROC CHART Statement 185
BLOCK Statement 187
BY Statement 188
HBAR Statement 189
PIE Statement 189
STAR Statement 190
VBAR Statement 191
Customizing All Types of Charts 191
Concepts: CHART Procedure 197
Results: CHART Procedure 197
Missing Values 197
ODS Table Names 198
Portability of ODS Output with PROC CHART 198
Examples: CHART Procedure 198
Example 1: Producing a Simple Frequency Count 198
Example 2: Producing a Percentage Bar Chart 201
Example 3: Subdividing the Bars into Categories 203
Example 4: Producing Side-by-Side Bar Charts 206
Example 5: Producing a Horizontal Bar Chart for a Subset of the Data
Example 6: Producing Block Charts for BY Groups 210
References 213
209
180
Chapter 7
same types of charts as PROC CHART does. In addition, PROC GCHART can produce
donut charts.
Output 7.1
Count Sum
200 +
*****
|
*****
|
*****
|
*****
|
*****
150 +
*****
|
*****
|
*****
|
*****
*****
|
*****
*****
*****
100 +
*****
*****
*****
*****
|
*****
*****
*****
*****
|
*****
*****
*****
*****
|
*****
*****
*****
*****
|
*****
*****
*****
*****
50 +
*****
*****
*****
*****
|
*****
*****
*****
*****
*****
|
*****
*****
*****
*****
*****
|
*****
*****
*****
*****
*****
|
*****
*****
*****
*****
*****
-------------------------------------------------------------------Always
Usually
Sometimes
Rarely
Never
Response
Output 7.2 shows the same data presented in a horizontal bar chart. The two types
of bar charts have essentially the same characteristics, except that horizontal bar
charts by default display a table of statistic values to the right of the bars. The
following statements produce the output:
181
Always
Usually
Sometimes
Rarely
Never
1
Count
Sum
|
|*********************
|
|****************************************
|
|************************
|
|*******************
|
|*********
|
----+---+---+---+---+---+---+---+---+---+
20 40 60 80 100 120 140 160 180 200
106.0000
202.0000
119.0000
97.0000
44.0000
Count Sum
Block Charts
Block charts display the relative magnitude of data by using blocks of varying height,
each set in a square that represents a category of data. Output 7.3 shows the number
of each survey response in the form of a block chart.
options nodate pageno=1 linesize=80
pagesize=30;
proc chart data=survey;
block response / sumvar=count
midpoints=Always Usually
Sometimes Rarely Never;
run;
182
Output 7.3
Chapter 7
Block Chart
The SAS System
Usually
Sometimes
Rarely
Never
Response
Pie Charts
Pie charts represent the relative contribution of parts to the whole by displaying data
as wedge-shaped slices of a circle. Each slice represents a category of the data. Output
7.4 shows the survey results divided by response into ve pie slices. The following
statements produce the output:
options nodate pageno=1 linesize=80
pagesize=35;
proc chart data=survey;
pie response / sumvar=count;
run;
183
Never
***********
Rarely
****
.
****
**
.
.
**
**
. 44
.
**
*
.7.75%.
*
Always
**
97
.
..
**
**
17.08%
.
.
**
* ..
.
.
106
*
*
..
. .
18.66%
*
*
.. .
..
*
*
. .
*
*
+ . . .. . .. . .*
*
119
*
*
20.95%
..
*
Sometimes *
.
*
*
.
*
**
.
202
**
*
..
35.56%
*
*
.
*
** .
**
**
**
****
****
***********
Usually
Star Charts
With PROC CHART, you can produce star charts that show group frequencies, totals,
or mean values. A star chart is similar to a vertical bar chart, but the bars on a star
chart radiate from a center point, like spokes in a wheel. Star charts are commonly used
for cyclical data, such as measures taken every month or day or hour, or for data like
these in which the categories have an inherent order (always meaning more frequent
than usually which means more frequent than sometimes). Output 7.5 shows the
survey data displayed in a star chart. The following statements produce the output:
options nodate pageno=1 linesize=80
pagesize=60;
proc chart data=survey;
star response / sumvar=count;
run;
184
Output 7.5
Chapter 7
Star Chart
The SAS System
Center = 0
Outside = 202
Never
*************
44
*****
*****
***
***
***
***
**
*
Rarely **
97
*
**
*
*
**
*
**
*
*
*
*
*
*
*
**
*
**
*
*
*
*
Sometimes **
119
*
**
**
*
**
*
**
*
*......
. ..
......*.
..
..
.. ...
.
..
.
....
.
.. ..
...
.
.+..............*
.
.. ..
.
.
...
.
.
.
.
.
.
.. ...
.
..
. ..
.
.
*.
..
.
..
.
..
..
.
.
*
..
.
..
*
...
..
.
*
..
.
.
**
...
. ..
*
.
. .
**
***
... . .
***
***
.... ***
*****
*.***
*************
Usually
202
*
**
*
**
*
*
*
*
*
*
*
**
*
**
*
Always
106
185
Options
DATA=SAS-data-set
denes the characters to use for constructing the horizontal and vertical axes,
reference lines, and other structural parts of a chart. It also denes the symbols to
use to create the bars, blocks, or sections in the output.
position(s)
identies the position of one or more characters in the SAS formatting-character
string. A space or a comma separates the positions.
Default: Omitting (position(s)), is the same as specifying all 20 possible SAS
formatting characters, in order.
Range: PROC CHART uses 6 of the 20 formatting characters that SAS provides.
Table 7.1 on page 186 shows the formatting characters that PROC CHART uses.
Figure 7.1 on page 186 illustrates the use of formatting characters commonly
used in PROC CHART.
formatting-character(s)
lists the characters to use for the specied positions. PROC CHART assigns
characters in formatting-character(s) to position(s), in the order that they are
listed. For instance, the following option assigns the asterisk (*) to the second
formatting character, the pound sign (#) to the seventh character, and does not
alter the remaining characters:
formchar(2,7)=*#
Interaction: The SAS system option FORMCHAR= species the default formatting
characters. The system option denes the entire string of formatting characters.
The FORMCHAR= option in a procedure can redene selected characters.
186
Chapter 7
Tip:
formchar(2,7)=2D7Cx
See also: For information on which hexadecimal codes to use for which characters,
Default
Used to draw
Vertical axes in bar charts, the sides of the blocks in block charts, and
reference lines in horizontal bar charts. In side-by-side bar charts, the rst
and second formatting characters appear around each value of the group
variable (below the chart) to indicate the width of each group.
Horizontal axes in bar charts, the horizontal lines that separate the blocks
in a block chart, and reference lines in vertical bar charts. In side-by-side
bar charts, the rst and second formatting characters appear around each
value of the group variable (below the chart) to indicate the width of each
group.
Tick marks in bar charts and the centers in pie and star charts.
16
Ends of blocks and the diagonal lines that separate blocks in a block chart.
20
Pies_Sold Mean
400 +
| ***
***
300 +--***-------***---------***-------***-----------------------------------| ***
***
***
***
***
200 +--***--***--***---------***--***--***---------***-------***-------------| *** *** ***
*** *** ***
***
***
100 +--***--***--***---------***--***--***---------***--***--***-------------| *** *** *** ***
*** *** *** ***
*** *** *** ***
-------------------------------------------------------------------------a
b
c
r
a
b
c
r
a
b
c
r
Flavor
9
p
l
h
h
p
l
h
h
p
l
h
h
p
u
e
u
p
u
e
u
p
u
e
u
l
e
r
b
l
e
r
b
l
e
r
b
e
b
r
a
e
b
r
a
e
b
r
a
e
y
r
e
y
r
e
y
r
r
b
r
b
r
b
r
r
r
|----- Clyde ----|
LPI=value
Bakery
BLOCK Statement
187
species the proportions of PIE and STAR charts. The value is determined by
(lines per inch = columns per inch)
10
For example, if you have a printer with 8 lines per inch and 12 columns per inch,
then specify LPI=6.6667.
Default: 6
BLOCK Statement
Produces a block chart.
Featured in: Example 6 on page 210
Required Arguments
variable(s)
species the variables for which PROC CHART produces a block chart, one chart for
each variable.
Options
The options available on the BLOCK, HBAR, PIE, STAR, and VBAR statements are
documented in Customizing All Types of Charts on page 191.
Statement Results
Because each block chart must t on one output page, you may have to adjust the
SAS system options LINESIZE= and PAGESIZE= if you have a large number of charted
values for the BLOCK variable and for the variable specied in the GROUP= option.
Table 7.2 on page 187 shows the maximum number of charted values of BLOCK
variables for selected LINESIZE= (LS=) specications that can t on a 66-line page.
Table 7.2 Maximum Number of Bars of BLOCK Variables
GROUP= Value
LS= 132
LS= 120
LS= 105
LS= 90
LS= 76
LS= 64
0,1
5,6
188
BY Statement
Chapter 7
If the value of any GROUP= level is longer than three characters, then the maximum
number of charted values for the BLOCK variable that can t might be reduced by one.
BLOCK level values truncate to 12 characters. If you exceed these limits, then PROC
CHART produces a horizontal bar chart instead.
BY Statement
Produces a separate chart for each BY group.
Main discussion: BY on page 58
Featured in:
Required Arguments
variable
species the variable that the procedure uses to form BY groups. You can specify
more than one variable. If you do not use the NOTSORTED option in the BY
statement, then the observations in the data set must either be sorted by all the
variables that you specify, or they must be indexed appropriately. Variables in a BY
statement are called BY variables.
Options
DESCENDING
species that the observations are sorted in descending order by the variable that
immediately follows the word DESCENDING in the BY statement.
NOTSORTED
species that observations are not necessarily sorted in alphabetic or numeric order.
The observations are grouped in another way, for example, chronological order.
The requirement for ordering or indexing observations according to the values of
BY variables is suspended for BY-group processing when you use the NOTSORTED
option. In fact, the procedure does not use an index if you specify NOTSORTED. The
procedure denes a BY group as a set of contiguous observations that have the same
values for all BY variables. If observations with the same values for the BY variables
are not contiguous, then the procedure treats each contiguous set as a separate BY
group.
PIE Statement
189
HBAR Statement
Produces a horizontal bar chart.
Tip: HBAR charts can print either the name or the label of the chart variable.
Featured in: Example 5 on page 209
Required Argument
variable(s)
species the variables for which PROC CHART produces a horizontal bar chart, one
chart for each variable.
Options
The options available on the BLOCK, HBAR, PIE, STAR, and VBAR statements are
documented in Customizing All Types of Charts on page 191.
Statement Results
Each chart occupies one or more output pages, depending on the number of bars;
each bar occupies one line, by default.
By default, for horizontal bar charts of TYPE=FREQ, CFREQ, PCT, or CPCT, PROC
CHART prints the following statistics: frequency, cumulative frequency, percentage,
and cumulative percentage. If you use one or more of the statistics options, then PROC
CHART prints only the statistics that you request, plus the frequency.
PIE Statement
Produces a pie chart.
Required Argument
variable(s)
species the variables for which PROC CHART produces a pie chart, one chart for
each variable.
Options
The options available on the BLOCK, HBAR, PIE, STAR, and VBAR statements are
documented in Customizing All Types of Charts on page 191.
190
STAR Statement
Chapter 7
Statement Results
PROC CHART determines the number of slices for the pie in the same way that it
determines the number of bars for vertical bar charts. Any slices of the pie accounting
for less than three print positions are grouped together into an "OTHER" category.
The pies size is determined only by the SAS system options LINESIZE= and
PAGESIZE=. By default, the pie looks elliptical if your printer does not print 6 lines per
inch and 10 columns per inch. To make a circular pie chart on a printer that does not
print 6 lines and 10 columns per inch, use the LPI= option on the PROC CHART
statement. See the description of LPI= on page 186 for the formula that gives you the
proper LPI= value for your printer.
If you try to create a PIE chart for a variable with more than 50 levels, then PROC
CHART produces a horizontal bar chart instead.
STAR Statement
Produces a star chart.
Required Argument
variable(s)
species the variables for which PROC CHART produces a star chart, one chart for
each variable.
Options
The options available on the BLOCK, HBAR, PIE, STAR, and VBAR statements are
documented in Customizing All Types of Charts on page 191.
Statement Results
The number of points in the star is determined in the same way as the number of
bars for vertical bar charts.
If all the data values are positive, then the center of the star represents zero and the
outside circle represents the maximum value. If any data values are negative, then the
center represents the minimum. See the description of the AXIS= option on page 193
for more information about how to specify maximum and minimum values. For
information about how to specify the proportion of the chart, see the description of the
LPI= option on page 186.
If you try to create a star chart for a variable with more than 24 levels, then PROC
CHART produces a horizontal bar chart instead.
191
VBAR Statement
Produces a vertical bar chart.
Featured in: Example 1 on page 198, Example 2 on page 201, Example 3 on page 203,
Example 4 on page 206
Required Argument
variable(s)
species the variables for which PROC CHART produces a vertical bar chart, one
chart for each variable.
Options
The options available on the BLOCK, HBAR, PIE, STAR, and VBAR statements are
documented in Customizing All Types of Charts on page 191.
Statement Results
PROC CHART prints one page per chart. Along the vertical axis, PROC CHART
describes the chart frequency, the cumulative frequency, the chart percentage, the
cumulative percentage, the sum, or the mean. At the bottom of each bar, PROC CHART
prints a value according to the value of the TYPE= option, if specied. For character
variables or discrete numeric variables, this value is the actual value represented by
the bar. For continuous numeric variables, the value gives the midpoint of the interval
represented by the bar.
PROC CHART can automatically scale the vertical axis, determine the bar width,
and choose spacing between the bars. However, by using options, you can choose bar
intervals and the number of bars, include missing values in the chart, produce
side-by-side charts, and subdivide the bars. If the number of characters per line
(LINESIZE=) is not sufcient to display all vertical bars, then PROC CHART produces
a horizontal bar chart instead.
DISCRETE
FREQ=
MISSING
192
Chapter 7
To do this
SUMVAR=
TYPE=
Specify groupings
Group the bars in side-by-side charts
GROUP=
G100
GROUP=
LEVELS=
MIDPOINTS=
SUBGROUP=
Compute statistics
Compute the cumulative frequency for each bar
CFREQ
CPERCENT
FREQ
MEAN
PERCENT
SUM
ASCENDING
AXIS=
DESCENDING
GSPACE=
NOHEADER
NOSPACE
NOSTATS
NOSYMBOL
NOZEROS
REF=
SPACE=
SYMBOL=
WIDTH=
Options
ASCENDING
prints the bars and any associated statistics in ascending order of size within groups.
Alias: ASC
Restriction: Available only on the HBAR and VBAR statements
193
AXIS=value-expression
species the values for the response axis, where value-expression is a list of
individual values, each separated by a space, or a range with a uniform interval for
the values. For example, the following range species tick marks on a bar chart from
0 to 100 at intervals of 10:
hbar x / axis=0 to 100 by 10;
the scale, PROC CHART uses the maximum value from the AXIS= list. If no value
is greater than 0, then PROC CHART ignores the AXIS= option.
Interaction: For HBAR and VBAR charts, AXIS= determines tick marks on the
response axis. If the AXIS= specication contains only one value, then the value
determines the minimum tick mark if the value is less than 0, or determines the
maximum tick mark if the value is greater than 0.
Interaction: For STAR charts, a single AXIS= value sets the minimum (the center
of the chart) if the value is less than zero, or sets the maximum (the outside circle)
if the value is greater than zero. If the AXIS= specication contains more than one
value, then PROC CHART uses the minimum and maximum values from the list.
Interaction: If you use AXIS= and the BY statement, then PROC CHART produces
uniform axes over BY groups.
CAUTION:
Values in value-expression override the range of the data. For example, if the data
range is 1 to 10 and you specify a range of 3 to 5, then only the data in the range 3
to 5 appears on the chart. Values out of range produce a warning message in the
SAS log. 4
CFREQ
prints the bars and any associated statistics in descending order of size within groups.
Alias: DESC
Restriction: Available only on the HBAR and VBAR statements
DISCRETE
species that a numeric chart variable is discrete rather than continuous. Without
DISCRETE, PROC CHART assumes that all numeric variables are continuous and
automatically chooses intervals for them unless you use MIDPOINTS= or LEVELS=.
FREQ
species a data set variable that represents a frequency count for each observation.
Normally, each observation contributes a value of one to the frequency counts. With
FREQ=, each observation contributes its value of the FREQ= value.
194
Chapter 7
Restriction: If the FREQ= values are not integers, then PROC CHART truncates
them.
Interaction: If you use SUMVAR=, then PROC CHART multiplies the sums by the
FREQ= value.
GROUP=variable
produces side-by-side charts, with each chart representing the observations that have
a common value for the GROUP= variable. The GROUP= variable can be character
or numeric and is assumed to be discrete. For example, the following statement
produces a frequency bar chart for men and women in each department:
vbar gender / group=dept;
species the amount of extra space between groups of bars. Use GSPACE=0 to leave
no extra space between adjacent groups of bars.
Restriction: Available only on the HBAR and VBAR statements
Interaction: PROC CHART ignores GSPACE= if you omit GROUP=
G100
species that the sum of percentages for each group equals 100. By default, PROC
CHART uses 100 percent as the total sum. For example, if you produce a bar chart
that separates males and females into three age categories, then the six bars, by
default, add to 100 percent; however, with G100, the three bars for females add to
100 percent, and the three bars for males add to 100 percent.
Restriction: Available only on the BLOCK, HBAR, and VBAR statements
Interaction: PROC CHART ignores G100 if you omit GROUP=.
LEVELS=number-of-midpoints
species the number of bars that represent each chart variable when the variables
are continuous.
MEAN
denes the range of values that each bar, block, or section represents by specifying
the range midpoints.
The value for MIDPOINTS= is one of the following:
midpoint-specication
species midpoints, either individually, or across a range at a uniform interval.
For example, the following statement produces a chart with ve bars; the rst bar
represents the range of values of X with a midpoint of 10, the second bar
represents the range with a midpoint of 20, and so on:
vbar x / midpoints=10 20 30 40 50;
195
OLD
species an algorithm that PROC CHART used in previous versions of SAS to
choose midpoints for continuous variables. The old algorithm was based on the
work of Nelder (1976). The current algorithm that PROC CHART uses if you omit
OLD is based on the work of Terrell and Scott (1985).
Default: Without MIDPOINTS=, PROC CHART displays the values in the SAS
in ascending order.
MISSING
species that missing values are valid levels for the chart variable.
NOHEADER
Alias:
NOSTATS
Alias:
Alias:
prints the percentages of observations having a given value for the chart variable.
Restriction: Available only on the HBAR statement
REF=value(s)
Featured in:
SPACE=n
Use the GSPACE= option to specify the amount of space between the bars
within each group.
Tip:
196
Chapter 7
SUBGROUP=variable
subdivides each bar or block into characters that show the contribution of the values
of variable to that bar or block. PROC CHART uses the rst character of each value
to ll in the portion of the bar or block that corresponds to that value, unless more
than one value begins with the same rst character. In that case, PROC CHART
uses the letters A, B, C, and so on to ll in the bars or blocks. If the variable is
formatted, then PROC CHART uses the rst character of the formatted value.
The characters used in the chart and the values that they represent are given in a
legend at the bottom of the chart. The subgroup symbols are ordered A through Z
and 0 through 9 with the characters in ascending order.
PROC CHART calculates the height of a bar or block for each subgroup
individually and then rounds the percentage of the total bar up or down. So the total
height of the bar may be higher or lower than the same bar without the
SUBGROUP= option.
Restriction: Available only on the BLOCK, HBAR, and VBAR statements
Interaction: If you use both TYPE=MEAN and SUBGROUP=, then PROC CHART
rst calculates the mean for each variable that is listed in the SUMVAR= option,
then subdivides the bar into the percentages that each subgroup contributes.
Featured in: Example 3 on page 203
SUM
species the variable for which either values or means (depending on the value of
TYPE=) PROC CHART displays in the chart.
Interaction: If you use SUMVAR= and you use TYPE= with a value other than
MEAN or SUM, then TYPE=SUM overrides the specied TYPE= value.
Tip: Both HBAR and VBAR charts can print labels for SUMVAR= variables if you
use a LABEL statement.
Featured in: Example 3 on page 203, Example 4 on page 206, Example 5 on page
209, Example 6 on page 210
SYMBOL=character(s)
species the character or characters that PROC CHART uses in the bars or blocks of
the chart when you do not use the SUBGROUP= option.
Default: asterisk (*)
Restriction: Available only on the BLOCK, HBAR, and VBAR statements
Interaction: If the SAS system option OVP is in effect and if your printing device
supports overprinting, then you can specify up to three characters to produce
overprinted charts.
Featured in: Example 6 on page 210
TYPE=statistic
species what the bars or sections in the chart represent. The statistic is one of the
following:
CFREQ
species that each bar, block, or section represent the cumulative frequency.
CPERCENT
species that each bar, block, or section represent the cumulative percentage.
Missing Values
197
Alias: CPCT
FREQ
species that each bar, block, or section represent the frequency with which a
value or range occurs for the chart variable in the data.
MEAN
species that each bar, block, or section represent the mean of the SUMVAR=
variable across all observations that belong to that bar, block, or section.
Interaction: With TYPE=MEAN, you can only compute MEAN and FREQ
statistics.
Featured in: Example 4 on page 206
PERCENT
species that each bar, block, or section represent the percentage of observations
that have a given value or that fall into a given range of the chart variable.
Alias: PCT
Featured in: Example 2 on page 201
SUM
species that each bar, block, or section represent the sum of the SUMVAR=
variable for the observations that correspond to each bar, block, or section.
Default: FREQ (unless you use SUMVAR=, which causes a default of SUM)
Interaction: With TYPE=SUM, you can only compute SUM and FREQ statistics.
WIDTH=n
Missing Values
PROC CHART follows these rules when handling missing values:
3 Missing values are not considered as valid levels for the chart variable when you
use the MISSING option.
3 Missing values for a GROUP= or SUBGROUP= variable are treated as valid levels.
3 PROC CHART ignores missing values for the FREQ= option and the SUMVAR=
option.
198
Chapter 7
3 If the value of the FREQ= variable is missing, zero, or negative, then the
observation is excluded from the calculation of the chart statistic.
3 If the value of the SUMVAR= variable is missing, then the observation is excluded
from the calculation of the chart statistic.
Description
Statement Used
BLOCK
A block chart
BLOCK
HBAR
HBAR
PIE
A pie chart
PIE
STAR
A star chart
STAR
VBAR
VBAR
VBAR statement
This example produces a vertical bar chart that shows a frequency count for the
values of the chart variable.
Program
199
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60;
Create the SHIRTS data set. SHIRTS contains the sizes of a particular shirt that is sold
during a week at a clothing store, with one observation for each shirt that is sold.
data shirts;
input Size $ @@;
datalines;
medium
large
large
large
large
medium
medium
small
small
medium
medium
large
small
medium
large
large
large
small
medium
medium
medium
medium
medium
large
small
small
;
Create a vertical bar chart with frequency counts. The VBAR statement produces a
vertical bar chart for the frequency counts of the Size values.
proc chart data=shirts;
vbar size;
200
Output
Chapter 7
Output
The frequency chart shows the stores sales of the shirt for the week: 9
large shirts, 11 medium shirts, and 6 small shirts.
Program
201
This example produces a vertical bar chart. The chart statistic is the percentage for
each category of the total number of shirts sold.
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60;
Create a vertical bar chart with percentages. The VBAR statement produces a vertical bar
chart. TYPE= species percentage as the chart statistic for the variable Size.
proc chart data=shirts;
vbar size / type=percent;
202
Output
Chapter 7
Output
The chart shows the percentage of total sales for each shirt size. Of all
the shirts sold, about 42.3 percent were medium, 34.6 were large, and
23.1 were small.
40
35
30
25
20
15
10
|
*****
|
*****
+
*****
|
*****
|
*****
|
*****
|
*****
+
*****
*****
|
*****
*****
|
*****
*****
|
*****
*****
|
*****
*****
+
*****
*****
|
*****
*****
|
*****
*****
|
*****
*****
|
*****
*****
+
*****
*****
|
*****
*****
|
*****
*****
*****
|
*****
*****
*****
|
*****
*****
*****
+
*****
*****
*****
|
*****
*****
*****
|
*****
*****
*****
|
*****
*****
*****
|
*****
*****
*****
+
*****
*****
*****
|
*****
*****
*****
|
*****
*****
*****
|
*****
*****
*****
|
*****
*****
*****
+
*****
*****
*****
|
*****
*****
*****
|
*****
*****
*****
|
*****
*****
*****
|
*****
*****
*****
+
*****
*****
*****
|
*****
*****
*****
|
*****
*****
*****
|
*****
*****
*****
|
*****
*****
*****
-------------------------------------------large
medium
small
Size
Program
203
This example
3 produces a vertical bar chart for categories of one variable with bar lengths that
represent the values of another variable.
3 subdivides each bar into categories based on the values of a third variable.
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60;
Create the PIESALES data set. PIESALES contains the number of each avor of pie that is
sold for two years at three bakeries that are owned by the same company. One bakery is on
Samford Avenue, one on Oak Street, and one on Clyde Drive.
data piesales;
input Bakery $ Flavor $ Year Pies_Sold;
datalines;
Samford apple
1995 234
Samford apple
1996 288
Samford blueberry 1995 103
Samford blueberry 1996 143
Samford cherry
1995 173
Samford cherry
1996 195
Samford rhubarb
1995
26
Samford rhubarb
1996
28
Oak
apple
1995 319
Oak
apple
1996 371
Oak
blueberry 1995 174
Oak
blueberry 1996 206
Oak
cherry
1995 246
Oak
cherry
1996 311
Oak
rhubarb
1995
51
Oak
rhubarb
1996
56
Clyde
apple
1995 313
Clyde
apple
1996 415
Clyde
blueberry 1995 177
Clyde
blueberry 1996 201
204
Program
Chapter 7
Clyde
Clyde
Clyde
Clyde
;
cherry
cherry
rhubarb
rhubarb
1995
1996
1995
1996
250
328
60
59
Create a vertical bar chart with the bars that are subdivided into categories. The
VBAR statement produces a vertical bar chart with one bar for each pie avor. SUBGROUP=
divides each bar into sales for each bakery.
proc chart data=piesales;
vbar flavor / subgroup=bakery
Specify the bar length variable. SUMVAR= species Pies_Sold as the variable whose values
are represented by the lengths of the bars.
sumvar=pies_sold;
Output
205
Output
The bar that represents the sales of apple pies, for example, shows 1,940 total pies across both
years and all three bakeries. The symbol for the Samford Avenue bakery represents the 522
pies at the top, the symbol for the Oak Street bakery represents the 690 pies in the middle, and
the symbol for the Clyde Drive bakery represents the 728 pies at the bottom of the bar for apple
pies. By default, the labels along the horizontal axis are truncated to eight characters.
1800
1600
1400
1200
1000
800
600
400
200
|
SSSSS
|
SSSSS
|
SSSSS
+
SSSSS
|
SSSSS
|
SSSSS
|
SSSSS
+
SSSSS
|
SSSSS
|
SSSSS
SSSSS
|
OOOOO
SSSSS
+
OOOOO
SSSSS
|
OOOOO
SSSSS
|
OOOOO
SSSSS
|
OOOOO
SSSSS
+
OOOOO
SSSSS
|
OOOOO
OOOOO
|
OOOOO
OOOOO
|
OOOOO
SSSSS
OOOOO
+
OOOOO
SSSSS
OOOOO
|
OOOOO
SSSSS
OOOOO
|
OOOOO
SSSSS
OOOOO
|
OOOOO
SSSSS
OOOOO
+
OOOOO
OOOOO
OOOOO
|
CCCCC
OOOOO
OOOOO
|
CCCCC
OOOOO
OOOOO
|
CCCCC
OOOOO
OOOOO
+
CCCCC
OOOOO
CCCCC
|
CCCCC
OOOOO
CCCCC
|
CCCCC
OOOOO
CCCCC
|
CCCCC
OOOOO
CCCCC
+
CCCCC
CCCCC
CCCCC
|
CCCCC
CCCCC
CCCCC
|
CCCCC
CCCCC
CCCCC
|
CCCCC
CCCCC
CCCCC
SSSSS
+
CCCCC
CCCCC
CCCCC
OOOOO
|
CCCCC
CCCCC
CCCCC
OOOOO
|
CCCCC
CCCCC
CCCCC
CCCCC
|
CCCCC
CCCCC
CCCCC
CCCCC
-------------------------------------------------------apple
blueberr
cherry
rhubarb
Flavor
Symbol Bakery
C
Clyde
Symbol Bakery
O
Oak
Symbol Bakery
S
Samford
206
Chapter 7
This example
3 charts the mean values of a variable for the categories of another variable
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60;
Create a side-by-side vertical bar chart. The VBAR statement produces a side-by-side
vertical bar chart to compare the sales across values of Bakery, specied by GROUP=. Each
Bakery group contains a bar for each Flavor value.
proc chart data=piesales;
vbar flavor / group=bakery
Program
Create reference lines. REF= draws reference lines to mark pie sales at 100, 200, and 300.
ref=100 200 300
Specify the bar length variable. SUMVAR= species Pies_Sold as the variable that is
represented by the lengths of the bars.
sumvar=pies_sold
Specify the statistical variable. TYPE= averages the sales for 1995 and 1996 for each
combination of bakery and avor.
type=mean;
207
208
Output
Chapter 7
Output
The side-by-side bar charts compare the sales of apple pies, for example, across bakeries. The
mean for the Clyde Drive bakery is 364, the mean for the Oak Street bakery is 345, and the
mean for the Samford Avenue bakery is 261.
Pies_Sold Mean
| ***
350 + ***
***
| ***
***
| ***
***
| ***
***
| ***
***
300 +--***-------------------***---------------------------------------------| ***
***
***
| ***
***
***
***
| ***
***
***
***
| ***
***
***
***
***
250 + ***
***
***
***
***
| ***
***
***
***
***
| ***
***
***
***
***
| ***
***
***
***
***
| ***
***
***
***
***
200 +--***-------***---------***-------***---------***-----------------------| *** *** ***
*** *** ***
***
| *** *** ***
*** *** ***
***
***
| *** *** ***
*** *** ***
***
***
| *** *** ***
*** *** ***
***
***
150 + *** *** ***
*** *** ***
***
***
| *** *** ***
*** *** ***
***
***
| *** *** ***
*** *** ***
***
***
| *** *** ***
*** *** ***
*** *** ***
| *** *** ***
*** *** ***
*** *** ***
100 +--***--***--***---------***--***--***---------***--***--***-------------| *** *** ***
*** *** ***
*** *** ***
| *** *** ***
*** *** ***
*** *** ***
| *** *** ***
*** *** ***
*** *** ***
| *** *** *** ***
*** *** ***
*** *** ***
50 + *** *** *** ***
*** *** *** ***
*** *** ***
| *** *** *** ***
*** *** *** ***
*** *** ***
| *** *** *** ***
*** *** *** ***
*** *** *** ***
| *** *** *** ***
*** *** *** ***
*** *** *** ***
| *** *** *** ***
*** *** *** ***
*** *** *** ***
-------------------------------------------------------------------------a
b
c
r
a
b
c
r
a
b
c
r
Flavor
p
l
h
h
p
l
h
h
p
l
h
h
p
u
e
u
p
u
e
u
p
u
e
u
l
e
r
b
l
e
r
b
l
e
r
b
e
b
r
a
e
b
r
a
e
b
r
a
e
y
r
e
y
r
e
y
r
r
b
r
b
r
b
r
r
r
|----- Clyde ----|
Bakery
Program
209
This example
3 produces horizontal bar charts only for observations with a common value
3 charts the values of a variable for the categories of another variable
3 creates side-by-side bar charts for the categories of a third variable.
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60;
Specify the variable value limitation for the horizontal bar chart. WHERE= limits the
chart to only the 1995 sales totals.
proc chart data=piesales(where=(year=1995));
Create a side-by-side horizontal bar chart. The HBAR statement produces a side-by-side
horizontal bar chart to compare sales across values of Flavor, specied by GROUP=. Each
Flavor group contains a bar for each Bakery value.
hbar bakery / group=flavor
Specify the bar length variable. SUMVAR= species Pies_Sold as the variable whose values
are represented by the lengths of the bars.
sumvar=pies_sold;
210
Output
Chapter 7
Output
1995 Pie Sales for Each Bakery According to Flavor
Flavor
Bakery
apple
Clyde
Oak
Samford
blueberr
Clyde
Oak
Samford
cherry
Clyde
Oak
Samford
rhubarb
Clyde
Oak
Samford
Pies_Sold
Sum
|
|******************************************
|*******************************************
|*******************************
|
|************************
|***********************
|**************
|
|*********************************
|*********************************
|***********************
|
|********
|*******
|***
|
----+---+---+---+---+---+---+---+---+---+--30 60 90 120 150 180 210 240 270 300
Pies_Sold Sum
PROC SORT
SAS system options:
NOBYLINE
OVP
TITLE statement:
#BYVAL specication
Data set:
This example
3 sorts the data set
313.0000
319.0000
234.0000
177.0000
174.0000
103.0000
250.0000
246.0000
173.0000
60.0000
51.0000
26.0000
Program
211
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60;
Sort the input data set PIESALES. PROC SORT sorts PIESALES by year. Sorting is
required to produce a separate chart for each year.
proc sort data=piesales out=sorted_piesales;
by year;
run;
Suppress BY lines and allow overprinted characters in the block charts. NOBYLINE
suppresses the usual BY lines in the output. OVP allows overprinted characters in the charts.
options nobyline ovp;
Specify the BY group for multiple block charts. The BY statement produces one chart for
1995 sales and one for 1996 sales.
proc chart data=sorted_piesales;
by year;
Create a block chart. The BLOCK statement produces a block chart for each year. Each chart
contains a grid (Bakery values along the bottom, Flavor values along the side) of cells that
contain the blocks.
block bakery / group=flavor
Specify the bar length variable. SUMVAR= species Pies_Sold as the variable whose values
are represented by the lengths of the blocks.
sumvar=pies_sold
212
Output
Chapter 7
Suppress the default header line. NOHEADER suppresses the default header line.
noheader
Specify the block symbols. SYMBOL= species the symbols in the blocks.
symbol=OX;
Specify the titles. The #BYVAL specication inserts the year into the second line of the title.
title Pie Sales for Each Bakery and Flavor;
title2 #byval(year);
run;
Reset the printing of the default BY line. The SAS system option BYLINE resets the
printing of the default BY line.
options byline;
Output
Flavor
Bakery
References
Flavor
Bakery
References
Nelder, J.A. (1976), A Simple Algorithm for Scaling Graphs, Applied Statistics,
Volume 25, Number 1, London: The Royal Statistical Society.
Terrell, G.R. and Scott, D.W. (1985), Oversmoothed Nonparametric Density
Estimates, Journal of the American Statistical Association, 80, 389, 209214.
213
214
215
CHAPTER
8
The CIMPORT Procedure
Overview: CIMPORT Procedure 215
What Does the CIMPORT Procedure Do? 215
General File Transport Process 215
Syntax: CIMPORT Procedure 216
PROC CIMPORT Statement 216
EXCLUDE Statement 219
SELECT Statement 220
Results: CIMPORT Procedure 221
Examples: CIMPORT Procedure 222
Example 1: Importing an Entire Data Library 222
Example 2: Importing Individual Catalog Entries 223
Example 3: Importing a Single Indexed SAS Data Set 224
216
Chapter 8
2 If you are changing operating environments, move the transport le to the new
3 Use PROC CIMPORT to translate the transport le into the format appropriate
To do this
INFILE=
TAPE
EET=
ET=
DATECOPY
EXTENDSN=
MEMTYPE=
To do this
FORCE
Create a new catalog for the imported transport le, and delete
any existing catalog with the same name
NEW
NOEDIT
217
NOSRC
Required Arguments
destination=libref | < libref. >member-name
identies the type of le to import and species the specic catalog, SAS data set, or
SAS data library to import.
destination
identies the le or les in the transport le as a single catalog, as a single SAS
data set, or as the members of a SAS data library. The destination argument can
be one of the following:
CATALOG | CAT | C
DATA | DS | D
LIBRARY | LIB | L
libref | <libref. > member-name
species the specic catalog, SAS data set, or SAS data library as the destination
of the transport le. If the destination argument is CATALOG or DATA, you can
specify both a libref and a member name. If the libref is omitted, PROC CIMPORT
uses the default library as the libref, which is usually the WORK library. If the
destination argument is LIBRARY, specify only a libref.
Options
DATECOPY
copies the SAS internal date and time when the SAS le was created and the date
and time when it was last modied to the resulting destination le. Note that the
operating environment date and time are not preserved.
Restriction: DATECOPY can be used only when the destination le uses the V8 or
V9 engine.
You can alter the le creation date and time with the DTC= option on the
MODIFY statementMODIFY Statement on page 348 in a PROC DATASETS step.
Tip:
EET=(etype(s))
excludes specied entry types from the import process. If the etype is a single entry
type, then you can omit the parentheses. Separate multiple values with spaces.
Interaction: You cannot specify both the EET= option and the ET= option in the
218
Chapter 8
ET=(etype(s))
species the entry types to import. If the etype is a single entry type, then you can
omit the parentheses. Separate multiple values with spaces.
Interaction: You cannot specify both the EET= option and the ET= option in the
species whether to extend by 1 byte the length of short numerics (fewer than 8
bytes) when you import them. You can avoid a loss of precision when you transport a
short numeric in IBM format to IEEE format if you extend its length. You cannot
extend the length of an 8-byte short numeric.
Default: YES
Restriction: This option applies only to data sets.
Tip:
FORCE
enables access to a locked catalog. By default, PROC CIMPORT locks the catalog
that it is updating to prevent other users from accessing the catalog while it is being
updated. The FORCE option overrides this lock, which allows other users to access
the catalog while it is being imported, or allows you to import a catalog that is
currently being accessed by other users.
CAUTION:
The FORCE option can lead to unpredictable results. The FORCE option allows
multiple users to access the same catalog entry simultaneously. 4
INFILE=leref | lename
FILE=
Featured in:
MEMTYPE=mtype
species that only data sets, only catalogs, or both, be moved when a SAS library is
imported. Values for mtype can be
ALL
both catalogs and data sets
CATALOG | CAT
catalogs
DATA | DS
SAS data sets
NEW
creates a new catalog to contain the contents of the imported transport le when the
destination you specify has the same name as an existing catalog. NEW deletes any
existing catalog with the same name as the one you specify as a destination for the
import. If you do not specify NEW, and the destination you specify has the same
name as an existing catalog, PROC CIMPORT appends the imported transport le to
the existing catalog.
NOEDIT
EXCLUDE Statement
219
You obtain the same results if you create a new catalog to contain SCL code by
using the MERGE statement with the NOEDIT option in the BUILD procedure of
SAS/AF software.
Note: The NOEDIT option affects only SAS/AF PROGRAM and SCL entries. It
does not affect FSEDIT SCREEN and FSVIEW FORMULA entries. 4
Alias:
NEDIT
NOSRC
suppresses the importing of source code for SAS/AF entries that contain compiled
SCL code.
You obtain the same results if you create a new catalog to contain SCL code by
using the MERGE statement with the NOSOURCE option in the BUILD procedure
of SAS/AF software.
Alias: NSRC
Interaction: PROC CIMPORT ignores the NOSRC option if you use it with an
EXCLUDE Statement
Excludes specied les or entries from the import process.
There is no limit to the number of EXCLUDE statements you can use in one
invocation of PROC CIMPORT.
Tip:
Interaction: You can use either EXCLUDE statements or SELECT statements in a PROC
CIMPORT step, but not both.
Required Arguments
SAS le(s) | catalog entry(s)
species either the name(s) of one or more SAS les or the name(s) of one or more
catalog entries to be excluded from the import process. Specify SAS lenames if you
import a data library; specify catalog entry names if you import an individual SAS
catalog. Separate multiple lenames or entry names with a space. You can use
shortcuts to list many like-named les in the EXCLUDE statement. For more
information, see Shortcuts for Specifying Lists of Variable Names on page 24.
220
SELECT Statement
Chapter 8
Options
ENTRYTYPE=entry-type
species a single entry type for the catalog entry(s) listed in the EXCLUDE
statement. See SAS Language Reference: Concepts for a complete list of catalog entry
types.
Restriction: ENTRYTYPE= is valid only when you import an individual SAS
catalog.
Alias:
ETYPE=, ET=
MEMTYPE=mtype
species a single member type for the SAS le(s) listed in the EXCLUDE statement.
Values for mtype can be
ALL
both catalogs and data sets
CATALOG
catalogs
DATA
SAS data sets.
You can also specify the MEMTYPE= option, enclosed in parentheses, immediately
after the name of a le. In parentheses, MEMTYPE= identies the type of the
lename that just precedes it. When you use this form of the option, it overrides the
MEMTYPE= option that follows the slash in the EXCLUDE statement, but it must
match the MEMTYPE= option in the PROC CIMPORT statement.
Restriction: MEMTYPE= is valid only when you import a SAS data library.
Alias:
MTYPE=, MT=
Default: ALL
SELECT Statement
Species individual les or entries to import.
Tip: There is no limit to the number of SELECT statements you can use in one
invocation of PROC CIMPORT.
Interaction: You can use either EXCLUDE statements or SELECT statements in a PROC
CIMPORT step, but not both.
Featured in:
221
Required Arguments
SAS le(s) | catalog entry(s)
species either the name(s) of one or more SAS les or the name(s) of one or more
catalog entries to import. Specify SAS lenames if you import a data library; specify
catalog entry names if you import an individual SAS catalog. Separate multiple
lenames or entry names with a space. You can use shortcuts to list many
like-named les in the SELECT statement. For more information, see Shortcuts for
Specifying Lists of Variable Names on page 24.
Options
ENTRYTYPE=entry-type
species a single entry type for the catalog entry(s) listed in the SELECT statement.
See SAS Language Reference: Concepts for a complete list of catalog entry types.
Restriction: ENTRYTYPE= is valid only when you import an individual SAS
catalog.
Alias:
ETYPE=, ET=
MEMTYPE=mtype
species a single member type for the SAS le(s) listed in the SELECT statement.
Valid values are CATALOG or CAT, DATA, or ALL.
You can also specify the MEMTYPE= option, enclosed in parentheses, immediately
after the name of a le. In parentheses, MEMTYPE= identies the type of the
lename that just precedes it. When you use this form of the option, it overrides the
MEMTYPE= option that follows the slash in the SELECT statement, but it must
match the MEMTYPE= option in the PROC CIMPORT statement.
Restriction: MEMTYPE= is valid only when you import a SAS data library.
Alias:
MTYPE=, MT=
Default: ALL
222
Chapter 8
2 Move the transport le from the other environment to the newly created le under
This example shows how to use PROC CIMPORT to read from disk a transport le,
named TRANFILE, that PROC CPORT created from a SAS data library in another
operating environment. The transport le was moved to the new operating environment
by means of communications software or magnetic medium. PROC CIMPORT imports
the transport le to a SAS data library, called NEWLIB, in the new operating
environment.
Program
Specify the library name and lename. The LIBNAME statement species a libname for
the new SAS data library. The FILENAME statement species the lename of the transport le
that PROC CPORT created and enables you to specify any operating environment options for
le characteristics.
libname newlib SAS-data-library;
filename tranfile transport-file
host-option(s)-for-file-characteristics;
Import the SAS data library in the NEWLIB library. PROC CIMPORT imports the SAS
data library into the library named NEWLIB.
proc cimport library=newlib infile=tranfile;
run;
Program
223
SAS Log
NOTE: Proc
NOTE: Entry
NOTE: Entry
NOTE: Entry
NOTE: Entry
NOTE: Entry
NOTE: Total
NOTE:
NOTE:
NOTE:
NOTE:
This example shows how to use PROC CIMPORT to import the individual catalog
entries LOAN.PMENU and LOAN.SCL from the transport le TRANS2, which was
created from a single SAS catalog.
Program
Specify the library name, lename, and operating environment options. The LIBNAME
statement species a libname for the new SAS data library. The FILENAME statement species
the lename of the transport le that PROC CPORT created and enables you to specify any
operating environment options for le characteristics.
libname newlib SAS-data-library;
filename trans2 transport-file
host-option(s)-for-file-characteristics;
Import the specied catalog entries to the new SAS catalog. PROC CIMPORT imports
the individual catalog entries from the TRANS2 transport le and stores them in a new SAS
catalog called NEWLIB.FINANCE. The SELECT statement selects only the two specied
entries from the transport le to be imported into the new catalog.
proc cimport catalog=newlib.finance infile=trans2;
select loan.pmenu loan.scl;
run;
224
SAS Log
Chapter 8
SAS Log
NOTE:
NOTE:
NOTE:
NOTE:
This example shows how to use PROC CIMPORT to import an indexed SAS data set
from a transport le that was created by PROC CPORT from a single SAS data set.
Program
Specify the library name, lename, and operating environment options. The LIBNAME
statement species a libname for the new SAS data library. The FILENAME statement species
the lename of the transport le that PROC CPORT created and enables you to specify any
operating environment options for le characteristics.
libname newdata SAS-data-library;
filename trans3 transport-file
host-option(s)-for-file-characteristics;
Import the SAS data set. PROC CIMPORT imports the single SAS data set that you identify
with the DATA= specication in the PROC CIMPORT statement. PROC CPORT exported the
data set NEWDATA.TIMES in the transport le TRANS3.
proc cimport data=newdata.times infile=trans3;
run;
SAS Log
NOTE: Proc CIMPORT begins to create/update data set NEWDATA.TIMES
NOTE: The data set index x is defined.
NOTE: Data set contains 2 variables and 2 observations.
Logical record length is 16
225
CHAPTER
9
The COMPARE Procedure
Overview: COMPARE Procedure 226
What Does the COMPARE Procedure Do? 226
What Information Does PROC COMPARE Provide? 226
How Can PROC COMPARE Output Be Customized? 227
Syntax: COMPARE Procedure 229
PROC COMPARE Statement 229
BY Statement 236
ID Statement 237
VAR Statement 239
WITH Statement 239
Concepts: COMPARE Procedure 240
Comparisons Using PROC COMPARE 240
A Comparison by Position of Observations 240
A Comparison with an ID Variable 241
The Equality Criterion 242
Using the CRITERION= Option 242
Denition of Difference and Percent Difference 244
How PROC COMPARE Handles Variable Formats 244
Results: COMPARE Procedure 244
Results Reporting 244
SAS Log 244
Macro Return Codes (SYSINFO) 244
Procedure Output 246
Procedure Output Overview 246
Data Set Summary 246
Variables Summary 246
Observation Summary 247
Values Comparison Summary 248
Value Comparison Results 249
Table of Summary Statistics 250
Comparison Results for Observations (Using the TRANSPOSE Option) 252
ODS Table Names 253
Output Data Set (OUT=) 254
Output Statistics Data Set (OUTSTATS=) 255
Examples: COMPARE Procedure 256
Example 1: Producing a Complete Report of the Differences 256
Example 2: Comparing Variables in Different Data Sets 261
Example 3: Comparing a Variable Multiple Times 263
Example 4: Comparing Variables That Are in the Same Data Set 264
Example 5: Comparing Observations with an ID Variable 266
Example 6: Comparing Values of Observations Using an Output Data Set (OUT=) 270
226
Chapter 9
273
3
3
3
3
3
3
Further, PROC COMPARE creates two kinds of output data sets that give detailed
information about the differences between observations of variables it is comparing.
The following example compares the data sets PROCLIB.ONE and PROCLIB.TWO,
which contain similar data about students:
data proclib.one(label=First Data Set);
input student year $ state $ gr1 gr2;
label year=Year of Birth;
format gr1 4.1;
datalines;
1000 1970 NC 85 87
1042 1971 MD 92 92
1095 1969 PA 78 72
1187 1970 MA 87 94
;
data proclib.two(label=Second Data Set);
input student $ year $ state $ gr1
gr2 major $;
label state=Home State;
format gr1 5.2;
datalines;
1000
1042
1095
1187
1204
;
1970
1971
1969
1970
1971
NC
MA
PA
MD
NC
84
92
79
87
82
87
92
73
74
96
227
Math
History
Physics
Dance
French
COMPARE Procedure
Comparison of PROCLIB.ONE with PROCLIB.TWO
(Method=EXACT)
Data Set Summary
Dataset
Created
PROCLIB.ONE
PROCLIB.TWO
Modified
NVar
NObs
13MAY98:15:01:42
13MAY98:15:01:44
13MAY98:15:01:42
13MAY98:15:01:44
5
6
4
5
Label
First Data Set
Second Data Set
Variables Summary
Number
Number
Number
Number
of
of
of
of
Variables
Variables
Variables
Variables
in Common: 5.
in PROCLIB.TWO but not in PROCLIB.ONE: 1.
with Conflicting Types: 1.
with Differing Attributes: 3.
Dataset
Type
Length
student
PROCLIB.ONE
PROCLIB.TWO
Num
Char
8
8
Dataset
Type
Length
year
PROCLIB.ONE
PROCLIB.TWO
PROCLIB.ONE
PROCLIB.TWO
Char
Char
Char
Char
8
8
8
8
state
Format
Label
Year of Birth
Home State
228
Chapter 9
COMPARE Procedure
Comparison of PROCLIB.ONE with PROCLIB.TWO
(Method=EXACT)
Listing of Common Variables with Differing Attributes
Variable
Dataset
Type
gr1
PROCLIB.ONE
PROCLIB.TWO
Num
Num
Length
8
8
Format
Label
4.1
5.2
Observation Summary
Observation
First
First
Last
Last
Last
Obs
Unequal
Unequal
Match
Obs
Base
Compare
1
1
4
4
.
1
1
4
4
5
COMPARE Procedure
Comparison of PROCLIB.ONE with PROCLIB.TWO
(Method=EXACT)
Values Comparison Summary
Number of Variables Compared with All Observations Equal: 1.
Number of Variables Compared with Some Observations Unequal: 3.
Total Number of Values which Compare Unequal: 6.
Maximum Difference: 20.
Type
Len
state
gr1
gr2
CHAR
NUM
NUM
8
8
8
Compare Label
Home State
Ndif
MaxDif
2
2
2
1.000
20.000
Procedure Output on page 246 shows the default output for these two data sets.
Example 1 on page 256 shows the complete output for these two data sets.
229
To do this
BY
ID
VAR
230
Chapter 9
To do this
BASE=
COMPARE=
OUT=
OUTALL
OUTBASE
OUTCOMP
OUTDIF
OUTNOEQUAL
OUTPERCENT
OUTSTATS=
CRITERION=
METHOD=
ALLOBS
ALLVARS
BRIEFSUMMARY
FUZZ=
MAXPRINT=
NODATE
NOPRINT
NOSUMMARY
NOVALUES
PRINTALL
To do this
Print the value differences by observation, not by
variable
231
LISTALL
LISTBASE
LISTBASEOBS
LISTBASEVAR
LISTCOMP
LISTCOMPOBS
LISTCOMPVAR
LISTEQUALVAR
LISTOBS
LISTVAR
Options
ALLOBS
includes in the report of value comparison results the values and, for numeric
variables, the differences for all matching observations, even if they are judged equal.
Default: If you omit ALLOBS, then PROC COMPARE prints values only for
ALLVARS option and displays the values for all matching observations and
variables.
ALLSTATS
statistics produced
ALLVARS
includes in the report of value comparison results the values and, for numeric
variables, the differences for all pairs of matching variables, even if they are judged
equal.
Default: If you omit ALLVARS, then PROC COMPARE prints values only for
values in context with the values for other matching variables. If you omit the
TRANSPOSE option, then ALLVARS invokes the ALLOBS option and displays the
values for all matching observations and variables.
232
Chapter 9
BASE=SAS-data-set
Alias:
You can use the WHERE= data set option with the BASE= option to limit the
observations that are available for comparison.
Tip:
BRIEFSUMMARY
produces a short comparison summary and suppresses the four default summary
reports (data set summary report, variables summary report, observation summary
report, and values comparison summary report).
BRIEF
Alias:
Tip:
Featured in:
COMPARE=SAS-data-set
COMP=, C=
Default: If you omit COMPARE=, then the comparison data set is the same as the
base data set, and PROC COMPARE compares variables within the data set.
Restriction: If you omit COMPARE=, then you must use the WITH statement.
You can use the WHERE= data set option with COMPARE= to limit the
observations that are available for comparison.
Tip:
CRITERION=
species the criterion for judging the equality of numeric values. Normally, the value
of
(gamma) is positive, in which case the number itself becomes the equality
criterion. If you use a negative value for
, then PROC COMPARE uses an equality
criterion proportional to the precision of the computer on which SAS is running.
Default: 0.00001
See also: The Equality Criterion on page 242 for more information
ERROR
displays an error message in the SAS log when differences are found.
Interaction: This option overrides the WARNING option.
FUZZ=number
alters the values comparison results for numbers less than number. PROC
COMPARE prints
0-1
A report that contains many trivial differences is easier to read in this form.
LISTALL
lists all variables and observations that are found in only one data set.
Alias
LIST
233
LISTBASE
lists all observations and variables that are found in the base data set but not in the
comparison data set.
Interaction: Using LISTBASE is equivalent to using the LISTBASEOBS and
LISTBASEVAR options.
LISTBASEOBS
lists all observations that are found in the base data set but not in the comparison
data set.
LISTBASEVAR
lists all variables that are found in the base data set but not in the comparison data
set.
LISTCOMP
lists all observations and variables that are found in the comparison data set but not
in the base data set.
Interaction: Using LISTCOMP is equivalent to using the LISTCOMPOBS and
LISTCOMPVAR options.
LISTCOMPOBS
lists all observations that are found in the comparison data set but not in the base
data set.
LISTCOMPVAR
lists all variables that are found in the comparison data set but not in the base data
set.
LISTEQUALVAR
prints a list of variables whose values are judged equal at all observations in addition
to the default list of variables whose values are judged unequal.
LISTOBS
lists all observations that are found in only one data set.
Interaction: Using LISTOBS is equivalent to using the LISTBASEOBS and
LISTCOMPOBS options.
LISTVAR
lists all variables that are found in only one data set.
Interaction: Using LISTVAR is equivalent to using both the LISTBASEVAR and
LISTCOMPVAR options.
MAXPRINT=total | (per-variable, total)
species the method for judging the equality of numeric values. The constant
(delta) is a number between 0 and 1 that species a value to add to the denominator
when calculating the equality measure. By default, is 0.
234
Chapter 9
Unless you use the CRITERION= option, the default method is EXACT. If you use
the CRITERION= option, then the default method is RELATIVE(), where (phi) is
a small number that depends on the numerical precision of the computer on which
SAS is running and on the value of CRITERION=.
See also: The Equality Criterion on page 242
NODATE
suppresses the display in the data set summary report of the creation dates and the
last modied dates of the base and comparison data sets.
NOMISSBASE
judges a missing value in the base data set equal to any value. (By default, a missing
value is equal only to a missing value of the same kind, that is .=., .^=.A, .A=.A,
.A^=.B, and so on.)
You can use this option to determine the changes that would be made to the
observations in the comparison data set if it were used as the master data set and
the base data set were used as the transaction data set in a DATA step UPDATE
statement. For information on the UPDATE statement, see the chapter on SAS
language statements in SAS Language Reference: Dictionary.
NOMISSCOMP
judges a missing value in the comparison data set equal to any value. (By default, a
missing value is equal only to a missing value of the same kind, that is .=., .^=.A,
.A=.A, .A^=.B, and so on.)
You can use this option to determine the changes that would be made to the
observations in the base data set if it were used as the master data set and the
comparison data set were used as the transaction data set in a DATA step UPDATE
statement. For information on the UPDATE statement, see the chapter on SAS
language statements in SAS Language Reference: Dictionary.
NOMISSING
judges missing values in both the base and comparison data sets equal to any value.
By default, a missing value is only equal to a missing value of the same kind, that is
.=., .^=.A, .A=.A, .A^=.B, and so on.
Alias:
NOMISS
NOMISSCOMP.
NOPRINT
NOSUMMARY
suppresses the data set, variable, observation, and values comparison summary
reports.
NOSUMMARY produces no output if there are no differences in the matching
values.
Featured in: Example 2 on page 261
Tips:
NOTE
displays notes in the SAS log that describe the results of the comparison, whether or
not differences were found.
NOVALUES
Featured in:
235
OUT=SAS-data-set
names the output data set. If SAS-data-set does not exist, then PROC COMPARE
creates it. SAS-data-set contains the differences between matching variables.
See also: Output Data Set (OUT=) on page 254
Featured in: Example 6 on page 270
OUTALL
writes an observation to the output data set for each observation in the base data set
and for each observation in the comparison data set. The option also writes
observations to the output data set that contains the differences and percent
differences between the values in matching observations.
Tip: Using OUTALL is equivalent to using the following four options: OUTBASE,
OUTCOMP, OUTDIF, and OUTPERCENT.
See also: Output Data Set (OUT=) on page 254
OUTBASE
writes an observation to the output data set for each observation in the base data set,
creating observations in which _TYPE_=BASE.
See also: Output Data Set (OUT=) on page 254
Featured in: Example 6 on page 270
OUTCOMP
writes an observation to the output data set for each observation in the comparison
data set, creating observations in which _TYPE_=COMP.
See also: Output Data Set (OUT=) on page 254
Featured in: Example 6 on page 270
OUTDIF
writes an observation to the output data set for each pair of matching observations.
The values in the observation include values for the differences between the values
in the pair of observations. The value of _TYPE_ in each observation is DIF.
Default: The OUTDIF option is the default unless you specify the OUTBASE,
OUTCOMP, or OUTPERCENT option. If you use any of these options, then you
must explicitly specify the OUTDIF option to create _TYPE_=DIF observations in
the output data set.
See also: Output Data Set (OUT=) on page 254
Featured in: Example 6 on page 270
OUTNOEQUAL
suppresses the writing of an observation to the output data set when all values in
the observation are judged equal. In addition, in observations containing values for
some variables judged equal and others judged unequal, the OUTNOEQUAL option
uses the special missing value ".E" to represent differences and percent differences
for variables judged equal.
See also: Output Data Set (OUT=) on page 254
Featured in: Example 6 on page 270
OUTPERCENT
writes an observation to the output data set for each pair of matching observations.
The values in the observation include values for the percent differences between the
values in the pair of observations. The value of _TYPE_ in each observation is
PERCENT.
See also: Output Data Set (OUT=) on page 254
236
BY Statement
Chapter 9
OUTSTATS=SAS-data-set
writes summary statistics for all pairs of matching variables to the specied
SAS-data-set.
If you want to print a table of statistics in the procedure output, then use the
STATS, ALLSTATS, or PRINTALL option.
See also: Output Statistics Data Set (OUTSTATS=) on page 255 and Table of
Summary Statistics on page 250.
Tip:
Featured in:
PRINTALL
STATS
prints a table of summary statistics for all pairs of matching numeric variables that
are judged unequal.
See also: Table of Summary Statistics on page 250 for information on the
statistics produced.
TRANSPOSE
displays a warning message in the SAS log when differences are found.
Interaction: The ERROR option overrides the WARNING option.
BY Statement
Produces a separate comparison for each BY group.
Main discussion: BY on page 58
Required Arguments
variable
species the variable that the procedure uses to form BY groups. You can specify
more than one variable. If you do not use the NOTSORTED option in the BY
statement, then the observations in the data set must be sorted by all the variables
that you specify. Variables in a BY statement are called BY variables.
ID Statement
237
Options
DESCENDING
species that the observations are sorted in descending order by the variable that
immediately follows the word DESCENDING in the BY statement.
NOTSORTED
species that observations are not necessarily sorted in alphabetic or numeric order.
The observations are grouped in another way, for example, chronological order.
The requirement for ordering observations according to the values of BY variables is
suspended for BY-group processing when you use the NOTSORTED option. The
procedure denes a BY group as a set of contiguous observations that have the same
values for all BY variables. If observations with the same values for the BY variables
are not contiguous, then the procedure treats each contiguous set as a separate BY
group.
ID Statement
Lists variables to use to match observations.
See also: A Comparison with an ID Variable on page 241
Featured in: Example 5 on page 266
ID <DESCENDING> variable-1
<< DESCENDING> variable-n>
<NOTSORTED>;
238
ID Statement
Chapter 9
Required Arguments
variable
species the variable that the procedure uses to match observations. You can specify
more than one variable, but the data set must be sorted by the variable or variables
you specify. These variables are ID variables. ID variables also identify observations
on the printed reports and in the output data set.
Options
DESCENDING
species that the data set is sorted in descending order by the variable that
immediately follows the word DESCENDING in the ID statement.
If you use the DESCENDING option, then you must sort the data sets. SAS does
not use an index to process an ID statement with the DESCENDING option.
Further, the use of DESCENDING for ID variables must correspond to the use of the
DESCENDING option in the BY statement in the PROC SORT step that was used to
sort the data sets.
NOTSORTED
species that observations are not necessarily sorted in alphabetic or numeric order.
The data are grouped in another way, for example, chronological order.
See also: Comparing Unsorted Data on page 238
WITH Statement
239
3 prints the warning Duplicate Observations for the rst occurrence for that data
set
3 prints the total number of duplicate observations found in the data set in the
observation summary report
3 uses the rst observation with the duplicate value for the comparison.
When the data sets are not sorted, PROC COMPARE detects only those duplicate
observations that occur in succession.
VAR Statement
Restricts the comparison of the values of variables to those named in the VAR statement.
Example 2 on page 261, Example 3 on page 263, and Example 4 on page 264
Featured in:
VAR variable(s);
Required Arguments
variable(s)
one or more variables that appear in the BASE= and COMPARE= data sets or only
in the BASE= data set.
Details
3 If you do not use the VAR statement, then PROC COMPARE compares the values
of all matching variables except those appearing in BY and ID statements.
3 If a variable in the VAR statement does not exist in the COMPARE= data set, then
PROC COMPARE writes a warning message to the SAS log and ignores the
variable.
3 If a variable in the VAR statement does not exist in the BASE= data set, then
PROC COMPARE stops processing and writes an error message to the SAS log.
3 The VAR statement restricts only the comparison of values of matching variables.
PROC COMPARE still reports on the total number of matching variables and
compares their attributes. However, it produces neither error nor warning
messages about these variables.
WITH Statement
Compares variables in the base data set with variables that have different names in the
comparison data set, and compares different variables that are in the same data set.
Restriction:
You must use the VAR statement when you use the WITH statement.
Featured in:
Example 2 on page 261, Example 3 on page 263, and Example 4 on page 264
240
Chapter 9
WITH variable(s);
Required Arguments
variable(s)
241
NAME
GENDER
GPA
2998
Bagwell
3.722
9866
Metcalf
3.342
2118
Gray
3.177
3847
Baglione
4.000
2342
Hall
3.574
NAME
GENDER
GPA
YEAR
2998
Bagwell
3.722
9866
Metcalf
3.342
2118
Gray
3.177
3847
Baglione
4.000
2342
Hall
3.574
7565
Gold
3.609
1755
Syme
3.883
When you use PROC COMPARE to compare data set TWO with data set ONE, the
procedure compares the rst observation in data set ONE with the rst observation in
data set TWO, and it compares the second observation in the rst data set with the
second observation in the second data set, and so on. In each observation that it
compares, the procedure compares the values of the IDNUM, NAME, GENDER, and
GPA.
The procedure does not report on the values of the last two observations or the
variable YEAR in data set TWO because there is nothing to compare them with in data
set ONE.
242
Figure 9.2
Chapter 9
IDNUM
NAME
GENDER
GPA
2998
Bagwell
3.722
9866
Metcalf
3.342
2118
Gray
3.177
3847
Baglione
4.000
2342
Hall
3.574
NAME
GENDER
GPA
YEAR
2998
Bagwell
3.722
9866
Metcalf
3.342
2118
Gray
3.177
3847
Baglione
4.000
2342
Hall
3.574
7565
Gold
3.609
1755
Syme
3.883
The data sets contain three matching variables: NAME, GENDER, and GPA. They
also contain ve matching observations: the observations with values of 2998, 9866,
2118, 3847, and 2342 for IDNUM.
Data Set TWO contains two observations (IDNUM=7565 and IDNUM=1755) for
which data set ONE contains no matching observations. Similarly, no variable in data
set ONE matches the variable YEAR in data set TWO.
See Example 5 on page 266 for an example that uses an ID variable.
3 The RELATIVE method compares the absolute relative difference to the value
specied by CRITERION=.
3 The PERCENT method compares the absolute percent difference to the value
specied by CRITERION=.
For a numeric variable compared, let x be its value in the base data set and let y be
its value in the comparison data set. If both x and y are nonmissing, then the values
243
are judged unequal according to the value of METHOD= and the value of CRITERION=
(
) as follows:
3 If METHOD=EXACT, then the values are unequal if y does not equal x.
3 If METHOD=ABSOLUTE, then the values are unequal if
ABS (y 0 x) >
3 If METHOD=RELATIVE, then the values are unequal if
y=0
6
for x = 0
244
Chapter 9
0x
Percent Difference = (y
0 x) =x 3 100
for x 6= 0
Results Reporting
PROC COMPARE reports the results of its comparisons in the following ways:
3
3
3
3
SAS Log
When you use the WARNING, PRINTALL, or ERROR option, PROC COMPARE
writes a description of the differences to the SAS log.
Condition
Code
Hex
Description
DSLABEL
0001X
DSTYPE
0002X
INFORMAT
0004X
FORMAT
0008X
LENGTH
16
0010X
LABEL
32
0020X
BASEOBS
64
0040X
COMPOBS
128
0080X
BASEBY
256
0100X
10
COMPBY
512
0200X
11
BASEVAR
1024
0400X
12
COMPVAR
2048
0800X
13
VALUE
4096
1000X
14
TYPE
8192
2000X
15
BYVAR
16384
4000X
16
ERROR
32768
8000X
These codes are ordered and scaled to enable a simple check of the degree to which
the data sets differ. For example, if you want to check that two data sets contain the
same variables, observations, and values, but you do not care about differences in
labels, formats, and so forth, then use the following statements:
proc compare base=SAS-data-set
compare=SAS-data-set;
run;
%if &sysinfo >= 64 %then
%do;
handle error;
%end;
You can examine individual bits in the SYSINFO value by using DATA step
bit-testing features to check for specic conditions. For example, to check for the
presence of observations in the base data set that are not in the comparison data set,
use the following statements:
proc compare base=SAS-data-set
compare=SAS-data-set;
run;
245
246
Procedure Output
Chapter 9
%let rc=&sysinfo;
data _null_;
if &rc=1......b then
put Observations in Base but not
in Comparison Data Set;
run;
PROC COMPARE must run before you check SYSINFO and you must obtain the
SYSINFO value before another SAS step starts because every SAS step resets
SYSINFO.
Procedure Output
Procedure Output Overview
The following sections show and describe the default output of the two data sets
shown in Overview: COMPARE Procedure on page 226. Because PROC COMPARE
produces lengthy output, the output is presented in seven pieces.
Output 9.2
Partial Output
COMPARE Procedure
Comparison of PROCLIB.ONE with PROCLIB.TWO
(Method=EXACT)
Data Set Summary
Dataset
PROCLIB.ONE
PROCLIB.TWO
Created
Modified
NVar
NObs
11SEP97:15:11:07
11SEP97:15:11:10
11SEP97:15:11:09
11SEP97:15:11:10
5
6
4
5
Label
Variables Summary
This report compares the variables in the two data sets. The rst part of the report
lists the following:
Procedure Output
247
of
of
of
of
Variables
Variables
Variables
Variables
in Common: 5.
in PROCLIB.TWO but not in PROCLIB.ONE: 1.
with Conflicting Types: 1.
with Differing Attributes: 3.
Dataset
Type
Length
student
PROCLIB.ONE
PROCLIB.TWO
Num
Char
8
8
Dataset
Type
Length
year
PROCLIB.ONE
PROCLIB.TWO
PROCLIB.ONE
PROCLIB.TWO
PROCLIB.ONE
PROCLIB.TWO
Char
Char
Char
Char
Num
Num
8
8
8
8
8
8
state
gr1
Format
Label
Year of Birth
Home State
4.1
5.2
Observation Summary
This report provides information about observations in the base and comparison data
sets. First of all, the report identies the rst and last observation in each data set, the
rst and last matching observations, and the rst and last differing observations. Then,
the report lists the following:
3 the number of observations that the data sets have in common
3 the number of observations in the base data set that are not in the comparison
data set and vice versa
3 the total number of observations in each data set
3 the number of matching observations for which PROC COMPARE judged some
variables unequal
3 the number of matching observations for which PROC COMPARE judged all
variables equal.
248
Procedure Output
Chapter 9
Output 9.4
Partial Output
Observation Summary
Observation
First
First
Last
Last
Last
Obs
Unequal
Unequal
Match
Obs
Base
Compare
1
1
4
4
.
1
1
4
4
5
3
3
3
3
3
In addition, for the variables for which some matching observations have unequal
values, the report lists
3 the name of the variable
Procedure Output
249
Type
Len
state
gr1
gr2
CHAR
NUM
NUM
8
8
8
Compare Label
Home State
Ndif
MaxDif
2
2
2
1.000
20.000
3
3
3
3
the
the
the
the
250
Procedure Output
Output 9.6
Chapter 9
Partial Output
Value Comparison Results for Variables
__________________________________________________________
|| Home State
|| Base Value
Compare Value
Obs || state
state
________ || ________
________
||
2 || MD
MA
4 || MA
MD
__________________________________________________________
__________________________________________________________
||
Base
Compare
Obs ||
gr1
gr1
Diff.
% Diff
________ || _________ _________ _________ _________
||
1 ||
85.0
84.00
-1.0000
-1.1765
3 ||
78.0
79.00
1.0000
1.2821
__________________________________________________________
__________________________________________________________
||
Base
Compare
Obs ||
gr2
gr2
Diff.
% Diff
________ || _________ _________ _________ _________
||
3 ||
72.0000
73.0000
1.0000
1.3889
4 ||
94.0000
74.0000
-20.0000
-21.2766
__________________________________________________________
You can suppress the value comparison results with the NOVALUES option. If you
use both the NOVALUES and TRANSPOSE options, then PROC COMPARE lists for
each observation the names of the variables with values judged unequal but does not
display the values and differences.
Procedure Output
251
MAX
the maximum value
MIN
the minimum value
STDERR
the standard error of the mean
T
the T ratio (MEAN/STDERR)
PROB> | T |
the probability of a greater absolute T value if the true population mean is 0.
NDIF
the number of matching observations judged unequal, and the percent of the
matching observations that were judged unequal.
DIFMEANS
the difference between the mean of the base values and the mean of the
comparison values. This line contains three numbers. The rst is the mean
expressed as a percentage of the base values mean. The second is the mean
expressed as a percentage of the comparison values mean. The third is the
difference in the two means (the comparison mean minus the base mean).
R
the correlation of the base and comparison values for matching observations that
are nonmissing in both data sets.
RSQ
the square of the correlation of the base and comparison values for matching
observations that are nonmissing in both data sets.
Output 9.7 is from the ALLSTATS option using the two data sets shown in
Overview:
252
Procedure Output
Output 9.7
Chapter 9
Partial Output
Value Comparison Results for Variables
__________________________________________________________
||
Base
Compare
Obs ||
gr1
gr1
Diff.
% Diff
________ || _________ _________ _________ _________
||
1 ||
85.0
84.00
-1.0000
-1.1765
3 ||
78.0
79.00
1.0000
1.2821
________ || _________ _________ _________ _________
||
N
||
4
4
4
4
Mean
||
85.5000
85.5000
0
0.0264
Std
||
5.8023
5.4467
0.8165
1.0042
Max
||
92.0000
92.0000
1.0000
1.2821
Min
||
78.0000
79.0000
-1.0000
-1.1765
StdErr
||
2.9011
2.7234
0.4082
0.5021
t
||
29.4711
31.3951
0.0000
0.0526
Prob>|t| ||
<.0001
<.0001
1.0000
0.9614
||
Ndif
||
2
50.000%
DifMeans ||
0.000%
0.000%
0
r, rsq
||
0.991
0.983
__________________________________________________________
__________________________________________________________
||
Base
Compare
Obs ||
gr2
gr2
Diff.
% Diff
________ || _________ _________ _________ _________
||
3 ||
72.0000
73.0000
1.0000
1.3889
4 ||
94.0000
74.0000
-20.0000
-21.2766
________ || _________ _________ _________ _________
||
N
||
4
4
4
4
Mean
||
86.2500
81.5000
-4.7500
-4.9719
Std
||
9.9457
9.4692
10.1776
10.8895
Max
||
94.0000
92.0000
1.0000
1.3889
Min
||
72.0000
73.0000
-20.0000
-21.2766
StdErr
||
4.9728
4.7346
5.0888
5.4447
t
||
17.3442
17.2136
-0.9334
-0.9132
Prob>|t| ||
0.0004
0.0004
0.4195
0.4285
||
Ndif
||
2
50.000%
DifMeans ||
-5.507%
-5.828%
-4.7500
r, rsq
||
0.451
0.204
__________________________________________________________
Note: If you use a wide line size with PRINTALL, then PROC COMPARE prints the
value comparison result for character variables next to the result for numeric variables.
In that case, PROC COMPARE calculates only NDIF for the character variables. 4
_OBS_2=number-2
253
where number-1 is the number of the observation in the base data set for which the
value of the variable is shown, and number-2 is the number of the observation in the
comparison data set.
Output 9.8 shows the differences in PROCLIB.ONE and PROCLIB.TWO by
observation instead of by variable.
Compare
84.00
_OBS_1=2 _OBS_2=2:
Variable
Base Value
state
MD
Compare
MA
_OBS_1=3 _OBS_2=3:
Variable
Base Value
gr1
78.0
gr2
72.000000
_OBS_1=4 _OBS_2=4:
Variable
Base Value
gr2
94.000000
state
MA
Diff.
-1.000000
% Diff
-1.176471
Compare
79.00
73.000000
Diff.
1.000000
1.000000
% Diff
1.282051
1.388889
Compare
74.000000
MD
Diff.
-20.000000
% Diff
-21.276596
If you use an ID statement, then the identifying label has the following form:
ID-1=ID-value-1 ...
ID-n=ID-value-n
where ID is the name of an ID variable and ID-value is the value of the ID variable.
Note: When you use the TRANSPOSE option, PROC COMPARE prints only the
rst 12 characters of the value. 4
Description
Generated...
CompareDatasets
by default, unless
NOSUMMARY or NOVALUES
option is specied
CompareDetails (Comparison
Results for Observations)
254
Chapter 9
Table Name
Description
Generated...
CompareDifferences
CompareSummary
Summary report of
observations, values, and
variables with unequal values
by default
CompareVariables
A listing of differences in
variable types or attributes
between the base data set and
the compare data set
255
PERCENT
The values in this observation are the percent differences between the values
in the base and comparison data sets. For character variables the values in
observations of type PERCENT are the same as the values in observations of
type DIF.
_OBS_
is a numeric variable that contains a number further identifying the source of the
OUT= observations.
For observations with _TYPE_ equal to BASE, _OBS_ is the number of the
observation in the base data set from which the values of the VAR variables were
copied. Similarly, for observations with _TYPE_ equal to COMPARE, _OBS_ is the
number of the observation in the comparison data set from which the values of the
VAR variables were copied.
For observations with _TYPE_ equal to DIF or PERCENT, _OBS_ is a sequence
number that counts the matching observations in the BY group.
_OBS_ has the label Observation Number.
The COMPARE procedure takes variable names and attributes for the OUT= data
set from the base data set except for the lengths of ID and VAR variables, for which it
uses the longer length regardless of which data set that length is from. This behavior
has two important repercussions:
3 If you use the VAR and WITH statements, then the names of the variables in the
OUT= data set come from the VAR statement. Thus, observations with _TYPE_
equal to BASE contain the values of the VAR variables, while observations with
_TYPE_ equal to COMPARE contain the values of the WITH variables.
3 If you include a variable more than once in the VAR statement in order to compare
it with more than one variable, then PROC COMPARE can include only the rst
comparison in the OUT= data set because each variable must have a unique name.
Other comparisons produce warning messages.
For an example of the OUT= option, see Example 6 on page 270.
256
Chapter 9
_BASE_
is a numeric variable that contains the value of the statistic calculated from the
values of the variable named by _VAR_ in the observations in the base data set
with matching observations in the comparison data set.
_COMP_
is a numeric variable that contains the value of the statistic calculated from the
values of the variable named by the _VAR_ variable (or by the _WITH_ variable if
you use the WITH statement) in the observations in the comparison data set with
matching observations in the base data set.
_DIF_
is a numeric variable that contains the value of the statistic calculated from the
differences of the values of the variable named by the _VAR_ variable in the base
data set and the matching variable (named by the _VAR_ or _WITH_ variable) in
the comparison data set.
_PCTDIF_
is a numeric variable that contains the value of the statistic calculated from the
percent differences of the values of the variable named by the _VAR_ variable in
the base data set and the matching variable (named by the _VAR_ or _WITH_
variable) in the comparison data set.
Note: For both types of output data sets, PROC COMPARE assigns one of the
following data set labels:
Comparison of base-SAS-data-set
with comparison-SAS-data-set
Comparison of variables in base-SAS-data-set
4
Labels are limited to 40 characters.
See Example 7 on page 273 for an example of an OUTSTATS= data set.
This example shows the most complete report that PROC COMPARE produces as
procedure output.
Program
257
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=40;
Create a complete report of the differences between two data sets. BASE= and
COMPARE= specify the data sets to compare. PRINTALL prints a full report of the differences.
proc compare base=proclib.one compare=proclib.two printall;
title Comparing Two Data Sets: Full Report;
run;
258
Output
Chapter 9
Output
A > in the output marks information that is in the full report but not in the default report. The
additional information includes a listing of variables found in one data set but not the other, a
listing of observations found in one data set but not the other, a listing of variables with all
equal values, and summary statistics. For an explanation of the statistics, see Table of
Summary Statistics on page 250.
COMPARE Procedure
Comparison of PROCLIB.ONE with PROCLIB.TWO
(Method=EXACT)
Data Set Summary
Dataset
Created
PROCLIB.ONE
PROCLIB.TWO
Modified
NVar
NObs
11SEP97:16:19:59
11SEP97:16:20:01
11SEP97:16:20:01
11SEP97:16:20:01
5
6
4
5
Label
First Data Set
Second Data Set
Variables Summary
Number
Number
Number
Number
of
of
of
of
Variables
Variables
Variables
Variables
in Common: 5.
in PROCLIB.TWO but not in PROCLIB.ONE: 1.
with Conflicting Types: 1.
with Differing Attributes: 3.
major
Type
Length
Char
Dataset
Type
Length
student
PROCLIB.ONE
PROCLIB.TWO
Num
Char
8
8
Output
COMPARE Procedure
Comparison of PROCLIB.ONE with PROCLIB.TWO
(Method=EXACT)
Listing of Common Variables with Differing Attributes
Variable
Dataset
Type
Length
year
PROCLIB.ONE
PROCLIB.TWO
PROCLIB.ONE
PROCLIB.TWO
PROCLIB.ONE
PROCLIB.TWO
Char
Char
Char
Char
Num
Num
8
8
8
8
8
8
state
gr1
Format
Label
Year of Birth
Home State
4.1
5.2
Observation Summary
Observation
First
First
Last
Last
Last
Base
1
1
4
4
.
Obs
Unequal
Unequal
Match
Obs
Compare
1
1
4
4
5
COMPARE Procedure
Comparison of PROCLIB.ONE with PROCLIB.TWO
(Method=EXACT)
Values Comparison Summary
Number of Variables Compared with All Observations Equal: 1.
Number of Variables Compared with Some Observations Unequal: 3.
Total Number of Values which Compare Unequal: 6.
Maximum Difference: 20.
Variable
year
Type
CHAR
Len
8
Label
Year of Birth
Type
Len
state
gr1
gr2
CHAR
NUM
NUM
8
8
8
Compare Label
Home State
Ndif
MaxDif
2
2
2
1.000
20.000
259
260
Output
Chapter 9
COMPARE Procedure
Comparison of PROCLIB.ONE with PROCLIB.TWO
(Method=EXACT)
Value Comparison Results for Variables
__________________________________________________________
|| Year of Birth
|| Base Value
Compare Value
Obs || year
year
________ || ________
________
||
1 || 1970
1970
2 || 1971
1971
3 || 1969
1969
4 || 1970
1970
__________________________________________________________
__________________________________________________________
|| Home State
|| Base Value
Compare Value
Obs || state
state
________ || ________
________
||
1 || NC
NC
2 || MD
MA
3 || PA
PA
4 || MA
MD
__________________________________________________________
>
__________________________________________________________
||
Base
Compare
Obs ||
gr1
gr1
Diff.
% Diff
________ || _________ _________ _________ _________
||
1 ||
85.0
84.00
-1.0000
-1.1765
2 ||
92.0
92.00
0
0
3 ||
78.0
79.00
1.0000
1.2821
4 ||
87.0
87.00
0
0
________ || _________ _________ _________ _________
||
N
||
4
4
4
4
Mean
||
85.5000
85.5000
0
0.0264
Std
||
5.8023
5.4467
0.8165
1.0042
Max
||
92.0000
92.0000
1.0000
1.2821
Min
||
78.0000
79.0000
-1.0000
-1.1765
StdErr
||
2.9011
2.7234
0.4082
0.5021
t
||
29.4711
31.3951
0.0000
0.0526
Prob>|t| ||
<.0001
<.0001
1.0000
0.9614
||
Ndif
||
2
50.000%
DifMeans ||
0.000%
0.000%
0
r, rsq
||
0.991
0.983
__________________________________________________________
Program
COMPARE Procedure
Comparison of PROCLIB.ONE with PROCLIB.TWO
(Method=EXACT)
Value Comparison Results for Variables
__________________________________________________________
||
Base
Compare
Obs ||
gr2
gr2
Diff.
% Diff
________ || _________ _________ _________ _________
||
1 ||
87.0000
87.0000
0
0
2 ||
92.0000
92.0000
0
0
3 ||
72.0000
73.0000
1.0000
1.3889
4 ||
94.0000
74.0000
-20.0000
-21.2766
________ || _________ _________ _________ _________
||
N
||
4
4
4
4
Mean
||
86.2500
81.5000
-4.7500
-4.9719
Std
||
9.9457
9.4692
10.1776
10.8895
Max
||
94.0000
92.0000
1.0000
1.3889
Min
||
72.0000
73.0000
-20.0000
-21.2766
StdErr
||
4.9728
4.7346
5.0888
5.4447
t
||
17.3442
17.2136
-0.9334
-0.9132
Prob>|t| ||
0.0004
0.0004
0.4195
0.4285
||
Ndif
||
2
50.000%
DifMeans ||
-5.507%
-5.828%
-4.7500
r, rsq
||
0.451
0.204
__________________________________________________________
>
This example compares a variable from the base data set with a variable in the
comparison data set. All summary reports are suppressed.
Program
261
262
Output
Chapter 9
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=40;
Suppress all summary reports of the differences between two data sets. BASE=
species the base data set and COMPARE= species the comparison data set. NOSUMMARY
suppresses all summary reports.
proc compare base=proclib.one compare=proclib.two nosummary;
Specify one variable from the base data set to compare with one variable from the
comparison data set. The VAR and WITH statements specify the variables to compare. This
example compares GR1 from the base data set with GR2 from the comparison data set.
var gr1;
with gr2;
title Comparison of Variables in Different Data Sets;
run;
Output
Comparison of Variables in Different Data Sets
COMPARE Procedure
Comparison of PROCLIB.ONE with PROCLIB.TWO
(Method=EXACT)
NOTE: Data set PROCLIB.TWO contains 1 observations not in PROCLIB.ONE.
NOTE: Values of the following 1 variables compare unequal: gr1^=gr2
Program
263
VAR statement
WITH statement
Data sets:
This example compares one variable from the base data set with two variables in the
comparison data set.
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=40;
Suppress all summary reports of the differences between two data sets. BASE=
species the base data set and COMPARE= species the comparison data set. NOSUMMARY
suppresses all summary reports.
proc compare base=proclib.one compare=proclib.two nosummary;
Specify one variable from the base data set to compare with two variables from the
comparison data set. The VAR and WITH statements specify the variables to compare. This
example compares GR1 from the base data set with GR1 and GR2 from the comparison data set.
var gr1 gr1;
with gr1 gr2;
title Comparison of One Variable with Two Variables;
run;
264
Output
Chapter 9
Output
The Value Comparison Results section shows the result of the comparison.
COMPARE Procedure
Comparison of PROCLIB.ONE with PROCLIB.TWO
(Method=EXACT)
NOTE: Data set PROCLIB.TWO contains 1 observations not in PROCLIB.ONE.
NOTE: Values of the following 2 variables compare unequal: gr1^=gr1 gr1^=gr2
__________________________________________________________
||
Base
Compare
Obs ||
gr1
gr2
Diff.
% Diff
________ || _________ _________ _________ _________
||
1 ||
85.0
87.0000
2.0000
2.3529
3 ||
78.0
73.0000
-5.0000
-6.4103
4 ||
87.0
74.0000
-13.0000
-14.9425
__________________________________________________________
This example shows that PROC COMPARE can compare two variables that are in
the same data set.
Program
265
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=40;
Create a short summary report of the differences within one data set. ALLSTATS prints
summary statistics. BRIEFSUMMARY prints only a short comparison summary.
proc compare base=proclib.one allstats briefsummary;
Specify two variables from the base data set to compare. The VAR and WITH statements
specify the variables in the base data set to compare. This example compares GR1 with GR2.
Because there is no comparison data set, the variables GR1 and GR2 must be in the base data
set.
var gr1;
with gr2;
title Comparison of Variables in the Same Data Set;
run;
266
Output
Chapter 9
Output
Comparison of Variables in the Same Data Set
COMPARE Procedure
Comparisons of variables in PROCLIB.ONE
(Method=EXACT)
NOTE: Values of the following 1 variables compare unequal: gr1^=gr2
ID statement
In this example, PROC COMPARE compares only the observations that have
matching values for the ID variable.
Program
267
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=40;
268
Program
Chapter 9
Sort the data sets by the ID variable. Both data sets must be sorted by the variable that will
be used as the ID variable in the PROC COMPARE step. OUT= species the location of the
sorted data.
proc sort data=proclib.emp95 out=emp95_byidnum;
by idnum;
run;
proc sort data=proclib.emp96 out=emp96_byidnum;
by idnum;
run;
Create a summary report that compares observations with matching values for the ID
variable. The ID statement species IDNUM as the ID variable.
proc compare base=emp95_byidnum compare=emp96_byidnum;
id idnum;
title Comparing Observations that Have Matching IDNUMs;
run;
Output
269
Output
COMPARE Procedure
Comparison of WORK.EMP95_BYIDNUM with WORK.EMP96_BYIDNUM
(Method=EXACT)
Data Set Summary
Dataset
Created
WORK.EMP95_BYIDNUM
WORK.EMP96_BYIDNUM
Modified
NVar
NObs
13MAY98:16:03:36
13MAY98:16:03:36
13MAY98:16:03:36
13MAY98:16:03:36
4
4
10
12
Variables Summary
Number of Variables in Common: 4.
Number of ID Variables: 1.
Observation Summary
Observation
First
First
Last
Last
Obs
Unequal
Unequal
Obs
Base
Compare
1
1
10
10
1
1
12
12
ID
idnum=0987
idnum=0987
idnum=9857
idnum=9857
270
Chapter 9
Type
Len
Ndif
MaxDif
address
salary
CHAR
NUM
42
8
4
4
2400
COMPARE Procedure
Comparison of WORK.EMP95_BYIDNUM with WORK.EMP96_BYIDNUM
(Method=EXACT)
Value Comparison Results for Variables
_______________________________________________________
||
Base
Compare
idnum ||
salary
salary
Diff.
% Diff
_____ || _________ _________ _________ _________
||
0987
||
44010
45110
1100
2.4994
3286
||
87734
89834
2100
2.3936
3888
||
77558
79958
2400
3.0945
9857
||
38756
40456
1700
4.3864
_______________________________________________________
This example creates and prints an output data set that shows the differences
between matching observations.
Program
271
In Example 5 on page 266, the output does not show the differences past the 20th
character. The output data set in this example shows the full values. Further, it shows
the observations that occur in only one of the data sets.
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=120 pagesize=40;
Sort the data sets by the ID variable. Both data sets must be sorted by the variable that will
be used as the ID variable in the PROC COMPARE step. OUT= species the location of the
sorted data.
proc sort data=proclib.emp95 out=emp95_byidnum;
by idnum;
run;
proc sort data=proclib.emp96 out=emp96_byidnum;
by idnum;
run;
Specify the data sets to compare. BASE= and COMPARE= specify the data sets to compare.
proc compare base=emp95_byidnum compare=emp96_byidnum
Create the output data set RESULT and include all unequal observations and their
differences. OUT= names and creates the output data set. NOPRINT suppresses the printing
of the procedure output. OUTNOEQUAL includes only observations that are judged unequal.
OUTBASE writes an observation to the output data set for each observation in the base data
set. OUTCOMP writes an observation to the output data set for each observation in the
comparison data set. OUTDIF writes an observation to the output data set that contains the
differences between the two observations.
out=result outnoequal outbase outcomp outdif
noprint;
272
Output
Chapter 9
Print the output data set RESULT and use the BY and ID statements with the ID
variable. PROC PRINT prints the output data set. Using the BY and ID statements with the
same variable makes the output easy to read. See Chapter 34, The PRINT Procedure, on page
703 for more information on this technique.
proc print data=result noobs;
by idnum;
id idnum;
title The Output Data Set RESULT;
run;
Output
The differences for character variables are noted with an X or a period (.). An X shows that the characters do
not match. A period shows that the characters do match. For numeric variables, an E means that there is no
difference. Otherwise, the numeric difference is shown. By default, the output data set shows that two
observations in the comparison data set have no matching observation in the base data set. You do not have to
use an option to make those observations appear in the output data set.
_TYPE_
0987
BASE
COMPARE
DIF
name
address
Dolly Lunford
1
1
Dolly Lunford
...............
45110
1100
BASE
COMPARE
5
5
Robert Jones
Robert Jones
29025
29025
DIF
...............
........................................X.
3278
COMPARE
Mary Cravens
35362
3286
BASE
COMPARE
6
7
Hoa Nguyen
Hoa Nguyen
87734
89834
DIF
...............
..........................................
2100
BASE
Kim Siu
77558
COMPARE
DIF
8
7
Kim Siu
...............
79958
2400
6544
COMPARE
Roger Monday
47007
9857
BASE
COMPARE
DIF
10
12
10
Kathy Krupski
Kathy Krupski
...............
38756
40456
1700
2776
3888
_OBS_
1
salary
Apex NC 27505
44010
Program
273
This example creates an output data set that contains summary statistics for the
numeric variables that are compared.
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=40;
Sort the data sets by the ID variable. Both data sets must be sorted by the variable that will
be used as the ID variable in the PROC COMPARE step. OUT= species the location of the
sorted data.
proc sort data=proclib.emp95 out=emp95_byidnum;
by idnum;
run;
proc sort data=proclib.emp96 out=emp96_byidnum;
by idnum;
run;
Create the output data set of statistics and compare observations that have matching
values for the ID variable. BASE= and COMPARE= specify the data sets to compare.
OUTSTATS= creates the output data set DIFFSTAT. NOPRINT suppresses the procedure
output. The ID statement species IDNUM as the ID variable. PROC COMPARE uses the
values of IDNUM to match observations.
proc compare base=emp95_byidnum compare=emp96_byidnum
outstats=diffstat noprint;
id idnum;
run;
274
Output
Chapter 9
Print the output data set DIFFSTAT. PROC PRINT prints the output data set DIFFSTAT.
proc print data=diffstat noobs;
title The DIFFSTAT Data Set;
run;
Output
The variables are described in Output Statistics Data Set (OUTSTATS=) on page 255.
_TYPE_
salary
salary
salary
salary
salary
salary
salary
salary
salary
salary
salary
N
MEAN
STD
MAX
MIN
STDERR
T
PROBT
NDIF
DIFMEANS
R,RSQ
_BASE_
_COMP_
_DIF_
_PCTDIF_
10.00
52359.00
24143.84
92100.00
29025.00
7634.95
6.86
0.00
4.00
1.39
1.00
10.00
53089.00
24631.01
92100.00
29025.00
7789.01
6.82
0.00
40.00
1.38
1.00
10.00
730.00
996.72
2400.00
0.00
315.19
2.32
0.05
.
730.00
.
10.0000
1.2374
1.6826
4.3864
0.0000
0.5321
2.3255
0.0451
.
.
.
275
CHAPTER
10
The CONTENTS Procedure
Overview: CONTENTS Procedure 275
Syntax: CONTENTS Procedure 275
See:
276
Chapter 10
To do this
CENTILES
DATA=
DETAILS|NODETAILS
DIRECTORY
FMTLEN
MEMTYPE=
NODS
NOPRINT
ORDER=IGNORECASE
OUT=
OUT2=
SHORT
VARNUM
277
CHAPTER
11
The COPY Procedure
Overview: COPY Procedure 277
Syntax: COPY Procedure 277
Concepts: COPY Procedure 278
Transporting SAS Data Sets between Hosts 278
Example: COPY Procedure 279
Example 1: Copying SAS Data Sets between Hosts
279
3 The IN= argument is required with PROC COPY. In the COPY statement, IN= is
optional. If IN= is omitted, the default value is the libref of the procedure input
library.
3 PROC DATASETS cannot work with libraries that allow only sequential data
access.
Note: The MIGRATE procedure is available specically for migrating a SAS data
library from a previous release to the most recent release. For migration, PROC
MIGRATE offers benets that PROC COPY does not. For documentation on PROC
MIGRATE, see the Migration Community at http://support.sas.com/rnd/
migration. 4
278
Chapter 11
either the transport (XPORT) engine or the XML engine. This le is referred to as
a transport le and is always a sequential le.
2 After the le is created, you can move it to another operating environment via
to copy the data sets from the transport le to a SAS data library.
For an example, see Example 1 on page 279.
For details on transporting les, see Moving and Accessing SAS Files across
Operating Environments.
The CPORT and CIMPORT procedures also provide a way to transport SAS les. For
information, see Chapter 8, The CIMPORT Procedure, on page 215 and Chapter 13,
The CPORT Procedure, on page 285.
Program
279
XPORT engine
Program
Assign library references. Assign a libref, such as SOURCE, to the SAS data library that
contains the SAS data set that you want to transport. Also, assign a libref to the transport le
and use the XPORT keyword to specify the XPORT engine.
libname source SAS-data-library-on-sending-host;
libname xptout xport filename-on-sending-host;
Copy the SAS data sets to the transport le. Use PROC COPY to copy the SAS data sets
from the IN= library to the transport le. MEMTYPE=DATA species that only SAS data sets
are copied. SELECT selects the data sets that you want to copy.
proc copy in=source out=xptout memtype=data;
select bonus budget salary;
run;
280
SAS Log
Chapter 11
SAS Log
1
libname source SAS-data-library-on-sending-host ;
NOTE: Libref SOURCE was successfully assigned as follows:
Engine:
V9
Physical Name: SAS-data-library-on-sending-host
2
libname xptout xport filename-on-sending-host;
NOTE: Libref XPTOUT was successfully assigned as follows:
Engine:
XPORT
Physical Name: filename-on-sending-host
3
proc copy in=source out=xptout memtype=data;
4
select bonus budget salary;
5
run;
NOTE:
NOTE:
NOTE:
NOTE:
NOTE:
NOTE:
Enable the procedure to read data from the transport le. The XPORT engine in the
LIBNAME statement enables the procedure to read the data from the transport le.
libname insource xport filename-on-receiving-host;
Copy the SAS data sets to the receiving host. After you copy the les (for example, by using
FTP in binary mode to the Windows NT host), use PROC COPY to copy the SAS data sets to the
WORK data library on the receiving host.
proc copy in=insource out=work;
run;
1
libname insource xport filename-on-receiving-host;
NOTE: Libref INSOURCE was successfully assigned as follows:
Engine:
XPORT
Physical Name: filename-on-receiving-host
2
proc copy in=insource out=work;
3
run;
NOTE: Input library INSOURCE is sequential.
NOTE: Copying INSOURCE.BUDGET to WORK.BUDGET (memtype=DATA).
NOTE: BUFSIZE is not cloned when copying across different engines.
System Option for BUFSIZE was used.
NOTE: The data set WORK.BUDGET has 1 observations and 3 variables.
NOTE: Copying INSOURCE.BONUS to WORK.BONUS (memtype=DATA).
NOTE: BUFSIZE is not cloned when copying across different engines.
System Option for BUFSIZE was used.
NOTE: The data set WORK.BONUS has 1 observations and 3 variables.
NOTE: Copying INSOURCE.SALARY to WORK.SALARY (memtype=DATA).
NOTE: BUFSIZE is not cloned when copying across different engines.
System Option for BUFSIZE was used.
NOTE: The data set WORK.SALARY has 1 observations and 3 variables.
SAS Log
281
282
283
CHAPTER
12
The CORR Procedure
Information about the CORR Procedure
283
The documentation for the CORR procedure has moved to Volume 3 of this book.
284
285
CHAPTER
13
The CPORT Procedure
Overview: CPORT Procedure 285
What Does the CPORT Procedure Do? 285
General File Transport Process 286
Syntax: CPORT Procedure 286
PROC CPORT Statement 286
EXCLUDE Statement 292
SELECT Statement 293
TRANTAB Statement 294
Concepts: CPORT Procedure 294
Results: CPORT Procedure 294
Examples: CPORT Procedure 295
Example 1: Exporting Multiple Catalogs 295
Example 2: Exporting Individual Catalog Entries 296
Example 3: Exporting a Single SAS Data Set 296
Example 4: Applying a Translation Table 297
Example 5: Exporting Entries Based on Modication Date
298
286
Chapter 13
PROC CPORT produces no output (other than the transport les), but it does write
notes to the SAS log.
3 Use PROC CIMPORT to translate the transport le into the format appropriate
To do this
FILE=
To do this
TAPE
AFTER=
EET=
ET=
GENERATION=
MEMTYPE=
ASIS
CONSTRAINT
DATECOPY
INDEX
NOCOMPRESS
OUTTYPE=
UPCASE
TRANSLATE
NOEDIT
NOSRC
OUTLIB=
Required Arguments
identies the type of le to export and species the catalog, SAS data set, or SAS
data library to export.
287
288
Chapter 13
source-type
identies the le(s) to export as a single catalog, as a single SAS data set, or as
the members of a SAS data library. The source-type argument can be one of the
following:
CATALOG | CAT | C
DATA | DS | D
LIBRARY | LIB | L
libref | <libref.>member-name
species the specic catalog, SAS data set, or SAS data library to export. If
source-type is CATALOG or DATA, you can specify both a libref and a member
name. If the libref is omitted, PROC CPORT uses the default library as the libref,
which is usually the WORK library. If the source-type argument is LIBRARY,
specify only a libref. If you specify a library, PROC CPORT exports only data sets
and catalogs from that library. You cannot export other types of les.
Options
AFTER=date
exports copies of all data sets or catalog entries that have a modication date later
than or equal to the date you specify. The modication date is the most recent date
when the contents of the data set or catalog entry changed. Specify date as a SAS
date literal or as a numeric SAS date value.
Tip: You can determine the modication date of a catalog entry by using the
CATALOG procedure.
Featured in: Example 5 on page 298.
ASIS
suppresses the conversion of displayed character data to transport format. Use this
option when you move les that contain DBCS (double-byte character set) data from
one operating environment to another if both operating environments use the same
type of DBCS data.
Interaction: The ASIS option invokes the NOCOMPRESS option.
Interaction: You cannot use both the ASIS option and the OUTTYPE= options in
the same PROC CPORT step.
CONSTRAINT=YES | NO
controls the exportation of integrity constraints that have been dened on a data set.
When you specify CONSTRAINT=YES, all types of integrity constraints are exported
for a library; only general integrity constraints are exported for a single data set.
When you specify CONTRAINT=NO, indexes created without integrity constraints
are ported, but neither integrity constraints nor any indexes created with integrity
constraints are ported. For more information on integrity constraints, see the section
on SAS les in SAS Language Reference: Concepts.
Alias: CON=
Default: YES
Interaction: You cannot specify both CONSTRAINT= and INDEX= in the same
PROC CPORT step.
Interaction: If you specify INDEX=NO, no integrity constraints are exported.
DATECOPY
copies the SAS internal date and time when the SAS le was created and the date
and time when it was last modied to the resulting transport le. Note that the
operating environment date and time are not preserved.
289
Restriction: DATECOPY can be used only when the destination le uses the V8 or
V9 engine.
You can alter the le creation date and time with the DTC= option on the
MODIFY statementMODIFY Statement on page 348 in a PROC DATASETS step.
Tip:
EET=(etype(s))
excludes specied entry types from the transport le. If etype is a single entry type,
then you can omit the parentheses. Separate multiple values with a space.
Interaction: You cannot use both the EET= option and the ET= option in the same
includes specied entry types in the transport le. If etype is a single entry type,
then you can omit the parentheses. Separate multiple values with a space.
Interaction: You cannot use both the EET= option and the ET= option in the same
species a previously dened leref or the lename of the transport le to write to. If
you omit the FILE= option, then PROC CPORT writes to the leref SASCAT, if
dened. If the leref SASCAT is not dened, PROC CPORT writes to SASCAT.DAT
in the current directory.
Note: The behavior of PROC CPORT when SASCAT is undened varies from one
operating environment to another. For details, see the SAS documentation for your
operating environment. 4
Featured in:
All examples.
GENERATION=YES | NO
species whether to export all generations of a SAS data set. To export only the base
generation of a data set, specify GENERATION=NO in the PROC CPORT statement.
To export a specic generation number, use the GENNUM= data set option when you
specify a data set in the PROC CPORT statement. For more information on
generation data sets, see SAS Language Reference: Concepts.
Note: PROC CIMPORT imports all generations of a data set that are present in
the transport le. It deletes any previous generation set with the same name and
replaces it with the imported generation set, even if the number of generations does
not match. 4
Alias:
GEN=
290
Chapter 13
INTYPE=DBCS-type
species the type of DBCS data stored in the SAS les to be exported. Double-byte
character set (DBCS) data uses up to two bytes for each character in the set.
DBCS-type must be one of the following values:
IBM | HITAC |
FACOM
for z/OS
IBM
for VSE
DEC | SJIS
for OpenVMS
PCIBM | SJIS
for OS/2
Restriction The INTYPE= option is allowed only if SAS is built with Double-Byte
Tip:
MEMTYPE=mtype
restricts the type of SAS le that PROC CPORT writes to the transport le.
MEMTYPE= restricts processing to one member type. Values for mtype can be
ALL
both catalogs and data sets
CATALOG | CAT
catalogs
DATA | DS
SAS data sets
Alias: MT=
Default: ALL
Featured in:
NOCOMPRESS
suppresses the compression of binary zeros and blanks in the transport le.
Alias:
NOCOMP
Default: By default, PROC CPORT compresses binary zeros and blanks to conserve
space.
Interaction: The ASIS, INTYPE=, and OUTTYPE= options invoke the
NOCOMPRESS option.
Note: Compression of the transport le does not alter the ag in each catalog and
data set that indicates whether the original le was compressed. 4
NOEDIT
291
exports SAS/AF PROGRAM and SCL entries without edit capability when you
import them.
The NOEDIT option produces the same results as when you create a new catalog
to contain SCL code by using the MERGE statement with the NOEDIT option in the
BUILD procedure of SAS/AF software.
Note: The NOEDIT option affects only SAS/AF PROGRAM and SCL entries. It
does not affect FSEDIT SCREEN or FSVIEW FORMULA entries. 4
Alias:
NEDIT
NOSRC
species that exported catalog entries contain compiled SCL code but not the source
code.
The NOSRC option produces the same results as when you create a new catalog to
contain SCL code by using the MERGE statement with the NOSOURCE option in
the BUILD procedure of SAS/AF software.
Alias:
NSRC
OUTLIB=libref
species a libref associated with a SAS data library. If you specify the OUTLIB=
option, PROC CIMPORT is invoked automatically to re-create the input data library,
data set, or catalog in the specied library.
Alias: OUT=
Tip: Use the OUTLIB= option when you change SAS les from one DBCS type to
another within the same operating environment if you want to keep the original
data intact.
OUTTYPE=UPCASE
translates specied characters from one ASCII or EBCDIC value to another. Each
element of translation-list has the form
ASCII-value-1 TO ASCII-value-2
EBCDIC-value-1 TO EBCDIC-value-2
You can use hexadecimal or decimal representation for ASCII values. If you use
the hexadecimal representation, values must begin with a digit and end with an x.
Use a leading zero if the hexadecimal value begins with an alphabetic character.
For example, to translate all left brackets to left braces, specify the TRANSLATE=
option as follows (for ASCII characters):
translate=(5bx to 7bx)
The following example translates all left brackets to left braces and all right
brackets to right braces:
translate=(5bx to 7bx 5dx to 7dx)
292
EXCLUDE Statement
Chapter 13
EXCLUDE Statement
Excludes specied les or entries from the transport le.
Tip: There is no limit to the number of EXCLUDE statements you can use in one
invocation of PROC CPORT.
Interaction: You can use either EXCLUDE statements or SELECT statements in a PROC
CPORT step, but not both.
Required Arguments
SAS le(s) | catalog entry(s)
species either the name(s) of one or more SAS les or the names of one or more
catalog entries to be excluded from the transport le. Specify SAS lenames when
you export a SAS data library; specify catalog entry names when you export an
individual SAS catalog. Separate multiple lenames or entry names with a space.
You can use shortcuts to list many like-named les in the EXCLUDE statement. For
more information, see Shortcuts for Specifying Lists of Variable Names on page 24.
Options
ENTRYTYPE=entry-type
species a single entry type for the catalog entries listed in the EXCLUDE statement.
See SAS Language Reference: Concepts for a complete list of catalog entry types.
Restriction: ENTRYTYPE= is valid only when you export an individual SAS
catalog.
Alias: ETYPE=, ET=
MEMTYPE=mtype
species a single member type for the SAS le(s) listed in the EXCLUDE statement.
Valid values are CATALOG or CAT, DATA, or ALL. If you do not specify the
MEMTYPE= option in the EXCLUDE statement, then processing is restricted to
those member types specied in the MEMTYPE= option in the PROC CPORT
statement.
You can also specify the MEMTYPE= option, enclosed in parentheses, immediately
after the name of a le. In parentheses, MEMTYPE= identies the type of the le
name that just precedes it. When you use this form of the option, it overrides the
MEMTYPE= option that follows the slash in the EXCLUDE statement, but it must
match the MEMTYPE= option in the PROC CPORT statement:
Restriction: MEMTYPE= is valid only when you export a SAS data library.
Restriction: If you specify a member type for MEMTYPE= in the PROC CPORT
statement, it must agree with the member type that you specify for MEMTYPE=
in the EXCLUDE statement.
Alias: MTYPE=, MT=
Default: If you do not specify MEMTYPE= in the PROC CPORT statement or in
the EXCLUDE statement, the default is MEMTYPE=ALL.
SELECT Statement
293
SELECT Statement
Includes specied les or entries in the transport le.
There is no limit to the number of SELECT statements you can use in one
invocation of PROC CPORT.
Tip:
Interaction: You can use either EXCLUDE statements or SELECT statements in a PROC
CPORT step, but not both.
Featured in: Example 2 on page 296
Required Arguments
SAS le(s) | catalog entry(s)
species either the name(s) of one or more SAS les or the names of one or more
catalog entries to be included in the transport le. Specify SAS lenames when you
export a SAS data library; specify catalog entry names when you export an
individual SAS catalog. Separate multiple lenames or entry names with a space.
You can use shortcuts to list many like-named les in the SELECT statement. For
more information, see Shortcuts for Specifying Lists of Variable Names on page 24.
Options
ENTRYTYPE=entry-type
species a single entry type for the catalog entries listed in the SELECT statement.
See SAS Language Reference: Concepts for a complete list of catalog entry types.
Restriction: ENTRYTYPE= is valid only when you export an individual SAS
catalog.
Alias:
ETYPE=, ET=
MEMTYPE=mtype
species a single member type for the SAS le(s) listed in the SELECT statement.
Valid values are CATALOG or CAT, DATA, or ALL. If you do not specify the
MEMTYPE= option in the SELECT statement, then processing is restricted to those
member types specied in the MEMTYPE= option in the PROC CPORT statement.
You can also specify the MEMTYPE= option, enclosed in parentheses, immediately
after the name of a member. In parentheses, MEMTYPE= identies the type of the
member name that just precedes it. When you use this form of the option, it
overrides the MEMTYPE= option that follows the slash in the SELECT statement,
but it must match the MEMTYPE= option in the PROC CPORT statement.
Restriction: MEMTYPE= is valid only when you export a SAS data library.
Restriction: If you specify a member type for MEMTYPE= in the PROC CPORT
statement, it must agree with the member type that you specify for MEMTYPE=
in the SELECT statement.
Alias:
MTYPE=, MT=
294
TRANTAB Statement
Chapter 13
TRANTAB Statement
Species translation tables for characters in catalog entries you export.
Tip: You can specify only one table for each TRANTAB statement, but there is no limit
to the number of TRANTAB statements you can use in one invocation of PROC CPORT.
Featured in:
The TRANTAB Statement for the CPORT Procedure and the UPLOAD and
DOWNLOAD Procedures in SAS National Language Support (NLS): Users Guide
See:
TRANTAB NAME=translation-table-name
<option(s)>;
SAS Log
295
This example shows how to use PROC CPORT to export entries from all of the SAS
catalogs in the SAS data library you specify.
Program
Specify the library reference for the SAS data library that contains the source les to
be exported and the le reference to which the output transport le is written. The
LIBNAME statement assigns a libref for the SAS data library. The FILENAME statement
assigns a leref and any operating environment options for le characteristics for the transport
le that PROC CPORT creates.
libname source SAS-data-library;
filename tranfile transport-file
host-option(s)-for-file-characteristics;
Create the transport le. The PROC CPORT step executes on the operating environment
where the source library is located. MEMTYPE=CATALOG writes all SAS catalogs in the source
library to the transport le.
proc cport library=source file=tranfile memtype=catalog;
run;
SAS Log
NOTE:
NOTE:
NOTE:
NOTE:
NOTE:
NOTE:
NOTE:
NOTE:
NOTE:
NOTE:
NOTE:
296
Chapter 13
This example shows how to use PROC CPORT to export individual catalog entries,
rather than all of the entries in a catalog.
Program
Assign library references. The LIBNAME and FILENAME statements assign a libref for the
source library and a leref for the transport le, respectively.
libname source SAS-data-library;
filename tranfile transport-file
host-option(s)-for-file-characteristics;
Write an entry to the transport le. SELECT writes only the LOAN.SCL entry to the
transport le for export.
proc cport catalog=source.finance file=tranfile;
select loan.scl;
run;
SAS Log
NOTE: Proc CPORT begins to transport catalog SOURCE.FINANCE
NOTE: The catalog has 5 entries and its maximum logical record length is 866.
NOTE: Entry LOAN.SCL has been transported.
This example shows how to use PROC CPORT to export a single SAS data set.
Program
297
Program
Assign library references. The LIBNAME and FILENAME statements assign a libref for the
source library and a leref for the transport le, respectively.
libname source SAS-data-library;
filename tranfile transport-file
host-option(s)-for-file-characteristics;
Specify the type of le that you are exporting. The DATA= specication in the PROC
CPORT statement tells the procedure that you are exporting a SAS data set rather than a
library or a catalog.
proc cport data=source.times file=tranfile;
run;
SAS Log
NOTE: Proc CPORT begins to transport data set SOURCE.TIMES
NOTE: The data set contains 2 variables and 2 observations.
Logical record length is 16.
NOTE: Transporting data set index information.
This example shows how to apply a customized translation table to the transport le
before PROC CPORT exports it. For this example, assume that you have already
created a customized translation table called TTABLE1.
Program
Assign library references. The LIBNAME and FILENAME statements assign a libref for the
source library and a leref for the transport le, respectively.
libname source SAS-data-library;
filename tranfile transport-file
host-option(s)-for-file-characteristics;
298
SAS Log
Chapter 13
Apply the translation specics. The TRANTAB statement applies the translation that you
specify with the customized translation table TTABLE1. TYPE= limits the translation to
FORMAT entries.
proc cport catalog=source.formats file=tranfile;
trantab name=ttable1 type=(format);
run;
SAS Log
NOTE:
NOTE:
NOTE:
NOTE:
This example shows how to use PROC CPORT to transport only the catalog entries
with modication dates equal to or later than the date you specify in the AFTER=
option.
Program
Assign library references. The LIBNAME and FILENAME statements assign a libref for the
source library and a leref for the transport le, respectively.
libname source SAS-data-library;
filename tranfile transport-file
host-option(s)-for-file-characteristics;
SAS Log
299
Specify the catalog entries to be written to the transport le. AFTER= species that only
catalog entries with modication dates on or after September 9, 1996, should be written to the
transport le.
proc cport catalog=source.finance file=tranfile
after=09sep1996d;
run;
SAS Log
PROC CPORT writes messages to the SAS log to inform you that it began the export process for
all the entries in the specied catalog. However, PROC CPORT wrote only the entries
LOAN.FRAME and LOAN.HELP in the FINANCE catalog to the transport le because only
those two entries had a modication date equal to or later than September 9, 1996. That is, of
all the entries in the specied catalog, only two met the requirement of the AFTER= option.
NOTE:
NOTE:
NOTE:
NOTE:
300
301
CHAPTER
14
The CV2VIEW Procedure
Information about the CV2VIEW Procedure
301
See:
302
303
CHAPTER
15
The DATASETS Procedure
Overview: DATASETS Procedure 304
What Does the DATASETS Procedure Do? 304
Sample PROC DATASETS Output 305
Notes 306
Syntax: DATASETS Procedure 307
PROC DATASETS Statement 308
AGE Statement 312
APPEND Statement 313
AUDIT Statement 319
CHANGE Statement 322
CONTENTS Statement 323
COPY Statement 327
DELETE Statement 334
EXCHANGE Statement 338
EXCLUDE Statement 339
FORMAT Statement 339
IC CREATE Statement 340
IC DELETE Statement 343
IC REACTIVATE Statement 343
INDEX CENTILES 344
INDEX CREATE Statement 345
INDEX DELETE Statement 346
INFORMAT Statement 347
LABEL Statement 347
MODIFY Statement 348
RENAME Statement 352
REPAIR Statement 353
SAVE Statement 355
SELECT Statement 356
Concepts: DATASETS Procedure 357
Procedure Execution 357
Execution of Statements 357
RUN-Group Processing 357
Error Handling 358
Password Errors 359
Forcing a RUN Group with Errors to Execute 359
Ending the Procedure 359
Using Passwords with the DATASETS Procedure 359
Restricting Member Types for Processing 360
In the PROC DATASETS Statement 360
In Subordinate Statements 360
304
Chapter 15
365
305
306
Notes
Output 15.1
59
Chapter 15
Member
#
1
2
3
4
external-file
external-file
Obs, Entries
File
Type
or Indexes
Vars
ALL
DATA
23
17
13312
29JAN2002:08:06:46
BODYFAT
CONFOUND
CORONARY
DATA
DATA
DATA
1
8
39
2
4
4
5120
5120
5120
29JAN2002:08:06:46
29JAN2002:08:06:46
29JAN2002:08:06:46
5
6
7
DRUG1
DRUG2
DRUG3
DATA
DATA
DATA
6
13
11
2
2
2
JAN95 Data
MAY95 Data
JUL95 Data
5120
5120
5120
29JAN2002:08:06:46
29JAN2002:08:06:46
29JAN2002:08:06:46
8
9
DRUG4
DRUG5
DATA
DATA
7
1
2
2
JAN92 Data
JUL92 Data
5120
5120
29JAN2002:08:06:46
29JAN2002:08:06:46
10
11
12
GROUP
MLSCL
NAMES
DATA
DATA
DATA
148
32
7
11
4
4
25600
5120
5120
29JAN2002:08:06:46
29JAN2002:08:06:46
29JAN2002:08:06:46
13
14
15
OXYGEN
PERSONL
PHARM
DATA
DATA
DATA
31
148
6
7
11
3
9216
25600
5120
29JAN2002:08:06:46
29JAN2002:08:06:46
29JAN2002:08:06:46
16
17
18
POINTS
PRENAT
RESULTS
DATA
DATA
DATA
6
149
10
6
6
5
5120
17408
5120
29JAN2002:08:06:46
29JAN2002:08:06:46
29JAN2002:08:06:46
19
20
SLEEP
SYNDROME
DATA
DATA
108
46
6
8
9216
9216
29JAN2002:08:06:46
29JAN2002:08:06:46
21
22
23
TENSION
TEST2
TRAIN
DATA
DATA
DATA
4
15
7
3
5
2
5120
5120
5120
29JAN2002:08:06:46
29JAN2002:08:06:46
29JAN2002:08:06:47
24
25
26
60
61
62
Name
HEALTH
V9
Label
VISION
WEIGHT
WGHT
DATA
DATA
DATA
16
83
83
3
13
13
5120
13312
13312
29JAN2002:08:06:47
29JAN2002:08:06:47
29JAN2002:08:06:47
Sugar Study
California Results
California Results
Size
Last Modified
delete syndrome;
change prenat=infant;
run;
Notes
3 Although the DATASETS procedure can perform some operations on catalogs,
generally the CATALOG procedure is the best utility to use for managing catalogs.
For documentation of PROC CATALOG, see Overview: CATALOG Procedure on
page 153.
3 The term member often appears as a synonym for SAS le. If you are unfamiliar
with SAS les and SAS libraries, refer to SAS Files Concepts in SAS Language
Reference: Concepts.
3 PROC DATASETS cannot work with sequential data libraries.
307
308
Chapter 15
<GENNUM=ALL|HIST|REVERT|integer>
<MEMTYPE=mtype>>;
EXCHANGE name-1=other-name-1
<name-n=other-name-n>
</ < ALTER=alter-password>
<MEMTYPE=mtype> >;
MODIFY SAS-le <(option(s))>
</ < CORRECTENCODING=encoding-value>
<DTC=SAS-date-time>
<GENNUM=integer>
<MEMTYPE=mtype>>;
FORMAT variable-list-1 <format-1>
<variable-list-n <format-n>>;
IC CREATE <constraint-name=> constraint
<MESSAGE=message-string < MSGTYPE=USER>>;
IC DELETE constraint-name(s)| _ALL_;
IC REACTIVATE foreign-key-name REFERENCES libref;
INDEX CENTILES index(s)
</ <REFRESH>
<UPDATECENTILES= ALWAYS|NEVER|integer>>;
INDEX CREATE index-specication(s)
</ <NOMISS>
<UNIQUE>
<UPDATECENTILES=ALWAYS|NEVER|integer>>;
INDEX DELETE index(s) | _ALL_;
INFORMAT variable-list-1 <informat-1>
<variable-list-n <informat-n>>;
LABEL variable-1=<label-1| >
<variable-n=<label-n| >>;
RENAME old-name-1=new-name-1
<old-name-n=new-name-n>;
REPAIR SAS-le(s)
</ < ALTER=alter-password>
<GENNUM=integer>
<MEMTYPE=mtype>>;
SAVE SAS-le(s) </ MEMTYPE=mtype>;
To do this
LIBRARY=
ALTER=
To do this
DETAILS|NODETAILS
FORCE
FORCE
GENNUM=
KILL
MEMTYPE=
NOLIST
NOWARN
PW=
309
READ=
Options
ALTER=alter-password
provides the alter password for any alter-protected SAS les in the SAS data library.
See also: Using Passwords with the DATASETS Procedure on page 359
DETAILS|NODETAILS
310
Chapter 15
supply the read password, the directory listing contains missing values for the
columns produced by the DETAILS option.
Default: If neither DETAILS or NODETAILS is specied, the default is the system
Tip:
Featured in:
FORCE
3 forces a RUN group to execute even if errors are present in one or more
statements in the RUN group. See RUN-Group Processing on page 357 for a
discussion of RUN-group processing and error handling.
3 forces all APPEND statements to concatenate two data sets even when the
variables in the data sets are not exactly the same. The APPEND statement
drops the extra variables and issues a warning message. Without the FORCE
option, the procedure issues an error message and stops processing if you try to
perform an append operation with two SAS data sets whose variables are not
exactly the same. Refer to APPEND Statement on page 313 for more
information on the FORCE option.
GENNUM=ALL|HIST|REVERT|integer
restricts processing for generation data sets. Valid values are as follows:
ALL
for subordinate CHANGE and DELETE statements, refers to the base version and
all historical versions in a generation group.
HIST
for a subordinate DELETE statement, refers to all historical versions, but excludes
the base version in a generation group.
REVERT|0
for a subordinate DELETE statement, refers to the base version in a generation
group and changes the most current historical version, if it exists, to the base
version.
integer
for subordinate AUDIT, CHANGE, MODIFY, DELETE, and REPAIR statements,
refers to a specic version in a generation group. Specifying a positive number is
an absolute reference to a specic generation number that is appended to a data set
name; that is, gennum=2 species MYDATA#002. Specifying a negative number is
a relative reference to a historical version in relation to the base version, from the
youngest to the oldest; that is, gennum=-1 refers to the youngest historical version.
See also: Restricting Processing for Generation Data Sets on page 362
See also: Understanding Generation Data Sets in SAS Language Reference:
Concepts
KILL
deletes all SAS les in the SAS data library that are available for processing. The
MEMTYPE= option subsets the member types that the statement deletes.
CAUTION:
The KILL option deletes the SAS les immediately after you submit the statement.
311
LIBRARY=libref
names the library that the procedure processes. This library is the procedure input
library.
Aliases: DDNAME=, DD=, LIB=
Default: WORK or USER. See Temporary and Permanent SAS Data Sets on page
restricts processing to one or more member types and restricts the listing of the data
library directory to SAS les of the specied member types. For example, the
following PROC DATASETS statement limits processing to SAS data sets in the
default data library and limits the directory listing in the SAS log to SAS les of
member type DATA:
proc datasets memtype=data;
MTYPE=, MT=
Default: ALL
Aliases:
suppresses the printing of the directory of the SAS les in the SAS log.
Featured in: Example 3 on page 381
Note: If you specify the ODS RTF destination, PROC DATASETS output will go
to both the SAS log and the ODS output area. The NOLIST option will suppress
output to both. To see the output only in the SAS log, use the ODS EXCLUDE
statement by specifying the member directory as the exclusion. 4
NOWARN
suppresses the error processing that occurs when a SAS le that is specied in a
SAVE, CHANGE, EXCHANGE, REPAIR, DELETE, or COPY statement or listed as
the rst SAS le in an AGE statement is not in the procedure input library. When an
error occurs and the NOWARN option is in effect, PROC DATASETS continues
processing that RUN group. If NOWARN is not in effect, PROC DATASETS stops
processing that RUN group and issues a warning for all operations except DELETE,
for which it does not stop processing.
PW= password
provides the password for any protected SAS les in the SAS data library. PW= can
act as an alias for READ=, WRITE=, or ALTER=.
See also: Using Passwords with the DATASETS Procedure on page 359
READ=read-password
provides the read-password for any read-protected SAS les in the SAS data library.
See also: Using Passwords with the DATASETS Procedure on page 359
312
AGE Statement
Chapter 15
AGE Statement
Renames a group of related SAS les in a library.
Featured in:
Required Arguments
current-name
is a SAS le that the procedure renames. current-name receives the name of the rst
name in related-SAS-le(s).
related-SAS-le(s)
Options
ALTER=alter-password
provides the alter password for any alter-protected SAS les named in the AGE
statement. Because an AGE statement renames and deletes SAS les, you need alter
access to use the AGE statement. You can use the option either in parentheses after
the name of each SAS le or after a forward slash.
See also: Using Passwords with the DATASETS Procedure on page 359
MEMTYPE=mtype
restricts processing to one member type. All of the SAS les that you name in the
AGE statement must be the same member type. You can use the option either in
parentheses after the name of each SAS le or after a forward slash.
Aliases:
MTYPE=, MT=
Default: If you do not specify MEMTYPE= in the PROC DATASETS statement, the
default is DATA.
See also: Restricting Member Types for Processing on page 360
Details
3 The AGE statement renames current-name to the name of the rst name in
related-SAS-le(s), renames the rst name in related-SAS-le(s) to the second
name in related-SAS-le(s), and so on until it changes the name of the next-to-last
SAS le in related-SAS-le(s) to the last name in related-SAS-le(s). The AGE
statement then deletes the last le in related-SAS-le(s).
3 If the rst SAS le named in the AGE statement does not exist in the SAS data
library, PROC DATASETS stops processing the RUN group containing the AGE
statement and issues an error message. The AGE statement does not age any of
APPEND Statement
313
the related-SAS-le(s). To override this behavior, use the NOWARN option in the
PROC DATASETS statement.
If one of the related-SAS-le(s) does not exist, the procedure prints a warning
message to the SAS log but continues to age the SAS les that it can.
3 If you age a data set that has an index, the index continues to correspond to the
data set.
3 You can age only entire generation groups. For example, if data sets A and B have
generation groups, then the following statement deletes generation group B and
ages (renames) generation group A to the name B:
age a b;
For example, suppose the generation group for data set A has 3 historical versions
and the generation group for data set B has 2 historical versions. Then aging A to
B has this effect:
Old Name
Version
New Name
Version
base
base
base
is deleted
is deleted
is deleted
APPEND Statement
Adds the observations from one SAS data set to the end of another SAS data set.
Reminder: You can use data set options with the BASE= and DATA= options. See Data
Set Options on page 18 for a list. You can also use any global statements as well. See
Global Statements on page 18.
Requirement: The BASE= data set must be a member of a SAS library that supports
update processing.
If the BASE= data set is accessed through a SAS server and if no other user
has the data set open at the time the APPEND statement begins processing, the
BASE= data set defaults to CNTLLEV=MEMBER (member-level locking). When this
happens, no other user can update the le while the data set is processed.
Tip: If a failure occurs during processing, the data set is marked as damaged and is
reset to its pre-append condition at the next REPAIR statement. If the data set has an
index, the index is not updated with each observation but is updated once at the end.
(This is Version 7 and later behavior, as long as APPENDVER=V6 is not set.)
Default:
314
APPEND Statement
Chapter 15
<DATA=<libref.>SAS-data-set>
<FORCE>;
Required Arguments
BASE=<libref.> SAS-data-set
Options
APPENDVER=V6
uses the Version 6 behavior for appending observations to the BASE= data set, which
is to append one observation at a time. Beginning in Version 7, to improve
performance, the default behavior changed so that all observations are appended
after the data set is processed.
See also: Appending to an Indexed Data Set Fast-Append Method on page 316
DATA=<libref.> SAS-data-set
names the SAS data set containing observations that you want to append to the end
of the SAS data set specied in the BASE= argument.
libref
species the library that contains the SAS data set. If you omit libref, the default
is the libref for the procedure input library. The DATA= data set can be from any
SAS data library, but you must use the two-level name if the data set resides in a
library other than the procedure input library.
SAS-data-set
names a SAS data set. If the APPEND statement cannot nd an existing data set
with this name, it stops processing.
Alias: NEW=
Default: the most recently created SAS data set, from any SAS data library
See also: Appending with Generation Groups on page 318
Featured in: Example 5 on page 386
FORCE
forces the APPEND statement to concatenate data sets when the DATA= data set
contains variables that either
APPEND Statement
315
For an existing BASE= data set: If there is a WHERE statement on the BASE= data
set, it will take effect only if the WHEREUP= option is set to YES. 4
CAUTION:
For the non-existent BASE= data set: If there is a WHERE statement on the
non-existent BASE= data set, regardless of the WHEREUP option setting, you use
the WHERE statement. 4
Note: You cannot append a data set to itself by using the WHERE= data set
option. 4
316
APPEND Statement
Chapter 15
3 If you do not give the read password for the DATA= data set in the APPEND
statement, by default the procedure looks for the read password for the DATA=
data set in the PROC DATASETS statement. However, the procedure does not
look for the write password for the BASE= data set in the PROC DATASETS
statement. Therefore, you must specify the write password for the BASE= data set
in the APPEND statement.
3 If the BASE= data set is read-protected only, you must specify its read password in
the APPEND statement.
3 In Version 6, when you appended to an indexed data set, the index was updated
for each added observation. Index updates tend to be random; therefore, disk I/O
could have been high.
3 Currently, SAS does not update the index until all observations are added to the
data set. After the append, SAS internally sorts the observations and inserts the
data into the index in sequential order, which reduces most of the disk I/O and
results in a faster append method.
The fast-append method is used by default when the following requirements are met;
otherwise, the Version 6 method is used:
3 The BASE= data set is open for member-level locking. If CNTLLEV= is set to
record, then the fast-append code is not used.
3 The BASE= data set does not contain referential integrity constraints.
3 The BASE= data set is not accessed using the Cross Environment Data Access
(CEDA) facility.
3 The BASE= data set is not using a WHERE= data set option.
To display information in the SAS log about the append method that is being used,
you can specify the MSGLEVEL= system option as follows:
options msglevel=i;
APPEND Statement
317
validated for uniqueness until the index is updated. Then, if a nonunique value is
detected, the offending observation is deleted from the data set. This means that after
observations are appended, some of them may subsequently be deleted.
For a simple example, consider that the BASE= data set has ten observations
numbered from 1 to 10 with a UNIQUE index for the variable ID. You append a data
set that contains ve observations numbered from 1 to 5, and observations 3 and 4 both
contain the same value for ID. The following occurs
1 After the observations are appended, the BASE= data set contains 15 observations
numbered from 1 to 15.
2 SAS updates the index for ID, validates the values, and determines that
observations 13 and 14 contain the same value for ID.
3 SAS deletes one of the observations from the BASE= data set, resulting in 14
observations that are numbered from 1 to 15. For example, observation 13 is
deleted. Note that you cannot predict which observation will be deleted, because
the internal sort may place either observation rst. (In Version 6, you could
predict that observation 13 would be added and observation 14 would be rejected.)
If you do not want the current behavior (which could result in deleted observations)
or if you want to be able to predict which observations are appended, request the
Version 6 append method by specifying the APPENDVER=V6 option:
proc datasets;
append base=a data=b appendver=v6;
run;
Note: In Version 6, deleting the index and then recreating it after the append could
improve performance. The current method may eliminate the need to do that. However,
the performance depends on the nature of your data. 4
318
APPEND Statement
Chapter 15
22may1952
;
data format2;
input Date datetime20.;
format Date datetime20.;
datalines;
25aug1952:11:23:07.4
;
proc append base=format1 data=format2;
run;
3 If the length of a variable is longer in the DATA= data set than in the BASE= data
set, or if the same variable is a character variable in one data set and a numeric
variable in the other, use the FORCE option. Using FORCE has these
consequences:
3 The length of the variables in the BASE= data set takes precedence. SAS
truncates values from the DATA= data set to t them into the length that is
specied in the BASE= data set.
3 The type of the variables in the BASE= data set takes precedence. The
APPEND statement replaces values of the wrong type (all values for the
variable in the DATA= data set) with missing values.
AUDIT Statement
SAS Statements
Result
proc datasets;
append base=a
data=b(gennum=2);
proc datasets;
append base=a(gennum=2)
data=b(gennum=2);
319
System Failures
If a system failure or some other type of interruption occurs while the procedure is
executing, the append operation may not be successful; it is possible that not all,
perhaps none, of the observations will be added to the BASE= data set. In addition, the
BASE= data set may suffer damage. The APPEND operation performs an update in
place, which means that it does not make a copy of the original data set before it begins
to append observations. If you want to be able to restore the original observations, you
can initiate an audit trail for the base data le and select to store a before-update
image of the observations. Then you can write a DATA step to extract and reapply the
original observations to the data le. For information about initiating an audit trail,
see the PROC DATASETS AUDIT Statement on page 319.
AUDIT Statement
Initiates and controls event logging to an audit le as well as suspends, resumes, or terminates
event logging in an audit le.
See also: Understanding an Audit Trail in SAS Language Reference: Concepts
The AUDIT statement takes one of two forms, depending on whether you are
initiating the audit trail or suspending, resuming, or terminating event logging in an
audit le.
Tip:
320
AUDIT Statement
Chapter 15
species the SAS data le in the procedure input library that you want to audit.
INITIATE
creates an audit le that has the same name as the SAS data le and a data set type
of AUDIT. The audit le logs additions, deletions, and updates to the SAS data le.
You must initiate an audit trail before you can suspend, resume, or terminate it.
Options
SAS-password
species the password for the SAS data le, if one exists. The parentheses are
required.
GENNUM=integer
species whether logging can be suspended and audit settings can be changed.
AUDIT_ALL=YES species that all images are logged and cannot be suspended.
That is, you cannot use the LOG statement to turn off logging of particular images,
and you cannot suspend event logging by using the SUSPEND statement. To turn off
logging, you must use the TERMINATE statement, which terminates event logging
and deletes the audit le.
Default: NO
LOG
AUDIT Statement
321
If you do not want to log a particular image, specify NO for the image type.
For example, the following code turns off logging the error images, but the
administrative, before, and data images continue to be logged:
Tip:
log error_image=no;
suspends event logging to the audit le, but does not delete the audit le.
RESUME
The following example creates the same audit le but stores only error record images:
proc datasets library=MyLib;
audit MyFile (alter=MyPassword);
initiate
log data_image=NO before_image=NO;
run;
322
CHANGE Statement
Chapter 15
CHANGE Statement
Renames one or more SAS les in the same SAS data library.
Featured in:
CHANGE old-name-1=new-name-1
<old-name-n=new-name-n >
</ <ALTER=alter-password>
<GENNUM=ALL|integer>
<MEMTYPE=mtype>>;
Required Arguments
old-name=new-name
changes the name of a SAS le in the input data library. old-name must be the name
of an existing SAS le in the input data library.
Featured in:
Options
ALTER=alter-password
provides the alter password for any alter-protected SAS les named in the CHANGE
statement. Because a CHANGE statement changes the names of SAS les, you need
alter access to use the CHANGE statement for new-name. You can use the option
either in parentheses after the name of each SAS le or after a forward slash.
See also: Using Passwords with the DATASETS Procedure on page 359
GENNUM=ALL|integer
restricts processing for generation data sets. You can use the option either in
parentheses after the name of each SAS le or after a forward slash. Valid values are
ALL | 0
refers to the base version and all historical versions of a generation group.
integer
refers to a specic version from a generation group. Specifying a positive number
is an absolute reference to a specic generation number that is appended to a data
sets name; that is, gennum=2 species MYDATA#002. Specifying a negative
number is a relative reference to a historical version in relation to the base
version, from the youngest to the oldest; that is, gennum=-1 refers to the youngest
historical version.
For example, the following statements change the name of version A#003 to base B:
proc datasets;
change A=B / gennum=3;
proc datasets;
change A(gennum=3)=B;
CONTENTS Statement
323
proc datasets;
change A(gennum=3)=B(gennum=3);
See also: Restricting Processing for Generation Data Sets on page 362
See also: Understanding Generation Data Sets in SAS Language Reference:
Concepts
MEMTYPE=mtype
restricts processing to one member type. You can use the option either in
parentheses after the name of each SAS le or after a forward slash.
Aliases: MTYPE=, MT=
Default: If you do not specify MEMTYPE= in the PROC DATASETS statement, the
default is MEMTYPE=ALL.
See also: Restricting Member Types for Processing on page 360
Details
3 The CHANGE statement changes names by the order that the old-names occur in
the directory listing, not in the order that you list the changes in the CHANGE
statement.
3 If the old-name SAS le does not exist in the SAS data library, PROC DATASETS
stops processing the RUN group containing the CHANGE statement and issues an
error message. To override this behavior, use the NOWARN option in the PROC
DATASETS statement.
3 If you change the name of a data set that has an index, the index continues to
correspond to the data set.
CONTENTS Statement
Describes the contents of one or more SAS data sets and prints the directory of the SAS data
library.
Reminder: You can use data set options with the DATA=, OUT=, and OUT2= options.
See Data Set Options on page 18 for a list. You can use any global statements as well.
See Global Statements on page 18.
Featured in: Example 4 on page 384
CONTENTS <option(s)>;
To do this
DATA=
OUT=
OUT2=
DETAILS|NODETAILS
324
CONTENTS Statement
Chapter 15
To do this
DIRECTORY
FMTLEN
MEMTYPE=
NODS
NOPRINT
VARNUM
ORDER=IGNORECASE
SHORT
CENTILES
Options
CENTILES
Index
Update Centiles
Current Update
Percent
# of Unique
Values
Variables
DATA=SAS-le-specication
To obtain the contents of a specic version from a generation group, use the
GENNUM= data set option as shown in the following CONTENTS statement:
contents data=HtWt(gennum=3);
CONTENTS Statement
325
<libref.>_ALL_
gives you information about all SAS data sets that have the type or types specied
by the MEMTYPE= option. libref refers to the SAS data library. The default for
libref is the libref of the procedure input library.
3 If you are using the _ALL_ keyword, you need read access to all
read-protected SAS data sets in the SAS data library.
library.
If you specify a read-protected data set in the DATA= option but do not give
the read password, by default the procedure looks in the PROC DATASETS
statement for the read password. However, if you do not specify the DATA= option
and the default data set (last one created in the session) is read protected, the
procedure does not look in the PROC DATASETS statement for the read password.
Tip:
Featured in:
DETAILS|NODETAILS
DETAILS includes these additional columns of information in the output, but only if
DIRECTORY is also specied.
Default: If neither DETAILS or NODETAILS is specied, the defaults are as
follows: for the CONTENTS procedure, the default is the system option setting,
which is NODETAILS; for the CONTENTS statement, the default is whatever is
specied on the PROC DATASETS statement, which also defaults to the system
option setting.
See also: description of the additional columns in Options in PROC DATASETS
prints a list of all SAS les in the specied SAS data library. If DETAILS is also
specied, using DIRECTORY causes the additional columns described in
DETAILS|NODETAILS on page 309 to be printed.
FMTLEN
prints the length of the informat or format. If you do not specify a length for the
informat or format when you associate it with a variable, the length does not appear
in the output of the CONTENTS statement unless you use the FMTLEN option. The
length also appears in the FORMATL or INFORML variable in the output data set.
MEMTYPE=(mtype(s))
326
CONTENTS Statement
Chapter 15
the following statements produce the contents of only the SAS data sets with the
member type DATA:
proc datasets memtype=data;
contents data=_all_;
run;
suppresses printing the contents of individual les when you specify _ALL_ in the
DATA= option. The CONTENTS statement prints only the SAS data library
directory. You cannot use the NODS option when you specify only one SAS data set
in the DATA= option.
NODETAILS
IGNORECASE
VARNUM
OUT=SAS-data-set
names the output data set to contain information about indexes and integrity
constraints.
Tip: If UPDATECENTILES was not specied in the index denition, then the
default value of 5 is used in the RECREATE variable of the OUT2 data set.
Tip: OUT2= does not suppress the printed output from the statement. To suppress
the printed output, use the NOPRINT option.
See also: The OUT2= Data Set on page 374 for a description of the variables in
the OUT2= data set.
SHORT
prints only the list of variable names, the index information, and the sort
information for the SAS data set.
VARNUM
prints a list of the variable names in the order of their logical position in the data
set. By default, the CONTENTS statement lists the variables alphabetically. The
physical position of the variable in the data set is engine-dependent.
Details
The CONTENTS statement prints an alphabetical listing of the variables by default,
except for variables in the form of a numbered range list. Numbered range lists, such
COPY Statement
327
as x1x100, are printed in incrementing order, that is, x1x100. For more information,
see Alphabetic List of Variables and Attributes on page 366.
COPY Statement
Copies all or some of the SAS les in a SAS library.
Featured in: Example 1 on page 376
COPY OUT=libref-1
<CLONE|NOCLONE>
<CONSTRAINT=YES|NO>
<DATECOPY>
<FORCE>
<IN=libref-2>
<INDEX=YES|NO>
<MEMTYPE=(mtype(s))>
<MOVE < ALTER=alter-password>> ;
Required Arguments
OUT=libref-1
Featured in:
328
COPY Statement
Chapter 15
Options
ALTER=alter-password
provides the alter password for any alter-protected SAS les that you are moving
from one data library to another. Because the MOVE option deletes the SAS le from
the original data library, you need alter access to move the SAS le.
See also: Using Passwords with the DATASETS Procedure on page 359
CLONE|NOCLONE
3
3
3
3
3
For the BUFSIZE= attribute, the following table summarizes how the COPY
statement works:
Table 15.1
If you use
CLONE
uses the BUFSIZE= value from the input data set for the output data
set.
NOCLONE
uses the current setting of the SAS system option BUFSIZE= for the
output data set.
neither
For the COMPRESS= and REUSE= attributes, the following table summarizes
how the COPY statement works:
Table 15.2
COPY Statement
329
If you use
CLONE
uses the values from the input data set for the output data set. If the
engine for the input data set does not support the compression and
reuse space attributes, then the COPY statement uses the current
setting of the corresponding SAS system option.
NOCLONE
uses the current setting of the SAS system options COMPRESS= and
REUSE= for the output data set.
neither
defaults to CLONE.
For the OUTREP= attribute, the following table summarizes how the COPY
statement works:
Table 15.3
If you use
CLONE
results in a copy with the data representation of the input data set.
NOCLONE
neither
default is CLONE.
If you use
CLONE
results in a copy that uses the encoding of the input data set or, if
specied, the value of the INENCODING= option in the LIBNAME
statement for the input library.
NOCLONE
neither
default is CLONE.
330
COPY Statement
Chapter 15
CONSTRAINT=YES|NO
species whether to copy all integrity constraints when copying a data set.
Default: NO
Tip: For data sets with integrity constraints that have a foreign key, the COPY
statement copies the general and referential constraints if CONSTRAINT=YES is
specied and the entire library is copied. If you use the SELECT or EXCLUDE
statement to copy the data sets, then the referential integrity constraints are not
copied. For more information, see Understanding Integrity Constraints in SAS
Language Reference: Concepts.
DATECOPY
copies the SAS internal date and time when the SAS le was created and the date
and time when it was last modied to the resulting copy of the le. Note that the
operating environment date and time are not preserved.
Restriction: DATECOPY cannot be used with encrypted les or catalogs.
Restriction: DATECOPY can be used only when the resulting SAS le uses the V8
or V9 engine.
Tip: You can alter the le creation date and time with the DTC= option on the
MODIFY statement. See MODIFY Statement on page 348.
Tip: If the le that you are copying has attributes that require additional
processing, the last modied date is changed to the current date. For example,
when you copy a data set that has an index, the index must be rebuilt, and this
changes the last modied date to the current date. Other attributes that require
additional processing and that could affect the last modied date include integrity
constraints and a sort indicator.
FORCE
allows you to use the MOVE option for a SAS data set on which an audit trail exists.
Note:
IN=libref-2
species whether to copy all indexes for a data set when copying the data set to
another SAS data library.
Default: YES
MEMTYPE=(mtype(s))
moves SAS les from the input data library (named with the IN= option) to the
output data library (named with the OUT= option) and deletes the original les from
the input data library.
COPY Statement
331
Restriction: The MOVE option can be used to delete a member of a SAS library
only if the IN= engine supports the deletion of tables. A tape format engine does
not support table deletion. If you use a tape format engine, SAS suppresses the
MOVE operation and prints a warning.
Featured in:
NOCLONE
Also, you can select a group of members whose names begin with the same letter or
letters by entering the common letters followed by a colon (:). For example, you can
select the four members in the previous example and all other members having names
that begin with the letter T by specifying the following statement:
select t:;
You specify members to exclude in the same way that you specify those to select.
That is, you can list individual member names, use an abbreviated list, or specify a
common letter or letters followed by a colon (:). For example, the following statement
excludes the members STATS, TEAMS1, TEAMS2, TEAMS3, TEAMS4 and all the
members that begin with the letters RBI from the copy operation:
exclude stats teams1-teams4 rbi:;
Note that the MEMTYPE= option affects which types of members are available to be
selected or excluded.
When a SELECT or EXCLUDE statement is used with CONSTRAINT=YES, only the
general integrity constraints on the data sets are copied. Any referential integrity
constraints are not copied. For more information, see Understanding Integrity
Constraints in SAS Language Reference: Concepts.
332
COPY Statement
Chapter 15
3 You cannot limit its effect to the member immediately preceding it by enclosing
the MEMTYPE= option in parentheses.
3 The SELECT and EXCLUDE statements and the IN= option (in the COPY
statement) affect the behavior of the MEMTYPE= option in the COPY statement
according to the following rules:
1 MEMTYPE= in a SELECT or EXCLUDE statement takes precedence over
2 If you do not use the IN= option, or you use it to specify the library that
3 If you specify an input data library in the IN= option other than the
COPY Statement
333
MIXEDCASE, the copy fails with an error if the OUT= engine does not support long
variable names.
When a variable name is truncated, the variable name is shortened to eight bytes. If
this name has already been dened in the data set, the name is shortened and a digit is
added, starting with the number 2. The process of truncation and adding a digit
continues until the variable name is unique. For example, a variable named
LONGVARNAME becomes LONGVARN, provided that a variable with that name does
not already exist in the data set. In that case, the variable name becomes LONGVAR2.
CAUTION:
Truncated variable names can collide with names already dened in the input data set.
This is possible when the variable name that is already dened is exactly eight bytes
long and ends in a digit. In that case, the truncated name is dened in the output
data set and the name from the input data set is changed. For example,
options validvarname=mixedcase;
data test;
lonvar10=aLongVariableName;
retain longvar1-longvar5 0;
run;
options validvarname=v6;
proc copy in=work out=sasuser;
select test;
run;
334
DELETE Statement
Chapter 15
DELETE Statement
Deletes SAS les from a SAS data library.
Featured in:
DELETE SAS-le(s)
</ <ALTER=alter-password>
<GENNUM=ALL|HIST|REVERT|integer>
<MEMTYPE=mtype>>;
Required Arguments
SAS-le(s)
Options
ALTER=alter-password
provides the alter password for any alter-protected SAS les that you want to delete.
You can use the option either in parentheses after the name of each SAS le or after
a forward slash.
See also: Using Passwords with the DATASETS Procedure on page 359
GENNUM=ALL|HIST|REVERT|integer
restricts processing for generation data sets. You can use the option either in
parentheses after the name of each SAS le or after a forward slash. Valid values are
ALL
refers to the base version and all historical versions in a generation group.
HIST
refers to all historical versions, but excludes the base version in a generation group.
REVERT|0
deletes the base version and changes the most current historical version, if it
exists, to the base version.
integer
is a number that references a specic version from a generation group. Specifying
a positive number is an absolute reference to a specic generation number that is
appended to a data sets name; that is, gennum=2 species MYDATA#002.
Specifying a negative number is a relative reference to a historical version in
relation to the base version, from the youngest to the oldest; that is, gennum=-1
refers to the youngest historical version.
See also: Restricting Processing for Generation Data Sets on page 362
See also: Understanding Generation Data Sets in SAS Language Reference:
Concepts
MEMTYPE=mtype
restricts processing to one member type. You can use the option either in
parentheses after the name of each SAS le or after a forward slash.
DELETE Statement
335
MT=, MTYPE=
Default: DATA
Aliases:
Details
3 SAS immediately deletes SAS les when the RUN group executes. You do not
have an opportunity to verify the delete operation before it begins.
3 If you attempt to delete a SAS le that does not exist in the procedure input
library, PROC DATASETS issues a message and continues processing. If
NOWARN is used, no message is issued.
3 When you use the DELETE statement to delete a data set that has indexes
associated with it, the statement also deletes the indexes.
3 You cannot use the DELETE statement to delete a data le that has a foreign key
integrity constraint or a primary key with foreign key references. For data les
that have foreign keys, you must remove the foreign keys before you delete the
data le. For data les that have primary keys with foreign key references, you
must remove the foreign keys that reference the primary key before you delete the
data le.
336
DELETE Statement
Chapter 15
The following statements delete the base version and all historical versions where
the data set name begins with the letter A:
proc datasets;
delete A:(gennum=all);
proc datasets;
delete A: / gennum=all;
proc datasets gennum=all;
delete A:;
Deleting the Base Version and Renaming the Youngest Historical Version to the Base
Version
The following statements delete the base version and rename the youngest historical
version to the base version, where the data set name is A:
proc datasets;
delete A(gennum=revert);
proc datasets;
delete A / gennum=revert;
proc datasets gennum=revert;
delete A;
The following statements delete the base version and rename the youngest historical
version to the base version, where the data set name begins with the letter A:
proc datasets;
delete A:(gennum=revert);
proc datasets;
delete A: / gennum=revert;
proc datasets gennum=revert;
delete A:;
The following statements delete a specic historical version, where the data set name
begins with the letter A:
proc datasets;
delete A:(gennum=1);
proc datasets;
delete A: / gennum=1;
DELETE Statement
The following statements use a relative number to delete the youngest historical
version, where the data set name begins with the letter A:
proc datasets;
delete A:(gennum=-1);
proc datasets;
delete A: / gennum=-1;
proc datasets gennum=-1;
delete A:;
The following statements delete all historical versions and leave the base version,
where the data set name begins with the letter A:
proc datasets;
delete A:(gennum=hist);
proc datasets;
delete A: / gennum=hist;
proc datasets gennum=hist;
delete A:;
337
338
EXCHANGE Statement
Chapter 15
EXCHANGE Statement
Exchanges the names of two SAS les in a SAS library.
Featured in:
EXCHANGE name-1=other-name-1
<name-n=other-name-n>
</ <ALTER=alter-password>
<MEMTYPE=mtype>>;
Required Arguments
name=other-name
exchanges the names of SAS les in the procedure input library. Both name and
other-name must already exist in the procedure input library.
Options
ALTER=alter-password
provides the alter password for any alter-protected SAS les whose names you want
to exchange. You can use the option either in parentheses after the name of each
SAS le or after a forward slash.
See also: Using Passwords with the DATASETS Procedure on page 359
MEMTYPE=mtype
restricts processing to one member type. You can only exchange the names of SAS
les of the same type. You can use the option either in parentheses after the name of
each SAS le or after a forward slash.
Default: If you do not specify MEMTYPE= in the PROC DATASETS statement, the
default is ALL.
See also: Restricting Member Types for Processing on page 360
Details
3 When you exchange more than one pair of names in one EXCHANGE statement,
PROC DATASETS performs the exchanges in the order that the names of the SAS
les occur in the directory listing, not in the order that you list the exchanges in
the EXCHANGE statement.
3 If the name SAS le does not exist in the SAS data library, PROC DATASETS
stops processing the RUN group that contains the EXCHANGE statement and
issues an error message. To override this behavior, specify the NOWARN option in
the PROC DATASETS statement.
3 The EXCHANGE statement also exchanges the associated indexes so that they
correspond with the new name.
3 The EXCHANGE statement only allows two existing generation groups to
exchange names. You cannot exchange a specic generation number with either an
existing base version or another generation number.
FORMAT Statement
339
EXCLUDE Statement
Excludes SAS les from copying.
Restriction:
Restriction:
Required Arguments
SAS-le(s)
species one or more SAS les to exclude from the copy operation. All SAS les you
name in the EXCLUDE statement must be in the library that is specied in the IN=
option in the COPY statement. If the SAS les are generation groups, the EXCLUDE
statement allows only selection of the base versions.
Options
MEMTYPE=mtype
restricts processing to one member type. You can use the option either in
parentheses after the name of each SAS le or after a forward slash.
Aliases:
MTYPE=, MT=
Default: If you do not specify MEMTYPE= in the PROC DATASETS statement, the
331
FORMAT Statement
Permanently assigns, changes, and removes variable formats in the SAS data set specied in the
MODIFY statement.
Restriction:
340
IC CREATE Statement
Chapter 15
Required Arguments
variable-list
species one or more variables whose format you want to assign, change, or remove.
If you want to disassociate a format with a variable, list the variable last in the list
with no format following. For example:
format x1-x3 4.1 time hhmm2.2 age;
Options
format
species a format to apply to the variable or variables listed before it. If you do not
specify a format, the FORMAT statement removes any format associated with the
variables in variable-list.
Note: You can use shortcut methods for specifying variables, such as the keywords
_NUMERIC, _CHARACTER_, and _ALL_. See Shortcuts for Specifying Lists of
Variable Names on page 24 for more information. 4
IC CREATE Statement
Creates an integrity constraint.
Must be in a MODIFY RUN group
See also: Understanding Integrity Constraints in SAS Language Reference: Concepts
Restriction:
Required Arguments
constraint
IC CREATE Statement
341
CHECK (WHERE-expression)
limits the data values of variables to a specic set, range, or list of values. This is
accomplished with a WHERE expression.
PRIMARY KEY (variables)
species a primary key, that is, a set of variables that do not contain missing
values and whose values are unique.
Interaction: A primary key affects the values of an individual data le until it
constraints, which means that variables in a data le are part of both a primary
key and a foreign key denition, if you use exactly the same variables, then the
variables must be dened in a different order.
FOREIGN KEY (variables) REFERENCES table-name
<ON DELETE referential-action> <ON UPDATE referential-action>
species a foreign key, that is, a set of variables whose values are linked to the
values of the primary key variables in another data le. The referential actions
are enforced when updates are made to the values of a primary key variable that
is referenced by a foreign key.
There are three types of referential actions: RESTRICT, SET NULL, and
CASCADE:
For a RESTRICT referential action,
a delete operation
deletes the primary key row, but only if no foreign key values match the deleted
value.
an update operation
updates the primary key value, but only if no foreign key values match the
current value to be updated.
For a SET NULL referential action,
a delete operation
deletes the primary key row and sets the corresponding foreign key values to
NULL.
an update operation
modies the primary key value and sets all matching foreign key values to
NULL.
For a CASCADE referential action,
an update operation
modies the primary key value, and additionally modies any matching foreign
key values to the same value. CASCADE is not supported for delete operations.
Default: RESTRICT is the default action if no referential action is specied.
Interaction: Before it will enforce a SET NULL or CASCADE referential action,
SAS checks to see if there are other foreign keys that reference the primary key
and that specify RESTRICT for the intended operation. If RESTRICT is specied,
or if the constraint reverts to the default values, then RESTRICT is enforced for all
foreign keys, unless no foreign key values match the values to updated or deleted.
When dening overlapping primary key and foreign key constraints,
which means that variables in a data le are part of both a primary key and a
foreign key denition,
Requirement:
3 if you use exactly the same variables, then the variables must be dened in a
different order.
342
IC CREATE Statement
Chapter 15
3 the foreign keys update and delete referential actions must both be
RESTRICT.
Options
<constraint-name=>
is an optional name for the constraint. The name must be a valid SAS name. When
you do not supply a constraint name, a default name is generated. This default
constraint name has the following form
Default name
Constraint type
_NMxxxx_
Not Null
_UNxxxx_
Unique
_CKxxxx_
Check
_PKxxxx_
Primary key
_FKxxxx_
Foreign key
message-string is the text of an error message to be written to the log when the data
fails the constraint. For example,
ic create not null(socsec)
message=Invalid Social Security number;
Length:
Note that for a referential constraint to be established, the foreign key must specify the
same number of variables as the primary key, in the same order, and the variables
must be of the same type (character/numeric) and length.
IC REACTIVATE Statement
343
IC DELETE Statement
Deletes an integrity constraint.
Restriction:
Arguments
constraint-name(s)
names one or more constraints to delete. For example, to delete the constraints
Unique_D and Unique_E, use this statement:
ic delete Unique_D Unique_E;
_ALL_
deletes all constraints for the SAS data le specied in the preceding MODIFY
statement.
IC REACTIVATE Statement
Reactivates a foreign key integrity constraint that is inactive.
Restriction:
Arguments
foreign-key-name
refers to the SAS library containing the data set that contains the primary key that
is referenced by the foreign key.
344
INDEX CENTILES
Chapter 15
For example, suppose that you have the foreign key FKEY dened in data set
MYLIB.MYOWN and that FKEY is linked to a primary key in data set
MAINLIB.MAIN. If the integrity constraint is inactivated by a copy or move operation,
you can reactivate the integrity constraint by using the following code:
proc datasets library=mylib;
modify myown;
ic reactivate fkey references mainlib;
run;
INDEX CENTILES
Updates centiles statistics for indexed variables.
Restriction:
Required Arguments
index(s)
Options
REFRESH
species when centiles are to be updated. It is not practical to update centiles after
every data set update. Therefore, you can specify as the value of
UPDATECENTILES the percent of the data values that can be changed before
centiles for the indexed variables are updated.
Valid values for UPDATECENTILES are
ALWAYS|0
updates centiles when the data set is closed if any changes have been made to the
data set index.
NEVER|101
does not update centiles.
integer
is the percent of values for the indexed variable that can be updated before
centiles are refreshed.
Alias: UPDCEN
Default 5 (percent)
345
Required Arguments
index-specication(s)
Options
NOMISS
excludes from the index all observations with missing values for all index variables.
When you create an index with the NOMISS option, SAS uses the index only for
WHERE processing and only when missing values fail to satisfy the WHERE
expression. For example, if you use the following WHERE statement, SAS does not
use the index, because missing values satisfy the WHERE expression:
where dept ne 01;
species that the combination of values of the index variables must be unique. If you
specify UNIQUE and multiple observations have the same values for the index
variables, the index is not created.
346
Chapter 15
Featured in:
UPDATECENTILES=ALWAYS|NEVER|integer
species when centiles are to be updated. It is not practical to update centiles after
every data set update. Therefore, you can specify the percent of the data values that
can be changed before centiles for the indexed variables are updated. Valid values for
UPDATECENTILES are as follows:
ALWAYS|0
updates centiles when the data set is closed if any changes have been made to the
data set index.
NEVER|101
does not update centiles.
integer
species the percent of values for the indexed variable that can be updated before
centiles are refreshed.
Alias: UPDCEN
Default: 5% (percent)
Required Arguments
index(s)
names one or more indexes to delete. The index(es) must be for variables in the SAS
data set that is named in the preceding MODIFY statement. You can delete both
simple and composite indexes.
_ALL_
deletes all indexes, except for indexes that are owned by an integrity constraint.
When an index is created, it is marked as owned by the user, by an integrity
constraint, or by both. If an index is owned by both a user and an integrity
constraint, the index is not deleted until both an IC DELETE statement and an
INDEX DELETE statement are processed.
Note: You can use the CONTENTS statement to produce a list of all indexes for a
data set. 4
LABEL Statement
347
INFORMAT Statement
Permanently assigns, changes, and removes variable informats in the data set specied in the
MODIFY statement.
Restriction:
Required Arguments
variable-list
species one or more variables whose informats you want to assign, change, or
remove. If you want to disassociate an informat with a variable, list the variable last
in the list with no informat following. For example:
informat a b 2. x1-x3 4.1 c;
Options
informat
LABEL Statement
Assigns, changes, and removes variable labels for the SAS data set specied in the MODIFY
statement.
Restriction:
348
MODIFY Statement
Chapter 15
Required Arguments
variable=<label>
assigns a label to a variable. If a single quotation mark appears in the label, write it
as two single quotation marks in the LABEL statement. Specifying variable= or
variable= removes the current label.
Range:
1-256 characters
MODIFY Statement
Changes the attributes of a SAS le and, through the use of subordinate statements, the attributes
of variables in the SAS le.
Featured in:
To do this
MEMTYPE=
Specify attributes
Change the character-set encoding
CORRECTENCODING=
DTC=
LABEL=
SORTEDBY=
TYPE=
Modify passwords
Modify an alter password
ALTER=
PW=
READ=
WRITE=
GENMAX=
GENNUM=
MODIFY Statement
349
Required Arguments
SAS-le
Options
ALTER=password-modication
assigns, changes, or removes an alter password for the SAS le named in the
MODIFY statement. password-modication is one of the following:
3
3
3
3
3
new-password
old-password / new-password
/ new-password
old-password /
/
enables you to change the encoding indicator, which is recorded in the les
descriptor information, in order to match the actual encoding of the les data.
See: The CORRECTENCODING= Option on the MODIFY Statement of the
DATASETS Procedure in SAS National Language Support (NLS): Users Guide
DTC=SAS-date-time
species a date and time to substitute for the date and time stamp placed on a SAS
le at the time of creation. You cannot use this option in parentheses after the name
of each SAS le; you must specify DTC= after a forward slash. For example:
modify mydata / dtc=03MAR00:12:01:00dt;
Use DTC= to alter a SAS les creation date and time prior to using the
DATECOPY option in the CIMPORT procedure, COPY procedure, CPORT
procedure, SORT procedure, and the COPY statement in the DATASETS
procedure.
Tip:
Restriction: A SAS les creation date and time cannot be set later than the date
engine.
GENMAX=number-of-generations
species the maximum number of versions. You can use this option either in
parentheses after the name of each SAS le or after a forward slash.
Range: 0 to 1,000
Default: 0
GENNUM=integer
restricts processing for generation data sets. You can specify GENNUM= either in
parentheses after the name of each SAS le or after a forward slash. Valid value is
integer, which is a number that references a specic version from a generation group.
Specifying a positive number is an absolute reference to a specic generation number
that is appended to a data sets name; that is, gennum=2 species MYDATA#002.
350
MODIFY Statement
Chapter 15
assigns, changes, or removes a data set label for the SAS data set named in the
MODIFY statement. If a single quotation mark appears in the label, write it as two
single quotation marks. LABEL= or LABEL= removes the current label.
Range: 1-40 characters
Featured in: Example 3 on page 381
MEMTYPE=mtype
assigns, changes, or removes a read, write, or alter password for the SAS le named
in the MODIFY statement. password-modication is one of the following:
3 new-password
3 old-password / new-password
3 / new-password
3 old-password /
3 /
See also: Manipulating Passwords on page 351
READ=password-modication
assigns, changes, or removes a read password for the SAS le named in the MODIFY
statement. password-modication is one of the following:
3
3
3
3
3
new-password
old-password / new-password
/ new-password
old-password /
/
SORTEDBY=sort-information
species how the data are currently sorted. SAS stores the sort information with the
le but does not verify that the data are sorted the way you indicate.
sort-information can be one of the following:
by-clause </ collate-name>
indicates how the data are currently sorted. Values for by-clause are the variables
and options you can use in a BY statement in a PROC SORT step. collate-name
names the collating sequence used for the sort. By default, the collating sequence
is that of your host operating environment.
MODIFY Statement
351
_NULL_
removes any existing sort information.
Restriction: The data must be sorted in the order that you specify. If the data is
not in the specied order, SAS will not sort it for you.
Featured in:
TYPE=special-type
assigns or changes the special data set type of a SAS data set. SAS does not verify
3 the SAS data set type you specify in the TYPE= option (except to check if it has
a length of eight or fewer characters).
3 that the SAS data sets structure is appropriate for the type you have
designated.
Note: Do not confuse the TYPE= option with the MEMTYPE= option. The
TYPE= option species a type of special SAS data set. The MEMTYPE= option
species one or more types of SAS les in a SAS data library. 4
Most SAS data sets have no special type. However, certain SAS procedures,
like the CORR procedure, can create a number of special SAS data sets. In
addition, SAS/STAT software and SAS/EIS software support special data set types.
Tip:
WRITE=password-modication
assigns, changes, or removes a write password for the SAS le named in the
MODIFY statement. password-modication is one of the following:
3
3
3
3
3
new-password
old-password / new-password
/ new-password
old-password /
/
Manipulating Passwords
In order to assign, change, or remove a password, you must specify the password for
the highest level of protection that currently exists on that le.
Assigning Passwords
/* assigns a password to an unprotected file */
modify colors (pw=green);
Changing Passwords
/* changes the write password from YELLOW to BROWN */
modify cars (write=yellow/brown);
352
RENAME Statement
Chapter 15
Removing Passwords
/* removes the alter password RED from STATES */
modify states (alter=red/);
Removing Passwords
/* removes the alter password RED from STATES#002 */
modify states (alter=red/) / gennum=2;
RENAME Statement
Renames variables in the SAS data set specied in the MODIFY statement.
Restriction:
Featured in:
RENAME old-name-1=new-name-1
<old-name-n=new-name-n>;
Required Arguments
old-name=new-name
changes the name of a variable in the data set specied in the MODIFY statement.
old-name must be a variable that already exists in the data set. new-name cannot be
the name of a variable that already exists in the data set or the name of an index,
and the new name must be a valid SAS name. See Rules for SAS Variable Names
in SAS Language Reference: Concepts.
REPAIR Statement
353
Details
3 If old-name does not exist in the SAS data set or new-name already exists, PROC
DATASETS stops processing the RUN group containing the RENAME statement
and issues an error message.
3 When you use the RENAME statement to change the name of a variable for which
there is a simple index, the statement also renames the index.
3 If the variable that you are renaming is used in a composite index, the composite
index automatically references the new variable name. However, if you attempt to
rename a variable to a name that has already been used for a composite index, you
receive an error message.
REPAIR Statement
Attempts to restore damaged SAS data sets or catalogs to a usable condition.
REPAIR SAS-le(s)
</ < ALTER=alter-password>
<GENNUM=integer>
<MEMTYPE=mtype>>;
Required Arguments
SAS-le(s)
species one or more SAS data sets or catalogs in the procedure input library.
Options
ALTER=alter-password
provides the alter password for any alter-protected SAS les that are named in the
REPAIR statement. You can use the option either in parentheses after the name of
each SAS le or after a forward slash.
See also: Using Passwords with the DATASETS Procedure on page 359
GENNUM=integer
restricts processing for generation data sets. You can use the option either in
parentheses after the name of each SAS le or after a forward slash. Valid value is
integer, which is a number that references a specic version from a generation group.
Specifying a positive number is an absolute reference to a specic generation number
that is appended to a data sets name; that is, gennum=2 species MYDATA#002.
Specifying a negative number is a relative reference to a historical version in relation
to the base version, from the youngest to the oldest; that is, gennum=-1 refers to the
youngest historical version. Specifying 0, which is the default, refers to the base
version.
See also: Restricting Processing for Generation Data Sets on page 362
See also: Understanding Generation Data Sets in SAS Language Reference:
Concepts
354
REPAIR Statement
Chapter 15
MEMTYPE=mtype
Details
The most common situations that require the REPAIR statement are as follows:
3 A system failure occurs while you are updating a SAS data set or catalog.
3 The device on which a SAS data set or an associated index resides is damaged. In
this case, you can restore the damaged data set or index from a backup device, but
the data set and index no longer match.
3 The disk that stores the SAS data set or catalog becomes full before the le is
completely written to disk. You may need to free some disk space. PROC
DATASETS requires free space when repairing SAS data sets with indexes and
when repairing SAS catalogs.
3 An I/O error occurs while you are writing a SAS data set or catalog entry.
When you use the REPAIR statement for SAS data sets, it recreates all indexes for
the data set. It also attempts to restore the data set to a usable condition, but the
restored data set may not include the last several updates that occurred before the
system failed. You cannot use the REPAIR statement to recreate indexes that were
destroyed by using the FORCE option in a PROC SORT step.
When you use the REPAIR statement for a catalog, you receive a message stating
whether the REPAIR statement restored the entry. If the entire catalog is potentially
damaged, the REPAIR statement attempts to restore all the entries in the catalog. If
only a single entry is potentially damaged, for example when a single entry is being
updated and a disk-full condition occurs, on most systems only the entry that is open
when the problem occurs is potentially damaged. In this case, the REPAIR statement
attempts to repair only that entry. Some entries within the restored catalog may not
include the last updates that occurred before a system crash or an I/O error. The
REPAIR statement issues warning messages for entries that may have truncated data.
To repair a damaged catalog, the version of SAS that you use must be able to update
the catalog. Whether a SAS version can update a catalog (or just read it) is determined
by the SAS version that created the catalog:
3 A damaged Version 6 catalog can be repaired with Version 6 only.
3 A damaged Version 8 catalog can be repaired with either Version 8 or SAS System
9, but not with Version 6.
3 A damaged SAS System 9 catalog can be repaired with SAS System 9 only.
If the REPAIR operation is not successful, try to restore the SAS data set or catalog
from your systems backup les.
If you issue a REPAIR statement for a SAS le that does not exist in the specied
library, PROC DATASETS stops processing the run group that contains the REPAIR
statement, and issues an error message. To override this behavior and continue
processing, use the NOWARN option in the PROC DATASETS statement.
If you are using Cross-Environment Data Access (CEDA) to process a damaged
foreign SAS data set, CEDA cannot repair it. CEDA does not support update processing,
which is required in order to repair a damaged data set. To repair the foreign le, you
must move it back to its native environment. Note that observations may be lost during
the repair process. For more information about CEDA, refer to Processing Data Using
Cross-Environment Data Access in SAS Language Reference: Concepts.
SAVE Statement
355
SAVE Statement
Deletes all the SAS les in a library except the ones listed in the SAVE statement.
Featured in: Example 2 on page 380
Required Arguments
SAS-le(s)
species one or more SAS les that you do not want to delete from the SAS data
library.
Options
MEMTYPE=mtype
restricts processing to one member type. You can use the option either in
parentheses after the name of each SAS le or after a forward slash.
Aliases:
Default: If you do not specify the MEMTYPE= option in the PROC DATASETS
Details
3 If one of the SAS les in SAS-le does not exist in the procedure input library,
PROC DATASETS stops processing the RUN group containing the SAVE
statement and issues an error message. To override this behavior, specify the
NOWARN option in the PROC DATASETS statement.
3 When the SAVE statement deletes SAS data sets, it also deletes any indexes
associated with those data sets.
CAUTION:
SAS immediately deletes libraries and library members when you submit a RUN
group. You are not asked to verify the delete operation before it begins. Because
the SAVE statement deletes many SAS les in one operation, be sure that you
understand how the MEMTYPE= option affects which types of SAS les are
saved and which types are deleted.
3 When you use the SAVE statement with generation groups, the SAVE statement
treats the base version and all historical versions as a unit. You cannot save a
specic version.
356
SELECT Statement
Chapter 15
SELECT Statement
Selects SAS les for copying.
Restriction:
Restriction:
Featured in:
SELECT SAS-le(s)
</ <ALTER=alter-password>
<MEMTYPE= mtype>>;
Required Arguments
SAS-le(s)
species one or more SAS les that you want to copy. All of the SAS les that you
name must be in the data library that is referenced by the libref named in the IN=
option in the COPY statement. If the SAS les have generation groups, the SELECT
statement allows only selection of the base versions.
Options
ALTER=alter-password
provides the alter password for any alter-protected SAS les that you are moving
from one data library to another. Because you are moving and thus deleting a SAS
le from a SAS data library, you need alter access. You can use the option either in
parentheses after the name of each SAS le or after a forward slash.
See also: Using Passwords with the DATASETS Procedure on page 359
MEMTYPE=mtype
restricts processing to one member type. You can use the option either in
parentheses after the name of each SAS le or after a forward slash.
Aliases:
Default: If you do not specify the MEMTYPE= option in the PROC DATASETS
331
See also: Restricting Member Types for Processing on page 360
Featured in:
Procedure Execution
357
Procedure Execution
Execution of Statements
When you start the DATASETS procedure, you specify the procedure input library in
the PROC DATASETS statement. If you omit a procedure input library, the procedure
processes the current default SAS data library (usually the WORK data library). To
specify a new procedure input library, issue the DATASETS procedure again.
Statements execute in the order they are written. For example, if you want to see
the contents of a data set, copy a data set, and then visually compare the contents of
the second data set with the rst, the statements that perform those tasks must appear
in that order (that is, CONTENTS, COPY, CONTENTS).
RUN-Group Processing
PROC DATASETS supports RUN-group processing. RUN-group processing enables
you to submit RUN groups without ending the procedure.
The DATASETS procedure supports four types of RUN groups. Each RUN group is
dened by the statements that compose it and by what causes it to execute.
Some statements in PROC DATASETS act as implied RUN statements because they
cause the RUN group preceding them to execute.
The following list discusses what statements compose a RUN group and what causes
each RUN group to execute:
3 The MODIFY statement, and any of its subordinate statements, form a RUN
group. These RUN groups always execute immediately. No other statement is
necessary to cause a MODIFY RUN group to execute.
358
Procedure Execution
Chapter 15
statement to execute; it becomes part of the same RUN group. To execute the
RUN group, submit one of the following statements:
3
3
3
3
3
3
3
3
PROC DATASETS
APPEND
CONTENTS
COPY
MODIFY
QUIT
RUN
another DATA or PROC step.
SAS reads the program statements that are associated with one task until it reaches
a RUN statement or an implied RUN statement. It executes all of the preceding
statements immediately, then continues reading until it reaches another RUN
statement or implied RUN statement. To execute the last task, you must use a RUN
statement or a statement that stops the procedure.
The following PROC DATASETS step contains ve RUN groups:
libname dest SAS-data-library;
/* RUN group */
proc datasets;
/* RUN group */
change nutr=fatg;
delete bldtest;
exchange xray=chest;
/* RUN group */
copy out=dest;
select report;
/* RUN group */
modify bp;
label dias=Taken at Noon;
rename weight=bodyfat;
/* RUN group */
append base=tissue data=newtiss;
quit;
Note: If you are running in interactive line mode, you can receive messages that
statements have already executed before you submit a RUN statement. Plan your tasks
carefully if you are using this environment for running PROC DATASETS. 4
Error Handling
Generally, if an error occurs in a statement, the RUN group containing the error does
not execute. RUN groups preceding or following the one containing the error execute
normally. The MODIFY RUN group is an exception. If a syntax error occurs in a
statement subordinate to the MODIFY statement, only the statement containing the
error fails. The other statements in the RUN group execute.
Note that if the rst word of the statement (the statement name) is in error and the
procedure cannot recognize it, the procedure treats the statement as part of the
preceding RUN group.
359
Password Errors
If there is an error involving an incorrect or omitted password in a statement, the
error affects only the statement containing the error. The other statements in the RUN
group execute.
used in parentheses, the option only refers to the name immediately preceding the
option. If you are working with more than one SAS le in a data library and each
SAS le has a different password, you must specify password options in
parentheses after individual names.
In the following statement, the ALTER= option provides the password RED for
the SAS le BONES only:
delete xplant bones(alter=red);
2 after a forward slash (/) in a subordinate statement. When you use a password
option following a slash, the option refers to all SAS les named in the statement
unless the same option appears in parentheses after the name of a SAS le. This
method is convenient when you are working with more than one SAS le and they
all have the same password.
* In the APPEND and CONTENTS statements, you use these options just as you use any SAS data set option, in parentheses
after the SAS data set name.
360
Chapter 15
DATASETS statement can be useful if all the SAS les you are working with in
the library have the same password. Do not specify the option in parentheses.
In the following PROC DATASETS step, the PW= option provides the password
RED for the SAS les INSULIN and ABNEG:
proc datasets pw=red;
delete insulin;
contents data=abneg;
run;
Note: For the password for a SAS le in a SELECT statement, SAS looks in
the COPY statement before it looks in the PROC DATASETS statement. 4
In Subordinate Statements
Use the MEMTYPE= option in the following subordinate statements to limit the
member types that are available for processing:
AGE
CHANGE
DELETE
EXCHANGE
EXCLUDE
REPAIR
SAVE
SELECT
Note: The MEMTYPE= option works slightly differently for the CONTENTS, COPY,
and MODIFY statements. Refer to CONTENTS Statement on page 323, COPY
Statement on page 327, and MODIFY Statement on page 348 for more information. 4
361
2 after a slash (/) at the end of the statement. When used following a slash, the
MEMTYPE= option refers to all SAS les named in the statement unless the option
appears in parentheses after the name of a SAS le. For example, the following
statement deletes LOTPIX.CATALOG, REGIONS.DATA, and APPL.CATALOG:
delete lotpix regions(memtype=data) appl / memtype=catalog;
deletes APPL.CATALOG:
proc datasets memtype=catalog;
delete appl;
run;
Note: When you use the EXCLUDE and SELECT statements, the procedure
looks in the COPY statement for the MEMTYPE= option before it looks in the
PROC DATASETS statement. For more information, see Specifying Member
Types When Copying or Moving SAS Files on page 331. 4
4 for the default value. If you do not specify a MEMTYPE= option in the subordinate
statement or in the PROC DATASETS statement, the default value for the
subordinate statement determines the member type available for processing.
Member Types
The following list gives the possible values for the MEMTYPE= option:
ACCESS
access descriptor les (created by SAS/ACCESS software)
ALL
all member types
CATALOG
SAS catalogs
DATA
SAS data les
FDB
nancial database
MDDB
multidimensional database
PROGRAM
stored compiled SAS programs
VIEW
SAS views
362
Chapter 15
Table 15.5 on page 362 shows the member types that you can use in each statement:
Table 15.5
Statement
Default
member type
AGE
DATA
CHANGE
ALL
CONTENTS
DATA1
COPY
ALL
DELETE
DATA
EXCHANGE
ALL
EXCLUDE
ALL
MODIFY
DATA
REPAIR
ALL2
SAVE
ALL
SELECT
ALL
1 When DATA=_ALL_ in the CONTENTS statement, the default is ALL. ALL includes only DATA and VIEW.
2 ALL includes only DATA and CATALOG.
* For the APPEND and CONTENTS statements, use GENNUM= just as you use any SAS data set option, in parentheses
after the SAS data set name.
363
2 after a forward slash (/) in a subordinate statement. When you use the
GENNUM= option following a slash, the option refers to all SAS data sets named
in the statement unless the same option appears in parentheses after the name of
a SAS data set. This method is convenient when you are working with more than
one le and you want the same version for all les.
In the following statement, the GENNUM= option in parentheses species the
generation version for SAS data set CHEST, and the GENNUM= option after the
slash species the generation version for SAS data set VIRUS:
delete chest (gennum=2) virus / gennum=1;
PROC DATASETS statement can be useful if you want the same version for all of
the SAS data sets you are working with in the library. Do not specify the option in
parentheses.
In the following PROC DATASETS step, the GENNUM= option species the
generation version for the SAS les INSULIN and ABNEG:
proc datasets gennum=2;
delete insulin;
contents data=abneg;
run;
Note: For the generation version for a SAS le in a SELECT statement, SAS
looks in the COPY statement before it looks in the PROC DATASETS statement.
364
Chapter 15
Procedure Output
indicates whether the SAS data set is READ, WRITE, or ALTER password
protected.
Data Set Type
names the special data set type (such as CORR, COV, SSPC, EST, or FACTOR), if
any.
Observations
is the total number of observations currently in the le. Note that for a very large
data set, if the number of observations exceeds the number that can be stored in a
double-precision integer, the count will show as missing.
Deleted Observations
is the number of observations marked for deletion. These observations are not
included in the total number of observations, shown in the Observations eld.
Note that for a very large data set, if the number of deleted observations exceeds
the number that can be stored in a double-precision integer, the count will show as
missing.
Procedure Output
365
Compressed
indicates whether the data set is compressed. If the data set is compressed, the
output includes an additional item, Reuse Space (with a value of YES or NO),
that indicates whether to reuse space that is made available when observations
are deleted.
Sorted
indicates whether the data set is sorted. If you sort the data set with PROC SORT,
PROC SQL, or specify sort information with the SORTEDBY= data set option, a
value of YES appears here, and there is an additional section to the output. See
Sort Information on page 367 for details.
Data Representation
Output 15.3
HEALTH.GROUP
DATA
V9
Wednesday, February
05, 2003 02:20:56
Wednesday, February
05, 2003 02:20:56
READ
Observations
Variables
Indexes
Observation Length
148
11
1
96
Deleted Observations
Compressed
Sorted
NO
YES
Test Subjects
WINDOWS_32
wlatin1 Western (Windows)
366
Procedure Output
Chapter 15
Output 15.4
8192
4
1
84
62
4096
2
0
c:\Myfiles\health\group.sas7bdat
9.0101B0
XP_PRO
is the logical position of each variable in the observation. This is the number that
is assigned to the variable when it is dened.
Variable
species the variables length, which is the number of bytes used to store each of a
variables values in a SAS data set.
Transcode
Output 15.5
Procedure Output
367
#
9
4
3
10
11
1
7
2
8
6
5
Variable
Type
Len
BIRTH
CITY
FNAME
HIRED
HPHONE
IDNUM
JOBCODE
LNAME
SALARY
SEX
STATE
Num
Char
Char
Num
Char
Char
Char
Char
Num
Char
Char
8
15
15
8
12
4
3
15
8
1
2
Format
Informat
DATE7.
$.
$.
DATE7.
$.
$.
$.
$.
COMMA8.
$.
$.
Label
Transcode
DATE7.
$.
$.
DATE7.
$.
$.
$.
$.
current salary excluding bonus
$.
$.
YES
NO
NO
YES
YES
YES
YES
YES
YES
YES
YES
indicates the number of each index. The indexes are numbered sequentially as
they are dened.
Index
displays the name of each index. For simple indexes, the name of the index is the
same as a variable in the data set.
Unique Option
indicates whether the index must have unique values. If the column contains YES,
the combination of values of the index variables is unique for each observation.
Nomiss Option
indicates whether the index excludes missing values for all index variables. If the
column contains YES, the index does not contain observations with missing values
for all index variables.
# of Unique Values
Output 15.6
Index
Unique
Option
vital
YES
NoMiss
Option
YES
# of
Unique
Values
148
Variables
BIRTH SALARY
Sort Information
The section shown in Output 15.7 appears only if the Sorted eld has a value of YES.
368
Chapter 15
Sortedby
indicates how the data are currently sorted. This eld contains either the
variables and options you use in the BY statement in PROC SORT, the column
name in PROC SQL, or the values you specify in the SORTEDBY= option.
Validated
indicates whether PROC SORT or PROC SQL sorted the data. If PROC SORT or
PROC SQL sorted the data set, the value is YES. If you assigned the sort
information with the SORTEDBY= data set option, the value is NO.
Character Set
is the character set used to sort the data. The value for this eld can be ASCII,
EBCDIC, or PASCII.
Collating Sequence
is the collating sequence used to sort the data set. This eld does not appear if you
do not specify a specic collating sequence that is different from the character set.
(not shown)
Sort Option
Output 15.7
LNAME
NO
ANSI
369
LISTING in an ODS SELECT or ODS EXCLUDE statement, you affect both the log
and the listing.
Description
Table is generated:
Directory
Members
Library member
information
Table 15.7 ODS Table Names Produced by PROC CONTENTS and PROC DATASETS with the
CONTENTS Statement
ODS Table
Description
Table is generated:
Attributes
Directory
EngineHost
IntegrityConstraints
Indexes
IndexesShort
Members
Position
PositionShort
Sortedby
SortedbyShort
370
Chapter 15
ODS Table
Description
Table is generated:
Variables
VariablesShort
* For PROC DATASETS, if both the NOLIST option and either the DIRECTORY option or DATA=<libref.>_ALL_
are specied, then the NOLIST option is ignored.
371
372
Chapter 15
LABEL
variable label (blank if none given).
LENGTH
variable length.
LIBNAME
libref used for the data library.
MEMLABEL
label for this SAS data set (blank if no label).
MEMNAME
SAS data set that contains the variable.
MEMTYPE
library member type (DATA or VIEW).
MODATE
date the data set was last modied.
NAME
variable name.
NOBS
number of observations in the data set.
NODUPKEY
indicates whether the NODUPKEY option was used in a PROC SORT statement
to sort the input data set.
NODUPREC
indicates whether the NODUPREC option was used in a PROC SORT statement
to sort the input data set.
NPOS
physical position of the rst character of the variable in the data set.
POINTOBS
indicates if the data set can be addressed by observation.
PROTECT
the rst letter of the level of protection. The value for PROTECT is one or more of
the following:
A
REUSE
indicates whether the space made available when observations are deleted from a
compressed data set should be reused. If the data set is not compressed, the
REUSE variable has a value of NO.
SORTED
the value depends on the sorting characteristics of the input data set. Possible
values are
. (period)
373
SORTEDBY
the value depends on that variables role in the sort. Possible values are
. (period)
if the variable was not used to sort the input data set.
n
where n is an integer that denotes the position of that variable in the sort. A
negative value of n indicates that the data set is sorted by the descending
order of that variable.
TYPE
type of the variable (1=numeric, 2=character).
TYPEMEM
special data set type (blank if no TYPE= value is specied).
VARNUM
variable number in the data set. Variables are numbered in the order they appear.
The output data set is sorted by the variables LIBNAME and MEMNAME.
Note: The variable names are sorted so that the values X1, X2, and X10 are listed
in that order, not in the true collating sequence of X1, X10, X2. Therefore, if you want
to use a BY statement on MEMNAME in subsequent steps, run a PROC SORT step on
the output data set rst or use the NOTSORTED option in the BY statement. 4
The following is an example of an output data set created from the GROUP data set,
which is shown in Example 4 on page 384 and in Procedure Output on page 364.
Output 15.8
OBS LIBNAME
1
2
3
4
5
6
7
8
9
10
11
OBS
HEALTH
HEALTH
HEALTH
HEALTH
HEALTH
HEALTH
HEALTH
HEALTH
HEALTH
HEALTH
HEALTH
MEMNAME
GROUP
GROUP
GROUP
GROUP
GROUP
GROUP
GROUP
GROUP
GROUP
GROUP
GROUP
MEMLABEL
Test
Test
Test
Test
Test
Test
Test
Test
Test
Test
Test
TYPEMEM
Subjects
Subjects
Subjects
Subjects
Subjects
Subjects
Subjects
Subjects
Subjects
Subjects
Subjects
LABEL
1
2
3
4
5
6
7
8
9 current salary excluding bonus
10
11
NAME
TYPE
BIRTH
CITY
FNAME
HIRED
HPHONE
IDNUM
JOBCODE
LNAME
SALARY
SEX
STATE
FORMAT
DATE
$
$
DATE
$
$
$
$
COMMA
$
$
1
LENGTH
VARNUM
1
2
2
1
2
2
2
2
1
2
2
8
15
15
8
12
4
3
15
8
1
2
9
4
3
10
11
1
7
2
8
6
5
FORMATL
FORMATD
INFORMAT
INFORML
7
0
0
7
0
0
0
0
8
0
0
0
0
0
0
0
0
0
0
0
0
0
DATE
$
$
DATE
$
$
$
$
7
0
0
7
0
0
0
0
0
0
0
$
$
374
Chapter 15
0
0
0
0
0
0
0
0
0
0
0
OBS IDXUSAGE
1
2
3
4
5
6
7
8
9
10
11
COMPOSITE
NONE
NONE
NONE
NONE
NONE
NONE
NONE
COMPOSITE
NONE
NONE
1
0
0
1
0
0
0
0
1
0
0
8
58
43
16
79
24
76
28
0
75
73
148
148
148
148
148
148
148
148
148
148
148
V9
V9
V9
V9
V9
V9
V9
V9
V9
V9
V9
CRDATE
29JAN02:08:06:46
29JAN02:08:06:46
29JAN02:08:06:46
29JAN02:08:06:46
29JAN02:08:06:46
29JAN02:08:06:46
29JAN02:08:06:46
29JAN02:08:06:46
29JAN02:08:06:46
29JAN02:08:06:46
29JAN02:08:06:46
MODATE DELOBS
29JAN02:09:13:36
29JAN02:09:13:36
29JAN02:09:13:36
29JAN02:09:13:36
29JAN02:09:13:36
29JAN02:09:13:36
29JAN02:09:13:36
29JAN02:09:13:36
29JAN02:09:13:36
29JAN02:09:13:36
29JAN02:09:13:36
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
R-R-R-R-R-R-R-R-R-R-R--
-----------------------
NO
NO
NO
NO
NO
NO
NO
NO
NO
NO
NO
NO
NO
NO
NO
NO
NO
NO
NO
NO
NO
NO
0
0
0
0
0
0
0
0
0
0
0
.
.
.
.
.
.
.
1
.
.
.
OBS CHARSET COLLATE NODUPKEY NODUPREC ENCRYPT POINTOBS GENMAX GENNUM GENNEXT
1
2
3
4
5
6
7
8
9
10
11
ANSI
ANSI
ANSI
ANSI
ANSI
ANSI
ANSI
ANSI
ANSI
ANSI
ANSI
NO
NO
NO
NO
NO
NO
NO
NO
NO
NO
NO
NO
NO
NO
NO
NO
NO
NO
NO
NO
NO
NO
NO
NO
NO
NO
NO
NO
NO
NO
NO
NO
NO
YES
YES
YES
YES
YES
YES
YES
YES
YES
YES
YES
0
0
0
0
0
0
0
0
0
0
0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Note: For information about how to get the CONTENTS output into an ODS data
set for processing, see Example 7 on page 389. 4
375
LIBNAME
libref used for the data library.
MEMNAME
SAS data set that contains the variable.
MG
the value of MESSAGE=, if it is used, in the IC CREATE statement.
MSGTYPE
the value will be blank unless an integrity constraint is violated and you specied
a message.
NAME
the name of the index or integrity constraint.
NOMISS
contains YES if the NOMISS option is dened for the index.
NUMVALS
the number of distinct values in the index (displayed for centiles).
NUMVARS
the number of variables involved in the index or integrity constraint.
ONDELETE
for a foreign key integrity constraint, contains RESTRICT or SET NULL if
applicable (the ON DELETE option in the IC CREATE statement).
ONUPDATE
for a foreign key integrity constraint, contains RESTRICT or SET NULL if
applicable (the ON UPDATE option in the IC CREATE statement).
RECREATE
the SAS statement necessary to recreate the index or integrity constraint.
REFERENCE
for a foreign key integrity constraint, contains the name of the referenced data set.
TYPE
the type. For an index, the value is Index while for an integrity constraint, the
value is the type of integrity constraint (Not Null, Check, Primary Key, etc.).
UNIQUE
contains YES if the UNIQUE option is dened for the index.
UPERC
the percentage of the index that has been updated since the last refresh (displayed
for centiles).
UPERCMX
the percentage of the index update that triggers a refresh (displayed for centiles).
WHERE
for a check integrity constraint, contains the WHERE statement.
376
Chapter 15
This example
3 changes the names of SAS les
3 copies SAS les between SAS data libraries
3 deletes SAS les
3 selects SAS les to copy
3 exchanges the names of SAS les
3 excludes SAS les from a copy operation.
Program
Write the programming statements to the SAS log. The SOURCE system option
accomplishes this.
options pagesize=60 linesize=80 nodate pageno=1 source;
Program
377
Specify the procedure input library, and add more details to the directory. DETAILS
prints these additional columns in the directory: Obs, Entries or Indexes, Vars, and
Label. All member types are available for processing because the MEMTYPE= option does not
appear in the PROC DATASETS statement.
proc datasets library=health details;
Delete two les in the library, and modify the names of a SAS data set and a catalog.
The DELETE statement deletes the TENSION data set and the A2 catalog. MT=CATALOG
applies only to A2 and is necessary because the default member type for the DELETE statement
is DATA. The CHANGE statement changes the name of the A1 catalog to POSTDRUG. The
EXCHANGE statement exchanges the names of the WEIGHT and BODYFAT data sets.
MEMTYPE= is not necessary in the CHANGE or EXCHANGE statement because the default is
MEMTYPE=ALL for each statement.
delete tension a2(mt=catalog);
change a1=postdrug;
exchange weight=bodyfat;
Restrict processing to one member type and delete and move data views.
MEMTYPE=VIEW restricts processing to SAS data views. MOVE species that all SAS data
views named in the SELECT statements in this step be deleted from the HEALTH data library
and moved to the DEST1 data library.
copy out=dest1 move memtype=view;
Move the SAS data view SPDATA from the HEALTH data library to the DEST1 data
library.
select spdata;
Move the catalogs to another data library. The SELECT statement species that the
catalogs ETEST1 through ETEST5 be moved from the HEALTH data library to the DEST1 data
library. MEMTYPE=CATALOG overrides the MEMTYPE=VIEW option in the COPY statement.
select etest1-etest5 / memtype=catalog;
Exclude all les with a specied criteria from processing. The EXCLUDE statement
excludes from the COPY operation all SAS les that begin with the letter D and the other SAS
les listed. All remaining SAS les in the HEALTH data library are copied to the DEST2 data
library.
copy out=dest2;
exclude d: mlscl oxygen test2 vision weight;
quit;
378
SAS Log
Chapter 15
SAS Log
1
options pagesize=60 linesize=80 nodate pageno=1 source;
2
libname dest1 c:\Myfiles\dest1;
NOTE: Libref DEST1 was successfully assigned as follows:
Engine:
V9
Physical Name: c:\Myfiles\dest1
3
libname dest2 c:\Myfiles\dest2;
NOTE: Libref DEST2 was successfully assigned as follows:
Engine:
V9
Physical Name: c:\Myfiles\dest2
4
libname health c:\Myfiles\health;
NOTE: Libref HEALTH was successfully assigned as follows:
Engine:
V9
Physical Name: c:\Myfiles\health
5
proc datasets library=health details;
Directory
Libref
Engine
Physical Name
File Name
HEALTH
V9
c:\Myfiles\health
c:\Myfiles\health
Name
Member
Type
Obs, Entries
or Indexes
1
2
3
4
5
6
7
A1
A2
ALL
BODYFAT
CONFOUND
CORONARY
DRUG1
CATALOG
CATALOG
DATA
DATA
DATA
DATA
DATA
23
1
23
1
8
39
6
17
2
4
4
2
DRUG2
DATA
13
DRUG3
DATA
11
10
DRUG4
DATA
11
DRUG5
DATA
12
13
14
15
16
17
18
19
20
21
ETEST1
ETEST2
ETEST3
ETEST4
ETEST5
ETESTS
FORMATS
GROUP
INFANT
MLSCL
CATALOG
CATALOG
CATALOG
CATALOG
CATALOG
CATALOG
CATALOG
DATA
DATA
DATA
1
1
1
1
1
1
6
148
149
32
11
6
4
22
23
24
25
NAMES
OXYGEN
PERSONL
PHARM
DATA
DATA
DATA
DATA
7
31
148
6
4
7
11
3
26
27
28
29
30
31
32
33
34
35
36
POINTS
PRENAT
RESULTS
SLEEP
SPDATA
SYNDROME
TENSION
TEST2
TRAIN
VISION
WEIGHT
DATA
DATA
DATA
DATA
VIEW
DATA
DATA
DATA
DATA
DATA
DATA
6
149
10
108
.
46
4
15
7
16
83
6
6
5
6
2
8
3
5
2
3
13
Vars
Label
JAN95
Data
MAY95
Data
JUL95
Data
JAN92
Data
JUL92
Data
Multiple
Sclerosi
s Data
Sugar
Study
Californ
ia
Results
File
Size
Last Modified
62464
17408
13312
5120
5120
5120
5120
19FEB2002:14:41:15
19FEB2002:14:41:15
19FEB2002:14:41:19
19FEB2002:14:41:19
19FEB2002:14:41:19
19FEB2002:14:41:20
19FEB2002:14:41:20
5120
19FEB2002:14:41:20
5120
19FEB2002:14:41:20
5120
19FEB2002:14:41:20
5120
19FEB2002:14:41:20
17408
17408
17408
17408
17408
17408
17408
25600
17408
5120
19FEB2002:14:41:20
19FEB2002:14:41:20
19FEB2002:14:41:20
19FEB2002:14:41:20
19FEB2002:14:41:20
19FEB2002:14:41:21
19FEB2002:14:41:21
19FEB2002:14:41:21
05FEB2002:12:52:30
19FEB2002:14:41:21
5120
9216
25600
5120
19FEB2002:14:41:21
19FEB2002:14:41:21
19FEB2002:14:41:21
19FEB2002:14:41:21
5120
17408
5120
9216
5120
9216
5120
5120
5120
5120
13312
19FEB2002:14:41:21
19FEB2002:14:41:22
19FEB2002:14:41:22
19FEB2002:14:41:22
19FEB2002:14:41:29
19FEB2002:14:41:22
19FEB2002:14:41:22
19FEB2002:14:41:22
19FEB2002:14:41:22
19FEB2002:14:41:22
19FEB2002:14:41:22
37
WGHT
DATA
83
13
Californ
ia
Results
13312
SAS Log
19FEB2002:14:41:23
6
delete tension a2(mt=catalog);
7
change a1=postdrug;
8
exchange weight=bodyfat;
NOTE: Deleting HEALTH.TENSION (memtype=DATA).
NOTE: Deleting HEALTH.A2 (memtype=CATALOG).
NOTE: Changing the name HEALTH.A1 to HEALTH.POSTDRUG (memtype=CATALOG).
NOTE: Exchanging the names HEALTH.WEIGHT and HEALTH.BODYFAT (memtype=DATA).
9
copy out=dest1 move memtype=view;
10
select spdata;
11
select etest1-etest5 / memtype=catalog;
NOTE: Moving HEALTH.SPDATA to DEST1.SPDATA (memtype=VIEW).
NOTE: Moving HEALTH.ETEST1 to DEST1.ETEST1 (memtype=CATALOG).
NOTE: Moving HEALTH.ETEST2 to DEST1.ETEST2 (memtype=CATALOG).
NOTE: Moving HEALTH.ETEST3 to DEST1.ETEST3 (memtype=CATALOG).
NOTE: Moving HEALTH.ETEST4 to DEST1.ETEST4 (memtype=CATALOG).
NOTE: Moving HEALTH.ETEST5 to DEST1.ETEST5 (memtype=CATALOG).
12
copy out=dest2;
13
exclude d: mlscl oxygen test2 vision weight;
14
quit;
NOTE:
NOTE:
NOTE:
NOTE:
NOTE:
NOTE:
NOTE:
NOTE:
NOTE:
NOTE:
NOTE:
NOTE:
NOTE:
NOTE:
NOTE:
NOTE:
NOTE:
NOTE:
NOTE:
NOTE:
NOTE:
NOTE:
NOTE:
NOTE:
NOTE:
NOTE:
NOTE:
NOTE:
NOTE:
NOTE:
NOTE:
NOTE:
NOTE:
NOTE:
NOTE:
NOTE:
NOTE:
NOTE:
NOTE:
NOTE:
NOTE:
NOTE:
NOTE:
NOTE:
NOTE:
NOTE:
NOTE:
NOTE:
NOTE:
NOTE:
NOTE:
379
380
Chapter 15
This example uses the SAVE statement to save some SAS les from deletion and to
delete other SAS les.
Program
Write the programming statements to the SAS log. SAS option SOURCE writes all
programming statements to the log.
options pagesize=40 linesize=80 nodate pageno=1 source;
Save the data sets CHRONIC, AGING, and CLINICS, and delete all other SAS les (of
all types) in the ELDER library. MEMTYPE=DATA is necessary because the ELDER library
has a catalog named CLINICS and a data set named CLINICS.
save chronic aging clinics / memtype=data;
run;
SAS Log
41
options pagesize=40 linesize=80 nodate pageno=1 source;
42
libname elder c:\Myfiles\elder;
NOTE: Libref ELDER was successfully assigned as follows:
Engine:
V9
Physical Name: c:\Myfiles\elder
43
proc datasets lib=elder;
Directory
Libref
Engine
Physical Name
File Name
ELDER
V9
c:\Myfiles\elder
c:\Myfiles\elder
44
45
NOTE:
NOTE:
NOTE:
NOTE:
NOTE:
NOTE:
NOTE:
NOTE:
NOTE:
Name
Member
Type
File
Size
Last Modified
1 AGING
DATA
5120
06FEB2003:08:51:21
2 ALCOHOL
DATA
5120
06FEB2003:08:51:21
3 BACKPAIN DATA
5120
06FEB2003:08:51:21
4 CHRONIC
DATA
5120
06FEB2003:08:51:21
5 CLINICS
CATALOG
17408
06FEB2003:08:51:21
6 CLINICS
DATA
5120
06FEB2003:08:51:21
7 DISEASE
DATA
5120
06FEB2003:08:51:21
8 GROWTH
DATA
5120
06FEB2003:08:51:21
9 HOSPITAL CATALOG
17408
06FEB2003:08:51:21
save chronic aging clinics / memtype=data;
run;
Saving ELDER.CHRONIC (memtype=DATA).
Saving ELDER.AGING (memtype=DATA).
Saving ELDER.CLINICS (memtype=DATA).
Deleting ELDER.ALCOHOL (memtype=DATA).
Deleting ELDER.BACKPAIN (memtype=DATA).
Deleting ELDER.CLINICS (memtype=CATALOG).
Deleting ELDER.DISEASE (memtype=DATA).
Deleting ELDER.GROWTH (memtype=DATA).
Deleting ELDER.HOSPITAL (memtype=CATALOG).
381
382
Program
Chapter 15
This example modies two SAS data sets using the MODIFY statement and
statements subordinate to it. Example 4 on page 384 shows the modications to the
GROUP data set.
Tasks include
Program
Write the programming statements to the SAS log. SAS option SOURCE writes the
programming statements to the log.
options pagesize=40 linesize=80 nodate pageno=1 source;
Specify HEALTH as the procedure input library to process. NOLIST suppresses the
directory listing for the HEALTH data library.
proc datasets library=health nolist;
Add a label to a data set, assign a READ password, and specify how to sort the data.
LABEL= adds a data set label to the data set GROUP. READ= assigns GREEN as the read
password. The password appears as Xs in the SAS log. SAS issues a warning message if you
specify a level of password protection on a SAS le that does not include alter protection.
SORTEDBY= species how the data is sorted.
modify group (label=Test Subjects read=green sortedby=lname);
SAS Log
383
Create the composite index VITAL on the variables BIRTH and SALARY for the
GROUP data set. NOMISS excludes all observations that have missing values for BIRTH and
SALARY from the index. UNIQUE species that the index is created only if each observation
has a unique combination of values for BIRTH and SALARY.
index create vital=(birth salary) / nomiss unique;
Rename a variable, and assign a label. Modify the data set OXYGEN by renaming the
variable OXYGEN to INTAKE and assigning a label to the variable INTAKE.
modify oxygen;
rename oxygen=intake;
label intake=Intake Measurement;
quit;
SAS Log
6
options pagesize=40 linesize=80 nodate pageno=1 source;
7
libname health c:\Myfiles\health;
NOTE: Libref HEALTH was successfully assigned as follows:
Engine:
V9
Physical Name: c:\Myfiles\health
8
proc datasets library=health nolist;
9
modify group (label=Test Subjects read=XXXXX sortedby=lname);
WARNING: The file HEALTH.GROUP.DATA is not ALTER protected. It could be
deleted or replaced without knowing the password.
10
index create vital=(birth salary) / nomiss unique;
NOTE: Composite index vital has been defined.
11
informat birth date7.;
12
format birth date7.;
13
label salary=current salary excluding bonus;
14
modify oxygen;
15
rename oxygen=intake;
NOTE: Renaming variable oxygen to intake.
16
label intake=Intake Measurement;
17
quit;
NOTE: MODIFY was successful for HEALTH.OXYGEN.DATA.
NOTE: PROCEDURE DATASETS used (Total process time):
real time
16.96 seconds
cpu time
0.73 seconds
384
Chapter 15
This example shows the output from the CONTENTS statement for the GROUP data
set. The output shows the modications made to the GROUP data set in Example 3 on
page 381.
Program
options pagesize=40 linesize=132 nodate pageno=1;
Specify HEALTH as the procedure input library, and suppress the directory listing.
proc datasets library=health nolist;
Create the output data set GRPOUT from the data set GROUP. Specify GROUP as the
data set to describe, give read access to the GROUP data set, and create the output data set
GRPOUT, which appears in The OUT= Data Set on page 370.
contents data=group (read=green) out=grpout;
title The Contents of the GROUP Data Set;
run;
Output
385
Output
Output 15.9
HEALTH.GROUP
DATA
V9
Observations
Variables
Indexes
148
11
1
Created
Last Modified
Observation Length
Deleted Observations
96
0
Protection
Data Set Type
Label
READ
Compressed
Sorted
NO
YES
Data Representation
Encoding
WINDOWS_32
wlatin1 Western (Windows)
Test Subjects
8192
4
1
84
62
4096
2
0
File Name
Release Created
Host Created
c:\Myfiles\health\group.sas7bdat
9.0101B0
XP_PRO
Variable
Type
Len
Format
Informat
9
4
BIRTH
CITY
Num
Char
8
15
DATE7.
$.
DATE7.
$.
FNAME
HIRED
HPHONE
Char
Num
Char
15
8
12
$.
DATE7.
$.
$.
DATE7.
$.
3
10
11
Label
386
Chapter 15
Variable
Type
Len
Format
Informat
IDNUM
Char
7
2
JOBCODE
LNAME
Char
Char
$.
$.
3
15
$.
$.
8
6
5
SALARY
SEX
STATE
Num
Char
Char
8
1
2
$.
$.
COMMA8.
$.
$.
Label
$.
$.
Unique
NoMiss
# of
Unique
Values
Index
Option
Option
vital
YES
YES
148
Variables
BIRTH SALARY
Sort Information
Sortedby
LNAME
Validated
Character Set
NO
ANSI
This example appends one data set to the end of another data set.
Program
387
ID
TREAT
INITWT
WT3MOS
AGE
1
2
3
5
6
7
10
11
12
13
Other
Other
Other
Other
Other
Other
Other
Other
Other
Other
166.28
214.42
172.46
175.41
173.13
181.25
239.83
175.32
227.01
274.82
146.98
210.22
159.42
160.66
169.40
170.94
214.48
162.66
211.06
251.82
35
54
33
37
20
30
48
51
29
31
The data set EXP.SUR contains the variable WT6MOS, but the EXP.RESULTS data set does not.
id
treat
initwt
wt3mos
wt6mos
age
14
17
18
surgery
surgery
surgery
203.60
171.52
207.46
169.78
150.33
155.22
143.88
123.18
.
38
42
41
Program
options pagesize=40 linesize=64 nodate pageno=1;
Suppress the printing of the EXP library. LIBRARY= species EXP as the procedure input
library. NOLIST suppresses the directory listing for the EXP library.
proc datasets library=exp nolist;
388
Output
Chapter 15
Append the data set EXP.SUR to the EXP.RESULTS data set. The APPEND statement
appends the data set EXP.SUR to the data set EXP.RESULTS. FORCE causes the APPEND
statement to carry out the append operation even though EXP.SUR has a variable that
EXP.RESULTS does not. APPEND does not add the WT6MOS variable to EXP.RESULTS.
append base=exp.results data=exp.sur force;
run;
Output
Output 15.10
The EXP.RESULTS Data Set
ID
TREAT
INITWT
WT3MOS
AGE
1
2
3
5
6
7
10
11
12
13
14
17
18
Other
Other
Other
Other
Other
Other
Other
Other
Other
Other
surgery
surgery
surgery
166.28
214.42
172.46
175.41
173.13
181.25
239.83
175.32
227.01
274.82
203.60
171.52
207.46
146.98
210.22
159.42
160.66
169.40
170.94
214.48
162.66
211.06
251.82
169.78
150.33
155.22
35
54
33
37
20
30
48
51
29
31
38
42
41
AGE statement
This example shows how the AGE statement ages SAS les.
389
Program
Write the programming statements to the SAS log. SAS option SOURCE writes the
programming statements to the log.
options pagesize=40 linesize=80 nodate pageno=1 source;
Specify DAILY as the procedure input library and suppress the directory listing.
proc datasets library=daily nolist;
Delete the last SAS le in the list, DAY7, and then age (or rename) DAY6 to DAY7,
DAY5 to DAY6, and so on, until it ages TODAY to DAY1.
age today day1-day7;
run;
SAS Log
CONTENTS Statement
The example shows how to get PROC CONTENTS output into an ODS output data
set for processing.
390
Program
Chapter 15
Program
title1 "PROC CONTENTS ODS Output";
options nodate nonumber nocenter formdlim=-;
data a;
x=1;
run;
Use the ODS OUTPUT statement to specify data sets to which CONTENTS data will be
directed.
Output 15.11
Member
Label1
cValue1
WORK.A
WORK.A
WORK.A
WORK.A
WORK.A
WORK.A
WORK.A
WORK.A
WORK.A
WORK.A
WORK.A
DATA
V9
Thursday, October 10, 2002 00:56:03
Thursday, October 10, 2002 00:56:03
nValue1
.
.
.
1349873763
1349873763
.
.
.
.
.
WINDOWS_32
wlatin1 Western (Windows)
c
Value2
Label2
Observations
Variables
Indexes
Observation Length
Deleted Observations
Compressed
Sorted
1
1
0
8
0
NO
NO
1.000000
1.000000
0
8.000000
0
.
.
0
0
0
Num
WORK.A
Variable
x
Type
Num
Len
8
nValue2
Pos
0
Program
391
392
Program
Chapter 15
Label1
WORK.A
WORK.A
WORK.A
WORK.A
WORK.A
WORK.A
WORK.A
WORK.A
WORK.A
cValue1
nValue1
4096
4096.000000
1
1.000000
1
1.000000
501
501.000000
1
1.000000
0
0
C:\DOCUME~1\userid\LOCALS~1\Temp\SAS Temporary Files\_TD3084\a.sas7bdat
.
9.0101B0
.
XP_PRO
.
attribute
value
WORK.A
WORK.A
WORK.A
WORK.A
Data Representation
Encoding
Observations
Variables
WINDOWS_32
wlatin1 Western (Windows)
1
1
For more information on the SAS Output Delivery System, see SAS Output Delivery
System: Users Guide.
393
CHAPTER
16
The DBCSTAB Procedure
Information about the DBCSTAB Procedure
393
See:
394
395
CHAPTER
17
The DISPLAY Procedure
Overview: DISPLAY Procedure 395
Syntax: DISPLAY Procedure
395
PROC DISPLAY Statement 395
Example: DISPLAY Procedure
396
Example 1: Executing a SAS/AF Application
396
396
Chapter 17
Required Argument
CATALOG=libref.catalog.entry.type
Options
BATCH
runs PROGRAM and SCL entries in batch mode. If a PROGRAM entry contains a
display, then it will not run, and you will receive the following error message:
ERROR: Cannot allocate window.
or an SCL entry.
Suppose that your company has developed a SAS/AF application that compiles
statistics from an invoice database. Further, suppose that this application is stored in
the SASUSER data library, as a FRAME entry in a catalog named
INVOICES.WIDGETS. You can execute this application using the following SAS code:
Program
proc display catalog=sasuser.invoices.widgets.frame;
run;
Program
397
398
399
CHAPTER
18
The DOCUMENT Procedure
Information about the DOCUMENT Procedure
399
See:
400
401
CHAPTER
19
The EXPLODE Procedure
Information about the EXPLODE Procedure
401
list.
402
403
CHAPTER
20
The EXPORT Procedure
Overview: EXPORT Procedure 403
Syntax: EXPORT Procedure 404
PROC EXPORT Statement 404
Data Source Statements 408
Examples: EXPORT Procedure 411
Example 1: Exporting a Delimited External File 411
Example 2: Exporting a Subset of Observations to an Excel Spreadsheet 414
Example 3: Exporting to a Specic Spreadsheet in an Excel Workbook 415
Example 4: Exporting a Microsoft Access Table 415
Example 5: Exporting a Specic Spreadsheet in an Excel Workbook on a PC Server
416
Export Data
404
Chapter 20
3 OpenVMS Alpha
3 UNIX
3 Microsoft Windows.
All examples
Required Arguments
DATA=<libref.>SAS-data-set
identies the input SAS data set with either a one- or two-level SAS name (library
and member name). If you specify a one-level name, by default, PROC EXPORT uses
either the USER library (if assigned) or the WORK library (if USER not assigned).
Default: If you do not specify a SAS data set, PROC EXPORT uses the most
recently created SAS data set, which SAS keeps track of with the system variable
_LAST_. However, in order to be certain that PROC EXPORT uses the correct
data set, you should identify the SAS data set.
Restriction: PROC EXPORT can export data only if the format of the data is
supported by the data source or the amount of data is within the limitations of the
data source. For example, some data sources have a maximum number of rows or
columns, and some data sources cannot support SAS user-dened formats and
informats. If the data that you want to export exceeds the limits of the data
source, PROC EXPORT may not be able to export it correctly. When incompatible
formats are encountered, the procedure formats the data to the best of its ability.
Restriction: PROC EXPORT does not support writing labels as column names.
However, SAS does support column names up to 32 characters.
Featured in: All examples
(SAS-data-set-options)
species SAS data set options. For example, if the data set that you are exporting
has an assigned password, you can use the ALTER=, PW=, READ=, or WRITE= data
405
set option, or to export only data that meets a specied condition, you can use the
WHERE= data set option. For information about SAS data set options, see Data Set
Options in SAS Language Reference: Dictionary.
Restriction: You cannot specify data set options when exporting delimited,
comma-separated, or tab-delimited external les.
Featured in: Example 2 on page 414
OUTFILE="lename"
species the complete path and lename or a leref for the output PC le,
spreadsheet, or delimited external le. If you specify a leref or if the complete path
and lename does not include special characters (such as the backslash in a path),
lowercase characters, or spaces, you can omit the quotation marks. A leref is a SAS
name that is associated with the physical location of the output le. To assign a
leref, use the FILENAME statement. For more information about PC le formats,
see SAS/ACCESS for PC Files: Reference.
Featured in Example 1 on page 411, Example 2 on page 414, and Example 3 on
page 415
Restriction: PROC EXPORT does not support device types or access methods for
the FILENAME statement except for DISK. For example, PROC EXPORT does
not support the TEMP device type, which creates a temporary external le.
Restriction: For client/server applications: When running SAS/ACCESS software
on UNIX to access data that is stored on a PC server, you must specify the full
path and lename of the le that you want to import. The use of a leref is not
supported.
OUTTABLE="tablename"
species the table name of the output DBMS table. If the name does not include
special characters (such as question marks), lowercase characters, or spaces, you can
omit the quotation marks. Note that the DBMS table name may be case sensitive.
Requirement: When you export a DBMS table, you must specify the DBMS= option.
Featured in: Example 4 on page 415
Options
DBMS=identier
species the type of data to export. To export a DBMS table, you must specify
DBMS= by using a valid database identier. For example, DBMS=ACCESS species
to export a table into a Microsoft Access 2000 or 2002 database. To export PC les,
spreadsheets, and delimited external les, you do not have to specify DBMS= if the
lename that is specied in OUTFILE= contains a valid extension so that PROC
EXPORT can recognize the type of data. For example, PROC EXPORT recognizes the
lename ACCOUNTS.WK1 as a Lotus 1-2-3 Release 2 spreadsheet and the lename
MYDATA.CSV as an external le that contains comma-separated data values;
therefore, a DBMS= specication is not necessary.
The following values are valid for the DBMS= option:
Identier
Extension
Host
Version
Availability of File
Created
ACCESS
.mdb
Microsoft
Windows *
2000
ACCESS97
.mdb
Microsoft
Windows *
97
406
Identier
Chapter 20
Extension
Host
Version
Availability of File
Created
.mdb
Microsoft
Windows *
2000
.mdb
Microsoft
Windows *
2000
ACCESSCS
.mdb
UNIX
2000**
CSV
.csv
OpenVMS
Alpha,
UNIX,
Microsoft
Windows
DBF
.dbf
UNIX,
Microsoft
Windows
DLM
.*
OpenVMS
Alpha,
UNIX,
Microsoft
Windows
EXCEL
.xls
Microsoft
Windows *
97
EXCEL4
.xls
Microsoft
Windows
4.0
EXCEL5
.xls
Microsoft
Windows
5.0
EXCEL97
Excel 97 spreadsheet
.xls
Microsoft
Windows *
97
EXCEL2000
.xls
Microsoft
Windows *
97
EXCEL2002
.xls
Microsoft
Windows *
97
EXCELCS
Excel spreadsheet
.xls
UNIX
97**
JMP
JMP table
.jmp
UNIX,
Microsoft
Windows
PCFS
Files on PC server
.*
UNIX
TAB
.txt
OpenVMS
Alpha,
UNIX,
Microsoft
Windows
WK1
.wk1
Microsoft
Windows
5.0
Identier
Extension
Host
Version
Availability of File
Created
WK3
.wk3
Microsoft
Windows
WK4
.wk4
407
Microsoft
Windows
* Not available for Microsoft Windows 64-Bit Edition.** Value listed here is the
default value. The real version of le loaded depends on the version of the existing
le or the value specied for VERSION= statement.
Restriction: The availability of an output data source depends on
3 the operating environment, and in some cases the platform, as specied in
the previous table.
3 whether your site has a license to the SAS/ACCESS software for PC le
formats. If you do not have a license, only delimited les are available.
Example 1 on page 411 and Example 4 on page 415
When you specify a value for DBMS=, consider the following for specic data
sources:
3 To export to an existing Microsoft Access database, PROC EXPORT can write to
Access 97, Access 2000, or Access 2002 regardless of your specication. For
example, if you specify DBMS=ACCESS2000 and the database is in Access 97
format, PROC EXPORT exports the table, and the database remains in Access
97 format. However, if you specify OUTFILE= for an Access database that does
not exist, a new database is created using the format specied in DBMS=. For
example to create a new Access database, specifying DBMS=ACCESS (which
defaults to Access 2000 or 2002 format) creates an MDB le that can be read by
Access 2000 or Access 2002, not by Access 97.
The following table lists the DBMS= specications and indicates which
version of Microsoft Access can open the resulting database:
Featured in:
Specication
Access 2002
Access 2000
Access 97
ACCESS
yes
yes
no
ACCESS2002
yes
yes
no
ACCESS2000
yes
yes
no
ACCESS97
yes
yes
yes
Excel 2002
Excel 2000
Excel 97
Excel 5.0
Excel 4.0
EXCEL
yes
yes
yes
no
no
EXCEL2002
yes
yes
yes
no
no
EXCEL2000
yes
yes
yes
no
no
408
Chapter 20
Specication
Excel 2002
Excel 2000
Excel 97
Excel 5.0
Excel 4.0
EXCEL97
yes
yes
yes
no
no
EXCEL5
yes
yes
yes
yes
no
EXCEL4
yes
yes
yes
yes
yes
Note: Later versions of Excel can open and update les in earlier formats.
3 When exporting a SAS data set to a dBASE le (DBF), if the data set contains
missing values (for either character or numeric values), the missing values are
translated to blanks.
3 When exporting a SAS data set to a dBASE le (DBF), values for a character
variable that are longer than 255 characters are truncated in the resulting
dBASE le because of dBASE limitations.
REPLACE
overwrites an existing le. Note that for a Microsoft Access database or an Excel
workbook, REPLACE overwrites the target table or spreadsheet. If you do not
specify REPLACE, PROC EXPORT does not overwrite an existing le.
Featured in:
PROC EXPORT provides a variety of statements that are specic to the output data
source.
SHEET=spreadsheet-name;
identies a particular spreadsheet name to load into a workbook. You use this
statement for Microsoft Excel 97, 2000, or 2002 only. If the SHEET= statement is
not specied, PROC EXPORT uses the SAS data set name as the spreadsheet
name to load the data.
409
For Excel data access, a spreadsheet name is treated as a special case of a range
name with a dollar sign ($) appended. For example, if you export a table and
specify sheet=Invoice, you will see a range (table) name INVOICE and another
range (table) name INVOICES$ created. Excel appends a dollar sign ($) to a
spreadsheet name in order to distinguish it from the corresponding range name.
Note: You should not append the dollar sign ($) when you specify the
spreadsheet name. For example, SHEET= Invoice$ is not allowed. 4
You should avoid using special characters for spreadsheet names when
exporting a table to an Excel le. Special characters such as a space or a hyphen
are replaced with an underscore. For example, if you export a table and specify
sheet=Sheet Number 1, PROC EXPORT creates the range names
Sheet_Number_1 and Sheet_Number_1$.
Featured in: Example 3 on page 415
DBPWD="database-password";
species a password that allows access to a database. You can replace the equal
sign with a blank.
PWD="password";
species the user password used by the DBMS to validate a specic userid. If the
password does not contain lowercase characters, special characters, or national
characters, you can omit the quotation marks. You can replace the equal sign with
a blank.
Note: The DBMS client software may default to the userid and password that
was used to log in to the operating environment; SAS does not generate a default
value. 4
UID="userid";
identies the user to the DBMS. If the userid does not contain lowercase
characters, special characters, or national characters, you can omit the quotation
marks. You can replace the equal sign with a blank.
Note: The DBMS client software may default to the userid and password that
were used to log in to the operating environment; SAS does not generate a default
value. 4
WGDB="workgroup-database-name";
species the workgroup (security) database name that contains the USERID and
PWD data for the DBMS. If the workgroup database name does not contain
lowercase characters, special characters, or national characters, you can omit the
quotation marks. You can replace the equal sign with a blank.
410
Chapter 20
Note: A default workgroup database may be used by the DBMS; SAS does not
generate a default value. 4
This example exports the following SAS data set named SASHELP.CLASS and
creates a delimited external le:
Output 20.1
Obs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Name
Alfred
Alice
Barbara
Carol
Henry
James
Jane
Janet
Jeffrey
John
Joyce
Judy
Louise
Mary
Philip
Robert
Ronald
Thomas
William
Sex
M
F
F
F
M
M
F
F
M
M
F
F
F
F
M
M
M
M
M
Age
Height
Weight
14
13
13
14
14
12
12
15
13
12
11
14
12
15
16
12
15
11
15
69
56.5
65.3
62.8
63.5
57.3
59.8
62.5
62.5
59
51.3
64.3
56.3
66.5
72
64.8
67
57.5
66.5
112.5
84
98
102.5
102.5
83
84.5
112.5
84
99.5
50.5
90
77
112
150
128
133
85
112
411
412
Program
Chapter 20
Program
Identify the input SAS data set, specify the output lename, and specify the type of
le. Note that the lename does not contain an extension. DBMS=DLM species that the
output le is a delimited external le.
proc export data=sashelp.class
outfile=c:\myfiles\class
dbms=dlm;
Specify the delimiter. The DELIMITER= option species that an & (ampersand) will delimit
data elds in the output le. The delimiter separates the columns of data in the output le.
delimiter=&;
run;
SAS Log
The SAS log displays the following information about the successful export. Notice
the generated SAS DATA step.
SAS Log
47
/**********************************************************************
48
*
PRODUCT:
SAS
49
*
VERSION:
9.00
50
*
CREATOR:
External File Interface
51
*
DATE:
07FEB02
52
*
DESC:
Generated SAS Datastep Code
53
*
TEMPLATE SOURCE: (None Specified.)
54
***********************************************************************/
55
data _null_;
56
set SASHELP.CLASS
end=EFIEOD;
57
%let _EFIERR_ = 0; /* set the ERROR detection macro variable */
58
%let _EFIREC_ = 0;
/* clear export record count macro variable */
59
file c:\myfiles\class delimiter=& DSD DROPOVER
59 ! lrecl=32767;
60
format Name $8. ;
61
format Sex $1. ;
62
format Age best12. ;
63
format Height best12. ;
64
format Weight best12. ;
65
if _n_ = 1 then
/* write column names */
66
do;
67
put
68
Name
69
&
70
Sex
71
&
72
Age
73
&
74
Height
75
&
76
Weight
77
;
78
end;
79
do;
80
EFIOUT + 1;
81
put Name $ @;
82
put Sex $ @;
83
put Age @;
84
put Height @;
85
put Weight ;
86
;
87
end;
88
if _ERROR_ then call symput(_EFIERR_,1); /* set ERROR detection
88 ! macro variable */
89
If EFIEOD then
90
call symput(_EFIREC_,EFIOUT);
91
run;
NOTE: Numeric values have been converted to character
values at the places given by: (Line):(Column).
88:44
90:31
NOTE: The file c:\myfiles\class is:
File Name=c:\myfiles\class,
RECFM=V,LRECL=32767
NOTE: 20 records were written to the file c:\myfiles\class.
The minimum record length was 17.
The maximum record length was 26.
NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: DATA statement used (Total process time):
real time
0.13 seconds
cpu time
0.05 seconds
413
414
Output
Chapter 20
Output
The external le produced by PROC EXPORT follows.
Name&Sex&Age&Height&Weight
Alfred&M&14&69&112.5
Alice&F&13&56.5&84
Barbara&F&13&65.3&98
Carol&F&14&62.8&102.5
Henry&M&14&63.5&102.5
James&M&12&57.3&83
Jane&F&12&59.8&84.5
Janet&F&15&62.5&112.5
Jeffrey&M&13&62.5&84
John&M&12&59&99.5
Joyce&F&11&51.3&50.5
Judy&F&14&64.3&90
Louise&F&12&56.3&77
Mary&F&15&66.5&112
Philip&M&16&72&150
Robert&M&12&64.8&128
Ronald&M&15&67&133
Thomas&M&11&57.5&85
William&M&15&66.5&112
This example exports the SAS data set SASHELP.CLASS, shown in Output 20.1.
PROC EXPORT creates an Excel le named Femalelist.xsl, and by default, creates a
spreadsheet named Class. Since the SHEET= data source statement is not specied,
PROC EXPORT uses the name of the SAS data set as the spreadsheet name. The
WHERE= SAS data set option is specied in order to export a subset of the
observations, which results in the spreadsheet containing only the female students.
Program
Identify the input SAS data set, request a subset of the observations, specify the
output data source, specify the output le, and overwrite the target spreadsheet if it
exists. The output le is an Excel 2000 spreadsheet.
proc export data=sashelp.class (where=(sex=F))
outfile=c:\myfiles\Femalelist.xls
415
dbms=excel
replace;
run;
This example exports a SAS data set named MYFILES.GRADES1 and creates an
Excel 2000 workbook named Grades.xsl. MYFILES.GRADES1 becomes one spreadsheet
in the workbook named Grades1.
Program
Identify the input SAS data set, specify the output data source, and specify the output
le.
proc export data=myfiles.grades1
dbms=excel2000
outfile=c:\Myfiles\Grades.xls;
This example exports a SAS data set named SASUSER.CUST, the rst ve
observations of which follow, and creates a Microsoft Access 97 table. The security level
416
Program
Chapter 20
for this Access table is none, so it is not necessary to specify any of the database
security statements.
Obs
Name
Zipcode
David Taylor
Theo Barnes
Lydia Stirog
Anton Niroles
Cheryl Gaspar
1
2
3
4
5
Street
124 Oxbow Street
2412 McAllen Avenue
12550 Overton Place
486 Gypsum Street
36 E. Broadway
72511
72513
72516
72511
72515
Program
Identify the input SAS data set, specify the output DBMS table name and the output
data source, and overwrite the output le if it exists. The output le is a Microsoft Access
97 table. The option REPLACE overwrites an existing le. If you do not specify REPLACE,
PROC EXPORT does not overwrite an existing le.
proc export data=sasuser.cust
outtable="customers"
dbms=access97
replace;
Specify the path and lename of the database to contain the table.
database="c:\myfiles\mydatabase.mdb";
run;
This example exports a SAS data set named SASHELP.CLASS and creates an Excel
2000 workbook named demo.xls. SASHELP.CLASS becomes one spreadsheet named
Class in the workbook named demo.xls.
Program
proc export data=sashelp.class
dbms=excelcs
outfile=c:\Myfiles\demo.xls;
sheet=Class;
server=sales;
port= 4632;
version=2000;
run;
Program
417
418
419
CHAPTER
21
The FONTREG Procedure
Overview: FONTREG Procedure 419
Syntax: FONTREG Procedure 419
PROC FONTREG Statement 420
FONTFILE Statement 421
FONTPATH Statement 421
TRUETYPE Statement 422
TYPE1 Statement (Experimental) 422
Concepts: FONTREG Procedure 423
Supported Font Types and Font Naming Conventions 423
Removing Fonts from the SAS Registry 424
Modifying SAS/GRAPH Device Drivers to Use System Fonts 425
Examples: FONTREG Procedure 425
Example 1: Adding a Single Font File 425
Example 2: Adding All Font Files from Multiple Directories 426
Example 3: Replacing Existing TrueType Font Files from a Directory
427
420
Chapter 21
Operating Environment Information: For z/OS sites that do not use the hierarchical
le system (HFS), only the FONTFILE statement is supported. See FONTREG
Procedure in SAS Companion for z/OS for details. 4
Options
MODE=ADD | REPLACE | ALL
species how to handle new and existing fonts in the SAS registry:
ADD
add fonts that do not already exist in the SAS registry. Do not modify existing
fonts.
REPLACE
replace fonts that already exist in the SAS registry. Do not add new fonts.
ALL
add new fonts that do not already exist in the SAS registry and replace fonts that
already exist in the SAS registry.
Default: ADD
Featured in:
FONTPATH Statement
421
Default: TERSE
Featured in:
NOUPDATE
species that the procedure should run without actually updating the SAS registry.
This option enables you to test the procedure on the specied fonts before modifying
the SAS registry.
USESASHELP
species that the SAS registry in the SASHELP library should be updated. You must
have write access to the SASHELP library in order to use this option. If the
USESASHELP option is not specied, then the SAS registry in the SASUSER library
is updated.
FONTFILE Statement
Species one or more font les to be processed.
Featured in: Example 1 on page 425
FONTFILE le <le>;
Argument
le
is the complete pathname to a font le. If the le is recognized as a valid font le,
then the le is processed. Each pathname must be enclosed in quotation marks. If
you specify more than one pathname, then you must separate the pathnames with a
space.
FONTPATH Statement
Species one or more directories to be searched for valid font les to process.
Featured in: Example 2 on page 426
422
TRUETYPE Statement
Chapter 21
Argument
directory
species a directory to search. All les that are recognized as valid font les are
processed. Each directory must be enclosed in quotation marks. If you specify more
than one directory, then you must separate the directories with a space.
TRUETYPE Statement
Species one or more directories to be searched for TrueType font les.
Featured in:
Argument
directory
species a directory to search. Only les that are recognized as valid TrueType font
les are processed. Each directory must be enclosed in quotation marks. If you
specify more than one directory, then you must separate the directories with a space.
CAUTION:
TYPE1 is an experimental statement that is available in SAS 9.1. Do not use this
statement in production jobs. 4
Argument
directory
species a directory to search. Only les that are recognized as valid Type 1 font les
are processed. Each directory must be enclosed in quotation marks. If you specify
more than one directory, then you must separate the directories with a space.
423
If you do not include a tag in your font specication, then SAS searches the registry
for fonts with that name. If more than one font with that name is encountered, then
SAS uses the one that has the highest rank in the following table.
Table 21.1
Rank
Type
Tag
File extension(s)
TrueType
<ttf>
.ttf
Type1
<at1>
.pfa
.pfb
PFR
<pfr>
.pfr
CAUTION:
Support for the Type1 and PFR font types is experimental in SAS 9.1. Do not use these
fonts in production jobs. 4
Note: SAS does not support nonscalable FreeType fonts of any type. Even if they
are recognized as valid FreeType fonts, they will not be added to the SAS registry. 4
Font les that are not produced by major vendors can be unreliable, and in some
cases SAS might not be able to use them.
The following SAS output methods and device drivers can use FreeType fonts:
3
3
3
3
3
3
3
3
3
424
Chapter 21
Accessories
Registry Editor
(Alternatively, you can type regedit in the command window or Command ===>
prompt.)
Display 21.1
Delete
Key
For more information about PROC REGISTRY, see Chapter 41, The REGISTRY
Procedure, on page 831.
425
3
3
3
3
The SASWMF and SASEMF device drivers do not require this change.
For more information about SAS/GRAPH device drivers and the GDEVICE
procedure, see SAS/GRAPH Reference, Volumes 1 and 2.
FONTFILE statement
This example shows how to add a single font le to the SAS registry.
426
Program
Chapter 21
Program
Specify a font le to add. The FONTFILE statement species the complete path to a single
font le.
proc fontreg;
fontfile your-font-file;
run;
Log
NOTE: PROCEDURE PRINTTO used (Total process time):
real time
0.03 seconds
cpu time
0.00 seconds
20
proc fontreg;
21
fontfile your-font-file;
22
run;
SUMMARY:
Files processed: 1
Unusable files: 0
Files identified as fonts: 1
Fonts that were processed: 1
Fonts replaced in the SAS registry: 0
Fonts added to the SAS registry: 1
Fonts that could not be used: 0
NOTE: PROCEDURE FONTREG used (Total process time):
real time
0.17 seconds
cpu time
0.03 seconds
MSGLEVEL= option
FONTPATH statement
This example shows how to add all valid font les from two different directories and
how to write detailed information to the SAS log.
Program
Write complete details to the SAS log. The MSGLEVEL=VERBOSE option writes complete
details about what fonts were added, what fonts were not added, and what font les were not
understood.
proc fontreg msglevel=verbose;
427
Specify the directories to search for valid fonts. You can specify more than one directory in
the FONTPATH statement. Each directory must be enclosed in quotation marks. If you specify
more than one directory, then you must separate the directories with a space.
fontpath your-font-directory-1 your-font-directory-2;
run;
Log (Partial)
NOTE: PROCEDURE PRINTTO used (Total process time):
real time
0.03 seconds
cpu time
0.00 seconds
34
35
36
37
MODE= option
TRUETYPE statement
This example reads all the TrueType Fonts in the specied directory and replaces
those that already exist in the SAS registry.
428
Program
Chapter 21
Program
Replace existing fonts only. The MODE=REPLACE option limits the action of the procedure
to replacing fonts that are already dened in the SAS registry. New fonts will not be added.
proc fontreg mode=replace;
Specify a directory that contains TrueType font les. Files in the directory that are not
recognized as being TrueType font les are ignored.
truetype your-font-directory;
run;
Log
53
proc fontreg mode=replace;
54
truetype your-font-directory;
55
run;
SUMMARY:
Files processed: 49
Unusable files: 4
Files identified as fonts: 45
Fonts that were processed: 39
Fonts replaced in the SAS registry: 39
Fonts added to the SAS registry: 0
Fonts that could not be used: 0
NOTE: PROCEDURE FONTREG used (Total process time):
real time
1.39 seconds
cpu time
0.63 seconds
See Also
3 The GDEVICE procedure in SAS/GRAPH Reference, Volumes 1 and 2
3 The FONTSLOC and SYSPRINTFONT SAS system options in SAS Language
Reference: Dictionary
429
CHAPTER
22
The FORMAT Procedure
Overview: FORMAT Procedure 430
What Does the FORMAT Procedure Do? 430
What Are Formats and Informats? 430
How Are Formats and Informats Associated with a Variable? 430
Syntax: FORMAT Procedure 431
PROC FORMAT Statement 432
EXCLUDE Statement 434
INVALUE Statement 435
PICTURE Statement 438
SELECT Statement 447
VALUE Statement 448
Informat and Format Options 451
Specifying Values or Ranges 453
Concepts: FORMAT Procedure 455
Associating Informats and Formats with Variables 455
Methods of Associating Informats and Formats with Variables 455
Tips 455
See Also 456
Storing Informats and Formats 456
Format Catalogs 456
Temporary Informats and Formats 456
Permanent Informats and Formats 456
Accessing Permanent Informats and Formats 457
Missing Formats and Informats 457
Results: FORMAT Procedure 458
Output Control Data Set 458
Input Control Data Set 460
Procedure Output 461
Examples: FORMAT Procedure 463
Example 1: Creating a Picture Format 464
Example 2: Creating a Format for Character Values 466
Example 3: Writing a Format for Dates Using a Standard SAS Format 468
Example 4: Converting Raw Character Data to Numeric Values 470
Example 5: Creating a Format from a Data Set 472
Example 6: Printing the Description of Informats and Formats 477
Example 7: Retrieving a Permanent Format 478
Example 8: Writing Ranges for Character Strings 480
Example 9: Filling a Picture Format 483
430
Chapter 22
Display 22.1
431
$1,544.32
read with
COMMA9.2
informat
converted value
1544.32
printed using
DOLLAR9.2
format
printed value
$1,544.32
In the gure, SAS reads the raw data value that contains the dollar sign and comma.
The COMMA9.2 informat ignores the dollar sign and comma and converts the value to
1544.32. The DOLLAR9.2 format prints the value, adding the dollar sign and comma.
For more information about associating informats and formats with variables, see
Associating Informats and Formats with Variables on page 455.
432
Chapter 22
To do this
EXCLUDE
INVALUE
PICTURE
SELECT
VALUE
Reminder:
To do this
CNTLIN=
CNTLOUT=
FMTLIB
LIBRARY=
MAXLABLEN=
Specify the number of characters of the start and end values that
appear in the PROC FORMAT output
MAXSELEN=
NOREPLACE
PAGE
1 Used in conjunction with FMTLIB. If PAGE is specied, FMTLIB is invoked (or assumed).
433
Options
CNTLIN=input-control-SAS-data-set
species a SAS data set from which PROC FORMAT builds informats and formats.
CNTLIN= builds formats and informats without using a VALUE, PICTURE, or
INVALUE statement. If you specify a one-level name, then the procedure searches
only the default data library (either the WORK data library or USER data library)
for the data set, regardless of whether you specify the LIBRARY= option.
Note: LIBRARY= can point to either a data library or a catalog. If only a libref is
specied, a catalog name of FORMATS is assumed. 4
A common source for an input control data set is the output from the
CNTLOUT= option of another PROC FORMAT step.
See also: Input Control Data Set on page 460
Tip:
Featured in:
CNTLOUT=output-control-SAS-data-set
creates a SAS data set that stores information about informats and formats that are
contained in the catalog specied in the LIBRARY= option.
Note: LIBRARY= can point to either a data library or a catalog. If only a libref is
specied, then a catalog name of FORMATS is assumed. 4
If you are creating an informat or format in the same step that the CNTLOUT=
option appears, then the informat or format that you are creating is included in the
CNTLOUT= data set.
If you specify a one-level name, then the procedure stores the data set in the
default data library (either the WORK data library or the USER data library),
regardless of whether you specify the LIBRARY= option.
Tip: You can use an output control data set as an input control data set in
subsequent PROC FORMAT steps.
See also: Output Control Data Set on page 458
FMTLIB
prints information about all the informats and formats in the catalog that is specied
in the LIBRARY= option. To get information only about specic informats or formats,
subset the catalog using the SELECT or EXCLUDE statement.
Interaction: The PAGE option invokes FMTLIB.
If your output from FMTLIB is not formatted correctly, then try increasing the
value of the LINESIZE= system option.
Tip: If you use the SELECT or EXCLUDE statement and omit the FMTLIB and
CNTLOUT= options, then the procedure invokes the FMTLIB option and you
receive FMTLIB option output.
Tip:
Featured in:
LIBRARY=libref<.catalog>
species a catalog to contain informats or formats that you are creating in the current
PROC FORMAT step. The procedure stores these informats and formats in the
catalog that you specify so that you can use them in subsequent SAS sessions or jobs.
Note: LIBRARY= can point to either a data library or a catalog. If only a libref is
specied, then a catalog name of FORMATS is assumed. 4
Alias:
LIB=
Default: If you omit the LIBRARY= option, then formats and informats are stored
in the WORK.FORMATS catalog. If you specify the LIBRARY= option but do not
434
EXCLUDE Statement
Chapter 22
specify a name for catalog, then formats and informats are stored in the
libref.FORMATS catalog.
SAS automatically searches LIBRARY.FORMATS. You might want to use the
LIBRARY libref for your format catalog. You can control the order in which SAS
searches for format catalogs with the FMTSEARCH= system option. For further
information about FMTSEARCH=, see the section on SAS system options in SAS
Language Reference: Dictionary.
Tip:
MAXLABLEN=number-of-characters
species the number of characters in the informatted or formatted value that you
want to appear in the CNTLOUT= data set or in the output of the FMTLIB option.
The FMTLIB option prints a maximum of 40 characters for the informatted or
formatted value.
MAXSELEN=number-of-characters
species the number of characters in the start and end values that you want to
appear in the CNTLOUT= data set or in the output of the FMTLIB option. The
FMTLIB option prints a maximum of 16 characters for start and end values.
NOREPLACE
prevents a new informat or format that you are creating from replacing an existing
informat or format of the same name. If you omit NOREPLACE, then the procedure
warns you that the informat or format already exists and replaces it.
Note:
PAGE
prints information about each format and informat (that is, each entry) in the catalog
on a separate page.
Tip:
EXCLUDE Statement
Excludes entries from processing by the FMTLIB and CNTLOUT= options.
Restriction:
Restriction: You cannot use a SELECT statement and an EXCLUDE statement within
the same PROC FORMAT step.
EXCLUDE entry(s);
Required Arguments
entry(s)
species one or more catalog entries to exclude from processing. Catalog entry names
are the same as the name of the informat or format that they store. Because
informats and formats can have the same name, and because character and numeric
informats or formats can have the same name, you must use certain prexes when
INVALUE Statement
435
specifying informats and formats in the EXCLUDE statement. Follow these rules
when specifying entries in the EXCLUDE statement:
3 Precede names of entries that contain character formats with a dollar sign ($).
3 Precede names of entries that contain character informats with an at sign and a
dollar sign (for example, @$entry-name).
3 Precede names of entries that contain numeric informats with an at sign (@).
3 Specify names of entries that contain numeric formats without a prex.
FMTLIB Output
If you use the EXCLUDE statement without either FMTLIB or CNTLOUT= in the
PROC FORMAT statement, then the procedure invokes FMTLIB.
INVALUE Statement
Creates an informat for reading and converting raw data values.
Featured in: Example 4 on page 470.
See also: The section on informats in SAS Language Reference: Dictionary for
documentation on informats supplied by SAS.
To do this
DEFAULT=
FUZZ=
MAX=
MIN=
NOTSORTED
JUST
UPCASE
436
INVALUE Statement
Chapter 22
Required Arguments
name
Requirement:
Tip:
When SAS prints messages that refer to a user-written informat, the name is
prexed by an at sign (@). When the informat is stored, the at sign is prexed to
the name that you specify for the informat; this is why the name is limited to 31
or 30 characters. You need to use the at sign only when you are using the name in
an EXCLUDE or SELECT statement; do not prex the name with an at sign when
you are associating the informat with a variable.
Tip:
Options
The following options are common to the INVALUE, PICTURE, and VALUE
statements and are described in Informat and Format Options on page 451:
DEFAULT=length
FUZZ= fuzz-factor
MAX=length
MIN=length
NOTSORTED
In addition, you can use the following options:
JUST
left-justies all input strings before they are compared to the ranges.
UPCASE
converts all raw data values to uppercase before they are compared to the possible
ranges. If you use UPCASE, then make sure the values or ranges you specify are in
uppercase.
value-range-set(s)
species raw data and values that the raw data will become. The value-range-set(s)
can be one or more of the following:
value-or-range-1 <, value-or-range-n>=informatted-value|[existing-informat]
The informat converts the raw data to the values of informatted-value on the right
side of the equal sign.
informatted-value
is the value you want the raw data in value-or-range to become. Use one of the
following forms for informatted-value:
INVALUE Statement
437
character-string
is a character string up to 32,767 characters long. Typically, character-string
becomes the value of a character variable when you use the informat to convert
raw data. Use character-string for informatted-value only when you are creating
a character informat. If you omit the single or double quotation marks around
character-string, then the INVALUE statement assumes that the quotation
marks are there.
For hexadecimal literals, you can use up to 32,767 typed characters, or up to
16,382 represented characters at 2 hexadecimal characters per represented
character.
number
is a number that becomes the informatted value. Typically, number becomes the
value of a numeric variable when you use the informat to convert raw data. Use
number for informatted-value when you are creating a numeric informat. The
maximum for number depends on the host operating environment.
_ERROR_
treats data values in the designated range as invalid data. SAS assigns a
missing value to the variable, prints the data line in the SAS log, and issues a
warning message.
_SAME_
prevents the informat from converting the raw data as any other value. For
example, the following GROUP. informat converts values 01 through 20 and
assigns the numbers 1 through 20 as the result. All other values are assigned a
missing value.
invalue group 01-20= _same_
other= .;
existing-informat
is an informat that is supplied by SAS or a user-dened informat. The informat
you are creating uses the existing informat to convert the raw data that match
value-or-range on the left side of the equals sign. If you use an existing informat,
then enclose the informat name in square brackets (for example, [date9.]) or with
parentheses and vertical bars, for example, (|date9.|). Do not enclose the name of
the existing informat in single quotation marks.
value-or-range
See Specifying Values or Ranges on page 453.
Consider the following examples:
3 The $GENDER. character informat converts the raw data values F and M to
character values 1 and 2:
invalue $gender F=1
M=2;
The dollar sign prex indicates that the informat converts character data.
3 When you are creating numeric informats, you can specify character strings or
numbers for value-or-range. For example, the TRIAL. informat converts any
character string that sorts between A and M to the number 1 and any character
string that sorts between N and Z to the number 2. The informat treats the
unquoted range 13000 as a numeric range, which includes all numeric values
between 1 and 3000:
invalue trial A-M=1
N-Z=2
1-3000=3;
438
PICTURE Statement
Chapter 22
PICTURE Statement
Creates a template for printing numbers.
Featured in:
See also:
To do this
DATATYPE=
DEFAULT=
DECSEP=
DIG3SEP=
FUZZ=
MAX=
MIN=
MULTILABEL
NOTSORTED
ROUND
FILL=
MULTIPLIER=
To do this
PICTURE Statement
439
NOEDIT
PREFIX=
Required Arguments
name
Requirement:
Tip:
Options
The following options are common to the INVALUE, PICTURE, and VALUE
statements and are described in Informat and Format Options on page 451:
DEFAULT= length
FUZZ=fuzz-factor
MAX=length
MIN=length
NOTSORTED
In addition, you can use the following arguments:
DATATYPE=DATE | TIME | DATETIME
species that you can use directives in the picture as a template to format date, time,
or datetime values. See the denition and list of directives on page 442.
If you format a numeric missing value, then the resulting label will be ERROR.
Adding a clause to your program that checks for missing values can eliminate the
ERROR label.
Tip:
DECSEP=character
440
PICTURE Statement
Chapter 22
FILL=character
species a character that completes the formatted value. If the number of signicant
digits is less than the length of the format, then the format must complete, or ll, the
formatted value:
3 The format uses character to ll the formatted value if you specify zeros as digit
selectors.
3 The format uses zeros to ll the formatted value if you specify nonzero digit
selectors. The FILL= option has no effect.
If the picture includes other characters, such as a comma, which appear to the left
of the digit selector that maps to the last signicant digit placed, then the characters
are replaced by the ll character or leading zeros.
Default: (a blank)
Interaction: If you use the FILL= and PREFIX= options in the same picture, then
the format places the prex and then the ll characters.
Featured in: Example 9 on page 483
MULTILABEL
allows the assignment of multiple labels or external values to internal values. The
following PICTURE statements show the two uses of the MULTILABEL option. In
each case, number formats are assigned as labels. The rst PICTURE statement
assigns multiple labels to a single internal value. Multiple labels may also be
assigned to a single range of internal values. The second PICTURE statement
assigns labels to overlapping ranges of internal values. The MULTILABEL option
allows the assignment of multiple labels to the overlapped internal values.
picture abc (multilabel)
1000=9,999
1000=9999;
picture overlap (multilabel)
/* without decimals */
0-999=999
1000-9999=9,999
/* with decimals */
0-9=9.999
10-99=99.99
100-999=999.9;
PICTURE Statement
441
is 015, and the secondary label for 15 is 15.00 because the range 0999 occurs in
sequence before the range 1099. If you want the primary label for 15 to use the
99.99 format, then you might want to change the range 1099 to 099 in the
PICTURE statement. The range 099 occurs in sequence before the range 0999 and
will produce the desired result.
MULTIPLIER=n
MULT=
Alias:
Default: 10 , where n is the number of digits after the rst decimal point in the
picture. For example, suppose your data contains a value 123.456 and you want to
3
print it using a picture of 999.999. The format multiplies 123.456 by 10 to obtain
a value of 123456, which results in a formatted value of 123.456.
Example: Example 1 on page 464
NOEDIT
species that numbers are message characters rather than digit selectors; that is, the
format prints the numbers as they appear in the picture. For example, the following
PICTURE statement creates the MILES. format, which formats any variable value
greater than 1000 as >1000 miles:
picture miles 1-1000=0000
1000<-high=>1000 miles(noedit);
PREFIX=prex
species a character prex to place in front of the values rst signicant digit. You
must use zero digit selectors or the prex will not be used.
The picture must be wide enough to contain both the value and the prex. If the
picture is not wide enough to contain both the value and the prex, then the format
truncates or omits the prex. Typical uses for PREFIX= are printing leading
currency symbols and minus signs. For example, the PAY. format prints the variable
value 25500 as $25,500.00:
picture pay low-high=000,009.99
(prefix=$);
Default: no prex
Interaction: If you use the FILL= and PREFIX= options in the same picture, then
ROUND
rounds the value to the nearest integer before formatting. Without the ROUND
option, the format multiplies the variable value by the multiplier, truncates the
decimal portion (if any), and prints the result according to the template that you
dene. With the ROUND option, the format multiplies the variable value by the
multiplier, rounds that result to the nearest integer, and then formats the value
according to the template. Note that if the FUZZ= option is also specied, the
rounding takes place after SAS has used the fuzz factor to determine which range
the value belongs to.
Tip:
Note that the ROUND option rounds a value of .5 to the next highest integer.
442
PICTURE Statement
Chapter 22
value-range-set
species one or more variable values and a template for printing those values. The
value-range-set is the following:
value-or-range-1 <, value-or-range-n>=picture
picture
species a template for formatting values of numeric variables. The picture is a
sequence of characters in single quotation marks. The maximum length for a
picture is 40 characters. Pictures are specied with three types of characters: digit
selectors, message characters, and directives. You can have a maximum of 16 digit
selectors in a picture.
Digit selectors are numeric characters (0 through 9) that dene positions for
numeric values. A picture format with nonzero digit selectors prints any leading
zeros in variable values; picture digit selectors of 0 do not print leading zeros in
variable values. If the picture format contains digit selectors, then a digit selector
must be the rst character in the picture.
Note: This chapter uses 9s as nonzero digit selectors. 4
Message characters are nonnumeric characters that print as specied in the
picture. The following PICTURE statement contains both digit selectors (99) and
message characters (illegal day value). Because the DAYS. format has nonzero
digit selectors, values are printed with leading zeros. The special range OTHER
prints the message characters for any values that do not fall into the specied
range (1 through 31).
picture days 01-31=99
other=99-illegal day value;
Directives are special characters that you can use in the picture to format date,
time, or datetime values.
Restriction: You can only use directives when you specify the DATATYPE= option
%A
%b
%B
%d
%H
%I
%j
%m
%M
PICTURE Statement
443
%p
%S
%U
Week number of the year (Sunday as the rst day of the week)
as a decimal number (0,53), with no leading zero
%w
%y
%Y
%%
%
Any directive that generates numbers can produce a leading zero, if desired, by
adding a 0 before the directive. This applies to %d, %H, %I, %j, %m, %M, %S, %U,
and %y. For example, if you specify %y in the picture, then 2001 would be
formatted as 1, but if you specify %0y, then 2001 would be formatted as 01.
Tip: Add code to your program to direct how you want missing values to be
displayed.
value-or-range
See Specifying Values or Ranges on page 453.
444
PICTURE Statement
Chapter 22
Amount
1
2
3
4
5
6
7
8
9
-2.051
-0.050
-0.017
0.000
0.093
0.540
0.556
6.600
14.630
The following PROC FORMAT step uses the ROUND format option and creates the
NOZEROS. format, which eliminates leading zeros in the formatted values:
libname library SAS-data-library;
The following table explains how one value from each range is formatted. Figure 22.1
on page 446 provides an illustration of each step. The circled numbers in the gure
correspond to the step numbers in the table.
Table 22.1
Step
Rule
In this example
PICTURE Statement
Step
Rule
In this example
445
446
PICTURE Statement
Chapter 22
Step
Rule
In this example
Figure 22.1
SELECT Statement
447
The following PROC PRINT step associates the NOZEROS. format with the
AMOUNT variable in SAMPLE. The output shows the result of rounding.
proc print data=sample noobs;
format amount nozeros.;
title Formatting the Variable Amount;
title2 with the NOZEROS. Format;
run;
Amount
-2.05
-.05
-.02
.00
.09
.54
.56
6.60
14.63
CAUTION:
The picture must be wide enough for the prex and the numbers. In this example, if the
value 45.00 were formatted with NOZEROS. then the result would be 45.00 because
it falls into the rst range, low - 1, and the picture for that range is not wide
enough to accommodate the prexed minus sign and the number. 4
Specifying No Picture
This PICTURE statement creates a picture-name format that has no picture:
picture picture-name;
Using this format has the effect of applying the default SAS format to the values.
SELECT Statement
Selects entries from processing by the FMTLIB and CNTLOUT= options.
Restriction:
Restriction: You cannot use a SELECT statement and an EXCLUDE statement within
the same PROC FORMAT step.
Featured in: Example 6 on page 477.
SELECT entry(s);
448
VALUE Statement
Chapter 22
Required Arguments
entry(s)
species one or more catalog entries for processing. Catalog entry names are the
same as the name of the informat or format that they store. Because informats and
formats can have the same name, and because character and numeric informats or
formats can have the same name, you must use certain prexes when specifying
informats and formats in the SELECT statement. Follow these rules when specifying
entries in the SELECT statement:
3 Precede names of entries that contain character formats with a dollar sign ($).
3 Precede names of entries that contain character informats with an at sign and a
dollar sign, for example, @$entry-name.
3 Precede names of entries that contain numeric informats with an at sign (@).
3 Specify names of entries that contain numeric formats without a prex.
In addition, the following SELECT statement selects all formats or informats that
occur alphabetically between apple and pear, inclusive:
select apple-pear;
FMTLIB Output
If you use the SELECT statement without either FMTLIB or CNTLOUT= in the
PROC FORMAT statement, then the procedure invokes FMTLIB.
VALUE Statement
Creates a format that species character strings to use to print variable values.
Featured in:
See also: The chapter about formats in SAS Language Reference: Dictionary for
To do this
DEFAULT=
FUZZ=
MAX=
VALUE Statement
To do this
MIN=
MULTILABEL
449
NOTSORTED
Required Arguments
name
Options
The following options are common to the INVALUE, PICTURE, and VALUE
statements and are described in Informat and Format Options on page 451:
DEFAULT=length
FUZZ= fuzz-factor
MAX=length
MIN=length
NOTSORTED
In addition, you can use the following options:
MULTILABEL
allows the assignment of multiple labels or external values to internal values. The
following VALUE statements show the two uses of the MULTILABEL option. The
rst VALUE statement assigns multiple labels to a single internal value. Multiple
labels may also be assigned to a single range of internal values. The second VALUE
statement assigns labels to overlapping ranges of internal values. The MULTILABEL
option allows the assignment of multiple labels to the overlapped internal values.
value one (multilabel)
1=ONE
1=UNO
1=UN
value agefmt (multilabel)
15-29=below 30 years
450
VALUE Statement
Chapter 22
30-50=between 30 and 50
51-high=over 50 years
15-19=15 to 19
20-25=20 to 25
25-39=25 to 39
40-55=40 to 55
56-high=56 and above;
species one or more variable values and a character string or an existing format.
The value-range-set(s) can be one or more of the following:
value-or-range-1 <, value-or-range-n>=formatted-value|[existing-format]
The variable values on the left side of the equals sign print as the character string
on the right side of the equals sign.
formatted-value
species a character string that becomes the printed value of the variable value
that appears on the left side of the equals sign. Formatted values are always
character strings, regardless of whether you are creating a character or numeric
format.
Formatted values can be up to 32,767 characters. For hexadecimal literals, you
can use up to 32,767 typed characters, or up to 16,382 represented characters at 2
hexadecimal characters per represented character. Some procedures, however, use
only the rst 8 or 16 characters of a formatted value.
Requirement: You must enclose a formatted value in single or double quotation
marks. The following example shows a formatted value that is enclosed in
double quotation marks.
value $ score
M=Male "(pass)"
F=Female "(pass)";
Tip: Formatting numeric variables does not preclude the use of those variables in
451
(|date9.|). Do not enclose the name of the existing format in single quotation
marks.
Using an existing format can be thought of as nesting formats. A nested level of
one means that if you are creating the format A with the format B as a formatted
value, then the procedure has to use only one existing format to create A.
Tip: Avoid nesting formats more than one level. The resource requirements can
3 The $STATE. character format prints the postal code for selected states:
value $state Delaware=DE
Florida=FL
Ohio=OH;
The variable value Delaware prints as DE, the variable value Florida prints
as FL, and the variable value Ohio prints as OH. Note that the $STATE. format
begins with a dollar sign.
Note: Range specications are case sensitive. In the $STATE. format above,
the value OHIO would not match any of the specied ranges. If you are not
certain what case the data values are in, then one solution is to use the
UPCASE function on the data values and specify all uppercase characters for
the ranges. 4
3 The numeric format ANSWER.writes the values 1 and 2 as yes and no:
value answer 1=yes
2=no;
Specifying No Ranges
This VALUE statement creates a format-name format that has no ranges:
value format-name;
Using this format has the effect of applying the default SAS format to the values.
452
Chapter 22
FUZZ=fuzz-factor
species a fuzz factor for matching values to a range. If a number does not match
or fall in a range exactly but comes within fuzz-factor, then the format considers it
a match. For example, the following VALUE statement creates the LEVELS.
format, which uses a fuzz factor of .2:
value levels (fuzz=.2) 1=A
2=B
3=C;
FUZZ=.2 means that if a variable value falls within .2 of a value on either end
of the range, then the format uses the corresponding formatted value to print the
variable value. So the LEVELS. format formats the value 2.1 as B.
If a variable value matches one value or range without the fuzz factor, and also
matches another value or range with the fuzz factor, then the format assigns the
variable value to the value or range that it matched without the fuzz factor.
Default: 1E12 for numeric formats and 0 for character formats.
Tip: Specify FUZZ=0 to save storage space when you use the VALUE statement
to create numeric formats.
Tip: A value that is excluded from a range using the < operator does not receive
the formatted value, even if it falls into the range when you use the fuzz factor.
MAX=length
species a maximum length for the informat or format. When you associate the
format with a variable, you cannot specify a width greater than the MAX= value.
Default: 40
Range: 140
MIN=length
species a minimum length for the informat or format.
Default: 1
Range: 140
NOTSORTED
stores values or ranges for informats or formats in the order in which you dene
them. If you do not specify NOTSORTED, then values or ranges are stored in
sorted order by default, and SAS uses a binary searching algorithm to locate the
range that a particular value falls into. If you specify NOTSORTED, then SAS
searches each range in the order in which you dene them until a match is found.
Use NOTSORTED if
3 you know the likelihood of certain ranges occurring, and you want your
informat or format to search those ranges rst to save processing time.
3 you want to preserve the order that you dene ranges when you print a
description of the informat or format using the FMTLIB option.
3 you want to preserve the order that you dene ranges when you use the
ORDER=DATA option and the PRELOADFMT option to analyze class
variables in PROC MEANS, PROC SUMMARY, or PROC TABULATE.
Do not use NOTSORTED if the distribution of values is uniform or unknown, or
if the number of values is relatively small. The binary searching algorithm that
SAS uses when NOTSORTED is not specied optimizes the performance of the
search under these conditions.
Note: SAS automatically sets the NOTSORTED option when you use the
CPORT and the CIMPORT procedures to transport informats or formats between
453
option to build the formats and informats from the imported control data set.
454
Chapter 22
You can use LOW or HIGH as one value in a range, and you can use the range
LOW-HIGH to encompass all values. For example, these are valid ranges:
low-ZZ
35-high
low-high
You can use the less than (<) symbol to exclude values from ranges. If you are
excluding the rst value in a range, then put the < after the value. If you are
excluding the last value in a range, then put the < before the value. For example,
the following range does not include 0:
0<-100
If a value at the high end of one range also appears at the low end of another
range, and you do not use the < noninclusion notation, then PROC FORMAT
assigns the value to the rst range. For example, in the following ranges, the
value AJ is part of the rst range:
AA-AJ=1 AJ-AZ=2
In this example, to include the value AJ in the second range, use the noninclusive
notation on the rst range:
AA-<AJ=1 AJ-AZ=2
If you overlap values in ranges, then PROC FORMAT returns an error message
unless, for the VALUE statement, the MULTILABEL option is specied. For
example, the following ranges will cause an error:
AA-AK=1 AJ-AZ=2
If the value were 96.9, then the printed result would be 96.9.
455
Step
Informats
Formats
In a DATA step
In a PROC step
Tips
3 Do not confuse the FORMAT statement with the FORMAT procedure. The
FORMAT and INFORMAT statements associate an existing format or informat
(either standard SAS or user-dened) with one or more variables. PROC FORMAT
creates user-dened formats or informats. Assigning your own format or informat
to a variable is a two-step process: creating the format or informat with the
FORMAT procedure, and then assigning the format or informat with the
FORMAT, INFORMAT, or ATTRIB statement.
456
Chapter 22
See Also
3 For complete documentation on the ATTRIB, INFORMAT, and FORMAT
statements, see the section on statements in SAS Language Reference: Dictionary.
3 For complete documentation on the INPUT and PUT functions, see the section on
functions in SAS Language Reference: Dictionary.
3 See Formatted Values on page 25 for more information and examples of using
formats in base SAS procedures.
Format Catalogs
PROC FORMAT stores user-dened informats and formats as entries in SAS
catalogs.* You use the LIBRARY= option in the PROC FORMAT statement to specify
the catalog. If you omit the LIBRARY= option, then formats and informats are stored
in the WORK.FORMATS catalog. If you specify LIBRARY=libref but do not specify a
catalog name, then formats and informats are stored in the libref.FORMATS catalog.
Note that this use of a one-level name differs from the use of a one-level name
elsewhere in SAS. With the LIBRARY= option, a one-level name indicates a library;
elsewhere in SAS, a one-level name indicates a le in the WORK library.
The name of the catalog entry is the name of the format or informat. The entry types
are
3 FORMAT for numeric formats
3 FORMATC for character formats
3 INFMT for numeric informats
3 INFMTC for character informats.
457
3 If you have more than one format catalog, or if the format catalog is named
something other than FORMATS, then you should do the following:
1 Assign a libref to a SAS data library in the SAS session in which you are
running the PROC FORMAT step.
2 Specify LIBRARY=libref or LIBRARY=libref.catalog in the PROC FORMAT
step, where libref is the libref that you assigned in step 1.
3 In the SAS program that uses your user-dened formats and informats, use
the FMTSEARCH= option in an OPTIONS statement, and include libref or
libref.catalog in the list of format catalogs.
The syntax for specifying a list of format catalogs to search is
OPTIONS FMTSEARCH=(catalog-specication-1< catalog-specication-n>);
where each catalog-specication can be libref or libref.catalog. If only libref is specied,
then SAS assumes that the catalog name is FORMATS.
When searching for a format or informat, SAS always searches in WORK.FORMATS
rst, and then LIBRARY.FORMATS, unless one of them appears in the FMTSEARCH=
list. SAS searches the catalogs in the FMTSEARCH= list in the order that they are
listed until the format or informat is found.
For further information on FMTSEARCH=, see the section on SAS system options in
SAS Language Reference: Dictionary. For an example that uses the LIBRARY= and
FMTSEARCH= options together, see Example 8 on page 480.
Refer to the section on SAS system options in SAS Language Reference: Dictionary
for more information on NOFMTERR.
458
Chapter 22
FILL
for picture formats, a numeric variable whose value is the value of the FILL=
option
FMTNAME
a character variable whose value is the format or informat name
FUZZ
a numeric variable whose value is the value of the FUZZ= option
HLO
a character variable that contains range information about the format or informat
in the form of eight different letters that can appear in any combination. Values
are
F
459
range is OTHER
LABEL
a character variable whose value is the informatted or formatted value or the
name of an existing informat or format
LENGTH
a numeric variable whose value is the value of the LENGTH= option
MAX
a numeric variable whose value is the value of the MAX= option
MIN
a numeric variable whose value is the value of the MIN= option
MULT
a numeric variable whose value is the value of the MULT= option
NOEDIT
for picture formats, a numeric variable whose value indicates whether the
NOEDIT option is in effect. Values are
1
PREFIX
for picture formats, a character variable whose value is the value of the PREFIX=
option
SEXCL
a character variable that indicates whether the ranges starting value is excluded.
Values are
Y
START
a character variable that gives the ranges starting value
TYPE
a character variable that indicates the type of format. Possible values are
C
character format
numeric informat
character informat
picture format
Output 22.1 shows an output control data set that contains information on all the
informats and formats created in Examples: FORMAT Procedure on page 463.
460
Output 22.1
Chapter 22
1
D L
D A A
M
T
N
S
T
O A
b M
A
R
E
N
B
E
M
I
s E
1 BENEFIT LOW
7304 WORDDATE20.
2 BENEFIT
3 NOZEROS LOW
4 NOZEROS
7305 HIGH
-1
5 NOZEROS
6 NOZEROS
7 PTSFRMT
0
1 HIGH
0
8 PTSFRMT
9 PTSFRMT
10 PTSFRMT
4
7
9
11 PTSFRMT
12 USCURR LOW
E
F
A
L
E
N
M
A
U
L
G
T
L
A
1 99
00.00
3 0%
N
O
S E
M F E T E E
D I T N
E G A G
C 3 T U
U
Z
F
I
U I D Y X X H
L L I P C C L
S S Y A
E E P G
T L T E L L O
P P E E
1E-12
0.00
0 N N N LF
1E-12
0.00
1E-12 - 100.00
1E-12 -. 100.00
0 N N N H
0 P N N L
0 P Y Y
0 P N Y
0 P N N H
0 N N N
100.00
100.00
0.00
. ,
. ,
1 40
1 40
1 40
5
5
3
5
5
3
1E-12 .
1E-12
1E-12
1 40
1 40
1 40
3
3
3
3
3
3
1E-12
1E-12
1E-12
0.00
0.00
0.00
0 N N N
0 N N N
0 N N N
10%
000,000
1 40
1 40
3
7
3
7
1E-12
1E-12 $
0.00
1.61
0 N N N H
0 P N N LH . ,
6 3%
8 6%
10 8%
11 HIGH
HIGH
1 40 20 20
** Not Eligible ** 1 40 20 20
-1 00.00
1 40 5 5
0 99
1 40 5 5
P
R
E
13 CITY
14 CITY
15 CITY
BR1
BR2
BR3
BR1
BR2
BR3
Birmingham UK
Plymouth UK
York UK
1 40 14 14
1 40 14 14
1 40 14 14
0
0
0
0.00
0.00
0.00
0 C N N
0 C N N
0 C N N
16 CITY
17 CITY
18 CITY
US1
US2
**OTHER**
US1
US2
**OTHER**
Denver USA
Miami USA
INCORRECT CODE
1 40 14 14
1 40 14 14
1 40 14 14
0
0
0
0.00
0.00
0.00
0 C N N
0 C N N
0 C N N O
19 EVAL
20 EVAL
21 EVAL
C
E
N
C
E
N
1 1 40
2 1 40
0 1 40
1
1
1
1
1
1
0
0
0
0.00
0.00
0.00
0 I N N
0 I N N
0 I N N
22 EVAL
23 EVAL
O
S
O
S
4 1 40
3 1 40
1
1
1
1
0
0
0.00
0.00
. ,
. ,
0 I N N
0 I N N
You can use the SELECT or EXCLUDE statement to control which formats and
informats are represented in the output control data set. For details, see SELECT
Statement on page 447 and EXCLUDE Statement on page 434.
3 For both numeric and character formats, the data set must contain the variables
FMTNAME, START, and LABEL, which are described in Output Control Data
Set on page 458. The remaining variables are not always required.
3 If you are creating a character format or informat, then you must either begin the
format or informat name with a dollar sign ($) or specify a TYPE variable with the
value C.
3 If you are creating a PICTURE statement format, then you must specify a TYPE
variable with the value P.
Procedure Output
461
3 If you are creating a format with ranges of input values, then you must specify the
END variable. If range values are to be noninclusive, then the variables SEXCL
and EEXCL must each have a value of Y. Inclusion is the default.
You can create more than one format from an input control data set if the
observations for each format are grouped together.
You can use a VALUE, INVALUE, or PICTURE statement in the same PROC
FORMAT step with the CNTLIN= option. If the VALUE, INVALUE, or PICTURE
statement is creating the same informat or format that the CNTLIN= option is
creating, then the VALUE, INVALUE, or PICTURE statement creates the informat or
format and the CNTLIN= data set is not used. You can, however, create an informat or
format with VALUE, INVALUE, or PICTURE and create a different informat or format
with CNTLIN= in the same PROC FORMAT step.
For an example featuring an input control data set, see Example 5 on page 472.
Procedure Output
The FORMAT procedure prints output only when you specify the FMTLIB option or
the PAGE option in the PROC FORMAT statement. The printed output is a table for
each format or informat entry in the catalog that is specied in the LIBRARY= option.
The output also contains global information and the specics of each value or range
that is dened for the format or informat. You can use the SELECT or EXCLUDE
statement to control which formats and informats are represented in the FMTLIB
output. For details, see SELECT Statement on page 447 and EXCLUDE Statement
on page 434. For an example, see Example 6 on page 477.
The FMTLIB output shown in Output 22.2 contains a description of the NOZEROS.
format, which is created in Building a Picture Format: Step by Step on page 443, and
the EVAL. informat, which is created in Example 4 on page 470.
462
Procedure Output
Chapter 22
Output 22.2
---------------------------------------------------------------------------|
FORMAT NAME: NOZEROS LENGTH:
5
NUMBER OF VALUES:
4
|
|
MIN LENGTH:
1 MAX LENGTH: 40 DEFAULT LENGTH
5 FUZZ: STD
|
|--------------------------------------------------------------------------|
|START
|END
|LABEL (VER. 7.00
29MAY98:10:00:24) |
|----------------+----------------+----------------------------------------|
|LOW
|
-1|00.00
P- F M100
|
|
-1<
0<99
P-. F M100
|
|
0|
1<99
P. F M100
|
|
1|HIGH
|00.00
P
F M100
|
----------------------------------------------------------------------------
---------------------------------------------------------------------------|
INFORMAT NAME: @EVAL
LENGTH:
1
NUMBER OF VALUES:
5
|
|
MIN LENGTH:
1 MAX LENGTH: 40 DEFAULT LENGTH
1 FUZZ:
0
|
|--------------------------------------------------------------------------|
|START
|END
|INVALUE(VER. 7.00
29MAY98:10:00:25) |
|----------------+----------------+----------------------------------------|
|C
|C
|
1|
|E
|E
|
2|
|N
|N
|
0|
|O
|O
|
4|
|S
|S
|
3|
----------------------------------------------------------------------------
The elds are described below in the order they appear in the output, from left to right:
INFORMAT NAME
FORMAT NAME
the name of the informat or format. Informat names begin with an at-sign (@).
LENGTH
the length of the informat or format. PROC FORMAT determines the length in the
following ways:
3 For character informats, the value for LENGTH is the length of the longest
raw data value on the left side of the equals sign.
3 LENGTH is the same as the longest raw data value on the left side of
the equal sign.
3 For formats, the value for LENGTH is the length of the longest value on the
right side of the equal sign.
In the output for @EVAL., the length is 1 because 1 is the length of the longest
raw data value on the left side of the equals sign.
In the output for NOZEROS., the LENGTH is 5 because the longest picture is 5
characters.
NUMBER OF VALUES
463
MIN LENGTH
the minimum length of the informat or format. The value for MIN LENGTH is 1
unless you specify a different minimum length with the MIN= option.
MAX LENGTH
the maximum length of the informat or format. The value for MAX LENGTH is 40
unless you specify a different maximum length with the MAX= option.
DEFAULT LENGTH
the length of the longest value in the INVALUE or LABEL eld, or the value of
the DEFAULT= option.
FUZZ
the fuzz factor. For informats, FUZZ always is 0. For formats, the value for this
eld is STD if you do not use the FUZZ= option. STD signies the default fuzz
value.
START
the beginning value of a range. FMTLIB prints only the rst 16 characters of a
value in the START and END columns.
END
the ending value of a range. The exclusion sign (<) appears after the values in
START and END, if the value is excluded from the range.
INVALUE
LABEL
INVALUE appears only for informats and contains the informatted values.
LABEL appears only for formats and contains either the formatted value or
picture. The SAS release number and the date on which the format or informat
was created are in parentheses after INVALUE or LABEL.
For picture formats, such as NOZEROS., the LABEL section contains the
PREFIX=, FILL=, and MULT= values. To note these values, FMTLIB prints the
letters P, F, and M to represent each option, followed by the value. For example, in
the LABEL section, P-. indicates that the prex value is a dash followed by a
period.
FMTLIB prints only 40 characters in the LABEL column.
Create the data set PROCLIB.STAFF. The INPUT statement assigns the names Name,
IdNumber, Salary, Site, and HireDate to the variables that appear after the DATALINES
statement. The FORMAT statement assigns the standard SAS format DATE7. to the variable
HireDate.
data proclib.staff;
input Name & $16. IdNumber $ Salary
464
Chapter 22
30JAN79
18JUN76
20MAR84
18SEP74
10JAN93
16FEB83
02FEB90
15APR86
19JUN93
18DEC91
The variables are about a small subset of employees who work for a corporation that
has sites in the U.S. and Britain. The data contain the name, identication number,
salary (in British pounds), location, and date of hire for each employee.
This example uses a PICTURE statement to create a format that prints the values
for the variable Salary in the data set PROCLIB.STAFF in U.S. dollars.
Program
Assign two SAS library references (PROCLIB and LIBRARY). Assigning a library
reference LIBRARY is useful in this case because if you use PROC FORMAT, then SAS
automatically searches for informats and formats in any library that is referenced with the
LIBRARY libref.
libname proclib SAS-data-library-1 ;
libname library SAS-data-library-2;
Program
465
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=40;
Dene the USCurrency. picture format. The PICTURE statement creates a template for
printing numbers. LOW-HIGH ensures that all values are included in the range. The MULT=
statement option species that each value is multiplied by 1.61. The PREFIX= statement adds a
US dollar sign to any number that you format. The picture contains six digit selectors, ve for
the salary and one for the dollar sign prex.
picture uscurrency low-high=000,000 (mult=1.61 prefix=$);
run;
Print the PROCLIB.STAFF data set. The NOOBS option suppresses the printing of
observation numbers. The LABEL option uses variable labels instead of variable names for
column headings.
proc print data=proclib.staff noobs label;
Specify a label and format for the Salary variable. The LABEL statement substitutes the
specic label for the variable in the report. In this case, Salary in US Dollars is substituted for
the variable Salary for this print job only. The FORMAT statement associates the USCurrency.
format with the variable name Salary for the duration of this procedure step.
label salary=Salary in U.S. Dollars;
format salary uscurrency.;
466
Output
Chapter 22
Output
PROCLIB.STAFF with a Format for the Variable Salary
Name
Id
Number
Salary in
U.S.
Dollars
2355
5889
3878
4409
3985
0740
2398
5162
4421
7385
$34,072
$33,771
$31,509
$55,256
$78,980
$80,648
$56,643
$64,561
$62,403
$37,010
Capalleti, Jimmy
Chen, Len
Davis, Brad
Leung, Brenda
Martinez, Maria
Orfali, Philip
Patel, Mary
Smith, Robert
Sorrell, Joseph
Zook, Carla
Site
BR1
BR1
BR2
BR2
US2
US2
BR3
BR5
US1
BR3
Hire
Date
30JAN79
18JUN76
20MAR84
18SEP74
10JAN93
16FEB83
02FEB90
15APR86
19JUN93
18DEC91
VALUE statement
OTHER keyword
Data set:
This example uses a VALUE statement to create a character format that prints a
value of a character variable as a different character string.
Program
Assign two SAS library references (PROCLIB and LIBRARY). Assigning a library
reference LIBRARY is useful in this case because if you use PROC FORMAT, then SAS
automatically searches for informats and formats in any library that is referenced with the
LIBRARY libref.
libname proclib SAS-data-library-1;
libname library SAS-data-library-2;
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=40;
Program
467
Create the catalog named LIBRARY.FORMATS, where the user-dened formats will be
stored. The LIBRARY= option species a permanent storage location for the formats that you
create. It also creates a catalog named FORMAT in the specied library. If you do not use
LIBRARY=, then SAS temporarily stores formats and informats that you create in a catalog
named WORK.FORMATS.
proc format library=library;
Dene the $CITY. format. The special codes BR1, BR2, and so on, are converted to the names
of the corresponding cities. The keyword OTHER species that values in the data set that do
not match any of the listed city code values are converted to the value INCORRECT CODE.
value
$city BR1=Birmingham UK
BR2=Plymouth UK
BR3=York UK
US1=Denver USA
US2=Miami USA
other=INCORRECT CODE;
run;
Print the PROCLIB.STAFF data set. The NOOBS option suppresses the printing of
observation numbers. The LABEL option uses variable labels instead of variable names for
column headings.
proc print data=proclib.staff noobs label;
Specify a label for the Salary variable. The LABEL statement substitutes the label Salary
in U.S. Dollars for the name SALARY.
label salary=Salary in U.S. Dollars;
Specify formats for Salary and Site. The FORMAT statement temporarily associates the
USCurrency. format (created in Example 1 on page 464) with the variable SALARY and also
temporarily associates the format $CITY. with the variable SITE.
format salary uscurrency. site $city.;
468
Output
Chapter 22
Output
PROCLIB.STAFF with a Format for the Variables
Salary and Site
Name
Capalleti, Jimmy
Chen, Len
Davis, Brad
Leung, Brenda
Martinez, Maria
Orfali, Philip
Patel, Mary
Smith, Robert
Sorrell, Joseph
Zook, Carla
Id
Number
Salary in
U.S.
Dollars
2355
5889
3878
4409
3985
0740
2398
5162
4421
7385
$34,072
$33,771
$31,509
$55,256
$78,980
$80,648
$56,643
$64,561
$62,403
$37,010
Site
Birmingham UK
Birmingham UK
Plymouth UK
Plymouth UK
Miami USA
Miami USA
York UK
INCORRECT CODE
Denver USA
York UK
Hire
Date
30JAN79
18JUN76
20MAR84
18SEP74
10JAN93
16FEB83
02FEB90
15APR86
19JUN93
18DEC91
VALUE statement:
HIGH keyword
Data set:
This example uses an existing format that is supplied by SAS as a formatted value.
Tasks include
3
3
3
3
Program
This program denes a format called BENEFIT, which differentiates between
employees hired on or before 31DEC1979. The purpose of this program is to indicate
any employees who are eligible to receive a benet, based on a hire date on or prior to
December 31, 1979. All other employees with a later hire date are listed as ineligible
for the benet.
Program
469
Assign two SAS library references (PROCLIB and LIBRARY). Assigning a library
reference LIBRARY is useful in this case because if you use PROC FORMAT, then SAS
automatically searches for informats and formats in any library that is referenced with the
LIBRARY libref.
libname proclib SAS-data-library-1;
libname library SAS-data-library-2;
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=40;
Store the BENEFIT. format in the catalog LIBRARY.FORMATS. The LIBRARY= option
species the permanent storage location LIBRARY for the formats that you create. If you do not
use LIBRARY=, then SAS temporarily stores formats and informats that you create in a catalog
named WORK.FORMATS.
proc format library=library;
Dene the rst range in the BENEFIT. format. This rst range differentiates between the
employees who were hired on or before 31DEC1979 and those who were hired after that date.
The keyword LOW and the SAS date constant 31DEC1979D create the rst range, which
includes all date values that occur on or before December 31, 1979. For values that fall into this
range, SAS applies the WORDDATEw. format.*
value benefit
low-31DEC1979d=[worddate20.]
Dene the second range in the BENEFIT. format. The second range consists of all dates on
or after January 1, 1980. The SAS date constant 01JAN1980D and the keyword HIGH specify
the range. Values that fall into this range receive ** Not Eligible ** as a formatted value.
01JAN1980d-high=
run;
Print the data set PROCLIB.STAFF. The NOOBS option suppresses the printing of
observation numbers. The LABEL option uses variable labels instead of variable names for
column headings.
proc print data=proclib.staff noobs label;
* For more information about SAS date constants, see the section on dates, times, and intervals in SAS Language Reference:
Concepts. For complete documentation on WORDDATEw., see the section on formats in SAS Language Reference: Dictionary.
470
Output
Chapter 22
Specify a label for the Salary variable. The LABEL statement substitutes the label Salary
in U.S. Dollars for the name SALARY.
label salary=Salary in U.S. Dollars;
Specify formats for Salary, Site, and Hiredate. The FORMAT statement associates the
USCurrency. format (created in Example 1 on page 464) with SALARY, the $CITY. format
(created in Example 2 on page 466) with SITE, and the BENEFIT. format with HIREDATE.
format salary uscurrency. site $city. hiredate benefit.;
Output
PROCLIB.STAFF with a Format for the Variables
Salary, Site, and HireDate
Name
Capalleti, Jimmy
Chen, Len
Davis, Brad
Leung, Brenda
Martinez, Maria
Orfali, Philip
Patel, Mary
Smith, Robert
Sorrell, Joseph
Zook, Carla
Id
Number
Salary in
U.S.
Dollars
2355
5889
3878
4409
3985
0740
2398
5162
4421
7385
$34,072
$33,771
$31,509
$55,256
$78,980
$80,648
$56,643
$64,561
$62,403
$37,010
Site
Birmingham UK
Birmingham UK
Plymouth UK
Plymouth UK
Miami USA
Miami USA
York UK
INCORRECT CODE
Denver USA
York UK
HireDate
January 30, 1979
June 18, 1976
** Not Eligible **
September 18, 1974
** Not Eligible **
** Not Eligible **
** Not Eligible **
** Not Eligible **
** Not Eligible **
** Not Eligible **
INVALUE statement
Program
This program converts quarterly employee evaluation grades, which are alphabetic,
into numeric values so that reports can be generated that sum the grades up as points.
Program
471
Set up two SAS library references, one named PROCLIB and the other named
LIBRARY.
libname proclib SAS-data-library-1;
libname library SAS-data-library-2;
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=64 pagesize=40;
Create the numeric informat Evaluation. The INVALUE statement converts the specied
values. The letters O (Outstanding), S (Superior), E (Excellent), C (Commendable), and N (None)
correspond to the numbers 4, 3, 2, 1, and 0, respectively.
invalue evaluation O=4
S=3
E=2
C=1
N=0;
run;
Create the PROCLIB.POINTS data set. The instream data, which immediately follows the
DATALINES statement, contains a unique identication number (EmployeeId) and bonus
evaluations for each employee for each quarter of the year (Q1Q4). Some of the bonus
evaluation values that are listed in the data lines are numbers; others are character values.
Where character values are listed in the data lines, the Evaluation. informat converts the value
O to 4, the value S to 3, and so on. The raw data values 0 through 4 are read as themselves
because they are not referenced in the denition of the informat. Converting the letter values to
numbers makes it possible to calculate the total number of bonus points for each employee for
the year. TotalPoints is the total number of bonus points.
data proclib.points;
input EmployeeId $ (Q1-Q4) (evaluation.,+1);
TotalPoints=sum(of q1-q4);
datalines;
2355 S O O S
5889 2 2 2 2
3878 C E E E
4409 0 1 1 1
3985 3 3 3 2
0740 S E E S
2398 E E C C
472
Output
Chapter 22
5162 C C C E
4421 3 2 2 2
7385 C C C N
;
Print the PROCLIB.POINTS data set. The NOOBS option suppresses the printing of
observation numbers.
proc print data=proclib.points noobs;
Output
The PROCLIB.POINTS Data Set
Employee
Id
Q1
Q2
Q3
Q4
Total
Points
2355
5889
3878
4409
3985
0740
2398
5162
4421
7385
3
2
1
0
3
3
2
1
3
1
4
2
2
1
3
2
2
1
2
1
4
2
2
1
3
2
1
1
2
1
3
2
2
1
2
3
1
2
2
0
14
8
7
3
11
10
6
5
9
3
Program
473
This example shows how to create a format from a SAS data set.
Tasks include
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60;
Create a temporary data set named scale. The rst two variables in the data lines, called
BEGIN and END, will be used to specify a range in the format. The third variable in the data
lines, called AMOUNT, contains a percentage that will be used as the formatted value in the
format. Note that all three variables are character variables as required for PROC FORMAT
input control data sets.
data scale;
input begin $ 1-2 end $ 5-8 amount $ 10-12;
datalines;
0
3
0%
4
6
3%
7
8
6%
9
10
8%
11 16
10%
;
Create the input control data set CTRL and set the length of the LABEL variable. The
LENGTH statement ensures that the LABEL variable is long enough to accommodate the label
***ERROR***.
data ctrl;
length label $ 11;
Rename variables and create an end-of-le ag. The data set CTRL is derived from
WORK.SCALE. RENAME= renames BEGIN and AMOUNT as START and LABEL,
respectively. The END= option creates the variable LAST, whose value is set to 1 when the last
observation is processed.
set scale(rename=(begin=start amount=label)) end=last;
474
Program
Chapter 22
Create the variables FMTNAME and TYPE with xed values. The RETAIN statement is
more efcient than an assignment statement in this case. RETAIN retains the value of
FMTNAME and TYPE in the program data vector and eliminates the need for the value to be
written on every iteration of the DATA step. FMTNAME species the name PercentageFormat,
which is the format that the input control data set creates. The TYPE variable species that the
input control data set will create a numeric format.
retain fmtname PercentageFormat type n;
Create an other category. Because the only valid values for this application are 016, any
other value (such as missing) should be indicated as an error to the user. The IF statement
executes only after the DATA step has processed the last observation from the input data set.
When IF executes, HLO receives a value of O to indicate that the range is OTHER, and LABEL
receives a value of ***ERROR***. The OUTPUT statement writes these values as the last
observation in the data set. HLO has missing values for all other observations.
if last then do;
hlo=O;
label=***ERROR***;
output;
end;
run;
Print the control data set, CTRL. The NOOBS option suppresses the printing of observation
numbers.
proc print data=ctrl noobs;
Program
475
Note that although the last observation contains values for START and END, these values are ignored because
of the O value in the HLO variable.
start
end
fmtname
1
type
0%
PercentageFormat
3%
6%
8%
4
7
9
6
8
10
PercentageFormat
PercentageFormat
PercentageFormat
n
n
n
10%
***ERROR***
11
11
16
16
PercentageFormat
PercentageFormat
n
n
hlo
Store the created format in the catalog WORK.FORMATS and specify the source for
the format. The CNTLIN= option species that the data set CTRL is the source for the format
PTSFRMT.
proc format library=work cntlin=ctrl;
run;
Create the numeric informat Evaluation. The INVALUE statement converts the specied
values. The letters O (Outstanding), S (Superior), E (Excellent), C (Commendable), and N (None)
correspond to the numbers 4, 3, 2, 1, and 0, respectively.
proc format;
invalue evaluation O=4
S=3
E=2
C=1
N=0;
run;
Create the WORK.POINTS data set. The instream data, which immediately follows the
DATALINES statement, contains a unique identication number (EmployeeId) and bonus
evaluations for each employee for each quarter of the year (Q1Q4). Some of the bonus
evaluation values that are listed in the data lines are numbers; others are character values.
Where character values are listed in the data lines, the Evaluation. informat converts the value
O to 4, the value S to 3, and so on. The raw data values 0 through 4 are read as themselves
because they are not referenced in the denition of the informat. Converting the letter values to
numbers makes it possible to calculate the total number of bonus points for each employee for
the year. TotalPoints is the total number of bonus points. The addition operator is used instead
of the SUM function so that any missing value will result in a missing value for TotalPoints.
data points;
input EmployeeId $ (Q1-Q4) (evaluation.,+1);
TotalPoints=q1+q2+q3+q4;
datalines;
2355 S O O S
5889 2 . 2 2
476
Output
Chapter 22
3878
4409
3985
0740
2398
5162
4421
7385
;
C
0
3
S
E
C
3
C
E
1
3
E
E
C
2
C
E
1
3
E
E
1
2
S
C
C E
2 2
C N
Generate a report for WORK.POINTS and associate the PTSFRMT. format with the
TotalPoints variable. The DEFINE statement performs the association. The column that
contains the formatted values of TotalPoints is using the alias Pctage. Using an alias enables
you to print a variable twice, once with a format and once with the default format. See Chapter
42, The REPORT Procedure, on page 845 for more information about PROC REPORT.
proc report data=work.points nowd headskip split=#;
column employeeid totalpoints totalpoints=Pctage;
define employeeid / right;
define totalpoints / Total#Points right;
define pctage / format=PercentageFormat12. Percentage left;
title The Percentage of Salary for Calculating Bonus;
run;
Output
Output 22.3
The Percentage of Salary for Calculating Bonus
Employee
Total
Id
Points
2355
14
5889
3878
4409
.
7
3
3985
0740
11
10
2398
5162
4421
.
5
9
***ERROR***
3%
8%
7385
0%
Percentage
10%
***ERROR***
6%
0%
10%
8%
Program
477
This example illustrates how to print a description of an informat and a format. The
description shows the values that are input and output.
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60;
Print a description of Evaluation. and NOZEROS. The FMTLIB option prints information
about the formats and informats in the catalog that the LIBRARY= option species.
LIBRARY=LIBRARY points to the LIBRARY.FORMATS catalog.
proc format library=library fmtlib;
Select an informat and a format. The SELECT statement selects EVAL and NOZEROS,
which were created in previous examples. The at sign (@) in front of EVAL indicates that EVAL.
is an informat.
select @evaluation nozeros;
478
Output
Chapter 22
Output
---------------------------------------------------------------------------|
FORMAT NAME: NOZEROS LENGTH:
5
NUMBER OF VALUES:
4
|
|
MIN LENGTH:
1 MAX LENGTH: 40 DEFAULT LENGTH
5 FUZZ: STD
|
|--------------------------------------------------------------------------|
|START
|END
|LABEL (VER. V7|V8
10APR2002:18:55:08)|
|----------------+----------------+----------------------------------------|
|LOW
|
-1|00.00
P- F M100
|
|
-1<
0<99
P-. F M100
|
|
0|
1<99
P. F M100
|
|
1|HIGH
|00.00
P
F M100
|
----------------------------------------------------------------------------
---------------------------------------------------------------------------|
INFORMAT NAME: @EVALUATION LENGTH: 1
|
|
MIN LENGTH:
1 MAX LENGTH: 40 DEFAULT LENGTH
1 FUZZ:
0
|
|--------------------------------------------------------------------------|
|START
|END
|INVALUE(VER. 9.00
10APR2002:18:55:11)|
|----------------+----------------+----------------------------------------|
|C
|C
|
1|
|E
|E
|
2|
|N
|N
|
0|
|O
|O
|
4|
|S
|S
|
3|
----------------------------------------------------------------------------
This example uses the LIBRARY= option and the FMTSEARCH= system option to
store and retrieve a format stored in a catalog other than WORK.FORMATS or
LIBRARY.FORMATS.
Program
479
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=64 pagesize=60;
Create the NOZEROS. format. The PICTURE statement denes the picture format
NOZEROS. See Building a Picture Format: Step by Step on page 443.
picture nozeros
low
-1 <-<
0
-<
1
run;
-1
0
1
high
=
=
=
=
00.00 (prefix=-
)
99 (prefix=-. mult=100)
99 (prefix=. mult=100)
00.00;
Add the PROCLIB.FORMATS catalog to the search path that SAS uses to nd
user-dened formats. The FMTSEARCH= system option denes the search path. The
FMTSEARCH= system option requires only a libref. FMTSEARCH= assumes that the catalog
name is FORMATS if no catalog name appears. Without the FMTSEARCH= option, SAS would
not nd the NOZEROS. format.*
options
fmtsearch=(proclib);
Print the SAMPLE data set. The FORMAT statement associates the NOZEROS. format with
the Amount variable.
proc print data=sample;
format amount nozeros.;
* For complete documentation on the FMTSEARCH= system option, see the section on SAS system options in SAS Language
Reference: Dictionary.
480
Output
Chapter 22
Output
Retrieving the NOZEROS. Format from PROCLIB.FORMATS
The SAMPLE Data Set
Obs
Amount
1
2
3
4
5
6
7
8
9
-2.05
-.05
-.01
.00
.09
.54
.55
6.60
14.63
This example creates a format and shows how to use ranges with character strings.
Program
libname proclib SAS-data-library;
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=40;
Program
481
Create the TRAIN data set from the PROCLIB.STAFF data set. PROCLIB.STAFF was
created in Examples: FORMAT Procedure on page 463.
data train;
set proclib.staff(keep=name idnumber);
run;
Print the data set TRAIN without a format. The NOOBS option suppresses the printing of
observation numbers.
proc print data=train noobs;
Name
Capalleti, Jimmy
Chen, Len
Davis, Brad
Leung, Brenda
Martinez, Maria
Orfali, Philip
Patel, Mary
Smith, Robert
Sorrell, Joseph
Zook, Carla
Id
Number
2355
5889
3878
4409
3985
0740
2398
5162
4421
7385
Store the format in WORK.FORMATS. Because the LIBRARY= option does not appear, the
format is stored in WORK.FORMATS and is available only for the current SAS session.
proc format;
Create the $SkillTest. format. The $SKILL. format prints each employees identication
number and the skills test that they have been assigned. Employees must take either TEST A,
TEST B, or TEST C, depending on their last name. The exclusion operator (<) excludes the last
value in the range. Thus, the rst range includes employees whose last name begins with any
letter from A through D, and the second range includes employees whose last name begins with
any letter from E through M. The tilde (~) in the last range is necessary to include an entire
string that begins with the letter Z.
value $skilltest a-<e,A-<E=Test A
e-<m,E-<M=Test B
m-z~,M-Z~=Test C;
run;
482
Output
Chapter 22
Generate a report of the TRAIN data set. The FORMAT= option in the DEFINE statement
associates $SkillTest. with the NAME variable. The column that contains the formatted values
of NAME is using the alias Test. Using an alias enables you to print a variable twice, once with
a format and once with the default format. See Chapter 42, The REPORT Procedure, on page
845for more information about PROC REPORT.
proc report data=train nowd headskip;
column name name=test idnumber;
define test / display format=$skilltest. Test;
define idnumber / center;
title Test Assignment for Each Employee;
run;
Output
Test Assignment for Each Employee
Name
Test
Capalleti, Jimmy
Chen, Len
Davis, Brad
Leung, Brenda
Martinez, Maria
Orfali, Philip
Patel, Mary
Smith, Robert
Sorrell, Joseph
Zook, Carla
Test
Test
Test
Test
Test
Test
Test
Test
Test
Test
IdNumber
A
A
A
B
C
C
C
C
C
C
2355
5889
3878
4409
3985
0740
2398
5162
4421
7385
Program
483
This example
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=64 pagesize=40;
Create the PAY data set. The PAY data set contains the monthly salary for each employee.
data pay;
input Name $ MonthlySalary;
datalines;
Liu
1259.45
Lars 1289.33
Kim
1439.02
Wendy 1675.21
Alex 1623.73
;
Dene the SALARY. picture format and specify how the picture will be lled. When
FILL= and PREFIX= PICTURE statement options appear in the same picture, the format
places the prex and then the ll characters. The SALARY. format lls the picture with the ll
character because the picture has zeros as digit selectors. The leftmost comma in the picture is
replaced by the ll character.
proc format;
picture salary low-high=00,000,000.00 (fill=* prefix=$);
run;
484
Output
Chapter 22
Print the PAY data set. The NOOBS option suppresses the printing of observation numbers.
The FORMAT statement temporarily associates the SALARY. format with the variable
MonthlySalary.
proc print data=pay noobs;
format monthlysalary salary.;
Output
Printing Salaries for a Check
Name
MonthlySalary
Liu
Lars
Kim
Wendy
Alex
****$1,259.45
****$1,289.33
****$1,439.02
****$1,675.21
****$1,623.73
See Also
FMTSEARCH= System option
VALIDFMTNAME= System option
FORMAT Statement
485
CHAPTER
23
The FORMS Procedure
Information about the FORMS Procedure
485
See:
list.
486
487
CHAPTER
24
The FREQ Procedure
Information about the FREQ Procedure
487
The documentation for the FREQ procedure has moved to Volume 3 of this book.
488
489
CHAPTER
25
The FSLIST Procedure
Overview: FSLIST Procedure 489
Syntax: FSLIST Procedure 489
Statement Descriptions 490
PROC FSLIST Statement 490
FSLIST Command 492
Using the FSLIST Window 494
General Information about the FSLIST Window
FSLIST Window Commands 494
Global Commands 494
Scrolling Commands 494
Searching Commands 496
Display Commands 498
Other Commands 499
494
3 You must specify either the FILEREF= or the UNIT= argument with the PROC
FSLIST statement.
490
Statement Descriptions
Chapter 25
NUM|NONUM
OVP|NOOVP
Statement Descriptions
The only statement that the FSLIST procedure supports is the PROC FSLIST
statement, which starts the procedure.
Requirements
You must specify an external le for PROC FSLIST to browse.
FSLIST Command
The FSLIST procedure can also be initiated by entering the following command on
the command line of any SAS window:
FSLIST <*|?| le-specication <carriage-control-option <overprinting-option>>>
where carriage-control-option can be CC, FORTCC, or NOCC and overprinting-option
can be OVP or NOOVP.
Note:
491
UNIT=nn
denes the FORTRAN-style logical unit number of the external le to browse. This
option is useful when the le to browse has a leref of the form FTnnF001, where nn
is the logical unit number that is specied in the UNIT= argument. For example, you
can specify
proc fslist unit=20;
instead of
proc fslist fileref=ft20f001;
controls how search strings for the FIND command are treated:
CAPS
NOCAPS
The default is NOCAPS. You can use the CAPS command in the FSLIST window to
change the behavior of the procedure while you are browsing a le.
CC | FORTCC | NOCC
indicates whether carriage-control characters are used to format the display. You can
specify one of the following values for this option:
CC
FORTCC
1
NOCC
If the FSLIST procedure can determine from the les attributes that the le
contains carriage-control information, then that carriage-control information is used
to format the displayed text (the CC option is the default). Otherwise, the entire
contents of the le are treated as text (the NOCC option the default).
492
FSLIST Command
Chapter 25
indicates the default horizontal scroll amount for the LEFT and RIGHT commands.
The following values are valid:
n
HALF
PAGE
sets the default scroll amount to the full window width.
The default is HSCROLL=HALF. You can use the HSCROLL command in the
FSLIST window to change the default scroll amount.
NOBORDER
suppresses the sides and bottom of the FSLIST windows border. When this option is
used, text can appear in the columns and row that are normally occupied by the
border.
NUM | NONUM
controls the display of line sequence numbers in les that have a record length of 80
and contain sequence numbers in columns 73 through 80. NUM displays the line
sequence numbers; NONUM suppresses them. The default is NONUM.
OVP| NOOVP
causes the procedure to honor the overprint code and print the
current line over the previous line when the code is encountered.
NOOVP
causes the procedure to ignore the overprint code and print each
line from the le on a separate line of the display.
The default is NOOVP. The OVP option is ignored if the NOCC option is in effect.
FSLIST Command
The FSLIST command provides a handy way to initiate an FSLIST session from any
SAS window. The command enables you to use either a leref or a lename to specify
the le to browse. It also enables you to specify how carriage-control information is
interpreted.
FSLIST Command
493
opens a dialog window in which you can specify the name of the le to browse, along
with various FSLIST procedure options. In the dialog window, you can specify either
a physical lename, a leref, or a directory name. If you specify a directory name,
then a selection list of the les in the directory appears, from which you can choose
the desired le.
?
opens a selection window from which you can choose the external le to browse. The
selection list in the window includes all external les that are identied in the
current SAS session (all les with dened lerefs).
Note: Only lerefs that are dened within the current SAS session appear in the
selection list. Under some operating environments, it is possible to allocate lerefs
outside of SAS. Such lerefs do not appear in the selection list that is displayed by
the FSLIST command. 4
To select a le, position the cursor on the corresponding leref and press ENTER.
Note: The selection window is not opened if no lerefs have been dened in the
current SAS session. Instead, an error message is printed, instructing you to enter a
lename with the FSLIST command. 4
le-specication
indicates whether carriage-control characters are used to format the display. You can
specify one of the following values for this option:
CC
FORTCC
494
Chapter 25
NOCC
treats carriage-control characters as regular text.
If the FSLIST procedure can determine from the les attributes that the le
contains carriage-control information, then that carriage-control information is used
to format the displayed text (the CC option is the default). Otherwise, the entire
contents of the le are treated as text (the NOCC option is the default).
OVP | NOOVP
indicates whether the carriage-control code for overprinting is honored. OVP causes
the overprint code to be honored; NOOVP causes it to be ignored. The default is
NOOVP. The OVP option is ignored if NOCC is in effect.
Global Commands
In the FSLIST window, you can use any of the global commands that are described in
the Global Commands chapter in SAS/FSP Procedures Guide.
Scrolling Commands
n
scrolls the window so that line n of text is at the top of the window. Type the
desired line number in the command window or on the command line and press
ENTER. If n is greater than the number of lines in the le, then the last few lines
of the le are displayed at the top of the window.
BACKWARD <n|HALF|PAGE|MAX>
scrolls vertically toward the rst line of the le. The following scroll amounts can
be specied:
n
scrolls upward by the specied number of lines.
HALF
scrolls upward by half the number of lines in the window.
495
PAGE
scrolls upward by the number of lines in the window.
MAX
scrolls upward until the rst line of the le is displayed.
If the scroll amount is not explicitly specied, then the window is scrolled by the
amount that was specied in the most recent VSCROLL command. The default
VSCROLL amount is PAGE.
BOTTOM
scrolls downward until the last line of the le is displayed.
FORWARD <n|HALF|PAGE|MAX>
scrolls vertically toward the end of the le. The following scroll amounts can be
specied:
n
scrolls downward by the specied number of lines.
HALF
scrolls downward by half the number of lines in the window.
PAGE
scrolls downward by the number of lines in the window.
MAX
scrolls downward until the rst line of the le is displayed.
If the scroll amount is not explicitly specied, then the window is scrolled by the
amount that was specied in the most recent VSCROLL command. The default
VSCROLL amount is PAGE. Regardless of the scroll amount, this command does
not scroll beyond the last line of the le.
HSCROLL <n|HALF|PAGE>
sets the default horizontal scrolling amount for the LEFT and RIGHT commands.
The following scroll amounts can be specied:
n
sets the default scroll amount to the specied number of columns.
HALF
sets the default scroll amount to half the number of columns in the window.
PAGE
sets the default scroll amount to the number of columns in the window.
The default HSCROLL amount is HALF.
LEFT <n|HALF|PAGE|MAX>
scrolls horizontally toward the left margin of the text. This command is ignored
unless the le width is greater than the window width. The following scroll
amounts can be specied:
n
scrolls left by the specied number of columns.
HALF
scrolls left by half the number of columns in the window.
PAGE
scrolls left by the number of columns in the window.
496
Chapter 25
MAX
scrolls left until the left margin of the text is displayed at the left edge of the
window.
If the scroll amount is not explicitly specied, then the window is scrolled by the
amount that was specied in the most recent HSCROLL command. The default
HSCROLL amount is HALF. Regardless of the scroll amount, this command does
not scroll beyond the left margin of the text.
RIGHT <n|HALF|PAGE|MAX>
scrolls horizontally toward the right margin of the text. This command is ignored
unless the le width is greater than the window width. The following scroll
amounts can be specied:
n
scrolls right by the specied number of columns.
HALF
scrolls right by half the number of columns in the window.
PAGE
scrolls right by the number of columns in the window.
MAX
scrolls right until the right margin of the text is displayed at the left edge of
the window.
If the scroll amount is not explicitly specied, then the window is scrolled by the
amount that was specied in the most recent HSCROLL command. The default
HSCROLL amount is HALF. Regardless of the scroll amount, this command does
not scroll beyond the right margin of the text.
TOP
scrolls upward until the rst line of text from the le is displayed.
VSCROLL <n|HALF|PAGE>
sets the default vertical scrolling amount for the FORWARD and BACKWARD
commands. The following scroll amounts can be specied:
n
sets the default scroll amount to the specied number of lines.
HALF
sets the default scroll amount to half the number of lines in the window.
PAGE
sets the default scroll amount to the number of lines in the window.
The default VSCROLL amount is PAGE.
Searching Commands
BFIND <search-string <PREFIX|SUFFIX|WORD>>
locates the previous occurrence of the specied string in the le, starting at the
current cursor position and proceeding backward toward the beginning of the le.
The search-string value must be enclosed in quotation marks if it contains
embedded blanks.
If a FIND command has previously been issued, then you can use the BFIND
command without arguments to repeat the search in the opposite direction.
497
The CAPS option on the PROC FSLIST statement and the CAPS ON command
cause search strings to be converted to uppercase for the purposes of the search,
unless the strings are enclosed in quotation marks. See the discussion of the FIND
command for details.
By default, the BFIND command locates any occurrence of the specied string,
even where the string is embedded in other strings. You can use any one of the
following options to alter the commands behavior:
PREFIX
causes the search string to match the text string only when the text string
occurs at the beginning of a word.
SUFFIX
causes the search string to match the text string only when the text string
occurs at the end of a word.
WORD
causes the search string to match the text string only when the text string is
a distinct word.
You can use the RFIND command to repeat the most recent BFIND command.
CAPS <ON|OFF>
controls how the FIND, BFIND, and RFIND commands locate matches for a
search string. By default, the FIND, BFIND, and RFIND commands locate only
those text strings that exactly match the search string as it was entered. When
you issue the CAPS command, the FIND, BFIND, and RFIND commands convert
search strings into uppercase for the purposes of searching (displayed text is not
affected), unless the strings are enclosed in quotation marks. Strings in quotation
marks are not affected.
For example, after you issue a CAPS ON command, both of the following
commands locate occurrences of NC but not occurrences of nc:
find NC
find nc
If you omit the ON or OFF argument, then the CAPS command acts as a toggle,
turning the attribute on if it was off or off if it was on.
FIND search-string <NEXT|FIRST|LAST|PREV|ALL>
<PREFIX|SUFFIX|WORD>
locates an occurrence of the specied search-string in the le. The search-string
must be enclosed in quotation marks if it contains embedded blanks.
The text in the search-string must match the text in the le in terms of both
characters and case. For example, the command
find raleigh
will locate not the text Raleigh in the le. You must instead use
find Raleigh
498
Chapter 25
When the CAPS option is used with the PROC FSLIST statement or when a
CAPS ON command is issued in the window, the search string is converted to
uppercase for the purposes of the search, unless the string is enclosed in quotation
marks. In that case, the command
find raleigh
will locate only the text RALEIGH in the le. You must instead use the command
find Raleigh
Display Commands
COLUMN <ON|OFF>
displays a column ruler below the message line in the FSLIST window. The ruler
is helpful when you need to determine the column in which a particular character
499
is located. If you omit the ON or OFF specication, then the COLUMN command
acts as a toggle, turning the ruler on if it was off and off if it was on.
HEX <ON|OFF>
controls the special hexadecimal display format of the FSLIST window. When the
hexadecimal format is turned on, each line of characters from the le occupies
three lines of the display. The rst is the line displayed as characters; the next two
lines of the display show the hexadecimal value of the operating environments
character codes for the characters in the line of text. The hexadecimal values are
displayed vertically, with the most signicant byte on top. If you omit the ON or
OFF specication, then the HEX command acts as a toggle, turning the
hexadecimal format on if it was off and off if it was on.
NUMS <ON|OFF>
controls whether line numbers are shown at the left side of the window. By
default, line numbers are not displayed. If line numbers are turned on, then they
remain at the left side of the display when text in the window is scrolled right and
left. If you omit the ON or OFF argument, then the NUMS command acts as a
toggle, turning line numbering on if it was off or off if it was on.
Other Commands
BROWSE leref|actual-lename <CC|FORTCC|NOCC <OVP|NOOVP>>
closes the current le and displays the specied le in the FSVIEW window. You
can specify either a leref previously associated with a le or an actual lename
enclosed in quotation marks. The BROWSE command also accepts the same
carriage-control options as the FSLIST command. See FSLIST Command
Options on page 493 for details.
END
closes the FSLIST window and ends the FSLIST session.
HELP <command>
opens a Help window that provides information about the FSLIST procedure and
about the commands available in the FSLIST window. To get information about a
specic FSLIST window command, follow the HELP command with the name of
the desired command.
KEYS
opens the KEYS window for browsing and editing function key denitions for the
FSLIST window. The default key denitions for the FSLIST window are stored in
the FSLIST.KEYS entry in the SASHELP.FSP catalog.
If you change any key denitions in the KEYS window, then a new
FSLIST.KEYS entry is created in your personal PROFILE catalog
(SASUSER.PROFILE, or WORK.PROFILE if the SASUSER library is not
allocated).
When the FSLIST procedure is initiated, it looks for function key denitions
rst in the FSLIST.KEYS entry in your personal PROFILE catalog. If that entry
does not exist, then the default entry in the SASHELP.FSP catalog is used.
500
501
CHAPTER
26
The IMPORT Procedure
Overview: IMPORT Procedure 501
Syntax: IMPORT Procedure 502
PROC IMPORT Statement 502
Data Source Statements 506
Examples: IMPORT Procedure 514
Example 1: Importing a Delimited External File 514
Example 2: Importing a Specic Spreadsheet from an Excel Workbook 517
Example 3: Importing a Subset of Records from an Excel Spreadsheet 518
Example 4: Importing a Microsoft Access Table 519
Example 5: Importing a Specic Spreadsheet from an Excel Workbook on a PC Server
521
Import Data
502
Chapter 26
Restriction:
3 Microsoft Windows.
PROC IMPORT
DATAFILE="lename" | TABLE="tablename"
OUT=<libref.>SAS-data-set < (SAS-data-set-options)>
<DBMS=identier>< REPLACE> ;
<data-source-statement(s);>
All examples
PROC IMPORT
DATAFILE="lename" | TABLE="tablename"
OUT=<libref.>SAS-data-set < (SAS-data-set-options)>
<DBMS=identier>< REPLACE> ;
Required Arguments
DATAFILE="lename"
species the complete path and lename or a leref for the input PC le,
spreadsheet, or delimited external le. If you specify a leref or if the complete path
and lename does not include special characters (such as the backslash in a path),
lowercase characters, or spaces, you can omit the quotation marks. A leref is a SAS
name that is associated with the physical location of the output le. To assign a
leref, use the FILENAME statement. For more information about PC le formats,
see SAS/ACCESS for PC Files: Reference.
Featured in:
page 518
Restriction: PROC IMPORT does not support device types or access methods for
the FILENAME statement except for DISK. For example, PROC IMPORT does not
support the TEMP device type, which creates a temporary external le.
Restriction: For client/server applications: When running SAS/ACCESS software
on UNIX to access data that is stored on a PC server, you must specify the full
path and lename of the le that you want to import. The use of a leref is not
supported.
Interaction: For some input data sources like a Microsoft Excel spreadsheet, in
order to determine the data type (numeric or character) for a column, the rst
503
eight rows of data are scanned and the most prevalent type of data is used. If
most of the data in the rst eight rows is missing, SAS defaults to the character
data type; any subsequent numeric data for that column becomes missing as well.
Mixed data can also create missing values. For example, if the rst eight rows
contain mostly character data, SAS assigns the column as a character data type;
any subsequent numeric data for that column becomes missing.
Restriction: PROC IMPORT can import data only if the data type is supported by
SAS. SAS supports numeric and character types of data but not, for example,
binary objects. If the data that you want to import is a type not supported by SAS,
PROC IMPORT may not be able to import it correctly. In many cases, the
procedure attempts to convert the data to the best of its ability; however, for some
types, this is not possible.
Tip: For information about how SAS converts data types, see the specic
information for the data source that you are importing in SAS/ACCESS for PC
Files: Reference. For example, see the chapter Understanding XLS Essentials for
a table that lists XLS data types and the resulting SAS variable data type and
formats.
Tip: For a DBF le, if the le was created by Microsoft Visual FoxPro, the le must
be exported by Visual FoxPro into an appropriate dBASE format in order to import
the le to SAS.
TABLE="tablename"
species the table name of the input DBMS table. If the name does not include
special characters (such as question marks), lowercase characters, or spaces, you can
omit the quotation marks. Note that the DBMS table name may be case sensitive.
Requirement: When you import a DBMS table, you must specify the DBMS=
option.
Featured in: Example 4 on page 519
OUT=<libref.>SAS-data-set
identies the output SAS data set with either a one- or two-level SAS name (library
and member name). If the specied SAS data set does not exist, PROC IMPORT
creates it. If you specify a one-level name, by default PROC IMPORT uses either the
USER library (if assigned) or the WORK library (if USER not assigned).
Featured in: All examples
(SAS-data-set-options)
species SAS data set options. For example, to assign a password to the resulting
SAS data set, you can use the ALTER=, PW=, READ=, or WRITE= data set option,
or to import only data that meets a specied condition, you can use the WHERE=
data set option. For information about all SAS data set options, see Data Set
Options in SAS Language Reference: Dictionary.
Restriction: You cannot specify data set options when importing delimited,
comma-separated, or tab-delimited external les.
Featured in: Example 3 on page 518
Options
DBMS=identier
species the type of data to import. To import a DBMS table, you must specify
DBMS= using a valid database identier. For example, DBMS=ACCESS species to
import a Microsoft Access 2000 or 2002 table. To import PC les, spreadsheets, and
delimited external les, you do not have to specify DBMS= if the lename that is
504
Chapter 26
Identier
Extension
Host
Availability
ACCESS
.mdb
Microsoft
Windows *
ACCESS97
.mdb
Microsoft
Windows *
ACCESS2000
.mdb
Microsoft
Windows *
ACCESS2002
.mdb
Microsoft
Windows *
ACCESSCS
.mdb
UNIX
CSV
.csv
OpenVMS
Alpha, UNIX,
Microsoft
Windows
DBF
.dbf
UNIX,
Microsoft
Windows
DLM
.*
OpenVMS
Alpha, UNIX,
Microsoft
Windows
EXCEL
.xls
Microsoft
Windows *
EXCEL4
.xls
Microsoft
Windows
EXCEL5
.xls
Microsoft
Windows
EXCEL97
.xls
Microsoft
Windows *
EXCEL2000
.xls
Microsoft
Windows *
EXCELCS
Excel spreadsheet
.xls
UNIX
JMP
JMP table
.jmp
UNIX,
Microsoft
Windows
PCFS
Files on PC server
.*
UNIX
Identier
Extension
Host
Availability
TAB
.txt
OpenVMS
Alpha, UNIX,
Microsoft
Windows
WK1
.wk1
Microsoft
Windows
WK3
.wk3
Microsoft
Windows
WK4
.wk4
505
Microsoft
Windows
3 To import a Microsoft Access table, PROC IMPORT can distinguish whether the
table is in Access 97, 2000, or 2002 format regardless of your specication. For
example, if you specify DBMS=ACCESS and the table is an Access 97 table,
PROC IMPORT will import the le.
Excel 2000
Excel 97
Excel 5.0
Excel 4.0
EXCEL
yes
yes
yes
yes
yes
EXCEL2002
yes
yes
yes
yes
yes
EXCEL2000
yes
yes
yes
yes
yes
EXCEL97
yes
yes
yes
yes
yes
EXCEL5
no
no
no
yes
yes
EXCEL4
no
no
no
yes
yes
Note: Although Excel 4.0 and Excel 5.0 spreadsheets are often
interchangeable, it is recommended that you specify the exact version.
506
Chapter 26
REPLACE
overwrites an existing SAS data set. If you do not specify REPLACE, PROC
IMPORT does not overwrite an existing data set.
Featured in: Example 1 on page 514
All examples
PROC IMPORT provides a variety of statements that are specic to the input data
source.
Supported Syntax
Valid Values
Default Value
CSV/TAB
GETNAMES=
YES | NO
YES
DATAROW=
1 to 32767
GUESSING ROWS=
1 to 32767
none
GETNAMES=
YES | NO
YES
DATAROW=
1 to 32767
GUESSINGROWS=
1 to 32767
none
DBF
GETDELETED=
YES | NO
NO
GETNAMES=
YES | NO
YES
RANGE=
Range Name or
Absolute Range Value,
such as A1...C4
DLM
JMP
SHEET=
Sheet Name
EXCEL4 / EXCEL5
GETNAMES=
YES | NO
RANGE=
Range Name or
Absolute Range Value,
such as A1...C4
SHEET=
Sheet Name
YES
Data Source
Supported Syntax
Valid Values
Default Value
EXCEL
GETNAMES=
YES | NO
YES
EXCEL97
RANGE=
EXCEL2000
SHEET=
EXCEL2002
MIXED=
Range Name or
Absolute Range Value,
such as A1...C4
507
SCANTEXT=
SCANTIME=
USEDATE=
TEXTSIZE=
DBSASLABEL=
NO
YES
Sheet Name
YES
YES | NO
YES
YES | NO
1024
YES | NO
COMPAT
YES | NO
1 to 32767
COMPAT | NONE
EXCELCS
VERSION=
SERVER=
5 | 95 | 97 | 2000
| 2002
SERVICE=
Server Name
PORT=
Service Name
RANGE=
1 to 32767
SHEET=
Range Name or
Absolute Range Value,
such as A1...C4
SCANTEXT=
SCANTIME=
USEDATE=
TEXTSIZE=
DBSASLABEL=
97
YES
YES
YES
1024
COMPAT
Sheet Name
YES | NO
YES | NO
YES | NO
1 to 32767
COMPAT | NONE
DATAROW=n;
starts reading data from row number n in the external le.
Default:
when GETNAMES=NO
DELIMITER=char | nnx;
for a delimited external le, species the delimiter that separates columns of data
in the input le. You can specify the delimiter as a single character or as a
hexadecimal value. For example, if columns of data are separated by an
508
Chapter 26
absolute-range
509
identies the top left cell that begins the range and the bottom
right cell that ends the range. For Excel 4.0, 5.0, and 7.0 (95),
the beginning and ending cells are separated by two periods;
that is, C9..F12 species a cell range that begins at cell C9,
ends at cell F12, and includes all the cells in between. For
Excel 97, 2000, and 2002, the beginning and ending cells are
separated by a colon that is, C9:F12.
Tip: For Excel 97, 2000, and 2002, you can include the
PROC IMPORT reads the rst spreadsheet in the le. For Excel 97 and later,
PROC IMPORT reads the rst spreadsheet from an ascending sort of the
spreadsheet names. To be certain that PROC IMPORT reads the desired
spreadsheet, you should identify the spreadsheet by specifying SHEET=.
Limitation: SAS supports spreadsheet names up to 31 characters. With the $
510
Chapter 26
USEDATE=YES | NO;
If USEDATE=YES, then DATE. format is used for date/time columns in the data
source table while importing data from Excel workbook. If USEDATE=NO, then
DATETIME. format is used for date/time.
VERSION="le-version";
species the version of le that you want to create with if the le does not exist on
your PC server yet. The default version is data-source specic. For Microsoft Excel
workbook, the valid values are 2002, 2000, 97, 95 and 5, and its default value
is 97.
Note: Always quote the version value.
Note: If the le already exists in the PC server, then this value can be
ignored. 4
511
Supported Syntax
Valid Values
Default Value
ACCESS
DATABASE=
ACCESS97
DBPWD=
YES
ACCESS2000
UID=
ACCESS2002
PWD=
WGDB=
SCANMEMO=
Database password
User ID
User password
MEMOSIZE=
NO
1024
COMPAT
DBSASLABEL=
YES
YES | NO
SCANTIME=
USEDATE=
YES | NO
YES | NO
1 to 32767
COMPAT | NONE
ACCESSCS
VERSION=
97 | 2000 | 2002
SERVER=
Server Name
SERVICE=
Service Name
PORT=
1 to 32767
DATABASE=
DBPWD=
UID=
PWD=
WGDB=
SCANMEMO=
Database password
2000
YES
YES
YES
1024
COMPAT
User ID
User password
MEMOSIZE=
DBSASLABEL=
YES | NO
SCANTIME=
USEDATE=
YES | NO
YES | NO
1 to 32767
COMPAT | NONE
DATABASE="database";
species the complete path and lename of the database that contains the
specied DBMS table. If the database name does not contain lowercase characters,
special characters, or national characters ($, #, or @), you can omit the quotation
marks. You can replace the equal sign with a blank.
512
Chapter 26
Note: A default may be congured in the DBMS client software; however, SAS
does not generate a default value. 4
DBPWD="database password";
species a password that allows access to a database. You can replace the equal
sign with a blank.
DBSASLABEL=COMPAT | NONE;
When DBSASLABEL=COMPAT, the data sources column names are saved as the
corresponding SAS label names. This is the default value.
WHEN DBSASLABEL=NONE, the data sources column names are not saved
as SAS label names. SAS label names are left as nulls.
Featured in:
MEMOSIZE="eld-length";
species the eld length for importing Microsoft Access Memo elds.
Range:
Default:
Tip:
1 to 32,767
1024
To prevent Memo
elds from being
imported, you can
specify
MEMOSIZE=0
Range: 1 - 32,767 Default: 1024 Tip: To prevent Memo elds from being
imported, you can specify MEMOSIZE=0.
PORT=1 to 3276;
scans data for its data type from row 1 to the row number that is specied.
Note: This number should be greater than the value that is specied for
DATAROW=. 4
PWD="password";
species the user password used by the DBMS to validate a specic userid. If the
password does not contain lowercase characters, special characters, or national
characters, you can omit the quotation marks. You can replace the equal sign with
a blank.
Note: The DBMS client software may default to the userid and password that
were used to log in to the operating environment; SAS does not generate a default
value. 4
SCANMEMO=YES | NO;
scans the length of data for memo elds and uses the length of the longest string
data that it nds as the SAS column width. However, if the maximum length that
it nds is greater than what is specied in the MEMOSIZE= option, then the
smaller value that is specied in MEMOSIZE= will be applied as the SAS variable
width.
SCANTIME=YES | NO;
scans all row values for a DATETIME data type eld and automatically
determines the TIME data type if only time values (that is, no date or datetime
values) exist in the column.
513
SERVER="PC-server-name";
species the name of the PC server. You must bring up the listener on the PC
server before you can establish a connection to it. You can congure the service
name, port number, maximum number of connections allowed, and use of data
encryption on your PC server. This is a required statement. Refer to your PC
server administrator for the information that is needed. Alias: SERVER_NAME=.
SERVICE="service-name";
species the service name that is dened on your service le for your client and
server machines. This statement and the PORT= statement should not be used in
the same procedure. Note that this service name must be dened on both your
UNIX machine and your PC server. Alias: SERVER_NAME=.
UID= "user-id";
identies the user to the DBMS. If the userid does not contain lowercase
characters, special characters, or national characters, you can omit the quotation
marks. You can replace the equal sign with a blank.
Note: The DBMS client software may default to the userid and password that
were used to log in to the operating environment; SAS does not generate a default
value. 4
WGDB= "workgroup-database-name" ;
species the workgroup (security) database name that contains the USERID and
PWD data for the DBMS. If the workgroup database name does not contain
lowercase characters, special characters, or national characters, you can omit the
quotation marks. You can replace the equal sign with a blank.
Note: A default workgroup database may be used by the DBMS; SAS does not
generate a default value. 4
USEDATE=YES | NO;
If USEDATE=YES, then DATE. format is used for date/time columns in the data
source table while importing data from Excel workbook. If USEDATE=NO, then
DATETIME. format is used for date/time.
VERSION="le-version";
species the version of le that you want to create with if the le does not exist on
your PC server yet. The default version is data-source specic. For Microsoft Excel
workbook, the valid values are 2002, 2000, 97, 95 and 5, and its default value
is 97.
Note: Always quote the version value.
Note: If the le already exists in the PC Server, this value can be ignored.
514
Chapter 26
Full
Specify DBPWD=, PWD=, UID=, and WGDB=.
Each statement has a default value; however, you may nd it necessary to provide a
value for each statement explicitly.
PRINT procedure
This example imports the following delimited external le and creates a temporary
SAS data set named WORK.MYDATA:
Region&State&Month&Expenses&Revenue
Southern&GA&JAN2001&2000&8000
Southern&GA&FEB2001&1200&6000
Southern&FL&FEB2001&8500&11000
Northern&NY&FEB2001&3000&4000
Northern&NY&MAR2001&6000&5000
Southern&FL&MAR2001&9800&13500
Northern&MA&MAR2001&1500&1000
Program
Program
515
Specify the delimiter. The DELIMITER= option species that an & (ampersand) delimits data
elds in the input le. The delimiter separates the columns of data in the input le.
delimiter=&;
Generate the variable names from the rst row of data in the input le.
getnames=yes;
run;
Print the WORK.MYDATA data set. PROC PRINT produces a simple listing.
options nodate ps=60 ls=80;
proc print data=mydata;
run;
516
SAS Log
Chapter 26
SAS Log
The SAS log displays information about the successful import. For this example,
PROC IMPORT generates a SAS DATA step, as shown in the partial log that follows.
/**********************************************************************
79
*
PRODUCT:
SAS
80
*
VERSION:
9.00
81
*
CREATOR:
External File Interface
82
*
DATE:
24JAN02
83
*
DESC:
Generated SAS Datastep Code
84
*
TEMPLATE SOURCE: (None Specified.)
85
***********************************************************************/
86
data MYDATA
;
87
%let _EFIERR_ = 0; /* set the ERROR detection macro variable */
88
infile C:\My Documents\myfiles\delimiter.txt delimiter = & MISSOVER
88 ! DSD lrecl=32767 firstobs=2 ;
89
informat Region $8. ;
90
informat State $2. ;
91
informat Month $7. ;
92
informat Expenses best32. ;
93
informat Revenue best32. ;
94
format Region $8. ;
95
format State $2. ;
96
format Month $7. ;
97
format Expenses best12. ;
98
format Revenue best12. ;
99
input
100
Region $
101
State $
102
Month $
103
Expenses
104
Revenue
105
;
106
if _ERROR_ then call symput(_EFIERR_,1); /* set ERROR detection
106! macro variable */
107
run;
NOTE: Numeric values have been converted to character
values at the places given by: (Line):(Column).
106:44
NOTE: The infile C:\My Documents\myfiles\delimiter.txt is:
File Name=C:\My Documents\myfiles\delimiter.txt,
RECFM=V,LRECL=32767
NOTE: 7 records were read from the infile C:\My
Documents\myfiles\delimiter.txt.
The minimum record length was 29.
The maximum record length was 31.
NOTE: The data set WORK.MYDATA has 7 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time
0.04 seconds
cpu time
0.05 seconds
from C:\My
Program
517
Output
This output lists the output data set, MYDATA, created by PROC IMPORT from the
delimited external le.
Region
State
Southern
Southern
Southern
Northern
Northern
Southern
Northern
GA
GA
FL
NY
NY
FL
MA
Month
Expenses
Revenue
2000
1200
8500
3000
6000
9800
1500
8000
6000
11000
4000
5000
13500
1000
JAN2001
FEB2001
FEB2001
FEB2001
MAR2001
MAR2001
MAR2001
This example imports a specic spreadsheet from an Excel workbook, which contains
multiple spreadsheets, and creates a new, permanent SAS data set named
SASUSER.ACCOUNTS.
Program
Specify the input le. The lename contains the extension .XLS, which PROC IMPORT
recognizes as identifying an Excel 2000 spreadsheet.
proc import datafile="c:\myfiles\Accounts.xls"
518
Output
Chapter 26
Do not generate the variable names from the input le. PROC IMPORT will use default
variable names.
getnames=no;
run;
Print the SASUSER.ACCOUNTS data set. PROC PRINT produces a simple listing. The
OBS= data set option limits the output to the rst 10 observations.
proc print data=sasuser.accounts(obs=10);
run;
Output
The following output displays the rst 10 observations of the output data set,
SASUSER.ACCOUNTS:
F1
F2
Dharamsala Tea
Tibetan Barley Beer
Licorice Syrup
Chef Antons Cajun Seasoning
Chef Antons Gumbo Mix
Grandmas Boysenberry Spread
Uncle Bobs Organic Dried Pears
Northwoods Cranberry Sauce
Mishi Kobe Beef
Fish Roe
10
24
12
48
36
12
12
12
18
12
F3
boxes x 20 bags
- 12 oz bottles
- 550 ml bottles
- 6 oz jars
boxes
- 8 oz jars
- 1 lb pkgs.
- 12 oz jars
- 500 g pkgss.
- 200 ml jars
18.00
19.00
10.00
22.00
21.35
25.00
30.00
40.00
97.00
31.00
519
This example imports a subset of an Excel spreadsheet and creates a temporary SAS
data set. The WHERE= SAS data set option is specied in order to import only a subset
of records from the Excel spreadsheet.
Program
Specify the input le.
proc import datafile=c:\Myfiles\Class.xls
Identify the output SAS data set, and request that only a subset of the records be
imported.
out=work.femaleclass (where=(sex=F));
run;
Print the new SAS data set. PROC PRINT produces a simple listing.
proc print data=work.femaleclass;
run;
Output
The following output displays the output SAS data set, WORK.FEMALECLASS:
Name
Alice
Barbara
Carol
Jane
Janet
Joyce
Judy
Louise
Mary
Sex
Age
Height
Weight
F
F
F
F
F
F
F
F
F
13
13
14
12
15
11
14
12
15
56.5
65.3
62.8
59.8
62.5
51.3
64.3
56.3
66.5
84.0
98.0
102.5
84.5
112.5
50.5
90.0
77.0
112.0
520
Program
Chapter 26
This example imports a Microsoft Access 97 table and creates a permanent SAS data
set named SASUSER.CUST. The Access table has user-level security, so it is necessary
to specify values for the PWD=, UID=, and WGDB= statements.
Program
Specify the path and lename of the database that contains the table.
database="c:\myfiles\east.mdb";
Specify the workgroup (security) database name that contains the user ID and
password data for the Microsoft Access table.
wgdb="c:\winnt\system32\security.mdb";
Print the SASUSER.CUST data set. PROC PRINT produces a simple listing. The OBS= data
set option limits the output to the rst ve observations.
proc print data=sasuser.cust(obs=5);
run;
521
Output
The following output displays the rst ve observations of the output data set,
SASUSER.CUST.
Name
Street
Zipcode
David Taylor
Theo Barnes
Lydia Stirog
Anton Niroles
Cheryl Gaspar
72511
72513
72516
72511
72515
522
Program
Chapter 26
Program
proc import dbms=excelcs
datafile="c:\myfiles\Invoice.xls"
out=work.prices;
server=Sales;
service=pcfiles;
sheet=Prices;
getnames=yes;
usedate=no;
run;
proc print data=work.prices(obs=10);
run;
523
CHAPTER
27
The MEANS Procedure
Overview: MEANS Procedure 524
What Does the MEANS Procedure Do? 524
What Types of Output Does PROC MEANS Produce? 524
Syntax: MEANS Procedure 526
PROC MEANS Statement 527
BY Statement 535
CLASS Statement 536
FREQ Statement 539
ID Statement 540
OUTPUT Statement 540
TYPES Statement 546
VAR Statement 547
WAYS Statement 548
WEIGHT Statement 549
Concepts: MEANS Procedure 550
Using Class Variables 550
Using TYPES and WAYS Statements 550
Ordering the Class Values 551
Computational Resources 552
Statistical Computations: MEANS Procedure 553
Computation of Moment Statistics 553
Condence Limits 553
Students t Test 554
Quantiles 555
Results: MEANS Procedure 556
Missing Values 556
Column Width for the Output 556
The N Obs Statistic 556
Output Data Set 557
Examples: MEANS Procedure 558
Example 1: Computing Specic Descriptive Statistics 558
Example 2: Computing Descriptive Statistics with Class Variables 560
Example 3: Using the BY Statement with Class Variables 562
Example 4: Using a CLASSDATA= Data Set with Class Variables 564
Example 5: Using Multilabel Value Formats with Class Variables 567
Example 6: Using Preloaded Formats with Class Variables 570
Example 7: Computing a Condence Limit for the Mean 573
Example 8: Computing Output Statistics 575
Example 9: Computing Different Output Statistics for Several Variables 577
Example 10: Computing Output Statistics with Missing Class Variable Values
Example 11: Identifying an Extreme Value with the Output Statistics 580
578
524
Chapter 27
Example 12: Identifying the Top Three Extreme Values with the Output Statistics
References 587
583
Output 27.1
N
Mean
Std Dev
Minimum
Maximum
-----------------------------------------------------------------10
5.5000000
3.0276504
1.0000000
10.0000000
------------------------------------------------------------------
525
Output 27.2
MoneyRaised
HoursVolunteered
20
20
28.5660000
19.2000000
23.5600000
20.0000000
1994
18
MoneyRaised
HoursVolunteered
18
18
31.5794444
24.2777778
65.4400000
15.0000000
1992
16
MoneyRaised
HoursVolunteered
16
16
28.5450000
18.8125000
48.2700000
38.0000000
1993
12
MoneyRaised
HoursVolunteered
12
12
28.0500000
15.8333333
52.4600000
21.0000000
1994
Monroe
20
28
MoneyRaised
28
29.4100000
73.5300000
HoursVolunteered
28
19.1428571
26.0000000
-----------------------------------------------------------------------------
Obs
1
2
3
4
5
6
7
8
9
10
11
12
School
Year
_TYPE_
Kennedy
Monroe
Kennedy
Kennedy
Kennedy
Monroe
Monroe
Monroe
.
1992
1993
1994
.
.
1992
1993
1994
1992
1993
1994
0
1
1
1
2
2
3
3
3
3
3
3
_FREQ_
109
31
32
46
53
56
15
20
18
16
12
28
Most
Cash
Most
Time
Willard
Tonya
Cameron
Willard
Luther
Willard
Thelma
Bill
Luther
Tonya
Cameron
Willard
Tonya
Tonya
Amy
L.T.
Jay
Tonya
Jay
Amy
Che-Min
Tonya
Myrtle
L.T.
Money
Raised
78.65
55.16
65.44
78.65
72.22
78.65
52.63
42.23
72.22
55.16
65.44
78.65
Hours
Volunteered
40
40
31
33
35
40
35
31
33
40
26
33
In addition to the report, the program also creates an output data set (located on
page 2 of the output) that identies the students who raised the most money and who
volunteered the most time over all the combinations of School and Year and within the
combinations of School and Year:
526
Chapter 27
3 The rst observation in the data set shows the students with the maximum values
overall for MoneyRaised and HoursVolunteered.
3 Observations 2 through 4 show the students with the maximum values for each
year, regardless of school.
3 Observations 5 and 6 show the students with the maximum values for each school,
regardless of year.
3 Observations 7 through 12 show the students with the maximum values for each
school-year combination.
To do this
BY
CLASS
FREQ
ID
OUTPUT
TYPES
VAR
To do this
WAYS
WEIGHT
To do this
DATA=
NOTRAP
SUMSIZE=
THREADS | NOTHREADS
CLASSDATA=
COMPLETETYPES
EXCLUSIVE
MISSING
ALPHA=
EXCLNPWGTS
QMARKERS=
QMETHOD=
QNTLDEF=
statistic-keyword
VARDEF=
527
528
Chapter 27
To do this
FW=
MAXDEC=
NONOBS
NOPRINT
ORDER=
PRINTALLTYPES
PRINTIDVARS
CHARTYPE
DESCENDTYPES
IDMIN
NWAY
Options
ALPHA=value
species the condence level to compute the condence limits for the mean. The
percentage for the condence limits is (1value)2100. For example, ALPHA=.05
results in a 95% condence limit.
Default: .05
Range:
between 0 and 1
LCLM, or UCLM.
See also: Condence Limits on page 553
Featured in:
CHARTYPE
species that the _TYPE_ variable in the output data set is a character
representation of the binary value of _TYPE_. The length of the variable equals the
number of class variables.
Main discussion: Output Data Set on page 557
Interaction: When you specify more than 32 class variables, _TYPE_ automatically
CLASSDATA=SAS-data-set
species a data set that contains the combinations of values of the class variables
that must be present in the output. Any combinations of values of the class variables
529
that occur in the CLASSDATA= data set but not in the input data set appear in the
output and have a frequency of zero.
Restriction: The CLASSDATA= data set must contain all class variables. Their
data type and format must match the corresponding class variables in the input
data set.
Interaction: If you use the EXCLUSIVE option, then PROC MEANS excludes any
observation in the input data set whose combination of class variables is not in the
CLASSDATA= data set.
Tip: Use the CLASSDATA= data set to lter or to supplement the input data set.
Featured in:
COMPLETETYPES
creates all possible combinations of class variables even if the combination does not
occur in the input data set.
Interaction: The PRELOADFMT option in the CLASS statement ensures that
PROC MEANS writes all user-dened format ranges or values for the
combinations of class variables to the output, even when a frequency is zero.
Using COMPLETETYPES does not increase the memory requirements.
Featured in: Example 6 on page 570
Tip:
DATA=SAS-data-set
DESCENDTYPES
DESCENDING | DESCEND
Tip:
EXCLNPWGTS
excludes observations with nonpositive weight values (zero or negative) from the
analysis. By default, PROC MEANS treats observations with negative weights like
those with zero weights and counts them in the total number of observations.
Alias: EXCLNPWGT
See also: WEIGHT= on page 548 and WEIGHT Statement on page 549
EXCLUSIVE
excludes from the analysis all combinations of the class variables that are not found
in the CLASSDATA= data set.
Requirement: If a CLASSDATA= data set is not specied, then this option is
ignored.
Featured in:
FW=eld-width
species the eld width to display the statistics in printed or displayed output. FW=
has no effect on statistics that are saved in an output data set.
Default: 12
If PROC MEANS truncates column labels in the output, then increase the eld
width.
Tip:
530
Chapter 27
Featured in:
page 567
IDMIN
species that the output data set contain the minimum value of the ID variables.
Interaction: Specify PRINTIDVARS to display the value of the ID variables in the
output.
See also: ID Statement on page 540
MAXDEC=number
species the maximum number of decimal places to display the statistics in the
printed or displayed output. MAXDEC= has no effect on statistics that are saved in
an output data set.
Default: BEST. width for columnar format, typically about 7.
Range:
0-8
Featured in:
MISSING
considers missing values as valid values to create the combinations of class variables.
Special missing values that represent numeric values (the letters A through Z and
the underscore (_) character) are each considered as a separate value.
Default: If you omit MISSING, then PROC MEANS excludes the observations with
NONOBS
suppresses the column that displays the total number of observations for each unique
combination of the values of the class variables. This column corresponds to the
_FREQ_ variable in the output data set.
See also: The N Obs Statistic on page 556
Featured in:
NOPRINT
disables oating point exception (FPE) recovery during data processing. By default,
PROC MEANS traps these errors and sets the statistic to missing.
In operating environments where the overhead of FPE recovery is signicant,
NOTRAP can improve performance. Note that normal SAS FPE handling is still in
effect so that PROC MEANS terminates in the case of math exceptions.
NWAY
species that the output data set contain only statistics for the observations with the
highest _TYPE_ and _WAY_ values. When you specify class variables, this
corresponds to the combination of all class variables.
Interaction: If you specify a TYPES statement or a WAYS statement, then PROC
531
species the sort order to create the unique combinations for the values of the class
variables in the output, where
DATA
orders values according to their order in the input data set.
Interaction: If you use PRELOADFMT in the CLASS statement, then the order
for the values of each class variable matches the order that PROC FORMAT
uses to store the values of the associated user-dened format. If you use the
CLASSDATA= option, then PROC MEANS uses the order of the unique values
of each class variable in the CLASSDATA= data set to order the output levels.
If you use both options, then PROC MEANS rst uses the user-dened formats
to order the output. If you omit EXCLUSIVE, then PROC MEANS appends
after the user-dened format and the CLASSDATA= values the unique values of
the class variables in the input data set based on the order in which they are
encountered.
Tip: By default, PROC FORMAT stores a format denition in sorted order. Use
the NOTSORTED option to store the values or ranges of a user dened format
in the order that you dene them.
FORMATTED
orders values by their ascending formatted values. This order depends on your
operating environment.
Alias: FMT | EXTERNAL
FREQ
orders values by descending frequency count so that levels with the most
observations are listed rst.
Interaction: For multiway combinations of the class variables, PROC MEANS
determines the order of a class variable combination from the individual class
variable frequencies.
Interaction: Use the ASCENDING option in the CLASS statement to order
Use NOPRINT when you want to create only an OUT= output data set.
Featured in:
532
Chapter 27
PRINTALLTYPES
displays all requested combinations of class variables (all _TYPE_ values) in the
printed or displayed output. Normally, PROC MEANS shows only the NWAY type.
Alias:
PRINTALL
Interaction: If you use the NWAY option, the TYPES statement, or the WAYS
PRINTIDVARS
PRINTIDS
species the default number of markers to use for the P2 quantile estimation method.
The number of markers controls the size of xed memory space.
Default: The default value depends on which quantiles you request. For the median
(P50), number is 7. For the quartiles (P25 and P50), number is 25. For the
quantiles P1, P5, P10, P90, P95, or P99, number is 105. If you request several
quantiles, then PROC MEANS uses the largest value of number.
Range:
Increase the number of markers above the defaults settings to improve the
accuracy of the estimate; reduce the number of markers to conserve memory and
computing time.
Tip:
species the method that PROC MEANS uses to process the input data when it
computes quantiles. If the number of observations is less than or equal to the
QMARKERS= value and QNTLDEF=5, then both methods produce the same results.
OS
uses order statistics. This is the same method that PROC UNIVARIATE uses.
Note: This technique can be very memory-intensive.
P2|HIST
uses the P2 method to approximate the quantile.
Default: OS
Restriction: When QMETHOD=P2, PROC MEANS will not compute weighted
quantiles.
When QMETHOD=P2, reliable estimations of some quantiles (P1,P5,P95,P99)
may not be possible for some data sets.
Tip:
species the mathematical denition that PROC MEANS uses to calculate quantiles
when QMETHOD=OS. To use QMETHOD=P2, you must use QNTLDEF=5.
Default: 5
Alias:
PCTLDEF=
533
statistic-keyword(s)
species which statistics to compute and the order to display them in the output.
The available keywords in the PROC statement are
Descriptive statistic keywords
CLM
RANGE
CSS
SKEWNESS|SKEW
CV
STDDEV|STD
KURTOSIS|KURT
STDERR
LCLM
SUM
MAX
SUMWGT
MEAN
UCLM
MIN
USS
VAR
NMISS
Quantile statistic keywords
MEDIAN|P50
Q3|P75
P1
P90
P5
P95
P10
P99
Q1|P25
QRANGE
To compute standard error, condence limits for the mean, and the
Students t-test, you must use the default value of the VARDEF= option, which is
DF. To compute skewness or kurtosis, you must use VARDEF=N or VARDEF=DF.
Requirement:
Use CLM or both LCLM and UCLM to compute a two-sided condence limit
for the mean. Use only LCLM or UCLM, to compute a one-sided condence limit.
Tip:
Main discussion:
Featured in:
SUMSIZE=value
species the amount of memory that is available for data summarization when you
use class variables. value may be one of the following:
n|nK| nM| nG
species the amount of memory available in bytes, kilobytes, megabytes, or
gigabytes, respectively. If n is 0, then PROC MEANS use the value of the SAS
system option SUMSIZE=.
MAXIMUM|MAX
species the maximum amount of memory that is available.
534
Chapter 27
For best results, do not make SUMSIZE= larger than the amount of physical
memory that is available for the PROC step. If additional space is needed, then
PROC MEANS uses utility les.
Tip:
See also: The SAS system option SUMSIZE= in SAS Language Reference:
Dictionary.
Main discussion: Computational Resources on page 552
THREADS | NOTHREADS
enables or disables parallel processing of the input data set. This option overrides
the SAS system option THREADS | NOTHREADS. See SAS Language Reference:
Concepts for more information about parallel processing.
Default: value of SAS system option THREADS | NOTHREADS.
Interaction: PROC MEANS honors the SAS system option THREADS except when
species the divisor to use in the calculation of the variance and standard deviation.
Table 27.1 on page 534 shows the possible values for divisor and associated divisors.
Table 27.1
Value
Divisor
DF
degrees of freedom
number of observations
WDF
WEIGHT | WGT
sum of weights
P (x 0
n1
n
(6i wi) 1
6i wi
CSS=divisor
CSS
Default: DF
To compute the standard error of the mean, condence limits for the
mean, or the Students t-test, use the default value of VARDEF=.
Tip: When you use the WEIGHT statement and VARDEF=DF, the variance is an
estimate of 2 , where the variance of the ith observation is var (xi) = 2 =wi and
wi is the weight for the ith observation. This yields an estimate of the variance of
an observation with unit weight.
Tip: When you use the WEIGHT statement and VARDEF=WGT, the computed
variance is asymptotically (for large n) an estimate of 2 =w, where w is the
average weight. This yields an asymptotic estimate of the variance of an
observation with average weight.
Requirement:
BY Statement
535
BY Statement
Produces separate statistics for each BY group.
Main discussion:
BY on page 58
Required Arguments
variable
species the variable that the procedure uses to form BY groups. You can specify
more than one variable. If you omit the NOTSORTED option in the BY statement,
then the observations in the data set either must be sorted by all the variables that
you specify or must be indexed appropriately. Variables in a BY statement are called
BY variables.
Options
DESCENDING
species that the observations are sorted in descending order by the variable that
immediately follows the word DESCENDING in the BY statement.
NOTSORTED
species that observations are not necessarily sorted in alphabetic or numeric order.
The observations are sorted in another way, for example, chronological order.
The requirement for ordering or indexing observations according to the values of
BY variables is suspended for BY-group processing when you use the NOTSORTED
option. In fact, the procedure does not use an index if you specify NOTSORTED. The
procedure denes a BY group as a set of contiguous observations that have the same
values for all BY variables. If observations with the same values for the BY variables
are not contiguous, then the procedure treats each contiguous set as a separate BY
group.
536
CLASS Statement
Chapter 27
CLASS Statement
Species the variables whose values dene the subgroup combinations for the analysis.
You can use multiple CLASS statements.
Tip: Some CLASS statement options are also available in the PROC MEANS
statement. They affect all CLASS variables. Options that you specify in a CLASS
statement apply only to the variables in that CLASS statement.
See also: For information about how the CLASS statement groups formatted values, see
Formatted Values on page 25.
Featured in: Example 2 on page 560, Example 4 on page 564, Example 5 on page 567,
Example 6 on page 570, and Example 10 on page 578
Tip:
Required Arguments
variable(s)
species one or more variables that the procedure uses to group the data. Variables
in a CLASS statement are referred to as class variables. Class variables are numeric
or character. Class variables can have continuous values, but they typically have a
few discrete values that dene levels of the variable. You do not have to sort the data
by class variables.
Interaction: Use the TYPES statement or the WAYS statement to control which
class variables that PROC MEANS uses to group the data.
Tip: To reduce the number of class variable levels, use a FORMAT statement to
combine variable values. When a format combines several internal values into one
formatted value, PROC MEANS outputs the lowest internal value.
See also: Using Class Variables on page 550
Options
ASCENDING
excludes from the analysis all combinations of the class variables that are not found
in the preloaded range of user-dened formats.
Requirement:
CLASS Statement
537
formats.
Featured in:
GROUPINTERNAL
species not to apply formats to the class variables when PROC MEANS groups the
values to create combinations of class variables.
Interaction: If you specify the PRELOADFMT option, then PROC MEANS ignores
Tip:
considers missing values as valid values for the class variable levels. Special missing
values that represent numeric values (the letters A through Z and the underscore (_)
character) are each considered as a separate value.
Default: If you omit MISSING, then PROC MEANS excludes the observations with
See also:
Featured in:
MLF
enables PROC MEANS to use the primary and secondary format labels for a given
range or overlapping ranges to create subgroup combinations when a multilabel
format is assigned to a class variable.
You must use PROC FORMAT and the MULTILABEL option in the
VALUE statement to create a multilabel format.
Requirement:
Interaction: If you use the OUTPUT statement with MLF, then the class variable
contains a character string that corresponds to the formatted value. Because the
formatted value becomes the internal value, the length of this variable is the
number of characters in the longest format label.
Interaction: Using MLF with ORDER=FREQ may not produce the order that you
Tip:
See also: The MULTILABEL option in the VALUE statement of the FORMAT
Note: When the formatted values overlap, one internal class variable value maps
to more than one class variable subgroup combination. Therefore, the sum of the N
statistics for all subgroups is greater than the number of observations in the data set
(the overall N statistic). 4
ORDER=DATA | FORMATTED | FREQ | UNFORMATTED
species the order to group the levels of the class variables in the output, where
DATA
orders values according to their order in the input data set.
538
CLASS Statement
Chapter 27
Interaction: If you use PRELOADFMT, then the order of the values of each class
variable matches the order that PROC FORMAT uses to store the values of the
associated user-dened format. If you use the CLASSDATA= option in the
PROC statement, then PROC MEANS uses the order of the unique values of
each class variable in the CLASSDATA= data set to order the output levels. If
you use both options, then PROC MEANS rst uses the user-dened formats to
order the output. If you omit EXCLUSIVE in the PROC statement, then PROC
MEANS appends after the user-dened format and the CLASSDATA= values
the unique values of the class variables in the input data set based on the order
in which they are encountered.
Tip: By default, PROC FORMAT stores a format denition in sorted order. Use
the NOTSORTED option to store the values or ranges of a user dened format
in the order that you dene them.
Featured in: Example 10 on page 578
FORMATTED
orders values by their ascending formatted values. This order depends on your
operating environment. If no format has been assigned to a class variable, then
the default format, BEST12., is used.
Alias: FMT | EXTERNAL
Featured in: Example 5 on page 567
FREQ
orders values by descending frequency count so that levels with the most
observations are listed rst.
Interaction: For multiway combinations of the class variables, PROC MEANS
determines the order of a level from the individual class variable frequencies.
Interaction: Use the ASCENDING option to order values by ascending frequency
count.
Featured in: Example 5 on page 567
UNFORMATTED
orders values by their unformatted values, which yields the same order as PROC
SORT. This order depends on your operating environment. This sort sequence is
particularly useful for displaying dates chronologically.
Alias: UNFMT | INTERNAL
Default: UNFORMATTED
By default, all orders except FREQ are ascending. For descending orders, use
the DESCENDING option.
Tip:
species that all formats are preloaded for the class variables.
Requirement: PRELOADFMT has no effect unless you specify either
COMPLETETYPES, EXCLUSIVE, or ORDER=DATA and you assign formats to
the class variables.
Interaction: To limit PROC MEANS output to the combinations of formatted class
variable values present in the input data set, use the EXCLUSIVE option in the
CLASS statement.
Interaction: To include all ranges and values of the user-dened formats in the
output, even when the frequency is zero, use COMPLETETYPES in the PROC
statement.
Featured in: Example 6 on page 570
FREQ Statement
539
Computer Resources
The total of unique class values that PROC MEANS allows depends on the amount of
computer memory that is available. See Computational Resources on page 552 for
more information.
The GROUPINTERNAL option can improve computer performance because the
grouping process is based on the internal values of the class variables. If a numeric
class variable is not assigned a format and you do not specify GROUPINTERNAL, then
PROC MEANS uses the default format, BEST12., to format numeric values as
character strings. Then PROC MEANS groups these numeric variables by their
character values, which takes additional time and computer memory.
FREQ Statement
Species a numeric variable that contains the frequency of each observation.
Main discussion:
FREQ on page 61
FREQ variable;
Required Arguments
variable
species a numeric variable whose value represents the frequency of the observation.
If you use the FREQ statement, then the procedure assumes that each observation
represents n observations, where n is the value of variable. If n is not an integer,
540
ID Statement
Chapter 27
then SAS truncates it. If n is less than 1 or is missing, then the procedure does not
use that observation to calculate statistics.
The sum of the frequency variable represents the total number of observations.
Note: The FREQ variable does not affect how PROC MEANS identies multiple
extremes when you use the IDGROUP syntax in the OUTPUT statement. 4
ID Statement
Includes additional variables in the output data set.
See Also: Discussion of id-group-specication in OUTPUT Statement on page 540.
ID variable(s);
Required Arguments
variable(s)
identies one or more variables from the input data set whose maximum values for
groups of observations PROC MEANS includes in the output data set.
Interaction: Use IDMIN in the PROC statement to include the minimum value of
the ID variables in the output data set.
Tip: Use the PRINTIDVARS option in the PROC statement to include the value of
the ID variable in the displayed output.
OUTPUT Statement
Writes statistics to a new SAS data set.
Tip: You can use multiple OUTPUT statements to create several OUT= data sets.
Featured in: Example 8 on page 575, Example 9 on page 577, Example 10 on page 578,
OUTPUT Statement
541
Options
OUT=SAS-data-set
names the new output data set. If SAS-data-set does not exist, then PROC MEANS
creates it. If you omit OUT=, then the data set is named DATAn, where n is the
smallest integer that makes the name unique.
Default: DATAn
You can use data set options with the OUT= option. See Data Set Options on
page 18 for a list.
Tip:
output-statistic-specication(s)
species the statistics to store in the OUT= data set and names one or more
variables that contain the statistics. The form of the output-statistic-specication is
statistic-keyword<(variable-list)>=<name(s)>
where
statistic-keyword
species which statistic to store in the output data set. The available statistic
keywords are
Descriptive statistics keyword
CSS
RANGE
CV
SKEWNESS|SKEW
KURTOSIS|KURT
STDDEV |STD
LCLM
STDERR
MAX
SUM
MEAN
SUMWGT
MIN
UCLM
USS
NMISS
VAR
Q3|P75
P1
P90
P5
P95
P10
P99
Q1|P25
QRANGE
542
OUTPUT Statement
Chapter 27
By default the statistics in the output data set automatically inherit the
analysis variables format, informat, and label. However, statistics computed for
N, NMISS, SUMWGT, USS, CSS, VAR, CV, T, PROBT, SKEWNESS, and
KURTOSIS will not inherit the analysis variables format because this format may
be invalid for these statistics (for example, dollar or datetime formats).
Restriction: If you omit variable and name(s), then PROC MEANS allows the
statistic-keyword only once in a single OUTPUT statement, unless you also use
the AUTONAME option.
Featured in: Example 8 on page 575, Example 9 on page 577, Example 11 on
name(s)
species one or more names for the variables in output data set that will contain
the analysis variable statistics. The rst name contains the statistic for the rst
analysis variable; the second name contains the statistic for the second analysis
variable; and so on.
Default: the analysis variable name. If you specify AUTONAME, then the default
which you specify the analysis variables to store the statistics in the output
data set variables.
Featured in: Example 8 on page 575
Default: If you use the CLASS statement and an OUTPUT statement without an
Tip:
id-group-specication
combines the features and extends the ID statement, the IDMIN option in the PROC
statement, and the MAXID and MINID options in the OUTPUT statement to create
an OUT= data set that identies multiple extreme values. The form of the
id-group-specication is
IDGROUP (<MIN|MAX (variable-list-1) <MIN|MAX (variable-list-n)>>
<<MISSING> <OBS> < LAST>> OUT <[n]>
(id-variable-list)=< name(s)>)
MIN|MAX(variable-list)
species the selection criteria to determine the extreme values of one or more
input data set variables specied in variable-list. Use MIN to determine the
minimum extreme value and MAX to determine the maximum extreme value.
When you specify multiple selection variables, the ordering of observations for
the selection of n extremes is done the same way that PROC SORT sorts data with
multiple BY variables. PROC MEANS concatenates the variable values into a
single key. The MAX(variable-list) selection criterion is similar to using PROC
SORT and the DESCENDING option in the BY statement.
OUTPUT Statement
543
Default: If you do not specify MIN or MAX, then PROC MEANS uses the
O(
log )
O ( log )
The OUT= data set contains the variables MinX_1, MinX_2, MinY_1, MinY_2,
MinZ_1, and MinZ_2.
544
OUTPUT Statement
Chapter 27
(id-variable-list)
identies one or more input data set variables whose values PROC MEANS
includes in the OUT= data set. PROC MEANS determines which observations to
output by the selection criteria that you specify (MIN, MAX, and LAST).
name(s)
species one or more names for variables in the OUT= data set.
Default: If you omit name, then PROC MEANS uses the names of variables in the
id-variable-list.
Tip: Use the AUTONAME option to automatically resolve naming conicts.
Alias:
IDGRP
Requirement:
Tip:
When you want the output data set to contain extreme values along with other
id variables, it is more efcient to include them in the id-variable-list than to
request separate statistics. For example, the statement
Tip:
Featured in:
CAUTION:
The IDGROUP syntax allows you to create output variables with the same name. When
this happens, only the rst variable appears in the output data set. Use the
AUTONAME option to automatically resolve these naming conicts. 4
Note: If you specify fewer new variable names than the combination of analysis
variables and identication variables, then the remaining output variables use the
corresponding names of the ID variables as soon as PROC MEANS exhausts the list
of new variable names. 4
maximum-id-specication(s)
species that one or more identication variables be associated with the maximum
values of the analysis variables. The form of the maximum-id-specication is
MAXID < (variable-1 < (id-variable-list-1)> <variable-n
< (id-variable-list-n)>>)> = name(s)
variable
identies the numeric analysis variable whose maximum values PROC MEANS
determines. PROC MEANS may determine several maximum values for a variable
because, in addition to the overall maximum value, subgroup levels, which are
dened by combinations of class variables values, also have maximum values.
Tip: If you use an ID statement and omit variable, then PROC MEANS uses all
analysis variables.
id-variable-list
identies one or more variables whose values identify the observations with the
maximum values of the analysis variable.
Default: the ID statement variables
OUTPUT Statement
545
name(s)
species the names for new variables that contain the values of the identication
variable associated with the maximum value of each analysis variable.
Tip: If you use an ID statement, and omit variable and id-variable, then PROC
MEANS associates all ID statement variables with each analysis variable. Thus,
for each analysis variable, the number of variables that are created in the output
data set equals the number of variables that you specify in the ID statement.
Tip: Use the AUTONAME option to automatically resolve naming conicts.
Limitation: If multiple observations contain the maximum value within a class
level, then PROC MEANS saves the value of the ID variable for only the rst of
those observations in the output data set.
Featured in: Example 11 on page 580
CAUTION:
The MAXID syntax allows you to create output variables with the same name. When
this happens, only the rst variable appears in the output data set. Use the
AUTONAME option to automatically resolve these naming conicts. 4
Note: If you specify fewer new variable names than the combination of analysis
variables and identication variables, then the remaining output variables use the
corresponding names of the ID variables as soon as PROC MEANS exhausts the list
of new variable names. 4
minid-specication
AUTOLABEL
species that PROC MEANS appends the statistic name to the end of the variable
label. If an analysis variable has no label, then PROC MEANS creates a label by
appending the statistic name to the analysis variable name.
Featured in: Example 12 on page 583
AUTONAME
species that PROC MEANS creates a unique variable name for an output statistic
when you do not explicitly assign the variable name in the OUTPUT statement. This
is accomplished by appending the statistic-keyword to the end of the input variable
name from which the statistic was derived. For example, the statement
output min(x)=/autoname;
produces two variables, x_Min and x_Min2, in the output data set.
Featured in: Example 12 on page 583
KEEPLEN
species that statistics in the output data set inherit the length of the analysis
variable that PROC MEANS uses to derive them.
546
TYPES Statement
Chapter 27
CAUTION:
You permanently lose numeric precision when the length of the analysis variable causes
PROC MEANS to truncate or round the value of the statistic. However, the precision of
the statistic will match that of the input. 4
LEVELS
includes a variable named _LEVEL_ in the output data set. This variable contains a
value from 1 to n that indicates a unique combination of the values of class variables
(the values of _TYPE_ variable).
Main discussion: Output Data Set on page 557
Featured in: Example 8 on page 575
NOINHERIT
species that the variables in the output data set that contain statistics do not
inherit the attributes (label and format) of the analysis variables which are used to
derive them.
Tip: By default, the output data set includes an output variable for each analysis
variable and for ve observations that contain N, MIN, MAX, MEAN, and
STDDEV. Unless you specify NOINHERIT, this variable inherits the format of the
analysis variable, which may be invalid for the N statistic (for example, datetime
formats).
WAYS
includes a variable named _WAY_ in the output data set. This variable contains a
value from 1 to the maximum number of class variables that indicates how many
class variables PROC MEANS combines to create the TYPE value.
Main discussion: Output Data Set on page 557
See also: WAYS Statement on page 548
Featured in: Example 8 on page 575
TYPES Statement
Identies which of the possible combinations of class variables to generate.
Main discussion: Output Data Set on page 557
Requirement:
Featured in:
CLASS statement
Example 2 on page 560, Example 5 on page 567, and Example 12 on page
583
TYPES request(s);
Required Arguments
request(s)
VAR Statement
547
Equivalent to
types (A
B)*(C D);
types (A
B C)*D;
Interaction The CLASSDATA= option places constraints on the NWAY type. PROC
MEANS generates all other types as if derived from the resulting NWAY type.
Tip: Use ( ) to request the overall total (_TYPE_=0).
Tip: If you do not need all types in the output data set, then use the TYPES
statement to specify particular subtypes rather than applying a WHERE clause to
the data set. Doing so saves time and computer memory.
then the B*C analysis (_TYPE_=3) is written rst, followed by the A*C analysis
(_TYPE_=5). However, if you specify
class B A C;
types (A B)*C;
VAR Statement
Identies the analysis variables and their order in the output.
If you omit the VAR statement, then PROC MEANS analyzes all numeric
variables that are not listed in the other statements. When all variables are character
variables, PROC MEANS produces a simple count of observations.
Tip: You can use multiple VAR statements.
See also: Chapter 46, The SUMMARY Procedure, on page 1177
Featured in: Example 1 on page 558
Default:
548
WAYS Statement
Chapter 27
Required Arguments
variable(s)
identies the analysis variables and species their order in the results.
Option
WEIGHT=weight-variable
species a numeric variable whose values weight the values of the variables that are
specied in the VAR statement. The variable does not have to be an integer. If the
value of the weight variable is
Weight value
PROC MEANS
less than 0
converts the value to zero and counts the observation in the total
number of observations
missing
To exclude observations that contain negative and zero weights from the analysis,
use EXCLNPWGT. Note that most SAS/STAT procedures, such as PROC GLM,
exclude negative and zero weights by default.
The weight variable does not change how the procedure determines the range,
extreme values, or number of missing values.
Restriction: To compute weighted quantiles, use QMETHOD=OS in the PROC
statement.
Restriction: Skewness and kurtosis are not available with the WEIGHT option.
When you use the WEIGHT option, consider which value of the VARDEF=
option is appropriate. See the discussion of VARDEF= on page 534.
Tip:
Tip:
Note: Prior to Version 7 of SAS, the procedure did not exclude the observations
with missing weights from the count of observations. 4
WAYS Statement
Species the number of ways to make unique combinations of class variables.
Tip:
Featured in:
WAYS list;
WEIGHT Statement
549
Required Arguments
list
species one or more integers that dene the number of class variables to combine to
form all the unique combinations of class variables. For example, you can specify 2
for all possible pairs and 3 for all possible triples. The list can be specied in the
following ways:
m
m1 m2 mn
m1,m2,,mn
m TO n <BY increment>
m1,m2, TO m3 <BY increment>, m4
Range: 0 to maximum number of class variables
Example: To create the two-way types for the classication variables A, B, and C,
use
class A B C ;
ways 2;
This WAYS statement is equivalent to specifying a*b, a*c, and b*c in the TYPES
statement.
See also: WAYS option on page 546
WEIGHT Statement
Species weights for observations in the statistical calculations.
See also: For information on how to calculate weighted statistics and for an example
that uses the WEIGHT statement, see WEIGHT on page 63
WEIGHT variable;
Required Arguments
variable
species a numeric variable whose values weight the values of the analysis variables.
The values of the variable do not have to be integers. If the value of the weight
variable is
Weight value
PROC MEANS
less than 0
missing
550
Chapter 27
To exclude observations that contain negative and zero weights from the analysis,
use EXCLNPWGT. Note that most SAS/STAT procedures, such as PROC GLM,
exclude negative and zero weights by default.
Restriction: To compute weighted quantiles, use QMETHOD=OS in the PROC
statement.
Restriction: Skewness and kurtosis are not available with the WEIGHT statement.
Interaction: If you use the WEIGHT= option in a VAR statement to specify a
weight variable, then PROC MEANS uses this variable instead to weight those
VAR statement variables.
When you use the WEIGHT statement, consider which value of the VARDEF=
option is appropriate. See the discussion of VARDEF= on page 534 and the
calculation of weighted statistics in Keywords and Formulas on page 1340 for
more information.
Tip:
Note: Prior to Version 7 of SAS, the procedure did not exclude the observations
with missing weights from the count of observations. 4
CAUTION:
Single extreme weight values can cause inaccurate results. When one (and only one)
weight value is many orders of magnitude larger than the other weight values (for
14
example, 49 weight values of 1 and one weight value of 1210 ), certain statistics
might not be within acceptable accuracy limits. The affected statistics are those that
are based on the second moment (such as standard deviation, corrected sum of
squares, variance, and standard error of the mean). Under certain circumstances, no
warning is written to the SAS log. 4
551
is equivalent to
proc means;
class a b c d e;
types a*b a*c a*d a*e b*c b*d b*e c*d c*e d*e
a*b*c a*b*d a*b*e a*c*d a*c*e a*d*e
b*c*d b*c*e c*d*e;
run;
If you omit the TYPES statement and the WAYS statement, then PROC MEANS uses
all class variables to subgroup the data (the NWAY type) for displayed output and
computes all types (2k ) for the output data set.
m
2
---------------------------
552
Computational Resources
Chapter 27
In the example, PROC MEANS does not list male cats before female cats. Instead, it
determines the order of gender for all types over the entire data set. PROC MEANS
found more observations for female pets (f=4, m=3).
Computational Resources
PROC MEANS employs the same memory allocation scheme across all operating
environments. When class variables are involved, PROC MEANS must keep a copy of
each unique value of each class variable in memory. You can estimate the memory
requirements to group the class variable by calculating
Nci
Lci
K
Lc
c c
Nc1 :::Ncn
Clearly, the memory requirements of the levels overwhelm those of the class variables.
For this reason, PROC MEANS may open one or more utility les and write the levels
of one or more types to disk. These types are either the primary types that PROC
MEANS built during the input data scan or the derived types.
If PROC MEANS must write partially complete primary types to disk while it
processes input data, then one or more merge passes may be required to combine type
levels in memory with those on disk. In addition, if you use an order other than DATA
for any class variable, then PROC MEANS groups the completed types on disk. For this
reason, the peak disk space requirements can be more than twice the memory
requirements for a given type.
When PROC MEANS uses a temporary work le, you will receive the following note
in the SAS log:
Processing on disk occurred during summarization.
Peak disk usage was approximately nnn Mbytes.
Adjusting SUMSIZE may improve performance.
Condence Limits
553
Condence Limits
With the keywords CLM, LCLM, and UCLM, you can compute condence limits for
the mean. A condence limit is a range, constructed around the value of a sample
statistic, that contains the corresponding true population value with given probability
(ALPHA=) in repeated sampling.
554
Students t Test
Chapter 27
A two-sided 100 (1
limits
q P
1
where s is
n01 (xi x) and t(10=2;n01) is the (1 =2) critical value of the
Students t statistics with n 1 degrees of freedom.
A one-sided 100 (1 )% condence interval is computed as
s
x + t(10;n01) p
n
s
x 0 t(10;n01) p
n
A two-sided 100 (1
and upper limits
(upper)
(lower)
n01
n01
s 2
;s 2
(10=2;n01)
(=2;n01)
2 0=2;n01) and 2=2;n01) are the (1 0 =2) and =2 critical values of the
(1
(
chi-square statistic with n 0 1 degrees of freedom. A one-sided 100 (1 0 )%
condence interval is computed by replacing =2 with .
A 100 (1 0 )% condence interval for the variance has upper and lower limits that
where
are equal to the squares of the corresponding upper and lower limits for the standard
deviation.
When you use the WEIGHT statement or WEIGHT= in a VAR statement and the
default value of VARDEF=, which is DF, the 100 (1 )% condence interval for the
weighted mean has upper and lower limits
yw 6 t(10=2)
sw
sP
n
i=1
wi
Students t Test
PROC MEANS calculates the t statistic as
t=
x 0 0
p
s= n
Quantiles
555
where x is the sample mean, n is the number of nonmissing values for a variable, and s
is the sample standard deviation. Under the null hypothesis, the population mean
equals 0 . When the data values are approximately normally distributed, the
probability under the null hypothesis of a t statistic as extreme as, or more extreme
than, the observed value (the p-value) is obtained from the t distribution with n 1
degrees of freedom. For large n, the t statistic is asymptotically equivalent to a z test.
When you use the WEIGHT statement or WEIGHT= in a VAR statement and the
default value of VARDEF=, which is DF, the Students t statistic is calculated as
tw =
y w 0 0
n
wi
sw =
i=1
sP
where yw is the weighted mean, sw is the weighted standard deviation, and wi is the
weight for ith observation. The tw statistic is treated as having a Students t
distribution with n 1 degrees of freedom. If you specify the EXCLNPWGT option in
the PROC statement, then n is the number of nonmissing observations when the value
of the WEIGHT variable is positive. By default, n is the number of nonmissing
observations for the WEIGHT variable.
Quantiles
The options QMETHOD=, QNTLDEF=, and QMARKERS= determine how PROC
MEANS calculates quantiles. QNTLDEF= deals with the mathematical denition of a
quantile. See Quantile and Related Statistics on page 1345. QMETHOD= deals with
the mechanics of how PROC MEANS handles the input data. The two methods are
OS
reads all data into memory and sorts it by unique value.
P2
accumulates all data into a xed sample size that is used to approximate the
quantile.
If data set A has 100 unique values for a numeric variable X and data set B has 1000
unique values for numeric variable X, then QMETHOD=OS for data set B will take 10
times as much memory as it does for data set A. If QMETHOD=P2, then both data sets
A and B will require the same memory space to generate quantiles.
The QMETHOD=P2 technique is based on the piecewise-parabolic (P2) algorithm
invented by Jain and Chlamtac (1985). P2 is a one-pass algorithm to determine
quantiles for a large data set. It requires a xed amount of memory for each variable
for each level within the type. However, using simulation studies, reliable estimations
of some quantiles (P1, P5, P95, P99) may not be possible for some data sets such as
those with heavily tailed or skewed distributions.
If the number of observations is less than the QMARKERS= value, then
QMETHOD=P2 produces the same results as QMETHOD=OS when QNTLDEF=5. To
compute weighted quantiles, you must use QMETHOD=OS.
556
Chapter 27
Missing Values
PROC MEANS excludes missing values for the analysis variables before calculating
statistics. Each analysis variable is treated individually; a missing value for an
observation in one variable does not affect the calculations for other variables. The
statements handle missing values as follows:
3 If a class variable has a missing value for an observation, then PROC MEANS
excludes that observation from the analysis unless you use the MISSING option in
the PROC statement or CLASS statement.
557
3
3
3
3
3 the variable _FREQ_ that contains the number of observations that a given output
level represents.
3 the variables requested in the OUTPUT statement that contain the output
statistics and extreme values.
3 the variable _STAT_ that contains the names of the default statistics if you omit
statistic keywords.
m
k
1
2k 0 1
n+1
m
where is the number of class variables and is the number of observations for the
given BY group in the input data set and
is 1, 5, or 6.
558
Chapter 27
PROC MEANS determines the actual number of levels for a given type from the
number of unique combinations of each active class variable. A single level is composed
of all input observations whose formatted class values match.
Figure 27.1 on page 558 shows the values of _TYPE_ and the number of observations
in the data set when you specify one, two, and three class variables.
th
re
e
tw CLA
o
SS
C
on LAS var
ia
e
CL S v
bl
es
AS ari
ab
S
le
va
s
ria
bl
e
Figure 27.1
Number of observations
of this _TYPE_ and _WAY_
in the data set
Total number of
observations
in the data set
C B
A _WAY_
_TYPE_
Subgroup
defined by
Total
A*B
a*b
A*C
a*c
B*C
b*c
1+a+b+a*b+c
A*B*C
a*b*c
+a*c+b*c+a*b*c
Character binary
equivalent of
_TYPE_
(CHARTYPE
option)
A ,B ,C=CLASS
variables
1+a
1+a+b+a*b
a, b, c,=number of levels of A, B, C,
respectively
This example
3 species the analysis variables
3 computes the statistics for the specied keywords and displays them in order
3 species the eld width of the statistics.
Program
559
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60;
Create the CAKE data set. CAKE contains data from a cake-baking contest: each
participants last name, age, score for presentation, score for taste, cake avor, and number of
cake layers. The number of cake layers is missing for two observations. The cake avor is
missing for another observation.
data cake;
input LastName $ 1-12 Age 13-14 PresentScore 16-17
TasteScore 19-20 Flavor $ 23-32 Layers 34 ;
datalines;
Orlando
27 93 80 Vanilla
1
Ramey
32 84 72 Rum
2
Goldston
46 68 75 Vanilla
1
Roe
38 79 73 Vanilla
2
Larsen
23 77 84 Chocolate .
Davis
51 86 91 Spice
3
Strickland 19 82 79 Chocolate 1
Nguyen
57 77 84 Vanilla
.
Hildenbrand 33 81 83 Chocolate 1
Byron
62 72 87 Vanilla
2
Sanders
26 56 79 Chocolate 1
Jaeger
43 66 74
1
Davis
28 69 75 Chocolate 2
Conrad
69 85 94 Vanilla
1
Walters
55 67 72 Chocolate 2
Rossburger 28 78 81 Spice
2
Matthew
42 81 92 Chocolate 2
Becker
36 62 83 Spice
2
Anderson
27 87 85 Chocolate 1
Merritt
62 73 84 Chocolate 1
;
Specify the analyses and the analysis options. The statistic keywords specify the statistics
and their order in the output. FW= uses a eld width of eight to display the statistics.
proc means data=cake n mean max min range std fw=8;
Specify the analysis variables. The VAR statement species that PROC MEANS calculate
statistics on the PresentScore and TasteScore variables.
var PresentScore TasteScore;
560
Output
Chapter 27
Output
PROC MEANS lists PresentScore rst because this is the rst variable that is specied in the
VAR statement. A eld width of eight truncates the statistics to four decimal places.
This example
3 analyzes the data for the two-way combination of class variables and across all
observations
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60;
Program
561
Create the GRADE data set. GRADE contains each students last name, gender, status of
either undergraduate (1) or graduate (2), expected year of graduation, class section (A or B),
nal exam score, and nal grade for the course.
data grade;
input Name $ 1-8 Gender $ 11 Status $13 Year $ 15-16
Section $ 18 Score 20-21 FinalGrade 23-24;
datalines;
Abbott
F 2 97 A 90 87
Branford M 1 98 A 92 97
Crandell M 2 98 B 81 71
Dennison M 1 97 A 85 72
Edgar
F 1 98 B 89 80
Faust
M 1 97 B 78 73
Greeley
F 2 97 A 82 91
Hart
F 1 98 B 84 80
Isley
M 2 97 A 88 86
Jasper
M 1 97 B 91 93
;
Generate the default statistics and specify the analysis options. Because no statistics are
specied in the PROC MEANS statement, all default statistics (N, MEAN, STD, MIN, MAX) are
generated. MAXDEC= limits the displayed statistics to three decimal places.
proc means data=grade maxdec=3;
Specify the analysis variable. The VAR statement species that PROC MEANS calculate
statistics on the Score variable.
var Score;
Specify subgroups for the analysis. The CLASS statement separates the analysis into
subgroups. Each combination of unique values for Status and Year represents a subgroup.
class Status Year;
Specify which subgroups to analyze. The TYPES statement requests that the analysis be
performed on all the observations in the GRADE data set as well as the two-way combination
of Status and Year, which results in four subgroups (because Status and Year each have two
unique values).
types () status*year;
562
Output
Chapter 27
Output
PROC MEANS displays the default statistics for all the observations (_TYPE_=0) and the four
class levels of the Status and Year combination (Status=1, Year=97; Status=1, Year=98;
Status=2, Year=97; Status=2, Year=98).
88.333
4.041
84.000
92.000
97
3
3
86.667
4.163
82.000
90.000
98
1
1
81.000
.
81.000
81.000
-----------------------------------------------------------------------------
SORT procedure
Data set:
This example
3 separates the analysis for the combination of class variables within BY values
3 shows the sort order requirement for the BY statement
3 calculates the minimum, maximum, and median.
Program
563
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60;
Sort the GRADE data set. PROC SORT sorts the observations by the variable Section.
Sorting is required in order to use Section as a BY variable in the PROC MEANS step.
proc sort data=Grade out=GradeBySection;
by section;
run;
Specify the analyses. The statistic keywords specify the statistics and their order in the
output.
proc means data=GradeBySection min max median;
Divide the data set into BY groups. The BY statement produces a separate analysis for each
value of Section.
by Section;
Specify the analysis variable. The VAR statement species that PROC MEANS calculate
statistics on the Score variable.
var Score;
Specify subgroups for the analysis. The CLASS statement separates the analysis by the
values of Status and Year. Because there is no TYPES statement in this program, analyses are
performed for each subgroup, within each BY group.
class Status Year;
564
Output
Chapter 27
Output
Final Exam Scores for Student Status and Year of Graduation
Within Each Section
92.0000000
92.0000000
92.0000000
2
97
3
82.0000000
90.0000000
88.0000000
---------------------------------------------------------------------
84.0000000
89.0000000
86.5000000
2
98
1
81.0000000
81.0000000
81.0000000
---------------------------------------------------------------------
This example
3 species the eld width and decimal places of the displayed statistics
3 uses only the values in CLASSDATA= data set as the levels of the combinations of
class variables
3 calculates the range, median, minimum, and maximum
3 displays all combinations of the class variables in the analysis.
Program
565
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60;
Create the CAKETYPE data set. CAKETYPE contains the cake avors and number of layers
that must occur in the PROC MEANS output.
data caketype;
input Flavor $ 1-10
datalines;
Vanilla
1
Vanilla
2
Vanilla
3
Chocolate 1
Chocolate 2
Chocolate 3
;
Layers 12;
Specify the analyses and the analysis options. The FW= option uses a eld width of seven
and the MAXDEC= option uses zero decimal places to display the statistics. CLASSDATA= and
EXCLUSIVE restrict the class levels to the values that are in the CAKETYPE data set.
PRINTALLTYPES displays all combinations of class variables in the output.
proc means data=cake range median min max fw=7 maxdec=0
classdata=caketype exclusive printalltypes;
Specify the analysis variable. The VAR statement species that PROC MEANS calculate
statistics on the TasteScore variable.
var TasteScore;
Specify subgroups for analysis. The CLASS statement separates the analysis by the values
of Flavor and Layers. Note that these variables, and only these variables, must appear in the
CAKETYPE data set.
class flavor layers;
566
Output
Chapter 27
Output
PROC MEANS calculates statistics for the 13 chocolate and vanilla cakes. Because the
CLASSDATA= data set contains 3 as the value of Layers, PROC MEANS uses 3 as a class value
even though the frequency is zero.
20
75
72
92
3
0
.
.
.
.
----------------------------------------------------------
20
75
72
92
19
80
75
94
Vanilla
14
80
73
87
3
0
.
.
.
.
------------------------------------------------------------------------
Program
567
FORMAT procedure
FORMAT statement
Data set: CAKE on page 559
This example
3 computes the statistics for the specied keywords and displays them in order
3 species the eld width of the statistics
3 suppresses the column with the total number of observations
3 analyzes the data for the one-way combination of cake avor and the two-way
combination of cake avor and participants age
3 assigns user-dened formats to the class variables
3 uses multilabel formats as the levels of class variables
3 orders the levels of the cake avors by the descending frequency count and orders
the levels of age by the ascending formatted values.
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=64;
Create the $FLVRFMT. and AGEFMT. formats. PROC FORMAT creates user-dened
formats to categorize the cake avors and ages of the participants. MULTILABEL creates a
multilabel format for Age. A multilabel format is one in which multiple labels can be assigned to
the same value, in this case because of overlapping ranges. Each value is represented in the
output for each range in which it occurs.
proc format;
value $flvrfmt
Chocolate=Chocolate
Vanilla=Vanilla
568
Program
Chapter 27
Rum,Spice=Other Flavor;
value agefmt (multilabel)
15 - 29=below 30 years
30 - 50=between 30 and 50
51 - high=over 50 years
15 - 19=15 to 19
20 - 25=20 to 25
25 - 39=25 to 39
40 - 55=40 to 55
56 - high=56 and above;
run;
Specify the analyses and the analysis options. FW= uses a eld width of six to display the
statistics. The statistic keywords specify the statistics and their order in the output. NONOBS
suppresses the N Obs column.
proc means data=cake fw=6 n min max median nonobs;
Specify subgroups for the analysis. The CLASS statements separate the analysis by values
of Flavor and Age. ORDER=FREQ orders the levels of Flavor by descending frequency count.
ORDER=FMT orders the levels of Age by ascending formatted values. MLF species that
multilabel value formats be used for Age.
class flavor/order=freq;
class age /mlf order=fmt;
Specify which subgroups to analyze. The TYPES statement requests the analysis for the
one-way combination of Flavor and the two-way combination of Flavor and Age.
types flavor flavor*age;
Specify the analysis variable. The VAR statement species that PROC MEANS calculate
statistics on the TasteScore variable.
var TasteScore;
Format the output. The FORMAT statement assigns user-dened formats to the Age and
Flavor variables for this analysis.
format age agefmt. flavor $flvrfmt.;
Output
569
Output
The one-way combination of class variables appears before the two-way combination. A eld
width of six truncates the statistics to four decimal places. For the two-way combination of Age
and Flavor, the total number of observations is greater than the one-way combination of Flavor.
This situation arises because of the multilabel format for age, which maps one internal value to
more than one formatted value.
The order of the levels of Flavor is based on the frequency count for each level. The order of the
levels of Age is based on the order of the user-dened formats.
Flavor
N
Min
Max
Median
-----------------------------------------------Chocolate
9
72.00
92.00
83.00
Vanilla
73.00
94.00
82.00
Other Flavor
4
72.00
91.00
82.00
------------------------------------------------
84.00
84.00
75.00
85.00
81.00
40 to 55
72.00
92.00
82.00
56 and above
84.00
84.00
84.00
below 30 years
75.00
85.00
79.00
between 30 and 50
83.00
92.00
87.50
over 50 years
72.00
84.00
78.00
25 to 39
73.00
80.00
76.50
40 to 55
75.00
75.00
75.00
56 and above
84.00
94.00
87.00
below 30 years
80.00
80.00
80.00
between 30 and 50
73.00
75.00
74.00
over 50 years
Other Flavor
84.00
25 to 39
Vanilla
84.00
94.00
87.00
25 to 39
72.00
83.00
81.00
40 to 55
91.00
91.00
91.00
below 30 years
81.00
81.00
81.00
between 30 and 50
72.00
83.00
77.50
over 50 years
91.00
91.00
91.00
570
Chapter 27
FORMAT procedure
FORMAT statement
Data set: CAKE on page 559
This example
3 species the eld width of the statistics
3 suppresses the column with the total number of observations
3 includes all possible combinations of class variables values in the analysis even if
the frequency is zero
3 considers missing values as valid class levels
3 analyzes the one-way and two-way combinations of class variables
3 assigns user-dened formats to the class variables
3 uses only the preloaded range of user-dened formats as the levels of class
variables
3 orders the results by the value of the formatted data.
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=64;
Create the LAYERFMT. and $FLVRFMT. formats. PROC FORMAT creates user-dened
formats to categorize the number of cake layers and the cake avors. NOTSORTED keeps
$FLVRFMT unsorted to preserve the original order of the format values.
proc format;
value layerfmt 1=single layer
2-3=multi-layer
Program
571
.=unknown;
value $flvrfmt (notsorted)
Vanilla=Vanilla
Orange,Lemon=Citrus
Spice=Spice
Rum,Mint,Almond=Other Flavor;
run;
Generate the default statistics and specify the analysis options. FW= uses a eld width of
seven to display the statistics. COMPLETETYPES includes class levels with a frequency of zero.
MISSING considers missing values valid values for all class variables. NONOBS suppresses the
N Obs column. Because no specic analyses are requested, all default analyses are performed.
proc means data=cake fw=7 completetypes missing nonobs;
Specify subgroups for the analysis. The CLASS statement separates the analysis by values
of Flavor and Layers. PRELOADFMT and EXCLUSIVE restrict the levels to the preloaded
values of the user-dened formats. ORDER=DATA orders the levels of Flavor and Layer by
formatted data values.
class flavor layers/preloadfmt exclusive order=data;
Specify which subgroups to analyze. The WAYS statement requests one-way and two-way
combinations of class variables.
ways 1 2;
Specify the analysis variable. The VAR statement species that PROC MEANS calculate
statistics on the TasteScore variable.
var TasteScore;
Format the output. The FORMAT statement assigns user-dened formats to the Flavor and
Layers variables for this analysis.
format layers layerfmt. flavor $flvrfmt.;
572
Output
Chapter 27
Output
The one-way combination of class variables appears before the two-way combination. PROC
MEANS reports only the level values that are listed in the preloaded range of user-dened
formats even when the frequency of observations is zero (in this case, citrus). PROC MEANS
rejects entire observations based on the exclusion of any single class value in a given
observation. Therefore, when the number of layers is unknown, statistics are calculated for only
one observation. The other observation is excluded because the avor chocolate was not
included in the preloaded user-dened format for Flavor.
The order of the levels is based on the order of the user-dened formats. PROC FORMAT
automatically sorted the Layers format and did not sort the Flavor format.
Layers
N
Mean
Std Dev
Minimum
Maximum
-------------------------------------------------------------unknown
1
84.000
.
84.000
84.000
single layer
3
83.000
9.849
75.000
94.000
multi-layer
6
81.167
7.548
72.000
91.000
-------------------------------------------------------------Analysis Variable : TasteScore
Flavor
N
Mean
Std Dev
Minimum
Maximum
-------------------------------------------------------------Vanilla
6
82.167
7.834
73.000
94.000
Citrus
0
.
.
.
.
Spice
3
85.000
5.292
81.000
91.000
Other Flavor
1
72.000
.
72.000
72.000
-------------------------------------------------------------Analysis Variable : TasteScore
Flavor
Layers
N
Mean
Std Dev
Minimum
Maximum
-----------------------------------------------------------------------------Vanilla
unknown
1
84.000
.
84.000
84.000
single layer
75.000
94.000
80.000
9.899
73.000
87.000
unknown
multi-layer
unknown
single layer
multi-layer
Other Flavor
9.849
single layer
Spice
83.000
multi-layer
Citrus
85.000
5.292
81.000
91.000
unknown
single layer
multi-layer
72.000
72.000
72.000
Program
573
This example
3 species the eld width and number of decimal places of the statistics
3 computes a two-sided 90 percent condence limit for the mean values of
MoneyRaised and HoursVolunteered for the three years of data.
If this data is representative of a larger population of volunteers, then the condence
limits provide ranges of likely values for the true population means.
Program
Create the CHARITY data set. CHARITY contains information about high-school students
volunteer work for a charity. The variables give the name of the high school, the year of the
fund-raiser, the rst name of each student, the amount of money each student raised, and the
number of hours each student volunteered. A DATA step on page 1378 creates this data set.
data charity;
input School $ 1-7 Year 9-12 Name $ 14-20 MoneyRaised 22-26
HoursVolunteered 28-29;
datalines;
Monroe 1992 Allison 31.65 19
Monroe 1992 Barry
23.76 16
Monroe 1992 Candace 21.11 5
. . . more data lines . . .
Kennedy 1994 Sid
Kennedy 1994 Will
Kennedy 1994 Morty
;
27.45 25
28.88 21
34.44 25
Specify the analyses and the analysis options. FW= uses a eld width of eight and
MAXDEC= uses two decimal places to display the statistics. ALPHA=0.1 species a 90%
condence limit, and the CLM keyword requests two-sided condence limits. MEAN and STD
request the mean and the standard deviation, respectively.
proc means data=charity fw=8 maxdec=2 alpha=0.1 clm mean std;
574
Output
Chapter 27
Specify subgroups for the analysis. The CLASS statement separates the analysis by values
of Year.
class Year;
Specify the analysis variables. The VAR statement species that PROC MEANS calculate
statistics on the MoneyRaised and HoursVolunteered variables.
var MoneyRaised HoursVolunteered;
Output
PROC MEANS displays the lower and upper condence limits for both variables for each year.
32
1994
46
MoneyRaised
HoursVolunteered
25.17
15.86
31.58
20.02
28.37
17.94
10.69
6.94
MoneyRaised
26.73
33.78
30.26
14.23
HoursVolunteered
19.68
22.63
21.15
5.96
-----------------------------------------------------------------------------
Program
575
PRINT procedure
Data set:
This example
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60;
Specify the analysis options. NOPRINT suppresses the display of all PROC MEANS output.
proc means data=Grade noprint;
Specify subgroups for the analysis. The CLASS statement separates the analysis by values
of Status and Year.
class Status Year;
576
Output
Chapter 27
Specify the analysis variable. The VAR statement species that PROC MEANS calculate
statistics on the FinalGrade variable.
var FinalGrade;
Specify the output data set options. The OUTPUT statement creates the SUMSTAT data
set and writes the mean value for the nal grade to the new variable AverageGrade. IDGROUP
writes the name of the student with the top exam score to the variable BestScore and the
observation number that contained the top score. WAYS and LEVELS write information on how
the class variables are combined.
output out=sumstat mean=AverageGrade
idgroup (max(score) obs out (name)=BestScore)
/ ways levels;
run;
Print the output data set WORK.SUMSTAT. The NOOBS option suppresses the observation
numbers.
proc print data=sumstat noobs;
title1 Average Undergraduate and Graduate Course Grades;
title2 For Two Years;
run;
Output
The rst observation contains the average course grade and the name of the student with the
highest exam score over the two-year period. The next four observations contain values for each
class variable value. The remaining four observations contain values for the Year and Status
combination. The variables _WAY_, _TYPE_, and _LEVEL_ show how PROC MEANS created
the class variable combinations. The variable _OBS_ contains the observation number in the
GRADE data set that contained the highest exam score.
Status
Year
97
98
1
2
1
1
2
2
97
98
97
98
_WAY_
_TYPE_
_LEVEL_
_FREQ_
Average
Grade
0
1
1
1
1
2
2
2
2
0
1
1
2
2
3
3
3
3
1
1
2
1
2
1
2
3
4
10
6
4
6
4
3
3
3
1
83.0000
83.6667
82.0000
82.5000
83.7500
79.3333
85.6667
88.0000
71.0000
Best
Score
Branford
Jasper
Branford
Branford
Abbott
Jasper
Branford
Abbott
Crandell
_OBS_
2
10
2
2
1
10
2
1
3
Program
577
PRINT procedure
WHERE= data set option
Data set:
This example
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60;
Specify the analysis options. NOPRINT suppresses the display of all PROC MEANS output.
DESCEND orders the observations in the OUT= data set by descending _TYPE_ value.
proc means data=Grade noprint descend;
Specify subgroups for the analysis. The CLASS statement separates the analysis by values
of Status and Year.
class Status Year;
578
Output
Chapter 27
Specify the analysis variables. The VAR statement species that PROC MEANS calculate
statistics on the Score and FinalGrade variables.
var Score FinalGrade;
Specify the output data set options. The OUTPUT statement writes the mean for Score and
FinalGrade to variables of the same name. The median nal grade is written to the variable
MedianGrade. The WHERE= data set option restricts the observations in SUMDATA. One
observation contains overall statistics (_TYPE_=0). The remainder must have a status of 1.
output out=Sumdata (where=(status=1 or _type_=0))
mean= median(finalgrade)=MedianGrade;
run;
Output
The rst three observations contain statistics for the class variable levels with a status of 1.
The last observation contains the statistics for all the observations (no subgroup). Score
contains the mean test score and FinalGrade contains the mean nal grade.
Obs
1
2
3
4
Status
1
1
1
Year
97
98
_TYPE_
3
3
2
0
_FREQ_
Score
Final
Grade
Median
Grade
3
3
6
10
84.6667
88.3333
86.5000
86.0000
79.3333
85.6667
82.5000
83.0000
73
80
80
83
Example 10: Computing Output Statistics with Missing Class Variable Values
Procedure features:
Program
579
ASCENDING
MISSING
ORDER=
OUTPUT statement
Other features:
PRINT procedure
Data set:
This example
3
3
3
3
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60;
Specify the analysis options. NWAY prints observations with the highest _TYPE_ value.
NOPRINT suppresses the display of all PROC MEANS output.
proc means data=cake nway noprint;
Specify subgroups for the analysis. The CLASS statements separate the analysis by Flavor
and Layers. ORDER=FREQ and ASCENDING order the levels of Flavor by ascending
frequency. MISSING uses missing values of Layers as a valid class level value.
class flavor /order=freq ascending;
class layers /missing;
Specify the analysis variable. The VAR statement species that PROC MEANS calculate
statistics on the TasteScore variable.
var TasteScore;
580
Output
Chapter 27
Specify the output data set options. The OUTPUT statement creates the CAKESTAT data
set and outputs the maximum value for the taste score to the new variable HighScore.
output out=cakestat max=HighScore;
run;
Output
The CAKESTAT output data set contains only observations for the combination of both class
variables, Flavor and Layers. Therefore, the value of _TYPE_ is 3 for all observations. The
observations are ordered by ascending frequency of Flavor. The missing value in Layers is a
valid value for this class variable. PROC MEANS excludes the observation with the missing
avor because it is an invalid value for Flavor.
Obs
1
2
3
4
5
6
7
8
9
Flavor
Rum
Spice
Spice
Vanilla
Vanilla
Vanilla
Chocolate
Chocolate
Chocolate
Layers
2
2
3
.
1
2
.
1
2
_TYPE_
3
3
3
3
3
3
3
3
3
_FREQ_
1
2
1
1
3
2
1
5
3
High
Score
72
83
91
84
94
87
84
85
92
CLASS statement
OUTPUT statement options:
statistic keyword
MAXID
Other features:
PRINT procedure
Data set:
Program
581
This example
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60;
Specify the analyses. The statistic keywords specify the statistics and their order in the
output. CHARTYPE writes the _TYPE_ values as binary characters in the output data set
proc means data=Charity n mean range chartype;
Specify subgroups for the analysis. The CLASS statement separates the analysis by School
and Year.
class School Year;
Specify the analysis variables. The VAR statement species that PROC MEANS calculate
statistics on the MoneyRaised and HoursVolunteered variables.
var MoneyRaised HoursVolunteered;
Specify the output data set options. The OUTPUT statement writes the new variables,
MostCash and MostTime, which contain the names of the students who collected the most
money and volunteered the most time, respectively, to the PRIZE data set.
output out=Prize maxid(MoneyRaised(name)
HoursVolunteered(name))= MostCash MostTime
max= ;
582
Output
Chapter 27
Output
The rst page of output shows the output from PROC MEANS with the statistics for six class
levels: one for Monroe High for the years 1992, 1993, and 1994; and one for Kennedy High for
the same three years.
MoneyRaised
HoursVolunteered
20
20
28.5660000
19.2000000
23.5600000
20.0000000
1994
18
MoneyRaised
HoursVolunteered
18
18
31.5794444
24.2777778
65.4400000
15.0000000
1992
16
MoneyRaised
HoursVolunteered
16
16
28.5450000
18.8125000
48.2700000
38.0000000
1993
Monroe
20
12
MoneyRaised
HoursVolunteered
12
12
28.0500000
15.8333333
52.4600000
21.0000000
1994
28
MoneyRaised
28
29.4100000
73.5300000
HoursVolunteered
28
19.1428571
26.0000000
-----------------------------------------------------------------------------
Example 12: Identifying the Top Three Extreme Values with the Output Statistics
583
The output from PROC PRINT shows the maximum MoneyRaised and HoursVolunteered values
and the names of the students who are responsible for them. The rst observation contains the
overall results, the next three contain the results by year, the next two contain the results by
school, and the nal six contain the results by School and Year.
Obs
1
2
3
4
5
6
7
8
9
10
11
12
School
Year
_TYPE_
Kennedy
Monroe
Kennedy
Kennedy
Kennedy
Monroe
Monroe
Monroe
.
1992
1993
1994
.
.
1992
1993
1994
1992
1993
1994
00
01
01
01
10
10
11
11
11
11
11
11
_FREQ_
109
31
32
46
53
56
15
20
18
16
12
28
Most
Cash
Most
Time
Willard
Tonya
Cameron
Willard
Luther
Willard
Thelma
Bill
Luther
Tonya
Cameron
Willard
Tonya
Tonya
Amy
L.T.
Jay
Tonya
Jay
Amy
Che-Min
Tonya
Myrtle
L.T.
Money
Raised
78.65
55.16
65.44
78.65
72.22
78.65
52.63
42.23
72.22
55.16
65.44
78.65
Hours
Volunteered
40
40
31
33
35
40
35
31
33
40
26
33
Example 12: Identifying the Top Three Extreme Values with the Output
Statistics
Procedure features:
FORMAT procedure
FORMAT statement
PRINT procedure
RENAME = data set option
Data set: CHARITY on page 573
This example
3 suppresses the display of PROC MEANS output
3 analyzes the data for the one-way combination of the class variables and across all
observations
584
Program
Chapter 27
3 stores the total and average amount of money raised in new variables
3 stores in new variables the top three amounts of money raised, the names of the
three students who raised the money, the years when it occurred, and the schools
the students attended
3 automatically resolves conicts in the variable names when names are assigned to
the new variables in the output data set
3 appends the statistic name to the label of the variables in the output data set that
contain statistics that were computed for the analysis variable.
3 assigns a format to the analysis variable so that the statistics that are computed
from this variable inherit the attribute in the output data set
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60;
Create the YRFMT. and $SCHFMT. formats. PROC FORMAT creates user-dened formats
that assign the value of All to the missing levels of the class variables.
proc format;
value yrFmt . = " All";
value $schFmt = "All
run;
";
Generate the default statistics and specify the analysis options. NOPRINT suppresses
the display of all PROC MEANS output.
proc means data=Charity noprint;
Specify subgroups for the analysis. The CLASS statement separates the analysis by values
of School and Year.
class School Year;
Specify which subgroups to analyze. The TYPES statement requests the analysis across all
the observations and for each one-way combination of School and Year.
types () school year;
Program
585
Specify the analysis variable. The VAR statement species that PROC MEANS calculate
statistics on the MoneyRaised variable.
var MoneyRaised;
Specify the output data set options. The OUTPUT statement creates the TOP3LIST data
set. RENAME= renames the _FREQ_ variable that contains frequency count for each class level.
SUM= and MEAN= specify that the sum and mean of the analysis variable (MoneyRaised) are
written to the output data set. IDGROUP writes 12 variables that contain the top three
amounts of money raised and the three corresponding students, schools, and years.
AUTOLABEL appends the analysis variable name to the label for the output variables that
contain the sum and mean. AUTONAME resolves naming conicts for these variables.
output out=top3list(rename=(_freq_=NumberStudents))sum= mean=
idgroup( max(moneyraised) out[3] (moneyraised name
school year)=)/autolabel autoname;
Format the output. The LABEL statement assigns a label to the analysis variable
MoneyRaised. The FORMAT statement assigns user-dened formats to the Year and School
variables and a SAS dollar format to the MoneyRaised variable.
label MoneyRaised=Amount Raised;
format year yrfmt. school $schfmt.
moneyraised dollar8.2;
run;
Display information about the TOP3LIST data set. PROC DATASETS displays the contents
of the TOP3LIST data set. NOLIST suppresses the directory listing for the WORK data library.
proc datasets library=work nolist;
contents data=top3list;
title1 Contents of the PROC MEANS Output Data Set;
run;
586
Output
Chapter 27
Output
The output from PROC PRINT shows the top three values of MoneyRaised, the names of the
students who raised these amounts, the schools the students attended, and the years when the
money was raised. The rst observation contains the overall results, the next three contain the
results by year, and the nal two contain the results by school. The missing class levels for
School and Year are replaced with the value ALL.
The labels for the variables that contain statistics that were computed from MoneyRaised
include the statistic name at the end of the label.
Obs School
1
2
3
4
5
6
All
All
All
1992
All
1993
All
1994
Kennedy All
Monroe
All
Obs Name_1
1
2
3
4
5
6
Number
Year _TYPE_ Students
Willard
Tonya
Cameron
Willard
Luther
Willard
0
1
1
1
2
2
109
31
32
46
53
56
Money
Raised_
Sum
Money
Raised_ Money
Money
Money
Mean Raised_1 Raised_2 Raised_3
$3192.75
$892.92
$907.92
$1391.91
$1575.95
$1616.80
$29.29
$28.80
$28.37
$30.26
$29.73
$28.87
$78.65
$55.16
$65.44
$78.65
$72.22
$78.65
$72.22
$53.76
$47.33
$72.22
$52.63
$65.44
$65.44
$52.63
$42.23
$56.87
$43.89
$56.87
Name_2
Name_3
Luther
Edward
Myrtle
Luther
Thelma
Cameron
Cameron
Thelma
Bill
L.T.
Jenny
L.T.
Monroe
Monroe
Monroe
Monroe
Kennedy
Monroe
Kennedy
Monroe
Monroe
Kennedy
Kennedy
Monroe
Monroe
Kennedy
Kennedy
Monroe
Kennedy
Monroe
1994
1992
1993
1994
1994
1994
1994
1992
1993
1994
1992
1993
1993
1992
1993
1994
1992
1994
References
587
WORK.TOP3LIST
DATA
V9
18:59 Thursday, March 14, 2002
18:59 Thursday, March 14, 2002
WINDOWS
wlatin1
Observations
Variables
Indexes
Observation Length
Deleted Observations
Compressed
Sorted
6
18
0
144
0
NO
NO
Western (Windows)
12288
1
1
85
6
0
filename
9.0000B0
WIN_PRO
Variable
Type
Len
MoneyRaised_1
MoneyRaised_2
MoneyRaised_3
MoneyRaised_Mean
MoneyRaised_Sum
Name_1
Name_2
Name_3
NumberStudents
School
School_1
School_2
School_3
Year
Year_1
Year_2
Year_3
_TYPE_
Num
Num
Num
Num
Num
Char
Char
Char
Num
Char
Char
Char
Char
Num
Num
Num
Num
Num
8
8
8
8
8
7
7
7
8
7
7
7
7
8
8
8
8
8
Format
Label
DOLLAR8.2
DOLLAR8.2
DOLLAR8.2
DOLLAR8.2
DOLLAR8.2
Amount
Amount
Amount
Amount
Amount
Raised
Raised
Raised
Raised_Mean
Raised_Sum
$SCHFMT.
$SCHFMT.
$SCHFMT.
$SCHFMT.
YRFMT.
YRFMT.
YRFMT.
YRFMT.
See the TEMPLATE procedure in SAS Output Delivery System: Users Guide for an
example of how to create a custom table denition for this output data set.
References
Jain R. and Chlamtac I., (1985) The P2 Algorithm for Dynamic Calculation of
Quantiles and Histograms Without Sorting Observations, Communications of the
Association of Computing Machinery, 28:10.
588
589
CHAPTER
28
The MIGRATE Procedure
Information about the MIGRATE Procedure
589
See:
590
591
CHAPTER
29
The OPTIONS Procedure
Overview: OPTIONS Procedure 591
What Does the OPTIONS Procedure Do? 591
What Types of Output Does PROC OPTIONS Produce? 591
Displaying the Settings of a Group of Options 593
Syntax: OPTIONS Procedure 595
PROC OPTIONS Statement 595
Results: OPTIONS Procedure 596
Examples: OPTIONS Procedure 597
Example 1: Producing the Short Form of the Options Listing
Example 2: Displaying the Setting of a Single Option 598
597
3
3
3
3
3
3
For information about SAS system options, see the section on SAS system options in
SAS Language Reference: Dictionary.
592
Chapter 29
The following example shows a partial log that displays the settings of portable
options.
proc options;
run;
Output 29.1
Portable Options:
APPLETLOC=(system-specific pathname)
Location of Java applets
ARMAGENT=
ARM Agent to use to collect ARM records
ARMLOC=ARMLOC.LOG Identify location where ARM records are to be written
ARMSUBSYS=(ARM_NONE)
Enable/Disable ARMing of SAS subsystems
NOASYNCHIO
Do not enable asynchronous input/output
AUTOSAVELOC=
Identifies the location where program editor contents are
auto saved
NOAUTOSIGNON
SAS/CONNECT remote submit will not automatically attempt
to SIGNON
NOBATCH
Do not use the batch set of default values for SAS system
options
BINDING=DEFAULT
Controls the binding edge for duplexed output
BOTTOMMARGIN=0.000
Bottom margin for printed output
BUFNO=1
Number of buffers for each SAS data set
BUFSIZE=0
Size of buffer for page of SAS data set
BYERR
Set the error flag if a null data set is input to the SORT
procedure
BYLINE
Print the by-line at the beginning of each by-group
BYSORTED
Require SAS data set observations to be sorted for BY
processing
NOCAPS
Do not translate source input to uppercase
NOCARDIMAGE
Do not process SAS source and data lines as 80-byte records
CATCACHE=0
Number of SAS catalogs to keep in cache memory
CBUFNO=0
Number of buffers to use for each SAS catalog
CENTER
Center SAS procedure output
NOCHARCODE
Do not use character combinations as substitute for
special characters not on the keyboard
CLEANUP
Attempt recovery from out-of-resources condition
NOCMDMAC
Do not support command-style macros
CPMLIB=
Identify previously compiled libraries of CMP subroutines
to use when linking
CMPOPT=(NOEXTRAMATH NOMISSCHECK NOPRECISE NOGUARDCHECK)
Enable SAS compiler performance optimizations
NOCOLLATE
Do not collate multiple copies of printed output
COLORPRINTING
Print in color if printer supports color
COMAMID=TCP
Specifies the communication access method to be used for
SAS distributed products
COMPRESS=NO
Specifies whether to compress observations in output SAS
data sets
To view the setting of a particular option, you can use the option parameter on PROC
OPTIONS. The following example shows a log that PROC OPTIONS produces for a
single SAS system option.
options pagesize=60;
proc options option=pagesize;
run;
Output 29.2
25
26
27
options pagesize=60;
proc options option=pagesize;
run;
SAS (r) Proprietary Software Release XXX
PAGESIZE=60
Output 29.3
6
7
Set the error flag if a null data set is input to the SORT
procedure
CLEANUP
Attempt recovery from out-of-resources condition
NODMSSYNCHK
Do not enable syntax check, in windowing mode, for a
submitted statement block
DSNFERR
Generate error when SAS data set not found condition occurs
NOERRORABEND
Do not abend on error conditions
NOERRORBYABEND
Do not abend on By-group error condition
ERRORCHECK=NORMAL Level of special error processing to be performed
ERRORS=20
Maximum number of observations for which complete error
messages are printed
FMTERR
Treat missing format or informat as an error
QUOTELENMAX
Enable warning for quoted string length max
VNFERR
Treat variable not found on _NULL_ SAS data set as an error
The following table lists the values that are available when you use the GROUP=
option with PROC OPTIONS.
Values for Use with GROUP=
COMMUNICATIONS
GRAPHICS
MACRO
DATAQUALITY
HELP
MEMORY
INPUTCONTROL
META
ENVDISPLAY
INSTALL
ODSPRINT
ENVFILES
LANGUAGECONTROL
PERFORMANCE
593
594
Chapter 29
LISTCONTROL
SASFILES
EXECMODES
LOG_LISTCONTROL
SORT
EXTFILES
LOGCONTROL
IDMS
ORACLE
DATACOM
IMS
REXX
DB2
ISPF
595
To do this
LONG
SHORT
DEFINE
VALUE
GROUP=
HOST
NOHOST | PORT
OPTION=
Options
DEFINE
displays the short description of the option, the option group, and the option type. It
displays information about when the option can be set, whether an option can be
restricted, and whether the PROC OPTSAVE will save the option.
Interaction: This option has no effect when SHORT is specied.
GROUP=group-name
displays the options in the group specied by group-name. For more information on
options groups, see Displaying the Settings of a Group of Options on page 593.
HOST | NOHOST
displays only host options (HOST) or displays only portable options (NOHOST).
Alias:
596
Chapter 29
LONG | SHORT
species the format for displaying the settings of the SAS system options. LONG
lists each option on a separate line with a description; SHORT produces a
compressed listing without the descriptions.
Default: LONG
Featured in:
NOHOST | PORT
displays a short description and the value (if any) of the option specied by
option-name. DEFINE and VALUE provide additional information about the option.
option-name
species the option to use as input to the procedure.
If a SAS system option uses an equals sign, such as PAGESIZE=, do
not include the equals sign when specifying the option to OPTION=.
Requirement:
Featured in:
SHORT
displays the option value and scope, as well as how the value was set.
Interaction: This option has no effect when SHORT is specied.
Note: SAS options that are passwords, such as EMAILPW and METAPASS,
return the value xxxxxxxx and not the actual password. 4
Program
597
This example shows how to generate the short form of the listing of SAS system
option settings. Compare this short form with the long form that is shown in
Overview: OPTIONS Procedure on page 591.
Program
List all options and their settings. SHORT lists the SAS system options and their settings
without any descriptions.
proc options short;
run;
598
Log (partial)
Chapter 29
Log (partial)
1
2
Portable Options:
APPLETLOC=(system-specific pathname) ARMAGENT= ARMLOC=ARMLOC.LOG ARMSUBSYS=
(ARM_NONE) NOASYNCHIO AUTOSAVELOC= NOAUTOSIGNON NOBATCH BINDING=DEFAULT
BOTTOMMARGIN=0.000 IN BUFNO=1 BUFSIZE=0 BYERR BYLINE BYSORTED NOCAPS
NOCARDIMAGE CATCACHE=0 CBUFNO=0 CENTER NOCHARCODE CLEANUP NOCMDMACCMPLIB=
CMPOPT=(NOEXTRAMATH NOMISSCHECK NOPRECISE NOGUARDCHECK) NOCOLLATE COLORPRINTING
COMAMID=TCP COMPRESS=NO CONNECTPERSIST CONNECTREMOTE= CONNECTSTATUS CONNECTWAIT
CONSOLELOG= COPIES=1 CPUCOUNT=1 CPUID DATASTMTCHK=COREKEYWORDS DATE DATESTYLE=MDY
DBSLICEPARM=(THREADED_APPS, 2) DBSRVTP=NONE NODETAILS DEVICE= DFLANG=ENGLISH
DKRICOND=ERROR DKROCOND=WARN DLDMGACTION=REPAIR NODMR DMS NODMSEXP DMSLOGSIZE=99999
DMSOUTSIZE=99999 NODMSSYNCHK DQLOCALE= DQSETUPLOC= DSNFERR NODTRESET NODUPLEX
NOECHOAUTO EMAILAUTHPROTOCOL=NONE EMAILHOST=LOCALHOST EMAILID= EMAILPORT=25 EMAILPW=
ENGINE=V9 NOERRORABEND NOERRORBYABEND ERRORCHECK=NORMAL ERRORS=20 NOEXPLORER
FIRSTOBS=1 FMTERR FMTSEARCH=(WORK LIBRARY) FONTSLOC=(system-specific pathname)
FORMCHAR=$<>\^_{|}~+=|-/\<>* FORMDLIM= FORMS=DEFAULT GISMAPS= GWINDOW HELPENCMD
HELPINDEX=(/help/common.hlp/index.txt /help/common.hlp/keywords.htm common.hhk)
HELPTOC=(/help/helpnav.hlp/config.txt /help/common.hlp/toc.htm common.hhc)
IBUFSIZE=0 NOIMPLMAC INITCMD= INITSTMT= INVALIDDATA=. LABEL LEFTMARGIN=0.000 IN
LINESIZE=97 LOGPARM= MACRO MAPS=(system-specific pathname) NOMAUTOLOCDISPLAY
MAUTOSOURCE MAXSEGRATIO=75 MCOMPILENOTE=NONE MERGENOBY=NOWARN MERROR
METAAUTORESOURCES= METACONNECT= METAENCRYPTALG=NONE METAENCRYPTLEVEL=EVERYTHING
METAID= METAPASS= METAPORT=0 METAPROFILE= METAPROTOCOL=BRIDGE METAREPOSITORY=Default
METASERVER= METAUSER= NOMFILE MINDELIMITER= MINPARTSIZE=0 MISSING=. NOMLOGIC
NOMLOGICNEST NOMPRINT NOMPRINTNEST NOMRECALL MSGLEVEL=N NOMSTORED MSYMTABMAX=4194304
NOMULTENVAPPL MVARSIZE=4096 NONETENCRYPT NETENCRYPTALGORITHM= NETENCRYPTKEYLEN=0
NETMAC NEWS= NOTES NUMBER NOOBJECTSERVER OBS=9223372036854775807 ORIENTATION=PORTRAIT
NOOVP NOPAGEBREAKINITIAL PAGENO=1 PAGESIZE=55 PAPERDEST= PAPERSIZE=LETTER
PAPERSOURCE= PAPERTYPE=PLAIN PARM= PARMCARDS=FT15F001 PRINTERPATH= NOPRINTINIT
PRINTMSGLIST QUOTELENMAX REPLACE REUSE=NO RIGHTMARGIN=0.000 IN NORSASUSER S=0
S2=0 SASAUTOS=(system-specific pathname) SASCMD= SASFRSCR=
SASHELP=(system-specific pathname) SASMSTORE= SASSCRIPT=
SASUSER=(system-specific pathname) SEQ=8 SERROR NOSETINIT SIGNONWAIT SKIP=0
SOLUTIONS SORTDUP=PHYSICAL SORTEQUALS SORTSEQ= SORTSIZE=2097152 SOURCE NOSOURCE2
SPDEINDEXSORTSIZE=33554432 SPDEMAXTHREADS=0 SPDESORTSIZE=33554432 SPDEUTILLOC=
SPDEWHEVAL=COST NOSPOOL NOSSLCLIENTAUTH NOSSLCRLCHECK STARTLIB SUMSIZE=0
NOSYMBOLGEN SYNCHIO SYNTAXCHECK SYSPARM= SYSPRINTFONT= NOSYSRPUTSYNC TBUFSIZE=0
TCPPORTFIRST=0 TCPPORTLAST=0 TERMINAL TERMSTMT= TEXTURELOC=\\dntsrc\sas\m901\ods\misc
THREADS TOOLSMENU TOPMARGIN=0.000 IN TRAINLOC= TRANTAB= UNIVERSALPRINT USER= UTILLOC=
UUIDCOUNT=100 UUIDGENDHOST= V6CREATEUPDATE=NOTE VALIDFMTNAME=LONG VALIDVARNAME=V7
VIEWMENU VNFERR WORK=(system-specific pathname) WORKINIT WORKTERM YEARCUTOFF=1920
_LAST_=_NULL_
Program
599
This example shows how to display the setting of a single SAS system option. The
log shows the current setting of the SAS system option CENTER. The DEFINE and
VALUE options display additional information.
Program
Set the CENTER SAS system option.OPTION=CENTER displays option value information.
DEFINE and VALUE display additional information.
Output 29.4
29
30
600
601
CHAPTER
30
The OPTLOAD Procedure
Overview: OPTLOAD Procedure 601
What Does the OPTLOAD Procedure Do?
Syntax: OPTLOAD Procedure 601
PROC OPTLOAD Statement 602
601
3
3
3
3
3
3
3
3
3
3
602
Chapter 30
To do this
KEY=
DATA=
Options
DATA=libref.dataset
species the library and data set name from where SAS system option settings are
loaded. The SAS variable OPTNAME contains the character value of the SAS system
option name, and the SAS variable OPTVALUE contains the character value of the
SAS system option setting.
Requirement:
Default: If you omit the DATA= option and the KEY= option, the procedure will use
the default SAS library and data set. The default library is where the current user
prole resides. Unless you specify a library, the default library is SASUSER. If
SASUSER is being used by another active SAS session, then the temporary
WORK library is the default location from which the data set is loaded. The
default data set name is MYOPTS.
KEY=SAS registry key
species the location in the SAS registry of stored SAS system option settings. The
registry is retained in SASUSER. If SASUSER is not available, then the temporary
WORK library is used. For example, KEY="OPTIONS".
Requirement:
You must use quotation marks around the SAS registry key name.
Separate the names in a sequence of key names with a backslash (\). For
example, KEY=CORE\OPTIONS.
Requirement:
603
CHAPTER
31
The OPTSAVE Procedure
Overview: OPTSAVE Procedure 603
What Does the OPTSAVE Procedure Do?
Syntax: OPTSAVE Procedure
603
PROC OPTSAVE Statement 604
603
3
3
3
3
3
3
3
3
3
3
604
Chapter 31
To do this
KEY=
OUT=
Options
KEY=SAS registry key
species the location in the SAS registry of stored SAS system option settings. The
registry is retained in SASUSER. If SASUSER is not available, then the temporary
WORK library is used. For example, KEY="OPTIONS".
Restriction: SAS registry key names cannot span multiple lines.
Requirement:
Requirement:
Requirement:
Tip:
You must use quotation marks around the SAS registry key name.
To specify a subkey, enter multiple key names starting with the root key.
If the key already exists, it will be overwritten. If the specied key does
not already exist in the current SAS registry, then the key is automatically created
when option settings are saved in the SAS registry.
Caution:
OUT=libref.dataset
species the names of the library and data set where SAS system option settings are
saved. The SAS variable OPTNAME contains the character value of the SAS system
option name. The SAS variable OPTVALUE contains the character value of the SAS
system option setting.
Caution:
Default: If you omit the OUT= and the KEY= options, the procedure will use the
default SAS library and data set. The default SAS library is where the current
user prole resides. Unless you specify a SAS library, the default library is
SASUSER. If SASUSER is in use by another active SAS session, then the
temporary WORK library is the default location where the data set is saved. The
default data set name is MYOPTS.
605
CHAPTER
32
The PLOT Procedure
Overview: PLOT Procedure 606
Syntax: PLOT Procedure 608
PROC PLOT Statement 609
BY Statement 612
PLOT Statement 613
Concepts: PLOT Procedure 624
RUN Groups 624
Generating Data with Program Statements 625
Labeling Plot Points with Values of a Variable 625
Pointer Symbols 625
Understanding Penalties 626
Changing Penalties 627
Collision States 628
Reference Lines 628
Hidden Label Characters 628
Overlaying Label Plots 628
Computational Resources Used for Label Plots 628
Time 629
Memory 629
Results: PLOT Procedure 629
Scale of the Axes 629
Printed Output 629
ODS Table Names 629
Portability of ODS Output with PROC PLOT 630
Missing Values 630
Hidden Observations 630
Examples: PLOT Procedure 631
Example 1: Specifying a Plotting Symbol 631
Example 2: Controlling the Horizontal Axis and Adding a Reference Line 632
Example 3: Overlaying Two Plots 634
Example 4: Producing Multiple Plots per Page 636
Example 5: Plotting Data on a Logarithmic Scale 639
Example 6: Plotting Date Values on an Axis 640
Example 7: Producing a Contour Plot 642
Example 8: Plotting BY Groups 646
Example 9: Adding Labels to a Plot 649
Example 10: Excluding Observations That Have Missing Values 652
Example 11: Adjusting Labels on a Plot with the PLACEMENT= Option 654
Example 12: Adjusting Labeling on a Plot with a Macro 658
Example 13: Changing a Default Penalty 661
606
Chapter 32
Output 32.1
A Simple Plot
High Values of the Dow Jones
Industrial Average
from 1954 to 1994
Plot of High*Year.
4000 +
A
|
A
|
AA
High |
A
|
A A
|
A
2000 +
A
|
A
|
AA
|
AAAAAAAAAAAAAAAAAAA
|
AAAAAAAA
|
AA
0 +
---+---------+---------+---------+---------+---------+-1950
1960
1970
1980
1990
2000
Year
You can also overlay two plots, as shown in Output 32.2. One plot shows the high
values of the DJIA; the other plot shows the low values. The plot also shows that you
can specify plotting symbols and put a box around a plot. The statements that produce
Output 32.2 are shown in Example 3 on page 634.
Output 32.2
607
PROC PLOT can also label points on a plot with the values of a variable, as shown in
Output 32.3. The plotted data represents population density and crime rates for
selected U.S. states. The SAS code that produces Output 32.3 is shown in Example 11
on page 654.
608
Output 32.3
Chapter 32
---+------------+------------+------------+------------+------------+------------+------------+--Density |
|
500 +
|
+
|
|
|
|
Maryland
M
|
|
|
|
|
|
|
|
|
|
|
|
Pennsylvania
|
250 +
|
|
|
|
|
+
|
Illinois
|
|
|
|
|
Delaware
D
Ohio
New
Hampshire
West
| Virginia
|
W
|
|
|
0 +
Florida
F
North Carolina
South
Alabama N
Carolina
N
T
S
Mississippi
A Tennessee
M Vermont V
M Missouri
California
C
G Georgia
Oklahoma
South
Arkansas A
M Minnesota
Dakota
I Idaho
S N North Dakota
O
Nevada
N
Washington Texas
W
T
Oregon
O
|
|
|
|
|
|
|
|
|
|
+
---+------------+------------+------------+------------+------------+------------+------------+--2000
3000
4000
5000
6000
7000
8000
9000
CrimeRate
To do this
BY
609
PLOT
To do this
DATA=
MISSING
NOMISS
UNIFORM
FORMCHAR=
NOLEGEND
VTOH=
HPERCENT=
VPERCENT=
Options
DATA=SAS-data-set
Procedures."
610
Chapter 32
FORMCHAR <(position(s))>=formatting-character(s)
denes the characters to use for constructing the borders of the plot.
position(s)
identies the position of one or more characters in the SAS formatting-character
string. A space or a comma separates the positions.
Default: Omitting (position(s)) is the same as specifying all twenty possible SAS
following table shows the formatting characters that PROC PLOT uses.
Position
Default
Used to draw
vertical separators
horizontal separators
35911
corners
intersection of vertical
and horizontal separators
formatting-character(s)
lists the characters to use for the specied positions. PROC PLOT assigns
characters in formatting-character(s) to position(s), in the order that they are
listed. For instance, the following option assigns the asterisk (*) to the third
formatting character, the pound sign (#) to the seventh character, and does not
alter the remaining characters:
formchar(3,7)=*#
Interaction: The SAS system option FORMCHAR= species the default formatting
characters. The system option denes the entire string of formatting characters.
The FORMCHAR= option in a procedure can redene selected characters.
You can use any character in formatting-characters, including hexadecimal
characters. If you use hexadecimal characters, then you must put an x after the
closing quotation mark. For instance, the following option assigns the hexadecimal
character 2D to the third formatting character, the hexadecimal character 7C to
the seventh character, and does not alter the remaining characters:
Tip:
formchar(3,7)=2D7Cx
Tip:
formchar (1,2,7)=
HPERCENT=percent(s)
species one or more percentages of the available horizontal space to use for each
plot. HPERCENT= enables you to put multiple plots on one page. PROC PLOT tries
to t as many plots as possible on a page. After using each of the percent(s), PROC
PLOT cycles back to the beginning of the list. A zero in the list forces PROC PLOT to
go to a new page even if it could t the next plot on the same page.
hpercent=33
prints three plots per page horizontally; each plot is one-third of a page wide.
611
hpercent=50 25 25
prints three plots per page; the rst is twice as wide as the other two.
hpercent=33 0
produces plots that are one-third of a page wide,; each plot is on a separate page.
hpercent=300
HPCT=
Default: 100
Featured in:
MISSING
includes missing character variable values in the construction of the axes. It has no
effect on numeric variables.
Interaction: overrides the NOMISS option for character variables
NOLEGEND
suppresses the legend at the top of each plot. The legend lists the names of the
variables being plotted and the plotting symbols used in the plot.
NOMISS
excludes observations for which either variable is missing from the calculation of the
axes. Normally, PROC PLOT draws an axis based on all the values of the variable
being plotted, including points for which the other variable is missing.
Interaction: The HAXIS= option overrides the effect of NOMISS on the horizontal
axis. The VAXIS= option overrides the effect on the vertical axis.
Interaction: NOMISS is overridden by MISSING for character variables.
Featured in:
UNIFORM
uniformly scales axes across BY groups. Uniform scaling enables you to directly
compare the plots for different values of the BY variables.
Restriction: You cannot use PROC PLOT with the UNIFORM option with an
engine that supports concurrent access if another user is updating the data set at
the same time.
VPERCENT=percent(s)
species one or more percentages of the available vertical space to use for each plot.
If you use a percentage greater than 100, then PROC PLOT prints sections of the
plot on successive pages.
Alias:
VPCT=
Default: 100
Featured in:
species the aspect ratio (vertical to horizontal) of the characters on the output
device. aspect-ratio is a positive real number. If you use the VTOH= option, then
PROC PLOT spaces tick marks so that the distance between horizontal tick marks is
nearly equal to the distance between vertical tick marks. For example, if characters
are twice as high as they are wide, then specify VTOH=2.
Minimum: 0
612
BY Statement
Chapter 32
Interaction: VTOH= has no effect if you use the HSPACE= and the VSPACE=
BY Statement
Produces a separate plot and starts a new page for each BY group.
Main discussion: BY on page 58
Featured in:
Required Arguments
variable
species the variable that the procedure uses to form BY groups. You can specify
more than one variable. If you do not use the NOTSORTED option in the BY
statement, then the observations in the data set must either be sorted by all the
variables that you specify or be indexed appropriately. Variables in a BY statement
are called BY variables.
Options
DESCENDING
species that the observations are sorted in descending order by the variable that
immediately follows the word DESCENDING in the BY statement.
NOTSORTED
species that observations are not necessarily sorted in alphabetic or numeric order.
The data is grouped in another way, for example, chronological order.
The requirement for ordering or indexing observations according to the values of
BY variables is suspended for BY-group processing when you use the NOTSORTED
option. In fact, the procedure does not use an index if you specify NOTSORTED. The
procedure denes a BY group as a set of contiguous observations that have the same
values for all BY variables. If observations with the same values for the BY variables
are not contiguous, then the procedure treats each contiguous set as a separate BY
group.
PLOT Statement
PLOT Statement
Requests the plots to be produced by PROC PLOT.
Tip:
To do this
HPOS=
and
VPOS=
HZERO
and
VZERO
HREF=
and
VREF=
BOX
Overlay plots
OVERLAY
CONTOUR
Scontour-level=
SLIST=
LIST=
OUTWARD=
PENALTIES=
PLACEMENT=
613
614
PLOT Statement
Chapter 32
To do this
SPLIT=
STATES
Required Arguments
plot-request(s)
species the variables (vertical and horizontal) to plot and the plotting symbol to use
to mark the points on the plot.
Each form of plot-request(s) supports a label variable. A label variable is preceded
by a dollar sign ($) and species a variable whose values label the points on the plot.
For example,
plot y*x $ label-variable
plot y*x=* $ label-variable
See Labeling Plot Points with Values of a Variable on page 625 for more
information. In addition, see Example 9 on page 649 and all the examples that follow
it.
The plot-request(s) can be one or more of the following:
vertical*horizontal <$ label-variable>
species the variable to plot on the vertical axis and the variable to plot on the
horizontal axis.
For example, the following statement requests a plot of Y by X:
plot y*x;
PLOT Statement
615
What is plotted
(a - - d)
(x1 - x4)
x1*x2
x1*x3 x1*x4 x2*x3
x2*x4 x3*x4
(_numeric_)
y*(x1 - x4)
y*x1
y*x2 y*x4 y*x4
If both the vertical and horizontal specications request more than one variable and
if a variable appears in both lists, then it will not be plotted against itself. For example,
the following statement does not plot B*B and C*C:
plot (a b c)*(b c d);
A colon combines the variables pairwise. Thus, the rst variables of each list
combine to request a plot, as do the second, third, and so on. For example, the following
plot requests are equivalent:
plot (y1-y2) : (x1-x2);
plot y1*x1 y2*x2;
Options
BOX
draws a border around the entire plot, rather than just on the left side and bottom.
Featured in: Example 3 on page 634
CONTOUR<=number-of-levels>
draws a contour plot using plotting symbols with varying degrees of shading where
number-of-levels is the number of levels for dividing the range of variable. The plot
616
PLOT Statement
Chapter 32
Comments
10 to 100 by 5
by 5
1 2 10 to 100
by 5
PLOT Statement
617
For example,
haxis=Paris London Tokyo
date-time-valuei TO <date-time-valuei>
<BY increment>
date-time-valuei
any SAS date, time, or datetime value described for the SAS functions
INTCK and INTNX. The sufx i is one of the following:
D
date
time
DT
datetime
increment
one of the valid arguments for the INTCK or INTNX functions: For dates,
increment can be one of the following:
DAY
WEEK
MONTH
QTR
YEAR
For datetimes, increment can be one of the following:
DTDAY
DTWEEK
DTMONTH
DTQTR
DTYEAR
For times, increment can be one of the following:
HOUR
MINUTE
SECOND
For example,
haxis=01JAN95d to 01JAN96d
by month
haxis=01JAN95d to 01JAN96d
by qtr
Note: You must use a FORMAT statement to print the tick-mark values
in an understandable form. 4
Interaction: You can use the HAXIS= and VAXIS= options with the VTOH= option
to equate axes. If your data is suitable, then use HAXIS=BY n and VAXIS=BY n
618
PLOT Statement
Chapter 32
with the same value for n and specify a value for the VTOH= option. The number
of columns that separate the horizontal tick marks is nearly equal to the number
of lines that separate the vertical tick marks times the value of the VTOH= option.
In some cases, PROC PLOT cannot simultaneously use all three values and
changes one or more of the values.
Featured in: Example 2 on page 632, Example 5 on page 639, and Example 6 on
page 640
HEXPAND
expands the horizontal axis to minimize the margins at the sides of the plot and to
maximize the distance between tick marks, if possible.
HEXPAND causes PROC PLOT to ignore information about the spacing of the
data. Plots produced with this option waste less space but may obscure the nature of
the relationship between the variables.
HPOS=axis-length
species the number of print positions on the horizontal axis. The maximum value of
axis-length that allows a plot to t on one page is three positions less than the value
of the LINESIZE= system option because there must be space for the procedure to
print information next to the vertical axis. The exact maximum depends on the
number of characters that are in the vertical variables values. If axis-length is too
large to t on a line, then PROC PLOT ignores the option.
HREF=value-specication
draws lines on the plot perpendicular to the specied values on the horizontal axis.
PROC PLOT includes the values you specify with the HREF= option on the
horizontal axis unless you specify otherwise with the HAXIS= option.
For the syntax for value-specication, see HAXIS= on page 616.
Featured in: Example 8 on page 646
HREFCHAR=character
species that a tick mark will occur on the horizontal axis at every nth print
position, where n is the value of HSPACE=.
HZERO
assigns a value of zero to the rst tick mark on the horizontal axis.
Interaction: PROC PLOT ignores HZERO if the horizontal variable has negative
values or if the HAXIS= option species a range that does not begin with zero.
LIST<=penalty-value>
lists the horizontal and vertical axis values, the penalty, and the placement state of
all points plotted with a penalty greater than or equal to penalty-value. If no plotted
points have a penalty greater than or equal to penalty-value, then no list is printed.
Tip: LIST is equivalent to LIST=0.
See also: Understanding Penalties on page 626
Featured in: Example 11 on page 654
OUTWARD=character
tries to force the point labels outward, away from the origin of the plot, by protecting
positions next to symbols that match character that are in the direction of the origin
PLOT Statement
619
(0,0). The algorithm tries to avoid putting the labels in the protected positions, so
they usually move outward.
This option is useful only when you are labeling points with the values of a
variable.
Tip:
OVERLAY
overlays all plots that are specied in the PLOT statement on one set of axes. The
variable names, or variable labels if they exist, from the rst plot are used to label
the axes. Unless you use the HAXIS= or the VAXIS= option, PROC PLOT
automatically scales the axes in the way that best ts all the variables.
When the SAS system option OVP is in effect and overprinting is allowed, the
plots are superimposed; otherwise, when NOOVP is in effect, PROC PLOT uses the
plotting symbol from the rst plot to represent points that appear in more than one
plot. In such a case, the output includes a message telling you how many
observations are hidden.
Featured in:
PENALTIES<(index-list)>=penalty-list
changes the default penalties. The index-list provides the positions of the penalties in
the list of penalties. The penalty-list contains the values that you are specifying for
the penalties that are indicated in the index-list. The index-list and the penalty-list
can contain one or more integers. In addition, both index-list and penalty-list accept
the form:
value TO value
PLACEMENT=(expression(s))
controls the placement of labels by specifying possible locations of the labels relative
to their coordinates. Each expression consists of a list of one or more suboptions (H=,
L=, S=, or V=) that are joined by an asterisk (*) or a colon (:). PROC PLOT uses the
asterisk and colon to expand each expression into combinations of values for the four
possible suboptions. The asterisk creates every possible combination of values in the
expression list. A colon creates only pairwise combinations. The colon takes
precedence over the asterisk. With the colon, if one list is shorter than the other,
then the values in the shorter list are reused as necessary.
Use the following suboptions to control the placement:
H=integer(s)
species the number of horizontal spaces (columns) to shift the label relative to
the starting position. Both positive and negative integers are valid. Positive
integers shift the label to the right; negative integers shift it to the left. For
example, you can use the H= suboption in the following way:
place=(h=0 1 -1 2 -2)
You can use the keywords BY ALT in this list. BY ALT produces a series of
numbers whose signs alternate between positive and negative and whose absolute
values change by one after each pair. For instance, the following PLACE=
specications are equivalent:
place=(h=0 -1 to -3 by alt)
place=(h=0 -1 1 -2 2 -3 3)
620
PLOT Statement
Chapter 32
If the series includes zero, then the zero appears twice. For example, the
following PLACE= options are equivalent:
place=(h= 0 to 2 by alt)
place=(h=0 0 1 -1 2 -2)
Default: H=0
Range: 500 to 500
L=integer(s)
species the number of lines onto which the label may be split.
Default: L=1
Range: 1-200
S=start-position(s)
species where to start printing the label. The value for start-position can be one
or more of the following:
CENTER
the procedure centers the label around the plotting symbol.
RIGHT
the label starts at the plotting symbol location and continues to the right.
LEFT
the label starts to the left of the plotting symbol and ends at the plotting symbol
location.
Default: CENTER
V=integer(s)
species the number of vertical spaces (lines) to shift the label relative to the
starting position. V= behaves the same as the H= suboption, described earlier.
A new expression begins when a suboption is not preceded by an operator.
Parentheses around each expression are optional. They make it easier to recognize
individual expressions in the list. However, the entire expression list must be in
parentheses, as shown in the following example. Table 32.1 on page 621 shows how
this expression is expanded and describes each placement state.
place=((v=1)
(s=right left : h=2 -2)
(v=-1)
(h=0 1 to 2 by alt * v=1 -1)
(l=1 to 3 * v=1 to 2 by alt *
h=0 1 to 2 by alt))
PLACE=
There are two defaults for the PLACE= option. If you are using a blank
as the plotting symbol, then the default placement state is PLACE=(S=CENTER :
Defaults:
PLOT Statement
621
V=0 : H=0 : L=1), which centers the label. If you are using anything other than a
blank, then the default is PLACE=((S=RIGHT LEFT : H=2 2) (V=1 1 * H=0 1 -1
2 -2)). The default for labels placed with symbols includes multiple positions
around the plotting symbol so the procedure has exibility when placing labels on
a crowded plot.
Tip: Use the STATES option to print a list of placement states.
See also: Labeling Plot Points with Values of a Variable on page 625
Featured in:
Table 32.1
Expression
Placement state
Meaning
(V=1)
(V=1)
622
PLOT Statement
Chapter 32
Expression
Placement state
Meaning
S=CENTER
S=CENTER
S=CENTER
S=CENTER
L=1
L=1
L=1
L=1
H=1 V=1
H=1 V=1
H=2 V=1
H=2 V=1
S=CENTER
S=CENTER
S=CENTER
S=CENTER
L=1
L=1
L=1
L=1
H=1 V=1
H=1 V=1
H=2 V=1
H=2 V=1
.
.
.
.
.
.
S=CENTER L=3 H= 2 V=2
Scontour-level=character-list
species the plotting symbol to use for a single contour level. When PROC PLOT
produces contour plots, it automatically chooses the symbols to use for each level of
intensity. You can use the S= option to override these symbols and specify your own.
You can include up to three characters in character-list. If overprinting is not
allowed, then PROC PLOT uses only the rst character.
For example, to specify three levels of shading for the Z variable, use the following
statement:
plot y*x=z /
contour=3 s1=A s2=+ s3=X0A;
PLOT Statement
623
s2=7Fx s3=A6x;
This feature was designed especially for printers where the hexadecimal constants
can represent grey-scale ll characters.
Range: 1 to the highest contour level (determined by the CONTOUR option).
See also: SLIST= and CONTOUR
SLIST=character-list-1 <character-list-n>
species plotting symbols for multiple contour levels. Each character-list species the
plotting symbol for one contour level: the rst character-list for the rst level, the
second character-list for the second level, and so on. For example:
plot y*x=z /
contour=5
slist=. : ! = +O;
Default: If you omit a plotting symbol for each contour level, then PROC PLOT
Restriction: If you use the SLIST= option, then it must be listed last in the PLOT
statement.
See also: Scontour-level= and CONTOUR=
SPLIT=split-character
when labeling plot points, species where to split the label when the label spans two
or more lines. The label is split onto the number of lines that is specied in the L=
suboption to the PLACEMENT= option. If you specify a split character, then the
procedure always splits the label on each occurrence of that character, even if it
cannot nd a suitable placement. If you specify L=2 or more but do not specify a split
character, then the procedure tries to split the label on blanks or punctuation but
will split words if necessary.
PROC PLOT shifts split labels as a block, not as individual fragments (a fragment
is the part of the split label that is contained on one line). For example, to force This
is a label to split after the a , change it to This is a*label and specify
SPLIT=* .
See also: Labeling Plot Points with Values of a Variable on page 625
STATES
lists all the placement states in effect. STATES prints the placement states in the
order that you specify them in the PLACE= option.
VAXIS=axis-specication
species tick mark values for the vertical axis. VAXIS= follows the same rules as
theHAXIS= option on page 616.
Featured in:
VEXPAND
expands the vertical axis to minimize the margins above and below the plot and to
maximize the space between vertical tick marks, if possible.
See also: HEXPAND on page 618
VPOS=axis-length
species the number of print positions on the vertical axis. The maximum value for
axis-length that allows a plot to t on one page is 8 lines less than the value of the
624
Chapter 32
SAS system option PAGESIZE= because you must allow room for the procedure to
print information under the horizontal axis. The exact maximum depends on the
titles that are used, whether or not plots are overlaid, and whether or not CONTOUR
is specied. If the value of axis-length species a plot that cannot t on one page,
then the plot spans multiple pages.
See also: HPOS= on page 618
VREF=value-specication
draws lines on the plot perpendicular to the specied values on the vertical axis.
PROC PLOT includes the values you specify with the VREF= option on the vertical
axis unless you specify otherwise with the VAXIS= option. For the syntax for
value-specication, see HAXIS= on page 616.
Featured in:
VREFCHAR=character
species that a tick mark will occur on the vertical axis at every nth print position,
where n is the value of VSPACE=.
VZERO
assigns a value of zero to the rst tick mark on the vertical axis.
Interaction: PROC PLOT ignores the VZERO option if the vertical variable has
negative values or if the VAXIS= option species a range that does not begin with
zero.
RUN Groups
PROC PLOT is an interactive procedure. It remains active after a RUN statement is
executed. Usually, SAS terminates a procedure after executing a RUN statement.
When you start the PLOT procedure, you can continue to submit any valid statements
without resubmitting the PROC PLOT statement. Thus, you can easily experiment with
changing labels, values of tick marks, and so forth. Any options submitted in the PROC
PLOT statement remain in effect until you submit another PROC PLOT statement.
When you submit a RUN statement, PROC PLOT executes all the statements
submitted since the last PROC PLOT or RUN statement. Each group of statements is
called a RUN group. With each RUN group, PROC PLOT begins a new page and begins
with the rst item in the VPERCENT= and HPERCENT= lists, if any.
To terminate the procedure, submit a QUIT statement, a DATA statement, or a
PROC statement. Like the RUN statement, each of these statements completes a RUN
group. If you do not want to execute the statements in the RUN group, then use the
RUN CANCEL statement, which terminates the procedure immediately.
625
You can use the BY statement interactively. The BY statement remains in effect
until you submit another BY statement or terminate the procedure.
See Example 11 on page 654 for an example of using RUN group processing with
PROC PLOT.
= 2:54 + 3:83x
If the plot is printed with a LINESIZE= value of 80, then about 75 positions are
available on the horizontal axis for the X values. Thus, 2 is a good increment: 51
observations are generated, which is fewer than the 75 available positions on the
horizontal axis.
However, if the plot is printed with a LINESIZE= value of 132, then an increment of
2 produces a plot in which the plotting symbols have space between them. For a
smoother line, a better increment is 1, because 101 observations are generated.
Pointer Symbols
When you are using a label variable and do not specify a plotting symbol or if the
value of the variable you use as the plotting symbol is null (00x), PROC PLOT uses
pointer symbols as plotting symbols. Pointer symbols associate a point with its label by
pointing in the general direction of the label placement. PROC PLOT uses four
different pointer symbols based on the value of the S= and V= suboptions in the
PLACEMENT= option. The table below shows the pointer symbols:
626
Chapter 32
S=
V=
Symbol
LEFT
any
<
RIGHT
any
>
CENTER
>0
CENTER
<=0
If you are using pointer symbols and multiple points coincide, then PROC PLOT uses
the number of points as the plotting symbol if the number of points is between 2 and 9.
If the number of points is more than 9, then the procedure uses an asterisk (*).
Note: Because of character set differences among operating environments, the
pointer symbol for S=CENTER and V>0 may differ from the one shown here. 4
Understanding Penalties
PROC PLOT assesses the quality of placements with penalties. If all labels are
plotted with zero penalty, then no labels collide and all labels are near their symbols.
When it is not possible to place all labels with zero penalty, PROC PLOT tries to
minimize the total penalty. Table 32.2 on page 626 gives a description of the penalty,
the default value of the penalty, the index that you use to reference the penalty, and the
range of values that you can specify if you change the penalties. Each penalty is
described in more detail in Table 32.3 on page 627.
Table 32.2
Penalties Table
Penalty
Default penalty
Index
Range
0-500
0-500
50
0-500
0-500
0-500
0-500
1-500
collision state
500
0-10,000
9-14
11
15
0-500
10
16
0-500
17
0-500
18
0-500
19-214
0-500
Table 32.3 on page 627 contains the index values from Table 32.2 on page 626 with a
description of the corresponding penalty.
Table 32.3
a nonblank character in the plot collides with an embedded blank in a label, or there is not a blank or a
plot boundary before or after each label fragment.
a split occurs on a nonblank or nonpunctuation character when you do not specify a split character.
a label is placed with a different number of lines than the L= suboption species, when you specify a
split character.
4-7
627
a label is placed far away from the corresponding point. PROC PLOT calculates the penalty according to
this (integer arithmetic) formula:
[MAX (j
j0
fhs
; 0) +
vsw
2 MAX (j
L fvs V
j0(
+(
>
vhsd
)) =2; 0)] =
Notice that penalties 4 through 7 are actually just components of the formula used to determine the
penalty. Changing the penalty for a free horizontal or free vertical shift to a large value such as 500 has
the effect of removing any penalty for a large horizontal or vertical shift. Example 6 on page 640
illustrates a case in which removing the horizontal shift penalty is useful.
8
a label may collide with its own plotting symbol. If the plotting symbol is blank, then a collision state
cannot occur. See Collision States on page 628 for more information.
15-214
a label character does not appear in the plot. By default, the penalty for not printing the rst character
is greater than the penalty for not printing the second character, and so on. By default, the penalty for
not printing the fth and subsequent characters is the same.
Note:
Changing Penalties
You can change the default penalties with the PENALTIES= option in the PLOT
statement. Because PROC PLOT considers penalties when it places labels, changing
the default penalties can change the placement of the labels. For example, if you have
labels that all begin with the same two-letter prex, then you might want to increase
the default penalty for not printing the third, fourth, and fth characters to 11, 10, and
8 and decrease the penalties for not printing the rst and second characters to 2. The
following PENALTIES= option accomplishes this change:
penalties(15 to 20)=2 2 11 10 8 2
This example extends the penalty list. The twentieth penalty of 2 is the penalty for
not printing the sixth through 200th character. When the last index i is greater than
18, the last penalty is used for the (i 14)th character and beyond.
You can also extend the penalty list by just specifying the starting index. For
example, the following PENALTIES= option is equivalent to the one above:
penalties(15)=2 2 11 10 8 2
628
Chapter 32
Collision States
Collision states are placement states that may cause a label to collide with its own
plotting symbol. PROC PLOT usually avoids using collision states because of the large
default penalty of 500 that is associated with them. PROC PLOT does not consider the
actual length or splitting of any particular label when determining if a placement state
is a collision state. The following are the rules that PROC PLOT uses to determine
collision states:
3 When S=CENTER, placement states that do not shift the label up or down
sufciently so that all of the label is shifted onto completely different lines from
the symbol are collision states.
3 When S=RIGHT, placement states that shift the label zero or more positions to the
left without rst shifting the label up or down onto completely different lines from
the symbol are collision states.
3 When S=LEFT, placement states that shift the label zero or more positions to the
right without rst shifting the label up or down onto completely different lines
from the symbol are collision states.
Note:
Reference Lines
PROC PLOT places labels and computes penalties before placing reference lines on a
plot. The procedure does not attempt to avoid rows and columns that contain reference
lines.
len
629
Time
For a given plot size, the time that is required to construct the plot is roughly
proportional to n len. The amount of time required to split the labels is roughly
proportional to ns2 . Generally, the more placement states that you specify, the more
time that PROC PLOT needs to place the labels. However, increasing the number of
horizontal and vertical shifts gives PROC PLOT more exibility to avoid collisions,
often resulting in less time used to place labels.
Memory
PROC PLOT uses 24p bytes of memory for the internal placement state list. PROC
PLOT uses n (84 + 5len + 4s (1 + 1:5 (s + 1))) bytes for the internal list of labels.
PROC PLOT buildsall plots in memory; each printing position uses one byte of memory.
If you run out of memory, then request fewer plots in each PLOT statement and put a
RUN statement after each PLOT statement.
Printed Output
Each plot uses one full page unless the plots size is changed by the VPOS= and
HPOS= options in the PLOT statement, the VPERCENT= or HPERCENT= options in
the PROC PLOT statement, or the PAGESIZE= and LINESIZE= system options. Titles,
legends, and variable labels are printed at the top of each page. Each axis is labeled
with the variables name or, if it exists, the variables label.
Normally, PROC PLOT begins a new plot on a new page. However, the VPERCENT=
and HPERCENT= options enable you to print more than one plot on a page.
VPERCENT= and HPERCENT= are described earlier in PROC PLOT Statement on
page 609.
PROC PLOT always begins a new page after a RUN statement and at the beginning
of a BY group.
630
Table 32.4
Chapter 32
Table Name
Description
Plot
A single plot
Overlaid
Missing Values
If values of either of the plotting variables are missing, then PROC PLOT does not
include the observation in the plot. However, in a plot of Y*X, values of X with
corresponding missing values of Y are included in scaling the X axis, unless the
NOMISS option is specied in the PROC PLOT statement.
Hidden Observations
By default, PROC PLOT uses different plotting symbols (A, B, C, and so on) to
represent observations whose values coincide on a plot. However, if you specify your
own plotting symbol or if you use the OVERLAY option, then you may not be able to
recognize coinciding values.
If you specify a plotting symbol, then PROC PLOT uses the same symbol regardless
of the number of observations whose values coincide. If you use the OVERLAY option
and overprinting is not in effect, then PROC PLOT uses the symbol from the rst plot
request. In both cases, the output includes a message telling you how many
observations are hidden.
Program
631
PLOT statement
plotting symbol in plot request
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. NUMBER enables printing of the page number. PAGENO= species the starting
page number. LINESIZE= species the output line length, and PAGESIZE= species the
number of lines on an output page.
options nodate number pageno=1 linesize=80 pagesize=35;
Create the DJIA data set. DJIA contains the high and low closing marks for the Dow Jones
Industrial Average from 1954 to 1994. A DATA step on page 1383 creates this data set.
data djia;
input Year @7 HighDate date7. High @24 LowDate date7. Low;
format highdate lowdate date7.;
datalines;
1954 31DEC54 404.39 11JAN54 279.87
1955 30DEC55 488.40 17JAN55 388.20
...more data lines...
1993 29DEC93 3794.33 20JAN93 3241.95
1994 31JAN94 3978.36 04APR94 3593.35
;
Create the plot. The plot request plots the values of High on the vertical axis and the values of
Year on the horizontal axis. It also species an asterisk as the plotting symbol.
proc plot data=djia;
plot high*year=*;
632
Output
Chapter 32
Output
PROC PLOT determines the tick marks and the scale of both axes.
Symbol used is *.
High |
|
4000 +
*
|
*
|
|
*
|
*
3000 +
*
|
* *
|
|
|
*
2000 +
*
|
|
*
|
|
**
1000 +
***** *** *** ***
|
****
*
**
*
|
*****
|
**
|
0 +
|
---+---------+---------+---------+---------+---------+-1950
1960
1970
1980
1990
2000
Year
Program
633
This example species values for the horizontal axis and draws a reference line from
the vertical axis.
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=35;
Create the plot. The plot request plots the values of High on the vertical axis and the values of
Year on the horizontal axis. It also species an asterisk as the plotting symbol.
proc plot data=djia;
plot high*year=*
Customize the horizontal axis and draw a reference line. HAXIS= species that the
horizontal axis will show the values 1950 to 1995 in ve-year increments. VREF= draws a
reference line that extends from the value 3000 on the vertical axis.
/ haxis=1950 to 1995 by 5 vref=3000;
634
Output
Chapter 32
Output
High Values of Dow Jones Industrial Average
from 1954 to 1994
Plot of High*Year.
Symbol used is *.
High |
|
4000 +
*
|
*
|
|
*
|
*
3000 +----------------------------------------------------------------*--------|
* *
|
|
|
*
2000 +
*
|
|
*
|
|
**
1000 +
* ** **
** *
** * * **
|
** **
*
* *
*
|
** ** *
|
* *
|
0 +
|
-+-------+-------+-------+-------+-------+-------+-------+-------+-------+1950
1955
1960
1965
1970
1975
1980
1985
1990
1995
Year
This example overlays two plots and puts a box around the plot.
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=64 pagesize=30;
Output
Create the plot.The rst plot request plots High on the vertical axis, plots Year on the
horizontal axis, and species an asterisk as a plotting symbol. The second plot request plots
Low on the vertical axis, plots Year on the horizontal axis, and species an o as a plotting
symbol. OVERLAY superimposes the second plot onto the rst. BOX draws a box around the
plot. OVERLAY and BOX apply to both plot requests.
proc plot data=djia;
plot high*year=*
low*year=o / overlay box;
Output
Plot of Highs and Lows
for the Dow Jones Industrial Average
635
636
Chapter 32
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=120 pagesize=60;
Specify the plot sizes. VPERCENT= species that 50% of the vertical space on the page of
output is used for each plot. HPERCENT= species that 50% of the horizontal space is used for
each plot.
proc plot data=djia vpercent=50 hpercent=50;
Create the rst plot. This plot request plots the values of High on the vertical axis and the
values of Year on the horizontal axis. It also species an asterisk as the plotting symbol.
plot high*year=*;
Create the second plot.This plot request plots the values of Low on the vertical axis and the
values of Year on the horizontal axis. It also species an asterisk as the plotting symbol.
plot low*year=o;
Program
637
Create the third plot. The rst plot request plots High on the vertical axis, plots Year on the
horizontal axis, and species an asterisk as a plotting symbol. The second plot request plots
Low on the vertical axis, plots Year on the horizontal axis, and species an o as a plotting
symbol. OVERLAY superimposes the second plot onto the rst. BOX draws a box around the
plot. OVERLAY and BOX apply to both plot requests.
plot high*year=* low*year=o / overlay box;
638
Output
Chapter 32
Output
Symbol used is *.
Plot of Low*Year.
4000 +
|
|
*
*
*
|
|
|
|
|
|
********
***
** ***
|
|
|
******
****
|
0 +
-+---------+---------+---------+---------+---------+1950
1960
1970
1980
1990
2000
Year
Plot of High*Year.
Plot of Low*Year.
Symbol used is *.
-+---------+---------+---------+---------+---------+4000 +
|
|
*
*
* o
|
High |
*oo
*
|
|
|
|
|
|
|
|
0 +
* *
o
*oo
* o
o
*o
**o
****** ************oo
*****oooooo*o o oooooooo
*****oooo
o
+
|
|
|
|
|
|
|
+
|
|
|
|
|
|
|
+
-+---------+---------+---------+---------+---------+1950
1960
1970
1980
1990
2000
Year
NOTE: 7 obs hidden.
oo
o
ooo
o oo ooo oo o o
ooo oo o oo
oo o o o
oooo
o
|
o
0 +
-+---------+---------+---------+---------+---------+1950
1960
1970
1980
Year
Symbol used is o.
2000 +
|
|
oo
o
|
|
|
**
*
2000 +
|
*
**
|
|
|
2000 +
|
|
Low |
|
**
|
|
|
Symbol used is o.
4000 +
|
|
|
High |
|
1990
2000
Program
639
This example uses a DATA step to generate data. The PROC PLOT step shows two
plots of the same data: one plot without a horizontal axis specication and one plot
with a logarithmic scale specied for the horizontal axis.
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=40;
Create the EQUA data set. EQUA contains values of X and Y. Each value of X is calculated as
Y
10 .
data equa;
do Y=1 to 3 by .1;
X=10**y;
output;
end;
run;
Specify the plot sizes. HPERCENT= makes room for two plots side-by-side by specifying that
50% of the horizontal space is used for each plot.
proc plot data=equa hpercent=50;
Create the plots. The plot requests plot Y on the vertical axis and X on the horizontal axis.
HAXIS= species a logarithmic scale for the horizontal axis for the second plot.
plot y*x;
plot y*x / haxis=10 100 1000;
640
Output
Chapter 32
Output
Two Plots with Different
Horizontal Axis Specifications
Plot of Y*X.
Y |
|
3.0 +
A
2.9 +
A
2.8 +
A
2.7 +
A
2.6 +
A
2.5 +
A
2.4 +
A
2.3 +
A
2.2 +
A
2.1 +
A
2.0 +
A
1.9 +
A
1.8 + A
1.7 + A
1.6 + A
1.5 + A
1.4 + A
1.3 + A
1.2 + A
1.1 +A
1.0 +A
|
-+---------------+---------------+
0
500
1000
Plot of Y*X.
Y |
|
3.0 +
A
2.9 +
A
2.8 +
A
2.7 +
A
2.6 +
A
2.5 +
A
2.4 +
A
2.3 +
A
2.2 +
A
2.1 +
A
2.0 +
A
1.9 +
A
1.8 +
A
1.7 +
A
1.6 +
A
1.5 +
A
1.4 +
A
1.3 +
A
1.2 +
A
1.1 + A
1.0 +A
|
-+---------------+---------------+
10
100
1000
This example shows how you can specify date values on an axis.
Program
641
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=120 pagesize=40;
488
460
356
480
388
328
280
394
590
330
321
511
309
Create the plot. The plot request plots Calls on the vertical axis and Date on the horizontal
axis. HAXIS= uses a monthly time for the horizontal axis. The notation 1JAN94d is a date
constant. The value 1JAN95d ensures that the axis will have enough room for observations
from December.
proc plot data=emergency_calls;
plot calls*date / haxis=1JAN94d to 1JAN95d by month;
Format the DATE values. The FORMAT statement assigns the DATE7. format to Date.
format date date7.;
642
Output
Chapter 32
Output
|
|
600 +
|
|
|
|
N 500 +
u
m
b
A
A
|
|
|
A
A
e
|
r 400 +
|
o
f
A
A
|
|
|
|
|
100 +
A
A
A
A
A
A
A
A
A
A
|
|
200 +
|
|
A
A
C 300 +
a
|
l
|
l
s
A
A
A
A
A
A
A
A
A
A
A
|
---+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+-01JAN94 01FEB94 01MAR94 01APR94 01MAY94 01JUN94 01JUL94 01AUG94 01SEP94 01OCT94 01NOV94 01DEC94 01JAN95
Date
Program
643
This example shows how to represent the values of three variables with a
two-dimensional plot by setting one of the variables as the CONTOUR variable. The
variables X and Y appear on the axes, and Z is the contour variable. Program
statements are used to generate the observations for the plot, and the following
equation describes the contour surface:
z
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=64 pagesize=25;
Print the CONTOURS data set. The OBS= data set option limits the printing to only the rst
5 observations. NOOBS suppresses printing of the observation numbers.
proc print data=contours(obs=5) noobs;
title CONTOURS Data Set;
title2 First 5 Observations Only;
run;
644
Program
Chapter 32
46.2
47.2
48.0
48.8
49.4
0
0
0
0
0
0
10
20
30
40
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page. NOOVP ensures
that overprinting is not used in the plot.
options nodate pageno=1 linesize=120 pagesize=60 noovp;
Create the plot. The plot request plots Y on the vertical axis, plots X on the horizontal axis,
and species Z as the contour variable. CONTOUR=10 species that the plot will divide the
values of Z into ten increments, and each increment will have a different plotting symbol.
proc plot data=contours;
plot y*x=z / contour=10;
Output
Output
The shadings associated with the values of Z appear at the bottom of the plot. The plotting symbol # shows
where high values of Z occur.
A Contour Plot
======++++++OOOOOOOOXXXXXXXXXXXWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWXXXXXXXXXXXOOOOOOOO
====++++++OOOOOOOXXXXXXXXXXWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWXXXXXXXXXXOOOOOOO
330 +
320 +
310 +
=++++++OOOOOOOXXXXXXXXXWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWXXXXXXXXXOOOOO
+++++OOOOOOOXXXXXXXXWWWWWWWWWWWWWW********************WWWWWWWWWWWWWWXXXXXXXXXOOOO
+++OOOOOOXXXXXXXXWWWWWWWWWWWW*****************************WWWWWWWWWWWXXXXXXXXOOOO
300 +
290 +
280 +
+OOOOOOXXXXXXXXWWWWWWWWWW***********************************WWWWWWWWWWXXXXXXXXOOO
OOOOOXXXXXXXWWWWWWWWWW****************************************WWWWWWWWWXXXXXXXOOO
OOOXXXXXXXWWWWWWWWW********************####********************WWWWWWWWWXXXXXXXOO
270 +
260 +
250 +
OXXXXXXXWWWWWWWWW**************##################***************WWWWWWWWXXXXXXXOO
XXXXXXWWWWWWWW*************#########################************WWWWWWWWXXXXXXXOO
XXXXWWWWWWWW************#############################************WWWWWWWWXXXXXXOO
240 +
230 +
220 +
XXXWWWWWWW***********#################################***********WWWWWWWWXXXXXXOO
XWWWWWWWW**********####################################**********WWWWWWWXXXXXXXOO
WWWWWWW**********######################################**********WWWWWWWXXXXXXOOO
210 +
200 +
WWWWWW*********########################################**********WWWWWWWXXXXXXOOO
WWWWW*********#########################################*********WWWWWWWXXXXXXOOOO
190 +
180 +
170 +
WWW**********##########################################*********WWWWWWWXXXXXXOOOO
WW*********###########################################*********WWWWWWWXXXXXXOOOOO
W*********############################################*********WWWWWWWXXXXXXOOOOO
160 +
150 +
140 +
W*********###########################################*********WWWWWWWXXXXXXOOOOO+
*********###########################################*********WWWWWWWXXXXXXOOOOO++
********###########################################*********WWWWWWWXXXXXXOOOOO+++
130 +
120 +
110 +
********##########################################*********WWWWWWWXXXXXXOOOOO++++
********########################################**********WWWWWWWXXXXXXOOOOO+++++
********#######################################**********WWWWWWWXXXXXXOOOOO+++++=
100 +
90 +
********#####################################**********WWWWWWWXXXXXXOOOOOO+++++==
********###################################**********WWWWWWWWXXXXXXOOOOO+++++====
80 +
70 +
60 +
*********################################***********WWWWWWWXXXXXXXOOOOO+++++====**********############################************WWWWWWWWXXXXXXOOOOOO+++++====-************######################**************WWWWWWWWXXXXXXXOOOOO+++++=====---
50 +
40 +
30 +
***************###############***************WWWWWWWWWXXXXXXXOOOOOO+++++====----
W******************************************WWWWWWWWWXXXXXXXOOOOOO+++++=====----
WW**************************************WWWWWWWWWWXXXXXXXOOOOOO+++++=====----
20 +
10 +
0 +
WWWW********************************WWWWWWWWWWWXXXXXXXXOOOOOO++++++====-----.
WWWWWW**************************WWWWWWWWWWWWWXXXXXXXXOOOOOO++++++=====----...
WWWWWWWWWW*****************WWWWWWWWWWWWWWWXXXXXXXXOOOOOOO++++++=====----....
|
---+---------+---------+---------+---------+---------+---------+---------+---------+-0
50
100
150
200
250
300
350
400
X
Symbol
.....
z
2.2 - 8.1
8.1 - 14.0
Symbol
----=====
z
14.0 - 19.9
19.9 - 25.8
Symbol
+++++
OOOOO
z
25.8 - 31.7
31.7 - 37.6
Symbol
XXXXX
WWWWW
z
37.6 - 43.5
43.5 - 49.4
Symbol
*****
#####
z
49.4 - 55.4
55.4 - 61.3
645
646
Chapter 32
BY statement
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=35;
Create the EDUCATION data set. EDUCATION contains educational data* about some U.S.
states. DropoutRate is the percentage of high school dropouts. Expenditures is the dollar
amount the state spends on each pupil. MathScore is the score of eighth-grade students on a
standardized math test. Not all states participated in the math test. A DATA step on page 1384
creates this data set.
data education;
input State $14. +1 Code $ DropoutRate Expenditures MathScore
Region $;
label dropout=Dropout Percentage - 1989
expend=Expenditure Per Pupil - 1989
math=8th Grade Math Exam - 1990;
datalines;
Alabama
AL 22.3 3197 252 SE
Alaska
AK 35.8 7716 .
W
...more data lines...
New York
NY 35.0 .
261 NE
North Carolina NC 31.2 3874 250 SE
North Dakota
ND 12.1 3952 281 MW
Ohio
OH 24.4 4649 264 MW
;
Program
647
Sort the EDUCATION data set. PROC SORT sorts EDUCATION by Region so that Region
can be used as the BY variable in PROC PLOT.
proc sort data=education;
by region;
run;
Create a separate plot for each BY group. The BY statement creates a separate plot for
each value of Region.
proc plot data=education;
by region;
Create the plot with a reference line. The plot request plots Expenditures on the vertical
axis, plots DropoutRate on the horizontal axis, and species an asterisk as the plotting symbol.
HREF= draws a reference line that extende from 28.6 on the horizontal axis. The reference line
represents the national average.
plot expenditures*dropoutrate=* / href=28.6;
648
Output
Chapter 32
Output
PROC PLOT produces a plot for each BY group. Only the plots for Midwest and Northeast
are shown.
Symbol used is *.
Expenditures |
|
5500 +
|
|
|
|
|
|
|
|
| *
5000 +
|
|
*
|
|
*
|
|
|
|
*
|
4500 +
|
|
*
*
|
|
**
*
|
|
|
|
|
4000 +
*
|
|
|
|
|
|
|
|
|
3500 +
|
|
|
---+------------+------------+------------+------------+-10
15
20
25
30
Dropout Percentage - 1989
Program
649
Symbol used is *.
Expenditures |
|
8000 +
|
|
|
|
*
|
|
|
|
|
7000 +
|
|
*
|
|
|
|
|
|
|
6000 +
*|
|
*
|
|
|
|
*
|
|
5000 +
|
|
*
*
|
|
|
|
|
|
|
4000 +
|
|
|
---+------------+------------+------------+------------+-15
20
25
30
35
Dropout Percentage - 1989
NOTE: 1 obs had missing values.
PLOT statement
label variable in plot request
Data set: EDUCATION on page 646
This example shows how to modify the plot request to label points on the plot with
the values of variables. This example adds labels to the plot shown in Example 8 on
page 646.
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=35;
650
Program
Chapter 32
Sort the EDUCATION data set. PROC SORT sorts EDUCATION by Region so that Region
can be used as the BY variable in PROC PLOT.
proc sort data=education;
by region;
run;
Create a separate plot for each BY group. The BY statement creates a separate plot for
each value of Region.
proc plot data=education;
by region;
Create the plot with a reference line and a label for each data point. The plot request
plots Expenditures on the vertical axis, plots DropoutRate on the horizontal axis, and species
an asterisk as the plotting symbol. The label variable specication ($ state) in the PLOT
statement labels each point on the plot with the name of the corresponding state. HREF= draws
a reference line that extends from 28.6 on the horizontal axis. The reference line represents the
national average.
plot expenditures*dropoutrate=* $ state / href=28.6;
Output
651
Output
PROC PLOT produces a plot for each BY group. Only the plots for Midwest and Northeast are
shown.
Symbol used is *.
Expenditures |
|
5500 +
|
|
|
|
|
|
|
|
Michigan *
5000 +
|
|
* Illinois
|
|
* Minnesota
|
|
|
|
* Ohio
|
4500 +
|
|
* Nebraska * Kansas
|
|
Iowa ** Indiana
* Missouri
|
|
|
|
4000 +
* North Dakota
|
|
|
|
|
|
|
|
|
3500 +
|
|
|
---+------------+------------+------------+------------+-10
15
20
25
30
Dropout Percentage - 1989
652
Chapter 32
Symbol used is *.
Expenditures |
|
8000 +
|
|
|
|
* New Jersey
|
|
|
|
|
7000 +
|
|
* Connecticut
|
|
|
|
|
|
|
6000 +
*|Massachusetts
|
* Maryland
|
|
|
* Delaware
|
|
5000 +
|
|
* Maine * New Hampshire
|
|
|
|
|
|
4000 +
|
|
|
---+------------+------------+------------+------------+-15
20
25
30
35
Dropout Percentage - 1989
NOTE: 1 obs had missing values.
This example shows how missing values affect the calculation of the axes.
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=35;
Program
653
Sort the EDUCATION data set. PROC SORT sorts EDUCATION by Region so that Region
can be used as the BY variable in PROC PLOT.
proc sort data=education;
by region;
run;
Exclude data points with missing values. NOMISS excludes observations that have a
missing value for either of the axis variables.
proc plot data=education nomiss;
Create a separate plot for each BY group. The BY statement creates a separate plot for
each value of Region.
by region;
Create the plot with a reference line and a label for each data point. The plot request
plots Expenditures on the vertical axis, plots DropoutRate on the horizontal axis, and species
an asterisk as the plotting symbol. The label variable specication ($ state) in the PLOT
statement labels each point on the plot with the name of the corresponding state. HREF= draws
a reference line extending from 28.6 on the horizontal axis. The reference line represents the
national average.
plot expenditures*dropoutrate=* $ state / href=28.6;
654
Output
Chapter 32
Output
PROC PLOT produces a plot for each BY group. Only the plot for the Northeast is shown.
Because New York has a missing value for Expenditures, the observation is excluded and
PROC PLOT does not use the value 35 for DropoutRate to calculate the horizontal axis.
Compare the horizontal axis in this output with the horizontal axis in the plot for Northeast in
Example 9 on page 649.
Symbol used is *.
Expenditures |
|
8000 +
|
|
|
|
* New Jersey
|
|
|
|
|
7000 +
|
|
* Connecticut
|
|
|
|
|
|
|
6000 +
Massachusetts * |
|
* Maryland |
|
|
|
Delaware *|
|
|
5000 +
|
|
* Maine
* New Hampshire
|
|
|
|
|
|
4000 +
|
|
|
--+--------+--------+--------+--------+--------+--------+--------+16
18
20
22
24
26
28
30
Dropout Percentage - 1989
NOTE: 1 obs had missing values.
Program
655
This example illustrates the default placement of labels and how to adjust the
placement of labels on a crowded plot. The labels are values of variable in the data set.*
This example also shows RUN group processing in PROC PLOT.
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=120 pagesize=37;
Create the CENSUS data set. CENSUS contains the variables CrimeRate and Density for
selected states. CrimeRate is the number of crimes per 100,000 people. Density is the population
density per square mile in the 1980 census. A DATA step on page 1377 creates this data set.
data census;
input Density CrimeRate State $ 14-27 PostalCode $ 29-30;
datalines;
263.3 4575.3 Ohio
OH
62.1 7017.1 Washington
WA
...more data lines...
111.6 4665.6 Tennessee
TN
120.4 4649.9 North Carolina NC
;
Create the plot with a label for each data point. The plot request plots Density on the
vertical axis, CrimeRate on the horizontal axis, and uses the rst letter of the value of State as
the plotting symbol. This makes it easier to match the symbol with its label. The label variable
specication ($ state) in the PLOT statement labels each point with the corresponding state
name.
proc plot data=census;
plot density*crimerate=state $ state /
Specify plot options. BOX draws a box around the plot. LIST= lists the labels that have
penalties greater than or equal to 1. HAXIS= and VAXIS= specify increments only. PROC PLOT
uses the data to determine the range for the axes.
box
list=1
haxis=by 1000
vaxis=by 250;
* Source: U.S. Bureau of the Census and the 1987 Uniform Crime Reports, FBI.
656
Program
Chapter 32
The labels Tennessee, South Carolina, Arkansas, Minnesota, and South Dakota have penalties. The
default placement states do not provide enough possibilities for PROC PLOT to avoid penalties given the
proximity of the points. Seven label characters are hidden.
---+------------+------------+------------+------------+------------+------------+------------+--Density |
|
500 +
|
|
+
|
|
|
|
|
M Maryland
|
|
|
|
|
|
|
|
|
|
|
|
D Delaware
P Pennsylvania
|
|
|
O Ohio
250 +
|
+
|
|
|
|
I Illinois
|
|
|
|
|
|
|
0 +
North Carolina
TennNssee
Georgia
N New Hampshire T
S South Garolina
W West Virginia
Mississippi M
A Alabama
Vermont V
M Missouri
MinneAoArkMnsas
North Dakota
S Nouth Dakota
|
F Florida|
|
C California
|
|
|
Washington W
O Oklahoma
I Idaho
|
|
|
T Texas
O Oregon
|
+
N Nevada
---+------------+------------+------------+------------+------------+------------+------------+--2000
3000
4000
5000
6000
7000
8000
9000
CrimeRate
NOTE: 7 label characters hidden.
Label
Tennessee
South Carolina
Arkansas
Minnesota
South Dakota
Vertical
Axis
Horizontal
Axis
Penalty
111.60
103.40
43.90
4665.6
5161.9
4245.2
2
2
6
51.20
9.10
4615.8
2678.0
7
11
Starting
Position
Lines
Vertical
Shift
Horizontal
Shift
Center
Right
Right
1
1
1
1
0
0
-1
2
2
Left
Right
1
1
0
0
-2
2
Program
657
Request a second plot. Because PROC PLOT is interactive, the procedure is still running at
this point in the program. It is not necessary to restart the procedure to submit another plot
request. LIST=1 produces no output because there are no penalties of 1 or greater.
plot density*crimerate=state $ state /
box
list=1
haxis=by 1000
vaxis=by 250
Specify placement options. PLACEMENT= gives PROC PLOT more placement states to use
to place the labels. PLACEMENT= contains three expressions. The rst expression species the
preferred positions for the label. The rst expression resolves to placement states centered
above the plotting symbol, with the label on one or two lines. The second and third expressions
resolve to placement states that enable PROC PLOT to place the label in multiple positions
around the plotting symbol.
placement=((v=2
((l=2 2 1
(s=center
h=0 1 to
1 : l=2 1)
: v=0 1 0) * (s=right left : h=2 -2))
right left * l=2 1 * v=0 1 -1 2 *
5 by alt));
658
Output
Chapter 32
Output
---+------------+------------+------------+------------+------------+------------+------------+--Density |
|
500 +
|
|
+
|
|
Maryland
|
|
|
|
|
|
|
|
|
|
|
|
|
Pennsylvania
|
250 +
|
|
+
|
Florida
F
North Carolina
New
Hampshire
West
| Virginia
|
W
|
|
|
0 +
|
|
|
Illinois
|
|
|
|
|
Delaware
D
Ohio
Alabama N
California
South
Carolina
N
T
S
Mississippi
A Tennessee
M Vermont V
M Missouri
|
|
G Georgia
Oklahoma
South
Arkansas A
M Minnesota
Dakota
I Idaho
S N North Dakota
O
Nevada
N
|
|
|
Washington Texas
W
T
Oregon
O
|
|
|
|
|
+
---+------------+------------+------------+------------+------------+------------+------------+--2000
3000
4000
5000
6000
7000
8000
9000
CrimeRate
This example illustrates the default placement of labels and uses a macro to adjust
the placement of labels. The labels are values of a variable in the data set.
Program
659
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=120 pagesize=37;
Use conditional logic to determine placement. The %PLACE macro provides an alternative
to using the PLACEMENT= option. The higher the value of n, the more freedom PROC PLOT
has to place labels.
%macro place(n);
%if &n > 13 %then %let n = 13;
placement=(
%if &n <= 0 %then (s=center); %else (h=2 -2 : s=right left);
%if &n = 1 %then (v=1 * h=0 -1 to -2 by alt);
%else %if &n = 2 %then (v=1 -1 * h=0 -1 to -5 by alt);
%else %if &n > 2 %then (v=1 to 2 by alt * h=0 -1 to -10 by alt);
%if &n > 3 %then
(s=center right left * v=0 1 to %eval(&n - 2) by alt *
h=0 -1 to %eval(-3 * (&n - 2)) by alt *
l=1 to %eval(2 + (10 * &n - 35) / 30)); )
%if &n > 4 %then penalty(7)=%eval((3 * &n) / 2);
%mend;
Create the plot. The plot request plots Density on the vertical axis, CrimeRate on the
horizontal axis, and uses the rst letter of the value of State as the plotting symbol. The label
variable specication ($ state) in the PLOT statement t labels each point with the
corresponding state name.
proc plot data=census;
plot density*crimerate=state $ state /
Specify plot options. BOX draws a box around the plot. LIST= lists the labels that have
penalties greater than or equal to 1. HAXIS= and VAXIS= specify increments only. PROC PLOT
uses the data to determine the range for the axes. The PLACE macro determines the placement
of the labels.
box
list=1
haxis=by 1000
vaxis=by 250
%place(4);
660
Output
Chapter 32
Output
---+------------+------------+------------+------------+------------+------------+------------+--Density |
|
500 +
+
|
|
|
|
|
|
|
M Maryland
|
|
|
|
|
|
|
|
|
|
|
|
D Delaware
P Pennsylvania
|
|
|
O Ohio
250 +
|
|
+
|
|
I Illinois
|
|
F Florida|
|
|
|
|
North Carolina
N Tennessee
N New Hampshire T
S
|
|
|
W West Virginia
Mississippi M
C California
|
|
|
G Georgia
Washington W
O Oklahoma
|
|
|
T Texas
|
South Dakota
I Idaho
O Oregon
|
0 +
S N North Dakota
N Nevada
+
---+------------+------------+------------+------------+------------+------------+------------+--2000
3000
4000
5000
6000
CrimeRate
7000
8000
9000
Program
661
This example demonstrates how changing a default penalty affects the placement of
labels. The goal is to produce a plot that has labels that do not detract from how the
points are scattered.
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=120 pagesize=37;
Create the plot. The plot request plots Density on the vertical axis, CrimeRate on the
horizontal axis, and uses the rst letter of the value of State as the plotting symbol. The label
variable specication ($ state) in the PLOT statement labels each point with the
corresponding state name.
proc plot data=census;
plot density*crimerate=state $ state /
662
Program
Chapter 32
Specify the placement. PLACEMENT= species that the preferred placement states are 100
columns to the left and the right of the point, on the same line with the point.
placement=(h=100 to 10 by alt * s=left right)
Change the default penalty. PENALTIES(4)= changes the default penalty for a free
horizontal shift to 500, which removes all penalties for a horizontal shift. LIST= shows how far
PROC PLOT shifted the labels away from their respective points.
penalties(4)=500
list=0
Customize the axes. HAXIS= creates a horizontal axis long enough to leave space for the
labels on the sides of the plot. VAXIS= species that the values on the vertical axis be in
increments of 100.
haxis=0 to 13000 by 1000
vaxis=by 100;
Output
Output
Density |
500 +
|
|
|
|
400 +
Maryland
|
|
|
|
300 +
|
|
|
Pennsylvania Ohio
|
200 +
|Florida
|Washington Texas
|Oklahoma
|Oregon
Illinois
F
|
|
|
100 +Georgia
|
Delaware
California
T
N
A M
M
V
A
I
W
O
O
0 +
S N
N
North Dakota South Dakota Nevada
---+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
11000
12000
13000
CrimeRate
NOTE: 1 obs hidden.
663
664
Output
Chapter 32
Horizontal
Axis
Penalty
Lines
Vertical
Shift
Horizontal
Shift
Maryland
Delaware
428.70
307.60
5477.6
4938.8
0
0
Right
Right
1
1
0
0
55
59
Pennsylvania
Ohio
264.30
263.30
3163.2
4575.3
0
0
Right
Right
1
1
0
0
65
66
Illinois
Florida
California
205.30
180.00
151.40
5416.5
8503.2
6506.4
0
0
0
Right
Left
Right
1
1
1
0
0
0
56
-64
45
Tennessee
North Carolina
New Hampshire
111.60
120.40
102.40
4665.6
4649.9
3371.7
0
0
0
Right
Right
Right
1
1
1
0
0
0
61
46
52
South Carolina
Georgia
West Virginia
103.40
94.10
80.80
5161.9
5792.0
2190.7
0
0
0
Right
Left
Right
1
1
1
0
0
0
52
-42
76
Alabama
Missouri
Mississippi
76.60
71.20
53.40
4451.4
4707.5
3438.6
0
0
0
Right
Right
Right
1
1
1
0
0
0
41
47
68
Vermont
Minnesota
55.20
51.20
4271.2
4615.8
0
0
Right
Right
1
1
0
0
44
49
Washington
Texas
Arkansas
62.10
54.30
43.90
7017.1
7722.4
4245.2
0
0
0
Left
Left
Right
1
1
1
0
0
0
-49
-49
65
Oklahoma
Idaho
Oregon
44.10
11.50
27.40
6025.6
4156.3
6969.9
0
0
0
Left
Right
Left
1
1
1
0
0
0
-43
69
-53
9.10
9.40
7.30
2678.0
2833.0
6371.4
0
0
0
Right
Right
Right
1
1
1
0
0
0
67
52
50
Label
South Dakota
North Dakota
Nevada
Starting
Position
665
CHAPTER
33
The PMENU Procedure
Overview: PMENU Procedure 665
Syntax: PMENU Procedure 666
PROC PMENU Statement 667
CHECKBOX Statement 668
DIALOG Statement 668
ITEM Statement 670
MENU Statement 673
RADIOBOX Statement 675
RBUTTON Statement 675
SELECTION Statement 676
SEPARATOR Statement 677
SUBMENU Statement 677
TEXT Statement 678
Concepts: PMENU Procedure 679
Procedure Execution 679
Initiating the Procedure 679
Ending the Procedure 680
Steps for Building and Using PMENU Catalog Entries 680
Templates for Coding PROC PMENU Steps 681
Examples: PMENU Procedure 682
Example 1: Building a Menu Bar for an FSEDIT Application 682
Example 2: Collecting User Input in a Dialog Box 685
Example 3: Creating a Dialog Box to Search Multiple Variables 688
Example 4: Creating Menus for a DATA Step Window Application 694
Example 5: Associating Menus with a FRAME Application 700
666
Figure 33.1
Chapter 33
Edit
Reports
Help
Farm
Industrial...
Manufacturing...
Select a commodity:
Wheat
Corn
Oats
Select a market:
Farmville
Monticello
Plainview
OK
Cancel
Note: A menu bar in some operating environments may appear as a popup menu or
may appear at the bottom of the window. 4
The PMENU procedure produces no immediately visible output. It simply builds a
catalog entry of type PMENU that can be used later in an application.
You must use at least one MENU statement followed by at least one ITEM
statement.
Tip:
You can also use appropriate global statements with this procedure. See
Chapter 2, Fundamental Concepts for Using Base SAS Procedures, on page 15 for a
list.
Reminder:
See:
667
MENU pull-down-menu;
SELECTION selection command-string;
SEPARATOR;
SUBMENU submenu-name SAS-le;
To do this
CHECKBOX
DIALOG
ITEM
MENU
SELECTION
SEPARATOR
SUBMENU
TEXT
Options
CATALOG=<libref.>catalog
provides a description for the PMENU catalog entries created in the step.
Default: Menu description
Note: These descriptions are displayed when you use the CATALOG window in
the windowing environment or the CONTENTS statement in the CATALOG
procedure. 4
668
CHECKBOX Statement
Chapter 33
CHECKBOX Statement
Denes choices that a user can make within a dialog box.
Restriction:
Required Arguments
column
species the column in the dialog box where the check box and text are placed.
line
species the line in the dialog box where the check box and text are placed.
text-for-selection
denes the text that describes this check box. This text appears in the window and,
if the SUBSTITUTE= option is not used, is also inserted into the command in the
preceding DIALOG statement when the user selects the check box.
Options
COLOR=color
denes the color of the check box and the text that describes it.
ON
indicates that by default this check box is active. If you use this option, then you
must specify it immediately after the CHECKBOX keyword.
SUBSTITUTE=text-for-substitution
species the text that is to be inserted into the command in the DIALOG statement.
DIALOG Statement
Describes a dialog box that is associated with an item on a pull-down menu.
Restriction: Must be followed by at least one TEXT statement.
Featured in: Example 2 on page 685, Example 3 on page 688, and Example 4 on page 694
DIALOG Statement
669
Required Arguments
command-string
is the command or partial command that is executed when the item is selected. The
limit of the command-string that results after the substitutions are made is the
command-line limit for your operating environment. Typically, the command-line
limit is approximately 80 characters.
The limit for command-string eld-number-specication is 200 characters.
Note: If you are using PROC PMENU to submit any command that is valid only
in the PROGRAM EDITOR window (such as the INCLUDE command), then you
must have the windowing environment running, and you must return control to the
PROGRAM EDITOR window. 4
dialog-box
is the same name specied for the DIALOG= option in a previous ITEM statement.
eld-number-specication
Note: To specify a literal @ (at sign), % (percent sign), or & (ampersand) in the
command-string, use a double character: @@ (at signs), %% (percent signs), or &&
(ampersands). 4
670
ITEM Statement
Chapter 33
Details
3 You cannot control the placement of the dialog box. The dialog box is not
3
3
3
3
scrollable. The size and placement of the dialog box are determined by your
windowing environment.
To use the DIALOG statement, specify an ITEM statement with the DIALOG=
option in the ITEM statement.
The ITEM statement creates an entry in a menu bar or in a pull-down menu, and
the DIALOG= option species which DIALOG statement describes the dialog box.
You can use CHECKBOX, RADIOBOX, and RBUTTON statements to dene the
contents of the dialog box.
Figure 33.2 on page 670 shows a typical dialog box. A dialog box can request
information in three ways:
3 Fill in a eld. Fields that accept text from a user are called text elds.
3 Choose from a list of mutually exclusive choices. A group of selections of this
type is called a radio box, and each individual selection is called a radio
button.
3 Indicate whether you want to select other independent choices. For example,
you could choose to use various options by selecting any or all of the listed
selections. A selection of this type is called a check box.
Figure 33.2
Select a commodity:
Wheat
Corn
Oats
Select a market:
Farmville
Monticello
Plainview
Radio box
Text field
Check box
OK
Cancel
Push button
Dialog boxes have two or more buttons, such as OK and Cancel, automatically
built into the box.* A button causes an action to occur.
ITEM Statement
Identies an item to be listed in a menu bar or in a pull-down menu.
Featured in:
ITEM Statement
671
To do this
DIALOG=
MENU=
SELECTION=
SUBMENU=
HELP=
ACCELERATE=
GRAY
ID=
MNEMONIC=
STATE=
Required Arguments
command
a single word that is a valid SAS command for the window in which the menu
appears. Commands that are more than one word, such as WHERE CLEAR, must be
enclosed in single quotation marks. The command appears in uppercase letters on
the menu bar.
If you want to control the case of a SAS command on the menu, then enclose the
command in single quotation marks. The case that you use then appears on the
menu.
menu-item
a word or text string, enclosed in quotation marks, that describes the action that
occurs when the user selects this item. A menu item should not begin with a percent
sign (%).
Options
ACCELERATE=name-of-key
denes a key sequence that can be used instead of selecting an item. When the user
presses the key sequence, it has the same effect as selecting the item from the menu
bar or pull-down menu.
Restriction: The functionality of this option is limited to only a few characters. For
672
ITEM Statement
Chapter 33
include this option and it is not available in your operating environment, then the
option is ignored.
action-option
MENU=pull-down-menu
the name of an associated MENU statement, which displays a pull-down menu
when the user selects this item.
Featured in: Example 1 on page 682
SELECTION=selection
the name of an associated SELECTION statement, which submits a command
when the user selects this item.
Featured in: Example 1 on page 682
SUBMENU=submenu
the name of an associated SUBMENU statement, which displays a pmenu entry
when the user selects this item.
Featured in: Example 1 on page 682
indicates that the item is not an active choice in this window. This option is useful
when you want to dene standard lists of items for many windows, but not all items
are valid in all windows. When this option is set and the user selects the item, no
action occurs.
HELP=help-text
species text that is displayed when the user displays the menu item. For example,
if you use a mouse to pull down a menu, then position the mouse pointer over the
item and the text is displayed.
Restriction: This option is not available in all operating environments. If you
include this option and it is not available in your operating environment, then the
option is ignored.
Tip: The place where the text is displayed is operating environment-specic.
ID=integer
a value that is used as an identier for an item in a pull-down menu. This identier
is used within a SAS/AF application to selectively activate or deactivate items in a
menu or to set the state of an item as a check box or a radio button.
Minimum: 3001
Restriction: Integers from 0 to 3000 are reserved for operating environment and
SAS use.
Restriction: This option is not available in all operating environments. If you
include this option and it is not available in your operating environment, then the
option is ignored.
Tip: ID= is useful with the WINFO function in SAS Component Language.
Tip:
MENU Statement
673
You can use the same ID for more than one item.
underlines the rst occurrence of character in the text string that appears on the
pull-down menu. The character must be in the text string.
The character is typically used in combination with another key, such as ALT.
When you use the key sequence, it has the same effect as putting your cursor on the
item. But it does not invoke the action that the item controls.
Restriction: This option is not available in all operating environments. If you
include this option and it is not available in your operating environment, then the
option is ignored.
STATE=CHECK|RADIO
provides the ability to place a check box or a radio button next to an item that has
been selected.
STATE= is used with the ID= option and the WINFO function in SAS
Component Language.
Tip:
include this option and it is not available in your operating environment, then the
option is ignored.
MENU Statement
Names the catalog entry that stores the menus or denes a pull-down menu.
Featured in: Example 1 on page 682
MENU menu-bar;
MENU pull-down-menu;
Required Arguments
One of the following arguments is required:
674
MENU Statement
Chapter 33
menu-bar
names the pull-down menu that appears when the user selects an item in the menu
bar. The value of pull-down-menu must match the pull-down-menu name that is
specied in the MENU= option in a previous ITEM statement.
Figure 33.3
RBUTTON Statement
Pull-Down Menu
Primary windows
Other windows
OUTPUT
MANAGER
LOG
PGM
KEYS
HELP
PMENU
BYE
RADIOBOX Statement
Denes a box that contains mutually exclusive choices within a dialog box.
Restriction:
RADIOBOX DEFAULT=button-number;
Required Arguments
DEFAULT=button-number
Details
The RADIOBOX statement indicates the beginning of a list of selections.
Immediately after the RADIOBOX statement, you must list an RBUTTON statement
for each of the selections the user can make. When the user makes a choice, the text
value that is associated with the selection is inserted into the command string of the
previous DIALOG statement at eld locations prexed by a percent sign (%).
RBUTTON Statement
Lists mutually exclusive choices within a dialog box.
Restriction:
675
676
SELECTION Statement
Chapter 33
Required Arguments
column
species the column in the dialog box where the radio button and text are placed.
line
species the line in the dialog box where the radio button and text are placed.
text-for-selection
denes the text that appears in the dialog box and, if the SUBSTITUTE= option is
not used, denes the text that is inserted into the command in the preceding
DIALOG statement.
Note: Be careful not to overlap columns and lines when placing text and radio
buttons; if you overlap text and buttons, you will get an error message. Also, specify
space between other text and a radio button. 4
Options
COLOR=color
denes the color of the radio button and the text that describes the button.
Restriction: This option is not available in all operating environments. If you
include this option and it is not available in your operating environment, then the
option is ignored.
NONE
denes a button that indicates none of the other choices. Dening this button
enables the user to ignore any of the other choices. No characters, including blanks,
are inserted into the DIALOG statement.
Restriction: If you use this option, then it must appear immediately after the
RBUTTON keyword.
SUBSTITUTE=text-for-substitution
species the text that is to be inserted into the command in the DIALOG statement.
Featured in:
SELECTION Statement
Denes a command that is submitted when an item is selected.
Restriction:
Featured in:
SUBMENU Statement
677
Required Arguments
selection
is the same name specied for the SELECTION= option in a previous ITEM
statement.
command-string
Details
You dene the name of the item in the ITEM statement and specify the
SELECTION= option to associate the item with a subsequent SELECTION statement.
The SELECTION statement then denes the actual command that is submitted when
the user chooses the item in the menu bar or pull-down menu.
You are likely to use the SELECTION statement to dene a command string. You
create a simple alias by using the ITEM statement, which invokes a longer command
string that is dened in the SELECTION statement. For example, you could include an
item in the menu bar that invokes a WINDOW statement to enable data entry. The
actual commands that are processed when the user selects this item are the commands
to include and submit the application.
Note: If you are using PROC PMENU to issue any command that is valid only in
the PROGRAM EDITOR window (such as the INCLUDE command), then you must
have the windowing environment running, and you must return control to the
PROGRAM EDITOR window. 4
SEPARATOR Statement
Draws a line between items on a pull-down menu.
Restriction:
Restriction:
SEPARATOR;
SUBMENU Statement
Species the SAS le that contains a common submenu associated with an item.
Featured in: Example 1 on page 682
678
TEXT Statement
Chapter 33
Required Arguments
submenu-name
species a name for the submenu statement. To associate a submenu with a menu
item, submenu-name must match the submenu name specied in the SUBMENU=
action-option in the ITEM statement.
SAS-le
species the name of the SAS le that contains the common submenu.
TEXT Statement
Species text and the input elds for a dialog box.
Restriction:
Featured in:
Required Arguments
column
denes how the TEXT statement is used. The eld-description can be one of the
following:
LEN=eld-length
is the length of an input eld in which the user can enter information. If the
LEN= argument is used, then the information entered in the eld is inserted into
the command string of the previous DIALOG statement at eld locations prexed
by an at sign (@).
Featured in: Example 2 on page 685
text
is the text string that appears inside the dialog box at the location dened by line
and column.
line
Procedure Execution
679
Options
ATTR=attribute
denes the attribute for the text or input eld. Valid attribute values are
3 BLINK
3 HIGHLIGH
3 REV_VIDE
3 UNDERLIN
Restriction: This option is not available in all operating environments. If you
include this option and it is not available in your operating environment, then the
option is ignored.
Restriction: Your hardware may not support all of these attributes.
COLOR=color
denes the color for the text or input eld characters. These are the color values that
you can use:
BLACK
BROWN
GRAY
MAGENTA
PINK
WHITE
BLUE
CYAN
GREEN
ORANGE
RED
YELLOW
include this option and it is not available in your operating environment, then the
option is ignored.
Restriction: Your hardware may not support all of these colors.
Procedure Execution
680
Chapter 33
DIALOG statements as well as statements that are associated with the DIALOG
statement within the same RUN group. For example, the following statements dene
two separate PMENU catalog entries. Both are stored in the same catalog, but each
PMENU catalog entry is independent of the other. In the example, both PMENU
catalog entries create menu bars that simply list windowing environment commands
the user can select and execute:
libname proclib SAS-data-library;
proc pmenu catalog=proclib.mycat;
menu menu1;
item end;
item bye;
run;
menu
item
item
item
item
run;
menu2;
end;
pgm;
log;
output;
When you submit these statements, you receive a message that says that the
PMENU entries have been created. To display one of these menu bars, you must
associate the PMENU catalog entry with a window and then activate the window with
the menus turned on, as described in Steps for Building and Using PMENU Catalog
Entries on page 680.
681
3 Build a simple menu bar. All items on the menu bar are windowing environment
commands:
proc pmenu;
menu menu-bar;
item command;
...more-ITEM-statements...
run;
3 Create a menu bar with an item that submits a command other than that which
appears on the menu bar:
proc pmenu;
menu menu-bar;
item menu-item selection=selection;
...more-ITEM-statements...
selection selection command-string;
run;
3 Create a menu bar with an item that opens a dialog box, which displays
information and requests text input:
proc pmenu;
menu menu-bar;
item menu-item menu=pull-down-menu;
...more-ITEM-statements...
menu pull-down-menu;
item menu-item dialog=dialog-box;
dialog dialog-box command @1;
text #line @column text;
text #line @column LEN=field-length;
run;
3 Create a menu bar with an item that opens a dialog box, which permits one choice
from a list of possible values:
proc pmenu;
menu menu-bar;
item menu-item menu=pull-down-menu;
...more-ITEM-statements...
menu pull-down-menu;
item menu-item dialog=dialog-box;
dialog dialog-box command %1;
text #line @column text;
682
Chapter 33
radiobox default=button-number;
rbutton #line @column
text-for-selection;
...more-RBUTTON-statements...
run;
3 Create a menu bar with an item that opens a dialog box, which permits several
independent choices:
proc pmenu;
menu menu-bar;
item menu-item menu=pull-down-menu;
...more-ITEM-statements...
menu pull-down-menu;
item menu-item dialog=dialog-box;
dialog dialog-box command &1;
text #line @column text;
checkbox #line @column text;
...more-CHECKBOX-statements...
run;
This example creates a menu bar that can be used in an FSEDIT application to
replace the default menu bar. The selections available on these pull-down menus do not
enable end users to delete or duplicate observations.
Program
683
Program
Declare the PROCLIB library. The PROCLIB library is used to store menu denitions.
libname proclib SAS-data-library;
Specify the catalog for storing menu denitions. Menu denitions will be stored in the
PROCLIB.MENUCAT catalog.
proc pmenu catalog=proclib.menucat;
Specify the name of the catalog entry. The MENU statement species PROJECT as the
name of the catalog entry. The menus are stored in the catalog entry
PROCLIB.MENUCAT.PROJECT.PMENU.
menu project;
Design the menu bar. The ITEM statements specify the items for the menu bar. The value of
the MENU= option is used in a subsequent MENU statement. The Edit item uses a common
predened submenu; the menus for the other items are dened in this PROC step.
item
item
item
item
File menu=f;
Edit submenu=editmnu;
Scroll menu=s;
Help menu=h;
Design the File menu. This group of statements denes the selections available under File
on the menu bar. The rst ITEM statement species Goback as the rst selection under File.
The value of the SELECTION= option corresponds to the subsequent SELECTION statement,
which species END as the command that is issued for that selection. The second ITEM
statement species that the SAVE command is issued for that selection.
menu f;
item Goback selection=g;
item Save;
selection g end;
Add the EDITMNU submenu. The SUBMENU statement associates a predened submenu
that is located in the SAS le SASHELP.CORE.EDIT with the Edit item on the menu bar. The
name of this SUBMENU statement is EDITMNU, which corresponds with the name in the
SUBMENU= action-option in the ITEM statement for the Edit item.
submenu editmnu sashelp.core.edit;
684
Chapter 33
Design the Scroll menu. This group of statements denes the selections available under
Scroll on the menu bar.
menu s;
item Next Obs selection=n;
item Prev Obs selection=p;
item Top;
item Bottom;
selection n forward;
selection p backward;
Design the Help menu. This group of statements denes the selections available under Help
on the menu bar. The SETHELP command species a HELP entry that contains user-written
information for this FSEDIT application. The semicolon that appears after the HELP entry
name enables the HELP command to be included in the string. The HELP command invokes
the HELP entry.
menu h;
item Keys;
item About this application selection=hlp;
selection hlp sethelp user.menucat.staffhlp.help;help;
quit;
You can also specify the menu bar on the command line in the FSEDIT session or by
issuing a CALL EXECCMD command in SAS Component Language (SCL).
See Associating a Menu Bar with an FSEDIT Session on page 691 for other
methods of associating the customized menu bar with the FSEDIT window.
The FSEDIT window shows the menu bar.
Program
685
DIALOG statement
TEXT statement option:
LEN=
This example adds a dialog box to the menus created in Example 1 on page 682. The
dialog box enables the user to use a WHERE clause to subset the SAS data set.
Tasks include
Program
Declare the PROCLIB library. The PROCLIB library is used to store menu denitions.
libname proclib SAS-data-library;
Specify the catalog for storing menu denitions. Menu denitions will be stored in the
PROCLIB.MENUCAT catalog.
proc pmenu catalog=proclib.menucat;
Specify the name of the catalog entry. The MENU statement species PROJECT as the
name of the catalog entry. The menus are stored in the catalog entry
PROCLIB.MENUCAT.PROJECT.PMENU.
menu project;
Design the menu bar. The ITEM statements specify the items for the menu bar. The value of
the MENU= option is used in a subsequent MENU statement.
item
item
item
item
item
File menu=f;
Edit menu=e;
Scroll menu=s;
Subset menu=sub;
Help menu=h;
686
Program
Chapter 33
Design the File menu. This group of statements denes the selections under File on the
menu bar. The rst ITEM statement species Goback as the rst selection under File. The
value of the SELECTION= option corresponds to the subsequent SELECTION statement, which
species END as the command that is issued for that selection. The second ITEM statement
species that the SAVE command is issued for that selection.
menu f;
item Goback selection=g;
item Save;
selection g end;
Design the Edit menu. This group of statements denes the selections available under Edit
on the menu bar.
menu e;
item Cancel;
item Add;
Design the Scroll menu. This group of statements denes the selections available under
Scroll on the menu bar.
menu s;
item Next Obs selection=n;
item Prev Obs selection=p;
item Top;
item Bottom;
selection n forward;
selection p backward;
Design the Subset menu. This group of statements denes the selections available under
Subset on the menu bar. The value d1 in the DIALOG= option is used in the subsequent
DIALOG statement.
menu sub;
item Where dialog=d1;
item Where Clear;
Design the Help menu. This group of statements denes the selections available under Help
on the menu bar. The SETHELP command species a HELP entry that contains user-written
information for this FSEDIT application. The semicolon enables the HELP command to be
included in the string. The HELP command invokes the HELP entry.
menu h;
item Keys;
item About this application selection=hlp;
selection hlp sethelp proclib.menucat.staffhlp.help;help;
687
Design the dialog box. The DIALOG statement builds a WHERE command. The arguments
for the WHERE command are provided by user input into the text entry elds described by the
three TEXT statements. The @1 notation is a placeholder for user input in the text eld. The
TEXT statements specify the text in the dialog box and the length of the input eld.
dialog d1 where @1;
text #2 @3 Enter a valid WHERE clause or UNDO;
text #4 @3 WHERE ;
text #4 @10 len=40;
quit;
You can also specify the menu bar on the command line in the FSEDIT session or by
issuing a CALL EXECCMD command in SAS Component Language (SCL). Refer to
SAS Component Language: Reference for complete documentation on SCL.
See Associating a Menu Bar with an FSEDIT Session on page 691 for other
methods of associating the customized menu bar with the FSEDIT window.
This dialog box appears when the user chooses Subset and then Where.
688
Chapter 33
DIALOG statement
SAS macro invocation
ITEM statement
DIALOG= option
RADIOBOX statement option:
DEFAULT=
RBUTTON statement option:
SUBSTITUTE=
Other features: SAS macro invocation
This example shows how to modify the menu bar in an FSEDIT session to enable a
search for one value across multiple variables. The example creates customized menus
to use in an FSEDIT session. The menu structure is the same as in the preceding
example, except for the WHERE dialog box.
When selected, the menu item invokes a macro. The user input becomes values for
macro parameters. The macro generates a WHERE command that expands to include
all the variables needed for the search.
Tasks include
Program
Declare the PROCLIB library. The PROCLIB library is used to store menu denitions.
libname proclib SAS-data-library;
Specify the catalog for storing menu denitions. Menu denitions will be stored in the
PROCLIB.MENUCAT catalog.
proc pmenu catalog=proclib.menucat;
Specify the name of the catalog entry. The MENU statement species STAFF as the name
of the catalog entry. The menus are stored in the catalog entry
PROCLIB.MENUCAT.PROJECT.PMENU.
menu project;
Program
689
Design the menu bar. The ITEM statements specify the items for the menu bar. The value of
the MENU= option is used in a subsequent MENU statement.
item
item
item
item
item
File menu=f;
Edit menu=e;
Scroll menu=s;
Subset menu=sub;
Help menu=h;
Design the File menu. This group of statements denes the selections under File on the
menu bar. The rst ITEM statement species Goback as the rst selection under File. The
value of the SELECTION= option corresponds to the subsequent SELECTION statement, which
species END as the command that is issued for that selection. The second ITEM statement
species that the SAVE command is issued for that selection.
menu f;
item Goback selection=g;
item Save;
selection g end;
Design the Edit menu. The ITEM statements dene the selections under Edit on the menu
bar.
menu e;
item Cancel;
item Add;
Design the Scroll menu. This group of statements denes the selections under Scroll on the
menu bar. If the quoted string in the ITEM statement is not a valid command, then the
SELECTION= option corresponds to a subsequent SELECTION statement, which species a
valid command.
menu s;
item Next Obs selection=n;
item Prev Obs selection=p;
item Top;
item Bottom;
selection n forward;
selection p backward;
Design the Subset menu. This group of statements denes the selections under Subset on
the menu bar. The DIALOG= option names a dialog box that is dened in a subsequent
DIALOG statement.
menu sub;
item Where dialog=d1;
item Where Clear;
690
Program
Chapter 33
Design the Help menu. This group of statements denes the selections under Help on the
menu bar. The SETHELP command species a HELP entry that contains user-written
information for this FSEDIT application. The semicolon that appears after the HELP entry
name enables the HELP command to be included in the string. The HELP command invokes
the HELP entry.
menu h;
item Keys;
item About this application selection=hlp;
selection hlp sethelp proclib.menucat.staffhlp.help;help;
Design the dialog box. WBUILD is a SAS macro. The double percent sign that precedes
WBUILD is necessary to prevent PROC PMENU from expecting a eld number to follow. The
eld numbers %1, %2, and %3 equate to the values that the user specied with the radio boxes.
The eld number @1 equates to the search value that the user enters. See How the WBUILD
Macro Works on page 693.
dialog d1 %%wbuild(%1,%2,@1,%3);
Add a radio box for region selection. The TEXT statement species text for the dialog box
that appears on line 1 and begins in column 1. The RADIOBOX statement species that a radio
box will appear in the dialog box. DEFAULT= species that the rst radio button (Northeast)
will be selected by default. The RBUTTON statements specify the mutually exclusive choices for
the radio buttons: Northeast, Northwest, Southeast, or Southwest. SUBSTITUTE= gives
the value that is substituted for the %1 in the DIALOG statement above if that radio button is
selected.
text #1 @1 Choose a region:;
radiobox default=1;
rbutton #3 @5 Northeast substitute=NE;
rbutton #4 @5 Northwest substitute=NW;
rbutton #5 @5 Southeast substitute=SE;
rbutton #6 @5 Southwest substitute=SW;
Add a radio box for pollutant selection. The TEXT statement species text for the dialog
box that appears on line 8 (#8) and begins in column 1 (@1). The RADIOBOX statement
species that a radio box will appear in the dialog box. DEFAULT= species that the rst radio
button (Pollutant A) will be selected by default. The RBUTTON statements specify the
mutually exclusive choices for the radio buttons: Pollutant A or Pollutant B.
SUBSTITUTE= gives the value that is substituted for the %2 in the preceding DIALOG
statement if that radio button is selected.
text #8 @1 Choose a contaminant:;
radiobox default=1;
rbutton #10 @5 Pollutant A substitute=pol_a,2;
rbutton #11 @5 Pollutant B substitute=pol_b,4;
691
Add an input eld. The rst TEXT statement species text for the dialog box that appears on
line 13 and begins in column 1. The second TEXT statement species an input eld that is 6
bytes long that appears on line 13 and begins in column 25. The value that the user enters in
the eld is substituted for the @1 in the preceding DIALOG statement.
text #13 @1 Enter Value for Search:;
text #13 @25 len=6;
Add a radio box for comparison operator selection. The TEXT statement species text for
the dialog box that appears on line 15 and begins in column 1. The RADIOBOX statement
species that a radio box will appear in the dialog box. DEFAULT= species that the rst radio
button (Greater Than or Equal To) will be selected by default. The RBUTTON statements
specify the mutually exclusive choices for the radio buttons. SUBSTITUTE= gives the value that
is substituted for the %3 in the preceding DIALOG statement if that radio button is selected.
text #15 @1 Choose a comparison criterion:;
radiobox default=1;
rbutton #16 @5 Greater Than or Equal To
substitute=GE;
rbutton #17 @5 Less Than or Equal To
substitute=LE;
rbutton #18 @5 Equal To substitute=EQ;
quit;
This dialog box appears when the user selects Subset and then Where.
692
Chapter 33
pollutant A twice at each lake, and the results are recorded in the variables POL_A1
and POL_A2. Tests were conducted for pollutant B four times at each lake, and the
results are recorded in the variables POL_B1 - POL_B4. Each lake is located in one of
four regions. The following output lists the contents of PROCLIB.LAKES:
Output 33.1
PROCLIB.LAKES
region
NE
NE
NE
NE
NW
NW
NW
NW
SE
SE
SE
SE
SW
SW
SW
SW
lake
Carr
Duraleigh
Charlie
Farmer
Canyon
Morris
Golf
Falls
Pleasant
Juliette
Massey
Delta
Alumni
New Dam
Border
Red
pol_a1
pol_a2
pol_b1
pol_b2
pol_b3
pol_b4
0.24
0.34
0.40
0.60
0.63
0.85
0.69
0.01
0.16
0.82
1.01
0.84
0.45
0.80
0.51
0.22
0.99
0.01
0.48
0.65
0.44
0.95
0.37
0.02
0.96
0.35
0.77
1.05
0.32
0.70
0.04
0.09
0.95
0.48
0.29
0.25
0.20
0.80
0.08
0.59
0.71
0.09
0.45
0.90
0.45
0.31
0.55
0.02
0.36
0.58
0.56
0.20
0.98
0.67
0.72
0.58
0.35
0.03
0.32
0.09
0.44
0.98
0.35
0.10
0.44
0.12
0.52
0.30
0.19
0.32
0.71
0.67
0.35
0.59
0.55
0.64
0.55
1.00
0.45
0.32
0.67
0.56
0.95
0.64
0.01
0.81
0.32
0.02
0.48
0.90
0.66
0.03
0.12
0.22
0.78
0.01
To associate the customized menu bar menu with the FSEDIT session, do any one of
the following:
3 enter a SETPMENU command on the command line. The command for this
example is
setpmenu proclib.menucat.project.pmenu
693
Using the custom menu item, you would select Southwest, Pollutant A, enter .50
as the value, and choose Greater Than or Equal To as the comparison criterion. Two
lakes, New Dam and Border, meet the criteria.
The WBUILD macro uses the four pieces of information from the dialog box to
generate a WHERE command:
3 One of the values for region, either NE, NW, SE, or SW, becomes the value of the
macro parameter REGION.
3 Either pol_a,2 or pol_b,4 become the values of the PREFIX and NUMVAR
macro parameters. The comma is part of the value that is passed to the WBUILD
macro and serves to delimit the two parameters, PREFIX and NUMVAR.
3 The value that the user enters for the search becomes the value of the macro
parameter VALUE.
3 The operator that the user chooses becomes the value of the macro parameter
OPERATOR.
To see how the macro works, again consider the following example, in which you
want to know if any of the lakes in the southwest tested for a value of .50 or greater for
pollutant A. The values of the macro parameters would be
REGION
SW
PREFIX
pol_a
NUMVAR
VALUE
.50
OPERATOR
GE
The rst %IF statement checks to make sure that the user entered a value. If a
value has been entered, then the macro begins to generate the WHERE command.
First, the macro creates the beginning of the WHERE command:
where region="SW" and (
Next, the %DO loop executes. For pollutant A, it executes twice because
NUMVAR=2. In the macro denition, the period in &prefix.&i concatenates pol_a
with 1 and with 2. At each iteration of the loop, the macro resolves PREFIX,
OPERATOR, and VALUE, and it generates a part of the WHERE command. On the
rst iteration, it generates pol_a1 GE .50
The %IF statement in the loop checks to see if the loop is working on its last
iteration. If it is not working, then the macro makes a compound WHERE command by
putting an OR between the individual clauses. The next part of the WHERE command
becomes OR pol_a2 GE .50
The loop ends after two executions for pollutant A, and the macro generates the end
of the WHERE command:
)
694
Chapter 33
Results from the macro are placed on the command line. The following code is the
denition of the WBUILD macro. The underlined code shows the parts of the WHERE
command that are text strings that the macro does not resolve:
%macro wbuild(region,prefix,numvar,value,operator);
/* check to see if value is present */
%if &value ne %then %do;
where region="®ion" AND (
/* If the values are character,
*/
/* enclose &value in double quotation marks. */
%do i=1 %to &numvar;
&prefix.&i &operator &value
/* if not on last variable, */
/* generate OR
*/
%if &i ne &numvar %then %do;
OR
%end;
%end;
)
%end;
%mend wbuild;
DIALOG statement
SELECTION statement
Other features: FILENAME statement
This example denes an application that enables the user to enter human resources
data for various departments and to request reports from the data sets that are created
by the data entry.
The rst part of the example describes the PROC PMENU step that creates the
menus. The subsequent sections describe how to use the menus in a DATA step window
application.
Tasks include
3 associating customized menus with a DATA step window
3 creating menus for a DATA step window
3 submitting SAS code from a menu selection
3 creating a pull-down menu selection that calls a dialog box.
Program
Declare the PROCLIB library. The PROCLIB library is used to store menu denitions.
libname proclib SAS-data-library;
Program
695
Declare the DE and PRT lenames. The FILENAME statements dene the external les in
which the programs to create the windows are stored.
filename de
filename prt
external-file;
external-file;
Specify the catalog for storing menu denitions. Menu denitions will be stored in the
PROCLIB.MENUCAT catalog.
proc pmenu catalog=proclib.menus;
Specify the name of the catalog entry. The MENU statement species SELECT as the name
of the catalog entry. The menus are stored in the catalog entry
PROCLIB.MENUS.SELECT.PMENU.
menu select;
Design the menu bar. The ITEM statements specify the three items on the menu bar. The
value of the MENU= option is used in a subsequent MENU statement.
item File menu=f;
item Data_Entry menu=deptsde;
item Print_Report menu=deptsprt;
Design the File menu. This group of statements denes the selections under File. The value
of the SELECTION= option is used in a subsequent SELECTION statement.
menu f;
item End
item End
selection
selection
Design the Data_Entry menu. This group of statements denes the selections under
Data_Entry on the menu bar. The ITEM statements specify that For Dept01 and For
Dept02 appear under Data_Entry. The value of the SELECTION= option equates to a
subsequent SELECTION statement, which contains the string of commands that are actually
submitted. The value of the DIALOG= option equates to a subsequent DIALOG statement,
which describes the dialog box that appears when this item is selected.
menu deptsde;
item For Dept01 selection=de1;
item For Dept02 selection=de2;
item Other Departments dialog=deother;
696
Program
Chapter 33
Specify commands under the Data_Entry menu. The commands in single quotation marks
are submitted when the user selects For Dept01 or For Dept02. The END command ends the
current window and returns to the PROGRAM EDITOR window so that further commands can
be submitted. The INCLUDE command includes the SAS statements that create the data entry
window. The CHANGE command modies the DATA statement in the included program so that
it creates the correct data set. (See Using a Data Entry Program on page 698.) The SUBMIT
command submits the DATA step program.
selection de1 end;pgm;include de;change xx 01;submit;
selection de2 end;pgm;include de;change xx 02;submit;
Design the DEOTHER dialog box. The DIALOG statement denes the dialog box that
appears when the user selects Other Departments. The DIALOG statement modies the
command string so that the name of the department that is entered by the user is used to
change deptxx in the SAS program that is included. (See Using a Data Entry Program on
page 698.) The rst two TEXT statements specify text that appears in the dialog box. The third
TEXT statement species an input eld. The name that is entered in this eld is substituted for
the @1 in the DIALOG statement.
dialog deother end;pgm;include de;c deptxx @1;submit;
text #1 @1 Enter department name;
text #2 @3 in the form DEPT99:;
text #2 @25 len=7;
Design the Print_Report menu. This group of statements denes the choices under the
Print_Report item. These ITEM statements specify that For Dept01 and For Dept02
appear in the pull-down menu. The value of the SELECTION= option equates to a subsequent
SELECTION statement, which contains the string of commands that are actually submitted.
menu deptsprt;
item For Dept01 selection=prt1;
item For Dept02 selection=prt2;
item Other Departments dialog=prother;
Specify commands for the Print_Report menu. The commands in single quotation marks
are submitted when the user selects For Dept01 or For Dept02. The END command ends the
current window and returns to the PROGRAM EDITOR window so that further commands can
be submitted. The INCLUDE command includes the SAS statements that print the report. (See
Printing a Program on page 699.) The CHANGE command modies the PROC PRINT step in
the included program so that it prints the correct data set. The SUBMIT command submits the
PROC PRINT program.
selection prt1
end;pgm;include prt;change xx 01 all;submit;
selection prt2
end;pgm;include prt;change xx 02 all;submit;
697
Design the PROTHER dialog box. The DIALOG statement denes the dialog box that
appears when the user selects Other Departments. The DIALOG statement modies the
command string so that the name of the department that is entered by the user is used to
change deptxx in the SAS program that is included. (See Printing a Program on page 699.)
The rst two TEXT statements specify text that appears in the dialog box. The third TEXT
statement species an input eld. The name entered in this eld is substituted for the @1 in the
DIALOG statement.
dialog prother end;pgm;include prt;c deptxx @1 all;submit;
text #1 @1 Enter department name;
text #2 @3 in the form DEPT99:;
text #2 @25 len=7;
Specify a second catalog entry and menu bar. The MENU statement species ENTRDATA
as the name of the catalog entry that this RUN group is creating. File is the only item on the
menu bar. The selections available are End this window and End this SAS session.
menu entrdata;
item File menu=f;
menu f;
item End this window selection=endwdw;
item End this SAS session selection=endsas;
selection endwdw end;
selection endsas bye;
run;
quit;
698
Chapter 33
Printing a Program
699
The %INCLUDE statement recalls the statements in the le HRWDW. The statements in
HRWDW redisplay the primary window. See the HRSELECT window on page 698.
filename hrwdw external-file;
%include hrwdw;
run;
The SELECTION and DIALOG statements in the PROC PMENU step modify the
DATA statement in this program so that the correct department name is used when the
data set is created. That is, if the user selects Other Departments and enters DEPT05,
then the DATA statement is changed by the command string in the DIALOG statement
to
data proclib.dept05;
Printing a Program
When the user selects Print_Report from the menu bar, a pull-down menu is
displayed. When the user selects one of the listed departments or chooses to enter a
different department, the following statements are invoked. These statements are
stored in the external le referenced by the PRT leref.
700
Chapter 33
The xxs are changed to the appropriate department number by the CHANGE command in the
SELECTION or DIALOG statement in the PROC PMENU step. PROC PRINT prints that data
set.
libname proclib SAS-data-library;
proc print data=proclib.deptxx;
title Information for deptxx;
run;
This PROC PRINTTO steps restores the default output destination. See Chapter 35, The
PRINTTO Procedure, on page 771 for documentation on PROC PRINTTO.
proc printto;
run;
The %INCLUDE statement recalls the statements in the le HRWDW. The statements in
HRWDW redisplay the primary window.
filename hrwdw external-file;
%include hrwdw;
run;
ITEM statement
MENU statement
Other features: SAS/AF software
This example creates menus for a FRAME entry and gives the steps necessary to
associate the menus with a FRAME entry from SAS/AF software.
Program
Declare the PROCLIB library. The PROCLIB library is used to store menu denitions.
libname proclib SAS-data-library;
Program
701
Specify the catalog for storing menu denitions. Menu denitions will be stored in the
PROCLIB.MENUCAT catalog.
proc pmenu catalog=proclib.menucat;
Specify the name of the catalog entry. The MENU statement species FRAME as the name
of the catalog entry. The menus are stored in the catalog entry
PROCLIB.MENUS.FRAME.PMENU.
menu frame;
Design the menu bar. The ITEM statements specify the items in the menu bar. The value of
MENU= corresponds to a subsequent MENU statement.
item File menu=f;
item Help menu=h;
Design the File menu. The MENU statement equates to the MENU= option in a preceding
ITEM statement. The ITEM statements specify the selections that are available under File on
the menu bar.
menu f;
item Cancel;
item End;
Design the Help menu. The MENU statement equates to the MENU= option in a preceding
ITEM statement. The ITEM statements specify the selections that are available under Help on
the menu bar. The value of the SELECTION= option equates to a subsequent SELECTION
statement.
menu h;
item About the application selection=a;
item About the keys
selection=k;
Specify commands for the Help menu. The SETHELP command species a HELP entry
that contains user-written information for this application. The semicolon that appears after the
HELP entry name enables the HELP command to be included in the string. The HELP
command invokes the HELP entry.
selection a sethelp proclib.menucat.app.help;help;
selection k sethelp proclib.menucat.keys.help;help;
run;
quit;
702
Chapter 33
View
Properties Window
2 In the Properties window, select the Value eld for the pmenuEntry Attribute
Build
Test
Notice that the menus are now associated with the FRAME.
703
CHAPTER
34
The PRINT Procedure
Overview: PRINT Procedure 703
What Does the PRINT Procedure Do? 703
Simple Listing Report 704
Customized Report 704
Syntax: PRINT Procedure
705
PROC PRINT Statement 707
BY Statement 715
ID Statement 716
PAGEBY Statement 717
SUM Statement 718
SUMBY Statement 719
VAR Statement 719
Results: Print Procedure 720
Procedure Output 720
Page Layout 720
Observations 720
Column Headings 722
Column Width 723
Examples: PRINT Procedure
723
Example 1: Selecting Variables to Print 723
Example 2: Customizing Text in Column Headers 727
Example 3: Creating Separate Sections of a Report for Groups of Observations 731
Example 4: Summing Numeric Variables with One BY Group 737
Example 5: Summing Numeric Variables with Multiple BY Variables 742
Example 6: Limiting the Number of Sums in a Report 748
Example 7: Controlling the Layout of a Report with Many Variables 754
Example 8: Creating a Customized Layout with BY Groups and ID Variables 761
Example 9: Printing All the Data Sets in a SAS Library 767
704
Chapter 34
highly customized report that groups the data and calculates totals and subtotals for
numeric variables.
Output 34.1
Obs
Region
1
2
3
4
5
6
7
Southern
Southern
Southern
Northern
Northern
Southern
Northern
State
GA
GA
FL
NY
NY
FL
MA
Month
Expenses
JAN95
FEB95
FEB95
FEB95
MAR95
MAR95
MAR95
2000
1200
8500
3000
6000
9800
1500
Revenues
8000
6000
11000
4000
5000
13500
1000
Customized Report
The following HTML report is a customized report that is produced by PROC PRINT
using ODS. The statements that create this report
3
3
3
3
3
3
3
For an explanation of the program that produces this report, see Program: Creating
an HTML Report with the STYLE Option on page 765.
Display 34.1
705
706
Chapter 34
ID variable(s) <option>;
SUM variable(s) <option>;
VAR variable(s) <option>;
To do this
BY
ID
PAGEBY
SUMBY
SUM
VAR
To do this
CONTENTS=
DATA=
DOUBLE
N=
NOOBS
OBS=
ROUND
ROWS=
WIDTH=UNIFORM
HEADING=
LABEL or SPLIT=
SPLIT=
STYLE
WIDTH=
Options
CONTENTS=link-text
species the text for the links in the HTML contents le to the output produced by
the PROC PRINT statement. For information on HTML output, see SAS Output
Delivery System: Users Guide.
Restriction: CONTENTS= does not affect the HTML body le. It affects only the
HTML contents le.
707
708
Chapter 34
DATA=SAS-data-set
HEADING=direction
controls the orientation of the column headings, where direction is one of the
following:
HORIZONTAL
prints all column headings horizontally.
Alias: H
VERTICAL
prints all column headings vertically.
Alias: V
Default: Headings are either all horizontal or all vertical. If you omit HEADING=,
3 If you do not use LABEL, spacing dictates whether column headings are
vertical or horizontal.
3 If you use LABEL and at least one variable has a label, all headings are
horizontal.
LABEL
Default: If you omit LABEL, PROC PRINT uses the variables name as the column
heading even if the PROC PRINT step contains a LABEL statement. If a variable
does not have a label, PROC PRINT uses the variables name as the column
heading.
Interaction: By default, if you specify LABEL and at least one variable has a label,
PROC PRINT prints all column headings horizontally. Therefore, using LABEL
may increase the number of pages of output. (Use HEADING=VERTICAL in the
PROC PRINT statement to print vertical column headings.)
Interaction: PROC PRINT sometimes conserves space by splitting labels across
multiple lines. Use SPLIT= in the PROC PRINT statement to control where these
splits occur. You do not need to use LABEL if you use SPLIT=.
To create a blank column header for a variable, use this LABEL statement in
your PROC PRINT step:
Tip:
label variable-name=00x;
See also: For information on using the LABEL statement to create temporary
709
Note: The SAS system option LABEL must be in effect in order for any procedure
to use labels. For more information see the section on system options in SAS
Language Reference: Dictionary 4
N<=string-1 <string-2>>
prints the number of observations in the data set, in BY groups, or both and species
explanatory text to print with the number.
If you use the N option
PROC PRINT
with a BY statement
Featured in:
NOOBS
species a column header for the column that identies each observation by number.
Tip: OBS= honors the split character (see the discussion of SPLIT= on page 710 ).
Featured in: Example 2 on page 727
ROUND
rounds unformatted numeric values to two decimal places. (Formatted values are
already rounded by the format to the specied number of decimal places.) For both
formatted and unformatted variables, PROC PRINT uses these rounded values to
calculate any sums in the report.
If you omit ROUND, PROC PRINT adds the actual values of the rows to obtain
the sum even though it displays the formatted (rounded) values. Any sums are also
rounded by the format, but they include only one rounding error, that of rounding the
sum of the actual values. The ROUND option, on the other hand, rounds values
before summing them, so there may be multiple rounding errors. The results without
ROUND are more accurate, but ROUND is useful for published reports where it is
important for the total to be the sum of the printed (rounded) values.
Be aware that the results from PROC PRINT with the ROUND option may differ
from the results of summing the same data with other methods such as PROC
MEANS or the DATA step. Consider a simple case in which
3 the data set contains three values for X: .003, .004, and .009.
3 X has a format of 5.2.
710
Chapter 34
Depending on how you calculate the sum, you can get three different answers:
0.02, 0.01, and 0.016. The following gure shows the results of calculating the sum
with PROC PRINT (without and with the ROUND option) and PROC MEANS.
Figure 34.1
Actual Values
PROC MEANS
===================================================================================
||
||
||
||
| | Analysis Variable : X
||
OBS
X
OBS
X
||
||
||
||
||
||
.003
Sum
1
0.00
1
0.00
||
| | -----------||
.004
2
0.00
2
0.00
0.0160000
||
||
||
.009
3
0.01
3
0.01
-----------=====
=====
=====
||
||
||
.016
0.02
0.01
||
||
||
||
||
||
===================================================================================
Notice that the sum produced without the ROUND option (.02) is closer to the
actual result (0.16) than the sum produced with ROUND (0.01). However, the sum
produced with ROUND reects the numbers displayed in the report.
Alias:
CAUTION:
Do not use ROUND with PICTURE formats. ROUND is for use with numeric values.
SAS procedures treat variables that have picture formats as character variables.
Using ROUND with such variables may lead to unexpected results. 4
ROWS=page-format
formats rows on a page. Currently, PAGE is the only value that you can use for
page-format:
PAGE
prints only one row of variables for each observation per page. When you use
ROWS=PAGE, PROC PRINT does not divide the page into sections; it prints as
many observations as possible on each page. If the observations do not ll the last
page of the output, PROC PRINT divides the last page into sections and prints all
the variables for the last few observations.
Restriction: Physical page size does not mean the same thing in HTML output as it
does in traditional procedure output. Therefore, HTML output from PROC PRINT
appears the same whether or not you use ROWS=.
Tip: The PAGE value can reduce the number of pages in the output if the data set
contains large numbers of variables and observations. However, if the data set
contains a large number of variables but few observations, the PAGE value can
increase the number of pages in the output.
See also: Page Layout on page 720 for discussion of the default layout.
Featured in:
SPLIT=split-character
species the split character, which controls line breaks in column headers. It also
uses labels as column headers. PROC PRINT breaks a column heading when it
reaches the split character and continues the header on the next line. The split
711
character is not part of the column heading although each occurrence of the split
character counts toward the 256-character maximum for a label.
Alias:
S=
Interaction: You do not need to use both LABEL and SPLIT= because SPLIT=
Note: PROC PRINT does not split labels of BY variables in the heading preceding
each BY group even if you specify SPLIT=. Instead, PROC PRINT replaces the split
character with a blank. 4
STYLE
location
identies the part of the report that the STYLE option affects. The following table
shows the available locations and the other statements in which you can specify
them.
Note: Style specications in a statement other than the PROC PRINT
statement override the same style specication in the PROC PRINT statement.
However, style attributes that you specify in the PROC PRINT statement are
inherited, provided that you do not override the style with style specications in
another statement. For instance, if you specify a blue background and a white
foreground for all column headers in the PROC PRINT statement, and you specify
a gray background for the column headers of a variable in the VAR statement, the
background for that particular column header is gray, and the foreground is white
(as specied in the PROC PRINT statement). 4
Table 34.1 Specifying Locations in the STYLE Option
This location
BYLABEL
none
DATA
VAR
ID
SUM
GRANDTOTAL
SUM
HEADER
VAR
ID
SUM
712
Chapter 34
This location
none
OBS
none
OBSHEADER
none
TABLE
none
TOTAL
SUM
For your convenience and for consistency with other procedures, the following
table shows aliases for the different locations.
Table 34.2
Location
BYLABEL
Aliases
BYSUMLABEL
BYLBL
BYSUMLBL
DATA
COLUMN
COL
GRANDTOTAL
GRANDTOT
GRAND
GTOTAL
GTOT
HEADER
HEAD
HDR
none
OBS
OBSDATA
OBSCOLUMN
OBSCOL
OBSHEADER
OBSHEAD
OBSHDR
TABLE
REPORT
TOTAL
TOT
BYSUMLINE
BYLINE
BYSUM
713
style-element-name
is the name of a style element that is part of a style denition that is registered
with the Output Delivery System. SAS provides some style denitions. Users can
create their own style denitions with PROC TEMPLATE.
When style elements are processed, more specic style elements override less
specic style elements.
Default: The following table shows the default style element for each location.
Table 34.3
Location
BYLABEL
Header
DATA
GRANDTOTAL
Header
HEADER
Header
NoteContent
OBS
RowHeader
OBSHEADER
Header
TABLE
Table
TOTAL
Header
style-attribute-specication
describes the style attribute to change. Each style-attribute-specication has this
general form:
style-attribute-name=style-attribute-value
You can set these style attributes in the TABLE location:
BACKGROUND=
FONT_WIDTH=*
BACKGROUNDIMAGE=
FOREGROUND=*
BORDERCOLOR=
FRAME=
BORDERCOLORDARK=
HTMLCLASS=
BORDERCOLORLIGHT=
JUST=
BORDERWIDTH=
OUTPUTWIDTH=
CELLPADDING=
POSTHTML=
CELLSPACING=
POSTIMAGE=
FONT=*
POSTTEXT=
FONT_FACE=*
PREHTML=
FONT_SIZE=*
PREIMAGE=
FONT_STYLE=*
PRETEXT=
FONT_WEIGHT=*
RULES=
714
Chapter 34
*When you use these attributes, they affect only the text that is specied with the
PRETEXT=, POSTTEXT=, PREHTML=, and POSTHTML= attributes. To alter the
foreground color or the font for the text that appears in the table, you must set the
corresponding attribute in a location that affects the cells rather than the table.
You can set these style attributes in all locations other than TABLE:
ASIS=
FONT_WIDTH=
BACKGROUND=
HREFTARGET=
BACKGROUNDIMAGE=
HTMLCLASS=
BORDERCOLOR=
JUST=
BORDERCOLORDARK=
NOBREAKSPACE=
BORDERCOLORLIGHT=
POSTHTML=
BORDERWIDTH=
POSTIMAGE=
CELLHEIGHT=
POSTTEXT=
CELLWIDTH=
PREHTML=
FLYOVER=
PREIMAGE=
FONT=
PRETEXT=
FONT_FACE=
PROTECTSPECIALCHARS=
FONT_SIZE=
TAGATTR=
FONT_STYLE=
URL=
FONT_WEIGHT=
VJUST=
For information about style attributes, see DEFINE STYLE statement in SAS
Output Delivery System: Users Guide.
Restriction: This option affects all destinations except Listing and Output.
UNIFORM
determines the column width for each variable. The value of column-width must be
one of the following:
FULL
uses a variables formatted width as the column width. If the variable does not
have a format that explicitly species a eld width, PROC PRINT uses the default
width. For a character variable, the default width is the length of the variable.
For a numeric variable, the default width is 12. When you use WIDTH=FULL, the
column widths do not vary from page to page.
Tip: Using WIDTH=FULL can reduce execution time.
MINIMUM
uses for each variable the minimum column width that accommodates all values of
the variable.
Alias: MIN
BY Statement
715
UNIFORM
uses each variables formatted width as its column width on all pages. If the
variable does not have a format that explicitly species a eld width, PROC
PRINT uses the widest data value as the column width. When you specify
WIDTH=UNIFORM, PROC PRINT normally needs to read the data set twice.
However, if all the variables in the data set have formats that explicitly specify a
eld width (for example, BEST12. but not BEST.), PROC PRINT reads the data
set only once.
Alias: U
Tip: If the data set is large and you want a uniform report, you can save computer
resources by using formats that explicitly specify a eld width so that PROC
PRINT reads the data only once.
Tip: WIDTH=UNIFORM is the same as UNIFORM.
Restriction: When not all variables have formats that explicitly specify a width,
PRINT individually constructs each page of output. The procedure analyzes the
data for a page and decides how best to display them. Therefore, column widths
may differ from one page to another.
Column width is affected not only by variable width but also by the length of
column headings. Long column headings may lessen the usefulness of WIDTH=.
Tip:
See also: For a discussion of default column widths, see Column Width on page
723.
BY Statement
Produces a separate section of the report for each BY group.
Main discussion:
BY on page 58
Featured in:
BY <DESCENDING> variable-1
< <DESCENDING> variable-n>
<NOTSORTED>;
716
ID Statement
Chapter 34
Required Arguments
variable
species the variable that the procedure uses to form BY groups. You can specify
more than one variable. If you do not use the NOTSORTED option in the BY
statement, the observations in the data set must either be sorted by all the variables
that you specify, or they must be indexed appropriately. Variables in a BY statement
are called BY variables.
Options
DESCENDING
species that the data set is sorted in descending order by the variable that
immediately follows the word DESCENDING in the BY statement.
NOTSORTED
species that observations are not necessarily sorted in alphabetic or numeric order.
The data are grouped in another way, such as chronological order.
The requirement for ordering or indexing observations according to the values of
BY variables is suspended for BY-group processing when you use the NOTSORTED
option. In fact, the procedure does not use an index if you specify NOTSORTED. The
procedure denes a BY group as a set of contiguous observations that have the same
values for all BY variables. If observations with the same values for the BY variables
are not contiguous, the procedure treats each contiguous set as a separate BY group.
ID Statement
Identies observations by using the formatted values of the variables that you list instead of by
using observation numbers.
Featured in:
PAGEBY Statement
717
Required Arguments
variable(s)
species one or more variables to print instead of the observation number at the
beginning of each row of the report.
Restriction: If the ID variables occupy so much space that no room remains on the
line for at least one other variable, PROC PRINT writes a warning to the SAS log
and does not treat all ID variables as ID variables.
Interaction: If a variable in the ID statement also appears in the VAR statement,
Options
STYLE <(location(s))>=<style-element-name><[style-attribute-specication(s)]>
species the style element to use for ID columns created with the ID statement. For
information about the arguments of this option and how it is used, see STYLE on
page 711 in the PROC PRINT statement.
To specify different style elements for different ID columns, use a separate ID
statement for each variable and add a different STYLE option to each ID
statement.
Tip:
PAGEBY Statement
Controls page ejects that occur before a page is full.
Requirements:
BY statement
PAGEBY BY-variable;
Required Arguments
BY-variable
identies a variable appearing in the BY statement in the PROC PRINT step. If the
value of the BY variable changes, or if the value of any BY variable that precedes it
in the BY statement changes, PROC PRINT begins printing a new page.
718
SUM Statement
Chapter 34
Interaction: If you use the BY statement with the SAS system option NOBYLINE,
which suppresses the BY line that normally appears in output produced with
BY-group processing, PROC PRINT always starts a new page for each BY group.
This behavior ensures that if you create customized BY lines by putting BY-group
information in the title and suppressing the default BY lines with NOBYLINE, the
information in the titles matches the report on the pages. (See Creating Titles
That Contain BY-Group Information on page 20.)
SUM Statement
Totals values of numeric variables.
Example 4 on page 737, Example 5 on page 742, Example 6 on page 748,
and Example 8 on page 761
Featured in:
Required Arguments
variable(s)
Option
STYLE <(location(s))>=<style-element-name><[style-attribute-specication(s)]>
species the style element to use for cells containing sums that are created with the
SUM statement. For information about the arguments of this option and how it is
used, see STYLE on page 711 in the PROC PRINT statement.
To specify different style elements for different cells reporting sums, use a
separate SUM statement for each variable and add a different STYLE option to
each SUM statement.
Tip:
If the STYLE option is used in multiple SUM statements that affect the same
location, the STYLE option in the last SUM statement will be used.
Tip:
VAR Statement
719
Note: When the value of a BY variable changes, the SAS System considers that the
values of all variables listed after it in the BY statement also change. 4
SUMBY Statement
Limits the number of sums that appear in the report.
BY statement
Featured in: Example 6 on page 748
Requirements:
SUMBY BY-variable;
Required Arguments
BY-variable
identies a variable that appears in the BY statement in the PROC PRINT step. If
the value of the BY variable changes, or if the value of any BY variable that precedes
it in the BY statement changes, PROC PRINT prints the sums of all variables listed
in the SUM statement.
VAR Statement
Selects variables that appear in the report and determines their order.
If you omit the VAR statement, PROC PRINT prints all variables in the data set.
Featured in: Example 1 on page 723 and Example 8 on page 761
Tip:
720
Chapter 34
Required Arguments
variable(s)
identies the variables to print. PROC PRINT prints the variables in the order that
you list them.
Interaction: In the PROC PRINT output, variables that are listed in the ID
statement precede variables that are listed in the VAR statement. If a variable in
the ID statement also appears in the VAR statement, the output contains two
columns for that variable.
Option
STYLE <(location(s))>=<style-element-name><[style-attribute-specication(s)]>
species the style element to use for all columns that are created by a VAR
statement. For information about the arguments of this option and how it is used,
see STYLE on page 711 in the PROC PRINT statement.
Tip: To specify different style elements for different columns, use a separate VAR
statement to create a column for each variable and add a different STYLE option
to each VAR statement.
Procedure Output
PROC PRINT always produces a printed report. You control the appearance of the
report with statements and options. See Examples: PRINT Procedure on page 723 for
a sampling of the types of reports that the procedure produces.
Page Layout
Observations
By default, PROC PRINT uses an identical layout for all observations on a page of
output. First, it attempts to print observations on a single line (see Figure 34.2 on page
721).
Figure 34.2
Page Layout
721
1
Obs
Var_1
Var_2
Var_3
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
If PROC PRINT cannot t all the variables on a single line, it splits the observations
into two or more sections and prints the observation number or the ID variables at the
beginning of each line. For example, in Figure 34.3 on page 721, PROC PRINT prints
the values for the rst three variables in the rst section of each page and the values
for the second three variables in the second section of each page.
Figure 34.3
Obs
Var_1
Var_2
Var_3
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
Obs
Var_4
~~~~
~~~~
~~~~
Var_5
Var_6
Obs
Var_1
~~~~
~~~~
4
~~~~
~~~~
~~~~
5
~~~~
~~~~
~~~~
6
~~~~
2
Var_2
Var_3
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
Obs
Var_4
Var_5
Var_6
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
If PROC PRINT cannot t all the variables on one page, the procedure prints
subsequent pages with the same observations until it has printed all the variables. For
example, in Figure 34.4 on page 722, PROC PRINT uses the rst two pages to print
values for the rst three observations and the second two pages to print values for the
rest of the observations.
722
Page Layout
Chapter 34
Figure 34.4
Obs
Var_1
Var_2
Var_3
~~~~
~~~~
~~~~
Obs
Var_7
Var_8
Var_9
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
Obs
Var_4
Var_5
Var_6
~~~~
~~~~
~~~~
Obs
Var_10
Var_11
Var_12
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
3
4
Obs
Var_1
Var_2
Var_3
~~~~
~~~~
~~~~
Obs
Var_7
Var_8
Var_9
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
Var_10
Var_11
Var_12
Obs
Var_4
Var_5
Var_6
~~~~
~~~~
~~~~
Obs
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
Note: You can alter the page layout with the ROWS= option in the PROC PRINT
statement (see the discussion of ROWS= on page 710). 4
Note: PROC PRINT may produce slightly different output if the data set is not
RADIX addressable. Version 6 compressed les are not RADIX addressable, while,
beginning with Version 7, compressed les are RADIX addressable. (The integrity of the
data is not compromised; the procedure simply numbers the observations differently.) 4
Column Headings
By default, spacing dictates whether PROC PRINT prints column headings
horizontally or vertically. Figure 34.2 on page 721, Figure 34.3 on page 721, and Figure
34.4 on page 722 all illustrate horizontal headings. Figure 34.5 on page 722 illustrates
vertical headings.
Figure 34.5
O
b
s
V
a
r
V
a
r
V
a
r
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
~~~~
Note: If you use LABEL and at least one variable has a label, PROC PRINT prints
all column headings horizontally unless you specify HEADING=VERTICAL. 4
723
Column Width
By default, PROC PRINT uses a variables formatted width as the column width.
(The WIDTH= option overrides this default behavior.) If the variable does not have a
format that explicitly species a eld width, PROC PRINT uses the widest data value
for that variable on that page as the column width.
If the formatted value of a character variable or the data width of an unformatted
character variable exceeds the linesize minus the length of all the ID variables, PROC
PRINT may truncate the value. Consider the following situation:
3
3
3
3
When PROC PRINT prints these three variables on a line, it uses 14 print positions
for the two ID variables and the space after each one. This leaves 8014, or 66, print
positions for COMMENT. Longer values of COMMENT are truncated.
WIDTH= controls the column width.
Note: Column width is affected not only by variable width but also by the length of
column headings. Long column headings may lessen the usefulness of WIDTH=. 4
This example
3 selects three variables for the report
724
Chapter 34
Create the input data set. EXPREV contains information about a companys monthly
expenses and revenues for two regions of the United States.
data exprev;
input Region $ State $ Month monyy5.
Expenses Revenues;
format month monyy5.;
datalines;
Southern GA JAN95 2000 8000
Southern GA FEB95 1200 6000
Southern FL FEB95 8500 11000
Northern NY FEB95 3000 4000
Northern NY MAR95 6000 5000
Southern FL MAR95 9800 13500
Northern MA MAR95 1500 1000
;
Print the data set EXPREV. DOUBLE inserts a blank line between observations. (This option
has no effect on the HTML output.)
proc print data=exprev double;
Select the variables to include in the report. The VAR statement creates columns for
Month, State, and Expenses, in that order.
var month state expenses;
Specify a title. The TITLE statement species a title for the report.
title Monthly Expenses for Offices in Each State;
run;
725
Output: Listing
Output 34.2
By default, PROC PRINT identies each observation by number under the column heading Obs.
Month
State
Expenses
JAN95
GA
2000
FEB95
GA
1200
FEB95
FL
8500
FEB95
NY
3000
MAR95
NY
6000
MAR95
FL
9800
MAR95
MA
1500
Create HTML output and specify the le to store the output in. The ODS HTML
statement opens the HTML destination. FILE= species the external le that you want to
contain the HTML output.
ods html file=your_file.html;
proc print data=exprev double;
var month state expenses;
title Monthly Expenses for Offices in Each State;
run;
Close the HTML destination. The ODS HTML CLOSE statement closes the HTML
destination.
ods html close;
726
Output: HTML
Chapter 34
Output: HTML
Display 34.2
Create stylized HTML output. The rst STYLE option species that the column headers be
written in white italic font.
The second STYLE option species that SAS change the color of the background of the
observations column to red.
Close the HTML destination. The ODS HTML CLOSE statement closes the HTML
destination.
ods html close;
LABEL statement
ODS PDF statement
Data set:
This example
727
728
Chapter 34
Print the report and dene the column headings. SPLIT= identies the asterisk as the
character that starts a new line in column headers. The N option prints the number of
observations at the end of the report. OBS= species the column header for the column that
identies each observation by number. The split character (*) starts a new line in the column
heading. Therefore, the equal signs (=) in the value of OBS= underline the column header.
proc print data=exprev split=* n obs=Observation*Number*===========;
Select the variables to include in the report. The VAR statement creates columns for
Month, State, and Expenses, in that order.
var month state expenses;
Assign the variables labels as column headings. The LABEL statement associates a label
with each variable for the duration of the PROC PRINT step. When you use SPLIT= in the
PROC PRINT statement, the procedure uses labels for column headers. The split character (*)
starts a new line in the column heading. Therefore, the equal signs (=) in the labels underline
the column headers.
label month=Month**=====
state=State**=====
expenses=Expenses**========;
Specify a title for the report, and format any variable containing numbers. The
FORMAT statement assigns a format to use for Expenses in the report. The TITLE statement
species a title.
format expenses comma10.;
title Monthly Expenses for Offices in Each State;
run;
729
Output: Listing
Output 34.3
Month
State
Expenses
===========
=====
=====
========
JAN95
GA
2,000
2
3
FEB95
FEB95
GA
FL
1,200
8,500
4
5
6
FEB95
MAR95
MAR95
NY
NY
FL
3,000
6,000
9,800
MAR95
MA
1,500
N = 7
Create PDF output and specify the le to store the output in. The ODS PDF statement
opens the PDF destination and creates PDF output. The FILE= argument species your
external le that contains the PDF output.
ods pdf file=your_file.pdf;
proc print data=exprev split=* n obs=Observation*Number*===========;
var month state expenses;
label month=Month**=====
state=State**=====
expenses=Expenses**========;
format expenses comma10.;
title Monthly Expenses for Offices in Each State;
run;
Close the PDF destination. The ODS PDF CLOSE statement closes the PDF destination.
ods pdf close;
730
Output: PDF
Chapter 34
Output: PDF
Display 34.4
Create stylized PDF output. The rst STYLE option species that the background color of
the cell containing the value for N be changed to blue and that the font style be changed to
italic. The second STYLE option species that the background color of the observation column,
the observation header, and the other variables headers be changed to white.
proc print data=exprev split=* n obs=Observation*Number*===========
style(N) = {font_style=italic background= blue}
Style(HEADER OBS OBSHEADER) = {background=white};
Create stylized PDF output. The STYLE option changes the color of the cells containing
data to gray.
var month state expenses / style (DATA)= [ background = gray ] ;
label month=Month**=====
state=State**=====
expenses=Expenses**========;
format expenses comma10.;
Close the PDF destination. The ODS PDF CLOSE statement closes the PDF destination.
ods pdf close;
731
732
Chapter 34
Other features:
SORT procedure
LABEL statement
ODS RTF statement
Data set:
This example
Sort the EXPREV data set. PROC SORT sorts the observations by Region, State, and Month.
proc sort data=exprev;
by region state month;
run;
Print the report, specify the total number of observations in each BY group, and
suppress the printing of observation numbers. N= prints the number of observations in a
BY group at the end of that BY group. The explanatory text that the N= option provides
precedes the number. NOOBS suppresses the printing of observation numbers at the beginning
of the rows. LABEL uses variables labels as column headings.
proc print data=exprev n=Number of observations for the state:
noobs label;
Specify the variables to include in the report. The VAR statement creates columns for
Month, Expenses, and Revenues, in that order.
var month expenses revenues;
Create a separate section for each region of the state and specify page breaks for each
BY group of Region. The BY statement produces a separate section of the report for each BY
group and prints a heading above each one. The PAGEBY statement starts a new page each
time the value of Region changes.
by region state;
pageby region;
Output: Listing
733
Establish the column headings. The LABEL statement associates a label with the variable
Region for the duration of the PROC PRINT step. When you use the LABEL option in the
PROC PRINT statement, the procedure uses labels for column headings.
label region=Sales Region;
Format the columns that contain numbers and specify a title. The FORMAT statement
assigns a format to Expenses and Revenues for this report. The TITLE statement species a
title.
format revenues expenses comma10.;
title Sales Figures Grouped by Region and State;
run;
Output: Listing
Output 34.4
Expenses
Revenues
MAR95
1,500
1,000
Expenses
Revenues
FEB95
MAR95
3,000
6,000
4,000
5,000
734
Chapter 34
Expenses
Revenues
FEB95
MAR95
8,500
9,800
11,000
13,500
Expenses
Revenues
JAN95
FEB95
2,000
1,200
8,000
6,000
Create output for Microsoft Word and specify the le to store the output in. The ODS
RTF statement opens the RTF destination and creates output formatted for Microsoft Word. The
FILE= option species your external le that contains the RTF output. The STARTPAGE=NO
option species that no new pages be inserted within the PRINT procedure, even if new pages
are requested by the procedure code.
ods rtf startpage=no file=your_file.rtf;
proc sort data=exprev;
by region state month;
run;
proc print data=exprev n=Number of observations for the state:
noobs label;
var month expenses revenues;
by region state;
pageby region;
label region=Sales Region;
format revenues expenses comma10.;
title Sales Figures Grouped by Region
and State;
run;
Close the RTF destination. The ODS RTF CLOSE statement closes the RTF destination.
ods rtf close;
Output: RTF
Display 34.6
Creating Separate Sections of a Report for Groups of Observations: Default RTF Output
735
736
Chapter 34
Create a stylized RTF report. The rst STYLE option species that the background color
of the cell containing the number of observations be changed to gray.
The second STYLE option species that the background color of the column header for the
variable MONTH be changed to white.
The third STYLE option species that the background color of the column header for the
variable EXPENSES be changed to blue and the font color be changed to white.
The fourth STYLE option species that the background color of the column header for the
variable REVENUES be changed to gray.
proc print data=exprev n=Number of observations for the state:
noobs label style(N) = {background=gray};
var month / style(HEADER) = [background = white];
var expenses / style(HEADER) = [background = blue foreground=white];
var revenues / style(HEADER) = [background = gray];
by region state;
pageby region;
label region=Sales Region;
format revenues expenses comma10.;
title Sales Figures Grouped by Region
and State;
run;
ods rtf close;
Creating Separate Sections of a Report for Groups of Observations: RTF Output Using Styles
737
738
Chapter 34
SUM statement
Other features:
This example
3 sums expenses and revenues for each region and for all regions
3 shows the number of observations in each BY group and in the whole report
3 creates a customized title, containing the name of the region. This title replaces
the default BY line for each BY group.
Start each BY group on a new page and suppress the printing of the default BY line.
The SAS system option NOBYLINE suppresses the printing of the default BY line. When you
use PROC PRINT with NOBYLINE, each BY group starts on a new page.
options nodate pageno=1 linesize=70 pagesize=60 nobyline;
Sort the data set. PROC SORT sorts the observations by Region.
Print the report, suppress the printing of observation numbers, and print the total
number of observations for the selected variables. NOOBS suppresses the printing of
observation numbers at the beginning of the rows. N= prints the number of observations in a
BY group at the end of that BY group and (because of the SUM statement) prints the number
of observations in the data set at the end of the report. The rst piece of explanatory text that
N= provides precedes the number for each BY group. The second piece of explanatory text that
N= provides precedes the number for the entire data set.
proc print data=exprev noobs
n=Number of observations for the state:
Number of observations for the data set: ;
Output: Listing
739
Sum the values for the selected variables. The SUM statement alone sums the values of
Expenses and Revenues for the entire data set. Because the PROC PRINT step contains a BY
statement, the SUM statement also sums the values of Expenses and Revenues for each region
that contains more than one observation.
sum expenses revenues;
by region;
Format the numeric values for a specied column. The FORMAT statement assigns the
COMMA10. format to Expenses and Revenues for this report.
format revenues expenses comma10.;
Specify and format a dynamic (or current) title. The TITLE statement species a title. The
#BYVAL specication places the current value of the BY variable Region in the title. Because
NOBYLINE is in effect, each BY group starts on a new page, and the title serves as a BY line.
title Revenue and Expense Totals for the
#byval(region) Region;
run;
Generate the default BY line. The SAS system option BYLINE resets the printing of the
default BY line.
options byline;
Output: Listing
Output 34.5
Month
Expenses
Revenues
NY
NY
MA
-----Region
FEB95
MAR95
MAR95
3,000
6,000
1,500
---------10,500
4,000
5,000
1,000
---------10,000
740
Chapter 34
Month
Expenses
Revenues
GA
GA
FL
FL
-----Region
JAN95
FEB95
FEB95
MAR95
2,000
1,200
8,500
9,800
---------21,500
==========
32,000
8,000
6,000
11,000
13,500
---------38,500
==========
48,500
Produce output that is tagged with Extensible Markup Language (XML) tags and
specify the le to store it in. The ODS MARKUP statement opens the MARKUP destination
and creates a le containing output that is tagged with XML tags. The FILE= argument
species your external le that contains the XML output.
ods markup file=your_file.xml;
options byline;
Output: XML le
Close the MARKUP destination. The ODS RTF CLOSE statement closes the MARKUP
destination.
ods markup close;
Output: XML le
Output 34.6
Summing Numeric Variables with One BY Group: Partial XML Output Viewed with a Text Editor
741
742
Chapter 34
BY statement
SUM statement
Other features: SORT procedure
Data set:
This example
3 sums expenses and revenues for
3 each region
3 each state with more than one row in the report
3 all rows in the report.
3 shows the number of observations in each BY group and in the whole report.
Sort the data set. PROC SORT sorts the observations by Region and State.
Print the report, suppress the printing of observation numbers, and print the total
number of observations for the selected variables. The N option prints the number of
observations in a BY group at the end of that BY group and prints the total number of
observations used in the report at the bottom of the report. NOOBS suppresses the printing of
observation numbers at the beginning of the rows.
proc print data=exprev n noobs;
Create a separate section of the report for each BY group, and sum the values for the
selected variables. The BY statement produces a separate section of the report for each BY
group. The SUM statement alone sums the values of Expenses and Revenues for the entire data
set. Because the program contains a BY statement, the SUM statement also sums the values of
Expenses and Revenues for each BY group that contains more than one observation.
by region state;
sum expenses revenues;
743
Establish a label for a selected variable, format the values of specied variables, and
create a title. The LABEL statement associates a label with the variable Region for the
duration of the PROC PRINT step. The BY line at the beginning of each BY group uses the
label. The FORMAT statement assigns a format to the variables Expenses and Revenues for
this report. The TITLE statement species a title.
label region=Sales Region;
format revenues expenses comma10.;
title Revenue and Expense Totals for Each State and Region;
run;
744
Output: Listing
Chapter 34
Output: Listing
Output 34.7
The report uses default column headers (variable names) because neither the SPLIT= nor the
LABEL option is used. Nevertheless, the BY line at the top of each section of the report shows
the BY variables labels and their values. The name of a BY variable identies the subtotals in
the report.
PROC PRINT sums Expenses and Revenues for each BY group that contains more than one
observation. However, sums are shown only for the BY variables whose values change from one
BY group to the next. For example, in the third BY group, where the sales region is Southern
and the state is FL, Expenses and Revenues are summed only for the state because the next BY
group is for the same region.
Expenses
Revenues
MAR95
1,500
1,000
N = 1
------------------- Sales Region=Northern State=NY ------------------Month
Expenses
Revenues
FEB95
MAR95
-----State
Region
3,000
6,000
---------9,000
10,500
4,000
5,000
---------9,000
10,000
N = 2
------------------- Sales Region=Southern State=FL ------------------Month
Expenses
Revenues
FEB95
MAR95
-----State
8,500
9,800
---------18,300
11,000
13,500
---------24,500
N = 2
------------------- Sales Region=Southern State=GA ------------------Month
Expenses
Revenues
JAN95
FEB95
-----State
Region
2,000
1,200
---------3,200
21,500
==========
32,000
8,000
6,000
---------14,000
38,500
==========
48,500
N = 2
Total N = 7
Produce HTML output and specify the le to store the output in. The ODS HTML
statement opens the HTML destination and creates a le that contains HTML output. The
FILE= argument species your external le that contains the HTML output.
ods html file=your_file.html;
proc sort data=exprev;
by region state;
run;
proc print data=exprev n noobs;
by region state;
sum expenses revenues;
label region=Sales Region;
format revenues expenses comma10.;
title Revenue and Expense Totals for Each State and Region;
run;
Close the HTML destination. The ODS HTML CLOSE statement closes the HTML
destination.
ods html close;
745
746
Output: HTML
Chapter 34
Output: HTML
Display 34.8
747
Create stylized HTML output. The STYLE option in the rst SUM statement species that
the background color of the cell containing the grand total for the variable EXPENSES be
changed to white and the font color be changed to dark gray.
The STYLE option in the second SUM statement species that the background color of cells
containing totals for the variable REVENUES be changed to blue and the font color be changed
to white.
by region state;
sum expenses / style(GRANDTOTAL) = [background =white foreground=blue];
sum revenues / style(TOTAL) = [background =dark gray foreground=white];
label region=Sales Region;
format revenues expenses comma10.;
title Revenue and Expense Totals for Each State and Region;
run;
ods html close;
Summing Numeric Variables with Multiple BY Variables: HTML Output Using Styles
748
Chapter 34
BY statement
SUM statement
SUMBY statement
Other features:
SORT procedure
LABEL statement
Data set:
This example
3 creates a separate section of the report for each combination of state and region
3 sums expenses and revenues only for each region and for all regions, not for
individual states.
Sort the data set. PROC SORT sorts the observations by Region and State.
Print the report and remove the observation numbers. NOOBS suppresses the printing
of observation numbers at the beginning of the rows.
proc print data=exprev noobs;
Sum the values for each region. The SUM and BY statements work together to sum the
values of Revenues and Expenses for each BY group as well as for the whole report. The
SUMBY statement limits the subtotals to one for each region.
by region state;
sum revenues expenses;
sumby region;
749
Assign labels to specic variables. The LABEL statement associates a label with the
variable Region for the duration of the PROC PRINT step. This label is used in the BY lines.
label region=Sales Region;
Assign a format to the necessary variables and specify a title. The FORMAT statement
assigns the COMMA10. format to Expenses and Revenues for this report.
format revenues expenses comma10.;
title Revenue and Expense Figures for Each Region;
run;
750
Output: Listing
Chapter 34
Output: Listing
Output 34.8
The report uses default column headers (variable names) because neither the SPLIT= nor the
LABEL option is used. Nevertheless, the BY line at the top of each section of the report shows
the BY variables labels and their values. The name of a BY variable identies the subtotals in
the report.
Expenses
Revenues
MAR95
1,500
1,000
Expenses
Revenues
FEB95
MAR95
-----Region
3,000
6,000
---------10,500
4,000
5,000
---------10,000
Expenses
Revenues
FEB95
MAR95
8,500
9,800
11,000
13,500
Expenses
Revenues
JAN95
FEB95
-----Region
2,000
1,200
---------21,500
==========
32,000
8,000
6,000
---------38,500
==========
48,500
Produce PostScript output and specify the le to store the output in. The ODS PS
statement opens the PS destination and creates a le that contains PostScript output. The
FILE= argument species your external le that contains the PostScript output.
ods ps file=your_file.ps;
by region state;
sum revenues expenses;
sumby region;
Close the PS destination. The ODS PS CLOSE statement closes the PS destination.
ods ps close;
751
752
Output: PostScript
Chapter 34
Output: PostScript
Display 34.10 Limiting the Number of Sums in a Report: PostScript Output
ods ps file=your_file.ps;
by region state;
753
Create stylized PostScript output. The STYLE option in the rst SUM statement species
that the background color of cells containing totals for the variable REVENUES be changed to
blue and the font color be changed to white.
The STYLE option in the second SUM statement species that the background color of the cell
containing the grand total for the EXPENSES variable be changed to white and the font color
be changed to dark gray.
sum revenues / style(TOTAL) = [background =blue foreground=white];
sum expenses / style(GRANDTOTAL) = [background =white foreground=dark gray];
ods ps close;
754
Chapter 34
755
This example shows two ways of printing a data set with a large number of
variables: one is the default, and the other uses ROWS=. For detailed explanations of
the layouts of these two reports, see the ROWS= option on page 710 and see Page
Layout on page 720.
These reports use a pagesize of 24 and a linesize of 64 to help illustrate the different
layouts.
Note:
When the two reports are written as HTML output, they do not differ.
Create the EMPDATA data set. The data set EMPDATA contains personal and job-related
information about a companys employees. A DATA step on page 1385 creates this data set.
data empdata;
input IdNumber $ 1-4 LastName $ 9-19 FirstName $ 20-29
City $ 30-42 State $ 43-44 /
Gender $ 1 JobCode $ 9-11 Salary 20-29 @30 Birth date9.
@43 Hired date9. HomePhone $ 54-65;
format birth hired date9.;
datalines;
1919
Adams
Gerald
Stamford
CT
M
TA2
34376
15SEP1948
07JUN1975
203/781-1255
1653
Alexander Susan
Bridgeport
CT
F
ME2
35108
18OCT1952
12AUG1978
203/675-7715
. . . more lines of data . . .
1407
M
1114
F
;
Grant
PT1
Green
TA2
Daniel
68096
Janice
32928
Mt. Vernon
26MAR1957
New York
21SEP1957
NY
21MAR1978
914/468-1616
30JUN1975
212/588-1092
NY
Print only the rst 12 observations in a data set. The OBS= data set option uses only the
rst 12 observations to create the report. (This is just to conserve space here.) The ID statement
identies observations with the formatted value of IdNumber rather than with the observation
number. This report is shown in Example 7 on page 754.
proc print data=empdata(obs=12);
id idnumber;
title Personnel Data;
run;
756
Output: Listing
Chapter 34
Print a report that contains only one row of variables on each page. ROWS=PAGE
prints only one row of variables for each observation on a page. This report is shown in Example
7 on page 754.
proc print data=empdata(obs=12) rows=page;
id idnumber;
title Personnel Data;
run;
Output: Listing
Output 34.9
Personnel Data
Id
Number
1919
1653
1400
1350
1401
1499
1101
Id
Number
1919
1653
1400
1350
1401
1499
1101
LastName
First
Name
Adams
Alexander
Apple
Arthur
Avery
Barefoot
Baucom
Gerald
Susan
Troy
Barbara
Jerry
Joseph
Walter
Job
Code
TA2
ME2
ME1
FA3
TA3
ME3
SCP
City
Stamford
Bridgeport
New York
New York
Paterson
Princeton
New York
Salary
Birth
Hired
34376
35108
29769
32886
38822
43025
18723
15SEP48
18OCT52
08NOV55
03SEP53
16DEC38
29APR42
09JUN50
07JUN75
12AUG78
19OCT78
01AUG78
20NOV73
10JUN68
04OCT78
State
CT
CT
NY
NY
NJ
NJ
NY
Gender
M
F
M
F
M
M
M
HomePhone
203/781-1255
203/675-7715
212/586-0808
718/383-1549
201/732-8787
201/812-5665
212/586-8060
Personnel Data
Id
Number
1333
1402
1479
1403
1739
Id
Number
1333
1402
1479
1403
1739
Output 34.10
LastName
First
Name
Blair
Blalock
Bostic
Bowden
Boyce
Justin
Ralph
Marie
Earl
Jonathan
Job
Code
City
State
Stamford
New York
New York
Bridgeport
New York
CT
NY
NY
CT
NY
Salary
Birth
Hired
88606
32615
38785
28072
66517
02APR49
20JAN51
25DEC56
31JAN57
28DEC52
13FEB69
05DEC78
08OCT77
24DEC79
30JAN79
PT2
TA2
TA3
ME1
PT1
Gender
M
M
F
M
M
HomePhone
203/781-1777
718/384-2849
718/384-8816
203/675-3434
212/587-1247
Each page of this report contains values for only some of the variables
in each observation. However, each page contains values for more
observations than the default report does.
Personnel Data
Id
Number
1919
1653
1400
1350
1401
1499
1101
1333
1402
1479
1403
1739
LastName
First
Name
Adams
Alexander
Apple
Arthur
Avery
Barefoot
Baucom
Blair
Blalock
Bostic
Bowden
Boyce
Gerald
Susan
Troy
Barbara
Jerry
Joseph
Walter
Justin
Ralph
Marie
Earl
Jonathan
City
State
Stamford
Bridgeport
New York
New York
Paterson
Princeton
New York
Stamford
New York
New York
Bridgeport
New York
CT
CT
NY
NY
NJ
NJ
NY
CT
NY
NY
CT
NY
Gender
M
F
M
F
M
M
M
M
M
F
M
M
Output: Listing
757
758
Chapter 34
Personnel Data
Id
Number
1919
1653
1400
1350
1401
1499
1101
1333
1402
1479
1403
1739
Job
Code
TA2
ME2
ME1
FA3
TA3
ME3
SCP
PT2
TA2
TA3
ME1
PT1
Salary
Birth
Hired
34376
35108
29769
32886
38822
43025
18723
88606
32615
38785
28072
66517
15SEP48
18OCT52
08NOV55
03SEP53
16DEC38
29APR42
09JUN50
02APR49
20JAN51
25DEC56
31JAN57
28DEC52
HomePhone
07JUN75
12AUG78
19OCT78
01AUG78
20NOV73
10JUN68
04OCT78
13FEB69
05DEC78
08OCT77
24DEC79
30JAN79
203/781-1255
203/675-7715
212/586-0808
718/383-1549
201/732-8787
201/812-5665
212/586-8060
203/781-1777
718/384-2849
718/384-8816
203/675-3434
212/587-1247
data empdata;
input IdNumber $ 1-4 LastName $ 9-19 FirstName $ 20-29
City $ 30-42 State $ 43-44 /
Gender $ 1 JobCode $ 9-11 Salary 20-29 @30 Birth date9.
@43 Hired date9. HomePhone $ 54-65;
format birth hired date9.;
datalines;
1919
Adams
Gerald
Stamford
CT
M
TA2
34376
15SEP1948
07JUN1975
203/781-1255
1653
Alexander Susan
Bridgeport
CT
F
ME2
35108
18OCT1952
12AUG1978
203/675-7715
. . . more lines of data . . .
1407
M
1114
F
;
Grant
PT1
Green
TA2
Daniel
68096
Janice
32928
Mt. Vernon
26MAR1957
New York
21SEP1957
NY
21MAR1978
914/468-1616
30JUN1975
212/588-1092
NY
759
Create output for Microsoft Word and specify the le to store the output in. The ODS
RTF statement opens the RTF destination and creates output formatted for Microsoft Word. The
FILE= argument species your external le that contains the RTF output.
ods rtf file=your_file.rtf;
Close the RTF destination. The ODS RTF CLOSE statement closes the RTF destination.
ods rtf close;
Output: RTF
Display 34.12 Layout for a Report with Many Variables: RTF Output
data empdata;
input IdNumber $ 1-4 LastName $ 9-19 FirstName $ 20-29
City $ 30-42 State $ 43-44 /
Gender $ 1 JobCode $ 9-11 Salary 20-29 @30 Birth date9.
@43 Hired date9. HomePhone $ 54-65;
format birth hired date9.;
datalines;
1919
Adams
Gerald
Stamford
CT
M
TA2
34376
15SEP1948
07JUN1975
203/781-1255
1653
Alexander Susan
Bridgeport
CT
F
ME2
35108
18OCT1952
12AUG1978
203/675-7715
. . . more lines of data . . .
760
1407
M
1114
F
;
Chapter 34
Grant
PT1
Green
TA2
Daniel
68096
Janice
32928
Mt. Vernon
26MAR1957
New York
21SEP1957
NY
21MAR1978
914/468-1616
30JUN1975
212/588-1092
NY
=
red foreground = white}
=
blue foreground = white};
761
BY statement
ID statement
SUM statement
VAR statement
Other features:
SORT procedure
Data set:
Create and sort a temporary data set. PROC SORT creates a temporary data set in which
the observations are sorted by JobCode and Gender.
options nodate pageno=1 linesize=64 pagesize=60;
proc sort data=empdata out=tempemp;
by jobcode gender;
run;
Identify the character that starts a new line in column headers. SPLIT= identies the
asterisk as the character that starts a new line in column headers.
proc print data=tempemp split=*;
Specify the variables to include in the report. The VAR statement and the ID statement
together select the variables to include in the report. The ID statement and the BY statement
produce the special format.
id jobcode;
by jobcode;
var gender salary;
762
Chapter 34
Calculate the total value for each BY group. The SUM statement totals the values of
Salary for each BY group and for the whole report.
sum salary;
Assign labels to the appropriate variables. The LABEL statement associates a label with
each variable for the duration of the PROC PRINT step. When you use SPLIT= in the PROC
PRINT statement, the procedure uses labels for column headings.
label jobcode=Job Code*========
gender=Gender*======
salary=Annual Salary*=============;
Create formatted columns. The FORMAT statement assigns a format to Salary for this
report. The WHERE statement selects for the report only the observations for job codes that
contain the letters FA or ME. The TITLE statements specify two titles.
format salary dollar11.2;
where jobcode contains FA or jobcode contains ME;
title Expenses Incurred for;
title2 Salaries for Flight Attendants and Mechanics;
run;
Output: Listing
Gender
======
F
F
M
$23,177.00
$22,454.00
$22,268.00
------------$67,899.00
F
F
M
$28,888.00
$27,787.00
$28,572.00
------------$85,247.00
F
F
M
$32,886.00
$33,419.00
$32,217.00
------------$98,522.00
M
M
M
$29,769.00
$28,072.00
$28,619.00
------------$86,460.00
F
F
M
M
M
M
$35,108.00
$34,929.00
$35,345.00
$36,925.00
$35,090.00
$35,185.00
------------$212,582.00
$43,025.00
=============
$593,735.00
-------FA1
FA2
-------FA2
FA3
-------FA3
ME1
-------ME1
ME2
-------ME2
ME3
Annual Salary
=============
Output: Listing
763
764
Chapter 34
Produce HTML output and specify the le to store the output in. The ODS HTML
statement opens the HTML destination and creates a le that contains HTML output. The
FILE= argument species your external le that contains the HTML output.
ods html file=your_file.html;
id jobcode;
by jobcode;
var gender salary;
sum salary;
Close the HTML destination. The ODS HTML CLOSE statement closes the HTML
destination.
ods html close;
765
Output: HTML
Display 34.14 Creating a Customized Layout with BY Groups and ID Variables: Default HTML Output
Create stylized HTML output. The rst STYLE option species that the font of the headers
be changed to italic. The second STYLE option species that the background of cells that
contain input data be changed to blue and the foreground of these cells be changed to white.
proc print data=tempemp (obs=10) split=* style(HEADER) =
{font_style=italic}
style(DATA) =
{background=blue foreground = white};
id jobcode;
by jobcode;
var gender salary;
766
Chapter 34
Create total values that are written in red. The STYLE option species that the color of the
foreground of the cell that contain the totals be changed to red.
sum salary
/ style(total)= [foreground=red];
Macro facility
DATASETS procedure
PRINT procedure
Data set:
This example prints all the data sets in a SAS library. You can use the same
programming logic with any procedure. Just replace the PROC PRINT step near the
end of the example with whatever procedure step you want to execute. The example
uses the macro language. For details about the macro language, see SAS Guide to
Macro Processing, Version 6, Second Edition.
767
768
Program
Chapter 34
Program
libname printlib SAS-data-library
options nodate pageno=1 linesize=80 pagesize=60;
Copy the desired data sets from the WORK library to a permanent library. PROC
DATASETS copies two data sets from the WORK library to the PRINTLIB library in order to
limit the number of data sets available to the example.
proc datasets library=work memtype=data nolist;
copy out=printlib;
select list exprev;
run;
Create a macro and specify the parameters. The %MACRO statement creates the macro
PRINTALL. When you call the macro, you can pass one or two parameters to it. The rst
parameter is the name of the library whose data set you want to print. The second parameter is
a library used by the macro. If you do not specify this parameter, the WORK library is the
default.
%macro printall(libname,worklib=work);
Create the local macro variables. The %LOCAL statement creates two local macro variables,
NUM and I, to use in a loop.
%local num i;
Produce an output data set. This PROC DATASETS step reads the library that you specify
as a parameter when you invoke the macro. The CONTENTS statement produces an output
data set called TEMP1 in WORKLIB. This data set contains an observation for each variable in
each data set in the library LIBNAME. By default, each observation includes the name of the
data set that the variable is included in as well as other information about the variable.
However, the KEEP= data set option writes only the name of the data set to TEMP1.
proc datasets library=&libname memtype=data nodetails;
contents out=&worklib..temp1(keep=memname) data=_all_ noprint;
run;
Specify the unique values in the data set, assign a macro variable to each one, and
assign DATA step information to a macro variable. This DATA step increments the value
of N each time it reads the last occurrence of a data set name (when IF LAST.MEMNAME is
true). The CALL SYMPUT statement uses the current value of N to create a macro variable for
each unique value of MEMNAME in the data set TEMP1. The TRIM function removes extra
blanks in the TITLE statement in the PROC PRINT step that follows.
data _null_;
set &worklib..temp1 end=final;
Output
769
by memname notsorted;
if last.memname;
n+1;
call symput(ds||left(put(n,8.)),trim(memname));
When it reads the last observation in the data set (when FINAL is true), the DATA step assigns
the value of N to the macro variable NUM. At this point in the program, the value of N is the
number of observations in the data set.
if final then call symput(num,put(n,8.));
Run the DATA step. The RUN statement is crucial. It forces the DATA step to run, thus
creating the macro variables that are used in the CALL SYMPUT statements before the %DO
loop, which uses them, executes.
run;
Print the data sets and end the macro. The %DO loop issues a PROC PRINT step for each
data set. The %MEND statement ends the macro.
%do i=1 %to #
proc print data=&libname..&&ds&i noobs;
title "Data Set &libname..&&ds&i";
run;
%end;
%mend printall;
Print all the data sets in the PRINTLIB library. This invocation of the PRINTALL macro
prints all the data sets in the library PRINTLIB.
options nodate pageno=1 linesize=70 pagesize=60;
%printall(printlib)
Output
Output 34.12
State
Month
Expenses
Northern
Northern
Northern
Southern
Southern
Southern
Southern
MA
NY
NY
FL
FL
GA
GA
MAR95
FEB95
MAR95
FEB95
MAR95
JAN95
FEB95
1500
3000
6000
8500
9800
2000
1200
1
Revenues
1000
4000
5000
11000
13500
8000
6000
770
Output
Chapter 34
Street
City
Gabrielli, Theresa
Clayton, Aria
Dix, Martin L.
Slater, Emily C.
Ericson, Jane
An, Ing
Jacobson, Becky
Misiewicz, Jeremy
Ahmadi, Hafez
Archuleta, Ruby
24 Ridgetop Rd.
314 Bridge St.
4 Shepherd St.
2009 Cherry St.
211 Clancey Court
95 Willow Dr.
7 Lincoln St.
43-C Lakeview Apts.
5203 Marston Way
Box 108
Westboro
Hanover
Norwich
York
Chapel Hill
Charlotte
Tallahassee
Madison
Boulder
Milagro
2
State
MA
NH
VT
PA
NC
NC
FL
WI
CO
NM
Zip
01581
03755
05055
17407
27514
28211
32312
53704
80302
87429
771
CHAPTER
35
The PRINTTO Procedure
Overview: PRINTTO Procedure 771
Syntax: PRINTTO Procedure
772
PROC PRINTTO Statement 772
Concepts: PRINTTO Procedure 775
Page Numbering 775
Routing SAS Log or Procedure Output Directly to a Printer
Examples: PRINTTO Procedure
776
Example 1: Routing to External Files 776
Example 2: Routing to SAS Catalog Entries 779
Example 3: Using Procedure Output as an Input File 782
Example 4: Routing to a Printer 785
775
Table 35.1
windowing environment
772
Chapter 35
LABEL=
LOG=
NEW
PRINT=
Without Options
Using a PROC PRINTTO statement with no options
3 closes any les opened by a PROC PRINTTO statement
3 points both the SAS log and SAS procedure output to their default destinations.
Interaction: To close the appropriate le and to return only the SAS log or
773
Options
LABEL=description
provides a description for a catalog entry that contains a SAS log or procedure output.
Range: 1 to 256 characters
Interaction: Use the LABEL= option only when you specify a catalog entry as the
After routing the log to an external le or a catalog entry, you can specify LOG
to route the SAS log back to its default destination.
Tip:
When routing the SAS log, include a RUN statement in the PROC PRINTTO
statement. If you omit the RUN statement, the rst line of the following DATA or
PROC step is not routed to the new le. (This occurs because a statement does not
execute until a step boundary is crossed.)
Tip:
774
Chapter 35
Interaction: The SAS log and procedure output cannot be routed to the same
page 782
NEW
clears any information that exists in a le and prepares the le to receive the SAS
log or procedure output.
Default: If you omit NEW, the new information is appended to the existing le.
Interaction: If you specify both LOG= and PRINT=, NEW applies to both.
Featured in:
page 782
PRINT= PRINT | le-specication | SAS-catalog-entry
775
entry.OUTPUT
a SAS catalog entry stored in the default SAS library and catalog:
SASUSER.PROFILE.
leref
a leref previously assigned to a SAS catalog entry. Search for "FILENAME,
CATALOG Access Method" in the SAS online documentation.
Aliases:
FILE=, NAME=
Default: PRINT
Interaction: The procedure output and the SAS log cannot be routed to the same
procedure output. If you omit NEW, the new output is appended to the le.
Interaction: To route the SAS log and procedure output to the same le, specify the
the LABEL option to provide a description for the entry in the catalog directory.
Featured in:
UNIT=nn
You can dene this leref yourself; however, some operating systems predene
certain lerefs in this form.
Tip:
Page Numbering
3 When the SAS system option NUMBER is in effect, there is a single
page-numbering sequence for all output in the current job or session. When
NONUMBER is in effect, output pages are not numbered.
3 You can specify the beginning page number for the output you are currently
producing by using the PAGENO= in an OPTIONS statement.
776
Chapter 35
PRINTTO statement:
Without options
Options:
LOG=
NEW
PRINT=
This example uses PROC PRINTTO to route the log and procedure output to an
external le and then reset both destinations to the default.
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page. The SOURCE
option writes lines of source code to the default destination for the SAS log.
options nodate pageno=1 linesize=80 pagesize=60 source;
Route the SAS log to an external le. PROC PRINTTO uses the LOG= option to route the
SAS log to an external le. By default, this log is appended to the current contents of log-file.
proc printto log=log-file;
run;
Log
777
Create the NUMBERS data set. The DATA step uses list input to create the NUMBERS data
set.
data numbers;
input x y z;
datalines;
14.2
25.2
96.8
10.8
51.6
96.8
9.5
34.2 138.2
8.8
27.6
83.2
11.5
49.4 287.0
6.3
42.0 170.7
;
Route the procedure output to an external le. PROC PRINTTO routes output to an
external le. Because NEW is specied, any output written to output-file will overwrite the
les current contents.
proc printto print=output-file new;
run;
Print the NUMBERS data set. The PROC PRINT output is written to the specied external
le.
proc print data=numbers;
title Listing of NUMBERS Data Set;
run;
Reset the SAS log and procedure output destinations to default. PROC PRINTTO routes
subsequent logs and procedure output to their default destinations and closes both of the
current les.
proc printto;
run;
Log
Output 35.1
1
2
3
778
Output
Chapter 35
Output 35.2
5
6
7
8
data numbers;
input x y z;
datalines;
15
16
16
17
;
proc printto print=output-file new;
run;
18
19
20
21
22
23
24
proc printto;
run;
Output
Output 35.3
x
14.2
10.8
9.5
8.8
11.5
6.3
y
25.2
51.6
34.2
27.6
49.4
42.0
z
96.8
96.8
138.2
83.2
287.0
170.7
Program
779
PRINTTO statement:
Without options
Options:
LABEL=
LOG=
NEW
PRINT=
This example uses PROC PRINTTO to route the SAS log and procedure output to a
SAS catalog entry and then to reset both destinations to the default.
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60 source;
Assign a libname.
libname lib1 SAS-data-library;
Route the SAS log to a SAS catalog entry. PROC PRINTTO routes the SAS log to a SAS
catalog entry named SASUSER.PROFILE.TEST.LOG. The PRINTTO procedure uses the default
libref and catalog SASUSER.PROFILE because only the entry name and type are specied.
LABEL= assigns a description for the catalog entry.
proc printto log=test.log label=Inventory program new;
run;
Create the LIB1.INVENTORY data set. The DATA step creates a permanent SAS data set.
data lib1.inventry;
length Dept $ 4 Item $ 6 Season $ 6 Year 4;
input dept item season year @@;
datalines;
3070 20410 spring 1996 3070 20411 spring 1997
3070 20412 spring 1997 3070 20413 spring 1997
3070 20414 spring 1996 3070 20416 spring 1995
780
Program
Chapter 35
3071
3071
3071
3071
;
20500
20502
20505
20507
spring
spring
spring
spring
1994
1996
1994
1994
3071
3071
3071
3071
20501
20503
20506
20424
spring
spring
spring
spring
1995
1996
1994
1994
Route the procedure output to a SAS catalog entry. PROC PRINTTO routes procedure
output from the subsequent PROC REPORT step to the SAS catalog entry
LIB1.CAT1.INVENTRY.OUTPUT. LABEL= assigns a description for the catalog entry.
proc printto print=lib1.cat1.inventry.output
label=Inventory program new;
run;
proc report data=lib1.inventry nowindows headskip;
column dept item season year;
title Current Inventory Listing;
run;
Reset the SAS log and procedure output back to the default and close the le. PROC
PRINTTO closes the current les that were opened by the previous PROC PRINTTO step and
reroutes subsequent SAS logs and procedure output to their default destinations.
proc printto;
run;
Log
Output 35.4
You can view this catalog entry in the BUILD window of the SAS Explorer.
8
9
10
11
12
data lib1.inventry;
length Dept $ 4 Item $ 6 Season $ 6 Year 4;
input dept item season year @@;
datalines;
NOTE: SAS went to a new line when INPUT statement reached past the end of a
line.
NOTE: The data set LIB1.INVENTRY has 14 observations and 4 variables.
NOTE: DATA statement used:
real time
0.00 seconds
cpu time
0.00 seconds
20
21
22
23
24
;
proc printto print=lib1.cat1.inventry.output
label=Inventory program new;
run;
25
26
27
28
29
30
31
32
proc printto;
run;
Log
781
782
Output
Chapter 35
Output
Output 35.5
You can view this catalog entry in the BUILD window of the SAS Explorer.
Item
Season
Year
3070
3070
3070
3070
3070
3070
3071
3071
3071
3071
3071
3071
3071
3071
20410
20411
20412
20413
20414
20416
20500
20501
20502
20503
20505
20506
20507
20424
spring
spring
spring
spring
spring
spring
spring
spring
spring
spring
spring
spring
spring
spring
1996
1997
1997
1997
1996
1995
1994
1995
1996
1996
1994
1994
1994
1994
PRINTTO statement:
Without options
Options:
LOG=
NEW
PRINT=
This example uses PROC PRINTTO to route procedure output to an external le and
then uses that le as input to a DATA step.
Generate random values for the variables. The DATA step uses the RANUNI function to
randomly generate values for the variables X and Y in the data set A.
data test;
do n=1 to 1000;
x=int(ranuni(77777)*7);
y=int(ranuni(77777)*5);
output;
end;
run;
783
Assign a leref and route procedure output to the le that is referenced. The
FILENAME statement assigns a leref to an external le. PROC PRINTTO routes subsequent
procedure output to the le that is referenced by the leref ROUTED. See Output 35.6.
filename routed output-filename;
Produce the frequency counts. PROC FREQ computes frequency counts and a chi-square
analysis of the variables X and Y in the data set TEST. This output is routed to the le that is
referenced as ROUTED.
proc freq data=test;
tables x*y / chisq;
run;
Close the le. You must use another PROC PRINTTO to close the le that is referenced by
leref ROUTED so that the following DATA step can read it. The step also routes subsequent
procedure output to the default destination. PRINT= causes the step to affect only procedure
output, not the SAS log.
proc printto print=print;
run;
Create the data set PROBTEST. The DATA step uses ROUTED, the le containing PROC
FREQ output, as an input le and creates the data set PROBTEST. This DATA step reads all
records in ROUTED but creates an observation only from a record that begins with Chi-Squa.
data probtest;
infile routed;
input word1 $ @;
if word1=Chi-Squa then
do;
input df chisq prob;
keep chisq prob;
output;
end;
run;
Print the PROBTEST data set. PROC PRINT produces a simple listing of data set
PROBTEST. This output is routed to the default destination. See Output 35.7.
proc print data=probtest;
title Chi-Square Analysis for Table of X by Y;
run;
784
Output 35.6
Chapter 35
Frequency|
Percent |
Row Pct |
Col Pct |
0|
1|
2|
3|
4| Total
---------+--------+--------+--------+--------+--------+
0 |
29 |
33 |
12 |
25 |
27 |
126
|
2.90 |
3.30 |
1.20 |
2.50 |
2.70 | 12.60
| 23.02 | 26.19 |
9.52 | 19.84 | 21.43 |
| 15.18 | 16.18 |
6.25 | 11.74 | 13.50 |
---------+--------+--------+--------+--------+--------+
1 |
23 |
26 |
29 |
20 |
19 |
117
|
2.30 |
2.60 |
2.90 |
2.00 |
1.90 | 11.70
| 19.66 | 22.22 | 24.79 | 17.09 | 16.24 |
| 12.04 | 12.75 | 15.10 |
9.39 |
9.50 |
---------+--------+--------+--------+--------+--------+
2 |
28 |
26 |
32 |
30 |
25 |
141
|
2.80 |
2.60 |
3.20 |
3.00 |
2.50 | 14.10
| 19.86 | 18.44 | 22.70 | 21.28 | 17.73 |
| 14.66 | 12.75 | 16.67 | 14.08 | 12.50 |
---------+--------+--------+--------+--------+--------+
3 |
26 |
24 |
36 |
32 |
45 |
163
|
2.60 |
2.40 |
3.60 |
3.20 |
4.50 | 16.30
| 15.95 | 14.72 | 22.09 | 19.63 | 27.61 |
| 13.61 | 11.76 | 18.75 | 15.02 | 22.50 |
---------+--------+--------+--------+--------+--------+
4 |
25 |
31 |
28 |
36 |
29 |
149
|
2.50 |
3.10 |
2.80 |
3.60 |
2.90 | 14.90
| 16.78 | 20.81 | 18.79 | 24.16 | 19.46 |
| 13.09 | 15.20 | 14.58 | 16.90 | 14.50 |
---------+--------+--------+--------+--------+--------+
5 |
32 |
29 |
26 |
33 |
27 |
147
|
3.20 |
2.90 |
2.60 |
3.30 |
2.70 | 14.70
| 21.77 | 19.73 | 17.69 | 22.45 | 18.37 |
| 16.75 | 14.22 | 13.54 | 15.49 | 13.50 |
---------+--------+--------+--------+--------+--------+
6 |
28 |
35 |
29 |
37 |
28 |
157
|
2.80 |
3.50 |
2.90 |
3.70 |
2.80 | 15.70
| 17.83 | 22.29 | 18.47 | 23.57 | 17.83 |
| 14.66 | 17.16 | 15.10 | 17.37 | 14.00 |
---------+--------+--------+--------+--------+--------+
Total
191
204
192
213
200
1000
19.10
20.40
19.20
21.30
20.00
100.00
2
The FREQ Procedure
Statistics for Table of x by y
Statistic
DF
Value
Prob
-----------------------------------------------------Chi-Square
24
27.2971
0.2908
Likelihood Ratio Chi-Square
24
28.1830
0.2524
Mantel-Haenszel Chi-Square
1
0.6149
0.4330
Phi Coefficient
0.1652
Contingency Coefficient
0.1630
Cramers V
0.0826
Sample Size = 1000
Output 35.7
Program
785
chisq
prob
27.297
0.291
PRINTTO statement:
Option:
PRINT= option
This example uses PROC PRINTTO to route procedure output directly to a printer.
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60;
Associate a leref with the printer name. The FILENAME statement associates a leref
with the printer name that you specify. If you want to associate a leref with the default printer,
omit printer-name.
filename your_fileref printer printer-name;
Specify the le to route to the printer. The PRINT= option species the le that PROC
PRINTTO routes to the printer.
proc printto print=your_fileref;
run;
786
787
CHAPTER
36
The PROTO Procedure
Information about the PROTO Procedure
787
list.
788
789
CHAPTER
37
The PRTDEF Procedure
Overview: PRTDEF Procedure 789
Syntax: PRTDEF Procedure
789
PROC PRTDEF Statement 789
Input Data Set: PRTDEF Procedure 791
Summary of Valid Variables 791
Required Variables 792
Optional Variables 793
Examples: PRTDEF Procedure 796
Example 1: Dening Multiple Printer Denitions 796
Example 2: Creating a Ghostview Printer in SASUSER to Preview PostScript Printer Output in
SASUSER 797
Example 3: Creating a Single Printer Denition That Is Available to All Users 798
Example 4: Adding, Modifying, and Deleting Printer Denitions 799
Example 5: Deleting a Single Printer Denition 801
790
Chapter 37
To do this
DATA=
DELETE
Specify that the registry entries are being created for export
to a different host
FOREIGN
LIST
REPLACE
USESASHELP
Options
DATA=SAS-data-set
species the SAS input data set that contains the printer attributes.
Requirements: Printer attributes variables that must be specied are DEST,
DEVICE, MODEL, and NAME, except when the value of the variable OPCODE is
DELETE, in which case only the NAME variable is required.
DELETE
species that the default operation is to delete the printer denitions from the
registry.
Interaction: If both DELETE and REPLACE are specied, then DELETE is the
default operation.
Tip: If the user-dened printer denition is deleted, then the administrator-dened
printer may still appear if it exists in the SASHELP catalog.
FOREIGN
species that the registry entries are being created for export to a different host. As a
consequence, tests of any host-dependent items, such as the TRANTAB, are skipped.
LIST
species that a list of printers that are created or replaced will be written to the log.
REPLACE
species that the default operation is to modify existing printer denitions. Any
printer name that already exists will be modied by using the information in the
printer attributes data set. Any printer name that does not exist will be added.
Interaction: If both REPLACE and DELETE are specied, then a DELETE will be
performed.
USESASHELP
species that the printer denitions that are to be placed in the SASHELP library,
where they are available to all users.
If the USESASHELP option is not specied, then the printer denitions that are
placed in the current SASUSER library, where they are available to the local user
only.
791
Restriction: To use the USESASHELP option, you must have permission to write
Variable Description
Required
DEST
Destination
DEVICE
Device
MODEL
Prototype
NAME
Printer name
BOTTOM
CHARSET
DESC
Description
FONTSIZE
HOSTOPT
Host options
LEFT
LRECL
OPCODE
Operation code
PAPERIN
PAPEROUT
PAPERSIZ
Paper size
Optional
792
Required Variables
Chapter 37
Variable Name
Variable Description
PAPERTYP
Paper type
PREVIEW
Preview
PROTOCOL
Protocol
RES
RIGHT
STYLE
TOP
TRANTAB
Translation table
TYPEFACE
Default font
UNITS
CM or IN units
VIEWER
Viewer
WEIGHT
Required Variables
To create or modify a printer, you must supply the NAME, MODEL, DEVICE, and
DEST variables. All the other variables use default values from the printer prototype
that is specied by the MODEL variable.
To delete a printer, specify only the required NAME variable.
The following variables are required in the input data set:
DEST
DEVICE
species the type of I/O device to use when sending output to the
printer. Valid devices are listed in the Printer Denition wizard and
in the SAS Registry Editor.
Restriction: DEVICE is limited to 31 characters.
MODEL
NAME
species the printer denition name that will be associated with the
rest of the attributes in the printer denition.
The name is unique within a given registry. If a new printer
denition contains a name that already exists, then the record will
Optional Variables
793
Optional Variables
The following variables are optional in the input data set:
BOTTOM
species the default bottom margin in the units that are specied by the UNITS
variable.
CHARSET
species the default font character set.
Restriction: The value must be one of the character set names in the typeface
that is specied by the TYPEFACE variable.
Restriction: CHARSET is limited to 31 characters.
DESC
species the description of the printer.
Restriction: The description can have a maximum of 1023 characters.
Default: DESC defaults to the prototype that is used to create the printer.
FONTSIZE
species the point size of the default font.
HOSTOPT
species any host options for the output destination. The host options are not case
sensitive.
Restriction: The host options can have a maximum of 1023 characters.
LEFT
species the default left margin in the units that are specied by the UNITS
variable.
LRECL
species the buffer size or record length to use when sending output to the printer.
Default: If LRECL is less than zero when modifying an existing printer, the
printers buffer size will be reset to that specied by the printer prototype.
OPCODE
is a character variable that species what action (Add, Delete, or Modify) to
perform on the printer denition.
Add
creates a new printer denition in the registry. If the REPLACE option has
been specied, then this operation will also modify an existing printer
denition.
Delete
removes an existing printer denition from the registry.
794
Optional Variables
Chapter 37
Tip:
PAPERIN
species the default paper source or input tray.
Restriction: The value of PAPERIN must be one of the paper source names in
the printer prototype that is specied by the MODEL variable.
Restriction: PAPERIN is limited to 31 characters.
PAPEROUT
species the default paper destination or output tray.
Restriction: The value of PAPEROUT must be one of the paper destination
names in the printer prototype that is specied by the MODEL variable.
Restriction: PAPEROUT is limited to 31 characters.
PAPERSIZ
species the default paper source or input tray.
Restriction: The value of PAPERSIZ must be one of the paper size names listed
in the printer prototype that is specied by the MODEL variable.
Restriction: PAPERSIZ is limited to 31 characters.
PAPERTYP
species the default paper type.
Restriction: The value of PAPERTYP must be one of the paper source names
listed in the printer prototype that is specied by the MODEL variable.
Restriction: PAPERTYP is limited to 31 characters.
PREVIEW
species the printer application to use for print preview.
Restriction: PREVIEW is limited to 127 characters.
PROTOCOL
species the I/O protocol to use when sending output to the printer.
Operating Environment Information: On mainframe systems, the protocol
describes how to convert the output to a format that can be processed by a protocol
converter that connects the mainframe to an ASCII device. 4
Restriction: PROTOCOL is limited to 31 characters.
RES
species the default printer resolution.
Restriction: The value of RES must be one of the resolution values available to
the printer prototype that is specied by the MODEL variable.
Restriction: RES is limited to 31 characters.
RIGHT
species the default right margin in the units that are specied by the UNITS
variable.
Optional Variables
795
STYLE
species the default font style.
Restriction: The value of STYLE must be one of the styles available to the
TOP
species the default top margin in the units that are specied by the UNITS
variable.
TRANTAB
species which translation table to use when sending output to the printer.
Operating Environment Information: The translation table is needed when an
EBCDIC host sends data to an ASCII device. 4
Restriction: TRANTAB is limited to 8 characters.
TYPEFACE
species the typeface of the default font.
Restriction: The typeface must be one of the typeface names available to the
UNITS
species the units CM or IN that are used by margin variables.
VIEWER
species the host system command that is to be used during print previews. As a
result, PROC PRTDEF causes a preview printer to be created.
Preview printers are specialized printers that are used to display printer output
on the screen before printing.
The values of the PREVIEW, PROTOCOL, DEST, and HOSTOPT variables
are ignored when a value for VIEWER has been specied. Place %s where the
input lename would normally be in the viewer command. The %s can be used
as many times as needed.
Tip:
WEIGHT
species the default font weight.
Restriction: The value must be one of the valid weights for the typeface that is
796
Chapter 37
Program
Create the PRINTERS data set. The INPUT statement contains the names of the four
required variables. Each data line contains the information that is needed to produce a single
printer denition.
data printers;
input name $ 1-14 model $ 16-42 device $ 46-53 dest $ 57-70;
datalines;
Myprinter
PostScript Level 1 (Color)
PRINTER
printer1
Laserjet
PCL 5
PIPE
lp -dprinter5
Color LaserJet PostScript Level 2 (Color)
PIPE
lp -dprinter2
;
Specify the input data set that contains the printer attributes, create the printer
denitions, and make the denitions available to all users. The DATA= option species
PRINTERS as the input data set that contains the printer attributes.
PROC PRTDEF creates the printer denitions for the SAS registry, and the USESASHELP
option species that the printer denitions will be available to all users.
proc prtdef data=printers usesashelp;
run;
Program
797
This example creates a Ghostview printer denition in the SASUSER library for
previewing PostScript output.
Program
Create the GSVIEW data set, and specify the printer name, printer description,
printer prototype, and commands to be used for print preview. The GSVIEW data set
contains the variables whose values contain the information that is needed to produce the
printer denitions.
The NAME variable species the printer name that will be associated with the rest of the
attributes in the printer denition data record.
The DESC variable species the description of the printer.
The MODEL variable species the printer prototype to use when dening this printer.
The VIEWER variable species the host system commands to be used for print preview.
GSVIEW must be installed on your system and the value for VIEWER must include the path
to nd it. You must enclose the value in single quotation marks because of the %s. If you use
double quotation marks, SAS will assume that %s is a macro variable.
DEVICE and DEST are required variables, but no value is needed in this example. Therefore,
a dummy or blank value should be assigned.
data gsview;
name = "Ghostview";
desc = "Print Preview with Ghostview";
model= "PostScript Level 2 (Color)";
viewer = ghostview %s;
device = "Dummy";
dest = " ";
798
Chapter 37
Specify the input data set that contains the printer attributes, create the printer
denitions, write the printer denitions to the SAS log, and replace a printer
denition in the SAS registry. The DATA= option species GSVIEW as the input data set
that contains the printer attributes.
PROC PRTDEF creates the printer denitions.
The LIST option species that a list of printers that are created or replaced will be written to
the SAS log.
The REPLACE option species that a printer denition will replace a printer denition in the
registry if the name of the printer denition matches a name already in the registry. If the
printer denition names do not match, then the new printer denition is added to the registry.
proc prtdef data=gsview list replace;
run;
This example creates a denition for a Tektronix Phaser 780 printer with a
Ghostview print previewer with the following specications:
799
Program
Create the TEK780 data set and supply appropriate information for the printer
destination. The TEK780 data set contains the variables whose values contain the information
that is needed to produce the printer denitions.
In the example, assignment statements are used to assign these variables.
The NAME variable species the printer name that will be associated with the rest of the
attributes in the printer denition data record.
The DESC variable species the description of the printer.
The MODEL variable species the printer prototype to use when dening this printer.
The DEVICE variable species the type of I/O device to use when sending output to the printer.
The DEST variable species the output destination for the printer.
The PREVIEW variable species which printer will be used for print preview.
The UNITS variable species whether the margin variables are measured in centimeters or
inches.
The BOTTOM variable species the default bottom margin in the units that are specied by the
UNITS variable.
The FONTSIZE variable species the point size of the default font.
The PAPERSIZ variable species the default paper size.
data tek780;
name = "Tek780";
desc = "Test Lab Phaser 780P";
model = "Tek Phaser 780 Plus";
device = "PRINTER";
dest = "testlab3";
preview = "Ghostview";
units = "cm";
bottom = 2.5;
fontsize = 14;
papersiz = "ISO A4";
run;
Create the TEK780 printer denition. The DATA= option species TEK780 as the input
data set.
proc prtdef data=tek780;
run;
800
Program
Chapter 37
This example
3 adds two printer denitions
Program
Create the PRINTERS data set and specify which actions to perform on the printer
denitions. The PRINTERS data set contains the variables whose values contain the
information that is needed to produce the printer denitions.
The MODEL variable species the printer prototype to use when dening this printer.
The DEVICE variable species the type of I/O device to use when sending output to the printer.
The DEST variable species the output destination for the printer.
The OPCODE variable species which action (add, delete, or modify) to perform on the printer
denition.
The rst Add operation creates a new printer denition for Color PostScript in the SAS registry,
and the second Add operation creates a new printer denition for ColorPS in the SAS registry.
The Mod operation modies the existing printer denition for LaserJet 5 in the registry.
The Del operation deletes the printer denitions for Gray PostScript and test from the registry.
The & species that two or more blanks separate character values. This allows the name and
model value to contain blanks.
data printers;
length name
$ 80
model $ 80
device $ 8
dest
$ 80
opcode $ 3
;
input opcode $& name $&
datalines;
add Color PostScript
mod LaserJet 5
del Gray PostScript
del test
add ColorPS
;
Level 2 (Color)
Level 2 (Gray Scale)
Level 2 (Color)
Level 2 (Color)
DISK
DISK
DISK
DISK
DISK
sasprt.ps
sasprt.pcl
sasprt.ps
sasprt.ps
sasprt.ps
Create multiple printer denitions and write them to the SAS log. The DATA= option
species the input data set PRINTERS that contains the printer attributes. PROC PRTDEF
creates ve printer denitions, two of which have been deleted. The LIST option species that a
list of printers that are created or replaced will be written to the log.
proc prtdef data=printers library=sasuser list;
run;
Program
801
This example shows you how to delete a printer from the registry.
Program
Create the DELETEPRT data set. The NAME variable contains the name of the printer to
delete.
data deleteprt;
name=printer1;
run;
Delete the printer denition from the registry and write the deleted printer to the log.
The DATA= option species DELETEPRT as the input data set.
PROC PRTDEF creates printer denitions for the SAS registry.
DELETE species that the printer is to be deleted.
LIST species to write the deleted printer to the log.
proc prtdef data=deleteprt delete list;
run;
802
See Also
Chapter 37
See Also
Procedures
Chapter 38, The PRTEXP Procedure, on page 803
803
CHAPTER
38
The PRTEXP Procedure
Overview: PRTEXP Procedure 803
Syntax: PRTEXP Procedure
803
PROC PRTEXP Statement 804
EXCLUDE Statement 804
SELECT Statement 804
Concepts: PRTEXP Procedure 805
Examples: PRTEXP Procedure 805
Example 1: Writing Attributes to the SAS Log 805
Example 2: Writing Attributes to a SAS Data Set 806
804
Chapter 38
Options
USESASHELP
species that SAS search only the SASHELP portion of the registry for printer
denitions.
Default: The default is to search both the SASUSER and SASHELP portions of the
registry for printer denitions.
OUT=SAS-data-set
species the SAS data set to create that contains the printer denitions.
The data set that is specied by the OUT=SAS-data-set option is the same type of
data set that is specied by the DATA=SAS-data-set option in PROC PRTDEF to
dene each printer.
Default: If OUT=SAS-data-set is not specied, then the data that is needed to
dene each printer is written to the SAS log.
EXCLUDE Statement
The EXCLUDE statement will cause the output to contain information from all those printers that
are not listed.
EXCLUDE printer_1 <printer_n>;
Required Arguments
printer_1 printer_n
species the printer(s) that you do not want the output to contain information about.
SELECT Statement
The SELECT statement will cause the output to contain information from only those printers that
are listed.
SELECT printer_1 <printer_n>;
Required Arguments
printer_1 printer_n
species the printer(s) that you would like the output to contain information about.
Program
805
This example shows you how to write the attributes that are used to dene a printer
to the SAS log.
Program
Specify the printer that you want information about, specify that only the SASHELP
portion of the registry be searched, and write the information to the SAS log. The
SELECT statement species that you want the attribute information that is used to dene the
printer Postscript to be included in the output. The USESASHELP option species that only the
SASHELP registry is to be searched for Postscripts printer denitions. The data that is needed
to dene each printer is written to the SAS log because the OUT= option was not used to specify
a SAS data set.
proc prtexp usesashelp;
select postscript;
run;
806
Chapter 38
This example shows you how to create a SAS data set that contains the data that
PROC PRTDEF would use to dene the printers PCL4, PCL5, PCL5E, and PCLC.
Program
Specify the printers that you want information about and create the PRDVTER data
set. The SELECT statement species the printers PCL4, PCL5, PCL5E, and PCLC. The OUT=
option creates the SAS data set PRDVTER, which contains the same attributes that are used by
PROC PRTDEF to dene the printers PCL4, PCL5, PCL5E, and PCLC. SAS will search both
the SASUSER and SASHELP registries, because USESASHELP was not specied.
proc prtexp out=PRDVTER;
select pcl4 pcl5 pcl5e pcl5c;
run;
See Also
Procedures
Chapter 37, The PRTDEF Procedure, on page 789
807
CHAPTER
39
The PWENCODE Procedure
Overview: PWENCODE Procedure 807
Syntax: PWENCODE Procedure 807
PROC PWENCODE Statement 807
Concepts: PWENCODE Procedure 808
Using Encoded Passwords in SAS Programs 808
Encoding versus Encryption 808
Examples: PWENCODE Procedure 809
Example 1: Encoding a Password 809
Example 2: Using an Encoded Password in a SAS Program 809
Example 3: Saving an Encoded Password to the Paste Buffer 811
808
Chapter 39
Required Argument
IN=password
species the password to encode. password can have no more than 512 characters.
password can contain letters, numerals, spaces, and special characters. If password
contains embedded single or double quotation marks, then use the standard SAS
rules for quoting character constants (see SAS Constants in Expressions in SAS
Language Reference: Concepts for details).
Featured in:
page 811
Options
OUT=leref
species a leref to which the output string is to be written. If the OUT= option is
not specied, then the output string is written to the SAS log.
Featured in:
METHOD=encoding-method
species the encoding method to use. Currently, sas001 is the only supported
encoding method and is the default if the METHOD= option is omitted.
809
IN= argument
This example shows a simple case of encoding a password and writing the encoded
password to the SAS log.
Program
Log
Output 39.1
6
7
{sas001}bXkgcGFzc3dvcmQ=
NOTE: PROCEDURE PWENCODE used (Total process time):
real time
0.31 seconds
cpu time
0.08 seconds
IN= argument
OUT= option
This example
3 encodes a password and saves it to an external le
3 reads the encoded password with a DATA step, stores it in a macro variable, and
uses it in a SAS/ACCESS LIBNAME statement.
810
Chapter 39
Declare a leref.
filename pwfile external-filename
Encode the password and write it to the external le. The OUT= option species which
external leref the encoded password will be written to.
proc pwencode in=mypass1 out=pwfile;
run;
Set the SYMBOLGEN SAS system option. The purpose of this step is to show that the
actual password cannot be revealed, even when the macro variable that contains the encoded
password is resolved in the SAS log. This step is not required in order for the program to work
properly. For more information about the SYMBOLGEN SAS system option, see SAS Macro
Language: Reference.
options symbolgen;
Read the le and store the encoded password in a macro variable. The DATA step stores
the encoded password in the macro variable DBPASS. For details about the INFILE and INPUT
statements, the $VARYING. informat, and the CALL SYMPUT routine, see SAS Language
Reference: Dictionary.
data _null_;
infile pwfile obs=1 length=l;
input @;
input @1 line $varying1024. l;
call symput(dbpass,substr(line,1,l));
run;
Use the encoded password to access a DBMS. You must use double quotation marks ( ) so
that the macro variable resolves properly.
libname x odbc dsn=SQLServer user=testuser password="&dbpass";
Program
811
Log
28
29
30
31
32
33
data _null_;
infile pwfile obs=1 length=l;
input @;
input @1 line $varying1024. l;
call symput(dbpass,substr(line,1,l));
run;
34
libname x odbc
SYMBOLGEN: Macro variable DBPASS resolves to {sas001}bXlwYXNzMQ==
34 !
dsn=SQLServer user=testuser password="&dbpass";
NOTE: Libref X was successfully assigned as follows:
Engine:
ODBC
Physical Name: SQLServer
IN= argument
OUT= option
Other features:
This example saves an encoded password to the paste buffer. You can then paste the
encoded password into another SAS program or into the password eld of an
authentication dialog box.
Program
Declare a leref with the CLIPBRD access method. For more information about the
FILENAME statement with the CLIPBRD accedd method, see SAS Language Reference:
Dictionary.
filename clip clipbrd;
812
Program
Chapter 39
Encode the password and save it to the paste buffer. The OUT= option saves the encoded
password to the leref that was declared in the previous statement.
proc pwencode in=my password out=clip;
run;
813
CHAPTER
40
The RANK Procedure
Overview: RANK Procedure 813
What Does the RANK Procedure Do? 813
Ranking Data 814
Syntax: RANK Procedure 815
PROC RANK Statement 816
BY Statement 818
RANKS Statement 819
VAR Statement 820
Concepts: RANK Procedure 820
Computer Resources 820
Statistical Applications 820
Results: RANK Procedure 821
Missing Values 821
Output Data Set 821
Examples: RANK Procedure 822
Example 1: Ranking Values of Multiple Variables 822
Example 2: Ranking Values within BY Groups 823
Example 3: Partitioning Observations into Groups Based on Ranks
References 829
826
814
Ranking Data
Chapter 40
Ranking Data
Output 40.1 shows the results of ranking the values of one variable with a simple
PROC RANK step. In this example, the new ranking variable shows the order of nish
of ve golfers over a four-day competition. The player with the lowest number of
strokes nishes in rst place. The following statements produce the output:
proc rank data=golf out=rankings;
var strokes;
ranks Finish;
run;
proc print data=rankings;
run;
Output 40.1
Player
Jack
Jerry
Mike
Randy
Tito
Strokes
Finish
279
283
274
296
302
2
3
1
4
5
In Output 40.2, the candidates for city council are ranked by district according to
the number of votes that they received in the election and according to the number of
years that they have served in ofce.
This example shows how PROC RANK can
3 reverse the order of the rankings so that the highest value receives the rank of 1,
the next highest value receives the rank of 2, and so on
3 rank the observations separately by values of multiple variables
Output 40.2
815
Assignment of the Lowest Rank Value to the Highest Variable Value within Each BY Group
Results of City Council Election
Obs
1
2
3
4
Candidate
Vote
Years
Cardella
Latham
Smith
Walker
1689
1005
1406
846
Vote
Rank
8
2
0
0
1
3
2
4
Years
Rank
1
2
3
3
N = 4
Obs
5
6
7
8
Candidate
Vote
Years
Hinkley
Kreitemeyer
Lundell
Thrash
912
1198
2447
912
Vote
Rank
0
0
6
2
3
2
1
3
Years
Rank
3
3
1
2
N = 4
Chapter 3, Statements with the Same Function in Multiple Procedures, on page 57 for
details. You can also use any global statements. See Global Statements on page 18 for
a list.
To do this
BY
RANKS
VAR
816
Chapter 40
To do this
DATA=
OUT=
FRACTION or NPLUS1
GROUPS=
NORMAL=
Compute percentages
PERCENT
SAVAGE
DESCENDING
TIES=
Note:
You can specify only one ranking method in a single PROC RANK step.
Options
DATA=SAS-data-set
access if another user is updating the data set at the same time.
DESCENDING
reverses the direction of the ranks. With DESCENDING, the largest value receives a
rank of 1, the next largest value receives a rank of 2, and so on. Otherwise, values
are ranked from smallest to largest.
Featured in:
FRACTION
817
Featured in:
NORMAL=BLOM | TUKEY | VW
computes normal scores from the ranks. The resulting variables appear normally
distributed. The formulas are
BLOM
yi=8 (ri3/8)/(n+1/4)
TUKEY
yi=8 (ri1/3)/(n+1/3)
VW
yi=8 (ri)/(n+1)
where 8 is the inverse cumulative normal (PROBIT) function, ri is the rank of the
ith observation, and n is the number of nonmissing observations for the ranking
variable.
VW stands for van der Waerden. With NORMAL=VW, you can use the scores for a
nonparametric location test. All three normal scores are approximations to the exact
expected order statistics for the normal distribution, also called normal scores. The
BLOM version appears to t slightly better than the others (Blom 1958; Tukey 1962).
1
Interaction: If you specify the TIES= option, then PROC RANK computes the
normal score from the ranks based on non-tied values and applies the TIES=
specication to the resulting normal score.
NPLUS1
computes fractional ranks by dividing each rank by the denominator n+1, where n is
the number of observations having nonmissing values of the ranking variable.
Aliases:
FN1, N1
names the output data set. If SAS-data-set does not exist, PROC RANK creates it. If
you omit OUT=, the data set is named using the DATAn naming convention.
818
BY Statement
Chapter 40
PERCENT
divides each rank by the number of observations that have nonmissing values of the
variable and multiplies the result by 100 to get a percentage.
Alias:
Tip:
SAVAGE
computes Savage (or exponential) scores from the ranks by the following formula
(Lehman 1998):
yi
3
2
X 1
501
=4
j
=n0r +1 j
i
species how to compute normal scores or ranks for tied data values.
HIGH
assigns the largest of the corresponding ranks (or largest of the normal scores
when NORMAL= is specied).
LOW
assigns the smallest of the corresponding ranks (or smallest of the normal scores
when NORMAL= is specied).
MEAN
assigns the mean of the corresponding rank (or mean of the normal scores when
NORMAL= is specied).
Default: MEAN (unless the FRACTION option or PERCENT option is in effect).
Interaction: If you specify the NORMAL= option, then the TIES= specication
applies to the normal score, not to the rank that is used to compute the normal
score.
Featured in:
BY Statement
Produces a separate set of ranks for each BY group.
Main discussion: BY on page 58
Featured in:
RANKS Statement
819
Required Arguments
variable
species the variable that the procedure uses to form BY groups. You can specify
more than one variable. If you do not use the NOTSORTED option in the BY
statement, the observations in the data set must either be sorted by all the variables
that you specify, or they must be indexed appropriately. Variables in a BY statement
are called BY variables.
Options
DESCENDING
species that the observations are sorted in descending order by the variable that
immediately follows the word DESCENDING in the BY statement.
NOTSORTED
species that observations are not necessarily sorted in alphabetic or numeric order.
The observations are grouped in another way, such as chronological order.
The requirement for ordering or indexing observations according to the values of
BY variables is suspended for BY-group processing when you use the
NOTSORTED option. In fact, the procedure does not use an index if you specify
NOTSORTED. The procedure denes a BY group as a set of contiguous observations
that have the same values for all BY variables. If observations with the same values
for the BY variables are not contiguous, the procedure treats each contiguous set as a
separate BY group.
RANKS Statement
Creates new variables for the rank values.
Requirement:
If you use the RANKS statement, you must also use the VAR statement.
If you omit the RANKS statement, the rank values replace the original variable
values in the output data set.
Default:
RANKS new-variables(s);
Required Arguments
new-variable(s)
species one or more new variables that contain the ranks for the variable(s) listed in
the VAR statement. The rst variable listed in the RANKS statement contains the
ranks for the rst variable listed in the VAR statement, the second variable listed in
the RANKS statement contains the ranks for the second variable listed in the VAR
statement, and so forth.
820
VAR Statement
Chapter 40
VAR Statement
Species the input variables.
If you omit the VAR statement, PROC RANK computes ranks for all numeric
variables in the input data set.
Default:
Featured in:
Example 1 on page 822, Example 2 on page 823, and Example 3 on page 826
VAR data-set-variables(s);
Required Arguments
data-set-variable(s)
Computer Resources
PROC RANK stores all values in memory of the variables for which it computes
ranks.
Statistical Applications
Ranks are useful for investigating the distribution of values for a variable. The ranks
divided by n or n+1 form values in the range 0 to 1, and these values estimate the
cumulative distribution function. You can apply inverse cumulative distribution
functions to these fractional ranks to obtain probability quantile scores, which you can
compare to the original values to judge the t to the distribution. For example, if a set
of data has a normal distribution, the normal scores should be a linear function of the
original values, and a plot of scores versus original values should be a straight line.
Many nonparametric methods are based on analyzing ranks of a variable:
3 A two-sample t-test applied to the ranks is equivalent to a Wilcoxon rank sum test
using the t approximation for the signicance level. If you apply the t-test to the
normal scores rather than to the ranks, the test is equivalent to the van der
Waerden test. If you apply the t-test to median scores (GROUPS=2), the test is
equivalent to the median test.
821
3 You can obtain a Friedmans two-way analysis for block designs by ranking within
BY groups and then performing a main-effects analysis of variance on these ranks
(Conover 1998).
Missing Values
Missing values are not ranked and are left missing when ranks or rank scores
replace the original values in the output data set.
822
Chapter 40
PRINT procedure
This example
3 reverses the order of the ranks so that the highest value receives the rank of 1
3 assigns tied values the best possible rank
3 creates ranking variables and prints them with the original variables.
Program
Set the SAS system options. The NODATE option species to omit the date and time when
the SAS job began. The PAGENO= option species the page number for the next page of output
that SAS produces. The LINESIZE= option species the line size. The PAGESIZE= option
species the number of lines for a page of SAS output.
options nodate pageno=1 linesize=80 pagesize=60;
Create the CAKE data set. This data set contains each participants last name, score for
presentation, and score for taste in a cake-baking contest.
data cake;
input Name
datalines;
Davis
77
Orlando
93
Ramey
68
Roe
68
Sanders
56
Simms
68
Strickland 82
;
823
Generate the ranks for the numeric variables in descending order and create the
output data set ORDER. DESCENDING reverses the order of the ranks so that the high
score receives the rank of 1. TIES=LOW gives tied values the best possible rank. OUT= creates
the output data set ORDER.
proc rank data=cake out=order descending ties=low;
Create two new variables that contain ranks. The VAR statement species the variables to
rank. The RANKS statement creates two new variables, PresentRank and TasteRank, that
contain the ranks for the variables Present and Taste, respectively.
var present taste;
ranks PresentRank TasteRank;
run;
Print the data set. PROC PRINT prints the ORDER data set. The TITLE statement species
a title.
proc print data=order;
title "Rankings of Participants Scores";
run;
Output
Rankings of Participants Scores
Obs
1
2
3
4
5
6
7
Name
Present
Davis
Orlando
Ramey
Roe
Sanders
Simms
Strickland
Taste
77
93
68
68
56
68
82
84
80
72
75
79
77
79
Present
Rank
Taste
Rank
3
1
4
4
7
4
2
1
2
7
6
3
5
3
824
Program
Chapter 40
Other features:
PRINT procedure
This example
3 ranks observations separately within BY groups
3 reverses the order of the ranks so that the highest value receives the rank of 1
3 assigns tied values the best possible rank
3 creates ranking variables and prints them with the original variables.
Program
Set the SAS system options. The NODATE option species to omit the date and time when
the SAS job began. The PAGENO= option species the page number for the next page of output
that SAS produces. The LINESIZE= option species the line size. The PAGESIZE= option
species the number of lines for a page of SAS output.
options nodate pageno=1 linesize=80 pagesize=60;
Create the ELECT data set. This data set contains each candidates last name, district
number, vote total, and number of years experience on the city council.
data elect;
input Candidate
datalines;
Cardella
1 1689
Latham
1 1005
Smith
1 1406
Walker
1 846
Hinkley
2 912
Kreitemeyer 2 1198
Lundell
2 2447
Thrash
2 912
;
Generate the ranks for the numeric variables in descending order and create the
output data set RESULTS. DESCENDING reverses the order of the ranks so that the highest
vote total receives the rank of 1. TIES=LOW gives tied values the best possible rank. OUT=
creates the output data set RESULTS.
proc rank data=elect out=results ties=low descending;
Create a separate set of ranks for each BY group. The BY statement separates the
rankings by values of District.
by district;
Output
825
Create two new variables that contain ranks. The VAR statement species the variables to
rank. The RANKS statement creates the new variables, VoteRank and YearsRank, that contain
the ranks for the variables Vote and Years, respectively.
var vote years;
ranks VoteRank YearsRank;
run;
Print the data set. PROC PRINT prints the RESULTS data set. The N option prints the
number of observations in each BY group. The TITLE statement species a title.
proc print data=results n;
by district;
title Results of City Council Election;
run;
Output
In the second district, Hinkley and Thrash tied with 912 votes. They both receive a rank of 3
because TIES=LOW.
Obs
1
2
3
4
Candidate
Vote
Years
Cardella
Latham
Smith
Walker
1689
1005
1406
846
8
2
0
0
Vote
Rank
1
3
2
4
Years
Rank
1
2
3
3
N = 4
Obs
5
6
7
8
Candidate
Vote
Years
Hinkley
Kreitemeyer
Lundell
Thrash
912
1198
2447
912
0
0
6
2
N = 4
Vote
Rank
3
2
1
3
Years
Rank
3
3
1
2
826
Chapter 40
PRINT procedure
SORT procedure
This example
3 partitions observations into groups on the basis of values of two input variables
3 groups observations separately within BY groups
3 replaces the original variable values with the group values.
Program
Set the SAS system options. The NODATE option species to omit the date and time when
the SAS job began. The PAGENO= option species the page number for the next page of output
that SAS produces. The LINESIZE= option species the line size. The PAGESIZE= option
species the number of lines for a page of SAS output.
options nodate pageno=1 linesize=80 pagesize=60;
Create the SWIM data set. This data set contains swimmers rst names and their times, in
seconds, for the backstroke and the freestyle. This example groups the swimmers into pairs,
within male and female classes, based on times for both strokes so that every swimmer is paired
with someone who has a similar time for each stroke.
data swim;
input Name $ 1-7 Gender $ 9 Back 11-14 Free 16-19;
datalines;
Andrea F 28.6 30.3
Carole F 32.9 24.0
Clayton M 27.0 21.9
Curtis M 29.0 22.6
Doug
M 27.3 22.4
Ellen
F 27.8 27.0
Jan
F 31.3 31.2
Jimmy
M 26.3 22.5
Karin
F 34.6 26.2
Mick
M 29.0 25.4
Richard M 29.7 30.2
Sam
M 27.2 24.1
Susan
;
Program
827
F 35.1 36.1
Sort the SWIM data set and create the output data set PAIRS. PROC SORT sorts the
data set by Gender. This is required to obtain a separate set of ranks for each group. OUT=
creates the output data set PAIRS.
proc sort data=swim out=pairs;
by gender;
run;
Generate the ranks that are partitioned into three groups and create an output data
set. GROUPS=3 assigns one of three possible group values (0,1,2) to each swimmer for each
stroke. OUT= creates the output data set RANKPAIR.
proc rank data=pairs out=rankpair groups=3;
Create a separate set of ranks for each BY group. The BY statement separates the
rankings by Gender.
by gender;
Replace the original values of the variables with the rank values. The VAR statement
species that Back and Free are the variables to rank. With no RANKS statement, PROC
RANK replaces the original variable values with the group values in the output data set.
var back free;
run;
Print the data set. PROC PRINT prints the RANKPAIR data set. The N option prints the
number of observations in each BY group. The TITLE statement species a title.
proc print data=rankpair n;
by gender;
title Pairings of Swimmers for Backstroke and Freestyle;
run;
828
Output
Chapter 40
Output
The group values pair up swimmers with similar times to work on each stroke. For example,
Andrea and Ellen work together on the backstroke because they have the fastest times in the
female class. The groups of male swimmers are unbalanced because there are seven male
swimmers; for each stroke, one group has three swimmers.
Name
Back
Andrea
Carole
Ellen
Jan
Karin
Susan
0
1
0
1
2
2
Free
1
0
1
2
0
2
N = 6
Name
Back
Clayton
Curtis
Doug
Jimmy
Mick
Richard
Sam
N = 7
0
2
1
0
2
2
1
Free
0
1
0
1
2
2
1
References
829
References
Blom, G. (1958), Statistical Estimates and Transformed Beta Variables, New York:
John Wiley & Sons, Inc.
Conover, W.J. (1998), Practical Nonparametric Statistics, Third Edition, New York:
John Wiley & Sons, Inc.
Conover, W.J. and Iman, R.L. (1976), On Some Alternative Procedures Using Ranks
for the Analysis of Experimental Designs, Communications in Statistics, A5, 14,
13481368.
Conover, W.J. and Iman, R.L. (1981), Rank Transformations as a Bridge between
Parametric and Nonparametric Statistics, The American Statistician, 35, 124129.
Iman, R.L. and Conover, W.J. (1979), The Use of the Rank Transform in Regression,
Technometrics, 21, 499509.
Lehman, E.L. (1998), Nonparametrics: Statistical Methods Based on Ranks, New
Jersey: Prentice Hall .
Quade, D. (1966), On Analysis of Variance for the k-Sample Problem, Annals of
Mathematical Statistics, 37, 17471758.
Tukey, John W. (1962), The Future of Data Analysis, Annals of Mathematical
Statistics, 33, 22.
830
831
CHAPTER
41
The REGISTRY Procedure
Overview: REGISTRY Procedure 831
Syntax: REGISTRY Procedure 831
PROC REGISTRY Statement 832
Creating Registry Files with the REGISTRY Procedure 836
Structure of a Registry File 836
Specifying Key Names 836
Specifying Values for Keys 836
Sample Registry Entries 837
Examples: REGISTRY Procedure 839
Example 1: Importing a File to the Registry 839
Example 2: Listing and Exporting the Registry 840
Example 3: Comparing the Registry to an External File
Example 4: Comparing Registry Files 842
841
3
3
3
3
3
3
832
Chapter 41
To do this
CLEARSASUSER
COMPAREREG1 and
COMPAREREG2
COMPARETO
DEBUGON
DEBUGOFF
EXPORT=
FULLSTATUS
IMPORT=
Write the contents of the registry to the SAS log. Used with
the STARTAT= option to list specic keys.
LIST
LISTHELP
LISTREG
LISTUSER
STARTAT=
Delete from the specied registry all the keys and values
that are in the specied le
UNINSTALL
UPCASE
USESASHELP
Options
CLEARSASUSER
species one of two registries to compare. The results appear in the SAS log.
833
libname
is the name of the library in which the registry le resides.
registry-name-1
is the name of the rst registry.
Requirement:
Interaction: To specify a single key and all of its subkeys, specify the STARTAT=
option.
Featured in:
COMPAREREG2=libname.registry-name-2
species the second of two registries to compare. The results appear in the SAS log.
libname
is the name of the library in which the registry le resides.
registry-name-2
is the name of the second registry.
Requirement:
Featured in:
COMPARETO=le-specication
3 keys that are dened in the external le but not in the registry
3 value names for a given key that are in the external le but not in the registry
3 differences in the content of like-named values in like-named keys.
COMPARETO= does not report as differences any keys and values that are in the
registry but not in the le because the registry could easily be composed of pieces
from many different les.
le-specication is one of the following:
external-le
is the path and name of an external le that contains the registry information.
leref
is a leref that has been assigned to an external le.
Requirement: You must have previously associated the leref with an external
See also: For information about how to structure a le that contains registry
information, see Creating Registry Files with the REGISTRY Procedure on page
836.
DEBUGON
834
Chapter 41
EXPORT=le-specication
the specied le. To write the SASHELP portion of the registry, specify the
USESASHELP option. You must have write permission to the SASHELP library
to use USESASHELP.
Interaction: To export a single key and all of its subkeys, specify the STARTAT=
option.
Featured in:
FULLSTATUS
lists the keys, subkeys, and values that were added or deleted as a result of running
the IMPORT= and the UNINSTALL options.
IMPORT=le-specication
species the le to import into the SAS registry. PROC REGISTRY does not
overwrite the existing registry. Instead, it updates the existing registry with the
contents of the specied le.
Note: .sasxreg le extension is not required.
le-specication is one of the following:
external-le
is the path and name of an external le that contains the registry information.
leref
is a leref that has been assigned to an external le.
Requirement: You must have previously associated the leref with an external
SAS registry. To import the le to the SASHELP portion of the registry, specify
the USESASHELP option. You must have write permission to SASHELP to use
USESASHELP.
Interaction: To obtain additional information in the SAS log as you import a le,
use FULLSTATUS.
Featured in:
See also: For information about how to structure a le that contains registry
information, see Creating Registry Files with the REGISTRY Procedure on page
836.
LIST
writes the contents of the entire SAS registry to the SAS log.
835
Interaction: To write a single key and all of its subkeys, use the STARTAT= option.
LISTHELP
writes the contents of the SASHELP portion of the registry to the SAS log.
Interaction: To write a single key and all of its subkeys, use the STARTAT= option.
LISTREG=libname.registry-name
Interaction: To list a single key and all of its subkeys, use the STARTAT= option.
LISTUSER
writes the contents of the SASUSER portion of the registry to the SAS log.
Interaction: To write a single key and all of its subkeys, use the STARTAT= option.
Featured in:
STARTAT=key-name
exports or writes the contents of a single key and all of its subkeys.
Interaction: USE STARTAT= with the EXPORT=, LIST, LISTHELP, LISTUSER,
UNINSTALL=le-specication
deletes from the specied registry all the keys and values that are in the specied le.
le-specication is one of the following:
external-le
is the name of an external le that contains the keys and values to delete.
leref
is a leref that has been assigned to an external le. To assign a leref you can
SASUSER portion of the SAS registry. To delete the keys and values from the
SASHELP portion of the registry, specify the USESASHELP option. You must
have write permission to SASHELP to use this option.
Interaction: Use FULLSTATUS to obtain additional information in the SAS log as
information, see Creating Registry Files with the REGISTRY Procedure on page
836.
UPCASE
836
Chapter 41
USESASHELP
performs the specied operation on the SASHELP portion of the SAS registry.
Interaction: Use USESASHELP with the IMPORT=, EXPORT=, COMPARETO, or
837
3
3
3
3
3
A string="my data"
Binary data=hex: 01,00,76,63,62,6B
Dword=dword:00010203
Signed integer value=int:-123
Unsigned integer value (decimal)=dword:0001E240
838
Display 41.2
Chapter 41
To see what the actual registry text le looks like, you can use PROC REGISTRY to
write the contents of the registry key to the SAS log, using the LISTUSER and
STARTAT= options:
Example Code 41.1
proc registry
listuser
startat=sasuser-registry-key-name;
run;
proc registry
listuser
startat=HKEY_SYSTEM_ROOT\CORE\PRINTING\PRINTERS\PostScript\DEFAULT SETTINGS;
run;
Output 41.1
Source File
IMPORT=
This example imports a le into the SASUSER portion of the SAS registry.
Source File
The following le contains examples of valid key name sequences in a registry le:
[HKEY_USER_ROOT\AllGoodPeopleComeToTheAidOfTheirCountry]
@="This is a string value"
"Value2"=""
"Value3"="C:\\This\\Is\\Another\\String\\Value"
839
840
Program
Chapter 41
Program
Assign a leref to a le that contains valid text for the registry. The FILENAME
statement assigns the leref SOURCE to the external le that contains the text to read into the
registry.
filename source external-file;
Invoke PROC REGISTRY to import the le that contains input for the registry. PROC
REGISTRY reads the input le that is identied by the leref SOURCE. IMPORT= writes to
the SASUSER portion of the SAS registry by default.
proc registry
run;
import=source;
SAS Log
1
filename source external-file;
2
proc registry
3
import=source;
4
run;
Parsing REG file and loading the registry please wait....
Registry IMPORT is now complete.
EXPORT=
LISTUSER
This example lists the SASUSER portion of the SAS registry and exports it to an
external le.
Note: This is usually a very large le. To export a portion of the registry, use the
STARTAT= option. 4
Program
Write the contents of the SASUSER portion of the registry to the SAS log. The
LISTUSER option causes PROC REGISTRY to write the entire SASUSER portion of the
registry to the log.
proc registry
listuser
Program
841
Export the registry to the specied le. The EXPORT= option writes a copy of the
SASUSER portion of the SAS registry to the external le.
export=external-file;
run;
SAS Log
1 proc registry listuser export=external-file;
2 run;
Starting to write out the registry file, please wait...
The export to file external-file is now complete.
Contents of SASUSER REGISTRY.
[ HKEY_USER_ROOT]
[
CORE]
[
EXPLORER]
[
CONFIGURATION]
Initialized= "True"
[
FOLDERS]
[
UNXHOST1]
Closed= "658"
Icon= "658"
Name= "Home Directory"
Open= "658"
Path= "~"
This example compares the SASUSER portion of the SAS registry to an external le.
Comparisons such as this are useful if you want to know the difference between a
backup le that was saved with a .txt le extension and the current registry le.
Note: To compare the SASHELP portion of the registry with an external le, specify
the USESASHELP option. 4
Program
Assign a leref to the external le that contains the text to compare to the registry.
The FILENAME statement assigns the leref TESTREG to the external le.
842
SAS Log
Chapter 41
Compare the specied le to the SASUSER portion of the SAS registry. The
COMPARETO option compares the contents of a le to a registry. It returns information about
keys and values that it nds in the le that are not in the registry.
proc registry
compareto=testreg;
run;
SAS Log
This SAS log shows two differences between the SASUSER portion of the registry
and the specied external le. In the registry, the value of Initialized is True; in the
external le, it is False. In the registry, the value of Icon is 658; in the external le
it is 343.
1
filename testreg external-file;
2
proc registry
3
compareto=testreg;
4 run;
Parsing REG file and comparing the registry please wait....
COMPARE DIFF: Value "Initialized" in
[HKEY_USER_ROOT\CORE\EXPLORER\CONFIGURATION]: REGISTRY TYPE=STRING, CURRENT
VALUE="True"
COMPARE DIFF: Value "Initialized" in
[HKEY_USER_ROOT\CORE\EXPLORER\CONFIGURATION]: FILE TYPE=STRING, FILE
VALUE="False"
COMPARE DIFF: Value "Icon" in
[HKEY_USER_ROOT\CORE\EXPLORER\FOLDERS\UNXHOST1]: REGISTRY TYPE=STRING,
CURRENT VALUE="658"
COMPARE DIFF: Value "Icon" in
[HKEY_USER_ROOT\CORE\EXPLORER\FOLDERS\UNXHOST1]: FILE TYPE=STRING, FILE
VALUE="343"
Registry COMPARE is now complete.
COMPARE: There were differences between the registry and the file.
Program
Declare the PROCLIB library. The PROCLIB library contains a registry le.
libname proclib SAS-data-library;
SAS Log
843
Start PROC REGISTRY and specify the rst registry le to be used in the comparison.
proc registry comparereg1=sasuser.regstry
Limit the comparison to the registry keys including and following the specied
registry key. The STARTAT= option limits the scope of the comparison to the EXPLORER
subkey under the CORE key. By default the comparison includes the entire contents of both
registries.
startat=CORE\EXPLORER
SAS Log
8
proc registry comparereg1=sasuser.regstry
9
10
startat=CORE\EXPLORER
11
comparereg2=proclib.regstry;
12
run;
NOTE: Comparing registry SASUSER.REGSTRY to registry PROCLIB.REGSTRY
NOTE: Diff in Key (CORE\EXPLORER\MENUS\FILES\SAS) Item (1;&Open)
SASUSER.REGSTRY Type: String len 17 data PGM;INCLUDE %s;
PROCLIB.REGSTRY Type: String len 15 data WHOSTEDIT %s;
NOTE: Diff in Key (CORE\EXPLORER\MENUS\FILES\SAS) Item (3;&Submit)
SASUSER.REGSTRY Type: String len 23 data PGM;INCLUDE %s;SUBMIT
PROCLIB.REGSTRY Type: String len 21 data WHOSTEDIT %s;SUBMIT
NOTE: Diff in Key (CORE\EXPLORER\MENUS\FILES\SAS) Item (4;&Remote Submit)
SASUSER.REGSTRY Type: String len 35 data SIGNCHECK;PGM;INCLUDE %s;RSUBMIT;
PROCLIB.REGSTRY Type: String len 33 data SIGNCHECK;WHOSTEDIT %s;RSUBMIT;
NOTE: Diff in Key (CORE\EXPLORER\MENUS\FILES\SAS) Item (@)
SASUSER.REGSTRY Type: String len 17 data PGM;INCLUDE %s;
PROCLIB.REGSTRY Type: String len 15 data WHOSTEDIT %s;
NOTE: Item (2;Open with &Program Editor) in key
(CORE\EXPLORER\MENUS\FILES\TXT) not found in registry PROCLIB.REGSTRY
NOTE: Diff in Key (CORE\EXPLORER\MENUS\FILES\TXT) Item (4;&Submit)
SASUSER.REGSTRY Type: String len 24 data PGM;INCLUDE %s;SUBMIT;
PROCLIB.REGSTRY Type: String len 22 data WHOSTEDIT %s;SUBMIT;
NOTE: Diff in Key (CORE\EXPLORER\MENUS\FILES\TXT) Item (5;&Remote Submit)
SASUSER.REGSTRY Type: String len 35 data SIGNCHECK;PGM;INCLUDE %s;RSUBMIT;
PROCLIB.REGSTRY Type: String len 33 data SIGNCHECK;WHOSTEDIT %s;RSUBMIT;
NOTE: PROCEDURE REGISTRY used (Total process time):
real time
0.07 seconds
cpu time
0.02 seconds
844
See Also
Chapter 41
See Also
SAS registry chapter in SAS Language Reference: Concepts
845
CHAPTER
42
The REPORT Procedure
Overview: REPORT Procedure 847
What Does the REPORT Procedure Do? 847
What Types of Reports Can PROC REPORT Produce? 847
What Do the Various Types of Reports Look Like? 847
Concepts: REPORT Procedure 852
Laying Out a Report 852
Planning the Layout 852
Usage of Variables in a Report 853
Display Variables 853
Order Variables 853
Across Variables 854
Group Variables 854
Analysis Variables 855
Computed Variables 855
Interactions of Position and Usage 855
Statistics That Are Available in PROC REPORT 857
Using Compute Blocks 858
What Is a Compute Block? 858
The Purpose of Compute Blocks 858
The Contents of Compute Blocks 859
Four Ways to Reference Report Items in a Compute Block
Compute Block Processing 860
Using Break Lines 861
What Are Break Lines? 861
Creating Break Lines 861
Order of Break Lines 861
The Automatic Variable _BREAK_ 862
Using Compound Names 862
Using Style Elements in PROC REPORT 863
Using the STYLE= Option 863
Using a Format to Assign a Style Attribute Value 866
Controlling the Spacing between Rows 866
Printing a Report 867
Printing with ODS 867
Printing from the REPORT Window 867
Printing with a Form 867
Printing from the Output Window 867
Printing from Noninteractive or Batch Mode 867
Printing from Interactive Line Mode 868
Using PROC PRINTTO 868
Storing and Reusing a Report Denition 868
859
846
Contents
Chapter 42
Example
Example
Example
Example
13:
14:
15:
16:
847
989
848
Chapter 42
Although the WHERE and FORMAT statements are not essential, here they limit the
amount of output and make the values easier to understand.
libname proclib SAS-data-library;
Figure 42.1
Sector
Southeast
Southeast
Southeast
Southeast
Southeast
Southeast
Southeast
Southeast
Manager
Smith
Smith
Smith
Smith
Jones
Jones
Jones
Jones
Department
Paper
Meat/Dairy
Canned
Produce
Paper
Meat/Dairy
Canned
Produce
1
Sales
$50.00
$100.00
$120.00
$80.00
$40.00
$300.00
$220.00
$70.00
The report in Figure 42.2 on page 849 uses the same observations as those in Figure
42.1 on page 848. However, the statements that produce this report
Figure 42.2
849
Manager
Department
Sales
----------------------------------Jones
Paper
Canned
Meat/Dairy
Produce
$40.00
$220.00
$300.00
$70.00
------$630.00
Paper
Canned
Meat/Dairy
Produce
$50.00
$120.00
$100.00
$80.00
------$350.00
------Jones
Smith
------Smith
Customized summary
line for the whole report
Default summary
line for Manager
The summary report in Figure 42.3 on page 849 contains one row for each store in
the northern sector. Each detail row represents four observations in the input data set,
one observation for each department. Information about individual departments does
not appear in this report. Instead, the value of Sales in each detail row is the sum of
the values of Sales in all four departments. In addition to consolidating multiple
observations into one row of the report, the statements that create this report
Figure 42.3
Detail row
Manager
-------
Sales
----------
Northeast
Alomar
Andrews
786.00
1,045.00
---------$1,831.00
Northwest
Brown
Pelfrey
Reveiz
598.00
746.00
1,110.00
---------$2,454.00
Customized summary
line for the whole report
850
Chapter 42
The summary report in Figure 42.4 on page 850 is similar to Figure 42.3 on page
849. The major difference is that it also includes information for individual
departments. Each selected value of Department forms a column in the report. In
addition, the statements that create this report
3 compute and display a variable that is not in the input data set
Figure 42.4
Sector
Manager
______Department_______
Meat/Dairy
Produce
Perishable
Total
-------------------------------------------------------Northeast Alomar
Andrews
Northwest Brown
$190.00
$86.00
$276.00
$300.00
$125.00
$425.00
$250.00
$73.00
$323.00
Pelfrey
$205.00
$76.00
$281.00
Reveiz
$600.00
$30.00
$630.00
The customized report in Figure 42.5 on page 851 shows each managers store on a
separate page. Only the rst two pages appear here. The statements that create this
report create
Figure 42.5
851
Detail row
Computed variable
Sales for Individual Stores
Northeast Sector
Store managed by Alomar
Department
Sales
Profit
----------------------------------Canned
Meat/Dairy
Paper
Produce
$420.00
$190.00
$90.00
$86.00
--------$786.00
$168.00
$47.50
$36.00
$21.50
--------$196.50
Detail row
Computed variable
Sales for Individual Stores
Northeast Sector
Store managed by Andrews
Department
Sales
Profit
----------------------------------Canned
Meat/Dairy
Paper
Produce
$420.00
$300.00
$200.00
$125.00
--------$1,045.00
$168.00
$75.00
$80.00
$31.25
--------$261.25
The report in Figure 42.6 on page 852 uses customized style elements to control
things like font faces, font sizes, and justication, as well as the width of the border of
the table and the width of the spacing between cells. This report was created by using
the HTML destination of the Output Delivery System (ODS) and the STYLE= option in
several statements in the procedure.
For an explanation of the program that produces this report, see Example 16 on page
994. For information on ODS, see Output Delivery System on page 32.
852
Figure 42.6
Chapter 42
HTML Output
3
3
3
3
853
When you understand the layout of the report, use the COLUMN and DEFINE
statements in PROC REPORT to construct the layout.
The COLUMN statement lists the items that appear in the columns of the report,
describes the arrangement of the columns, and denes headers that span multiple
columns. A report item can be
Display Variables
A report that contains one or more display variables has a row for every observation
in the input data set. Display variables do not affect the order of the rows in the report.
If no order variables appear to the left of a display variable, then the order of the rows
in the report reects the order of the observations in the data set. By default, PROC
REPORT treats all character variables as display variables.
Featured in: Example 1 on page 948
Order Variables
A report that contains one or more order variables has a row for every observation in
the input data set. If no display variable appears to the left of an order variable, then
PROC REPORT orders the detail rows according to the ascending, formatted values of
the order variable. You can change the default order with ORDER= and DESCENDING
in the DEFINE statement or with the DEFINITION window.
If the report contains multiple order variables, then PROC REPORT establishes the
order of the detail rows by sorting these variables from left to right in the report. PROC
854
Chapter 42
REPORT does not repeat the value of an order variable from one row to the next if the
value does not change, unless an order variable to its left changes values.
Featured in:
Across Variables
PROC REPORT creates a column for each value of an across variable. PROC
REPORT orders the columns by the ascending, formatted values of the across variable.
You can change the default order with ORDER= and DESCENDING in the DEFINE
statement or with the DEFINITION window. If no other variable helps dene the
column (see COLUMN Statement on page 893), then PROC REPORT displays the N
statistic (the number of observations in the input data set that belong to that cell of the
report).
If you are familiar with procedures that use class variables, then you will see that
across variables are class variables that are used in the column dimension.
Featured in:
Group Variables
If a report contains one or more group variables, then PROC REPORT tries to
consolidate into one row all observations from the data set that have a unique
combination of formatted values for all group variables.
When PROC REPORT creates groups, it orders the detail rows by the ascending,
formatted values of the group variable. You can change the default order with ORDER=
and DESCENDING in the DEFINE statement or with the DEFINITION window.
If the report contains multiple group variables, then the REPORT procedure
establishes the order of the detail rows by sorting these variables from left to right in the
report. PROC REPORT does not repeat the values of a group variable from one row to
the next if the value does not change, unless a group variable to its left changes values.
If you are familiar with procedures that use class variables, then you will see that
group variables are class variables that are used in the row dimension.
Note: You cannot always create groups. PROC REPORT cannot consolidate
observations into groups if the report contains any order variables or any display
variables that do not have one or more statistics associated with them (see COLUMN
Statement on page 893). In the windowing environment, if PROC REPORT cannot
immediately create groups, then the procedure changes all display and order variables
to group variables so that it can create the group variable that you requested. In the
nonwindowing environment, it returns to the SAS log a message that explains why it
could not create groups. Instead, it creates a detail report that displays group variables
the same way as it displays order variables. Even when PROC REPORT creates a
detail report, the variables that you dene as group variables retain that usage in their
denitions. 4
Featured in:
855
Analysis Variables
An analysis variable is a numeric variable that is used to calculate a statistic for all
the observations represented by a cell of the report. (Across variables, in combination
with group variables or order variables, determine which observations a cell
represents.) You associate a statistic with an analysis variable in the variables
denition or in the COLUMN statement. By default, PROC REPORT uses numeric
variables as analysis variables that are used to calculate the Sum statistic.
The value of an analysis variable depends on where it appears in the report:
3 In a detail report, the value of an analysis variable in a detail row is the value of
the statistic associated with that variable calculated for a single observation.
Calculating a statistic for a single observation is not practical; however, using the
variable as an analysis variable enables you to create summary lines for sets of
observations or for all observations.
3 In a summary report, the value displayed for an analysis variable is the value of
the statistic that you specify calculated for the set of observations represented by
that cell of the report.
3 In a summary line for any report, the value of an analysis variable is the value of
the statistic that you specify calculated for all observations represented by that
cell of the summary line.
See also:
908
Featured in: Example 2 on page 951, Example 3 on page 954, Example 4 on
Computed Variables
Computed variables are variables that you dene for the report. They are not in the
input data set, and PROC REPORT does not add them to the input data set. However,
computed variables are included in an output data set if you create one.
In the windowing environment, you add a computed variable to a report from the
COMPUTED VAR window.
In the nonwindowing environment, you add a computed variable by
3 including the computed variable in the COLUMN statement
3 dening the variables usage as COMPUTED in the DEFINE statement
3 computing the value of the variable in a compute block associated with the
variable.
Featured in: Example 5 on page 960, Example 10 on page 975, and Example 13
on page 983
856
Chapter 42
values of order and group variables, considered from left to right in the report.
Similarly, PROC REPORT orders columns for an across variable from left to right,
according to the values of the variable.
Several items can collectively dene the contents of a column in a report. For
instance, in Figure 42.7 on page 856, the values that appear in the third and fourth
columns are collectively determined by Sales, an analysis variable, and by Department,
an across variable. You create this kind of report with the COLUMN statement or, in
the windowing environment, by placing report items above or below each other. This is
called stacking items in the report because each item generates a header, and the
headers are stacked one above the other.
Figure 42.7
Sector
Manager
______Department_______
Meat/Dairy
Produce
Perishable
Total
-------------------------------------------------------Northeast Alomar
Andrews
Northwest Brown
$190.00
$86.00
$276.00
$300.00
$125.00
$425.00
$250.00
$73.00
$323.00
Pelfrey
$205.00
$76.00
$281.00
Reveiz
$600.00
$30.00
$630.00
When you use multiple items to dene the contents of a column, at most one of the
following can be in a column:
3
3
3
3
3
More than one of these items in a column creates a conict for PROC REPORT about
which values to display.
Table 42.1 on page 857 shows which report items can share a column.
Note:
Table 42.1
857
Analysis
Order
Group
Computed
Across
Statistic
Display
Analysis
Order
Group
Computed
variable
Across
X*
Statistic
When a display variable and an across variable share a column, the report must also contain another variable that is
not in the same column.
When a column is dened by stacked report items, PROC REPORT formats the
values in the column by using the format that is specied for the lowest report item in
the stack that does not have an ACROSS usage.
The following items can stand alone in a column:
3 display variable
3 analysis variable
3 order variable
3 group variable
3 computed variable
3 across variable
3 N statistic.
Note: The values in a column that is occupied only by an across variable are
frequency counts. 4
PCTSUM
CV
RANGE
MAX
STDDEV|STD
MEAN
STDERR
MIN
SUM
SUMWGT
NMISS
USS
PCTN
VAR
Q3|P75
P1
P90
858
Chapter 42
P5
P95
P10
P99
Q1|P25
QRANGE
These statistics, the formulas that are used to calculate them, and their data
requirements are discussed in Keywords and Formulas on page 1340.
To compute standard error and the Students t-test you must use the default value of
VARDEF=, which is DF.
Every statistic except N must be associated with a variable. You associate a statistic
with a variable either by placing the statistic above or below a numeric display variable
or by specifying the statistic as a usage option in the DEFINE statement or in the
DEFINITION window for an analysis variable.
You can place N anywhere because it is the number of observations in the input data
set that contribute to the value in a cell of the report. The value of N does not depend
on a particular variable.
Note: If you use the MISSING option in the PROC REPORT statement, then N
includes observations with missing group, order, or across variables. 4
859
In addition, all compute blocks can use most SAS language elements to perform
calculations (see The Contents of Compute Blocks on page 859). A PROC REPORT
step can contain multiple compute blocks, but they cannot be nested.
LENGTH
CALL
RETURN
DO (all forms)
SELECT
END
3
3
3
3
IF-THEN/ELSE
sum
comments
null statements
macro variables and macro invocations
all DATA step functions.
For information about SAS language elements see the appropriate section in SAS
Language Reference: Dictionary.
Within a compute block, you can also use these PROC REPORT features:
3 Compute blocks for a customized summary can contain one or more LINE
statements, which place customized text and formatted values in the summary.
(See LINE Statement on page 907.)
3 Compute blocks for a report item can contain one or more CALL DEFINE
statements, which set attributes like color and format each time a value for the
item is placed in the report. (See CALL DEFINE Statement on page 890.)
3 Any compute block can contain the automatic variable _BREAK_ (see The
Automatic Variable _BREAK_ on page 862.
3 by name.
3 by a compound name that identies both the variable and the name of the statistic
that you calculate with it. A compound name has this form
variable-name.statistic
860
Chapter 42
Note: Referencing variables that have missing values leads to missing values. If a
compute block references a variable that has a missing value, then PROC REPORT
displays that variable as a blank (for character variables) or as a period (for numeric
variables). 4
The following table shows how to use each type of reference in a compute block.
If the variable that you
reference is this type
Then refer to it by
For example
group
name*
Department
Department
Department
order
name
computed
name
display
name
Department
a compound name*
Sales.sum
analysis
a compound name*
Sales.mean
column number
**
_c3_
If the variable has an alias, then you must reference it with the alias.
**
Even if the variable has an alias, you must reference it by column number.
Featured in:
861
item. The value of a computed variable in any row of a report is the last value
assigned to that variable during that execution of the DATA step statements in the
compute block. PROC REPORT assigns values to the columns in a row of a report
from left to right. Consequently, you cannot base the calculation of a computed
variable on any variable that appears to its right in the report.
Note: PROC REPORT recalculates computed variables at breaks. For details on
compute block processing see How PROC REPORT Builds a Report on page 936. 4
3 text
3 values calculated for either a set of rows or for the whole report.
862
Chapter 42
4 blank line
5 page break.
In traditional SAS monospace output only, if you dene a customized summary for
the same location, then customized break lines appear after underlining or double
underlining.
3 the value _RBREAK_ if the current line is part of a break at the beginning or end of
the report
3 the value _PAGE_ if the current line is part of a break at the beginning or end of a
page.
Output 42.1
863
Manager
Alomar
Andrews
--------Northeast
Northwest
Sales
$786.00
$1,045.00
--------$1,831.00
Brown
Pelfrey
Reveiz
--------Southwest
$695.00
$353.00
--------$1,048.00
=========
Total:
=========
=========
$6,313.00
=========
$630.00
$350.00
--------$980.00
Adams
Taylor
$598.00
$746.00
$1,110.00
--------$2,454.00
Jones
Smith
--------Northwest
Southeast
--------Southeast
Southwest
Note: Unless you use the NOALIAS option in the PROC REPORT statement, when
you refer in a compute block to a statistic that has an alias, you do not use a compound
name. Generally, you must use the alias. However, if the statistic shares a column with
an across variable, then you must reference it by column number (see Four Ways to
Reference Report Items in a Compute Block on page 859). 4
864
Chapter 42
You can use braces ({ and }) instead of square brackets ([ and ]).
location(s)
identies the part of the report that the STYLE= option affects. The following
table shows what parts of a report are affected by values of location.
Table 42.2
Location Values
Location Value
CALLDEF
COLUMN
Column cells
HEADER|HDR
Column headers
LINES
REPORT
Report as a whole
SUMMARY
Summary lines
The valid and default values for location vary by what statement the STYLE=
option appears in. Table 42.3 on page 864 shows valid and default values for
location for each statement. To specify more than one value of location in the same
STYLE= option, separate each value with a space.
style-element-name
is the name of a style element that is part of a style denition that is registered
with the Output Delivery System. SAS provides some style denitions. Users can
create their own style denitions with the TEMPLATE procedure (see SAS Output
Delivery System: Users Guide for information about PROC TEMPLATE). The
following table shows the default style elements for each statement.
Table 42.3
Locations and Default Style Elements for Each Statement in PROC REPORT
Statement
Default Location
Value
Default Style
Element
PROC REPORT
REPORT
Table
BREAK
SUMMARY, LINES
SUMMARY
DataEmphasis
CALL DEFINE
CALLDEF
CALLDEF
Data
COMPUTE
LINES
LINES
NoteContent
DEFINE
COLUMN, HEADER|HDR
COLUMN and
HEADER
COLUMN: Data
SUMMARY
DataEmphasis
RBREAK
SUMMARY, LINES
HEADER: Header
style-attribute-specication(s)
describes the style attribute to change. Each style-attribute-specication has this
general form:
style-attribute-name=style-attribute-value
865
FONT_WIDTH=*
BACKGROUNDIMAGE=
FOREGROUND=*
BORDERCOLOR=
FRAME=
BORDERCOLORDARK=
HTMLCLASS=
BORDERCOLORLIGHT=
JUST=
BORDERWIDTH=
OUTPUTWIDTH=
CELLPADDING=
POSTHTML=
CELLSPACING=
POSTIMAGE=
FONT=*
POSTTEXT=
*
FONT_FACE=
PREHTML=
FONT_SIZE=
PREIMAGE=
*
FONT_STYLE=
PRETEXT=
FONT_WEIGHT=*
RULES=
* When you use these attributes in this location, they affect only the text that is specied
with the PRETEXT=, POSTTEXT=, PREHTML=, and POSTHTML= attributes. To alter
the foreground color or the font for the text that appears in the table, you must set the
corresponding attribute in a location that affects the cells rather than the table.
The following table shows valid values of style-attribute-name for the CALLDEF,
COLUMN, HEADER, LINES, and SUMMARY locations. Note that not all style
attributes are valid in all destinations. See SAS Output Delivery System: Users
Guide for more information on these style attributes, their valid values, and their
applicable destinations.
ASIS=
FONT_WIDTH=
BACKGROUND=
HREFTARGET=
BACKGROUNDIMAGE=
HTMLCLASS=
BORDERCOLOR=
JUST=
BORDERCOLORDARK=
NOBREAKSPACE=
BORDERCOLORLIGHT=
POSTHTML=
BORDERWIDTH=
POSTIMAGE=
CELLHEIGHT=
POSTTEXT=
CELLWIDTH=
PREHTML=
FLYOVER=
PREIMAGE=
FONT=
PRETEXT=
866
Chapter 42
FONT_FACE=
PROTECTSPECIALCHARS=
FONT_SIZE=
TAGATTR=
FONT_STYLE=
URL=
FONT_WEIGHT=
VJUST=
Specications in a statement other than the PROC REPORT statement override the
same specication in the PROC REPORT statement. However, any style attributes that
you specify in the PROC REPORT statement and do not override in another statement
are inherited. For instance, if you specify a blue background and a white foreground for
all column headings in the PROC REPORT statement, and you specify a gray
background for the column headings of a variable in the DEFINE statement, then the
background for that particular column heading is gray, and the foreground is white (as
specied in the PROC REPORT statement).
Printing a Report
867
Printing a Report
Forms are available only when you run SAS from a windowing environment.
cases, through Print utilities in the File menu. You cannot view or print
the le until you free it.
b Use operating environment commands to send the le to the printer.
868
Chapter 42
options that you use. Refer to the SAS documentation for your operating environment
for information about how these les are named and where they are stored.
You can print the output le directly or use PROC PRINTTO to redirect the output to
another le. In either case, no form is used, but carriage control characters are written
if the destination is a print le.
Use operating environment commands to send the le to the printer.
Save Report
A report denition may differ from the SAS program that creates the report (see the
discussion of OUTREPT= on page 879).
You can use a report denition to create an identically structured report for any SAS
data set that contains variables with the same names as the ones that are used in the
report denition. Use the REPORT= option in the PROC REPORT statement to load a
report denition when you start PROC REPORT. In the windowing environment, load a
report denition from the LOAD REPORT window by selecting
File
Open Report
869
Tip:
Report
Reminder: You can use the ATTRIB, FORMAT, LABEL, and WHERE statements. See
Chapter 3, Statements with the Same Function in Multiple Procedures, on page 57 for
details. You can also use any global statements. See Global Statements on page 18 for
a list.
To do this
BREAK
BY
CALL DEFINE
COLUMN
870
Chapter 42
To do this
DEFINE
FREQ
LINE
RBREAK
WEIGHT
To do this
DATA=
OUT=
THREADS | NOTHREADS
WINDOWS|NOWINDOWS
NOALIAS
VARDEF=
QMARKERS=
QMETHOD=
QNTLDEF=
EXCLNPWGT
To do this
COMPLETECOLS|NOCOMPLETECOLS
COMPLETEROWS|NOCOMPLETEROWS
BOX*
CENTER|NOCENTER
COLWIDTH=*
FORMCHAR=*
LS=*
MISSING
PANELS=*
PS=
PSPACE=*
SHOWALL
SPACING=*
WRAP
HEADLINE*
HEADSKIP*
NOHEADER
NAMED
SPLIT=
871
872
Chapter 42
To do this
STYLE=
CONTENTS=
Store and retrieve report denitions, PROC REPORT statements, and your report prole
Write to the SAS log the PROC
REPORT code that creates the current
report
LIST
NOEXEC
OUTREPT=
PROFILE=
REPORT=
COMMAND
HELP=
PROMPT
Options
BOX
873
CENTER|NOCENTER
species whether to center or left-justify the report and summary text (customized
break lines).
PROC REPORT honors the rst of these centering specications that it nds:
species the default number of characters for columns containing computed variables
or numeric data set variables.
Default: 9
Range: 1 to the linesize
Restriction: This option has no effect on ODS destinations other than traditional
WIDTH= in the denition for that column. If WIDTH= is not present, then PROC
REPORT uses a column width large enough to accommodate the format for the
item. (For information about formats see the discussion of FORMAT= on page 901.)
If no format is associated with the item, then the column width depends on
variable type:
If the variable is a
Featured in:
COMMAND
displays command lines rather than menu bars in all REPORT windows.
After you have started PROC REPORT in the windowing environment, you can
display the menu bars in the current window by issuing the COMMAND command.
You can display the menu bars in all PROC REPORT windows by issuing the
PMENU command. The PMENU command affects all the windows in your SAS
session. Both of these commands are toggles.
You can store a setting of COMMAND in your report prole. PROC REPORT
honors the rst of these settings that it nds:
874
Chapter 42
COMPLETECOLS|NOCOMPLETECOLS
creates all possible combinations for the values of the across variables even if one or
more of the combinations do not occur within the input data set. Consequently, the
column headings are the same for all logical pages of the report within a single BY
group.
Default: COMPLETECOLS
Interaction: The PRELOADFMT option in the DEFINE statement ensures that
PROC REPORT uses all user-dened format ranges for the combinations of across
variables, even when a frequency is zero.
COMPLETEROWS|NOCOMPLETEROWS
displays all possible combinations of the values of the group variables, even if one or
more of the combinations do not occur in the input data set. Consequently, the row
headings are the same for all logical pages of the report within a single BY group.
Default: NOCOMPLETEROWS
Interaction: The PRELOADFMT option in the DEFINE statement ensures that
PROC REPORT uses all user-dened format ranges for the combinations of group
variables, even when a frequency is zero.
CONTENTS=link-text
species the text for the entries in the HTML contents le or PDF table of contents
for the output that is produced by PROC REPORT. For information on HTML and
PDF output, see Output Delivery System on page 32.
Note: A hexadecimal value (such as DFx) that is specied within link-text will
not resolve because it is specied within quotation marks. To resolve a hexadecimal
value, use the %sysfunc(byte(num)) function, where num is the hexadecimal value.
Be sure to enclose link-text in double quotation marks (" ") so that the macro function
will resolve. 4
Restriction: For HTML output, the CONTENTS= option has no effect on the
HTML body le. It affects only the HTML contents le.
DATA=SAS-data-set
excludes observations with nonpositive weight values (zero or negative) from the
analysis. By default, PROC REPORT treats observations with negative weights like
those with zero weights and counts them in the total number of observations.
Alias: EXCLNPWGTS
Requirement: You must use a WEIGHT statement.
See also: WEIGHT Statement on page 912
FORMCHAR <(position(s))>=formatting-character(s)
875
formatting-character(s)
lists the characters to use for the specied positions. PROC REPORT assigns
characters in formatting-character(s) to position(s), in the order that they are
listed. For instance, the following option assigns the asterisk (*) to the third
formatting character, the pound sign (#) to the seventh character, and does not
alter the remaining characters:
formchar(3,7)=*#
Restriction: This option has no effect on ODS destinations other than traditional
characters. The system option denes the entire string of formatting characters.
The FORMCHAR= option in a procedure can redene selected characters.
You can use any character in formatting-characters, including hexadecimal
characters. If you use hexadecimal characters, then you must put an x after the
closing quotation mark. For instance, the following option assigns the hexadecimal
character 2D to the third formatting character, the hexadecimal character 7C to
the seventh character, and does not alter the remaining characters:
Tip:
formchar(3,7)=2D7Cx
Table 42.4
Position
Default
Used to draw
876
Chapter 42
10
11
13
Figure 42.8
Sector
Manager
Sales
-----------------------------Northeast
Northwest
Alomar
Andrews
Brown
Pelfrey
Reveiz
786.00
1,045.00
---------1,831.00
----------
598.00
746.00
1,110.00
---------2,454.00
---------==========
4,285.00
==========
13
HEADLINE
underlines all column headers and the spaces between them at the top of each page
of the report.
The HEADLINE option underlines with the second formatting character. (See the
discussion of FORMCHAR= on page 874 .)
Default: hyphen (-)
Restriction: This option has no effect on ODS destinations other than traditional
SAS monospace output.
Tip: In traditional (monospace) SAS output, you can underline column headers
without underlining the spaces between them, by using two hyphens (--) as
the last line of each column header instead of using HEADLINE.
Featured in: Example 2 on page 951 and Example 8 on page 968
HEADSKIP
writes a blank line beneath all column headers (or beneath the underlining that the
HEADLINE option writes) at the top of each page of the report.
Restriction: This option has no effect on ODS destinations other than traditional
SAS monospace output.
877
Featured in:
HELP=libref.catalog
identies the library and catalog containing user-dened help for the report. This
help can be in CBT or HELP catalog entries. You can write a CBT or HELP entry for
each item in the report with the BUILD procedure in SAS/AF software. Store all
such entries for a report in the same catalog.
Specify the entry name for help for a particular report item in the DEFINITION
window for that report item or in a DEFINE statement.
Restriction: This option has no effect in the nonwindowing environment or on ODS
destinations other than traditional SAS monospace output.
LIST
writes to the SAS log the PROC REPORT code that creates the current report. This
listing may differ in these ways from the statements that you submit:
3 It shows some defaults that you may not have specied.
3 It omits some statements that are not specic to the REPORT procedure,
whether you submit them with the PROC REPORT step or had previously
submitted them. These statements include
BY
FOOTNOTE
FREQ
TITLE
WEIGHT
WHERE
windowing environment, you can write the report denition for the report that is
currently in the REPORT window to the SOURCE window by selecting
Tools
Report Statements
LS=line-size
3 the LS= setting stored in the report denition loaded with REPORT= in the
PROC REPORT statement
3 the SAS system option LINESIZE=.
878
Chapter 42
64-256 (integer)
Restriction: This option has no effect on ODS destinations other than traditional
SAS monospace output.
Featured in: Example 6 on page 964 and Example 8 on page 968
Range:
MISSING
considers missing values as valid values for group, order, or across variables. Special
missing values used to represent numeric values (the letters A through Z and the
underscore (_) character) are each considered as a different value. A group for each
missing value appears in the report. If you omit the MISSING option, then PROC
REPORT does not include observations with a missing value for any group, order, or
across variables in the report.
See also: For information about special missing values, see the section on missing
values in SAS Language Reference: Concepts.
Featured in: Example 11 on page 977
NAMED
writes name= in front of each value in the report, where name is the column header
for the value.
Interaction: When you use the NAMED option, PROC REPORT automatically uses
the NOHEADER option.
Tip: Use NAMED in conjunction with the WRAP option to produce a report that
wraps all columns for a single row of the report onto consecutive lines rather than
placing columns of a wide report on separate pages.
Featured in: Example 7 on page 966
NOALIAS
lets you use a report that was created before compute blocks required aliases (before
Release 6.11). If you use NOALIAS, then you cannot use aliases in compute blocks.
NOCENTER
suppresses the building of the report. Use NOEXEC with OUTREPT= to store a
report denition in a catalog entry. Use NOEXEC with LIST and REPORT= to
display a listing of the specied report denition.
NOHEADER
NOWD
See WINDOWS|NOWINDOWS on page 884.
Alias:
OUT=SAS-data-set
names the output data set. If this data set does not exist, then PROC REPORT
creates it. The data set contains one observation for each detail row of the report and
879
one observation for each unique summary line. If you use both customized and
default summaries at the same place in the report, then the output data set contains
only one observation because the two summaries differ only in how they present the
data. Information about customization (underlining, color, text, and so forth) is not
data and is not saved in the output data set.
The output data set contains one variable for each column of the report. PROC
REPORT tries to use the name of the report item as the name of the corresponding
variable in the output data set. However, this is not possible if a data set variable is
under or over an across variable or if a data set variable appears multiple times in
the COLUMN statement without aliases. In these cases, the name of the variable is
based on the column number (_C1_, _C2_, and so forth).
Output data set variables that are derived from input data set variables retain the
formats of their counterparts in the input data set. PROC REPORT derives labels for
these variables from the corresponding column headers in the report unless the only
item dening the column is an across variable. In that case, the variables have no
label. If multiple items are stacked in a column, then the labels of the corresponding
output data set variables come from the analysis variable in the column.
The output data set also contains a character variable named _BREAK_. If an
observation in the output data set derives from a detail row in the report, then the
value of _BREAK_ is missing. If the observation derives from a summary line, then
the value of _BREAK_ is the name of the break variable that is associated with the
summary line, or _RBREAK_. If the observation derives from a COMPUTE BEFORE
_PAGE_ or COMPUTE AFTER _PAGE_ statement, then the value of _BREAK_ is
_PAGE_. Note, however, that for COMPUTE BEFORE _PAGE_ and COMPUTE
AFTER _PAGE_, the _PAGE_ value is written to the output data set only; it is not
available as a value of the automatic variable _BREAK_ during execution of the
procedure.
Interaction: You cannot use OUT= in a PROC REPORT step that uses a BY
statement.
Featured in: Example 12 on page 980 and Example 13 on page 983
OUTREPT=libref.catalog.entry
stores in the specied catalog entry the REPORT denition that is dened by the
PROC REPORT step that you submit. PROC REPORT assigns the entry a type of
REPT.
The stored report denition may differ in these ways from the statements that you
submit:
3 It omits some statements that are not specic to the REPORT procedure,
whether you submit them with the PROC REPORT step or whether they are
already in effect when you submit the step. These statements include
BY
FOOTNOTE
FREQ
TITLE
WEIGHT
WHERE
3 It omits these PROC REPORT statement options:
LIST
NOALIAS
OUT=
880
Chapter 42
OUTREPT=
PROFILE=
REPORT=
WINDOWS|NOWINDOWS
PANELS=number-of-panels
species the number of panels on each page of the report. If the width of a report is
less than half of the line size, then you can display the data in multiple sets of
columns so that rows that would otherwise appear on multiple pages appear on the
same page. Each set of columns is a panel. A familiar example of this kind of report
is a telephone book, which contains multiple panels of names and telephone numbers
on a single page.
When PROC REPORT writes a multipanel report, it lls one panel before
beginning the next.
The number of panels that ts on a page depends on the
SAS monospace output. However, the COLUMNS= option in the ODS PRINTER
or ODS PDF statement produces similar results. For details, see the chapter on
ODS statements in SAS Output Delivery System: Users Guide.
Default: 1
Tip:
See also: For information about the space between panels and the line size, see the
discussions of PSPACE= on page 881 and the discussion of LS= on page 877.
Featured in:
PCTLDEF=
3 species the location of menus that dene alternative menu bars and pull-down
menus for the REPORT and COMPUTE windows.
881
You create a prole from the PROFILE window while using PROC REPORT in a
windowing environment. To create a prole
1 Invoke PROC REPORT with the WINDOWS option.
2 Select
Tools
Report Prole
PROMPT
opens the REPORT window and starts the PROMPT facility. This facility guides you
through creating a new report or adding more data set variables or statistics to an
existing report.
If you start PROC REPORT with prompting, then the rst window gives you a
chance to limit the number of observations that are used during prompting. When
you exit the prompter, PROC REPORT removes the limit.
Restriction: When you use the PROMPT option, you open the REPORT window.
When the REPORT window is open, you cannot send procedure output to any ODS
destination.
You can store a setting of PROMPT in your report prole. PROC REPORT
honors the rst of these settings that it nds:
Tip:
PSPACE=space-between-panels
species the number of blank characters between panels. PROC REPORT separates
all panels in the report by the same number of blank characters. For each panel, the
sum of its width and the number of blank characters separating it from the panel to
its left cannot exceed the line size.
882
Chapter 42
Default: 4
Restriction: This option has no effect on ODS destinations other than traditional
QMARKERS=number
2
species the default number of markers to use for the P estimation method. The
number of markers controls the size of xed memory space.
Default: The default value depends on which quantiles you request. For the median
(P50), number is 7. For the quartiles (P25 and P75), number is 25. For the
quantiles P1, P5, P10, P90, P95, or P99, number is 105. If you request several
quantiles, then PROC REPORT uses the largest default value of number.
any odd integer greater than 3
Range:
Increase the number of markers above the default settings to improve the
accuracy of the estimates; you can reduce the number of markers to conserve
computing resources.
Tip:
QMETHOD=OS|P2
species the method that PROC REPORT uses to process the input data when it
computes quantiles. If the number of observations is less than or equal to the value
of the QMARKERS= option, and the value of the QNTLDEF= option is 5, then both
methods produce the same results.
OS
uses order statistics. This is the technique that PROC UNIVARIATE uses.
Note: This technique can be very memory intensive.
P2
2
uses the P method to approximate the quantile.
Default: OS
Restriction: When QMETHOD=P2, PROC REPORT does not compute weighted
quantiles.
When QMETHOD=P2, reliable estimates of some quantiles (P1, P5, P95, P99)
might not be possible for some data sets such as those with heavily tailed or
skewed distributions.
Tip:
QNTLDEF=1|2|3|4|5
species the mathematical denition that the procedure uses to calculate quantiles
when the value of the QMETHOD= option is OS. When QMETHOD=P2, you must
use QNTLDEF=5.
Default: 5
Alias:
PCTLDEF=
species the report denition to use. PROC REPORT stores all report denitions as
entries of type REPT in a SAS catalog.
Interaction: If you use REPORT=, then you cannot use the COLUMN statement.
See also:
Featured in:
SHOWALL
overrides options in the DEFINE statement that suppress the display of a column.
See also: NOPRINT and NOZERO in DEFINE Statement on page 897
883
SPACING=space-between-columns
species the number of blank characters between columns. For each column, the sum
of its width and the blank characters between it and the column to its left cannot
exceed the line size.
Default: 2
Restriction: This option has no effect on ODS destinations other than traditional
SAS monospace output.
Interaction: PROC REPORT separates all columns in the report by the number of
blank characters specied by SPACING= in the PROC REPORT statement unless
you use SPACING= in the DEFINE statement to change the spacing to the left of
a specic item.
Interaction: When CENTER is in effect, PROC REPORT ignores spacing that
precedes the leftmost variable in the report.
Featured in: Example 2 on page 951
SPLIT=character
species the split character. PROC REPORT breaks a column header when it
reaches that character and continues the header on the next line. The split character
itself is not part of the column header although each occurrence of the split character
counts toward the 256-character maximum for a label.
Default: slash (/)
Interaction: The FLOW option in the DEFINE statement honors the split character.
Restriction: This option has no effect on ODS destinations other than traditional
SAS monospace output.
Featured in: Example 5 on page 960
STYLE<(location(s))>=<style-element-name><[style-attribute-specication(s)]>
species the style element to use for the specied locations in the report. See Using
Style Elements in PROC REPORT on page 863 for details.
Restriction: This option affects only the HTML, RTF, and Printer output.
Featured in: Example 15 on page 989 and Example 16 on page 994
THREADS | NOTHREADS
enables or disables parallel processing of the input data set. This option overrides
the SAS system option THREADS | NOTHREADS. See SAS Language Reference:
Concepts for more information about parallel processing.
Default: value of SAS system option THREADS | NOTHREADS.
Interaction: PROC REPORT uses the value of the SAS system option THREADS
except when a BY statement is specied or the value of the SAS system option
CPUCOUNT is less than 2. You can use THREADS in the PROC REPORT
statement to force PROC REPORT to use parallel processing in these situations.
VARDEF=divisor
species the divisor to use in the calculation of the variance and standard deviation.
Table 42.5 on page 883 shows the possible values for divisor and associated divisors.
Table 42.5
Value
Divisor
DF
degrees of freedom
number of observations
884
Chapter 42
Value
Divisor
WDF
WEIGHT|WGT
sum of weights
(6i wi) 1
6i wi
P (x 0 x) CSS=divisor
CSS
CSS
P w (x 0 x )
WINDOWS|NOWINDOWS
displays one value from each column of the report, on consecutive lines if necessary,
before displaying another value from the rst column. By default, PROC REPORT
displays values for only as many columns as it can t on one page. It lls a page
with values for these columns before starting to display values for the remaining
columns on the next page.
Restriction: This option has no effect on ODS destinations other than traditional
SAS monospace output.
Interaction: When WRAP is in effect, PROC REPORT ignores PAGE in any item
denitions.
Tip: Typically, you use WRAP in conjunction with the NAMED option in order to
avoid wrapping column headers.
Featured in:
BREAK Statement
885
BREAK Statement
Produces a default summary at a break (a change in the value of a group or order variable). The
information in a summary applies to a set of observations. The observations share a unique
combination of values for the break variable and all other group or order variables to the left of
the break variable in the report.
Featured in: Example 4 on page 957 and Example 5 on page 960.
To do this
COLOR=
DOL*
DUL*
OL*
PAGE
SKIP
STYLE=
SUMMARIZE
SUPPRESS
UL*
Required Arguments
location
886
BREAK Statement
Chapter 42
break-variable
is a group or order variable. The REPORT procedure writes break lines each time
the value of this variable changes.
Options
COLOR=color
species the color of the break lines in the REPORT window. You can use the
following colors:
BLACK
MAGENTA
BLUE
ORANGE
BROWN
PINK
CYAN
RED
GRAY
WHITE
GREEN
YELLOW
(for double overlining) uses the thirteenth formatting character to overline each value
3 that appears in the summary line
3 that would appear in the summary line if you specied the SUMMARIZE option.
Default: equals sign (=)
Restriction: This option has no effect on ODS destinations other than traditional
(for double underlining) uses the thirteenth formatting character to underline each
value
3 that appears in the summary line
3 that would appear in the summary line if you specied the SUMMARIZE option.
Default: equals sign (=)
Restriction: This option has no effect on ODS destinations other than traditional
BREAK Statement
887
(for overlining) uses the second formatting character to overline each value
3 that appears in the summary line
3 that would appear in the summary line if you specied the SUMMARIZE option.
Default: hyphen (-)
Restriction: This option has no effect on ODS destinations other than traditional
species the style element to use for default summary lines that are created with the
BREAK statement. See Using Style Elements in PROC REPORT on page 863 for
details.
Restriction: This option affects only the HTML, RTF, and Printer output.
SUMMARIZE
writes a summary line in each group of break lines. A summary line for a set of
observations contains values for
3 the break variable (which you can suppress with the SUPPRESS option)
3 other group or order variables to the left of the break variable
3 statistics
3 analysis variables
3 computed variables.
The following table shows how PROC REPORT calculates the value for each kind
of report item in a summary line that is created by the BREAK statement:
If the report item is
missing*
888
BREAK Statement
Chapter 42
a statistic
an analysis variable
a computed variable
If you reference a variable with a missing value in a customized summary line, then PROC
REPORT displays that variable as a blank (for character variables) or a period (for numeric
variables).
Note: PROC REPORT cannot create groups in a report that contains order or
display variables. 4
Featured in:
page 971
SUPPRESS
suppresses printing of
unavailable for use in customized break lines unless you assign a value to it in the
compute block that is associated with the break (see COMPUTE Statement on
page 895).
Featured in:
UL
(for underlining) uses the second formatting character to underline each value
BY Statement
889
Note: If you dene a customized summary for the break, then customized break
lines appear after underlining or double underlining. For more information about
customized break lines, see COMPUTE Statement on page 895 and LINE Statement
on page 907. 4
BY Statement
Creates a separate report on a separate page for each BY group.
Restriction: If you use the BY statement, then you must use the NOWINDOWS option in
the PROC REPORT statement.
Restriction:
You cannot use the OUT= option when you use a BY statement.
Interaction: If you use the RBREAK statement in a report that uses BY processing, then
PROC REPORT creates a default summary for each BY group. In this case, you cannot
summarize information for the whole report.
Using the BY statement does not make the FIRST. and LAST. variables available
in compute blocks.
Tip:
Main discussion:
BY on page 58
BY <DESCENDING> variable-1
<< DESCENDING> variable-n> <NOTSORTED>;
Required Arguments
variable
species the variable that the procedure uses to form BY groups. You can specify
more than one variable. If you do not use the NOTSORTED option in the BY
statement, then the observations in the data set either must be sorted by all the
variables that you specify or must be indexed appropriately. Variables in a BY
statement are called BY variables.
Options
DESCENDING
species that the data set is sorted in descending order by the variable that
immediately follows the word DESCENDING in the BY statement.
NOTSORTED
species that observations are not necessarily sorted in alphabetic or numeric order.
The data are grouped in another way, for example, chronological order.
The requirement for ordering or indexing observations according to the values of
BY variables is suspended for BY-group processing when you use the NOTSORTED
option. In fact, the procedure does not use an index if you specify NOTSORTED. The
procedure denes a BY group as a set of contiguous observations that have the same
890
Chapter 42
values for all BY variables. If observations with the same values for the BY variables
are not contiguous, then the procedure treats each contiguous set as a separate BY
group.
Featured in:
Required Arguments
column-id
species a column name or a column number (that is, the position of the column from
the left edge of the report). A column ID can be one of the following:
is the attribute to dene. For attribute names, refer to Table 42.6 on page 891.
_ROW_
sets the value for the attribute. For values for each attribute, refer to Table 42.6 on
page 891.
Table 42.6
Attribute Descriptions
Attribute
Description
Values
Affects
BLINK
windowing environment
COLOR
windowing environment
COMMAND
windowing environment
FORMAT
a SAS format or a
user-dened format
windowing and
nonwindowing environments
HIGHLIGHT
windowing environment
RVSVIDEO
windowing environment
STYLE=
URL
891
892
Chapter 42
Attribute
Description
Values
Affects
URLBP
HTML output
HTML output
attribute
URLP
attribute
* The total length of the URL that you specify (including any characters that come from the BASE= and PATH=
options) cannot exceed the line size. Use the LS= option in the PROC REPORT statement to alter the line size
for the PROC REPORT step.
# For information on the BASE= and PATH= options, see the documentation for the ODS HTML statement in
SAS Output Delivery System: Users Guide.
Note: The attributes BLINK, HIGHLIGHT, and RVSVIDEO do not work on all
devices. 4
See Using Style Elements in PROC REPORT on page 863 for details.
Restriction: This option affects only the HTML, RTF, Printer destinations.
Interaction: If you set a style element for the CALLDEF location in the PROC
REPORT statement and you want to use that exact style element in a CALL
DEFINE statement, then use an empty string as the value for the STYLE
attribute, as shown here:
call define (_col_, "STYLE", "" );
COLUMN Statement
893
COLUMN Statement
Describes the arrangement of all columns and of headers that span more than one column.
Restriction: You cannot use the COLUMN statement if you use REPORT= in the PROC
REPORT statement.
Featured in: Example 1 on page 948, Example 3 on page 954, Example 5 on page 960,
Example 6 on page 964, Example 10 on page 975, and Example 11 on page 977
COLUMN column-specication(s);
Required Arguments
column-specication(s)
3
3
3
3
report-item(s)
report-item-1, report-item-2 <. . . , report-item-n>
(header-1 < . . . header-n > report-item(s) )
report-item=name
894
COLUMN Statement
Chapter 42
If you stack a display variable under an across variable, then all the values of
that display variable appear in the report.
Interaction: A series of stacked report items can include only one analysis
variable or statistic. If you include more than one analysis variable or statistic,
then PROC REPORT returns an error because it cannot determine which
values to put in the cells of the report.
Tip: You can use parentheses to group report items whose headers should appear
at the same level rather than stacked one above the other.
Featured in: Example 5 on page 960, Example 6 on page 964, and Example 10 on
page 975
(header-1 < header-n > report-item(s))
creates one or more headers that span multiple columns.
header
is a string of characters that spans one or more columns in the report. PROC
REPORT prints each header on a separate line. You can use split characters in
a header to split one header over multiple lines. See the discussion of SPLIT=
on page 883.
In traditional (monospace) SAS output, if the rst and last characters of a
header are one of the following characters, then PROC REPORT uses that
character to expand the header to ll the space over the column or columns:
: = \_ .* +
Similarly, if the rst character of a header is < and the last character is >, or
vice-versa, then PROC REPORT expands the header to ll the space over the
column by repeating the rst character before the text of the header and the
last character after it.
Note: A hexadecimal value (such as DFx) that is specied within header
will not resolve because it is specied within quotation marks. To resolve a
hexadecimal value, use the %sysfunc(byte(num)) function, where num is the
hexadecimal value. Be sure to enclose header in double quotation marks (" ") so
that the macro function will resolve. 4
report-item(s)
species the columns to span.
Featured in: Example 10 on page 975
report-item=name
species an alias for a report item. You can use the same report item more than
once in a COLUMN statement. However, you can use only one DEFINE statement
for any given name. (The DEFINE statement designates characteristics such as
formats and customized column headers. If you omit a DEFINE statement for an
item, then the REPORT procedure uses defaults.) Assigning an alias in the
COLUMN statement does not by itself alter the report. However, it does enable you
to use separate DEFINE statements for each occurrence of a variable or statistic.
Featured in: Example 3 on page 954
Note: You cannot always use an alias. When you refer in a compute block to a report
item that has an alias, you must usually use the alias. However, if the report item
shares a column with an across variable, then you must reference the column by column
number (see Four Ways to Reference Report Items in a Compute Block on page 859). 4
COMPUTE Statement
895
COMPUTE Statement
Starts a compute block. A compute block contains one or more programming statements that
PROC REPORT executes as it builds the report.
Interaction: An ENDCOMP statement must mark the end of the group of statements in
the compute block.
Featured in: Example 2 on page 951, Example 3 on page 954, Example 4 on page 957,
Example 5 on page 960, Example 9 on page 971, and Example 10 on page 975
Required Arguments
You must specify either a location or a report item in the COMPUTE statement.
location
3 immediately after the last row of a set of rows that have the same value for
the variable that you specify as target or, if there is a default summary on
that variable, immediately after the creation of the preliminary summary line
(see How PROC REPORT Builds a Report on page 936).
3 except in Printer and RTF output, near the bottom of each page, immediately
before any footnotes, if you specify _PAGE_ as target.
3 immediately before the rst row of a set of rows that have the same value for
the variable that you specify as target or, if there is a default summary on
896
COMPUTE Statement
Chapter 42
that variable, immediately after the creation of the preliminary summary line
(see How PROC REPORT Builds a Report on page 936).
3 except in Printer and RTF output, near the top of each page, between any
titles and the column headings, if you specify _PAGE_ as target.
report-item
Options
STYLE<(location(s))>=<style-element-name><[style-attribute-specication(s)]>
species the style to use for the text that is created by any LINE statements in this
compute block. See Using Style Elements in PROC REPORT on page 863 for
details.
Restriction: This option affects only the HTML, RTF, and Printer destinations.
Featured in:
target
controls when the compute block executes. If you specify a location (BEFORE or
AFTER) for the COMPUTE statement, then you can also specify target, which can be
one of the following:
break-variable
is a group or order variable.
When you specify a break variable, PROC REPORT executes the statements in
the compute block each time the value of the break variable changes.
_PAGE_ </ justication>
except in Printer and RTF output, causes the compute block to execute once for
each page, either immediately after printing any titles or immediately before
printing any footnotes. justication controls the placement of text and values. It
can be one of the following:
CENTER
LEFT
RIGHT
Featured in:
DEFINE Statement
897
type-specication
species the type and, optionally, the length of report-item. If the report item that is
associated with a compute block is a computed variable, then PROC REPORT
assumes that it is a numeric variable unless you use a type specication to specify
that it is a character variable. A type specication has the form
CHARACTER < LENGTH=length>
where
CHARACTER
species that the computed variable is a character variable. If you do not specify a
length, then the variables length is 8.
Alias: CHAR
Featured in: Example 10 on page 975
LENGTH=length
species the length of a computed character variable.
Default: 8
Range: 1 to 200
Interaction: If you specify a length, then you must use CHARACTER to indicate
that the computed variable is a character variable.
Featured in: Example 10 on page 975
DEFINE Statement
Describes how to use and display a report item.
If you do not use a DEFINE statement, then PROC REPORT uses default
characteristics.
Featured in: Example 2 on page 951, Example 3 on page 954, Example 4 on page 957,
Example 5 on page 960, Example 6 on page 964, Example 9 on page 971, and Example
10 on page 975
Tip:
To do this
Specify how to use a report item (see Usage of Variables in a Report on page 853)
Dene the item, which must be a data set variable, as an
across variable
ACROSS
ANALYSIS
COMPUTED
DISPLAY
898
DEFINE Statement
Chapter 42
To do this
GROUP
ORDER
EXCLUSIVE
FORMAT=
ITEMHELP=
MISSING
ORDER=
PRELOADFMT
SPACING=
statistic
STYLE=
WEIGHT=
WIDTH=
DESCENDING
FLOW
ID
NOPRINT
NOZERO
PAGE
CENTER
LEFT
To do this
DEFINE Statement
899
RIGHT
COLOR=
column-header
Required Arguments
report-item
species the name or alias (established in the COLUMN statement) of the data set
variable, computed variable, or statistic to dene.
Note: Do not specify a usage option in the denition of a statistic. The name of the
statistic tells PROC REPORT how to use it. 4
Options
ACROSS
denes report-item, which must be a data set variable, as an across variable. (See
Across Variables on page 854.)
Featured in: Example 5 on page 960
ANALYSIS
denes report-item, which must be a data set variable, as an analysis variable. (See
Analysis Variables on page 855.)
By default, PROC REPORT calculates the Sum statistic for an analysis variable.
Specify an alternate statistic with the statistic option in the DEFINE statement.
Note: Naming a statistic in the DEFINE statement implies the ANALYSIS
option, so you never need to specify ANALYSIS. However, specifying ANALYSIS may
make your code easier for novice users to understand. 4
Featured in: Example 2 on page 951, Example 3 on page 954, and Example 4 on
page 957
CENTER
centers the formatted values of the report item within the column width and centers
the column header over the values. This option has no effect on the CENTER option
in the PROC REPORT statement, which centers the report on the page.
COLOR=color
species the color in the REPORT window of the column header and of the values of
the item that you are dening. You can use the following colors:
BLACK
MAGENTA
BLUE
ORANGE
BROWN
PINK
CYAN
RED
900
DEFINE Statement
Chapter 42
GRAY
WHITE
GREEN
YELLOW
Note: Not all operating environments and devices support all colors, and in some
operating environments and devices, one color may map to another color. For
example, if the DEFINITION window displays the word BROWN in yellow
characters, then selecting BROWN results in a yellow item. 4
column-header
denes the column header for the report item. Enclose each header in single or
double quotation marks. When you specify multiple column headers, PROC REPORT
uses a separate line for each one. The split character also splits a column header
over multiple lines.
In traditional (monospace) SAS output, if the rst and last characters of a heading
are one of the following characters, then PROC REPORT uses that character to
expand the heading to ll the space over the column:
: = \_ .* +
Similarly, if the rst character of a header is < and the last character is >, or
vice-versa, then PROC REPORT expands the header to ll the space over the column
by repeating the rst character before the text of the header and the last character
after it.
Note: A hexadecimal value (such as DFx) that is specied within
column-header will not resolve because it is specied within quotation marks. To
resolve a hexadecimal value, use the %sysfunc(byte(num)) function, where num is
the hexadecimal value. Be sure to enclose column-header in double quotation marks
(" ") so that the macro function will resolve. 4
Default:
Item
Header
variable name
variable label
statistic
statistic name
If you want to use names when labels exist, then submit the following SAS
statement before invoking PROC REPORT:
Tip:
options nolabel;
HEADLINE underlines all column headers and the spaces between them. In
traditional (monospace) SAS output, you can underline column headers without
underlining the spaces between them, by using the special characters -- as the
last line of each column header instead of using HEADLINE (see Example 4 on
page 957).
Tip:
DEFINE Statement
901
page 960
COMPUTED
denes the specied item as a computed variable. Computed variables are variables
that you dene for the report. They are not in the input data set, and PROC
REPORT does not add them to the input data set.
In the windowing environment, you add a computed variable to a report from the
COMPUTED VAR window.
In the nonwindowing environment, you add a computed variable by
DESCENDING
reverses the order in which PROC REPORT displays rows or values of a group, order,
or across variable.
By default, PROC REPORT orders group, order, and across variables by their
formatted values. Use the ORDER= option in the DEFINE statement to specify an
alternate sort order.
Tip:
DISPLAY
denes report-item, which must be a data set variable, as a display variable. (See
Display Variables on page 853.)
EXCLUSIVE
excludes from the report and the output data set all combinations of the group
variables and the across variables that are not found in the preloaded range of
user-dened formats.
Requirement: You must specify the PRELOADFMT option in the DEFINE
statement in order to preload the variable formats.
FLOW
wraps the value of a character variable in its column. The FLOW option honors the
split character. If the text contains no split character, then PROC REPORT tries to
split text at a blank.
Restriction: This option has no effect on ODS destinations other than traditional
FORMAT=format
assigns a SAS or user-dened format to the item. This format applies to report-item
as PROC REPORT displays it; the format does not alter the format associated with a
variable in the data set. For data set variables, PROC REPORT honors the rst of
these formats that it nds:
902
DEFINE Statement
Chapter 42
width. For character variables in the input data set, the default column width is the
variables length. For numeric variables in the input data set and for computed
variables (both numeric and character), the default column width is the value
specied by COLWIDTH= in the PROC REPORT statement or in the ROPTIONS
window.
In the windowing environment, if you are unsure what format to use, then type a
question mark (?) in the format eld in the DEFINITION window to access the
FORMATS window.
Featured in: Example 2 on page 951 and Example 6 on page 964
GROUP
denes report-item, which must be a data set variable, as a group variable. (See
Group Variables on page 854.)
Featured in: Example 4 on page 957, Example 6 on page 964, and Example 14 on
page 986
ID
species that the item that you are dening is an ID variable. An ID variable and all
columns to its left appear at the left of every page of a report. ID ensures that you
can identify each row of the report when the report contains more columns than will
t on one page.
Featured in: Example 6 on page 964
ITEMHELP=entry-name
references a HELP or CBT entry that contains help information for the report item.
Use PROC BUILD in SAS/AF software to create a HELP or CBT entry for a report
item. All HELP and CBT entries for a report must be in the same catalog, and you
must specify that catalog with the HELP= option in the PROC REPORT statement
or from the User Help elds in the ROPTIONS window.
Of course, you can access these entries only from a windowing environment. To
access a Help entry from the report, select the item and issue the HELP command.
PROC REPORT rst searches for and displays an entry named entry-name.CBT. If
no such entry exists, then PROC REPORT searches for entry-name.HELP. If neither
a CBT nor a HELP entry for the selected item exists, then the opening frame of the
Help for PROC REPORT is displayed.
LEFT
left-justies the formatted values of the report item within the column width and
left-justies the column headers over the values. If the format width is the same as
the width of the column, then the LEFT option has no effect on the placement of
values.
MISSING
considers missing values as valid values for the report item. Special missing values
that represent numeric values (the letters A through Z and the underscore (_)
character) are each considered as a separate value.
Default: If you omit the MISSING option, then PROC REPORT excludes from the
report and the output data sets all observations that have a missing value for any
group, order, or across variable.
NOPRINT
DEFINE Statement
903
Interaction: Even though the columns that you dene with NOPRINT do not
appear in the report, you must count them when you are referencing columns by
number (see Four Ways to Reference Report Items in a Compute Block on page
859).
Interaction: SHOWALL in the PROC REPORT statement or the ROPTIONS
window overrides all occurrences of NOPRINT.
Featured in: Example 3 on page 954, Example 9 on page 971, and Example 12 on
page 980
NOZERO
suppresses the display of the report item if its values are all zero or missing.
Interaction: Even though the columns that you dene with NOZERO do not appear
in the report, you must count them when you are referencing columns by number
(see Four Ways to Reference Report Items in a Compute Block on page 859).
Interaction: SHOWALL in the PROC REPORT statement or in the ROPTIONS
window overrides all occurrences of NOZERO.
ORDER
denes report-item, which must be a data set variable, as an order variable. (See
Order Variables on page 853.)
Featured in: Example 2 on page 951
ORDER=DATA|FORMATTED|FREQ|INTERNAL
orders the values of a group, order, or across variable according to the specied order,
where
DATA
orders values according to their order in the input data set.
FORMATTED
orders values by their formatted (external) values. If no format has been assigned
to a class variable, then the default format, BEST12., is used.
FREQ
orders values by ascending frequency count.
INTERNAL
orders values by their unformatted values, which yields the same order that PROC
SORT would yield. This order is operating environment-dependent. This sort
sequence is particularly useful for displaying dates chronologically.
Default: FORMATTED
Interaction: DESCENDING in the items denition reverses the sort sequence for
an item. By default, the order is ascending.
Featured in: Example 2 on page 951
Note: The default value for the ORDER= option in PROC REPORT is not the
same as the default value in other SAS procedures. In other SAS procedures, the
default is ORDER=INTERNAL. The default for the option in PROC REPORT may
change in a future release to be consistent with other procedures. Therefore, in
production jobs where it is important to order report items by their formatted values,
specify ORDER=FORMATTED even though it is currently the default. Doing so
ensures that PROC REPORT will continue to produce the reports you expect even if
the default changes. 4
PAGE
inserts a page break just before printing the rst column containing values of the
report item.
904
DEFINE Statement
Chapter 42
Interaction: PAGE is ignored if you use WRAP in the PROC REPORT statement or
right-justies the formatted values of the specied item within the column width and
right-justies the column headers over the values. If the format width is the same as
the width of the column, then RIGHT has no effect on the placement of values.
SPACING=horizontal-positions
denes the number of blank characters to leave between the column being dened
and the column immediately to its left. For each column, the sum of its width and
the blank characters between it and the column to its left cannot exceed the line size.
Default: 2
Restriction: This option has no effect on ODS destinations other than traditional
SAS monospace output.
Interaction: When PROC REPORTs CENTER option is in effect, PROC REPORT
ignores spacing that precedes the leftmost variable in the report.
Interaction: SPACING= in an items denition overrides the value of SPACING= in
the PROC REPORT statement or in the ROPTIONS window.
statistic
associates a statistic with an analysis variable. You must associate a statistic with
every analysis variable in its denition. PROC REPORT uses the statistic that you
specify to calculate values for the analysis variable for the observations that are
represented by each cell of the report. You cannot use statistic in the denition of any
other kind of variable.
See Statistics That Are Available in PROC REPORT on page 857 for a list of
available statistics.
Default: SUM
Featured in: Example 2 on page 951, Example 3 on page 954, and Example 4 on
page 957
Note: PROC REPORT uses the name of the analysis variable as the default
header for the column. You can customize the column header with the column-header
option in the DEFINE statement. 4
STYLE<(location(s))>=<style-element-name><[style-attribute-specication(s)]>
species the style element to use for column headers and for text inside cells for this
report item. See Using Style Elements in PROC REPORT on page 863 for details.
DEFINE Statement
905
Restriction: This option affects only the HTML, RTF, and Printer destinations.
Featured in:
WEIGHT=weight-variable
species a numeric variable whose values weight the values of the analysis variable
that is specied in the DEFINE statement. The variable value does not have to be an
integer. The following table describes how PROC REPORT treats various values of
the WEIGHT variable.
Weight
Value
less than 0
converts the value to zero and counts the observation in the total number of
observations
missing
To exclude observations that contain negative and zero weights from the analysis,
use the EXCLNPWGT option in the PROC REPORT statement. Note that most
SAS/STAT procedures, such as PROC GLM, exclude negative and zero weights by
default.
Restriction: to compute weighted quantiles, use QMETHOD=OS in the PROC
REPORT statement.
Tip: When you use the WEIGHT= option, consider which value of the VARDEF=
option in the PROC REPORT statement is appropriate.
Tip: Use the WEIGHT= option in separate variable denitions in order to specify
different weights for the variables.
Note: Prior to Version 7 of SAS, the REPORT procedure did not exclude the
observations with missing weights from the count of observations. 4
WIDTH=column-width
denes the width of the column in which PROC REPORT displays report-item.
Default: A column width that is just large enough to handle the format. If there is
no format, then PROC REPORT uses the value of the COLWIDTH= option in the
PROC REPORT statement.
Range: 1 to the value of the SAS system option LINESIZE=
Restriction: This option has no effect on ODS destinations other than traditional
SAS monospace output.
Interaction: WIDTH= in an item denition overrides the value of COLWIDTH= in
the PROC REPORT statement or the ROPTIONS window.
Tip: When you stack items in the same column in a report, the width of the item
that is at the bottom of the stack determines the width of the column.
Featured in: Example 10 on page 975
906
ENDCOMP Statement
Chapter 42
ENDCOMP Statement
Marks the end of one or more programming statements that PROC REPORT executes as it builds
the report.
Restriction:
ENDCOMP;
FREQ Statement
Treats observations as if they appear multiple times in the input data set.
Tip: The effects of the FREQ and WEIGHT statements are similar except when
calculating degrees of freedom.
See also: For an example that uses the FREQ statement, see Example on page 62
FREQ variable;
Required Arguments
variable
species a numeric variable whose value represents the frequency of the observation.
If you use the FREQ statement, then the procedure assumes that each observation
represents n observations, where n is the value of variable. If n is not an integer,
then SAS truncates it. If n is less than 1 or is missing, then the procedure does not
use that observation to calculate statistics.
LINE Statement
907
LINE Statement
Provides a subset of the features of the PUT statement for writing customized summaries.
Restriction: This statement is valid only in a compute block that is associated with a
location in the report.
Restriction: You cannot use the LINE statement in conditional statements (IF-THEN,
IF-THEN/ELSE, and SELECT) because it is not executed until PROC REPORT has
executed all other statements in the compute block.
Featured in:
Example 2 on page 951, Example 3 on page 954, and Example 9 on page 971
LINE specication(s);
Required Arguments
specication(s)
can have one of the following forms. You can mix different forms of specications in
one LINE statement.
item item-format
species the item to display and the format to use to display it, where
item
is the name of a data set variable, a computed variable, or a statistic in the
report. For information about referencing report items see Four Ways to
Reference Report Items in a Compute Block on page 859.
item-format
is a SAS format or user-dened format. You must specify a format for each item.
Featured in: Example 2 on page 951
character-string
species a string of text to display. When the string is a blank and nothing else is
in specication(s), PROC REPORT prints a blank line.
Note: A hexadecimal value (such as DFx) that is specied within
character-string will not resolve because it is specied within quotation marks. To
resolve a hexadecimal value, use the %sysfunc(byte(num)) function, where num
is the hexadecimal value. Be sure to enclose character-string in double quotation
marks (" ") so that the macro function will resolve. 4
Featured in: Example 2 on page 951
number-of-repetitions*character-string
species a character string and the number of times to repeat it.
Featured in: Example 3 on page 954
pointer-control
species the column in which PROC REPORT displays the next specication. You
can use either of the following forms for pointer controls:
@column-number
species the number of the column in which to begin displaying the next item in
the specication list.
908
RBREAK Statement
Chapter 42
+column-increment
species the number of columns to skip before beginning to display the next
item in the specication list.
Both column-number and column-increment can be either a variable or a literal
value.
Restriction: The pointer controls are designed for monospace output. They have
no effect on the HTML, RTF, or Printer output.
Featured in: Example 3 on page 954 and Example 5 on page 960
RBREAK Statement
Produces a default summary at the beginning or end of a report or at the beginning or end of each
BY group.
Featured in:
To do this
COLOR=
DOL*
DUL*
OL*
Start a new page after the last break line of a break located at the
beginning of the report
PAGE
Write a blank line for the last break line of a break located at the
beginning of the report
SKIP*
STYLE=
SUMMARIZE
UL*
RBREAK Statement
909
Required Arguments
location
controls the placement of the break lines and is either of the following:
AFTER
places the break lines at the end of the report.
BEFORE
places the break lines at the beginning of the report.
Options
COLOR=color
species the color of the break lines in the REPORT window. You can use the
following colors:
BLACK
MAGENTA
BLUE
ORANGE
BROWN
PINK
CYAN
RED
GRAY
WHITE
GREEN
YELLOW
Note: Not all operating environments and devices support all colors, and in some
operating environments and devices, one color may map to another color. For
example, if the DEFINITION window displays the word BROWN in yellow
characters, then selecting BROWN results in a yellow item. 4
DOL
(for double overlining) uses the thirteenth formatting character to overline each value
910
RBREAK Statement
Chapter 42
Featured in:
DUL
(for double underlining) uses the thirteenth formatting character to underline each
value
(for overlining) uses the second formatting character to overline each value
PAGE
starts a new page after the last break line of a break located at the beginning of the
report.
SKIP
writes a blank line after the last break line of a break located at the beginning of the
report.
Restriction: This option has no effect on ODS destinations other than traditional
species the style element to use for default summary lines that are created with the
RBREAK statement. See Using Style Elements in PROC REPORT on page 863 for
details.
Restriction: This option affects only the HTML, RTF, and Printer destinations.
SUMMARIZE
includes a summary line as one of the break lines. A summary line at the beginning
or end of a report contains values for
3 statistics
3 analysis variables
3 computed variables.
The following table shows how PROC REPORT calculates the value for each kind
of report item in a summary line created by the RBREAK statement:
RBREAK Statement
a statistic
an analysis variable
a computed variable
911
Featured in:
UL
(for underlining) uses the second formatting character to underline each value
3 that appears in the summary line
3 that would appear in the summary line if you specied the SUMMARIZE option.
Default: hyphen (-)
Restriction: This option has no effect on ODS destinations other than traditional
only)
2 summary line (SUMMARIZE)
3 underlining or double underlining (UL or DUL, traditional SAS monospace output
only)
4 skipped line (SKIP, traditional SAS monospace output only)
5 page break (PAGE).
Note: If you dene a customized summary for the break, then customized break
lines appear after underlining or double underlining. For more information about
customized break lines, see COMPUTE Statement on page 895 and LINE Statement
on page 907. 4
912
WEIGHT Statement
Chapter 42
WEIGHT Statement
Species weights for analysis variables in the statistical calculations.
See also: For information about calculating weighted statistics see Calculating
Weighted Statistics on page 64. For an example that uses the WEIGHT statement, see
Weighted Statistics Example on page 65.
WEIGHT variable;
Required Arguments
variable
species a numeric variable whose values weight the values of the analysis variables.
The value of the variable does not have to be an integer. If the value of variable is
Weight value
PROC REPORT
less than 0
missing
To exclude observations that contain negative and zero weights from the analysis,
use EXCLNPWGT. Note that most SAS/STAT procedures, such as PROC GLM,
exclude negative and zero weights by default.
When you use the WEIGHT statement, consider which value of the VARDEF=
option is appropriate. See VARDEF= on page 883 and the calculation of weighted
statistics in Keywords and Formulas on page 1340 for more information.
Tip:
Note: Prior to Version 7 of SAS, the procedure did not exclude the observations
with missing weights from the count of observations. 4
BREAK
913
BREAK
Controls PROC REPORTs actions at a change in the value of a group or order variable or at the
top or bottom of a report.
Path
Edit
Summarize information
After you select Summarize Information, PROC REPORT offers you four choices for
the location of the break:
3
3
3
3
Before Item
After Item
At the top
At the bottom.
Description
Note: For information about changing the formatting characters that are used by the
line drawing options in this window, see the discussion of FORMCHAR= on page 874. 4
Options
Overline summary
914
BREAK
Chapter 42
REPORT overlines.
Double overline summary
REPORT overlines.
Underline summary
starts a new page after the last break line. This option has no effect in a break at the
end of a report.
Interaction: If you use this option in a break on a variable and you create a break at
the end of the report, then the summary for the whole report is on a separate page.
Summarize analysis columns
writes a summary line in each group of break lines. A summary line contains values
for
3 statistics
3 analysis variables
3 computed variables.
A summary line between sets of observations also contains
3 the break variable (which you can suppress with Suppress break value)
3 other group or order variables to the left of the break variable.
The following table shows how PROC REPORT calculates the value for each kind
of report item in a summary line created by the BREAK window:
BREAK
missing*
a statistic
an analysis variable
a computed variable
915
If you reference a variable with a missing value in a customized summary line, then PROC
REPORT displays that variable as a blank (for character variables) or a period (for numeric
variables).
suppresses printing of
Color
From the list of colors, select the one to use in the REPORT window for the column
header and the values of the item that you are dening.
Default: The color of Foreground in the SASCOLOR window. (For more
Note:
916
COMPUTE
Chapter 42
Buttons
Edit Program
opens the COMPUTE window and enables you to associate a compute block with a
location in the report.
OK
applies the information in the BREAK window to the report and closes the window.
Cancel
COMPUTE
Attaches a compute block to a report item or to a location in the report. Use the SAS Text Editor
commands to manipulate text in this window.
Path
From Edit Program in the COMPUTED VAR, DEFINITION, or BREAK window.
Description
For information about the SAS language features that you can use in the COMPUTE
window, see The Contents of Compute Blocks on page 859.
COMPUTED VAR
Adds a variable that is not in the input data set to the report.
Path
Select a column. Then select
Edit
Add Item
Computed Column
After you select Computed Column, PROC REPORT prompts you for the location of
the computed column relative to the column that you have selected. After you select a
location, the COMPUTED VAR window opens.
Description
Enter the name of the variable at the prompt. If it is a character variable, then
select the Character data check box and, if you want, enter a value in the Length
eld. The length can be any integer between 1 and 200. If you leave the eld blank,
then PROC REPORT assigns a length of 8 to the variable.
DATA SELECTION
917
After you enter the name of the variable, select Edit Program to open the COMPUTE
window. Use programming statements in the COMPUTE window to dene the
computed variable. After closing the COMPUTE and COMPUTED VAR windows, open
the DEFINITION window to describe how to display the computed variable.
Note: The position of a computed variable is important. PROC REPORT assigns
values to the columns in a row of a report from left to right. Consequently, you cannot
base the calculation of a computed variable on any variable that appears to its right in
the report. 4
DATA COLUMNS
Lists all variables in the input data set so that you can add one or more data set variables to the
report.
Path
Select a report item. Then select
Edit
Add Item
Data Column
After you select Data column, PROC REPORT prompts you for the location of the
computed column relative to the column that you have selected. After you select a
location, the DATA COLUMNS window opens.
Description
Select one or more variables to add to the report. When you select the rst variable,
it moves to the top of the list in the window. If you select multiple variables, then
subsequent selections move to the bottom of the list of selected variables. An asterisk
(*) identies each selected variable. The order of selected variables from top to bottom
determines their order in the report from left to right.
DATA SELECTION
Loads a data set into the current report denition.
Path
File
Description
The rst list box in the DATA SELECTION window lists all the librefs dened for
your SAS session. The second one lists all the SAS data sets in the selected library.
918
DEFINITION
Chapter 42
Note: You must use data that is compatible with the current report denition. The
data set that you load must contain variables whose names are the same as the
variable names in the current report denition. 4
Buttons
OK
loads the selected data set into the current report denition.
Cancel
DEFINITION
Displays the characteristics associated with an item in the report and lets you change them.
Path
Select a report item. Then select
Edit
Dene
Description
Usage
For an explanation of each type of usage see Laying Out a Report on page 852.
DISPLAY
denes the selected item as a display variable. DISPLAY is the default for character
variables.
ORDER
DEFINITION
919
GROUP
denes the selected item as an analysis variable. You must specify a statistic (see the
discussion of the Statistic= attribute on page 920) for an analysis variable.
ANALYSIS is the default for numeric variables.
COMPUTED
denes the selected item as a computed variable. Computed variables are variables
that you dene for the report. They are not in the input data set, and PROC
REPORT does not add them to the input data set. However, computed variables are
included in an output data set if you create one.
In the windowing environment, you add a computed variable to a report from the
COMPUTED VAR window.
Attributes
Format=
assigns a SAS or user-dened format to the item. This format applies to the selected
item as PROC REPORT displays it; the format does not alter the format that is
associated with a variable in the data set. For data set variables, PROC REPORT
honors the rst of these formats that it nds:
3 the format that is assigned with FORMAT= in the DEFINITION window
3 the format that is assigned in a FORMAT statement when you start PROC
REPORT
3 the format that is associated with the variable in the data set.
If none of these is present, then PROC REPORT uses BESTw. for numeric
variables and $w. for character variables. The value of w is the default column
width. For character variables in the input data set, the default column width is the
variables length. For numeric variables in the input data set and for computed
variables (both numeric and character), the default column width is the value of the
COLWIDTH= attribute in the ROPTIONS window.
If you are unsure what format to use, then type a question mark (?) in the format
eld in the DEFINITION window to access the FORMATS window.
Spacing=
denes the number of blank characters to leave between the column being dened
and the column immediately to its left. For each column, the sum of its width and
the blank characters between it and the column to its left cannot exceed the line size.
Default: 2
Interaction: When PROC REPORTs CENTER option is in effect, PROC REPORT
ignores spacing that precedes the leftmost variable in the report.
Interaction: SPACING= in an item denition overrides the value of SPACING= in
the PROC REPORT statement or the ROPTIONS window.
Width=
denes the width of the column in which PROC REPORT displays the selected item.
Range: 1 to the value of the SAS system option LINESIZE=
Default: A column width that is just large enough to handle the format. If there is
no format, then PROC REPORT uses the value of COLWIDTH=.
920
DEFINITION
Chapter 42
Note: When you stack items in the same column in a report, the width of the
item that is at the bottom of the stack determines the width of the column. 4
Statistic=
associates a statistic with an analysis variable. You must associate a statistic with
every analysis variable in its denition. PROC REPORT uses the statistic that you
specify to calculate values for the analysis variable for the observations represented
by each cell of the report. You cannot use statistic in the denition of any other kind
of variable.
Default: SUM
Note: PROC REPORT uses the name of the analysis variable as the default
header for the column. You can customize the column header with the Header eld of
the DEFINITION window. 4
You can use the following values for statistic:
Descriptive statistic keywords
CSS
PCTSUM
CV
RANGE
MAX
STDDEV|STD
MEAN
STDERR
MIN
SUM
SUMWGT
NMISS
USS
PCTN
VAR
Q3|P75
P1
P90
P5
P95
P10
P99
Q1|P25
QRANGE
Explanations of the keywords, the formulas that are used to calculate them, and
the data requirements are discussed in Appendix 1, SAS Elementary Statistics
Procedures, on page 1339.
To compute standard error and the Students t-test you must use the
default value of VARDEF= which is DF.
Requirement:
See also: For denitions of these statistics, see Keywords and Formulas on page
1340.
Order=
DEFINITION
921
DATA
orders values according to their order in the input data set.
FORMATTED
orders values by their formatted (external) values. By default, the order is
ascending.
FREQ
orders values by ascending frequency count.
INTERNAL
orders values by their unformatted values, which yields the same order that PROC
SORT would yield. This order is operating environment-dependent. This sort
sequence is particularly useful for displaying dates chronologically.
Default: FORMATTED
Interaction: DESCENDING in the items denition reverses the sort sequence for
an item.
Note: The default value for the ORDER= option in PROC REPORT is not the
same as the default value in other SAS procedures. In other SAS procedures, the
default is ORDER=INTERNAL. The default for the option in PROC REPORT may
change in a future release to be consistent with other procedures. Therefore, in
production jobs where it is important to order report items by their formatted values,
specify ORDER=FORMATTED even though it is currently the default. Doing so
ensures that PROC REPORT will continue to produce the reports you expect even if
the default changes. 4
Justify=
You can justify the placement of the column header and of the values of the item that
you are dening within a column in one of three ways:
LEFT
left-justies the formatted values of the item that you are dening within the
column width and left-justies the column header over the values. If the format
width is the same as the width of the column, then LEFT has no effect on the
placement of values.
RIGHT
right-justies the formatted values of the item that you are dening within the
column width and right-justies the column header over the values. If the format
width is the same as the width of the column, then RIGHT has no effect on the
placement of values.
CENTER
centers the formatted values of the item that you are dening within the column
width and centers the column header over the values. This option has no effect on
the setting of the SAS system option CENTER.
When justifying values, PROC REPORT justies the eld width dened by the
format of the item within the column. Thus, numbers are always aligned.
Data type=
shows you if the report item is numeric or character. You cannot change this eld.
Item Help=
references a HELP or CBT entry that contains help information for the selected item.
Use PROC BUILD in SAS/AF software to create a HELP or CBT entry for a report
item. All HELP and CBT entries for a report must be in the same catalog, and you
must specify that catalog with the HELP= option in the PROC REPORT statement
or from the User Help elds in the ROPTIONS window.
922
DEFINITION
Chapter 42
To access a help entry from the report, select the item and issue the HELP
command. PROC REPORT rst searches for and displays an entry named
entry-name.CBT. If no such entry exists, then PROC REPORT searches for
entry-name.HELP. If neither a CBT nor a HELP entry for the selected item exists,
then the opening frame of the help for PROC REPORT is displayed.
Alias=
By entering a name in the Alias eld, you create an alias for the report item that
you are dening. Aliases let you distinguish between different uses of the same
report item. When you refer in a compute block to a report item that has an alias,
you must use the alias (see Example 3 on page 954).
Options
NOPRINT
suppresses the display of the item that you are dening. Use this option
3 if you do not want to show the item in the report but you need to use the values
in it to calculate other values that you use in the report
3 to establish the order of rows in the report
3 if you do not want to use the item as a column but want to have access to its
values in summaries (see Example 9 on page 971).
Interaction: Even though the columns that you dene with NOPRINT do not
appear in the report, you must count them when you are referencing columns by
number (see Four Ways to Reference Report Items in a Compute Block on page
859).
Interaction: SHOWALL in the PROC REPORT statement or the ROPTIONS
window overrides all occurrences of NOPRINT.
NOZERO
suppresses the display of the item that you are dening if its values are all zero or
missing.
Interaction: Even though the columns that you dene with NOZERO do not appear
in the report, you must count them when you are referencing columns by number
(see Four Ways to Reference Report Items in a Compute Block on page 859).
Interaction: SHOWALL in the PROC REPORT statement or the ROPTIONS
window overrides all occurrences of NOZERO.
DESCENDING
reverses the order in which PROC REPORT displays rows or values of a group, order,
or across variable.
PAGE
inserts a page break just before printing the rst column containing values of the
selected item.
Interaction: PAGE is ignored if you use WRAP in the PROC REPORT statement or
in the ROPTIONS window.
FLOW
wraps the value of a character variable in its column. The FLOW option honors the
split character. If the text contains no split character, then PROC REPORT tries to
split text at a blank.
ID column
species that the item that you are dening is an ID variable. An ID variable and all
columns to its left appear at the left of every page of a report. ID ensures that you
DISPLAY PAGE
923
can identify each row of the report when the report contains more columns than will
t on one page.
Color
From the list of colors, select the one to use in the REPORT window for the column
header and the values of the item that you are dening.
Note:
Buttons
Apply
applies the information in the open window to the report and keeps the window open.
Edit Program
opens the COMPUTE window and enables you to associate a compute block with the
variable that you are dening.
OK
applies the information in the DEFINITION window to the report and closes the
window.
Cancel
closes the DEFINITION window without applying changes made with APPLY .
DISPLAY PAGE
Displays a particular page of the report.
Path
View
Display Page
Description
You can get to the last page of the report by entering a large number for the page
number. When you are on the last page of the report, PROC REPORT sends a note to
the message line of the REPORT window.
924
EXPLORE
Chapter 42
EXPLORE
Lets you experiment with your data.
Restriction: You cannot open the EXPLORE window unless your report contains at least
one group or order variable.
Path
Edit
Explore Data
Description
In the EXPLORE window you can
Window Features
list boxes
The EXPLORE window contains three list boxes. These boxes contain the value All
levels as well as actual values for the rst three group or order variables in your
report. The values reect any WHERE clause processing that is in effect. For
example, if you use a WHERE clause to subset the data so that it includes only the
northeast and northwest sectors, then the only values that appear in the list box for
Sector are All levels, Northeast, and Northwest. Selecting All levels in this
case displays rows of the report for only the northeast and northwest sectors. To see
data for all the sectors, you must clear the WHERE clause before you open the
EXPLORE window.
Selecting values in the list boxes restricts the display in the REPORT window to
the values that you select. If you select incompatible values, then PROC REPORT
returns an error.
Remove Column
Above each list box in the EXPLORE window is a check box labeled Remove Column.
Selecting this check box and applying the change removes the column from the
REPORT window. You can easily restore the column by clearing the check box and
applying that change.
Buttons
OK
applies the information in the EXPLORE window to the report and closes the window.
LOAD REPORT
925
Apply
applies the information in the EXPLORE window to the report and keeps the window
open.
Rotate columns
changes the order of the variables displayed in the list boxes. Each variable that can
move one column to the left does; the leftmost variable moves to the third column.
Cancel
closes the EXPLORE window without applying changes made with APPLY .
FORMATS
Displays a list of formats and provides a sample of each one.
Path
From the DEFINE window, type a question mark (?) in the Format eld and select
any of the Buttons except Cancel, or press RETURN.
Description
When you select a format in the FORMATS window, a sample of that format appears
in the Sample: eld. Select the format that you want to use for the variable that you
are dening.
Buttons
OK
writes the format that you have selected into the Format eld in the DEFINITION
window and closes the FORMATS window. To see the format in the report, select
Apply in the DEFINITION window.
Cancel
closes the FORMATS window without writing a format into the Format eld.
LOAD REPORT
Loads a stored report denition.
Path
File
Open Report
Description
The rst list box in the LOAD REPORT window lists all the librefs that are dened
for your SAS session. The second list box lists all the catalogs that are in the selected
926
MESSAGES
Chapter 42
library. The third list box lists descriptions of all the stored report denitions (entry
types of REPT) that are in the selected catalog. If there is no description for an entry,
then the list box contains the entrys name.
Buttons
OK
closes the LOAD REPORT window without loading a new report denition.
Note: Issuing the END command in the REPORT window returns you to the
previous report denition (with the current data). 4
MESSAGES
Automatically opens to display notes, warnings, and errors returned by PROC REPORT.
You must close the MESSAGES window by selecting OK before you can continue to
use PROC REPORT.
PROFILE
Customizes some features of the PROC REPORT environment by creating a report prole.
Path
Tools
Report Prole
Description
The PROFILE window creates a report prole that
3 species the SAS library, catalog, and entry that dene alternative menus to use
in the REPORT and COMPUTE windows. Use PROC PMENU to create catalog
entries of type PMENU that dene these menus. PMENU entries for both
windows must be in the same catalog.
3 sets defaults for WINDOWS, PROMPT, and COMMAND. PROC REPORT uses the
default option whenever you start the procedure unless you specically override
the option in the PROC REPORT statement.
Specify the catalog that contains the prole to use with the PROFILE= option in the
PROC REPORT statement (see the discussion of PROFILE= on page 880).
PROMPTER
927
Buttons
OK
PROMPTER
Prompts you for information as you add items to a report.
Path
Specify the PROMPT option when you start PROC REPORT or select PROMPT from
the ROPTIONS window. The PROMPTER window opens the next time that you add an
item to the report.
Description
The prompter guides you through parts of the windows that are most commonly used
to build a report. As the content of the PROMPTER window changes, the title of the
window changes to the name of the window that you would use to perform a task if you
were not using the prompter. The title change is to help you begin to associate the
windows with their functions and to learn what window to use if you later decide to
change something.
If you start PROC REPORT with prompting, then the rst window gives you a
chance to limit the number of observations that are used during prompting. When you
exit the prompter, PROC REPORT removes the limit.
Buttons
OK
applies the information in the open window to the report and continues the
prompting process.
Note: When you select OK from the last prompt window, PROC REPORT
removes any limit on the number of observations that it is working with. 4
Apply
applies the information in the open window to the report and keeps the window open.
Backup
closes the PROMPTER window without applying any more changes to the report. If
you have limited the number of observations to use during prompting, then PROC
REPORT removes the limit.
928
REPORT
Chapter 42
REPORT
Is the surface on which the report appears.
Path
Use WINDOWS or PROMPT in the PROC REPORT statement.
Description
You cannot write directly in any part of the REPORT window except column headers.
To change other aspects of the report, you select a report item (for example, a column
heading) as the target of the next command and issue the command. To select an item,
use a mouse or cursor keys to position the cursor over it. Then click the mouse button
or press ENTER. To execute a command, make a selection from the menu bar at the top
of the REPORT window. PROC REPORT displays the effect of a command immediately
unless the DEFER option is on.
Note: Issuing the END command in the REPORT window returns you to the
previous report denition with the current data. If there is no previous report
denition, then END closes the REPORT window. 4
ROPTIONS
Displays choices that control the layout and display of the entire report and identies the SAS data
library and catalog containing CBT or HELP entries for items in the report.
Path
Tools
Options
Report
ROPTIONS
929
Description
Modes
DEFER
stores the information for changes and makes the changes all at once when you turn
DEFER mode off or select
View
Refresh
DEFER is particularly useful when you know that you need to make several
changes to the report but do not want to see the intermediate reports.
By default, PROC REPORT redisplays the report in the REPORT window each
time you redene the report by adding or deleting an item, by changing information
in the DEFINITION window, or by changing information in the BREAK window.
PROMPT
opens the PROMPTER window the next time that you add an item to the report.
Options
CENTER
centers the report and summary text (customized break lines). If CENTER is not
selected, then the report is left-justied.
PROC REPORT honors the rst of these centering specications that it nds:
3 the CENTER or NOCENTER option stored in the report denition loaded with
REPORT= in the PROC REPORT statement
underlines all column headers and the spaces between them at the top of each page
of the report.
930
ROPTIONS
Chapter 42
HEADLINE underlines with the second formatting character. (See the discussion
of FORMCHAR= on page 874.)
Default: hyphen (-)
Tip:
HEADSKIP
writes a blank line beneath all column headers (or beneath the underlining that the
HEADLINE option writes) at the top of each page of the report.
NAMED
writes name= in front of each value in the report, where name is the column header
for the value.
Use NAMED in conjunction with WRAP to produce a report that wraps all
columns for a single row of the report onto consecutive lines rather than placing
columns of a wide report on separate pages.
Tip:
NOHEADER.
NOHEADER
overrides the parts of a denition that suppress the display of a column (NOPRINT
and NOZERO). You dene a report item with a DEFINE statement or in the
DEFINITION window.
WRAP
displays one value from each column of the report, on consecutive lines if necessary,
before displaying another value from the rst column. By default, PROC REPORT
displays values for only as many columns as it can t on one page. It lls a page
with values for these columns before starting to display values for the remaining
columns on the next page.
Interaction: When WRAP is in effect, PROC REPORT ignores PAGE in any item
denitions.
Typically, you use WRAP in conjunction with NAMED to avoid wrapping
column headers.
Tip:
BOX
considers missing values as valid values for group, order, or across variables. Special
missing values that are used to represent numeric values (the letters A through Z
ROPTIONS
931
and the underscore (_) character) are each considered as a different value. A group
for each missing value appears in the report. If you omit the MISSING option, then
PROC REPORT does not include observations with a missing value for one or more
group, order, or across variables in the report.
Attributes
Linesize
species the line size for a report. PROC REPORT honors the rst of these line-size
specications that it nds:
If the line size is greater than the width of the REPORT window, then use SAS
windowing environment commands RIGHT and LEFT to display portions of the
report that are not currently in the display.
Tip:
Pagesize
species the page size for a report. PROC REPORT honors the rst of these page
size specications that it nds:
species the default number of characters for columns containing computed variables
or numeric data set variables.
Range: 1 to the linesize
Default: 9
Interaction: When setting the width for a column, PROC REPORT rst looks at
WIDTH= in the denition for that column. If WIDTH= is not present, then PROC
REPORT uses a column width large enough to accommodate the format for the
item. (For information about formats, see the discussion of Format= on page 919.)
If no format is associated with the item, then the column width depends on
variable type:
If the variable is a
932
ROPTIONS
Chapter 42
SPACING=space-between-columns
species the number of blank characters between columns. For each column, the sum
of its width and the blank characters between it and the column to its left cannot
exceed the line size.
Default: 2
Interaction: PROC REPORT separates all columns in the report by the number of
species the split character. PROC REPORT breaks a column header when it
reaches that character and continues the header on the next line. The split character
itself is not part of the column header although each occurrence of the split character
counts toward the 40-character maximum for a label.
Default: slash (/)
Interaction: The FLOW option in the DEFINE statement honors the split character.
If you are typing over a header (rather than entering one from the
PROMPTER or DEFINITION window), then you do not see the effect of the split
character until you refresh the screen by adding or deleting an item, by changing
the contents of a DEFINITION or a BREAK window, or by selecting
Note:
View
Refresh
PANELS=number-of-panels
species the number of panels on each page of the report. If the width of a report is
less than half of the line size, then you can display the data in multiple sets of
columns so that rows that would otherwise appear on multiple pages appear on the
same page. Each set of columns is a panel. A familiar example of this kind of report
is a telephone book, which contains multiple panels of names and telephone numbers
on a single page.
When PROC REPORT writes a multipanel report, it lls one panel before
beginning the next.
The number of panels that ts on a page depends on the
Tip:
See also: For information about specifying the space between panels see the
discussion of PSPACE= on page 932. For information about setting the linesize,
see the discussion of Linesize on page 931).
PSPACE=space-between-panels
species the number of blank characters between panels. PROC REPORT separates
all panels in the report by the same number of blank characters. For each panel, the
SAVE DEFINITION
933
sum of its width and the number of blank characters separating it from the panel to
its left cannot exceed the line size.
Default: 4
User Help
identies the library and catalog containing user-dened help for the report. This
help can be in CBT or HELP catalog entries. You can write a CBT or HELP entry for
each item in the report with the BUILD procedure in SAS/AF software. You must
store all such entries for a report in the same catalog.
Specify the entry name for help for a particular report item in the DEFINITION
window for that report item or in a DEFINE statement.
Path
File
Description
To specify an output data set, enter the name of the SAS data library and the name
of the data set (called member in the window) that you want to create in the Save Data
Set window.
Buttons
OK
Creates the output data set and closes the Save Data Set window.
Cancel
Closes the Save Data Set window without creating an output data set.
SAVE DEFINITION
Saves a report denition for subsequent use with the same data set or with a similar data set.
Path
File
Save Report
Description
The SAVE DEFINITION window prompts you for the complete name of the catalog
entry in which to store the denition of the current report and for an optional
934
SOURCE
Chapter 42
description of the report. This description shows up in the LOAD REPORT window and
helps you to select the appropriate report.
SAS stores the report denition as a catalog entry of type REPT. You can use a report
denition to create an identically structured report for any SAS data set that contains
variables with the same names as those used in the report denition.
Buttons
OK
Creates the report denition and closes the SAVE DEFINITION window.
Cancel
SOURCE
Lists the PROC REPORT statements that build the current report.
Path
Tools
Report Statements
STATISTICS
Displays statistics that are available in PROC REPORT.
Path
Edit
Add item
Statistic
After you select Statistic, PROC REPORT prompts you for the location of the
statistic relative to the column that you have selected. After you select a location, the
STATISTICS window opens.
Description
Select the statistics that you want to include in your report and close the window.
When you select the rst statistic, it moves to the top of the list in the window. If you
select multiple statistics, then subsequent selections move to the bottom of the list of
selected statistics. An asterisk (*) indicates each selected statistic. The order of selected
statistics from top to bottom determines their order in the report from left to right.
Note: If you double-click on a statistic, then PROC REPORT immediately adds it to
the report. The STATISTICS window remains open. 4
WHERE ALSO
935
To compute standard error and the Students t test you must use the default value of
VARDEF= which is DF.
To add all selected statistics to the report, select
File
Accept Selection
Selecting
File
Close
closes the STATISTICS window without adding the selected statistics to the report.
WHERE
Selects observations from the data set that meet the conditions that you specify.
Path
Subset
Where
Description
Enter a where-expression in the Enter where clause eld. A where-expression is an
arithmetic or logical expression that generally consists of a sequence of operands and
operators. For information about constructing a where-expression, see the
documentation of the WHERE statement in the section on statements in SAS Language
Reference: Dictionary.
Note: You can clear all where-expressions by leaving the Enter where clause eld
empty and by selecting OK . 4
Buttons
OK
Applies the where-expression to the report and closes the WHERE window.
Cancel
WHERE ALSO
Selects observations from the data set that meet the conditions that you specify and any other
conditions that are already in effect.
936
Chapter 42
Path
Subset
Where Also
Description
Enter a where-expression in the Enter where also clause eld. A
where-expression is an arithmetic or logical expression that generally consists of a
sequence of operands and operators. For information about constructing a
where-expression, see the documentation of the WHERE statement in the chapter on
statements in SAS Language Reference: Dictionary.
Buttons
OK
Adds the where-expression to any other where-expressions that are already in effect
and applies them all to the report. It also closes the WHERE ALSO window.
Cancel
Sequence of Events
This section explains the general process of building a report. For examples that
illustrate this process, see Report-Building Examples on page 937. The sequence of
events is the same whether you use programming statements or the windowing
environment.
To understand the process of building a report, you must understand the difference
between report variables and temporary variables. Report variables are variables that
are specied in the COLUMN statement. A report variable can come from the input
data set or can be computed (that is, the DEFINE statement for that variable species
the COMPUTED option). A report variable might or might not appear in a compute
block. Variables that appear only in one or more compute blocks are temporary
variables. Temporary variables do not appear in the report and are not written to the
output data set (if one is requested).
PROC REPORT constructs a report as follows:
1 It consolidates the data by group, order, and across variables. It calculates all
statistics for the report, those for detail rows as well as those for summary lines in
breaks. Statistics include those computed for analysis variables. PROC REPORT
calculates statistics for summary lines whether or not they appear in the report. It
stores all this information in a temporary le.
2 It initializes all temporary variables to missing.
3 It begins constructing the rows of the report.
a At the beginning of each row, it initializes all report variables to missing.
Report-Building Examples
937
3 Values for all other variables come from the temporary le that was
created at the beginning of the report-building process.
c Whenever it comes to a break, PROC REPORT rst constructs the break
lines that are created with the BREAK or RBREAK statement or with
options in the BREAK window. If there is a compute block attached to the
break, then PROC REPORT then executes the statements in the compute
block. See Construction of Summary Lines on page 937 for details.
Note: Because of the way PROC REPORT builds a report, you can
3 use group statistics in compute blocks for a break before the group variable.
3 use statistics for the whole report in a compute block at the beginning of the
report.
This document references these statistics with the appropriate compound name.
For information about referencing report items in a compute block, see Four Ways
to Reference Report Items in a Compute Block on page 859. 4
Report-Building Examples
Building a Report That Uses Groups and a Report Summary
The report in Output 42.2 contains ve columns:
3 Sector and Department are group variables.
3 Sales is an analysis variable that is used to calculate the Sum statistic.
3 Prot is a computed variable whose value is based on the value of Department.
3 The N statistic indicates how many observations each row represents.
At the end of the report a break summarizes the statistics and computed variables in
the report and assigns to Sector the value of TOTALS:.
The following statements produce Output 42.2. The user-dened formats that are
used are created by a PROC FORMAT step on page 949.
938
Report-Building Examples
Chapter 42
Output 42.2
Sector
Department
Sales
Profit
N
-----------------------------------------------------Northeast
Northwest
=========
TOTALS:
=========
Canned
Meat/Dairy
Paper
Produce
Canned
Meat/Dairy
Paper
Produce
$840.00
$490.00
$290.00
$211.00
$1,070.00
$1,055.00
$150.00
$179.00
=========
$4,285.00
=========
$336.00
$122.50
$116.00
$52.75
$428.00
$263.75
$60.00
$44.75
=========
$1,071.25
=========
2
2
2
2
3
3
3
3
=========
20
=========
Department are group variables) and by calculating the statistics (Sales.sum and
N) for each detail row and for the break at the end of the report. It stores these
values in a temporary le.
2 Now, PROC REPORT is ready to start building the rst row of the report. This
report does not contain a break at the beginning of the report or a break before
Report-Building Examples
939
any groups, so the rst row of the report is a detail row. The procedure initializes
all report variables to missing, as Figure 42.9 on page 939 illustrates. Missing
values for a character variable are represented by a blank, and missing values for
a numeric variable are represented by a period.
Figure 42.9
Sector
Department
Sales
Profit
3 Figure 42.10 on page 939 illustrates the construction of the rst three columns of
the row. PROC REPORT lls in values for the row from left to right. Values come
from the temporary le that is created at the beginning of the report-building
process.
Figure 42.10 First Detail Row with Values Filled in from Left to Right
Sector
Department
Profit
Northeast
Sales
Sector
Department
Sales
Profit
Northeast
Canned
Sector
Department
Sales
Profit
Northeast
Canned
$840.00
4 The next column in the report contains the computed variable Prot. When it gets
to this column, PROC REPORT executes the statements in the compute block that
is attached to Prot. Nonperishable items (which have a value of np1 or np2)
return a prot of 40%; perishable items (which have a value of p1 or p2) return a
prot of 25%.
if department=np1 or department=np2
then profit=0.4*sales.sum;
else profit=0.25*sales.sum;
940
Report-Building Examples
Chapter 42
you cannot base the calculation of a computed variable on any variable that
appears to its right in the report. 4
Figure 42.11 A Computed Variable Added to the First Detail Row
Sector
Department
Sales
Profit
Northeast
Canned
$840.00
$336.00
5 Next, PROC REPORT lls in the value for the N statistic. The value comes from
Sector
Department
Sales
Profit
Northeast
Canned
$840.00
$336.00
Sector
Department
Sales
Profit
$4,285.00
$1,071.25
20
9 If no compute block is attached to the break, then the preliminary version of the
summary line is the same as the nal version. However, in this example, a
compute block is attached to the break. Therefore, PROC REPORT now executes
the statements in that compute block. In this case, the compute block contains one
statement:
Report-Building Examples
941
sector=TOTALS:;
This statement replaces the value of Sector, which in the summary line is
missing by default, with the word TOTALS:. After PROC REPORT executes the
statement, it modies the summary line to reect this change to the value of
Sector. The nal version of the summary line appears in Figure 42.14 on page 941.
Sector
TOTALS:
Department
Sales
Profit
$4,285.00
$1,071.25
20
10 Finally, PROC REPORT writes all the break lines, with underlining, overlining,
3 Sctrpct is a computed variable whose values are based on the values of Sales and a
temporary variable, Sctrtot, which is the total sales for a sector.
At the beginning of the report, a customized report summary tells what the sales for
all stores are. At a break before each group of observations for a department, a default
summary summarizes the data for that sector. At the end of each group a break inserts
a blank line.
The following statements produce Output 42.3. The user-dened formats that are
used are created by a PROC FORMAT step on page 949.
942
Report-Building Examples
Chapter 42
Note: Calculations of the percentages do not multiply their results by 100 because
PROC REPORT prints them with the PERCENT. format. 4
/ Sector group
format=$sctrfmt.;
define department / group format=$deptfmt.;
define sales
/ analysis sum
format=dollar9.2 ;
define sctrpct
/ computed
format=percent9.2 ;
define salespct
/ pctsum format=percent9.2;
compute before;
line ;
line @16 Total for all stores is
sales.sum dollar9.2;
line ;
line @29 Sum of @40 Percent
@51 Percent of;
line @6 Sector @17 Department
@29 Sales
@40 of Sector @51 All Stores;
line @6 55*=;
line ;
endcomp;
break before sector / summarize ul;
compute before sector;
sctrtot=sales.sum;
sctrpct=sales.sum/sctrtot;
endcomp;
compute sctrpct;
sctrpct=sales.sum/sctrtot;
endcomp;
break after sector/skip;
where sector contains n;
title Report for Northeast and Northwest Sectors;
run;
Output 42.3
Report-Building Examples
943
Northwest
--------Northwest
Canned
Meat/Dairy
Paper
Produce
$1,831.00
--------$840.00
$490.00
$290.00
$211.00
100.00%
--------45.88%
26.76%
15.84%
11.52%
42.73%
--------19.60%
11.44%
6.77%
4.92%
Canned
Meat/Dairy
Paper
Produce
$2,454.00
--------$1,070.00
$1,055.00
$150.00
$179.00
100.00%
--------43.60%
42.99%
6.11%
7.29%
57.27%
--------24.97%
24.62%
3.50%
4.18%
Department are group variables) and by calculating the statistics (Sales.sum and
Sales.pctsum) for each detail row, for the break at the beginning of the report, for
the breaks before each group, and for the breaks after each group. It stores these
values in a temporary le.
2 PROC REPORT initializes the temporary variable, Sctrtot, to missing (see Figure
Department
Temporary
Variable
Sales.sum
Sctrpct
Sales.pctsum
Sctrtot
3 Because this PROC REPORT step contains a COMPUTE BEFORE statement, the
procedure constructs a preliminary summary line for the break at the beginning of
the report. This preliminary summary line contains values for the statistics
(Sales.sum and Sales.pctsum) and the computed variable (Sctrpct).
At this break, Sales.sum is the sales for all stores, and Sales.pctsum is the
percentage those sales represent for all stores (100%). PROC REPORT takes the
values for these statistics from the temporary le that it created at the beginning
of the report-building process.
The value for Sctrpct comes from executing the statements in the corresponding
compute block. Because the value of Sctrtot is missing, PROC REPORT cannot
calculate a value for Sctrpct. Therefore, in the preliminary summary line (which is
944
Report-Building Examples
Chapter 42
not printed in this case), this variable also has a missing value (see Figure 42.16
on page 944).
The statements in the COMPUTE BEFORE block do not alter any variables.
Therefore, the nal summary line is the same as the preliminary summary line.
Note: The COMPUTE BEFORE statement creates a break at the beginning of
the report. You do not need to use an RBREAK statement.
Figure 42.16 Preliminary and Final Summary Line for the Break at the Beginning
of the Report
Report Variables
Sector
Department
Temporary
Variable
Sales.sum
Sctrpct
Sales.pctsum
Sctrtot
$4,285.00
100.00%
4 Because the program does not include an RBREAK statement with the
SUMMARIZE option, PROC REPORT does not write the nal summary line to the
report. Instead, it uses LINE statements to write a customized summary that
embeds the value of Sales.sum into a sentence and to write customized column
headers. (The NOHEADER option in the PROC REPORT statement suppresses
the default column headers, which would have appeared before the customized
summary.)
5 Next, PROC REPORT constructs a preliminary summary line for the break before
the rst group of observations. (This break both uses the SUMMARIZE option in
the BREAK statement and has a compute block attached to it. Either of these
conditions generates a summary line.) The preliminary summary line contains
values for the break variable (Sector), the statistics (Sales.sum and Sales.pctsum),
and the computed variable (Sctrpct). At this break, Sales.sum is the sales for one
sector (the northeast sector). PROC REPORT takes the values for Sector,
Sales.sum, and Sales.pctsum from the temporary le that it created at the
beginning of the report-building process.
The value for Sctrpct comes from executing the statements in the corresponding
compute blocks. Because the value of Sctrtot is still missing, PROC REPORT
cannot calculate a value for Sctrpct. Therefore, in the preliminary summary line,
Sctrpct has a missing value (see Figure 42.17 on page 944).
Figure 42.17 Preliminary Summary Line for the Break before the First Group of
Observations
Report Variables
Sector
Northeast
Department
Temporary
Variable
Sales.sum
Sctrpct
Sales.pctsum
Sctrtot
$1,831.00
42.73%
6 PROC REPORT creates the nal version of the summary line by executing the
3 The rst statement assigns the value of Sales.sum, which in that part of the
report represents total sales for one Sector, to the variable Sctrtot.
Report-Building Examples
945
Department
Temporary
Variable
Sctrpct
Sales.pctsum
Sctrtot
$1,831.00
Northeast
Sales.sum
100.00%
42.73%
$1,831.00
SUMMARIZE option, PROC REPORT writes the nal summary line to the report.
The UL option in the BREAK statement underlines the summary line.
8 Now, PROC REPORT is ready to start building the rst detail row of the report. It
initializes all report variables to missing. Values for temporary variables do not
change. Figure 42.19 on page 945 illustrates the rst detail row at this point.
Department
Temporary
Variable
Sales.sum
Sctrpct
Sales.pctsum
Sctrtot
$1,831.00
9 Figure 42.20 on page 945 illustrates the construction of the rst three columns of
the row. PROC REPORT lls in values for the row from left to right. The values
come from the temporary le that it created at the beginning of the report-building
process.
Department
Temporary
Variable
Sctrpct
Sales.pctsum
Sctrtot
Northeast
Sales.sum
$1,831.00
Report Variables
Temporary
Variable
Sector
Department
Sales.sum
Sctrpct
Sales.pctsum
Sctrtot
Northeast
Canned
$1,831.00
Report Variables
Temporary
Variable
Sector
Department
Sales.sum
Sctrpct
Sales.pctsum
Sctrtot
Northeast
Canned
$840.00
$1,831.00
946
Report-Building Examples
Chapter 42
10 The next column in the report contains the computed variable Sctrpct. When it
gets to this column, PROC REPORT executes the statement in the compute block
attached to Sctrpct. This statement calculates the percentage of the sectors total
sales that this department accounts for:
sctrpct=sales.sum/sctrtot;
Figure 42.21 First Detail Row with the First Computed Variable Added
Report Variables
Temporary
Variable
Sector
Department
Sales.sum
Sctrpct
Sales.pctsum
Sctrtot
Northeast
Canned
$840.00
45.88%
$1,831.00
11 The next column in the report contains the statistic Sales.pctsum. PROC REPORT
gets this value from the temporary le. The rst detail row is now complete (see
Figure 42.22 on page 946).
Temporary
Variable
Sector
Department
Sales.sum
Sctrpct
Sales.pctsum
Sctrtot
Northeast
Canned
$840.00
45.88%
19.60%
$1,831.00
12 PROC REPORT writes the detail row to the report. It repeats steps 8, 9, 10, 11,
Report-Building Examples
947
Figure 42.23 Preliminary Summary Line for the Break before the Second Group of
Observations
Report Variables
Sector
Department
Temporary
Variable
Sales.sum
Sales.pctsum
Sctrtot
$2,454.00
Northwest
Sctrpct
134.00%
57.27%
$1,831.00
CAUTION:
Synchronize values for computed variables in break lines to prevent incorrect results.
If the PROC REPORT step does not recalculate Sctrpct in the compute block
that is attached to the break, then the value in the nal summary line will not
be synchronized with the other values in the summary line, and the report will
be incorrect. 4
15 PROC REPORT creates the nal version of the summary line by executing the
3 The rst statement assigns the value of Sales.sum, which in that part of the
report represents sales for the Northwest sector, to the variable Sctrtot.
Figure 42.24 Final Summary Line for the Break before the Second Group of
Observations
Report Variables
Sector
Northwest
Department
Temporary
Variable
Sales.sum
Sctrpct
Sales.pctsum
Sctrtot
$2,454.00
100.00%
57.27%
$2,454.00
948
Chapter 42
FORMAT statement
FORMAT procedure:
LIBRARY=
SAS system options:
FMTSEARCH=
Automatic macro variables:
SYSDATE
This example uses a permanent data set and permanent formats to create a report
that contains
Program
Declare the PROCLIB library. The PROCLIB library is used to store user-created formats.
libname proclib SAS-data-library;
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=64 pagesize=60;
Program
949
Create the GROCERY data set. GROCERY contains one days sales gures for eight stores in
the Grocery Mart chain. Each observation contains one days sales data for one department in
one store.
data grocery;
input Sector $
datalines;
se 1 np1 50
se
se 2 np1 40
se
nw 3 np1 60
nw
nw 4 np1 45
nw
nw 9 np1 45
nw
sw 5 np1 53
sw
sw 6 np1 40
sw
ne 7 np1 90
ne
ne 8 np1 200
ne
;
p1
p1
p1
p1
p1
p1
p1
p1
p1
100
300
600
250
205
130
350
190
300
se
se
nw
nw
nw
sw
sw
ne
ne
1
2
3
4
9
5
6
7
8
np2
np2
np2
np2
np2
np2
np2
np2
np2
120
220
420
230
420
120
225
420
420
se
se
nw
nw
nw
sw
sw
ne
ne
1
2
3
4
9
5
6
7
8
p2
p2
p2
p2
p2
p2
p2
p2
p2
80
70
30
73
76
50
80
86
125
Create the $SCTRFMT., $MGRFMT., and $DEPTFMT. formats. PROC FORMAT creates
permanent formats for Sector, Manager, and Department. The LIBRARY= option species a
permanent storage location so that the formats are available in subsequent SAS sessions. These
formats are used for examples throughout this section.
proc format library=proclib;
value $sctrfmt se = Southeast
ne = Northeast
nw = Northwest
sw = Southwest;
value $mgrfmt 1
3
5
7
9
=
=
=
=
=
Smith
2
Reveiz 4
Taylor 6
Alomar 8
Pelfrey;
=
=
=
=
=
=
=
=
Jones
Brown
Adams
Andrews
Paper
Canned
Meat/Dairy
Produce;
Specify the format search library. The SAS system option FMTSEARCH= adds the SAS
data library PROCLIB to the search path that is used to locate formats.
options fmtsearch=(proclib);
Specify the report options. The NOWD option runs the REPORT procedure without the
REPORT window and sends its output to the open output destination(s).
proc report data=grocery nowd;
950
Output
Chapter 42
Specify the report columns. The report contains a column for Manager, Department, and
Sales. Because there is no DEFINE statement for any of these variables, PROC REPORT uses
the character variables (Manager and Department) as display variables and the numeric
variable (Sales) as an analysis variable that is used to calculate the sum statistic.
column manager department sales;
Produce a report summary. The RBREAK statement produces a default summary at the end
of the report. DOL writes a line of equal signs (=) above the summary information.
SUMMARIZE sums the value of Sales for all observations in the report.
rbreak after / dol summarize;
Select the observations to process. The WHERE statement selects for the report only the
observations for stores in the southeast sector.
where sector=se;
Format the report columns. The FORMAT statement assigns formats to use in the report.
You can use the FORMAT statement only with data set variables.
format manager $mgrfmt. department $deptfmt.
sales dollar11.2;
Specify the titles. SYSDATE is an automatic macro variable that returns the date when the
SAS job or SAS session began. The TITLE2 statement uses double rather than single quotation
marks so that the macro variable resolves.
title Sales for the Southeast Sector;
title2 "for &sysdate";
run;
Output
Sales for the Southeast Sector
for 04JAN02
Manager
Smith
Smith
Smith
Smith
Jones
Jones
Jones
Jones
Department
Paper
Meat/Dairy
Canned
Produce
Paper
Meat/Dairy
Canned
Produce
Sales
$50.00
$100.00
$120.00
$80.00
$40.00
$300.00
$220.00
$70.00
===========
$980.00
Program
This example
3 arranges the rows alphabetically by the formatted values of Manager and the
internal values of Department (so that sales for the two departments that sell
nonperishable goods precede sales for the two departments that sell perishable
goods)
3
3
3
3
controls the default column width and the spacing between columns
underlines the column headers and writes a blank line beneath the underlining
creates a default summary of Sales for each manager
creates a customized summary of Sales for the whole report.
Program
Declare the PROCLIB library. The PROCLIB library is used to store user-created formats.
libname proclib SAS-data-library;
951
952
Program
Chapter 42
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page. FMTSEARCH=
species the library to include when searching for user-created formats.
options nodate pageno=1 linesize=64 pagesize=60
fmtsearch=(proclib);
Specify the report options. The NOWD option runs PROC REPORT without the REPORT
window and sends its output to the open output destination(s). COLWIDTH=10 sets the default
column width to 10 characters. SPACING= puts ve blank characters between columns.
HEADLINE underlines all column headers and the spaces between them at the top of each page
of the report. HEADSKIP writes a blank line beneath the underlining that HEADLINE writes.
proc report data=grocery nowd
colwidth=10
spacing=5
headline headskip;
Specify the report columns. The report contains a column for Manager, Department, and
Sales.
column manager department sales;
Dene the sort order variables. The values of all variables with the ORDER option in the
DEFINE statement determine the order of the rows in the report. In this report, PROC
REPORT arranges the rows rst by the value of Manager (because it is the rst variable in the
COLUMN statement) and then by the values of Department.
ORDER= species the sort order for a variable. This report arranges the rows according to the
formatted values of Manager and the internal values of Department (np1, np2, p1, and p2).
FORMAT= species the formats to use in the report.
define manager / order order=formatted format=$mgrfmt.;
define department / order order=internal format=$deptfmt.;
Dene the analysis variable. Sum calculates the sum statistic for all observations that are
represented by the current row. In this report each row represents only one observation.
Therefore, the Sum statistic is the same as the value of Sales for that observation in the input
data set. Using Sales as an analysis variable in this report enables you to summarize the values
for each group and at the end of the report.
define sales / analysis sum format=dollar7.2;
Produce a report summary. This BREAK statement produces a default summary after the
last row for each manager. OL writes a row of hyphens above the summary line. SUMMARIZE
writes the value of Sales (the only analysis or computed variable) in the summary line. PROC
REPORT sums the values of Sales for each manager because Sales is an analysis variable that
is used to calculate the Sum statistic. SKIP writes a blank line after the summary line.
Output
953
Produce a customized summary. This COMPUTE statement begins a compute block that
produces a customized summary at the end of the report. The LINE statement places the quoted
text and the value of Sales.sum (with the DOLLAR9.2 format) in the summary. An ENDCOMP
statement must end the compute block.
compute after;
line Total sales for these stores were:
sales.sum dollar9.2;
endcomp;
Select the observations to process. The WHERE statement selects for the report only the
observations for stores in the southeast sector.
where sector=se;
Output
Sales for the Southeast Sector
Manager
Department
Sales
---------------------------------Jones
Paper
Canned
Meat/Dairy
Produce
$40.00
$220.00
$300.00
$70.00
------$630.00
Paper
Canned
Meat/Dairy
Produce
$50.00
$120.00
$100.00
$80.00
------$350.00
------Jones
Smith
------Smith
$980.00
954
Example 3: Using Aliases to Obtain Multiple Statistics for the Same Variable
Chapter 42
Example 3: Using Aliases to Obtain Multiple Statistics for the Same Variable
Procedure features:
COLUMN statement:
with aliases
COMPUTE statement arguments:
AFTER
DEFINE statement options:
ANALYSIS
MAX
MIN
NOPRINT
customizing column headers
LINE statement:
pointer controls
quoted text
repeating a character string
variable values and formats
writing a blank line
Other features:
Formats:
The customized summary at the end of this report displays the minimum and
maximum values of Sales over all departments for stores in the southeast sector. To
determine these values, PROC REPORT needs the MIN and MAX statistic for Sales in
every row of the report. However, to keep the report simple, the display of these
statistics is suppressed.
Program
Declare the PROCLIB library. The PROCLIB library is used to store user-created formats.
libname proclib SAS-data-library;
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page. FMTSEARCH=
species the library to include when searching for user-created formats.
options nodate pageno=1 linesize=64 pagesize=60
fmtsearch=(proclib);
Program
955
Specify the report options. The NOWD option runs PROC REPORT without the REPORT
window and sends its output to the open output destination(s). HEADLINE underlines all
column headers and the spaces between them at the top of each page of the report. HEADSKIP
writes a blank line beneath the underlining that HEADLINE writes.
proc report data=grocery nowd headline headskip;
Specify the report columns. The report contains columns for Manager and Department. It
also contains three columns for Sales. The column specications SALES=SALESMIN and
SALES=SALESMAX create aliases for Sales. These aliases enable you to use a separate
denition of Sales for each of the three columns.
column manager department sales
sales=salesmin
sales=salesmax;
Dene the sort order variables. The values of all variables with the ORDER option in the
DEFINE statement determine the order of the rows in the report. In this report, PROC REPORT
arranges the rows rst by the value of Manager (because it is the rst variable in the COLUMN
statement) and then by the values of Department. The ORDER= option species the sort order
for a variable. This report arranges the values of Manager by their formatted values and
arranges the values of Department by their internal values (np1, np2, p1, and p2). FORMAT=
species the formats to use in the report. Text in quotation marks species column headings.
define manager / order
order=formatted
format=$mgrfmt.
Manager;
define department
/ order
order=internal
format=$deptfmt.
Department;
Dene the analysis variable. The value of an analysis variable in any row of a report is the
value of the statistic that is associated with it (in this case Sum), calculated for all observations
that are represented by that row. In a detail report each row represents only one observation.
Therefore, the Sum statistic is the same as the value of Sales for that observation in the input
data set.
define sales / analysis sum format=dollar7.2 Sales;
Dene additional analysis variables for use in the summary. These DEFINE statements
use aliases from the COLUMN statement to create separate columns for the MIN and MAX
statistics for the analysis variable Sales. NOPRINT suppresses the printing of these statistics.
Although PROC REPORT does not print these values in columns, it has access to them so that
it can print them in the summary.
define salesmin / analysis min noprint;
define salesmax / analysis max noprint;
956
Program
Chapter 42
Print a horizontal line at the end of the report. This COMPUTE statement begins a
compute block that executes at the end of the report. The rst LINE statement writes a blank
line. The second LINE statement writes 53 hyphens (-), beginning in column 7. Note that the
pointer control (@) has no effect on ODS destinations other than traditional SAS monospace
output.
compute after;
line ;
line @7 53*-;
Produce a customized summary. The rst line of this LINE statement writes the text in
quotation marks, beginning in column 7. The second line writes the value of Salesmin with the
DOLLAR7.2 format, beginning in the next column. The cursor then moves one column to the
right (+1), where PROC REPORT writes the text in quotation marks. Again, the cursor moves
one column to the right, and PROC REPORT writes the value of Salesmax with the DOLLAR7.2
format. (Note that the program must reference the variables by their aliases.) The third line
writes the text in quotation marks, beginning in the next column. Note that the pointer control
(@) is designed for the Listing destination (traditional SAS output). It has no effect on ODS
destinations other than traditional SAS monospace output. The ENDCOMP statement ends the
compute block.
line @7 | Departmental sales ranged from
salesmin dollar7.2 +1 to +1 salesmax dollar7.2
. |;
line @7 53*-;
endcomp;
Select the observations to process. The WHERE statement selects for the report only the
observations for stores in the southeast sector.
where sector=se;
Specify the titles. SYSDATE is an automatic macro variable that returns the date when the
SAS job or SAS session began. The TITLE2 statement uses double rather than single quotation
marks so that the macro variable resolves.
title Sales for the Southeast Sector;
title2 "for &sysdate";
run;
957
Output
Sales for the Southeast Sector
for 04JAN02
Manager Department
Sales
---------------------------Jones
Smith
Paper
Canned
Meat/Dairy
Produce
Paper
Canned
Meat/Dairy
Produce
$40.00
$220.00
$300.00
$70.00
$50.00
$120.00
$100.00
$80.00
958
Program
Chapter 42
3
3
3
3
Program
Declare the PROCLIB library. The PROCLIB library is used to store user-created formats.
libname proclib SAS-data-library;
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page. FMTSEARCH=
species the library to include when searching for user-created formats.
options nodate pageno=1 linesize=64 pagesize=60
fmtsearch=(proclib);
Specify the report options. The NOWD option runs PROC REPORT without the REPORT
window and sends its output to the open output destination(s). HEADLINE underlines all
column headings and the spaces between them at the top of each page of the report. HEADSKIP
writes a blank line beneath the underlining that HEADLINE writes.
proc report data=grocery nowd headline headskip;
Specify the report columns. The report contains columns for Sector, Manager, and Sales.
column sector manager sales;
Dene the group and analysis variables. In this report, Sector and Manager are group
variables. Sales is an analysis variable that is used to calculate the Sum statistic. Each detail
row represents a set of observations that have a unique combination of formatted values for all
group variables. The value of Sales in each detail row is the sum of Sales for all observations in
the group. FORMAT= species the format to use in the report. Text in quotation marks in a
DEFINE statement species the column heading.
define sector / group
format=$sctrfmt.
Sector;
define manager / group
format=$mgrfmt.
Manager;
define sales / analysis sum
format=comma10.2
Sales;
Program
959
Produce a report summary. This BREAK statement produces a default summary after the
last row for each sector. OL writes a row of hyphens above the summary line. SUMMARIZE
writes the value of Sales in the summary line. PROC REPORT sums the values of Sales for
each manager because Sales is an analysis variable used to calculate the Sum statistic.
SUPPRESS prevents PROC REPORT from displaying the value of Sector in the summary line.
SKIP writes a blank line after the summary line.
break after sector / ol
summarize
suppress
skip;
Produce a customized summary. This compute block creates a customized summary at the
end of the report. The LINE statement writes the quoted text and the value of Sales.sum (with a
format of DOLLAR9.2) in the summary. An ENDCOMP statement must end the compute block.
compute after;
line Combined sales for the northern sectors were
sales.sum dollar9.2 .;
endcomp;
Specify a format for the summary rows. In detail rows, PROC REPORT displays the value
of Sales with the format that is specied in its denition (COMMA10.2). The compute block
species an alternate format to use in the current column on summary rows. Summary rows are
identied as a value other than a blank for _BREAK_.
compute sales;
if _break_ ne then
call define(_col_,"format","dollar11.2");
endcomp;
Select the observations to process. The WHERE statement selects for the report only the
observations for stores in the northeast and northwest sectors. The TITLE statement species
the title.
where sector contains n;
960
Output
Chapter 42
Output
Sales Figures for Northern Sectors
Sector
Manager
Sales
-----------------------------Northeast
Alomar
Andrews
786.00
1,045.00
---------$1,831.00
Northwest
Brown
Pelfrey
Reveiz
598.00
746.00
1,110.00
---------$2,454.00
Program
961
Program
Declare the PROCLIB library. The PROCLIB library is used to store user-created formats.
libname proclib SAS-data-library;
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page. FMTSEARCH=
species the library to include when searching for user-created formats.
options nodate pageno=1 linesize=64 pagesize=60
fmtsearch=(proclib);
Specify the report options. The NOWD option runs PROC REPORT without the REPORT
window and sends its output to the open output destination(s). HEADLINE underlines the
column headings. HEADSKIP writes a blank line beneath the underlining that HEADLINE
writes. SPLIT= denes the split character as an asterisk (*) because the default split character
(/) is part of the name of a department.
proc report data=grocery nowd
headline
headskip
split=*;
Specify the report columns. Department and Sales are separated by a comma in the
COLUMN statement, so they collectively determine the contents of the column that they dene.
Each item generates a header, but the header for Sales is set to blank in its denition. Because
Sales is an analysis variable, its values ll the cells that are created by these two variables.
column sector manager department,sales perish;
Dene the group variables. In this report, Sector and Manager are group variables. Each
detail row of the report consolidates the information for all observations with the same values of
the group variables. FORMAT= species the formats to use in the report. Text in quotation
marks in the DEFINE statements species column headings. These statements illustrate two
ways to write a blank line in a column header. Sector writes a blank line because each
quoted string is a line of the column heading. The two adjacent quotation marks write a blank
line for the second line of the heading. Manager* writes a blank line because the split
character (*) starts a new line of the heading. That line contains only a blank.
define sector / group format=$sctrfmt. Sector ;
define manager / group format=$mgrfmt. Manager* ;
962
Program
Chapter 42
Dene the across variable. PROC REPORT creates a column and a column heading for each
formatted value of the across variable Department. PROC REPORT orders the columns by these
values. PROC REPORT also generates a column heading that spans all these columns. Quoted
text in the DEFINE statement for Department customizes this heading. In traditional
(monospace) SAS output, PROC REPORT expands the heading with underscores to ll all
columns that are created by the across variable.
define department / across format=$deptfmt. _Department_;
Dene the analysis variable. Sales is an analysis variable that is used to calculate the sum
statistic. In each case, the value of Sales is the sum of Sales for all observations in one
department in one group. (In this case, the value represents a single observation.)
define sales / analysis sum format=dollar11.2 ;
Dene the computed variable. The COMPUTED option indicates that PROC REPORT must
compute values for Perish. You compute the variables values in a compute block that is
associated with Perish.
define perish / computed format=dollar11.2
Perishable*Total;
Produce a report summary. This BREAK statement creates a default summary after the last
row for each value of Manager. The only option that is in use is SKIP, which writes a blank line.
You can use this technique to double-space in many reports that contains a group or order
variable.
break after manager / skip;
Calculate values for the computed variable. This compute block computes the value of
Perish from the values for the Meat/Dairy department and the Produce department. Because
the variables Sales and Department collectively dene these columns, there is no way to
identify the values to PROC REPORT by name. Therefore, the assignment statement uses
column numbers to unambiguously specify the values to use. Each time PROC REPORT needs a
value for Perish, it sums the values in the third and fourth columns of that row of the report.
compute perish;
perish=sum(_c3_, _c4_);
endcomp;
Produce a customized summary. This compute block creates a customized summary at the
end of the report. The rst LINE statement writes 57 hyphens (-) starting in column 4.
Subsequent LINE statements write the quoted text in the specied columns and the values of
the variables _C3_, _C4_, and _C5_ with the DOLLAR11.2 format. Note that the pointer control
(@) is designed for the Listing destination. It has no effect on ODS destinations other than
traditional SAS monospace output.
Output
compute after;
line @4 57*-;
line @4 |
Combined sales for meat and dairy :
@46 _c3_ dollar11.2
|;
line @4 |
Combined sales for produce :
@46 _c4_ dollar11.2
|;
line @4 | @60 |;
line @4 |
Combined sales for all perishables:
@46 _c5_ dollar11.2
|;
line @4 57*-;
endcomp;
Select the observations to process. The WHERE statement selects for the report only the
observations for departments p1 and p2 in stores in the northeast or northwest sector.
where sector contains n
and (department=p1 or department=p2);
Output
Sales Figures for Perishables in Northern Sectors
_______Department_______
Meat/Dairy
Produce
Sector
Manager
Perishable
Total
--------------------------------------------------------Northeast
Alomar
$190.00
$86.00
$276.00
Andrews
$300.00
$125.00
$425.00
Brown
$250.00
$73.00
$323.00
Pelfrey
$205.00
$76.00
$281.00
Reveiz
$600.00
$30.00
$630.00
Northwest
--------------------------------------------------------|
Combined sales for meat and dairy :
$1,545.00
|
|
Combined sales for produce :
$390.00
|
|
|
|
Combined sales for all perishables:
$1,935.00
|
---------------------------------------------------------
963
964
Chapter 42
Formats:
The report in this example displays six statistics for the sales for each managers
store. The output is too wide to t all the columns on one page, so three of the statistics
appear on the second page of the report. In order to make it easy to associate the
statistics on the second page with their group, the report repeats the values of Manager
and Sector on every page of the report.
Program
Declare the PROCLIB library. The PROCLIB library is used to store user-created formats.
libname proclib SAS-data-library;
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page. FMTSEARCH=
species the library to include when searching for user-created formats.
options nodate pageno=1 linesize=80 pagesize=60
fmtsearch=(proclib);
Specify the report options. The NOWD option runs PROC REPORT without the REPORT
window and sends its output to the open output destination(s). HEADLINE underlines all
column headings and the spaces between them at the top of each page of the report. HEADSKIP
writes a blank line beneath the underlining that HEADLINE writes. LS= sets the line size for
the report to 66, and PS= sets the page size to 18.
proc report data=grocery nowd headline headskip
ls=66 ps=18;
Output
965
Specify the report columns. This COLUMN statement creates a column for Sector, Manager,
and each of the six statistics that are associated with Sales.
column sector manager (Sum Min Max Range Mean Std),sales;
Dene the group variables and the analysis variable. ID species that Manager is an ID
variable. An ID variable and all columns to its left appear at the left of every page of a report.
In this report, Sector and Manager are group variables. Each detail row of the report
consolidates the information for all observations with the same values of the group variables.
FORMAT= species the formats to use in the report.
define manager / group format=$mgrfmt. id;
define sector / group format=$sctrfmt.;
define sales / format=dollar11.2 ;
Output
Sales Statistics for All Sectors
Sum
Min
Max
Sector
Manager
Sales
Sales
Sales
--------------------------------------------------------Northeast
Northwest
Southeast
Southwest
Alomar
Andrews
Brown
Pelfrey
Reveiz
Jones
Smith
Adams
Taylor
$786.00
$1,045.00
$598.00
$746.00
$1,110.00
$630.00
$350.00
$695.00
$353.00
$86.00
$125.00
$45.00
$45.00
$30.00
$40.00
$50.00
$40.00
$50.00
$420.00
$420.00
$250.00
$420.00
$600.00
$300.00
$120.00
$350.00
$130.00
Range
Mean
Std
Sector
Manager
Sales
Sales
Sales
--------------------------------------------------------Northeast
Northwest
Southeast
Southwest
Alomar
Andrews
Brown
Pelfrey
Reveiz
Jones
Smith
Adams
Taylor
$334.00
$295.00
$205.00
$375.00
$570.00
$260.00
$70.00
$310.00
$80.00
$196.50
$261.25
$149.50
$186.50
$277.50
$157.50
$87.50
$173.75
$88.25
$156.57
$127.83
$105.44
$170.39
$278.61
$123.39
$29.86
$141.86
$42.65
966
Chapter 42
TITLE statement
WHERE statement
Data set: GROCERY on page 949
Formats:
The rst PROC REPORT step in this example creates a report that displays one
value from each column of the report, using two rows to do so, before displaying another
value from the rst column. (By default, PROC REPORT displays values for only as
many columns as it can t on one page. It lls a page with values for these columns
before starting to display values for the remaining columns on the next page.)
Each item in the report is identied in the body of the report rather than in a column
header.
The report denition created by the rst PROC REPORT step is stored in a catalog
entry. The second PROC REPORT step uses it to create a similar report for a different
sector of the city.
Declare the PROCLIB library. The PROCLIB library is used to store user-created formats.
libname proclib SAS-data-library;
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page. FMTSEARCH=
species the library to include when searching for user-created formats.
options nodate pageno=1 linesize=80 pagesize=60
fmtsearch=(proclib);
Specify the report options. The NOWD option runs PROC REPORT without the REPORT
window and sends its output to the open output destination(s). NAMED writes name= in front
of each value in the report, where name= is the column heading for the value. When you use
NAMED, PROC REPORT suppresses the display of column headings at the top of each page.
proc report data=grocery nowd
named
Output
967
wrap
ls=64 ps=36
outrept=proclib.reports.namewrap;
Specify the report columns. The report contains a column for Sector, Manager, Department,
and Sales.
column sector manager department sales;
Dene the display and analysis variables. Because no usage is specied in the DEFINE
statements, PROC REPORT uses the defaults. The character variables (Sector, Manager, and
Department) are display variables. Sales is an analysis variable that is used to calculate the
sum statistic. FORMAT= species the formats to use in the report.
define
define
define
define
sector / format=$sctrfmt.;
manager / format=$mgrfmt.;
department / format=$deptfmt.;
sales / format=dollar11.2;
Select the observations to process. A report denition might differ from the SAS program
that creates the report. In particular, PROC REPORT stores neither WHERE statements nor
TITLE statements.
where manager=1;
Specify the title. SYSDATE is an automatic macro variable that returns the date when the
SAS job or SAS session began. The TITLE statement uses double rather than single quotation
marks so that the macro variable resolves.
title "Sales Figures for Smith on &sysdate";
run;
Output
This is the output from the rst PROC REPORT step, which creates the
report denition.
Manager=Smith
Department=Paper
Manager=Smith
Department=Meat/Dairy
Manager=Smith
Department=Canned
Manager=Smith
Department=Produce
968
Chapter 42
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. FMTSEARCH= species the
library to include when searching for user-created formats.
options nodate pageno=1 fmtsearch=(proclib);
Specify the report options, load the report denition, and select the observations to
process. REPORT= uses the report denition that is stored in
PROCLIB.REPORTS.NAMEWRAP to produce the report. The second report differs from the
rst one because it uses different WHERE and TITLE statements.
proc report data=grocery report=proclib.reports.namewrap
nowd;
where sector=sw;
title "Sales Figures for the Southwest Sector on &sysdate";
run;
Output
Sales Figures for the Southwest Sector on 04JAN02
Sector=Southwest
Sector=Southwest
Sector=Southwest
Sector=Southwest
Sector=Southwest
Sector=Southwest
Sector=Southwest
Sector=Southwest
Manager=Taylor
Manager=Taylor
Manager=Taylor
Manager=Taylor
Manager=Adams
Manager=Adams
Manager=Adams
Manager=Adams
Department=Paper
Department=Meat/Dairy
Department=Canned
Department=Produce
Department=Paper
Department=Meat/Dairy
Department=Canned
Department=Produce
$53.00
$130.00
$120.00
$50.00
$40.00
$350.00
$225.00
$80.00
Program
969
FORMCHAR=
HEADLINE
LS=
PANELS=
PS=
PSPACE=
BREAK statement options:
SKIP
Other features:
Program
Declare the PROCLIB library. The PROCLIB library is used to store user-created formats.
libname proclib SAS-data-library;
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page. FMTSEARCH=
species the library to include when searching for user-created formats.
options nodate pageno=1 linesize=80 pagesize=60
fmtsearch=(proclib);
Specify the report options. The NOWD option runs PROC REPORT without the REPORT
window and sends its output to the open output destination(s). HEADLINE underlines all
column headings and the spaces between them at the top of each panel of the report.
FORMCHAR= sets the value of the second formatting character (the one that HEADLINE
uses) to the tilde (~). Therefore, the tilde underlines the column headings in the output.
HEADSKIP writes a blank line beneath the underlining that HEADLINE writes. LS= sets the
line size for the report to 64, and PS= sets the page size to 18. PANELS= creates a multipanel
report. Specifying PANELS=99 ensures that PROC REPORT ts as many panels as possible on
one page. PSPACE=6 places six spaces between panels.
proc report data=grocery nowd headline
formchar(2)=~
970
Program
Chapter 42
panels=99 pspace=6
ls=64 ps=18;
Specify the report columns. The report contains a column for Manager, Department, and
Sales.
column manager department sales;
Dene the sort order and analysis columns. The values of all variables with the ORDER
option in the DEFINE statement determine the order of the rows in the report. In this report,
PROC REPORT arranges the rows rst by the value of Manager (because it is the rst variable
in the COLUMN statement) and then, within each value of Manager, by the values of
Department. The ORDER= option species the sort order for a variable. This report arranges
the values of Manager by their formatted values and arranges the values of Department by their
internal values (np1, np2, p1, and p2). FORMAT= species the formats to use in the report.
define manager / order
order=formatted
format=$mgrfmt.;
define department / order
order=internal
format=$deptfmt.;
define sales / format=dollar7.2;
Produce a report summary. This BREAK statement produces a default summary after the
last row for each manager. Because SKIP is the only option in the BREAK statement, each
break consists of only a blank line.
break after manager / skip;
Select the observations to process. The WHERE statement selects for the report only the
observations for stores in the northwest or southwest sector.
where sector=nw or sector=sw;
971
Output
Sales for the Western Sectors
Manager Department
Sales
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Adams
Paper
$40.00
Canned
$225.00
Meat/Dairy $350.00
Produce
$80.00
Brown
Pelfrey
Paper
Canned
Meat/Dairy
Produce
$45.00
$230.00
$250.00
$73.00
Paper
Canned
Meat/Dairy
Produce
Manager Department
Sales
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Reveiz
Paper
Canned
Meat/Dairy
Produce
$60.00
$420.00
$600.00
$30.00
Taylor
Paper
Canned
Meat/Dairy
Produce
$53.00
$120.00
$130.00
$50.00
$45.00
$420.00
$205.00
$76.00
The report in this example displays a record of one days sales for each store. The
rows are arranged so that all the information about one store is together, and the
information for each store begins on a new page. Some variables appear in columns.
Others appear only in the page header that identies the sector and the stores manager.
The header that appears at the top of each page is created with the _PAGE_
argument in the COMPUTE statement.
Prot is a computed variable based on the value of Sales and Department.
972
Program
Chapter 42
The text that appears at the bottom of the page depends on the total of Sales for the
store. Only the rst two pages of the report appear here.
Program
Declare the PROCLIB library. The PROCLIB library is used to store user-created formats.
libname proclib SAS-data-library;
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page. FMTSEARCH=
species the library to include when searching for user-created formats.
options nodate pageno=1 linesize=64 pagesize=30
fmtsearch=(proclib);
Specify the report options. The NOWD option runs PROC REPORT without the REPORT
window and sends its output to the open output destination(s). NOHEADER in the PROC
REPORT statement suppresses the default column headings.
proc report data=grocery nowd
headline headskip;
Specify the report columns. The report contains a column for Sector, Manager, Department,
Sales, and Prot, but the NOPRINT option suppresses the printing of the columns for Sector and
Manager. The page heading (created later in the program) includes their values. To get these
variable values into the page heading, Sector and Manager must be in the COLUMN statement.
column sector manager department sales Profit;
Dene the group, computed, and analysis variables. In this report, Sector, Manager, and
Department are group variables. Each detail row of the report consolidates the information for
all observations with the same values of the group variables. Prot is a computed variable
whose values are calculated in the next section of the program. FORMAT= species the formats
to use in the report.
define
define
define
define
define
Program
973
Create a customized page header. This compute block executes at the top of each page, after
PROC REPORT writes the title. It writes the page heading for the current managers store. The
LEFT option left-justies the text in the LINE statements. Each LINE statement writes the
text in quotation marks just as it appears in the statement. The rst two LINE statements
write a variable value with the format specied immediately after the variables name.
compute before _page_ / left;
line sector $sctrfmt. Sector;
line Store managed by manager $mgrfmt.;
line ;
line ;
line ;
endcomp;
Produce a report summary. This BREAK statement creates a default summary after the last
row for each manager. OL writes a row of hyphens above the summary line. SUMMARIZE
writes the value of Sales (the only analysis or computed variable) in the summary line. The
PAGE option starts a new page after each default summary so that the page heading that is
created in the preceding compute block always pertains to the correct manager.
break after manager / ol summarize page;
Produce a customized summary. This compute block places conditional text in a customized
summary that appears after the last detail row for each manager.
compute after manager;
Specify the length of the customized summary text. The LENGTH statement assigns a
length of 35 to the temporary variable TEXT. In this particular case, the LENGTH statement is
unnecessary because the longest version appears in the rst IF/THEN statement. However,
using the LENGTH statement ensures that even if the order of the conditional statements
changes, TEXT will be long enough to hold the longest version.
length text $ 35;
974
Output
Chapter 42
Specify the conditional logic for the customized summary text. You cannot use the LINE
statement in conditional statements (IF-THEN, IF-THEN/ELSE, and SELECT) because it does
not take effect until PROC REPORT has executed all other statements in the compute block.
These IF-THEN/ELSE statements assign a value to TEXT based on the value of Sales.sum in
the summary row. A LINE statement writes that variable, whatever its value happens to be.
if sales.sum lt 500 then
text=Sales are below the target region.;
else if sales.sum ge 500 and sales.sum lt 1000 then
text=Sales are in the target region.;
else if sales.sum ge 1000 then
text=Sales exceeded goal!;
line ;
line text $35.;
endcomp;
run;
Output
Northeast Sector
Store managed by Alomar
Department
Sales
Profit
-----------------------------------Canned
Meat/Dairy
Paper
Produce
$420.00
$190.00
$90.00
$86.00
----------$786.00
$168.00
$47.50
$36.00
$21.50
----------$196.50
Department
Sales
Profit
-----------------------------------Canned
Meat/Dairy
Paper
Produce
$420.00
$300.00
$200.00
$125.00
----------$1,045.00
$168.00
$75.00
$80.00
$31.25
----------$261.25
Program
975
TITLE statement
Data set: GROCERY on page 949
Formats: $MGRFMT. and $DEPTFMT. on page 949
The summary report in this example shows the total sales for each store and the
percentage that these sales represent of sales for all stores. Each of these columns has
its own header. A single header also spans all the columns. This header looks like a
title, but it differs from a title because it would be stored in a report denition. You
must submit a null TITLE statement whenever you use the report denition, or the
report will contain both a title and the spanning header.
The report includes a computed character variable, COMMENT, that ags stores
with an unusually high percentage of sales. The text of COMMENT wraps across
multiple rows. It makes sense to compute COMMENT only for individual stores.
Therefore, the compute block that does the calculation includes conditional code that
prevents PROC REPORT from calculating COMMENT on the summary line.
Program
Declare the PROCLIB library. The PROCLIB library is used to store user-created formats.
libname proclib SAS-data-library;
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page. FMTSEARCH=
species the library to include when searching for user-created formats.
options nodate pageno=1 linesize=64 pagesize=60
fmtsearch=(proclib);
976
Program
Chapter 42
Specify the report options. The NOWD option runs PROC REPORT without the REPORT
window and sends its output to the open output destination(s). HEADLINE underlines all
column headings and the spaces between them at the top of each page of the report. The null
TITLE statement suppresses the title of the report.
proc report data=grocery nowd headline;
title;
Specify the report columns. The COLUMN statement uses the text in quotation marks as a
spanning heading. The heading spans all the columns in the report because they are all
included in the pair of parentheses that contains the heading. The COLUMN statement
associates two statistics with Sales: Sum and Pctsum. The Sum statistic sums the values of
Sales for all observations that are included in a row of the report. The Pctsum statistic shows
what percentage of Sales that sum is for all observations in the report.
column (Individual Store Sales as a Percent of All Sales
sector manager sales,(sum pctsum) comment);
Dene the group and analysis columns. In this report, Sector and Manager are group
variables. Each detail row represents a set of observations that have a unique combination of
formatted values for all group variables. Sales is, by default, an analysis variable that is used to
calculate the Sum statistic. However, because statistics are associated with Sales in the column
statement, those statistics override the default. FORMAT= species the formats to use in the
report. Text between quotation marks species the column heading.
define manager / group
format=$mgrfmt.;
define sector / group
format=$sctrfmt.;
define sales / format=dollar11.2
;
define sum / format=dollar9.2
Total Sales;
Dene the percentage and computed columns. The DEFINE statement for Pctsum
species a column heading, a format, and a column width of 8. The PERCENT. format presents
the value of Pctsum as a percentage rather than a decimal. The DEFINE statement for
COMMENT denes it as a computed variable and assigns it a column width of 20 and a blank
column heading. The FLOW option wraps the text for COMMENT onto multiple lines if it
exceeds the column width.
define pctsum / Percent of Sales format=percent6. width=8;
define comment / computed width=20 flow;
Calculate the computed variable. Options in the COMPUTE statement dene COMMENT
as a character variable with a length of 40.
compute comment / char length=40;
977
Specify the conditional logic for the computed variable. For every store where sales
exceeded 15% of the sales for all stores, this compute block creates a comment that says Sales
substantially above expectations. Of course, on the summary row for the report, the
value of Pctsum is 100. However, it is inappropriate to ag this row as having exceptional sales.
The automatic variable _BREAK_ distinguishes detail rows from summary rows. In a detail row,
the value of _BREAK_ is blank. The THEN statement executes only on detail rows where the
value of Pctsum exceeds 0.15.
if sales.pctsum gt .15 and _break_ =
then comment=Sales substantially above expectations.;
else comment= ;
endcomp;
Produce the report summary. This RBREAK statement creates a default summary at the
end of the report. OL writes a row of hyphens above the summary line. SUMMARIZE writes the
values of Sales.sum and Sales.pctsum in the summary line.
rbreak after / ol summarize;
run;
Output
1
Individual Store Sales as a Percent of All Sales
Total
Percent
Sector
Manager
Sales of Sales
------------------------------------------------------------Northeast Alomar
$786.00
12%
Andrews $1,045.00
17%
Sales substantially
above expectations.
Northwest Brown
$598.00
9%
Pelfrey
$746.00
12%
Reveiz
$1,110.00
18%
Sales substantially
above expectations.
Southeast Jones
$630.00
10%
Smith
$350.00
6%
Southwest Adams
$695.00
11%
Taylor
$353.00
6%
--------- -------$6,313.00
100%
978
Chapter 42
COLUMN statement
with the N statistic
Other features:
TITLE statement
Formats: $MGRFMT. on page 949
This example illustrates the difference between the way PROC REPORT handles
missing values for group (or order or across) variables with and without the MISSING
option. The differences in the reports are apparent if you compare the values of N for
each row and compare the totals in the default summary at the end of the report.
Declare the PROCLIB library. The PROCLIB library is used to store user-created formats.
libname proclib SAS-data-library;
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page. FMTSEARCH=
species the library to include when searching for user-created formats.
options nodate pageno=1 linesize=64 pagesize=60
fmtsearch=(proclib);
Create the GROCMISS data set. GROCMISS is identical to GROCERY except that it
contains some observations with missing values for Sector, Manager, or both.
data grocmiss;
input Sector $
datalines;
se 1 np1 50
.
se 2 np1 40
se
nw 3 np1 60
nw
nw 4 np1 45
nw
nw 9 np1 45
nw
sw 5 np1 53
sw
. . np1 40
sw
ne 7 np1 90
ne
ne 8 np1 200
ne
;
p1
p1
p1
p1
p1
p1
p1
p1
p1
100
300
600
250
205
130
350
190
300
se
se
.
nw
nw
sw
sw
ne
ne
.
2
3
4
9
5
6
7
8
np2
np2
np2
np2
np2
np2
np2
np2
np2
120
220
420
230
420
120
225
420
420
se
se
nw
nw
nw
sw
sw
ne
ne
1
2
3
4
9
5
6
7
8
p2
p2
p2
p2
p2
p2
p2
p2
p2
80
70
30
73
76
50
80
86
125
Specify the report options. The NOWD option runs PROC REPORT without the REPORT
window and sends its output to the open output destination(s). HEADLINE underlines all
column headings and the spaces between them.
proc report data=grocmiss nowd headline;
979
Specify the report columns. The report contains columns for Sector, Manager, the N statistic,
and Sales.
column sector manager N sales;
Dene the group and analysis variables. In this report, Sector and Manager are group
variables. Sales is, by default, an analysis variable that is used to calculate the Sum statistic.
Each detail row represents a set of observations that have a unique combination of formatted
values for all group variables. The value of Sales in each detail row is the sum of Sales for all
observations in the group. In this PROC REPORT step, the procedure does not include
observations with a missing value for the group variable. FORMAT= species formats to use in
the report.
define sector / group format=$sctrfmt.;
define manager / group format=$mgrfmt.;
define sales / format=dollar9.2;
Produce a report summary. This RBREAK statement creates a default summary at the end
of the report. DOL writes a row of equal signs above the summary line. SUMMARIZE writes the
values of N and Sales.sum in the summary line.
rbreak after / dol summarize;
980
Chapter 42
Include the missing values. The MISSING option in the second PROC REPORT step includes
the observations with missing values for the group variable.
proc report data=grocmiss nowd headline missing;
column sector manager N sales;
define sector / group format=$sctrfmt.;
define manager / group format=$mgrfmt.;
define sales / format=dollar9.2;
rbreak after / dol summarize;
run;
981
Other features:
This example uses WHERE processing as it builds an output data set. This
technique enables you to do WHERE processing after you have consolidated multiple
observations into a single row.
The rst PROC REPORT step creates a report (which it does not display) in which
each row represents all the observations from the input data set for a single manager.
The second PROC REPORT step builds a report from the output data set. This report
uses line-drawing characters to separate the rows and columns.
Declare the PROCLIB library. The PROCLIB library is used to store user-created formats.
libname proclib SAS-data-library;
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page. FMTSEARCH=
species the library to include when searching for user-created formats.
options nodate pageno=1 linesize=64 pagesize=60
fmtsearch=(proclib);
Specify the report options and columns. The NOWD option runs PROC REPORT without
the REPORT window and sends its output to the open output destination(s). OUT= creates the
output data set TEMP. The output data set contains a variable for each column in the report
(Manager and Sales) as well as for the variable _BREAK_, which is not used in this example.
Each observation in the data set represents a row of the report. Because Manager is a group
variable and Sales is an analysis variable that is used to calculate the Sum statistic, each row
in the report (and therefore each observation in the output data set) represents multiple
observations from the input data set. In particular, each value of Sales in the output data set is
the total of all values of Sales for that manager. The WHERE= data set option in the OUT=
option lters those rows as PROC REPORT creates the output data set. Only those observations
with sales that exceed $1,000 become observations in the output data set.
proc report data=grocery nowd
out=temp( where=(sales gt 1000) );
column manager sales;
982
Chapter 42
Dene the group and analysis variables. Because the denitions of all report items in this
report include the NOPRINT option, PROC REPORT does not print a report. However, the
PROC REPORT step does execute and create an output data set.
define manager / group noprint;
define sales / analysis sum noprint;
run;
This is the output data set that PROC REPORT creates. It is used as
the input set in the second PROC REPORT step.
Sales
1110
1045
_____________BREAK______________
Specify the report options and columns, dene the group and analysis columns, and
specify the titles. DATA= species the output data set from the rst PROC REPORT step as
the input data set for this report. The BOX option draws an outline around the output,
separates the column headings from the body of the report, and separates rows and columns of
data. The TITLE statements specify a title for the report.
proc report data=temp box nowd;
column manager sales;
define manager / group format=$mgrfmt.;
define sales / analysis sum format=dollar11.2;
title Managers with Daily Sales;
title2 of over;
title3 One Thousand Dollars;
run;
983
---------------------|Manager
Sales|
|--------------------|
|Andrews|
$1,045.00|
|-------+------------|
|Reveiz |
$1,110.00|
----------------------
Declare the PROCLIB library. The PROCLIB library is used to store user-created formats.
libname proclib SAS-data-library;
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page. FMTSEARCH=
species the library to include when searching for user-created formats.
options nodate pageno=1 linesize=64 pagesize=60
fmtsearch=(proclib);
984
Chapter 42
Specify the report options. The NOWD option runs PROC REPORT without the REPORT
window and sends its output to the open output destination(s). OUT= creates the output data
set PROFIT.
proc report data=grocery nowd out=profit;
Specify the report columns. The report contains columns for Manager, Department, Sales,
and Prot, which is not in the input data set. Because the purpose of this report is to generate
an output data set to use in another procedure, the report layout simply uses the default usage
for all the data set variables to list all the observations. DEFINE statements for the data set
variables are unnecessary.
column sector manager department sales Profit;
Dene the computed column. The COMPUTED option tells PROC REPORT that Prot is
dened in a compute block somewhere in the PROC REPORT step.
define profit / computed;
985
This is the output data set that is created by PROC REPORT. It is used
as input for PROC CHART.
Manager
1
1
1
1
2
2
2
2
3
3
3
3
4
4
4
4
9
9
9
9
5
5
5
5
6
6
6
6
7
7
7
7
8
8
8
8
Department
np1
p1
np2
p2
np1
p1
np2
p2
np1
p1
np2
p2
np1
p1
np2
p2
np1
p1
np2
p2
np1
p1
np2
p2
np1
p1
np2
p2
np1
p1
np2
p2
np1
p1
np2
p2
Sales
50
100
120
80
40
300
220
70
60
600
420
30
45
250
230
73
45
205
420
76
53
130
120
50
40
350
225
80
90
190
420
86
200
300
420
125
1
Profit
20
25
48
20
16
75
88
17.5
24
150
168
7.5
18
62.5
92
18.25
18
51.25
168
19
21.2
32.5
48
12.5
16
87.5
90
20
36
47.5
168
21.5
80
75
168
31.25
_BREAK__
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page. FMTSEARCH=
species the library to include when searching for user-created formats.
options nodate pageno=1 linesize=80 pagesize=60
fmtsearch=(proclib);
986
Chapter 42
Chart the data in the output data set. PROC CHART uses the output data set from the
previous PROC REPORT step to chart the sum of Prot for each sector.
proc chart data=profit;
block sector / sumvar=profit;
format sector $sctrfmt.;
format profit dollar7.2;
title Sum of Profit by Sector;
run;
Northwest
Southeast
Southwest
Sector
Formats:
This example shows how to use formats to control the number of groups that PROC
REPORT creates. The program creates a format for Department that classies the four
departments as one of two types: perishable or nonperishable. Consequently, when
Department is an across variable, PROC REPORT creates only two columns instead of
four. The column header is the formatted value of the variable.
Program
987
Program
Declare the PROCLIB library. The PROCLIB library is used to store user-created formats.
libname proclib SAS-data-library;
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page. FMTSEARCH=
species the library to include when searching for user-created formats.
options nodate pageno=1 linesize=64 pagesize=60
fmtsearch=(proclib);
Create the $PERISH. format. PROC FORMAT creates a format for Department. This
variable has four different values in the data set, but the format has only two values.
proc format;
value $perish p1,p2=Perishable
np1,np2=Nonperishable;
run;
Specify the report options. The NOWD option runs the REPORT procedure without the
REPORT window and sends its output to the open output destination(s). HEADLINE underlines
all column headings and the spaces between them at the top of each page of the report.
HEADSKIP writes a blank line beneath the underlining that HEADLINE writes.
proc report data=grocery nowd
headline
headskip;
Specify the report columns. Department and Sales are separated by a comma in the
COLUMN statement, so they collectively determine the contents of the column that they dene.
Because Sales is an analysis variable, its values ll the cells that are created by these two
variables. The report also contains a column for Manager and a column for Sales by itself
(which is the sales for all departments).
column manager department,sales sales;
Dene the group and across variables. Manager is a group variable. Each detail row of the
report consolidates the information for all observations with the same value of Manager.
Department is an across variable. PROC REPORT creates a column and a column heading for
each formatted value of Department. ORDER=FORMATTED arranges the values of Manager
and Department alphabetically according to their formatted values. FORMAT= species the
formats to use. The empty quotation marks in the denition of Department specify a blank
column heading, so no heading spans all the departments. However, PROC REPORT uses the
formatted values of Department to create a column heading for each individual department.
988
Output
Chapter 42
Dene the analysis variable. Sales is an analysis variable that is used to calculate the Sum
statistic. Sales appears twice in the COLUMN statement, and the same denition applies to both
occurrences. FORMAT= species the format to use in the report. WIDTH= species the width of
the column. Notice that the column headings for the columns that both Department and Sales
create are a combination of the heading for Department and the (default) heading for Sales.
define sales / analysis sum
format=dollar9.2 width=13;
Produce a customized summary. This COMPUTE statement begins a compute block that
produces a customized summary at the end of the report. The LINE statement places the quoted
text and the value of Sales.sum (with the DOLLAR9.2 format) in the summary. An ENDCOMP
statement must end the compute block.
compute after;
line ;
line Total sales for these stores were:
sales.sum dollar9.2;
endcomp;
Output
Sales Summary for All Stores
Nonperishable
Perishable
Manager
Sales
Sales
Sales
---------------------------------------------------Adams
Alomar
Andrews
Brown
Jones
Pelfrey
Reveiz
Smith
Taylor
$265.00
$510.00
$620.00
$275.00
$260.00
$465.00
$480.00
$170.00
$173.00
$430.00
$276.00
$425.00
$323.00
$370.00
$281.00
$630.00
$180.00
$180.00
$695.00
$786.00
$1,045.00
$598.00
$630.00
$746.00
$1,110.00
$350.00
$353.00
Program
989
Example 15: Specifying Style Elements for ODS Output in the PROC REPORT
Statement
Procedure features:
Other features:
This example creates HTML, PDF, and RTF les and sets the style elements for each
location in the report in the PROC REPORT statement.
Program
Declare the PROCLIB library. The PROCLIB library is used to store user-created formats.
libname proclib SAS-data-library;
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. FMTSEARCH= species the
library to include when searching for user-created formats. LINESIZE= and PAGESIZE= are
not set for this example because they have no effect on HTML, RTF, and Printer output.
options nodate pageno=1 fmtsearch=(proclib);
Specify the ODS output lenames. By opening multiple ODS destinations, you can produce
multiple output les in a single execution. The ODS HTML statement produces output that is
written in HTML. The ODS PDF statement produces output in Portable Document Format
(PDF). The ODS RTF statement produces output in Rich Text Format (RTF). The output from
PROC REPORT goes to each of these les.
ods html body=external-HTML-file;
ods pdf file=external-PDF-file;
ods rtf file=external-RTF-file;
Specify the report options. The NOWD option runs PROC REPORT without the REPORT
window. In this case, SAS writes the output to the traditional procedure output, the HTML body
le, and the RTF and PDF les.
proc report data=grocery nowd headline headskip
Specify the style attributes for the report. This STYLE= option sets the style element for
the structural part of the report. Because no style element is specied, PROC REPORT uses all
the style attributes of the default style element for this location except for CELLSPACING=,
BORDERWIDTH=, and BORDERCOLOR=.
style(report)=[cellspacing=5 borderwidth=10 bordercolor=blue]
990
Program
Chapter 42
Specify the style attributes for the column headings. This STYLE= option sets the style
element for all column headings. Because no style element is specied, PROC REPORT uses all
the style attributes of the default style element for this location except for those that are
specied here.
style(header)=[foreground=yellow
font_style=italic font_size=6]
Specify the style attributes for the report columns. This STYLE= option sets the style
element for all the cells in all the columns. Because no style element is specied, PROC
REPORT uses all the style attributes of the default style element for this location except for
those that are specied here.
style(column)=[foreground=moderate brown
font_face=helvetica font_size=4]
Specify the style attributes for the compute block lines. This STYLE= option sets the
style element for all the LINE statements in all compute blocks. Because no style element is
specied, PROC REPORT uses all the style attributes of the default style element for this
location except for those that are specied here.
style(lines)=[foreground=white background=black
font_style=italic font_weight=bold font_size=5]
Specify the style attributes for report summaries. This STYLE= option sets the style
element for all the default summary lines. Because no style element is specied, PROC
REPORT uses all the style attributes of the default style element for this location except for
those that are specied here.
style(summary)=[foreground=cx3e3d73 background=cxaeadd9
font_face=helvetica font_size=3 just=r];
Specify the report columns. The report contains columns for Manager, Department, and
Sales.
column manager department sales;
Dene the sort order variables. In this report Manager and Department are order variables.
PROC REPORT arranges the rows rst by the value of Manager (because it is the rst variable
in the COLUMN statement), then by the value of Department. For Manager, ORDER= species
that values of Manager are arranged according to their formatted values; similarly, for
Department, ORDER= species that values of Department are arranged according to their
internal values. FORMAT= species the format to use for each variable. Text in quotation
marks species the column headings.
define manager / order
order=formatted
format=$mgrfmt.
Manager;
define department / order
order=internal
format=$deptfmt.
Department;
Program
991
Produce a report summary. The BREAK statement produces a default summary after the last
row for each manager. SUMMARIZE writes the values of Sales (the only analysis or computed
variable in the report) in the summary line. PROC REPORT sums the values of Sales for each
manager because Sales is an analysis variable that is used to calculate the Sum statistic.
break after manager / summarize;
Produce a customized summary. The COMPUTE statement begins a compute block that
produces a customized summary after each value of Manager. The LINE statement places the
quoted text and the values of Manager and Sales.sum (with the formats $MGRFMT. and
DOLLAR7.2) in the summary. An ENDCOMP statement must end the compute block.
compute after manager;
line Subtotal for manager $mgrfmt. is
sales.sum dollar7.2 .;
endcomp;
Select the observations to process. The WHERE statement selects for the report only the
observations for stores in the southeast sector.
where sector=se;
992
HTML Output
Chapter 42
HTML Output
PDF Output
PDF Output
993
994
RTF Output
Chapter 42
RTF Output
STYLE= option in
PROC REPORT statement
CALL DEFINE statement
COMPUTE statement
DEFINE statement
Program
995
Other features:
This example creates HTML, PDF, and RTF les and sets the style elements for each
location in the report in the PROC REPORT statement. It then overrides some of these
settings by specifying style elements in other statements.
Program
Declare the PROCLIB library. The PROCLIB library is used to store user-created formats.
libname proclib SAS-data-library;
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. FMTSEARCH= species the
library to include when searching for user-created formats. LINESIZE= and PAGESIZE= are
not set for this example because they have no effect on HTML, RTF, and Printer output.
options nodate pageno=1 fmtsearch=(proclib);
Specify the ODS output lenames. By opening multiple ODS destinations, you can produce
multiple output les in a single execution. The ODS HTML statement produces output that is
written in HTML. The ODS PDF statement produces output in Portable Document Format
(PDF). The ODS RTF statement produces output in Rich Text Format (RTF). The output from
PROC REPORT goes to each of these les.
ods html body=external-HTML-file;
ods pdf file=external-PDF-file;
ods rtf file=external-RTF-file;
Specify the report options. The NOWD option runs PROC REPORT without the REPORT
window. In this case, SAS writes the output to the traditional procedure output, the HTML body
le, and the RTF and PDF les.
proc report data=grocery nowd headline headskip
Specify the style attributes for the report. This STYLE= option sets the style element for
the structural part of the report. Because no style element is specied, PROC REPORT uses all
the style attributes of the default style element for this location except for those that are
specied here.
style(report)=[cellspacing=5 borderwidth=10 bordercolor=blue]
996
Program
Chapter 42
Specify the style attributes for the column headings. This STYLE= option sets the style
element for all column headings. Because no style element is specied, PROC REPORT uses all
the style attributes of the default style element for this location except for those that are
specied here.
style(header)=[foreground=yellow
font_style=italic font_size=6]
Specify the style attributes for the report columns. This STYLE= option sets the style
element for all the cells in all the columns. Because no style element is specied, PROC
REPORT uses all the style attributes of the default style element for this location except for
those that are specied here.
style(column)=[foreground=moderate brown
font_face=helvetica font_size=4]
Specify the style attributes for the compute block lines. This STYLE= option sets the
style element for all the LINE statements in all compute blocks. Because no style element is
specied, PROC REPORT uses all the style attributes of the default style element for this
location except for those that are specied here.
style(lines)=[foreground=white background=black
font_style=italic font_weight=bold font_size=5]
Specify the style attributes for the report summaries. This STYLE= option sets the style
element for all the default summary lines. Because no style element is specied, PROC
REPORT uses all the style attributes of the default style element for this location except for
those that are specied here.
style(summary)=[foreground=cx3e3d73 background=cxaeadd9
font_face=helvetica font_size=3 just=r];
Specify the report columns. The report contains columns for Manager, Department, and
Sales.
column manager department sales;
Dene the rst sort order variable. In this report Manager is an order variable. PROC
REPORT arranges the rows rst by the value of Manager (because it is the rst variable in the
COLUMN statement). ORDER= species that values of Manager are arranged according to
their formatted values. FORMAT= species the format to use for this variable. Text in quotation
marks species the column headings.
define manager / order
order=formatted
format=$mgrfmt.
Manager
Specify the style attributes for the rst sort order variable column heading. The
STYLE= option sets the foreground and background colors of the column heading for Manager.
The other style attributes for the column heading will match those that were established for the
HEADER location in the PROC REPORT statement.
style(header)=[foreground=white
background=black];
Program
997
Dene the second sort order variable. In this report Department is an order variable.
PROC REPORT arranges the rows rst by the value of Manager (because it is the rst variable
in the COLUMN statement), then by the value of Department. ORDER= species that values of
Department are arranged according to their internal values. FORMAT= species the format to
use for this variable. Text in quotation marks species the column heading.
define department / order
order=internal
format=$deptfmt.
Department
Specify the style attributes for the second sort order variable column.The STYLE=
option sets the font of the cells in the column Department to italic. The other style attributes for
the cells will match those that were established for the COLUMN location in the PROC
REPORT statement.
style(column)=[font_style=italic];
Produce a report summary. The BREAK statement produces a default summary after the last
row for each manager. SUMMARIZE writes the values of Sales (the only analysis or computed
variable in the report) in the summary line. PROC REPORT sums the values of Sales for each
manager because Sales is an analysis variable that is used to calculate the Sum statistic.
break after manager / summarize;
Produce a customized summary. The COMPUTE statement begins a compute block that
produces a customized summary at the end of the report. This STYLE= option species the style
element to use for the text that is created by the LINE statement in this compute block. This
style element switches the foreground and background colors that were specied for the LINES
location in the PROC REPORT statement. It also changes the font style, the font weight, and
the font size.
compute after manager
/ style=[font_style=roman font_size=3 font_weight=bold
background=white foreground=black];
Specify the text for the customized summary. The LINE statement places the quoted text
and the values of Manager and Sales.sum (with the formats $MGRFMT. and DOLLAR7.2) in
the summary. An ENDCOMP statement must end the compute block.
line Subtotal for manager $mgrfmt. is
sales.sum dollar7.2 .;
endcomp;
Produce a customized background for the analysis column. This compute block species
a background color and a bold font for all cells in the Sales column that contain values of 100 or
greater and that are not summary lines.
compute sales;
if sales.sum>100 and _break_= then
call define(_col_, "style",
"style=[background=yellow
font_face=helvetica
font_weight=bold]");
endcomp;
998
Program
Chapter 42
Select the observations to process. The WHERE statement selects for the report only the
observations for stores in the southeast sector.
where sector=se;
999
1000
PDF Output
Chapter 42
PDF Output
RTF Output
RTF Output
1001
1002
1003
CHAPTER
43
The SORT Procedure
Overview: SORT Procedure 1003
What Does the SORT Procedure Do? 1003
Sorting SAS Data Sets 1004
Syntax: SORT Procedure 1005
PROC SORT Statement 1005
BY Statement 1012
Concepts: SORT Procedure 1013
Multi-threaded Sorting 1013
Using PROC SORT with a DBMS 1013
Sorting Orders for Numeric Variables 1013
Sorting Orders for Character Variables 1014
Default Collating Sequence 1014
EBCDIC Order 1014
ASCII Order 1014
Specifying Sorting Orders for Character Variables 1015
Stored Sort Information 1015
Integrity Constraints: SORT Procedure 1015
Results: SORT Procedure 1016
Procedure Output 1016
Output Data Set 1016
Examples: SORT Procedure 1016
Example 1: Sorting by the Values of Multiple Variables 1017
Example 2: Sorting in Descending Order 1019
Example 3: Maintaining the Relative Order of Observations in Each BY Group
Example 4: Retaining the First Observation of Each BY Group 1023
1020
1004
Chapter 43
documentation for your operating environment for information about other sorting
capabilities 4
Output 43.1
NOTE: There were 6 observations read from the data set WORK.EMPLOYEE.
NOTE: The data set WORK.EMPLOYEE has 6 observations and 3 variables.
NOTE: PROCEDURE SORT used:
real time
0.01 seconds
cpu time
0.01 seconds
Output 43.2
Name
Belloit
Wesley
Lemeux
Arnsbarger
Pierce
Capshaw
IDnumber
1988
2092
4210
5466
5779
7338
The following output shows the results of a more complicated sort by three variables.
The businesses in this example are sorted by town, then by debt from highest amount
to lowest amount, then by account number. For an explanation of the program that
produces this output, see Example 2 on page 1019.
Output 43.3
1005
Obs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Company
Town
Debt
Pauls Pizza
Peters Auto Parts
Watson Tabor Travel
Tinas Pet Shop
Apex Catering
Deluxe Hardware
Boyd & Sons Accounting
World Wide Electronics
Elway Piano and Organ
Ice Cream Delight
Tims Burger Stand
Strickland Industries
Paulines Antiques
Bobs Beds
Apex
Apex
Apex
Apex
Apex
Garner
Garner
Garner
Garner
Holly Springs
Holly Springs
Morrisville
Morrisville
Morrisville
83.00
65.79
37.95
37.95
37.95
467.12
312.49
119.95
65.79
299.98
119.95
657.22
302.05
119.95
Account
Number
1019
7288
3131
5108
9923
8941
4762
1122
5217
2310
6335
1675
9112
4998
To do this
ASCII
Specify EBCDIC
EBCDIC
1006
Chapter 43
To do this
Specify Danish
DANISH
Specify Finnish
FINNISH
Specify Norwegian
NORWEGIAN
Specify Swedish
SWEDISH
NATIONAL
SORTSEQ=
DATA=
DATECOPY
OUT=
DUPOUT=
REVERSE
EQUALS
NOEQUALS
NODUPKEY
NODUPRECS
OVERWRITE
SORTSIZE=
FORCE
TAGSORT
THREADS
NOTHREADS
Options
Options can include one collating-sequence-option and multiple other options. The
order of the two types of options does not matter and both types are not necessary in
the same PROC SORT step.
1007
Collating-Sequence-Options
step.
ASCII
sorts character variables using the ASCII collating sequence. You need this option
only when you sort by ASCII on a system where EBCDIC is the native collating
sequence.
See also: Sorting Orders for Character Variables on page 1014
DANISH
NORWEGIAN
sorts character variables using the EBCDIC collating sequence. You need this option
only when you sort by EBCDIC on a system where ASCII is the native collating
sequence.
See also: Sorting Orders for Character Variables on page 1014
FINNISH
SWEDISH
sorts characters according to the Finnish and Swedish national standard. The
Finnish and Swedish collating sequence is shown in Figure 43.1 on page 1008.
NATIONAL
See DANISH.
SORTSEQ=collating-sequence
species the collating sequence. The value of collating-sequence can be any one of the
collating-sequence-options in the PROC SORT statement, or the value can be the
name of a translation table, either a default translation table or one that you have
created in the TRANTAB procedure. For an example of using PROC TRANTAB and
PROC SORT with SORTSEQ=, see Using Different Translation Tables for Sorting in
SAS National Language Support (NLS): Users Guide. The available translation
tables are
Danish
Finnish
Italian
Norwegian
Spanish
1008
Chapter 43
Swedish
The following gure shows how the alphanumeric characters in each language will
sort.
Figure 43.1
CAUTION:
If you use a host sort utility to sort your data, then specifying the SORTSEQ= option
might corrupt the character BY variables. For more information, see the PROC SORT
documentation for your operating environment. 4
SWEDISH
See FINNISH.
Other Options
DATA=SAS-data-set
copies the SAS internal date and time when the SAS data set was created and the
date and time when it was last modied prior to the sort to the resulting sorted data
set. Note that the operating environment date and time are not preserved.
Restriction: DATECOPY can be used only when the resulting data set uses the V8
or V9 engine.
Tip: You can alter the le creation date and time with the DTC= option in the
MODIFY statement in PROC DATASETS. For more information, see MODIFY
Statement on page 348.
DUPOUT= SAS-data-set
species the output data set to which duplicate observations are written.
EQUALS | NOEQUALS
species the order of the observations in the output data set. For observations with
identical BY-variable values, EQUALS maintains the relative order of the
observations within the input data set in the output data set. NOEQUALS does not
necessarily preserve this order in the output data set.
Default: EQUALS
Interaction: When you use NODUPRECS or NODUPKEY to remove observations
in the output data set, the choice of EQUALS or NOEQUALS can affect which
observations are removed.
1009
However, I/O performance may be reduced when using the EQUALS option with
the multi-threaded sort because partitioned data sets will be processed as if they
are non-partitioned data sets.
Interaction: The NOEQUALS option is supported by the multi-threaded sort. The
FORCE
sorts and replaces an indexed data set when the OUT= option is not specied.
Without the FORCE option, PROC SORT does not sort and replace an indexed data
set because sorting destroys user-created indexes for the data set. When you specify
FORCE, PROC SORT sorts and replaces the data set and destroys all user-created
indexes for the data set. Indexes that were created or required by integrity
constraints are preserved.
PROC SORT checks for the sort information before it sorts a data set so that
data is not re-sorted unnecessarily. By default, PROC SORT does not sort a data
set if the sort information matches the requested sort. You can use FORCE to
override this behavior. You might need to use FORCE if SAS cannot verify the sort
specication in the data set option SORTEDBY=. For more information about
SORTEDBY=, see the chapter on SAS data set options in SAS Language
Reference: Dictionary.
Tip:
Restriction: If you use PROC SORT with the FORCE option on data sets that were
created with the Version 5 compatibility engine or with a sequential engine such
as a tape format engine, you must also specify the OUT= option.
NODUPKEY
checks for and eliminates observations with duplicate BY values. If you specify this
option, then PROC SORT compares all BY values for each observation to those for
the previous observation that is written to the output data set. If an exact match is
found, then the observation is not written to the output data set.
Operating Environment Information: If you use the VMS operating environment
sort, then the observation that is written to the output data set is not always the
rst observation of the BY group. 4
Note: See NODUPRECS for information about eliminating duplicate
observations. 4
Interaction: When you are removing observations with duplicate BY values with
Tip:
Featured in:
NODUPRECS
checks for and eliminates duplicate observations. If you specify this option, then
PROC SORT compares all variable values for each observation to those for the
previous observation that was written to the output data set. If an exact match is
found, then the observation is not written to the output data set.
1010
Chapter 43
See THREADS|NOTHREADS.
OUT=SAS-data-set
names the output data set. If SAS-data-set does not exist, then PROC SORT creates
it.
CAUTION:
Use care when you use PROC SORT without OUT=. Without OUT=, data could be lost
if your system failed during execution of PROC SORT. 4
Default: Without OUT=, PROC SORT overwrites the original data set.
Tip : You can use data set options with OUT=.
Featured in:
OVERWRITE
enables the input data set to be deleted before the replacement output data set is
populated with observations.
Restriction: The OVERWRITE option has no effect if you also specify the
TAGSORT option. You cannot overwrite the input data set because TAGSORT
must reread the input data set while populating the output data set.
Restriction: The OVERWRITE option is supported by the SAS sort and SAS
multi-threaded sort only. The option has no effect if you are using a host sort.
Tip: Using the OVERWRITE option can reduce disk space requirements.
CAUTION:
Use the OVERWRITE option only with a data set that is backed up or with a data set that
you can reconstruct. Because the input data set is deleted, data will be lost if a
failure occurs while the output data set is being written. 4
1011
REVERSE
sorts character variables using a collating sequence that is reversed from the normal
collating sequence.
Operating Environment Information: For information about the normal collating
sequence for your operating environment, see EBCDIC Order on page 1014, ASCII
Order on page 1014, and the SAS documentation for your operating environment. 4
Interaction: Using REVERSE with the DESCENDING option in the BY statement
species the maximum amount of memory that is available to PROC SORT. Valid
values for memory-specication are as follows:
MAX
species that all available memory can be used.
n
species the amount of memory in bytes, where n is a real number.
nK
species the amount of memory in kilobytes, where n is a real number.
nM
species the amount of memory in megabytes, where n is a real number.
nG
species the amount of memory in gigabytes, where n is a real number.
Specifying the SORTSIZE= option in the PROC SORT statement temporarily
overrides the SAS system option SORTSIZE=. For more information about
SORTSIZE=, see the chapter on SAS system options in SAS Language Reference:
Dictionary.
Operating Environment Information: Some system sort utilities may treat this
option differently. Refer to the SAS documentation for your operating environment.
Tip:
stores only the BY variables and the observation numbers in temporary les. The BY
variables and the observation numbers are called tags. At the completion of the
sorting process, PROC SORT uses the tags to retrieve records from the input data set
in sorted order.
Restriction: The TAGSORT option is not compatible with the OVERWRITE option.
Interaction: The TAGSORT option is not supported by the multi-threaded sort.
1012
BY Statement
Chapter 43
When the total length of BY variables is small compared with the record
length, TAGSORT reduces temporary disk usage considerably. However,
processing time may be much higher.
Tip:
THREADS | NOTHREADS
option THREADS. For more information about THREADS, see the chapter on SAS
system options in SAS Language Reference: Dictionary.
Interaction: The THREADS option is honored if the value of the SAS system option
Note:
BY Statement
Species the sorting variables
Featured in:
1023
Required Arguments
variable
species the variable by which PROC SORT sorts the observations. PROC SORT
rst arranges the data set by the values in ascending order, by default, of the rst
BY variable. PROC SORT then arranges any observations that have the same value
of the rst BY variable by the values of the second BY variable in ascending order.
This sorting continues for every specied BY variable.
Option
DESCENDING
reverses the sort order for the variable that immediately follows in the statement so
that observations are sorted from the largest value to the smallest value.
Featured in:
1013
Multi-threaded Sorting
The SAS system option THREADS activates multi-threaded sorting, which is new
with SAS System 9. Multi-threaded sorting achieves a degree of parallelism in the
sorting operations. This parallelism is intended to reduce the real-time to completion
for a given operation at the possible cost of additional CPU resources. For more
information, see the section on Support for Parallel Processing in SAS Language
Reference: Concepts.
The performance of the multi-threaded sort will be affected by the value of the SAS
system option CPUCOUNT=. CPUCOUNT= suggests how many system CPUs are
available for use by the multi-threaded sort.
The multi-threaded sort supports concurrent input from the partitions of a
partitioned data set.
Note: These partitioned data sets should not be confused with partitioned data sets
on z/OS. 4
Operating Environment Information: For information about the support of partitioned
data sets in your operating environment, see the SAS documentation for your operating
environment. 4
For more information about THREADS and CPUCOUNT=, see the chapter on SAS
system options in SAS Language Reference: Dictionary.
1014
Chapter 43
3 zero
4 positive numeric values.
EBCDIC Order
The z/OS operating environment uses the EBCDIC collating sequence.
The sorting order of the English-language EBCDIC sequence is
blank . < ( + | & ! $ * );
- / , % _ > ?: # @ = "
abcdefghijklmnopqr~stuvwxyz
{ A B C D E F G H I } J K L M N O P Q R \S T
UVWXYZ
0123456789
The main features of the EBCDIC sequence are that lowercase letters are sorted
before uppercase letters, and uppercase letters are sorted before digits. Note also that
some special characters interrupt the alphabetic sequences. The blank is the smallest
character that you can display.
ASCII Order
The operating environments that use the ASCII collating sequence include
3 UNIX and its derivatives
3 OpenVMS
3 Windows.
From the smallest to the largest character that you can display, the English-language
ASCII sequence is
blank ! " # $ % & ( )* + , - . /0 1 2 3 4 5 6 7 8 9 : ; < = > ? @
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z[ \] _
abcdefghijklmnopqrstuvwxyz{}~
The main features of the ASCII sequence are that digits are sorted before uppercase
letters, and uppercase letters are sorted before lowercase letters. The blank is the
smallest character that you can display.
1015
1016
Chapter 43
Procedure Output
PROC SORT produces only an output data set. To see the output data set, you can
use PROC PRINT, PROC REPORT, or another of the many available methods of
printing in SAS.
With all three replacement options (implicit replacement, explicit replacement, and no
replacement) there must be at least enough space in the output data library for a copy
of the original data set.
You can also sort compressed data sets. If you specify a compressed data set as the
input data set and omit the OUT= option, then the input data set is sorted and remains
compressed. If you specify an OUT= data set, then the resulting data set is compressed
only if you choose a compression method with the COMPRESS= data set option. For
more information about COMPRESS=, see the chapter on SAS data set options in SAS
Language Reference: Dictionary.
Note: If the SAS system option NOREPLACE is in effect, then you cannot replace
an original permanent data set with a sorted version. You must either use the OUT=
option or specify the SAS system option REPLACE in an OPTIONS statement. The
SAS system option NOREPLACE does not affect temporary SAS data sets. 4
Program
1017
PROC PRINT
This example
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60;
Create the input data set ACCOUNT. ACCOUNT contains the name of each business that
owes money, the amount of money that it owes on its account, the account number, and the town
where the business is located.
data account;
input Company $ 1-22
Town $ 39-51;
datalines;
Pauls Pizza
World Wide Electronics
Strickland Industries
Ice Cream Delight
Watson Tabor Travel
Boyd & Sons Accounting
Bobs Beds
Tinas Pet Shop
Elway Piano and Organ
Tims Burger Stand
Peters Auto Parts
Deluxe Hardware
Paulines Antiques
Apex Catering
;
83.00
119.95
657.22
299.98
37.95
312.49
119.95
37.95
65.79
119.95
65.79
467.12
302.05
37.95
1019
1122
1675
2310
3131
4762
4998
5108
5217
6335
7288
8941
9112
9923
Apex
Garner
Morrisville
Holly Springs
Apex
Garner
Morrisville
Apex
Garner
Holly Springs
Apex
Garner
Morrisville
Apex
1018
Output
Chapter 43
Create the output data set BYTOWN. OUT= creates a new data set for the sorted
observations.
proc sort data=account out=bytown;
Sort by two variables. The BY statement species that the observations should be rst
ordered alphabetically by town and then by company.
by town company;
run;
Print the output data set BYTOWN. PROC PRINT prints the data set BYTOWN.
proc print data=bytown;
Specify the variables to print. The VAR statement species the variables to print and their
column order in the output.
var company town debt accountnumber;
Output
Obs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Company
Town
Apex Catering
Pauls Pizza
Peters Auto Parts
Tinas Pet Shop
Watson Tabor Travel
Boyd & Sons Accounting
Deluxe Hardware
Elway Piano and Organ
World Wide Electronics
Ice Cream Delight
Tims Burger Stand
Bobs Beds
Paulines Antiques
Strickland Industries
Apex
Apex
Apex
Apex
Apex
Garner
Garner
Garner
Garner
Holly Springs
Holly Springs
Morrisville
Morrisville
Morrisville
Debt
Account
Number
37.95
83.00
65.79
37.95
37.95
312.49
467.12
65.79
119.95
299.98
119.95
119.95
302.05
657.22
9923
1019
7288
5108
3131
4762
8941
5217
1122
2310
6335
4998
9112
1675
Program
1019
PROC PRINT
Data set:
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60;
Create the output data set SORTED. OUT= creates a new data set for the sorted
observations.
proc sort data=account out=sorted;
Sort by three variables with one in descending order. The BY statement species that
observations should be rst ordered alphabetically by town, then by descending value of amount
owed, then by ascending value of the account number.
by town descending debt accountnumber;
run;
Print the output data set SORTED. PROC PRINT prints the data set SORTED.
proc print data=sorted;
Specify the variables to print. The VAR statement species the variables to print and their
column order in the output.
var company town debt accountnumber;
1020
Output
Chapter 43
Output
Note that sorting last by AccountNumber puts the businesses in Apex with a debt of $37.95 in
order of account number.
Obs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Company
Town
Pauls Pizza
Peters Auto Parts
Watson Tabor Travel
Tinas Pet Shop
Apex Catering
Deluxe Hardware
Boyd & Sons Accounting
World Wide Electronics
Elway Piano and Organ
Ice Cream Delight
Tims Burger Stand
Strickland Industries
Paulines Antiques
Bobs Beds
Apex
Apex
Apex
Apex
Apex
Garner
Garner
Garner
Garner
Holly Springs
Holly Springs
Morrisville
Morrisville
Morrisville
Debt
Account
Number
83.00
65.79
37.95
37.95
37.95
467.12
312.49
119.95
65.79
299.98
119.95
657.22
302.05
119.95
1019
7288
3131
5108
9923
8941
4762
1122
5217
2310
6335
1675
9112
4998
This example
Program
1021
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60;
Create the input data set INSURANCE. INSURANCE contains the number of years worked
by all insured employees and their insurance ids.
data insurance;
input YearsWorked 1 InsuranceID 3-5;
datalines;
5 421
5 336
1 209
1 564
3 711
3 343
4 212
4 616
;
Create the output data set BYYEARS1 with the EQUALS option. OUT= creates a new
data set for the sorted observations. The EQUALS option maintains the order of the
observations relative to each other.
proc sort data=insurance out=byyears1 equals;
Sort by the rst variable. The BY statement species that the observations should be ordered
numerically by the number of years worked.
by yearsworked;
run;
Print the output data set BYYEARS1. PROC PRINT prints the data set BYYEARS1.
proc print data=byyears1;
Specify the variables to print. The VAR statement species the variables to print and their
column order in the output.
var yearsworked insuranceid;
1022
Program
Chapter 43
Create the output data set BYYEARS2. OUT= creates a new data set for the sorted
observations. The NOEQUALS option will not maintain the order of the observations relative to
each other.
proc sort data=insurance out=byyears2 noequals;
Sort by the rst variable. The BY statement species that the observations should be ordered
numerically by the number of years worked.
by yearsworked;
run;
Print the output data set BYYEARS2. PROC PRINT prints the data set BYYEARS2.
proc print data=byyears2;
Specify the variables to print. The VAR statement species the variables to print and their
column order in the output.
var yearsworked insuranceid;
1023
Output
Note that sorting with the EQUALS option versus sorting with the NOEQUALS option causes a
different sort order for the observations where YearsWorked=3.
Obs
1
2
3
4
5
6
7
8
1
1
3
3
4
4
5
5
Insurance
ID
209
564
711
343
212
616
421
336
Obs
1
2
3
4
5
6
7
8
Years
Worked
1
1
3
3
4
4
5
5
Insurance
ID
209
564
343
711
212
616
421
336
PROC PRINT
Data set: ACCOUNT on page 1017
Interaction: The EQUALS option, which is the default, must be in effect to ensure that
the rst observation for each BY group is the one that is retained by the NODUPKEY
option. If the NOEQUALS option has been specied, then one observation for each BY
group will still be retained by the NODUPKEY option, but not necessarily the rst
observation.
In this example, PROC SORT creates an output data set that contains only the rst
observation of each BY group. The NODUPKEY option prevents an observation from
1024
Program
Chapter 43
being written to the output data set when its BY value is identical to the BY value of
the last observation written to the output data set. The resulting report contains one
observation for each town where the businesses are located.
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60;
Create the output data set TOWNS but include only the rst observation of each BY
group. NODUPKEY writes only the rst observation of each BY group to the new data set
TOWNS.
Operating Environment Information: If you use the VMS operating environment sort,
then the observation that is written to the output data set is not always the rst
observation of the BY group. 4
proc sort data=account out=towns nodupkey;
Sort by one variable. The BY statement species that observations should be ordered by town.
by town;
run;
Print the output data set TOWNS. PROC PRINT prints the data set TOWNS.
proc print data=towns;
Specify the variables to print. The VAR statement species the variables to print and their
column order in the output.
var town company debt accountnumber;
Output
Output
The output data set contains only four observations, one for each town in the input data set.
Obs
1
2
3
4
Town
Company
Apex
Garner
Holly Springs
Morrisville
Pauls Pizza
World Wide Electronics
Ice Cream Delight
Strickland Industries
Debt
83.00
119.95
299.98
657.22
1
Account
Number
1019
1122
2310
1675
1025
1026
1027
CHAPTER
44
The SQL Procedure
Overview: SQL Procedure 1029
What Is the SQL Procedure? 1029
What Are PROC SQL Tables? 1029
What Are Views? 1029
SQL Procedure Coding Conventions 1030
Syntax: SQL Procedure 1031
PROC SQL Statement 1033
ALTER TABLE Statement 1038
CONNECT Statement 1042
CREATE INDEX Statement 1043
CREATE TABLE Statement 1045
CREATE VIEW Statement 1049
DELETE Statement 1051
DESCRIBE Statement 1052
DISCONNECT Statement 1053
DROP Statement 1054
EXECUTE Statement 1055
INSERT Statement 1056
RESET Statement 1058
SELECT Statement 1058
UPDATE Statement 1069
VALIDATE Statement 1070
SQL Procedure Component Dictionary 1071
BETWEEN condition 1071
BTRIM function 1072
CALCULATED 1073
CASE expression 1073
COALESCE Function 1075
column-denition 1075
column-modier 1076
column-name 1078
CONNECTION TO 1079
CONTAINS condition 1079
EXISTS condition 1080
IN condition 1080
IS condition 1081
joined-table 1082
LIKE condition 1091
LOWER function 1093
query-expression 1093
sql-expression 1099
1028
Contents
Chapter 44
1029
3
3
3
3
Figure 44.1
reports
PROC SQL tables
(SAS data files)
PROC
SQL
DBMS tables
macro variables
DBMS tables
PROC SQL views
1030
Chapter 44
1031
3 The ORDER BY clause sorts data by columns. In addition, tables do not need to
be presorted by a variable for use with PROC SQL. Therefore, you do not need to
use the SORT procedure with your PROC SQL programs.
3 A PROC SQL statement runs when you submit it; you do not have to specify a
RUN statement. If you follow a PROC SQL statement with a RUN statement,
then SAS ignores the RUN statement and submits the statements as usual.
Note:
Regular type indicates the name of a component that is described in SQL Procedure
Component Dictionary on page 1071.
view-name indicates a SAS data view of any type.
1032
Chapter 44
PROC SQL;
CONNECT TO dbms-name <AS alias>
< (connect-statement-argument-1=value <
connect-statement-argument-n=value>)>
< (database-connection-argument-1=value <
database-connection-argument-n=value>)>;
SELECT column-list
FROM CONNECTION TO dbms-name|alias
(dbms-query)
optional PROC SQL clauses;
<DISCONNECT FROM dbms-name|alias;>
<QUIT;>
To do this
ALTER TABLE
CONNECT TO
CREATE INDEX
CREATE TABLE
CREATE VIEW
Delete rows
DELETE
DESCRIBE
DISCONNECT FROM
DROP
EXECUTE
Add rows
INSERT
RESET
SELECT
Query a DBMS
CONNECTION TO
Modify values
UPDATE
VALIDATE
1033
1034
Chapter 44
To do this
Control output
Double-space the report
DOUBLE|NODOUBLE
FEEDBACK|NOFEEDBACK
FLOW|NOFLOW
NUMBER|NONUMBER
PRINT|NOPRINT
SORTMSG|NOSORTMSG
SORTSEQ=
Control execution
Allow PROC SQL to use names other than
SAS names
DQUOTE=
ERRORSTOP|NOERRORSTOP
EXEC|NOEXEC
INOBS=
OUTOBS=
LOOPS=
PROMPT|NOPROMPT
STIMER|NOSTIMER
THREADS|NOTHREADS
UNDO_POLICY=
Options
DOUBLE|NODOUBLE
DQUOTE=ANSI|SAS
species whether PROC SQL treats values within double quotation marks (" ") as
variables or strings. With DQUOTE=ANSI, PROC SQL treats a quoted value as a
1035
variable. This feature enables you to use the following as table names, column
names, or aliases:
or noninteractive session
Interaction: This option is useful only when the EXEC option is in effect.
Tip:
Tip:
EXEC|NOEXEC
species whether a statement should be executed after its syntax is checked for
accuracy.
Default: EXEC
NOEXEC is useful if you want to check the syntax of your SQL statements
without executing the statements.
Tip:
species whether PROC SQL displays, in the SAS log, PROC SQL statements after
view references are expanded or certain other transformations of the statement are
made.
This option has the following effects:
3 Any asterisk (for example, SELECT *) is expanded into the list of qualied
columns that it represents.
species that character columns longer than n are owed to multiple lines. PROC
SQL sets the column width at n and species that character columns longer than n
are owed to multiple lines. When you specify FLOW=n m, PROC SQL oats the
width of the columns between these limits to achieve a balanced layout. Specifying
FLOW without arguments is equivalent to specifying FLOW=12 200.
Default: NOFLOW
1036
Chapter 44
INOBS=n
restricts the number of rows (observations) that PROC SQL retrieves from any single
source.
Tip:
LOOPS=n
restricts PROC SQL to n iterations through its inner loop. You use the number of
iterations reported in the SQLOOPS macro variable (after each SQL statement is
executed) to discover the number of loops. Set a limit to prevent queries from
consuming excessive computer resources. For example, joining three large tables
without meeting the join-matching conditions could create a huge internal table that
would be inefcient to execute.
See also: Using Macro Variables Set by PROC SQL on page 1119
NODOUBLE
See THREADS|NOTHREADS.
NUMBER|NONUMBER
species whether the SELECT statement includes a column called ROW, which is the
row (or observation) number of the data as the rows are retrieved.
Default: NONUMBER
Featured in:
OUTOBS=n
restricts the number of rows (observations) in the output. For example, if you specify
OUTOBS=10 and insert values into a table using a query-expression, then the SQL
procedure inserts a maximum of 10 rows. Likewise, OUTOBS=10 limits the output to
10 rows.
PRINT|NOPRINT
1037
Default: PRINT
NOPRINT is useful when you are selecting values from a table into macro
variables and do not want anything to be displayed.
Tip:
Interaction: NOPRINT affects the value of the SQLOBS automatic macro variable.
See Using Macro Variables Set by PROC SQL on page 1119 for details.
PROMPT|NOPROMPT
modies the effect of the INOBS=, OUTOBS=, and LOOPS= options. If you specify
the PROMPT option and reach the limit specied by INOBS=, OUTOBS=, or
LOOPS=, then PROC SQL prompts you to stop or continue. The prompting repeats if
the same limit is reached again.
Default: NOPROMPT
SORTMSG|NOSORTMSG
Certain operations, such as ORDER BY, may sort tables internally using PROC
SORT. Specifying SORTMSG requests information from PROC SORT about the sort
and displays the information in the log.
Default: NOSORTMSG
SORTSEQ=sort-table
species the collating sequence to use when a query contains an ORDER BY clause.
Use this option only if you want a collating sequence other than your systems or
installations default collating sequence.
See also: SORTSEQ= option in SAS National Language Support (NLS): Users
Guide.
STIMER|NOSTIMER
species whether PROC SQL writes timing information to the SAS log for each
statement, rather than as a cumulative value for the entire procedure. For this
option to work, you must also specify the SAS system option STIMER. Some
operating environments require that you specify this system option when you invoke
SAS. If you use the system option alone, then you receive timing information for the
entire SQL procedure, not on a statement-by-statement basis.
Default: NOSTIMER
THREADS|NOTHREADS
species how PROC SQL handles updated data if errors occur while you are
updating data. You can use UNDO_POLICY= to control whether your changes will
be permanent:
NONE
keeps any updates or inserts.
1038
Chapter 44
OPTIONAL
reverses any updates or inserts that it can reverse reliably.
REQUIRED
reverses all inserts or updates that have been done to the point of the error. In
some cases, the UNDO operation cannot be done reliably. For example, when a
program uses a SAS/ACCESS view, it may not be able to reverse the effects of the
INSERT and UPDATE statements without reversing the effects of other changes
at the same time. In that case, PROC SQL issues an error message and does not
execute the statement. Also, when a SAS data set is accessed through a
SAS/SHARE server and is opened with the data set option CNTLLEV=RECORD,
you cannot reliably reverse your changes.
This option may enable other users to update newly inserted rows. If an error
occurs during the insert, then PROC SQL can delete a record that another user
updated. In that case, the statement is not executed, and an error message is
issued.
Default: REQUIRED
Note: Options can be added, removed, or changed between PROC SQL statements
with the RESET statement. 4
Restriction: You cannot use ALTER TABLE on a table that is accessed by an engine that
does not support UPDATE processing.
Restriction: You must use at least one ADD, DROP, or MODIFY clause in the ALTER
TABLE statement.
Featured in:
1039
Arguments
<ADD CONSTRAINT constraint-name constraint-specication< , constraint-name
constraint-specication>>
adds the integrity constraint that is specied in constraint-specication and assigns
constraint-name to it.
<ADD constraint-specication< , constraint-specication>>
adds the integrity constraint that is specied in constraint-specication and assigns a
default name to it. The default constraint name has the form that is shown in the
following table:
Default Name
Constraint Type
_NMxxxx_
Not null
_UNxxxx_
Unique
_CKxxxx_
Check
_PKxxxx_
Primary key
_FKxxxx_
Foreign key
which means that variables in a data le are part of both a primary key and a
foreign key denition,
3 if you use the exact same variables, then the variables must be dened in
a different order.
1040
Chapter 44
3 the foreign keys update and delete referential actions must both be
RESTRICT.
NOT NULL (column)
species that column does not contain a null or missing value, including special
missing values.
PRIMARY KEY (column< , column>)
species one or more primary key columns, that is, columns that do not contain
missing values and whose values are unique.
Restriction: When you are dening overlapping primary key and foreign key
constraints, which means that variables in a data le are part of both a primary
key denition and a foreign key denition, if you use the exact same variables,
then the variables must be dened in a different order.
UNIQUE (column< , column>)
species that the values of each column must be unique. This constraint is
identical to DISTINCT.
constraint-name
species a name for the constraint that is being specied. The name must be a valid
SAS name.
Note: The names PRIMARY, FOREIGN, MESSAGE, UNIQUE, DISTINCT,
CHECK, and NOT cannot be used as values for constraint-name. 4
constraint-specication
consists of
constraint <MESSAGE=message-string <MSGTYPE=message-type>>
<DROP column< , column>>
deletes each column from the table.
<DROP CONSTRAINT constraint-name < , constraint-name>>
deletes the integrity constraint that is referenced by each constraint-name. To nd
the name of an integrity constraint, use the DESCRIBE TABLE CONSTRAINTS
clause (see DESCRIBE Statement on page 1052).
<DROP FOREIGN KEY constraint-name>
Removes the foreign key constraint that is referenced by constraint-name.
Note:
message-string
species the text of an error message that is written to the log when the integrity
constraint is not met. The maximum length of message-string is 250 characters.
message-type
species how the error message is displayed in the SAS log when an integrity
constraint is not met.
1041
NEWLINE
the text that is specied for MESSAGE= is displayed as well as the default error
message for that integrity constraint.
USER
only the text that is specied for MESSAGE= is displayed.
<MODIFY column-denition<, column-denition>>
changes one or more attributes of the column that is specied in each
column-denition.
referential-action
species the type of action to be performed on all matching foreign key values.
CASCADE
allows primary key data values to be updated, and updates matching values in the
foreign key to the same values. This referential action is currently supported for
updates only.
RESTRICT
prevents the update or deletion of primary key data values if a matching value
exists in the foreign key. This referential action is the default.
SET NULL
allows primary key data values to be updated, and sets all matching foreign key
values to NULL.
table-name
3 in the ALTER TABLE statement, refers to the name of the table that is to be
altered.
3 in the REFERENCES clause, refers to the name of table that contains the
primary key that is referenced by the foreign key.
table-name can be a one-level name, a two-level libref.table name, or a physical
pathname that is enclosed in single quotation marks.
WHERE-clause
species a SAS WHERE clause. Do not include the WHERE keyword in the WHERE
clause.
1042
CONNECT Statement
Chapter 44
Renaming Columns
To change a columns name, you must use the SAS data set option RENAME=. You
cannot change this attribute with the ALTER TABLE statement. RENAME= is
described in the section on SAS data set options in SAS Language Reference: Dictionary.
Integrity Constraints
Use ALTER TABLE to modify integrity constraints for existing tables. Use the
CREATE TABLE statement to attach integrity constraints to new tables. For more
information on integrity constraints, see the section on SAS les in SAS Language
Reference: Concepts.
CONNECT Statement
Establishes a connection with a DBMS that is supported by SAS/ACCESS software.
Requirement: SAS/ACCESS software is required. For more information about this
statement, refer to your SAS/ACCESS documentation.
See also: Connecting to a DBMS Using the SQL Procedure Pass-Through Facility on
page 1115
Arguments
alias
species an alias that has 1 to 32 characters. The keyword AS must precede alias.
Some DBMSs allow more than one connection. The optional AS clause enables you to
name the connections so that you can refer to them later.
connect-statement-argument=value
species values for arguments that indicate whether you can make multiple
connections, shared or unique connections, and so on, to the database. These
arguments are optional, but if they are included, then they must be enclosed in
parentheses. See SAS/ACCESS for Relational Databases: Reference for more
information about these arguments.
1043
database-connection-argument=value
species values for the DBMS-specic arguments that are needed by PROC SQL in
order to connect to the DBMS. These arguments are optional for most databases, but
if they are included, then they must be enclosed in parentheses. For more
information, see the SAS/ACCESS documentation for your DBMS.
dbms-name
identies the DBMS that you want to connect to (for example, ORACLE or DB2).
Arguments
column
names the index that you are creating. If you are creating an index on one column
only, then index-name must be the same as column. If you are creating an index on
more than one column, then index-name cannot be the same as any column in the
table.
table-name
1044
Chapter 44
UNIQUE Keyword
The UNIQUE keyword causes SAS to reject any change to a table that would cause
more than one row to have the same index value. Unique indexes guarantee that data
in one column, or in a composite group of columns, remain unique for every row in a
table. For this reason, a unique index cannot be dened for a column that includes
NULL or missing values.
1045
Managing Indexes
You can use the CONTENTS statement in the DATASETS procedure to display a
tables index names and the columns for which they are dened. You can also use the
DICTIONARY tables INDEXES, TABLES, and COLUMNS to list information about
indexes. For more information, see Using the DICTIONARY Tables on page 1116.
See the section on SAS les in SAS Language Reference: Dictionary for a further
description of when to use indexes and how they affect SAS statements that handle
BY-group processing.
Arguments
column-constraint
1046
Chapter 44
NOT NULL
species that the column does not contain a null or missing value, including
special missing values.
PRIMARY KEY
species that the column is a primary key column, that is, a column that does not
contain missing values and whose values are unique.
Restriction: When dening overlapping primary key and foreign key constraints,
which means that variables in a data le are part of both a primary key and a
foreign key denition, if you use the exact same variables, then the variables
must be dened in a different order.
REFERENCES table-name
<ON DELETE referential-action > <ON UPDATE referential-action>
species that the column is a foreign key, that is, a column whose values are
linked to the values of the primary key variable in another table (the table-name
that is specied for REFERENCES). The referential-actions are performed when
the values of a primary key column that is referenced by the foreign key are
updated or deleted.
Restriction: When you are dening overlapping primary key and foreign key
constraints, which means that variables in a data le are part of both a primary
key denition and a foreign key denition,
3 if you use the exact same variables, then the variables must be dened in
a different order
3 the foreign keys update and delete referential actions must both be
RESTRICT.
UNIQUE
species that the values of the column must be unique. This constraint is identical
to DISTINCT.
Note: If you specify column-constraint, then SAS automatically assigns a name to
the constraint. The constraint name has the form
Default name
Constraint type
_CKxxxx_
Check
_FKxxxx_
Foreign key
_NMxxxx_
Not Null
_PKxxxx_
Primary key
_UNxxxx_
Unique
consists of
column-denition <column-constraint>
constraint
1047
CHECK (WHERE-clause)
species that all rows in table-name satisfy the WHERE-clause.
DISTINCT (column<, column>)
species that the values of each column must be unique. This constraint is
identical to UNIQUE.
FOREIGN KEY (column< , column>)
REFERENCES table-name
<ON DELETE referential-action > <ON UPDATE referential-action>
species a foreign key, that is, a set of columns whose values are linked to the
values of the primary key variable in another table (the table-name that is specied
for REFERENCES). The referential-actions are performed when the values of a
primary key column that is referenced by the foreign key are updated or deleted.
Restriction: When you are dening overlapping primary key and foreign key
constraints, which means that variables in a data le are part of both a primary
key denition and a foreign key denition,
3 if you use the exact same variables, then the variables must be dened in
a different order
3 the foreign keys update and delete referential actions must both be
RESTRICT.
NOT NULL (column)
species that column does not contain a null or missing value, including special
missing values.
PRIMARY KEY (column< , column>)
species one or more primary key columns, that is, columns that do not contain
missing values and whose values are unique.
Restriction: When dening overlapping primary key and foreign ke constraints,
which means that variables in a data le are part of both a primary key and a
foreign key denition, if you use the exact same variables, then the variables
must be dened in a different order.
UNIQUE (column< , column>)
species that the values of each column must be unique. This constraint is
identical to DISTINCT.
constraint-name
species a name for the constraint that is being specied. The name must be a valid
SAS name.
Note: The names PRIMARY, FOREIGN, MESSAGE, UNIQUE, DISTINCT,
CHECK, and NOT cannot be used as values for constraint-name. 4
constraint-specication
consists of
CONSTRAINT constraint-name constraint <MESSAGE=message-string
< MSGTYPE=message-type>>
message-string
species the text of an error message that is written to the log when the integrity
constraint is not met. The maximum length of message-string is 250 characters.
message-type
species how the error message is displayed in the SAS log when an integrity
constraint is not met.
1048
Chapter 44
NEWLINE
the text that is specied for MESSAGE= is displayed as well as the default error
message for that integrity constraint.
USER
only the text that is specied for MESSAGE= is displayed.
ORDER BY order-by-item
sorts the rows in table-name by the values of each order-by-item. See ORDER BY
Clause on page 1067.
query-expression
creates table-name from the results of a query. See query-expression on page 1093.
referential-action
species the type of action to be performed on all matching foreign key values.
CASCADE
allows primary key data values to be updated, and updates matching values in the
foreign key to the same values. This referential action is currently supported for
updates only.
RESTRICT
occurs only if there are matching foreign key values. This referential action is the
default.
SET NULL
sets all matching foreign key values to NULL.
table-name
3 in the CREATE TABLE statement, refers to the name of the table that is to be
created. You can use data set options by placing them in parentheses
immediately after table-name. See Using SAS Data Set Options with PROC
SQL on page 1114 for details.
3 in the REFERENCES clause, refers to the name of table that contains the
primary key that is referenced by the foreign key.
table-name2
creates table-name with the same column names and column attributes as
table-name2, but with no rows.
WHERE-clause
species a SAS WHERE clause. Do not include the WHERE keyword in the WHERE
clause.
1049
Both of these forms create a table without rows. You can use an INSERT
statement to add rows. Use an ALTER TABLE statement to modify column
attributes or to add or drop columns.
Recursive table references can cause data integrity problems. While it is possible to
recursively reference the target table of a CREATE TABLE AS statement, doing
so can cause data integrity problems and incorrect results. Constructions such
as the following should be avoided:
proc sql;
create table a as
select var1, var2
from a;
Integrity Constraints
You can attach integrity constraints when you create a new table. To modify integrity
constraints, use the ALTER TABLE statement. For more information on integrity
constraints, see the section on SAS les in SAS Language Reference: Concepts.
1050
Chapter 44
Arguments
column-name-list
is a comma-separated list of column names for the view, to be used in place of the
column names or aliases that are specied in the SELECT clause. The names in this
list are assigned to columns in the order in which they are specied in the SELECT
clause. If the number of column names in this list does not equal the number of
columns in the SELECT clause, then a warning is written to the SAS log.
query-expression
species the name for the PROC SQL view that you are creating. See What Are
Views? on page 1029 for a denition of a PROC SQL view.
In this view, VIEW1 and INVOICE are stored permanently in the SAS data library
referenced by PROCLIB. Specifying a libref for INVOICE is optional.
DELETE Statement
1051
Updating Views
You can update a views underlying data with some restrictions. See Updating
PROC SQL and SAS/ACCESS Views on page 1121.
For more information on the SAS/ACCESS LIBNAME statement, see the SAS/ACCESS
documentation for your DBMS.
Note: Starting in SAS System 9, PROC SQL views, the Pass-Through Facility, and
the SAS/ACCESS LIBNAME statement are the preferred ways to access relational
DBMS data; SAS/ACCESS views are no longer recommended. You can convert existing
SAS/ACCESS views to PROC SQL views by using the CV2VIEW procedure. See The
CV2VIEW Procedure in SAS/ACCESS for Relational Databases: Reference for more
information. 4
You can also embed a SAS LIBNAME statement in a view with the USING clause.
This enables you to store SAS libref information in the view. Just as in the embedded
SAS/ACCESS LIBNAME statement, the scope of the libref is local to the view, and it
will not conict with an identically named libref in the SAS session.
create view work.tableview as
select * from proclib.invoices
using libname proclib sas-data-library;
DELETE Statement
Removes one or more rows from a table or view that is specied in the FROM clause.
Restriction: You cannot use DELETE FROM on a table that is accessed by an engine
that does not support UPDATE processing.
Featured in: Example 5 on page 1134
1052
DESCRIBE Statement
Chapter 44
DELETE
FROM table-name|sas/access-view|proc-sql-view <AS alias>
<WHERE sql-expression>;
Arguments
alias
species a PROC SQL view that you are deleting rows from. proc-sql-view can be a
one-level name, a two-level libref.view name, or a physical pathname that is enclosed
in single quotation marks.
sql-expression
species the table that you are deleting rows from. table-name can be a one-level
name, a two-level libref.table name, or a physical pathname that is enclosed in single
quotation marks.
CAUTION:
Recursive table references can cause data integrity problems. While it is possible to
recursively reference the target table of a DELETE statement, doing so can cause
data integrity problems and incorrect results. Constructions such as the following
should be avoided:
proc sql;
delete from a
where var1 > (select min(var2) from a);
If you omit a WHERE clause, then the DELETE statement deletes all the rows from the
specied table or the table that is described by a view. 4
DESCRIBE Statement
Displays a PROC SQL denition in the SAS log.
Restriction:
PROC SQL views are the only type of view allowed in a DESCRIBE VIEW
statement.
Featured in:
DISCONNECT Statement
1053
Arguments
table-name
Details
3 The DESCRIBE TABLE statement writes a CREATE TABLE statement to the
SAS log for the table specied in the DESCRIBE TABLE statement, regardless of
how the table was originally created (for example, with a DATA step). If
applicable, SAS data set options are included with the table denition. If indexes
are dened on columns in the table, then CREATE INDEX statements for those
indexes are also written to the SAS log.
When you are transferring a table to a DBMS that is supported by
SAS/ACCESS software, it is helpful to know how it is dened. To nd out more
information about a table, use the FEEDBACK option or the CONTENTS
statement in the DATASETS procedure.
3 The DESCRIBE VIEW statement writes a view denition to the SAS log. If you
use a PROC SQL view in the DESCRIBE VIEW statement that is based on or
derived from another view, then you might want to use the FEEDBACK option in
the PROC SQL statement. This option displays in the SAS log how the underlying
view is dened and expands any expressions that are used in this view denition.
The CONTENTS statement in DATASETS procedure can also be used with a view
to nd out more information.
DISCONNECT Statement
Ends the connection with a DBMS that is supported by a SAS/ACCESS interface.
Requirement: SAS/ACCESS software is required. For more information on this
statement, refer to your SAS/ACCESS documentation.
See also: Connecting to a DBMS Using the SQL Procedure Pass-Through Facility on
page 1115
1054
DROP Statement
Chapter 44
Arguments
alias
species the DBMS from which you want to end the connection (for example, DB2 or
ORACLE). The name you specify should match the name that is specied in the
CONNECT statement.
Details
3 An implicit COMMIT is performed before the DISCONNECT statement ends the
DBMS connection. If a DISCONNECT statement is not submitted, then implicit
DISCONNECT and COMMIT actions are performed and the connection to the
DBMS is broken when PROC SQL terminates.
3 PROC SQL continues executing until you submit a QUIT statement, another SAS
procedure, or a DATA step.
DROP Statement
Deletes tables, views, or indexes.
Restriction: You cannot use DROP TABLE or DROP INDEX on a table that is accessed
by an engine that does not support UPDATE processing.
Arguments
index-name
species a SAS data view of any type: PROC SQL view, SAS/ACCESS view, or DATA
step view. view-name can be a one-level name, a two-level libref.view name, or a
physical pathname that is enclosed in single quotation marks.
EXECUTE Statement
1055
Details
3 If you drop a table that is referenced in a view denition and try to execute the
view, then an error message is written to the SAS log that states that the table
does not exist. Therefore, remove references in queries and views to any table(s)
and view(s) that you drop.
3 If you drop a table with indexed columns, then all the indexes are automatically
dropped. If you drop a composite index, then the index is dropped for all the
columns that are named in that index.
3 You can use the DROP statement to drop a table or view in an external database
that is accessed with the Pass-Through Facility or SAS/ACCESS LIBNAME
statement, but not for an external database table or view that is described by a
SAS/ACCESS view.
EXECUTE Statement
Sends a DBMS-specic SQL statement to a DBMS that is supported by a SAS/ACCESS interface.
Requirement: SAS/ACCESS software is required. For more information on this
statement, refer to your SAS/ACCESS documentation.
See also: Connecting to a DBMS Using the SQL Procedure Pass-Through Facility on
page 1115 and the SQL documentation for your DBMS.
EXECUTE (dbms-SQL-statement)
BY dbms-name|alias;
Arguments
alias
species an optional alias that is dened in the CONNECT statement. Note that
alias must be preceded by the keyword BY.
dbms-name
identies the DBMS to which you want to direct the DBMS statement (for example,
ORACLE or DB2).
dbms-SQL-statement
is any DBMS-specic SQL statement, except the SELECT statement, that can be
executed by the DBMS-specic dynamic SQL.
Details
3 If your DBMS supports multiple connections, then you can use the alias that is
dened in the CONNECT statement. This alias directs the EXECUTE statements
to a specic DBMS connection.
3 Any return code or message that is generated by the DBMS is available in the
macro variables SQLXRC and SQLXMSG after the statement completes.
1056
INSERT Statement
Chapter 44
INSERT Statement
Adds rows to a new or existing table or view.
Restriction: You cannot use INSERT INTO on a table that is accessed with an engine
that does not support UPDATE processing.
Featured in:
Arguments
column
species a PROC SQL view into which you are inserting rows. proc-sql-view can be a
one-level name, a two-level libref.view name, or a physical pathname that is enclosed
in single quotation marks.
query-expression
species a PROC SQL table into which you are inserting rows. table-name can be a
one-level name, a two-level libref.table name, or a physical pathname that is enclosed
in single quotation marks.
value
is a data value.
CAUTION:
Recursive table references can cause data integrity problems. While it is possible to
recursively reference the target table of an INSERT statement, doing so can cause
data integrity problems and incorrect results. Constructions such as the following
should be avoided:
INSERT Statement
1057
proc sql;
insert into a
select var1, var2
from a
where var1 > 0;
Note: If the INSERT statement includes an optional list of column names, then only
those columns are given values by the statement. Columns that are in the table but not
listed are given missing values. 4
1058
RESET Statement
Chapter 44
RESET Statement
Resets PROC SQL options without restarting the procedure.
Featured in:
RESET <option(s)>;
The RESET statement enables you to add, drop, or change the options in PROC SQL
without restarting the procedure. See PROC SQL Statement on page 1033 for a
description of the options.
SELECT Statement
Selects columns and rows of data from tables and views.
Restriction:
The clauses in the SELECT statement must appear in the order shown.
SELECT Clause
Lists the columns that will appear in the output.
See Also: column-denition on page 1075
Featured in:
SELECT Clause
1059
Arguments
alias
Column Aliases
A column alias is a temporary, alternate name for a column. Aliases are specied in
the SELECT clause to name or rename columns so that the result table is clearer or
easier to read. Aliases are often used to name a column that is the result of an
arithmetic expression or summary function. An alias is one word only. If you need a
longer column name, then use the LABEL= column-modier, as described in
column-modier on page 1076. The keyword AS is not required with a column alias.
Column aliases are optional, and each column name in the SELECT clause can have
an alias. After you assign an alias to a column, you can use the alias to refer to that
column in other clauses.
1060
INTO Clause
Chapter 44
If you use a column alias when creating a PROC SQL view, then the alias becomes
the permanent name of the column for each execution of the view.
INTO Clause
Stores the value of one or more columns for use later in another PROC SQL query or SAS
statement.
An INTO clause cannot be used in a CREATE TABLE statement.
See also: Using Macro Variables Set by PROC SQL on page 1119
Restriction:
INTO macro-variable-specication
<, macro-variable-specication>
Arguments
macro-variable
species a SAS macro variable that stores the values of the rows that are returned.
macro-variable-specication
protects the leading and trailing blanks from being deleted from values that are
stored in a range of macro variables or multiple values that are stored in a single
macro variable.
SEPARATED BY character
Details
3 Use the INTO clause only in the outer query of a SELECT statement and not in a
subquery.
3 When storing a single value into a macro variable, PROC SQL preserves leading
or trailing blanks. However, when storing values into a range of macro variables,
or when using the SEPARATED BY option to store multiple values in one macro
variable, PROC SQL trims leading or trailing blanks unless you use the NOTRIM
option.
3 You can put multiple rows of the output into macro variables. You can check the
PROC SQL macro variable SQLOBS to see the number of rows that are produced
by a query-expression. See Using Macro Variables Set by PROC SQL on page
1119 for more information on SQLOBS.
INTO Clause
1061
Examples
These examples use the PROCLIB.HOUSES table:
Style
SqFeet
-----------------CONDO
900
CONDO
1000
RANCH
1200
RANCH
1400
SPLIT
1600
SPLIT
1800
TWOSTORY
2100
TWOSTORY
3000
TWOSTORY
1940
TWOSTORY
1860
3 You can create macro variables based on the rst row of the result.
proc sql noprint;
select style, sqfeet
into :style, :sqfeet
from proclib.houses;
%put &style &sqfeet;
3 You can create one new macro variable per row in the result of the SELECT
statement. This example shows how you can request more values for one column
than for another. The hyphen (-) is used in the INTO clause to imply a range of
macro variables. You can use either of the keywords THROUGH or THRU instead
of a hyphen.
The following PROC SQL step puts the values from the rst four rows of the
PROCLIB.HOUSES table into macro variables:
proc sql noprint;
select distinct Style, SqFeet
into :style1 - :style3, :sqfeet1 - :sqfeet4
from proclib.houses;
%put
%put
%put
%put
&style1 &sqfeet1;
&style2 &sqfeet2;
&style3 &sqfeet3;
&sqfeet4;
1062
INTO Clause
Chapter 44
3 You can concatenate the values of one column into one macro variable. This form
is useful for building up a list of variables or constants.
proc sql noprint;
select distinct style
into :s1 separated by ,
from proclib.houses;
%put &s1;
CONDO,RANCH,SPLIT,TWOSTORY
3 You can use leading zeros in order to create a range of macro variable names, as
shown in the following example:
proc sql noprint;
select SqFeet
into :sqfeet01 - :sqfeet10
from proclib.houses;
%put &sqfeet01 &sqfeet02 &sqfeet03 &sqfeet04 &sqfeet05;
%put &sqfeet06 &sqfeet07 &sqfeet08 &sqfeet09 &sqfeet10;
15
%put &sqfeet01 &sqfeet02 &sqfeet03 &sqfeet04 &sqfeet05;
900 1000 1200 1400 1600
16
%put &sqfeet06 &sqfeet07 &sqfeet08 &sqfeet09 &sqfeet10;
1800 2100 3000 1940 1860
FROM Clause
1063
3 You can prevent leading and trailing blanks from being trimmed from values that
are stored in macro variables. By default, when storing values in a range of macro
variables or when storing multiple values in one macro variable (with the
SEPARATED BY option), PROC SQL trims the leading and trailing blanks from
the values before creating the macro variables. If you do not want the blanks to be
trimmed, then add the NOTRIM option, as shown in the following example:
proc sql noprint;
select style, sqfeet
into :style1 - :style4 notrim,
:sqfeet separated by , notrim
from proclib.houses;
%put
%put
%put
%put
*&style1*
*&style2*
*&style3*
*&style4*
*&sqfeet*;
*&sqfeet*;
*&sqfeet*;
*&sqfeet*;
The results are written to the SAS log, as shown in the following output:
3
proc sql noprint;
4
select style, sqfeet
5
into :style1 - :style4 notrim,
6
:sqfeet separated by , notrim
7
from proclib.houses;
8
9
%put *&style1* *&sqfeet*;
*CONDO
* *
900,
1000,
1200,
1400,
3000,
1940,
1860*
10 %put *&style2* *&sqfeet*;
*CONDO
* *
900,
1000,
1200,
1400,
3000,
1940,
1860**
11 %put *&style3* *&sqfeet*;
*RANCH
* *
900,
1000,
1200,
1400,
3000,
1940,
1860**
12 %put *&style4* *&sqfeet*;
*RANCH
* *
900,
1000,
1200,
1400,
3000,
1940,
1860**
1600,
1800,
2100,
1600,
1800,
2100,
1600,
1800,
2100,
1600,
1800,
2100,
FROM Clause
Species source tables or views.
Featured in: Example 1 on page 1125, Example 4 on page 1131, Example 9 on page 1145,
and Example 10 on page 1148
FROM from-list
Arguments
alias
species a temporary, alternate name for a table, view, or in-line view that is
specied in the FROM clause.
1064
FROM Clause
Chapter 44
column
names the column that appears in the output. The column names that you specify
are matched by position to the columns in the output.
from-list
Table Aliases
A table alias is a temporary, alternate name for a table that is specied in the FROM
clause. Table aliases are prexed to column names to distinguish between columns that
are common to multiple tables. Column names in reexive joins (joining a table with
itself) must be prexed with a table alias in order to distinguish which copy of the table
the column comes from. Column names in other kinds of joins must be prexed with
table aliases or table names unless the column names are unique to those tables.
The optional keyword AS is often used to distinguish a table alias from other table
names.
In-Line Views
The FROM clause can itself contain a query-expression that takes an optional table
alias. This kind of nested query-expression is called an in-line view. An in-line view is
any query-expression that would be valid in a CREATE VIEW statement. PROC SQL
can support many levels of nesting, but it is limited to 32 tables in any one query. The
32-table limit includes underlying tables that may contribute to views that are specied
in the FROM clause.
An in-line view saves you a programming step. Rather than creating a view and
referring to it in another query, you can specify the view in-line in the FROM clause.
Characteristics of in-line views include the following:
3 An in-line view is not assigned a permanent name, although it can take an alias.
3 An in-line view can be referred to only in the query in which it is dened. It
cannot be referenced in another query.
GROUP BY Clause
1065
3 The names of columns in an in-line view can be assigned in the object-item list of
that view or with a parenthesized list of names following the alias. This syntax
can be useful for renaming columns. See Example 10 on page 1148 for an example.
3 In order to visually separate an in-line view from the rest of the query, you can
enclose the in-line view in any number of pairs of parentheses. Note that if you
specify an alias for the in-line view, the alias specication must appear outside the
outermost pair of parentheses for that in-line view.
WHERE Clause
Subsets the output based on specied conditions.
Featured in: Example 4 on page 1131 and Example 9 on page 1145
WHERE sql-expression
Argument
sql-expression
Details
3 When a condition is met (that is, the condition resolves to true), those rows are
displayed in the result table; otherwise, no rows are displayed.
3 You cannot use summary functions that specify only one column. For example:
where max(measure1) > 50;
GROUP BY Clause
Species how to group the data for summarizing.
Featured in: Example 8 on page 1143 and Example 12 on page 1152
1066
GROUP BY Clause
Chapter 44
Arguments
group-by-item
Details
3 You can specify more than one group-by-item to get more detailed reports. Both
the grouping of multiple items and the BY statement of a PROC step are
evaluated in similar ways. If more than one group-by-item is specied, then the
rst one determines the major grouping.
3 Integers can be substituted for column names (that is, SELECT object-items) in
the GROUP BY clause. For example, if the group-by-item is 2, then the results are
grouped by the values in the second column of the SELECT clause list. Using
integers can shorten your coding and enable you to group by the value of an
unnamed expression in the SELECT list. Note that if you use a oating-point
value (for example, 2.3), then PROC SQL ignores the decimal portion.
3 The data does not have to be sorted in the order of the group-by values because
PROC SQL handles sorting automatically. You can use the ORDER BY clause to
specify the order in which rows are displayed in the result table.
3 If you specify a GROUP BY clause in a query that does not contain a summary
function, then your clause is transformed into an ORDER BY clause and a
message to that effect is written to the SAS log.
3 You can group the output by the values that are returned by an expression. For
example, if X is a numeric variable, then the output of the following is grouped by
the integer portion of values of X:
select x, sum(y)
from table1
group by int(x);
Note that an expression that contains only numeric literals (and functions of
numeric literals) or only character literals (and functions of character literals) is
ignored.
An expression in a GROUP BY clause cannot be a summary function. For
example, the following GROUP BY clause is not valid:
group by sum(x)
ORDER BY Clause
1067
HAVING Clause
Subsets grouped data based on specied conditions.
Featured in: Example 8 on page 1143 and Example 12 on page 1152
HAVING sql-expression
Argument
sql-expression
Note: This query involves remerged data because the values returned by a
summary function are compared to values of a column that is not in the GROUP BY
clause. See Remerging Data on page 1110 for more information about summary
functions and remerging data. 4
ORDER BY Clause
Species the order in which rows are displayed in a result table.
See also: query-expression on page 1093
Featured in: Example 11 on page 1150
1068
ORDER BY Clause
Chapter 44
Arguments
order-by-item
orders the data in ascending order. This is the default order; if neither ASC nor
DESC is specied, the data is ordered in ascending order.
DESC
Details
3 The ORDER BY clause sorts the result of a query expression according to the
3
3
order specied in that query. When this clause is used, the default ordering
sequence is ascending, from the lowest value to the highest. You can use the
SORTSEQ= option to change the collating sequence for your output. See PROC
SQL Statement on page 1033.
If an ORDER BY clause is omitted, then a particular order to the output rows,
such as the order in which the rows are encountered in the queried table, cannot
be guaranteed. Without an ORDER BY clause, the order of the output rows is
determined by the internal processing of PROC SQL, the default collating
sequence of SAS, and your operating environment. Therefore, if you want your
result table to appear in a particular order, then use the ORDER BY clause.
If more than one order-by-item is specied (separated by commas), then the rst
one determines the major sort order.
Integers can be substituted for column names (that is, SELECT object-items) in
the ORDER BY clause. For example, if the order-by-item is 2 (an integer), then the
results are ordered by the values of the second column. If a query-expression
includes a set operator (for example, UNION), then use integers to specify the
order. Doing so avoids ambiguous references to columns in the table expressions.
Note that if you use a oating-point value (for example, 2.3) instead of an integer,
then PROC SQL ignores the decimal portion.
In the ORDER BY clause, you can specify any column of a table or view that is
specied in the FROM clause of a query-expression, regardless of whether that
column has been included in the querys SELECT clause. For example, this query
produces a report ordered by the descending values of the population change for
each country from 1990 to 1995:
proc sql;
select country
UPDATE Statement
1069
from census
order by pop95-pop90 desc;
3 You can order the output by the values that are returned by an expression. For
example, if X is a numeric variable, then the output of the following is ordered by
the integer portion of values of X:
select x, y
from table1
order by int(x);
Note that an expression that contains only numeric literals (and functions of
numeric literals) or only character literals (and functions of character literals) is
ignored.
UPDATE Statement
Modies a columns values in existing rows of a table or view.
Restriction: You cannot use UPDATE on a table that is accessed by an engine that does
not support UPDATE processing.
Featured in: Example 3 on page 1129
Arguments
alias
1070
VALIDATE Statement
Chapter 44
sql-expression
Details
3 Any column that is not modied retains its original values, except in certain
queries using the CASE expression. See CASE expression on page 1073 for a
description of CASE expressions.
3 To add, drop, or modify a columns denition or attributes, use the ALTER TABLE
statement, described in ALTER TABLE Statement on page 1038.
3 In the SET clause, a column reference on the left side of the equal sign can also
appear as part of the expression on the right side of the equal sign. For example,
you could use this expression to give employees a $1,000 holiday bonus:
set salary=salary + 1000
3 If you omit the WHERE clause, then all the rows are updated. When you use a
WHERE clause, only the rows that meet the WHERE condition are updated.
3 When you update a column and an index has been dened for that column, the
values in the updated column continue to have the index dened for them.
VALIDATE Statement
Checks the accuracy of a query-expressions syntax and semantics without executing the
expression.
VALIDATE query-expression;
Argument
query-expression
BETWEEN condition
1071
Details
3 The VALIDATE statement writes a message in the SAS log that states that the
query is valid. If there are errors, then VALIDATE writes error messages to the
SAS log.
3 The VALIDATE statement can also be included in applications that use the macro
facility. When used in such an application, VALIDATE returns a value that
indicates the query-expressions validity. The value is returned through the macro
variable SQLRC (a short form for SQL return code). For example, if a SELECT
statement is valid, then the macro variable SQLRC returns a value of 0. See
Using Macro Variables Set by PROC SQL on page 1119 for more information.
BETWEEN condition
Selects rows where column values are within a range of values.
sql-expression <NOT> BETWEEN sql-expression
AND sql-expression
Argument
sql-expression
Details
3 The sql-expressions must be of compatible data types. They must be either all
numeric or all character types.
3 Because a BETWEEN condition evaluates the boundary values as a range, it is
not necessary to specify the smaller quantity rst.
3 You can use the NOT logical operator to exclude a range of numbers, for example,
to eliminate customer numbers between 1 and 15 (inclusive) so that you can
retrieve data on more recently acquired customers.
1072
BTRIM function
Chapter 44
3 PROC SQL supports the same comparison operators that the DATA step supports.
For example:
x between 1 and 3
x between 3 and 1
1<=x<=3
x>=1 and x<=3
BTRIM function
Removes blanks or specied characters from the beginning, the end, or both the beginning and
end of a character string.
BTRIM (<< btrim-specication> <btrim-character FROM>> sql-expression)
Arguments
btrim-specication
is a single character that is to be removed from the character string. The default
character is a blank.
sql-expression
Details
The BTRIM function operates on character strings. BTRIM removes one or more
instances of a single character (the value of btrim-character) from the beginning, the
end, or both the beginning and end of a string, depending whether LEADING,
TRAILING, or BOTH is specied. If btrim-specication is not specied, then BOTH is
used. If btrim-character is omitted, then blanks are removed.
Note: SAS adds trailing blanks to character values that are shorter than the length
of the variable. Suppose you have a character variable Z, with length 10, and a value
xxabcxx. SAS stores the value with three blanks after the last x (for a total length of
10). If you attempt to remove all the x characters with
btrim(both x from z)
CASE expression
1073
then the result is abcxx because PROC SQL sees the trailing characters as blanks, not
the x character. In order to remove all the x characters, use
btrim(both x from btrim(z))
The inner BTRIM function removes the trailing blanks before passing the value to the
outer BTRIM function. 4
CALCULATED
Refers to columns already calculated in the SELECT clause.
CALCULATED column-alias
Argument
column-alias
CASE expression
Selects result values that satisfy specied conditions.
Featured in: Example 3 on page 1129 and Example 13 on page 1154
CASE <case-operand>
WHEN when-condition THEN result-expression
<WHEN when-condition THEN result-expression>
<ELSE result-expression>
END
1074
CASE expression
Chapter 44
Arguments
case-operand
is a valid sql-expression that resolves to a table column whose values are compared
to all the when-conditions. See sql-expression on page 1099.
when-condition
Details
The CASE expression selects values if certain conditions are met. A CASE expression
returns a single value that is conditionally evaluated for each row of a table (or view).
Use the WHEN-THEN clauses when you want to execute a CASE expression for some
but not all of the rows in the table that is being queried or created. An optional ELSE
expression gives an alternative action if no THEN expression is executed.
When you omit case-operand, when-condition is evaluated as a Boolean (true or false)
value. If when-condition returns a nonzero, nonmissing result, then the WHEN clause
is true. If case-operand is specied, then it is compared with when-condition for
equality. If case-operand equals when-condition, then the WHEN clause is true.
If the when-condition is true for the row that is being executed, then the
result-expression that follows THEN is executed. If when-condition is false, then PROC
SQL evaluates the next when-condition until they are all evaluated. If every
when-condition is false, then PROC SQL executes the ELSE expression, and its result
becomes the CASE expressions result. If no ELSE expression is present and every
when-condition is false, then the result of the CASE expression is a missing value.
You can use a CASE expression as an item in the SELECT clause and as either
operand in an sql-expression.
Example
The following two PROC SQL steps show two equivalent CASE expressions that
create a character column with the strings in the THEN clause. The CASE expression
in the second PROC SQL step is a shorthand method that is useful when all the
comparisons are with the same column.
proc sql;
select Name, case
when Continent = North America then Continental U.S.
when Continent = Oceania then Pacific Islands
else None
end as Region
from states;
proc sql;
select Name, case Continent
when North America then Continental U.S.
when Oceania then Pacific Islands
column-denition
1075
else None
end as Region
from states;
Note: When you use the shorthand method, the conditions must all be equality
tests. That is, they cannot use comparison operators or other types of operators. 4
COALESCE Function
Returns the rst nonmissing value from a list of columns.
Featured in: Example 7 on page 1138
Arguments
column-name
Details
COALESCE accepts one or more column names of the same data type. The
COALESCE function checks the value of each column in the order in which they are
listed and returns the rst nonmissing value. If only one column is listed, the
COALESCE function returns the value of that column. If all the values of all
arguments are missing, the COALESCE function returns a missing value.
In some SQL DBMSs, the COALESCE function is called the IFNULL function. See
PROC SQL and the ANSI Standard on page 1122 for more information.
Note: If your query contains a large number of COALESCE function calls, it might
be more efcient to use a natural join instead. See Natural Joins on page 1088. 4
column-denition
Denes PROC SQLs data types and dates.
See also: column-modier on page 1076
Featured in: Example 1 on page 1125
1076
column-modier
Chapter 44
Arguments
column
is a column name.
column-modier
Details
3 SAS supports many but not all of the data types that SQL-based databases
support.
3 For all the numeric data types (INTEGER, SMALLINT, DECIMAL, NUMERIC,
FLOAT, REAL, DOUBLE PRECISION, and DATE), the SQL procedure defaults to
the SAS data type NUMERIC. The width and ndec arguments are ignored; PROC
SQL creates all numeric columns with the maximum precision allowed by SAS. If
you want to create numeric columns that use less storage space, then use the
LENGTH statement in the DATA step. The various numeric data type names,
along with the width and ndec arguments, are included for compatibility with
other SQL software.
3 For the character data types (CHARACTER and VARCHAR), the SQL procedure
defaults to the SAS data type CHARACTER. The width argument is honored.
3 The CHARACTER, INTEGER, and DECIMAL data types can be abbreviated to
CHAR, INT, and DEC, respectively.
3 A column that is declared with DATE is a SAS numeric variable with a date
informat or format. You can use any of the column-modiers to set the appropriate
attributes for the column that is being dened. See SAS Language Reference:
Dictionary for more information on dates.
column-modier
Sets column attributes.
See also: column-denition on page 1075 and SELECT Clause on page 1058
Featured in:
column-modier
1077
column-modier
Arguments
column-modier
If a special character must appear as the rst character in the output, then
precede it with a space or a forward slash (/).
You can omit the LABEL= part of the column-modier and still specify a label.
Be sure to enclose the label in quotation marks, as in this example:
select empname "Names of Employees"
from sql.employees;
If an apostrophe must appear in the label, then type it twice so that SAS reads
the apostrophe as a literal. Alternatively, you can use single and double quotation
marks alternately (for example, Date Recd).
LENGTH=length
species the length of the column. This column modier is valid only in the
context of a SELECT statement.
TRANSCODE=YES|NO
for character columns, species whether values can be transcoded. Use
TRANSCODE=NO to suppress transcoding. Note that when you create a table by
using the CREATE TABLE AS statement, the transcoding attribute for a given
character column in the created table is the same as it is in the source table unless
1078
column-name
Chapter 44
you change it with the TRANSCODE= column modier. For more information
about transcoding, see SAS National Language Support (NLS): Users Guide.
Default: YES
Restriction: Suppression of transcoding is not supported for the V6TAPE engine.
Interaction: If the TRANSCODE= attribute is set to NO for any character
variable in a table, then PROC CONTENTS prints a transcode column that
contains the TRANSCODE= value for each variable in the data set. If all
variables in the table are set to the default TRANSCODE= value (YES), then no
transcode column is printed.
Details
If you refer to a labeled column in the ORDER BY or GROUP BY clause, then you
must use either the column name (not its label), the columns alias, or its ordering
integer (for example, ORDER BY 2). See the section on SAS statements in SAS
Language Reference: Dictionary for more information about labels.
column-name
Species the column to select.
See also: column-modier on page 1076 and SELECT Clause on page 1058
column-name
column-name
Details
A column can be referred to by its name alone if it is the only column by that name
in all the tables or views listed in the current query-expression. If the same column
name exists in more than one table or view in the query-expression, then you must
qualify each use of the column name by prexing a reference to the table that contains
it. Consider the following examples:
SALARY
EMP.SALARY
E.SALARY
CONTAINS condition
1079
CONNECTION TO
Retrieves and uses DBMS data in a PROC SQL query or view.
You can use CONNECTION TO in the SELECT statements FROM clause as part
of the from-list.
Tip:
See also: Connecting to a DBMS Using the SQL Procedure Pass-Through Facility on
page 1115 and your SAS/ACCESS documentation.
Arguments
alias
species the query to send to a DBMS. The query uses the DBMSs dynamic SQL.
You can use any SQL syntax that the DBMS understands, even if that is not valid for
PROC SQL. However, your DBMS query cannot contain a semicolon because that
represents the end of a statement to SAS.
The number of tables that you can join with dbms-query is determined by the
DBMS. Each CONNECTION TO component counts as one table toward the 32-table
PROC SQL limit for joins.
See SAS/ACCESS for Relational Databases: Reference for more information about
DBMS queries.
CONTAINS condition
Tests whether a string is part of a columns value.
Alias:
Restriction:
1080
EXISTS condition
Chapter 44
Argument
sql-expression
EXISTS condition
Tests if a subquery returns one or more rows.
See also: Query Expressions (Subqueries) on page 1102
Argument
query-expression
Details
The EXISTS condition is an operator whose right operand is a subquery. The result
of an EXISTS condition is true if the subquery resolves to at least one row. The result
of a NOT EXISTS condition is true if the subquery evaluates to zero rows. For example,
the following query subsets PROCLIB.PAYROLL (which is shown in Example 2 on page
1127) based on the criteria in the subquery. If the value for STAFF.IDNUM is on the
same row as the value CT in PROCLIB.STAFF (which is shown in Example 4 on page
1131), then the matching IDNUM in PROCLIB.PAYROLL is included in the output.
Thus, the query returns all the employees from PROCLIB.PAYROLL who live in CT.
proc sql;
select *
from proclib.payroll p
where exists (select *
from proclib.staff s
where p.idnumber=s.idnum
and state=CT);
IN condition
Tests set membership.
Featured in:
IS condition
1081
Arguments
constant
is a number or a quoted character string (or other special notation) that indicates a
xed value. Constants are also called literals.
query-expression
Details
An IN condition tests if the column value that is returned by the sql-expression on
the left is a member of the set (of constants or values returned by the query-expression)
on the right. The IN condition is true if the value of the left-hand operand is in the set
of values that are dened by the right-hand operand.
IS condition
Tests for a missing value.
Featured in: Example 5 on page 1134
Argument
sql-expression
Details
IS NULL and IS MISSING are predicates that test for a missing value. IS NULL and
IS MISSING are used in the WHERE, ON, and HAVING expressions. Each predicate
resolves to true if the sql-expressions result is missing and false if it is not missing.
SAS stores a numeric missing value as a period (.) and a character missing value as
a blank space. Unlike missing values in some versions of SQL, missing values in SAS
always appear rst in the collating sequence. Therefore, in Boolean and comparison
operations, the following expressions resolve to true in a predicate:
3>null
-3>null
0>null
1082
joined-table
Chapter 44
The SAS way of evaluating missing values differs from that of the ANSI Standard for
SQL. According to the Standard, these expressions are NULL. See sql-expression on
page 1099 for more information on predicates and operators. See PROC SQL and the
ANSI Standard on page 1122 for more information on the ANSI Standard.
joined-table
Joins a table with itself or with other tables or views.
Restrictions:
See also: FROM Clause on page 1063 and query-expression on page 1093
Featured in: Example 4 on page 1131, Example 7 on page 1138, Example 9 on page 1145,
Example 13 on page 1154, and Example 14 on page 1158
Arguments
alias
joined-table
1083
Types of Joins
uv Inner join. See Inner Joins on page 1084.
w Outer join. See Outer Joins on page 1086.
x Cross join. See Cross Joins on page 1087.
y Union join. See Union Joins on page 1088.
U Natural join. See Natural Joins on page 1088.
Joining Tables
When multiple tables, views, or query-expressions are listed in the FROM clause,
they are processed to form one table. The resulting table contains data from each
contributing table. These queries are referred to as joins.
Conceptually, when two tables are specied, each row of table A is matched with all
the rows of table B to produce an internal or intermediate table. The number of rows in
the intermediate table (Cartesian product) is equal to the product of the number of rows
in each of the source tables. The intermediate table becomes the input to the rest of the
query in which some of its rows may be eliminated by the WHERE clause or
summarized by a summary function.
A common type of join is an equijoin, in which the values from a column in the rst
table must equal the values of a column in the second table.
Table Limit
PROC SQL can process a maximum of 32 tables for a join. If you are using views in
a join, then the number of tables on which the views are based count toward the
32-table limit. Each CONNECTION TO component in the Pass-Through Facility counts
as one table.
Table Aliases
Table aliases are used in joins to distinguish the columns of one table from those in
the other table(s). A table name or alias must be prexed to a column name when you
are joining tables that have matching column names. See FROM Clause on page 1063
for more information on table aliases.
1084
joined-table
Chapter 44
Inner Joins
An inner join returns a result table for all the rows in a table that have one or more
matching rows in the other table(s), as specied by the sql-expression. Inner joins can
be performed on up to 32 tables in the same query-expression.
You can perform an inner join by using a list of table-names separated by commas or
by using the INNER, JOIN, and ON keywords.
The LEFTTAB and RIGHTTAB tables are used to illustrate this type of join:
The following example joins the LEFTTAB and RIGHTTAB tables to get the
Cartesian product of the two tables. The Cartesian product is the result of combining
every row from one table with every row from another table. You get the Cartesian
product when you join two tables and do not subset them with a WHERE clause or ON
clause.
proc sql;
title The Cartesian Product of;
title2 LEFTTAB and RIGHTTAB;
select *
from lefttab, righttab;
joined-table
1085
The LEFTTAB and RIGHTTAB tables can be joined by listing the table names in the
FROM clause. The following query represents an equijoin because the values of
Continent from each table are matched. The column names are prexed with the table
aliases so that the correct columns can be selected.
proc sql;
title Inner Join;
select *
from lefttab as l, righttab as r
where l.continent=r.continent;
Inner Join
Continent Export
Country
Continent Export
Country
-----------------------------------------------------------NA
wheat
Canada
NA
sugar
USA
EUR
corn
France
EUR
corn
Spain
EUR
corn
France
EUR
beets
Belgium
EUR
rice
Italy
EUR
corn
Spain
EUR
rice
Italy
EUR
beets
Belgium
The following PROC SQL step is equivalent to the previous one and shows how to
write an equijoin using the INNER JOIN and ON keywords.
proc sql;
title Inner Join;
select *
from lefttab as l inner join
righttab as r
on l.continent=r.continent;
See Example 4 on page 1131, Example 13 on page 1154, and Example 14 on page
1158 for more examples.
1086
joined-table
Chapter 44
Outer Joins
Outer joins are inner joins that have been augmented with rows that did not match
with any row from the other table in the join. The three types of outer joins are left,
right, and full.
A left outer join, specied with the keywords LEFT JOIN and ON, has all the rows
from the Cartesian product of the two tables for which the sql-expression is true, plus
rows from the rst (LEFTTAB) table that do not match any row in the second
(RIGHTTAB) table.
proc sql;
title Left Outer Join;
select *
from lefttab as l left join
righttab as r
on l.continent=r.continent;
A right outer join, specied with the keywords RIGHT JOIN and ON, has all the
rows from the Cartesian product of the two tables for which the sql-expression is true,
plus rows from the second (RIGHTTAB) table that do not match any row in the rst
(LEFTTAB) table.
proc sql;
title Right Outer Join;
select *
from lefttab as l right join
righttab as r
on l.continent=r.continent;
A full outer join, specied with the keywords FULL JOIN and ON, has all the rows
from the Cartesian product of the two tables for which the sql-expression is true, plus
rows from each table that do not match any row in the other table.
joined-table
proc sql;
title Full Outer Join;
select *
from lefttab as l full join
righttab as r
on l.continent=r.continent;
Cross Joins
A cross join returns as its result table the product of the two tables.
Using the LEFTTAB and RIGHTTAB example tables, the following program
demonstrates the cross join:
proc sql;
title Cross Join;
select *
from lefttab as l cross join
righttab as r;
Cross Join
Continent Export
Country
Continent Export
Country
-----------------------------------------------------------NA
wheat
Canada
NA
sugar
USA
NA
wheat
Canada
EUR
corn
Spain
NA
wheat
Canada
EUR
beets
Belgium
NA
wheat
Canada
ASIA
rice
Vietnam
EUR
corn
France
NA
sugar
USA
EUR
corn
France
EUR
corn
Spain
EUR
corn
France
EUR
beets
Belgium
EUR
corn
France
ASIA
rice
Vietnam
EUR
rice
Italy
NA
sugar
USA
EUR
rice
Italy
EUR
corn
Spain
EUR
rice
Italy
EUR
beets
Belgium
EUR
rice
Italy
ASIA
rice
Vietnam
AFR
oil
Egypt
NA
sugar
USA
AFR
oil
Egypt
EUR
corn
Spain
AFR
oil
Egypt
EUR
beets
Belgium
AFR
oil
Egypt
ASIA
rice
Vietnam
1087
1088
joined-table
Chapter 44
The cross join is not functionally different from a Cartesian product join. You would
get the same result by submitting the following program:
proc sql;
select *
from lefttab, righttab;
Do not use an ON clause with a cross join. An ON clause will cause a cross join to
fail. However, you can use a WHERE clause to subset the output.
Union Joins
A union join returns a union of the columns of both tables. The union join places in
the results all rows with their respective column values from each input table. Columns
that do not exist in one table will have null (missing) values for those rows in the result
table. The following example demonstrates a union join.
proc sql;
title Union Join;
select *
from lefttab union join righttab;
Union Join
Continent Export
Country
Continent Export
Country
-----------------------------------------------------------NA
sugar
USA
EUR
corn
Spain
EUR
beets
Belgium
ASIA
rice
Vietnam
NA
wheat
Canada
EUR
corn
France
EUR
rice
Italy
AFR
oil
Egypt
Using a union join is similar to concatenating tables with the OUTER UNION set
operator. See query-expression on page 1093 for more information.
Do not use an ON clause with a union join. An ON clause will cause a union join to
fail.
Natural Joins
A natural join selects rows from two tables that have equal values in columns that
share the same name and the same type. An error results if two columns have the same
name but different types. If join-specication is omitted when specifying a natural join,
then INNER is implied. If no like columns are found, then a cross join is performed.
The following examples use these two tables:
table1
x
y
z
---------------------------1
2
3
2
1
8
6
5
4
2
5
6
joined-table
1089
table2
x
b
z
---------------------------1
5
3
3
5
4
2
7
8
6
0
4
Do not use an ON clause with a natural join. An ON clause will cause a natural join
to fail. When using a natural join, an ON clause is implied, matching all like columns.
1090
joined-table
Chapter 44
table and then joining that table with the third one for the same result. However,
PROC SQL can do it all in one step as shown in the next example.
The example shows the joining of three tables: COMM, PRICE, and AMOUNT. To
calculate the total revenue from exports for each country, you need to multiply the
amount exported (AMOUNT table) by the price of each unit (PRICE table), and you
must know the commodity that each country exports (COMM table).
COMM Table
Continent Export
Country
----------------------------NA
wheat
Canada
EUR
corn
France
EUR
rice
Italy
AFR
oil
Egypt
PRICE Table
Export
Price
-----------------rice
3.56
corn
3.45
oil
18
wheat
2.98
AMOUNT Table
Country
Quantity
-----------------Canada
16000
France
2400
Italy
500
Egypt
10000
proc sql;
title Total Export Revenue;
select c.Country, p.Export, p.Price,
a.Quantity,a.quantity*p.price
as Total
from comm c, price p, amount a
where c.export=p.export
and c.country=a.country;
LIKE condition
1091
Note:
LIKE condition
Tests for a matching pattern.
sql-expression <NOT> LIKE sql-expression < ESCAPE character-expression>
Arguments
sql-expression
Details
The LIKE condition selects rows by comparing character strings with a
pattern-matching specication. It resolves to true and displays the matched string(s) if
the left operand matches the pattern specied by the right operand.
The ESCAPE clause is used to search for literal instances of the percent (%) and
underscore (_) characters, which are usually used for pattern matching.
1092
LIKE condition
Chapter 44
matches Smuggle.
S_o
3 The condition like a_% matches app, a_%, and a__, because the underscore (_)
in the search pattern matches any single character (including the underscore), and
the percent (%) in the search pattern matches zero or more characters, including
% and _.
3 The condition like a_^% escape ^ matches only a_%, because the escape
character (^) species that the pattern search for a literal %.
3 The condition like a_% escape _ matches none of the values, because the
escape character (_) species that the pattern search for an a followed by a literal
%, which does not apply to any of these values.
query-expression
1093
Note: When you are using the % character, be aware of the effect of trailing blanks.
You may have to use the TRIM function to remove trailing blanks in order to match
values. 4
LOWER function
Converts the case of a character string to lowercase.
See also: UPPER function on page 1114
LOWER (sql-expression)
Argument
sql-expression
Details
The LOWER function operates on character strings. LOWER changes the case of its
argument to all lowercase.
Note: The LOWER function is provided for compatibility with the ANSI SQL
standard. You can also use the SAS function LOWCASE. 4
query-expression
Retrieves data from tables.
See also: table-expression on page 1113, Query Expressions (Subqueries) on page
1102, and In-Line Views on page 1064
Arguments
table-expression
1094
query-expression
Chapter 44
set-operator
tableexpression
queryexpression
SELECT clause
FROM clause
(more clauses)
set operator
tableexpression
SELECT clause
FROM clause
(more clauses)
Set Operators
PROC SQL provides these set operators:
OUTER UNION
concatenates the query results.
UNION
produces all unique rows from both queries.
EXCEPT
produces rows that are part of the rst query only.
INTERSECT
produces rows that are common to both query results.
A query-expression with set operators is evaluated as follows.
3 Each intermediate result table then becomes an operand linked with a set
operator to form an expression, for example, A UNION B.
3 If the query-expression involves more than two table-expressions, then the result
from the rst two becomes an operand for the next set operator and operand, such
as (A UNION B) EXCEPT C, ((A UNION B) EXCEPT C) INTERSECT D, and so on.
query-expression
1095
PROC SQL performs set operations even if the tables or views that are referred to in
the table-expressions do not have the same number of columns. The reason for this
behavior is that the ANSI Standard for SQL requires that tables or views that are
involved in a set operation have the same number of columns and that the columns have
matching data types. If a set operation is performed on a table or view that has fewer
columns than the one(s) with which it is being linked, then PROC SQL extends the table
or view with fewer columns by creating columns with missing values of the appropriate
data type. This temporary alteration enables the set operation to be performed correctly.
ALL Keyword
The set operators automatically eliminate duplicate rows from their output tables.
The optional ALL keyword preserves the duplicate rows, reduces the execution by one
step, and thereby improves the query-expressions performance. You use it when you
want to display all the rows resulting from the table-expressions, rather than just the
unique rows. The ALL keyword is used only when a set operator is also specied.
OUTER UNION
Performing an OUTER UNION is very similar to performing the SAS DATA step
with a SET statement. The OUTER UNION concatenates the intermediate results from
the table-expressions. Thus, the result table for the query-expression contains all the
rows produced by the rst table-expression followed by all the rows produced by the
second table-expression. Columns with the same name are in separate columns in the
result table.
For example, the following query expression concatenates the ME1 and ME2 tables
but does not overlay like-named columns. Output 44.1 shows the result.
ME1
IDnum
Jobcode
Salary
Bonus
-------------------------------------1400
ME1
29769
587
1403
ME1
28072
342
1120
ME1
28619
986
1120
ME1
28619
986
1096
query-expression
Chapter 44
ME2
IDnum
Jobcode
Salary
---------------------------1653
ME2
35108
1782
ME2
35345
1244
ME2
36925
proc sql;
title ME1 and ME2: OUTER UNION;
select *
from me1
outer union
select *
from me2;
Output 44.1
IDnum
Jobcode
Salary
Bonus IDnum
Jobcode
Salary
-------------------------------------------------------------------1400
ME1
29769
587
.
1403
ME1
28072
342
.
1120
ME1
28619
986
.
1120
ME1
28619
986
.
.
. 1653
ME2
35108
.
. 1782
ME2
35345
.
. 1244
ME2
36925
Concatenating tables with the OUTER UNION set operator is similar to performing
a union join. See Union Joins on page 1088 for more information.
To overlay columns with the same name, use the CORRESPONDING keyword.
proc sql;
title ME1 and ME2: OUTER UNION CORRESPONDING;
select *
from me1
outer union corr
select *
from me2;
query-expression
1097
3 The ALL keyword is not used with OUTER UNION because this operators default
action is to include all rows in a result table. Thus, both rows from the table ME1
where IDnum is 1120 appear in the output.
UNION
The UNION operator produces a table that contains all the unique rows that result
from both table-expressions. That is, the output table contains rows produced by the
rst table-expression, the second table-expression, or both.
Columns are appended by position in the tables, regardless of the column names.
However, the data type of the corresponding columns must match or the union will not
occur. PROC SQL issues a warning message and stops executing.
The names of the columns in the output table are the names of the columns from the
rst table-expression unless a column (such as an expression) has no name in the rst
table-expression. In such a case, the name of that column in the output table is the
name of the respective column in the second table-expression.
In the following example, PROC SQL combines the two tables:
proc sql;
title ME1 and ME2: UNION;
select *
from me1
union
select *
from me2;
1098
query-expression
Chapter 44
In the following example, ALL includes the duplicate row from ME1. In addition,
ALL changes the sorting by specifying that PROC SQL make one pass only. Thus, the
values from ME2 are simply appended to the values from ME1.
proc sql;
title ME1 and ME2: UNION ALL;
select *
from me1
union all
select *
from me2;
EXCEPT
The EXCEPT operator produces (from the rst table-expression) an output table that
has unique rows that are not in the second table-expression. If the intermediate result
from the rst table-expression has at least one occurrence of a row that is not in the
intermediate result of the second table-expression, then that row (from the rst
table-expression) is included in the result table.
In the following example, the IN_USA table contains ights to cities within and
outside the USA. The OUT_USA table contains ights only to cities outside the USA.
This example returns only the rows from IN_USA that are not also in OUT_USA:
proc sql;
title Flights from IN_USA Only;
select * from in_usa
except
select * from out_usa;
IN_USA
Flight
Dest
-----------------145
ORD
156
WAS
188
LAX
193
FRA
207
LON
sql-expression
1099
OUT_USA
Flight
Dest
-----------------193
FRA
207
LON
311
SJA
INTERSECT
The INTERSECT operator produces an output table that has rows that are common
to both tables. For example, using the IN_USA and OUT_USA tables shown above, the
following example returns rows that are in both tables:
proc sql;
title Flights from Both IN_USA and OUT_USA;
select * from in_usa
intersect
select * from out_usa;
sql-expression
Produces a value from a sequence of operands and operators.
operand operator operand
Arguments
operand
1100
sql-expression
Chapter 44
SAS Functions
PROC SQL supports the same SAS functions as the DATA step, except for the
functions LAG, DIF, and SOUND. For example, the SCAN function is used in the
following query:
select style, scan(street,1) format=$15.
from houses;
USER Literal
USER can be specied in a view denition, for example, to create a view that restricts
access to those in the users department. Note that the USER literal value is stored in
uppercase, so it is advisable to use the UPCASE function when comparing to this value:
create view myemp as
select * from dept12.employees
where upcase(manager)=user;
This view produces a different set of employee information for each manager who
references it.
sql-expression
1101
Unlike missing values in some versions of SQL, missing values in SAS always appear
rst in the collating sequence. Therefore, in Boolean and comparison operations, the
following expressions resolve to true in a predicate:
3>null
-3>null
0>null
Group
Operator
Description
()
case-expression
**
raises to a power
unary +, unary -
multiplies
divides
adds
subtracts
||
concatenates
<NOT> IN condition
IS <NOT> condition
=, eq
equals
>, gt
is greater than
<, lt
is less than
>=, ge
<=, le
=*
1102
Group
sql-expression
Chapter 44
Operator
Description
eqt
gtt
ltt
get
let
net
&, AND
|, OR
indicates logical OR
10
, ^, NOT
Symbols for operators might vary, depending on your operating environment. See
SAS Language Reference: Dictionary for more information on operators and expressions.
sql-expression
1103
Subqueries can return multiple values. The following example uses the tables
PROCLIB.DELAY and PROCLIB.MARCH. These tables contain information about the
same ights and have the Flight column in common. The following subquery returns all
the values for Flight in PROCLIB.DELAY for international ights. The values from the
subquery complete the WHERE clause in the outer query. Thus, when the outer query
is executed, only the international ights from PROCLIB.MARCH are in the output.
options ls=64 nodate nonumber;
proc sql outobs=5;
title International Flights from;
title2 PROCLIB.MARCH;
select Flight, Date, Dest, Boarded
from proclib.march
where flight in
(select flight
from proclib.delay
where destype=International);
1104
sql-expression
Chapter 44
subquery returns no rows, then the result of an ALL comparison is true for each row of
the outer query.
If ANY is specied, then the comparison is true if it is true for any one of the values
that are returned by the subquery. If a subquery returns no rows, then the result of an
ANY comparison is false for each row of the outer query.
The following example selects all those in PROCLIB.PAYROLL who earn more than
the highest paid ME3:
options ls=64 nodate nonumber ;
proc sql;
title Employees who Earn More than;
title2 All MEs;
select *
from proclib.payroll
where salary > all (select salary
from proclib.payroll
where jobcode=ME3);
Note: See the rst item in Subqueries and Efciency on page 1105 for a note
about efciency when using ALL. 4
In order to visually separate a subquery from the rest of the query, you can enclose
the subquery in any number of pairs of parentheses.
Correlated Subqueries
In a correlated subquery, the WHERE expression in a subquery refers to values in a
table in the outer query. The correlated subquery is evaluated for each row in the outer
sql-expression
1105
query. With correlated subqueries, PROC SQL executes the subquery and the outer
query together.
The following example uses the PROCLIB.DELAY and PROCLIB.MARCH tables. A
DATA step (PROCLIB.DELAY on page 1390) creates PROCLIB.DELAY.
PROCLIB.MARCH is shown in Example 13 on page 1154. PROCLIB.DELAY has the
Flight, Date, Orig, and Dest columns in common with PROCLIB.MARCH:
proc sql outobs=5;
title International Flights;
select *
from proclib.march
where International in
(select destype
from proclib.delay
where march.Flight=delay.Flight);
The subquery resolves by substituting every value for MARCH.Flight into the
subquerys WHERE clause, one row at a time. For example, when MARCH.Flight=219,
the subquery resolves as follows:
1 PROC SQL retrieves all the rows from DELAY where Flight=219 and passes their
3 The WHERE clause checks to see if International is in the list. Because it is, all
rows from MARCH that have a value of 219 for Flight become part of the output.
The following output contains the rows from MARCH for international ights only.
Output 44.2
Flight
Date Depart Orig Dest
Miles
Boarded Capacity
----------------------------------------------------------------219
01MAR94
9:31 LGA
LON
3442
198
250
622
01MAR94
12:19 LGA
FRA
3857
207
250
132
01MAR94
15:35 LGA
YYZ
366
115
178
271
01MAR94
13:17 LGA
PAR
3635
138
250
219
02MAR94
9:31 LGA
LON
3442
147
250
1106
SUBSTRING function
Chapter 44
proc sql;
select * from proclib.payroll
where salary> (select max(salary)
from proclib.payroll
where jobcode=ME3);
3 With subqueries, use IN instead of EXISTS when possible. For example, the
following queries produce the same result, but the second query is usually more
efcient:
proc sql;
select *
from proclib.payroll p
where exists (select *
from staff s
where p.idnum=s.idnum
and state=CT);
proc sql;
select *
from proclib.payroll
where idnum in (select idnum
from staff
where state=CT);
SUBSTRING function
Returns a part of a character expression.
SUBSTRING (sql-expression FROM start <FOR length>)
3 start is a number (not a variable or column name) that species the position,
counting from the left end of the character string, at which to begin extracting the
substring.
3 length is a number (not a variable or column name) that species the length of the
substring that is to be extracted.
Details
The SUBSTRING function operates on character strings. SUBSTRING returns a
specied part of the input character string, beginning at the position that is specied by
start. If length is omitted, then the SUBSTRING function returns all characters from
start to the end of the input character string. The values of start and length must be
numbers (not variables) and can be positive, negative, or zero.
If start is greater than the length of the input character string, then the
SUBSTRING function returns a zero-length string.
If start is less than 1, then the SUBSTRING function begins extraction at the
beginning of the input character string.
If length is specied, then the sum of start and length cannot be less than start or an
error is returned. If the sum of start and length is greater than the length of the input
summary-function
1107
character string, then the SUBSTRING function returns all characters from start to the
end of the input character string. If the sum of start and length is less than 1, then the
SUBSTRING function returns a zero-length string.
Note: The SUBSTRING function is provided for compatibility with the ANSI SQL
standard. You can also use the SAS function SUBSTR. 4
summary-function
Performs statistical summary calculations.
Restriction:
See also: GROUP BY on page 1065, HAVING Clause on page 1067, SELECT Clause on
page 1058, and table-expression on page 1113
Featured in: Example 8 on page 1143, Example 12 on page 1152, and Example 15 on
page 1160
Arguments
summary-function
1108
summary-function
Chapter 44
STD
standard deviation
STDERR
standard error of the mean
SUM
sum of values
SUMWGT
sum of the WEIGHT variable values*
T
Students t value for testing the hypothesis that the population mean is zero
USS
uncorrected sum of squares
VAR
variance
For a description and the formulas used for these statistics, see Appendix 1, SAS
Elementary Statistics Procedures, on page 1339.
DISTINCT
species that only the unique values of sql-expression be used in the calculation.
ALL
Summarizing Data
Summary functions produce a statistical summary of the entire table or view that is
listed in the FROM clause or for each group that is specied in a GROUP BY clause. If
GROUP BY is omitted, then all the rows in the table or view are considered to be a
single group. These functions reduce all the values in each row or column in a table to
one summarizing or aggregate value. For this reason, these functions are often called
aggregate functions. For example, the sum (one value) of a column results from the
addition of all the values in the column.
Counting Rows
The COUNT function counts rows. COUNT(*) returns the total number of rows in a
group or in a table. If you use a column name as an argument to COUNT, then the
result is the total number of rows in a group or in a table that have a nonmissing value
for that column. If you want to count the unique values in a column, then specify
COUNT(DISTINCT column).
* Currently, there is no way to designate a WEIGHT variable for a table in PROC SQL. Thus, each row (or observation) has a
weight of 1.
summary-function
1109
Summary Table
X
Y
Z
---------------------------1
3
4
2
4
5
8
9
4
4
5
4
If you use one argument in the function, then the calculation is performed on that
column only. If you use more than one argument, then the calculation is performed on
each row of the specied columns. In the following PROC SQL step, the MIN and MAX
functions return the minimum and maximum of the columns they are used with. The
SUM function returns the sum of each row of the columns specied as arguments:
proc sql;
select min(x) as Colmin_x,
min(y) as Colmin_y,
max(z) as Colmax_z,
sum(x,y,z) as Rowsum
from summary;
Summary Table
Colmin_x Colmin_y Colmax_z
Rowsum
-------------------------------------1
3
5
8
1
3
5
11
1
3
5
21
1
3
5
13
1110
summary-function
Chapter 44
Remerging Data
When you use a summary function in a SELECT clause or a HAVING clause, you
might see the following message in the SAS log:
NOTE: The query requires remerging summary
statistics back with the original
data.
The process of remerging involves two passes through the data. On the rst pass,
PROC SQL
3 calculates and returns the value of summary functions. It then uses the result to
calculate the arithmetic expressions in which the summary function participates.
Salary Information
(First 10 Rows Only)
Id
Number Jobcode
Salary AvgSalary
-----------------------------------1704
BCK
25465
25794.22
1677
BCK
26007
25794.22
1383
BCK
25823
25794.22
1845
BCK
25996
25794.22
1100
BCK
25004
25794.22
1663
BCK
26452
25794.22
1673
BCK
25477
25794.22
1389
BCK
25028
25794.22
1834
BCK
26896
25794.22
1132
FA1
22413
23039.36
You can change the previous query to return only the average salary for each
jobcode. The following query does not require remerging because the rst pass of the
data does the summarizing and the grouping. A second pass is not necessary.
proc sql outobs=10;
title Average Salary for Each Jobcode;
select Jobcode, avg(salary) as AvgSalary
from proclib.payroll
group by jobcode;
summary-function
1111
When you use the HAVING clause, PROC SQL may have to remerge data to resolve
the HAVING expression.
First, consider a query that uses HAVING but that does not require remerging. The
query groups the data by values of Jobcode, and the result contains one row for each
value of Jobcode and summary information for people in each Jobcode. On the rst
pass, the summary functions provide values for the Number, Average Age, and Average
Salary columns. The rst pass provides everything that PROC SQL needs to resolve
the HAVING clause, so no remerging is necessary.
proc sql outobs=10;
title Summary Information for Each Jobcode;
title2 (First 10 Rows Only);
select Jobcode,
count(jobcode) as number
label=Number,
avg(int((today()-birth)/365.25))
as avgage format=2.
label=Average Age,
avg(salary) as avgsal format=dollar8.
label=Average Salary
from proclib.payroll
group by jobcode
having avgage ge 30;
1112
summary-function
Chapter 44
In the following query, PROC SQL remerges the data because the HAVING clause
uses the SALARY column in the comparison and SALARY is not in the GROUP BY
clause.
proc sql outobs=10;
title Employees who Earn More than the;
title2 Average for Their Jobcode;
title3 (First 10 Rows Only);
select Jobcode, Salary,
avg(salary) as AvgSalary
from proclib.payroll
group by jobcode
having salary > AvgSalary;
table-expression
1113
proc sql;
select
jobcode, salary,
avg(salary) as avsal
from proclib.payroll
group by jobcode
having salary > avsal;
3 a column from the input table is specied in the SELECT clause and is not
specied in the GROUP BY clause. This rule does not refer to columns used as
arguments to summary functions in the SELECT clause.
For example, in the following query, the presence of IdNumber in the SELECT
clause causes PROC SQL to remerge the data because IdNumber is not involved in
grouping or summarizing during the rst pass. In order for PROC SQL to retrieve
the values for IdNumber, it must make a second pass through the data.
proc sql;
select IdNumber, jobcode,
avg(salary) as avsal
from proclib.payroll
group by jobcode;
table-expression
Denes part or all of a query-expression.
See also: query-expression on page 1093
Details
A table-expression is a SELECT statement. It is the fundamental building block of
most SQL procedure statements. You can combine the results of multiple
table-expressions with set operators, which creates a query-expression. Use one
ORDER BY clause for an entire query-expression. Place a semicolon only at the end of
the entire query-expression. A query-expression is often only one SELECT statement or
table-expression.
1114
UPPER function
Chapter 44
UPPER function
Converts the case of a character string to uppercase.
See also: LOWER function on page 1093
UPPER (sql-expression)
Details
The UPPER function operates on character strings. UPPER converts the case of its
argument to all uppercase.
SAS data set options can be combined with SQL statement arguments:
proc sql;
create table test
(a character, b numeric, pw=cat);
create index staffidx on
staff1 (lastname, alter=dog);
You cannot use SAS data set options with DICTIONARY tables because
DICTIONARY tables are read-only objects.
The only SAS data set options that you can use with PROC SQL views are those that
assign and provide SAS passwords: READ=, WRITE=, ALTER=, and PW=.
See SAS Language Reference: Dictionary for a description of SAS data set options.
1115
Return Codes
As you use PROC SQL statements that are available in the Pass-Through Facility,
any errors are written to the SAS log. The return codes and messages that are
generated by the Pass-Through Facility are available to you through the SQLXRC and
SQLXMSG macro variables. Both macro variables are described in Using Macro
Variables Set by PROC SQL on page 1119.
1116
Chapter 44
checks to see if the DBMS can do the join. If it can, then PROC SQL passes the join to
the DBMS. This enhances performance by reducing data movement and translation. If
the DBMS cannot do the join, then PROC SQL processes the join. Using the
SAS/ACCESS LIBNAME statement can often provide you with the performance
benets of the SQL Procedure Pass-Through Facility without having to write
DBMS-specic code.
To use the SAS/ACCESS LIBNAME statement, you must have SAS/ACCESS
software installed for your DBMS. For more information about the SAS/ACCESS
LIBNAME statement, refer to the SAS/ACCESS documentation for your DBMS.
For an example that demonstrates the use of a DICTIONARY table, see Example 6
on page 1136.
The following table describes the DICTIONARY tables that are available and shows
the associated SASHELP view(s) for each table.
Table 44.2
DICTIONARY table
SASHELP
view
Description
CATALOGS
VCATALG
CHECK_CONSTRAINTS
VCHKCON
COLUMNS
VCOLUMN
CONSTRAINT_COLUMN_USAGE
VCNCOLU
CONSTRAINT_TABLE_USAGE
VCNTABU
DICTIONARIES
VDCTNRY
ENGINES
VENGINE
DICTIONARY table
SASHELP
view
Description
EXTFILES
VEXTFL
FORMATS
VFORMAT
GOPTIONS
VGOPT
VALLOPT
INDEXES
VINDEX
LIBNAMES
VLIBNAM
MACROS
VMACRO
MEMBERS
VMEMBER
1117
VSACCES
VSCATLG
VSLIB
VSTABLE
VSTABVW
VSVIEW
OPTIONS
VOPTION
VALLOPT
REFERENTIAL_CONSTRAINTS
VREFCON
STYLES
VSTYLE
TABLE_CONSTRAINTS
VTABCON
TABLES
VTABLE
TITLES
VTITLE
VIEWS
VVIEW
1118
Chapter 44
6
proc sql;
7
describe table dictionary.indexes;
NOTE: SQL table DICTIONARY.INDEXES was created like:
create table DICTIONARY.INDEXES
(
libname char(8) label=Library Name,
memname char(32) label=Member Name,
memtype char(8) label=Member Type,
name char(32) label=Column Name,
idxusage char(9) label=Column Index Type,
indxname char(32) label=Index Name,
indxpos num label=Position of Column in Concatenated Key,
nomiss char(3) label=Nomiss Option,
unique char(3) label=Unique Option
);
Use the DESCRIBE VIEW statement in PROC SQL to nd out how a SASHELP
view is dened. Heres an example:
proc sql;
describe view sashelp.vstabvw;
6
proc sql;
7
describe view sashelp.vstabvw;
NOTE: SQL view SASHELP.VSTABVW is defined as:
select
from
where
order by
1119
process by optimizing the query before the discovery process is launched. Therefore,
although it is possible to access DICTIONARY table information with SAS procedures
or the DATA step by using the SASHELP views, it is often more efcient to use PROC
SQL instead.
For example, the following programs both produce the same result, but the PROC
SQL step runs much faster because the WHERE clause is processed prior to opening
the tables that are referenced by the SASHELP.VCOLUMN view:
data mytable;
set sashelp.vcolumn;
where libname=WORK and memname=SALES;
run;
proc sql;
create table mytable as
select * from sashelp.vcolumn
where libname=WORK and memname=SALES;
quit;
Note: SAS does not maintain DICTIONARY table information between queries.
Each query of a DICTIONARY table launches a new discovery process. 4
If you are querying the same DICTIONARY table several times in a row, then you
can get even faster performance by creating a temporary SAS data set (with the DATA
step SET statement or PROC SQL CREATE TABLE AS statement) with the
information that you want and running your query against that data set.
1120
Chapter 44
SQLRC
contains the following status values that indicate the success of the SQL procedure
statement:
0
PROC SQL statement completed successfully with no errors.
4
PROC SQL statement encountered a situation for which it issued a warning.
The statement continued to execute.
8
PROC SQL statement encountered an error. The statement stopped
execution at this point.
12
PROC SQL statement encountered an internal error, indicating a bug in
PROC SQL that should be reported to SAS Technical Support. These errors
can occur only during compile time.
16
PROC SQL statement encountered a user error. This error code is used, for
example, when a subquery (that can only return a single value) evaluates to
more than one row. These errors can only be detected during run time.
24
PROC SQL statement encountered a system error. This error is used, for
example, if the system cannot write to a PROC SQL table because the disk is
full. These errors can occur only during run time.
28
PROC SQL statement encountered an internal error, indicating a bug in
PROC SQL that should be reported to SAS Technical Support. These errors
can occur only during run time.
SQLOOPS
contains the number of iterations that the inner loop of PROC SQL executes. The
number of iterations increases proportionally with the complexity of the query. See
also the description of LOOPS= on page 1036.
SQLXRC
contains the DBMS-specic return code that is returned by the Pass-Through
Facility.
SQLXMSG
contains descriptive information and the DBMS-specic return code for the error
that is returned by the Pass-Through Facility.
Note: Because the value of the SQLXMSG macro variable can contain special
characters (such as &, %, /, *, and ;), use the %SUPERQ macro function when
printing the value:
%put %superq(sqlxmsg);
See SAS Macro Language: Reference for information about the %SUPERQ
function. 4
This example retrieves the data but does not display them in SAS output because of
the NOPRINT option in the PROC SQL statement. The %PUT macro statement
displays the macro variables values.
proc sql noprint;
select *
1121
from proclib.payroll;
%put sqlobs=**&sqlobs**
sqloops=**&sqloops**
sqlrc=**&sqlrc**;
The message in Output 44.3 appears in the SAS log and gives you the macros values.
Output 44.3
40
options ls=80;
41
42
proc sql noprint;
43
select *
44
from proclib.payroll;
45
46
%put sqlobs=**&sqlobs**
47
sqloops=**&sqloops**
48
sqlrc=**&sqlrc**;
sqlobs=**1**
sqloops=**11**
sqlrc=**0**
Macro variables that are generated by PROC SQL follow the scoping rules for %LET.
For more information about macro variable scoping, see SAS Macro Language:
Reference.
3 If the view accesses a DBMS table, then you must have been granted the
appropriate authorization by the external database management system (for
example, DB2). You must have installed the SAS/ACCESS software for your
DBMS. See the SAS/ACCESS interface guide for your DBMS for more information
on SAS/ACCESS views.
3 You can update only a single table through a view. The table cannot be joined to
another table or linked to another table with a set-operator. The view cannot
contain a subquery.
3 You can update a column in a view using the columns alias, but you cannot
update a derived column, that is, a column produced by an expression. In the
following example, you can update the column SS, but not WeeklySalary.
create view EmployeeSalaries as
select Employee, SSNumber as SS,
Salary/52 as WeeklySalary
from employees;
1122
Chapter 44
Compliance
PROC SQL follows most of the guidelines set by the American National Standards
Institute (ANSI) in its implementation of SQL. However, it is not fully compliant with
the current ANSI Standard for SQL.*
The SQL research project at SAS has focused primarily on the expressive power of
SQL as a query language. Consequently, some of the database features of SQL have not
yet been implemented in PROC SQL.
3 The keyword CASE is always reserved; its use in the CASE expression (an SQL2
feature) precludes its use as a column name.
If you have a column named CASE in a table and you want to specify it in a
PROC SQL step, then you can use the SAS data set option RENAME= to rename
that column for the duration of the query. You can also surround CASE in double
quotation marks (CASE) and set the PROC SQL option DQUOTE=ANSI.
3 The keywords AS, ON, FULL, JOIN, LEFT, FROM, WHEN, WHERE, ORDER,
GROUP, RIGHT, INNER, OUTER, UNION, EXCEPT, HAVING, and INTERSECT
cannot normally be used for table aliases. These keywords all introduce clauses
that appear after a table name. Since the alias is optional, PROC SQL deals with
this ambiguity by assuming that any one of these words introduces the
corresponding clause and is not the alias. If you want to use one of these keywords
as an alias, then use the PROC SQL option DQUOTE=ANSI.
3 The keyword USER is reserved for the current userid. If you specify USER on a
SELECT statement in conjunction with a CREATE TABLE statement, then the
column is created in the table with a temporary column name that is similar to
_TEMA001. If you specify USER in a SELECT statement without using the
CREATE TABLE statement, then the column is written to the output without a
column heading. In either case, the value for the column varies by operating
environment, but is typically the userid of the user who is submitting the program
or the value of the &SYSJOBID automatic macro variable.
If you have a column named USER in a table and you want to specify it in a
PROC SQL step, then you can use the SAS data set option RENAME= to rename
that column for the duration of the query. You can also enclose USER with double
quotation marks (USER) and set the PROC SQL option DQUOTE=ANSI.
* International Organization for Standardization (ISO): Database SQL. Document ISO/IEC 9075:1992. Also available as
American National Standards Institute (ANSI) Document ANSI X3.135-1992.
1123
Column Modiers
PROC SQL supports the SAS INFORMAT=, FORMAT=, and LABEL= modiers for
expressions within the SELECT clause. These modiers control the format in which
output data are displayed and labeled.
In-Line Views
The ability to code nested query-expressions in the FROM clause is a requirement of
the ANSI Standard. PROC SQL supports such nested coding.
Outer Joins
The ability to include columns that both match and do not match in a join-expression
is a requirement of the ANSI Standard. PROC SQL supports this ability.
Arithmetic Operators
PROC SQL supports the SAS exponentiation (**) operator. PROC SQL uses the
notation <> to mean not equal.
Orthogonal Expressions
PROC SQL permits the combination of comparison, Boolean, and algebraic
expressions. For example, (X=3)*7 yields a value of 7 if X=3 is true because true is
dened to be 1. If X=3 is false, then it resolves to 0 and the entire expression yields a
value of 0.
PROC SQL permits a subquery in any expression. This feature is required by the
ANSI Standard. Therefore, you can have a subquery on the left side of a comparison
operator in the WHERE expression.
PROC SQL permits you to order and group data by any kind of mathematical
expression (except those including summary functions) using ORDER BY and GROUP
BY clauses. You can also group by an expression that appears on the SELECT clause
by using the integer that represents the expressions ordinal position in the SELECT
clause. You are not required to select the expression by which you are grouping or
ordering. See ORDER BY Clause on page 1067 and GROUP BY Clause on page 1065
for more information.
Set Operators
The set operators UNION, INTERSECT, and EXCEPT are required by the ANSI
Standard. PROC SQL provides these operators plus the OUTER UNION operator.
1124
Chapter 44
The ANSI Standard also requires that the tables being operated upon all have the
same number of columns with matching data types. The SQL procedure works on
tables that have the same number of columns, as well as on those that do not, by
creating virtual columns so that a query can evaluate correctly. See query-expression
on page 1093 for more information.
Statistical Functions
PROC SQL supports many more summary functions than required by the ANSI
Standard for SQL.
PROC SQL supports the remerging of summary function results into the tables
original data. For example, computing the percentage of total is achieved with 100*x/
SUM(x) in PROC SQL. See summary-function on page 1107 for more information on
the available summary functions and remerging data.
ROLLBACK Statement
The ROLLBACK statement is not supported. The UNDO_POLICY= option in the
PROC SQL statement addresses rollback. See the description of the UNDO_POLICY=
option in PROC SQL Statement on page 1033 for more information.
Three-Valued Logic
ANSI-compatible SQL has three-valued logic, that is, special cases for handling
comparisons involving NULL values. Any value compared with a NULL value
evaluates to NULL.
PROC SQL follows the SAS convention for handling missing values: when numeric
NULL values are compared to non-NULL numbers, the NULL values are less than or
smaller than all the non-NULL values; when character NULL values are compared to
non-NULL characters, the character NULL values are treated as a string of blanks.
Program
1125
Embedded SQL
Currently there is no provision for embedding PROC SQL statements in other SAS
programming environments, such as the DATA step or SAS/IML software.
This example creates the table PROCLIB.PAYLIST and inserts data into it.
Program
Declare the PROCLIB library. The PROCLIB library is used in these examples to store
created tables.
libname proclib SAS-data-library;
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=40;
1126
Program
Chapter 44
Gender char(1),
Jobcode char(3),
Salary num,
Birth num informat=date7.
format=date7.,
Hired num informat=date7.
format=date7.);
Insert values into the PROCLIB.PAYLIST table. The INSERT statement inserts data
values into PROCLIB.PAYLIST according to the position in the VALUES clause. Therefore, in
the rst VALUES clause, 1639 is inserted into the rst column, F into the second column, and
so forth. Dates in SAS are stored as integers with 0 equal to January 1, 1960. Sufxing the date
with a d is one way to use the internal value for dates.
insert into proclib.paylist
values(1639,F,TA1,42260,26JUN70d,28JAN91d)
values(1065,M,ME3,38090,26JAN54d,07JAN92d)
values(1400,M,ME1,29769.05NOV67d,16OCT90d)
Include missing values in the data. The value null represents a missing value for the
character column Jobcode. The period represents a missing value for the numeric column Salary.
values(1561,M,null,36514,30NOV63d,07OCT87d)
values(1221,F,FA3,.,22SEP63d,04OCT94d);
Display the entire PROCLIB.PAYLIST table. The SELECT clause selects columns from
PROCLIB.PAYLIST. The asterisk (*) selects all columns. The FROM clause species
PROCLIB.PAYLIST as the table to select from.
select *
from proclib.paylist;
Output Table
PROCLIB.PAYLIST
PROCLIB.PAYLIST Table
Id
Num
Gender Jobcode
Salary
Birth
Hired
------------------------------------------------1639 F
TA1
42260 26JUN70 28JAN91
1065 M
ME3
38090 26JAN54 07JAN92
1400 M
ME1
29769 05NOV67 16OCT90
1561 M
36514 30NOV63 07OCT87
1221 F
FA3
. 22SEP63 04OCT94
PROCLIB.PAYROLL, PROCLIB.BONUS
This example builds a column with an arithmetic expression and creates the
PROCLIB.BONUS table from the querys result.
1127
1128
Input Table
Chapter 44
Input Table
PROCLIB.PAYROLL
First 10 Rows Only
Id
Number Gender Jobcode
Salary
Birth
Hired
--------------------------------------------------1919
M
TA2
34376 12SEP60 04JUN87
1653
F
ME2
35108 15OCT64 09AUG90
1400
M
ME1
29769 05NOV67 16OCT90
1350
F
FA3
32886 31AUG65 29JUL90
1401
M
TA3
38822 13DEC50 17NOV85
1499
M
ME3
43025 26APR54 07JUN80
1101
M
SCP
18723 06JUN62 01OCT90
1333
M
PT2
88606 30MAR61 10FEB81
1402
M
TA2
32615 17JAN63 02DEC90
1479
F
TA3
38785 22DEC68 05OCT89
Program
Declare the PROCLIB library. The PROCLIB library is used in these examples to store
created tables.
libname proclib SAS-data-library;
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=40;
Create the PROCLIB.BONUS table. The CREATE TABLE statement creates the table
PROCLIB.BONUS from the result of the subsequent query.
proc sql;
create table proclib.bonus as
Select the columns to include. The SELECT clause species that three columns will be in
the new table: IdNumber, Salary, and Bonus. FORMAT= assigns the DOLLAR8. format to
Salary. The Bonus column is built with the SQL expression salary*.025.
select IdNumber, Salary format=dollar8.,
salary*.025 as Bonus format=dollar8.
from proclib.payroll;
Display the rst 10 rows of the PROCLIB.BONUS table. The SELECT clause selects
columns from PROCLIB.BONUS. The asterisk (*) selects all columns. The FROM clause
species PROCLIB.BONUS as the table to select from. The OBS= data set option limits the
printing of the output to 10 rows.
select *
from proclib.bonus(obs=10);
Output
PROCLIB.BONUS
BONUS Information
Id
Number
Salary
Bonus
-------------------------1919
$34,376
$859
1653
$35,108
$878
1400
$29,769
$744
1350
$32,886
$822
1401
$38,822
$971
1499
$43,025
$1,076
1101
$18,723
$468
1333
$88,606
$2,215
1402
$32,615
$815
1479
$38,785
$970
EMPLOYEES
This example updates data values in the EMPLOYEES table and drops a column.
1129
1130
Input
Chapter 44
Input
data Employees;
input IdNum $4. +2 LName $11. FName $11. JobCode $3.
+1 Salary 5. +1 Phone $12.;
datalines;
1876 CHIN
JACK
TA1 42400 212/588-5634
1114 GREENWALD JANICE
ME3 38000 212/588-1092
1556 PENNINGTON MICHAEL
ME1 29860 718/383-5681
1354 PARKER
MARY
FA3 65800 914/455-2337
1130 WOOD
DEBORAH
PT2 36514 212/587-0013
;
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=40;
Display the entire EMPLOYEES table. The SELECT clause displays the table before the
updates. The asterisk (*) selects all columns for display. The FROM clause species
EMPLOYEES as the table to select from.
proc sql;
title Employees Table;
select * from Employees;
Update the values in the Salary column. The UPDATE statement updates the values in
EMPLOYEES. The SET clause species that the data in the Salary column be multiplied by
1.04 when the job code ends with a 1 and 1.025 for all other job codes. (The two underscores
represent any character.) The CASE expression returns a value for each row that completes the
SET clause.
update employees
set salary=salary*
case when jobcode like __1 then 1.04
else 1.025
end;
Modify the format of the Salary column and delete the Phone column. The ALTER
TABLE statement species EMPLOYEES as the table to alter. The MODIFY clause
permanently modies the format of the Salary column. The DROP clause permanently drops the
Phone column.
alter table employees
modify salary num format=dollar8.
drop phone;
Display the entire updated EMPLOYEES table. The SELECT clause displays the
EMPLOYEES table after the updates. The asterisk (*) selects all columns.
select * from employees;
Output
Employees Table
Id
Job
Num
LName
FName
Code
Salary Phone
-----------------------------------------------------------1876 CHIN
JACK
TA1
42400 212/588-5634
1114 GREENWALD
JANICE
ME3
38000 212/588-1092
1556 PENNINGTON
MICHAEL
ME1
29860 718/383-5681
1354 PARKER
MARY
FA3
65800 914/455-2337
1130 WOOD
DEBORAH
PT2
36514 212/587-0013
FROM clause
table alias
inner join
joined-table component
PROC SQL statement option
NUMBER
WHERE clause
IN condition
1131
1132
Input Tables
Tables:
Chapter 44
PROCLIB.STAFF, PROCLIB.PAYROLL
This example joins two tables in order to get more information about data that are
common to both tables.
Input Tables
PROCLIB.STAFF
First 10 Rows Only
Id
Num
Lname
Fname
City
State Hphone
---------------------------------------------------------------------------1919 ADAMS
GERALD
STAMFORD
CT
203/781-1255
1653 ALIBRANDI
MARIA
BRIDGEPORT
CT
203/675-7715
1400 ALHERTANI
ABDULLAH
NEW YORK
NY
212/586-0808
1350 ALVAREZ
MERCEDES
NEW YORK
NY
718/383-1549
1401 ALVAREZ
CARLOS
PATERSON
NJ
201/732-8787
1499 BAREFOOT
JOSEPH
PRINCETON
NJ
201/812-5665
1101 BAUCOM
WALTER
NEW YORK
NY
212/586-8060
1333 BANADYGA
JUSTIN
STAMFORD
CT
203/781-1777
1402 BLALOCK
RALPH
NEW YORK
NY
718/384-2849
1479 BALLETTI
MARIE
NEW YORK
NY
718/384-8816
PROCLIB.PAYROLL
First 10 Rows Only
Id
Number Gender Jobcode
Salary
Birth
Hired
--------------------------------------------------1919
M
TA2
34376 12SEP60 04JUN87
1653
F
ME2
35108 15OCT64 09AUG90
1400
M
ME1
29769 05NOV67 16OCT90
1350
F
FA3
32886 31AUG65 29JUL90
1401
M
TA3
38822 13DEC50 17NOV85
1499
M
ME3
43025 26APR54 07JUN80
1101
M
SCP
18723 06JUN62 01OCT90
1333
M
PT2
88606 30MAR61 10FEB81
1402
M
TA2
32615 17JAN63 02DEC90
1479
F
TA3
38785 22DEC68 05OCT89
Program
1133
Program
Declare the PROCLIB library. The PROCLIB library is used in these examples to store
created tables.
libname proclib SAS-data-library;
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=120 pagesize=40;
Add row numbers to PROC SQL output. NUMBER adds a column that contains the row
number.
proc sql number;
Select the columns to display. The SELECT clause selects the columns to show in the output.
select Lname, Fname, City, State,
IdNumber, Salary, Jobcode
Specify the tables from which to obtain the data. The FROM clause lists the tables to
select from.
from proclib.staff, proclib.payroll
Specify the join criterion and subset the query. The WHERE clause species that the
tables are joined on the ID number from each table. WHERE also further subsets the query
with the IN condition, which returns rows for only four employees.
where idnumber=idnum and idnum in
(1919, 1400, 1350, 1333);
1134
Output
Chapter 44
Output
Information for Certain Employees Only
Id
Lname
Fname
City
State Number
Salary Jobcode
-----------------------------------------------------------------------1 ADAMS
GERALD
STAMFORD
CT
1919
34376 TA2
Row
ALHERTANI
29769 ME1
ABDULLAH
NEW YORK
NY
1400
ALVAREZ
32886
MERCEDES
NEW YORK
NY
1350
FA3
BANADYGA
88606
JUSTIN
STAMFORD
CT
1333
PT2
DELETE statement
IS condition
RESET statement option
DOUBLE
UNION set operator
Tables:
Input Tables
PROCLIB.PAYLIST
Program
1135
PROCLIB.PAYLIST2
PROCLIB.PAYLIST2 Table
Id
Num
Gender Jobcode
Salary
Birth
Hired
------------------------------------------------1919 M
TA2
34376 12SEP66 04JUN87
1653 F
ME2
31896 15OCT64 09AUG92
1350 F
FA3
36886 31AUG55 29JUL91
1401 M
TA3
38822 13DEC55 17NOV93
1499 M
ME1
23025 26APR74 07JUN92
Program
Declare the PROCLIB library. The PROCLIB library is used in these examples to store
created tables.
libname proclib SAS-data-library;
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60;
Create the PROCLIB.NEWPAY table. The SELECT clauses select all the columns from the
tables that are listed in the FROM clauses. The UNION set operator concatenates the query
results that are produced by the two SELECT clauses.
proc sql;
create table proclib.newpay as
select * from proclib.paylist
union
select * from proclib.paylist2;
Delete rows with missing Jobcode or Salary values. The DELETE statement deletes rows
from PROCLIB.NEWPAY that satisfy the WHERE expression. The IS condition species rows
that contain missing values in the Jobcode or Salary column.
delete
from proclib.newpay
where jobcode is missing or salary is missing;
1136
Output
Chapter 44
Reset the PROC SQL environment and double-space the output. RESET changes the
procedure environment without stopping and restarting PROC SQL. The DOUBLE option
double-spaces the output. (The DOUBLE option has no effect on ODS output.)
reset double;
Display the entire PROCLIB.NEWPAY table. The SELECT clause selects all columns from
the newly created table, PROCLIB.NEWPAY.
select *
from proclib.newpay;
Output
Personnel Data
Id
Num
Gender Jobcode
Salary
Birth
Hired
------------------------------------------------1065 M
ME3
38090 26JAN54 07JAN92
1350
FA3
36886
31AUG55
29JUL91
1400
ME1
29769
05NOV67
16OCT90
1401
TA3
38822
13DEC55
17NOV93
1499
ME1
23025
26APR74
07JUN92
1639
TA1
42260
26JUN70
28JAN91
1653
ME2
31896
15OCT64
09AUG92
1919
TA2
34376
12SEP66
04JUN87
This example uses DICTIONARY tables to show a list of the SAS les in a SAS data
library. If you do not know the names of the columns in the DICTIONARY table that
you are querying, then use a DESCRIBE TABLE statement with the table.
Program
1137
Program
Declare the PROCLIB library. The PROCLIB library is used in these examples to store
created tables.
libname proclib SAS-data-library;
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page. SOURCE writes
the programming statements to the SAS log.
options nodate pageno=1 source linesize=80 pagesize=60;
List the column names from the DICTIONARY.MEMBERS table. DESCRIBE TABLE
writes the column names from DICTIONARY.MEMBERS to the SAS log.
proc sql;
describe table dictionary.members;
Display a list of les in the PROCLIB library. The SELECT clause selects the MEMNAME
and MEMTYPE columns. The FROM clause species DICTIONARY.MEMBERS as the table to
select from. The WHERE clause subsets the output to include only those rows that have a libref
of PROCLIB in the LIBNAME column.
select memname, memtype
from dictionary.members
where libname=PROCLIB;
1138
Log
Chapter 44
Log
277 options nodate pageno=1 source linesize=80 pagesize=60;
278
279 proc sql;
280
describe table dictionary.members;
NOTE: SQL table DICTIONARY.MEMBERS was created like:
create table DICTIONARY.MEMBERS
(
libname char(8) label=Library Name,
memname char(32) label=Member Name,
memtype char(8) label=Member Type,
engine char(8) label=Engine Name,
index char(32) label=Indexes,
path char(1024) label=Path Name
);
281
282
283
284
285
Output
SAS Files in the PROCLIB Library
Member
Member Name
Type
-----------------------------------------ALL
DATA
BONUS
DATA
BONUS95
DATA
DELAY
DATA
HOUSES
DATA
INTERNAT
DATA
MARCH
DATA
NEWPAY
DATA
PAYLIST
DATA
PAYLIST2
DATA
PAYROLL
DATA
PAYROLL2
DATA
SCHEDULE
DATA
SCHEDULE2
DATA
STAFF
DATA
STAFF2
DATA
SUPERV
DATA
SUPERV2
DATA
joined-table component
left outer join
SELECT clause
Input Tables
COALESCE function
WHERE clause
CONTAINS condition
Tables:
PROCLIB.PAYROLL, PROCLIB.PAYROLL2
Input Tables
PROCLIB.PAYROLL
First 10 Rows Only
Id
Number Gender Jobcode
Salary
Birth
Hired
--------------------------------------------------1009
M
TA1
28880 02MAR59 26MAR92
1017
M
TA3
40858 28DEC57 16OCT81
1036
F
TA3
39392 19MAY65 23OCT84
1037
F
TA1
28558 10APR64 13SEP92
1038
F
TA1
26533 09NOV69 23NOV91
1050
M
ME2
35167 14JUL63 24AUG86
1065
M
ME2
35090 26JAN44 07JAN87
1076
M
PT1
66558 14OCT55 03OCT91
1094
M
FA1
22268 02APR70 17APR91
1100
M
BCK
25004 01DEC60 07MAY88
PROCLIB.PAYROLL2
PROCLIB.PAYROLL2
Id
Num
Sex Jobcode
Salary
Birth
Hired
---------------------------------------------1036 F
TA3
42465 19MAY65 23OCT84
1065 M
ME3
38090 26JAN44 07JAN87
1076 M
PT1
69742 14OCT55 03OCT91
1106 M
PT3
94039 06NOV57 16AUG84
1129 F
ME3
36758 08DEC61 17AUG91
1221 F
FA3
29896 22SEP67 04OCT91
1350 F
FA3
36098 31AUG65 29JUL90
1369 M
TA3
36598 28DEC61 13MAR87
1447 F
FA1
22123 07AUG72 29OCT92
1561 M
TA3
36514 30NOV63 07OCT87
1639 F
TA3
42260 26JUN57 28JAN84
1998 M
SCP
23100 10SEP70 02NOV92
1139
1140
Program
Chapter 44
Program
Declare the PROCLIB library. The PROCLIB library is used in these examples to store
created tables.
libname proclib SAS-data-library;
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60;
Limit the number of output rows. OUTOBS= limits the output to 10 rows.
proc sql outobs=10;
Select the columns. The SELECT clause lists the columns to select. Some column names are
prexed with a table alias because they are in both tables. LABEL= and FORMAT= are column
modiers.
select p.IdNumber, p.Jobcode, p.Salary,
p2.jobcode label=New Jobcode,
p2.salary label=New Salary format=dollar8.
Specify the type of join. The FROM clause lists the tables to join and assigns table aliases.
The keywords LEFT JOIN specify the type of join. The order of the tables in the FROM clause
is important. PROCLIB.PAYROLL is listed rst and is considered the left table.
PROCLIB.PAYROLL2 is the right table.
from proclib.payroll as p left join proclib.payroll2 as p2
Specify the join criterion. The ON clause species that the join be performed based on the
values of the ID numbers from each table.
on p.IdNumber=p2.idnum;
Output
1141
Output
As the output shows, all rows from the left table, PROCLIB.PAYROLL, are returned. PROC
SQL assigns missing values for rows in the left table, PAYROLL, that have no matching values
for IdNum in PAYROLL2.
Select the columns and coalesce the Jobcode columns.The SELECT clause lists the
columns to select. COALESCE overlays the like-named columns. For each row, COALESCE
returns the rst nonmissing value of either P2.JOBCODE or P.JOBCODE. Because
P2.JOBCODE is the rst argument, if there is a nonmissing value for P2.JOBCODE,
COALESCE returns that value. Thus, the output contains the most recent job code information
for every employee. LABEL= assigns a column label.
select p.idnumber, coalesce(p2.jobcode,p.jobcode)
label=Current Jobcode,
Coalesce the Salary columns. For each row, COALESCE returns the rst nonmissing value
of either P2.SALARY or P.SALARY. Because P2.SALARY is the rst argument, if there is a
nonmissing value for P2.SALARY, then COALESCE returns that value. Thus, the output
contains the most recent salary information for every employee.
coalesce(p2.salary,p.salary) label=Current Salary
format=dollar8.
1142
Output
Chapter 44
Specify the type of join and the join criterion. The FROM clause lists the tables to join and
assigns table aliases. The keywords LEFT JOIN specify the type of join. The ON clause species
that the join is based on the ID numbers from each table.
from proclib.payroll p left join proclib.payroll2 p2
on p.IdNumber=p2.idnum;
Output
Subset the query. The WHERE clause subsets the left join to include only those rows
containing the value TA.
title Most Current Information for Ticket Agents;
select p.IdNumber,
coalesce(p2.jobcode,p.jobcode) label=Current Jobcode,
coalesce(p2.salary,p.salary) label=Current Salary
from proclib.payroll p left join proclib.payroll2 p2
on p.IdNumber=p2.idnum
where p2.jobcode contains TA;
Output
Program
PROCLIB.PAYROLL, PROCLIB.JOBS
This example creates the PROC SQL view PROCLIB.JOBS from the result of a
query-expression.
Input Table
PROCLIB.PAYROLL
First 10 Rows Only
Id
Number Gender Jobcode
Salary
Birth
Hired
--------------------------------------------------1009
M
TA1
28880 02MAR59 26MAR92
1017
M
TA3
40858 28DEC57 16OCT81
1036
F
TA3
39392 19MAY65 23OCT84
1037
F
TA1
28558 10APR64 13SEP92
1038
F
TA1
26533 09NOV69 23NOV91
1050
M
ME2
35167 14JUL63 24AUG86
1065
M
ME2
35090 26JAN44 07JAN87
1076
M
PT1
66558 14OCT55 03OCT91
1094
M
FA1
22268 02APR70 17APR91
1100
M
BCK
25004 01DEC60 07MAY88
Program
Declare the PROCLIB library. The PROCLIB library is used in these examples to store
created tables.
libname proclib SAS-data-library;
1143
1144
Program
Chapter 44
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60;
Create the PROCLIB.JOBS view. CREATE VIEW creates the PROC SQL view
PROCLIB.JOBS. The PW= data set option assigns password protection to the data that is
generated by this view.
proc sql;
create view proclib.jobs(pw=red) as
Select the columns. The SELECT clause species four columns for the view: Jobcode and
three columns, Number, AVGAGE, and AVGSAL, whose values are the products functions.
COUNT returns the number of nonmissing values for each job code because the data is grouped
by Jobcode. LABEL= assigns a label to the column.
select Jobcode,
count(jobcode) as number label=Number,
Calculate the Avgage and Avgsal columns. The AVG summary function calculates the
average age and average salary for each job code.
avg(int((today()-birth)/365.25)) as avgage
format=2. label=Average Age,
avg(salary) as avgsal
format=dollar8. label=Average Salary
Specify the table from which the data is obtained. The FROM clause species PAYROLL
as the table to select from. PROC SQL assumes the libref of PAYROLL to be PROCLIB because
PROCLIB is used in the CREATE VIEW statement.
from payroll
Organize the data into groups and specify the groups to include in the output. The
GROUP BY clause groups the data by the values of Jobcode. Thus, any summary statistics are
calculated for each grouping of rows by value of Jobcode. The HAVING clause subsets the
grouped data and returns rows for job codes that contain an average age of greater than or
equal to 30.
group by jobcode
having avgage ge 30;
1145
Display the entire PROCLIB.JOBS view. The SELECT statement selects all columns from
PROCLIB.JOBS. PW=RED is necessary because the view is password protected.
select * from proclib.jobs(pw=red);
Output
Current Summary Information for Each Job Category
Average Age Greater Than Or Equal to 30
Average
Average
Jobcode
Number
Age
Salary
-----------------------------------BCK
9
36
$25,794
FA1
11
33
$23,039
FA2
16
37
$27,987
FA3
7
39
$32,934
ME1
8
34
$28,500
ME2
14
39
$35,577
ME3
7
42
$42,411
NA1
5
30
$42,032
NA2
3
42
$52,383
PT1
8
38
$67,908
PT2
10
43
$87,925
PT3
2
54
$10,505
SCP
7
37
$18,309
TA1
9
36
$27,721
TA2
20
36
$33,575
TA3
12
40
$39,680
FROM clause
joined-table component
WHERE clause
Tables: PROCLIB.STAFF2, PROCLIB.SCHEDULE2, PROCLIB.SUPERV2
This example joins three tables and produces a report that contains columns from
each table.
1146
Input Tables
Chapter 44
Input Tables
PROCLIB.STAFF2
PROCLIB.STAFF2
Id
Num
Lname
Fname
City
State Hphone
---------------------------------------------------------------------------1106 MARSHBURN
JASPER
STAMFORD
CT
203/781-1457
1430 DABROWSKI
SANDRA
BRIDGEPORT
CT
203/675-1647
1118 DENNIS
ROGER
NEW YORK
NY
718/383-1122
1126 KIMANI
ANNE
NEW YORK
NY
212/586-1229
1402 BLALOCK
RALPH
NEW YORK
NY
718/384-2849
1882 TUCKER
ALAN
NEW YORK
NY
718/384-0216
1479 BALLETTI
MARIE
NEW YORK
NY
718/384-8816
1420 ROUSE
JEREMY
PATERSON
NJ
201/732-9834
1403 BOWDEN
EARL
BRIDGEPORT
CT
203/675-3434
1616 FUENTAS
CARLA
NEW YORK
NY
718/384-3329
PROCLIB.SCHEDULE2
PROCLIB.SCHEDULE2
Id
Flight
Date Dest Num
--------------------------132
01MAR94 BOS
1118
132
01MAR94 BOS
1402
219
02MAR94 PAR
1616
219
02MAR94 PAR
1478
622
03MAR94 LON
1430
622
03MAR94 LON
1882
271
04MAR94 NYC
1430
271
04MAR94 NYC
1118
579
05MAR94 RDU
1126
579
05MAR94 RDU
1106
Program
1147
PROCLIB.SUPERV2
PROCLIB.SUPERV2
Supervisor
Job
Id
State Category
--------------------------1417
NJ
NA
1352
NY
NA
1106
CT
PT
1442
NJ
PT
1118
NY
PT
1405
NJ
SC
1564
NY
SC
1639
CT
TA
1126
NY
TA
1882
NY
ME
Program
Declare the PROCLIB library. The PROCLIB library is used in these examples to store
created tables.
libname proclib SAS-data-library;
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60;
Select the columns. The SELECT clause species the columns to select. IdNum is prexed
with a table alias because it appears in two tables.
proc sql;
title All Flights for Each Supervisor;
select s.IdNum, Lname, City Hometown, Jobcat,
Flight, Date
Specify the tables to include in the join. The FROM clause lists the three tables for the join
and assigns an alias to each table.
from proclib.schedule2 s, proclib.staff2 t, proclib.superv2 v
1148
Output
Chapter 44
Specify the join criteria. The WHERE clause species the columns that join the tables. The
STAFF2 and SCHEDULE2 tables have an IdNum column, which has related values in both
tables. The STAFF2 and SUPERV2 tables have the IdNum and SUPID columns, which have
related values in both tables.
where s.idnum=t.idnum and t.idnum=v.supid;
Output
All Flights for Each Supervisor
Id
Job
Num
Lname
Hometown
Category Flight
Date
----------------------------------------------------------------1106 MARSHBURN
STAMFORD
PT
579
05MAR94
1118 DENNIS
NEW YORK
PT
132
01MAR94
1118 DENNIS
NEW YORK
PT
271
04MAR94
1126 KIMANI
NEW YORK
TA
579
05MAR94
1882 TUCKER
NEW YORK
ME
622
03MAR94
FROM clause
in-line view
Tables: PROCLIB.STAFF2, PROCLIB.SCHEDULE2, PROCLIB.SUPERV2
This example shows an alternative way to construct the query that is explained in
Example 9 on page 1145 by joining one of the tables with the results of an in-line view.
The example also shows how to rename columns with an in-line view.
Program
Declare the PROCLIB library. The PROCLIB library is used in these examples to store
created tables.
libname proclib SAS-data-library;
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60;
Output
1149
Select the columns. The SELECT clause selects all columns that are returned by the in-line
view (which will have the alias Three assigned to it), plus one column from the third table
(which will have the alias V assigned to it).
proc sql;
title All Flights for Each Supervisor;
select three.*, v.jobcat
Specify the in-line query. Instead of including the name of a table or view, the FROM clause
includes a query that joins two of the three tables. In the in-line query, the SELECT clause lists
the columns to select. IdNum is prexed with a table alias because it appears in both tables.
The FROM clause lists the two tables for the join and assigns an alias to each table. The
WHERE clause species the columns that join the tables. The STAFF2 and SCHEDULE2 tables
have an IdNum column, which has related values in both tables.
from (select lname, s.idnum, city, flight, date
from proclib.schedule2 s, proclib.staff2 t
where s.idnum=t.idnum)
Specify an alias for the query and names for the columns. The alias Three refers to the
results of the in-line view. The names in parentheses become the names for the columns in the
view.
as three (Surname, Emp_ID, Hometown,
FlightNumber, FlightDate),
Join the results of the in-line view with the third table. The WHERE clause species the
columns that join the table with the in-line view. Note that the WHERE clause species the
renamed Emp_ID column from the in-line view.
proclib.superv2 v
where three.Emp_ID=v.supid;
Output
All Flights for Each Supervisor
Job
Surname
Emp_ID Hometown
FlightNumber FlightDate Category
---------------------------------------------------------------------------MARSHBURN
1106
STAMFORD
579
05MAR94 PT
DENNIS
1118
NEW YORK
132
01MAR94 PT
DENNIS
1118
NEW YORK
271
04MAR94 PT
KIMANI
1126
NEW YORK
579
05MAR94 TA
TUCKER
1882
NEW YORK
622
03MAR94 ME
1150
Chapter 44
ORDER BY clause
SOUNDS-LIKE operator
Table:
PROCLIB.STAFF
This example returns rows based on the functionality of the SOUNDS-LIKE operator
in a WHERE clause.
Note: The SOUNDS-LIKE operator is based on the SOUNDEX algorithm for
identifying words that sound alike. The SOUNDEX algorithm is English-biased and is
less useful for languages other than English. For more information on the SOUNDEX
algorithm, see SAS Language Reference: Dictionary. 4
Input Table
PROCLIB.STAFF
PROCLIB.STAFF
First 10 Rows Only
Id
Num
Lname
Fname
City
State Hphone
---------------------------------------------------------------------------1919 ADAMS
GERALD
STAMFORD
CT
203/781-1255
1653 ALIBRANDI
MARIA
BRIDGEPORT
CT
203/675-7715
1400 ALHERTANI
ABDULLAH
NEW YORK
NY
212/586-0808
1350 ALVAREZ
MERCEDES
NEW YORK
NY
718/383-1549
1401 ALVAREZ
CARLOS
PATERSON
NJ
201/732-8787
1499 BAREFOOT
JOSEPH
PRINCETON
NJ
201/812-5665
1101 BAUCOM
WALTER
NEW YORK
NY
212/586-8060
1333 BANADYGA
JUSTIN
STAMFORD
CT
203/781-1777
1402 BLALOCK
RALPH
NEW YORK
NY
718/384-2849
1479 BALLETTI
MARIE
NEW YORK
NY
718/384-8816
Program
Declare the PROCLIB library. The PROCLIB library is used in these examples to store
created tables.
libname proclib SAS-data-library;
Output
1151
Select the columns and the table from which the data is obtained. The SELECT clause
selects all columns from the table in the FROM clause, PROCLIB.STAFF.
proc sql;
title "Employees Whose Last Name Sounds Like Johnson";
select idnum, upcase(lname), fname
from proclib.staff
Subset the query and sort the output. The WHERE clause uses the SOUNDS-LIKE
operator to subset the table by those employees whose last name sounds like Johnson. The
ORDER BY clause orders the output by the second column.
where lname=*"Johnson"
order by 2;
Output
Employees Whose Last Name Sounds Like Johnson
Id
Num
Fname
-------------------------------------1411 JOHNSEN
JACK
1113 JOHNSON
LESLIE
1369 JONSON
ANTHONY
SOUNDS-LIKE is useful, but there might be instances where it does not return every row that
seems to satisfy the condition. PROCLIB.STAFF has an employee with the last name SANDERS
and an employee with the last name SANYERS. The algorithm does not nd SANYERS, but it does
nd SANDERS and SANDERSON.
title "Employees Whose Last Name Sounds Like Sanders";
select *
from proclib.staff
where lname=*"Sanders"
order by 2;
1152
Chapter 44
GROUP BY clause
HAVING clause
SELECT clause
ABS function
FORMAT= column-modier
LABEL= column-modier
MIN summary function
** operator, exponentiation
SQRT function
Tables: STORES, HOUSES
This example joins two tables in order to compare and analyze values that are unique
to each table yet have a relationship with a column that is common to both tables.
options ls=80 ps=60 nodate pageno=1 ;
data stores;
input Store $ x y;
datalines;
store1 5 1
store2 5 3
store3 3 5
store4 7 5
;
data houses;
input House $ x y;
datalines;
house1 1 1
house2 3 3
house3 2 3
house4 7 7
;
Input Tables
STORES Table
Coordinates of Stores
Store
x
y
---------------------------store1
6
1
store2
5
2
store3
3
5
store4
7
5
HOUSES Table
Coordinates of Houses
Program
1153
House
x
y
---------------------------house1
1
1
house2
3
3
house3
2
3
house4
7
7
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60;
Specify the query. The SELECT clause species three columns: HOUSE, STORE, and DIST.
The arithmetic expression uses the square root function (SQRT) to create the values of DIST,
which contain the distance from HOUSE to STORE for each row. The double asterisk (**)
represents exponentiation. LABEL= assigns a label to STORE and to DIST.
proc sql;
title Each House and the Closest Store;
select house, store label=Closest Store,
sqrt((abs(s.x-h.x)**2)+(abs(h.y-s.y)**2)) as dist
label=Distance format=4.2
from stores s, houses h
Organize the data into groups and subset the query. The minimum distance from each
house to all the stores is calculated because the data are grouped by house. The HAVING clause
species that each row be evaluated to determine if its value of DIST is the same as the
minimum distance from that house to any store.
group by house
having dist=min(dist);
1154
Output
Chapter 44
Output
Note that two stores are tied for shortest distance from house2.
Closest
House
Store
Distance
---------------------------house1
store1
4.00
house2
store2
2.00
house2
store3
2.00
house3
store3
2.24
house4
store4
2.00
CASE expression
joined-table component
Cross join
SELECT clause
DISTINCT keyword
Tables: PROCLIB.MARCH, FLIGHTS
This example joins a table with itself to get all the possible combinations of the
values in a column.
1155
Input Table
PROCLIB.MARCH
First 10 Rows Only
Flight
Date Depart Orig Dest
Miles
Boarded Capacity
----------------------------------------------------------------114
01MAR94
7:10 LGA
LAX
2475
172
210
202
01MAR94
10:43 LGA
ORD
740
151
210
219
01MAR94
9:31 LGA
LON
3442
198
250
622
01MAR94
12:19 LGA
FRA
3857
207
250
132
01MAR94
15:35 LGA
YYZ
366
115
178
271
01MAR94
13:17 LGA
PAR
3635
138
250
302
01MAR94
20:22 LGA
WAS
229
105
180
114
02MAR94
7:10 LGA
LAX
2475
119
210
202
02MAR94
10:43 LGA
ORD
740
120
210
219
02MAR94
9:31 LGA
LON
3442
147
250
Declare the PROCLIB library. The PROCLIB library is used in these examples to store
created tables.
libname proclib SAS-data-library;
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60;
Create the FLIGHTS table. The CREATE TABLE statement creates the table FLIGHTS from
the output of the query. The SELECT clause selects the unique values of Dest. DISTINCT
species that only one row for each value of city be returned by the query and stored in the
table FLIGHTS. The FROM clause species PROCLIB.MARCH as the table to select from.
proc sql;
create table flights as
select distinct dest
from proclib.march;
1156
Output
Chapter 44
Output
FLIGHTS Table
Dest
---FRA
LAX
LON
ORD
PAR
WAS
YYZ
Select the columns. The SELECT clause species three columns for the output. The prexes
on DEST are table aliases to specify which table to take the values of Dest from. The CASE
expression creates a column that contains the character string to and from.
select f1.Dest, case
when f1.dest ne then to and from
end,
f2.Dest
Specify the type of join. The FROM clause joins FLIGHTS with itself and creates a table that
contains every possible combination of rows (a Cartesian product). The table contains two rows
for each possible route, for example, PAR <-> WAS and WAS <-> PAR.
from flights as f1, flights as f2
1157
Specify the join criterion. The WHERE clause subsets the internal table by choosing only
those rows where the name in F1.Dest sorts before the name in F2.Dest. Thus, there is only one
row for each possible route.
where f1.dest < f2.dest
Sort the output. ORDER BY sorts the result by the values of F1.Dest.
order by f1.dest;
Output
Dest
Dest
----------------------FRA
to and from LAX
FRA
to and from LON
FRA
to and from WAS
FRA
to and from ORD
FRA
to and from PAR
FRA
to and from YYZ
LAX
to and from LON
LAX
to and from PAR
LAX
to and from WAS
LAX
to and from ORD
LAX
to and from YYZ
LON
to and from ORD
LON
to and from WAS
LON
to and from PAR
LON
to and from YYZ
ORD
to and from WAS
ORD
to and from PAR
ORD
to and from YYZ
PAR
to and from WAS
PAR
to and from YYZ
WAS
to and from YYZ
Specify a cross join. Because a cross join is functionally the same as a Cartesian product join,
the cross join syntax can be substituted for the conventional join syntax.
proc sql;
title All Possible Connections;
select f1.Dest, case
when f1.dest ne then to and from
end,
f2.Dest
from flights as f1 cross join flights as f2
where f1.dest < f2.dest
order by f1.dest;
1158
Output
Chapter 44
Output
All Possible Connections
Dest
Dest
----------------------FRA
to and from LAX
FRA
to and from LON
FRA
to and from WAS
FRA
to and from ORD
FRA
to and from PAR
FRA
to and from YYZ
LAX
to and from LON
LAX
to and from PAR
LAX
to and from WAS
LAX
to and from ORD
LAX
to and from YYZ
LON
to and from ORD
LON
to and from WAS
LON
to and from PAR
LON
to and from YYZ
ORD
to and from WAS
ORD
to and from PAR
ORD
to and from YYZ
PAR
to and from WAS
PAR
to and from YYZ
WAS
to and from YYZ
joined-table component
Tables:
This example uses a table that contains data for a case-control study. Each row
contains information for a case or a control. To perform statistical analysis, you need a
table with one row for each case-control pair. PROC SQL joins the table with itself in
order to match the cases with their appropriate controls. After the rows are matched,
differencing can be performed on the appropriate columns.
The input table MATCH_11 contains one row for each case and one row for each
control. Pair contains a number that associates the case with its control. Low is 0 for
the controls and 1 for the cases. The remaining columns contain information about the
cases and controls.
Program
1159
Input Table
MATCH_11 Table
First 10 Rows Only
Pair
Low
Age
Lwt
Race
Smoke
Ptd
Ht
UI
race1
race2
-----------------------------------------------------------------------------------------------------------1
1
2
0
1
0
14
14
15
135
101
98
1
3
2
0
1
0
0
1
0
0
0
0
0
0
0
0
0
1
0
1
0
2
3
3
1
0
1
15
16
16
115
95
130
3
3
3
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
1
1
1
4
4
0
1
17
17
103
130
3
3
0
1
0
1
0
0
0
1
0
0
1
1
5
5
0
1
17
17
122
110
1
1
1
1
0
0
0
0
0
0
0
0
0
0
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60;
Create the MATCH table. The SELECT clause species the columns for the table MATCH.
SQL expressions in the SELECT clause calculate the differences for the appropriate columns
and create new columns.
proc sql;
create table match as
select
one.Low,
one.Pair,
(one.lwt - two.lwt) as Lwt_d,
(one.smoke - two.smoke) as Smoke_d,
(one.ptd - two.ptd) as Ptd_d,
(one.ht - two.ht) as Ht_d,
(one.ui - two.ui) as UI_d
Specify the type of join and the join criterion. The FROM clause lists the table MATCH_11
twice. Thus, the table is joined with itself. The WHERE clause returns only the rows for each
pair that show the difference when the values for control are subtracted from the values for case.
from match_11 one, match_11 two
where (one.pair=two.pair and one.low>two.low);
1160
Output
Chapter 44
Display the rst ve rows of the MATCH table. The SELECT clause selects all the columns
from MATCH. The OBS= data set option limits the printing of the output to ve rows.
select *
from match(obs=5);
Output
MATCH Table
Low
Pair
Lwt_d
Smoke_d
Ptd_d
Ht_d
UI_d
-------------------------------------------------------------------1
1
-34
1
1
0
0
1
2
17
0
0
0
1
1
3
35
0
0
0
0
1
4
27
1
1
0
1
1
5
-12
0
0
0
0
COUNT function
Table:
SURVEY
This example uses a SAS macro to create columns. The SAS macro is not explained
here. See SAS Macro Language: Reference for information on SAS macros.
Program
1161
Input Table
SURVEY contains data from a questionnaire about diet and exercise habits. SAS enables you to
use a special notation for missing values. In the EDUC column, the .x notation indicates that
the respondent gave an answer that is not valid, and .n indicates that the respondent did not
answer the question. A period as a missing value indicates a data entry error.
data survey;
input id $
datalines;
1001 yes yes
1002 no yes
1003 no no
1004 yes yes
1005 no yes
1006 yes yes
1007 no yes
1008 no no
;
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60;
Count the nonmissing responses. The COUNTM macro uses the COUNT function to perform
various counts for a column. Each COUNT function uses a CASE expression to select the rows
to be counted. The rst COUNT function uses only the column as an argument to return the
number of nonmissing rows.
%macro countm(col);
count(&col) "Valid Responses for &col",
Count missing or invalid responses. The NMSS function returns the number of rows for
which the column has any type of missing value: .n, .x, or a period.
nmiss(&col) "Missing or NOT VALID Responses for &col",
1162
Output
Chapter 44
Count the occurrences of various sources of missing or invalid responses. The last
three COUNT functions use CASE expressions to count the occurrences of the three notations
for missing values. The count me character string gives the COUNT function a nonmissing
value to count.
count(case
when
end)
count(case
when
end)
count(case
when
end)
%mend;
Use the COUNTM macro to create the columns. The SELECT clause species the columns
that are in the output. COUNT(*) returns the total number of rows in the table. The COUNTM
macro uses the values of the EDUC column to create the columns that are dened in the macro.
proc sql;
title Counts for Each Type of Missing Response;
select count(*) "Total No. of Rows",
%countm(educ)
from survey;
Output
Counts for Each Type of Missing Response
Missing
Coded as
or NOT Coded as
NOT
Data
Total
Valid
VALID
NO
VALID
Entry
No. of Responses Responses
ANSWER
answers
Errors
Rows
for educ
for educ for educ for educ for educ
-----------------------------------------------------------8
2
6
1
3
2
1163
CHAPTER
45
The STANDARD Procedure
Overview: STANDARD Procedure 1163
What Does the STANDARD Procedure Do? 1163
Standardizing Data 1163
Syntax: STANDARD Procedure 1165
PROC STANDARD Statement 1166
BY Statement 1168
FREQ Statement 1169
VAR Statement 1169
WEIGHT Statement 1169
Results: STANDARD Procedure 1170
Missing Values 1170
Output Data Set 1170
Statistical Computations: STANDARD Procedure 1171
Examples: STANDARD Procedure 1171
Example 1: Standardizing to a Given Mean and Standard Deviation
Example 2: Standardizing BY Groups and Replacing Missing Values
1171
1173
Standardizing Data
Output 45.1 shows a simple standardization where the output data set contains
standardized student exam scores. The statements that produce the output follow:
proc standard data=score mean=75 std=5
out=stndtest;
run;
proc print data=stndtest;
run;
1164
Standardizing Data
Output 45.1
Chapter 45
Student
1
Test1
Capalleti
Dubose
Engles
Grant
Krupski
Lundsford
McBane
Mullen
Nguyen
Patel
Si
Tanaka
80.5388
64.3918
80.9143
68.8980
75.2816
79.7877
73.4041
78.6612
74.9061
71.9020
73.4041
77.9102
Output 45.2 shows a more complex example that uses BY-group processing. PROC
STANDARD computes Z scores separately for two BY groups by standardizing
life-expectancy data to a mean of 0 and a standard deviation of 1. The data are 1950
and 1993 life expectancies at birth for 16 countries. The birth rates for each country,
classied as stable or rapid, form the two BY groups. The statements that produce the
analysis also
3 print statistics for each variable to standardize
3 replace missing values with the given mean
3 calculate standardized values using a given mean and standard deviation
3 print the data set with the standardized values.
For an explanation of the program that produces this output, see Example 2 on page
1173.
Output 45.2
Name
Label
Mean
Life50
67.400000
1950 life expectancy
Life93
74.500000
1993 life expectancy
Standard
Deviation
1.854724
4.888763
Name
Label
Mean
Life50
42.000000
1950 life expectancy
Life93
59.100000
1993 life expectancy
Standard
Deviation
5.033223
8.225300
10
Country
France
Germany
Japan
Russia
United Kingdom
United States
Bangladesh
Brazil
China
Egypt
Ethiopia
India
Indonesia
Mozambique
Philippines
Turkey
Life50
1165
Life93
-0.21567
0.32350
-1.83316
0.00000
0.86266
0.86266
0.00000
1.78812
-0.19868
0.00000
-1.78812
-0.59604
-0.79472
0.00000
1.19208
0.39736
0.51138
0.10228
0.92048
-1.94323
0.30683
0.10228
-0.74161
0.96045
1.32518
0.10942
-1.59265
-0.01216
-0.01216
-1.47107
0.59572
0.83888
Tip:
Standard
Reminder: You can use the ATTRIB, FORMAT, LABEL, and WHERE statements. See
Chapter 3, Statements with the Same Function in Multiple Procedures, on page 57 for
details. You can also use any global statements. See Global Statements on page 18 for
a list.
To do this
BY
FREQ
1166
Chapter 45
To do this
VAR
WEIGHT
To do this
DATA=
OUT=
Computational options
Exclude observations with nonpositive weights
EXCLNPWGT
MEAN=
REPLACE
STD=
VARDEF=
Without Options
If you do not specify MEAN=, REPLACE, or STD=, the output data set is an
identical copy of the input data set.
Options
DATA=SAS-data-set
concurrent access if another user is updating the data set at the same time.
1167
EXCLNPWGT
identies the output data set. If SAS-data-set does not exist, PROC STANDARD
creates it. If you omit OUT=, the data set is named DATAn, where n is the smallest
integer that makes the name unique.
Default: DATAn
Featured in: Example 1 on page 1171
PRINT
prints the original frequency, mean, and standard deviation for each variable to
standardize.
Featured in: Example 2 on page 1173
REPLACE
species the divisor to use in the calculation of variances and standard deviation.
Table 45.1 on page 1167 shows the possible values for divisor and the associated
divisors.
Table 45.1
Value
Divisor
DF
degrees of freedom
number of observations
WDF
WEIGHT
|WGT
sum of weights
P (x 0
n1
n
(6i wi) 1
6i wi
CSS=divisor
CSS
1168
BY Statement
Chapter 45
Default: DF
When you use the WEIGHT statement and VARDEF=DF, the variance is an
estimate of 2 , where the variance of the ith observation is var (xi ) = 2 =wi and
wi is the weight for the ith observation. This yields an estimate of the variance of
an observation with unit weight.
Tip:
When you use the WEIGHT statement and VARDEF=WGT, the computed
variance is asymptotically (for large n) an estimate of 2 =w, where w is the
average weight. This yields an asymptotic estimate of the variance of an
observation with average weight.
Tip:
BY Statement
Calculates standardized values separately for each BY group.
Main discussion: BY on page 58
Featured in:
Required Arguments
variable
species the variable that the procedure uses to form BY groups. You can specify
more than one variable. If you do not use the NOTSORTED option in the BY
statement, the observations in the data set must either be sorted by all the variables
that you specify, or they must be indexed appropriately. These variables are called
BY variables.
Options
DESCENDING
species that the data set is sorted in descending order by the variable that
immediately follows the word DESCENDING in the BY statement.
NOTSORTED
species that observations are not necessarily sorted in alphabetic or numeric order.
The data are grouped in another way, such as chronological order.
The requirement for ordering or indexing observations according to the values of
BY variables is suspended for BY-group processing when you use the
NOTSORTED option. In fact, the procedure does not use an index if you specify
NOTSORTED. The procedure denes a BY group as a set of contiguous observations
that have the same values for all BY variables. If observations with the same values
for the BY variables are not contiguous, the procedure treats each contiguous set as a
separate BY group.
WEIGHT Statement
1169
FREQ Statement
Species a numeric variable whose values represent the frequency of the observation.
The effects of the FREQ and WEIGHT statements are similar except when
calculating degrees of freedom.
See also: For an example that uses the FREQ statement, see FREQ on page 61
Tip:
FREQ variable;
Required Arguments
variable
species a numeric variable whose value represents the frequency of the observation.
If you use the FREQ statement, the procedure assumes that each observation
represents n observations, where n is the value of variable. If n is not an integer, the
SAS System truncates it. If n is less than 1 or is missing, the procedure does not use
that observation to calculate statistics but the observation is still standardized.
The sum of the frequency variable represents the total number of observations.
VAR Statement
Species the variables to standardize and their order in the printed output.
If you omit the VAR statement, PROC STANDARD standardizes all numeric
variables not listed in the other statements.
Featured in: Example 1 on page 1171
Default:
VAR variable(s);
Required Arguments
variable(s)
WEIGHT Statement
Species weights for analysis variables in the statistical calculations.
See also: For information about calculating weighted statistics and for an example that
uses the WEIGHT statement, see WEIGHT on page 63
1170
Chapter 45
WEIGHT variable;
Required Arguments
variable
species a numeric variable whose values weight the values of the analysis variables.
The values of the variable do not have to be integers. If the value of the weight
variable is
Weight value
0
less than 0
missing
PROC STANDARD
counts the observation in the total number of observations
converts the weight value to zero and counts the observation in
the total number of observations
excludes the observation from the calculation of mean and
standard deviation
To exclude observations that contain negative and zero weights from the calculation
of mean and standard deviation, use EXCLNPWGT. Note that most SAS/STAT
procedures, such as PROC GLM, exclude negative and zero weights by default.
Tip: When you use the WEIGHT statement, consider which value of the VARDEF=
option is appropriate. See VARDEF= on page 1167 and the calculation of weighted
statistics in Keywords and Formulas on page 1340 for more information.
Note: Prior to Version 7 of the SAS System, the procedure did not exclude the
observations with missing weights from the count of observations. 4
Missing Values
By default, PROC STANDARD excludes missing values for the analysis variables
from the standardization process, and the values remain missing in the output data set.
When you specify the REPLACE option, the procedure replaces missing values with the
variables mean or the MEAN= value.
If the value of the WEIGHT variable or the FREQ variable is missing then the
procedure does not use the observation to calculate the mean and the standard
deviation. However, the observation is standardized.
xi =
0
S 3 (xi 0 x)
+M
sx
where
xi
S
M
xi
x
sx
0
PROC STANDARD calculates the mean (x) and standard deviation (sx ) from the
input data set. The resulting standardized variable has a mean of M and a standard
deviation of S.
If the data are normally distributed, standardizing is also studentizing since the
resulting data have a Students t distribution.
PRINT procedure
This example
3 standardizes two variables to a mean of 75 and a standard deviation of 5
3 species the output data set
3 combines standardized variables with original variables
3 prints the output data set.
1171
1172
Program
Chapter 45
Program
Set the SAS system options. The NODATE option species to omit the date and time when
the SAS job began. The PAGENO= option species the page number for the next page of output
that SAS produces. The LINESIZE= option species the line size. The PAGESIZE= option
species the number of lines for a page of SAS output.
options nodate pageno=1 linesize=80 pagesize=60;
Create the SCORE data set. This data set contains test scores for students who took two tests
and a nal exam. The FORMAT statement assigns the Zw.d format to StudentNumber. This
format pads right-justied output with 0s instead of blanks. The LENGTH statement species
the number of bytes to use to store values of Student.
data score;
length Student $ 9;
input Student $ StudentNumber Section $
Test1 Test2 Final @@;
format studentnumber z4.;
datalines;
Capalleti 0545 1 94 91 87 Dubose
1252 2
Engles
1167 1 95 97 97 Grant
1230 2
Krupski
2527 2 80 69 71 Lundsford 4860 1
McBane
0674 1 75 78 72 Mullen
6445 2
Nguyen
0886 1 79 76 80 Patel
9164 2
Si
4915 1 75 71 73 Tanaka
8534 2
;
51
63
92
89
71
87
65
75
40
82
77
73
91
80
86
93
83
76
Generate the standardized data and create the output data set STNDTEST. PROC
STANDARD uses a mean of 75 and a standard deviation of 5 to standardize the values. OUT=
identies STNDTEST as the data set to contain the standardized values.
proc standard data=score mean=75 std=5 out=stndtest;
Specify the variables to standardize. The VAR statement species the variables to
standardize and their order in the output.
var test1 test2;
run;
Create a data set that combines the original values with the standardized values.
PROC SQL joins SCORE and STNDTEST to create the COMBINED data set (table) that
contains standardized and original test scores for each student. Using AS to rename the
standardized variables NEW.TEST1 to StdTest1 and NEW.TEST2 to StdTest2 makes the
variable names unique.
proc sql;
create table combined as
Print the data set. PROC PRINT prints the COMBINED data set. ROUND rounds the
standardized values to two decimal places. The TITLE statement species a title.
proc print data=combined noobs round;
title Standardized Test Scores for a College Course;
run;
Output
The data set contains variables with both standardized and original values. StdTest1 and
StdTest2 store the standardized test scores that PROC STANDARD computes.
Student
Capalleti
Dubose
Engles
Grant
Krupski
Lundsford
McBane
Mullen
Nguyen
Patel
Si
Tanaka
Student
Number
0545
1252
1167
1230
2527
4860
0674
6445
0886
9164
4915
8534
Section
1
2
1
2
2
1
1
2
1
2
1
2
Test1
94
51
95
63
80
92
75
89
79
71
75
87
Std
Test1
80.54
64.39
80.91
68.90
75.28
79.79
73.40
78.66
74.91
71.90
73.40
77.91
Test2
91
65
97
75
69
40
78
82
76
77
71
73
1
Std
Test2
Final
80.86
71.63
82.99
75.18
73.05
62.75
76.24
77.66
75.53
75.89
73.76
74.47
87
91
97
80
71
86
72
93
80
83
73
76
1173
1174
Program
Chapter 45
Other features:
FORMAT procedure
PRINT procedure
SORT procedure
This example
3 calculates Z scores separately for each BY group using a mean of 1 and standard
deviation of 0
3 replaces missing values with the given mean
3 prints the mean and standard deviation for the variables to standardize
3 prints the output data set.
Program
Set the SAS system options. The NODATE option species to omit the date and time when
the SAS job began. The PAGENO= option species the page number for the next page of output
that SAS produces. The LINESIZE= option species the line size. The PAGESIZE= option
species the number of lines for a page of SAS output.
options nodate pageno=1 linesize=80 pagesize=60;
Assign a character string format to a numeric value. PROC FORMAT creates the format
POPFMT to identify birth rates with a character value.
proc format;
value popfmt 1=Stable
2=Rapid;
run;
Create the LIFEEXP data set. Each observation in this data set contains information on 1950
and 1993 life expectancies at birth for 16 nations.* The birth rate for each nation is classied as
stable (1) or rapid (2). The nations with missing data obtained independent status after 1950.
data lifexp;
input PopulationRate Country $char14. Life50 Life93 @@;
label life50=1950 life expectancy
life93=1993 life expectancy;
datalines;
2 Bangladesh
. 53 2 Brazil
51 67
2 China
41 70 2 Egypt
42 60
2 Ethiopia
33 46 1 France
67 77
1 Germany
68 75 2 India
39 59
2 Indonesia
38 59 1 Japan
64 79
* Data are from Vital Signs 1994: The Trends That Are Shaping Our Future, Lester R. Brown, Hal Kane, and David Malin
Roodman, eds. Copyright 1994 by Worldwatch Institute. Reprinted by permission of W.W. Norton & Company, Inc.
2 Mozambique
. 47 2 Philippines
1 Russia
. 65 2 Turkey
1 United Kingdom 69 76 1 United States
;
Program
1175
48 64
44 66
69 75
Sort the LIFEEXP data set. PROC SORT sorts the observations by the birth rate.
proc sort data=lifexp;
by populationrate;
run;
Generate the standardized data for all numeric variables and create the output data
set ZSCORE. PROC STANDARD standardizes all numeric variables to a mean of 1 and a
standard deviation of 0. REPLACE replaces missing values. PRINT prints statistics.
proc standard data=lifexp mean=0 std=1 replace
print out=zscore;
Create the standardized values for each BY group. The BY statement standardizes the
values separately by birth rate.
by populationrate;
Assign a format to a variable and specify a title for the report. The FORMAT statement
assigns a format to PopulationRate. The output data set contains formatted values. The TITLE
statement species a title.
format populationrate popfmt.;
title1 Life Expectancies by Birth Rate;
run;
Print the data set. PROC PRINT prints the ZSCORE data set with the standardized values.
The TITLE statements specify two titles to print.
proc print data=zscore noobs;
title Standardized Life Expectancies at Birth;
title2 by a Countrys Birth Rate;
run;
1176
Output
Chapter 45
Output
PROC STANDARD prints the variable name, mean, standard deviation, input frequency, and
label of each variable to standardize for each BY group.
Life expectancies for Bangladesh, Mozambique, and Russia are no longer missing. The missing
values are replaced with the given mean (0).
Name
Life50
Life93
Mean
Standard
Deviation
Label
67.400000
74.500000
1.854724
4.888763
5
6
Name
Life50
Life93
Mean
Standard
Deviation
Label
42.000000
5.033223
8
1950 life expectancy
59.100000
8.225300
10
1993 life expectancy
Standardized Life Expectancies at Birth
2
by a Countrys Birth Rate
Population
Rate
Stable
Stable
Stable
Stable
Stable
Stable
Rapid
Rapid
Rapid
Rapid
Rapid
Rapid
Rapid
Rapid
Rapid
Rapid
Country
France
Germany
Japan
Russia
United Kingdom
United States
Bangladesh
Brazil
China
Egypt
Ethiopia
India
Indonesia
Mozambique
Philippines
Turkey
Life50
Life93
-0.21567
0.32350
-1.83316
0.00000
0.86266
0.86266
0.00000
1.78812
-0.19868
0.00000
-1.78812
-0.59604
-0.79472
0.00000
1.19208
0.39736
0.51138
0.10228
0.92048
-1.94323
0.30683
0.10228
-0.74161
0.96045
1.32518
0.10942
-1.59265
-0.01216
-0.01216
-1.47107
0.59572
0.83888
1177
CHAPTER
46
The SUMMARY Procedure
Overview: SUMMARY Procedure 1177
Syntax: SUMMARY Procedure 1177
PROC SUMMARY Statement 1178
VAR Statement 1178
1178
Chapter 46
VAR Statement
Identies the analysis variables and their order in the results.
If you omit the VAR statement, then PROC SUMMARY produces a simple
count of observations, whereas PROC MEANS tries to analyze all the numeric variables
that are not listed in the other statements.
Interaction: If you specify statistics on the PROC SUMMARY statement and the VAR
statement is omitted, then PROC SUMMARY stops processing and an error message is
written to the SAS log.
Default:
1179
CHAPTER
47
The TABULATE Procedure
Overview: TABULATE Procedure 1180
What Does the TABULATE Procedure Do? 1180
Simple Tables 1180
Complex Tables 1181
PROC TABULATE and the Output Delivery System 1182
Terminology: TABULATE Procedure 1183
Syntax: TABULATE Procedure 1186
PROC TABULATE Statement 1187
BY Statement 1196
CLASS Statement 1197
CLASSLEV Statement 1200
FREQ Statement 1201
KEYLABEL Statement 1202
KEYWORD Statement 1202
TABLE Statement 1203
VAR Statement 1211
WEIGHT Statement 1212
Concepts: TABULATE Procedure 1213
Statistics That Are Available in PROC TABULATE 1213
Formatting Class Variables 1214
Formatting Values in Tables 1215
How Using BY-Group Processing Differs from Using the Page Dimension 1215
Calculating Percentages 1216
Calculating the Percentage of the Value of in a Single Table Cell 1216
Using PCTN and PCTSUM 1217
Specifying a Denominator for the PCTN Statistic 1217
Specifying a Denominator for the PCTSUM Statistic 1218
Using Style Elements in PROC TABULATE 1220
What Are Style Elements? 1220
Using the STYLE= Option 1220
Applying Style Attributes to Table Cells 1221
Using a Format to Assign a Style Attribute 1221
Results: TABULATE Procedure 1222
Missing Values 1222
How PROC TABULATE Treats Missing Values 1222
No Missing Values 1223
A Missing Class Variable 1224
Including Observations with Missing Class Variables 1225
Formatting Headings for Observations with Missing Class Variables 1226
Providing Headings for All Categories 1227
Providing Text for Cells That Contain Missing Values 1228
1180
Chapter 47
Simple Tables
Output 47.1 shows a simple table that was produced by PROC TABULATE. The data
setENERGY on page 1387 contains data on expenditures of energy by two types of
customers, residential and business, in individual states in the Northeast (1) and West
(4) regions of the United States. The table sums expenditures for states within a
geographic division. (The RTS option provides enough space to display the column
headers without hyphenating them.)
options nodate pageno=1 linesize=64
pagesize=40;
proc tabulate data=energy;
class region division type;
Complex Tables
1181
var expenditures;
table region*division, type*expenditures /
rts=20;
run;
Output 47.1
---------------------------------------------|
|
Type
|
|
|-------------------------|
|
|
1
|
2
|
|
|------------+------------|
|
|Expenditures|Expenditures|
|
|------------+------------|
|
|
Sum
|
Sum
|
|------------------+------------+------------|
|Region |Division |
|
|
|--------+---------|
|
|
|1
|1
|
7477.00|
5129.00|
|
|---------+------------+------------|
|
|2
|
19379.00|
15078.00|
|--------+---------+------------+------------|
|4
|3
|
5476.00|
4729.00|
|
|---------+------------+------------|
|
|4
|
13959.00|
12619.00|
----------------------------------------------
Complex Tables
Output 47.2 is a more complicated table using the same data set that was used to
create Output 47.1. The statements that create this report
3
3
3
3
3
For an explanation of the program that produces this report, see Example 6 on page
1246.
1182
Output 47.2
Chapter 47
---------------------------------------------------------------|
|
Customer Base
|
|
|
|-------------------------|
|
|
|Residential | Business |
All
|
|
| Customers | Customers | Customers |
|-----------------------+------------+------------+------------|
|Region
|Division
|
|
|
|
|-----------+-----------|
|
|
|
|Northeast |New England|
7,477|
5,129|
12,606|
|
|-----------+------------+------------+------------|
|
|Middle
|
|
|
|
|
|Atlantic
|
19,379|
15,078|
34,457|
|
|-----------+------------+------------+------------|
|
|Subtotal
|
26,856|
20,207|
47,063|
|-----------+-----------+------------+------------+------------|
|West
|Division
|
|
|
|
|
|-----------|
|
|
|
|
|Mountain
|
5,476|
4,729|
10,205|
|
|-----------+------------+------------+------------|
|
|Pacific
|
13,959|
12,619|
26,578|
|
|-----------+------------+------------+------------|
|
|Subtotal
|
19,435|
17,348|
36,783|
|-----------------------+------------+------------+------------|
|Total for All Regions |
$46,291|
$37,555|
$83,846|
----------------------------------------------------------------
Display 47.1
1183
1184
Figure 47.1
Chapter 47
Column headings
Column
-----------------------------------------------|
Type
|-----------------------|
|Residential|
| Customers | Customers |
Business |
|----------------------+-----------+-----------|
|Region
|Division
|----------+-----------|
$7,477|
$5,129 |
|-----------+-----------+-----------|
|Middle
|Atlantic
$19,379|
$15,078 |
|----------+-----------+-----------+-----------|
|West
|Mountain
|-----------+-----------+-----------|
|Pacific
$5,476|
$13,959|
$4,729 |
$12,619 |
-----------------------------------------------Row
Row headings
Cell
Figure 47.2
1185
Year: 2000
Year: 2001
Year: 2002
page
dimension
column dimension
row dimension
Division
Type
Northeast
New England
Residential Customers
Northeast
New England
Business Customers
Northeast
Middle Atlantic
Residential Customers
Northeast
Middle Atlantic
Business Customers
West
Mountain
Residential Customers
West
Mountain
Business Customers
West
Pacic
Residential Customers
West
Pacic
Business Customers
continuation message
the text that appears below the table if it spans multiple physical pages.
1186
Chapter 47
nested variable
a variable whose values appear in the table with each value of another variable.
In Figure 47.1 on page 1184, Division is nested under Region.
page dimension text
the text that appears above the table if the table has a page dimension. However,
if you specify BOX=_PAGE_ in the TABLE statement, then the text that would
appear above the table appears in the box. In Figure 47.2 on page 1185, the word
Year:, followed by the value, is the page dimension text.
Page dimension text has a style. The default style is Beforecaption. For more
information about using styles, see STYLE= on page 1194 in the PROC
TABULATE statement and Output Delivery System on page 32.
subtable
the group of cells that is produced by crossing a single element from each
dimension of the TABLE statement when one or more dimensions contain
concatenated elements.
Figure 47.1 on page 1184 contains no subtables. For an illustration of a table
that is composed of multiple subtables, see Figure 47.18 on page 1274.
To do this
BY
CLASS
CLASSLEV
FREQ
KEYLABEL
KEYWORD
TABLE
VAR
WEIGHT
To do this
CONTENTS=
DATA=
OUT=
THREADS | NOTHREADS
TRAP
CLASSDATA=
EXCLUSIVE
MISSING
ALPHA=
1187
1188
Chapter 47
To do this
EXCLNPWGTS
QMARKERS=
QMETHOD=
QNTLDEF=
VARDEF=
FORMAT=
FORMCHAR=
NOSEPS
ORDER=
STYLE=
Options
ALPHA=value
species the condence level to compute the condence limits for the mean. The
percentage for the condence limits is (1value)2100. For example, ALPHA=.05
results in a 95% condence limit.
Default: .05
Range:
between 0 and 1
UCLM.
CLASSDATA=SAS-data-set
species a data set that contains the combinations of values of the class variables
that must be present in the output. Any combinations of values of the class variables
that occur in the CLASSDATA= data set but not in the input data set appear in each
table or output data set and have a frequency of zero.
Restriction: The CLASSDATA= data set must contain all class variables. Their
data type and format must match the corresponding class variables in the input
data set.
Interaction: If you use the EXCLUSIVE option, then PROC TABULATE excludes
any observations in the input data set whose combinations of values of class
variables are not in the CLASSDATA= data set.
Tip:
Use the CLASSDATA= data set to lter or supplement the input data set.
Featured in:
1189
CONTENTS=link-name
enables you to name the link in the HTML table of contents that points to the ODS
output of the rst table that was produced by using the TABULATE procedure.
Note: CONTENTS= affects only the contents le of ODS HTML output. It has no
effect on the actual TABULATE procedure reports. 4
DATA=SAS-data-set
EXCLNPWGTS
excludes observations with nonpositive weight values (zero or negative) from the
analysis. By default, PROC TABULATE treats observations with negative weights
like those with zero weights and counts them in the total number of observations.
Alias:
EXCLNPWGT
See also: WEIGHT= on page 1212 and WEIGHT Statement on page 1212
EXCLUSIVE
excludes from the tables and the output data sets all combinations of the class
variable that are not found in the CLASSDATA= data set.
Requirement:
ignored.
Featured in:
FORMAT=format-name
species a default format for the value in each table cell. You can use any SAS or
user-dened format.
Alias:
F=
Default: If you omit FORMAT=, then PROC TABULATE uses BEST12.2 as the
default format.
Interaction: Formats that are specied in a TABLE statement override the format
Tip:
Featured in:
FORMCHAR <(position(s))>=formatting-character(s)
denes the characters to use for constructing the table outlines and dividers.
position(s)
identies the position of one or more characters in the SAS formatting-character
string. A space or a comma separates the positions.
Default: Omitting position(s) is the same as specifying all 20 possible SAS
provides. Table 47.2 on page 1190 shows the formatting characters that PROC
TABULATE uses. Figure 47.3 on page 1191 illustrates the use of each
formatting character in the output from PROC TABULATE.
formatting-character(s)
lists the characters to use for the specied positions. PROC TABULATE assigns
characters in formatting-character(s) to position(s), in the order that they are
listed. For example, the following option assigns the asterisk (*) to the third
1190
Chapter 47
formatting character, the pound sign (#) to the seventh character, and does not
alter the remaining characters:
formchar(3,7)=*#
Interaction: The SAS system option FORMCHAR= species the default formatting
characters. The system option denes the entire string of formatting characters.
The FORMCHAR= option in a procedure can redene selected characters.
Restriction: The FORMCHAR= option affects only the traditional SAS monospace
output destination.
You can use any character in formatting-characters, including hexadecimal
characters. If you use hexadecimal characters, then you must put an x after the
closing quotation mark. For instance, the following option assigns the hexadecimal
character 2D to the third formatting character, assigns the hexadecimal character
7C to the seventh character, and does not alter the remaining characters:
Tip:
formchar(3,7)=2D7Cx
Tip:
formchar(1,2,3,4,5,6,7,8,9,10,11)
=
(11 blanks)
See also: For more information about formatting output, see Chapter 5,
Table 47.2
Position
Default
Used to draw
10
11
Figure 47.3
1191
-----------------------------------|
| Expend |
|
|----------|
4
|
|
Sum
|
|-----------------------+----------|
|Region
|Division
|
|
|-----------+-----------|
|
|Northeast |New England|
$12,606|
|
|-----------+----------|
|
|Middle
|
|
|
|Atlantic
|
$34,457|
|-----------+-----------+----------|
|West
|Mountain
|
$10,205|
|
|-----------+----------|
|
|Pacific
|
$26,578|
------------------------------------
10
11
MISSING
considers missing values as valid values to create the combinations of class variables.
Special missing values that are used to represent numeric values (the letters A
through Z and the underscore (_) character) are each considered as a separate value.
A heading for each missing value appears in the table.
Default: If you omit MISSING, then PROC TABULATE does not include
observations with a missing value for any class variable in the report.
Main Discussion:
1225
See also: SAS Language Reference: Concepts for a discussion of missing values that
eliminates horizontal separator lines from the row titles and the body of the table.
Horizontal separator lines remain between nested column headers.
Restriction: The NOSEPS option affects only the traditional SAS monospace
output destination.
If you want to replace the separator lines with blanks rather than remove
them, then use the FORMCHAR= option on page 1189.
Tip:
Featured in:
NOTHREADS
species the sort order to create the unique combinations of the values of the class
variables, which form the headings of the table, according to the specied order.
DATA
orders values according to their order in the input data set.
1192
Chapter 47
Interaction: If you use PRELOADFMT in the CLASS statement, then the order
for the values of each class variable matches the order that PROC FORMAT
uses to store the values of the associated user-dened format. If you use the
CLASSDATA= option, then PROC TABULATE uses the order of the unique
values of each class variable in the CLASSDATA= data set to order the output
levels. If you use both options, then PROC TABULATE rst uses the
user-dened formats to order the output. If you omit EXCLUSIVE, then PROC
TABULATE appends after the user-dened format and the CLASSDATA=
values the unique values of the class variables in the input data set in the same
order in which they are encountered.
Tip: By default, PROC FORMAT stores a format denition in sorted order. Use
the NOTSORTED option to store the values or ranges of a user dened format
in the order that you dene them.
FORMATTED
orders values by their ascending formatted values. If no format has been assigned
to a numeric class variable, then the default format, BEST12., is used. This order
depends on your operating environment.
Alias: FMT | EXTERNAL
FREQ
orders values by descending frequency count.
Interaction: Use the ASCENDING option in the CLASS statement to order
values by ascending frequency count.
UNFORMATTED
orders values by their unformatted values, which yields the same order as PROC
SORT. This order depends on your operating environment. This sort sequence is
particularly useful for displaying dates chronologically.
Alias: UNFMT | INTERNAL
Default: UNFORMATTED
Interaction: If you use the PRELOADFMT option in the CLASS statement, then
PROC TABULATE orders the levels by the order of the values in the user-dened
format.
Featured in: Understanding the Order of Headings with ORDER=DATA on page
1230
OUT=SAS-data-set
names the output data set. If SAS-data-set does not exist, then PROC TABULATE
creates it.
The number of observations in the output data set depends on the number of
categories of data that are used in the tables and the number of subtables that are
generated. The output data set contains these variables (in this order):
by variables
variables that are listed in the BY statement.
class variables
variables that are listed in the CLASS statement.
_TYPE_
a character variable that shows which combination of class variables produced the
summary statistics in that observation. Each position in _TYPE_ represents one
variable in the CLASS statement. If that variable is in the category that produced
the statistic, then the position contains a 1; if it is not, then the position contains a
0. In simple PROC TABULATE steps that do not use the universal class variable
ALL, all values of _TYPE_ contain only 1s because the only categories that are
1193
being considered involve all class variables. If you use the variable ALL, then your
tables will contain data for categories that do not include all the class variables,
and positions of _TYPE_ will, therefore, include both 1s and 0s.
_PAGE_
The logical page that contains the observation.
_TABLE_
The number of the table that contains the observation.
statistics
statistics that are calculated for each observation in the data set.
Featured in:
PCTLDEF=
species the default number of markers to use for the P quantile estimation method.
The number of markers controls the size of xed memory space.
Default: The default value depends on which quantiles you request. For the median
(P50), number is 7. For the quartiles (P25 and P75), number is 25. For the
quantiles P1, P5, P10, P90, P95, or P99, number is 105. If you request several
quantiles, then PROC TABULATE uses the largest default value of number.
Range: an odd integer greater than 3
Increase the number of markers above the default settings to improve the
accuracy of the estimates; reduce the number of markers to conserve memory and
computing time.
Tip:
Main Discussion:
QMETHOD=OS|P2|HIST
species the method PROC TABULATE uses to process the input data when it
computes quantiles. If the number of observations is less than or equal to the
QMARKERS= value and QNTLDEF=5, then both methods produce the same results.
OS
uses order statistics. This is the technique that PROC UNIVARIATE uses.
Note: This technique can be very memory-intensive.
P2|HIST
2
uses the P method to approximate the quantile.
Default: OS
Restriction: When QMETHOD=P2, PROC TABULATE does not compute weighted
quantiles.
When QMETHOD=P2, reliable estimates of some quantiles (P1, P5, P95, P99)
may not be possible for some types of data.
Tip:
Main Discussion:
QNTLDEF=1|2|3|4|5
species the mathematical denition that the procedure uses to calculate quantiles
when QMETHOD=OS is specied. When QMETHOD=P2, you must use
QNTLDEF=5.
Default: 5
Alias:
PCTLDEF=
Main discussion:
1194
Chapter 47
STYLE=<style-element-name|PARENT>[style-attribute-name=style-attributevalue< style-attribute-name=style-attribute-value>]
species the style element to use for the data cells of a table when it is used in the
PROC TABULATE statement. For example, the following statement species that
the background color for data cells be red:
proc tabulate data=one style=[background=red];
You can use braces ({ and }) instead of square brackets ([ and ]).
style-element-name
is the name of a style element that is part of a style denition that is registered
with the Output Delivery System. SAS provides some style denitions. You can
create your own style denitions with PROC TEMPLATE.
Default: If you do not specify a style element, then PROC TABULATE uses Data.
See also: See SAS Output Delivery System: Users Guide for information about
PROC TEMPLATE and the default style denitions.
PARENT
species that the data cell use the style element of its parent heading. The parent
style element of a data cell is one of the following:
3 the style element of the leaf heading above the column that contains the data
cell, if the table species no row dimension, or if the table species the style
element in the column dimension expression.
3 the style element of the leaf heading above the row that contains the cell, if
the table species the style element in the row dimension expression.
3 the Beforecaption style element, if the table species the style element in the
page dimension expression.
3 undened, otherwise.
Note: The parent of a heading (not applicable to STYLE= in the PROC
TABULATE statement) is the heading under which the current heading is
nested. 4
style-attribute-name
species the attribute to change. The following table shows attributes that you can
set or change with the STYLE= option in the PROC TABULATE statement (or in
any other statement that uses STYLE=, except for the TABLE statement). Note
that not all attributes are valid in all destinations.
ASIS=
FONT_WIDTH=
BACKGROUND=
HREFTARGET=
BACKGROUNDIMAGE=
HTMLCLASS=
BORDERCOLOR=
JUST=
BORDERCOLORDARK=
NOBREAKSPACE=
BORDERCOLORLIGHT=
POSTHTML=
BORDERWIDTH=
POSTIMAGE=
CELLHEIGHT=
POSTTEXT=
CELLWIDTH=
PREHTML=
FLYOVER=
PREIMAGE=
FONT=
PRETEXT=
FONT_FACE=
PROTECTSPECIALCHARS=
FONT_SIZE=
TAGATTR=
FONT_STYLE=
URL=
FONT_WEIGHT=
1195
VJUST=
style-attribute-value
species a value for the attribute. Each attribute has a different set of valid
values. See SAS Output Delivery System: Users Guide for more information about
these style attributes, their valid values, and their applicable destinations.
Alias:
S=
Restriction: This option affects only the HTML, RTF, and Printer destinations.
To specify a style element for data cells with missing values, use STYLE= in
the TABLE statement MISSTEXT= option.
Tip:
THREADS | NOTHREADS
enables or disables parallel processing of the input data set. This option overrides
the SAS system option THREADS | NOTHREADS. See SAS Language Reference:
Concepts for more information about parallel processing.
Default: value of SAS system option THREADS | NOTHREADS.
Interaction: PROC TABULATE uses the value of the SAS system option THREADS
except when a BY statement is specied or the value of the SAS system option
CPUCOUNT is equal to 1. In those cases, you can use THREADS in the PROC
TABULATE statement to force PROC TABULATE to use parallel processing.
TRAP | NOTRAP
enables or disables oating point exception (FPE) recovery during data processing
beyond that provided by normal SAS FPE handling, which terminates PROC
TABULATE in the case of math exceptions. Note that with NOTRAP, normal SAS
FPE handling is still in effect so that PROC TABULATE terminates in the case of
math exceptions.
Default: NOTRAP
VARDEF=divisor
species the divisor to use in the calculation of the variance and standard deviation.
Table 47.3 on page 1195 shows the possible values for divisor and the associated
divisors.
Table 47.3
Value
Divisor
DF
degrees of freedom
number of observations
WDF
WEIGHT | WGT
sum of weights
6i wi
1196
BY Statement
Chapter 47
P x 0x
CSS=divisor
CSS
CSS
Pw
x 0x
Default: DF
Requirement:
VARDEF=.
When you use the WEIGHT statement and VARDEF=DF, the variance is an
( i) = 2
estimate of 2 , where the variance of the ith observation is
i , and
i is the weight for the ith observation. This yields an estimate of the variance of
an observation with unit weight.
Tip:
=w
var x
When you use the WEIGHT statement and VARDEF=WGT, the computed
, where
is the
variance is asymptotically (for large n) an estimate of 2
average weight. This yields an asymptotic estimate of the variance of an
observation with average weight.
Tip:
=w
BY Statement
Creates a separate table on a separate page for each BY group.
Main discussion: BY on page 58
Required Arguments
variable
species the variable that the procedure uses to form BY groups. You can specify
more than one variable. If you do not use the NOTSORTED option in the BY
statement, then the observations in the data set must either be sorted by all the
variables that you specify, or they must be indexed appropriately. Variables in a BY
statement are called BY variables.
Options
DESCENDING
species that the observations are sorted in descending order by the variable that
immediately follows the word DESCENDING in the BY statement.
NOTSORTED
species that observations are not necessarily sorted in alphabetic or numeric order.
The observations are grouped in another way, for example, chronological order.
The requirement for ordering or indexing observations according to the values of
BY variables is suspended for BY-group processing when you use the NOTSORTED
CLASS Statement
1197
option. In fact, the procedure does not use an index if you specify NOTSORTED. The
procedure denes a BY group as a set of contiguous observations that have the same
values for all BY variables. If observations with the same values for the BY variables
are not contiguous, then the procedure treats each contiguous set as a separate BY
group.
CLASS Statement
Identies class variables for the table. Class variables determine the categories that PROC
TABULATE uses to calculate statistics.
You can use multiple CLASS statements.
Tip: Some CLASS statement options are also available in the PROC TABULATE
statement. They affect all CLASS variables rather than just the one(s) that you specify
in a CLASS statement.
Tip:
Required Arguments
variable(s)
species one or more variables that the procedure uses to group the data. Variables
in a CLASS statement are referred to as class variables. Class variables can be
numeric or character. Class variables can have continuous values, but they typically
have a few discrete values that dene the classications of the variable. You do not
have to sort the data by class variables.
Options
ASCENDING
excludes from tables and output data sets all combinations of class variables that are
not found in the preloaded range of user-dened formats.
Requirement: You must specify the PRELOADFMT option in the CLASS statement
to preload the class variable formats.
1198
CLASS Statement
Chapter 47
Featured in:
GROUPINTERNAL
species not to apply formats to the class variables when PROC TABULATE groups
the values to create combinations of class variables.
Interaction: If you specify the PRELOADFMT option in the CLASS statement,
then PROC TABULATE ignores the GROUPINTERNAL option and uses the
formatted values.
Interaction: If you specify the ORDER=FORMATTED option, then PROC
TABULATE ignores the GROUPINTERNAL option and uses the formatted values.
This option saves computer resources when the class variables contain discrete
numeric values.
Tip:
MISSING
considers missing values as valid class variable levels. Special missing values that
represent numeric values (the letters A through Z and the underscore (_) character)
are each considered as a separate value.
Default: If you omit MISSING, then PROC TABULATE excludes the observations
with any missing CLASS variable values from tables and output data sets.
See also: SAS Language Reference: Concepts for a discussion of missing values with
special meanings.
MLF
enables PROC TABULATE to use the format label or labels for a given range or
overlapping ranges to create subgroup combinations when a multilabel format is
assigned to a class variable.
You must use PROC FORMAT and the MULTILABEL option in the
VALUE statement to create a multilabel format.
Interaction: Using MLF with ORDER=FREQ may not produce the order that you
expect for the formatted values.
Requirement:
Interaction: When you specify MLF, the formatted values of the class variable
FORMAT procedure.
Featured in:
Note: When the formatted values overlap, one internal class variable value maps
to more than one class variable subgroup combination. Therefore, the sum of the N
statistics for all subgroups is greater than the number of observations in the data set
(the overall N statistic). 4
ORDER=DATA | FORMATTED | FREQ | UNFORMATTED
species the order to group the levels of the class variables in the output, where
DATA
orders values according to their order in the input data set.
Interaction: If you use PRELOADFMT, then the order for the values of each class
variable matches the order that PROC FORMAT uses to store the values of the
associated user-dened format. If you use the CLASSDATA= option in the
PROC statement, then PROC TABULATE uses the order of the unique values of
each class variable in the CLASSDATA= data set to order the output levels. If
CLASS Statement
1199
you use both options, then PROC TABULATE rst uses the user-dened
formats to order the output. If you omit EXCLUSIVE in the PROC statement,
then PROC TABULATE places, in the order in which they are encountered, the
unique values of the class variables that are in the input data set after the
user-dened format and the CLASSDATA= values.
Tip: By default, PROC FORMAT stores a format denition in sorted order. Use
FREQ
orders values by descending frequency count.
Interaction: Use the ASCENDING option to order values by ascending frequency
count.
UNFORMATTED
orders values by their unformatted values, which yields the same order as PROC
SORT. This order depends on your operating environment. This sort sequence is
particularly useful for displaying dates chronologically.
Alias: UNFMT | INTERNAL
Default: UNFORMATTED
Interaction: If you use the PRELOADFMT option in the CLASS statement, then
PROC TABULATE orders the levels by the order of the values in the user-dened
format.
By default, all orders except FREQ are ascending. For descending orders, use
the DESCENDING option.
Tip:
Featured in:
1230
PRELOADFMT
species that all formats are preloaded for the class variables.
PRELOADFMT has no effect unless you specify EXCLUSIVE,
ORDER=DATA, or PRINTMISS and you assign formats to the class variables.
Requirement:
class variable values present in the input data set, use the EXCLUSIVE option in
the CLASS statement.
Interaction: To include all ranges and values of the user-dened formats in the
1200
CLASSLEV Statement
Chapter 47
STYLE=<style-element-name|PARENT>[style-attribute-name=style-attributevalue< style-attribute-name=style-attribute-value>]
species the style element to use for page dimension text and class variable name
headings. For information about the arguments of this option, and how it is used, see
STYLE= on page 1194 in the PROC TABULATE statement.
Note: When you use STYLE= in the CLASS statement, it differs slightly from its
use in the PROC TABULATE statement. In the CLASS statement, the parent of the
heading is the page dimension text or heading under which the current heading is
nested. 4
Note: If a page dimension expression contains multiple nested elements, then the
Beforecaption style element is the style element of the rst element in the nesting. 4
Alias:
S=
Restriction: This option affects only the HTML, RTF, and Printer destinations.
To override a style element that is specied for page dimension text in the
CLASS statement, you can specify a style element in the TABLE statement page
dimension expression.
Tip:
To override a style element that is specied for a class variable name heading
in the CLASS statement, you can specify a style element in the related TABLE
statement dimension expression.
Tip:
Featured in:
CLASSLEV Statement
Species a style element for class variable level value headings.
Restriction:
This statement affects only the HTML, RTF, and Printer destinations.
FREQ Statement
1201
Required Arguments
variable(s)
species one or more class variables from the CLASS statement for which you want
to specify a style element.
Options
STYLE=<style-element-name|PARENT>[style-attribute-name=style-attributevalue< style-attribute-name=style-attribute-value>]
species a style element for class variable level value headings. For information on
the arguments of this option and how it is used, see STYLE= on page 1194 in the
PROC TABULATE statement.
Note: When you use STYLE= in the CLASSLEV statement, it differs slightly
from its use in the PROC TABULATE statement. In the CLASSLEV statement, the
parent of the heading is the heading under which the current heading is nested. 4
Alias:
S=
Restriction: This option affects only the HTML, RTF, and Printer destinations.
Tip:
Featured in:
FREQ Statement
Species a numeric variable that contains the frequency of each observation.
The effects of the FREQ and WEIGHT statements are similar except when
calculating degrees of freedom.
Tip:
See also: For an example that uses the FREQ statement, see FREQ on page 61.
FREQ variable;
Required Arguments
variable
species a numeric variable whose value represents the frequency of the observation.
If you use the FREQ statement, then the procedure assumes that each observation
represents n observations, where n is the value of variable. If n is not an integer,
then SAS truncates it. If n is less than 1 or is missing, then the procedure does not
use that observation to calculate statistics.
The sum of the frequency variable represents the total number of observations.
1202
KEYLABEL Statement
Chapter 47
KEYLABEL Statement
Labels a keyword for the duration of the PROC TABULATE step. PROC TABULATE uses the label
anywhere that the specied keyword would otherwise appear.
KEYLABEL keyword-1=description-1
<keyword-n=description-n>;
Required Arguments
keyword
is one of the keywords for statistics that is discussed in Statistics That Are Available
in PROC TABULATE on page 1213 or is the universal class variable ALL (see
Elements That You Can Use in a Dimension Expression on page 1208).
description
is up to 256 characters to use as a label. As the syntax shows, you must enclose
description in quotation marks.
Restriction: Each keyword can have only one label in a particular PROC
TABULATE step; if you request multiple labels for the same keyword, then PROC
TABULATE uses the last one that is specied in the step.
KEYWORD Statement
Species a style element for keyword headings.
Restriction:
This statement affects only the HTML, RTF, and Printer output.
Required Arguments
keyword
is one of the keywords for statistics that is discussed in Statistics That Are Available
in PROC TABULATE on page 1213 or is the universal class variable ALL (see
Elements That You Can Use in a Dimension Expression on page 1208).
Options
STYLE=<style-element-name|PARENT>[style-attribute-name=style-attributevalue< style-attribute-name=style-attribute-value>]
TABLE Statement
1203
species a style element for the keyword headings. For information on the
arguments of this option and how it is used, see STYLE= on page 1194 in the PROC
TABULATE statement.
Note: When you use STYLE= in the KEYWORD statement, it differs slightly
from its use in the PROC TABULATE statement. In the KEYWORD statement, the
parent of the heading is the heading under which the current heading is nested. 4
Alias: S=
Restriction: This option affects only the HTML, RTF, and Printer destinations.
Tip: To override a style element that is specied in the KEYWORD statement, you
can specify a style element in the related TABLE statement dimension expression.
Featured in: Example 14 on page 1279
TABLE Statement
Describes a table to print.
Requirement: All variables in the TABLE statement must appear in either the VAR
statement or the CLASS statement.
Tip: Use multiple TABLE statements to create several tables.
Required Arguments
column-expression
Options
To do this
Add dimensions
Dene the pages in a table
page-expression
row-expression
CONTENTS=
FORMAT_PRECEDENCE=
STYLE=
1204
TABLE Statement
Chapter 47
To do this
STYLE_PRECEDENCE=
BOX=
MISSTEXT=
NOCONTINUED
CONDENSE
Create the same row and column headings for all logical pages
of the table
PRINTMISS
INDENT=
ROW=
RTSPACE=
BOX=value
BOX={<label=value>
<STYLE=<style-element-name>[style-attribute-name=style-attribute-value<
style-attribute-name=style-attribute-value>]>}
species text and a style element for the empty box above the row titles.
Value can be one of the following:
_PAGE_
writes the page-dimension text in the box. If the page-dimension text does not t,
then it is placed in its default position above the box, and the box remains empty.
string
writes the quoted string in the box. Any string that does not t in the box is
truncated.
variable
writes the name (or label, if the variable has one) of a variable in the box. Any
name or label that does not t in the box is truncated.
For details about the arguments of the STYLE= option and how it is used, see
STYLE= on page 1194 in the PROC TABULATE statement.
Featured in:
CONDENSE
prints as many complete logical pages as possible on a single printed page or, if
possible, prints multiple pages of tables that are too wide to t on a page one below
TABLE Statement
1205
the other on a single page, instead of on separate pages. A logical page is all the
rows and columns that fall within one of the following:
3 a page-dimension category (with no BY-group processing)
3 a BY group with no page dimension
CONTENTS=link-name
enables you to name the link in the HTML table of contents that points to the ODS
output of the table that is produced by using the TABLE statement.
Note: CONTENTS= affects only the contents le of ODS HTML output. It has no
effect on the actual TABULATE procedure reports. 4
FORMAT_PRECEDENCE=PAGE|ROW|COLUMN|COL
species whether the format that is specied for the page dimension (PAGE), row
dimension (ROW), or column dimension (COLUMN or COL) is applied to the
contents of the table cells.
Default: COLUMN
FUZZ=number
supplies a numeric value against which analysis variable values and table cell values
other than frequency counts are compared to eliminate trivial values (absolute values
less than the FUZZ= value) from computation and printing. A number whose
absolute value is less than the FUZZ= value is treated as zero in computations and
printing. The default value is the smallest representable oating-point number on
the computer that you are using.
INDENT=number-of-spaces
species the number of spaces to indent nested row headings, and suppresses the
row headings for class variables.
Tip: When there are no crossings in the row dimension, there is nothing to indent,
so the value of number-of-spaces has no effect. However, in such cases INDENT=
still suppresses the row headings for class variables.
Restriction: In the HTML, RTF, and Printer destinations, the INDENT= option
suppresses the row headings for class variables but does not indent nested row
headings.
Featured in: Example 8 on page 1251 (with crossings) and Example 9 on page 1253
(without crossings)
MISSTEXT=text
MISSTEXT={<label= text> <STYLE=<style-element-name>
[style-attribute-name=style-attribute-value<
style-attribute-name=style-attribute-value>]>}
supplies up to 256 characters of text to print and species a style element for table
cells that contain missing values. For details on the arguments of the STYLE= option
and how it is used, see STYLE= on page 1194 in the PROC TABULATE statement.
Interaction: A style element that is specied in a dimension expression overrides a
style element that is specied in the MISSTEXT= option for any given cell(s).
Providing Text for Cells That Contain Missing Values on page 1228
and Example 14 on page 1279
Featured in:
1206
TABLE Statement
Chapter 47
NOCONTINUED
Featured in:
PRINTMISS
prints all values that occur for a class variable each time headings for that variable
are printed, even if there are no data for some of the cells that these headings create.
Consequently, PRINTMISS creates row and column headings that are the same for
all logical pages of the table, within a single BY group.
Default: If you omit PRINTMISS, then PROC TABULATE suppresses a row or
column for which there are no data, unless you use the CLASSDATA= option in
the PROC TABULATE statement.
Restrictions: If an entire logical page contains only missing values, then that page
does not print regardless of the PRINTMISS option.
See also: CLASSDATA= option on page 1188
Featured in: Providing Headings for All Categories on page 1227
ROW=spacing
species whether all title elements in a row crossing are allotted space even when
they are blank. The possible values for spacing are as follows:
CONSTANT
allots space to all row titles even if the title has been blanked out (for example,
N= ).
Alias: CONST
FLOAT
divides the row title space equally among the nonblank row titles in the crossing.
Default: CONSTANT
Featured in: Example 7 on page 1249
row-expression
denes the rows in the table. For information on constructing dimension expressions,
see Constructing Dimension Expressions on page 1208.
Restriction: A row dimension is the next to last dimension in a table statement. A
column dimension must follow a row dimension. A page dimension may precede a
row dimension.
RTSPACE=number
species the number of print positions to allot to all of the headings in the row
dimension, including spaces that are used to print outlining characters for the row
headings. PROC TABULATE divides this space equally among all levels of row
headings.
Alias: RTS=
Default: one-fourth of the value of the SAS system option LINESIZE=
TABLE Statement
1207
Restriction: The RTSPACE= option affects only the traditional SAS monospace
output destination.
Interaction: By default, PROC TABULATE allots space to row titles that are blank.
Use ROW=FLOAT in the TABLE statement to divide the space among only
nonblank titles.
See also: For more information about controlling the space for row titles, see
STYLE=<style-element-name> [style-attribute-name=style-attribute-value<
style-attribute-name=style-attribute-value>]
species a style element to use for parts of the table other than table cells. For
information about the arguments of this option and how it is used, see STYLE= on
page 1194 in the PROC TABULATE statement.
Note: The list of attributes that you can set or change with the STYLE= option in
the TABLE statement differs from that of the PROC TABULATE statement. 4
The following table shows the attributes that you can set or change with the
STYLE= option in the TABLE statement. Most of these attributes apply to parts of
the table other than cells (for example, table borders and the lines between columns
and rows). Attributes that you apply in the PROC TABULATE statement and in
other locations in the PROC TABULATE step apply to cells within the table. Note
that not all attributes are valid in all destinations. See SAS Output Delivery System:
Users Guide for more information about these style attributes, their valid values,
and their applicable destinations.
BACKGROUND=
FONT_WIDTH=*
BACKGROUNDIMAGE=
FOREGROUND=*
BORDERCOLOR=
FRAME=
BORDERCOLORDARK=
HTMLCLASS=
BORDERCOLORLIGHT=
JUST=
BORDERWIDTH=
OUTPUTWIDTH=
CELLPADDING=
POSTHTML=
CELLSPACING=
POSTIMAGE=
FONT=*
POSTTEXT=
*
FONT_FACE=
*
PREHTML=
FONT_SIZE=
PREIMAGE=
FONT_STYLE=*
PRETEXT=
FONT_WEIGHT=*
RULES=
* When you use these attributes in this location, they affect only the text that is specied
with the PRETEXT=, POSTTEXT=, PREHTML=, and POSTHTML= attributes. To alter
the foreground color or the font for the text that appears in the table, you must set the
corresponding attribute in a location that affects the cells rather than the table.
Note: You can use braces ({ and }) instead of square brackets ([ and ]).
Alias:
S=
1208
TABLE Statement
Chapter 47
Restriction: This option affects only the HTML, RTF, and Printer destinations.
Tip:
Featured in:
STYLE_PRECEDENCE=PAGE|ROW|COLUMN|COL
species whether the style that is specied for the page dimension (PAGE), row
dimension (ROW), or column dimension (COLUMN or COL) is applied to the
contents of the table cells.
Default: COLUMN
on page 1269
Note: If the input data set contains a variable named ALL, then enclose the
name of the universal class variable in quotation marks. 4
keywords for statistics
See Statistics That Are Available in PROC TABULATE on page 1213 for a list of
available statistics. Use the asterisk (*) operator to associate a statistic keyword
with a variable. The N statistic (number of nonmissing values) can be specied in
a dimension expression without associating it with a variable.
Restriction: Statistic keywords other than N must be associated with an
analysis variable.
Default: For analysis variables, the default statistic is SUM. Otherwise, the
default statistic is N.
TABLE Statement
1209
Examples:
n
Region*n
Sales*max
format modiers
dene how to format values in cells. Use the asterisk (*) operator to associate a
format modier with the element (an analysis variable or a statistic) that produces
the cells that you want to format. Format modiers have the form
f=format
Example:
Sales*f=dollar8.2
Tip:
See also:
labels
temporarily replace the names of variables and statistics. Labels affect only the
variable or statistic that immediately precedes the label. Labels have the form
statistic-keyword-or-variable-name=label-text
PROC TABULATE eliminates the space for blank column headings from a
table but by default does not eliminate the space for blank row headings unless
all row headings are blank. Use ROW=FLOAT in the TABLE statement to
remove the space for blank row headings.
Tip:
Examples:
Region=Geographical Region
Sales*max=Largest Sale
style-element specications
specify style elements for page dimension text, headings, or data cells. For details,
see Specifying Style Elements in Dimension Expressions on page 1210.
(blank)
places the output for each element immediately after the output for the preceding
element. This process is called concatenation.
1210
TABLE Statement
Chapter 47
Example:
n Region*Sales ALL
Featured in:
parentheses ()
group elements and associate an operator with each concatenated element in the
group.
Examples:
Division*(Sales*max Sales*min)
(Region ALL)*Sales
Featured in:
Note: When used in a dimension expression, the STYLE= option must be enclosed
within square brackets ([ and ]) or braces ({ and }). 4
With the exception of (CLASSLEV), all arguments are described in STYLE= on page
1194 in the PROC TABULATE statement.
VAR Statement
1211
(CLASSLEV)
assigns a style element to a class variable level value heading. For example, the
following TABLE statement species that the level value heading for the class
variable, DEPT, has a foreground color of yellow:
table dept=[style(classlev)=
[foreground=yellow]]*sales;
For an example that shows how to specify style elements within dimension
expressions, see Example 14 on page 1279.
VAR Statement
Identies numeric variables to use as analysis variables.
Alias:
Tip:
VARIABLES
You can use multiple VAR statements.
Required Arguments
analysis-variable(s);
identies the analysis variables in the table. Analysis variables are numeric
variables for which PROC TABULATE calculates statistics. The values of an analysis
variable can be continuous or discrete.
If an observation contains a missing value for an analysis variable, then PROC
TABULATE omits that value from calculations of all statistics except N (the number
of observations with nonmissing variable values) and NMISS (the number of
observations with missing variable values). For example, the missing value does not
increase the SUM, and it is not counted when you are calculating statistics such as
the MEAN.
Options
STYLE=<style-element-name|PARENT>[style-attribute-name=style-attributevalue< style-attribute-name=style-attribute-value>]
species a style element for analysis variable name headings. For information on the
arguments of this option and how it is used, see STYLE= on page 1194 in the PROC
TABULATE statement.
Note: When you use STYLE= in the VAR statement, it differs slightly from its
use in the PROC TABULATE statement. In the VAR statement, the parent of the
heading is the heading under which the current heading is nested. 4
1212
WEIGHT Statement
Alias:
Chapter 47
S=
Restriction: This option affects only the HTML, RTF, and Printer destinations.
To override a style element that is specied in the VAR statement, you can
specify a style element in the related TABLE statement dimension expression.
Featured in: Example 14 on page 1279
Tip:
WEIGHT=weight-variable
species a numeric variable whose values weight the values of the variables that are
specied in the VAR statement. The variable does not have to be an integer. If the
value of the weight variable is
Weight value
PROC TABULATE
less than 0
converts the value to zero and counts the observation in the total
number of observations
missing
To exclude observations that contain negative and zero weights from the analysis,
use EXCLNPWGT. Note that most SAS/STAT procedures, such as PROC GLM,
exclude negative and zero weights by default.
Restriction: To compute weighted quantiles, use QMETHOD=OS in the PROC
statement.
Tip: When you use the WEIGHT= option, consider which value of the VARDEF=
option is appropriate (see the discussion of VARDEF= on page 1195).
Tip: Use the WEIGHT option in multiple VAR statements to specify different
weights for the analysis variables.
Note: Prior to Version 7 of SAS, the procedure did not exclude the observations
with missing weights from the count of observations. 4
WEIGHT Statement
Species weights for analysis variables in the statistical calculations.
See also: For information on calculating weighted statistics and for an example that
uses the WEIGHT statement, see Calculating Weighted Statistics on page 64
WEIGHT variable;
Required Arguments
variable
species a numeric variable whose values weight the values of the analysis variables.
The values of the variable do not have to be integers. PROC TABULATE responds to
weight values in accordance with the following table.
Weight value
less than 0
missing
1213
To exclude observations that contain negative and zero weights from the analysis,
use EXCLNPWGT. Note that most SAS/STAT procedures, such as PROC GLM,
exclude negative and zero weights by default.
Restriction: To compute weighted quantiles, use QMETHOD=OS in the PROC
statement.
Interaction: If you use the WEIGHT= option in a VAR statement to specify a
weight variable, then PROC TABULATE uses this variable instead to weight those
VAR statement variables.
When you use the WEIGHT statement, consider which value of the VARDEF=
option is appropriate. See the discussion of VARDEF= on page 1195 and the
calculation of weighted statistics in Keywords and Formulas on page 1340 for
more information.
Tip:
Note: Prior to Version 7 of SAS, the procedure did not exclude the observations
with missing weights from the count of observations. 4
PCTSUM
COLPCTSUM
RANGE
CSS
REPPCTN
CV
REPPCTSUM
KURTOSIS | KURT
ROWPCTN
LCLM
ROWPCTSUM
MAX
SKEWNESS | SKEW
MEAN
STDDEV|STD
MIN
STDERR
1214
Chapter 47
SUM
NMISS
SUMWGT
PAGEPCTN
UCLM
PAGEPCTSUM
USS
PCTN
VAR
Q3|P75
P1
P90
P5
P95
P10
P99
Q1|P25
QRANGE
These statistics, the formulas that are used to calculate them, and their data
requirements are discussed in Keywords and Formulas on page 1340.
To compute standard error of the mean (STDERR) or Students t-test, you must use
the default value of the VARDEF= option, which is DF. The VARDEF= option is
specied in the PROC TABULATE statement.
To compute weighted quantiles, you must use QMETHOD=OS in the PROC
TABULATE statement.
Use both LCLM and UCLM to compute a two-sided condence limit for the mean.
Use only LCLM or UCLM to compute a one-sided condence limit. Use the ALPHA=
option in the PROC TABULATE statement to specify a condence level.
run;
0-29=Under 30
30-39=30-39
40-49=40-49
50-59=50-59
60-69=60-69
other=70 or over;
How Using BY-Group Processing Differs from Using the Page Dimension
1215
For information on creating user-dened formats, see Chapter 22, The FORMAT
Procedure, on page 429.
By default, PROC TABULATE includes in a table only those formats for which the
frequency count is not zero and for which values are not missing. To include missing
values for all class variables in the output, use the MISSING option in the PROC
TABULATE statement, and to include missing values for selected class variables, use
the MISSING option in a CLASS statement. To include formats for which the frequency
count is zero, use the PRELOADFMT option in a CLASS statement and the
PRINTMISS option in the TABLE statement, or use the CLASSDATA= option in the
PROC TABULATE statement.
3 changing the default format with the FORMAT= option in the PROC TABULATE
statement
(12.2).
2 The FORMAT= option in the PROC TABULATE statement changes the default
format. If no format modiers affect a cell, then PROC TABULATE uses this
format for the value in that cell.
3 A format modier in the page dimension applies to the values in all the table cells
on the logical page unless you specify another format modier for a cell in the row
or column dimension.
4 A format modier in the row dimension applies to the values in all the table cells
in the row unless you specify another format modier for a cell in the column
dimension.
5 A format modier in the column dimension applies to the values in all the table
1216
Calculating Percentages
Table 47.4
Chapter 47
Issue
Order of observations
in the input data set
Sorting is unnecessary.
One report
summarizing all BY
groups
Percentages
Titles
Ordering class
variables
Obtaining uniform
headings
1 You can use the BY statement without sorting the data set if the data set has an index for the BY variable.
Calculating Percentages
Calculating the Percentage of the Value of in a Single Table Cell
The following statistics print the percentage of the value in a single table cell in
relation to the total of the values in a group of cells. No denominator denitions are
required; however, an analysis variable may be used as a denominator denition for
percentage sum statistics.
REPPCTN and REPPCTSUM statisticsprint the percentage of the value in a single
table cell in relation to the total of the values in the report.
COLPCTN and COLPCTSUM statisticsprint the percentage of the value in a single
table cell in relation to the total of the values in the column.
ROWPCTN and ROWPCTSUM statisticsprint the percentage of the value in a
single table cell in relation to the total of the values in the row.
PAGEPCTN and PAGEPCTSUM statisticsprint the percentage of the value in a
single table cell in relation to the total of the values in the page.
Calculating Percentages
1217
These statistics calculate the most commonly used percentages. See Example 12 on
page 1266 for an example.
The TABLE statement creates a row for each value of Division and a column for
each value of Type. Within each row, the TABLE statement nests four statistics: N and
three different calculations of PCTN (see Figure 47.4 on page 1218). Each occurrence of
PCTN uses a different denominator denition.
1218
Calculating Percentages
Figure 47.4
Highlighted
Chapter 47
u <type> sums the frequency counts for all occurrences of Type within the same
value of Division. Thus, for Division=1, the denominator is 6 + 6, or 12.
v <division> sums the frequency counts for all occurrences of Division within the
same value of Type. Thus, for Type=1, the denominator is 6 + 3 + 8 + 5, or 22.
w The third use of PCTN has no denominator denition. Omitting a denominator
denition is the same as including all class variables in the denominator denition.
Thus, for all cells, the denominator is 6 + 3 + 8 + 5 + 6 + 3 + 8 + 5, or 44.
Calculating Percentages
1219
pctsum<division>=% of column v
pctsum=% of all customers), w
type*expenditures/rts=40;
title Expenditures in Each Division;
run;
The TABLE statement creates a row for each value of Division and a column for each
value of Type. Because Type is crossed with Expenditures, the value in each cell is the
sum of the values of Expenditures for all observations that contribute to the cell.
Within each row, the TABLE statement nests four statistics: SUM and three different
calculations of PCTSUM (see Figure 47.5 on page 1219). Each occurrence of PCTSUM
uses a different denominator denition.
Figure 47.5
-------------------------------------------------------|
|
Type
|
|
|---------------------|
|
|
1
|
2
|
|
|----------+----------|
|
| Expend | Expend |
|--------------------------------+----------+----------|
|Division
|
|
|
|
|-----------+--------------------|
|
|
|1
|Expenditures
| $7,477.00| $5,129.00|
|
|--------------------+----------+----------|
|
|% of row
|
59.31|
40.69|
|
|--------------------+----------+----------|
|
|% of column
|
16.15|
13.66|
|
|--------------------+----------+----------|
|
|% of all customers |
8.92|
6.12|
|-----------+--------------------+----------+----------|
|2
|Expenditures
|$19,379.00|$15,078.00|
|
|--------------------+----------+----------|
|
|% of row
|
56.24|
43.76|
|
|--------------------+----------+----------|
|
|% of column
|
41.86|
40.15|
|
|--------------------+----------+----------|
|
|% of all customers |
23.11|
17.98|
|-----------+--------------------+----------+----------|
|3
|Expenditures
| $5,476.00| $4,729.00|
|
|--------------------+----------+----------|
|
|% of row
|
53.66|
46.34|
|
|--------------------+----------+----------|
|
|% of column
|
11.83|
12.59|
|
|--------------------+----------+----------|
|
|% of all customers |
6.53|
5.64|
|-----------+--------------------+----------+----------|
|4
|Expenditures
|$13,959.00|$12,619.00|
|
|--------------------+----------+----------|
|
|% of row
|
52.52|
47.48|
|
|--------------------+----------+----------|
|
|% of column
|
30.15|
33.60|
|
|--------------------+----------+----------|
|
|% of all customers |
16.65|
15.05|
--------------------------------------------------------
u <type> sums the values of Expenditures for all occurrences of Type within the
same value of Division. Thus, for Division=1, the denominator is $7,477 + $5,129.
v <division> sums the frequency counts for all occurrences of Division within the
same value of Type. Thus, for Type=1, the denominator is $7,477 + $19,379 +
$5,476 + $13,959.
w The third use of PCTN has no denominator denition. Omitting a denominator
denition is the same as including all class variables in the denominator
1220
Chapter 47
denition. Thus, for all cells, the denominator is $7,477 + $19,379 + $5,476 +
$13,959 + $5,129 + $15,078 + $4,729 + $12,619.
Region
Style
column headings
Header
box
Header
Beforecaption
row headings
Rowheader
data cells
Data
table
Table
Table 47.6
data cells
CLASS
CLASSLEV
keyword headings
KEYWORD
TABLE
box text
missing values
1221
VAR
cells on the logical page unless you specify another STYLE= option for a cell in the
row or column dimension.
4 A STYLE= option that is specied in the row dimension applies to all the table
cells in the row unless you specify another STYLE= option for a cell in the column
dimension.
5 A STYLE= option that is specied in the column dimension applies to all the table
1222
Chapter 47
Missing Values
How PROC TABULATE Treats Missing Values
How a missing value for a variable in the input data set affects your output depends
on how you use the variable in the PROC TABULATE step. Table 47.7 on page 1222
summarizes how the procedure treats missing values.
Table 47.7
If
no alternative
no alternative
uses a value of 0
no alternative
1 The CLASS statement applies to all TABLE statements in a PROC TABULATE step. Therefore, if you dene a variable as
a class variable, PROC TABULATE omits observations that have missing values for that variable even if you do not use the
variable in a TABLE statement.
This section presents a series of PROC TABULATE steps that illustrate how PROC
TABULATE treats missing values. The following program creates the data set and
formats that are used in this section and prints the data set. The data set COMPREV
contains no missing values (see Figure 47.6 on page 1223).
Missing Values
proc format;
value cntryfmt 1=United States
2=Japan;
value compfmt 1=Supercomputer
2=Mainframe
3=Midrange
4=Workstation
5=Personal Computer
6=Laptop;
run;
data comprev;
input Country Computer Rev90 Rev91 Rev92;
datalines;
1 1 788.8 877.6 944.9
1 2 12538.1 9855.6 8527.9
1 3 9815.8 6340.3 8680.3
1 4 3147.2 3474.1 3722.4
1 5 18660.9 18428.0 23531.1
2 1 469.9 495.6 448.4
2 2 5697.6 6242.4 5382.3
2 3 5392.1 5668.3 4845.9
2 4 1511.6 1875.5 1924.5
2 5 4746.0 4600.8 4363.7
;
Figure 47.6
Country
Computer
United
United
United
United
United
Japan
Japan
Japan
Japan
Japan
Supercomputer
Mainframe
Midrange
Workstation
Personal Computer
Supercomputer
Mainframe
Midrange
Workstation
Personal Computer
States
States
States
States
States
Rev90
Rev91
Rev92
788.8
12538.1
9815.8
3147.2
18660.9
469.9
5697.6
5392.1
1511.6
4746.0
877.6
9855.6
6340.3
3474.1
18428.0
495.6
6242.4
5668.3
1875.5
4600.8
944.9
8527.9
8680.3
3722.4
23531.1
448.4
5382.3
4845.9
1924.5
4363.7
No Missing Values
The following PROC TABULATE step produces Figure 47.7 on page 1224:
proc tabulate data=comprev;
class country computer;
var rev90 rev91 rev92;
1223
1224
Missing Values
Chapter 47
Figure 47.7
Because the data set contains no missing values, the table includes all observations. All headers
and cells contain nonmissing values.
Revenues from Computer Sales
for 1990 to 1992
-------------------------------------------------------------|
| Rev90
| Rev91
| Rev92 |
|
|----------+----------+--------|
|
| Sum
| Sum
| Sum
|
|-----------------------------+----------+----------+--------|
|Computer
|Country
|
|
|
|
|--------------+--------------|
|
|
|
|Supercomputer |United States |
788.80|
877.60| 944.90|
|
|--------------+----------+----------+--------|
|
|Japan
|
469.90|
495.60| 448.40|
|--------------+--------------+----------+----------+--------|
|Mainframe
|United States | 12538.10|
9855.60| 8527.90|
|
|--------------+----------+----------+--------|
|
|Japan
|
5697.60|
6242.40| 5382.30|
|--------------+--------------+----------+----------+--------|
|Midrange
|United States |
9815.80|
6340.30| 8680.30|
|
|--------------+----------+----------+--------|
|
|Japan
|
5392.10|
5668.30| 4845.90|
|--------------+--------------+----------+----------+--------|
|Workstation
|United States |
3147.20|
3474.10| 3722.40|
|
|--------------+----------+----------+--------|
|
|Japan
|
1511.60|
1875.50| 1924.50|
|--------------+--------------+----------+----------+--------|
|Personal
|United States | 18660.90| 18428.00|23531.10|
|Computer
|--------------+----------+----------+--------|
|
|Japan
|
4746.00|
4600.80| 4363.70|
--------------------------------------------------------------
Missing Values
1225
Figure 47.8
The observation with a missing value for Computer was the category Midrange, Japan. This
category no longer exists. By default, PROC TABULATE ignores observations with missing
values for a class variable, so this table contains one fewer row than Figure 47.7 on page 1224.
-------------------------------------------------------------|
| Rev90
| Rev91
| Rev92
|
|
|---------+----------+---------|
|
|
| Sum
| Sum
| Sum
|-----------------------------+---------+----------+---------|
|Computer
|Country
|
|
|
|
|--------------+--------------|
|
|
|
|Supercomputer |United States |
788.80|
877.60|
944.90|
|
|--------------+---------+----------+---------|
|
|Japan
|
469.90|
495.60|
448.40|
|--------------+--------------+---------+----------+---------|
|Mainframe
|United States | 12538.10|
9855.60| 8527.90|
|
|--------------+---------+----------+---------|
|
|Japan
| 5697.60|
6242.40| 5382.30|
|--------------+--------------+---------+----------+---------|
|Midrange
|United States | 9815.80|
6340.30| 8680.30|
|--------------+--------------+---------+----------+---------|
|Workstation
|United States | 3147.20|
3474.10| 3722.40|
|
|--------------+---------+----------+---------|
|
|Japan
| 1511.60|
1875.50| 1924.50|
|--------------+--------------+---------+----------+---------|
|Personal
|United States | 18660.90| 18428.00| 23531.10|
|Computer
|--------------+---------+----------+---------|
|
|Japan
| 4746.00|
4600.80| 4363.70|
--------------------------------------------------------------
1226
Missing Values
Chapter 47
Figure 47.9
This table includes a category with missing values of Computer. This category makes up the
rst row of data in the table.
1
------------------------------------------------------------|
|
Animal
|-----------------------------------------------------------|
|
|
cat
|
dog
|-----------------------------+-----------------------------|
|
|
Food
|
Food
|-----------------------------+-----------------------------|
| fish
| meat
| milk
| fish
| meat
| bones |
|---------+---------+---------+---------+---------+---------|
|
N
|
N
|
N
|
N
|
N
|
N
|
|---------+---------+---------+---------+---------+---------|
|
1|
1|
1|
1|
1|
1|
-------------------------------------------------------------
Missing Values
1227
Figure 47.10
In this table, the missing value appears as the text that the MISSCOMP. format species.
---------------------------------------------------------|
| Rev90 | Rev91 | Rev92 |
|
|--------+--------+--------|
|
| Sum
| Sum
| Sum
|
|-----------------------------+--------+--------+--------|
|Computer
|Country
|
|
|
|
|--------------+--------------|
|
|
|
|No type given |Japan
| 5392.10| 5668.30| 4845.90|
|--------------+--------------+--------+--------+--------|
|Supercomputer |United States | 788.80| 877.60| 944.90|
|
|--------------+--------+--------+--------|
|
|Japan
| 469.90| 495.60| 448.40|
|--------------+--------------+--------+--------+--------|
|Mainframe
|United States |12538.10| 9855.60| 8527.90|
|
|--------------+--------+--------+--------|
|
|Japan
| 5697.60| 6242.40| 5382.30|
|--------------+--------------+--------+--------+--------|
|Midrange
|United States | 9815.80| 6340.30| 8680.30|
|--------------+--------------+--------+--------+--------|
|Workstation
|United States | 3147.20| 3474.10| 3722.40|
|
|--------------+--------+--------+--------|
|
|Japan
| 1511.60| 1875.50| 1924.50|
|--------------+--------------+--------+--------+--------|
|Personal
|United States |18660.90|18428.00|23531.10|
|Computer
|--------------+--------+--------+--------|
|
|Japan
| 4746.00| 4600.80| 4363.70|
----------------------------------------------------------
1228
Missing Values
Chapter 47
Figure 47.11
This table contains a row for the categories No type given, United States and Midrange,
Japan. Because there are no data in these categories, the values for the statistics are all
missing.
-----------------------------------------------------------|
| Rev90
| Rev91 | Rev92 |
|
|---------+---------+---------|
|
| Sum
|
Sum
|
Sum
|
|----------------------------+---------+---------+---------|
|Computer
|Country
|
|
|
|
|--------------+-------------|
|
|
|
|No type given |United States|
.|
.|
.|
|
|-------------+---------+---------+---------|
|
|Japan
| 5392.10| 5668.30| 4845.90|
|--------------+-------------+---------+---------+---------|
|Supercomputer |United States|
788.80|
877.60|
944.90|
|
|-------------+---------+---------+---------|
|
|Japan
|
469.90|
495.60|
448.40|
|--------------+-------------+---------+---------+---------|
|Mainframe
|United States| 12538.10| 9855.60| 8527.90|
|
|-------------+---------+---------+---------|
|
|Japan
| 5697.60| 6242.40| 5382.30|
|--------------+-------------+---------+---------+---------|
|Midrange
|United States| 9815.80| 6340.30| 8680.30|
|
|-------------+---------+---------+---------|
|
|Japan
|
.|
.|
.|
|--------------+-------------+---------+---------+---------|
|Workstation
|United States| 3147.20| 3474.10| 3722.40|
|
|-------------+---------+---------+---------|
|
|Japan
| 1511.60| 1875.50| 1924.50|
|--------------+-------------+---------+---------+---------|
|Personal
|United States| 18660.90| 18428.00| 23531.10|
|Computer
|-------------+---------+---------+---------|
|
|Japan
| 4746.00| 4600.80| 4363.70|
------------------------------------------------------------
Figure 47.12
Missing Values
1229
This table replaces the period normally used to display missing values with the text of the
MISSTEXT= option.
Revenues for Computer Sales
for 1990 to 1992
---------------------------------------------------------|
| Rev90 | Rev91 | Rev92 |
|
|--------+--------+--------|
|
|
Sum | Sum
| Sum
|
|-----------------------------+--------+------------+----|
|Computer
|Country
|
|
|
|
|
|--------------+--------------|
|
|
|No type given |United States |NO DATA!|NO DATA!|NO DATA!|
|
|--------------+--------+--------+--------|
|
|Japan
| 5392.10| 5668.30| 4845.90|
|--------------+--------------+--------+--------+--------|
|Supercomputer |United States | 788.80| 877.60| 944.90|
|
|--------------+--------+--------+--------|
|
|Japan
| 469.90| 495.60| 448.40|
|--------------+--------------+--------+--------+--------|
|Mainframe
|United States |12538.10| 9855.60| 8527.90|
|
|--------------+--------+--------+--------|
|
|Japan
| 5697.60| 6242.40| 5382.30|
|--------------+--------------+--------+--------+--------|
|Midrange
|United States | 9815.80| 6340.30| 8680.30|
|
|--------------+--------+--------+--------|
|
|Japan
|NO DATA!|NO DATA!|NO DATA!|
|--------------+--------------+--------+--------+--------|
|Workstation
|United States | 3147.20| 3474.10| 3722.40|
|
|--------------+--------+--------+--------|
|
|Japan
| 1511.60| 1875.50| 1924.50|
|--------------+--------------+--------+--------+--------|
|Personal
|United States |18660.90|18428.00|23531.10|
|Computer
|--------------+--------+--------+--------|
|
|Japan
| 4746.00| 4600.80| 4363.70|
----------------------------------------------------------
1230
Chapter 47
Figure 47.13
--------------------------------------------------------|
| Rev90 | Rev91 | Rev92 |
|
|--------+--------+--------|
|
| Sum
|
Sum |
Sum |
|----------------------------+--------+--------+--------|
|Computer
|Country
|
|
|
|
|--------------+-------------|
|
|
|
|.
|United States|NO DATA!|NO DATA!|NO DATA!|
|
|-------------+--------+--------+--------|
|
|Japan
| 5392.10| 5668.30| 4845.90|
|--------------+-------------+--------+--------+--------|
|Supercomputer |United States| 788.80| 877.60| 944.90|
|
|-------------+--------+--------+--------|
|
|Japan
| 469.90| 495.60| 448.40|
|--------------+-------------+--------+--------+--------|
|Mainframe
|United States|12538.10| 9855.60| 8527.90|
|
|-------------+--------+--------+--------|
|
|Japan
| 5697.60| 6242.40| 5382.30|
|--------------+-------------+--------+--------+--------|
|Midrange
|United States| 9815.80| 6340.30| 8680.30|
|
|-------------+--------+------------+----|
|
|Japan
|NO DATA!|NO DATA!|NO DATA!|
|--------------+-------------+--------+------------+----|
|Workstation
|United States| 3147.20| 3474.10| 3722.40|
|
|-------------+--------+------------+----|
|
|Japan
| 1511.60| 1875.50| 1924.50|
|--------------+-------------+--------+--------+--------|
|Personal
|United States|18660.90|18428.00|23531.10|
|Computer
|-------------+--------+--------+--------|
|
|Japan
| 4746.00| 4600.80| 4363.70|
|--------------+-------------+--------+--------+--------|
|Laptop
|United States|NO DATA!|NO DATA!|NO DATA!|
|
|-------------+--------+--------+--------|
|
|Japan
|NO DATA!|NO DATA!|NO DATA!|
---------------------------------------------------------
1231
The following program creates a simple data set in which the observations are
ordered rst by the values of Animal, then by the values of Food. The ORDER= option
in the PROC TABULATE statement orders the heading for the class variables by the
order of their appearance in the data set (see Figure 47.14 on page 1231). Although
bones is the rst value for Food in the group of observations where Animal=dog, all
other values for Food appear before bones in the data set because bones never appears
when Animal=cat. Therefore, the header for bones in the table in Figure 47.14 on page
1231 is not in alphabetical order.
In other words, PROC TABULATE maintains for subsequent categories the order
that was established by earlier categories. If you want to re-establish the order of Food
for each value of Animal, then use BY-group processing. PROC TABULATE creates a
separate table for each BY group, so that the ordering can differ from one BY group to
the next.
data foodpref;
input Animal $ Food $;
datalines;
cat fish
cat meat
cat milk
dog bones
dog fish
dog meat
;
proc tabulate data=foodpref format=9.
order=data;
class animal food;
table animal*food;
run;
Figure 47.14
------------------------------------------------------------|
|
Animal
|-----------------------------------------------------------|
|
|
cat
|
dog
|-----------------------------+-----------------------------|
|
|
Food
|
Food
|-----------------------------+-----------------------------|
| fish
| meat
| milk
| fish
| meat
| bones |
|---------+---------+---------+---------+---------+---------|
|
N
|
N
|
N
|
N
|
N
|
N
|
|---------+---------+---------+---------+---------+---------|
|
1|
1|
1|
1|
1|
1|
-------------------------------------------------------------
1232
Chapter 47
Monospace font is not installed. To avoid this problem, specify the following OPTIONS
statement before executing PROC TABULATE:
options formchar="|----|+|---+=|-/\<>*";
This example
3 creates a category for each type of user (residential or business) in each division of
each region
3 applies the same format to all cells in the table
3 applies a format to each class variable
3 extends the space for row headings.
Program
Create the ENERGY data set. ENERGY contains data on expenditures of energy for business
and residential customers in individual states in the Northeast and West regions of the United
States. A DATA step on page 1387 creates the data set.
data energy;
length State $2;
input Region Division state $ Type Expenditures;
datalines;
1 1 ME 1 708
1 1 ME 2 379
. . . more data lines . . .
4 4 HI 1 273
Program
1233
4 4 HI 2 298
;
Create the REGFMT., DIVFMT., and USETYPE. formats. PROC FORMAT creates formats
for Region, Division, and Type.
proc format;
value regfmt 1=Northeast
2=South
3=Midwest
4=West;
value divfmt 1=New England
2=Middle Atlantic
3=Mountain
4=Pacific;
value usetype 1=Residential Customers
2=Business Customers;
run;
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60;
Specify the table options. The FORMAT= option species DOLLAR12. as the default format
for the value in each table cell.
proc tabulate data=energy format=dollar12.;
Specify subgroups for the analysis. The CLASS statement separates the analysis by values
of Region, Division, and Type.
class region division type;
Specify the analysis variable. The VAR statement species that PROC TABULATE calculate
statistics on the Expenditures variable.
var expenditures;
1234
Output
Chapter 47
Dene the table rows and columns. The TABLE statement creates a row for each formatted
value of Region. Nested within each row are rows for each formatted value of Division. The
TABLE statement also creates a column for each formatted value of Type. Each cell that is
created by these rows and columns contains the sum of the analysis variable Expenditures for
all observations that contribute to that cell.
table region*division,
type*expenditures
Specify the row title space. RTS= provides 25 characters per line for row headings.
/ rts=25;
Format the output. The FORMAT statement assigns formats to the variables Region,
Division, and Type.
format region regfmt. division divfmt. type usetype.;
Output
Program
1235
This example
3 uses the EXCLUSIVE option to restrict the output to only the combinations
specied in the CLASSDATA= data set. Without the EXCLUSIVE option, the
output would be the same as in Example 1 on page 1232.
Program
Create the CLASSES data set. CLASSES contains the combinations of class variable values
that PROC TABULATE uses to create the table.
data classes;
input region division type;
datalines;
1 1 1
1 1 2
4 4 1
4 4 2
;
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60;
Specify the table options. CLASSDATA= and EXCLUSIVE restrict the class level
combinations to those that are specied in the CLASSES data set.
proc tabulate data=energy format=dollar12.
classdata=classes exclusive;
1236
Program
Chapter 47
Specify subgroups for the analysis. The CLASS statement separates the analysis by values
of Region, Division, and Type.
class region division type;
Specify the analysis variable. The VAR statement species that PROC TABULATE calculate
statistics on the Expenditures variable.
var expenditures;
Dene the table rows and columns. The TABLE statement creates a row for each formatted
value of Region. Nested within each row are rows for each formatted value of Division. The
TABLE statement also creates a column for each formatted value of Type. Each cell that is
created by these rows and columns contains the sum of the analysis variable Expenditures for
all observations that contribute to that cell.
table region*division,
type*expenditures
Specify the row title space. RTS= provides 25 characters per line for row headings.
/ rts=25;
Format the output. The FORMAT statement assigns formats to the variables Region,
Division, and Type.
format region regfmt. division divfmt. type usetype.;
1237
Output
Energy Expenditures for Each Region
(millions of dollars)
--------------------------------------------------|
|
Type
|
|
|-------------------------|
|
|Residential | Business |
|
| Customers | Customers |
|
|------------+------------|
|
|Expenditures|Expenditures|
|
|------------+------------|
|
|
Sum
|
Sum
|
|-----------------------+------------+------------|
|Region
|Division
|
|
|
|-----------+-----------|
|
|
|Northeast |New England|
$7,477|
$5,129|
|-----------+-----------+------------+------------|
|West
|Pacific
|
$13,959|
$12,619|
---------------------------------------------------
This example
3 creates a table that includes all possible combinations of formatted class variable
values (PRELOADFMT with PRINTMISS), even if those combinations have a zero
frequency and even if they do not make sense
3 uses only the preloaded range of user-dened formats as the levels of class
variables (PRELOADFMT with EXCLUSIVE).
3 writes the output to an output data set, and prints that data set.
1238
Program
Chapter 47
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60;
Specify the table options. The FORMAT= option species DOLLAR12. as the default format
for the value in each table cell.
proc tabulate data=energy format=dollar12.;
Specify subgroups for the analysis. The CLASS statement separates the analysis by values
of Region, Division, and Type. PRELOADFMT species that PROC TABULATE use the
preloaded values of the user-dened formats for the class variables.
class region division type / preloadfmt;
Specify the analysis variable. The VAR statement species that PROC TABULATE calculate
statistics on the Expenditures variable.
var expenditures;
Dene the table rows and columns, and specify row and column options. PRINTMISS
species that all possible combinations of user-dened formats be used as the levels of the class
variables.
table region*division,
type*expenditures / rts=25 printmiss;
Format the output. The FORMAT statement assigns formats to the variables Region,
Division, and Type.
format region regfmt. division divfmt. type usetype.;
Program
1239
Specify the table options and the output data set. The OUT= option species the name of
the output data set to which PROC TABULATE writes the data.
proc tabulate data=energy format=dollar12. out=tabdata;
Specify subgroups for the analysis. The EXCLUSIVE option, when used with
PRELOADFMT, uses only the preloaded range of user-dened formats as the levels of class
variables.
class region division type / preloadfmt exclusive;
Specify the analysis variable. The VAR statement species that PROC TABULATE calculate
statistics on the Expenditures variable.
var expenditures;
Dene the table rows and columns, and specify row and column options. The
PRINTMISS option is not specied in this case. If it were, then it would override the
EXCLUSIVE option in the CLASS statement.
table region*division,
type*expenditures / rts=25;
Format the output. The FORMAT statement assigns formats to the variables Region,
Division, and Type.
format region regfmt. division divfmt. type usetype.;
1240
Output
Chapter 47
Output
This output, created with the PRELOADFMT and PRINTMISS options, contains all possible
combinations of preloaded user-dened formats for the class variable values. It includes
combinations with zero frequencies, and combinations that make no sense, such as Northeast
and Pacific.
Output
1241
This output, created with the PRELOADFMT and EXCLUSIVE options, contains only those
combinations of preloaded user-dened formats for the class variable values that appear in the
input data set. This output is identical to the output from Example 1 on page 1232.
--------------------------------------------------|
|
Type
|
|
|-------------------------|
|
|Residential | Business |
|
| Customers | Customers |
|
|------------+------------|
|
|Expenditures|Expenditures|
|
|------------+------------|
|
|
Sum
|
Sum
|
|-----------------------+------------+------------|
|Region
|Division
|
|
|
|-----------+-----------|
|
|
|Northeast |New England|
$7,477|
$5,129|
|
|-----------+------------+------------|
|
|Middle
|
|
|
|
|Atlantic
|
$19,379|
$15,078|
|-----------+-----------+------------+------------|
|West
|Mountain
|
$5,476|
$4,729|
|
|-----------+------------+------------|
|
|Pacific
|
$13,959|
$12,619|
---------------------------------------------------
This output is a listing of the output data set TABDATA, which was created by the OUT= option
in the PROC TABULATE statement. TABDATA contains the data that is created by having the
PRELOADFMT and EXCLUSIVE options specied.
O
b
s
R
e
g
i
o
n
D
i
v
i
s
i
o
n
1
2
3
4
5
6
7
8
Northeast
Northeast
Northeast
Northeast
West
West
West
West
New England
New England
Middle Atlantic
Middle Atlantic
Mountain
Mountain
Pacific
Pacific
T
y
p
e
_
T
Y
P
E
_
_
P
A
G
E
_
_
T
A
B
L
E
_
E
x
p
e
n
d
i
t
u
r
e
s
_
S
u
m
Residential Customers
Business Customers
Residential Customers
Business Customers
Residential Customers
Business Customers
Residential Customers
Business Customers
111
111
111
111
111
111
111
111
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
7477
5129
19379
15078
5476
4729
13959
12619
1242
Chapter 47
FORMAT procedure
FORMAT statement
VALUE statement options:
MULTILABEL
This example
3 shows how to activate multilabel format processing using the MLF option with the
CLASS statement
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=64;
Create the CARSURVEY data set. CARSURVEY contains data from a survey that was
distributed by a car manufacturer to a focus group of potential customers who were brought
together to evaluate new car names. Each observation in the data set contains an identication
number, the participants age, and the participants ratings of four car names. A DATA step
creates the data set.
data carsurvey;
input Rater Age Progressa Remark Jupiter Dynamo;
datalines;
1
38 94 98 84 80
2
49 96 84 80 77
3
16 64 78 76 73
27
89
73
90
Program
1243
92
61
24
18
62
92
87
54
90
88
88
50
91
77
88
62
90
85
91
74
86
Create the AGEFMT. format. The FORMAT procedure creates a multilabel format for ages by
using the MULTILABEL option on page 449. A multilabel format is one in which multiple labels
can be assigned to the same value, in this case because of overlapping ranges. Each value is
represented in the table for each range in which it occurs. The NOTSORTED option stores the
ranges in the order in which they are dened.
proc format;
value agefmt (multilabel notsorted)
15 - 29 = Below 30 years
30 - 50 = Between 30 and 50
51 - high = Over 50 years
15 - 19 = 15 to 19
20 - 25 = 20 to 25
25 - 39 = 25 to 39
40 - 55 = 40 to 55
56 - high = 56 and above;
run;
Specify the table options. The FORMAT= option species up to 10 digits as the default
format for the value in each table cell.
proc tabulate data=carsurvey format=10.;
Specify subgroups for the analysis. The CLASS statement identies Age as the class
variable and uses the MLF option to activate multilabel format processing.
class age / mlf;
Specify the analysis variables. The VAR statement species that PROC TABULATE
calculate statistics on the Progressa, Remark, Jupiter, and Dynamo variables.
var progressa remark jupiter dynamo;
Dene the table rows and columns. The row dimension of the TABLE statement creates a
row for each formatted value of Age. Multilabel formatting allows an observation to be included
in multiple rows or age categories. The row dimension uses the ALL class variable to
summarize information for all rows. The column dimension uses the N statistic to calculate the
number of observations for each age group. Notice that the result of the N statistic crossed with
the ALL class variable in the row dimension is the total number of observations instead of the
sum of the N statistics for the rows. The column dimension uses the ALL class variable at the
beginning of a crossing to assign a label, Potential Car Names. The four nested columns
calculate the mean ratings of the car names for each age group.
table age all, n all=Potential Car Names*(progressa remark
jupiter dynamo)*mean;
1244
Output
Chapter 47
Format the output. The FORMAT statement assigns the user-dened format AGEFMT. to Age
for this analysis.
format age agefmt.;
run;
Output
Output 47.3
Rating Four Potential Car Names
Rating Scale 0-100 (100 is the highest rating)
--------------------------------------------------------------------------|
|
|
Potential Car Names
|
|
|
|-------------------------------------------|
|
|
|Progressa | Remark | Jupiter | Dynamo |
|
|
|----------+----------+----------+----------|
|
|
N
|
Mean
|
Mean
|
Mean
|
Mean
|
|------------------+----------+----------+----------+----------+----------|
|Age
|
|
|
|
|
|
|------------------|
|
|
|
|
|
|15 to 19
|
14|
75|
78|
81|
73|
|------------------+----------+----------+----------+----------+----------|
|20 to 25
|
11|
89|
88|
84|
89|
|------------------+----------+----------+----------+----------+----------|
|25 to 39
|
26|
84|
90|
82|
72|
|------------------+----------+----------+----------+----------+----------|
|40 to 55
|
14|
85|
87|
80|
68|
|------------------+----------+----------+----------+----------+----------|
|56 and above
|
15|
84|
82|
81|
75|
|------------------+----------+----------+----------+----------+----------|
|Below 30 years
|
36|
82|
84|
82|
75|
|------------------+----------+----------+----------+----------+----------|
|Between 30 and 50 |
25|
86|
89|
81|
73|
|------------------+----------+----------+----------+----------+----------|
|Over 50 years
|
19|
82|
84|
80|
76|
|------------------+----------+----------+----------+----------+----------|
|All
|
80|
83|
86|
81|
74|
---------------------------------------------------------------------------
TABLE statement
labels
Data set:
Formats:
Program
1245
This example shows how to customize row and column headings. A label species
text for a heading. A blank label creates a blank heading. PROC TABULATE removes
the space for blank column headings from the table.
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60;
Specify the table options. The FORMAT= option species DOLLAR12. as the default format
for the value in each table cell.
proc tabulate data=energy format=dollar12.;
Specify subgroups for the analysis. The CLASS statement identies Region, Division, and
Type as class variables.
class region division type;
Specify the analysis variable. The VAR statement species that PROC TABULATE calculate
statistics on the Expenditures variable.
var expenditures;
Dene the table rows and columns. The TABLE statement creates a row for each formatted
value of Region. Nested within each row are rows for each formatted value of Division. The
TABLE statement also creates a column for each formatted value of Type. Each cell that is
created by these rows and columns contains the sum of the analysis variable Expenditures for
all observations that contribute to that cell. Text in quotation marks species headings for the
corresponding variable or statistic. Although Sum is the default statistic, it is specied here so
that you can specify a blank for its heading.
table region*division,
type=Customer Base*expenditures= *sum=
Specify the row title space. RTS= provides 25 characters per line for row headings.
/ rts=25;
Format the output. The FORMAT statement assigns formats to Region, Division, and Type.
format region regfmt. division divfmt. type usetype.;
1246
Output
Chapter 47
Output
The heading for Type contains text that is specied in the TABLE statement. The TABLE
statement eliminated the headings for Expenditures and Sum.
--------------------------------------------------|
|
Customer Base
|
|
|-------------------------|
|
|Residential | Business |
|
| Customers | Customers |
|-----------------------+------------+------------|
|Region
|Division
|
|
|
|-----------+-----------|
|
|
|Northeast |New England|
$7,477|
$5,129|
|
|-----------+------------+------------|
|
|Middle
|
|
|
|
|Atlantic
|
$19,379|
$15,078|
|-----------+-----------+------------+------------|
|West
|Mountain
|
$5,476|
$4,729|
|
|-----------+------------+------------|
|
|Pacific
|
$13,959|
$12,619|
---------------------------------------------------
This example shows how to use the universal class variable ALL to summarize
information from multiple categories.
Program
1247
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=64 pagesize=60;
Specify the table options. The FORMAT= option species COMMA12. as the default format
for the value in each table cell.
proc tabulate data=energy format=comma12.;
Specify subgroups for the analysis. The CLASS statement identies Region, Division, and
Type as class variables.
class region division type;
Specify the analysis variable. The VAR statement species that PROC TABULATE calculate
statistics on the Expenditures variable.
var expenditures;
Dene the table rows. The row dimension of the TABLE statement creates a row for each
formatted value of Region. Nested within each row are rows for each formatted value of Division
and a row (labeled Subtotal) that summarizes all divisions in the region. The last row of the
report (labeled Total for All Regions) summarizes all regions. The format modier
f=DOLLAR12. assigns the DOLLAR12. format to the cells in this row.
table region*(division all=Subtotal)
all=Total for All Regions*f=dollar12.,
Dene the table columns. The column dimension of the TABLE statement creates a column
for each formatted value of Type and a column that is labeled All customers that shows
expenditures for all customers in a row of the table. Each cell that is created by these rows and
columns contains the sum of the analysis variable Expenditures for all observations that
contribute to that cell. Text in quotation marks species headings for the corresponding variable
or statistic. Although Sum is the default statistic, it is specied here so that you can specify a
blank for its heading.
type=Customer Base*expenditures= *sum=
all=All Customers*expenditures= *sum=
Specify the row title space. RTS= provides 25 characters per line for row headings.
/ rts=25;
1248
Output
Chapter 47
Format the output. The FORMAT statement assigns formats to the variables Region,
Division, and Type.
format region regfmt. division divfmt. type usetype.;
Output
The universal class variable ALL provides subtotals and totals in this
table.
---------------------------------------------------------------|
|
Customer Base
|
|
|
|-------------------------|
|
|
|Residential | Business |
All
|
|
| Customers | Customers | Customers |
|-----------------------+------------+------------+------------|
|Region
|Division
|
|
|
|
|-----------+-----------|
|
|
|
|Northeast |New England|
7,477|
5,129|
12,606|
|
|-----------+------------+------------+------------|
|
|Middle
|
|
|
|
|
|Atlantic
|
19,379|
15,078|
34,457|
|
|-----------+------------+------------+------------|
|
|Subtotal
|
26,856|
20,207|
47,063|
|-----------+-----------+------------+------------+------------|
|West
|Division
|
|
|
|
|
|-----------|
|
|
|
|
|Mountain
|
5,476|
4,729|
10,205|
|
|-----------+------------+------------+------------|
|
|Pacific
|
13,959|
12,619|
26,578|
|
|-----------+------------+------------+------------|
|
|Subtotal
|
19,435|
17,348|
36,783|
|-----------------------+------------+------------+------------|
|Total for All Regions |
$46,291|
$37,555|
$83,846|
----------------------------------------------------------------
Program
1249
TABLE statement:
labels
ROW=FLOAT
Data set: ENERGYENERGY on page 1387
Formats: REGFMT., DIVFMT., and USETYPE. on page 1233
This example shows how to eliminate blank row headings from a table. To do so, you
must both provide blank labels for the row headings and specify ROW=FLOAT in the
TABLE statement.
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60;
Specify the table options. The FORMAT= option species DOLLAR12. as the default format
for the value in each table cell.
proc tabulate data=energy format=dollar12.;
Specify subgroups for the analysis. The CLASS statement identies Region, Division, and
Type as class variables.
class region division type;
Specify the analysis variable. The VAR statement species that PROC TABULATE calculate
statistics on the Expenditures variable.
var expenditures;
Dene the table rows. The row dimension of the TABLE statement creates a row for each
formatted value of Region. Nested within these rows is a row for each formatted value of
Division. The analysis variable Expenditures and the Sum statistic are also included in the row
dimension, so PROC TABULATE creates row headings for them as well. The text in quotation
marks species the headings for the corresponding variable or statistic. Although Sum is the
default statistic, it is specied here so that you can specify a blank for its heading.
table region*division*expenditures= *sum= ,
1250
Output
Chapter 47
Dene the table columns. The column dimension of the TABLE statement creates a column
for each formatted value of Type.
type=Customer Base
Specify the row title space and eliminate blank row headings. RTS= provides 25
characters per line for row headings. ROW=FLOAT eliminates blank row headings.
/ rts=25 row=float;
Format the output. The FORMAT statement assigns formats to the variables Region,
Division, and Type.
format region regfmt. division divfmt. type usetype.;
Output
Compare this table with the output in Example 5 on page 1244. The two tables are identical,
but the program that creates this table uses Expenditures and Sum in the row dimension.
PROC TABULATE automatically eliminates blank headings from the column dimension,
whereas you must specify ROW=FLOAT to eliminate blank headings from the row dimension.
Program
1251
3 eliminating horizontal separator lines from the row titles and the body of the table.
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60;
Specify the table options. The FORMAT= option species DOLLAR12. as the default format
for the value in each table cell. NOSEPS eliminates horizontal separator lines from row titles
and from the body of the table.
proc tabulate data=energy format=dollar12. noseps;
Specify subgroups for the analysis. The CLASS statement identies Region, Division, and
Type as class variables.
class region division type;
Specify the analysis variable. The VAR statement species that PROC TABULATE calculate
statistics on the Expenditures variable.
var expenditures;
1252
Output
Chapter 47
Dene the table rows and columns. The TABLE statement creates a row for each formatted
value of Region. Nested within each row are rows for each formatted value of Division. The
TABLE statement also creates a column for each formatted value of Type. Each cell that is
created by these rows and columns contains the sum of the analysis variable Expenditures for
all observations that contribute to that cell. Text in quotation marks in all dimensions species
headings for the corresponding variable or statistic. Although Sum is the default statistic, it is
specied here so that you can specify a blank for its heading.
table region*division,
type=Customer Base*expenditures= *sum=
Specify the row title space and indention value. RTS= provides 25 characters per line for
row headings. INDENT= removes row headings for class variables, places values for Division
beneath values for Region rather than beside them, and indents values for Division four spaces.
/ rts=25 indent=4;
Format the output. The FORMAT statement assigns formats to the variables Region,
Division, and Type.
format region regfmt. division divfmt. type usetype.;
Output
NOSEPS removes the separator lines from the row titles and the body of the table. INDENT=
eliminates the row headings for Region and Division and indents values for Division underneath
values for Region.
Program
1253
TABLE statement
ALL class variable
BOX=
CONDENSE
INDENT=
page expression
Data set: ENERGY ENERGY on page 1387
Formats: REGFMT., DIVFMT., and USETYPE. on page 1233
This example creates a separate table for each region and one table for all regions.
By default, PROC TABULATE creates each table on a separate page, but the
CONDENSE option places them all on the same page.
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60;
Specify the table options. The FORMAT= option species DOLLAR12. as the default format
for the value in each table cell.
proc tabulate data=energy format=dollar12.;
Specify subgroups for the analysis. The CLASS statement identies Region, Division, and
Type as class variables.
class region division type;
Specify the analysis variable. The VAR statement species that PROC TABULATE calculate
statistics on the Expenditures variable.
var expenditures;
Dene the table pages. The page dimension of the TABLE statement creates one table for
each formatted value of Region and one table for all regions. Text in quotation marks provides
the heading for each page.
table region=Region: all=All Regions,
1254
Program
Chapter 47
Dene the table rows. The row dimension creates a row for each formatted value of Division
and a row for all divisions. Text in quotation marks provides the row headings.
division all=All Divisions,
Dene the table columns. The column dimension of the TABLE statement creates a column
for each formatted value of Type. Each cell that is created by these pages, rows, and columns
contains the sum of the analysis variable Expenditures for all observations that contribute to
that cell. Text in quotation marks species headings for the corresponding variable or statistic.
Although Sum is the default statistic, it is specied here so that you can specify a blank for its
heading.
type=Customer Base*expenditures= *sum=
Specify additional table options. RTS= provides 25 characters per line for row headings.
BOX= places the page heading inside the box above the row headings. CONDENSE places as
many tables as possible on one physical page. INDENT= eliminates the row heading for
Division. (Because there is no nesting in the row dimension, there is nothing to indent.)
/ rts=25 box=_page_ condense indent=1;
Format the output. The FORMAT statement assigns formats to the variables Region,
Division, and Type.
format region regfmt. division divfmt. type usetype.;
Output
Energy Expenditures for Each Region and All Regions
(millions of dollars)
--------------------------------------------------|Region: Northeast
|
Customer Base
|
|
|-------------------------|
|
|Residential | Business |
|
| Customers | Customers |
|-----------------------+------------+------------|
|New England
|
$7,477|
$5,129|
|-----------------------+------------+------------|
|Middle Atlantic
|
$19,379|
$15,078|
|-----------------------+------------+------------|
|All Divisions
|
$26,856|
$20,207|
---------------------------------------------------
--------------------------------------------------|Region: West
|
Customer Base
|
|
|-------------------------|
|
|Residential | Business |
|
| Customers | Customers |
|-----------------------+------------+------------|
|Mountain
|
$5,476|
$4,729|
|-----------------------+------------+------------|
|Pacific
|
$13,959|
$12,619|
|-----------------------+------------+------------|
|All Divisions
|
$19,435|
$17,348|
---------------------------------------------------
--------------------------------------------------|All Regions
|
Customer Base
|
|
|-------------------------|
|
|Residential | Business |
|
| Customers | Customers |
|-----------------------+------------+------------|
|New England
|
$7,477|
$5,129|
|-----------------------+------------+------------|
|Middle Atlantic
|
$19,379|
$15,078|
|-----------------------+------------+------------|
|Mountain
|
$5,476|
$4,729|
|-----------------------+------------+------------|
|Pacific
|
$13,959|
$12,619|
|-----------------------+------------+------------|
|All Divisions
|
$46,291|
$37,555|
---------------------------------------------------
TABLE statement:
denominator denition (angle bracket operators)
N statistic
PCTN statistic
variable list
Other features:
FORMAT procedure
1255
1256
Chapter 47
Figure 47.15
ID#:
Performance
Reliability
Sales staff
Newspaper / Magazine
Word of mouth
Personality
Appearance
Program
1257
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page. The FORMDLIM=
option replaces the character that delimits page breaks with a single blank. By default, a new
physical page starts whenever a page break occurs.
options nodate pageno=1 linesize=80 pagesize=18 formdlim= ;
Store the number of observations in a macro variable. The SET statement reads the
descriptor portion of CUSTOMER_RESPONSE at compile time and stores the number of
observations (the number of respondents) in COUNT. The SYMPUT routine stores the value of
COUNT in the macro variable NUM. This variable is available for use by other procedures and
DATA steps for the remainder of the SAS session. The IF 0 condition, which is always false,
ensures that the SET statement, which reads the observations, never executes. (Reading
observations is unnecessary.) The STOP statement ensures that the DATA step executes only
once.
data _null_;
if 0 then set customer_response nobs=count;
call symput(num,left(put(count,4.)));
stop;
run;
1258
Program
Chapter 47
Create the PCTFMT. format. The FORMAT procedure creates a format for percentages. The
PCTFMT. format writes all values with at least one digit to the left of the decimal point and
with one digit to the right of the decimal point. A blank and a percent sign follow the digits.
proc format;
picture pctfmt low-high=009.9 %;
run;
Specify the analysis variables. The VAR statement species that PROC TABULATE
calculate statistics on the Factor1, Factor2, Factor3, Factor4, and Customer variables. The
variable Customer must be listed because it is used to calculate the Percent column that is
dened in the TABLE statement.
var factor1-factor4 customer;
Dene the table rows and columns. The TABLE statement creates a row for each factor, a
column for frequency counts, and a column for the percentages. Text in quotation marks
supplies headers for the corresponding row or column. The format modiers F=7. and
F=PCTFMT9. provide formats for values in the associated cells and extend the column widths to
accommodate the column headers.
table factor1=Cost
factor2=Performance
factor3=Reliability
factor4=Sales Staff,
(n=Count*f=7. pctn<customer>=Percent*f=pctfmt9.) ;
Suppress page numbers. The SAS system option NONUMBER suppresses page numbers for
subsequent pages.
options nonumber;
Program
1259
Specify the analysis variables. The VAR statement species that PROC TABULATE
calculate statistics on the Source1, Source2, Source3, and Customer variables. The variable
Customer must be in the variable list because it appears in the denominator denition.
var source1-source3 customer;
Dene the table rows and columns. The TABLE statement creates a row for each source of
the company name, a column for frequency counts, and a column for the percentages. Text in
quotation marks supplies a heading for the corresponding row or column.
table source1=TV/Radio
source2=Newspaper
source3=Word of Mouth,
(n=Count*f=7. pctn<customer>=Percent*f=pctfmt9.) ;
Specify the title and footnote. The macro variable NUM resolves to the number of
respondents. The FOOTNOTE statement uses double rather than single quotation marks so
that the macro variable will resolve.
title Source of Company Name;
footnote "Number of Respondents: &num";
run;
Reset the SAS system options. The FORMDLIM= option resets the page delimiter to a page
eject. The NUMBER option resumes the display of page numbers on subsequent pages.
options formdlim= number;
1260
Output
Chapter 47
Output
TABLE statement:
N statistic
Other features:
FORMAT procedure
TRANSPOSE procedure
Data set options:
RENAME=
This report of listener preferences shows how many listeners select each type of
programming during each of seven time periods on a typical weekday. The data was
collected by a survey, and the results were stored in a SAS data set. Although this data
1261
set contains all the information needed for this report, the information is not arranged
in a way that PROC TABULATE can use.
To make this crosstabulation of time of day and choice of radio programming, you
must have a data set that contains a variable for time of day and a variable for
programming preference. PROC TRANSPOSE reshapes the data into a new data set
that contains these variables. Once the data are in the appropriate form, PROC
TABULATE creates the report.
Figure 47.16
phone_ _ _
LISTENER SURVEY
1. _______
2. _______
3. _______
4. _______
Use codes 1-8 for question 5. Use codes 0-8 for 6-19.
0 Do not listen at that time
1
2
3
4
5. _______
Rock
Top 40
Country
Jazz
5
6
7
8
Classical
Easy Listening
News/Information/Talk
Other
On a typical WEEKDAY,
what kind of radio programming do you listen to
On a typical WEEKEND-DAY,
what kind of radio programming
do you listen to
13.
14.
15.
16.
17.
18.
19.
An external le on page 1405 contains the raw data for the survey. Several lines
from that le appear here.
967
7 5
781
5 0
32 f 5 3 5
5 5 7 0 0 0 8 7 0 0 8 0
30 f 2 3 5
0 0 5 0 0 0 4 7 5 0 0 0
1262
Program
Chapter 47
859 39 f 1 0 5
1 0 0 0 1 0 0 0 0 0 0 0 0 0
. . . more data lines . . .
859 32 m .25 .25 1
1 0 0 0 0 0 0 0 1 0 0 0 0 0
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=132 pagesize=40;
Create the RADIO data set and specify the input le. RADIO contains data from a survey
of 336 listeners. The data set contains information about listeners and their preferences in radio
programming. The INFILE statement species the external le that contains the data.
MISSOVER prevents the input pointer from going to the next record if it fails to nd values in
the current line for all variables that are listed in the INPUT statement.
data radio;
infile input-file missover;
Read the appropriate data line, assign a unique number to each respondent, and
write an observation to RADIO. Each raw-data record contains two lines of information
about each listener. The INPUT statement reads only the information that this example needs.
The / line control skips the rst line of information in each record. The rest of the INPUT
statement reads Time1-Time7 from the beginning of the second line. These variables represent
the listeners radio programming preference for each of seven time periods on weekdays (see
Figure 47.16 on page 1261). The listener=_N_ statement assigns a unique identier to each
listener. An observation is automatically written to RADIO at the end of each iteration.
input /(Time1-Time7) ($1. +1);
listener=_n_;
run;
Create the $TIMEFMT. and $PGMFMT. formats. PROC FORMAT creates formats for the
time of day and the choice of programming.
proc format;
value $timefmt Time1=6-9 a.m.
Time2=9 a.m. to noon
Time3=noon to 1 p.m.
Time4=1-4 p.m.
Time5=4-6 p.m.
Time6=6-10 p.m.
Time7=10 p.m. to 2 a.m.
Program
1263
value $pgmfmt
run;
Reshape the data by transposing the RADIO data set. PROC TRANSPOSE creates
RADIO_TRANSPOSED. This data set contains the variable Listener from the original data set.
It also contains two transposed variables: Timespan and Choice. Timespan contains the names
of the variables (Time1-Time7) from the input data set that are transposed to form observations
in the output data set. Choice contains the values of these variables. (See A Closer Look on
page 1264 for a complete explanation of the PROC TRANSPOSE step.)
proc transpose data=radio
out=radio_transposed(rename=(col1=Choice))
name=Timespan;
by listener;
var time1-time7;
Format the transposed variables. The FORMAT statement permanently associates these
formats with the variables in the output data set.
format timespan $timefmt. choice $pgmfmt.;
run;
Create the report and specify the table options. The FORMAT= option species the default
format for the values in each table cell.
proc tabulate data=radio_transposed format=12.;
Specify subgroups for the analysis. The CLASS statement identies Timespan and Choice
as class variables.
class timespan choice;
Dene the table rows and columns. The TABLE statement creates a row for each formatted
value of Timespan and a column for each formatted value of Choice. In each column are values
for the N statistic. Text in quotation marks supplies headings for the corresponding rows or
columns.
table timespan=Time of Day,
choice=Choice of Radio Program*n=Number of Listeners;
1264
Output
Chapter 47
Output
--------------------------------------------------------------------------------------------------------------|
|
|
Choice of Radio Program
|
|-----------------------------------------------------------------------------|
|
|
|
|
|
|
|
|
|
|Dont Listen|
40
| Country
| Listening |
/Talk
|
Other
|
|------------+------------+------------+------------+------------+------------|
| Number of | Number of | Number of | Number of | Number of | Number of |
|
|
|
|
|Rock and Top|
|
Jazz,
|
|
| Classical, |
News/
|
| and Easy |Information |
|
|
|
|
| Listeners | Listeners | Listeners | Listeners | Listeners | Listeners |
|-------------------------------+------------+------------+------------+------------+------------+------------|
|Time of Day
|
|
|
|
|
|
|
|-------------------------------|
|6-9 a.m.
|
|
34|
|
143|
|
7|
|
39|
|
96|
|
17|
|-------------------------------+------------+------------+------------+------------+------------+------------|
|9 a.m. to noon
|
214|
59|
5|
51|
3|
4|
|-------------------------------+------------+------------+------------+------------+------------+------------|
|noon to 1 p.m.
|
238|
55|
3|
27|
9|
4|
|-------------------------------+------------+------------+------------+------------+------------+------------|
|1-4 p.m.
|
216|
60|
5|
50|
2|
3|
|-------------------------------+------------+------------+------------+------------+------------+------------|
|4-6 p.m.
|
56|
130|
6|
57|
69|
18|
|-------------------------------+------------+------------+------------+------------+------------+------------|
|6-10 p.m.
|
202|
54|
9|
44|
20|
7|
|-------------------------------+------------+------------+------------+------------+------------+------------|
|10 p.m. to 2 a.m.
|
264|
29|
3|
36|
2|
2|
---------------------------------------------------------------------------------------------------------------
A Closer Look
Reshape the data
The original input data set has all the information that you need to make the
crosstabular report, but PROC TABULATE cannot use the information in that form.
PROC TRANSPOSE rearranges the data so that each observation in the new data set
contains the variable Listener, a variable for time of day, and a variable for
programming preference. Figure 47.17 on page 1265 illustrates the transposition.
PROC TABULATE uses this new data set to create the crosstabular report.
PROC TRANSPOSE restructures data so that values that were stored in one
observation are written to one variable. You can specify which variables you want to
transpose. This section illustrates how PROC TRANSPOSE reshapes the data. The
following section explains the PROC TRANSPOSE step in this example.
When you transpose with BY processing, as this example does, you create from each
BY group one observation for each variable that you transpose. In this example,
Listener is the BY variable. Each observation in the input data set is a BY group
because the value of Listener is unique for each observation.
This example transposes seven variables, Time1 through Time7. Therefore, the
output data set has seven observations from each BY group (each observation) in the
input data set.
Figure 47.17
A Closer Look
1265
Time2
Time3
Time4
Time5
Time6
Time7
Listener
Listener
_NAME_
COL1
Time1
Time2
Time3
Time4
Time5
Time6
Time7
Time1
Time2
Time3
Time4
Time5
Time6
Time7
1266
Chapter 47
This example shows how to use three percentage sum statistics: COLPCTSUM,
REPPCTSUM, and ROWPCTSUM.
Program
1267
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=105 pagesize=60;
Create the FUNDRAIS data set. FUNDRAIS contains data on student sales during a school
fund-raiser. A DATA step creates the data set.
data fundrais;
length name $ 8 classrm $ 1;
input @1 team $ @8 classrm $ @10 name $
@19 pencils @23 tablets;
sales=pencils + tablets;
datalines;
BLUE
A ANN
4
8
RED
A MARY
5 10
GREEN A JOHN
6
4
RED
A BOB
2
3
BLUE
B FRED
6
8
GREEN B LOUISE
12
2
BLUE
B ANNETTE
.
9
RED
B HENRY
8 10
GREEN A ANDREW
3
5
RED
A SAMUEL
12 10
BLUE
A LINDA
7 12
GREEN A SARA
4
.
BLUE
B MARTIN
9 13
RED
B MATTHEW
7
6
GREEN B BETH
15 10
RED
B LAURA
4
3
;
Create the PCTFMT. format. The FORMAT procedure creates a format for percentages. The
PCTFMT. format writes all values with at least one digit, a blank, and a percent sign.
proc format;
picture pctfmt low-high=009 %;
run;
Create the report and specify the table options. The FORMAT= option species up to
seven digits as the default format for the value in each table cell.
proc tabulate format=7.;
Specify subgroups for the analysis. The CLASS statement identies Team and Classrm as
class variables.
class team classrm;
1268
Output
Chapter 47
Specify the analysis variable. The VAR statement species that PROC TABULATE calculate
statistics on the Sales variable.
var sales;
Dene the table rows. The row dimension of the TABLE statement creates a row for each
formatted value of Team. The last row of the report summarizes sales for all teams.
table (team all),
Dene the table columns. The column dimension of the TABLE statement creates a column
for each formatted value of Classrm. Crossed within each value of Classrm is the analysis
variable (sales) with a blank label. Nested within each column are columns that summarize
sales for the class.
3 The rst nested column, labeled sum, is the sum of sales for the row for the classroom.
3 The second nested column, labeled ColPctSum, is the percentage of the sum of sales for the
row for the classroom in relation to the sum of sales for all teams in the classroom.
3 The third nested column, labeled RowPctSum, is the percentage of the sum of sales for the
row for the classroom in relation to the sum of sales for the row for all classrooms.
3 The fourth nested column, labeled RepPctSum, is the percentage of the sum of sales for the
row for the classroom in relation to the sum of sales for all teams for all classrooms.
The last column of the report summarizes sales for the row for all classrooms.
classrm=Classroom*sales= *(sum
colpctsum*f=pctfmt9.
rowpctsum*f=pctfmt9.
reppctsum*f=pctfmt9.)
all*sales*sum=
Specify the row title space and eliminate blank row headings. RTS= provides 20
characters per line for row headings.
/rts=20;
run;
Output
Fundraiser Sales
-------------------------------------------------------------------------------------------------------|
|
|
|
Classroom
|
|---------------------------------------------------------------------------|
|
A
|
B
|
|
|
|-------------------------------------+-------------------------------------+-------|
| Sum |ColPctSum|RowPctSum|RepPctSum| Sum |ColPctSum|RowPctSum|RepPctSum| Sum |
All
|
|
|
|------------------+-------+---------+---------+---------+-------+---------+---------+---------+-------|
|team
|
|
|
|
|
|
|
|
|
|
|------------------|
|
|
|
|
|
|
|
|
|
|BLUE
|
31|
34 %|
46 %|
15 %|
36|
31 %|
53 %|
17 %|
67|
|------------------+-------+---------+---------+---------+-------+---------+---------+---------+-------|
|GREEN
|
18|
19 %|
31 %|
8 %|
39|
34 %|
68 %|
19 %|
57|
|------------------+-------+---------+---------+---------+-------+---------+---------+---------+-------|
|RED
|
42|
46 %|
52 %|
20 %|
38|
33 %|
47 %|
18 %|
80|
|------------------+-------+---------+---------+---------+-------+---------+---------+---------+-------|
|All
|
91|
100 %|
44 %|
44 %|
113|
100 %|
55 %|
55 %|
204|
--------------------------------------------------------------------------------------------------------
Program
1269
A Closer Look
Here are the percentage sum statistic calculations used to produce the output for the
Blue Team in Classroom A:
COLPCTSUM=31/91*100=34%
ROWPCTSUM=31/67*100=46%
REPPCTSUM=31/204*100=15%
Similar calculations were used to produce the output for the remaining teams and
classrooms.
TABLE statement:
ALL class variable
denominator denitions (angle bracket operators)
N statistic
PCTN statistic
Other features:
FORMAT procedure
3 the total for that gender in all job classes (column percentage)
3 the total for all employees.
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60;
Create the JOBCLASS data set. JOBCLASS contains encoded information about the gender
and job class of employees at a ctitious company.
data jobclass;
input Gender Occupation @@;
1270
Program
Chapter 47
1
1
1
1
1
1
1
1
1
2
2
2
2
2
2
2
2
2
;
datalines;
1 1 1 1 1
2 1 2 1 2
3 1 3 1 3
1 1 1 1 1
2 1 2 1 3
4 1 4 1 4
1 1 2 1 2
2 1 3 1 3
4 1 4 1 4
1 2 1 2 1
2 2 2 2 2
4 2 4 2 4
3 2 3 2 3
4 2 4 2 1
2 2 2 2 2
3 2 3 2 4
1 2 1 2 1
3 2 3 2 3
1
1
1
1
1
1
1
1
1
2
2
2
2
2
2
2
2
2
1
2
3
2
3
1
2
3
1
1
3
4
3
1
2
4
2
4
1
1
1
1
1
1
1
1
1
2
2
2
2
2
2
2
2
1
2
3
2
4
1
2
3
3
1
3
4
4
1
2
4
2
1
1
1
1
1
1
1
1
2
2
2
2
2
2
2
2
2
1
2
3
2
4
1
2
4
1
2
3
1
4
1
2
1
2
1
1
1
1
1
1
1
1
2
2
2
2
2
2
2
2
2
1
2
3
2
4
1
2
4
1
2
4
3
4
1
2
1
3
Create the GENDFMT. and OCCUPFMT. formats. PROC FORMAT creates formats for the
variables Gender and Occupation.
proc format;
value gendfmt 1=Female
2=Male
other=*** Data Entry Error ***;
value occupfmt 1=Technical
2=Manager/Supervisor
3=Clerical
4=Administrative
other=*** Data Entry Error ***;
run;
Create the report and specify the table options. The FORMAT= option species the 8.2
format as the default format for the value in each table cell.
proc tabulate data=jobclass format=8.2;
Specify subgroups for the analysis. The CLASS statement identies Gender and Occupation
as class variables.
class gender occupation;
Program
1271
Dene the table rows. The TABLE statement creates a set of rows for each formatted value of
Occupation and for all jobs together. Text in quotation marks supplies a header for the
corresponding row.
The asterisk in the row dimension indicates that the statistics that follow in parentheses are
nested within the values of Occupation and All to form sets of rows. Each set of rows includes
four statistics:
3 N, the frequency count. The format modier (F=9.) writes the values of N without the
decimal places that the default format would use. It also extends the column width to nine
characters so that the word Employees ts on one line.
Dene the table columns and specify the amount of space for row headings. The
column dimension creates a column for each formatted value of Gender and for all employees.
Text in quotation marks supplies the heading for the corresponding column. The RTS= option
provides 50 characters per line for row headings.
gender=Gender all=All Employees/ rts=50;
Format the output. The FORMAT statement assigns formats to the variables Gender and
Occupation.
format gender gendfmt. occupation occupfmt.;
1272
Output
Chapter 47
Output
Gender Distribution
within Job Classes
-------------------------------------------------------------------------------|
|
Gender
|
|
|
|-------------------|
All
|
|
| Female | Male
|Employees|
|------------------------------------------------+---------+---------+---------|
|Job Class
|
|
|
|
|
|-----------------------+------------------------|
|
|
|
|Technical
|Number of employees
|
16|
18|
34|
|
|------------------------+---------+---------+---------|
|
|Percent of row total
|
47.06|
52.94|
100.00|
|
|------------------------+---------+---------+---------|
|
|Percent of column total |
26.23|
29.03|
27.64|
|
|------------------------+---------+---------+---------|
|
|Percent of total
|
13.01|
14.63|
27.64|
|-----------------------+------------------------+---------+---------+---------|
|Manager/Supervisor
|Number of employees
|
20|
15|
35|
|
|------------------------+---------+---------+---------|
|
|Percent of row total
|
57.14|
42.86|
100.00|
|
|------------------------+---------+---------+---------|
|
|Percent of column total |
32.79|
24.19|
28.46|
|
|------------------------+---------+---------+---------|
|
|Percent of total
|
16.26|
12.20|
28.46|
|-----------------------+------------------------+---------+---------+---------|
|Clerical
|Number of employees
|
14|
14|
28|
|
|------------------------+---------+---------+---------|
|
|Percent of row total
|
50.00|
50.00|
100.00|
|
|------------------------+---------+---------+---------|
|
|Percent of column total |
22.95|
22.58|
22.76|
|
|------------------------+---------+---------+---------|
|
|Percent of total
|
11.38|
11.38|
22.76|
|-----------------------+------------------------+---------+---------+---------|
|Administrative
|Number of employees
|
11|
15|
26|
|
|------------------------+---------+---------+---------|
|
|Percent of row total
|
42.31|
57.69|
100.00|
|
|------------------------+---------+---------+---------|
|
|Percent of column total |
18.03|
24.19|
21.14|
|
|------------------------+---------+---------+---------|
|
|Percent of total
|
8.94|
12.20|
21.14|
|-----------------------+------------------------+---------+---------+---------|
|All Jobs
|Number of employees
|
61|
62|
123|
|
|------------------------+---------+---------+---------|
|
|Percent of row total
|
49.59|
50.41|
100.00|
|
|------------------------+---------+---------+---------|
|
|Percent of column total |
100.00|
100.00|
100.00|
|
|------------------------+---------+---------+---------|
|
|Percent of total
|
49.59|
50.41|
100.00|
--------------------------------------------------------------------------------
A Closer Look
The part of the TABLE statement that denes the rows of the table uses the PCTN
statistic to calculate three different percentages.
In all calculations of PCTN, the numerator is N, the frequency count for one cell of
the table. The denominator for each occurrence of PCTN is determined by the
denominator denition. The denominator denition appears in angle brackets after the
keyword PCTN. It is a list of one or more expressions. The list tells PROC TABULATE
which frequency counts to sum for the denominator.
A Closer Look
1273
Contents of Subtables
Number of
categories
Figure 47.18 on page 1274 highlights these subtables and the frequency counts for each
category.
1274
A Closer Look
Chapter 47
Figure 47.18
Occupation
and All
Occupation and Gender
---------------------------------------------------------------------|
|
Gender
|
|
|-------------------|
|
| Female | Male |
|------------------------------------------------+---------+---------+
|Job Class
|
|
|
|
|-----------------------+------------------------|
|
|
|Technical
|Number of employees
|
16|
18|
|
|------------------------+---------+---------+
|
|Percent of row total
|
47.06|
52.94|
|
|------------------------+---------+---------+
|
|Percent of column total |
26.23|
29.03|
|
|------------------------+---------+---------+
|
|Percent of total
|
13.01|
14.63|
|-----------------------+------------------------+---------+---------+
|Manager/Supervisor
|Number of employees
|
20|
15|
|
|------------------------+---------+---------+
|
|Percent of row total
|
57.14|
42.86|
|
|------------------------+---------+---------+
|
|Percent of column total |
32.79|
24.19|
|
|------------------------+---------+---------+
|
|Percent of total
|
16.26|
12.20|
|-----------------------+------------------------+---------+---------+
|Clerical
|Number of employees
|
14|
14|
|
|------------------------+---------+---------+
|
|Percent of row total
|
50.00|
50.50|
|
|------------------------+---------+---------+
|
|Percent of column total |
22.95|
22.58|
|
|------------------------+---------+---------+
|
|Percent of total
|
11.38|
11.38|
|-----------------------+------------------------+---------+---------+
|Administrative
|Number of employees
|
11|
15|
|
|------------------------+---------+---------+
|
|Percent of row total
|
42.31|
57.69|
|
|------------------------+---------+---------+
|
|Percent of column total |
18.03|
24.19|
|
|------------------------+---------+---------+
|
|Percent of total
|
8.94|
12.20|
|-----------------------+------------------------+---------+---------+
----------|
|
| All |
|Employees|
+---------|
|
|
|
|
|
34|
+---------|
| 100.00|
+---------|
|
27.64|
+---------|
|
27.64|
+---------|
|
35|
+---------|
| 100.00|
+---------|
|
28.46|
+---------|
|
28.46|
+---------|
|
28|
+---------|
| 100.00|
+---------|
|
22.76|
+---------|
|
22.76|
+---------|
|
26|
+---------|
| 100.00|
+---------|
|
21.14|
+---------|
|
21.14|
+---------|
|-----------------------+------------------------+---------+---------+
|All Jobs
|Number of employees
|
61|
62|
|
|------------------------+---------+---------+
|
|Percent of row total
|
49.59|
50.41|
|
|------------------------+---------+---------+
|
|Percent of column total | 100.00| 100.00|
|
|------------------------+---------+---------+
|
|Percent of total
|
49.59|
50.41|
----------------------------------------------------------------------
+---------|
|
123|
+---------|
| 100.00|
+---------|
| 100.00|
+---------|
| 100.00|
-----------
All
and All
Each use of PCTN nests a row of statistics within each value of Occupation and All.
Each denominator denition tells PROC TABULATE which frequency counts to sum for
the denominators in that row. This section explains how PROC TABULATE interprets
these denominator denitions.
Row Percentages
The part of the TABLE statement that calculates the row percentages and that labels
the row is
pctn<gender all>=Row percent
Consider how PROC TABULATE interprets this denominator denition for each
subtable.
A Closer Look
1275
Gender Distribution
within Job Classes
-------------------------------------------------------------------------------|
|
Gender
|
|
|
|-------------------| All |
|
| Female | Male |Employees|
|------------------------------------------------+---------+---------+---------|
|Job Class
|
|
|
|
|
|-----------------------+------------------------|
|
|
|
|Technical
|Number of employees
|
16|
18|
34|
|
|------------------------+---------+---------+---------|
|
|Percent of row total
|
47.06|
52.94| 100.00|
|
|------------------------+---------+---------+---------|
|
|Percent of column total |
26.23|
29.03|
27.64|
|
|------------------------+---------+---------+---------|
|
|Percent of total
|
13.01|
14.63|
27.64|
|-----------------------+------------------------+---------+---------+---------|
|Manager/Supervisor
|Number of employees
|
20|
15|
35|
|
|------------------------+---------+---------+---------|
|
|Percent of row total
|
57.14|
42.86| 100.00|
|
|------------------------+---------+---------+---------|
|
|Percent of column total |
32.79|
24.19|
28.46|
|
|------------------------+---------+---------+---------|
|
|Percent of total
|
16.26|
12.20|
28.46|
|-----------------------+------------------------+---------+---------+---------|
|Clerical
|Number of employees
|
14|
14|
28|
|
|------------------------+---------+---------+---------|
|
|Percent of row total
|
50.00|
50.50| 100.00|
|
|------------------------+---------+---------+---------|
|
|Percent of column total |
22.95|
22.58|
22.76|
|
|------------------------+---------+---------+---------|
|
|Percent of total
|
11.38|
11.38|
22.76|
|-----------------------+------------------------+---------+---------+---------|
|Administrative
|Number of employees
|
11|
15|
26|
|
|------------------------+---------+---------+---------|
|
|Percent of row total
|
42.31|
57.69| 100.00|
|
|------------------------+---------+---------+---------|
|
|Percent of column total |
18.03|
24.19|
21.14|
|
|------------------------+---------+---------+---------|
|
|Percent of total
|
8.94|
12.20|
21.14|
|-----------------------+------------------------+---------+---------+---------|
|All Jobs
|Number of employees
|
61|
62|
123|
|
|------------------------+---------+---------+---------|
|
|Percent of row total
|
49.59|
50.41| 100.00|
|
|------------------------+---------+---------+---------|
|
|Percent of column total | 100.00| 100.00| 100.00|
|
|------------------------+---------+---------+---------|
|
|Percent of total
|
49.59|
50.41| 100.00|
--------------------------------------------------------------------------------
PROC TABULATE looks at the rst element in the denominator denition, Gender,
and asks if Gender contributes to the subtable. Because Gender does contribute to the
subtable, PROC TABULATE uses it as the denominator denition. This denominator
denition tells PROC TABULATE to sum the frequency counts for all occurrences of
Gender within the same value of Occupation.
For example, the denominator for the category female, technical is the sum of all
frequency counts for all categories in this subtable for which the value of Occupation is
technical. There are two such categories: female, technical and male,
technical. The corresponding frequency counts are 16 and 18. Therefore, the
denominator for this category is 16+18, or 34.
Gender Distribution
within Job Classes
-------------------------------------------------------------------------------|
|
Gender
|
|
|
|-------------------| All |
|
| Female | Male |Employees|
|------------------------------------------------+---------+---------+---------|
|Job Class
|
|
|
|
|
|-----------------------+------------------------|
|
|
|
|Technical
|Number of employees
|
16|
18|
34|
|
|------------------------+---------+---------+---------|
|
|Percent of row total
|
47.06|
52.94| 100.00|
|
|------------------------+---------+---------+---------|
|
|Percent of column total |
26.23|
29.03|
27.64|
|
|------------------------+---------+---------+---------|
|
|Percent of total
|
13.01|
14.63|
27.64|
|-----------------------+------------------------+---------+---------+---------|
|Manager/Supervisor
|Number of employees
|
20|
15|
35|
|
|------------------------+---------+---------+---------|
|
|Percent of row total
|
57.14|
42.86| 100.00|
|
|------------------------+---------+---------+---------|
|
|Percent of column total |
32.79|
24.19|
28.46|
|
|------------------------+---------+---------+---------|
|
|Percent of total
|
16.26|
12.20|
28.46|
|-----------------------+------------------------+---------+---------+---------|
|Clerical
|Number of employees
|
14|
14|
28|
|
|------------------------+---------+---------+---------|
|
|Percent of row total
|
50.00|
50.50| 100.00|
|
|------------------------+---------+---------+---------|
|
|Percent of column total |
22.95|
22.58|
22.76|
|
|------------------------+---------+---------+---------|
|
|Percent of total
|
11.38|
11.38|
22.76|
|-----------------------+------------------------+---------+---------+---------|
|Administrative
|Number of employees
|
11|
15|
26|
|
|------------------------+---------+---------+---------|
|
|Percent of row total
|
42.31|
57.69| 100.00|
|
|------------------------+---------+---------+---------|
|
|Percent of column total |
18.03|
24.19|
21.14|
|
|------------------------+---------+---------+---------|
|
|Percent of total
|
8.94|
12.20|
21.14|
|-----------------------+------------------------+---------+---------+---------|
|All Jobs
|Number of employees
|
61|
62|
123|
|
|------------------------+---------+---------+---------|
|
|Percent of row total
|
49.59|
50.41| 100.00|
|
|------------------------+---------+---------+---------|
|
|Percent of column total | 100.00| 100.00| 100.00|
|
|------------------------+---------+---------+---------|
|
|Percent of total
|
49.59|
50.41| 100.00|
--------------------------------------------------------------------------------
PROC TABULATE looks at the rst element in the denominator denition, Gender,
and asks if Gender contributes to the subtable. Because Gender does contribute to the
subtable, PROC TABULATE uses it as the denominator denition. This denominator
denition tells PROC TABULATE to sum the frequency counts for all occurrences of
Gender in the subtable.
For example, the denominator for the category all, female is the sum of the
frequency counts for all, female and all, male. The corresponding frequency counts
are 61 and 62. Therefore, the denominator for cells in this subtable is 61+62, or 123.
1276
A Closer Look
Chapter 47
Gender Distribution
within Job Classes
-------------------------------------------------------------------------------|
|
Gender
|
|
|
|-------------------| All |
|
| Female | Male |Employees|
|------------------------------------------------+---------+---------+---------|
|Job Class
|
|
|
|
|
|-----------------------+------------------------|
|
|
|
|Technical
|Number of employees
|
16|
18|
34|
|
|------------------------+---------+---------+---------|
|
|Percent of row total
|
47.06|
52.94| 100.00|
|
|------------------------+---------+---------+---------|
|
|Percent of column total |
26.23|
29.03|
27.64|
|
|------------------------+---------+---------+---------|
|
|Percent of total
|
13.01|
14.63|
27.64|
|-----------------------+------------------------+---------+---------+---------|
|Manager/Supervisor
|Number of employees
|
20|
15|
35|
|
|------------------------+---------+---------+---------|
|
|Percent of row total
|
57.14|
42.86| 100.00|
|
|------------------------+---------+---------+---------|
|
|Percent of column total |
32.79|
24.19|
28.46|
|
|------------------------+---------+---------+---------|
|
|Percent of total
|
16.26|
12.20|
28.46|
|-----------------------+------------------------+---------+---------+---------|
|Clerical
|Number of employees
|
14|
14|
28|
|
|------------------------+---------+---------+---------|
|
|Percent of row total
|
50.00|
50.50| 100.00|
|
|------------------------+---------+---------+---------|
|
|Percent of column total |
22.95|
22.58|
22.76|
|
|------------------------+---------+---------+---------|
|
|Percent of total
|
11.38|
11.38|
22.76|
|-----------------------+------------------------+---------+---------+---------|
|Administrative
|Number of employees
|
11|
15|
26|
|
|------------------------+---------+---------+---------|
|
|Percent of row total
|
42.31|
57.69| 100.00|
|
|------------------------+---------+---------+---------|
|
|Percent of column total |
18.03|
24.19|
21.14|
|
|------------------------+---------+---------+---------|
|
|Percent of total
|
8.94|
12.20|
21.14|
|-----------------------+------------------------+---------+---------+---------|
|All Jobs
|Number of employees
|
61|
62|
123|
|
|------------------------+---------+---------+---------|
|
|Percent of row total
|
49.59|
50.41| 100.00|
|
|------------------------+---------+---------+---------|
|
|Percent of column total | 100.00| 100.00| 100.00|
|
|------------------------+---------+---------+---------|
|
|Percent of total
|
49.59|
50.41| 100.00|
--------------------------------------------------------------------------------
PROC TABULATE looks at the rst element in the denominator denition, Gender,
and asks if Gender contributes to the subtable. Because Gender does not contribute to
the subtable, PROC TABULATE looks at the next element in the denominator
denition, which is All. The variable All does contribute to this subtable, so PROC
TABULATE uses it as the denominator denition. All is a reserved class variable with
only one category. Therefore, this denominator denition tells PROC TABULATE to use
the frequency count of All as the denominator.
For example, the denominator for the category clerical, all is the frequency
count for that category, 28.
Note: In these table cells, because the numerator and the denominator are the
same, the row percentages in this subtable are all 100. 4
Gender Distribution
within Job Classes
-------------------------------------------------------------------------------|
|
Gender
|
|
|
|-------------------| All |
|
| Female | Male |Employees|
|------------------------------------------------+---------+---------+---------|
|Job Class
|
|
|
|
|
|-----------------------+------------------------|
|
|
|
|Technical
|Number of employees
|
16|
18|
34|
|
|------------------------+---------+---------+---------|
|
|Percent of row total
|
47.06|
52.94| 100.00|
|
|------------------------+---------+---------+---------|
|
|Percent of column total |
26.23|
29.03|
27.64|
|
|------------------------+---------+---------+---------|
|
|Percent of total
|
13.01|
14.63|
27.64|
|-----------------------+------------------------+---------+---------+---------|
|Manager/Supervisor
|Number of employees
|
20|
15|
35|
|
|------------------------+---------+---------+---------|
|
|Percent of row total
|
57.14|
42.86| 100.00|
|
|------------------------+---------+---------+---------|
|
|Percent of column total |
32.79|
24.19|
28.46|
|
|------------------------+---------+---------+---------|
|
|Percent of total
|
16.26|
12.20|
28.46|
|-----------------------+------------------------+---------+---------+---------|
|Clerical
|Number of employees
|
14|
14|
28|
|
|------------------------+---------+---------+---------|
|
|Percent of row total
|
50.00|
50.50| 100.00|
|
|------------------------+---------+---------+---------|
|
|Percent of column total |
22.95|
22.58|
22.76|
|
|------------------------+---------+---------+---------|
|
|Percent of total
|
11.38|
11.38|
22.76|
|-----------------------+------------------------+---------+---------+---------|
|Administrative
|Number of employees
|
11|
15|
26|
|
|------------------------+---------+---------+---------|
|
|Percent of row total
|
42.31|
57.69| 100.00|
|
|------------------------+---------+---------+---------|
|
|Percent of column total |
18.03|
24.19|
21.14|
|
|------------------------+---------+---------+---------|
|
|Percent of total
|
8.94|
12.20|
21.14|
|-----------------------+------------------------+---------+---------+---------|
|All Jobs
|Number of employees
|
61|
62|
123|
|
|------------------------+---------+---------+---------|
|
|Percent of row total
|
49.59|
50.41| 100.00|
|
|------------------------+---------+---------+---------|
|
|Percent of column total | 100.00| 100.00| 100.00|
|
|------------------------+---------+---------+---------|
|
|Percent of total
|
49.59|
50.41| 100.00|
--------------------------------------------------------------------------------
PROC TABULATE looks at the rst element in the denominator denition, Gender,
and asks if Gender contributes to the subtable. Because Gender does not contribute to
the subtable, PROC TABULATE looks at the next element in the denominator
denition, which is All. The variable All does contribute to this subtable, so PROC
TABULATE uses it as the denominator denition. All is a reserved class variable with
only one category. Therefore, this denominator denition tells PROC TABULATE to use
the frequency count of All as the denominator.
There is only one category in this subtable: all, all. The denominator for this
category is 123.
Note: In this table cell, because the numerator and denominator are the same, the
row percentage in this subtable is 100. 4
Column Percentages
The part of the TABLE statement that calculates the column percentages and labels the
row is
pctn<occupation all>=Column percent
Consider how PROC TABULATE interprets this denominator denition for each
subtable.
A Closer Look
1277
Gender Distribution
within Job Classes
-------------------------------------------------------------------------------|
|
Gender
|
|
|
|-------------------| All |
|
| Female | Male |Employees|
|------------------------------------------------+---------+---------+---------|
|Job Class
|
|
|
|
|
|-----------------------+------------------------|
|
|
|
|Technical
|Number of employees
|
16|
18|
34|
|
|------------------------+---------+---------+---------|
|
|Percent of row total
|
47.06|
52.94| 100.00|
|
|------------------------+---------+---------+---------|
|
|Percent of column total |
26.23|
29.03|
27.64|
|
|------------------------+---------+---------+---------|
|
|Percent of total
|
13.01|
14.63|
27.64|
|-----------------------+------------------------+---------+---------+---------|
|Manager/Supervisor
|Number of employees
|
20|
15|
35|
|
|------------------------+---------+---------+---------|
|
|Percent of row total
|
57.14|
42.86| 100.00|
|
|------------------------+---------+---------+---------|
|
|Percent of column total |
32.79|
24.19|
28.46|
|
|------------------------+---------+---------+---------|
|
|Percent of total
|
16.26|
12.20|
28.46|
|-----------------------+------------------------+---------+---------+---------|
|Clerical
|Number of employees
|
14|
14|
28|
|
|------------------------+---------+---------+---------|
|
|Percent of row total
|
50.00|
50.50| 100.00|
|
|------------------------+---------+---------+---------|
|
|Percent of column total |
22.95|
22.58|
22.76|
|
|------------------------+---------+---------+---------|
|
|Percent of total
|
11.38|
11.38|
22.76|
|-----------------------+------------------------+---------+---------+---------|
|Administrative
|Number of employees
|
11|
15|
26|
|
|------------------------+---------+---------+---------|
|
|Percent of row total
|
42.31|
57.69| 100.00|
|
|------------------------+---------+---------+---------|
|
|Percent of column total |
18.03|
24.19|
21.14|
|
|------------------------+---------+---------+---------|
|
|Percent of total
|
8.94|
12.20|
21.14|
|-----------------------+------------------------+---------+---------+---------|
|All Jobs
|Number of employees
|
61|
62|
123|
|
|------------------------+---------+---------+---------|
|
|Percent of row total
|
49.59|
50.41| 100.00|
|
|------------------------+---------+---------+---------|
|
|Percent of column total | 100.00| 100.00| 100.00|
|
|------------------------+---------+---------+---------|
|
|Percent of total
|
49.59|
50.41| 100.00|
--------------------------------------------------------------------------------
Gender Distribution
within Job Classes
-------------------------------------------------------------------------------|
|
Gender
|
|
|
|-------------------| All |
|
| Female | Male |Employees|
|------------------------------------------------+---------+---------+---------|
|Job Class
|
|
|
|
|
|-----------------------+------------------------|
|
|
|
|Technical
|Number of employees
|
16|
18|
34|
|
|------------------------+---------+---------+---------|
|
|Percent of row total
|
47.06|
52.94| 100.00|
|
|------------------------+---------+---------+---------|
|
|Percent of column total |
26.23|
29.03|
27.64|
|
|------------------------+---------+---------+---------|
|
|Percent of total
|
13.01|
14.63|
27.64|
|-----------------------+------------------------+---------+---------+---------|
|Manager/Supervisor
|Number of employees
|
20|
15|
35|
|
|------------------------+---------+---------+---------|
|
|Percent of row total
|
57.14|
42.86| 100.00|
|
|------------------------+---------+---------+---------|
|
|Percent of column total |
32.79|
24.19|
28.46|
|
|------------------------+---------+---------+---------|
|
|Percent of total
|
16.26|
12.20|
28.46|
|-----------------------+------------------------+---------+---------+---------|
|Clerical
|Number of employees
|
14|
14|
28|
|
|------------------------+---------+---------+---------|
|
|Percent of row total
|
50.00|
50.50| 100.00|
|
|------------------------+---------+---------+---------|
|
|Percent of column total |
22.95|
22.58|
22.76|
|
|------------------------+---------+---------+---------|
|
|Percent of total
|
11.38|
11.38|
22.76|
|-----------------------+------------------------+---------+---------+---------|
|Administrative
|Number of employees
|
11|
15|
26|
|
|------------------------+---------+---------+---------|
|
|Percent of row total
|
42.31|
57.69| 100.00|
|
|------------------------+---------+---------+---------|
|
|Percent of column total |
18.03|
24.19|
21.14|
|
|------------------------+---------+---------+---------|
|
|Percent of total
|
8.94|
12.20|
21.14|
|-----------------------+------------------------+---------+---------+---------|
|All Jobs
|Number of employees
|
61|
62|
123|
|
|------------------------+---------+---------+---------|
|
|Percent of row total
|
49.59|
50.41| 100.00|
|
|------------------------+---------+---------+---------|
|
|Percent of column total | 100.00| 100.00| 100.00|
|
|------------------------+---------+---------+---------|
|
|Percent of total
|
49.59|
50.41| 100.00|
--------------------------------------------------------------------------------
1278
A Closer Look
Chapter 47
Gender Distribution
within Job Classes
-------------------------------------------------------------------------------|
|
Gender
|
|
|
|-------------------| All |
|
| Female | Male |Employees|
|------------------------------------------------+---------+---------+---------|
|Job Class
|
|
|
|
|
|-----------------------+------------------------|
|
|
|
|Technical
|Number of employees
|
16|
18|
34|
|
|------------------------+---------+---------+---------|
|
|Percent of row total
|
47.06|
52.94| 100.00|
|
|------------------------+---------+---------+---------|
|
|Percent of column total |
26.23|
29.03|
27.64|
|
|------------------------+---------+---------+---------|
|
|Percent of total
|
13.01|
14.63|
27.64|
|-----------------------+------------------------+---------+---------+---------|
|Manager/Supervisor
|Number of employees
|
20|
15|
35|
|
|------------------------+---------+---------+---------|
|
|Percent of row total
|
57.14|
42.86| 100.00|
|
|------------------------+---------+---------+---------|
|
|Percent of column total |
32.79|
24.19|
28.46|
|
|------------------------+---------+---------+---------|
|
|Percent of total
|
16.26|
12.20|
28.46|
|-----------------------+------------------------+---------+---------+---------|
|Clerical
|Number of employees
|
14|
14|
28|
|
|------------------------+---------+---------+---------|
|
|Percent of row total
|
50.00|
50.50| 100.00|
|
|------------------------+---------+---------+---------|
|
|Percent of column total |
22.95|
22.58|
22.76|
|
|------------------------+---------+---------+---------|
|
|Percent of total
|
11.38|
11.38|
22.76|
|-----------------------+------------------------+---------+---------+---------|
|Administrative
|Number of employees
|
11|
15|
26|
|
|------------------------+---------+---------+---------|
|
|Percent of row total
|
42.31|
57.69| 100.00|
|
|------------------------+---------+---------+---------|
|
|Percent of column total |
18.03|
24.19|
21.14|
|
|------------------------+---------+---------+---------|
|
|Percent of total
|
8.94|
12.20|
21.14|
|-----------------------+------------------------+---------+---------+---------|
|All Jobs
|Number of employees
|
61|
62|
123|
|
|------------------------+---------+---------+---------|
|
|Percent of row total
|
49.59|
50.41| 100.00|
|
|------------------------+---------+---------+---------|
|
|Percent of column total | 100.00| 100.00| 100.00|
|
|------------------------+---------+---------+---------|
|
|Percent of total
|
49.59|
50.41| 100.00|
--------------------------------------------------------------------------------
Gender Distribution
within Job Classes
-------------------------------------------------------------------------------|
|
Gender
|
|
|
|-------------------| All |
|
| Female | Male |Employees|
|------------------------------------------------+---------+---------+---------|
|Job Class
|
|
|
|
|
|-----------------------+------------------------|
|
|
|
|Technical
|Number of employees
|
16|
18|
34|
|
|------------------------+---------+---------+---------|
|
|Percent of row total
|
47.06|
52.94| 100.00|
|
|------------------------+---------+---------+---------|
|
|Percent of column total |
26.23|
29.03|
27.64|
|
|------------------------+---------+---------+---------|
|
|Percent of total
|
13.01|
14.63|
27.64|
|-----------------------+------------------------+---------+---------+---------|
|Manager/Supervisor
|Number of employees
|
20|
15|
35|
|
|------------------------+---------+---------+---------|
|
|Percent of row total
|
57.14|
42.86| 100.00|
|
|------------------------+---------+---------+---------|
|
|Percent of column total |
32.79|
24.19|
28.46|
|
|------------------------+---------+---------+---------|
|
|Percent of total
|
16.26|
12.20|
28.46|
|-----------------------+------------------------+---------+---------+---------|
|Clerical
|Number of employees
|
14|
14|
28|
|
|------------------------+---------+---------+---------|
|
|Percent of row total
|
50.00|
50.50| 100.00|
|
|------------------------+---------+---------+---------|
|
|Percent of column total |
22.95|
22.58|
22.76|
|
|------------------------+---------+---------+---------|
|
|Percent of total
|
11.38|
11.38|
22.76|
|-----------------------+------------------------+---------+---------+---------|
|Administrative
|Number of employees
|
11|
15|
26|
|
|------------------------+---------+---------+---------|
|
|Percent of row total
|
42.31|
57.69| 100.00|
|
|------------------------+---------+---------+---------|
|
|Percent of column total |
18.03|
24.19|
21.14|
|
|------------------------+---------+---------+---------|
|
|Percent of total
|
8.94|
12.20|
21.14|
|-----------------------+------------------------+---------+---------+---------|
|All Jobs
|Number of employees
|
61|
62|
123|
|
|------------------------+---------+---------+---------|
|
|Percent of row total
|
49.59|
50.41| 100.00|
|
|------------------------+---------+---------+---------|
|
|Percent of column total | 100.00| 100.00| 100.00|
|
|------------------------+---------+---------+---------|
|
|Percent of total
|
49.59|
50.41| 100.00|
--------------------------------------------------------------------------------
The part of the TABLE statement that calculates the total percentages and labels the
row is
pctn=Total percent
If you do not specify a denominator denition, then PROC TABULATE obtains the
denominator for a cell by totaling all the frequency counts in the subtable. Table 47.9
on page 1279 summarizes the process for all subtables in this example.
Table 47.9
Program
1279
Frequency counts
Total
123
123
61, 62
123
123
123
STYLE= option in
PROC TABULATE statement
CLASSLEV statement
KEYWORD statement
TABLE statement
VAR statement
Other features:
This example creates HTML, RTF, and PDF les and species style elements for
various table regions.
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= and PAGESIZE= are
not set for this example because they have no effect on HTML, RTF, and Printer output.
options nodate pageno=1;
Specify the ODS output lenames. By opening multiple ODS destinations, you can produce
multiple output les in a single execution. The ODS HTML statement produces output that is
written in HTML. The ODS PDF statement produces output in Portable Document Format
(PDF). The ODS RTF statement produces output in Rich Text Format (RTF). The output from
PROC TABULATE goes to each of these les.
ods html body=external-HTML-file;
ods pdf file=external-PDF-file;
1280
Program
Chapter 47
Specify the table options. The STYLE= option in the PROC TABULATE statement species
the style element for the data cells of the table.
proc tabulate data=energy style=[font_weight=bold];
Specify subgroups for the analysis. The STYLE= option in the CLASS statement species
the style element for the class variable name headings.
class region division type / style=[just=center];
Specify the style attributes for the class variable value headings. The STYLE= option in
the CLASSLEV statement species the style element for the class variable level value headings.
classlev region division type / style=[just=left];
Specify the analysis variable and its style attributes. The STYLE= option in the VAR
statement species a style element for the variable name headings.
var expenditures / style=[font_size=3];
Specify the style attributes for keywords, and label the all keyword. The STYLE=
option in the KEYWORD statement species a style element for keywords. The KEYLABEL
statement assigns a label to the keyword.
keyword all sum / style=[font_width=wide];
keylabel all="Total";
Dene the table rows and columns and their style attributes. The STYLE= option in the
dimension expression overrides any other STYLE= specications in PROC TABULATE that
specify attributes for table cells. The STYLE= option after the slash (/) species attributes for
parts of the table other than table cells.
table (region all)*(division all*[style=[background=yellow]]),
(type all)*(expenditures*f=dollar10.) /
style=[bordercolor=blue]
Specify the style attributes for cells with missing values. The STYLE= option in the
MISSTEXT option of the TABLE statement species a style element to use for the text in table
cells that contain missing values.
misstext=[label="Missing" style=[font_weight=light]]
Specify the style attributes for the box above the row titles. The STYLE= option in the
BOX option of the TABLE statement species a style element to use for text in the box above
the row titles.
box=[label="Region by Division by Type"
style=[font_style=italic]];
Format the class variable values. The FORMAT statement assigns formats to Region,
Division, and Type.
format region regfmt. division divfmt. type usetype.;
HTML Output
HTML Output
1281
1282
PDF Output
Chapter 47
PDF Output
References
1283
RTF Output
References
Jain, Raj and Chlamtac, Imrich (1985), The P2 Algorithm for Dynamic Calculation of
Quantiles and Histograms without Storing Observations, Communications of the
Association of Computing Machinery, 28:10.
1284
1285
CHAPTER
48
The TEMPLATE Procedure
Information about the TEMPLATE Procedure
1285
See:
1286
1287
CHAPTER
49
The TIMEPLOT Procedure
Overview: TIMEPLOT Procedure 1287
Syntax: TIMEPLOT Procedure 1289
PROC TIMEPLOT Statement 1290
BY Statement 1291
CLASS Statement 1291
ID Statement 1292
PLOT Statement 1293
Results: TIMEPLOT Procedure 1297
Data Considerations 1297
Procedure Output 1297
Page Layout 1297
Contents of the Listing 1298
ODS Table Names 1298
Missing Values 1298
Examples: TIMEPLOT Procedure 1299
Example 1: Plotting a Single Variable 1299
Example 2: Customizing an Axis and a Plotting Symbol 1301
Example 3: Using a Variable for a Plotting Symbol 1303
Example 4: Superimposing Two Plots 1306
Example 5: Showing Multiple Observations on One Line of a Plot
1308
1288
Chapter 49
Output 49.1 illustrates a simple report that you can produce with PROC TIMEPLOT.
This report shows sales of refrigerators for two sales representatives during the rst six
weeks of the year. The statements that produce the output follow. A DATA
stepExample 1 on page 1299 creates the data set SALES.
options linesize=64 pagesize=60 nodate
pageno=1;
Output 49.1
Month
Week
Icebox
1
1
1
1
1
1
1
1
2
2
2
2
1
1
2
2
3
3
4
4
1
1
2
2
3450.94
2520.04
3240.67
2675.42
3160.45
2805.35
3400.24
2870.61
3550.43
2730.09
3385.74
2670.93
min
max
2520.04
3550.43
*-------------------------------*
|
I |
|I
|
|
I
|
|
I
|
|
I
|
|
I
|
|
I
|
|
I
|
|
I|
|
I
|
|
I
|
|
I
|
*-------------------------------*
Output 49.2 is a more complicated report of the same data set that is used to create
Output 49.1. The statements that create this report
3 create one plot for the sale of refrigerators and one for the sale of stoves
3
3
3
3
For an explanation of the program that produces this report, see Example 5 on page
1308.
Output 49.2
1289
Month
Week
January
January
January
January
February
February
1
2
3
4
1
2
Seller :Kreitz
Stove
$1,312.61
$222.35
$2,263.33
$1,787.45
$2,910.37
$819.69
Seller :LeGrange
Stove
$728.13
$184.24
$267.35
$274.51
$397.98
$2,242.24
min
max
$184.24
$2,910.37
*-------------------------*
|
L
K
|
|!
|
| L
K
|
| L
K
|
| L
K|
|
K
L
|
*-------------------------*
Month
January
January
January
January
February
February
1
2
3
4
1
2
Kreitz
Icebox
Week
$3,450.94
$3,240.67
$3,160.45
$3,400.24
$3,550.43
$3,385.74
LeGrange
Icebox
$2,520.04
$2,675.42
$2,805.35
$2,870.61
$2,730.09
$2,670.93
min
max
$2,520.04
$3,550.43
*-------------------------*
|L
K |
|
L
K
|
|
L
K
|
|
L
K
|
|
L
K|
|
L
K
|
*-------------------------*
Supports the Output Delivery System. See Output Delivery System on page 32
for details.
Tip:
Chapter 3, Statements with the Same Function in Multiple Procedures, on page 57 for
details. You can also use any global statements. See Global Statements on page 18 for
a list.
1290
Chapter 49
To do this
BY
CLASS
ID
PLOT
Options
DATA=SAS-data-set
specication.
Default: 2
0-12
Range:
Featured in:
SPLIT=split-character
species a split character, which controls line breaks in column headings. It also
species that labels be used as column headings. PROC TIMEPLOT breaks a column
heading when it reaches the split character and continues the heading on the next
line. Unless the split character is a blank, it is not part of the column heading. Each
occurrence of the split character counts toward the 256-character maximum for a
label.
Alias:
S=
Default: blank ( )
Note: Column headings can occupy up to three lines. If the column label can be
split into more lines than this xed number, then the split character is used only as a
recommendation on how to split the label. 4
UNIFORM
uniformly scales the horizontal axis across all BY groups. By default, PROC
TIMEPLOT separately determines the scale of the axis for each BY group.
Interaction: UNIFORM also affects the calculation of means for reference lines (see
CLASS Statement
1291
BY Statement
Produces a separate plot for each BY group.
Main discussion:
BY on page 58
BY <DESCENDING> variable-1
<< DESCENDING> variable-n>
<NOTSORTED>;
Required Arguments
variable
species the variable that the procedure uses to form BY groups. You can specify
more than one variable. If you do not use the NOTSORTED option in the BY
statement, then either the observations in the data set must be sorted by all the
variables that you specify, or they must be indexed appropriately. These variables are
called BY variables.
Options
DESCENDING
species that the data set is sorted in descending order by the variable that
immediately follows the word DESCENDING in the BY statement.
NOTSORTED
species that observations are not necessarily sorted in alphabetic or numeric order.
The data is grouped in another way, for example, chronological order.
The requirement for ordering or indexing observations according to the values of
BY variables is suspended for BY-group processing when you use the NOTSORTED
option. In fact, the procedure does not use an index if you specify NOTSORTED. The
procedure denes a BY group as a set of contiguous observations that have the same
values for all BY variables. If observations that have the same values for the BY
variables are not contiguous, then the procedure treats each contiguous set as a
separate BY group.
CLASS Statement
Groups data according to the values of the class variables.
PROC TIMEPLOT uses the formatted values of the CLASS variables to form
classes. Thus, if a format groups the values, then the procedure uses those groups.
Tip:
Featured in:
CLASS variable(s);
1292
ID Statement
Chapter 49
Required Arguments
variable(s)
species one or more variables that the procedure uses to group the data. Variables
in a CLASS statement are called class variables. Class variables can be numeric or
character. Class variables can have continuous values, but they typically have a few
discrete values that dene the classications of the variable. You do not have to sort
the data by class variables.
The values of the class variables appear in the listing. PROC TIMEPLOT prints
and plots one line each time the combination of values of the class variables changes.
Therefore, the output typically is more meaningful if you sort or group the data
according to values of the class variables.
ID Statement
Prints in the listing the values of the variables that you identify.
Featured in:
ID variable(s);
PLOT Statement
Required Arguments
variable(s)
PLOT Statement
Species the plots to produce.
Tip:
PLOT plot-request(s)/option(s);
Table 49.1 on page 1293 summarizes the options that are available in the PLOT
statement.
Table 49.1
To do this
AXIS=
Order the values on the horizontal axis with the largest value in the
leftmost position
REVERSE
HILOC
Connect the leftmost and rightmost symbols on each line of the plot with a
line of hyphens (-) regardless of whether the symbols are reference symbols
or plotting symbols
JOINREF
Suppress the name of the symbol variable in column headings when you use
a CLASS statement
NOSYMNAME
Suppress the listing of the values of the variables that appear in the PLOT
statement
NPP
Specify the number of print positions to use for the horizontal axis
POS=
REF=
REFCHAR=
OVERLAY
OVPCHAR=
1293
1294
PLOT Statement
Chapter 49
Required Arguments
plot-request(s)
species the variable or variables to plot and, optionally, the plotting symbol to use.
By default, each plot request produces a separate plot.
A plot request can have the following forms. You can mix different forms of
requests in one PLOT statement (see Example 4 on page 1306).
variable(s)
identies one or more numeric variables to plot. PROC TIMEPLOT uses the rst
character of the variable name as the plotting symbol.
Featured in: Example 1 on page 1299
(variable(s))=plotting-symbol
identies one or more numeric variables to plot and species the plotting symbol
to use for all variables in the list. You can omit the parentheses if you use only one
variable.
Featured in: Example 2 on page 1301
(variable(s))=symbol-variable
identies one or more numeric variables to plot and species a symbol variable.
PROC TIMEPLOT uses the rst nonblank character of the formatted value of the
symbol variable as the plotting symbol for all variables in the list. The plotting
symbol changes from one observation to the next if the value of the symbol
variable changes. You can omit the parentheses if you use only one variable.
Featured in: Example 3 on page 1303
Options
AXIS=axis-specication
species the range of values to plot on the horizontal axis, as well as the interval
represented by each print position on the axis. PROC TIMEPLOT labels the rst and
last ends of the axis, if space permits.
3 For numeric values, axis-specication can be one of the following or a
combination of both:
n< . . .n>
n TO n <BY increment>
The values must be in either ascending or descending order. Use a negative
value for increment to specify descending order. The specied values are spaced
evenly along the horizontal axis even if the values are not uniformly
distributed. Numeric values can be specied in the following ways:
Specication
Comments
axis=1 2 10
axis=10 to 100 by 5
axis=12 10 to 100 by 5
PLOT Statement
1295
date
time
DT
datetime
increment
one of the valid arguments for the INTCK or INTNX functions. For dates,
increment can be one of the following:
DAY
WEEK
MONTH
QTR
YEAR
For datetimes, increment can be one of the following:
DTDAY
DTWEEK
DTMONTH
DTQTR
DTYEAR
For times, increment can be one of the following:
HOUR
MINUTE
SECOND
For example,
axis=01JAN95d to 01JAN96d by month
axis=01JAN95d to 01JAN96d by qtr
For descriptions of individual intervals, see the chapter on dates, times, and
intervals in SAS Language Reference: Concepts.
Note: You must use a FORMAT statement to print the tick-mark values in
an understandable form. 4
1296
PLOT Statement
Chapter 49
Interaction: The value of POS= (see POS= on page 1296) overrides an interval set
with AXIS=.
Tip: If the range that you specify does not include all your data, then PROC
TIMEPLOT uses angle brackets (< or >) on the left or right border of the plot to
indicate a value that is outside the range.
Featured in: Example 2 on page 1301
HILOC
connects the leftmost plotting symbol to the rightmost plotting symbol with a line of
hyphens (-).
Interactions: If you specify JOINREF, then PROC TIMEPLOT ignores HILOC.
JOINREF
connects the leftmost and rightmost symbols on each line of the plot with a line of
hyphens (-), regardless of whether the symbols are reference symbols or plotting
symbols. However, if a line contains only reference symbols, then PROC TIMEPLOT
does not connect the symbols.
Featured in: Example 3 on page 1303
NOSYMNAME
suppresses the name of the symbol variable in column headings when you use a
CLASS statement. If you use NOSYMNAME, then only the value of the symbol
variable appears in the column heading.
Featured in: Example 5 on page 1308
NPP
suppresses the listing of the values of the variables that appear in the PLOT
statement.
Featured in: Example 3 on page 1303
OVERLAY
plots all requests in one PLOT statement on one set of axes. Otherwise, PROC
TIMEPLOT produces a separate plot for each plot request.
Featured in: Example 4 on page 1306
OVPCHAR=character
species the number of print positions to use for the horizontal axis.
Default: If you omit both POS= and AXIS=, then PROC TIMEPLOT initially
assumes that POS=20. However, if space permits, then this value increases so
that the plot lls the available space.
Interaction: If you specify POS=0 and AXIS=, then the plot lls the available
space. POS= overrides an interval set with AXIS= (see the discussion of AXIS= on
page 1294).
See also: Page Layout on page 1297
Featured in: Example 1 on page 1299
REF=reference-value(s)
draws lines on the plot that are perpendicular to the specied values on the horizontal
axis. The values for reference-value(s) may be constants, or you may use the form
Procedure Output
1297
MEAN(variable(s))
If you use this form of REF=, then PROC TIMEPLOT evaluates the mean for each
variable that you list and draws a reference line for each mean.
Interaction: If you use the UNIFORM option in the PROC TIMEPLOT statement,
then the procedure calculates the mean values for the variables over all
observations for all BY groups. If you do not use UNIFORM, then the procedure
calculates the mean for each variable for each BY group.
Interaction: If a plotting symbol and a reference character coincide, then PROC
REFCHAR=character
value for REFCHAR= that is the same as a plotting symbol, because PROC
TIMEPLOT will interpret the plotting symbols as reference characters and will
not connect the symbols as you expect.
Featured in:
REVERSE
orders the values on the horizontal axis with the largest value in the leftmost
position.
Featured in:
Data Considerations
The input data set usually contains a date variable to use as either a class or an ID
variable. Although PROC TIMEPLOT does not require an input data set sorted by
date, the output is usually more meaningful if the observations are in chronological
order. In addition, if you use a CLASS statement, then the output is more meaningful if
the input data set groups observations according to combinations of class variable
values. (For more information see CLASS Statement on page 1291.)
Procedure Output
Page Layout
For each plot request, PROC TIMEPLOT prints a listing and a plot. PROC
TIMEPLOT determines the arrangement of the page as follows:
1298
Chapter 49
Table Name
Description
Plot
A single plot
OverlaidPlot
Missing Values
Four types of variables can appear in the listing from PROC TIMEPLOT: plot
variables, ID variables, class variables, and symbol variables (as part of some column
headers). Plot variables and symbol variables can also appear in the plot.
Observations with missing values of a class variable form a class of observations.
In the listing, missing values appear as a period (.), a blank, or a special missing
value (the letters A through Z and the underscore (_) character).
In the plot, PROC TIMEPLOT handles different variables in different ways:
Program
1299
3 If you use a symbol variable (see the discussion of plot requests on page 1294),
then PROC TIMEPLOT uses a period (.) as the symbol variable on the plot for all
observations that have a missing value for the symbol variable.
ID statement
PLOT statement arguments:
simple plot request
POS=
This example
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60;
Create the SALES data set. SALES contains weekly information on the sales of refrigerators
and stoves by two sales representatives.
data sales;
input Month Week Seller $ Icebox Stove;
datalines;
1 1 Kreitz
3450.94 1312.61
1 1 LeGrange 2520.04 728.13
1 2 Kreitz
3240.67 222.35
1 2 LeGrange 2675.42 184.24
1 3 Kreitz
3160.45 2263.33
1 3 LeGrange 2805.35 267.35
1 4 Kreitz
3400.24 1787.45
1 4 LeGrange 2870.61 274.51
1300
Output
Chapter 49
2
2
2
2
;
1
1
2
2
Kreitz
LeGrange
Kreitz
LeGrange
3550.43 2910.37
2730.09 397.98
3385.74 819.69
2670.93 2242.24
Plot sales of refrigerators. The plot variable, Icebox, appears in both the listing and the
output. POS= provides 50 print positions for the horizontal axis.
proc timeplot data=sales;
plot icebox / pos=50;
Label the rows in the listing. The values of the ID variables, Month and Week, are used to
uniquely identify each row of the listing.
id month week;
Output
The column headers in the listing are the variables names. The plot uses the default plotting
symbol, which is the rst character of the plot variables name.
Week
Icebox
1
1
1
1
1
1
1
1
2
2
2
2
1
1
2
2
3
3
4
4
1
1
2
2
3450.94
2520.04
3240.67
2675.42
3160.45
2805.35
3400.24
2870.61
3550.43
2730.09
3385.74
2670.93
min
max
2520.04
3550.43
*--------------------------------------------------*
|
I
|
|I
|
|
I
|
|
I
|
|
I
|
|
I
|
|
I
|
|
I
|
|
I|
|
I
|
|
I
|
|
I
|
*--------------------------------------------------*
Program
1301
ID statement
PLOT statement arguments:
using a plotting symbol
AXIS=
Other features:
LABEL statement
PROC FORMAT
SAS system options:
FMTSEARCH=
Data set:
SALES on page 1299
This example
3 species the character to use as the plotting symbol
3 species the minimum and maximum values for the horizontal axis as well as the
interval represented by each print position
3 provides context for the points in the plot by printing in the listing the values of
two variables that are not in the plot
3 uses a variables label as a column header in the listing
3 creates and uses a permanent format.
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page. FMTSEARCH=
adds the SAS data library PROCLIB to the search path that is used to locate formats.
options nodate pageno=1 linesize=80 pagesize=60
fmtsearch=(proclib);
1302
Program
Chapter 49
Create a format for the Month variable. PROC FORMAT creates a permanent format for
Month. The LIBRARY= option species a permanent storage location so that the formats are
available in subsequent SAS sessions. This format is used for examples throughout this chapter.
proc format library=proclib;
value monthfmt 1=January
2=February;
run;
Plot sales of refrigerators. The plot variable, Icebox, appears in both the listing and the
output. The plotting symbol is R. AXIS= sets the minimum value of the axis to 2500 and the
maximum value to 3600. BY 25 species that each print position on the axis represents 25 units
(in this case, dollars).
proc timeplot data=sales;
plot icebox=R / axis=2500 to 3600 by 25;
Label the rows in the listing. The values of the ID variables, Month and Week, are used to
uniquely identify each row of the listing.
id month week;
Apply a label to the sales column in the listing. The LABEL statement associates a label
with the variable Icebox for the duration of the PROC TIMEPLOT step. PROC TIMEPLOT uses
the label as the column header in the listing.
label icebox=Refrigerator;
Apply the MONTHFMT. format to the Month variable. The FORMAT statement assigns a
format to use for Month in the report.
format month monthfmt.;
1303
Output
The column headers in the listing are the variables names (for Month and Week, which have no
labels) and the variables label (for Icebox, which has a label). The plotting symbol is R (for
Refrigerator).
Week
Refrigerator
January
January
January
January
January
January
January
January
February
February
February
February
1
1
2
2
3
3
4
4
1
1
2
2
3450.94
2520.04
3240.67
2675.42
3160.45
2805.35
3400.24
2870.61
3550.43
2730.09
3385.74
2670.93
min
max
2500
3600
*---------------------------------------------*
|
R
|
| R
|
|
R
|
|
R
|
|
R
|
|
R
|
|
R
|
|
R
|
|
R |
|
R
|
|
R
|
|
R
|
*---------------------------------------------*
ID statement
PLOT statement arguments:
using a variable as the plotting symbol
JOINREF
NPP
REF=
REFCHAR=
Data set:
This example
3 species a variable to use as the plotting symbol to distinguish between points for
each of two sales representatives
3 suppresses the printing of the values of the plot variable in the listing
3 draws a reference line to a specied value on the axis and species the character
to use to draw the line
3 connects the leftmost and rightmost symbols on each line of the plot.
1304
Program
Chapter 49
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page. FMTSEARCH=
adds the SAS data library PROCLIB to the search path that is used to locate formats.
options nodate pageno=1 linesize=80 pagesize=60
fmtsearch=(proclib);
Plot sales of stoves. The PLOT statement species both the plotting variable, Stove, and a
symbol variable, Seller. The plotting symbol is the rst letter of the formatted value of the Seller
(in this case, L or K).
proc timeplot data=sales;
plot stove=seller /
Suppress the appearance of the plotting variable in the listing. The values of the Stove
variable will not appear in the listing.
npp
Create a reference line on the plot. REF= and REFCHAR= draw a line of colons at the sales
target of $1500.
ref=1500 refchar=:
Draw a line between the symbols on each line of the plot. In this plot, JOINREF connects
each plotting symbol to the reference line.
joinref
Customize the horizontal axis. AXIS= sets the minimum value of the horizontal axis to 100
and the maximum value to 3000. BY 50 species that each print position on the axis represents
50 units (in this case, dollars).
axis=100 to 3000 by 50;
Output
1305
Label the rows in the listing. The values of the ID variables, Month and Week, are used to
identify each row of the listing.
id month week;
Apply the MONTHFMT. format to the Month variable. The FORMAT statement assigns a
format to use for Month in the report.
format month monthfmt.;
Output
The plot uses the rst letter of the value of Seller as the plotting symbol.
Week
January
January
January
January
January
January
January
January
February
February
February
February
1
1
2
2
3
3
4
4
1
1
2
2
min
max
100
3000
*-----------------------------------------------------------*
|
K---:
|
|
L--------------:
|
| K-------------------------:
|
| L-------------------------:
|
|
:--------------K
|
|
L------------------------:
|
|
:-----K
|
|
L------------------------:
|
|
:---------------------------K |
|
L---------------------:
|
|
K-------------:
|
|
:--------------L
|
*-----------------------------------------------------------*
1306
Chapter 49
This example
3 superimposes two plots on one set of axes
3 species a variable to use as the plotting symbol for one plot and a character to
use as the plotting symbol for the other plot
3 draws a reference line to the mean value of each of the two variables plotted
3 reverses the labeling of the axis so that the largest value is at the far left of the
plot.
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=60;
Specify the number of decimal places to display. MAXDEC= species the number of
decimal places to display in the listing.
proc timeplot data=sales maxdec=0;
Plot sales of both stoves and refrigerators.The PLOT statement requests two plots. One
plot uses the rst letter of the formatted value of Seller to plot the values of Stove. The other
uses the letter R (to match the label Refrigerators) to plot the value of Icebox.
plot stove=seller icebox=R /
Output
1307
Create two reference lines on the plot. REF= draws two reference lines: one perpendicular
to the mean of Stove, the other perpendicular to the mean of Icebox.
ref=mean(stove icebox)
Apply a label to the sales column in the listing. The LABEL statement associates a label
with the variable Icebox for the duration of the PROC TIMEPLOT step. PROC TIMEPLOT uses
the label as the column header in the listing.
label icebox=Refrigerators;
Output
The column header for the variable Icebox in the listing is the variables label (Refrigerators).
One plot uses the rst letter of the value of Seller as the plotting symbol. The other plot uses
the letter R.
Refrigerators
1313
728
222
184
2263
267
1787
275
2910
398
820
2242
3451
2520
3241
2675
3160
2805
3400
2871
3550
2730
3386
2671
max
min
3550.43
184.24
*--------------------------------------------------*
|R
|
K |
|
|
|
R
|
L
|
|
R |
|
K |
|
|
R
|
L|
|
R |
K
|
|
|
| R
|
L |
| R
|
K
|
|
|
| R
|
L |
|R
| K
|
|
|
|
R
|
L
|
| R
|
|
K
|
|
|
R
L
|
|
*--------------------------------------------------*
1308
Chapter 49
CLASS statement
PLOT statement arguments:
creating multiple plots
NOSYMNAME
OVPCHAR=
SALES on page 1299
Data set:
Formats:
This example
3 groups observations for the same month and week so that sales for the two sales
representatives for the same week appear on the same line of the plot
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page. FMTSEARCH=
adds the SAS data library PROCLIB to the search path that is used to locate formats.
options nodate pageno=1 linesize=80 pagesize=60
fmtsearch=(proclib);
Specify subgroups for the analysis. The CLASS statement groups all observations with the
same values of Month and Week into one line in the output. Using the CLASS statement with a
symbol variable produces in the listing one column of the plot variable for each value of the
symbol variable.
proc timeplot data=sales;
class month week;
Output
1309
Plot sales of stoves and refrigerators. Each PLOT statement produces a separate plot. The
plotting symbol is the rst character of the formatted value of the symbol variable: K for Kreitz;
L for LeGrange. POS= species that each plot uses 25 print positions for the horizontal axis.
OVPCHAR= designates the exclamation point as the plotting symbol when the plotting symbols
coincide. NOSYMNAME suppresses the name of the symbol variable Seller from the second
listing.
plot stove=seller / pos=25 ovpchar=!;
plot icebox=seller / pos=25 ovpchar=! nosymname;
Apply formats to values in the listing. The FORMAT statement assigns formats to use for
Stove, Icebox, and Month in the report. The TITLE statement species a title.
format stove icebox dollar10.2 month monthfmt.;
Output
Month
Week
January
January
January
January
February
February
1
2
3
4
1
2
Seller :Kreitz
Stove
$1,312.61
$222.35
$2,263.33
$1,787.45
$2,910.37
$819.69
Seller :LeGrange
Stove
$728.13
$184.24
$267.35
$274.51
$397.98
$2,242.24
min
max
$184.24
$2,910.37
*-------------------------*
|
L
K
|
|!
|
| L
K
|
| L
K
|
| L
K|
|
K
L
|
*-------------------------*
Month
Week
January
January
January
January
February
February
1
2
3
4
1
2
Kreitz
Icebox
$3,450.94
$3,240.67
$3,160.45
$3,400.24
$3,550.43
$3,385.74
LeGrange
Icebox
$2,520.04
$2,675.42
$2,805.35
$2,870.61
$2,730.09
$2,670.93
min
max
$2,520.04
$3,550.43
*-------------------------*
|L
K |
|
L
K
|
|
L
K
|
|
L
K
|
|
L
K|
|
L
K
|
*-------------------------*
1310
1311
CHAPTER
50
The TRANSPOSE Procedure
Overview: TRANSPOSE Procedure 1311
What Does the TRANSPOSE Procedure Do? 1311
What Types of Transpositions Can PROC TRANSPOSE Perform? 1312
Syntax: TRANSPOSE Procedure 1314
PROC TRANSPOSE Statement 1314
BY Statement 1315
COPY Statement 1317
ID Statement 1318
IDLABEL Statement 1319
VAR Statement 1319
Results: TRANSPOSE Procedure 1320
Output Data Set 1320
Output Data Set Variables 1320
Attributes of Transposed Variables 1321
Names of Transposed Variables 1321
Examples: TRANSPOSE Procedure 1321
Example 1: Performing a Simple Transposition 1321
Example 2: Naming Transposed Variables 1323
Example 3: Labeling Transposed Variables 1324
Example 4: Transposing BY Groups 1325
Example 5: Naming Transposed Variables When the ID Variable Has Duplicate Values
Example 6: Transposing Data for Statistical Analysis 1329
1328
1312
Chapter 50
Output 50.1
A Simple Transposition
The Input Data Set
Tester1
Tester2
22
15
17
20
14
15
10
22
Tester3
25
19
19
19
15
17
11
24
21
18
19
16
13
18
9
23
1
Tester4
21
17
19
19
13
19
10
21
COL1
COL2
COL3
COL4
COL5
COL6
COL7
COL8
22
25
21
21
15
19
18
17
17
19
19
19
20
19
16
19
14
15
13
13
15
17
18
19
10
11
9
10
22
24
23
21
1313
Output 50.2
L
o
c
a
t
i
o
n
Cole Pond
Cole Pond
Cole Pond
Eagle Lake
Eagle Lake
Eagle Lake
D
a
t
e
L
e
n
g
t
h
1
02JUN95
03JUL95
04AUG95
02JUN95
03JUL95
04AUG95
31
33
29
32
30
33
W
e
i
g
h
t
1
0.25
0.32
0.23
0.35
0.20
0.30
L
e
n
g
t
h
2
32
34
30
32
36
33
W
e
i
g
h
t
2
0.30
0.41
0.25
0.25
0.45
0.28
L
e
n
g
t
h
3
W
e
i
g
h
t
3
32
37
34
33
.
34
0.25
0.48
0.47
0.30
.
0.42
Date
02JUN95
02JUN95
02JUN95
02JUN95
03JUL95
03JUL95
03JUL95
03JUL95
04AUG95
04AUG95
04AUG95
04AUG95
02JUN95
02JUN95
02JUN95
02JUN95
03JUL95
03JUL95
03JUL95
03JUL95
04AUG95
04AUG95
04AUG95
04AUG95
_NAME_
Length1
Length2
Length3
Length4
Length1
Length2
Length3
Length4
Length1
Length2
Length3
Length4
Length1
Length2
Length3
Length4
Length1
Length2
Length3
Length4
Length1
Length2
Length3
Length4
L
e
n
g
t
h
4
33
32
32
.
.
.
W
e
i
g
h
t
4
0.30
0.28
0.30
.
.
.
Measurement
31
32
32
33
33
34
37
32
29
30
34
32
32
32
33
.
30
36
.
.
33
33
34
.
For a complete explanation of the SAS program that produces these results, see
Example 4 on page 1325.
1314
Chapter 50
You can use the ATTRIB, FORMAT, LABEL, and WHERE statements. See
Chapter 3, Statements with the Same Function in Multiple Procedures, on page 57 for
details. You can also use any global statements. See Global Statements on page 18 for
a list.
Reminder:
To do this
BY
COPY
ID
IDLABEL
VAR
Reminder:
Options
DATA= input-data-set
BY Statement
1315
LABEL= label
species a name for the variable in the output data set that contains the label of the
variable that is being transposed to create the current observation.
Default: _LABEL_
LET
species the name for the variable in the output data set that contains the name of
the variable that is being transposed to create the current observation.
Default: _NAME_
Featured in:
OUT= output-data-set
names the output data set. If output-data-set does not exist, then PROC
TRANSPOSE creates it by using the DATAn naming convention.
Default: DATAn
Featured in:
PREFIX= prex
species a prex to use in constructing names for transposed variables in the output
data set. For example, if PREFIX=VAR, then the names of the variables are VAR1,
VAR2, ,VARn.
Interaction: when you use PREFIX= with an ID statement, the value prexes to
the ID value.
Featured in: Example 2 on page 1323
BY Statement
Denes BY groups.
Main discussion:
Featured in:
BY on page 58
BY <DESCENDING> variable-1
<< DESCENDING> variable-n>
<NOTSORTED>;
1316
BY Statement
Chapter 50
Required Arguments
variable
species the variable that PROC TRANSPOSE uses to form BY groups. You can
specify more than one variable. If you do not use the NOTSORTED option in the BY
statement, then either the observations must be sorted by all the variables that you
specify, or they must be indexed appropriately. Variables in a BY statement are
called BY variables.
Options
DESCENDING
species that the data set is sorted in descending order by the variable that
immediately follows the word DESCENDING in the BY statement.
NOTSORTED
species that observations are not necessarily sorted in alphabetic or numeric order.
The data is grouped in another way, such as chronological order.
The requirement for ordering or indexing observations according to the values of
BY variables is suspended for BY-group processing when you use the NOTSORTED
option. In fact, the procedure does not use an index if you specify NOTSORTED. The
procedure denes a BY group as a set of contiguous observations that have the same
values for all BY variables. If observations with the same values for the BY variables
are not contiguous, then the procedure treats each contiguous set as a separate BY
group.
Figure 50.1
COPY Statement
1317
input
data set
MONTH
sedan
sedan
sports
sports
trucks
trucks
SOLD
26
28
16
19
29
35
jan
feb
jan
feb
jan
feb
TYPE
_NAME_
sedan
sedan
sedan
sedan
sports
sports
sports
sports
trucks
trucks
trucks
trucks
output
data set
SOLD
NOTSOLD
REPAIRED
JUNKED
SOLD
NOTSOLD
REPAIRED
JUNKED
SOLD
NOTSOLD
REPAIRED
JUNKED
NOTSOLD
JUNKED
41
48
15
20
20
22
6
9
6
7
1
3
REPAIRED
4
2
0
1
3
4
COL1
COL2
26
6
41
4
16
6
15
0
29
1
20
3
28
9
48
2
19
7
20
1
35
3
22
4
3 The number of observations in the output data set (12) is the number of BY groups
(3) multiplied by the number of variables that are transposed (4).
3 The maximum number of observations in any BY group in the input data set is
two; therefore, the output data set contains two variables, COL1 and COL2. COL1
and COL2 contain the values of SOLD, NOTSOLD, REPAIRED, and JUNKED.
Note: If a BY group in the input data set has more observations than other BY
groups, then PROC TRANSPOSE assigns missing values in the output data set to the
variables that have no corresponding input observations. 4
COPY Statement
Copies variables directly from the input data set to the output data set without transposing them.
Featured in:
COPY variable(s);
1318
ID Statement
Chapter 50
Required Argument
variable(s)
names one or more variables that the COPY statement copies directly from the input
data set to the output data set without transposing them.
Details
Because the COPY statement copies variables directly to the output data set, the
number of observations in the output data set is equal to the number of observations in
the input data set.
The procedure pads the output data set with missing values if the number of
observations in the input data set is not equal to the number of variables that it
transposes.
ID Statement
Species a variable in the input data set whose formatted values name the transposed variables
in the output data set.
Example 2 on page 1323
Restriction: You cannot use PROC TRANSPOSE with an ID statement or a BY
statement with an engine that supports concurrent access if another user is updating
the data set at the same time.
Featured in:
ID variable;
Required Argument
variable
names the variable whose formatted values name the transposed variables.
Duplicate ID Values
Typically, each formatted ID value occurs only once in the input data set or, if you
use a BY statement, only once within a BY group. Duplicate values cause PROC
TRANSPOSE to issue a warning message and stop. However, if you use the LET option
in the PROC TRANSPOSE statement, then the procedure issues a warning message
about duplicate ID values and transposes the observation that contains the last
occurrence of the duplicate ID value.
VAR Statement
1319
Missing Values
If you use an ID variable that contains a missing value, then PROC TRANSPOSE
writes an error message to the log. The procedure does not transpose observations that
have a missing value for the ID variable.
IDLABEL Statement
Creates labels for the transposed variables.
Restriction:
Featured in:
IDLABEL variable;
Required Argument
variable
names the variable whose values the procedure uses to label the variables that the
ID statement names. variable can be character or numeric.
Note: To see the effect of the IDLABEL statement, print the output data set with
the PRINT procedure by using the LABEL option, or print the contents of the output
data set by using the CONTENTS statement in the DATASETS procedure. 4
VAR Statement
Lists the variables to transpose.
Featured in:
VAR variable(s);
1320
Chapter 50
Required Argument
variable(s)
Details
3 If you omit the VAR statement, the then TRANSPOSE procedure transposes all
numeric variables in the input data set that are not listed in another statement.
3 You must list character variables in a VAR statement if you want to transpose
them.
3 variables that result from transposing the values of each variable into an
observation.
3 a variable that PROC TRANSPOSE creates to identify the source of the values in
each observation in the output data set. This variable is a character variable
whose values are the names of the variables that are transposed from the input
data set. By default, PROC TRANSPOSE names this variable _NAME_. To
override the default name, use the NAME= option. The label for the _NAME_
variable is NAME OF FORMER VARIABLE.
3 variables that PROC TRANSPOSE copies from the input data set when you use
either the BY or COPY statement. These variables have the same names and
values as they do in the input data set.
3 a character variable whose values are the variable labels of the variables that are
being transposed (if any of the variables that the procedure is transposing have
labels). Specify the name of the variable by using the LABEL= option. The default
is _LABEL_.
Note: If the value of the LABEL= option or the NAME= option is the same as a
variable that appears in a BY or COPY statement, then the output data set does
not contain a variable whose values are the names or labels of the transposed
variables.
1321
3 If any variable that the procedure is transposing is character, then all transposed
variables are character. Thus, if you are transposing a numeric variable that has a
character string as a formatted value, then the formatted value is transposed.
3 The length of the transposed variables is equal to the length of the longest
variable that is being transposed.
transposed variables.
3 If you do not use an ID statement or the PREFIX= option, then PROC
TRANSPOSE looks for an input variable called _NAME_ from which to get the
names of the transposed variables.
4 If you do not use an ID statement or the PREFIX= option, and if the input data
set does not contain a variable named _NAME_, then PROC TRANSPOSE assigns
the names COL1, COL2, , COLn to the transposed variables.
1322
Program
Chapter 50
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=40;
Create the SCORE data set.set SCORE contains students names, their identication
numbers, and their grades on two tests and a nal exam.
data score;
input Student $9. +1 StudentID $ Section $ Test1 Test2 Final;
datalines;
Capalleti 0545 1 94 91 87
Dubose
1252 2 51 65 91
Engles
1167 1 95 97 97
Grant
1230 2 63 75 80
Krupski
2527 2 80 76 71
Lundsford 4860 1 92 40 86
McBane
0674 1 75 78 72
;
Transpose the data set. PROC TRANSPOSE transposes only the numeric variables, Test1,
Test2, and Final, because no VAR statement appears and none of the numeric variables appear
in another statement. OUT= puts the result of the transposition in the data set
SCORE_TRANSPOSED.
proc transpose data=score out=score_transposed;
run;
Print the SCORE_TRANSPOSED data set. The NOOBS option suppresses the printing of
observation numbers.
proc print data=score_transposed noobs;
title Student Test Scores in Variables;
run;
Program
1323
Output
In the output data set SCORE_TRANSPOSED, the variables COL1 through COL7 contain the
individual scores for the students. Each observation contains all the scores for one test. The
variable _NAME_ contains the names of the variables from the input data set that were
transposed.
COL1
COL2
COL3
COL4
COL5
COL6
COL7
94
91
87
51
65
91
95
97
97
63
75
80
80
76
71
92
40
86
75
78
72
This example uses the values of a variable and a user-supplied value to name
transposed variables.
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=40;
Transpose the data set. PROC TRANSPOSE transposes only the numeric variables, Test1,
Test2, and Final, because no VAR statement appears. OUT= puts the result of the transposition
in the IDNUMBER data set. NAME= species Test as the name for the variable that contains
the names of the variables in the input data set that the procedure transposes. The procedure
names the transposed variables by using the value from PREFIX=, sn, and the value of the ID
variable StudentID.
proc transpose data=score out=idnumber name=Test
prefix=sn;
1324
Output
Chapter 50
id studentid;
run;
Print the IDNUMBER data set. The NOOBS option suppresses the printing of observation
numbers.
proc print data=idnumber noobs;
title Student Test Scores;
run;
Output
sn0545
sn1252
sn1167
sn1230
sn2527
sn4860
sn0674
94
91
87
51
65
91
95
97
97
63
75
80
80
76
71
92
40
86
75
78
72
This example uses the values of the variable in the IDLABEL statement to label
transposed variables.
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=40;
1325
Transpose the data set. PROC TRANSPOSE transposes only the numeric variables, Test1,
Test2, and Final, because no VAR statement appears. OUT= puts the result of the transposition
in the IDLABEL data set. NAME= species Test as the name for the variable that contains the
names of the variables in the input data set that the procedure transposes. The procedure
names the transposed variables by using the value from PREFIX=, sn, and the value of the ID
variable StudentID.
proc transpose data=score out=idlabel name=Test
prefix=sn;
id studentid;
Assign labels to the output variables. PROC TRANSPOSE uses the values of the variable
Student to label the transposed variables. The procedure provides the following as the label for
the _NAME_ variable:NAME OF FORMER VARIABLE
idlabel student;
run;
Print the IDLABEL data set. The LABEL option causes PROC PRINT to print variable labels
for column headers. The NOOBS option suppresses the printing of observation numbers.
proc print data=idlabel label noobs;
title Student Test Scores;
run;
Output
Capalleti
Test1
Test2
Final
94
91
87
Dubose
Engles
51
65
91
95
97
97
BY statement
VAR statement
Grant
63
75
80
Krupski
80
76
71
Lundsford
92
40
86
McBane
75
78
72
1326
Program
Chapter 50
RENAME=
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=40;
Create the FISHDATA data set. The data in FISHDATA represents length and weight
measurements of sh that were caught at two ponds on three separate days. The raw data is
sorted by Location and Date.
data fishdata;
infile datalines missover;
input Location & $10. Date date7.
Length1 Weight1 Length2 Weight2 Length3 Weight3
Length4 Weight4;
format date date7.;
datalines;
Cole Pond
2JUN95 31 .25 32 .3 32 .25 33 .3
Cole Pond
3JUL95 33 .32 34 .41 37 .48 32 .28
Cole Pond
4AUG95 29 .23 30 .25 34 .47 32 .3
Eagle Lake 2JUN95 32 .35 32 .25 33 .30
Eagle Lake 3JUL95 30 .20 36 .45
Eagle Lake 4AUG95 33 .30 33 .28 34 .42
;
Transpose the data set. OUT= puts the result of the transposition in the FISHLENGTH data
set. RENAME= renames COL1 in the output data set to Measurement.
proc transpose data=fishdata
out=fishlength(rename=(col1=Measurement));
Specify the variables to transpose. The VAR statement limits the variables that PROC
TRANSPOSE transposes.
var length1-length4;
Output
1327
Organize the output data set into BY groups. The BY statement creates BY groups for each
unique combination of values of Location and Date. The procedure does not transpose the BY
variables.
by location date;
run;
Print the FISHLENGTH data set. The NOOBS option suppresses the printing of observation
numbers.
proc print data=fishlength noobs;
title Fish Length Data for Each Location and Date;
run;
Output
This is the output data set, FISHLENGTH. For each BY group in the original data set, PROC
TRANSPOSE creates four observations, one for each variable that it is transposing. Missing
values appear for the variable Measurement (renamed from COL1) when the variables that are
being transposed have no value in the input data set for that BY group. Several observations
have a missing value for Measurement. For example, in the last observation, a missing value
appears because the input data contained no value for Length4 on 04AUG95 at Eagle Lake.
Date
02JUN95
02JUN95
02JUN95
02JUN95
03JUL95
03JUL95
03JUL95
03JUL95
04AUG95
04AUG95
04AUG95
04AUG95
02JUN95
02JUN95
02JUN95
02JUN95
03JUL95
03JUL95
03JUL95
03JUL95
04AUG95
04AUG95
04AUG95
04AUG95
_NAME_
Length1
Length2
Length3
Length4
Length1
Length2
Length3
Length4
Length1
Length2
Length3
Length4
Length1
Length2
Length3
Length4
Length1
Length2
Length3
Length4
Length1
Length2
Length3
Length4
Measurement
31
32
32
33
33
34
37
32
29
30
34
32
32
32
33
.
30
36
.
.
33
33
34
.
1328
Example 5: Naming Transposed Variables When the ID Variable Has Duplicate Values
Chapter 50
This example shows how to use values of a variable (ID) to name transposed
variables even when the ID variable has duplicate values.
Program
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=64 pagesize=40;
Create the STOCKS data set. STOCKS contains stock prices for two competing kite
manufacturers. The prices are recorded for two days, three times a day: at opening, at noon, and
at closing. Notice that the input data set contains duplicate values for the Date variable.
data stocks;
input Company $14. Date
datalines;
Horizon Kites jun11 opening
Horizon Kites jun11 noon
Horizon Kites jun11 closing
Horizon Kites jun12 opening
Horizon Kites jun12 noon
Horizon Kites jun12 closing
SkyHi Kites
jun11 opening
SkyHi Kites
jun11 noon
SkyHi Kites
jun11 closing
SkyHi Kites
jun12 opening
SkyHi Kites
jun12 noon
SkyHi Kites
jun12 closing
;
$ Time $ Price;
29
27
27
27
28
30
43
43
44
44
45
45
Transpose the data set. LET transposes only the last observation for each BY group. PROC
TRANSPOSE transposes only the Price variable. OUT= puts the result of the transposition in
the CLOSE data set.
proc transpose data=stocks out=close let;
Organize the output data set into BY groups. The BY statement creates two BY groups,
one for each company.
by company;
Name the transposed variables. The values of Date are used as names for the transposed
variables.
id date;
run;
Print the CLOSE data set. The NOOBS option suppresses the printing of observation
numbers..
proc print data=close noobs;
title Closing Prices for Horizon Kites and SkyHi Kites;
run;
Output
_NAME_
Price
Price
jun11
27
44
jun12
30
45
COPY statement
VAR statement
1329
1330
Program 1
Chapter 50
Program 1
Set the SAS system options. The NODATE option suppresses the display of the date and time
in the output. PAGENO= species the starting page number. LINESIZE= species the output
line length, and PAGESIZE= species the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=40;
Create the WEIGHTS data set. The data in WEIGHTS represents the results of an exercise
therapy study of three weight-lifting programs: CONT is a control group, RI is a program in
which the number of repetitions is increased, and WI is a program in which the weight is
increased.
data weights;
input Program $ s1-s7;
datalines;
CONT 85 85 86 85 87 86 87
CONT 80 79 79 78 78 79 78
CONT 78 77 77 77 76 76 77
CONT 84 84 85 84 83 84 85
CONT 80 81 80 80 79 79 80
RI
79 79 79 80 80 78 80
RI
83 83 85 85 86 87 87
RI
81 83 82 82 83 83 82
RI
81 81 81 82 82 83 81
RI
80 81 82 82 82 84 86
WI
84 85 84 83 83 83 84
WI
74 75 75 76 75 76 76
WI
83 84 82 81 83 83 82
WI
86 87 87 87 87 87 86
WI
82 83 84 85 84 85 86
;
Create the SPLIT data set. This DATA step rearranges WEIGHTS to create the data set
SPLIT. The DATA step transposes the strength values and creates two new variables: Time and
Subject. SPLIT contains one observation for each repeated measure. SPLIT can be used in a
PROC GLM step for a univariate repeated-measures analysis.
data split;
set weights;
array s{7} s1-s7;
Subject + 1;
do Time=1 to 7;
Strength=s{time};
output;
end;
drop s1-s7;
run;
Program 2
1331
Print the SPLIT data set. The NOOBS options suppresses the printing of observation
numbers. The OBS= data set option limits the printing to the rst 15 observations. SPLIT has
105 observations.
proc print data=split(obs=15) noobs;
title SPLIT Data Set;
title2 First 15 Observations Only;
run;
Output 1
SPLIT Data Set
First 15 Observations Only
Program
CONT
CONT
CONT
CONT
CONT
CONT
CONT
CONT
CONT
CONT
CONT
CONT
CONT
CONT
CONT
Subject
1
1
1
1
1
1
1
2
2
2
2
2
2
2
3
Time
Strength
1
2
3
4
5
6
7
1
2
3
4
5
6
7
1
85
85
86
85
87
86
87
80
79
79
78
78
79
78
78
Program 2
Transpose the SPLIT data set. PROC TRANSPOSE transposes SPLIT to create TOTSPLIT.
The TOTSPLIT data set contains the same variables as SPLIT and a variable for each strength
measurement (Str1-Str7). TOTSPLIT can be used for either a multivariate repeated-measures
analysis or a univariate repeated-measures analysis.
proc transpose data=split out=totsplit prefix=Str;
Organize the output data set into BY groups, and populate each BY group with
untransposed values.The variables in the BY and COPY statements are not transposed.
TOTSPLIT contains the variables Program, Subject, Time, and Strength with the same values
that are in SPLIT. The BY statement creates the rst observation in each BY group, which
contains the transposed values of Strength. The COPY statement creates the other observations
in each BY group by copying the values of Time and Strength without transposing them.
1332
Output 2
Chapter 50
by program subject;
copy time strength;
Specify the variable to transpose. The VAR statement species the Strength variable as the
only variable to be transposed.
var strength;
run;
Print the TOTSPLIT data set. The NOOBS options suppresses the printing of observation
numbers. The OBS= data set option limits the printing to the rst 15 observations. SPLIT has
105 observations.
proc print data=totsplit(obs=15) noobs;
title TOTSPLIT Data Set;
title2 First 15 Observations Only;
run;
Output 2
The variables in TOTSPLIT with missing values are used only in a multivariate
repeated-measures analysis. The missing values do not preclude this data set from being used
in a repeated-measures analysis because the MODEL statement in PROC GLM ignores
observations with missing values.
1
1
1
1
1
1
1
2
2
2
2
2
2
2
3
1
2
3
4
5
6
7
1
2
3
4
5
6
7
1
85
85
86
85
87
86
87
80
79
79
78
78
79
78
78
_NAME_
Strength
Strength
Strength
85
.
.
.
.
.
.
79
.
.
.
.
.
.
77
86
.
.
.
.
.
.
79
.
.
.
.
.
.
77
85
.
.
.
.
.
.
78
.
.
.
.
.
.
77
87
.
.
.
.
.
.
78
.
.
.
.
.
.
76
86
.
.
.
.
.
.
79
.
.
.
.
.
.
76
87
.
.
.
.
.
.
78
.
.
.
.
.
.
77
1333
CHAPTER
51
The TRANTAB Procedure
Information about the TRANTAB Procedure
1333
1334
1335
CHAPTER
52
The UNIVARIATE Procedure
Information about the UNIVARIATE Procedure
1335
book.
The documentation for the UNIVARIATE procedure has moved to Volume 3 of this
1336
1337
P A R T
Appendices
Appendix
Appendix
Appendix
Appendix
4. . . . . . . . . Recommended Reading
1377
1419
1339
1375
1338
1339
APPENDIX
1
SAS Elementary Statistics
Procedures
Overview 1339
Keywords and Formulas 1340
Simple Statistics 1340
Descriptive Statistics 1342
Quantile and Related Statistics 1345
Hypothesis Testing Statistics 1347
Condence Limits for the Mean 1347
Using Weights 1348
Data Requirements for Summarization Procedures
Statistical Background 1348
Populations and Parameters 1348
Samples and Statistics 1349
Measures of Location 1350
The Mean 1350
The Median 1350
The Mode 1350
Percentiles 1350
Quantiles 1350
Measures of Variability 1354
The Range 1354
The Interquartile Range 1355
The Variance 1355
The Standard Deviation 1355
Coefcient of Variation 1355
Measures of Shape 1355
Skewness 1355
Kurtosis 1356
The Normal Distribution 1356
Sampling Distribution of the Mean 1359
Testing Hypotheses 1369
Dening a Hypothesis 1369
Signicance and Power 1370
Students t Distribution 1371
Probability Values 1372
References 1373
1348
Overview
This appendix provides a brief description of some of the statistical concepts
necessary for you to interpret the output of base SAS procedures for elementary
1340
Appendix 1
statistics. In addition, this appendix lists statistical notation, formulas, and standard
keywords used for common statistics in base SAS procedures. Brief examples illustrate
the statistical concepts.
Table A1.1 on page 1341 lists the most common statistics and the procedures that
compute them.
xi
fi
wi
n
x
=1
is the weight that is associated with xi if you use a WEIGHT statement. The base
procedures automatically exclude the values of xi with missing weights from the
analysis.
By default, the base procedures treat a negative weight as if it is equal to zero.
However, if you use the EXCLNPWGT option in the PROC statement, then the
procedure also excludes those values of xi with nonpositive weights. Note that
most SAS/STAT procedures, such as PROC TTEST and PROC GLM, exclude
values with nonpositive weights by default.
If you omit the WEIGHT statement, then wi
for all i.
X X
w i xi =
s2
=1
is the variance
wi
wi (xi 0 x)2
Simple Statistics
where d is the variance divisor (the VARDEF= option) that you specify in the
PROC statement. Valid values are as follows:
When VARDEF=
d equals .
. .
n01
Pw
Pw 0 1
DF
WEIGHT
WDF
zi
is the standardized variable
(xi
0 x) =s
The standard keywords and formulas for each statistic follow. Some formulas use
keywords to designate the corresponding statistic.
Table A1.1
Statistic
PROC
MEANS and
SUMMARY
PROC
UNIVARIATE
PROC
PROC
TABULATE REPORT
PROC
CORR
PROC
SQL
Number of nonmissing
values
Number of observations
Sum of weights
Mean
Sum
Extreme values
Minimum
Maximum
Range
Uncorrected sum of
squares
Variance
X
X
Covariance
Standard deviation
X
X
1341
1342
Descriptive Statistics
Appendix 1
PROC
MEANS and
SUMMARY
Statistic
PROC
UNIVARIATE
PROC
PROC
TABULATE REPORT
PROC
CORR
PROC
SQL
Coefcient of variation
Skewness
Kurtosis
Condence Limits
of the mean
of the variance
of quantiles
Median
Mode
Percentiles/Deciles/
Quartiles
t test
for mean=0
for mean= 0
Correlation coefcients
Cronbachs alpha
Descriptive Statistics
The keywords for descriptive statistics are
CSS
is the sum of squares corrected for the mean, computed as
Xw x 0 x
i
2
)
CV
is the percent coefcient of variation, computed as
(100s) =x
KURTOSIS | KURT
is the kurtosis, which measures heaviness of tails. When VARDEF=DF, the
kurtosis is computed as
c4 n
zi
Descriptive Statistics
1343
0 1)2
0 (n 0 2) (n 0 3)
3 (n
n(n+1)
= c4 n
X
X
0 1)2
^
((xi 0 x) =i ) 0
(n 0 2) (n 0 3)
2
3 (n 0 1)
4
2
^
wi ((xi 0 x) = ) 0
(n 0 2) (n 0 3)
= c4 n
3 (n
4
zi
03
n
1
=
n
X
X
^
0 x) =i)4 0 3
4
2
^
wi ((xi 0 x) = ) 0 3
((xi
2
where i is 2 =wi . The formula is invariant under the transformation
3 = zw ; z > 0. When you use VARDEF=WDF or VARDEF=WEIGHT, the
wi
i
kurtosisis set to missing.
is the number of xi values that are not missing. Observations with fi less than
one and wi equal to missing or wi
0 (when you use the EXCLNPWGT option)
are excluded from the analysis and are not included in the calculation of N.
NMISS
is the number of xi values that are missing. Observations with fi less than one
and wi equal to missing or wi
0 (when you use the EXCLNPWGT option) are
excluded from the analysis and are not included in the calculation of NMISS.
1344
Descriptive Statistics
Appendix 1
NOBS
is the total number of observations and is calculated as the sum of N and NMISS.
However, if you use the WEIGHT statement, then NOBS is calculated as the sum
of N, NMISS, and the number of observations excluded because of missing or
nonpositive weights.
RANGE
is the range and is calculated as the difference between maximum value and
minimum value.
SKEWNESS | SKEW
is skewness, which measures the tendency of the deviations to be larger in one
direction than in the other. When VARDEF=DF, the skewness is computed as
c3 n
3
zi
n
where c3n is (n01)(n02) . The weighted skewness is computed as
= c3 n
= c3n
X
X
0 x) =j )3
^
3 =2
3
^
wi ((xi 0 x) = )
((xi
3
zi
n
1
=
n
X
X
^
0 x) =j )3
3=2
3
^
wi ((xi 0 x) = )
((xi
3
The formula is invariant under the transformation wi = zwi ; z > 0. When you
use VARDEF=WDF or VARDEF=WEIGHT, the skewnessis set to missing.
s=
qX
w
SUM
is the sum, computed as
Xw x
i
SUMWGT
is the sum of the weights,
W , computed as
Xw
USS
is the uncorrected sum of squares, computed as
Xw x
i
2
i
VAR
is the variance 2 .
Q3 0 Q 1
1345
1346
Appendix 1
You use the QNTLDEF= option (PCTLDEF= in PROC UNIVARIATE) to specify the
method that the procedure uses to compute percentiles. Let n be the number of
nonmissing values for a variable, and let x1 ; x2 ; . . . ; xn represent the ordered values of
the variable such that x1 is the smallest value, x2 is next smallest value, and xn is the
largest value. For the tth percentile between 0 and 1, let p = t=100. Then dene j as
the integer part of np and g as the fractional part of np or (n + 1) p, so that
np = j + g
(n + 1) p = j + g
; ; ;
when QNTLDEF = 1 2 3 or 5
when QNTLDEF = 4
Here, QNTLDEF= species the method that the procedure uses to compute the tth
percentile, as shown in the table that follows.
When you use the WEIGHT statement, the tth percentile is computed as
8
> 1 xi
>2
<
> xi+1
>
:
(
y=
where
xi+1 )
if
if
i
Pw
j =1
i
Pw
j =1
pW
j < pW <
Pw
i+1
j =1
n
Pw
i=1
When the observations have identical weights, the weighted percentiles are the same as
the unweighted percentiles with QNTLDEF=5.
Table A1.2
QNTLDEF= Description
Formula
xnp
weighted average at
np
y = (1 0 g ) xj + gxj +1
where xo is taken to be x1
y = xi
if
y = xj
if g
even
y = xj +1
where i is the integer part of
3
x(n+1)p
g 6=
if g
odd
np + 1
2
y = xj
y = xj +1
y = (1 0 g ) xj + gxj +1
where xn+1 is taken to be xn
1
2
1
2 and
= 1 and j is
2
g=0
if g > 0
if
j is
QNTLDEF= Description
5
Formula
=0
if g > 0
= 1 (xj + xj+1)
2
y = xj +1
y
1347
if
0
pP w
s=
x
By default, 0 is equal to zero. You can use the MU0= option in the PROC
UNIVARIATE statement to specify 0 . You must use VARDEF=DF, which is the
default variance divisor, otherwise T is set to missing.
By default, when you use a WEIGHT statement, the procedure counts the xi
values with nonpositive weights in the degrees of freedom. Use the EXCLNPWGT
option in the PROC statement to exclude values with nonpositive weights. Most
SAS/STAT procedures, such as PROC TTEST and PROC GLM automatically
exclude values with nonpositive weights.
PROBT
is the two-tailed p-value for Students t statistic, T, with n
degrees of freedom.
This is the probability under the null hypothesis of obtaining a more extreme
value of T than is observed in this sample.
01
100 (1 0 )percent
s
6 t(10=2;n01) pP
q P
where s is
(x 0 x) , t
1
n 1
01
wi
LCLM
is the one-sided condence limit below the mean. The one-sided
percent condence interval for the mean has the lower limit
100 (1 0 )
1348
Using Weights
Appendix 1
x 0 t(10;n01)
s
pP w
Unless you use VARDEF=DF, which is the default variance divisor, LCLM is set to
missing.
UCLM
is the one-sided condence limit above the mean. The one-sided
100 (1 )percent condence interval for the mean has the upper limit
x + t(10;n01)
s
pP w
Unless you use VARDEF=DF, which is the default variance divisor, UCLM is set to
missing.
Using Weights
For more information on using weights and an example, see WEIGHT on page 63.
3 SUM, MEAN, MAX, MIN, RANGE, USS, and CSS require at least one nonmissing
observation.
3
3
3
3
3
3
VAR, STD, STDERR, CV, T, and PRT require at least two nonmissing observations.
SKEWNESS requires at least three nonmissing observations.
KURTOSIS requires at least four nonmissing observations.
SKEWNESS, KURTOSIS, T, and PROBT require that STD is greater than zero.
CV requires that MEAN is not equal to zero.
CLM, LCLM, UCLM, STDERR, T, and PROBT require that VARDEF=DF.
Statistical Background
Populations and Parameters
Usually, there is a clearly dened set of elements in which you are interested. This
set of elements is called the universe, and a set of values associated with these elements
is called a population of values. The statistical term population has nothing to do with
people per se. A statistical population is a collection of values, not a collection of people.
For example, a universe is all the students at a particular school, and there could be
two populations of interest: one of height values and one of weight values. Or, a
1349
universe is the set of all widgets manufactured by a particular company, while the
population of values could be the length of time each widget is used before it fails.
A population of values can be described in terms of its cumulative distribution
function, which gives the proportion of the population less than or equal to each
possible value. A discrete population can also be described by a probability function,
which gives the proportion of the population equal to each possible value. A continuous
population can often be described by a density function, which is the derivative of the
cumulative distribution function. A density function can be approximated by a
histogram that gives the proportion of the population lying within each of a series of
intervals of values. A probability density function is like a histogram with an innite
number of innitely small intervals.
In technical literature, when the term distribution is used without qualication, it
generally refers to the cumulative distribution function. In informal writing,
distribution sometimes means the density function instead. Often the word distribution
is used simply to refer to an abstract population of values rather than some concrete
population. Thus, the statistical literature refers to many types of abstract distributions,
such as normal distributions, exponential distributions, Cauchy distributions, and so
on. When a phrase such as normal distribution is used, it frequently does not matter
whether the cumulative distribution function or the density function is intended.
It may be expedient to describe a population in terms of a few measures that
summarize interesting features of the distribution. One such measure, computed from
the population values, is called a parameter. Many different parameters can be dened
to measure different aspects of a distribution.
The most commonly used parameter is the (arithmetic) mean. If the population
contains a nite number of values, then the population mean is computed as the sum of
all the values in the population divided by the number of elements in the population.
For an innite population, the concept of the mean is similar but requires more
complicated mathematics.
E(x) denotes the mean of a population of values symbolized by x, such as height,
where E stands for expected value. You can also consider expected values of 0 1
derived
functions of the original values. For example, if x represents height, then E x 2 is the
expected value of height squared, that is, the mean value of the population obtained by
squaring every value in the population of heights.
1350
Measures of Location
Appendix 1
For example, the sample mean is the usual estimator of the population mean. In the
case of the mean, the formulas for the parameter and the statistic are the same. In
other cases, the formula for a parameter may be different from that of the most
commonly used estimator. The most commonly used estimator is not necessarily the
best estimator in all applications.
Measures of Location
Measures of location include the mean, the median, and the mode. These measures
describe the center of a distribution. In the denitions that follow, notice that if the
entire sample changes by adding a xed amount to each observation, then these
measures of location are shifted by the same xed amount.
The Mean
The population mean
The Median
The population median is the central value, lying above and below half of the
population values. The sample median is the middle value when the data are arranged
in ascending or descending order. For an even number of observations, the midpoint
between the two middle values is usually reported as the median.
The Mode
The mode is the value at which the density of the population is at a maximum. Some
densities have more than one local maximum (peak) and are said to be multimodal.
The sample mode is the value that occurs most often in the sample. By default, PROC
UNIVARIATE reports the lowest such value if there is a tie for the most-often-occurring
sample value. PROC UNIVARIATE lists all possible modes when you specify the
MODES option in the PROC statement. If the population is continuous, then all sample
values occur once, and the sample mode has little use.
Percentiles
Percentiles, including quantiles, quartiles, and the median, are useful for a detailed
study of a distribution. For a set of measurements arranged in order of magnitude, the
pth percentile is the value that has p percent of the measurements below it and (100p)
percent above it. The median is the 50th percentile. Because it may not be possible to
divide your data so that you get exactly the desired percentile, the UNIVARIATE
procedure uses a more precise denition.
The upper quartile of a distribution is the value below which 75 percent of the
measurements fall (the 75th percentile). Twenty-ve percent of the measurements fall
below the lower quartile value.
Quantiles
In the following example, SAS articially generates the data with a pseudorandom
number function. The UNIVARIATE procedure computes a variety of quantiles and
measures of location, and outputs the values to a SAS data set. A DATA step then uses
the SYMPUT routine to assign the values of the statistics to macro variables. The
Quantiles
1351
macro %FORMGEN uses these macro variables to produce value labels for the
FORMAT procedure. PROC CHART uses the resulting format to display the values of
the statistics on a histogram.
options nodate pageno=1 linesize=80 pagesize=52;
title Example of Quantiles and Measures of Location;
data random;
drop n;
do n=1 to 1000;
X=floor(exp(rannor(314159)*.8+1.8));
output;
end;
run;
proc univariate data=random nextrobs=0;
var x;
output out=location
mean=Mean mode=Mode median=Median
q1=Q1 q3=Q3 p5=P5 p10=P10 p90=P90 p95=P95
max=Max;
run;
data _null_;
set location;
call symput(MEAN,round(mean,1));
call symput(MODE,mode);
call symput(MEDIAN,round(median,1));
call symput(Q1,round(q1,1));
call symput(Q3,round(q3,1));
call symput(P5,round(p5,1));
call symput(P10,round(p10,1));
call symput(P90,round(p90,1));
call symput(P95,round(p95,1));
call symput(MAX,min(50,max));
run;
%macro formgen;
%do i=1 %to &max;
%let value=&i;
%if &i=&p5
%if &i=&p10
%if &i=&q1
%if &i=&mode
%if &i=&median
%if &i=&mean
%if &i=&q3
%if &i=&p90
%if &i=&p95
%then
%then
%then
%then
%then
%then
%then
%then
%then
%let
%let
%let
%let
%let
%let
%let
%let
%let
value=&value
value=&value
value=&value
value=&value
value=&value
value=&value
value=&value
value=&value
value=&value
P5;
P10;
Q1;
Mode;
Median;
Mean;
Q3;
P90;
P95;
1352
Quantiles
Appendix 1
%if &i=&max
&i="&value"
%end;
%mend;
run;
Example of Quantiles and Measures of Location
1000
7.605
7.38169794
2.73038523
112271
97.0637467
Sum Weights
Sum Observations
Variance
Kurtosis
Corrected SS
Std Error Mean
1000
7605
54.4894645
11.1870588
54434.975
0.23342978
Variability
7.605000
5.000000
3.000000
Std Deviation
Variance
Range
Interquartile Range
7.38170
54.48946
62.00000
6.00000
-Statistic-
-----p Value------
Students t
Sign
Signed Rank
t
M
S
Pr > |t|
Pr >= |M|
Pr >= |S|
32.57939
494.5
244777.5
<.0001
<.0001
<.0001
Quantiles (Definition 5)
Quantile
Estimate
100% Max
99%
95%
90%
75% Q3
50% Median
25% Q1
10%
5%
1%
0% Min
62.0
37.5
21.5
16.0
9.0
5.0
3.0
2.0
1.0
0.0
0.0
Max
P95
P90
Q3
7.605
62
21.5
16
Median
5
Q1
3
P10
2
P5
1
Mode
3
Quantiles
1353
1354
Measures of Variability
Appendix 1
Frequency
120 + *
| *
| **
| ***
90 +*****
|*****
|*******
|*******
60 +*******
|*********
|*********
|*********
30 +************
|************
*
|**************** *
|*********************** * *
--------------------------------------------------1234567891111111111222222222233333333334444444444>
0123456789012345678901234567890123456789=
5
PPQ M MQ
0
511 e e3
P
P
0 d a
9
9
i n
0
5
M a
o n
d
e
X Midpoint
P5
P10
P90
P95
Q1
Q3
= 5TH PERCENTILE
= 10TH PERCENTILE
= 90TH PERCENTILE
= 95TH PERCENTILE
= 1ST QUARTILE
= 3RD QUARTILE
Measures of Variability
Another group of statistics is important in studying the distribution of a population.
These statistics measure the variability, also called the spread, of values. In the
denitions given in the sections that follow, notice that if the entire sample is changed
by the addition of a xed amount to each observation, then the values of these statistics
are unchanged. If each observation in the sample is multiplied by a constant, however,
then the values of these statistics are appropriately rescaled.
The Range
The sample range is the difference between the largest and smallest values in the
sample. For many populations, at least in statistical theory, the range is innite, so the
sample range may not tell you much about the population. The sample range tends to
increase as the sample size increases. If all sample values are multiplied by a constant,
then the sample range is multiplied by the same constant.
Measures of Shape
1355
The Variance
The population variance, usually denoted by 2 , is the expected value of the squared
difference of the values from the population mean:
2 = E (x
0 )2
The sample variance is denoted by s2 . The difference between a value and the mean
is called a deviation from the mean. Thus, the variance approximates the mean of the
squared deviations.
When all the values lie close to the mean, the variance is small but never less than
zero. When values are more scattered, the variance is larger. If all sample values are
multiplied by a constant, then the sample variance is multiplied by the square of the
constant.
are used in the denominator. The VARDEF=
Sometimes values other than n
option controls what divisor the procedure uses.
01
Coefcient of Variation
The coefcient of variation is a unitless measure of relative variability. It is dened
as the ratio of the standard deviation to the mean expressed as a percentage. The
coefcient of variation is meaningful only if the variable is measured on a ratio scale. If
all sample values are multiplied by a constant, then the sample coefcient of variation
remains unchanged.
Measures of Shape
Skewness
The variance is a measure of the overall size of the deviations from the mean. Since
the formula for the variance squares the deviations, both positive and negative
deviations contribute to the variance in the same way. In many distributions, positive
deviations may tend to be larger in magnitude than negative deviations, or vice versa.
Skewness is a measure of the tendency of the deviations to be larger in one direction
than in the other. For example, the data in the last example are skewed to the right.
1356
Appendix 1
E (x 0 )3 = 3
Because the deviations are cubed rather than squared, the signs of the deviations are
maintained. Cubing the deviations also emphasizes the effects of large deviations. The
formula includes a divisor of 3 to remove the effect of scale, so multiplying all values
by a constant does not change the skewness. Skewness can thus be interpreted as a
tendency for one tail of the population to be heavier than the other. Skewness can be
positive or negative and is unbounded.
Kurtosis
The heaviness of the tails of a distribution affects the behavior of many statistics.
Hence it is useful to have a measure of tail heaviness. One such measure is kurtosis.
The population kurtosis is usually dened as
E (x 0 )4
4
Note:
03
Because the deviations are raised to the fourth power, positive and negative
deviations make the same contribution, while large deviations are strongly emphasized.
Because of the divisor 4 , multiplying each value by a constant has no effect on kurtosis.
Population kurtosis must lie between 2 and + , inclusive. If M3 represents
population skewness and M4 represents population kurtosis, then
M4 > (M3 )2
02
1357
two standard deviations of the mean; and about 99.7 percent are within three standard
deviations. Use of the term normal to describe this particular kind of distribution does
not imply that other kinds of distributions are necessarily abnormal or pathological.
Many statistical methods are designed under the assumption that the population
being sampled is normally distributed. Nevertheless, most real-life populations do not
have normal distributions. Before using any statistical method based on normality
assumptions, you should consult the statistical literature to nd out how sensitive the
method is to nonnormality and, if necessary, check your sample for evidence of
nonnormality.
In the following example, SAS generates a sample from a normal distribution with a
mean of 50 and a standard deviation of 10. The UNIVARIATE procedure performs tests
for location and normality. Because the data are from a normal distribution, all p-values
from the tests for normality are greater than 0.15. The CHART procedure displays a
histogram of the observations. The shape of the histogram is a belllike, normal density.
options nodate pageno=1 linesize=80 pagesize=52;
title 10000 Obs Sample from a Normal Distribution;
title2 with Mean=50 and Standard Deviation=10;
data normaldat;
drop n;
do n=1 to 10000;
X=10*rannor(53124)+50;
output;
end;
run;
proc univariate data=normaldat nextrobs=0 normal
mu0=50 loccount;
var x;
run;
proc format;
picture msd
20=20 3*Std
30=30 2*Std
40=40 1*Std
50=50 Mean
60=60 1*Std
70=70 2*Std
80=80 3*Std
other= ;
run;
options linesize=80
(noedit)
(noedit)
(noedit)
(noedit)
(noedit)
(noedit)
(noedit)
pagesize=42;
proc chart;
vbar x / midpoints=20 to 80 by 2;
format x msd.;
run;
1358
Appendix 1
10000
50.0323744
9.92013874
-0.019929
26016378
19.8274395
Sum Weights
Sum Observations
Variance
Kurtosis
Corrected SS
Std Error Mean
10000
500323.744
98.4091525
-0.0163755
983993.116
0.09920139
Variability
50.03237
50.06492
.
Std Deviation
Variance
Range
Interquartile Range
9.92014
98.40915
76.51343
13.28179
-Statistic-
-----p Value------
Students t
Sign
Signed Rank
t
M
S
Pr > |t|
Pr >= |M|
Pr >= |S|
0.32635
26
174063
0.7442
0.6101
0.5466
Value
5026
10000
4974
--Statistic---
-----p Value------
Kolmogorov-Smirnov
Cramer-von Mises
Anderson-Darling
D
W-Sq
A-Sq
Pr > D
Pr > W-Sq
Pr > A-Sq
0.006595
0.049963
0.371151
>0.1500
>0.2500
>0.2500
1359
Estimate
100% Max
99%
95%
90%
75% Q3
50% Median
25% Q1
10%
5%
1%
0% Min
90.2105
72.6780
66.2221
62.6678
56.7280
50.0649
43.4462
37.1139
33.5454
26.9189
13.6971
Frequency
|
*
800 +
***
|
****
|
******
|
*******
600 +
*******
|
**********
|
***********
|
***********
400 +
************
|
*************
|
***************
|
*****************
200 +
******************
|
*******************
|
**********************
| ***************************
-------------------------------2
3
4
5
6
7
8
0
0
0
0
0
0
0
3
*
S
t
d
2
*
S
t
d
1
*
S
t
d
M
e
a
n
1
*
S
t
d
2
*
S
t
d
3
*
S
t
d
X Midpoint
1360
Appendix 1
It can be proven mathematically that if the original population has mean and
standard deviation , then the sampling distribution of the mean also has mean , but
its standard deviation is = n. The standard deviation of the sampling distribution of
the mean is called the standard error of the mean. The standard error of the mean
provides an indication of the accuracy of a sample mean as an estimator of the
population mean.
If the original population has a normal distribution, then the sampling distribution of
the mean is also normal. If the original distribution is not normal but does not have
excessively long tails, then the sampling distribution of the mean can be approximated
by a normal distribution for large sample sizes.
The following example consists of three separate programs that show how the
sampling distribution of the mean can be approximated by a normal distribution as the
sample size increases. The rst DATA step uses the RANEXP function to create a
sample of 1000 observations from an exponential distribution.The theoretical
population mean is 1.00, while the sample mean is 1.01, to two decimal places. The
population standard deviation is 1.00; the sample standard deviation is 1.04.
This is an example of a nonnormal distribution. The population skewness is 2.00,
which is close to the sample skewness of 1.97. The population kurtosis is 6.00, but the
sample kurtosis is only 4.80.
options pagesize=64;
proc univariate data=expodat noextrobs=0 normal
mu0=1;
var x;
1361
1362
Appendix 1
run;
1000 Observation Sample
from an Exponential Distribution
Frequency
300 +
|
|
|
|
250 +
|
|
|
|
200 +
|
|
|
|
150 +
|
|
|
|
100 +*
|*
|*** *
|*****
|***** *
50 +********
|***********
|************ *
|*************** **
*
|************************* *** *** *
*
*
--------------------------------------------------------0
0
1
1
2
2
3
3
4
4
5
5
.
.
.
.
.
.
.
.
.
.
.
.
0
5
0
5
0
5
0
5
0
5
0
5
5
5
5
5
5
5
5
5
5
5
5
5
X Midpoint
1000
1.01176214
1.04371187
1.96963112
2111.90777
103.15783
Sum Weights
Sum Observations
Variance
Kurtosis
Corrected SS
Std Error Mean
1000
1011.76214
1.08933447
4.80150594
1088.24514
0.03300507
1.011762
0.689502
.
Variability
Std Deviation
Variance
Range
Interquartile Range
1.04371
1.08933
6.63851
1.06252
1363
-Statistic-
-----p Value------
Students t
Sign
Signed Rank
t
M
S
Pr > |t|
Pr >= |M|
Pr >= |S|
0.356374
-140
-50781
0.7216
<.0001
<.0001
--Statistic---
-----p Value------
Shapiro-Wilk
Kolmogorov-Smirnov
Cramer-von Mises
Anderson-Darling
W
D
W-Sq
A-Sq
Pr
Pr
Pr
Pr
0.801498
0.166308
9.507975
54.5478
<
>
>
>
W
D
W-Sq
A-Sq
<0.0001
<0.0100
<0.0050
<0.0050
Quantiles (Definition 5)
Quantile
100% Max
99%
95%
90%
75% Q3
50% Median
25% Q1
10%
5%
1%
0% Min
Estimate
6.63906758
5.04491651
3.13482318
2.37803632
1.35733401
0.68950221
0.29481436
0.10219011
0.05192799
0.01195590
0.00055441
The next DATA step generates 1000 different samples from the same exponential
distribution. Each sample contains ten observations. The MEANS procedure computes
the mean of each sample. In the data set that is created by PROC MEANS, each
observation represents the mean of a sample of ten observations from an exponential
distribution. Thus, the data set is a sample from the sampling distribution of the mean
for an exponential population.
PROC UNIVARIATE displays statistics for this sample of means. Notice that the
mean of the sample of means is .99, almost the same as the mean of the original
population. Theoretically, the standard deviation of the sampling distribution is
= n
: =
: , whereas the standard deviation of this sample from
thesampling distribution is .30. The skewness (.55) and kurtosis (-.006) are closer to
zero in the sample from the sampling distribution than in the original sample from the
exponential distribution. This is so because the sampling distribution is closer to a
normal distribution than is the original exponential distribution. The CHART
procedure displays a histogram of the 1000-sample means. The shape of the histogram
is much closer to a belllike, normal density, but it is still distinctly lopsided.
p = 1 00 p10 = 32
1364
Appendix 1
X=ranexp(433879);
output;
end;
end;
proc means data=samp10 noprint;
output out=mean10 mean=Mean;
var x;
by sample;
run;
proc format;
value axisfmt
.05=0.05
.55=0.55
1.05=1.05
1.55=1.55
2.05=2.05
other= ;
run;
proc chart data=mean10;
vbar mean/axis=300
midpoints=0.05 to 2.05 by .1;
format mean axisfmt.;
run;
options pagesize=64;
proc univariate data=mean10 noextrobs=0 normal
mu0=1;
var mean;
run;
1000 Sample Means with 10 Obs per Sample
Drawn from an Exponential Distribution
Frequency
300 +
|
|
|
|
250 +
|
|
|
|
200 +
|
|
|
|
150 +
|
*
|
* *
*
|
* * * *
|
* * * *
100 +
* * * *
|
* * * * *
|
* * * * * *
|
* * * * * *
|
* * * * * * *
*
50 +
* * * * * * * * * *
|
* * * * * * * * * *
|
* * * * * * * * * * *
|
* * * * * * * * * * * *
|
* * * * * * * * * * * * * * * *
-------------------------------------------0
0
1
1
2
.
.
.
.
.
0
5
0
5
0
5
5
5
5
5
Mean Midpoint
1000
0.9906857
0.30732649
0.54575615
1075.81327
31.0215931
Sum Weights
Sum Observations
Variance
Kurtosis
Corrected SS
Std Error Mean
1000
990.685697
0.09444957
-0.0060892
94.3551193
0.00971852
0.990686
0.956152
.
Variability
Std Deviation
Variance
Range
Interquartile Range
0.30733
0.09445
1.79783
0.41703
1365
1366
Appendix 1
-Statistic-
-----p Value------
Students t
Sign
Signed Rank
t
M
S
Pr > |t|
Pr >= |M|
Pr >= |S|
-0.95841
-53
-22687
0.3381
0.0009
0.0129
--Statistic---
-----p Value------
Shapiro-Wilk
Kolmogorov-Smirnov
Cramer-von Mises
Anderson-Darling
W
D
W-Sq
A-Sq
Pr
Pr
Pr
Pr
0.9779
0.055498
0.953926
5.945023
<
>
>
>
W
D
W-Sq
A-Sq
<0.0001
<0.0100
<0.0050
<0.0050
Quantiles (Definition 5)
Quantile
Estimate
100% Max
99%
95%
90%
75% Q3
50% Median
25% Q1
10%
5%
1%
0% Min
2.053899
1.827503
1.557175
1.416611
1.181006
0.956152
0.763973
0.621787
0.553568
0.433820
0.256069
In the following DATA step, the size of each sample from the exponential distribution
is increased to 50. The standard deviation of the sampling distribution is smaller than
in the previous example because the size of each sample is larger. Also, the sampling
distribution is even closer to a normal distribution, as can be seen from the histogram
and the skewness.
options nodate pageno=1 linesize=80 pagesize=48;
title 1000 Sample Means with 50 Obs per Sample;
title2 Drawn from an Exponential Distribution;
data samp50;
drop n;
do sample=1 to 1000;
do n=1 to 50;
X=ranexp(72437213);
output;
end;
end;
proc means data=samp50 noprint;
output out=mean50 mean=Mean;
var x;
by sample;
run;
proc format;
value axisfmt
.05=0.05
.55=0.55
1.05=1.05
1.55=1.55
2.05=2.05
2.55=2.55
other= ;
run;
proc chart data=mean50;
vbar mean / axis=300
midpoints=0.05 to 2.55 by .1;
format mean axisfmt.;
run;
options pagesize=64;
proc univariate data=mean50 nextrobs=0 normal
mu0=1;
var mean;
1367
1368
Appendix 1
run;
1000 Sample Means with 50 Obs per Sample
Drawn from an Exponential Distribution
Frequency
300 +
|
|
|
*
|
* *
250 +
* *
|
* *
|
* *
|
* *
|
* *
200 +
* *
|
* *
|
* * *
|
* * *
|
* * *
150 +
* * * *
|
* * * *
|
* * * *
|
* * * *
|
* * * *
100 +
* * * *
|
* * * *
|
* * * *
|
* * * * *
|
* * * * * *
50 +
* * * * * *
|
* * * * * *
|
* * * * * *
|
* * * * * * *
|
* * * * * * * *
-----------------------------------------------------0
0
1
1
2
2
.
.
.
.
.
.
0
5
0
5
0
5
5
5
5
5
5
5
Mean Midpoint
1000
0.99679697
0.13815404
0.19062633
1012.67166
13.8597969
Sum Weights
Sum Observations
Variance
Kurtosis
Corrected SS
Std Error Mean
1000
996.796973
0.01908654
-0.1438604
19.067451
0.00436881
0.996797
0.996023
.
Variability
Std Deviation
Variance
Range
Interquartile Range
0.13815
0.01909
0.87040
0.18956
Testing Hypotheses
1369
-Statistic-
-----p Value------
Students t
Sign
Signed Rank
t
M
S
Pr > |t|
Pr >= |M|
Pr >= |S|
-0.73316
-13
-10767
0.4636
0.4292
0.2388
--Statistic---
-----p Value------
Shapiro-Wilk
Kolmogorov-Smirnov
Cramer-von Mises
Anderson-Darling
W
D
W-Sq
A-Sq
Pr
Pr
Pr
Pr
0.996493
0.023687
0.084468
0.66039
<
>
>
>
W
D
W-Sq
A-Sq
0.0247
>0.1500
0.1882
0.0877
Quantiles (Definition 5)
Quantile
Estimate
100% Max
99%
95%
90%
75% Q3
50% Median
25% Q1
10%
5%
1%
0% Min
1.454957
1.337016
1.231508
1.179223
1.086515
0.996023
0.896953
0.814906
0.780783
0.706588
0.584558
Testing Hypotheses
Dening a Hypothesis
The purpose of the statistical methods that have been discussed so far is to estimate
a population parameter by means of a sample statistic. Another class of statistical
methods is used for testing hypotheses about population parameters or for measuring
the amount of evidence against a hypothesis.
Consider the universe of students in a college. Let the variable X be the number of
pounds by which a students weight deviates from the ideal weight for a person of the
same sex, height, and build. You want to nd out whether the population of students is,
on the average, underweight or overweight. To this end, you have taken a random
sample of X values from nine students, with results as given in the following DATA step:
title Deviations from Normal Weight;
data x;
input X @@;
datalines;
-7 -2 1 3 6 10 15 21 30
;
You can dene several hypotheses of interest. One hypothesis is that, on the average,
the students are of exactly ideal weight. If represents the population mean of the X
values, then you can write this hypothesis, called the null hypothesis, as 0
.
H : =0
1370
Testing Hypotheses
Appendix 1
The other two hypotheses, called alternative hypotheses, are that the students are
, and that the students are overweight on the
underweight on the average, 1
.
average, 2
The null hypothesis is so called because in many situations it corresponds to the
assumption of no effect or no difference. However, this interpretation is not
appropriate for all testing problems. The null hypothesis is like a straw man that can
be toppled by statistical evidence. You decide between the alternative hypotheses
according to which way the straw man falls.
A naive way to approach this problem would be to look at the sample mean and
decide among the three hypotheses according to the following rule:
3 If
, then decide on 1
.
, then decide on 0
.
3 If
3 If
, then decide on 2
.
H : <0
H : >0
x
x<0
x=0
x>0
H : <0
H : =0
H : >0
The trouble with this approach is that there may be a high probability of making an
incorrect decision. If H0 is true, then you are nearly certain to make a wrong decision
because the chances of being exactly zero are almost nil. If is slightly less than
zero, so that H1 is true, then there may be nearly a 50 percent chance that will be
greater than zero in repeated sampling, so the chances of incorrectly choosing H2 would
also be nearly 50 percent. Thus, you have a high probability of making an error if is
near zero. In such cases, there is not enough evidence to make a condent decision, so
the best response may be to reserve judgment until you can obtain more evidence.
The question is, how far from zero must be for you to be able to make a condent
decision? The answer can be obtained by considering the sampling distribution of . If
X has a roughly normal distribution, then has an approximately normal sampling
distribution. The mean of the sampling distribution of is . Assume temporarily that
, the standard deviation of X, is known to be 12. Then the standard error of for
.
samples of nine observations is
You know that about 95 percent of the values from a normal distribution are within
two standard deviations of the mean, so about 95 percent of the possible samples of
nine X values have a sample mean between
and
, or between 8
and 8. Consider the chances of making an error with the following decision rule:
, then decide on 1
.
3 If
, then reserve judgment.
3 If
3 If
, then decide on 2
.
x
x
x
x
x
x
p
=pn = 12= 9 = 4
x < 08
08 x 8
x>8
x
x
0 0 2 (4)
H : <0
H : >0
x
0 + 2 (4)
x
If H0 is true, then in about 95 percent of the possible samples will be between the
critical values
and 8, so you will reserve judgment. In these cases the statistical
evidence is not strong enough to fell the straw man. In the other 5 percent of the
samples you will make an error; in 2.5 percent of the samples you will incorrectly
choose H1, and in 2.5 percent you will incorrectly choose H2.
The price you pay for controlling the chances of making an error is the necessity of
reserving judgment when there is not sufcient statistical evidence to reject the null
hypothesis.
08
08
x
Testing Hypotheses
1371
hypothesis. If you were interested only in the possibility of the students being
overweight on the average, then you could use a one-tailed test:
3 If x
3 If x
For this one-tailed test, the type I error rate is 2.5 percent, half that of the two-tailed
test.
The probability of rejecting the null hypothesis if it is false is called the power of the
statistical test and is typically denoted as 1 . is called the Type II error rate,
which is the probability of not rejecting a false null hypothesis. The power depends on
the true value of the parameter. In the example, assume the population mean is 4. The
power for detecting H2 is the probability of getting a sample mean greater than 8. The
critical value 8 is one standard error higher than the population mean 4. The chance of
getting a value at least one standard deviation greater than the mean from a normal
distribution is about 16 percent, so the power for detecting the alternative hypothesis
H2 is about 16 percent. If the population mean were 8, then the power for H2 would be
50 percent, whereas a population mean of 12 would yield a power of about 84 percent.
The smaller the type I error rate is, the less the chance of making an incorrect
decision, but the higher the chance of having to reserve judgment. In choosing a type I
error rate, you should consider the resulting power for various alternatives of interest.
Students t Distribution
In practice, you usually cannot use any decision rule that uses a critical value based
on because you do not usually know the value of . You can, however, use s as an
estimate of . Consider the following statistic:
t=
x 0 0
p
s= n
This t statistic is the difference between the sample mean and the hypothesized
mean 0 divided by the estimated standard error of the mean.
If the null hypothesis is true and the population is normally distributed, then the t
statistic has what is called a Students t distribution with n 1 degrees of freedom.
This distribution looks very similar to a normal distribution, but the tails of the
Students t distribution are heavier. As the sample size gets larger, the sample standard
deviation becomes a better estimator of the population standard deviation, and the t
distribution gets closer to a normal distribution.
You can base a decision rule on the t statistic:
0
0
1372
Testing Hypotheses
Appendix 1
= 0.
Then, the TINV function in a DATA step computes the value of Students t
distribution for a two-tailed test at the 5 percent level of signicance and 8 degrees of
freedom.
data devnorm;
title Deviations from Normal Weight;
input X @@;
datalines;
-7 -2 1 3 6 10 15 21 30
;
proc means data=devnorm maxdec=3 n mean
std stderr t probt;
run;
title Students t Critical Value;
data _null_;
file print;
t=tinv(.975,8);
put t 5.3;
run;
Analysis Variable : X
N
Mean
Std Dev
Std Error t Value Pr > |t|
-------------------------------------------------------------9
8.556
11.759
3.920
2.18
0.0606
--------------------------------------------------------------
2.306
In the current example, the value of the t statistic is 2.18, which is less than the critical
t value of 2.3 (for a 5 percent signicance level and 8 degrees of freedom). Thus, at a 5
percent signicance level you must reserve judgment. If you had elected to use a 10
percent signicance level, then the critical value of the t distribution would have been
1.86 and you could have rejected the null hypothesis. The sample size is so small,
however, that the validity of your conclusion depends strongly on how close the
distribution of the population is to a normal distribution.
Probability Values
Another way to report the results of a statistical test is to compute a probability
value or p-value. A p-value gives the probability in repeated sampling of obtaining a
statistic as far in the direction(s) specied by the alternative hypothesis as is the value
actually observed. A two-tailed p-value for a t statistic is the probability of obtaining an
absolute t value that is greater than the observed absolute t value. A one-tailed p-value
for a t statistic for the alternative hypothesis
0 is the probability of obtaining a t
>
References
1373
value greater than the observed t value. Once the p-value is computed, you can perform
a hypothesis test by comparing the p-value with the desired signicance level. If the
p-value is less than or equal to the type I error rate of the test, then the null hypothesis
can be rejected. The two-tailed p-value, labeled Pr > |t| in the PROC MEANS output,
is .0606, so the null hypothesis could be rejected at the 10 percent signicance level but
not at the 5 percent level.
A p-value is a measure of the strength of the evidence against the null hypothesis.
The smaller the p-value, the stronger the evidence for rejecting the null hypothesis.
Note: For a more thorough discussion, consult an introductory statistics textbook
such as Mendenhall and Beaver (1998); Ott and Mendenhall (1994); or Snedecor and
Cochran (1989). 4
References
Ali, M.M. (1974), Stochastic Ordering and Kurtosis Measure, Journal of the
American Statistical Association, 69, 543545.
Johnson, M.E., Tietjen, G.L., and Beckman, R.J. (1980), A New Family of
Probability Distributions With Applications to Monte Carlo Studies, Journal of
the American Statistical Association, 75, 276-279.
Kaplansky, I. (1945), A Common Error Concerning Kurtosis, Journal of the
American Statistical Association, 40, 259-263.
Mendenhall, W. and Beaver, R.. (1998), Introduction to Probability and Statistics,
10th Edition, Belmont, CA: Wadsworth Publishing Company.
Ott, R. and Mendenhall, W. (1994) Understanding Statistics, 6th Edition, North
Scituate, MA: Duxbury Press.
Schlotzhauer, S.D. and Littell, R.C. (1997), SAS System for Elementary Statistical
Analysis, Second Edition, Cary, NC: SAS Institute Inc.
Snedecor, G.W. and Cochran, W.C. (1989), Statistical Methods, 8th Edition, Ames, IA:
Iowa State University Press.
1374
1375
APPENDIX
2
Operating Environment-Specic
Procedures
Descriptions of Operating Environment-Specic Procedures
1375
Host-Specic Procedures
Procedure
Description
Releases
BMDP
All
CONVERT
All
C16PORT
6.10 - 6.12
FSDEVICE
All
PDS
6.09E
PDSCOPY
Copies partitioned data sets from disk to disk, disk to tape, tape
to tape, or tape to disk.
6.09E
RELEASE
6.09E
SOURCE
6.09E
TAPECOPY
6.09E
TAPELABEL
6.09E
1376
1377
APPENDIX
3
Raw Data and DATA Steps
Overview 1377
CENSUS 1377
CHARITY 1378
CUSTOMER_RESPONSE 1380
DJIA 1383
EDUCATION 1384
EMPDATA 1385
ENERGY 1387
GROC 1388
MATCH_11 1388
PROCLIB.DELAY 1390
PROCLIB.EMP95 1391
PROCLIB.EMP96 1392
PROCLIB.INTERNAT 1393
PROCLIB.LAKES 1393
PROCLIB.MARCH 1394
PROCLIB.PAYLIST2 1395
PROCLIB.PAYROLL 1395
PROCLIB.PAYROLL2 1398
PROCLIB.SCHEDULE 1399
PROCLIB.STAFF 1402
PROCLIB.SUPERV 1405
RADIO 1405
Overview
The programs for examples in this document generally show you how to create the
data sets that are used. Some examples show only partial data. For these examples,
the complete data is shown in this appendix.
CENSUS
data census;
input Density CrimeRate State $ 14-27 PostalCode $ 29-30;
datalines;
263.3 4575.3 Ohio
OH
1378
CHARITY
62.1 7017.1
Appendix 3
Washington
WA
Mississippi
MS
FL
80.8 2190.7
WV
West Virginia
MD
71.2 4707.5
Missouri
MO
43.9 4245.2
Arkansas
AR
7.3 6371.4
Nevada
NV
PA
11.5 4156.3
Idaho
ID
44.1 6025.6
Oklahoma
OK
51.2 4615.8
Minnesota
MN
55.2 4271.2
Vermont
VT
27.4 6969.9
Oregon
OR
IL
94.1 5792.0
Georgia
GA
9.1 2678.0
South Dakota
SD
9.4 2833.0
North Dakota
ND
NH
54.3 7722.4
Texas
TX
76.6 4451.4
Alabama
AL
DE
CA
TN
CHARITY
data Charity;
input School $ 1-7 Year 9-12 Name $ 14-20 MoneyRaised 22-26
HoursVolunteered 28-29;
datalines;
Monroe
Monroe
1992 Barry
Monroe
Monroe
1992 Danny
Monroe
1992 Edward
53.76 31
Monroe
1992 Fiona
48.55 13
Monroe
1992 Gert
24.00 16
Monroe
1992 Harold
27.55 17
Monroe
1992 Ima
15.98
Monroe
1992 Jack
20.00 23
Monroe
1992 Katie
22.11
Monroe
1992 Lisa
18.34 17
Monroe
1992 Tonya
55.16 40
Monroe
1992 Max
26.77 34
Monroe
1992 Ned
28.43 22
Monroe
1992 Opal
32.66 14
Monroe
1993 Patsy
18.33 18
Monroe
23.76 16
5
6.89 23
Monroe
Monroe
1993 Sam
15.88
Monroe
1993 Tyra
21.88 23
Monroe
1993 Myrtle
47.33 26
Monroe
1993 Frank
41.11 22
Monroe
Monroe
1993 Vern
Monroe
Monroe
1993 Bob
26.88
Monroe
1993 Leah
28.99 23
Monroe
1994 Becky
30.33 26
Monroe
1994 Sally
35.75 27
Monroe
1994 Edgar
27.11 12
Monroe
1994 Dawson
17.24 16
Monroe
1994 Lou
Monroe
1994 Damien
18.74 17
Monroe
1994 Mona
27.43
Monroe
1994 Della
56.78 15
Monroe
Monroe
1994 Carl
31.12 25
Monroe
1994 Reba
35.16 22
Monroe
1994 Dax
27.65 23
Monroe
1994 Gary
23.11 15
Monroe
1994 Suzie
26.65 11
Monroe
1994 Benito
47.44 18
Monroe
1994 Thomas
21.99 23
Monroe
1994 Annie
24.99 27
Monroe
1994 Paul
27.98 22
Monroe
1994 Alex
24.00 16
Monroe
1994 Lauren
15.00 17
Monroe
1994 Julia
12.98 15
Monroe
1994 Keith
11.89 19
Monroe
1994 Jackie
26.88 22
Monroe
1994 Pablo
13.98 28
Monroe
1994 L.T.
56.87 33
Monroe
Monroe
1994 Kathy
Monroe
1994 Abby
17.89 11
5.12 16
32.88 11
35.88 10
34.98 14
27.55 25
12.88 21
15.62
28.99 34
25.89 22
35.89 35
28.77 26
29.00 27
31.67 25
43.89 22
52.63 21
19.67 21
24.89 12
37.88 12
CHARITY
1379
1380
CUSTOMER_RESPONSE
28.89 21
26.44 21
Appendix 3
25.89 21
27.17 17
42.23 25
18.67 27
19.09 25
28.77 28
27.08 31
22.22 24
19.80 24
27.07 25
24.44 12
28.89 11
31.11 12
30.55 11
27.56 11
29.90 26
30.55 28
37.67 22
23.33 27
27.90 25
27.78 23
34.44 18
72.22 24
6.78 18
23.33 19
40.00 26
35.99 28
27.45 25
28.88 21
34.44 25
CUSTOMER_RESPONSE
data customer_response;
input Customer Factor1-Factor4 Source1-Source3
Quality1-Quality3;
datalines;
1 . . 1 1 1 1 . 1 . .
2 1 1 . 1 1 1 . 1 1 .
3 . . 1 1 1 1 . . . .
4 1 1 . 1 . 1 . . . 1
5 . 1 . 1 1 . . . . 1
6 . 1 . 1 1 . . . . .
7 . 1 . 1 1 . . 1 . .
8 1 . . 1 1 1 . 1 1 .
9 1 1 . 1 1 . . . . 1
10 1 . . 1 1 1 . 1 1 .
11 1 1 1 1 . 1 . 1 1 1
12 1 1 . 1 1 1 . . . .
13 1 1 . 1 . 1 . 1 1 .
14 1 1 . 1 1 1 . . . .
15 1 1 . 1 . 1 . 1 1 1
16 1 . . 1 1 . . 1 . .
17 1 1 . 1 1 1 . . 1 .
18 1 1 . 1 1 1 1 . . 1
19 . 1 . 1 1 1 1 . 1 .
20 1 . . 1 1 1 . 1 1 1
21 . . . 1 1 1 . 1 . .
22 . . . 1 1 1 . 1 1 .
23 1 . . 1 . . . . . 1
24 . 1 . 1 1 . . 1 . 1
25 1 1 . 1 1 . . . 1 1
26 1 1 . 1 1 . . 1 . .
27 1 . . 1 1 . . . 1 .
28 1 1 . 1 . . . 1 1 1
29 1 . . 1 1 1 . 1 . 1
30 1 . 1 1 1 . . 1 1 .
31 . . . 1 1 . . 1 1 .
32 1 1 1 1 1 . . 1 1 1
33 1 . . 1 1 . . 1 . 1
34 . . 1 1 . . . 1 1 .
35 1 1 1 1 1 . 1 1 . .
36 1 1 1 1 . 1 . 1 . .
37 1 1 . 1 . . . 1 . .
38 . . . 1 1 1 . 1 . .
39 1 1 . 1 1 . . 1 . 1
40 1 . . 1 . . 1 1 . 1
41 1 . . 1 1 1 1 1 . 1
42 1 1 1 1 . . 1 1 . .
43 1 . . 1 1 1 . 1 . .
44 1 . 1 1 . 1 . 1 . 1
45 . . . 1 . . 1 . . 1
46 . . . 1 1 . . . 1 .
47 1 1 . 1 . . 1 1 . .
48 1 . 1 1 1 . 1 1 . .
49 . . 1 1 1 1 . 1 . 1
50 . 1 . 1 1 . . 1 1 .
51 1 . 1 1 1 1 . . . .
52 1 1 1 1 1 1 . 1 . .
53 . 1 1 1 . 1 . 1 1 1
54 1 . . 1 1 . . 1 1 .
55 1 1 . 1 1 1 . 1 . .
56 1 . . 1 1 . . 1 1 .
57 1 1 . 1 1 . 1 . . 1
58 . 1 . 1 . 1 . . 1 1
59 1 1 1 1 . . 1 1 1 .
60 . 1 1 1 1 1 . . 1 1
61 1 1 1 1 1 1 . 1 . .
CUSTOMER_RESPONSE
1381
1382
CUSTOMER_RESPONSE
62 1 1 . 1 1 . . 1 1 .
63 . . . 1 . . . 1 1 1
64 1 . . 1 1 1 . 1 . .
65 1 . . 1 1 1 . 1 . .
66 1 . . 1 1 1 1 1 1 .
67 1 1 . 1 1 1 . 1 1 .
68 1 1 . 1 1 1 . 1 1 .
69 1 1 . 1 1 . 1 . . .
70 . . . 1 1 1 . 1 . .
71 1 . . 1 1 . 1 . . 1
72 1 . 1 1 1 1 . . 1 .
73 1 1 . 1 . 1 . 1 1 .
74 1 1 1 1 1 1 . 1 . .
75 . 1 . 1 1 1 . . 1 .
76 1 1 . 1 1 1 . 1 1 1
77 . . . 1 1 1 . . . .
78 1 1 1 1 1 1 . 1 1 .
79 1 . . 1 1 1 . 1 1 .
80 1 1 1 1 1 . 1 1 . 1
81 1 1 . 1 1 1 1 1 1 .
82 . . . 1 1 1 1 . . .
83 1 1 . 1 1 1 . 1 1 .
84 1 . . 1 1 . . 1 1 .
85 . . . 1 . 1 . 1 . .
86 1 . . 1 1 1 . 1 1 1
87 1 1 . 1 1 1 . 1 . .
88 . . . 1 . 1 . . . .
89 1 . . 1 . 1 . . 1 1
90 1 1 . 1 1 1 . 1 . 1
91 . . . 1 1 . . . 1 .
92 1 . . 1 1 1 . 1 1 .
93 1 . . 1 1 . . 1 1 .
94 1 . . 1 1 1 1 1 . .
95 1 . . 1 . 1 1 1 1 .
96 1 . 1 1 1 1 . . 1 .
97 1 1 . 1 1 . . . 1 .
98 1 . 1 1 1 1 1 1 . .
99 1 1 . 1 1 1 1 1 1 .
100 1 . 1 1 1 . . . 1 1
101 1 . 1 1 1 1 . . . .
102 1 . . 1 1 . 1 1 . .
103 1 1 . 1 1 1 . 1 . .
104 . . . 1 1 1 . 1 1 1
105 1 . 1 1 1 . . 1 . 1
106 1 1 1 1 1 1 1 1 1 1
107 1 1 1 1 . . . 1 . 1
108 1 . . 1 . 1 1 1 . .
109 . 1 . 1 1 . . 1 1 .
110 1 . . 1 . . . . . .
111 1 . . 1 1 1 . 1 1 .
112 1 1 . 1 1 1 . . . 1
113 1 1 . 1 1 . 1 1 1 .
114 1 1 . 1 1 . . . . .
115 1 1 . 1 1 . . 1 . .
Appendix 3
116 . 1 . 1 1 1 1 1 . .
117 . 1 . 1 1 1 . . . .
118 . 1 1 1 1 . . 1 1 .
119 . . . 1 . . . 1 . .
120 1 1 . 1 . . . . 1 .
;
DJIA
data djia;
input Year @7 HighDate date7. High @24 LowDate date7. Low;
format highdate lowdate date7.;
datalines;
1954
31DEC54
404.39
11JAN54
279.87
1955
30DEC55
488.40
17JAN55
388.20
1956
06APR56
521.05
23JAN56
462.35
1957
12JUL57
520.77
22OCT57
419.79
1958
31DEC58
583.65
25FEB58
436.89
1959
31DEC59
679.36
09FEB59
574.46
1960
05JAN60
685.47
25OCT60
568.05
1961
13DEC61
734.91
03JAN61
610.25
1962
03JAN62
726.01
26JUN62
535.76
1963
18DEC63
767.21
02JAN63
646.79
1964
18NOV64
891.71
02JAN64
768.08
1965
31DEC65
969.26
28JUN65
840.59
1966
09FEB66
995.15
07OCT66
744.32
1967
25SEP67
943.08
03JAN67
786.41
1968
03DEC68
985.21
21MAR68
825.13
1969
14MAY69
968.85
17DEC69
769.93
1970
29DEC70
842.00
06MAY70
631.16
1971
28APR71
950.82
23NOV71
797.97
1972
11DEC72 1036.27
26JAN72
889.15
1973
11JAN73 1051.70
05DEC73
788.31
1974
13MAR74
891.66
06DEC74
577.60
1975
15JUL75
881.81
02JAN75
632.04
1976
21SEP76 1014.79
02JAN76
858.71
1977
03JAN77
999.75
02NOV77
800.85
1978
08SEP78
907.74
28FEB78
742.12
1979
05OCT79
897.61
07NOV79
796.67
1980
20NOV80 1000.17
21APR80
759.13
1981
27APR81 1024.05
25SEP81
824.01
1982
27DEC82 1070.55
12AUG82
776.92
1983
29NOV83 1287.20
03JAN83 1027.04
1984
06JAN84 1286.64
24JUL84 1086.57
1985
16DEC85 1553.10
04JAN85 1184.96
1986
02DEC86 1955.57
22JAN86 1502.29
1987
25AUG87 2722.42
19OCT87 1738.74
1988
21OCT88 2183.50
20JAN88 1879.14
1989
09OCT89 2791.41
03JAN89 2144.64
1990
16JUL90 2999.75
11OCT90 2365.10
1991
31DEC91 3168.83
09JAN91 2470.30
1992
01JUN92 3413.21
09OCT92 3136.58
DJIA
1383
1384
EDUCATION
Appendix 3
1993
29DEC93 3794.33
20JAN93 3241.95
1994
31JAN94 3978.36
04APR94 3593.35
EDUCATION
data education;
input State $14. +1 Code $ DropoutRate Expenditures MathScore Region $;
label dropoutrate=Dropout Percentage - 1989
expenditures=Expenditure Per Pupil - 1989
mathscore=8th Grade Math Exam - 1990;
datalines;
Alabama
Alaska
AK 35.8 7716 .
Arizona
Arkansas
California
Colorado
Connecticut
Delaware
Florida
Georgia
Hawaii
Idaho
Illinois
Indiana
Iowa
Kansas
KS 17.9 4443 .
Kentucky
Louisiana
Maine
ME 22.5 4744 .
Maryland
MW
NE
Massachusetts
MA 28.0 5979 .
Michigan
NE
Minnesota
Mississippi
MS 39.9 2874 .
SE
Missouri
MO 26.5 4263 .
MW
Montana
Nebraska
Nevada
NV 28.1 3791 .
New Hampshire
New Jersey
New Mexico
New York
NY 35.0 .
261 NE
Ohio
EMPDATA
data empdata;
input IdNumber $ 1-4 LastName $ 9-19 FirstName $ 20-29
City $ 30-42 State $ 43-44 /
Gender $ 1 JobCode $ 9-11 Salary 20-29 @30 Birth date7.
@43 Hired date7. HomePhone $ 54-65;
format birth hired date7.;
datalines;
1919
Adams
Gerald
Stamford
CT
TA2
34376
15SEP48
07JUN75
1653
Alexander
Susan
Bridgeport
CT
ME2
35108
18OCT52
12AUG78
1400
Apple
Troy
New York
NY
ME1
29769
08NOV55
19OCT78
1350
Arthur
Barbara
New York
NY
FA3
32886
03SEP53
01AUG78
1401
Avery
Jerry
Paterson
NJ
TA3
38822
16DEC38
20NOV73
1499
Barefoot
Joseph
Princeton
NJ
ME3
43025
29APR42
10JUN68
1101
Baucom
Walter
New York
NY
SCP
18723
09JUN50
04OCT78
1333
Blair
Justin
Stamford
CT
PT2
88606
02APR49
13FEB69
1402
Blalock
Ralph
New York
NY
TA2
32615
20JAN51
05DEC78
1479
Bostic
Marie
New York
NY
TA3
38785
25DEC56
08OCT77
1403
Bowden
Earl
Bridgeport
CT
ME1
28072
31JAN57
24DEC79
1739
Boyce
Jonathan
New York
NY
PT1
66517
28DEC52
30JAN79
1658
Bradley
Jeremy
New York
NY
SCP
17943
11APR55
03MAR80
1428
Brady
Christine Stamford
CT
PT1
68767
07APR58
19NOV79
1782
Brown
Jason
Stamford
CT
ME2
35345
07DEC58
25FEB80
1244
Bryant
Leonard
New York
NY
ME2
36925
03SEP51
20JAN76
1383
Burnette
Thomas
New York
NY
BCK
25823
28JAN56
23OCT80
1574
Cahill
Marshall
New York
NY
FA2
28572
30APR48
23DEC80
1789
Caraway
Davis
New York
NY
SCP
18326
28JAN45
14APR66
1404
Carter
Donald
New York
NY
PT2
91376
27FEB41
04JAN68
1437
Carter
Dorothy
Bridgeport
CT
A3
33104
23SEP48
03SEP72
1639
Carter
Karen
Stamford
CT
203/781-1255
203/675-7715
212/586-0808
718/383-1549
201/732-8787
201/812-5665
212/586-8060
203/781-1777
718/384-2849
718/384-8816
203/675-3434
212/587-1247
212/587-3622
203/781-1212
203/781-0019
718/383-3334
718/384-3569
718/383-2338
212/587-9000
718/384-2946
203/675-4117
EMPDATA
1385
1386
EMPDATA
Appendix 3
A3
40260
29JUN45
31JAN72
1269
Caston
Franklin
Stamford
CT
NA1
41690
06MAY60
01DEC80
1065
Chapman
Neil
New York
NY
ME2
35090
29JAN32
10JAN75
1876
Chin
Jack
New York
NY
TA3
39675
23MAY46
30APR73
1037
Chow
Jane
Stamford
CT
TA1
28558
13APR52
16SEP80
1129
Cook
Brenda
New York
NY
ME2
34929
11DEC49
20AUG79
1988
Cooper
Anthony
New York
NY
FA3
32217
03DEC47
21SEP72
1405
Davidson
Jason
Paterson
NJ
SCP
18056
08MAR54
29JAN80
1430
Dean
Sandra
Bridgeport
CT
TA2
32925
03MAR50
30APR75
1983
Dean
Sharon
New York
NY
FA3
33419
03MAR50
30APR75
1134
Delgado
Maria
Stamford
CT
TA2
33462
08MAR57
24DEC76
1118
Dennis
Roger
New York
NY
PT3
111379
19JAN32
21DEC68
1438
Donaldson
Karen
Stamford
CT
TA3
39223
18MAR53
21NOV75
1125
Dunlap
Donna
New York
NY
FA2
28888
11NOV56
14DEC75
1475
Eaton
Alicia
New York
NY
FA2
27787
18DEC49
16JUL78
1117
Edgerton
Joshua
New York
NY
TA3
39771
08JUN51
16AUG80
1935
Fernandez
Katrina
Bridgeport
CT
NA2
51081
31MAR42
19OCT69
1124
Fields
Diana
White Plains NY
FA1
23177
13JUL46
04OCT78
1422
Fletcher
Marie
Princeton
NJ
FA1
22454
07JUN52
09APR79
1616
Flowers
Annette
New York
NY
TA2
34137
04MAR58
07JUN81
1406
Foster
Gerald
Bridgeport
CT
ME2
35185
11MAR49
20FEB75
1120
Garcia
Jack
New York
NY
ME1
28619
14SEP60
10OCT81
1094
Gomez
Alan
Bridgeport
CT
FA1
22268
05APR58
20APR79
1389
Gordon
Levi
New York
NY
BCK
25028
18JUL47
21AUG78
1905
Graham
Alvin
New York
NY
PT1
65111
19APR60
01JUN80
1407
Grant
Daniel
Mt. Vernon
NY
PT1
68096
26MAR57
21MAR78
1114
Green
Janice
New York
NY
TA2
32928
21SEP57
30JUN75
203/781-8839
203/781-3335
718/384-5618
212/588-5634
203/781-8868
718/383-2313
212/587-1228
201/732-2323
203/675-1647
718/384-1647
203/781-1528
718/383-1122
203/781-2229
718/383-2094
718/383-2828
212/588-1239
203/675-2962
914/455-2998
201/812-0902
718/384-3329
203/675-6363
718/384-4930
203/675-7181
718/384-9326
212/586-8815
914/468-1616
212/588-1092
ENERGY
data energy;
length State $2;
input Region Division state $ Type Expenditures;
datalines;
1 1 ME 1 708
1 1 ME 2 379
1 1 NH 1 597
1 1 NH 2 301
1 1 VT 1 353
1 1 VT 2 188
1 1 MA 1 3264
1 1 MA 2 2498
1 1 RI 1 531
1 1 RI 2 358
1 1 CT 1 2024
1 1 CT 2 1405
1 2 NY 1 8786
1 2 NY 2 7825
1 2 NJ 1 4115
1 2 NJ 2 3558
1 2 PA 1 6478
1 2 PA 2 3695
4 3 MT 1 322
4 3 MT 2 232
4 3 ID 1 392
4 3 ID 2 298
4 3 WY 1 194
4 3 WY 2 184
4 3 CO 1 1215
4 3 CO 2 1173
4 3 NM 1 545
4 3 NM 2 578
4 3 AZ 1 1694
4 3 AZ 2 1448
4 3 UT 1 621
4 3 UT 2 438
4 3 NV 1 493
4 3 NV 2 378
4 4 WA 1 1680
4 4 WA 2 1122
4 4 OR 1 1014
4 4 OR 2 756
4 4 CA 1 10643
4 4 CA 2 10114
4 4 AK 1 349
4 4 AK 2 329
4 4 HI 1 273
4 4 HI 2 298
;
ENERGY
1387
1388
GROC
Appendix 3
GROC
data groc;
input Region $9. Manager $ Department $ Sales;
datalines;
Southeast
Hayes
Paper
250
Southeast
Hayes
Produce
100
Southeast
Hayes
Canned
120
Southeast
Hayes
Meat
80
Southeast
Michaels
Paper
40
Southeast
Michaels
Produce
300
Southeast
Michaels
Canned
220
Southeast
Michaels
Meat
Northwest
Jeffreys
Paper
Northwest
Jeffreys
Produce
600
Northwest
Jeffreys
Canned
420
Northwest
Jeffreys
Meat
Northwest
Duncan
Paper
Northwest
Duncan
Produce
250
Northwest
Duncan
Canned
230
Northwest
Duncan
Meat
73
Northwest
Aikmann
Paper
45
Northwest
Aikmann
Produce
205
Northwest
Aikmann
Canned
420
Northwest
Aikmann
Meat
Southwest
Royster
Paper
Southwest
Royster
Produce
130
Southwest
Royster
Canned
120
Southwest
Royster
Meat
Southwest
Patel
Paper
Southwest
Patel
Produce
350
Southwest
Patel
Canned
225
Southwest
Patel
Meat
Northeast
Rice
Paper
90
Northeast
Rice
Produce
90
Northeast
Rice
Canned
Northeast
Rice
Meat
Northeast
Fuller
Paper
200
Northeast
Fuller
Produce
300
Northeast
Fuller
Canned
420
Northeast
Fuller
Meat
125
70
60
30
45
76
53
50
40
80
420
86
MATCH_11
data match_11;
input Pair Low Age Lwt Race Smoke Ptd Ht UI @@;
select(race);
when (1) do;
race1=0;
race2=0;
end;
when (2) do;
race1=1;
race2=0;
end;
when (3) do;
race1=0;
race2=1;
end;
end;
datalines;
1
0 14 135 1 0 0 0 0
1 14 101 3 1 1 0 0
0 15
98 2 0 0 0 0
1 15 115 3 0 0 0 1
0 16
95 3 0 0 0 0
1 16 130 3 0 0 0 0
0 17 103 3 0 0 0 0
1 17 130 3 1 1 0 1
0 17 122 1 1 0 0 0
1 17 110 1 1 0 0 0
0 17 113 2 0 0 0 0
1 17 120 1 1 0 0 0
0 17 113 2 0 0 0 0
1 17 120 2 0 0 0 0
0 17 119 3 0 0 0 0
1 17 142 2 0 0 1 0
0 18 100 1 1 0 0 0
1 18 148 3 0 0 0 0
10 0 18
90 1 1 0 0 1
10 1 18 110 2 1 1 0 0
11 0 19 150 3 0 0 0 0
11 1 19
12 0 19 115 3 0 0 0 0
12 1 19 102 1 0 0 0 0
13 0 19 235 1 1 0 1 0
13 1 19 112 1 1 0 0 1
14 0 20 120 3 0 0 0 1
14 1 20 150 1 1 0 0 0
15 0 20 103 3 0 0 0 0
15 1 20 125 3 0 0 0 1
16 0 20 169 3 0 1 0 1
16 1 20 120 2 1 0 0 0
17 0 20 141 1 0 1 0 1
17 1 20
18 0 20 121 2 1 0 0 0
18 1 20 109 3 0 0 0 0
19 0 20 127 3 0 0 0 0
19 1 20 121 1 1 1 0 1
20 0 20 120 3 0 0 0 0
20 1 20 122 2 1 0 0 0
21 0 20 158 1 0 0 0 0
21 1 20 105 3 0 0 0 0
22 0 21 108 1 1 0 0 1
22 1 21 165 1 1 0 1 0
23 0 21 124 3 0 0 0 0
23 1 21 200 2 0 0 0 0
24 0 21 185 2 1 0 0 0
24 1 21 103 3 0 0 0 0
25 0 21 160 1 0 0 0 0
25 1 21 100 3 0 1 0 0
26 0 21 115 1 0 0 0 0
26 1 21 130 1 1 0 1 0
27 0 22
91 1 1 1 0 1
80 3 1 0 0 1
95 3 0 0 1 0
27 1 22 130 1 1 0 0 0
28 0 22 158 2 0 1 0 0
28 1 22 130 1 1 1 0 1
29 0 23 130 2 0 0 0 0
29 1 23
30 0 23 128 3 0 0 0 0
30 1 23 187 2 1 0 0 0
31 0 23 119 3 0 0 0 0
31 1 23 120 3 0 0 0 0
32 0 23 115 3 1 0 0 0
32 1 23 110 1 1 1 0 0
33 0 23 190 1 0 0 0 0
33 1 23
34 0 24
90 1 1 1 0 0
34 1 24 128 2 0 1 0 0
35 0 24 115 1 0 0 0 0
35 1 24 132 3 0 0 1 0
36 0 24 110 3 0 0 0 0
36 1 24 155 1 1 1 0 0
37 0 24 115 3 0 0 0 0
37 1 24 138 1 0 0 0 0
38 0 24 110 3 0 1 0 0
38 1 24 105 2 1 0 0 0
39 0 25 118 1 1 0 0 0
39 1 25 105 3 0 1 1 0
40 0 25 120 3 0 0 0 1
40 1 25
41 0 25 155 1 0 0 0 0
41 1 25 115 3 0 0 0 0
97 3 0 0 0 1
94 3 1 0 0 0
85 3 0 0 0 1
MATCH_11
1389
1390
PROCLIB.DELAY
Appendix 3
42 0 25 125 2 0 0 0 0
42 1 25
92 1 1 0 0 0
43 0 25 140 1 0 0 0 0
43 1 25
89 3 0 1 0 0
44 0 25 241 2 0 0 1 0
44 1 25 105 3 0 1 0 0
45 0 26 113 1 1 0 0 0
45 1 26 117 1 1 1 0 0
46 0 26 168 2 1 0 0 0
46 1 26
47 0 26 133 3 1 1 0 0
47 1 26 154 3 0 1 1 0
48 0 26 160 3 0 0 0 0
48 1 26 190 1 1 0 0 0
49 0 27 124 1 1 0 0 0
49 1 27 130 2 0 0 0 1
50 0 28 120 3 0 0 0 0
50 1 28 120 3 1 1 0 1
51 0 28 130 3 0 0 0 0
51 1 28
52 0 29 135 1 0 0 0 0
52 1 29 130 1 0 0 0 1
53 0 30
96 3 0 0 0 0
95 1 1 0 0 0
95 1 1 0 0 0
53 1 30 142 1 1 1 0 0
54 0 31 215 1 1 0 0 0
54 1 31 102 1 1 1 0 0
55 0 32 121 3 0 0 0 0
55 1 32 105 1 1 0 0 0
56 0 34 170 1 0 1 0 0
56 1 34 187 2 1 0 1 0
PROCLIB.DELAY
data proclib.delay;
input flight $3. +5 date date7. +2 orig $3. +3 dest $3. +3
delaycat $15. +2 destype $15. +8 delay;
informat date date7.;
format date date7.;
datalines;
114
01MAR94
LGA
LAX
1-10 Minutes
Domestic
202
01MAR94
LGA
ORD
No Delay
Domestic
-5
219
01MAR94
LGA
LON
11+ Minutes
International
18
622
01MAR94
LGA
FRA
No Delay
International
-5
132
01MAR94
LGA
YYZ
11+ Minutes
International
14
271
01MAR94
LGA
PAR
1-10 Minutes
International
302
01MAR94
LGA
WAS
No Delay
Domestic
114
02MAR94
LGA
LAX
No Delay
Domestic
202
02MAR94
LGA
ORD
1-10 Minutes
Domestic
219
02MAR94
LGA
LON
11+ Minutes
International
622
02MAR94
LGA
FRA
No Delay
International
132
02MAR94
LGA
YYZ
1-10 Minutes
International
271
02MAR94
LGA
PAR
1-10 Minutes
International
302
02MAR94
LGA
WAS
No Delay
Domestic
114
03MAR94
LGA
LAX
No Delay
Domestic
-1
202
03MAR94
LGA
ORD
No Delay
Domestic
-1
219
03MAR94
LGA
LON
1-10 Minutes
International
622
03MAR94
LGA
FRA
No Delay
International
-2
132
03MAR94
LGA
YYZ
1-10 Minutes
International
271
03MAR94
LGA
PAR
1-10 Minutes
International
302
03MAR94
LGA
WAS
1-10 Minutes
Domestic
114
04MAR94
LGA
LAX
11+ Minutes
Domestic
15
202
04MAR94
LGA
ORD
No Delay
Domestic
-5
219
04MAR94
LGA
LON
1-10 Minutes
International
622
04MAR94
LGA
FRA
11+ Minutes
International
30
132
04MAR94
LGA
YYZ
No Delay
International
-5
271
04MAR94
LGA
PAR
1-10 Minutes
International
5
-2
18
302
04MAR94
LGA
WAS
1-10 Minutes
Domestic
114
05MAR94
LGA
LAX
No Delay
Domestic
-2
202
05MAR94
LGA
ORD
1-10 Minutes
Domestic
219
05MAR94
LGA
LON
1-10 Minutes
International
622
05MAR94
LGA
FRA
No Delay
International
-6
132
05MAR94
LGA
YYZ
1-10 Minutes
International
271
05MAR94
LGA
PAR
1-10 Minutes
International
114
06MAR94
LGA
LAX
No Delay
Domestic
-1
202
06MAR94
LGA
ORD
No Delay
Domestic
-3
219
06MAR94
LGA
LON
11+ Minutes
International
27
132
06MAR94
LGA
YYZ
1-10 Minutes
International
302
06MAR94
LGA
WAS
1-10 Minutes
Domestic
114
07MAR94
LGA
LAX
No Delay
Domestic
-1
202
07MAR94
LGA
ORD
No Delay
Domestic
-2
219
07MAR94
LGA
LON
11+ Minutes
International
15
622
07MAR94
LGA
FRA
11+ Minutes
International
21
132
07MAR94
LGA
YYZ
No Delay
International
-2
271
07MAR94
LGA
PAR
1-10 Minutes
International
302
07MAR94
LGA
WAS
No Delay
Domestic
PROCLIB.EMP95
data proclib.emp95;
input #1 idnum $4. @6 name $15.
#2 address $42.
#3 salary 6.;
datalines;
2388 James Schmidt
100 Apt. C Blount St. SW Raleigh NC 27693
92100
2457 Fred Williams
99 West Lane
Garner NC 27509
33190
2776 Robert Jones
12988 Wellington Farms Ave. Cary NC 27512
29025
8699 Jerry Capalleti
222 West L St. Oxford NC 27587
39985
2100 Lanny Engles
293 Manning Pl. Raleigh NC 27606
30998
9857 Kathy Krupski
1000 Taft Ave. Morrisville NC 27508
38756
0987 Dolly Lunford
2344 Persimmons Branch
Apex NC 27505
44010
3286 Hoa Nguyen
2818 Long St. Cary NC 27513
87734
PROCLIB.EMP95
1391
1392
PROCLIB.EMP96
Appendix 3
PROCLIB.EMP96
data proclib.emp96;
input #1 idnum $4. @6 name $15.
#2 address $42.
#3 salary 6.;
datalines;
2388 James Schmidt
100 Apt. C Blount St. SW Raleigh NC 27693
92100
2457 Fred Williams
99 West Lane
Garner NC 27509
33190
2776 Robert Jones
12988 Wellington Farms Ave. Cary NC 27511
29025
8699 Jerry Capalleti
222 West L St. Oxford NC 27587
39985
3278 Mary Cravens
211 N. Cypress St. Cary NC 27512
35362
2100 Lanny Engles
293 Manning Pl. Raleigh NC 27606
30998
9857 Kathy Krupski
100 Taft Ave. Morrisville NC 27508
40456
0987 Dolly Lunford
2344 Persimmons Branch Trail Apex NC 27505
45110
3286 Hoa Nguyen
2818 Long St. Cary NC 27513
89834
6579 Bryan Samosky
3887 Charles Ave. Garner NC 27508
50234
3888 Kim Siu
5662 Magnolia Blvd Southwest Cary NC 27513
79958
6544 Roger Monday
PROCLIB.INTERNAT
data proclib.internat;
input flight $3.
01MAR94
LON
198
622
01MAR94
FRA
207
132
01MAR94
YYZ
115
271
01MAR94
PAR
138
219
02MAR94
LON
147
622
02MAR94
FRA
176
132
02MAR94
YYZ
106
271
02MAR94
PAR
172
219
03MAR94
LON
197
622
03MAR94
FRA
180
132
03MAR94
YYZ
75
271
03MAR94
PAR
147
219
04MAR94
LON
232
622
04MAR94
FRA
137
132
04MAR94
YYZ
117
271
04MAR94
PAR
146
219
05MAR94
LON
160
622
05MAR94
FRA
185
132
05MAR94
YYZ
157
271
05MAR94
PAR
177
219
06MAR94
LON
163
132
06MAR94
YYZ
150
219
07MAR94
LON
241
622
07MAR94
FRA
210
132
07MAR94
YYZ
164
271
07MAR94
PAR
155
PROCLIB.LAKES
data proclib.lakes;
input region $ 1-2 lake $ 5-13 pol_a1 pol_a2 pol_b1-pol_b4;
datalines;
NE
Carr
0.24
0.99
0.95
0.36
0.44
0.67
NE
Duraleigh
0.34
0.01
0.48
0.58
0.12
0.56
NE
Charlie
0.40
0.48
0.29
0.56
0.52
0.95
NE
Farmer
0.60
0.65
0.25
0.20
0.30
0.64
NW
Canyon
0.63
0.44
0.20
0.98
0.19
0.01
NW
Morris
0.85
0.95
0.80
0.67
0.32
0.81
PROCLIB.LAKES
1393
1394
PROCLIB.MARCH
Appendix 3
NW
Golf
0.69
0.37
0.08
0.72
0.71
0.32
NW
Falls
0.01
0.02
0.59
0.58
0.67
0.02
SE
Pleasant
0.16
0.96
0.71
0.35
0.35
0.48
SE
Juliette
0.82
0.35
0.09
0.03
0.59
0.90
SE
Massey
1.01
0.77
0.45
0.32
0.55
0.66
SE
Delta
0.84
1.05
0.90
0.09
0.64
0.03
SW
Alumni
0.45
0.32
0.45
0.44
0.55
0.12
SW
New Dam
0.80
0.70
0.31
0.98
1.00
0.22
SW
Border
0.51
0.04
0.55
0.35
0.45
0.78
SW
Red
0.22
0.09
0.02
0.10
0.32
0.01
PROCLIB.MARCH
data proclib.march;
input flight $3. +5 date date7. +3 depart time5. +2 orig $3.
+3 dest $3.
01MAR94
7:10
LGA
LAX
2475
172
210
202
01MAR94
10:43
LGA
ORD
740
151
210
219
01MAR94
9:31
LGA
LON
3442
198
250
622
01MAR94
12:19
LGA
FRA
3857
207
250
132
01MAR94
15:35
LGA
YYZ
366
115
178
271
01MAR94
13:17
LGA
PAR
3635
138
250
302
01MAR94
20:22
LGA
WAS
229
105
180
114
02MAR94
7:10
LGA
LAX
2475
119
210
202
02MAR94
10:43
LGA
ORD
740
120
210
219
02MAR94
9:31
LGA
LON
3442
147
250
622
02MAR94
12:19
LGA
FRA
3857
176
250
132
02MAR94
15:35
LGA
YYZ
366
106
178
302
02MAR94
20:22
LGA
WAS
229
78
180
271
02MAR94
13:17
LGA
PAR
3635
104
250
114
03MAR94
7:10
LGA
LAX
2475
197
210
202
03MAR94
10:43
LGA
ORD
740
118
210
219
03MAR94
9:31
LGA
LON
3442
197
250
622
03MAR94
12:19
LGA
FRA
3857
180
250
132
03MAR94
15:35
LGA
YYZ
366
75
178
271
03MAR94
13:17
LGA
PAR
3635
147
250
302
03MAR94
20:22
LGA
WAS
229
123
180
114
04MAR94
7:10
LGA
LAX
2475
178
210
202
04MAR94
10:43
LGA
ORD
740
148
210
219
04MAR94
9:31
LGA
LON
3442
232
250
622
04MAR94
12:19
LGA
FRA
3857
137
250
132
04MAR94
15:35
LGA
YYZ
366
117
178
271
04MAR94
13:17
LGA
PAR
3635
146
250
302
04MAR94
20:22
LGA
WAS
229
115
180
114
05MAR94
7:10
LGA
LAX
2475
117
210
202
05MAR94
10:43
LGA
ORD
740
104
210
219
05MAR94
9:31
LGA
LON
3442
160
250
622
05MAR94
12:19
LGA
FRA
3857
185
250
132
05MAR94
15:35
LGA
YYZ
366
157
05MAR94
13:17
LGA
PAR
3635
177
06MAR94
7:10
LGA
LAX
2475
128
210
202
06MAR94
10:43
LGA
ORD
740
115
210
219
06MAR94
9:31
LGA
LON
3442
163
250
132
06MAR94
15:35
LGA
YYZ
366
150
178
302
06MAR94
20:22
LGA
WAS
229
66
180
114
07MAR94
7:10
LGA
LAX
2475
160
210
202
07MAR94
10:43
LGA
ORD
740
175
210
219
07MAR94
9:31
LGA
LON
3442
241
250
622
07MAR94
12:19
LGA
FRA
3857
210
250
132
07MAR94
15:35
LGA
YYZ
366
164
178
271
07MAR94
13:17
LGA
PAR
3635
155
250
302
07MAR94
20:22
LGA
WAS
229
135
1395
250
114
PROCLIB.PAYROLL
178
271
180
PROCLIB.PAYLIST2
proc sql;
create table proclib.paylist2
(IdNum char(4),
Gender char(1),
Jobcode char(3),
Salary num,
Birth num informat=date7.
format=date7.,
Hired num informat=date7.
format=date7.);
PROCLIB.PAYROLL
This data set (table) is updated in Example 3 on page 1129 and its updated data is
used in subsequent examples.
data proclib.payroll;
input IdNumber $4. +3 Gender $1. +4 Jobcode $3. +9 Salary 5.
+2 Birth date7. +2 Hired date7.;
informat birth date7. hired date7.;
format birth date7. hired date7.;
datalines;
1919
TA2
34376
12SEP60
04JUN87
1396
PROCLIB.PAYROLL
Appendix 3
1653
ME2
35108
15OCT64
09AUG90
1400
ME1
29769
05NOV67
16OCT90
1350
FA3
32886
31AUG65
29JUL90
1401
TA3
38822
13DEC50
17NOV85
1499
ME3
43025
26APR54
07JUN80
1101
SCP
18723
06JUN62
01OCT90
1333
PT2
88606
30MAR61
10FEB81
1402
TA2
32615
17JAN63
02DEC90
1479
TA3
38785
22DEC68
05OCT89
1403
ME1
28072
28JAN69
21DEC91
1739
PT1
66517
25DEC64
27JAN91
1658
SCP
17943
08APR67
29FEB92
1428
PT1
68767
04APR60
16NOV91
1782
ME2
35345
04DEC70
22FEB92
1244
ME2
36925
31AUG63
17JAN88
1383
BCK
25823
25JAN68
20OCT92
1574
FA2
28572
27APR60
20DEC92
1789
SCP
18326
25JAN57
11APR78
1404
PT2
91376
24FEB53
01JAN80
1437
FA3
33104
20SEP60
31AUG84
1639
TA3
40260
26JUN57
28JAN84
1269
NA1
41690
03MAY72
28NOV92
1065
ME2
35090
26JAN44
07JAN87
1876
TA3
39675
20MAY58
27APR85
1037
TA1
28558
10APR64
13SEP92
1129
ME2
34929
08DEC61
17AUG91
1988
FA3
32217
30NOV59
18SEP84
1405
SCP
18056
05MAR66
26JAN92
1430
TA2
32925
28FEB62
27APR87
1983
FA3
33419
28FEB62
27APR87
1134
TA2
33462
05MAR69
21DEC88
1118
PT3
111379
16JAN44
18DEC80
1438
TA3
39223
15MAR65
18NOV87
1125
FA2
28888
08NOV68
11DEC87
1475
FA2
27787
15DEC61
13JUL90
1117
TA3
39771
05JUN63
13AUG92
1935
NA2
51081
28MAR54
16OCT81
1124
FA1
23177
10JUL58
01OCT90
1422
FA1
22454
04JUN64
06APR91
1616
TA2
34137
01MAR70
04JUN93
1406
ME2
35185
08MAR61
17FEB87
1120
ME1
28619
11SEP72
07OCT93
1094
FA1
22268
02APR70
17APR91
1389
BCK
25028
15JUL59
18AUG90
1905
PT1
65111
16APR72
29MAY92
1407
PT1
68096
23MAR69
18MAR90
1114
TA2
32928
18SEP69
27JUN87
1410
PT2
84685
03MAY67
07NOV86
1439
PT1
70736
06MAR64
10SEP90
1409
ME3
41551
19APR50
22OCT81
1408
TA2
34138
29MAR60
14OCT87
1121
ME1
29112
26SEP71
07DEC91
1991
TA1
27645
07MAY72
12DEC92
1102
TA2
34542
01OCT59
15APR91
1356
ME2
36869
26SEP57
22FEB83
1545
PT1
66130
12AUG59
29MAY90
1292
ME2
36691
28OCT64
02JUL89
1440
ME2
35757
27SEP62
09APR91
1368
FA2
27808
11JUN61
03NOV84
1369
TA2
33705
28DEC61
13MAR87
1411
FA2
27265
27MAY61
01DEC89
1113
FA1
22367
15JAN68
17OCT91
1704
BCK
25465
30AUG66
28JUN87
1900
ME2
35105
25MAY62
27OCT87
1126
TA3
40899
28MAY63
21NOV80
1677
BCK
26007
05NOV63
27MAR89
1441
FA2
27158
19NOV69
23MAR91
1421
TA2
33155
08JAN59
28FEB90
1119
TA1
26924
20JUN62
06SEP88
1834
BCK
26896
08FEB72
02JUL92
1777
PT3
109630
23SEP51
21JUN81
1663
BCK
26452
11JAN67
11AUG91
1106
PT2
89632
06NOV57
16AUG84
1103
FA1
23738
16FEB68
23JUL92
1477
FA2
28566
21MAR64
07MAR88
1476
TA2
34803
30MAY66
17MAR87
1379
ME3
42264
08AUG61
10JUN84
1104
SCP
17946
25APR63
10JUN91
1009
TA1
28880
02MAR59
26MAR92
1412
ME1
27799
18JUN56
05DEC91
1115
FA3
32699
22AUG60
29FEB80
1128
TA2
32777
23MAY65
20OCT90
1442
PT2
84536
05SEP66
12APR88
1417
NA2
52270
27JUN64
07MAR89
1478
PT2
84203
09AUG59
24OCT90
1673
BCK
25477
27FEB70
15JUL91
1839
NA1
43433
29NOV70
03JUL93
1347
TA3
40079
21SEP67
06SEP84
1423
ME2
35773
14MAY68
19AUG90
1200
ME1
27816
10JAN71
14AUG92
1970
FA1
22615
25SEP64
12MAR91
1521
ME3
41526
12APR63
13JUL88
1354
SCP
18335
29MAY71
16JUN92
1424
FA2
28978
04AUG69
11DEC89
1132
FA1
22413
30MAY72
22OCT93
1845
BCK
25996
20NOV59
22MAR80
1556
PT1
71349
22JUN64
11DEC91
1413
FA2
27435
16SEP65
02JAN90
1123
TA1
28407
31OCT72
05DEC92
1907
TA2
33329
15NOV60
06JUL87
1436
TA2
34475
11JUN64
12MAR87
1385
ME3
43900
16JAN62
01APR86
1432
ME2
35327
03NOV61
10FEB85
1111
NA1
40586
14JUL73
31OCT92
1116
FA1
22862
28SEP69
21MAR91
1352
NA2
53798
02DEC60
16OCT86
1555
FA2
27499
16MAR68
04JUL92
1038
TA1
26533
09NOV69
23NOV91
PROCLIB.PAYROLL
1397
1398
PROCLIB.PAYROLL2
Appendix 3
1420
ME3
43071
19FEB65
22JUL87
1561
TA2
34514
30NOV63
07OCT87
1434
FA2
28622
11JUL62
28OCT90
1414
FA1
23644
24MAR72
12APR92
1112
TA1
26905
29NOV64
07DEC92
1390
FA2
27761
19FEB65
23JUN91
1332
NA1
42178
17SEP70
04JUN91
1890
PT2
91908
20JUL51
25NOV79
1429
TA1
27939
28FEB60
07AUG92
1107
PT2
89977
09JUN54
10FEB79
1908
TA2
32995
10DEC69
23APR90
1830
PT2
84471
27MAY57
29JAN83
1882
ME3
41538
10JUL57
21NOV78
1050
ME2
35167
14JUL63
24AUG86
1425
FA1
23979
28DEC71
28FEB93
1928
PT2
89858
16SEP54
13JUL90
1480
TA3
39583
03SEP57
25MAR81
1100
BCK
25004
01DEC60
07MAY88
1995
ME1
28810
24AUG73
19SEP93
1135
FA2
27321
20SEP60
31MAR90
1415
FA2
28278
09MAR58
12FEB88
1076
PT1
66558
14OCT55
03OCT91
1426
TA2
32991
05DEC66
25JUN90
1564
SCP
18833
12APR62
01JUL92
1221
FA2
27896
22SEP67
04OCT91
1133
TA1
27701
13JUL66
12FEB92
1435
TA3
38808
12MAY59
08FEB80
1418
ME1
28005
29MAR57
06JAN92
1017
TA3
40858
28DEC57
16OCT81
1443
NA1
42274
17NOV68
29AUG91
1131
TA2
32575
26DEC71
19APR91
1427
TA2
34046
31OCT70
30JAN90
1036
TA3
39392
19MAY65
23OCT84
1130
FA1
23916
16MAY71
05JUN92
1127
TA2
33011
09NOV64
07DEC86
1433
FA3
32982
08JUL66
17JAN87
1431
FA3
33230
09JUN64
05APR88
1122
FA2
27956
01MAY63
27NOV88
1105
ME2
34805
01MAR62
13AUG90
PROCLIB.PAYROLL2
data proclib.payroll2;
input idnum $4. +3 gender $1. +4 jobcode $3. +9 salary 5.
+2 birth date7. +2 hired date7.;
informat birth date7. hired date7.;
format birth date7. hired date7.;
datalines;
1639
TA3
42260
26JUN57
28JAN84
1065
ME3
38090
26JAN44
07JAN87
1561
TA3
36514
30NOV63
07OCT87
1221
FA3
29896
22SEP67
04OCT91
1447
FA1
22123
07AUG72
29OCT92
1998
SCP
23100
10SEP70
02NOV92
1036
TA3
42465
19MAY65
23OCT84
1106
PT3
94039
06NOV57
16AUG84
1129
ME3
36758
08DEC61
17AUG91
1350
FA3
36098
31AUG65
29JUL90
1369
TA3
36598
28DEC61
13MAR87
1076
PT1
69742
14OCT55
03OCT91
PROCLIB.SCHEDULE
data proclib.schedule;
input flight $3. +5 date date7. +2 dest $3. +3 idnum $4.;
format date date7.;
informat date date7.;
datalines;
132
01MAR94
YYZ
1739
132
01MAR94
YYZ
1478
132
01MAR94
YYZ
1130
132
01MAR94
YYZ
1390
132
01MAR94
YYZ
1983
132
01MAR94
YYZ
1111
219
01MAR94
LON
1407
219
01MAR94
LON
1777
219
01MAR94
LON
1103
219
01MAR94
LON
1125
219
01MAR94
LON
1350
219
01MAR94
LON
1332
271
01MAR94
PAR
1439
271
01MAR94
PAR
1442
271
01MAR94
PAR
1132
271
01MAR94
PAR
1411
271
01MAR94
PAR
1988
271
01MAR94
PAR
1443
622
01MAR94
FRA
1545
622
01MAR94
FRA
1890
622
01MAR94
FRA
1116
622
01MAR94
FRA
1221
622
01MAR94
FRA
1433
622
01MAR94
FRA
1352
132
02MAR94
YYZ
1556
132
02MAR94
YYZ
1478
132
02MAR94
YYZ
1113
132
02MAR94
YYZ
1411
132
02MAR94
YYZ
1574
132
02MAR94
YYZ
1111
219
02MAR94
LON
1407
219
02MAR94
LON
1118
219
02MAR94
LON
1132
219
02MAR94
LON
1135
PROCLIB.SCHEDULE
1399
1400
PROCLIB.SCHEDULE
219
02MAR94
LON
1441
219
02MAR94
LON
1332
271
02MAR94
PAR
1739
271
02MAR94
PAR
1442
271
02MAR94
PAR
1103
271
02MAR94
PAR
1413
271
02MAR94
PAR
1115
271
02MAR94
PAR
1443
622
02MAR94
FRA
1439
622
02MAR94
FRA
1890
622
02MAR94
FRA
1124
622
02MAR94
FRA
1368
622
02MAR94
FRA
1477
622
02MAR94
FRA
1352
132
03MAR94
YYZ
1739
132
03MAR94
YYZ
1928
132
03MAR94
YYZ
1425
132
03MAR94
YYZ
1135
132
03MAR94
YYZ
1437
132
03MAR94
YYZ
1111
219
03MAR94
LON
1428
219
03MAR94
LON
1442
219
03MAR94
LON
1130
219
03MAR94
LON
1411
219
03MAR94
LON
1115
219
03MAR94
LON
1332
271
03MAR94
PAR
1905
271
03MAR94
PAR
1118
271
03MAR94
PAR
1970
271
03MAR94
PAR
1125
271
03MAR94
PAR
1983
271
03MAR94
PAR
1443
622
03MAR94
FRA
1545
622
03MAR94
FRA
1830
622
03MAR94
FRA
1414
622
03MAR94
FRA
1368
622
03MAR94
FRA
1431
622
03MAR94
FRA
1352
132
04MAR94
YYZ
1428
132
04MAR94
YYZ
1118
132
04MAR94
YYZ
1103
132
04MAR94
YYZ
1390
132
04MAR94
YYZ
1350
132
04MAR94
YYZ
1111
219
04MAR94
LON
1739
219
04MAR94
LON
1478
219
04MAR94
LON
1130
219
04MAR94
LON
1125
219
04MAR94
LON
1983
219
04MAR94
LON
1332
271
04MAR94
PAR
1407
271
04MAR94
PAR
1410
271
04MAR94
PAR
1094
271
04MAR94
PAR
1411
Appendix 3
271
04MAR94
PAR
1115
271
04MAR94
PAR
1443
622
04MAR94
FRA
1545
622
04MAR94
FRA
1890
622
04MAR94
FRA
1116
622
04MAR94
FRA
1221
622
04MAR94
FRA
1433
622
04MAR94
FRA
1352
132
05MAR94
YYZ
1556
132
05MAR94
YYZ
1890
132
05MAR94
YYZ
1113
132
05MAR94
YYZ
1475
132
05MAR94
YYZ
1431
132
05MAR94
YYZ
1111
219
05MAR94
LON
1428
219
05MAR94
LON
1442
219
05MAR94
LON
1422
219
05MAR94
LON
1413
219
05MAR94
LON
1574
219
05MAR94
LON
1332
271
05MAR94
PAR
1739
271
05MAR94
PAR
1928
271
05MAR94
PAR
1103
271
05MAR94
PAR
1477
271
05MAR94
PAR
1433
271
05MAR94
PAR
1443
622
05MAR94
FRA
1545
622
05MAR94
FRA
1830
622
05MAR94
FRA
1970
622
05MAR94
FRA
1441
622
05MAR94
FRA
1350
622
05MAR94
FRA
1352
132
06MAR94
YYZ
1333
132
06MAR94
YYZ
1890
132
06MAR94
YYZ
1414
132
06MAR94
YYZ
1475
132
06MAR94
YYZ
1437
132
06MAR94
YYZ
1111
219
06MAR94
LON
1106
219
06MAR94
LON
1118
219
06MAR94
LON
1425
219
06MAR94
LON
1434
219
06MAR94
LON
1555
219
06MAR94
LON
1332
132
07MAR94
YYZ
1407
132
07MAR94
YYZ
1118
132
07MAR94
YYZ
1094
132
07MAR94
YYZ
1555
132
07MAR94
YYZ
1350
132
07MAR94
YYZ
1111
219
07MAR94
LON
1905
219
07MAR94
LON
1478
219
07MAR94
LON
1124
219
07MAR94
LON
1434
PROCLIB.SCHEDULE
1401
1402
PROCLIB.STAFF
219
07MAR94
LON
1983
219
07MAR94
LON
1332
271
07MAR94
PAR
1410
271
07MAR94
PAR
1777
271
07MAR94
PAR
1103
271
07MAR94
PAR
1574
271
07MAR94
PAR
1115
271
07MAR94
PAR
1443
622
07MAR94
FRA
1107
622
07MAR94
FRA
1890
622
07MAR94
FRA
1425
622
07MAR94
FRA
1475
622
07MAR94
FRA
1433
622
07MAR94
FRA
Appendix 3
1352
PROCLIB.STAFF
data proclib.staff;
input idnum $4. +3 lname $15. +2 fname $15. +2 city $15. +2
state $2. +5 hphone $12.;
datalines;
1919
ADAMS
GERALD
STAMFORD
CT
203/781-1255
1653
ALIBRANDI
MARIA
BRIDGEPORT
CT
203/675-7715
1400
ALHERTANI
ABDULLAH
NEW YORK
NY
212/586-0808
1350
ALVAREZ
MERCEDES
NEW YORK
NY
718/383-1549
1401
ALVAREZ
CARLOS
PATERSON
NJ
201/732-8787
1499
BAREFOOT
JOSEPH
PRINCETON
NJ
201/812-5665
1101
BAUCOM
WALTER
NEW YORK
NY
212/586-8060
1333
BANADYGA
JUSTIN
STAMFORD
CT
203/781-1777
1402
BLALOCK
RALPH
NEW YORK
NY
718/384-2849
1479
BALLETTI
MARIE
NEW YORK
NY
718/384-8816
1403
BOWDEN
EARL
BRIDGEPORT
CT
203/675-3434
1739
BRANCACCIO
JOSEPH
NEW YORK
NY
212/587-1247
1658
BREUHAUS
JEREMY
NEW YORK
NY
212/587-3622
1428
BRADY
CHRISTINE
STAMFORD
CT
203/781-1212
1782
BREWCZAK
JAKOB
STAMFORD
CT
203/781-0019
1244
BUCCI
ANTHONY
NEW YORK
NY
718/383-3334
1383
BURNETTE
THOMAS
NEW YORK
NY
718/384-3569
1574
CAHILL
MARSHALL
NEW YORK
NY
718/383-2338
1789
CARAWAY
DAVIS
NEW YORK
NY
212/587-9000
1404
COHEN
LEE
NEW YORK
NY
718/384-2946
1437
CARTER
DOROTHY
BRIDGEPORT
CT
203/675-4117
1639
CARTER-COHEN
KAREN
STAMFORD
CT
203/781-8839
1269
CASTON
FRANKLIN
STAMFORD
CT
203/781-3335
1065
COPAS
FREDERICO
NEW YORK
NY
718/384-5618
1876
CHIN
JACK
NEW YORK
NY
212/588-5634
1037
CHOW
JANE
STAMFORD
CT
203/781-8868
1129
COUNIHAN
BRENDA
NEW YORK
NY
718/383-2313
1988
COOPER
ANTHONY
NEW YORK
NY
212/587-1228
1405
DACKO
JASON
PATERSON
NJ
201/732-2323
1430
DABROWSKI
SANDRA
BRIDGEPORT
CT
203/675-1647
1983
DEAN
SHARON
NEW YORK
NY
718/384-1647
1134
DELGADO
MARIA
STAMFORD
CT
203/781-1528
1118
DENNIS
ROGER
NEW YORK
NY
718/383-1122
1438
DABBOUSSI
KAMILLA
STAMFORD
CT
203/781-2229
1125
DUNLAP
DONNA
NEW YORK
NY
718/383-2094
1475
ELGES
MARGARETE
NEW YORK
NY
718/383-2828
1117
EDGERTON
JOSHUA
NEW YORK
NY
212/588-1239
1935
FERNANDEZ
KATRINA
BRIDGEPORT
CT
203/675-2962
1124
FIELDS
DIANA
WHITE PLAINS
NY
914/455-2998
1422
FUJIHARA
KYOKO
PRINCETON
NJ
201/812-0902
1616
FUENTAS
CARLA
NEW YORK
NY
718/384-3329
1406
FOSTER
GERALD
BRIDGEPORT
CT
203/675-6363
1120
GARCIA
JACK
NEW YORK
NY
718/384-4930
1094
GOMEZ
ALAN
BRIDGEPORT
CT
203/675-7181
1389
GOLDSTEIN
LEVI
NEW YORK
NY
718/384-9326
1905
GRAHAM
ALVIN
NEW YORK
NY
212/586-8815
1407
GREGORSKI
DANIEL
MT. VERNON
NY
914/468-1616
1114
GREENWALD
JANICE
NEW YORK
NY
212/588-1092
1410
HARRIS
CHARLES
STAMFORD
CT
203/781-0937
1439
HASENHAUER
CHRISTINA
BRIDGEPORT
CT
203/675-4987
1409
HAVELKA
RAYMOND
STAMFORD
CT
203/781-9697
1408
HENDERSON
WILLIAM
PRINCETON
NJ
201/812-4789
1121
HERNANDEZ
ROBERTO
NEW YORK
NY
718/384-3313
1991
HOWARD
GRETCHEN
BRIDGEPORT
CT
203/675-0007
1102
HERMANN
JOACHIM
WHITE PLAINS
NY
914/455-0976
1356
HOWARD
MICHAEL
NEW YORK
NY
212/586-8411
1545
HERRERO
CLYDE
STAMFORD
CT
203/781-1119
1292
HUNTER
HELEN
BRIDGEPORT
CT
203/675-4830
1440
JACKSON
LAURA
STAMFORD
CT
203/781-0088
1368
JEPSEN
RONALD
STAMFORD
CT
203/781-8413
1369
JONSON
ANTHONY
NEW YORK
NY
212/587-5385
1411
JOHNSEN
JACK
PATERSON
NJ
201/732-3678
1113
JOHNSON
LESLIE
NEW YORK
NY
718/383-3003
1704
JONES
NATHAN
NEW YORK
NY
718/384-0049
1900
KING
WILLIAM
NEW YORK
NY
718/383-3698
1126
KIMANI
ANNE
NEW YORK
NY
212/586-1229
1677
KRAMER
JACKSON
BRIDGEPORT
CT
203/675-7432
1441
LAWRENCE
KATHY
PRINCETON
NJ
201/812-3337
1421
LEE
RUSSELL
MT. VERNON
NY
914/468-9143
1119
LI
JEFF
NEW YORK
NY
212/586-2344
1834
LEBLANC
RUSSELL
NEW YORK
NY
718/384-0040
1777
LUFKIN
ROY
NEW YORK
NY
718/383-4413
1663
MARKS
JOHN
NEW YORK
NY
212/587-7742
1106
MARSHBURN
JASPER
STAMFORD
CT
203/781-1457
1103
MCDANIEL
RONDA
NEW YORK
NY
212/586-0013
1477
MEYERS
PRESTON
BRIDGEPORT
CT
203/675-8125
1476
MONROE
JOYCE
STAMFORD
CT
203/781-2837
1379
MORGAN
ALFRED
STAMFORD
CT
203/781-2216
1104
MORGAN
CHRISTOPHER
NEW YORK
NY
718/383-9740
1009
MORGAN
GEORGE
NEW YORK
NY
212/586-7753
1412
MURPHEY
JOHN
PRINCETON
NJ
201/812-4414
1115
MURPHY
ALICE
NEW YORK
NY
718/384-1982
1128
NELSON
FELICIA
BRIDGEPORT
CT
203/675-1166
1442
NEWKIRK
SANDRA
PRINCETON
NJ
201/812-3331
PROCLIB.STAFF
1403
1404
PROCLIB.STAFF
Appendix 3
1417
NEWKIRK
WILLIAM
PATERSON
NJ
201/732-6611
1478
NEWTON
JAMES
NEW YORK
NY
212/587-5549
1673
NICHOLLS
HENRY
STAMFORD
CT
203/781-7770
1839
NORRIS
DIANE
NEW YORK
NY
718/384-1767
1347
ONEAL
BRYAN
NEW YORK
NY
718/384-0230
1423
OSWALD
LESLIE
MT. VERNON
NY
914/468-9171
1200
OVERMAN
MICHELLE
STAMFORD
CT
203/781-1835
1970
PARKER
ANNE
NEW YORK
NY
718/383-3895
1521
PARKER
JAY
NEW YORK
NY
212/587-7603
1354
PARKER
MARY
WHITE PLAINS
NY
914/455-2337
1424
PATTERSON
RENEE
NEW YORK
NY
212/587-8991
1132
PEARCE
CAROL
NEW YORK
NY
718/384-1986
1845
PEARSON
JAMES
NEW YORK
NY
718/384-2311
1556
PENNINGTON
MICHAEL
NEW YORK
NY
718/383-5681
1413
PETERS
RANDALL
PRINCETON
NJ
201/812-2478
1123
PETERSON
SUZANNE
NEW YORK
NY
718/383-0077
1907
PHELPS
WILLIAM
STAMFORD
CT
203/781-1118
1436
PORTER
SUSAN
NEW YORK
NY
718/383-5777
1385
RAYNOR
MILTON
BRIDGEPORT
CT
203/675-2846
1432
REED
MARILYN
MT. VERNON
NY
914/468-5454
1111
RHODES
JEREMY
PRINCETON
NJ
201/812-1837
1116
RICHARDS
CASEY
NEW YORK
NY
212/587-1224
1352
RIVERS
SIMON
NEW YORK
NY
718/383-3345
1555
RODRIGUEZ
JULIA
BRIDGEPORT
CT
203/675-2401
1038
RODRIGUEZ
MARIA
BRIDGEPORT
CT
203/675-2048
1420
ROUSE
JEREMY
PATERSON
NJ
201/732-9834
1561
SANDERS
RAYMOND
NEW YORK
NY
212/588-6615
1434
SANDERSON
EDITH
STAMFORD
CT
203/781-1333
1414
SANDERSON
NATHAN
BRIDGEPORT
CT
203/675-1715
1112
SANYERS
RANDY
NEW YORK
NY
718/384-4895
1390
SMART
JONATHAN
NEW YORK
NY
718/383-1141
1332
STEPHENSON
ADAM
BRIDGEPORT
CT
203/675-1497
1890
STEPHENSON
ROBERT
NEW YORK
NY
718/384-9874
1429
THOMPSON
ALICE
STAMFORD
CT
203/781-3857
1107
THOMPSON
WAYNE
NEW YORK
NY
718/384-3785
1908
TRENTON
MELISSA
NEW YORK
NY
212/586-6262
1830
TRIPP
KATHY
BRIDGEPORT
CT
203/675-2479
1882
TUCKER
ALAN
NEW YORK
NY
718/384-0216
1050
TUTTLE
THOMAS
WHITE PLAINS
NY
914/455-2119
1425
UNDERWOOD
JENNY
STAMFORD
CT
203/781-0978
1928
UPCHURCH
LARRY
WHITE PLAINS
NY
914/455-5009
1480
UPDIKE
THERESA
NEW YORK
NY
212/587-8729
1100
VANDEUSEN
RICHARD
NEW YORK
NY
212/586-2531
1995
VARNER
ELIZABETH
NEW YORK
NY
718/384-7113
1135
VEGA
ANNA
NEW YORK
NY
718/384-5913
1415
VEGA
FRANKLIN
NEW YORK
NY
718/384-2823
1076
VENTER
RANDALL
NEW YORK
NY
718/383-2321
1426
VICK
THERESA
PRINCETON
NJ
201/812-2424
1564
WALTERS
ANNE
NEW YORK
NY
212/587-3257
1221
WALTERS
DIANE
NEW YORK
NY
718/384-1918
1133
WANG
CHIN
NEW YORK
NY
212/587-1956
1435
WARD
ELAINE
NEW YORK
NY
718/383-4987
1418
WATSON
BERNARD
NEW YORK
NY
718/383-1298
1017
WELCH
DARIUS
NEW YORK
NY
212/586-5535
1443
WELLS
AGNES
STAMFORD
CT
1131
WELLS
NADINE
NEW YORK
NY
WHALEY
CAROLYN
MT. VERNON
NY
WONG
LESLIE
NEW YORK
NY
212/587-2570
1130
WOOD
DEBORAH
NEW YORK
NY
212/587-0013
1127
WOOD
SANDRA
NEW YORK
NY
212/587-2881
1433
YANCEY
ROBIN
PRINCETON
NJ
201/812-1874
1431
YOUNG
DEBORAH
STAMFORD
CT
203/781-2987
1122
YOUNG
JOANN
NEW YORK
NY
718/384-2021
1105
YOUNG
LAWRENCE
NEW YORK
NY
1405
914/468-4528
1036
RADIO
718/383-1045
1427
203/781-5546
718/384-0008
PROCLIB.SUPERV
data proclib.superv;
input supid $4. +8 state $2. +5
jobcat
$2.;
CT
BC
1834
NY
BC
1431
CT
FA
1433
NJ
FA
1983
NY
FA
1385
CT
ME
1420
NJ
ME
1882
NY
ME
1935
CT
NA
1417
NJ
NA
1352
NY
NA
1106
CT
PT
1442
NJ
PT
1118
NY
PT
1405
NJ
SC
1564
NY
SC
1639
CT
TA
1401
NJ
TA
1126
NY
TA
RADIO
This DATA step uses an INFILE statement to read data that is stored in an external
le.
data radio;
infile input-file missover;
input /(time1-time7) ($1. +1);
listener=_n_;
run;
1406
RADIO
Appendix 3
5 0 0 0 1 0 0 0 0 0 8 8 5 0
833 29 m 1 3 2
2 0 0 0 2 2 0 0 4 2 0 2 0 0
859 23 f 10 3 1
1 5 0 8 8 1 4 0 1 1 1 1 1 4
781 37 f .5 2 7
7 0 0 0 1 0 0 0 1 7 0 1 0 0
833 31 f 5 4 1
1 0 0 0 1 0 0 0 4 0 4 0 0 0
942 23 f 4 2 1
1 0 0 0 1 0 1 0 1 1 0 0 0 0
848 33 f 5 4 1
1 1 0 1 1 0 0 0 1 1 1 0 0 0
222 33 f 2 0 1
1 0 0 0 1 0 0 0 0 0 0 0 0 0
851 45 f .5 1 8
8 0 0 0 8 0 0 0 0 0 8 0 0 0
848 27 f 2 4 1
1 0 0 0 1 1 0 0 4 1 1 1 1 1
781 38 m 2 2 1
5 0 0 0 1 0 0 0 0 0 1 1 0 0
222 27 f 3 1 2
2 0 2 0 2 2 0 0 2 0 0 0 0 0
467 34 f 2 2 1
1 0 0 0 0 1 0 1 0 0 0 0 1 0
833 27 f 8 8 1
7 0 1 0 7 4 0 0 1 1 1 4 1 0
677 49 f 1.5 0 8
8 0 8 0 8 0 0 0 0 0 0 0 0 0
849 43 m 1 4 1
1 0 0 0 4 0 0 0 4 0 1 0 0 0
467 28 m 2 1 7
7 0 0 0 7 0 0 7 0 0 1 0 0 0
732 29 f 1 0 2
2 0 0 0 2 0 0 0 0 0 0 0 0 0
851 31 m 2 2 2
2 5 0 6 0 0 8 0 2 2 8 2 0 0
779 42 f 8 2 2
7 2 0 2 7 0 0 0 0 0 0 0 2 0
493 40 m 1 3 3
3 0 0 0 5 3 0 5 5 0 0 0 1 1
859 30 m 1 0 7
7 0 0 0 7 0 0 0 0 0 0 0 0 0
833 36 m 4 2 5
7 5 0 5 0 5 0 0 7 0 0 0 5 0
467 30 f 1 4 1
0 0 0 0 1 0 6 0 0 1 1 1 0 6
859 32 f 3 5 2
2 2 2 2 2 2 6 6 2 2 2 2 2 6
851 43 f 8 1 5
7 5 5 5 0 0 0 4 0 0 0 0 0 0
848 29 f 3 5 1
7 0 0 0 7 1 0 0 1 1 1 1 1 0
833 25 f 2 4 5
RADIO
1407
1408
RADIO
Appendix 3
7 0 0 0 5 7 0 0 7 5 0 0 5 0
783 33 f 8 3 8
8 0 8 0 7 0 0 0 8 0 5 4 0 5
222 26 f 10 2 1
1 1 0 1 1 0 0 0 3 1 1 0 0 0
222 23 f 3 2 2
2 2 2 2 7 0 0 2 2 0 0 0 0 0
859 50 f 1 5 4
7 0 0 0 7 0 0 5 4 4 4 7 0 0
833 26 f 3 2 1
1 0 0 1 1 0 0 5 5 0 1 0 0 0
467 29 m 7 2 1
1 1 1 1 1 0 0 1 1 1 0 0 0 0
859 35 m .5 2 2
7 0 0 0 2 0 0 7 5 0 0 4 0 0
833 33 f 3 3 6
7 0 0 0 6 8 0 8 0 0 0 8 6 0
221 36 f .5 1 5
0 7 0 0 0 7 0 0 7 0 0 7 7 0
220 32 f 2 4 5
5 0 5 0 5 5 5 0 5 5 5 5 5 5
684 19 f 2 4 2
0 2 0 2 0 0 0 0 0 2 2 0 0 0
493 55 f 1 0 5
5 0 0 5 0 0 0 0 7 0 0 0 0 0
221 27 m 1 1 7
7 0 0 0 0 0 0 0 5 0 0 0 5 0
684 19 f 0 .5 1
7 0 0 0 0 1 1 0 0 0 0 0 1 1
493 38 f .5 .5 5
0 8 0 0 5 0 0 0 5 0 0 0 0 0
221 26 f .5 2 1
0 1 0 0 0 1 0 0 5 5 5 1 0 0
684 18 m 1 .5 1
0 2 0 0 0 0 1 0 0 0 0 1 1 0
684 19 m 1 1 1
0 0 0 1 1 0 0 0 0 0 1 0 0 0
221 29 m .5 .5 5
0 0 0 0 0 5 5 0 0 0 0 0 5 5
683 18 f 2 4 8
0 0 0 0 8 0 0 0 8 8 8 0 0 0
966 23 f 1 2 1
1 5 5 5 1 0 0 0 0 1 0 0 1 0
493 25 f 3 5 7
7 0 0 0 7 2 0 0 7 0 2 7 7 0
683 18 f .5 .5 2
1 0 0 0 0 0 5 0 0 1 0 0 0 1
382 21 f 3 1 8
0 8 0 0 5 8 8 0 0 8 8 0 0 0
683 18 f 4 6 2
2 0 0 0 2 2 2 0 2 0 2 2 2 0
684 19 m .5 2 1
0 0 0 0 1 1 0 0 0 1 1 1 1 5
684 19 m 1.5 3.5 2
2 0 0 0 2 0 0 0 0 0 2 5 0 0
221 23 f 1 5 1
7 5 1 5 1 3 1 7 5 1 5 1 3 1
684 18 f 2 3 1
2 0 0 1 1 1 1 7 2 0 1 1 1 1
683 19 f 3 5 2
2 0 0 2 0 6 1 0 1 1 2 2 6 1
683 19 f 3 5 1
2 0 0 2 0 6 1 0 1 1 2 0 2 1
221 35 m 3 5 5
7 5 0 1 7 0 0 5 5 5 0 0 0 0
221 43 f 1 4 5
1 0 0 0 5 0 0 5 5 0 0 0 0 0
493 32 f 2 1 6
0 0 0 6 0 0 0 0 0 0 0 0 4 0
221 24 f 4 5 2
2 0 5 0 0 2 4 4 4 5 0 0 2 2
684 19 f 2 3 2
0 5 5 2 5 0 1 0 5 5 2 2 2 2
221 19 f 3 3 8
0 1 1 8 8 8 4 0 5 4 1 8 8 4
221 29 m 1 1 5
5 5 5 5 5 5 5 5 5 5 5 5 5 5
221 21 m 1 1 1
1 0 0 0 0 0 5 1 0 0 0 0 0 5
683 20 f 1 2 2
0 0 0 0 2 0 0 0 2 0 0 0 0 0
493 54 f 1 1 5
7 0 0 5 0 0 0 0 0 0 5 0 0 0
493 45 m 4 6 5
7 0 0 0 7 5 0 0 5 5 5 5 5 5
850 44 m 2.5 1.5 7
7 0 7 0 4 7 5 0 5 4 3 0 0 4
220 33 m 5 3 5
1 5 0 5 1 0 0 0 0 0 0 0 5 5
684 20 f 1.5 3 1
1 0 0 0 1 0 1 0 1 0 0 1 1 0
966 63 m 3 5 3
5 4 7 5 4 5 0 5 0 0 5 5 4 0
683 21 f 4 6 1
0 1 0 1 1 1 1 0 1 1 1 1 1 1
493 23 f 5 2 5
7 5 0 4 0 0 0 0 1 1 1 1 1 0
493 32 f 8 8 5
7 5 0 0 7 0 5 5 5 0 0 7 5 5
942 33 f 7 2 5
0 5 5 4 7 0 0 0 0 0 0 7 8 0
493 34 f .5 1 5
5 0 0 0 5 0 0 0 0 0 6 0 0 0
382 40 f 2 2 5
5 0 0 0 5 0 0 5 0 0 5 0 0 0
362 27 f 0 3 8
0 0 0 0 0 0 0 0 0 0 0 0 8 0
542 36 f 3 3 7
RADIO
1409
1410
RADIO
Appendix 3
7 0 0 0 7 1 0 0 0 7 1 1 0 0
966 39 f 3 6 5
7 0 0 0 7 5 0 0 7 0 5 0 5 0
849 32 m 1 .5 7
7 0 0 0 5 0 0 0 7 4 4 5 7 0
677 52 f 3 2 3
7 0 0 0 0 7 0 0 0 7 0 0 3 0
222 25 m 2 4 1
1 0 0 0 1 0 0 0 1 0 1 0 0 0
732 42 f 3 2 7
7 0 0 0 1 7 5 5 7 0 0 3 4 0
467 26 f 4 4 1
7 0 1 0 7 1 0 0 7 7 4 7 0 0
467 38 m 2.5 0 1
1 0 0 0 1 0 0 0 0 0 0 0 0 0
382 37 f 1.5 .5 7
7 0 0 0 7 0 0 0 3 0 0 0 3 0
856 45 f 3 3 7
7 0 0 0 7 5 0 0 7 7 4 0 0 0
677 33 m 3 2 7
7 0 0 4 7 0 0 0 7 0 0 0 0 0
490 27 f .5 1 2
2 0 0 0 2 0 0 0 2 0 2 0 0 0
362 27 f 1.5 2 2
2 0 0 0 1 0 4 0 1 0 0 0 4 4
783 25 f 2 1 1
1 0 0 0 1 7 0 0 0 0 1 1 1 0
546 30 f 8 3 1
1 1 1 1 1 0 0 1 0 5 5 0 0 0
677 30 f 2 0 1
1 0 0 0 0 1 0 0 0 0 0 0 0 1
221 35 f 2 2 1
1 0 0 0 1 0 1 0 1 1 1 0 0 0
966 32 f 6 1 7
7 1 1 1 7 4 0 1 7 1 8 8 4 0
222 28 f 1 5 4
7 0 0 0 4 0 0 4 4 4 4 0 0 0
467 29 f 5 3 4
4 5 5 5 1 4 4 5 1 1 1 1 4 4
467 32 m 3 4 1
1 0 1 0 4 0 0 0 4 0 0 0 1 0
966 30 m 1.5 1 7
7 0 0 0 7 5 0 7 0 0 0 0 5 0
967 38 m 14 4 7
7 7 7 7 7 0 4 8 0 0 0 0 4 0
490 28 m 8 1 1
7 1 1 1 1 0 0 7 0 0 8 0 0 0
833 30 f .5 1 6
6 0 0 0 6 0 0 0 0 6 0 0 6 0
851 40 m 1 0 7
7 5 5 5 7 0 0 0 0 0 0 0 0 0
859 27 f 2 5 2
6 0 0 0 2 0 0 0 0 0 0 2 2 2
851 22 f 3 5 2
7 0 2 0 2 2 0 0 2 0 8 0 2 0
967 38 f 1 1.5 7
7 0 0 0 7 5 0 7 4 0 0 7 5 0
856 34 f 1.5 1 1
0 1 0 0 0 1 0 0 4 0 0 0 0 0
222 33 m .1 .1 7
7 0 0 0 7 0 0 0 0 0 7 0 0 0
856 22 m .50 .25 1
0 1 0 0 1 0 0 0 0 0 0 0 0 0
677 30 f 2 2 4
1 0 4 0 4 0 0 0 4 0 0 0 0 0
859 25 m 2 3 7
0 0 0 0 0 7 0 0 7 0 2 0 0 1
833 35 m 2 6 7
7 0 0 0 7 1 1 0 4 7 4 7 1 1
677 35 m 10 4 1
1 1 1 1 1 8 6 8 1 0 0 8 8 8
848 29 f 5 3 8
8 0 0 0 8 8 0 0 0 8 8 8 0 0
688 26 m 3 1 1
1 1 7 1 1 7 0 0 0 8 8 0 0 0
490 41 m 2 2 5
5 0 0 0 0 0 5 5 0 0 0 0 0 5
493 35 m 4 4 7
7 5 0 5 7 0 0 7 7 7 7 0 0 0
677 27 m 15 11 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1
848 27 f 3 5 1
1 1 0 0 1 1 0 0 1 1 1 1 0 0
362 30 f 1 0 1
1 0 0 0 7 5 0 0 0 0 0 0 0 0
783 29 f 1 1 4
4 0 0 0 4 0 0 0 4 0 0 0 4 0
467 39 f .5 2 4
7 0 4 0 4 4 0 0 4 4 4 4 4 4
677 27 m 2 2 7
7 0 0 0 7 0 0 7 7 0 0 7 0 0
221 23 f 2.5 1 1
1 0 0 0 1 0 0 0 0 0 0 0 0 0
677 29 f 1 1 7
0 0 0 0 7 0 0 0 7 0 0 0 0 0
783 32 m 1 2 5
4 5 5 5 4 2 0 0 0 0 3 2 2 0
833 25 f 1 0 1
1 1 0 0 0 0 0 0 0 0 0 0 0 0
859 24 f 7 3 7
1 0 0 0 1 0 0 0 0 1 0 0 1 0
677 29 m 2 2 8
0 8 8 0 8 0 0 0 8 8 8 0 0 0
688 31 m 8 2 5
7 5 5 5 5 7 0 0 7 7 0 0 0 0
856 31 m 9 4 1
1 1 1 1 1 0 0 0 0 0 0 0 1 0
856 44 f 1 0 6
RADIO
1411
1412
RADIO
Appendix 3
6 0 0 0 6 0 0 0 0 0 0 0 0 0
677 37 f 3 3 1
0 0 1 0 0 0 0 0 4 4 0 0 0 0
859 27 m 2 .5 2
2 2 2 2 2 2 2 2 0 0 0 0 0 2
781 30 f 10 4 2
2 0 0 0 2 0 2 0 0 0 0 0 0 2
362 27 m 12 4 3
3 1 1 1 1 3 3 3 0 0 0 0 3 0
362 33 f 2 4 1
1 0 0 0 7 0 0 7 1 1 1 1 1 0
222 26 f 8 1 1
1 1 1 1 0 0 0 1 0 0 0 0 0 0
779 37 f 6 3 1
1 1 1 1 1 0 0 1 1 0 0 0 1 0
467 32 f 1 1 2
2 0 0 0 0 0 0 0 2 0 0 2 0 0
859 23 m 1 1 1
1 0 0 0 1 1 0 1 0 0 0 0 1 1
781 33 f 1 .5 6
6 0 0 0 6 0 0 0 0 0 0 0 0 0
779 28 m 5 2 1
1 1 1 1 1 0 0 0 0 7 7 1 1 0
677 28 m 3 1 5
7 5 5 5 5 6 0 0 6 6 6 6 6 0
677 25 f 9 2 5
1 5 5 5 5 1 1 0 1 1 1 1 1 1
848 30 f 6 2 8
8 0 0 0 2 7 0 0 0 0 2 0 2 0
546 36 f 4 6 4
7 0 0 0 4 4 0 5 5 5 5 2 4 4
222 30 f 2 3 2
2 2 0 0 2 0 0 0 2 0 2 2 0 0
383 32 m 4 1 2
2 0 0 0 2 0 0 2 0 0 0 0 0 0
851 43 f 8 1 6
4 6 0 6 4 0 0 0 0 0 0 0 0 0
222 27 f 1 3 1
1 1 0 1 1 1 0 0 1 0 0 0 4 0
833 22 f 1.5 2 1
1 0 0 0 1 1 0 0 1 1 1 0 0 0
467 29 f 2 1 8
8 0 8 0 8 0 0 0 0 0 8 0 0 0
856 28 f 2 3 1
1 0 0 0 1 0 0 0 1 0 0 1 0 0
580 31 f 2.5 2.5 6
6 6 6 6 6 6 6 6 1 1 1 1 6 6
688 39 f 8 8 3
3 3 3 3 3 3 3 3 3 3 3 3 3 3
677 37 f 1.5 .5 1
6 1 1 1 6 6 0 0 1 1 6 6 6 0
859 38 m 3 6 3
7 0 0 0 7 3 0 0 3 0 3 0 0 0
677 25 f 7 1 1
0 1 1 1 2 0 0 0 1 2 1 1 1 0
848 36 f 7 1 1
0 1 0 1 1 0 0 0 0 0 0 1 1 0
781 31 f 2 4 1
1 0 0 0 1 1 0 1 1 1 1 1 0 0
781 40 f 2 2 8
8 0 0 8 8 0 0 0 0 0 8 8 0 0
677 25 f 3 5 1
1 6 1 6 6 3 0 0 2 2 1 1 1 1
779 33 f 3 2 1
1 0 1 0 0 0 1 0 1 0 0 0 1 0
677 25 m 7 1.5 1
1 1 0 1 1 0 0 0 0 0 1 0 0 0
362 35 f .5 0 1
1 0 0 0 1 0 0 0 0 0 0 0 0 0
677 41 f 6 2 7
7 7 0 7 7 0 0 0 0 0 8 0 0 0
677 24 m 5 1 5
1 5 0 5 0 0 0 0 1 0 0 0 0 0
833 29 f .5 0 6
6 0 0 0 6 0 0 0 0 0 0 0 0 0
362 30 f 1 1 1
1 0 0 0 1 0 0 0 1 0 0 0 0 0
850 26 f 6 12 6
6 0 0 0 2 2 2 6 6 6 0 0 6 6
467 25 f 2 3 1
1 0 0 6 1 1 0 0 0 0 1 1 1 1
967 29 f 1 2 7
7 0 0 0 7 0 0 7 7 0 0 0 0 0
833 31 f 1 1 7
7 0 7 0 7 3 0 0 3 3 0 0 0 0
859 40 f 7 1 5
1 5 0 5 5 1 0 0 1 0 0 0 0 0
848 31 m 1 2 1
1 0 0 0 1 1 0 0 4 4 1 4 0 0
222 32 f 2 3 3
3 0 0 0 0 7 0 0 3 0 8 0 0 0
783 33 f 2 0 4
7 0 0 0 7 0 0 0 4 0 4 0 0 0
856 28 f 8 4 2
0 2 0 2 2 0 0 0 2 0 2 0 4 0
781 30 f 3 5 1
1 1 1 1 1 1 0 0 1 1 1 1 1 0
850 25 f 6 3 1
7 5 0 5 7 1 0 0 7 0 1 0 1 0
580 33 f 2.5 4 2
2 0 0 0 2 0 0 0 0 0 8 8 0 0
677 38 f 3 3 1
1 0 0 0 1 0 1 1 1 0 1 0 0 4
677 26 f 2 2 1
1 0 1 0 1 0 0 0 1 1 1 0 0 0
467 52 f 3 2 2
2 6 6 6 6 2 0 0 2 2 2 2 0 0
542 31 f 1 3 1
RADIO
1413
1414
RADIO
Appendix 3
1 0 1 0 1 0 0 0 1 1 1 1 1 0
859 50 f 9 3 6
6 6 6 6 6 6 6 6 6 3 3 3 6 6
779 26 f 1 2 1
7 0 1 0 1 1 4 1 4 1 1 1 4 4
779 36 m 1.5 2 4
1 4 0 4 4 0 0 4 4 4 4 0 0 0
222 31 f 0 3 7
1 0 0 0 7 0 0 0 0 0 0 0 0 0
362 27 f 1 1 1
1 0 1 0 1 4 0 4 4 1 0 4 4 0
967 32 f 3 2 7
7 0 0 0 7 0 0 0 1 0 0 1 0 0
362 29 f 10 2 2
2 2 2 2 2 2 2 2 2 2 2 7 0 0
677 27 f 3 4 1
0 5 1 1 0 5 0 0 0 1 1 1 0 0
546 32 m 5 .5 8
8 0 0 0 8 0 0 0 8 0 0 0 0 0
688 38 m 2 3 2
2 0 0 0 2 0 0 0 2 0 0 0 1 0
362 28 f 1 1 1
1 0 0 0 1 1 0 4 0 0 0 0 4 0
851 32 f .5 2 4
5 0 0 0 4 0 0 0 0 0 0 0 2 0
967 43 f 2 2 1
1 0 0 0 1 0 0 1 7 0 0 0 1 0
467 44 f 10 4 6
7 6 0 6 6 0 6 0 0 0 0 0 0 6
467 23 f 5 3 1
0 2 1 2 1 0 0 0 1 1 1 1 1 1
783 30 f 1 .5 1
1 0 0 0 1 0 0 0 0 0 0 7 0 0
677 29 f 3 1 2
2 2 2 2 2 0 0 0 0 0 0 0 0 0
859 26 f 9.5 1.5 2
2 2 2 2 2 0 0 2 2 0 0 0 0 0
222 28 f 3 0 2
2 0 0 0 2 0 0 0 0 0 2 0 0 0
966 37 m 2 1 1
7 1 1 1 7 0 0 0 7 0 0 0 0 0
859 31 f 10 10 1
0 1 1 1 1 0 0 0 1 1 0 0 1 0
781 27 f 2 1 2
2 0 0 0 1 0 0 0 4 0 0 0 0 0
677 31 f .5 .5 6
7 0 0 0 0 0 0 0 6 0 0 0 0 0
848 28 f 5 1 2
2 2 0 2 0 0 0 0 2 0 0 0 0 0
781 24 f 3 3 6
1 6 6 6 1 6 0 0 0 0 1 0 1 1
856 27 f 1.5 1 6
2 6 6 6 2 5 0 2 0 0 5 2 0 0
382 30 m 1 2 7
7 0 0 0 7 0 4 7 0 0 0 7 4 4
848 25 f 9 3 1
7 1 1 5 1 0 0 0 1 1 1 1 1 0
382 30 m 1 2 4
7 0 0 0 7 0 4 7 0 0 0 7 4 4
688 40 m 2 3 1
1 0 0 0 1 3 1 0 5 0 4 4 7 1
856 40 f .5 5 5
3 0 0 0 3 0 0 0 0 0 5 5 0 0
966 25 f 2 .5 2
1 0 0 0 2 6 0 0 4 0 0 0 0 0
859 30 f 2 4 2
2 0 0 0 0 2 0 0 0 0 2 0 0 0
849 29 m 10 1 5
7 5 5 5 7 5 5 0 0 0 0 0 7 0
781 28 m 1.5 3 4
1 0 0 0 1 4 4 0 4 4 1 1 4 0
467 35 f 4 2 6
7 6 7 6 6 7 6 7 7 7 7 7 7 6
222 32 f 10 5 1
1 1 0 1 1 0 0 1 1 1 0 0 1 0
677 32 f 1 0 1
1 0 1 0 0 0 0 0 0 0 0 0 0 0
222 54 f 21 4 3
5 0 0 0 7 0 0 7 0 0 0 0 0 0
677 30 m 4 6 1
7 0 0 0 0 1 1 1 7 1 1 0 8 1
683 29 f 1 2 8
8 0 0 0 8 0 0 0 0 8 8 0 0 0
467 38 m 3 5 1
1 0 0 0 1 0 0 1 1 0 0 0 0 0
781 29 f 2 3 8
8 0 0 0 8 8 0 0 8 8 0 8 8 0
781 30 f 1 0 5
5 0 0 0 0 5 0 0 0 0 0 0 0 0
783 40 f 1.5 3 1
1 0 0 0 1 4 0 0 1 1 1 0 0 0
851 30 f 1 1 6
6 0 0 0 6 0 0 0 6 0 0 6 0 0
851 40 f 1 1 5
5 0 0 0 5 0 0 0 0 1 0 0 0 0
779 40 f 1 0 2
2 0 0 0 2 0 0 0 0 0 0 0 0 0
467 37 f 4 8 1
1 0 0 0 1 0 3 0 3 1 1 1 0 0
859 37 f 4 3 3
0 3 7 0 0 7 0 0 0 7 8 3 7 0
781 26 f 4 1 2
2 2 0 2 1 0 0 0 2 0 0 0 0 0
859 23 f 8 3 3
3 2 0 2 3 0 0 0 1 0 0 3 0 0
967 31 f .5 0 1
1 0 0 0 0 0 0 0 0 0 0 0 0 0
851 38 m 4 2 5
RADIO
1415
1416
RADIO
Appendix 3
7 5 0 5 4 0 4 7 7 0 4 0 8 0
467 30 m 2 1 2
2 2 0 2 0 0 0 0 2 0 2 0 0 0
848 33 f 2 2 7
7 0 0 0 0 7 0 7 7 0 0 0 7 0
688 35 f 5 8 3
2 2 2 2 2 0 0 3 3 3 3 3 0 0
467 27 f 2 3 1
1 0 1 0 0 1 0 0 1 1 1 0 0 0
783 42 f 3 1 1
1 0 0 0 1 0 0 0 1 0 1 1 0 0
687 40 m 1.5 2 1
7 0 0 0 1 1 0 0 1 0 7 0 1 0
779 30 f 4 8 7
7 0 0 0 7 0 6 7 4 2 2 0 0 6
222 34 f 9 0 8
8 2 0 2 8 0 0 0 0 0 0 0 0 0
467 28 m 3 1 2
2 0 0 0 2 2 0 0 0 2 2 0 0 0
222 28 f 8 4 2
1 2 1 2 2 0 0 1 2 2 0 0 2 0
542 35 m 2 3 2
6 0 7 0 7 0 7 0 0 0 2 2 0 0
677 31 m 12 4 3
7 3 0 3 3 4 0 0 4 4 4 0 0 0
783 45 f 1.5 2 6
6 0 0 0 6 0 0 6 6 0 0 0 0 0
942 34 f 1 .5 4
4 0 0 0 1 0 0 0 0 0 2 0 0 0
222 30 f 8 4 1
1 1 1 1 1 0 0 0 1 1 0 0 0 0
967 38 f 1.5 2 7
7 0 0 0 7 0 0 7 1 1 1 1 0 0
783 37 f 2 1 1
6 6 1 1 6 6 0 0 6 1 1 1 6 0
467 31 f 1.5 2 2
2 0 7 0 7 0 0 7 7 0 0 0 7 0
859 48 f 3 0 7
7 0 0 0 0 0 0 0 0 7 0 0 0 0
490 35 f 1 1 7
7 0 0 0 7 0 0 0 0 0 0 0 8 0
222 27 f 3 2 3
8 0 0 0 3 8 0 3 3 0 0 0 0 0
382 36 m 3 2 4
7 0 5 4 7 4 4 0 7 7 4 7 0 4
859 37 f 1 1 2
7 0 0 0 0 2 0 2 2 0 0 0 0 2
856 29 f 3 1 1
1 0 0 0 1 1 1 1 0 0 1 1 0 1
542 32 m 3 3 7
7 0 0 0 0 7 7 7 0 0 0 0 7 7
783 31 m 1 1 1
1 0 0 0 1 0 0 0 1 1 1 0 0 0
833 35 m 1 1 1
5 4 1 5 1 0 0 1 1 0 0 0 0 0
782 38 m 30 8 5
7 5 5 5 5 0 0 4 4 4 4 4 0 0
222 33 m 3 3 1
1 1 1 1 1 1 1 1 4 1 1 1 1 1
467 24 f 2 4 1
0 0 1 0 1 0 0 0 1 1 1 0 0 0
467 34 f 1 1 1
1 0 0 0 1 0 0 1 1 0 0 0 0 0
781 53 f 2 1 5
5 0 0 0 5 5 0 0 0 0 5 5 5 0
222 30 m 2 5 3
6 3 3 3 6 0 0 0 3 3 3 3 0 0
688 26 f 2 2 1
1 0 0 0 1 0 0 0 1 0 1 1 0 0
222 29 m 8 5 1
1 6 0 6 1 0 0 1 1 1 1 0 0 0
783 33 m 1 2 7
7 0 0 0 7 0 0 0 7 0 0 0 7 0
781 39 m 1.5 2.5 2
2 0 2 0 2 0 0 0 2 2 2 0 0 0
850 22 f 2 1 1
1 0 0 0 1 1 1 0 5 0 0 1 0 0
493 36 f 1 0 5
0 0 0 0 7 0 0 0 0 0 0 0 0 0
967 46 f 2 4 7
7 5 0 5 7 0 0 0 4 7 4 0 0 0
856 41 m 2 2 4
7 4 0 0 7 4 0 4 0 0 0 7 0 0
546 25 m 5 5 8
8 8 0 0 0 0 0 0 0 0 0 0 0 0
222 27 f 4 4 3
2 2 2 3 7 7 0 2 2 2 3 3 3 0
688 23 m 9 3 3
3 3 3 3 3 7 0 0 3 0 0 0 0 0
849 26 m .5 .5 8
8 0 0 0 8 0 0 0 0 8 0 0 0 0
783 29 f 3 3 1
1 0 0 0 4 0 0 4 1 0 1 0 0 0
856 34 f 1.5 2 1
7 0 0 0 7 0 0 7 4 0 0 7 0 0
966 33 m 3 5 4
7 0 0 0 7 4 5 0 7 0 0 7 4 4
493 34 f 2 5 1
1 0 0 0 1 0 0 0 7 0 1 1 8 0
467 29 m 2 4 2
2 0 0 0 2 0 0 2 2 2 2 2 2 2
677 28 f 1 4 1
1 1 1 1 1 0 0 0 1 0 1 0 0 0
781 27 m 2 2 1
1 0 1 0 4 2 4 0 2 2 1 0 1 4
467 24 m 4 4 1
7 1 0 1 1 1 0 7 1 0 0 0 0 0
859 26 m 5 5 1
RADIO
1417
1418
RADIO
Appendix 3
1 1 1 1 1 1 1 1 1 1 1 1 1 1
848 27 m 7 2 5
7 5 0 5 4 5 0 0 0 7 4 4 0 4
677 25 f 1 2 8
8 0 0 0 0 5 0 0 8 0 0 0 2 0
222 26 f 3.5 0 2
2 0 0 0 2 0 0 0 0 0 0 0 0 0
833 32 m 1 2 1
1 0 0 0 1 0 0 0 5 0 1 0 0 0
781 28 m 2 .5 7
7 0 0 0 7 0 0 0 4 0 0 0 0 0
783 28 f 1 1 1
1 0 0 0 1 0 0 0 0 0 1 1 0 0
222 28 f 5 5 2
2 6 6 2 2 0 0 0 2 2 0 0 2 2
851 33 m 4 5 3
1 0 0 0 7 3 0 3 3 3 3 3 7 5
859 39 m 2 1 1
1 0 0 0 1 0 0 0 0 0 0 1 0 0
848 45 m 2 2 7
7 0 0 0 7 0 0 0 7 0 0 0 0 0
467 37 m 2 2 7
7 0 0 0 0 7 0 0 0 7 0 0 7 0
859 32 m .25 .25 1
1 0 0 0 0 0 0 0 1 0 0 0 0 0
1419
APPENDIX
4
Recommended Reading
Recommended Reading
1419
Recommended Reading
Here is the recommended reading list for this title:
3 The Little SAS Book: A Primer, Third Edition
3 Output Delivery System: The Basics
3
3
3
3
3
3
3
3
For a complete list of SAS publications, see the current SAS Publishing Catalog. To
order the most current publications or to receive a free copy of the catalog, contact a
SAS representative at
SAS Publishing Sales
SAS Campus Drive
Cary, NC 27513
Telephone: (800) 727-3228*
Fax: (919) 677-8166
E-mail: [email protected]
Web address: support.sas.com/pubs
* For other SAS Institute business, call (919) 677-8000.
Customers outside the United States should contact their local SAS ofce.
1420
Index
Index
A
ACCELERATE= option
ITEM statement (PMENU) 671
ACROSS option
DEFINE statement (REPORT) 899
across variables 854, 899
activities data set 86, 107
AFTER= option
PROC CPORT statement 288
AGE statement
DATASETS procedure 312
aging data sets 388
aging les 312
ALL class variable 1246
ALL keyword 1095
ALLOBS option
PROC COMPARE statement 231
ALLSTATS option
PROC COMPARE statement 231
ALLVARS option
PROC COMPARE statement 231
ALPHA= option
PROC MEANS statement 528
PROC TABULATE statement 1188
ALTER= option
AGE statement (DATASETS) 312
CHANGE statement (DATASETS) 322
COPY statement (DATASETS) 328
DELETE statement (DATASETS) 334
EXCHANGE statement (DATASETS) 338
MODIFY statement (DATASETS) 349
PROC DATASETS statement 309
REPAIR statement (DATASETS) 353
SELECT statement (DATASETS) 356
ALTER TABLE statement
SQL procedure 1038
alternative hypotheses
ANALYSIS option
DEFINE statement (REPORT) 899
analysis variables 548, 855, 899
SUMMARY procedure 1179
TABULATE procedure 1211, 1212
weights for 64, 912, 1212
ANSI Standard 1122
APPEND 75
APPEND procedure 75
overview 75
syntax 75
APPEND statement
DATASETS procedure 314
appending data sets 314
APPEND procedure vs. APPEND statement 319
compressed data sets 316
indexed data sets 316
integrity constraints and 318
password-protected data sets 316
restricting observations 315
SET statement vs. APPEND statement 315
system failures 319
variables with different attributes 317
with different variables 317
with generation groups 318
APPENDVER= option
APPEND statement (DATASETS) 314
arithmetic mean
arithmetic operators 1123
ASCENDING option
CHART procedure 192
CLASS statement (MEANS) 536
CLASS statement (TABULATE) 1197
ASCII option
PROC SORT statement 1007
ASCII order 1007, 1014
ASIS option
PROC CPORT statement 288
asterisk (*) notation 1059
ATTR= option
TEXT statement (PMENU) 679
ATTRIB statement
procedures and 57
audit les
creating 321
event logging 320
AUDIT statement
DATASETS procedure 320
AUDIT_ALL= option
AUDIT statement (DATASETS) 320
AUTOLABEL option
OUTPUT statement (MEANS) 545
AUTONAME option
OUTPUT statement (MEANS) 545
axes
customizing
AXIS= option
CHART procedure 193
PLOT statement (TIMEPLOT) 1294
B
bar charts 180
horizontal 180, 189, 209
maximum number of bars 187
percentage charts 201
side-by-side 206
vertical 180, 191, 203
BASE= argument
APPEND statement (DATASETS) 314
base data set 226
BASE= option
PROC COMPARE statement 232
batch mode
creating printer denitions 789
printing from 867
BATCH option
PROC DISPLAY statement 396
BETWEEN condition 1071
block charts 181, 187
for BY groups 210
BLOCK statement
CHART procedure 187
BOX option
PLOT statement (PLOT) 615
PROC REPORT statement 872
TABLE statement (TABULATE) 1204
_BREAK_ automatic variable 862
break lines 861
_BREAK_ automatic variable 862
creating 861
order of 861, 888, 911
BREAK statement
REPORT procedure 885
BREAK window
REPORT procedure 913
breaks 861
BRIEFSUMMARY option
PROC COMPARE statement 232
browsing external les 489
BTRIM function (SQL) 1072
buttons 670
BY-group information
titles containing 20
BY-group processing 20, 59
error processing for 24
formats and 30
TABULATE procedure 1215
BY groups
block charts for 210
1421
1422
Index
plotting 646
transposing
BY lines
inserting into titles 23
suppressing the default 20
BY processing
COMPARE procedure 237
BY statement 58
BY-group processing 59
CALENDAR procedure 91
CHART procedure 188
COMPARE procedure 236
example 60
formatting BY-variable values 59
MEANS procedure 535, 562
options 58
PLOT procedure 612
PRINT procedure 716
procedures supporting 60
RANK procedure 819
REPORT procedure 889
SORT procedure 1012
STANDARD procedure 1168
TABULATE procedure 1196
TIMEPLOT procedure 1291
TRANSPOSE procedure
BY variables
formatting values 59
inserting names into titles 22
inserting values into titles 21
C
calculated columns
SQL 1073
CALCULATED component 1073
CALEDATA= option
PROC CALENDAR statement 86
CALENDAR 84
calendar, dened 104
calendar data set 86, 109
multiple calendars 105, 106
CALENDAR procedure 84
activities data set 107
activity lines 113
calendar data set 109
calendar types 79, 101
concepts 101
customizing calendar appearance 113
default calendars 103
duration 94
examples 114
holiday duration 95
holidays data set 108
input data sets 106
missing values 111
multiple calendars 79, 92, 104
ODS portability 113
output, format of 112
output, quantity of 112
overview 79
project management 83
results 112
schedule calendars 102
scheduling 83, 137
summary calendars 102
syntax 84
task tables 84, 85
workdays data set 110
calendar reports 104
CALID statement
CALENDAR procedure 92
CALL DEFINE statement
REPORT procedure 890
CAPS option
PROC FSLIST statement 491
Cartesian product 1083, 1084
case-control studies 1158
CASE expression 1073
CATALOG 154
CATALOG= argument
PROC DISPLAY statement 396
catalog concatenation 168
catalog entries
copying 159, 165, 170
deleting 156, 160, 170, 176
displaying contents of 174
excluding, for copying 162
exporting 296, 298
importing 223
modifying descriptions of 163, 174
moving, from multiple catalogs 170
renaming 157, 174
routing log or output to entries 779
saving from deletion 164
switching names of 161
CATALOG= option
CONTENTS statement (CATALOG) 158
PROC PMENU statement 667
CATALOG procedure 154
catalog concatenation 168
concepts 165
ending a step 166
entry type specication 166
error handling 166
examples 170
interactive processing with RUN groups 165
overview 153
results 169
syntax 154
task tables 154, 155, 159
catalogs
concatenating 168
exporting multiple 295
format catalogs 456
listing contents of 157
PMENU entries 667, 673, 680
repairing 353
categories 1185
headings for 1227
categories of procedures 3
CC option
FSLIST command 493
PROC FSLIST statement 491
CENTER option
DEFINE statement (REPORT) 899
PROC REPORT statement 873
centiles 344
CENTILES option
CONTENTS statement (DATASETS) 324
CFREQ option
CHART procedure 193
CHANGE statement
CATALOG procedure 157
DATASETS procedure 322
character data
converting to numeric values 470
character strings
converting to lowercase 1093
converting to uppercase 1114
formats for 448
ranges for 480
returning a substring 1106
trimming 1072
character values
formats for 466
character variables
sorting orders for 1014
CHART 185
CHART procedure 185
bar charts 180, 206
block charts 181, 187, 210
concepts 197
customizing charts 191
examples 198
formatting characters 185
frequency counts 198
horizontal bar charts 189, 209
missing values 195, 197
ODS output 198
ODS table names 198
options 192
overview 179
percentage bar charts 201
pie charts 182, 189
results 197
star charts 183, 190
syntax 185
task table 191
variable characteristics 197
vertical bar charts 191, 203
charts
bar charts 180, 201, 206
block charts 181, 187, 210
customizing 191
horizontal bar charts 189, 209
missing values 195
pie charts 182, 189
star charts 183, 190
vertical bar charts 191, 203
CHARTYPE option
PROC MEANS statement 528
check boxes 668, 670
active vs. inactive 668
color of 668
CHECKBOX statement
PMENU procedure 668
CIMPORT 216
CIMPORT procedure 216
examples 222
le transport process 215
overview 215
results 221
syntax 216
task table 216
CLASS statement
MEANS procedure 536
TABULATE procedure 1197
TIMEPLOT procedure 1292
Index
1423
starting 895
COMPUTE statement
REPORT procedure 895
COMPUTE window
REPORT procedure 916
COMPUTED option
DEFINE statement (REPORT) 901
COMPUTED VAR window
REPORT procedure 916
computed variables 855, 901
storing 983
concatenating catalogs 168
concatenating data sets 386
CONDENSE option
TABLE statement (TABULATE) 1204
condence limits 553, 573
keywords and formulas
one-sided, above the mean
one-sided, below the mean
TABULATE procedure 1188
two-sided
CONNECT statement
SQL procedure 1042
CONNECTION TO component 1079
CONSTRAINT= option
COPY statement (DATASETS) 330
PROC CPORT statement 288
CONTAINS condition 1080
CONTENTS= option
PROC PRINT statement 707
PROC REPORT statement 874
PROC TABULATE statement 1189
TABLE statement (TABULATE) 1205
CONTENTS procedure 276
overview 275
syntax 276
task table 276
versus CONTENTS statement
(DATASETS) 327
CONTENTS statement
CATALOG procedure 157
DATASETS procedure 323
contingency tables 1269
continuation messages 1185
CONTOUR= option
PLOT statement (PLOT) 615
contour plots 615, 642
converting les 215, 285
COPY 278
COPY procedure 278
concepts 278
example 279
overview 277
syntax 278
transporting data sets 278
versus COPY statement (DATASETS) 333
COPY statement
CATALOG procedure 159
DATASETS procedure 327
TRANSPOSE procedure
copying data libraries
entire data library 331
copying data sets
between hosts 279
long variable names 332
copying les 327
COPY statement vs. COPY procedure 333
1424
Index
D
DANISH option
PROC SORT statement 1007
DATA= argument
PROC EXPORT statement 404
DATA COLUMNS window
REPORT procedure 917
data components
denition 40
Data Control Blocks (DCBs) 221, 294
data libraries
copying entire library 331
copying les 327
deleting les 334
exchanging lenames 338
importing 222
printing directories of 275, 323
processing all data sets in 30
renaming les 322
saving les from deletion 355
USER data library 17
DATA= option
APPEND statement (DATASETS) 314
CONTENTS statement (DATASETS) 324
Index
DECSEP= option
PICTURE statement (FORMAT) 439
DEFAULT= option
FORMAT procedure 451
RADIOBOX statement (PMENU) 675
DEFINE option
PROC OPTIONS statement 595
DEFINE statement
REPORT procedure 897
DEFINITION window
REPORT procedure 918
DELETE option
PROC PRTDEF statement 790
DELETE statement
CATALOG procedure 160
DATASETS procedure 334
SQL procedure 1052
delimited les
exporting 408, 411
importing 506, 514
DELIMITER= statement
EXPORT procedure 408
IMPORT procedure 507
denominator denitions 1269
density function
DESC option
PROC PMENU statement 667
DESCENDING option
BY statement 58
BY statement (CALENDAR) 92
BY statement (CHART) 188
BY statement (COMPARE) 237
BY statement (MEANS) 535
BY statement (PLOT) 612
BY statement (PRINT) 716
BY statement (RANK) 819
BY statement (REPORT) 889
BY statement (SORT) 1012
BY statement (STANDARD) 1168
BY statement (TABULATE) 1196
BY statement (TIMEPLOT) 1291
BY statement (TRANSPOSE)
CHART procedure 193
CLASS statement (MEANS) 536
CLASS statement (TABULATE) 1197
DEFINE statement (REPORT) 901
ID statement (COMPARE) 238
PROC RANK statement 816
DESCENDTYPES option
PROC MEANS statement 529
DESCRIBE statement
SQL procedure 1053
DESCRIPTION= argument
MODIFY statement (CATALOG) 163
descriptive statistics 558, 1177
computing with class variables 560
keywords and formulas
table of 31
destination-independent input 43
detail reports 847
detail rows 847
DETAILS option
CONTENTS statement (DATASETS) 325
PROC DATASETS statement 309
deviation from the mean
device drivers
system fonts 425
1425
E
EBCDIC option
PROC SORT statement 1007
EBCDIC order 1007, 1014
EET= option
PROC CIMPORT statement 217
PROC CPORT statement 289
efciency
statistical procedures 7
elementary statistics procedures
embedded LIBNAME statements 1051
embedded SQL 1125
encoded passwords 807, 809
in SAS programs 808, 809
saving to paste buffer 811
encoding
versus encryption 808
encryption
versus encoding 808
ENDCOMP statement
REPORT procedure 906
ENTRYTYPE= option
CATALOG procedure 167
CHANGE statement (CATALOG) 157
COPY statement (CATALOG) 159
DELETE statement (CATALOG) 161
EXCHANGE statement (CATALOG) 161
EXCLUDE statement (CATALOG) 162
EXCLUDE statement (CIMPORT) 220
EXCLUDE statement (CPORT) 292
MODIFY statement (CATALOG) 164
PROC CATALOG statement 156
SAVE statement (CATALOG) 164
SELECT statement (CATALOG) 165
SELECT statement (CIMPORT) 221
SELECT statement (CPORT) 293
EQUALS option
PROC SORT statement 1008
equijoins 1083
error checking
formats and 30
error handling
CATALOG procedure 166
ERROR option
PROC COMPARE statement 232
error processing
of BY-group specications 24
ERRORSTOP option
PROC SQL statement 1035
estimates
ET= option
PROC CIMPORT statement 218
PROC CPORT statement 289
ETYPE= option
SELECT statement (CPORT) 293
event logging 320
Excel
exporting spreadsheets 407, 416
exporting subset of observations to 414
exporting to a specic spreadsheet 415
importing spreadsheet from workbook 517,
521
importing spreadsheets 505
importing subset of records from 518
loading spreadsheet into workbook 408
EXCEPT operator 1098
1426
Index
EXCHANGE statement
CATALOG procedure 161
DATASETS procedure 338
EXCLNPWGT option
PROC REPORT statement 874
PROC STANDARD statement 1167
EXCLNPWGTS option
PROC MEANS statement 529
PROC TABULATE statement 1189
EXCLUDE statement
CATALOG procedure 162
CIMPORT procedure 219
CPORT procedure 292
DATASETS procedure 339
FORMAT procedure 434
PRTEXP procedure 804
exclusion lists 52
destinations for output objects 53
EXCLUSIVE option
CLASS statement (MEANS) 536
CLASS statement (TABULATE) 1197
DEFINE statement (REPORT) 901
PROC MEANS statement 529
PROC TABULATE statement 1189
EXEC option
PROC SQL statement 1035
EXECUTE statement
SQL procedure 1055
EXISTS condition 1080
expected value
EXPLODE 403
EXPLODE procedure 403
EXPLORE window
REPORT procedure 924
EXPORT 404
EXPORT= option
PROC REGISTRY statement 834
EXPORT procedure 404
data source statements 408
DBMS specications 405
DBMS table statements 409
examples 411
overview 403
syntax 404
exporting
catalog entries 296, 298
CPORT procedure 285
excluding les or entries 292
multiple catalogs 295
printer denitions 790
registry contents 834, 840
selecting les or entries 293
exporting data 403
client/server model 410
DBMS tables 409
delimited les 408, 411
Microsoft Access 407, 410, 415
spreadsheet compatibility 407
spreadsheets 408
EXTENDSN= option
PROC CIMPORT statement 218
external les
browsing 489
comparing registry with 841
routing output or log to 776
extreme values 580, 583
F
FEEDBACK option
PROC SQL statement 1035
FILE= option
CONTENTS statement (CATALOG) 158
PROC CIMPORT statement 218
PROC CPORT statement 289
le transport process 286
FILEREF= argument
PROC FSLIST statement 490
les
aging 312
converting 215, 285
copying 277, 327
deleting 334
exchanging names 338
excluding from copying 339
modifying attributes 348
moving 331
renaming 322
renaming groups of 312
saving from deletion 355, 380
selecting for copying 356
FILL option
PROC CALENDAR statement 86
PICTURE statement (FORMAT) 440
FIN statement
CALENDAR procedure 94
FINNISH option
PROC SORT statement 1007
oating point exception (FPE) recovery 1195
FLOW option
DEFINE statement (REPORT) 901
PROC SQL statement 1035
FMTLEN option
CONTENTS statement (DATASETS) 325
FMTLIB option
PROC FORMAT statement 433, 434, 448
font les
adding 425, 426
searching directories for 421
specifying 421
TrueType 422, 427
Type 1 422
FONTFILE statement
FONTREG procedure 421
FONTPATH statement
FONTREG procedure 421
FONTREG 420
FONTREG procedure 420
concepts 423
examples 425
font naming conventions 423
overview 419
removing fonts from registry 424
SAS/GRAPH device drivers 425
supported font types 423
syntax 420
fonts
naming conventions 423
removing from registry 424
FORCE option
APPEND statement (DATASETS) 314
COPY statement (DATASETS) 330
PROC CATALOG statement 156, 176
PROC CIMPORT statement 218
Index
FW= option
PROC MEANS statement 529
G
G100 option
CHART procedure 194
Gaussian distribution
generation data sets
DATASETS procedure and 362
generation groups
appending with 318
changing number of 352
copying 333
deleting 335
removing passwords 352
GENERATION option
PROC CPORT statement 289
GENMAX= option
MODIFY statement (DATASETS) 349
GENNUM= data set option 318
GENNUM= option
AUDIT statement (DATASETS) 320
CHANGE statement (DATASETS) 322
DELETE statement (DATASETS) 334
MODIFY statement (DATASETS) 349
PROC DATASETS statement 310
REPAIR statement (DATASETS) 353
GETDELETED= statement
IMPORT procedure 508
GETNAMES= statement
IMPORT procedure 508
Ghostview printer denition 797
global statements 18
GRAY option
ITEM statement (PMENU) 672
grayed items 672
GROUP BY clause
SQL procedure 1065
GROUP option
DEFINE statement (REPORT) 902
CHART procedure 194
PROC OPTIONS statement 595
group variables 854, 902
grouping formatted data 27
GROUPINTERNAL option
CLASS statement (MEANS) 537
CLASS statement (TABULATE) 1198
groups
creating with formats 986
GROUPS= option
PROC RANK statement 817
GSPACE= option
CHART procedure 194
GUESSING ROWS= statement
IMPORT procedure 508
H
HAVING clause
SQL procedure 1067
HAXIS= option
PLOT statement (PLOT) 616
HBAR statement
CHART procedure 189
HEADER= option
PROC CALENDAR statement 89
HEADING= option
PROC PRINT statement 708
HEADLINE option
PROC REPORT statement 876
HEADSKIP option
PROC REPORT statement 876
HELP= option
ITEM statement (PMENU) 672
PROC REPORT statement 877
HEXPAND option
PLOT statement (PLOT) 618
hidden label characters 628
hidden observations 630
HILOC option
PLOT statement (TIMEPLOT) 1296
HOLIDATA= option
PROC CALENDAR statement 89
holidays data set 89, 108
multiple calendars 105, 106
HOLIDUR statement
CALENDAR procedure 95
HOLIFIN statement
CALENDAR procedure 96
HOLISTART statement
CALENDAR procedure 96
HOLIVAR statement
CALENDAR procedure 97
horizontal bar charts 180, 189
for subset of data 209
horizontal separators 1251
HOST option
PROC OPTIONS statement 595
host-specic procedures
HPERCENT= option
PROC PLOT statement 610
HPOS= option
PLOT statement (PLOT) 618
HREF= option
PLOT statement (PLOT) 618
HREFCHAR= option
PLOT statement (PLOT) 618
HREVERSE option
PLOT statement (PLOT) 618
HSCROLL= option
PROC FSLIST statement 492
HSPACE= option
PLOT statement (PLOT) 618
HTML destination 45
HTML les
style elements 989
TABULATE procedure 1279
HTML output
sample 35
HTML reports 725
HTML version setting 50
hypotheses
keywords and formulas
testing
HZERO option
PLOT statement (PLOT) 618
1427
1428
Index
I
IC CREATE statement
DATASETS procedure 340
IC DELETE statement
DATASETS procedure 343
IC REACTIVATE statement
DATASETS procedure 343
ID option
DEFINE statement (REPORT) 902
ITEM statement (PMENU) 672
ID statement
COMPARE procedure 238
MEANS procedure 540
PRINT procedure 717
TIMEPLOT procedure 1293
TRANSPOSE procedure
ID variables 902
COMPARE procedure 238
IDLABEL statement
TRANSPOSE procedure
IDMIN option
PROC MEANS statement 530
IMPORT 502
IMPORT= option
PROC REGISTRY statement 834
IMPORT procedure 502
data source statements 506
DBMS specications 504
examples 514
overview 501
syntax 502
IMPORT statement
DBMS table statements 511
importing
catalog entries 223
CIMPORT procedure 215
data libraries 222
excluding les or entries 219
indexed data sets 224
selecting les or entries 221
to registry 834, 839
importing data 501
DBMS tables 511
delimited les 506, 514
Excel spreadsheets 505
Microsoft Access 505, 513, 519
PC les 506
spreadsheet from Excel workbook 517, 521
spreadsheets 506
subset of records from Excel 518
IN= argument
PROC PWENCODE statement 808
IN condition 1081
in-line views 1064, 1123
querying 1148
IN= option
COPY statement (CATALOG) 159
COPY statement (DATASETS) 330
INDENT= option
TABLE statement (TABULATE) 1205
indenting row headings 1251
INDEX CENTILES statement
DATASETS procedure 344
INDEX CREATE statement
DATASETS procedure 345
INVALUE statement
FORMAT procedure 435
IS condition 1081
ITEM statement
PMENU procedure 671
ITEMHELP= option
DEFINE statement (REPORT)
902
J
joined-table component 1082
JOINREF option
PLOT statement (TIMEPLOT) 1296
joins
cross joins 1087
denition of 1083
equijoins 1083
inner joins 1084
joining a table with itself 1084
joining more than two tables 1089
joining tables 1083
joining three tables 1145
joining two tables 1131
natural joins 1088
outer joins 1086, 1123, 1138
reexive joins 1084
rows to be returned 1083
subqueries compared with 1091
table limit 1083
types of 1083
union joins 1088
JUST option
INVALUE statement (FORMAT) 436
K
KEEPLEN option
OUTPUT statement (MEANS) 545
KEY= option
PROC OPTLOAD statement 602
PROC OPTSAVE statement 604
key sequences 671
KEYLABEL statement
TABULATE procedure 1202
keyword headings
style elements for 1202
KEYWORD statement
TABULATE procedure 1202
keywords
for statistics
KILL option
PROC CATALOG statement 156, 176
PROC DATASETS statement 310
kurtosis
KURTOSIS keyword
L
LABEL option
PROC PRINT statement 708
MODIFY statement (DATASETS) 350
ODS TRACE statement 52
PROC PRINTTO statement 773
Index
LISTREG= option
PROC REGISTRY statement 835
LISTUSER option
PROC REGISTRY statement 835
LISTVAR option
PROC COMPARE statement 233
LOAD REPORT window
REPORT procedure 925
LOCALE option
PROC CALENDAR statement 90
log
COMPARE procedure results 244
default destinations 771
destinations for 771
displaying SQL denitions 1053
listing registry contents in 835
routing to catalog entries 779
routing to external les 776
routing to printer 775, 785
writing printer attributes to 805
writing registry contents to 834
LOG option
AUDIT statement (DATASETS) 320
PROC PRINTTO statement 773
logarithmic scale for plots 639
LONG option
PROC OPTIONS statement 596
LOOPS= option
PROC SQL statement 1036
LOWER function (SQL) 1093
LPI= option
PROC CHART statement 186
LS= option
PROC REPORT statement 877
M
macro return codes
COMPARE procedure 244
macro variables
set by SQL procedure 1119
macros
adjusting plot labels 658
markers 882, 1193
MARKUP destination 45
denition 40
markup languages 40
matching observations 226
matching patterns 1091, 1158
matching variables 226
MAX keyword
MAX= option
FORMAT procedure 436, 452
MAXDEC= option
PROC MEANS statement 530
PROC TIMEPLOT statement 1290
maximum value
MAXLABLEN= option
PROC FORMAT statement 434
MAXPRINT= option
PROC COMPARE statement 233
MAXSELEN= option
PROC FORMAT statement 434
mean
MEAN keyword
1429
MEAN option
CHART procedure 194
PROC STANDARD statement 1167
MEAN statement
CALENDAR procedure 97
MEANS 526
MEANS procedure 526
class variables 550
column width for output 556
computational resources 552
computer resources 539
concepts 550
examples 558
missing values 539, 556, 578
N Obs statistic 556
output 524
output data set 557
output statistics 575, 577, 578, 580,
overview 524
results 556
statistic keywords 533, 541
statistical computations 553
syntax 526
task tables 526, 527
MEANTYPE= option
PROC CALENDAR statement 90
measures of location
measures of shape
measures of variability
median
MEDIAN keyword
MEMOSIZE= statement
IMPORT procedure 512
MEMTYPE= option
AGE statement (DATASETS) 312
CHANGE statement (DATASETS) 323
CONTENTS statement (DATASETS) 325
COPY statement (DATASETS) 330, 331
DELETE statement (DATASETS) 334
EXCHANGE statement (DATASETS) 338
EXCLUDE statement (CIMPORT) 220
EXCLUDE statement (CPORT) 292
EXCLUDE statement (DATASETS) 339
MODIFY statement (DATASETS) 350
PROC CIMPORT statement 218
PROC CPORT statement 290
PROC DATASETS statement 311
REPAIR statement (DATASETS) 354
SAVE statement (DATASETS) 355
SELECT statement (CIMPORT) 221
SELECT statement (CPORT) 293
SELECT statement (DATASETS) 356
menu bars 665
associating with FSEDIT sessions 684, 691
associating with FSEDIT window 687
dening items 673
for FSEDIT applications 682
items in 671
key sequences for 671
menu items 671
MENU statement
PMENU procedure 673
merging data
SQL procedure 1110
message characters 442
MESSAGE= option
IC CREATE statement (DATASETS) 342
1430
Index
MESSAGES window
REPORT procedure 926
METHOD= option
PROC COMPARE statement 233
PROC PWENCODE statement 808
Microsoft Access
exporting tables 415
exporting to database 407
importing tables 505, 519
security level for tables 513
security levels 410
MIDPOINTS= option
CHART procedure 194
MIGRATE 591
MIGRATE procedure 591
MIN keyword
MIN= option
FORMAT procedure 436, 452
minimum value
missing formats/informats 457
MISSING option
CHART procedure 195
CLASS statement (MEANS) 537
CLASS statement (TABULATE) 1198
DEFINE statement (REPORT) 902
PROC CALENDAR statement 90
PROC MEANS statement 530
PROC PLOT statement 611
PROC REPORT statement 878
PROC TABULATE statement 1191
missing values
CALENDAR procedure 111
charts 195, 197
for class variables 539
MEANS procedure 539, 556, 578
NMISS keyword
PLOT procedure 630, 652
RANK procedure 821
REPORT procedure 860, 977
SQL procedure 1081, 1160
STANDARD procedure 1170
TABULATE procedure 1200, 1222
TIMEPLOT procedure 1298
TRANSPOSE procedure
MISSTEXT= option
TABLE statement (TABULATE) 1205
MIXED= statement
IMPORT procedure 508
MLF option
CLASS statement (MEANS) 537
CLASS statement (TABULATE) 1198
MNEMONIC= option
ITEM statement (PMENU) 673
mode
MODE keyword
MODE= option
PROC FONTREG statement 420
MODIFY statement
CATALOG procedure 163
DATASETS procedure 348
moment statistics 553
MOVE option
COPY statement (CATALOG) 159
COPY statement (DATASETS) 330
moving les 331
MSGLEVEL= option
PROC FONTREG statement 420
MT= option
PROC CPORT statement 290
MTYPE= option
AGE statement (DATASETS) 312
EXCLUDE statement (CPORT) 292
SELECT statement (CPORT) 293
SELECT statement (DATASETS) 356
multi-threaded sorting 1013
multilabel formats 1242
MULTILABEL option
PICTURE statement (FORMAT) 440
VALUE statement (FORMAT) 449
multilabel value formats 567
multipage tables 1253
multiple-choice survey data 1260
multiple-response survey data 1255
MULTIPLIER= option
PICTURE statement (FORMAT) 441
N
N keyword
N Obs statistic 556
N option
PROC PRINT statement 709
NAME= option
PROC TRANSPOSE statement
NAMED option
PROC REPORT statement 878
naming data sets 16
NATIONAL option
PROC SORT statement 1007
natural joins 1088
NEDIT option
PROC CPORT statement 290
nested variables 1186
NEW option
COPY statement (CATALOG) 159
PROC CIMPORT statement 218
PROC PRINTTO statement 774
APPEND statement (DATASETS) 314
NMISS keyword
NOALIAS option
PROC REPORT statement 878
NOBORDER option
PROC FSLIST statement 492
NOBS keyword
NOBYLINE system option
BY statement (MEANS) with 535
BY statement (PRINT) with 716
NOCC option
FSLIST command 493
PROC FSLIST statement 491
NOCOMPRESS option
PROC CPORT statement 290
NOCONTINUED option
TABLE statement (TABULATE) 1206
NODATE option
PROC COMPARE statement 234
NODS option
CONTENTS statement (DATASETS) 326
NODUPKEY option
PROC SORT statement 1009
NODUPRECS option
PROC SORT statement 1009
NOEDIT option
COPY statement (CATALOG) 160
PICTURE statement (FORMAT) 441
PROC CIMPORT statement 218
PROC CPORT statement 290
NOEXEC option
PROC REPORT statement 878
NOHEADER option
CHART procedure 195
PROC REPORT statement 878
NOINHERIT option
OUTPUT statement (MEANS) 546
NOLEGEND option
CHART procedure 195
PROC PLOT statement 611
NOLIST option
PROC DATASETS statement 311
NOMISS option
INDEX CREATE statement
(DATASETS) 345
PROC PLOT statement 611
NOMISSBASE option
PROC COMPARE statement 234
NOMISSCOMP option
PROC COMPARE statement 234
NOMISSING option
PROC COMPARE statement 234
NONE option
RBUTTON statement (PMENU) 676
noninteractive mode
printing from 867
NONOBS option
PROC MEANS statement 530
NOOBS option
PROC PRINT statement 709
NOPRINT option
CONTENTS statement (DATASETS) 326
DEFINE statement (REPORT) 902
PROC COMPARE statement 234
PROC SUMMARY statement 1178
NOREPLACE option
PROC FORMAT statement 434
normal distribution
NORMAL= option
PROC RANK statement 817
NORWEGIAN option
PROC SORT statement 1007
NOSEPS option
PROC TABULATE procedure 1191
NOSOURCE option
COPY statement (CATALOG) 160
NOSRC option
PROC CIMPORT statement 219
PROC CPORT statement 291
NOSTATS option
CHART procedure 195
NOSUMMARY option
PROC COMPARE statement 234
NOSYMBOL option
CHART procedure 195
NOSYMNAME option
PLOT statement (TIMEPLOT) 1296
NOTE option
PROC COMPARE statement 234
NOTRAP option
PROC MEANS statement 530
Index
NOTSORTED option
BY statement 58
BY statement (CALENDAR) 92
BY statement (CHART) 188
BY statement (COMPARE) 237
BY statement (MEANS) 535
BY statement (PLOT) 612
BY statement (PRINT) 716
BY statement (RANK) 819
BY statement (REPORT) 889
BY statement (STANDARD) 1168
BY statement (TABULATE) 1196
BY statement (TIMEPLOT) 1291
BY statement (TRANSPOSE)
FORMAT procedure 452
ID statement (COMPARE) 238
FORMAT procedure 436
NOUPDATE option
PROC FONTREG statement 421
NOVALUES option
PROC COMPARE statement 234
NOWARN option
PROC DATASETS statement 311
NOZERO option
DEFINE statement (REPORT) 903
NOZEROS option
CHART procedure 195
NPLUS1 option
PROC RANK statement 817
NPP option
PLOT statement (TIMEPLOT) 1296
NSRC option
PROC CPORT statement 291
null hypothesis
NUM option
PROC FSLIST statement 492
NUMBER option
PROC SQL statement 1036
numbers
template for printing 438
numeric values
converting raw character data to 470
summing 742
numeric variables
sorting orders for 1013
summing 737
NWAY option
PROC MEANS statement 530
O
OBS= option
PROC PRINT statement 709
observations
consolidating in reports 957
frequency of 61
grouping for reports 731
hidden 630
page layout 720
SQL procedure 1029
statistics for groups of 7
total number of
transposing variables into
weighting 549
observations, comparing
comparison summary 247, 252
1431
OPTLOAD 602
OPTLOAD procedure 602
overview 601
syntax 602
task table 602
OPTSAVE 604
OPTSAVE procedure 604
overview 603
syntax 604
task table 604
ORDER BY clause
SQL procedure 1068, 1123
ORDER option
DEFINE statement (REPORT) 903
CLASS statement (MEANS) 537
CLASS statement (TABULATE) 1198
CONTENTS statement (DATASETS) 326
DEFINE statement (REPORT) 903
PROC MEANS statement 531
PROC TABULATE statement 1191, 1230
order variables 853, 903
orthogonal expressions 1123
OUT= argument
APPEND statement (DATASETS) 314
COPY statement (CATALOG) 159
COPY statement (DATASETS) 327
PROC IMPORT statement 503
OUT= option
CONTENTS statement (CATALOG) 158
CONTENTS statement (DATASETS) 326
OUTPUT statement (MEANS) 541
PROC COMPARE statement 235, 254, 270
PROC OPTSAVE statement 604
PROC PRTEXP statement 804
PROC PWENCODE statement 808
PROC RANK statement 817
PROC REPORT statement 878
PROC SORT statement 1010
PROC STANDARD statement 1167
PROC TABULATE statement 1192
PROC TRANSPOSE statement
OUT2= option
CONTENTS statement (DATASETS) 326
OUTALL option
PROC COMPARE statement 235
OUTBASE option
PROC COMPARE statement 235
OUTCOMP option
PROC COMPARE statement 235
OUTDIF option
PROC COMPARE statement 235
OUTDUR statement
CALENDAR procedure 98
outer joins 1086, 1123, 1138
OUTER UNION set operator 1095
OUTFILE= argument
PROC EXPORT statement 405
OUTFIN statement
CALENDAR procedure 99
OUTLIB= option
PROC CPORT statement 291
OUTNOEQUAL option
PROC COMPARE statement 235
OUTOBS= option
PROC SQL statement 1036
OUTPERCENT option
PROC COMPARE statement 235
1432
Index
P
P keywords
p-values
page dimension 1215
page dimension text 1186
page ejects 717
page layout 720
column headings 722
column width 723
customizing 761
observations 720
plots 1297
with many variables 754
page numbering 775
PAGE option
BREAK statement (REPORT) 887
DEFINE statement (REPORT) 903
PROC FORMAT statement 434
RBREAK statement (REPORT) 910
PAGEBY statement
PRINT procedure 717
panels
in reports 968
PANELS= option
PROC REPORT statement 880
parameters
partitioned data sets
multi-threaded sorting 1013
password-protected data sets
appending 316
copying les 332
transporting 294
passwords 351
assigning 351
changing 351
DATASETS procedure with 359
encoding 807, 809
integrity constraints and 327
removing 352
pattern matching 1091, 1158
PC les
importing 506
PCTLDEF= option
PROC MEANS statement 532
PROC REPORT statement 882
PROC TABULATE statement 1193
PCTN statistic 1217
denominator for 1217
PCTSUM statistic 1217
denominator for 1218
PDF les
style elements 989
TABULATE procedure 1279
PDF output
sample 37
PDF reports 729
peakedness
penalties 626
changing 627, 661
index values for 626
PENALTIES= option
PLOT statement (PLOT) 619
percent coefcient of variation
percent difference 244
PERCENT option
CHART procedure 195
PROC RANK statement 818
percentage bar charts 201
percentages
displaying with denominator denitions 1269
in reports 975
TABULATE procedure 1216, 1266, 1269
percentiles
keywords and formulas
permanent data sets 16
permanent formats/informats 456
accessing 457
retrieving 478
picture formats 438
creating 464
digit selectors 442
directives 442
lling 483
message characters 442
steps for building 443
picture-name formats 447
PICTURE statement
FORMAT procedure 438
pie charts 182, 189
PIE statement
CHART procedure 189
PLACEMENT= option
PLOT statement (PLOT) 619
PLOT 609
PLOT procedure 609
combinations of variables 615
computational resources 628
concepts 624
examples 631
generating data with program statements 625
hidden observations 630
labeling plot points 625
missing values 630, 652
ODS table names 629
overview 606
portability of ODS output 630
printed output 629
results 629
RUN groups 624
scale of axes 629
syntax 609
task tables 609, 613
variable lists in plot requests 615
PLOT statement
PLOT procedure 613
TIMEPLOT procedure 1293
plots
collision states 628
contour plots 615, 642
customizing axes
customizing plotting symbols
data on logarithmic scale 639
data values on axis 640
hidden label characters 628
horizontal axis 632
labels 649, 654, 658
multiple observations, on one line
multiple plots per page 636
overlaying 628, 634
page layout 1297
penalties 626
plotting a single variable
plotting BY groups 646
pointer symbols 625
reference lines 628, 632
specifying in TIMEPLOT 1293
superimposing
plotting symbols 631
customizing
variables for
PMENU 667
PMENU catalog entries
naming 673
steps for building and using 680
storing 667
PMENU command 665
PMENU procedure 667
concepts 679
ending 680
examples 682
execution of 679
initiating 679
overview 665
PMENU catalog entries 680
syntax 667
task tables 667, 671
templates for 681
pointer symbols 625
Index
populations
PORT= statement
EXPORT procedure 410
IMPORT procedure 508, 512
POS= option
PLOT statement (TIMEPLOT) 1296
PostScript les 751
PostScript output
previewing 797
sample 35
power of statistical tests
PREFIX= option
PICTURE statement (FORMAT) 441
PROC TRANSPOSE statement
preloaded formats 904, 1199
class variables with 570, 1237
PRELOADFMT option
CLASS statement (MEANS) 538
CLASS statement (TABULATE) 1199
DEFINE statement (REPORT) 904
PRINT 706
PRINT option
PROC MEANS statement 531
PROC SQL statement 1036
PROC STANDARD statement 1167
PROC PRINTTO statement 774
PRINT procedure 706
examples 723
HTML reports 725
listing reports 724, 727
overview 703
page layout 720, 754, 761
PDF reports 729
PostScript les 751
procedure output 720
results 720
RTF reports 734
style denitions with 49
style elements 711
syntax 706
task tables 706, 707
XML les 740
PRINTALL option
PROC COMPARE statement 236
PRINTALLTYPES option
PROC MEANS statement 532
printer attributes
extracting from registry 803
writing to data sets 806
writing to log 805
printer denitions 789
adding 799
available to all users 798
creating 805
deleting 790, 799, 801
exporting 790
for Ghostview printer 797
in SASHELP library 790
modifying 799, 805
multiple 796
replicating 805
PRINTER destination 46
denition 40
printers
list of 790
routing log or output to 775, 785
PRINTIDVARS option
PROC MEANS statement 532
printing
See also printing reports
all data sets in library 767
data set contents 275
formatted values 25
grouping observations 731
informat/format descriptions 477
page ejects 717
page layout 720, 754, 761
selecting variables for 719, 723
template for printing numbers 438
printing reports 867
batch mode 867
from Output window 867
from REPORT window 867
interactive line mode 868
noninteractive mode 867
PRINTTO procedure 868
with forms 867
with ODS 867
PRINTMISS option
TABLE statement (TABULATE) 1206
PRINTTO 772
PRINTTO procedure 772
concepts 775
examples 776
overview 771
printing reports 868
syntax 772
task table 772
probability function
probability values
PROBT keyword
PROC CALENDAR statement 85
PROC CATALOG statement 155
options 156
PROC CHART statement 185
PROC CIMPORT statement 216
PROC COMPARE statement 230
PROC CONTENTS statement 276
PROC CPORT statement 286
PROC DATASETS statement 308
restricting member types 360
PROC DISPLAY statement 396
PROC EXPORT statement 404
PROC FONTREG statement 420
PROC FORMAT statement 432
PROC FSLIST statement 490
PROC IMPORT statement 502
PROC MEANS statement 527
PROC OPTIONS statement 595
PROC OPTLOAD statement 602
PROC OPTSAVE statement 604
PROC PLOT statement 609
PROC PMENU statement 667
PROC PRINT statement 707
PROC PRINTTO statement 772
PROC PRTDEF statement 790
PROC PRTEXP statement 804
PROC PWENCODE statement 807
PROC RANK statement 816
PROC REGISTRY statement 832
PROC REPORT statement 870
PROC SORT statement 1005
PROC SQL statement 1034
1433
1434
Index
procedure output
as input le 782
default destinations 771
destinations for 771
page numbering 775
routing to catalog entries 779
routing to external les 776
routing to printer 775, 785
procedures
descriptions of 10
ending 63
functional categories 3
host-specic
raw data for examples
report-writing procedures 3, 4
statistical procedures 3, 6
style denitions with 48
utility procedures 4, 7
PROFILE= option
PROC REPORT statement 880
PROFILE window
REPORT procedure 926
project management 83
PROMPT option
PROC REPORT statement 881
PROC SQL statement 1037
PROMPTER window
REPORT procedure 927
PROTO 789
PROTO procedure 789
PRTDEF 789
PRTDEF procedure 789
examples 796
input data set 791
optional variables 793
overview 789
required variables 792
syntax 789
task table 790
valid variables 791
PRTEXP 803
PRTEXP procedure 803
concepts 805
examples 805
overview 803
syntax 803
PS= option
PROC REPORT statement 881
PSPACE= option
PROC REPORT statement 881
pull-down menus 665
activating 665
associating FRAME applications with 700
dening 674
for DATA step window applications 694
grayed items 672
items in 671
key sequences for 671
separator lines 678
submenus 678
PUT statement
compared with LINE statement (REPORT) 908
PW= option
MODIFY statement (DATASETS) 350
PROC DATASETS statement 311
PWD= statement
EXPORT procedure 409
IMPORT procedure 512
PWENCODE 807
PWENCODE procedure 807
concepts 808
encoding vs. encryption 808
examples 809
overview 807
syntax 807
Q
Q keywords
QMARKERS= option
PROC MEANS statement 532
PROC REPORT statement 882
PROC TABULATE statement 1193
QMETHOD= option
PROC MEANS statement 532
PROC REPORT statement 882
PROC TABULATE statement 1193
QNTLDEF= option
PROC MEANS statement 532
PROC REPORT statement 882
PROC TABULATE statement 1193
QRANGE keyword
quantiles 882, 1193
efciency issues 7
MEANS procedure 555
queries
creating tables from results 1127
creating views from results 1143
DBMS queries 1079
in-line view queries 1148
query-expression component 1093
query expressions 1094
creating PROC SQL tables from 1049
creating PROC SQL views from 1049
subqueries 1102
validating syntax 1070
QUIT statement 63
procedures supporting 63
R
radio boxes 670, 675
radio buttons 670, 676
color of 676
default 675
RADIOBOX statement
PMENU procedure 675
range
RANGE keyword
RANGE= statement
IMPORT procedure 508
ranges
for character strings 480
FORMAT procedure 453
RANK 815
RANK procedure 815
computer resources 820
concepts 820
examples 822
Index
S
S= option
PLOT statement (PLOT) 622
samples
sampling distribution
SAS/ACCESS views
SQL procedure 1029
updating 1121
SAS/AF applications
executing 395, 396
SAS data views
DICTIONARY tables 1116
SQL procedure 1029
SAS Explorer window
list of available styles 48
SAS formatted destinations 43, 44
SAS/GRAPH device drivers
system fonts 425
SASHELP views 1116
retrieving information about 1117
SASUSER library
Ghostview printer denition in 797
1435
1436
Index
SAVAGE option
PROC RANK statement 818
SAVE DATA SET window
REPORT procedure 933
SAVE DEFINITION window
REPORT procedure 933
SAVE statement
CATALOG procedure 164
DATASETS procedure 355
SCANMEMO= statement
IMPORT procedure 512
SCANTEXT= statement
IMPORT procedure 509
SCANTIME= statement
IMPORT procedure 509, 512
schedule calendars 79, 102
advanced 80
simple 79
scheduling 83
searching for patterns 1091, 1158
SELECT clause
SQL procedure 1058
SELECT statement
CATALOG procedure 165
CIMPORT procedure 221
CPORT procedure 293
DATASETS procedure 356
FORMAT procedure 448
PRTEXP procedure 804
SQL procedure 1058
selection lists 52
destinations for output objects 53
SELECTION statement
PMENU procedure 677
separator lines 678
SEPARATOR statement
PMENU procedure 678
SERVER= statement
EXPORT procedure 410
IMPORT procedure 509, 513
SERVICE= statement
EXPORT procedure 410
IMPORT procedure 509, 513
set membership 1081
set operators 1094, 1123
SET statement
appending data 315
SHEET= statement
EXPORT procedure 408
IMPORT procedure 509
SHORT option
CONTENTS statement (DATASETS) 326
PROC OPTIONS statement 596
SHOWALL option
PROC REPORT statement 882
shrinking reports 866
signicance
simple indexes 1044
simple random sample
skewness
SKEWNESS keyword
SKIP option
BREAK statement (REPORT) 887
RBREAK statement (REPORT) 910
SLIST= option
PLOT statement (PLOT) 623
SORT 1005
sort order
ASCII 1007, 1014
EBCDIC 1007, 1014
for character variables 1014
for numeric variables 1013
SORT procedure 1005
character variable sorting orders 1014
collating-sequence options 1007
concepts 1013
DBMS data source 1013
examples 1016
integrity constraints 1015
multi-threaded sorting 1013
numeric variable sorting orders 1013
output 1016
output data set 1016
overview 1003
results 1016
sorting data sets 1004
stored sort information 1015
syntax 1005
task tables 1005, 1016
SORTEDBY= option
MODIFY statement (DATASETS) 350
sorting
by multiple variable values 1017
collating sequence 1007
data retrieved by views 1050
data sets 1004
in descending order 1019
maintaining relative order of observations 1020
multi-threaded 1013
retaining rst observation of BY groups 1023
stored sort information 1015
SORTMSG option
PROC SQL statement 1037
SORTSEQ= option
PROC SORT statement 1007
PROC SQL statement 1037
SORTSIZE= option
PROC SORT statement 1011
SOUNDS-LIKE operator 1150
SOURCE window
REPORT procedure 934
SPACE= option
CHART procedure 195
SPACING= option
DEFINE statement (REPORT) 904
PROC REPORT statement 883
SPLIT= option
PLOT statement (PLOT) 623
PROC PRINT statement 710
PROC REPORT statement 883
PROC TIMEPLOT statement 1290
spread of values
spreadsheets
exporting 407, 416
exporting subset of observations to 414
exporting to specic spreadsheet 415
importing 505, 506
importing from Excel workbook 517, 521
importing subset of records from 518
SQL 1031
SQL, embedded 1125
SQL components 1071
BETWEEN condition 1071
Index
overview 1163
results 1170
standardizing data 1163
statistical computations 1171
syntax 1165
task tables 1165, 1166
standardizing data 1163
order of variables 1169
specifying variables 1169
weights for analysis variables 1170
star charts 183, 190
STAR statement
CHART procedure 190
START statement
CALENDAR procedure 100
STARTAT= option
PROC REGISTRY statement 835
STATE= option
ITEM statement (PMENU) 673
statements with same function in multiple procedures 57
ATTRIB 57
BY 58
FORMAT 57
FREQ 61
LABEL 57
QUIT 63
WEIGHT 64
WHERE 69
STATES option
PLOT statement (PLOT) 623
statistic, dened
statistic option
DEFINE statement (REPORT) 904
statistical analysis
transposing data for
statistical procedures 3, 6
efciency issues 7
quantiles 7
statistical summaries 1107
statistically signicant
statistics
based on number of arguments 1109
computational requirements for 32
descriptive statistics 1177
for groups of observations 7
formulas for
in reports 954
keywords for
measures of location
measures of shape
measures of variability
normal distribution
percentiles
populations
REPORT procedure 857
samples
sampling distribution
summarization procedures
table of descriptive statistics 31
TABULATE procedure 1213
testing hypotheses
weighted statistics 64
weights
statistics procedures
STATISTICS window
REPORT procedure 934
STATS option
PROC COMPARE statement 236
STD keyword
STD= option
PROC STANDARD statement 1167
STDDEV keyword
STDERR keyword
STDMEAN keyword
STIMER option
PROC SQL statement 1037
string comparison operators
truncated 1102
stub-and-banner reports 1269
Students t distribution
Students t statistic
two-tailed p-value
Students t test 554
STYLE= attribute
CALL DEFINE statement (REPORT) 892
style attributes 46
applying to table cells 1221
assigning with formats 1221
denition 48
style denitions
denition of 47
procedures with 48
SAS-supplied 48
style elements
class variable level value headings 1201
denition 47
for keyword headings 1202
for ODS output 989, 994
in dimension expressions 1210
PRINT procedure 711
REPORT procedure 863, 989, 994
TABULATE procedure 1194, 1220, 1279
STYLE= option
BREAK statement (REPORT) 887
CLASS statement (TABULATE) 1200
CLASSLEV statement (TABULATE) 1201
COMPUTE statement (REPORT) 896
DEFINE statement (REPORT) 904
ID statement (PRINT) 717
KEYWORD statement (TABULATE) 1202
PROC PRINT statement 711
PROC REPORT statement 883
PROC TABULATE statement 1194
RBREAK statement (REPORT) 910
REPORT procedure 863, 864
SUM statement (PRINT) 718
TABLE statement (TABULATE) 1207
TABULATE procedure 1220
VAR statement (PRINT) 720
VAR statement (TABULATE) 1211
SUBGROUP= option
CHART procedure 196
SUBMENU statement
PMENU procedure 678
submenus 678
subqueries 1102
compared with joins 1091
correlated 1104
efciency and 1105
returning rows 1080
subsetting data
SQL procedure 1065, 1067
WHERE statement 69
SUBSTITUTE= option
CHECKBOX statement (PMENU) 668
RBUTTON statement (PMENU) 676
SUBSTRING function (SQL) 1106
subtables 1186
SUM keyword
sum of squares
corrected
uncorrected
sum of the weights
SUM option
CHART procedure 196
SUM statement
CALENDAR procedure 100
PRINT procedure 718, 737, 742
SUMBY statement
PRINT procedure 719
summarization procedures
data requirements
SUMMARIZE option
BREAK statement (REPORT) 887
RBREAK statement (REPORT) 910
summarizing data
SQL procedure 1108
SUMMARY 1178
summary calendars 79, 102
multiple activities per day 107
simple 81
summary-function component 1107
summary lines 847
construction of 937
SUMMARY procedure 1178
overview 1177
syntax 1178
summary reports 847
summary statistics
COMPARE procedure 250, 273
SUMSIZE= option
PROC MEANS statement 533
SUMVAR= option
CHART procedure 196
SUMWGT keyword
superimposing plots
SUPPRESS option
BREAK statement (REPORT) 888
survey data
multiple-choice 1260
multiple-response 1255
SUSPEND option
AUDIT statement (DATASETS) 321
SWEDISH option
PROC SORT statement 1007
SYMBOL= option
CHART procedure 196
symbol variables
TIMEPLOT procedure 1292
SYSINFO macro variable 244
system fonts 419
SAS/GRAPH device drivers for 425
system options
display setting for single option 598
display settings for a group 593
list of current settings 591
loading from registry or data sets 601
OPTIONS procedure 591
procedures and 17
saving current settings 603
1437
1438
Index
T
T keyword
table aliases 1064, 1083
TABLE= argument
PROC IMPORT statement 503
table attributes
denition 47
table denitions 1053
denition of 40, 47
table elements
denition 47
table-expression component 1113
table expressions 1094
TABLE statement
TABULATE procedure 1203
tables
See also PROC SQL tables
applying style attributes to cells 1221
cells with missing values 1228
class variable combinations 1235
crosstabulation 1269
customizing headings 1244
describing for printing 1203
formatting values in 1215
multipage 1253
subtables 1186
two-dimensional 1232
TABULATE 1187
TABULATE procedure 1187
BY-group processing 1215
complex tables 1181
concepts 1213
dimension expressions 1208
examples 1232
formatting characters 1189
formatting class variables 1214
formatting values in tables 1215
headings 1227, 1229, 1230
missing values 1200, 1222
ODS and 1182
overview 1180
page dimension 1215
percentages 1216
portability of ODS output 1231
results 1222
simple tables 1180
statistics 1213
style denitions with 49
style elements 1194, 1220
syntax 1187
task tables 1187, 1203
terminology 1183
tagsets 45
list of 41
TAGSORT option
PROC SORT statement 1011
TAPE option
PROC CIMPORT statement 219
PROC CPORT statement 291
TEMPLATE procedure
list of available styles 48
templates
for printing numbers 438
simple transposition
syntax
task table
transposing BY groups
variable names, from numeric values
transposed variables
attributes of
labeling
naming
TRANTAB statement
CPORT procedure 294
TRAP option
PROC TABULATE procedure 1195
TrueType font les
replacing from a directory 427
searching directories for 422
TRUETYPE statement
FONTREG procedure 422
truncated string comparison operators 1102
two-dimensional tables 1232
two-tailed tests
Type 1 font les 422
Type I error rate
Type II error rate
TYPE= option
CHART procedure 196
MODIFY statement (DATASETS) 351
TYPE1 statement
FONTREG procedure 422
TYPES statement
MEANS procedure 546
U
UCLM keyword
UID= statement
EXPORT procedure 409
IMPORT procedure 513
UL option
BREAK statement (REPORT) 888
RBREAK statement (REPORT) 911
uncorrected sum of squares
underlining 886, 888, 910, 911
UNDO_POLICY= option
PROC SQL statement 1037
unformatted values
comparing 244
UNIFORM option
PROC PLOT statement 611
PROC TIMEPLOT statement 1290
UNINSTALL= option
PROC REGISTRY statement 835
union joins 1088
UNION operator 1097
UNIQUE keyword 1044
UNIQUE option
CREATE INDEX statement
(DATASETS) 345
UNIT= argument
PROC FSLIST statement 491
UNIT= option
PROC PRINTTO statement 775
universe
unsorted data
comparing 238
Index
UPCASE option
INVALUE statement (FORMAT) 436
PROC REGISTRY statement 835
UPDATE statement
SQL procedure 1069
UPDATECENTILES= option
CREATE INDEX statement
(DATASETS) 346
INDEX CENTILES statement
(DATASETS) 344
UPPER function (SQL) 1114
USEDATE= statement
IMPORT procedure 510, 513
USER data library 17
user input
collecting in dialog boxes 685
USER literal 1100
USER_VAR option
AUDIT statement (DATASETS) 321
USESASHELP option
PROC FONTREG statement 421
PROC PRTDEF statement 790
PROC PRTEXP statement 804
PROC REGISTRY statement 836
USS keyword
utility procedures 4, 7
V
VALIDATE statement
SQL procedure 1070
VALUE option
PROC OPTIONS statement 596
value-range-sets 453
VALUE statement
FORMAT procedure 448
VAR keyword
VAR statement
CALENDAR procedure 101
COMPARE procedure 239
MEANS procedure 548
PRINT procedure 719
RANK procedure 820
STANDARD procedure 1169
SUMMARY procedure 1179
TABULATE procedure 1211
TRANSPOSE procedure
VARDEF= option
PROC MEANS statement 534
PROC REPORT statement 883
PROC STANDARD statement 1167
PROC TABULATE statement 1195
variability
variable formats
COMPARE procedure 244
variable names
shortcut notations for 24
variables
across variables 854, 899
analysis variables 855, 899
associating formats/informats with 430, 455
attributes of 348
CHART procedure 197
class variables 1197
computed variables 855, 901, 983
copying without transposing
VZERO option
PLOT statement (PLOT) 624
W
WARNING option
PROC COMPARE statement 236
WAYS option
OUTPUT statement (MEANS) 546
WAYS statement
MEANS procedure 549
WBUILD macro 693
WEEKDAYS option
PROC CALENDAR statement 90
WEIGHT= option
DEFINE statement (REPORT) 905
VAR statement (MEANS) 548
VAR statement (TABULATE) 1212
WEIGHT statement 64
calculating weighted statistics 64
example 65
MEANS procedure 549
procedures supporting 64
REPORT procedure 912
STANDARD procedure 1170
TABULATE procedure 1212
weight values 874, 1189
weighted statistics 64
weighting observations 549
weights
analysis variables 64
WGDB= statement
EXPORT procedure 409
IMPORT procedure 513
WHERE ALSO window
REPORT procedure 936
WHERE clause
SQL procedure 1065
WHERE statement 69
example 69
procedures supporting 69
WHERE window
REPORT procedure 935
WIDTH= option
CHART procedure 197
DEFINE statement (REPORT) 905
PROC PRINT statement 714
window applications
menus for 694
windows
associating with menus 697
WINDOWS option
PROC REPORT statement 884
WITH statement
COMPARE procedure 240
WORKDATA= option
PROC CALENDAR statement 91
workdays data set 91, 110
default workshifts instead of 110
missing values 111
workshifts 110
WRAP option
PROC REPORT statement 884
WRITE= option
MODIFY statement (DATASETS) 351
1439
1440
Index
X
XML les
XML output
sample 38
740
Your Turn
If you have comments or suggestions about Base SAS 9.1 Procedures Guide, please
send them to us on a photocopy of this page, or send us electronic mail.
For comments about this book, please return the photocopy to
SAS Publishing
SAS Campus Drive
Cary, NC 27513
email: [email protected]
Send suggestions about the software, please return the photocopy to
SAS Institute Inc.
Technical Support Division
SAS Campus Drive
Cary, NC 27513
email: [email protected]
Volume 3
CORR, FREQ, and UNIVARIATE Procedures
The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2004.
Base SAS 9.1 Procedures Guide. Cary, NC: SAS Institute Inc.
Base SAS 9.1 Procedures Guide
Copyright 2004 by SAS Institute Inc., Cary, NC, USA
ISBN 1-59047-204-7
All rights reserved. Produced in the United States of America. No part of this publication
may be reproduced, stored in a retrieval system, or transmitted, in any form or by any
means, electronic, mechanical, photocopying, or otherwise, without the prior written
permission of the publisher, SAS Institute Inc.
U.S. Government Restricted Rights Notice. Use, duplication, or disclosure of this
software and related documentation by the U.S. government is subject to the Agreement
with SAS Institute and the restrictions set forth in FAR 52.22719 Commercial Computer
Software-Restricted Rights (June 1987).
SAS Institute Inc., SAS Campus Drive, Cary, North Carolina 27513.
1st printing, January 2004
SAS Publishing provides a complete selection of books and electronic products to help
customers use SAS software to its fullest potential. For more information about our
e-books, e-learning products, CDs, and hard-copy books, visit the SAS Publishing Web site
at support.sas.com/pubs or call 1-800-727-3228.
SAS and all other SAS Institute Inc. product or service names are registered trademarks
or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA
registration.
Other brand and product names are registered trademarks or trademarks of their
respective companies.
Contents
Chapter 1. The CORR Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ii
Chapter 1
GETTING STARTED . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SYNTAX . . . . . . . . .
PROC CORR Statement
BY Statement . . . . .
FREQ Statement . . . .
PARTIAL Statement . .
VAR Statement . . . .
WEIGHT Statement . .
WITH Statement . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
6
7
12
13
13
13
13
14
DETAILS . . . . . . . . . . . . . . . . .
Pearson Product-Moment Correlation .
Spearman Rank-Order Correlation . .
Kendalls Tau-b Correlation Coefcient
Hoeffding Dependence Coefcient . .
Partial Correlation . . . . . . . . . . .
Fishers z Transformation . . . . . . .
Cronbachs Coefcient Alpha . . . . .
Missing Values . . . . . . . . . . . . .
Output Tables . . . . . . . . . . . . .
Output Data Sets . . . . . . . . . . . .
Determining Computer Resources . . .
ODS Table Names . . . . . . . . . . .
ODS Graphics (Experimental) . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
14
14
16
16
18
18
21
24
26
26
27
28
30
31
EXAMPLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Example 1.1. Computing Four Measures of Association . . . . . . .
Example 1.2. Computing Correlations between Two Sets of Variables
Example 1.3. Analysis Using Fishers z Transformation . . . . . . .
Example 1.4. Applications of Fishers z Transformation . . . . . . .
Example 1.5. Computing Cronbachs Coefcient Alpha . . . . . . .
Example 1.6. Saving Correlations in an Output Data Set . . . . . . .
Example 1.7. Creating Scatter Plots . . . . . . . . . . . . . . . . . .
Example 1.8. Computing Partial Correlations . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
34
34
38
42
44
48
52
53
58
Chapter 1
Getting Started
The following statements create the data set Fitness, which has been altered to contain some missing values:
*----------------- Data on Physical Fitness -----------------*
| These measurements were made on men involved in a physical |
| fitness course at N.C. State University.
|
| The variables are Age (years), Weight (kg),
|
| Runtime (time to run 1.5 miles in minutes), and
|
| Oxygen (oxygen intake, ml per kg body weight per minute)
|
| Certain values were changed to missing for the analysis.
|
*------------------------------------------------------------*;
data Fitness;
input Age Weight Oxygen RunTime @@;
datalines;
44 89.47 44.609 11.37
40 75.07 45.313 10.07
44 85.84 54.297 8.65
42 68.15 59.571 8.17
38 89.02 49.874
.
47 77.45 44.811 11.63
40 75.98 45.681 11.95
43 81.19 49.091 10.85
44 81.42 39.442 13.08
38 81.87 60.055 8.63
44 73.03 50.541 10.13
45 87.66 37.388 14.03
45 66.45 44.754 11.12
47 79.15 47.273 10.60
54 83.12 51.855 10.33
49 81.42 49.156 8.95
51 69.63 40.836 10.95
51 77.91 46.672 10.00
48 91.63 46.774 10.25
49 73.37
.
10.08
57 73.37 39.407 12.63
54 79.38 46.080 11.17
52 76.32 45.441 9.63
50 70.87 54.625 8.92
51 67.25 45.118 11.08
54 91.63 39.203 12.88
51 73.71 45.790 10.47
57 59.08 50.545 9.93
49 76.32
.
.
48 61.24 47.920 11.50
52 82.78 47.467 10.50
;
The following statements invoke the CORR procedure and request a correlation analysis:
ods html;
ods graphics on;
proc corr data=Fitness plots;
run;
ods graphics off;
ods html close;
Getting Started
Variables:
Age
Weight
Oxygen
RunTime
Simple Statistics
Variable
Mean
Std Dev
Sum
Minimum
Maximum
31
31
29
29
Age
Weight
Oxygen
RunTime
47.67742
77.44452
47.22721
10.67414
5.21144
8.32857
5.47718
1.39194
1478
2401
1370
309.55000
38.00000
59.08000
37.38800
8.17000
57.00000
91.63000
60.05500
14.03000
By default, all numeric variables not listed in other statements are used in the analysis. Observations with nonmissing values for each variable are used to derive the
univariate statistics for that variable.
Pearson Correlation Coefficients
Prob > |r| under H0: Rho=0
Number of Observations
Age
Age
Weight
Oxygen
RunTime
1.00000
-0.23354
0.2061
31
-0.31474
0.0963
29
0.14478
0.4536
29
-0.23354
0.2061
31
1.00000
-0.15358
0.4264
29
0.20072
0.2965
29
-0.31474
0.0963
29
-0.15358
0.4264
29
1.00000
-0.86843
<.0001
28
0.14478
0.4536
29
0.20072
0.2965
29
-0.86843
<.0001
28
31
Weight
Oxygen
RunTime
31
29
1.00000
29
By default, Pearson correlation statistics are computed from observations with nonmissing values for each pair of analysis variables. With missing values in the analysis,
the Pearson Correlation Coefcients table shown in Figure 1.2 displays the correlation, the p-value under the null hypothesis of zero correlation, and the number of
nonmissing observations for each pair of variables.
The table displays a correlation of 0.86843 between Runtime and Oxygen, which
is signicant with a p-value less than 0.0001. That is, there exists an inverse linear relationship between these two variables. As Runtime (time to run 1.5 miles
in minutes) increases, Oxygen (oxygen intake, ml per kg body weight per minute)
decreases.
Syntax
The following statements are available in PROC CORR.
Tasks
Specify data sets
Input data set
Output data set with Hoeffdings D statistics
Output data set with Kendall correlation statistics
Output data set with Pearson correlation statistics
Output data set with Spearman correlation statistics
Control statistical analysis
Exclude observations with nonpositive weight values
from the analysis
Exclude observations with missing analysis values
from the analysis
Request Hoeffdings measure of dependence, D
Request Kendalls tau-b
Request Pearson product-moment correlation
Request Spearman rank-order correlation
Request Pearson correlation statistics using Fishers
z transformation
Request Spearman rank-order correlation statistics
using Fishers z transformation
Control Pearson correlation statistics
Compute Cronbachs coefcient alpha
Compute covariances
Compute corrected sums of squares and crossproducts
Options
DATA=
OUTH=
OUTK=
OUTP=
OUTS=
EXCLNPWGT
NOMISS
HOEFFDING
KENDALL
PEARSON
SPEARMAN
FISHER PEARSON
FISHER SPEARMAN
ALPHA
COV
CSSCP
Tasks
Options
Compute correlation statistics based on Fishers
z transformation
Exclude missing values
Specify singularity criterion
Compute sums of squares and crossproducts
Specify the divisor for variance calculations
FISHER
NOMISS
SINGULAR=
SSCP
VARDEF=
BEST=
NOCORR
NOPRINT
NOPROB
NOSIMPLE
RANK
The following options (listed in alphabetical order) can be used in the PROC CORR
statement:
ALPHA
calculates and prints Cronbachs coefcient alpha. PROC CORR computes separate coefcients using raw and standardized values (scaling the variables to a unit
variance of 1). For each VAR statement variable, PROC CORR computes the correlation between the variable and the total of the remaining variables. It also computes
Cronbachs coefcient alpha using only the remaining variables.
If a WITH statement is specied, the ALPHA option is invalid. When you specify
the ALPHA option, the Pearson correlations will also be displayed. If you specify the
OUTP= option, the output data set also contains observations with Cronbachs coefcient alpha. If you use the PARTIAL statement, PROC CORR calculates Cronbachs
coefcient alpha for partialled variables. See the section Partial Correlation on
page 18.
BEST=n
displays the variance and covariance matrix. When you specify the COV option, the
Pearson correlations will also be displayed. If you specify the OUTP= option, the
output data set also contains the covariance matrix with the corresponding TYPE
variable value COV. If you use the PARTIAL statement, PROC CORR computes a
partial covariance matrix.
displays a table of the corrected sums of squares and crossproducts. When you specify the CSSCP option, the Pearson correlations will also be displayed. If you specify
the OUTP= option, the output data set also contains a CSSCP matrix with the corresponding TYPE variable value CSSCP. If you use a PARTIAL statement, PROC
CORR prints both an unpartial and a partial CSSCP matrix, and the output data set
contains a partial CSSCP matrix.
DATA=SAS-data-set
names the SAS data set to be analyzed by PROC CORR. By default, the procedure
uses the most recently created SAS data set.
EXCLNPWGT
excludes observations with nonpositive weight values from the analysis. By default,
PROC CORR treats observations with negative weights like those with zero weights
and counts them in the total number of observations.
FISHER < ( sher-options ) >
species the level of the condence limits for the correlation, 100(1 )%.
The value of the ALPHA= option must be between 0 and 1, and the default is
ALPHA=0.05.
BIASADJ= YES | NO
species the type of condence limits. The TYPE=LOWER option requests a lower condence limit from the lower alternative H1 : < 0 , the
TYPE=UPPER option requests an upper condence limit from the upper alternative H1 : > 0 , and the default TYPE=TWOSIDED option requests twosided condence limits from the two-sided alternative H1 : = 0 .
HOEFFDING
requests a table of Hoeffdings D statistics. This D statistic is 30 times larger than the
usual denition and scales the range between 0.5 and 1 so that large positive values
indicate dependence. The HOEFFDING option is invalid if a WEIGHT or PARTIAL
statement is used.
10
suppresses displaying of Pearson correlations. If you specify the OUTP= option, the
data set type remains CORR. To change the data set type to COV, CSSCP, or SSCP,
use the TYPE= data set option.
NOMISS
excludes observations with missing values from the analysis. Otherwise, PROC
CORR computes correlation statistics using all of the nonmissing pairs of variables.
Using the NOMISS option is computationally more efcient.
NOPRINT
suppresses all displayed output. Use NOPRINT if you want to create an output data
set only.
NOPROB
suppresses printing simple descriptive statistics for each variable. However, if you
request an output data set, the output data set still contains simple descriptive statistics
for the variables.
OUTH=output-data-set
creates an output data set containing Hoeffdings D statistics. The contents of the
output data set are similar to the OUTP= data set. When you specify the OUTH=
option, the Hoeffdings D statistics will be displayed, and the Pearson correlations
will be displayed only if the PEARSON, ALPHA, COV, CSSCP, SSCP, or OUT=
option is also specied.
OUTK=output-data-set
creates an output data set containing Kendall correlation statistics. The contents of
the output data set are similar to those of the OUTP= data set. When you specify the
OUTK= option, the Kendall correlation statistics will be displayed, and the Pearson
correlations will be displayed only if the PEARSON, ALPHA, COV, CSSCP, SSCP,
or OUT= option is also specied.
OUTP=output-data-set
OUT=output-data-set
creates an output data set containing Pearson correlation statistics. This data set also
includes means, standard deviations, and the number of observations. The value of
the TYPE variable is CORR. When you specify the OUTP= option, the Pearson
correlations will also be displayed. If you specify the ALPHA option, the output data
set also contains six observations with Cronbachs coefcient alpha.
creates an output data set containing Spearman correlation coefcients. The contents of the output data set are similar to the OUTP= data set. When you specify
the OUTS= option, the Spearman correlation coefcients will be displayed, and the
Pearson correlations will be displayed only if the PEARSON, ALPHA, COV, CSSCP,
SSCP, or OUT= option is also specied.
PEARSON
displays the ordered correlation coefcients for each variable. Correlations are ordered from highest to lowest in absolute value. If you specify the HOEFFDING
option, the D statistics are displayed in order from highest to lowest.
SINGULAR=p
species the criterion for determining the singularity of a variable if you use a
PARTIAL statement. A variable is considered singular if its corresponding diagonal element after Cholesky decomposition has a value less than p times the original
unpartialled value of that variable. The default value is 1E8. The range of is
between 0 and 1.
SPEARMAN
requests a table of Spearman correlation coefcients based on the ranks of the variables. The correlations range from 1 to 1. If you specify a WEIGHT statement, the
SPEARMAN option is invalid.
SSCP
displays a table the sums of squares and crossproducts. When you specify the SSCP
option, the Pearson correlations will also be displayed. If you specify the OUTP=
option, the output data set contains a SSCP matrix and the corresponding TYPE
variable value is SSCP. If you use a PARTIAL statement, the unpartial SSCP matrix
is displayed, and the output data set does not contain an SSCP matrix.
VARDEF=d
species the variance divisor in the calculation of variances and covariances. The
following table shows the possible values for the value d and associated divisors,
where k is the number of PARTIAL statement variables. The default is VARDEF=DF.
Table 1.2. Possible Values for VARDEF=
Value
DF
N
WDF
WEIGHT|WGT
Divisor
degrees of freedom
number of observations
sum of weights minus one
sum of weights
Formula
nk1
n
(wi ) k 1
wi
11
12
(xi x)2
wi (xi xw )2
where wi is the weight for the ith observation and xw is the weighted mean.
If you use the WEIGHT statement and VARDEF=DF, the variance is an estimate
of s2 , where the variance of the ith observation is V (xi ) = s2 /wi . This yields an
estimate of the variance of an observation with unit weight.
If you use the WEIGHT statement and VARDEF=WGT, the computed variance is
asymptotically an estimate of s2 /w, where w is the average weight (for large n). This
BY Statement
BY variables ;
You can specify a BY statement with PROC CORR to obtain separate analyses on
observations in groups dened by the BY variables. If a BY statement appears, the
procedure expects the input data set to be sorted in order of the BY variables.
If your input data set is not sorted in ascending order, use one of the following alternatives:
Sort the data using the SORT procedure with a similar BY statement.
Specify the BY statement option NOTSORTED or DESCENDING in the BY
statement for the CORR procedure. The NOTSORTED option does not mean
that the data are unsorted but rather that the data are arranged in groups (according to values of the BY variables) and that these groups are not necessarily
in alphabetical or increasing numeric order.
Create an index on the BY variables using the DATASETS procedure.
For more information on the BY statement, refer to the discussion in SAS Language
Reference: Concepts. For more information on the DATASETS procedure, refer to
the discussion in the SAS Procedures Guide.
WEIGHT Statement
FREQ Statement
FREQ variable ;
The FREQ statement lists a numeric variable whose value represents the frequency
of the observation. If you use the FREQ statement, the procedure assumes that each
observation represents n observations, where n is the value of the FREQ variable. If
n is not an integer, SAS truncates it. If n is less than 1 or is missing, the observation
is excluded from the analysis. The sum of the frequency variable represents the total
number of observations.
The effects of the FREQ and WEIGHT statements are similar except when calculating degrees of freedom.
PARTIAL Statement
PARTIAL variables ;
The PARTIAL statement lists variables to use in the calculation of partial correlation statistics. Only the Pearson partial correlation, Spearman partial rank-order
correlation, and Kendalls partial tau-b can be computed. It is not valid with the
HOEFFDING option. When you use the PARTIAL statement, observations with
missing values are excluded.
With a PARTIAL statement, PROC CORR also displays the partial variance and standard deviation for each analysis variable if the PEARSON option is specied.
VAR Statement
VAR variables ;
The VAR statement lists variables for which to compute correlation coefcients. If
the VAR statement is not specied, PROC CORR computes correlations for all numeric variables not listed in other statements.
WEIGHT Statement
WEIGHT variable ;
The WEIGHT statement lists weights to use in the calculation of Pearson weighted
product-moment correlation. The HOEFFDING, KENDALL, and SPEARMAN options are not valid with the WEIGHT statement.
The observations with missing weights are excluded from the analysis. By default,
for observations with nonpositive weights, weights are set to zero and the observations are included in the analysis. You can use the EXCLNPWGT option to exclude
observations with negative or zero weights from the analysis.
Note that most SAS/STAT procedures, such as PROC GLM, exclude negative and
zero weights by default. If you use the WEIGHT statement, consider which value of
the VARDEF= option is appropriate. See the discussion of the VARDEF= option for
more information.
13
14
WITH Statement
WITH variables;
The WITH statement lists variables with which correlations of the VAR statement
variables are to be computed. The WITH statement requests correlations of the form
r(Xi , Yj ), where X1 , . . . , Xm are analysis variables specied in the VAR statement,
and Y1 , . . . , Yn are variables specied in the WITH statement. The correlation matrix
has a rectangular structure of the form
r(Y1 , X1 ) r(Y1 , Xm )
.
.
..
.
.
.
.
.
r(Yn , X1 ) r(Yn , Xm )
Details
Pearson Product-Moment Correlation
The Pearson product-moment correlation is a parametric measure of association for
two variables. It measures both the strength and direction of a linear relationship. If
one variable X is an exact linear function of another variable Y, a positive relationship
exists if the correlation is 1 and a negative relationship exists if the correlation is 1.
If there is no linear predictability between the two variables, the correlation is 0. If
the two variables are normal with a correlation 0, the two variables are independent.
However, correlation does not imply causality because, in some cases, an underlying
causal relationship may not exist.
The following scatter plot matrix displays the relationship between two numeric random variables under various situations.
The scatter plot matrix shows a positive correlation between variables Y1 and X1, a
negative correlation between Y1 and X2, and no clear correlation between Y2 and
X1. The plot also shows no clear linear correlation between Y2 and X2, even though
Y2 is dependent on X2.
The formula for the population Pearson product-moment correlation, denoted xy , is
xy =
Cov(x, y)
V (x)V (y)
E( (x E(x))(y E(y)) )
E(x E(x))2 E(y E(y))2
x)(yi y ) )
2
i (xi x)
i (yi y )
i ( (xi
where x is the sample mean of x and y is the sample mean of y. The formula for a
wi (xi xw )(yi yw )
2
i wi (xi xw )
i wi (yi yw )
i
15
16
of y.
Probability Values
Probability values for the Pearson correlation are computed by treating
1/2
t = (n 2)
r2
1 r2
1/2
as coming from a t distribution with (n2) degrees of freedom, where r is the sample
correlation.
i ( (Ri
i (Ri
R)(Si S) )
R)2
(Si S)2
where Ri is the rank of xi , Si is the rank of yi , R is the mean of the Ri values, and S
is the mean of the Si values.
PROC CORR computes the Spearman correlation by ranking the data and using the
ranks in the Pearson product-moment correlation formula. In case of ties, the averaged ranks are used.
Probability Values
Probability values for the Spearman correlation are computed by treating
1/2
t = (n 2)
r2
1 r2
1/2
as coming from a t distribution with (n2) degrees of freedom, where r is the sample
Spearman correlation.
i<j
(sgn(xi xj )sgn(yi yj ))
(T0 T1 )(T0 T2 )
if z > 0
1
0
if z = 0
sgn(z) =
1 if z < 0
PROC CORR computes Kendalls tau-b by ranking the data and using a method similar to Knight (1966). The data are double sorted by ranking observations according
to values of the rst variable and reranking the observations according to values of
the second variable. PROC CORR computes Kendalls tau-b from the number of interchanges of the rst variable and corrects for tied pairs (pairs of observations with
equal values of X or equal values of Y).
Probability Values
Probability values for Kendalls tau-b are computed by treating
s
V (s)
as coming from a standard normal distribution where
(sgn(xi xj )sgn(yi yj ))
s=
i<j
v0 vt vu
v1
v2
+
+
18
2n(n 1) 9n(n 1)(n 2)
where
v0 = n(n 1)(2n + 5)
vt =
tk (tk 1)(2tk + 5)
vu =
ul (ul 1)(2ul + 5)
v1 = (
tk (tk 1)) (
ui (ul 1))
v2 = (
The sums are over tied groups of values where ti is the number of tied x values and ui
is the number of tied y values (Noether 1967). The sampling distribution of Kendalls
partial tau-b is unknown; therefore, the probability values are not available.
17
18
where D1 = i (Qi 1)(Qi 2), D2 = i (Ri 1)(Ri 2)(Si 1)(Si 2), and
D3 = i (Ri 2)(Si 2)(Qi 1). Ri is the rank of xi , Si is the rank of yi , and
Qi (also called the bivariate rank) is 1 plus the number of points with both x and y
values less than the ith point.
A point that is tied on only the x value or y value contributes 1/2 to Qi if the other
value is less than the corresponding value for the ith point.
A point that is tied on both x and y contributes 1/4 to Qi . PROC CORR obtains
the Qi values by rst ranking the data. The data are then double sorted by ranking
observations according to values of the rst variable and reranking the observations
according to values of the second variable. Hoeffdings D statistic is computed using
the number of interchanges of the rst variable. When no ties occur among data
set observations, the D statistic values are between 0.5 and 1, with 1 indicating
complete dependence. However, when ties occur, the D statistic may result in a
smaller value. That is, for a pair of variables with identical values, the Hoeffdings
D statistic may be less than 1. With a large number of ties in a small data set, the D
statistic may be less than 0.5. For more information about Hoeffdings D, refer to
Hollander and Wolfe (1973, p. 228).
Probability Values
The probability values for Hoeffdings D statistic are computed using the asymptotic
distribution computed by Blum, Kiefer, and Rosenblatt (1961). The formula is
4
(n 1) 4
D+
60
72
which comes from the asymptotic distribution. If the sample size is less than 10, refer
to the tables for the distribution of D in Hollander and Wolfe (1973).
Partial Correlation
A partial correlation measures the strength of a relationship between two variables,
while controlling the effect of other variables. The Pearson partial correlation between two variables, after controlling for variables in the PARTIAL statement, is
equivalent to the Pearson correlation between the residuals of the two variables after
regression on the controlling variables.
Partial Correlation
Let y = (y1 , y2 , . . . , yv ) be the set of variables to correlate and z = (z1 , z2 , . . . , zp )
be the set of controlling variables. The population Pearson partial correlation between
the ith and the jth variables of y given z is the correlation between errors (yi E(yi ))
and (yj E(yj )), where
E(yi ) = i + z i
and E(yj ) = j + z j
are the regression models for variables yi and yj given the set of controlling variables
z, respectively.
For a given sample of observations, a sample Pearson partial correlation between yi
and yj given z is derived from the residuals yi yi and yj yj , where
yi = i + z i
and yj = j + z j
are tted values from regression models for variables yi and yj given z.
The partial corrected sums of squares and crossproducts (CSSCP) of y given z are
the corrected sums of squares and crossproducts of the residuals y y . Using these
partial corrected sums of squares and crossproducts, you can calculate the partial
partial covariances and partial correlations.
PROC CORR derives the partial corrected sums of squares and crossproducts matrix by applying the Cholesky decomposition algorithm to the CSSCP matrix. For
Pearson partial correlations, let S be the partitioned CSSCP matrix between two sets
of variables, z and y:
S =
Szz Szy
Szy Syy
PROC CORR calculates Syy.z , the partial CSSCP matrix of y after controlling for z,
by applying the Cholesky decomposition algorithm sequentially on the rows associated with z, the variables being partialled out.
After applying the Cholesky decomposition algorithm to each row associated with
variables z, PROC CORR checks all higher numbered diagonal elements associated
with z for singularity. A variable is considered singular if the value of the corresponding diagonal element is less than times the original unpartialled corrected
sum of squares of that variable. You can specify the singularity criterion using
the SINGULAR= option. For Pearson partial correlations, a controlling variable z is
considered singular if the R2 for predicting this variable from the variables that are
already partialled out exceeds 1 . When this happens, PROC CORR excludes the
variable from the analysis. Similarly, a variable is considered singular if the R2 for
predicting this variable from the controlling variables exceeds 1 . When this happens, its associated diagonal element and all higher numbered elements in this row or
column are set to zero.
After the Cholesky decomposition algorithm is performed on all rows associated with
z, the resulting matrix has the form
19
20
T =
Tzz Tzy
0 Syy.z
where Tzz is an upper triangular matrix with Tzz Tzz = Szz , Tzz Tzy = Szy , and
Syy.z = Syy Tzy Tzy .
If Szz is positive denite, then Tzy = Tzz 1 Szy and the partial CSSCP matrix Syy.z
is identical to the matrix derived from the formula
1
Syy.z = Syy Szy Szz Szy
where rxy.z1 , rxz2 .z1 , and ryz2 .z1 are rst-order partial correlations among variables
x, y, and z2 given z1 .
To derive the corresponding Spearman partial rank-order correlations and Kendall
partial tau-b correlations, PROC CORR applies the Cholesky decomposition algorithm to the Spearman rank-order correlation matrix and Kendalls tau-b correlation
matrix and uses the correlation formula. That is, the Spearman partial correlation is
equivalent to the Pearson correlation between the residuals of the linear regression
of the ranks of the two variables on the ranks of the partialled variables. Thus, if a
PARTIAL statement is specied with the CORR=SPEARMAN option, the residuals
of the ranks of the two variables are displayed in the plot. The partial tau-b correlations range from 1 to 1. However, the sampling distribution of this partial tau-b is
unknown; therefore, the probability values are not available.
Fishers z Transformation
Probability Values
Probability values for the Pearson and Spearman partial correlations are computed by
treating
(n k 2)1/2 r
(1 r2 )1/2
as coming from a t distribution with (n k 2) degrees of freedom, where r is the
partial correlation and k is the number of variables being partialled out.
Fishers z Transformation
For a sample correlation r using a sample from a bivariate normal distribution with
correlation = 0, the statistic
tr = (n 2)1/2
r2
1 r2
1/2
1
log
2
1+r
1r
the statistic z has an approximate normal distribution with mean and variance
E(zr ) = +
V (zr ) =
2(n 1)
1
n3
0
2(n 1)
as a normal random variable with mean zero and variance 1/(n 3), where 0 =
tanh1 (0 ) (Fisher 1970, p. 207; Anderson 1984, p. 123).
21
22
1
n3
u = zr + z(1/2)
1
n3
where z(1/2) is the 100(1 /2) percentage point of the standard normal distribution.
With a bias adjustment, condence limits for are computed by treating
zr bias(r)
as having a normal distribution with mean zero and variance 1/(n 3), where the
bias adjustment function (Keeping 1962, p. 308) is
bias(rr ) =
r
2(n 1)
1
n3
Fishers z Transformation
u = zr bias(r) + z(1/2)
1
n3
These computed condence limits of l and u are then transformed back to derive
the condence limits for the correlation :
rl = tanh(l ) =
exp(2l ) 1
exp(2l ) + 1
ru = tanh(u ) =
exp(2u ) 1
exp(2u ) + 1
Note that with a bias adjustment, the CORR procedure also displays the following
correlation estimate:
radj = tanh(zr bias(r))
0
2(n1 1)
as a normal random variable with mean zero and variance 1/(n1 3).
Assume that sample correlations r1 and r2 are computed from two independent samples of n1 and n2 observations, respectively. To test whether the two corresponding
population correlations, 1 and 2 , are equal, rst apply the z transformation to the
two sample correlations: z1 = tanh1 (r1 ) and z2 = tanh1 (r2 ).
The p-value is derived under the null hypothesis of equal correlation. That is, the
difference z1 z2 is distributed as a normal random variable with mean zero and
variance 1/(n1 3) + 1/(n2 3).
23
24
z=
z
z
See Example 1.4 for further illustrations of these applications.
Note that this approach can be extended to include more than two samples.
Cov(T, E) = 0
Cov(Y, T )2
V (T )2
V (T )
=
=
V (Y )V (T )
V (Y )V (T )
V (Y )
which is the proportion of the observed variance due to true differences among individuals in the sample. If Y is the sum of several observed variables measuring the
same feature, you can estimate V (T ). Cronbachs coefcient alpha, based on a lower
bound for V (T ), is an estimate of the reliability coefcient.
Suppose p variables are used with Yj = Tj + Ej for j = 1, 2, . . . , p, where Yj is the
observed value, Tj is the true value, and Ej is the measurement error. The measurement errors (Ej ) are independent of the true values (Tj ) and are also independent of
V (Tj )
(p 1)
j
Tj be the total
Cov(Ti , Tj )
i=j
Cov(Ti , Tj )
i=j
With Cov(Yi , Yj ) = Cov(Ti , Tj ) for i = j, a lower bound for the reliability coefcient, V (T0 )/V (Y0 ), is then given by the Cronbachs coefcient alpha:
p
p1
i=j
Cov(Yi , Yj )
V (Y0 )
p
p1
V (Yj )
V (Y0 )
If the variances of the items vary widely, you can standardize the items to a standard
deviation of 1 before computing the coefcient alpha. If the variables are dichotomous (0,1), the coefcient alpha is equivalent to the Kuder-Richardson 20 (KR-20)
reliability measure.
When the correlation between each pair of variables is 1, the coefcient alpha has a
maximum value of 1. With negative correlations between some variables, the coefcient alpha can have a value less than zero. The larger the overall alpha coefcient, the
more likely that items contribute to a reliable scale. Nunnally and Bernstein (1994)
suggests 0.70 as an acceptable reliability coefcient; smaller reliability coefcients
are seen as inadequate. However, this varies by discipline.
To determine how each item reects the reliability of the scale, you calculate a coefcient alpha after deleting each variable independently from the scale. The Cronbachs
coefcient alpha from all variables except the kth variable is given by
k =
p1
p2
i=k
V(
V (Yi )
i=k
Yi )
If the reliability coefcient increases after an item is deleted from the scale, you can
assume that the item is not correlated highly with other items in the scale. Conversely,
if the reliability coefcient decreases, you can assume that the item is highly correlated with other items in the scale. Refer to SAS Communications, Fourth Quarter
1994, for more information on how to interpret Cronbachs coefcient alpha.
Listwise deletion of observations with missing values is necessary to correctly calculate Cronbachs coefcient alpha. PROC CORR does not automatically use listwise
deletion if you specify the ALPHA option. Therefore, you should use the NOMISS
option if the data set contains missing values. Otherwise, PROC CORR prints a
warning message indicating the need to use the NOMISS option with the ALPHA
option.
25
26
Missing Values
PROC CORR excludes observations with missing values in the WEIGHT and FREQ
variables. By default, PROC CORR uses pairwise deletion when observations contain missing values. PROC CORR includes all nonmissing pairs of values for each
pair of variables in the statistical computations. Therefore, the correlation statistics
may be based on different numbers of observations.
If you specify the NOMISS option, PROC CORR uses listwise deletion when a value
of the VAR or WITH statement variable is missing. PROC CORR excludes all observations with missing values from the analysis. Therefore, the number of observations
for each pair of variables is identical.
The PARTIAL statement always excludes the observations with missing values by
automatically invoking the NOMISS option. With the NOMISS option, the data are
processed more efciently because fewer resources are needed. Also, the resulting
correlation matrix is nonnegative denite.
In contrast, if the data set contains missing values for the analysis variables and the
NOMISS option is not specied, the resulting correlation matrix may not be nonnegative denite. This leads to several statistical difculties if you use the correlations
as input to regression or other statistical procedures.
Output Tables
By default, PROC CORR prints a report that includes descriptive statistics and correlation statistics for each variable. The descriptive statistics include the number of
observations with nonmissing values, the mean, the standard deviation, the minimum,
and the maximum.
If a nonparametric measure of association is requested, the descriptive statistics include the median. Otherwise, the sample sum is included. If a Pearson partial correlation is requested, the descriptive statistics also include the partial variance and
partial standard deviation.
If variable labels are available, PROC CORR labels the variables. If you specify the
CSSCP, SSCP, or COV option, the appropriate sum-of-squares and crossproducts and
covariance matrix appears at the top of the correlation report. If the data set contains
missing values, PROC CORR prints additional statistics for each pair of variables.
These statistics, calculated from the observations with nonmissing row and column
variable values, may include
SSCP(W,V), uncorrected sum-of-squares and crossproducts
USS(W), uncorrected sum-of-squares for the row variable
USS(V), uncorrected sum-of-squares for the column variable
CSSCP(W,V), corrected sum-of-squares and crossproducts
CSS(W), corrected sum-of-squares for the row variable
CSS(V), corrected sum-of-squares for the column variable
27
28
=
=
=
=
=
= V +W +P
V W
K =
V (V + 1)/2
K
L =
T (T + 1)/2
when W > 0
when W = 0
when P = 0
when P > 0
You can reduce CPU time by specifying NOMISS. With NOMISS, processing is
much faster when most observations do not contain missing values. The options and
statements you use in the procedure require different amounts of storage to process
the data. For Pearson correlations, the amount of temporary storage needed (in bytes)
is
M
The NOMISS option decreases the amount of temporary storage by 56K bytes, the
FISHER option increases the storage by 24K bytes, the PARTIAL statement increases the storage by 12T bytes, and the ALPHA option increases the storage by
32V + 16 bytes.
The following example uses a PARTIAL statement, which excludes missing values.
proc corr;
var x1 x2;
with y1 y2 y3;
partial z1;
run;
Therefore, using 40T + 16L + 56T + 12T , the minimum temporary storage equals
984 bytes (T = 2 + 3 + 1 and L = T (T + 1)/2).
Using the SPEARMAN, KENDALL, or HOEFFDING option requires additional
temporary storage for each observation. For the most time-efcient processing, the
amount of temporary storage (in bytes) is
M
where
QS
QP
QK =
0
68T
56K
0
32N
0
with NOSIMPLE
otherwise
with PEARSON and without NOMISS
otherwise
with KENDALL or HOEFFDING
otherwise
29
30
= 40 3 + 8 6 + 8 6 1 + 12 3N + 28N + 3 68 + 32N
= 420 + 96N
ODS Name
Cov
CronbachAlpha
CronbachAlphaDel
Csscp
FisherPearsonCorr
FisherSpearmanCorr
HoeffdingCorr
KendallCorr
PearsonCorr
SimpleStats
SpearmanCorr
Sscp
VarInformation
Description
Covariances
Coefcient alpha
Coefcient alpha with deleted variable
Corrected sums of squares and crossproducts
Pearson correlation statistics using Fishers
z Transformation
Spearman correlation statistics using Fishers
z Transformation
Hoeffdings D statistics
Kendalls tau-b coefcients
Pearson correlations
Simple descriptive statistics
Spearman correlations
Sums of squares and crossproducts
Variable information
Option
COV
ALPHA
ALPHA
CSSCP
FISHER
FISHER SPEARMAN
HOEFFDING
KENDALL
PEARSON
SPEARMAN
SSCP
ODS Name
FisherPearsonPartialCorr
FisherSpearmanPartialCorr
Description
Pearson Partial Correlation Statistics
Using Fishers z Transformation
Spearman Partial Correlation Statistics
Using Fishers z Transformation
Option
FISHER
FISHER SPEARMAN
ODS Name
PartialCsscp
PartialCov
PartialKendallCorr
PartialPearsonCorr
PartialSpearmanCorr
Description
Partial corrected sums of squares and
crossproduct
Partial covariances
Partial Kendall tau-b coefcients
Partial Pearson correlations
Partial Spearman correlations
Option
CSSCP
COV
KENDALL
SPEARMAN
requests a scatter plot matrix for all variables, scatter plots for all pairs of variables, or
both. If only the option keyword PLOTS is specied, the PLOTS=MATRIX option
is used. When you specify the PLOTS option, the Pearson correlations will also be
displayed.
You can specify the following with the PLOTS= option:
MATRIX < ( matrix-options ) >
requests a scatter plot matrix for all variables. That is, the procedure displays a symmetric matrix plot with variables in the VAR list if a WITH statement is not specied.
Otherwise, the procedure displays a rectangular matrix plot with the WITH variables
appear down the side and the VAR variables appear across the top.
The available matrix-options are:
NMAXVAR=n
31
32
requests a scatter plot for each pair of variables. That is, the procedure displays a
scatter plot for each pair of distinct variables from the VAR list if a WITH statement
is not specied. Otherwise, the procedure displays a scatter plot for each pair of
variables, one from the WITH list and the other from the VAR list.
The available scatter-options are:
ALPHA=numbers
suppresses the default inset of summary information for the scatter plot. The
inset table is displayed next to the scatter plot and contains statistics such as
number of observations (NObs), correlation, and p-value (Prob >|r|).
NOLEGEND
suppresses the default legend for overlaid prediction or condence ellipses. The
legend table is displayed next to the scatter plot and identies each ellipse displayed in the plot.
NMAXVAR=n
Let Z and S be the sample mean and sample covariance matrix of a random sample
of size n from a bivariate normal distribution with mean and covariance matrix .
The variable Z is distributed as a bivariate normal variate with mean zero and
covariance (1/n), and it is independent of S. Using Hotellings T 2 statistic, which
is dened as
T 2 = n(Z ) S1 (Z )
a 100(1 )% condence ellipse for is computed from the equation
n
2
(Z ) S1 (Z ) =
F2,n2 (1 )
n1
n2
where F2,n2 (1 ) is the (1 ) critical value of an F distribution with degrees
of freedom 2 and n 2.
A prediction ellipse is a region for predicting a new observation in the population. It
also approximates a region containing a specied percentage of the population.
Denote a new observation as the bivariate random variable Znew . The variable
Znew Z = (Znew ) (Z )
is distributed as a bivariate normal variate with mean zero (the zero vector) and covariance (1 + 1/n), and it is independent of S. A 100(1 )% prediction ellipse
is then given by the equation
n
2(n + 1)
(Z ) S1 (Z ) =
F2,n2 (1 )
n1
n2
The family of ellipses generated by different critical values of the F distribution has
a common center (the sample mean) and common major and minor axis directions.
The shape of an ellipse depends on the aspect ratio of the plot. The ellipse indicates
the correlation between the two variables if the variables are standardized (by dividing the variables by their respective standard deviations). In this situation, the ratio
between the major and minor axis lengths is
1 + |r|
1 |r|
In particular, if r = 0, the ratio is 1, which corresponds to a circular condence
contour and indicates that the variables are uncorrelated. A larger value of the ratio
indicates a larger positive or negative correlation between the variables.
33
34
Plot Description
Scatter plot
Rectangular scatter plot matrix
Symmetric scatter plot matrix
Option
PLOTS=SCATTER
PLOTS=MATRIX
PLOTS=MATRIX
Statement
WITH
(omit WITH)
Examples
Example 1.1. Computing Four Measures of Association
This example produces a correlation analysis with descriptive statistics and four measures of association: the Pearson product-moment correlation, the Spearman rankorder correlation, Kendalls tau-b coefcients, and Hoeffdings measure of dependence, D.
The Fitness data set created in the Getting Started section beginning on page 4
contains measurements from a study of physical tness of 31 participants. The following statements request all four measures of association for the variables Weight,
Oxygen, and Runtime.
ods html;
ods graphics on;
title Measures of Association for a Physical Fitness Study;
proc corr data=Fitness pearson spearman kendall hoeffding
plots;
var Weight Oxygen RunTime;
run;
ods graphics off;
ods html close;
Note that Pearson correlations are computed by default only if all three nonparametric correlations (SPEARMAN, KENDALL, and HOEFFDING) are not specied.
Otherwise, you need to specify the PEARSON option explicitly to compute Pearson
correlations.
By default, observations with nonmissing values for each variable are used to derive the univariate statistics for that variable. When nonparametric measures of association are specied, the procedure displays the median instead of the sum as an
additional descriptive measure.
Variables:
Weight
Oxygen
RunTime
Simple Statistics
Variable
Weight
Oxygen
RunTime
Mean
Std Dev
Median
Minimum
Maximum
31
29
29
77.44452
47.22721
10.67414
8.32857
5.47718
1.39194
77.45000
46.67200
10.50000
59.08000
37.38800
8.17000
91.63000
60.05500
14.03000
Oxygen
RunTime
1.00000
-0.15358
0.4264
29
0.20072
0.2965
29
-0.15358
0.4264
29
1.00000
-0.86843
<.0001
28
0.20072
0.2965
29
-0.86843
<.0001
28
31
Oxygen
RunTime
29
1.00000
29
35
36
Oxygen
RunTime
1.00000
-0.06824
0.7250
29
0.13749
0.4769
29
-0.06824
0.7250
29
1.00000
-0.80131
<.0001
28
0.13749
0.4769
29
-0.80131
<.0001
28
31
Oxygen
RunTime
29
1.00000
29
Oxygen
RunTime
1.00000
-0.00988
0.9402
29
0.06675
0.6123
29
-0.00988
0.9402
29
1.00000
-0.62434
<.0001
28
0.06675
0.6123
29
-0.62434
<.0001
28
31
Oxygen
RunTime
29
1.00000
29
Oxygen
RunTime
Weight
0.97690
<.0001
31
-0.00497
0.5101
29
-0.02355
1.0000
29
Oxygen
-0.00497
0.5101
29
1.00000
0.23449
<.0001
28
-0.02355
1.0000
29
0.23449
<.0001
28
RunTime
29
1.00000
29
The experimental PLOTS option requests a symmetric scatter plot for the analysis
variables listed in the VAR statement. The strong negative linear relationship between
Oxygen and Runtime is evident in Output 1.1.6.
37
38
The following statements request a correlation analysis between two sets of variables,
the sepal measurements and the petal measurements.
The CORR procedure displays univariate statistics for variables in the VAR and
WITH statements.
Output 1.2.1. Simple Statistics
Fisher (1936) Iris Setosa Data
The CORR Procedure
2 With Variables:
2
Variables:
PetalLength PetalWidth
SepalLength SepalWidth
Simple Statistics
Variable
PetalLength
PetalWidth
SepalLength
SepalWidth
Mean
Std Dev
Sum
49
48
50
50
14.71429
2.52083
50.06000
34.28000
1.62019
1.03121
3.52490
3.79064
721.00000
121.00000
2503
1714
Simple Statistics
Variable
PetalLength
PetalWidth
SepalLength
SepalWidth
Minimum
Maximum
Label
11.00000
1.00000
43.00000
23.00000
19.00000
6.00000
58.00000
44.00000
Petal
Petal
Sepal
Sepal
Length in mm.
Width in mm.
Length in mm.
Width in mm.
When the WITH statement is specied together with the VAR statement, the
CORR procedure produces rectangular matrices for statistics such as covariances
and correlations. The matrix rows correspond to the WITH variables (PetalLength
and PetalWidth) while the matrix columns correspond to the VAR variables
(SepalLength and SepalWidth). The CORR procedure uses the WITH variable
labels to label the matrix rows.
The SSCP option requests a table of the uncorrected sum-of-squares and crossproducts matrix, and the COV option requests a table of the covariance matrix. The SSCP
and COV options also produce a table of the Pearson correlations.
39
40
SepalWidth
PetalLength
Petal Length in mm.
36214.00000
10735.00000
123793.0000
24756.00000
10735.00000
58164.0000
PetalWidth
Petal Width in mm.
6113.00000
355.00000
121356.0000
4191.00000
355.00000
56879.0000
The variances are computed by using observations with nonmissing row and column
variable values. The Variances and Covariances table shown in Output 1.2.3 displays the covariance, variance for the row variable, variance for the column variable,
and the associated degrees of freedom for each pair of variables.
Output 1.2.3. Variances and Covariances
Fisher (1936) Iris Setosa Data
Variances and Covariances
Covariance / Row Var Variance / Col Var Variance / DF
SepalLength
SepalWidth
PetalLength
Petal Length in mm.
1.270833333
2.625000000
12.33333333
48
1.363095238
2.625000000
14.60544218
48
PetalWidth
Petal Width in mm.
0.911347518
1.063386525
11.80141844
47
1.048315603
1.063386525
13.62721631
47
Sepal
Width
PetalLength
Petal Length in mm.
0.22335
0.1229
49
0.22014
0.1285
49
PetalWidth
Petal Width in mm.
0.25726
0.0775
48
0.27539
0.0582
48
When there are missing values in the analysis variables, the Pearson Correlation
Coefcients table shown in Output 1.2.4 displays the correlation, the p-value under
the null hypothesis of zero correlation, and the number of observations for each pair
of variables. Only the correlation between PetalWidth and SepalLength and the
correlation between PetalWidth and SepalWidth are slightly positive.
The experimental PLOTS option displays a rectangular scatter plot matrix for the two
sets of variables. The VAR variables SepalLength and SepalWidth are listed across
the top of the matrix, and the WITH variables PetalLength and PetalWidth are listed
down the side of the matrix. As measured in Output 1.2.4, the plot for PetalWidth
and SepalLength and the plot for PetalWidth and SepalWidth show slight positive
correlations.
41
42
This display is requested by specifying both the ODS GRAPHICS statement and
the PLOTS option. For general information about ODS graphics, refer to Chapter 15,
Statistical Graphics Using ODS (SAS/STAT Users Guide). For specic information
about the graphics available in the CORR procedure, see the section ODS Graphics
on page 31.
The NOSIMPLE option suppresses the table of descriptive statistics. The Pearson
Correlation Coefcients table is displayed by default.
Oxygen
RunTime
1.00000
-0.15358
0.4264
29
0.20072
0.2965
29
-0.15358
0.4264
29
1.00000
-0.86843
<.0001
28
0.20072
0.2965
29
-0.86843
<.0001
28
Weight
31
Oxygen
RunTime
29
1.00000
29
Variable
With
Variable
Weight
Weight
Oxygen
Oxygen
RunTime
RunTime
Sample
Correlation
Fishers z
Bias
Adjustment
Correlation
Estimate
29
29
28
-0.15358
0.20072
-0.86843
-0.15480
0.20348
-1.32665
-0.00274
0.00358
-0.01608
-0.15090
0.19727
-0.86442
Variable
With
Variable
Weight
Weight
Oxygen
Oxygen
RunTime
RunTime
-0.490289
-0.182422
-0.935728
0.228229
0.525765
-0.725221
p Value for
H0:Rho=0
0.4299
0.2995
<.0001
See the section Fishers z Transformation on page 21 for details on Fishers z transformation.
The following statements request one-sided hypothesis tests and condence limits for
the correlation using Fishers z transformation.
proc corr data=Fitness nosimple nocorr fisher (type=lower);
var weight oxygen runtime;
run;
43
44
Variable
With
Variable
Weight
Weight
Oxygen
Oxygen
RunTime
RunTime
Sample
Correlation
Fishers z
Bias
Adjustment
Correlation
Estimate
29
29
28
-0.15358
0.20072
-0.86843
-0.15480
0.20348
-1.32665
-0.00274
0.00358
-0.01608
-0.15090
0.19727
-0.86442
Variable
With
Variable
Weight
Weight
Oxygen
Oxygen
RunTime
RunTime
Lower 95% CL
p Value for
H0:Rho<=0
-0.441943
-0.122077
-0.927408
0.7850
0.1497
1.0000
The TYPE=LOWER option requests a lower condence limit and a p-value for the
test of the one-sided hypothesis H0 : 0 against the alternative hypothesis H1 : >
0. Here Fishers z, the bias adjustment, and the estimate of the correlation are the
same as for the two-sided alternative. However, because TYPE=LOWER is specied,
only a lower condence limit is computed for each correlation, and one-sided pvalues are computed.
+
=
=
=
(i>300);
0.3*X + 0.9*rannor(246791);
0.25*X + sqrt(.8375)*rannor(246791);
0.3*X + 0.9*rannor(246791);
The test is requested with the option FISHER(RHO0=0.5). The results, which are
based on Fishers transformation, are shown in Output 1.4.1.
Output 1.4.1. Fishers Test for H0 : = 0
Analysis for Batch 1
The CORR Procedure
Pearson Correlation Statistics (Fishers z Transformation)
Variable
With
Variable
Sample
Correlation
Fishers z
Bias
Adjustment
Correlation
Estimate
150
0.22081
0.22451
0.0007410
0.22011
Variable
With
Variable
0.367409
------H0:Rho=Rho0----Rho0
p Value
0.50000
<.0001
The null hypothesis is rejected since the p-value is less than 0.0001.
45
46
The ODS SELECT statement restricts the output from PROC CORR to the
FisherPearsonCorr table, which is shown in Output 1.4.2; see the section ODS
Table Names on page 30. The output data set SimCorr contains Fishers z statistics
for both batches.
Output 1.4.2. Fishers Correlation Statistics
Testing Equality of Population Correlations
----------------------------------- Batch=1 -----------------------------------The CORR Procedure
Pearson Correlation Statistics (Fishers z Transformation)
Variable
With
Variable
Sample
Correlation
Fishers z
Bias
Adjustment
Correlation
Estimate
150
0.22081
0.22451
0.0007410
0.22011
Variable
With
Variable
p Value for
H0:Rho=0
0.367409
0.0065
Variable
With
Variable
Sample
Correlation
Fishers z
Bias
Adjustment
Correlation
Estimate
150
0.33694
0.35064
0.00113
0.33594
Variable
With
Variable
0.470853
p Value for
H0:Rho=0
<.0001
z1
n2
z2
variance
150
0.22451
150
0.35064
0.013605
z
-1.08135
pval
0.27954
In Output 1.4.3, the p-value of 0.2795 does not provide evidence to reject the null
hypothesis that 1 = 2 . The sample sizes n1 = 150 and n2 = 150 are not large
enough to detect the difference 1 2 = 0.05 at a signicance level of = 0.05.
r1 and r2 :
z=
47
48
z1
n2
z2
150
0.22451
100
0.23929
corr
0.22640
lcl
ucl
0.10092
0.35187
Thus, a correlation estimate from the combined samples is r = 0.23. The 95%
condence interval displayed in Output 1.4.4 is (0.10, 0.35) using the variance of the
combined estimate. Note that this interval contains the population correlation 0.3.
See the section Applications of Fishers z Transformation on page 23.
The following statements request a correlation analysis and compute Cronbachs coefcient alpha for the variables Weight3, Length3, Height, and Width.
ods html;
ods graphics on;
title Fish Measurement Data;
proc corr data=fish1 nomiss alpha plots;
var Weight3 Length3 Height Width;
run;
ods graphics off;
ods html close;
The NOMISS option excludes observations with missing values, and the PLOTS option requests a symmetric scatter plot matrix for the analysis variables.
By default, the CORR procedure displays descriptive statistics for each variable, as
shown in Output 1.5.1.
49
50
Variables:
Weight3
Length3
Height
Width
Simple Statistics
Variable
Mean
Std Dev
Sum
Minimum
Maximum
34
34
34
34
Weight3
Length3
Height
Width
8.44751
38.38529
15.22057
5.43805
0.97574
4.21628
1.98159
0.72967
287.21524
1305
517.49950
184.89370
6.23168
30.00000
11.52000
4.02000
10.00000
46.50000
18.95700
6.74970
Since the NOMISS option is specied, the same set of 34 observations is used to
compute the correlation for each pair of variables. The correlations are shown in
Output 1.5.2.
Output 1.5.2. Pearson Correlation Coefcients
Fish Measurement Data
Pearson Correlation Coefficients, N = 34
Prob > |r| under H0: Rho=0
Weight3
Length3
Height
Width
Weight3
1.00000
0.96523
<.0001
0.96261
<.0001
0.92789
<.0001
Length3
0.96523
<.0001
1.00000
0.95492
<.0001
0.92171
<.0001
Height
0.96261
<.0001
0.95492
<.0001
1.00000
0.92632
<.0001
Width
0.92789
<.0001
0.92171
<.0001
0.92632
<.0001
1.00000
Since the data set contains only one species of sh, all the variables are highly correlated. This is evidenced in the scatter plot matrix for the analysis variables, which is
shown in Output 1.7.3, created in Example 1.7.
Positive correlation is needed for the alpha coefcient because variables measure a
common entity.
With the ALPHA option, the CORR procedure computes Cronbachs coefcient alpha, which is a lower bound for the reliability coefcient for the raw variables and
the standardized variables.
Because the variances of some variables vary widely, you should use the standardized
score to estimate reliability. The overall standardized Cronbachs coefcient alpha of
0.985145 provides an acceptable lower bound for the reliability coefcient. This
is much greater than the suggested value of 0.70 given by Nunnally and Bernstein
(1994).
Output 1.5.4. Cronbachs Coefcient Alpha with Deleted Variables
Fish Measurement Data
Cronbach Coefficient Alpha with Deleted Variable
Raw Variables
Standardized Variables
Deleted
Correlation
Correlation
Variable
with Total
Alpha
with Total
Alpha
-----------------------------------------------------------------------Weight3
0.975379
0.783365
0.973464
0.977103
Length3
0.967602
0.881987
0.967177
0.978783
Height
0.964715
0.655098
0.968079
0.978542
Width
0.934635
0.824069
0.937599
0.986626
The standardized alpha coefcient provides information on how each variable reects the reliability of the scale with standardized variables. If the standardized
alpha decreases after removing a variable from the construct, then this variable is
strongly correlated with other variables in the scale. On the other hand, if the standardized alpha increases after removing a variable from the construct, then removing this variable from the scale makes the construct more reliable. The Cronbach
Coefcient Alpha with Deleted Variables table in Output 1.5.4 does not show signicant increase or decrease for the standardized alpha coefcients. See the section Cronbachs Coefcient Alpha on page 24 for more information regarding constructs and Cronbachs alpha.
51
52
The NOMISS option excludes observations with missing values of the VAR statement variables from the analysis. The NOSIMPLE option suppresses the display
of descriptive statistics, and the OUTP= option creates an output data set named
CorrOutp that contains the Pearson correlation statistics. Since the NOMISS option
is specied, the same set of 28 observations is used to compute the correlation for
each pair of variables.
Output 1.6.1. Pearson Correlation Coefcients
Correlations for a Fitness and Exercise Study
The CORR Procedure
Pearson Correlation Coefficients, N = 28
Prob > |r| under H0: Rho=0
Weight
Oxygen
RunTime
Weight
1.00000
-0.18419
0.3481
0.19505
0.3199
Oxygen
-0.18419
0.3481
1.00000
-0.86843
<.0001
0.19505
0.3199
-0.86843
<.0001
1.00000
RunTime
The following statements display the output data set, which is shown in Output 1.6.2.
title Output Data Set from PROC CORR;
proc print data=CorrOutp noobs;
run;
_NAME_
Weight
Oxygen
RunTime
Weight
Oxygen
RunTime
77.2168
8.4495
28.0000
1.0000
-0.1842
0.1950
47.1327
5.5535
28.0000
-0.1842
1.0000
-0.8684
10.6954
1.4127
28.0000
0.1950
-0.8684
1.0000
The preceding statements generate the same results as the following statements:
proc reg data=Fitness nomiss;
model runtime= weight oxygen;
run;
By default, the CORR procedure displays descriptive statistics for the VAR statement
variables, which are shown in Output 1.7.1.
Output 1.7.1. Simple Statistics
Fish Measurement Data
The CORR Procedure
4
Variables:
Height
Width
Length3
Weight3
Simple Statistics
Variable
Height
Width
Length3
Weight3
Mean
Std Dev
Sum
Minimum
Maximum
34
34
34
34
15.22057
5.43805
38.38529
8.44751
1.98159
0.72967
4.21628
0.97574
517.49950
184.89370
1305
287.21524
11.52000
4.02000
30.00000
6.23168
18.95700
6.74970
46.50000
10.00000
53
54
Width
Length3
Weight3
Height
1.00000
0.92632
<.0001
0.95492
<.0001
0.96261
<.0001
Width
0.92632
<.0001
1.00000
0.92171
<.0001
0.92789
<.0001
Length3
0.95492
<.0001
0.92171
<.0001
1.00000
0.96523
<.0001
Weight3
0.96261
<.0001
0.92789
<.0001
0.96523
<.0001
1.00000
The variables are highly correlated. For example, the correlation between Height and
Width is 0.92632.
The experimental PLOTS=MATRIX option requests a scatter plot matrix for the VAR
statement variables, which is shown in Output 1.7.3.
In order to create this display, you must specify the experimental ODS GRAPHICS
statement in addition to the PLOTS=MATRIX option. For general information about
ODS graphics, refer to Chapter 15, Statistical Graphics Using ODS (SAS/STAT
Users Guide). For specic information about ODS graphics available in the CORR
procedure, see the section ODS Graphics on page 31.
To explore the correlation between Height and Width, the following statements request a scatter plot with prediction ellipses for the two variables, which is shown in
Output 1.7.4. A prediction ellipse is a region for predicting a new observation from
the population, assuming bivariate normality. It also approximates a region containing a specied percentage of the population.
ods html;
ods graphics on;
proc corr data=fish1 nomiss noprint
plots=scatter(nmaxvar=2 alpha=.20 .30);
var Height Width Length3 Weight3;
run;
ods graphics off;
ods html close;
55
56
The prediction ellipse is centered at the means (, y ). For further details, see the
x
section Condence and Prediction Ellipses on page 33.
Note that the following statements can also be used to create a scatter plot for Height
and Width:
ods html;
ods graphics on;
proc corr data=fish1 noprint
plots=scatter(alpha=.20 .30);
var Height Width;
run;
ods graphics off;
ods html close;
Output 1.7.5 includes the point (13.9, 5.1), which was excluded from Output 1.7.4
because the observation had a missing value for Weight3. The prediction ellipses in
Output 1.7.5 also reect the inclusion of this observation.
The following statements request a scatter plot with condence ellipses for the mean,
which is shown in Output 1.7.6:
ods html;
ods graphics on;
title Fish Measurement Data;
proc corr data=fish1 nomiss noprint
plots=scatter(ellipse=mean nmaxvar=2 alpha=.05 .01);
var Height Width Length3 Weight3;
run;
ods graphics off;
ods html close;
57
58
The condence ellipse for the mean is centered at the means (, y ). For further
x
details, see the section Condence and Prediction Ellipses on page 33.
By default, the CORR procedure displays descriptive statistics for all the variables
and the partial variance and partial standard deviation for the VAR statement variables, as shown in Output 1.8.1.
Output 1.8.1. Descriptive Statistics
Fish Measurement Data
The CORR Procedure
2 Partial Variables:
2
Variables:
Length3
Height
Weight3
Width
Simple Statistics
Variable
Length3
Weight3
Height
Width
Mean
Std Dev
Sum
Minimum
Maximum
34
34
34
34
38.38529
8.44751
15.22057
5.43805
4.21628
0.97574
1.98159
0.72967
1305
287.21524
517.49950
184.89370
30.00000
6.23168
11.52000
4.02000
46.50000
10.00000
18.95700
6.74970
Simple Statistics
Variable
Partial
Variance
Partial
Std Dev
Length3
Weight3
Height
Width
0.26607
0.07315
0.51582
0.27047
When a PARTIAL statement is specied, observations with missing values are excluded from the analysis. The partial correlations for the VAR statement variables
are shown in Output 1.8.2.
The partial correlation between the variables Height and Width is 0.25692, which
is much less than the unpartialled correlation, 0.92632. The p-value for the partial
correlation is 0.1558.
The PLOTS=SCATTER option requests a scatter plot of the residuals for the variables
Height and Width after controlling for the effect of variables Length3 and Weight.
59
60
Width
Height
1.00000
0.25692
0.1558
Width
0.25692
0.1558
1.00000
In Output 1.8.3, a standard deviation of Height has roughly the same length on the
X-axis as a standard deviation of Width on the Y-axis. The major axis length is not
signicantly larger than the minor axis length, indicating a weak partial correlation
between Height and Width.
References
References
Anderson, T.W. (1984), An Introduction to Multivariate Statistical Analysis, Second
Edition, New York: John Wiley & Sons.
Blum, J.R., Kiefer, J., and Rosenblatt, M. (1961), Distribution Free Tests
of Independence Based on the Sample Distribution Function, Annals of
Mathematical Statistics , 32, 485498.
Conover, W.J. (1998), Practical Nonparametric Statistics, Third Edition, New York:
John Wiley & Sons, Inc.
Cronbach, L.J. (1951), Coefcient Alpha and the Internal Structure of Tests,
Psychometrika, 16, 297334.
Fisher, R.A. (1915), Frequency Distribution of the Values of the Correlation
Coefcient in Samples from an Indenitely Large Population, Biometrika, 10,
507521.
Fisher, R.A. (1921), On the Probable Error of a Coefcient of Correlation
Deduced from a Small Sample, Metron, 1, 332.
Fisher, R.A. (1936), The Use of Multiple Measurements in Taxonomic Problems,
Annals of Eugenics, 7, 179188.
Fisher, R.A. (1970), Statistical Methods for Research Workers, Fourteenth Edition,
Davien, CT: Hafner Publishing Company.
Hoeffding, W. (1948), A Non-Parametric Test of Independence, Annals of
Mathematical Statistics , 19, 546557.
Hollander, M. and Wolfe, D. (1999), Nonparametric Statistical Methods, Second
Edition, New York: John Wiley & Sons, Inc.
Keeping, E.S. (1962), Introduction to Statistical Inference, New York: D. Van
Nostrand Cimpany, Inc.
Knight, W.E. (1966), A Computer Method for Calculating Kendalls Tau with
Ungrouped Data, Journal of the American Statistical Association , 61, 436439.
Moore, D.S. (2000), Statistics: Concepts and Controversies, Fifth Edition, New
York: W.H. Freeman & Company.
Mudholkar, G.S. (1983), Fishers z-Transformation, Encyclopedia of Statistical
Sciences, 3, 130135.
Noether, G.E. (1967), Elements of Nonparametric Statistics, New York: John Wiley
& Sons, Inc.
Novick, M.R. (1967), Coefcient Alpha and the Reliability of Composite
Measurements, Psychometrika , 32, 113.
Nunnally, J.C. and Bernstein, I.H. (1994), Psychometric Theory, Third Edition, New
York: McGraw-Hill Companies.
Ott, R.L. and Longnecker, M.T. (2000), An Introduction to Statistical Methods and
Data Analysis, Fifth Edition, Belmont, CA: Wadsworth Publishing Company.
61
62
Chapter 2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
74
75
77
77
80
84
96
97
DETAILS . . . . . . . . . . . . . . . . . . . . . . .
Inputting Frequency Counts . . . . . . . . . . . .
Grouping with Formats . . . . . . . . . . . . . .
Missing Values . . . . . . . . . . . . . . . . . . .
Statistical Computations . . . . . . . . . . . . . .
Denitions and Notation . . . . . . . . . . . .
Chi-Square Tests and Statistics . . . . . . . .
Measures of Association . . . . . . . . . . . .
Binomial Proportion . . . . . . . . . . . . . .
Risks and Risk Differences . . . . . . . . . .
Odds Ratio and Relative Risks for 2 x 2 Tables
Cochran-Armitage Test for Trend . . . . . . .
Jonckheere-Terpstra Test . . . . . . . . . . . .
Tests and Measures of Agreement . . . . . . .
Cochran-Mantel-Haenszel Statistics . . . . . .
Exact Statistics . . . . . . . . . . . . . . . . .
Computational Resources . . . . . . . . . . . . .
Output Data Sets . . . . . . . . . . . . . . . . . .
Displayed Output . . . . . . . . . . . . . . . . .
ODS Table Names . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
98
98
99
100
102
102
103
108
118
120
122
124
125
127
134
142
147
148
151
158
EXAMPLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
64
161
164
166
169
172
174
177
180
182
REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
Chapter 2
66
Getting Started
Frequency Tables and Statistics
The FREQ procedure provides easy access to statistics for testing for association in a
crosstabulation table.
In this example, high school students applied for courses in a summer enrichment
program: these courses included journalism, art history, statistics, graphic arts, and
computer programming. The students accepted were randomly assigned to classes
with and without internships in local companies. The following table contains counts
of the students who enrolled in the summer program by gender and whether they were
assigned an internship slot.
Table 2.1. Summer Enrichment Data
Gender
boys
boys
girls
girls
Internship
yes
no
yes
no
Enrollment
Yes No Total
35 29
64
14 27
41
32 10
42
53 23
76
The SAS data set SummerSchool is created by inputting the summer enrichment
data as cell count data, or providing the frequency count for each combination
of variable values. The following DATA step statements create the SAS data set
SummerSchool.
29
27
10
23
The variable Gender takes the values boys or girls, the variable Internship takes
the values yes and no, and the variable Enrollment takes the values yes and
no. The variable Count contains the number of students corresponding to each
combination of data values. The double at sign (@@) indicates that more than one
observation is included on a single data line. In this DATA step, two observations are
included on each line.
Researchers are interested in whether there is an association between internship status
and summer program enrollment. The Pearson chi-square statistic is an appropriate
statistic to assess the association in the corresponding 2 2 table. The following
PROC FREQ statements specify this analysis.
You specify the table for which you want to compute statistics with the TABLES
statement. You specify the statistics you want to compute with options after a slash
(/) in the TABLES statement.
proc freq data=SummerSchool order=data;
weight count;
tables Internship*Enrollment / chisq;
run;
The ORDER= option controls the order in which variable values are displayed in the
rows and columns of the table. By default, the values are arranged according to the
alphanumeric order of their unformatted values. If you specify ORDER=DATA, the
data are displayed in the same order as they occur in the input data set. Here, since
yes appears before no in the data, yes appears rst in any table. Other options for
controlling order include ORDER=FORMATTED, which orders according to the formatted values, and ORDER=FREQUENCY, which orders by descending frequency
count.
In the TABLES statement, Internship*Enrollment species a table where the rows
are internship status and the columns are program enrollment. The CHISQ option
requests chi-square statistics for assessing association between these two variables.
Since the input data are in cell count form, the WEIGHT statement is required. The
WEIGHT statement names the variable Count, which provides the frequency of each
combination of data values.
67
68
Enrollment
Frequency|
Percent |
Row Pct |
Col Pct |yes
|no
| Total
---------+--------+--------+
yes
|
67 |
39 |
106
| 30.04 | 17.49 | 47.53
| 63.21 | 36.79 |
| 50.00 | 43.82 |
---------+--------+--------+
no
|
67 |
50 |
117
| 30.04 | 22.42 | 52.47
| 57.26 | 42.74 |
| 50.00 | 56.18 |
---------+--------+--------+
Total
134
89
223
60.09
39.91
100.00
0.0726
0.4122
This execution of PROC FREQ rst produces two individual crosstabulation tables
of Internship*Enrollment, one for boys and one for girls. Chi-square statistics are
produced for each individual table. Figure 2.3 shows the results for boys. Note that
the chi-square statistic for boys is signicant at the = 0.05 level of signicance.
Boys offered a course with an internship are more likely to enroll than boys who are
not.
If you look at the individual table for girls, displayed in Figure 2.4, you see that
there is no evidence of association for girls between internship offers and program
enrollment.
69
70
Enrollment
Frequency|
Percent |
Row Pct |
Col Pct |no
|yes
| Total
---------+--------+--------+
no
|
27 |
14 |
41
| 25.71 | 13.33 | 39.05
| 65.85 | 34.15 |
| 48.21 | 28.57 |
---------+--------+--------+
yes
|
29 |
35 |
64
| 27.62 | 33.33 | 60.95
| 45.31 | 54.69 |
| 51.79 | 71.43 |
---------+--------+--------+
Total
56
49
105
53.33
46.67
100.00
0.0196
0.0467
Enrollment
Frequency|
Percent |
Row Pct |
Col Pct |no
|yes
| Total
---------+--------+--------+
no
|
23 |
53 |
76
| 19.49 | 44.92 | 64.41
| 30.26 | 69.74 |
| 69.70 | 62.35 |
---------+--------+--------+
yes
|
10 |
32 |
42
|
8.47 | 27.12 | 35.59
| 23.81 | 76.19 |
| 30.30 | 37.65 |
---------+--------+--------+
Total
33
85
118
27.97
72.03
100.00
0.1311
0.5245
71
72
Dermatologist 1
Terrible
Poor
Marginal
Clear
Dermatologist 2
Terrible Poor Marginal
10
4
1
5
10
12
2
4
12
0
2
6
Clear
0
2
5
13
The dermatologists evaluations of the patients are contained in the variables derm1
and derm2; the variable count is the number of patients given a particular pair of
ratings. In order to evaluate the agreement of the diagnoses (a possible contribution
to measurement error in the study), the kappa coefcient is computed. You specify
the AGREE option in the TABLES statement and use the TEST statement to request
a test for the null hypothesis that their agreement is purely by chance. You specify
the keyword KAPPA to perform this test for the kappa coefcient. The results are
shown in Figure 2.6.
data SkinCondition;
input derm1 $ derm2 $ count;
datalines;
terrible terrible 10
terrible
poor 4
terrible marginal 1
terrible
clear 0
poor
terrible 5
poor
poor 10
poor
marginal 12
poor
clear 2
marginal terrible 2
marginal
poor 4
marginal marginal 12
marginal
clear 5
clear
terrible 0
clear
poor 2
clear
marginal 6
clear
clear 13
;
73
74
0.0612
5.6366
<.0001
<.0001
Sample Size = 88
The kappa coefcient has the value 0.3449, which indicates slight agreement between the dermatologists, and the hypothesis test conrms that you can reject the
null hypothesis of no agreement. This conclusion is further supported by the condence interval of (0.2030, 0.4868), which suggests that the true kappa is greater than
zero. The AGREE option also produces Bowkers test for symmetry and the weighted
kappa coefcient, but that output is not shown.
Syntax
The following statements are available in PROC FREQ.
The rest of this section gives detailed syntax information for the BY, EXACT,
OUTPUT, TABLES, TEST, and WEIGHT statements in alphabetical order after the
description of the PROC FREQ statement. Table 2.3 summarizes the basic functions
of each statement.
Statement
BY
EXACT
OUTPUT
TABLES
TEST
WEIGHT
Description
calculates separate frequency or crosstabulation tables for each BY
group.
requests exact tests for specied statistics.
creates an output data set that contains specied statistics.
species frequency or crosstabulation tables and requests tests and
measures of association.
requests asymptotic tests for measures of association and agreement.
identies a variable with values that weight each observation.
Option
DATA=
COMPRESS
FORMCHAR=
NLEVELS
NOPRINT
ORDER=
PAGE
Description
species the input data set.
begins the next one-way table on the current page
species the outline and cell divider characters for the cells of the
crosstabulation table.
displays the number of levels for all TABLES variables
suppresses all displayed output.
species the order for listing variable values.
displays one table per page.
You can specify the following options in the PROC FREQ statement.
COMPRESS
begins display of the next one-way frequency table on the same page as the preceding
one-way table if there is enough space to begin the table. By default, the next oneway table begins on the current page only if the entire table ts on that page. The
COMPRESS option is not valid with the PAGE option.
DATA=SAS-data-set
names the SAS data set to be analyzed by PROC FREQ. If you omit the DATA=
option, the procedure uses the most recently created SAS data set.
FORMCHAR (1,2,7) =formchar-string
denes the characters to be used for constructing the outlines and dividers for the
cells of contingency tables. The FORMCHAR= option can specify 20 different SAS
formatting characters used to display output; however, PROC FREQ uses only the
rst, second, and seventh formatting characters. Therefore, the proper specication
for PROC FREQ is FORMCHAR(1,2,7)= formchar-string. The formchar-string
75
76
If you do not specify the FORMCHAR= option, PROC FREQ uses the default
formchar (1,2,7)=|-+
Refer to the CALENDAR, PLOT, and TABULATE procedures in the Base SAS 9.1
Procedures Guide for more information on form characters.
Table 2.5. Formatting Characters Used by PROC FREQ
Position
1
2
7
Default
|
+
Used to Draw
vertical separators
horizontal separators
intersections of vertical and horizontal separators
NLEVELS
displays the Number of Variable Levels table. This table provides the number of
levels for each variable named in the TABLES statements. See the section Number
of Variable Levels Table on page 151 for more information. PROC FREQ determines the variable levels from the formatted variable values, as described in the section Grouping with Formats on page 99.
NOPRINT
suppresses the display of all output. Note that this option temporarily disables the
Output Delivery System (ODS). For more information, see Chapter 14, Using the
Output Delivery System. (SAS/STAT Users Guide) .
Note: A NOPRINT option is also available in the TABLES statement. It suppresses
display of the crosstabulation tables but allows display of the requested statistics.
ORDER=DATA | FORMATTED | FREQ | INTERNAL
species the order in which the values of the frequency and crosstabulation table
variables are to be reported. The following table shows how PROC FREQ interprets
values of the ORDER= option.
DATA
FORMATTED
orders values by their formatted values. This order is operatingenvironment dependent. By default, the order is ascending.
FREQ
INTERNAL
EXACT Statement
PAGE
displays only one table per page. Otherwise, PROC FREQ displays multiple tables
per page as space permits. The PAGE option is not valid with the COMPRESS option.
BY Statement
BY variables ;
You can specify a BY statement with PROC FREQ to obtain separate analyses on
observations in groups dened by the BY variables. When a BY statement appears,
the procedure expects the input data set to be sorted in order of the BY variables.
If your input data set is not sorted in ascending order, use one of the following alternatives:
Sort the data using the SORT procedure with a similar BY statement.
Specify the BY statement option NOTSORTED or DESCENDING in the BY
statement for the FREQ procedure. The NOTSORTED option does not mean
that the data are unsorted but rather that the data are arranged in groups (according to values of the BY variables) and that these groups are not necessarily
in alphabetical or increasing numeric order.
Create an index on the BY variables using the DATASETS procedure.
For more information on the BY statement, refer to the discussion in SAS Language
Reference: Concepts. For more information on the DATASETS procedure, refer to
the discussion in the Base SAS 9.1 Procedures Guide.
EXACT Statement
EXACT statistic-options < / computation-options > ;
The EXACT statement requests exact tests or condence limits for the specied
statistics. Optionally, PROC FREQ computes Monte Carlo estimates of the exact
p-values. The statistic-options specify the statistics for which to provide exact tests
or condence limits. The computation-options specify options for the computation
of exact statistics.
CAUTION: PROC FREQ computes exact tests with fast and efcient algorithms
that are superior to direct enumeration. Exact tests are appropriate when a data set
is small, sparse, skewed, or heavily tied. For some large problems, computation of
exact tests may require a large amount of time and memory. Consider using asymptotic tests for such problems. Alternatively, when asymptotic methods may not be
sufcient for such large problems, consider using Monte Carlo estimation of exact
p-values. See the section Computational Resources on page 145 for more information.
77
78
Statistic-Options
The statistic-options specify the statistics for which exact tests or condence limits are computed. PROC FREQ can compute exact p-values for the following
hypothesis tests: chi-square goodness-of-t test for one-way tables; Pearson chisquare, likelihood-ratio chi-square, Mantel-Haenszel chi-square, Fishers exact test,
Jonckheere-Terpstra test, Cochran-Armitage test for trend, and McNemars test for
two-way tables. PROC FREQ can also compute exact p-values for tests of the following statistics: Pearson correlation coefcient, Spearman correlation coefcient,
simple kappa coefcient, weighted kappa coefcient, and common odds ratio. PROC
FREQ can compute exact p-values for the binomial proportion test for one-way tables, as well as exact condence limits for the binomial proportion. Additionally,
PROC FREQ can compute exact condence limits for the odds ratio for 2 2 tables,
as well as exact condence limits for the common odds ratio for stratied 22 tables.
Table 2.6 lists the available statistic-options and the exact statistics computed. Most
of the option names are identical to the corresponding options in the TABLES statement and the OUTPUT statement. You can request exact computations for groups
of statistics by using options that are identical to the following TABLES statement
options: CHISQ, MEASURES, and AGREE. For example, when you specify the
CHISQ option in the EXACT statement, PROC FREQ computes exact p-values for
the Pearson chi-square, likelihood-ratio chi-square, and Mantel-Haenszel chi-square
tests. You request exact p-values for an individual test by specifying one of the
statistic-options shown in Table 2.6 .
Table 2.6. EXACT Statement Statistic-Options
Option
AGREE
BINOMIAL
CHISQ
COMOR
FISHER
JT
KAPPA
LRCHI
MCNEM
MEASURES
MHCHI
OR
PCHI
PCORR
SCORR
TREND
WTKAP
EXACT Statement
Computation-Options
The computation-options specify options for computation of exact statistics. You can
specify the following computation-options in the EXACT statement.
ALPHA=
species the level of the condence limits for Monte Carlo p-value estimates. The
value of the ALPHA= option must be between 0 and 1, and the default is 0.01.
A condence level of produces 100(1 )% condence limits. The default of
ALPHA=.01 produces 99% condence limits for the Monte Carlo estimates. The
ALPHA= option invokes the MC option.
MAXTIME=value
species the maximum clock time (in seconds) that PROC FREQ can use to compute
an exact p-value. If the procedure does not complete the computation within the specied time, the computation terminates. The value of the MAXTIME= option must be
a positive number. The MAXTIME= option is valid for Monte Carlo estimation of
exact p-values, as well as for direct exact p-value computation.
See the section Computational Resources on page 145 for more information.
MC
requests Monte Carlo estimation of exact p-values instead of direct exact p-value
computation. Monte Carlo estimation can be useful for large problems that require a
great amount of time and memory for exact computations but for which asymptotic
approximations may not be sufcient. See the section Monte Carlo Estimation on
page 146 for more information.
The MC option is available for all EXACT statistic-options except BINOMIAL,
COMOR, MCNEM, and OR. PROC FREQ computes only exact tests or condence
limits for those statistics.
The ALPHA=, N=, and SEED= options also invoke the MC option.
N=n
species the number of samples for Monte Carlo estimation. The value of the N=
option must be a positive integer, and the default is 10000 samples. Larger values
of n produce more precise estimates of exact p-values. Because larger values of n
generate more samples, the computation time increases. The N= option invokes the
MC option.
POINT
species the initial seed for random number generation for Monte Carlo estimation.
The value of the SEED= option must be an integer. If you do not specify the SEED=
option, or if the SEED= value is negative or zero, PROC FREQ uses the time of day
from the computers clock to obtain the initial seed. The SEED= option invokes the
MC option.
79
80
OUTPUT Statement
OUTPUT < OUT= SAS-data-set > options ;
The OUTPUT statement creates a SAS data set containing statistics computed by
PROC FREQ. The variables contain statistics for each two-way table or stratum, as
well as summary statistics across all strata.
Only one OUTPUT statement is allowed for each execution of PROC FREQ. You
must specify a TABLES statement with the OUTPUT statement. If you use multiple
TABLES statements, the contents of the OUTPUT data set correspond to the last
TABLES statement. If you use multiple table requests in a TABLES statement, the
contents of the OUTPUT data set correspond to the last table request.
For more information, see the section Output Data Sets on page 148.
Note that you can use the Output Delivery System (ODS) to create a SAS data set
from any piece of PROC FREQ output. For more information, see Table 2.11 on page
159 and Chapter 14, Using the Output Delivery System. (SAS/STAT Users Guide)
You can specify the following options in an OUTPUT statement.
OUT=SAS-data-set
names the output data set. If you omit the OUT= option, the data set is named DATAn,
where n is the smallest integer that makes the name unique.
options
specify the statistics that you want in the output data set. Available statistics are those
produced by PROC FREQ for each one-way or two-way table, as well as the summary
statistics across all strata. When you request a statistic, the OUTPUT data set contains
that estimate or test statistic plus any associated standard error, condence limits, pvalues, and degrees of freedom. You can output statistics by using group options
identical to those specied in the TABLES statement: AGREE, ALL, CHISQ, CMH,
and MEASURES. Alternatively, you can request an individual statistic by specifying
one of the options shown in the following table.
OUTPUT Statement
Table 2.7. OUTPUT Statement Options and Required TABLES Statement Options
Option
AGREE
AJCHI
ALL
BDCHI
BIN | BINOMIAL
CHISQ
CMH
CMH1
CMH2
CMHCOR
CMHGA
CMHRMS
COCHQ
Required TABLES
Statement Option
AGREE
ALL or CHISQ
ALL
ALL or CMH
81
82
Option
CONTGY
CRAMV
EQKAP
EQWKP
FISHER | EXACT
GAMMA
JT
KAPPA
KENTB
LAMCR
LAMDAS
LAMRC
LGOR
contingency coefcient
Cramers V
test for equal simple kappas
test for equal weighted kappas
Fishers exact test
gamma
Jonckheere-Terpstra test
simple kappa coefcient
Kendalls tau-b
lambda asymmetric (C|R)
lambda symmetric
lambda asymmetric (R|C)
adjusted logit odds ratio
LGRRC1
LGRRC2
LRCHI
MCNEM
MEASURES
likelihood-ratio chi-square
McNemars test
gamma, Kendalls tau-b, Stuarts
tau-c, Somers D(C|R), Somers
D(R|C), Pearson correlation coefcient,
Spearman correlation coefcient, lambda
asymmetric (C|R), lambda asymmetric
(R|C), lambda symmetric, uncertainty
coefcient (C|R), uncertainty coefcient
(R|C), and symmetric uncertainty coefcient; for 2 2 tables, odds ratio and
relative risks
Mantel-Haenszel chi-square
adjusted Mantel-Haenszel odds ratio
MHCHI
MHOR
MHRRC1
MHRRC2
N
NMISS
Required TABLES
Statement Option
ALL or CHISQ
ALL or CHISQ
AGREE
AGREE
ALL or CHISQ
ALL or MEASURES
JT
AGREE
ALL or MEASURES
ALL or MEASURES
ALL or MEASURES
ALL or MEASURES
ALL or CMH or CMH1
or CMH2
ALL or CMH or CMH1
or CMH2
ALL or CMH or CMH1
or CMH2
ALL or CHISQ
AGREE
ALL or MEASURES
ALL or CHISQ
ALL or CMH or CMH1
or CMH2
ALL or CMH or CMH1
or CMH2
ALL or CMH or CMH1
or CMH2
ALL and CHISQ compute Fishers exact test for 2 2 tables. Use the FISHER option to compute
Fishers exact test for general rxc tables.
OUTPUT Statement
Table 2.7. (continued)
Option
OR
odds ratio
PCHI
RISKDIFF
RISKDIFF1
RISKDIFF2
RRC1
chi-square goodness-of-t test for oneway tables; for two-way tables, Pearson
chi-square
Pearson correlation coefcient
phi coefcient
polychoric correlation coefcient
column 1 risk difference (row 1 - row 2)
column 2 risk difference (row 1 - row 2)
odds ratio and relative risks for 2 2 tables
risks and risk differences
column 1 risks and risk difference
column 2 risks and risk difference
column 1 relative risk
RRC2
RSK1
RSK11
RSK12
RSK2
RSK21
RSK22
SCORR
SMDCR
SMDRC
STUTC
TREND
TSYMM
U
UCR
URC
WTKAP
PCORR
PHI
PLCORR
RDIF1
RDIF2
RELRISK
Required TABLES
Statement Option
ALL or MEASURES
or RELRISK
ALL or CHISQ
ALL or MEASURES
ALL or CHISQ
PLCORR
RISKDIFF
RISKDIFF
ALL or MEASURES
or RELRISK
RISKDIFF
RISKDIFF
RISKDIFF
ALL or MEASURES
or RELRISK
ALL or MEASURES
or RELRISK
RISKDIFF
RISKDIFF
RISKDIFF
RISKDIFF
RISKDIFF
RISKDIFF
ALL or MEASURES
ALL or MEASURES
ALL or MEASURES
ALL or MEASURES
TREND
AGREE
ALL or MEASURES
ALL or MEASURES
ALL or MEASURES
AGREE
83
84
TABLES Statement
TABLES requests < / options > ;
The TABLES statement requests one-way to n-way frequency and crosstabulation
tables and statistics for those tables.
If you omit the TABLES statement, PROC FREQ generates one-way frequency tables
for all data set variables that are not listed in the other statements.
The following argument is required in the TABLES statement.
requests
Request
tables A*(B C);
tables (A B)*(C D);
tables (A B C)*D;
tables A C;
tables (A C)*D;
Equivalent to
tables A*B A*C;
tables A*C B*C A*D B*D;
tables A*D B*D C*D;
tables A B C;
tables A*D B*D C*D;
Without Options
If you request a one-way frequency table for a variable without specifying options,
PROC FREQ produces frequencies, cumulative frequencies, percentages of the total
frequency, and cumulative percentages for each value of the variable. If you request a
two-way or an n-way crosstabulation table without specifying options, PROC FREQ
produces crosstabulation tables that include cell frequencies, cell percentages of the
total frequency, cell percentages of row frequencies, and cell percentages of column
TABLES Statement
frequencies. The procedure excludes observations with missing values from the table
but displays the total frequency of missing observations below each table.
Options
The following table lists the options available with the TABLES statement.
Descriptions follow in alphabetical order.
Table 2.9. TABLES Statement Options
Option
Description
Control Statistical Analysis
AGREE
requests tests and measures of classication agreement
ALL
requests tests and measures of association produced by CHISQ,
MEASURES, and CMH
ALPHA=
sets the condence level for condence limits
BDT
requests Tarones adjustment for the Breslow-Day test
BINOMIAL
requests binomial proportion, condence limits and test for oneway tables
BINOMIALC
requests BINOMIAL statistics with a continuity correction
CHISQ
requests chi-square tests and measures of association based on chisquare
CL
requests condence limits for the MEASURES statistics
CMH
requests all Cochran-Mantel-Haenszel statistics
CMH1
requests the CMH correlation statistic, and adjusted relative risks
and odds ratios
CMH2
requests CMH correlation and row mean scores (ANOVA) statistics, and adjusted relative risks and odds ratios
CONVERGE=
species convergence criterion to compute polychoric correlation
FISHER
requests Fishers exact test for tables larger than 2 2
JT
requests Jonckheere-Terpstra test
MAXITER=
species maximum number of iterations to compute polychoric
correlation
MEASURES
requests measures of association and their asymptotic standard errors
MISSING
treats missing values as nonmissing
PLCORR
requests polychoric correlation
RELRISK
requests relative risk measures for 2 2 tables
RISKDIFF
requests risks and risk differences for 2 2 tables
RISKDIFFC
requests RISKDIFF statistics with a continuity correction
SCORES=
species the type of row and column scores
TESTF=
species expected frequencies for a one-way table chi-square test
TESTP=
species expected proportions for a one-way table chi-square test
TREND
requests Cochran-Armitage test for trend
85
86
Option
Description
Control Additional Table Information
CELLCHI2
displays each cells contribution to the total Pearson chi-square
statistic
CUMCOL
displays the cumulative column percentage in each cell
DEVIATION
displays the deviation of the cell frequency from the expected
value for each cell
EXPECTED
displays the expected cell frequency for each cell
MISSPRINT
displays missing value frequencies
SPARSE
lists all possible combinations of variable levels even when a combination does not occur
TOTPCT
displays percentage of total frequency on n-way tables when n > 2
Control Displayed Output
CONTENTS=
species the HTML contents link for crosstabulation tables
CROSSLIST
displays crosstabulation tables in ODS column format
FORMAT=
formats the frequencies in crosstabulation tables
LIST
displays two-way to n-way tables in list format
NOCOL
suppresses display of the column percentage for each cell
NOCUM
suppresses display of cumulative frequencies and cumulative percentages in one-way frequency tables and in list format
NOFREQ
suppresses display of the frequency count for each cell
NOPERCENT
suppresses display of the percentage, row percentage, and column
percentage in crosstabulation tables, or percentages and cumulative percentages in one-way frequency tables and in list format
NOPRINT
suppresses display of tables but displays statistics
NOROW
suppresses display of the row percentage for each cell
NOSPARSE
suppresses zero cell frequencies in the list display and in the OUT=
data set when ZEROS is specied
NOWARN
suppresses log warning message for the chi-square test
PRINTKWT
displays kappa coefcient weights
SCOROUT
displays the row and the column scores
Create an Output Data Set
OUT=
species an output data set to contain variable values and frequency counts
OUTCUM
includes the cumulative frequency and cumulative percentage in
the output data set for one-way tables
OUTEXPECT
includes the expected frequency of each cell in the output data set
OUTPCT
includes the percentage of column frequency, row frequency, and
two-way table frequency in the output data set
TABLES Statement
You can specify the following options in a TABLES statement.
AGREE < (WT=FC) >
requests tests and measures of classication agreement for square tables. The
AGREE option provides McNemars test for 2 2 tables and Bowkers test of symmetry for tables with more than two response categories. The AGREE option also
produces the simple kappa coefcient, the weighted kappa coefcient, the asymptotic
standard errors for the simple and weighted kappas, and the corresponding condence
limits. When there are multiple strata, the AGREE option provides overall simple and
weighted kappas as well as tests for equal kappas among strata. When there are multiple strata and two response categories, PROC FREQ computes Cochrans Q test.
For more information, see the section Tests and Measures of Agreement on page
127.
The (WT=FC) specication requests that PROC FREQ use Fleiss-Cohen weights to
compute the weighted kappa coefcient. By default, PROC FREQ uses CicchettiAllison weights. See the section Weighted Kappa Coefcient on page 130 for
more information. You can specify the PRINTKWT option to display the kappa
coefcient weights.
AGREE statistics are computed only for square tables, where the number of rows
equals the number of columns. If your table is not square due to observations with
zero weights, you can use the ZEROS option in the WEIGHT statement to include
these observations. For more details, see the section Tables with Zero Rows and
Columns on page 133.
ALL
requests all of the tests and measures that are computed by the CHISQ, MEASURES,
and CMH options. The number of CMH statistics computed can be controlled by the
CMH1 and CMH2 options.
ALPHA=
species the level of condence limits. The value of the ALPHA= option must be between 0 and 1, and the default is 0.05. A condence level of produces 100(1 )%
condence limits. The default of ALPHA=0.05 produces 95% condence limits.
ALPHA= applies to condence limits requested by TABLES statement options.
There is a separate ALPHA= option in the EXACT statement that sets the level of
condence limits for Monte Carlo estimates of exact p-values, which are requested
in the EXACT statement.
BDT
requests Tarones adjustment in the Breslow-Day test for homogeneity of odds ratios.
(You must specify the CMH option to compute the Breslow-Day test.) See the section Breslow-Day Test for Homogeneity of the Odds Ratios on page 142 for more
information.
BINOMIAL < (P= value) | (LEVEL= level-number | level-value) >
requests the binomial proportion for one-way tables. The BINOMIAL option also
provides the asymptotic standard error, asymptotic and exact condence intervals,
87
88
requests the BINOMIAL option statistics for one-way tables, and includes a continuity correction in the asymptotic condence interval and the asymptotic test. The
BINOMIAL option statistics include the binomial proportion, the asymptotic standard error, asymptotic and exact condence intervals, and the asymptotic test for the
binomial proportion. To request an exact test for the binomial proportion, use the
BINOMIAL option in the EXACT statement.
To specify the null hypothesis proportion for the test, use P=. If you omit P=value,
PROC FREQ uses 0.5 as the default for the test. By default BINOMIALC computes
the proportion of observations for the rst variable level that appears in the output.
To specify a different level, use LEVEL=level-number or LEVEL=level-value, where
level-number is the variable levels number or order in the output, and level-value is
the formatted value of the variable level.
See the section Binomial Proportion on page 118 for more information.
CELLCHI2
displays each crosstabulation table cells contribution to the total Pearson chi-square
statistic, which is computed as
(frequency expected)2
expected
The CELLCHI2 option has no effect for one-way tables or for tables that are displayed with the LIST option.
CHISQ
requests chi-square tests of homogeneity or independence and measures of association based on chi-square. The tests include the Pearson chi-square, likelihood-ratio
chi-square, and Mantel-Haenszel chi-square. The measures include the phi coefcient, the contingency coefcient, and Cramers V . For 2 2 tables, the CHISQ
option includes Fishers exact test and the continuity-adjusted chi-square. For oneway tables, the CHISQ option requests a chi-square goodness-of-t test for equal
proportions. If you specify the null hypothesis proportions with the TESTP= option,
then PROC FREQ computes a chi-square goodness-of-t test for the specied proportions. If you specify null hypothesis frequencies with the TESTF= option, PROC
TABLES Statement
FREQ computes a chi-square goodness-of-t test for the specied frequencies. See
the section Chi-Square Tests and Statistics on page 103 for more information.
CL
requests condence limits for the MEASURES statistics. If you omit the
MEASURES option, the CL option invokes MEASURES. The FREQ procedure
determines the condence coefcient using the ALPHA= option, which, by default,
equals 0.05 and produces 95% condence limits.
For more information, see the section Condence Limits on page 109.
CMH
species the text for the HTML contents le links to crosstabulation tables. For
information on HTML output, refer to the SAS Output Delivery System Users Guide.
The CONTENTS= option affects only the HTML contents le, and not the HTML
body le.
If you omit the CONTENTS= option, by default, the HTML link text for crosstabulation tables is Cross-Tabular Freq Table.
Note that links to all crosstabulation tables produced by a single TABLES statement
use the same text. To specify different text for different crosstabulation table links,
89
90
species the convergence criterion for computing the polychoric correlation when
you specify the PLCORR option. The value of the CONVERGE= option must be a
positive number; by default, CONVERGE=0.0001. Iterative computation of the polychoric correlation stops when the convergence measure falls below the value of the
CONVERGE= option or when the number of iterations exceeds the value specied
in the MAXITER= option, whichever happens rst.
See the section Polychoric Correlation on page 116 for more information.
CROSSLIST
displays crosstabulation tables in ODS column format, instead of the default crosstabulation cell format. In a CROSSLIST table display, the rows correspond to the
crosstabulation table cells, and the columns correspond to descriptive statistics such
as Frequency, Percent, and so on. See the section Multiway Tables on page 152 for
details on the contents of the CROSSLIST table.
The CROSSLIST table displays the same information as the default crosstabulation
table, but uses an ODS column format instead of the table cell format. Unlike the
default crosstabulation table, the CROSSLIST table has a table denition that you
can customize with PROC TEMPLATE. For more information, refer to the chapter
titled The TEMPLATE Procedure in the SAS Output Delivery System Users Guide.
You can control the contents of a CROSSLIST table with the same options available
for the default crosstabulation table. These include the NOFREQ, NOPERCENT,
NOROW, and NOCOL options. You can request additional information in a
CROSSLIST table with the CELLCHI2, DEVIATION, EXPECTED, MISSPRINT,
and TOTPCT options.
The FORMAT= option and the CUMCOL option have no effect for CROSSLIST
tables. You cannot specify both the LIST option and the CROSSLIST option in the
same TABLES statement.
You can use the NOSPARSE option to suppress display of variable levels with zero
frequency in CROSSLIST tables. By default for CROSSLIST tables, PROC FREQ
displays all levels of the column variable within each level of the row variable, including any column variable levels with zero frequency for that row. And for multiway
tables displayed with the CROSSLIST option, the procedure displays all levels of
the row variable for each stratum of the table by default, including any row variable
levels with zero frequency for the stratum.
TABLES Statement
CUMCOL
displays the cumulative column percentages in the cells of the crosstabulation table.
DEVIATION
displays the deviation of the cell frequency from the expected frequency for each cell
of the crosstabulation table. The DEVIATION option is valid for contingency tables
but has no effect on tables produced with the LIST option.
EXPECTED
displays the expected table cell frequencies under the hypothesis of independence (or
homogeneity). The EXPECTED option is valid for crosstabulation tables but has no
effect on tables produced with the LIST option.
FISHER | EXACT
requests Fishers exact test for tables that are larger than 2 2. This test is also
known as the Freeman-Halton test. For more information, see the section Fishers
Exact Test on page 106 and the EXACT Statement section on page 77.
If you omit the CHISQ option in the TABLES statement, the FISHER option invokes
CHISQ. You can also request Fishers exact test by specifying the FISHER option in
the EXACT statement.
CAUTION: For tables with many rows or columns or with large total frequency,
PROC FREQ may require a large amount of time or memory to compute exact pvalues. See the section Computational Resources on page 145 for more information.
FORMAT=format-name
species a format for the following crosstabulation table cell values: frequency, expected frequency, and deviation. PROC FREQ also uses this format to display the
total row and column frequencies for crosstabulation tables.
You can specify any standard SAS numeric format or a numeric format dened
with the FORMAT procedure. The format length must not exceed 24. If you omit
FORMAT=, by default, PROC FREQ uses the BEST6. format to display frequencies
less than 1E6, and the BEST7. format otherwise.
To change formats for all other FREQ tables, you can use PROC TEMPLATE. For information on this procedure, refer to the chapter titled The TEMPLATE Procedure
in the SAS Output Delivery System Users Guide.
JT
LIST
displays two-way to n-way tables in a list format rather than as crosstabulation tables.
PROC FREQ ignores the LIST option when you request statistical tests or measures
of association.
91
92
species the maximum number of iterations for computing the polychoric correlation
when you specify the PLCORR option. The value of the MAXITER= option must be
a positive integer; by default, MAXITER=20. Iterative computation of the polychoric
correlation stops when the number of iterations exceeds the value of the MAXITER=
option, or when the convergence measure falls below the value of the CONVERGE=
option, whichever happens rst. For more information see the section Polychoric
Correlation on page 116.
MEASURES
requests several measures of association and their asymptotic standard errors (ASE).
The measures include gamma, Kendalls tau-b, Stuarts tau-c, Somers D(C|R),
Somers D(R|C), the Pearson and Spearman correlation coefcients, lambda (symmetric and asymmetric), uncertainty coefcients (symmetric and asymmetric). To
request condence limits for these measures of association, you can specify the CL
option.
For 2 2 tables, the MEASURES option also provides the odds ratio, column
1 relative risk, column 2 relative risk, and the corresponding condence limits.
Alternatively, you can obtain the odds ratio and relative risks, without the other measures of association, by specifying the RELRISK option.
For more information, see the section Measures of Association on page 108.
MISSING
displays missing value frequencies for all tables, even though PROC FREQ does not
use the frequencies in the calculation of statistics. For more information, see the
section Missing Values on page 100.
NOCOL
suppresses the display of cumulative frequencies and cumulative percentages for oneway frequency tables and for crosstabulation tables in list format.
NOFREQ
suppresses the display of cell frequencies for crosstabulation tables. This also suppresses frequencies for row totals.
NOPERCENT
suppresses the display of cell percentages, row total percentages, and column total
percentages for crosstabulation tables. For one-way frequency tables and crosstabulation tables in list format, the NOPERCENT option suppresses the display of percentages and cumulative percentages.
TABLES Statement
NOPRINT
suppresses the display of frequency and crosstabulation tables but displays all requested tests and statistics. Use the NOPRINT option in the PROC FREQ statement
to suppress the display of all tables.
NOROW
requests that PROC FREQ not invoke the SPARSE option when you specify the
ZEROS option in the WEIGHT statement. The NOSPARSE option suppresses the
display of cells with a zero frequency count in the list output, and it also omits them
from the OUT= data set. By default, the ZEROS option invokes the SPARSE option, which displays table cells with a zero frequency count in the LIST output and
includes them in the OUT= data set. For more information, see the description of the
ZEROS option.
For CROSSLIST tables, the NOSPARSE option suppresses display of variable levels
with zero frequency. By default for CROSSLIST tables, PROC FREQ displays all
levels of the column variable within each level of the row variable, including any
column variable levels with zero frequency for that row. And for multiway tables
displayed with the CROSSLIST option, the procedure displays all levels of the row
variable for each stratum of the table by default, including any row variable levels
with zero frequency for the stratum.
NOWARN
suppresses the log warning message that the asymptotic chi-square test may not be
valid. By default, PROC FREQ displays this log message when more than 20 percent
of the table cells have expected frequencies less than ve.
OUT=SAS-data-set
names the output data set that contains variable values and frequency counts. The
variable COUNT contains the frequencies and the variable PERCENT contains the
percentages. If more than one table request appears in the TABLES statement, the
contents of the data set correspond to the last table request in the TABLES statement.
For more information, see the section Output Data Sets on page 148 and see the
following descriptions for the options OUTCUM, OUTEXPECT, and OUTPCT.
OUTCUM
includes the cumulative frequency and the cumulative percentage for one-way tables
in the output data set when you specify the OUT= option in the TABLES statement.
The variable CUM FREQ contains the cumulative frequency for each level of the
analysis variable, and the variable CUM PCT contains the cumulative percentage
for each level. The OUTCUM option has no effect for two-way or multiway tables.
For more information, see the section Output Data Sets on page 148.
OUTEXPECT
includes the expected frequency in the output data set for crosstabulation tables when
you specify the OUT= option in the TABLES statement. The variable EXPECTED
93
94
includes the following additional variables in the output data set when you specify
the OUT= option in the TABLES statement for crosstabulation tables:
PCT COL
PCT ROW
PCT TABL
The OUTPCT option is valid for two-way or multiway tables, and has no effect for
one-way tables.
For more information, see the section Output Data Sets on page 148.
PLCORR
requests the polychoric correlation coefcient. For 2 2 tables, this statistic is more
commonly known as the tetrachoric correlation coefcient, and it is labeled as such
in the displayed output. If you omit the MEASURES option, the PLCORR option invokes MEASURES. For more information, see the section Polychoric Correlation
on page 116 and the descriptions for the CONVERGE= and MAXITER= options in
this list.
PRINTKWT
displays the weights PROC FREQ uses to compute the weighted kappa coefcient.
You must also specify the AGREE option, which requests the weighted kappa coefcient. You can specify (WT=FC) with the AGREE option to request Fleiss-Cohen
weights. By default, PROC FREQ uses Cicchetti-Allison weights.
See the section Weighted Kappa Coefcient on page 130 for more information.
RELRISK
requests relative risk measures and their condence limits for 2 2 tables. These
measures include the odds ratio and the column 1 and 2 relative risks. For more
information, see the section Odds Ratio and Relative Risks for 2 x 2 Tables on page
122. You can also obtain the RELRISK measures by specifying the MEASURES
option, which produces other measures of association in addition to the relative risks.
RISKDIFF
requests column 1 and 2 risks (or binomial proportions), risk differences, and their
condence limits for 2 2 tables. See the section Risks and Risk Differences on
page 120 for more information.
RISKDIFFC
requests the RISKDIFF option statistics for 2 2 tables, and includes a continuity
correction in the asymptotic condence limits. The RISKDIFF option statistics include the column 1 and 2 risks (or binomial proportions), risk differences, and their
TABLES Statement
condence limits. See the section Risks and Risk Differences on page 120 for more
information.
SCORES=type
species the type of row and column scores that PROC FREQ uses with the MantelHaenszel chi-square, Pearson correlation, Cochran-Armitage test for trend, weighted
kappa coefcient, and Cochran-Mantel-Haenszel statistics, where type is one of the
following (the default is SCORE=TABLE):
MODRIDIT
RANK
RIDIT
TABLE
By default, the row or column scores are the integers 1,2,... for character variables
and the actual variable values for numeric variables. Using other types of scores
yields nonparametric analyses. For more information, see the section Scores on
page 102.
To display the row and column scores, you can use the SCOROUT option.
SCOROUT
displays the row and the column scores. You specify the score type with the
SCORES= option. PROC FREQ uses the scores when it calculates the MantelHaenszel chi-square, Pearson correlation, Cochran-Armitage test for trend, weighted
kappa coefcient, or Cochran-Mantel-Haenszel statistics. The SCOROUT option
displays the row and column scores only when statistics are computed for two-way
tables. To store the scores in an output data set, use the Output Delivery System.
For more information, see the section Scores on page 102.
SPARSE
lists all possible combinations of the variable values for an n-way table when n > 1,
even if a combination does not occur in the data. The SPARSE option applies only
to crosstabulation tables displayed in list format and to the OUT= output data set.
Otherwise, if you do not use the LIST option or the OUT= option, the SPARSE
option has no effect.
When you specify the SPARSE and LIST options, PROC FREQ displays all combinations of variable variables in the table listing, including those values with a frequency
count of zero. By default, without the SPARSE option, PROC FREQ does not display
zero-frequency values in list output. When you use the SPARSE and OUT= options,
PROC FREQ includes empty crosstabulation table cells in the output data set. By
default, PROC FREQ does not include zero-frequency table cells in the output data
set.
For more information, see the section Missing Values on page 100.
95
96
species the null hypothesis frequencies for a one-way chi-square test for specied
frequencies. You can separate values with blanks or commas. The sum of the frequency values must equal the total frequency for the one-way table. The number of
TESTF= values must equal the number of variable levels in the one-way table. List
these values in the order in which the corresponding variable levels appear in the
output. If you omit the CHISQ option, the TESTF= option invokes CHISQ.
For more information, see the section Chi-Square Test for One-Way Tables on page
104.
TESTP=(values)
species the null hypothesis proportions for a one-way chi-square test for specied
proportions. You can separate values with blanks or commas. Specify values in probability form as numbers between 0 and 1, where the proportions sum to 1. Or specify
values in percentage form as numbers between 0 and 100, where the percentages sum
to 100. The number of TESTP= values must equal the number of variable levels in
the one-way table. List these values in the order in which the corresponding variable levels appear in the output. If you omit the CHISQ option, the TESTP= option
invokes CHISQ.
For more information, see the section Chi-Square Test for One-Way Tables on page
104.
TOTPCT
displays the percentage of total frequency in crosstabulation tables, for n-way tables where n > 2. This percentage is also available with the LIST option or as the
PERCENT variable in the OUT= output data set.
TREND
performs the Cochran-Armitage test for trend. The table must be 2 C or R 2. For
more information, see the section Cochran-Armitage Test for Trend on page 124.
TEST Statement
TEST options ;
The TEST statement requests asymptotic tests for the specied measures of association and measures of agreement. You must use a TABLES statement with the TEST
statement.
options
specify the statistics for which to provide asymptotic tests. The available statistics are
those measures of association and agreement listed in Table 2.10 . The option names
are identical to those in the TABLES statement and the OUTPUT statement. You can
request all available tests for groups of statistics by using group options MEASURES
or AGREE. Or you can request tests individually by using one of the options shown
in Table 2.10 .
For each measure of association or agreement that you specify, the TEST statement provides an asymptotic test that the measure equals zero. When you request
an asymptotic test, PROC FREQ gives the asymptotic standard error under the null
WEIGHT Statement
hypothesis, the test statistic, and the p-values. Additionally, PROC FREQ reports the
condence limits for that measure. The ALPHA= option in the TABLES statement
determines the condence level, which, by default, equals 0.05 and provides 95%
condence limits. For more information, see the sections Asymptotic Tests on
page 109 and Condence Limits on page 109, and see Statistical Computations
beginning on page 102 for sections describing the individual measures.
In addition to these asymptotic tests, exact tests for selected measures of association
and agreement are available with the EXACT statement. See the section EXACT
Statement on page 77 for more information.
Table 2.10. TEST Statement Options and Required TABLES Statement Options
Option
AGREE
GAMMA
KAPPA
KENTB
MEASURES
PCORR
SCORR
SMDCR
SMDRC
STUTC
WTKAP
Required TABLES
Statement Option
AGREE
ALL or MEASURES
AGREE
ALL or MEASURES
ALL or MEASURES
ALL or MEASURES
ALL or MEASURES
ALL or MEASURES
ALL or MEASURES
ALL or MEASURES
AGREE
WEIGHT Statement
WEIGHT variable < / option > ;
The WEIGHT statement species a numeric variable with a value that represents the
frequency of the observation. The WEIGHT statement is most commonly used to
input cell count data. See the Inputting Frequency Counts section on page 98 for
more information. If you use the WEIGHT statement, PROC FREQ assumes that an
observation represents n observations, where n is the value of variable. The value of
the weight variable need not be an integer. When a weight value is missing, PROC
FREQ ignores the corresponding observation. When a weight value is zero, PROC
FREQ ignores the corresponding observation unless you specify the ZEROS option,
which includes observations with zero weights. If a WEIGHT statement does not
appear, each observation has a default weight of 1. The sum of the weight variable
values represents the total number of observations.
If any value of the weight variable is negative, PROC FREQ displays the frequencies
(as measured by the weighted values) but does not compute percentages and other
statistics. If you create an output data set using the OUT= option in the TABLES
97
98
Option
ZEROS
includes observations with zero weight values. By default, PROC FREQ ignores
observations with zero weights.
If you specify the ZEROS option, frequency and and crosstabulation tables display
any levels corresponding to observations with zero weights. Without the ZEROS
option, PROC FREQ does not process observations with zero weights, and so does
not display levels that contain only observations with zero weights.
With the ZEROS option, PROC FREQ includes levels with zero weights in the chisquare goodness-of-t test for one-way tables. Also, PROC FREQ includes any levels
with zero weights in binomial computations for one-way tables. This enables computation of binomial estimates and tests when there are no observations with positive
weights in the specied level.
For two-way tables, the ZEROS option enables computation of kappa statistics when
there are levels containing no observations with positive weight. For more information, see the section Tables with Zero Rows and Columns on page 133.
Note that even with the ZEROS option, PROC FREQ does not compute the CHISQ
or MEASURES statistics for two-way tables when the table has a zero row or zero
column, because most of these statistics are undened in this case.
The ZEROS option invokes the SPARSE option in the TABLES statement, which
includes table cells with a zero frequency count in the list output and the OUT=
data set. By default, without the SPARSE option, PROC FREQ does not include
zero frequency cells in the list output or in the OUT= data set. If you specify the
ZEROS option in the WEIGHT statement but do not want the SPARSE option, you
can specify the NOSPARSE option in the TABLES statement.
Details
Inputting Frequency Counts
PROC FREQ can use either raw data or cell count data to produce frequency and
crosstabulation tables. Raw data, also known as case-record data, report the data
as one record for each subject or sample member. Cell count data report the data
as a table, listing all possible combinations of data values along with the frequency
counts. This way of presenting data often appears in published results.
R C @@;
1 1
1 2
2 2
04 1 1
09 2 1
14 2 2
05 1 1
10 2 1
14 2 2
You can store the same data as cell counts using the following DATA step statements:
data CellCounts;
input R C Count @@;
datalines;
1 1 5
1 2 3
2 1 4
2 2 3
;
The variable R contains the values for the rows, and the variable C contains the values
for the columns. The Count variable contains the cell count for each row and column
combination.
Both the Raw data set and the CellCounts data set produce identical frequency
counts, two-way tables, and statistics. With the CellCounts data set, you must use
a WEIGHT statement to specify that the Count variable contains cell counts. For
example, to create a two-way crosstabulation table, submit the following statements:
proc freq data=CellCounts;
weight Count;
tables R*C;
run;
Now the table lists the frequency count for formatted level 1 as two and formatted
level 2 as three.
99
100
When you use a FORMAT statement to assign Questfmt. to a variable, the variables
frequency table no longer includes a frequency count for the response of 8. You
must use the MISSING or MISSPRINT option in the TABLES statement to list the
frequency for no answer. The frequency count for this level includes observations
with either a value of 8 or a missing value (.).
The frequency or crosstabulation table lists the values of both character and numeric
variables in ascending order based on internal (unformatted) variable values unless
you change the order with the ORDER= option. To list the values in ascending order
by formatted values, use ORDER=FORMATTED in the PROC FREQ statement.
For more information on the FORMAT statement, refer to SAS Language Reference:
Concepts.
Missing Values
By default, PROC FREQ excludes missing values before it constructs the frequency
and crosstabulation tables. PROC FREQ also excludes missing values before computing statistics. However, the total frequency of observations with missing values is
displayed below each table. The following options change the way in which PROC
FREQ handles missing values:
Missing Values
The OUT= option in the TABLES statement includes an observation in the output
data set that contains the frequency of missing values. The NMISS option in the
OUTPUT statement creates a variable in the output data set that contains the number
of missing values.
Figure 2.7 shows three ways in which PROC FREQ handles missing values. The rst
table uses the default method; the second table uses the MISSPRINT option; and the
third table uses the MISSING option.
*** Default ***
The FREQ Procedure
Cumulative
Cumulative
A
Frequency
Percent
Frequency
Percent
-----------------------------------------------------1
2
50.00
2
50.00
2
2
50.00
4
100.00
Frequency Missing = 2
101
102
Statistical Computations
Denitions and Notation
In this chapter, a two-way table represents the crosstabulation of variables X and Y.
Let the rows of the table be labeled by the values Xi , i = 1, 2, . . . , R, and the columns
by Yj , j = 1, 2, . . . , C. Let nij denote the cell frequency in the ith row and the jth
column and dene the following:
nij
(row totals)
nij
ni =
(column totals)
nj
=
i
nij
n =
i
pij
(overall total)
= nij /n
(cell percentages)
pi = ni /n
(row percentages)
pj
(column percentages)
= nj /n
R =
ni Ri /n
nj Cj /n
C =
j
Aij
k<i l<j
k>i l>j
Dij
nkl +
k>i l<j
nkl
nkl +
nkl
k<i l>j
nij Aij
i
nij Dij
Q =
Scores
PROC FREQ uses scores for the variable values when computing the MantelHaenszel chi-square, Pearson correlation, Cochran-Armitage test for trend, weighted
kappa coefcient, and Cochran-Mantel-Haenszel statistics. The SCORES= option in
the TABLES statement species the score type that PROC FREQ uses. The available
score types are TABLE, RANK, RIDIT, and MODRIDIT scores. The default score
type is TABLE.
Statistical Computations
For numeric variables, table scores are the values of the row and column levels. If the
row or column variables are formatted, then the table score is the internal numeric
value corresponding to that level. If two or more numeric values are classied into
the same formatted level, then the internal numeric value for that level is the smallest
of these values. For character variables, table scores are dened as the row numbers
and column numbers (that is, 1 for the rst row, 2 for the second row, and so on).
Rank scores, which you can use to obtain nonparametric analyses, are dened by
Row scores:
R1i =
nk + (ni + 1)/2
i = 1, 2, . . . , R
nl + (nj + 1)/2
j = 1, 2, . . . , C
k<i
Column scores:
C1j
=
l<j
= C1j /n
Modied ridit (MODRIDIT) scores (van Elteren 1960; Lehmann 1975), which also
yield nonparametric analyses, represent the expected values of the order statistics for
the uniform distribution on (0,1). Modied ridit scores are derived from rank scores
as
R3i = R1i /(n + 1)
C3j
= C1j /(n + 1)
103
104
For one-way frequency tables, the CHISQ option in the TABLES statement computes
a chi-square goodness-of-t test. Let C denote the number of classes, or levels, in the
one-way table. Let fi denote the frequency of class i (or the number of observations
in class i) for i = 1, 2, ..., C. Then PROC FREQ computes the chi-square statistic as
C
QP
=
i=1
(fi ei )2
ei
where ei is the expected frequency for class i under the null hypothesis.
In the test for equal proportions, which is the default for the CHISQ option, the null
hypothesis species equal proportions of the total sample size for each class. Under
this null hypothesis, the expected frequency for each class equals the total sample
size divided by the number of classes,
ei = n / C
for i = 1, 2, . . . , C
In the test for specied frequencies, which PROC FREQ computes when you input null hypothesis frequencies using the TESTF= option, the expected frequencies
are those TESTF= values. In the test for specied proportions, which PROC FREQ
computes when you input null hypothesis proportions using the TESTP= option, the
expected frequencies are determined from the TESTP= proportions pi , as
ei = pi n
for i = 1, 2, . . . , C
Statistical Computations
Under the null hypothesis (of equal proportions, specied frequencies, or specied
proportions), this test statistic has an asymptotic chi-square distribution, with C 1
degrees of freedom. In addition to the asymptotic test, PROC FREQ computes the
exact one-way chi-square test when you specify the CHISQ option in the EXACT
statement.
Chi-Square Test for Two-Way Tables
The Pearson chi-square statistic for two-way tables involves the differences between
the observed and expected frequencies, where the expected frequencies are computed
under the null hypothesis of independence. The chi-square statistic is computed as
(nij eij )2
eij
QP =
i
where
eij =
ni nj
n
When the row and column variables are independent, QP has an asymptotic chisquare distribution with (R 1)(C 1) degrees of freedom. For large values of QP ,
this test rejects the null hypothesis in favor of the alternative hypothesis of general
association. In addition to the asymptotic test, PROC FREQ computes the exact chisquare test when you specify the PCHI or CHISQ option in the EXACT statement.
For a 2 2 table, the Pearson chi-square is also appropriate for testing the equality
of two binomial proportions or, for R 2 and 2 C tables, the homogeneity of
proportions. Refer to Fienberg (1980).
Likelihood-Ratio Chi-Square Test
The likelihood-ratio chi-square statistic involves the ratios between the observed and
expected frequencies. The statistic is computed as
G2 = 2
nij ln
i
nij
eij
When the row and column variables are independent, G2 has an asymptotic chisquare distribution with (R 1)(C 1) degrees of freedom. In addition to the
asymptotic test, PROC FREQ computes the exact test when you specify the LRCHI
or CHISQ option in the EXACT statement.
Continuity-Adjusted Chi-Square Test
105
106
QC =
i
Under the null hypothesis of independence, QC has an asymptotic chi-square distribution with (R 1)(C 1) degrees of freedom.
Mantel-Haenszel Chi-Square Test
The Mantel-Haenszel chi-square statistic tests the alternative hypothesis that there is
a linear association between the row variable and the column variable. Both variables
must lie on an ordinal scale. The statistic is computed as
QM H = (n 1)r2
where r2 is the Pearson correlation between the row variable and the column variable. For a description of the Pearson correlation, see the Pearson Correlation
Coefcient section on page 113. The Pearson correlation and, thus, the MantelHaenszel chi-square statistic use the scores that you specify in the SCORES= option
in the TABLES statement.
Under the null hypothesis of no association, QM H has an asymptotic chi-square
distribution with one degree of freedom. In addition to the asymptotic test, PROC
FREQ computes the exact test when you specify the MHCHI or CHISQ option in the
EXACT statement.
Refer to Mantel and Haenszel (1959) and Landis, Heyman, and Koch (1978).
Fishers Exact Test
Fishers exact test is another test of association between the row and column variables. This test assumes that the row and column totals are xed, and then uses the
hypergeometric distribution to compute probabilities of possible tables with these
observed row and column totals. Fishers exact test does not depend on any largesample distribution assumptions, and so it is appropriate even for small sample sizes
and for sparse tables.
2 2 Tables
For 2 2 tables, PROC FREQ gives the following information for Fishers exact test:
table probability, two-sided p-value, left-sided p-value, and right-sided p-value. The
table probability equals the hypergeometric probability of the observed table, and is
in fact the value of the test statistic for Fishers exact test.
Where p is the hypergeometric probability of a specic table with the observed row
and column totals, Fishers exact p-values are computed by summing probabilities p
over dened sets of tables,
PROB =
p
A
Statistical Computations
The two-sided p-value is the sum of all possible table probabilties (for tables having
the observed row and column totals) that are less than or equal to the observed table
probability. So, for the two-sided p-value, the set A includes all possible tables with
hypergeometric probabilities less than or equal to the probability of the observed
table. A small two-sided p-value supports the alternative hypothesis of association
between the row and column variables.
One-sided tests are dened in terms of the frequency of the cell in the rst row and
rst column of the table, the (1,1) cell. Denoting the observed (1,1) cell frequency
by F , the left-sided p-value for Fishers exact test is probability that the (1,1) cell
frequency is less than or equal to F . So, for the left-sided p-value, the set A includes
those tables with a (1,1) cell frequency less than or equal to F . A small left-sided pvalue supports the alternative hypothesis that the probability of an observation being
in the rst cell is less than expected under the null hypothesis of independent row and
column variables.
Similarly, for a right-sided alternative hypothesis, A is the set of tables where the
frequency of the (1,1) cell is greater than or equal to that in the observed table. A
small right-sided p-value supports the alternative that the probability of the rst cell
is greater than that expected under the null hypothesis.
Because the (1,1) cell frequency completely determines the 2 2 table when the
marginal row and column sums are xed, these one-sided alternatives can be equivalently stated in terms of other cell probabilities or ratios of cell probabilities. The
left-sided alternative is equivalent to an odds ratio greater than 1, where the odds ratio equals (n11 n22 / n12 n21 ). Additionally, the left-sided alternative is equivalent to
the column 1 risk for row 1 being less than the column 1 risk for row 2, p1|1 < p1|2 .
Similarly, the right-sided alternative is equivalent to the column 1 risk for row 1 being
greater than the column 1 risk for row 2, p1|1 > p1|2 . Refer to Agresti (1996).
R C Tables
Fishers exact test was extended to general R C tables by Freeman and Halton
(1951), and this test is also known as the Freeman-Halton test. For R C tables, the
two-sided p-value is dened the same as it is for 2 2 tables. The set A contains all
tables with p less than or equal to the probability of the observed table. A small pvalue supports the alternative hypothesis of association between the row and column
variables. For RC tables, Fishers exact test is inherently two-sided. The alternative
hypothesis is dened only in terms of general, and not linear, association. Therefore,
PROC FREQ does not provide right-sided or left-sided p-values for general R C
tables.
For R C tables, PROC FREQ computes Fishers exact test using the network algorithm of Mehta and Patel (1983), which provides a faster and more efcient solution
than direct enumeration. See the section Exact Statistics beginning on page 142 for
more details.
107
108
Phi Coefcient
The phi coefcient is a measure of association derived from the Pearson chi-square
statistic. It has range 1 1 for 2 2 tables. Otherwise, the range is
the
QP /n
for 2 2 tables
otherwise
The contingency coefcient is a measure of association derived from the Pearson chisquare. It has the range 0 P (m 1)/m, where m = min(R, C) (Liebetrau
1983). The contingency coefcient is computed as
P =
QP
QP + n
V =
for 2 2 tables
QP /n
min(R 1, C 1)
otherwise
Measures of Association
When you specify the MEASURES option in the TABLES statement, PROC FREQ
computes several statistics that describe the association between the two variables
of the contingency table. The following are measures of ordinal association that
consider whether the variable Y tends to increase as X increases: gamma, Kendalls
tau-b, Stuarts tau-c, and Somers D. These measures are appropriate for ordinal
variables, and they classify pairs of observations as concordant or discordant. A pair
is concordant if the observation with the larger value of X also has the larger value of
Y. A pair is discordant if the observation with the larger value of X has the smaller
Statistical Computations
value of Y. Refer to Agresti (1996) and the other references cited in the discussion of
each measure of association.
The Pearson correlation coefcient and the Spearman rank correlation coefcient are
also appropriate for ordinal variables. The Pearson correlation describes the strength
of the linear association between the row and column variables, and it is computed
using the row and column scores specied by the SCORES= option in the TABLES
statement. The Spearman correlation is computed with rank scores. The polychoric
correlation (requested by the PLCORR option) also requires ordinal variables and
assumes that the variables have an underlying bivariate normal distribution. The following measures of association do not require ordinal variables, but they are appropriate for nominal variables: lambda asymmetric, lambda symmetric, and uncertainty
coefcients.
PROC FREQ computes estimates of the measures according to the formulas given in
the discussion of each measure of association. For each measure, PROC FREQ computes an asymptotic standard error (ASE), which is the square root of the asymptotic
variance denoted by var in the following sections.
Condence Limits
If you specify the CL option in the TABLES statement, PROC FREQ computes
asymptotic condence limits for all MEASURES statistics. The condence coefcient is determined according to the value of the ALPHA= option, which, by default,
equals 0.05 and produces 95% condence limits.
The condence limits are computed as
est ( z/2 ASE )
where est is the estimate of the measure, z/2 is the 100(1 /2) percentile of
the standard normal distribution, and ASE is the asymptotic standard error of the
estimate.
Asymptotic Tests
For each measure that you specify in the TEST statement, PROC FREQ computes
an asymptotic test of the null hypothesis that the measure equals zero. Asymptotic
tests are available for the following measures of association: gamma, Kendalls tau-b,
Stuarts tau-c, Somers D(R|C), Somers D(C|R), the Pearson correlation coefcient, and the Spearman rank correlation coefcient. To compute an asymptotic test,
PROC FREQ uses a standardized test statistic z, which has an asymptotic standard
normal distribution under the null hypothesis. The standardized test statistic is computed as
z =
est
var0 (est)
where est is the estimate of the measure and var0 (est) is the variance of the estimate
under the null hypothesis. Formulas for var0 (est) are given in the discussion of each
measure of association.
109
110
if z > 0
P1 = Prob ( Z < z )
if z 0
Exact tests are available for two measures of association, the Pearson correlation coefcient and the Spearman rank correlation coefcient. If you specify the PCORR
option in the EXACT statement, PROC FREQ computes the exact test of the hypothesis that the Pearson correlation equals zero. If you specify the SCORR option in the
EXACT statement, PROC FREQ computes the exact test of the hypothesis that the
Spearman correlation equals zero. See the section Exact Statistics beginning on
page 142 for information on exact tests.
Gamma
The estimator of gamma is based only on the number of concordant and discordant
pairs of observations. It ignores tied pairs (that is, pairs of observations that have
equal values of X or equal values of Y ). Gamma is appropriate only when both
variables lie on an ordinal scale. It has the range 1 1. If the two variables
are independent, then the estimator of gamma tends to be close to zero. Gamma is
estimated by
G=
P Q
P +Q
16
(P + Q)4
Statistical Computations
The variance of the estimator under the null hypothesis that gamma equals zero is
computed as
var0 (G) =
(P + Q)2
Kendalls tau-b is similar to gamma except that tau-b uses a correction for ties. Tau-b
is appropriate only when both variables lie on an ordinal scale. Tau-b has the range
1 b 1. It is estimated by
P Q
tb =
wr wc
with
1
var = 4
w
where
w =
wr wc
n2
i
wr = n2
i
n2
j
wc = n2
j
dij
= Aij Dij
vij
= ni wc + nj wr
The variance of the estimator under the null hypothesis that tau-b equals zero is computed as
var0 (tb ) =
4
wr wc
111
112
Stuarts Tau-c
Stuarts tau-c makes an adjustment for table size in addition to a correction for ties.
Tau-c is appropriate only when both variables lie on an ordinal scale. Tau-c has the
range 1 c 1. It is estimated by
tc =
m(P Q)
n2 (m 1)
with
var =
4m2
(m 1)2 n4
nij d2 (P Q)2 /n
ij
i
where
m = min(R, C)
dij
= Aij Dij
The variance of the estimator under the null hypothesis that tau-c equals zero is
var0 (tc ) = var
Refer to Brown and Benedetti (1977).
Somers D (C |R ) and D (R |C )
Somers D(C|R) and Somers D(R|C) are asymmetric modications of tau-b. C|R
denotes that the row variable X is regarded as an independent variable, while the
column variable Y is regarded as dependent. Similarly, R|C denotes that the column
variable Y is regarded as an independent variable, while the row variable X is regarded
as dependent. Somers D differs from tau-b in that it uses a correction only for pairs
that are tied on the independent variable. Somers D is appropriate only when both
variables lie on an ordinal scale. It has the range 1 D 1. Formulas for Somers
D(R|C) are obtained by interchanging the indices.
D(C|R) =
P Q
wr
with
var =
4
4
wr
Statistical Computations
where
n2
i
wr = n2
i
dij
= Aij Dij
The variance of the estimator under the null hypothesis that D(C|R) equals zero is
computed as
var0 (D(C|R)) =
4
2
wr
Refer to Somers (1962), Goodman and Kruskal (1979), and Liebetrau (1983).
Pearson Correlation Coefcient
PROC FREQ computes the Pearson correlation coefcient using the scores specied
in the SCORES= option. The Pearson correlation is appropriate only when both variables lie on an ordinal scale. It has the range 1 1. The Pearson correlation
coefcient is computed as
r=
v
ssrc
=
w
ssr ssc
with
var =
1
w4
bij v
The row scores Ri and the column scores Cj are determined by the SCORES= option
in the TABLES statement, and
ssr =
i
ssc =
i
ssrc =
bij
v = ssrc
w =
ssr ssc
113
114
r =
var0 (r)
where var0 (r) is the variance of the correlation under the null hypothesis.
i
var0 (r) =
ssr ssc
The Spearman correlation coefcient is computed using rank scores R1i and C1j ,
dened in the section Scores beginning on page 102. It is appropriate only when
both variables lie on an ordinal scale. It has the range 1 s 1. The Spearman
correlation coefcient is computed as
rs =
v
w
with
var =
nij (zij z )2
n2 w 4
where
nij R(i)C(j)
v =
i
w =
F
1
FG
12
n3
i
= n3
i
Statistical Computations
n3
j
G = n3
j
z =
nij zij
i
zij
= wvij vwij
vij
= n R(i)C(j) +
1
2
nil C(l) +
l
1
2
nkj R(k)+
k
nkl C(l) +
l
wij
k>i
nkl R(k)
k
l>j
n
F n2 + Gn2
i
j
96w
Refer to Snedecor and Cochran (1989) and Brown and Benedetti (1977).
To compute an asymptotic test for the Spearman correlation, PROC FREQ uses a
rs =
var0 (rs )
where var0 (rs ) is the variance of the correlation under the null hypothesis.
var0 (rs ) =
1
n2 w 2
nij (vij v )2
where
nij vij /n
v=
115
116
Polychoric Correlation
When you specify the PLCORR option in the TABLES statement, PROC FREQ
computes the polychoric correlation. This measure of association is based on the
assumption that the ordered, categorical variables of the frequency table have an underlying bivariate normal distribution. For 2 2 tables, the polychoric correlation is
also known as the tetrachoric correlation. Refer to Drasgow (1986) for an overview
of polychoric correlation. The polychoric correlation coefcient is the maximum
likelihood estimate of the product-moment correlation between the normal variables,
estimating thresholds from the observed table frequencies. The range of the polychoric correlation is from -1 to 1. Olsson (1979) gives the likelihood equations and
an asymptotic covariance matrix for the estimates.
To estimate the polychoric correlation, PROC FREQ iteratively solves the likelihood
equations by a Newton-Raphson algorithm using the Pearson correlation coefcient
as the initial approximation. Iteration stops when the convergence measure falls below the convergence criterion or when the maximum number of iterations is reached,
whichever occurs rst. The CONVERGE= option sets the convergence criterion, and
the default value is 0.0001. The MAXITER= option sets the maximum number of
iterations, and the default value is 20.
Lambda Asymmetric
Asymmetric lambda, (C|R), is interpreted as the probable improvement in predicting the column variable Y given knowledge of the row variable X. Asymmetric
lambda has the range 0 (C|R) 1. It is computed as
(C|R) =
r
nr
i ri
with
var =
n i ri
(n r)3
(ri | li = l)
ri + r 2
i
where
ri = max(nij )
j
r = max(nj )
j
Also, let li be the unique value of j such that ri = nij , and let l be the unique value
of j such that r = nj .
Because of the uniqueness assumptions, ties in the frequencies or in the marginal
totals must be broken in an arbitrary but consistent manner. In case of ties, l is dened
here as the smallest value of j such that r = nj . For a given i, if there is at least one
value j such that nij = ri = cj , then li is dened here to be the smallest such value
Statistical Computations
117
The nondirectional lambda is the average of the two asymmetric lambdas, (C|R)
and (R|C). Lambda symmetric has the range 0 1. Lambda symmetric is
dened as
i ri
j cj
rc
2n r c
wv
w
with
var =
1
wvy 2w2 n
w4
(nij | j = li , i = kj ) 2v 2 (n nkl )
i
where
cj
= max(nij )
i
c = max(ni )
i
w = 2n r c
ri
v = 2n
i
cj
j
(cj | kj = k) + rk + cl
(ri | li = l) +
x =
i
y = 8n w v 2x
118
U (C|R) =
with
var =
1
2 w4
n
nij H(Y ) ln
i
nij
ni
+ (H(X) H(XY )) ln
nj
n
where
v = H(X) + H(Y ) H(XY )
w = H(Y )
H(X) =
i
H(Y ) =
j
ni
ni
ln
n
n
nj
nj
ln
n
n
H(XY ) =
i
nij
nij
ln
n
n
Refer to Theil (1972, pp. 115120) and Goodman and Kruskal (1979).
Uncertainty Coefcient (U )
The uncertainty coefcient, U , is the symmetric version of the two asymmetric coefcients. It has the range 0 U 1. It is dened as
U=
with
nij H(XY ) ln
var = 4
i
ni nj
(H(X) + H(Y
n2
2 (H(X) + H(Y ))4
n
)) ln
nij
n
Binomial Proportion
When you specify the BINOMIAL option in the TABLES statement, PROC FREQ
computes a binomial proportion for one-way tables. By default this is the proportion
of observations in the rst variable level, or class, that appears in the output. To
specify a different level, use the LEVEL= option.
p = n1 / n
Statistical Computations
where n1 is the frequency for the rst level and n is the total frequency for the oneway table. The standard error for the binomial proportion is computed as
se() =
p
p (1 p) / n
Using the normal approximation to the binomial distribution, PROC FREQ constructs
asymptotic condence limits for p according to
p ( z/2 se() )
p
where z/2 is the 100(1 /2) percentile of the standard normal distribution. The
condence level is determined by the ALPHA= option, which, by default, equals
0.05 and produces 95% condence limits.
If you specify the BINOMIALC option, PROC FREQ includes a continuity correction of 1/2n in the asymptotic condence limits for p. The purpose of this correction
is to adjust for the difference between the normal approximation and the binomial distribution, which is a discrete distribution. Refer to Fleiss (1981). With the continuity
correction, the asymptotic condence limits for p are
p ( z/2 se() + (1/2n) )
p
Additionally, PROC FREQ computes exact condence limits for the binomial proportion using the F distribution method given in Collett (1991) and also described by
Leemis and Trivedi (1996).
PROC FREQ computes an asymptotic test of the hypothesis that the binomial proportion equals p0 , where the value of p0 is specied by the P= option in the TABLES
statement. If you do not specify a value for the P= option, PROC FREQ uses p0 = 0.5
by default. The asymptotic test statistic is
z =
p p0
p0 (1 p0 ) / n
If you specify the BINOMIALC option, PROC FREQ includes a continuity correction in the asymptotic test statistic, towards adjusting for the difference between the
normal approximation and the discrete binomial distribution. Refer to Fleiss (1981).
The continuity correction of (1/2n) is subtracted from ( p0 ) in the numerator of
p
the test statistic z if ( p0 ) is positive; otherwise, the continuity correction is added
p
to the numerator.
PROC FREQ computes one-sided and two-sided p-values for this test. When the test
statistic z is greater than zero, its expected value under the null hypothesis, PROC
FREQ computes the right-sided p-value, which is the probability of a larger value
of the statistic occurring under the null hypothesis. A small right-sided p-value supports the alternative hypothesis that the true value of the proportion is greater than
p0 . When the test statistic is less than or equal to zero, PROC FREQ computes the
left-sided p-value, which is the probability of a smaller value of the statistic occurring
119
120
if z > 0
P1 = Prob ( Z < z )
if z 0
Prob (X = x | p0 ) =
p0x (1 p0 ) (nx)
x = 0, 1, 2, . . . , n
where the variable X has a binomial distribution with parameters n and p0 . To compute Prob(X n1 ), PROC FREQ sums these binomial probabilities over x from
zero to n1 . To compute Prob(X n1 ), PROC FREQ sums these binomial probabilities over x from n1 to n. Then the exact one-sided p-value is
P1 = min ( Prob(X n1 | p0 ), Prob(X n1 | p0 ) )
and the exact two-sided p-value is
P2 = 2 P1
Row 1
Row 2
Total
Column 1
n11
n21
n1
Column 2
n12
n22
n2
Total
n1
n2
n
Statistical Computations
The column 1 risk for row 1 is the proportion of row 1 observations classied in
column 1,
p1|1 = n11 / n1
This estimates the conditional probability of the column 1 response, given the rst
level of the row variable.
The column 1 risk for row 2 is the proportion of row 2 observations classied in
column 1,
p1|2 = n21 / n2
and the overall column 1 risk is the proportion of all observations classied in
column 1,
p1 = n1 / n
The column 1 risk difference compares the risks for the two rows, and it is computed
as the column 1 risk for row 1 minus the column 1 risk for row 2,
(pdi )1 = p1|1 p1|2
The risks and risk difference are dened similarly for column 2.
The standard error of the column 1 risk estimate for row i is computed as
se(p1|i ) =
p1|i (1 p1|i ) / ni
p1 (1 p1 ) / n
If the two rows represent independent binomial samples, the standard error for the
column 1 risk difference is computed as
se ( (pdi )1 ) =
var(p1|1 ) + var(p1|2 )
The standard errors are computed in a similar manner for the column 2 risks and risk
difference.
Using the normal approximation to the binomial distribution, PROC FREQ constructs
asymptotic condence limits for the risks and risk differences according to
est ( z/2 se(est) )
121
122
The odds ratio is a useful measure of association for a variety of study designs. For a
retrospective design called a case-control study, the odds ratio can be used to estimate
the relative risk when the probability of positive response is small (Agresti 1990). In a
case-control study, two independent samples are identied based on a binary (yes-no)
response variable, and the conditional distribution of a binary explanatory variable is
examined, within xed levels of the response variable. Refer to Stokes, Davis, and
Koch (1995) and Agresti (1996).
The odds of a positive response (column 1) in row 1 is n11 /n12 . Similarly, the odds
of a positive response in row 2 is n21 /n22 . The odds ratio is formed as the ratio of
the row 1 odds to the row 2 odds. The odds ratio for 2 2 tables is dened as
OR =
n11 n22
n11 /n12
=
n21 /n22
n12 n21
The odds ratio can be any nonnegative number. When the row and column variables
are independent, the true value of the odds ratio equals 1. An odds ratio greater than 1
indicates that the odds of a positive response are higher in row 1 than in row 2. Values
less than 1 indicate the odds of positive response are higher in row 2. The strength of
association increases with the deviation from 1.
The transformation G = (OR 1)/(OR + 1) transforms the odds ratio to the range
(1, 1) with G = 0 when OR = 1; G = 1 when OR = 0; and G approaches 1
Statistical Computations
as OR approaches innity. G is the gamma statistic, which PROC FREQ computes
when you specify the MEASURES option.
The asymptotic 100(1 )% condence limits for the odds ratio are
1
1
1
1
+
+
+
n11 n12 n21 n22
and z is the 100(1 /2) percentile of the standard normal distribution. If any of the
four cell frequencies are zero, the estimates are not computed.
When you specify option OR in the EXACT statement, PROC FREQ computes exact
condence limits for the odds ratio. Because this is a discrete problem, the condence
coefcient for these exact condence limits is not exactly 1 but is at least 1 .
Thus, these condence limits are conservative. Refer to Agresti (1992).
PROC FREQ computes exact condence limits for the odds ratio with an algorithm
based on that presented by Thomas (1971). Refer also to Gart (1971). The following
two equations are solved iteratively for the lower and upper condence limits, 1 and
2 .
n1
i=n11
n11
i=0
n1
i
n1
i
n2
n1 i
n2
n1 i
n1
i /
1
i=0
n1
i
2
/
i=0
n1
i
n1
i
n2
n1 i
n2
n1 i
i
1
i
2
/2
/2
When the odds ratio equals zero, which occurs when either n11 = 0 or n22 = 0, then
PROC FREQ sets the lower exact condence limit to zero and determines the upper
limit with level . Similarly, when the odds ratio equals innity, which occurs when
either n12 = 0 or n21 = 0, then PROC FREQ sets the upper exact condence limit
to innity and determines the lower limit with level .
Relative Risks (Cohort Studies)
These measures of relative risk are useful in cohort (prospective) study designs,
where two samples are identied based on the presence or absence of an explanatory
factor. The two samples are observed in future time for the binary (yes-no) response
variable under study. Relative risk measures are also useful in cross-sectional studies,
where two variable are observed simultaneously. Refer to Stokes, Davis, and Koch
(1995) and Agresti (1996).
The column 1 relative risk is the ratio of the column 1 risks for row 1 to row 2.
The column 1 risk for row 1 is the proportion of the row 1 observations classied in
column 1,
p1|1 = n11 / n1
123
124
p1|1
p1|2
A relative risk greater than 1 indicates that the probability of positive response is
greater in row 1 than in row 2. Similarly, a relative risk less than 1 indicates that
the probability of positive response is less in row 1 than in row 2. The strength of
association increases with the deviation from 1.
The asymptotic 100(1 )% condence limits for the column 1 relative risk are
1 p1|1 1 p1|2
+
n11
n21
and z is the 100(1 /2) percentile of the standard normal distribution. If either n11
or n21 is zero, the estimates are not computed.
PROC FREQ computes the column 2 relative risks in a similar manner.
R
i=1 ni1 (Ri
R)
p1 (1 p1 )s2
Statistical Computations
where
R
ni (Ri R)2
s2 =
i=1
The row scores Ri are determined by the value of the SCORES= option in the
TABLES statement. By default, PROC FREQ uses table scores. For character variables, the table scores for the row variable are the row numbers (for example, 1 for
the rst row, 2 for the second row, and so on). For numeric variables, the table score
for each row is the numeric value of the row level. When you perform the trend
test, the explanatory variable may be numeric (for example, dose of a test substance),
and these variable values may be appropriate scores. If the explanatory variable has
ordinal levels that are not numeric, you can assign meaningful scores to the variable
levels. Sometimes equidistant scores, such as the table scores for a character variable,
may be appropriate. For more information on choosing scores for the trend test, refer
to Margolin (1988).
The null hypothesis for the Cochran-Armitage test is no trend, which means that the
binomial proportion pi1 = ni1 /ni is the same for all levels of the explanatory variable. Under this null hypothesis, the trend test statistic is asymptotically distributed as
a standard normal random variable. In addition to this asymptotic test, PROC FREQ
can compute the exact trend test, which you request by specifying the TREND option
in the EXACT statement. See the section Exact Statistics beginning on page 142
for information on exact tests.
PROC FREQ computes one-sided and two-sided p-values for the trend test. When the
test statistic is greater than its null hypothesis expected value of zero, PROC FREQ
computes the right-sided p-value, which is the probability of a larger value of the
statistic occurring under the null hypothesis. A small right-sided p-value supports
the alternative hypothesis of increasing trend in binomial proportions from row 1 to
row R. When the test statistic is less than or equal to zero, PROC FREQ outputs the
left-sided p-value. A small left-sided p-value supports the alternative of decreasing
trend.
The one-sided p-value P1 can be expressed as
P1 = Prob ( Trend Statistic > T )
if T > 0
if T 0
Jonckheere-Terpstra Test
The JT option in the TABLES statement requests the Jonckheere-Terpstra test, which
is a nonparametric test for ordered differences among classes. It tests the null hypothesis that the distribution of the response variable does not differ among classes. It is
125
126
j = 1, . . . , ni. ; j = 1, . . . , ni . }
+
1
2
where Xi,j is response j in row i. Then the Jonckheere-Terpstra test statistic is computed as
J
Mi,i
1i< i R
This test rejects the null hypothesis of no difference among classes for large values
of J. Asymptotic p-values for the Jonckheere-Terpstra test are obtained by using
the normal approximation for the distribution of the standardized test statistic. The
standardized test statistic is computed as
J =
J E0 (J)
var0 (J)
where E0 (J) and var0 (J) are the expected value and variance of the test statistic
under the null hypothesis.
E0 (J) =
n2
i
n2
/4
Statistical Computations
127
where
nj (nj 1)(2nj + 5)
ni (ni 1)(2ni + 5)
A = n(n 1)(2n + 5)
B=
ni (ni 1)(ni 2)
nj (nj 1)(nj 2)
j
C=
ni (ni 1)
nj (nj 1)
In addition to this asymptotic test, PROC FREQ can compute the exact JonckheereTerpstra test, which you request by specifying the JT option in the EXACT statement.
See the section Exact Statistics beginning on page 142 for information on exact
tests.
PROC FREQ computes one-sided and two-sided p-values for the Jonckheere-Terpstra
test. When the standardized test statistic is greater than its null hypothesis expected
value of zero, PROC FREQ computes the right-sided p-value, which is the probability
of a larger value of the statistic occurring under the null hypothesis. A small rightsided p-value supports the alternative hypothesis of increasing order from row 1 to
row R. When the standardized test statistic is less than or equal to zero, PROC FREQ
computes the left-sided p-value. A small left-sided p-value supports the alternative
of decreasing order from row 1 to row R.
The one-sided p-value P1 can be expressed as
P1 = Prob ( Std JT Statistic > J )
if J > 0
if J 0
128
McNemars Test
PROC FREQ computes McNemars test for 22 tables when you specify the AGREE
option. McNemars test is appropriate when you are analyzing data from matched
pairs of subjects with a dichotomous (yes-no) response. It tests the null hypothesis of
marginal homogeneity, or p1 = p1 . McNemars test is computed as
QM =
(n12 n21 )2
n12 + n21
Under the null hypothesis, QM has an asymptotic chi-square distribution with one
degree of freedom. Refer to McNemar (1947), as well as the references cited in the
preceding section. In addition to the asymptotic test, PROC FREQ also computes
the exact p-value for McNemars test when you specify the MCNEM option in the
EXACT statement.
Bowkers Test of Symmetry
For Bowkers test of symmetry, the null hypothesis is that the probabilities in the
square table satisfy symmetry or that pij = pji for all pairs of table cells. When there
are more than two categories, Bowkers test of symmetry is calculated as
QB =
i<j
(nij nji )2
nij + nji
For large samples, QB has an asymptotic chi-square distribution with R(R 1)/2
degrees of freedom under the null hypothesis of symmetry of the expected counts.
Refer to Bowker (1948). For two categories, this test of symmetry is identical to
McNemars test.
Statistical Computations
Simple Kappa Coefcient
Po Pe
1 Pe
where Po = i pii and Pe = i pi. p.i . If the two response variables are viewed as
two independent ratings of the n subjects, the kappa coefcient equals +1 when there
is complete agreement of the raters. When the observed agreement exceeds chance
agreement, kappa is positive, with its magnitude reecting the strength of agreement.
Although this is unusual in practice, kappa is negative when the observed agreement
is less than chance agreement. The minimum value of kappa is between 1 and 0,
depending on the marginal proportions.
The asymptotic variance of the simple kappa coefcient can be estimated by the following, according to Fleiss, Cohen, and Everitt (1969):
A+BC
(1 Pe )2 n
var =
where
A=
i
B = (1 )2
pij (pi + pj )2
i=j
and
2
C = Pe (1 )
PROC FREQ computes condence limits for the simple kappa coefcient according
to
( z/2
var )
where z/2 is the 100(1 /2) percentile of the standard normal distribution. The
value of is determined by the value of the ALPHA= option, which, by default,
equals 0.05 and produces 95% condence limits.
To compute an asymptotic test for the kappa coefcient, PROC FREQ uses a standardized test statistic , which has an asymptotic standard normal distribution under
129
130
var0 ( )
where var0 ( ) is the variance of the kappa coefcient under the null hypothesis.
var0 ( ) =
2
Pe + Pe i pi pi (pi + pi )
(1 Pe )2 n
Po(w) Pe(w)
1 Pe(w)
where
wij pij
Po(w) =
i
and
wij pi pj
Pe(w) =
The asymptotic variance of the weighted kappa coefcient can be estimated by the
following, according to Fleiss, Cohen, and Everitt (1969):
2
2
i
var =
w Pe(w) (1 w )
Statistical Computations
where
pj wij
wi =
j
and
pi wij
wj =
i
PROC FREQ computes condence limits for the weighted kappa coefcient according to
w ( z/2
var )
where z/2 is the 100(1 /2) percentile of the standard normal distribution. The
value of is determined by the value of the ALPHA= option, which, by default,
equals 0.05 and produces 95% condence limits.
To compute an asymptotic test for the weighted kappa coefcient, PROC FREQ uses
a standardized test statistic , which has an asymptotic standard normal distribution
w
under the null hypothesis that weighted kappa equals zero. The standardized test
statistic is computed as
=
w
var0 ( w )
where var0 ( w ) is the variance of the weighted kappa coefcient under the null hy
pothesis.
2
i
var0 ( w ) =
j pi pj wij (w i + w j )
2
Pe(w)
(1 Pe(w) )2 n
PROC FREQ computes kappa coefcient weights using the column scores and one of
two available weight types. The column scores are determined by the SCORES= option in the TABLES statement. The two available weight types are Cicchetti-Allison
and Fleiss-Cohen, and PROC FREQ uses the Cicchetti-Allison type by default. If you
specify (WT=FC) with the AGREE option, then PROC FREQ uses the Fleiss-Cohen
weight type to construct kappa weights.
131
132
|Ci Cj |
CC C1
where Ci is the score for column i, and C is the number of categories or columns.
You can specify the score type using the SCORES= option in the TABLES statement; if you do not specify the SCORES= option, PROC FREQ uses table scores.
For numeric variables, table scores are the values of the numeric row and column
headings. You can assign numeric values to the categories in a way that reects their
level of similarity. For example, suppose you have four categories and order them
according to similarity. If you assign them values of 0, 2, 4, and 10, the following
weights are used for computing the weighted kappa coefcient: w12 = 0.8, w13 = 0.6,
w14 = 0, w23 = 0.8, w24 = 0.2, and w34 = 0.4. Note that when there are only two
categories (that is, C = 2), the weighted kappa coefcient is identical to the simple
kappa coefcient.
If you specify (WT=FC) with the AGREE option in the TABLES statement, PROC
FREQ computes Fleiss-Cohen kappa coefcient weights using a form similar to that
given by Fleiss and Cohen (1973).
wij = 1
(Ci Cj )2
(CC C1 )2
For the preceding example, the weights used for computing the weighted kappa coefcient are: w12 = 0.96, w13 = 0.84, w14 = 0, w23 = 0.96, w24 = 0.36, and w34 =
0.64.
Overall Kappa Coefcient
When there are multiple strata, PROC FREQ combines the stratum-level estimates of
kappa into an overall estimate of the supposed common value of kappa. Assume there
are q strata, indexed by h = 1, 2, . . . , q, and let var( h ) denote the squared standard
error of h . Then the estimate of the overall kappa, according to Fleiss (1981), is
computed as
q
overall =
h=1
/
var( h )
h=1
1
var( h )
PROC FREQ computes an estimate of the overall weighted kappa in a similar manner.
Tests for Equal Kappa Coefcients
When there are multiple strata, the following chi-square statistic tests whether the
stratum-level values of kappa are equal.
q
QK =
h=1
( h overall )2
var( h )
Statistical Computations
Under the null hypothesis of equal kappas over the q strata, QK has an asymptotic
chi-square distribution with q 1 degrees of freedom. PROC FREQ computes a test
for equal weighted kappa coefcients in a similar manner.
Cochrans Q Test
Cochrans Q is computed for multi-way tables when each variable has two levels,
that is, for 2 2 2 tables. Cochrans Q statistic is used to test the homogeneity
of the one-dimensional margins. Let m denote the number of variables and N denote
the total number of subjects. Then Cochrans Q statistic is computed as
QC = (m 1)
m
mT
m
2
2
j=1 Tj T
2
N Sk
k=1
where Tj is the number of positive responses for variable j, T is the total number
of positive responses over all variables, and Sk is the number of positive responses
for subject k. Under the null hypothesis, Cochrans Q is an approximate chi-square
statistic with m 1 degrees of freedom. Refer to Cochran (1950). When there are
only two binary response variables (m = 2), Cochrans Q simplies to McNemars
test. When there are more than two response categories, you can test for marginal
homogeneity using the repeated measures capabilities of the CATMOD procedure.
Tables with Zero Rows and Columns
The AGREE statistics are dened only for square tables, where the number of rows
equals the number of columns. If the table is not square, PROC FREQ does not
compute AGREE statistics. In the kappa statistic framework, where two independent
raters assign ratings to each of n subjects, suppose one of the raters does not use all
possible r rating levels. If the corresponding table has r rows but only r 1 columns,
then the table is not square, and PROC FREQ does not compute the AGREE statistics.
To create a square table in this situation, use the ZEROS option in the WEIGHT
statement, which requests that PROC FREQ include observations with zero weights
in the analysis. And input zero-weight observations to represent any rating levels that
are not used by a rater, so that the input data set has at least one observation for each
possible rater and rating combination. This includes all rating levels in the analysis,
whether or not all levels are actually assigned by both raters. The resulting table is a
square table, r r, and so all AGREE statistics can be computed.
For more information, see the description of the ZEROS option. By default, PROC
FREQ does not process observations that have zero weights, because these observations do not contribute to the total frequency count, and because any resulting zeroweight row or column causes many of the tests and measures of association to be
undened. However, kappa statistics are dened for tables with a zero-weight row or
column, and the ZEROS option allows input of zero-weight observations so you can
construct the tables needed to compute kappas.
133
134
Cochran-Mantel-Haenszel Statistics
For n-way crosstabulation tables, consider the following example:
proc freq;
tables A*B*C*D / cmh;
run;
The CMH option in the TABLES statement gives a stratied statistical analysis of the
relationship between C and D, after controlling for A and B. The stratied analysis
provides a way to adjust for the possible confounding effects of A and B without being forced to estimate parameters for them. The analysis produces Cochran-MantelHaenszel statistics, and for 2 2 tables, it includes estimation of the common odds
ratio, common relative risks, and the Breslow-Day test for homogeneity of the odds
ratios.
Let the number of strata be denoted by q, indexing the strata by h = 1, 2, . . . , q.
Each stratum contains a contingency table with X representing the row variable and
Y representing the column variable. For table h, denote the cell frequency in row i
and column j by nhij , with corresponding row and column marginal totals denoted
by nhi. and nh.j , and the overall stratum total by nh .
Because the formulas for the Cochran-Mantel-Haenszel statistics are more easily dened in terms of matrices, the following notation is used. Vectors are presumed to be
column vectors unless they are transposed ( ).
(1 C)
nhi
nh
phi
(1 1)
phj
nhi
nh
nhj
nh
Ph
(1 R)
Ph
(1 C)
(1 RC)
(1 1)
Assume that the strata are independent and that the marginal totals of each stratum
are xed. The null hypothesis, H0 , is that there is no association between X and Y
in any of the strata. The corresponding model is the multiple hypergeometric; this
implies that, under H0 , the expected value and covariance matrix of the frequencies
are, respectively,
mh = E[nh | H0 ] = nh (Ph Ph )
and
var[nh | H0 ] = c (DPh Ph Ph ) (DPh Ph Ph )
Statistical Computations
where
c=
n2
h
nh 1
G =
h
Bh (Var(nh | H0 )) Bh
VG =
h
and where
Bh = Ch Rh
is a matrix of xed constants based on column scores Ch and row scores Rh . When
the null hypothesis is true, the CMH statistic has an asymptotic chi-square distribution
with degrees of freedom equal to the rank of Bh . If VG is found to be singular, PROC
FREQ prints a message and sets the value of the CMH statistic to missing.
PROC FREQ computes three CMH statistics using this formula for the generalized
CMH statistic, with different row and column score denitions for each statistic. The
CMH statistics that PROC FREQ computes are the correlation statistic, the ANOVA
(row mean scores) statistic, and the general association statistic. These statistics test
the null hypothesis of no association against different alternative hypotheses. The
following sections describe the computation of these CMH statistics.
CAUTION: The CMH statistics have low power for detecting an association in which
the patterns of association for some of the strata are in the opposite direction of the
patterns displayed by other strata. Thus, a nonsignicant CMH statistic suggests
either that there is no association or that no pattern of association has enough strength
or consistency to dominate any other pattern.
Correlation Statistic
The correlation statistic, popularized by Mantel and Haenszel (1959) and Mantel
(1963), has one degree of freedom and is known as the Mantel-Haenszel statistic.
The alternative hypothesis for the correlation statistic is that there is a linear association between X and Y in at least one stratum. If either X or Y does not lie on an
ordinal (or interval) scale, then this statistic is not meaningful.
135
136
The ANOVA statistic can be used only when the column variable Y lies on an ordinal
(or interval) scale so that the mean score of Y is meaningful. For the ANOVA statistic,
the mean score is computed for each row of the table, and the alternative hypothesis
is that, for at least one stratum, the mean scores of the R rows are unequal. In other
words, the statistic is sensitive to location differences among the R distributions of
Y.
The matrix of column scores Ch has dimension 1 C, the column scores are determined by the SCORES= option.
The matrix of row scores Rh has dimension (R 1) R and is created internally by
PROC FREQ as
Rh = [IR1 , JR1 ]
where IR1 is an identity matrix of rank R 1, and JR1 is an (R 1) 1 vector
of ones. This matrix has the effect of forming R 1 independent contrasts of the R
mean scores.
When there is only one stratum, this CMH statistic is essentially an analysis of variance (ANOVA) statistic in the sense that it is a function of the variance ratio F statistic that would be obtained from a one-way ANOVA on the dependent variable Y. If
nonparametric scores are specied in this case, then the ANOVA statistic is a KruskalWallis test.
If there is more than one stratum, then this CMH statistic corresponds to a stratumadjusted ANOVA or Kruskal-Wallis test. In the special case where there is one subject
per row and one subject per column in the contingency table of each stratum, this
CMH statistic is identical to Friedmans chi-square. See Example 2.8 on page 180
for an illustration.
General Association Statistic
The alternative hypothesis for the general association statistic is that, for at least one
stratum, there is some kind of association between X and Y. This statistic is always
interpretable because it does not require an ordinal scale for either X or Y.
Statistical Computations
For the general association statistic, the matrix Rh is the same as the one used for the
ANOVA statistic. The matrix Ch is dened similarly as
Ch = [IC1 , JC1 ]
PROC FREQ generates both score matrices internally. When there is only one stratum, then the general association CMH statistic reduces to QP (n 1)/n, where QP
is the Pearson chi-square statistic. When there is more than one stratum, then the
CMH statistic becomes a stratum-adjusted Pearson chi-square statistic. Note that a
similar adjustment can be made by summing the Pearson chi-squares across the strata.
However, the latter statistic requires a large sample size in each stratum to support the
resulting chi-square distribution with q(R1)(C 1) degrees of freedom. The CMH
statistic requires only a large overall sample size since it has only (R 1)(C 1)
degrees of freedom.
Refer to Cochran (1954); Mantel and Haenszel (1959); Mantel (1963); Birch (1965);
Landis, Heyman, and Koch (1978).
Adjusted Odds Ratio and Relative Risk Estimates
The CMH option provides adjusted odds ratio and relative risk estimates for stratied
22 tables. For each of these measures, PROC FREQ computes the Mantel-Haenszel
estimate and the logit estimate. These estimates apply to n-way table requests in the
TABLES statement, when the row and column variables both have only two levels.
For example,
proc freq;
tables A*B*C*D / cmh;
run;
In this example, if the row and columns variables C and D both have two levels,
PROC FREQ provides odds ratio and relative risk estimates, adjusting for the confounding variables A and B.
The choice of an appropriate measure depends on the study design. For case-control
(retrospective) studies, the odds ratio is appropriate. For cohort (prospective) or crosssectional studies, the relative risk is appropriate. See the section Odds Ratio and
Relative Risks for 2 x 2 Tables beginning on page 122 for more information on
these measures.
Throughout this section, z denotes the 100(1/2) percentile of the standard normal
distribution.
Odds Ratio, Case-Control Studies
Mantel-Haenszel Estimator
The Mantel-Haenszel estimate of the common odds ratio is computed as
ORMH =
h nh11
nh22 /nh
h nh12 nh21 /nh
137
138
where
2 = var[ ln(ORMH ) ]
h (nh11
2(
+
h [(nh11
h (nh12
nh22 /nh )2
2(
h nh12
nh21 /nh )2
Note that the Mantel-Haenszel odds ratio estimator is less sensitive to small nh than
the logit estimator.
Logit Estimator
The adjusted logit estimate of the odds ratio (Woolf 1955) is computed as
ORL = exp
h wh ln(ORh )
h wh
ORL exp
z
h wh
, ORL exp
z
h wh
1
var(ln ORh )
If any cell frequency in a stratum h is zero, then PROC FREQ adds 0.5 to each cell
of the stratum before computing ORh and wh (Haldane 1955), and prints a warning.
Exact Condence Limits for the Common Odds Ratio
When you specify the COMOR option in the EXACT statement, PROC FREQ computes exact condence limits for the common odds ratio for stratied 2 2 tables.
Statistical Computations
This computation assumes that the odds ratio is constant over all the 2 2 tables.
Exact condence limits are constructed from the distribution of S =
h nh11 , conditional on the marginal totals of the 2 2 tables.
Because this is a discrete problem, the condence coefcient for these exact condence limits is not exactly 1 but is at least 1 . Thus, these condence limits
are conservative. Refer to Agresti (1992).
PROC FREQ computes exact condence limits for the common odds ratio with an
algorithm based on that presented by Vollset, Hirji, and Elashoff (1991). Refer also
to Mehta, Patel, and Gray (1985).
Conditional on the marginal totals of 2 2 table h, let the random variable Sh denote
the frequency of table cell (1, 1). Given the row totals nh1 and nh2 and column
totals nh1 and nh2 , the lower and upper bounds for Sh are lh and uh ,
lh
uh
nh2
nh1 sh
nh1
sh
and let denote the common odds ratio. Then the conditional distribution of Sh is
x = uh
P ( Sh = sh | n1 , n1 , n2 ) = Csh sh /
Cx x
x = lh
l =
uh
u =
and
Cx x
/
x=l
where
Csh
Cs =
s1 +....+sq = s
139
140
x=u
Cx 1x
x = so
/2
Cx 2x
Cx 1x /
/2
x=l
x = s0
x=u
Cx 2x
/
x=l
x=l
When the observed sum s0 equals the lower bound l, then PROC FREQ sets the lower
exact condence limit to zero and determines the upper limit with level . Similarly,
when the observed sum s0 equals the upper bound u, then PROC FREQ sets the upper
exact condence limit to innity and determines the lower limit with level .
When you specify the COMOR option in the EXACT statement, PROC FREQ also
computes the exact test that the common odds ratio equals one. Setting = 1, the
conditional distribution of the sum S under the null hypothesis becomes
x=u
Cx
x=l
The point probability for this exact test is the probability of the observed sum s0
under the null hypothesis, conditional on the marginals of the stratied 2 2 tables,
and is denoted by P0 (s0 ). The expected value of S under the null hypothesis is
x=u
E0 (S) =
x=u
x Cx /
x=l
Cx
x=l
x=u
Cx
x = s0
x = s0
x=l
if s0 E0 (S)
x=l
x=u
Cx /
P1 = P0 ( S <= s0 ) =
if s0 > E0 (S)
Cx
Cx /
P1 = P0 ( S >= s0 ) =
x=l
PROC FREQ computes two-sided p-values for this test according to three different
denitions. A two-sided p-value is computed as twice the one-sided p-value, setting
the result equal to one if it exceeds one.
P2 a = 2 P1
Statistical Computations
Additionally, a two-sided p-value is computed as the sum of all probabilities less than
or equal to the point probability of the observed sum s0 , summing over all possible
values of s, l s u.
P2 b =
P0 (s)
lsu: P0 (s)P0 (s0 )
Also, a two-sided p-value is computed as the sum of the one-sided p-value and the
corresponding area in the opposite tail of the distribution, equidistant from the expected value.
P2c = P0 ( |S E0 (S)| |s0 E0 (S)| )
Relative Risks, Cohort Studies
Mantel-Haenszel Estimator
The Mantel-Haenszel estimate of the common relative risk for column 1 is computed
as
RRMH =
h nh11
nh2 /nh
h nh21 nh1 /nh
It is always computed unless the denominator is zero. Refer to Mantel and Haenszel
(1959) and Agresti (1990).
Using the estimated variance for log(RRMH ) given by Greenland and Robins (1985),
PROC FREQ computes the corresponding 100(1 )% condence limits for the
relative risk as
( RRMH exp(z ), RRMH exp(z ) )
where
2 = var[ ln(RRMH ) ]
h (nh1
Logit Estimator
The adjusted logit estimate of the common relative risk for column 1 is computed as
RRL = exp
h wh ln RRh
wh
z
h wh
, RRL exp
z
h wh
141
142
1
var(ln RRh )
If nh11 or nh21 is zero, then PROC FREQ adds 0.5 to each cell of the stratum before
computing RRh and wh , and prints a warning. Refer to Kleinbaum, Kupper, and
Morgenstern (1982, Sections 17.4 and 17.5).
Breslow-Day Test for Homogeneity of the Odds Ratios
When you specify the CMH option, PROC FREQ computes the Breslow-Day test for
stratied analysis of 22 tables. It tests the null hypothesis that the odds ratios for the
q strata are all equal. When the null hypothesis is true, the statistic has approximately
a chi-square distribution with q 1 degrees of freedom. Refer to Breslow and Day
(1980) and Agresti (1996).
The Breslow-Day statistic is computed as
QBD =
h
where E and var denote expected value and variance, respectively. The summation
does not include any table with a zero row or column. If ORMH equals zero or if it
is undened, then PROC FREQ does not compute the statistic and prints a warning
message.
For the Breslow-Day test to be valid, the sample size should be relatively large in
each stratum, and at least 80% of the expected cell counts should be greater than
5. Note that this is a stricter sample size requirement than the requirement for the
Cochran-Mantel-Haenszel test for q 2 2 tables, in that each stratum sample size
(not just the overall sample size) must be relatively large. Even when the BreslowDay test is valid, it may not be very powerful against certain alternatives, as discussed
in Breslow and Day (1980).
If you specify the BDT option, PROC FREQ computes the Breslow-Day test with
Tarones adjustment, which subtracts an adjustment factor from QBD to make the
resulting statistic asymptotically chi-square.
QBDT = QBD
h (nh11
Exact Statistics
Exact statistics can be useful in situations where the asymptotic assumptions are not
met, and so the asymptotic p-values are not close approximations for the true pvalues. Standard asymptotic methods involve the assumption that the test statistic
follows a particular distribution when the sample size is sufciently large. When the
Statistical Computations
sample size is not large, asymptotic results may not be valid, with the asymptotic
p-values differing perhaps substantially from the exact p-values. Asymptotic results
may also be unreliable when the distribution of the data is sparse, skewed, or heavily tied. Refer to Agresti (1996) and Bishop, Fienberg, and Holland (1975). Exact
computations are based on the statistical theory of exact conditional inference for
contingency tables, reviewed by Agresti (1992).
In addition to computation of exact p-values, PROC FREQ provides the option of
estimating exact p-values by Monte Carlo simulation. This can be useful for problems
that are so large that exact computations require a great amount of time and memory,
but for which asymptotic approximations may not be sufcient.
PROC FREQ provides exact p-values for the following tests for two-way tables:
Pearson chi-square, likelihood-ratio chi-square, Mantel-Haenszel chi-square, Fishers
exact test, Jonckheere-Terpstra test, Cochran-Armitage test for trend, and McNemars
test. PROC FREQ also computes exact p-values for tests of hypotheses that the following statistics equal zero: Pearson correlation coefcient, Spearman correlation
coefcient, simple kappa coefcient, and weighted kappa coefcient. Additionally,
PROC FREQ computes exact condence limits for the odds ratio for 2 2 tables. For
stratied 2 2 tables, PROC FREQ computes exact condence limits for the common odds ratio, as well as an exact test that the common odds ratio equals one. For
one-way frequency tables, PROC FREQ provides the exact chi-square goodness-oft test (for equal proportions or for proportions or frequencies that you specify). Also
for one-way tables, PROC FREQ provides exact condence limits for the binomial
proportion and an exact test for the binomial proportion value.
The following sections summarize the exact computational algorithms, dene the
exact p-values that PROC FREQ computes, discuss the computational resource requirements, and describe the Monte Carlo estimation option.
Computational Algorithms
PROC FREQ computes exact p-values for general R C tables using the network algorithm developed by Mehta and Patel (1983). This algorithm provides a substantial
advantage over direct enumeration, which can be very time-consuming and feasible
only for small problems. Refer to Agresti (1992) for a review of algorithms for computation of exact p-values, and refer to Mehta, Patel, and Tsiatis (1984) and Mehta,
Patel, and Senchaudhuri (1991) for information on the performance of the network
algorithm.
The reference set for a given contingency table is the set of all contingency tables
with the observed marginal row and column sums. Corresponding to this reference
set, the network algorithm forms a directed acyclic network consisting of nodes in a
number of stages. A path through the network corresponds to a distinct table in the
reference set. The distances between nodes are dened so that the total distance of a
path through the network is the corresponding value of the test statistic. At each node,
the algorithm computes the shortest and longest path distances for all the paths that
pass through that node. For statistics that can be expressed as a linear combination
of cell frequencies multiplied by increasing row and column scores, PROC FREQ
computes shortest and longest path distances using the algorithm given in Agresti,
Mehta, and Patel (1990). For statistics of other forms, PROC FREQ computes an
143
144
Denition of p-Values
For several tests in PROC FREQ, the test statistic is nonnegative, and large values of
the test statistic indicate a departure from the null hypothesis. Such tests include the
Pearson chi-square, the likelihood-ratio chi-square, the Mantel-Haenszel chi-square,
Fishers exact test for tables larger than 2 2 tables, McNemars test, and the one-
Statistical Computations
way chi-square goodness-of-t test. The exact p-value for these nondirectional tests
is the sum of probabilities for those tables having a test statistic greater than or equal
to the value of the observed test statistic.
There are other tests where it may be appropriate to test against either a one-sided or a
two-sided alternative hypothesis. For example, when you test the null hypothesis that
the true parameter value equals 0 (T = 0), the alternative of interest may be one-sided
(T 0, or T 0) or two-sided (T = 0). Such tests include the Pearson correlation coefcient, Spearman correlation coefcient, Jonckheere-Terpstra test, CochranArmitage test for trend, simple kappa coefcient, and weighted kappa coefcient. For
these tests, PROC FREQ outputs the right-sided p-value when the observed value of
the test statistic is greater than its expected value. The right-sided p-value is the sum
of probabilities for those tables having a test statistic greater than or equal to the
observed test statistic. Otherwise, when the test statistic is less than or equal to its
expected value, PROC FREQ outputs the left-sided p-value. The left-sided p-value
is the sum of probabilities for those tables having a test statistic less than or equal to
the one observed. The one-sided p-value P1 can be expressed as
P1 = Prob (Test Statistic t)
if t > E0 (T )
if t E0 (T )
where t is the observed value of the test statistic and E0 (T ) is the expected value of
the test statistic under the null hypothesis. PROC FREQ computes the two-sided pvalue as the sum of the one-sided p-value and the corresponding area in the opposite
tail of the distribution of the statistic, equidistant from the expected value. The twosided p-value P2 can be expressed as
P2 = Prob ( | Test Statistic E0 (T ) | | t E0 (T ) | )
If you specify the POINT option in the EXACT statement, PROC FREQ also displays
exact point probabilities for the test statistics. The exact point probability is the exact
probability that the test statistic equals the observed value.
Computational Resources
PROC FREQ uses relatively fast and efcient algorithms for exact computations.
These recently developed algorithms, together with improvements in computer
power, make it feasible now to perform exact computations for data sets where previously only asymptotic methods could be applied. Nevertheless, there are still large
problems that may require a prohibitive amount of time and memory for exact computations, depending on the speed and memory available on your computer. For large
problems, consider whether exact methods are really needed or whether asymptotic
methods might give results quite close to the exact results, while requiring much less
computer time and memory. When asymptotic methods may not be sufcient for
such large problems, consider using Monte Carlo estimation of exact p-values, as
described in the section Monte Carlo Estimation on page 146.
145
146
If you specify the option MC in the EXACT statement, PROC FREQ computes Monte
Carlo estimates of the exact p-values instead of directly computing the exact p-values.
Monte Carlo estimation can be useful for large problems that require a great amount
of time and memory for exact computations but for which asymptotic approximations
may not be sufcient. To describe the precision of each Monte Carlo estimate, PROC
FREQ provides the asymptotic standard error and 100(1)% condence limits. The
condence level is determined by the ALPHA= option in the EXACT statement,
which, by default, equals 0.01, and produces 99% condence limits. The N=n option
in the EXACT statement species the number of samples that PROC FREQ uses for
Monte Carlo estimation; the default is 10000 samples. You can specify a larger value
for n to improve the precision of the Monte Carlo estimates. Because larger values
of n generate more samples, the computation time increases. Alternatively, you can
specify a smaller value of n to reduce the computation time.
To compute a Monte Carlo estimate of an exact p-value, PROC FREQ generates a
random sample of tables with the same total sample size, row totals, and column totals as the observed table. PROC FREQ uses the algorithm of Agresti, Wackerly, and
Boyett (1979), which generates tables in proportion to their hypergeometric probabilities conditional on the marginal frequencies. For each sample table, PROC FREQ
computes the value of the test statistic and compares it to the value for the observed
table. When estimating a right-sided p-value, PROC FREQ counts all sample tables
Computational Resources
for which the test statistic is greater than or equal to the observed test statistic. Then
the p-value estimate equals the number of these tables divided by the total number of
tables sampled.
PMC =
M /N
PROC FREQ computes left-sided and two-sided p-value estimates in a similar manner. For left-sided p-values, PROC FREQ evaluates whether the test statistic for each
sampled table is less than or equal to the observed test statistic. For two-sided pvalues, PROC FREQ examines the sample test statistics according to the expression
for P2 given in the section Asymptotic Tests on page 109. The variable M is a binomially distributed variable with N trials and success probability p. It follows that
the asymptotic standard error of the Monte Carlo estimate is
se(PMC ) =
PROC FREQ constructs asymptotic condence limits for the p-values according to
When the Monte Carlo estimate PMC equals 0, then PROC FREQ computes the
condence limits for the p-value as
( 0, 1 (1/N ) )
When the Monte Carlo estimate PM C equals 1, then PROC FREQ computes the
condence limits as
( (1/N ) , 1 )
Computational Resources
For each variable in a table request, PROC FREQ stores all of the levels in memory.
If all variables are numeric and not formatted, this requires about 84 bytes for each
variable level. When there are character variables or formatted numeric variables,
the memory that is required depends on the formatted variable lengths, with longer
formatted lengths requiring more memory. The number of levels for each variable is
limited only by the largest integer that your operating environment can store.
147
148
In addition, PROC FREQ requires 8000 bytes to store the table cell frequencies
1000 cells * 8 bytes/cell
the output data set D contains frequencies and percentages for the last table request,
A*B. If A has two levels (1 and 2), B has three levels (1,2, and 3), and no table cell
count is zero or missing, the output data set D includes six observations, one for each
combination of A and B. The rst observation corresponds to A=1 and B=1; the second observation corresponds to A=1 and B=2; and so on. The data set includes the
variables COUNT and PERCENT. The value of COUNT is the number of observations with the given combination of A and B values. The value of PERCENT is the
percent of the total number of observations having that A and B combination.
When PROC FREQ combines different variable values into the same formatted level,
the output data set contains the smallest internal value for the formatted level. For
149
150
in a PROC FREQ step, the formatted levels listed in the frequency table for X are 1
and 2. If you create an output data set with the frequency counts, the internal values
of X are 1.1 and 1.7. To report the internal values of X when you display the output
data set, use a format of 3.1 with X.
DF
E
L
U
E0
Z
P
P2
PL
PR
XP
XP2
XPL
XPR
XPT
XL
XR
degrees of freedom
asymptotic standard error (ASE)
lower condence limit
upper condence limit
ASE under the null hypothesis
standardized value
p-value
two-sided p-value
left-sided p-value
right-sided p-value
exact p-value
exact two-sided p-value
exact left-sided p-value
exact right-sided p-value
exact point probability
exact lower condence limit
exact upper condence limit
Displayed Output
For example, variable names created for the Pearson chi-square, its degrees of freedom, its p-values are PCHI , DF PCHI, and P PCHI, respectively.
If the length of the prex plus the statistic option exceeds eight characters, PROC
FREQ truncates the option so that the name of the new variable is eight characters
long.
Displayed Output
Number of Variable Levels Table
If you specify the NLEVELS option in the PROC FREQ statement, PROC FREQ
displays the Number of Variable Levels table. This table provides the number of
levels for all variables named in the TABLES statements. PROC FREQ determines
the variable levels from the formatted variable values. See Grouping with Formats
for details. The Number of Variable Levels table contains the following information:
Variable name
Levels, which is the total number of levels of the variable
Number of Nonmissing Levels, if there are missing levels for any of the variables
Number of Missing Levels, if there are missing levels for any of the variables
151
152
Multiway Tables
PROC FREQ displays all multiway table requests in the TABLES statements, unless
you specify the NOPRINT option in the PROC statement or the NOPRINT option in
the TABLES statement.
For two-way to multiway crosstabulation tables, the values of the last variable in the
table request form the table columns. The values of the next-to-last variable form the
rows. Each level (or combination of levels) of the other variables forms one stratum.
There are three ways to display multiway tables in PROC FREQ. By default, PROC
FREQ displays multiway tables as separate two-way crosstabulation tables for each
stratum of the multiway table. Also by default, PROC FREQ displays these twoway crosstabulation tables in table cell format. Alternatively, if you specify the
CROSSLIST option, PROC FREQ displays the two-way crosstabulation tables in
ODS column format. If you specify the LIST option, PROC FREQ displays multiway tables in list format.
Displayed Output
Crosstabulation Tables
By default, PROC FREQ displays two-way crosstabulation tables in table cell format.
The row variable values are listed down the side of the table, the column variable
values are listed across the top of the table, and each row and column variable level
combination forms a table cell.
Each cell of a crosstabulation table may contain the following information:
Frequency, giving the number of observations that have the indicated values of
the two variables. (The NOFREQ option suppresses this information.)
the Expected cell frequency under the hypothesis of independence, if you specify the EXPECTED option
the Deviation of the cell frequency from the expected value, if you specify the
DEVIATION option
Cell Chi-Square, which is the cells contribution to the total chi-square statistic,
if you specify the CELLCHI2 option
Tot Pct, or the cells percentage of the total frequency, for n-way tables when
n > 2, if you specify the TOTPCT option
Percent, the cells percentage of the total frequency. (The NOPERCENT option
suppresses this information.)
Row Pct, or the row percentage, the cells percentage of the total frequency
count for that cells row. (The NOROW option suppresses this information.)
Col Pct, or column percentage, the cells percentage of the total frequency
count for that cells column. (The NOCOL option suppresses this information.)
Cumulative Col%, or cumulative column percent, if you specify the CUMCOL
option
The table also displays the Frequency Missing, or the number of observations with
missing values.
CROSSLIST Tables
If you specify the CROSSLIST option, PROC FREQ displays two-way crosstabulation tables with ODS column format. Using column format, a CROSSLIST table
provides the same information (frequencies, percentages, and other statistics) as the
default crosstabulation table with cell format. But unlike the default crosstabulation
table, a CROSSLIST table has a table denition that you can customize with PROC
TEMPLATE. For more information, refer to the chapter titled The TEMPLATE
Procedure in the SAS Output Delivery System Users Guide.
In the CROSSLIST table format, the rows of the display correspond to the crosstabulation table cells, and the columns of the display correspond to descriptive statistics
such as frequencies and percentages. Each table cell is identied by the values of
its TABLES row and column variable levels, with all column variable levels listed
within each row variable level. The CROSSLIST table also provides row totals, column totals, and overall table totals.
153
154
LIST Tables
If you specify the LIST option in the TABLES statement, PROC FREQ displays
multiway tables in a list format rather than as crosstabulation tables. The LIST option
displays the entire multiway table in one table, instead of displaying a separate twoway table for each stratum. The LIST option is not available when you also request
statistical options. Unlike the default crosstabulation output, the LIST output does
not display row percentages, column percentages, and optional information such as
expected frequencies and cell chi-squares.
For a multiway table in list format, PROC FREQ displays the following information:
the variable names and values
Frequency counts, giving the number of observations with the indicated combination of variable values
Percent, the cells percentage of the total number of observations.
NOPERCENT option suppresses this information.)
(The
Cumulative Frequency counts, giving the sum of the frequency counts of that
cell and all other cells listed above it in the table. The last cumulative frequency is the total number of nonmissing observations. (The NOCUM option
suppresses this information.)
Displayed Output
Cumulative Percent values, giving the percentage of the total number of observations for that cell and all others previously listed in the table. (The NOCUM
or the NOPERCENT option suppresses this information.)
The table also displays the Frequency Missing, or the number of observations with
missing values.
155
156
Displayed Output
If you specify the POINT option with the TREND option in the EXACT statement, PROC FREQ displays the exact point probability for the test statistic.
If you specify the JT option, PROC FREQ displays the Jonckheere-Terpstra
Test, showing the Statistic (JT), the standardized test statistic (Z), and the onesided and two-sided probability values. If you specify the JT option in the
EXACT statement, PROC FREQ also displays the exact one-sided and twosided probability values for this test. If you specify the POINT option with
the JT option in the EXACT statement, PROC FREQ displays the exact point
probability for the test statistic.
If you specify the AGREE option and the PRINTKWT option, PROC FREQ
displays the Kappa Coefcient Weights for square tables greater than 2 2.
If you specify the AGREE option, for two-way tables PROC FREQ displays
McNemars Test and the Simple Kappa Coefcient for 2 2 tables. For square
tables larger than 2 2, PROC FREQ displays Bowkers Test of Symmetry,
the Simple Kappa Coefcient, and the Weighted Kappa Coefcient. For
McNemars Test and Bowkers Test of Symmetry, PROC FREQ displays the
Statistic (S), the degrees of freedom (DF), and the probability value (Pr > S).
If you specify the MCNEM option in the EXACT statement, PROC FREQ
also displays the exact probability value for McNemars test. If you specify
the POINT option with the MCNEM option in the EXACT statement, PROC
FREQ displays the exact point probability for the test statistic. For the simple and weighted kappa coefcients, PROC FREQ displays the kappa values,
asymptotic standard errors (ASE), and Condence Limits.
If you specify the KAPPA or WTKAP option in the TEST statement, PROC
FREQ displays asymptotic tests for the simple kappa coefcient or the
weighted kappa coefcient, respectively. If you specify the AGREE option
in the TEST statement, PROC FREQ displays both these asymptotic tests. The
test output includes the kappa coefcient, its asymptotic standard error (ASE),
Condence Limits, the ASE under the null hypothesis H0, the standardized test
statistic (Z), and the one-sided and two-sided probability values.
If you specify the KAPPA or WTKAP option in the EXACT statement, PROC
FREQ displays asymptotic and exact tests for the simple kappa coefcient or
the weighted kappa coefcient, respectively. The test output includes the kappa
coefcient, its asymptotic standard error (ASE), Condence Limits, the ASE
under the null hypothesis H0, the standardized test statistic (Z), and the asymptotic and exact one-sided and two-sided probability values. If you specify the
POINT option in the EXACT statement, PROC FREQ displays the point probability for each exact test requested.
If you specify the MC option in the EXACT statement, PROC FREQ displays
Monte Carlo estimates for all exact p-values requested by statistic-options in
the EXACT statement. The Monte Carlo output includes the p-value Estimate,
its Condence Limits, the Number of Samples used to compute the Monte
Carlo estimate, and the Initial Seed for random number generation.
If you specify the AGREE option, for multiple strata PROC FREQ displays
Overall Simple and Weighted Kappa Coefcients, with their asymptotic standard errors (ASE) and Condence Limits. PROC FREQ also displays Tests for
157
158
159
FishersExactMC
Gamma
GammaTest
JTTest
JTTestMC
KappaStatistics
KappaWeights
List
LRChiSq
LRChiSqMC
McNemarsTest
Measures
MHChiSq
MHChiSqMC
NLevels
OddsRatioCL
OneWayChiSq
Description
Binomial proportion
Binomial proportion test
Breslow-Day test
Cochran-Mantel-Haenszel
statistics
Chi-square tests
Cochrans Q
Column scores
Exact condence limits
for the common odds ratio
Common odds ratio exact test
Common relative risks
Column format
crosstabulation table
Crosstabulation table
Test for equal simple kappas
Tests for equal kappas
Fishers exact test
Kappa weights
List format multiway table
Likelihood-ratio
chi-square exact test
Monte Carlo exact test for
likelihood-ratio chi-square
McNemars test
Measures of association
Mantel-Haenszel
chi-square exact test
Monte Carlo exact test for
Mantel-Haenszel chi-square
Number of variable levels
Exact condence limits
for the odds ratio
One-way chi-square test
Statement
TABLES
TABLES
TABLES
TABLES
Option
BINOMIAL (one-way tables)
BINOMIAL (one-way tables)
CMH (h 2 2 tables)
CMH
TABLES
TABLES
TABLES
EXACT
CHISQ
AGREE (h 2 2 tables)
SCOROUT
COMOR
EXACT
TABLES
TABLES
TABLES
TABLES
TABLES
EXACT
or TABLES
or TABLES
EXACT
COMOR
CMH (h 2 2 tables)
CROSSLIST
(n-way table request, n > 1)
(n-way table request, n > 1)
AGREE (h 2 2 tables)
AGREE (hrr tables, r > 2)
FISHER
FISHER or EXACT
CHISQ (2 2 tables)
FISHER / MC
TEST
TEST
TABLES
EXACT
GAMMA
GAMMA
JT
JT / MC
TABLES
TABLES
TABLES
EXACT
AGREE
(r r tables, r > 2, and
no TEST or EXACT KAPPA)
AGREE and PRINTKWT
LIST
LRCHI
EXACT
LRCHI / MC
TABLES
TABLES
EXACT
AGREE (2 2 tables)
MEASURES
MHCHI
EXACT
MHCHI / MC
PROC
EXACT
NLEVELS
OR
TABLES
160
Description
Monte Carlo exact test
for one-way chi-square
One-way frequencies
Overall simple kappa
Overall kappa coefcients
Pearson chi-square
exact test
Monte Carlo exact test
for Pearson chi-square
Pearson correlation
PearsonCorrTest
RelativeRisks
RiskDiffCol1
RiskDiffCol2
RowScores
SimpleKappa
SimpleKappaMC
SimpleKappaTest
SomersDCR
SomersDCRTest
SomersDRC
SomersDRCTest
SpearmanCorr
Somers D(C|R)
Somers D(C|R) test
Somers D(R|C)
Somers D(R|C) test
Spearman correlation
SpearmanCorrMC
SpearmanCorrTest
SymmetryTest
TauB
TauBTest
TauC
TauCTest
TrendTest
Test of symmetry
Kendalls tau-b
Kendalls tau-b test
Stuarts tau-c
Stuarts tau-c test
Cochran-Armitage test
for trend
Statement
EXACT
Option
CHISQ / MC (one-way tables)
PROC
or TABLES
TABLES
TABLES
EXACT
EXACT
PCHI / MC
TEST
or EXACT
EXACT
PCORR
PCORR
PCORR / MC
TEST
or EXACT
TABLES
TABLES
TABLES
TABLES
TEST
or EXACT
EXACT
PCORR
PCORR
RELRISK or MEASURES
(2 2 tables)
RISKDIFF (2 2 tables)
RISKDIFF (2 2 tables)
SCOROUT
KAPPA
KAPPA
KAPPA / MC
TEST
or EXACT
TEST
TEST
TEST
TEST
TEST
or EXACT
EXACT
KAPPA
KAPPA
SMDCR
SMDCR
SMDRC
SMDRC
SCORR
SCORR
SCORR / MC
TEST
or EXACT
TABLES
TEST
TEST
TEST
TEST
TABLES
SCORR
SCORR
AGREE
KENTB
KENTB
STUTC
STUTC
TREND
Examples
161
Description
Monte Carlo exact test
for trend
Weighted kappa
Monte Carlo exact test
for weighted kappa
Weighted kappa test
Statement
EXACT
Option
TREND / MC
TEST
or EXACT
EXACT
WTKAP
WTKAP
WTKAP / MC
TEST
or EXACT
WTKAP
WTKAP
Examples
Example 2.1. Creating an Output Data Set with Table Cell
Frequencies
The eye and hair color of children from two different regions of Europe are recorded
in the data set Color. Instead of recording one observation per child, the data are
recorded as cell counts, where the variable Count contains the number of children
exhibiting each of the 15 eye and hair color combinations. The data set does not
include missing combinations.
data Color;
input Region Eyes $ Hair $ Count @@;
label Eyes =Eye Color
Hair =Hair Color
Region=Geographic Region;
datalines;
1 blue fair
23 1 blue red
7 1
1 blue dark
11 1 green fair
19 1
1 green medium 18 1 green dark
14 1
1 brown red
5 1 brown medium 41 1
1 brown black
3 2 blue fair
46 2
2 blue medium 44 2 blue dark
40 2
2 green fair
50 2 green red
31 2
2 green dark
23 2 brown fair
56 2
2 brown medium 53 2 brown dark
54 2
;
blue
green
brown
brown
blue
blue
green
brown
brown
medium
red
fair
dark
red
black
medium
red
black
24
7
34
40
21
6
37
42
13
The following statements read the Color data set and create an output data set containing the frequencies, percentages, and expected cell frequencies of the Eyes by
Hair two-way table. The TABLES statement requests three tables: Eyes and Hair
frequency tables and an Eyes by Hair crosstabulation table. The OUT= option
creates the FreqCnt data set, which contains the crosstabulation table frequencies.
The OUTEXPECT option outputs the expected cell frequencies to FreqCnt, and the
SPARSE option includes the zero cell counts. The WEIGHT statement species
that Count contains the observation weights. These statements create Output 2.1.1
through Output 2.1.3.
162
Output 2.1.1 displays the two frequency tables produced, one showing the distribution of eye color, and one showing the distribution of hair color. By default, PROC
FREQ lists the variables values in alphabetical order. The Eyes*Hair specication
produces a crosstabulation table, shown in Output 2.1.2, with eye color dening the
table rows and hair color dening the table columns. A zero cell count for green eyes
and black hair indicates that this eye and hair color combination does not occur in the
data.
The output data set (Output 2.1.3) contains frequency counts and percentages for the
last table. The data set also includes an observation for the zero cell count (SPARSE)
and a variable with the expected cell frequency for each table cell (OUTEXPECT).
Hair Color
Cumulative
Cumulative
Hair
Frequency
Percent
Frequency
Percent
----------------------------------------------------------black
22
2.89
22
2.89
dark
182
23.88
204
26.77
fair
228
29.92
432
56.69
medium
217
28.48
649
85.17
red
113
14.83
762
100.00
Examples
Hair(Hair Color)
Frequency|
Percent |
Row Pct |
Col Pct |black
|dark
|fair
|medium |red
| Total
---------+--------+--------+--------+--------+--------+
blue
|
6 |
51 |
69 |
68 |
28 |
222
|
0.79 |
6.69 |
9.06 |
8.92 |
3.67 | 29.13
|
2.70 | 22.97 | 31.08 | 30.63 | 12.61 |
| 27.27 | 28.02 | 30.26 | 31.34 | 24.78 |
---------+--------+--------+--------+--------+--------+
brown
|
16 |
94 |
90 |
94 |
47 |
341
|
2.10 | 12.34 | 11.81 | 12.34 |
6.17 | 44.75
|
4.69 | 27.57 | 26.39 | 27.57 | 13.78 |
| 72.73 | 51.65 | 39.47 | 43.32 | 41.59 |
---------+--------+--------+--------+--------+--------+
green
|
0 |
37 |
69 |
55 |
38 |
199
|
0.00 |
4.86 |
9.06 |
7.22 |
4.99 | 26.12
|
0.00 | 18.59 | 34.67 | 27.64 | 19.10 |
|
0.00 | 20.33 | 30.26 | 25.35 | 33.63 |
---------+--------+--------+--------+--------+--------+
Total
22
182
228
217
113
762
2.89
23.88
29.92
28.48
14.83
100.00
Hair
blue
blue
blue
blue
blue
brown
brown
brown
brown
brown
green
green
green
green
green
black
dark
fair
medium
red
black
dark
fair
medium
red
black
dark
fair
medium
red
COUNT
6
51
69
68
28
16
94
90
94
47
0
37
69
55
38
EXPECTED
PERCENT
6.409
53.024
66.425
63.220
32.921
9.845
81.446
102.031
97.109
50.568
5.745
47.530
59.543
56.671
29.510
0.7874
6.6929
9.0551
8.9239
3.6745
2.0997
12.3360
11.8110
12.3360
6.1680
0.0000
4.8556
9.0551
7.2178
4.9869
163
164
The frequency tables in Output 2.2.1 list the variable values (hair color) in the order
in which they appear in the data set. The Test Percent column lists the hypothesized
percentages for the chi-square test. Always check that you have ordered the TESTP=
percentages to correctly match the order of the variable levels.
PROC FREQ computes a chi-square statistic for each region. The chi-square statistic
is signicant at the 0.05 level for Region 2 (p=0.0003) but not for Region 1. This
indicates a signicant departure from the hypothesized percentages in Region 2.
Examples
Chi-Square Test
for Specified Proportions
------------------------Chi-Square
7.7602
DF
4
Pr > ChiSq
0.1008
Sample Size = 246
Chi-Square Test
for Specified Proportions
------------------------Chi-Square
21.3824
DF
4
Pr > ChiSq
0.0003
Sample Size = 516
165
166
The rst TABLES statement produces a frequency table for eye color. The
BINOMIAL option computes the binomial proportion and condence limits, and it
tests the hypothesis that the proportion for the rst eye color level (brown) is 0.5. The
option ALPHA=.1 species that 90% condence limits should be computed. The
second TABLES statement creates a frequency table for hair color and computes the
binomial proportion and condence limits, but it tests that the proportion for the rst
hair color (fair) is 0.28. These statements produce Output 2.3.1 and Output 2.3.2.
The frequency table in Output 2.3.1 displays the variable values in order of descending frequency count. Since the rst variable level is brown, PROC FREQ computes
the binomial proportion of children with brown eyes. PROC FREQ also computes its
asymptotic standard error (ASE), and asymptotic and exact 90% condence limits.
If you do not specify the ALPHA= option, then PROC FREQ computes the default
95% condence limits.
Because the value of Z is less than zero, PROC FREQ computes a left-sided p-value
(0.0019). This small p-value supports the alternative hypothesis that the true value of
the proportion of children with brown eyes is less than 50%.
Output 2.3.2 displays the results from the second TABLES statement. PROC FREQ
computes the default 95% condence limits since the ALPHA= option is not specied. The value of Z is greater than zero, so PROC FREQ computes a right-sided
p-value (0.1188). This large p-value provides insufcient evidence to reject the null
hypothesis that the proportion of children with fair hair is 28%.
Examples
Binomial Proportion
for Eyes = brown
-------------------------------Proportion
0.4475
ASE
0.0180
90% Lower Conf Limit
0.4179
90% Upper Conf Limit
0.4771
Exact Conf Limits
90% Lower Conf Limit
90% Upper Conf Limit
0.4174
0.4779
0.0181
-2.8981
0.0019
0.0038
167
168
Binomial Proportion
for Hair = fair
-------------------------------Proportion
0.2992
ASE
0.0166
95% Lower Conf Limit
0.2667
95% Upper Conf Limit
0.3317
Exact Conf Limits
95% Lower Conf Limit
95% Upper Conf Limit
0.2669
0.3331
0.0163
1.1812
0.1188
0.2375
In the following statements, the TABLES statement creates a two-way table, and the
option ORDER=DATA orders the contingency table values by their order in the data
set. The CHISQ option produces several chi-square tests, while the RELRISK option
produces relative risk measures. The EXACT statement creates the exact Pearson chisquare test and exact condence limits for the odds ratio. These statements produce
Output 2.4.1 through Output 2.4.3.
proc freq data=FatComp order=data;
weight Count;
tables Exposure*Response / chisq relrisk;
exact pchi or;
format Exposure ExpFmt. Response RspFmt.;
title Case-Control Study of High Fat/Cholesterol Diet;
run;
169
170
Response(Heart Disease)
Frequency
|
Percent
|
Row Pct
|
Col Pct
|Yes
|No
| Total
-----------------+--------+--------+
High Cholesterol |
11 |
4 |
15
Diet
| 47.83 | 17.39 | 65.22
| 73.33 | 26.67 |
| 84.62 | 40.00 |
-----------------+--------+--------+
Low Cholesterol |
2 |
6 |
8
Diet
|
8.70 | 26.09 | 34.78
| 25.00 | 75.00 |
| 15.38 | 60.00 |
-----------------+--------+--------+
Total
13
10
23
56.52
43.48
100.00
The contingency table in Output 2.4.1 displays the variable values so that the rst
table cell contains the frequency for the rst cell in the data set, the frequency of
positive exposure and positive response.
0.0334
0.0393
Sample Size = 23
Output 2.4.2 displays the chi-square statistics. Since the expected counts in some
of the table cells are small, PROC FREQ gives a warning that the asymptotic chisquare tests may not be appropriate. In this case, the exact tests are appropriate.
The alternative hypothesis for this analysis states that coronary heart disease is more
likely to be associated with a high fat diet, so a one-sided test is desired. Fishers
exact right-sided test analyzes whether the probability of heart disease in the high fat
group exceeds the probability of heart disease in the low fat group; since this p-value
is small, the alternative hypothesis is supported.
171
172
1.1535
59.0029
0.8677
105.5488
Sample Size = 23
The odds ratio, displayed in Output 2.4.3, provides an estimate of the relative risk
when an event is rare. This estimate indicates that the odds of heart disease is 8.25
times higher in the high fat diet group; however, the wide condence limits indicate
that this estimate has low precision.
The CHISQ option produces chi-square tests, the EXPECTED option displays expected cell frequencies in the table, and the CELLCHI2 option displays the cell contribution to the chi-square. The NOROW and NOCOL options suppress the display
of row and column percents in the table.
Hair(Hair Color)
Frequency
|
Expected
|
Cell Chi-Square|
Percent
|fair
|red
|medium |dark
|black
| Total
---------------+--------+--------+--------+--------+--------+
blue
|
69 |
28 |
68 |
51 |
6 |
222
| 66.425 | 32.921 | 63.22 | 53.024 | 6.4094 |
| 0.0998 | 0.7357 | 0.3613 | 0.0772 | 0.0262 |
|
9.06 |
3.67 |
8.92 |
6.69 |
0.79 | 29.13
---------------+--------+--------+--------+--------+--------+
green
|
69 |
38 |
55 |
37 |
0 |
199
| 59.543 | 29.51 | 56.671 | 47.53 | 5.7454 |
| 1.5019 | 2.4422 | 0.0492 | 2.3329 | 5.7454 |
|
9.06 |
4.99 |
7.22 |
4.86 |
0.00 | 26.12
---------------+--------+--------+--------+--------+--------+
brown
|
90 |
47 |
94 |
94 |
16 |
341
| 102.03 | 50.568 | 97.109 | 81.446 | 9.8451 |
| 1.4187 | 0.2518 | 0.0995 | 1.935 | 3.8478 |
| 11.81 |
6.17 | 12.34 | 12.34 |
2.10 | 44.75
---------------+--------+--------+--------+--------+--------+
Total
228
113
217
182
22
762
29.92
14.83
28.48
23.88
2.89
100.00
The contingency table in Output 2.5.1 displays eye and hair color in the order in
which they appear in the Color data set. The Pearson chi-square statistic in Output
2.5.2 provides evidence of an association between eye and hair color (p=0.0073).
The cell chi-square values show that most of the association is due to more greeneyed children with fair or red hair and fewer with dark or black hair. The opposite
occurs with the brown-eyed children.
173
174
NMISS
_PCHI_
DF_PCHI
762
20.9248
P_PCHI
_LRCHI_
.007349898
25.9733
DF_LRCHI
8
P_LRCHI
.001061424
The OUT= data set is displayed in Output 2.5.3. It contains one observation with the
sample size, the number of missing values, and the chi-square statistics and corresponding degrees of freedom and p-values as in Output 2.5.2.
@@;
11
20
16
19
For a stratied 2 2 table, the three CMH statistics displayed in Output 2.6.1 test
the same hypothesis. The signicant p-value (0.004) indicates that the association
between treatment and response remains strong after adjusting for gender.
175
176
Mantel-Haenszel
Logit
2.1636
2.1059
1.2336
1.1951
3.7948
3.7108
Cohort
(Col2 Risk)
Mantel-Haenszel
Logit
0.6420
0.6613
0.4705
0.4852
0.8761
0.9013
The CMH option also produces a table of relative risks, as shown in Output 2.6.2.
Because this is a prospective study, the relative risk estimate assesses the effectiveness
of the new drug; the Cohort (Col1 Risk) values are the appropriate estimates for the
rst column, or the risk of improvement. The probability of migraine improvement
with the new drug is just over two times the probability of improvement with the
placebo.
Output 2.6.3. CMH Option: Breslow-Day Test
Clinical Trial for Treatment of Migraine Headaches
Summary Statistics for Treatment by Response
Controlling for Gender
Breslow-Day Test for
Homogeneity of the Odds Ratios
-----------------------------Chi-Square
1.4929
DF
1
Pr > ChiSq
0.2218
Total Sample Size = 106
The large p-value for the Breslow-Day test (0.2218) in Output 2.6.3 indicates no
signicant gender difference in the odds ratios.
The TABLES statement in the following program produces a two-way table. The
MEASURES option produces measures of association, and the CL option produces
condence limits for these measures. The TREND option tests for a trend across the
ordinal values of the Dose variable with the Cochran-Armitage test. The EXACT
statement produces exact p-values for this test, and the MAXTIME= option terminates the exact computations if they do not complete within 60 seconds. The TEST
statement computes an asymptotic test for Somers D(C|R). These statements produce Output 2.7.1 through Output 2.7.3.
proc freq data=Pain;
weight Count;
tables Dose*Adverse / trend measures cl;
test smdcr;
exact trend / maxtime=60;
title1 Clinical Trial for Treatment of Pain;
run;
177
178
Adverse
Frequency|
Percent |
Row Pct |
Col Pct |No
|Yes
| Total
---------+--------+--------+
0 |
26 |
6 |
32
| 16.15 |
3.73 | 19.88
| 81.25 | 18.75 |
| 25.49 | 10.17 |
---------+--------+--------+
1 |
26 |
7 |
33
| 16.15 |
4.35 | 20.50
| 78.79 | 21.21 |
| 25.49 | 11.86 |
---------+--------+--------+
2 |
23 |
9 |
32
| 14.29 |
5.59 | 19.88
| 71.88 | 28.13 |
| 22.55 | 15.25 |
---------+--------+--------+
3 |
18 |
14 |
32
| 11.18 |
8.70 | 19.88
| 56.25 | 43.75 |
| 17.65 | 23.73 |
---------+--------+--------+
4 |
9 |
23 |
32
|
5.59 | 14.29 | 19.88
| 28.13 | 71.88 |
|
8.82 | 38.98 |
---------+--------+--------+
Total
102
59
161
63.35
36.65
100.00
The Row Pct values in Output 2.7.1 show the expected increasing trend in the
proportion of adverse effects due to increasing dosage (from 18.75% to 71.88%).
0.2569
0.4427
0.0499
0.0837
0.1592
0.2786
0.3547
0.6068
Pearson Correlation
Spearman Correlation
0.3776
0.3771
0.0714
0.0718
0.2378
0.2363
0.5175
0.5178
0.2373
0.1250
0.1604
0.0837
0.0662
0.0621
0.0732
0.0000
0.0388
0.4014
0.2547
0.2821
0.1261
0.0515
0.0731
0.0467
0.0191
0.0271
0.0346
0.0140
0.0199
0.2175
0.0890
0.1262
Somers D C|R
-------------------------------Somers D C|R
0.2569
ASE
0.0499
95% Lower Conf Limit
0.1592
95% Upper Conf Limit
0.3547
Test of H0: Somers D C|R = 0
ASE under H0
Z
One-sided Pr > Z
Two-sided Pr > |Z|
0.0499
5.1511
<.0001
<.0001
179
180
<.0001
<.0001
7.237E-07
1.324E-06
The Cochran-Armitage test (Output 2.7.3) supports the trend hypothesis. The small
left-sided p-values for the Cochran-Armitage test indicate that the probability of the
Column 1 level (Adverse=No) decreases as Dose increases or, equivalently, that
the probability of the Column 2 level (Adverse=Yes) increases as Dose increases.
The two-sided p-value tests against either an increasing or decreasing alternative.
This is an appropriate hypothesis when you want to determine whether the drug has
progressive effects on the probability of adverse effects but the direction is unknown.
$ SkinResponse @@;
1
2
3
4
5
6
7
8
sadness
sadness
sadness
sadness
sadness
sadness
sadness
sadness
22.5
53.7
10.8
21.1
13.7
39.2
13.7
16.3
1
2
3
4
5
6
7
8
calmness
calmness
calmness
calmness
calmness
calmness
calmness
calmness
22.6
53.1
8.3
21.6
13.3
37.0
14.8
14.8
In the following statements, the TABLES statement creates a three-way table stratied by Subject and a two-way table; the variables Emotion and SkinResponse
form the rows and columns of each table. The CMH2 option produces the rst two
Cochran-Mantel-Haenszel statistics, the option SCORES=RANK species that rank
scores are used to compute these statistics, and the NOPRINT option suppresses the
contingency tables. These statements produce Output 2.8.1 and Output 2.8.2.
proc freq data=Hypnosis;
tables Subject*Emotion*SkinResponse
/ cmh2 scores=rank noprint;
run;
Because the CMH statistics in Output 2.8.1 are based on rank scores, the Row Mean
Scores Differ statistic is identical to Friedmans chi-square (Q = 6.45). The p-value
of 0.0917 indicates that differences in skin potential response for different emotions
are signicant at the 10% level but not at the 5% level.
181
182
When you do not stratify by subject, the Row Mean Scores Differ CMH statistic is
identical to a Kruskal-Wallis test and is not signicant (p=0.9038 in Output 2.8.2).
Thus, adjusting for subject is critical to reducing the background variation due to
subject differences.
Drug_B
Frequency
Percent
-----------------------------------Favorable
28
60.87
Unfavorable
18
39.13
Drug_C
Frequency
Percent
-----------------------------------Favorable
16
34.78
Unfavorable
30
65.22
The one-way frequency tables in Output 2.9.1 provide the marginal response for each
drug. For drugs A and B, 61% of the subjects reported a favorable response while
35% of the subjects reported a favorable response to drug C.
183
184
References
McNemars test (Output 2.9.2) shows strong discordance between drugs B and C
when the response to drug A is favorable. The small negative value of the simple
kappa indicates no agreement between drug B response and drug C response.
Output 2.9.3. Cochrans Q
Study of Three Drug Treatments for a Chronic Disease
Summary Statistics for Drug_B by Drug_C
Controlling for Drug_A
Cochrans Q, for Drug_A
by Drug_B by Drug_C
----------------------Statistic (Q)
8.4706
DF
2
Pr > Q
0.0145
Total Sample Size = 46
References
Agresti, A. (1990), Categorical Data Analysis, New York: John Wiley & Sons, Inc.
Agresti, A. (1992), A Survey of Exact Inference for Contingency Tables, Statistical
Science, 7(1), 131177.
Agresti, A. (1996), An Introduction to Categorical Data Analysis, New York: John
Wiley & Sons, Inc.
Agresti, A., Mehta, C.R. and Patel, N.R. (1990), Exact Inference for Contingency
Tables with Ordered Categories, Journal of the American Statistical
Association, 85, 453458.
Agresti, A., Wackerly, D., and Boyett, J.M. (1979), Exact Conditional Tests
for Cross-Classications: Approximation of Attained Signicance Levels,
Psychometrika, 44, 7583.
Birch, M.W. (1965), The Detection of Partial Association, II: The General Case,
Journal of the Royal Statistical Society, B, 27, 111124.
Bishop, Y., Fienberg, S.E., and Holland, P.W. (1975), Discrete Multivariate Analysis:
Theory and Practice, Cambridge, MA: MIT Press.
Bowker, A.H. (1948), Bowkers Test for Symmetry, Journal of the American
Statistical Association, 43, 572574.
Breslow, N.E. (1996), Statistics in Epidemiology: The Case-Control Study, Journal
of the American Statistical Association, 91, 1426.
185
186
References
Goodman, L.A. and Kruskal, W.H. (1979), Measures of Association for Cross
Classication, New York: Springer-Verlag.
Greenland, S. and Robins, J.M. (1985), Estimators of the Mantel-Haenszel Variance
Consistent in Both Sparse Data and Large-Strata Limiting Models, Biometrics,
42, 311323.
Haldane, J.B.S. (1955), The Estimation and Signicance of the Logarithm of a Ratio
of Frequencies, Annals of Human Genetics, 20, 309314.
Hollander, M. and Wolfe, D.A. (1973), Nonparametric Statistical Methods, New
York: John Wiley & Sons, Inc.
Jones, M.P., OGorman, T.W., Lemka, J.H., and Woolson, R.F. (1989), A Monte
Carlo Investigation of Homogeneity Tests of the Odds Ratio Under Various
Sample Size Congurations, Biometrics, 45, 171181.
Kendall, M. (1955), Rank Correlation Methods, Second Edition, London: Charles
Grifn and Co.
Kendall, M. and Stuart, A. (1979), The Advanced Theory of Statistics, vol. 2, New
York: Macmillan Publishing Company, Inc.
Kleinbaum, D.G., Kupper, L.L., and Morgenstern, H. (1982), Epidemiologic
Research: Principles and Quantitative Methods, Research Methods Series, New
York: Van Nostrand Reinhold.
Landis, R.J., Heyman, E.R., and Koch, G.G. (1978), Average Partial Association in
Three-way Contingency Tables: A Review and Discussion of Alternative Tests,
International Statistical Review, 46, 237254.
Leemis, L.M. and Trivedi, K.S. (1996), A Comparison of Approximate Interval
Estimators for the Bernoulli Parameter, The American Statistician, 50(1),
6368.
Lehmann, E.L. (1975), Nonparametrics: Statistical Methods Based on Ranks, San
Francisco: Holden-Day, Inc.
Liebetrau, A.M. (1983), Measures of Association, Quantitative Application in the
Social Sciences, vol. 32, Beverly Hills: Sage Publications, Inc.
Mack, G.A. and Skillings, J.H. (1980), A Friedman-Type Rank Test for Main Effects
in a Two-Factor ANOVA, Journal of the American Statistical Association, 75,
947951.
Mantel, N. (1963), Chi-square Tests with One Degree of Freedom: Extensions of the
Mantel-Haenszel Procedure, Journal of the American Statistical Association,
58, 690700.
Mantel, N. and Haenszel, W. (1959), Statistical Aspects of the Analysis of Data from
Retrospective Studies of Disease, Journal of the National Cancer Institute, 22,
719748.
Margolin, B.H. (1988), Test for Trend in Proportions, in Encyclopedia of Statistical
Sciences, vol. 9, ed. S. Kotz and N.L. Johnson, New York: John Wiley & Sons,
Inc., 334336.
187
188
References
Valz, P.D. and Thompson, M.E. (1994), Exact Inference for Kendalls S and
Spearmans Rho with Extensions to Fishers Exact Test in r c Contingency
Tables, Journal of Computational and Graphical Statistics, 3(4), 459472.
van Elteren, P.H. (1960), On the Combination of Independent Two-Sample Tests of
Wilcoxon, Bulletin of the International Statistical Institute, 37, 351361.
Vollset, S.E., Hirji, K.F., and Elashoff, R.M. (1991), Fast Computation of Exact
Condence Limits for the Common Odds Ratio in a Series of 2 2 Tables,
Journal of the American Statistical Association, 86, 404409.
Woolf, B. (1955), On Estimating the Relationship between Blood Group and
Disease, Annals of Human Genetics, 19, 251253.
189
190
Chapter 3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
196
196
196
197
200
SYNTAX . . . . . . . . . . . . . .
PROC UNIVARIATE Statement
BY Statement . . . . . . . . . .
CLASS Statement . . . . . . . .
FREQ Statement . . . . . . . . .
HISTOGRAM Statement . . . .
ID Statement . . . . . . . . . . .
INSET Statement . . . . . . . .
OUTPUT Statement . . . . . . .
PROBPLOT Statement . . . . .
QQPLOT Statement . . . . . . .
VAR Statement . . . . . . . . .
WEIGHT Statement . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
202
203
209
210
212
212
230
230
237
241
253
266
267
DETAILS . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Missing Values . . . . . . . . . . . . . . . . . . . . . . . . .
Rounding . . . . . . . . . . . . . . . . . . . . . . . . . . .
Descriptive Statistics . . . . . . . . . . . . . . . . . . . . .
Calculating the Mode . . . . . . . . . . . . . . . . . . . . .
Calculating Percentiles . . . . . . . . . . . . . . . . . . . .
Tests for Location . . . . . . . . . . . . . . . . . . . . . . .
Condence Limits for Parameters of the Normal Distribution
Robust Estimators . . . . . . . . . . . . . . . . . . . . . . .
Creating Line Printer Plots . . . . . . . . . . . . . . . . . .
Creating High-Resolution Graphics . . . . . . . . . . . . . .
Using the CLASS Statement to Create Comparative Plots . .
Positioning the Inset . . . . . . . . . . . . . . . . . . . . . .
Formulas for Fitted Continuous Distributions . . . . . . . . .
Goodness-of-Fit Tests . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
268
268
269
269
272
273
275
277
278
281
284
284
285
288
292
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
192
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
297
298
299
300
303
304
305
305
306
308
308
309
310
311
EXAMPLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312
Example 3.1. Computing Descriptive Statistics for Multiple Variables . . . . 312
Example 3.2. Calculating Modes . . . . . . . . . . . . . . . . . . . . . . . 314
Example 3.3. Identifying Extreme Observations and Extreme Values . . . . 315
Example 3.4. Creating a Frequency Table . . . . . . . . . . . . . . . . . . . 317
Example 3.5. Creating Plots for Line Printer Output . . . . . . . . . . . . . 319
Example 3.6. Analyzing a Data Set With a FREQ Variable . . . . . . . . . . 322
Example 3.7. Saving Summary Statistics in an OUT= Output Data Set . . . 323
Example 3.8. Saving Percentiles in an Output Data Set . . . . . . . . . . . . 325
Example 3.9. Computing Condence Limits for the Mean, Standard Deviation,
and Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
Example 3.10. Computing Condence Limits for Quantiles and Percentiles . 328
Example 3.11. Computing Robust Estimates . . . . . . . . . . . . . . . . . 329
Example 3.12. Testing for Location . . . . . . . . . . . . . . . . . . . . . . 331
Example 3.13. Performing a Sign Test Using Paired Data . . . . . . . . . . 332
Example 3.14. Creating a Histogram . . . . . . . . . . . . . . . . . . . . . 333
Example 3.15. Creating a One-Way Comparative Histogram . . . . . . . . . 334
Example 3.16. Creating a Two-Way Comparative Histogram . . . . . . . . . 337
Example 3.17. Adding Insets with Descriptive Statistics . . . . . . . . . . . 338
Example 3.18. Binning a Histogram . . . . . . . . . . . . . . . . . . . . . 340
Example 3.19. Adding a Normal Curve to a Histogram . . . . . . . . . . . . 343
Example 3.20. Adding Fitted Normal Curves to a Comparative Histogram . 345
Example 3.21. Fitting a Beta Curve . . . . . . . . . . . . . . . . . . . . . . 346
Example 3.22. Fitting Lognormal, Weibull, and Gamma Curves . . . . . . . 348
Example 3.23. Computing Kernel Density Estimates . . . . . . . . . . . . . 352
Example 3.24. Fitting a Three-Parameter Lognormal Curve . . . . . . . . . 354
Example 3.25. Annotating a Folded Normal Curve . . . . . . . . . . . . . . 355
Example 3.26. Creating Lognormal Probability Plots . . . . . . . . . . . . . 360
Example 3.27. Creating a Histogram to Display Lognormal Fit . . . . . . . 363
Example 3.28. Creating a Normal Quantile Plot . . . . . . . . . . . . . . . 365
Example 3.29. Adding a Distribution Reference Line . . . . . . . . . . . . 366
Example 3.30. Interpreting a Normal Quantile Plot . . . . . . . . . . . . . . 368
Example 3.31. Estimating Three Parameters from Lognormal Quantile Plots 369
193
194
Chapter 3
196
Getting Started
The following examples demonstrate how you can use the UNIVARIATE procedure
to analyze the distributions of variables through the use of descriptive statistical measures and graphical displays, such as histograms.
The ODS SELECT statement restricts the default output to the tables for basic statistical measures and extreme observations.
Variable:
Location
Mean
Median
Mode
Variability
0.292512
0.248050
0.250000
Std Deviation
Variance
Range
Interquartile Range
0.16476
0.02715
1.24780
0.16419
Extreme Observations
-------Lowest------
-----Highest-----
Value
Obs
Value
Obs
0.0651786
0.0690157
0.0699755
0.0702412
0.0704787
1
3
59
84
4
1.13976
1.14209
1.14286
1.17090
1.31298
5776
5791
5801
5799
5811
The tables in Figure 3.1 show, in particular, that the average ratio is 0.2925 and the
minimum and maximum ratios are 0.06518 and 1.1398, respectively.
The NOPRINT option suppresses the display of summary statistics. The INSET
statement inserts the total number of analyzed home loans in the northeast corner of
the plot.
197
198
The data set HomeLoans contains a variable named LoanType that classies the
loans into two types: Gold and Platinum. It is useful to compare the distributions
of LoanToValueRatio for the two types. The following statements request quantiles
for each distribution and a comparative histogram, which are shown in Figure 3.3 and
Figure 3.4.
title Comparison of Loan Types;
ods select Quantiles MyHist;
proc univariate data=HomeLoans;
var LoanToValueRatio;
class LoanType;
histogram LoanToValueRatio / cfill=ltgray
kernel(color=black)
name=MyHist;
inset n=Number of Homes median=Median Ratio (5.3) / position=ne;
label LoanType = Type of Loan;
run;
The ODS SELECT statement restricts the default output to the tables of quantiles.
The CLASS statement species LoanType as a classication variable for the quantile computations and comparative histogram. The KERNEL option adds a smooth
nonparametric estimate of the ratio density to each histogram. The INSET statement
species summary statistics to be displayed directly in the graph.
Variable:
Estimate
1.0617647
0.8974576
0.6385908
0.4471369
0.2985099
0.2217033
0.1734568
0.1411130
0.1213079
0.0942167
0.0651786
Variable:
Estimate
100% Max
99%
95%
90%
75% Q3
50% Median
25% Q1
10%
5%
1%
0% Min
1.312981
1.050000
0.691803
0.549273
0.430160
0.366168
0.314452
0.273670
0.253124
0.231114
0.215504
The output in Figure 3.3 shows that the median ratio for Platinum loans (0.366) is
greater than the median ratio for Gold loans (0.222). The comparative histogram in
Figure 3.4 enables you to compare the two distributions more easily.
199
200
The comparative histogram shows that the ratio distributions are similar except for a
shift of about 0.14.
A sample program, univar1.sas, for this example is available in the SAS Sample
Library for Base SAS software.
Variable:
N
Mean
Std Deviation
Skewness
Uncorrected SS
Coeff Variation
30
-0.0053067
0.00254362
1.2562507
0.00103245
-47.932613
Sum Weights
Sum Observations
Variance
Kurtosis
Corrected SS
Std Error Mean
30
-0.1592
6.47002E-6
0.69790426
0.00018763
0.0004644
--Statistic---
-----p Value------
Shapiro-Wilk
Kolmogorov-Smirnov
Cramer-von Mises
Anderson-Darling
W
D
W-Sq
A-Sq
Pr
Pr
Pr
Pr
0.845364
0.208921
0.329274
1.784881
<
>
>
>
W
D
W-Sq
A-Sq
0.0005
<0.0100
<0.0050
<0.0050
All four goodness-of-t tests in Figure 3.5 reject the hypothesis that the measurements are normally distributed.
Figure 3.6 shows a normal probability plot for the measurements. A linear pattern
of points following the diagonal reference line would indicate that the measurements
are normally distributed. Instead, the curved point pattern suggests that a skewed
distribution, such as the lognormal, is more appropriate than the normal distribution.
A lognormal distribution for Deviation is tted in Example 3.26.
A sample program, univar2.sas, for this example is available in the SAS Sample
Library for Base SAS software.
201
202
Syntax
PROC UNIVARIATE < options > ;
BY variables ;
CLASS variable-1 <(v-options)>< variable-2 <(v-options)>>
< / KEYLEVEL= value1 | ( value1 value2 ) >;
FREQ variable ;
HISTOGRAM < variables >< / options > ;
ID variables ;
INSET keyword-list < / options > ;
OUTPUT < OUT=SAS-data-set >
< keyword1=names. . .keywordk=names >< percentile-options >;
PROBPLOT < variables >< / options > ;
QQPLOT < variables >< / options > ;
VAR variables ;
WEIGHT variable ;
The PROC UNIVARIATE statement invokes the procedure. The VAR statement
species the numeric variables to be analyzed, and it is required if the OUTPUT
statement is used to save summary statistics in an output data set. If you do not use
the VAR statement, all numeric variables in the data set are analyzed.
sample moments
basic measures of location and variability
condence intervals for the mean, standard deviation, and variance
tests for location
tests for normality
trimmed and Winsorized means
robust estimates of scale
quantiles and related condence intervals
extreme observations and extreme values
frequency counts for observations
missing values
203
204
ALL
requests all statistics and tables that the FREQ, MODES, NEXTRVAL=5, PLOT,
and CIBASIC options generate. If the analysis variables are not weighted,
this option also requests the statistics and tables generated by the CIPCTLDF,
CIPCTLNORMAL, LOCCOUNT, NORMAL, ROBUSTSCALE, TRIMMED=.25,
and WINSORIZED=.25 options. PROC UNIVARIATE also uses any values
that you specify for ALPHA=, MU0=, NEXTRVAL=, CIBASIC, CIPCTLDF,
CIPCTLNORMAL, TRIMMED=, or WINSORIZED= to produce the output.
ALPHA=
species the level of signicance for 100(1 )% condence intervals. The value
must be between 0 and 1; the default value is 0.05, which results in 95% condence
intervals.
Note that specialized ALPHA= options are available for a number of condence interval options. For example, you can specify CIBASIC( ALPHA=0.10 ) to request a
table of basic condence limits at the 90% level. The default values of these options
are the value of the ALPHA= option in the PROC statement.
ANNOTATE=SAS-data-set
ANNO=SAS-data-set
requests condence limits for the mean, standard deviation, and variance based on the
assumption that the data are normally distributed. If you use the CIBASIC option,
you must use the default value of VARDEF=, which is DF.
TYPE=keyword
requests condence limits for quantiles based on the assumption that the data are
normally distributed. The computational method is described in Section 4.4.1 of
Hahn and Meeker (1991) and uses the noncentral t distribution as given by Odeh and
Owen (1980). This option does not apply if you use a WEIGHT statement
TYPE=keyword
species the input SAS data set to be analyzed. If the DATA= option is omitted, the
procedure uses the most recently created SAS data set.
EXCLNPWGT
excludes observations with nonpositive weight values (zero or negative) from the
analysis. By default, PROC UNIVARIATE treats observations with negative weights
like those with zero weights and counts them in the total number of observations.
This option applies only when you use a WEIGHT statement.
FREQ
requests a frequency table that consists of the variable values, frequencies, cell percentages, and cumulative percentages.
If you specify the WEIGHT statement, PROC UNIVARIATE includes the weighted
count in the table and uses this value to compute the percentages.
GOUT=graphics-catalog
species the SAS catalog that PROC UNIVARIATE uses to save high-resolution
graphics output. If you omit the libref in the name of the graphics-catalog, PROC
UNIVARIATE looks for the catalog in the temporary library called WORK and creates the catalog if it does not exist.
205
206
LOCCOUNT
requests a table that shows the number of observations greater than, not equal to, and
less than the value of MU0=. PROC UNIVARIATE uses these values to construct the
sign test and the signed rank test. This option does not apply if you use a WEIGHT
statement.
MODES|MODE
requests a table of all possible modes. By default, when the data contain multiple
modes, PROC UNIVARIATE displays the lowest mode in the table of basic statistical
measures. When all the values are unique, PROC UNIVARIATE does not produce a
table of modes.
MU0=values
LOCATION=values
species the value of the mean or location parameter (0 ) in the null hypothesis for
tests of location summarized in the table labeled Tests for Location: Mu0=value. If
you specify one value, PROC UNIVARIATE tests the same null hypothesis for all
analysis variables. If you specify multiple values, a VAR statement is required, and
PROC UNIVARIATE tests a different null hypothesis for each analysis variable in
the corresponding order. The default value is 0.
The following statement tests the hypothesis 0 = 0 for the rst variable and the
hypothesis 0 = 0.5 for the second variable.
proc univariate mu0=0 0.5;
NEXTROBS=n
species the number of extreme observations that PROC UNIVARIATE lists in the
table of extreme observations. The table lists the n lowest observations and the n
highest observations. The default value is 5, and n can range between 0 and half the
maximum number of observations. You can specify NEXTROBS=0 to suppress the
table of extreme observations.
NEXTRVAL=n
species the number of extreme values that PROC UNIVARIATE lists in the table of
extreme values. The table lists the n lowest unique values and the n highest unique
values. The default value is 0, and n can range between 0 and half the maximum
number of observations. By default, n = 0 and no table is displayed.
NOBYPLOT
suppresses side-by-side box plots that are created by default when you use the BY
statement and the ALL option or the PLOT option in the PROC statement.
NOPRINT
suppresses all the tables of descriptive statistics that the PROC UNIVARIATE statement creates. NOPRINT does not suppress the tables that the HISTOGRAM statement creates. You can use the NOPRINT option in the HISTOGRAM statement to
suppress the creation of its tables. Use NOPRINT when you want to create an OUT=
output data set only.
requests tests for normality that include a series of goodness-of-t tests based on
the empirical distribution function. The table provides test statistics and p-values
for the Shapiro-Wilk test (provided the sample size is less than or equal to 2000),
the Kolmogorov-Smirnov test, the Anderson-Darling test, and the Cramr-von Mises
test. This option does not apply if you use a WEIGHT statement.
PCTLDEF=value
DEF=value
species the denition that PROC UNIVARIATE uses to calculate quantiles. The
default value is 5. Values can be 1, 2, 3, 4, or 5. You cannot use PCTLDEF= when
you compute weighted quantiles. See the section Calculating Percentiles on page
273 for details on quantile denitions.
PLOTS | PLOT
produces a stem-and-leaf plot (or a horizontal bar chart), a box plot, and a normal
probability plot in line printer output. If you use a BY statement, side-by-side box
plots that are labeled Schematic Plots appear after the univariate analysis for the
last BY group.
PLOTSIZE=n
species the approximate number of rows used in line-printer plots requested with the
PLOTS option. If n is larger than the value of the SAS system option PAGESIZE=,
PROC UNIVARIATE uses the value of PAGESIZE=. If n is less than 8, PROC
UNIVARIATE uses eight rows to draw the plots.
ROBUSTSCALE
produces a table with robust estimates of scale. The statistics include the interquartile range, Ginis mean difference, the median absolute deviation about the median
(MAD), and two statistics proposed by Rousseeuw and Croux (1993), Qn , and Sn .
This option does not apply if you use a WEIGHT statement.
ROUND=units
species the units to use to round the analysis variables prior to computing statistics.
If you specify one unit, PROC UNIVARIATE uses this unit to round all analysis
variables. If you specify multiple units, a VAR statement is required, and each unit
rounds the values of the corresponding analysis variable. If ROUND=0, no rounding
occurs. The ROUND= option reduces the number of unique variable values, thereby
reducing memory requirements for the procedure. For example, to make the rounding
unit 1 for the rst analysis variable and 0.5 for the second analysis variable, submit
the statement
proc univariate round=1 0.5;
var yldstren tenstren;
run;
When a variable value is midway between the two nearest rounded points, the value
is rounded to the nearest even multiple of the roundoff value. For example, with a
roundoff value of 1, the variable values of 2.5, 2.2, and 1.5 are rounded to 2;
the values of 0.5, 0.2, and 0.5 are rounded to 0; and the values of 0.6, 1.2, and 1.4
are rounded to 1.
207
208
species the type of condence limit for the mean, where keyword is LOWER,
UPPER, or TWOSIDED. The default value is TWOSIDED.
ALPHA=
species the divisor to use in the calculation of variances and standard deviation. By
default, VARDEF=DF. The following table shows the possible values for divisor and
associated divisors.
Table 3.1. Possible Values for VARDEF=
Value
DF
N
WDF
WEIGHT|WGT
Divisor
Degrees of freedom
Number of observations
Sum of weights minus one
Sum of weights
CSS
The procedure computes the variance as divisor where CSS is the corrected sums of
n
2 . When you weight the analysis variables, CSS =
w
i=1 (wi xi xw )
The default value is DF. To compute the standard error of the mean, condence limits,
and Students t test, use the default value of VARDEF=.
When you use the WEIGHT statement and VARDEF=DF, the variance is an estimate
s2
of s2 where the variance of the ith observation is var(xi ) = wi and wi is the weight
for the ith observation. This yields an estimate of the variance of an observation with
unit weight.
When you use the WEIGHT statement and VARDEF=WGT, the computed variance
2
is asymptotically (for large n) an estimate of s where w is the average weight. This
w
yields an asymptotic estimate of the variance of an observation with average weight.
BY Statement
WINSORIZED=values <(<TYPE=keyword><ALPHA= >)>
WINSOR=values <(<TYPE=keyword><ALPHA=>)>
requests of a table of Winsorized means, where value is the number or the proportion
species the type of condence limit for the mean, where keyword is LOWER,
UPPER, or TWOSIDED. The default is TWOSIDED.
ALPHA=
BY Statement
BY variables ;
You can specify a BY statement with PROC UNIVARIATE to obtain separate analyses for each BY group. The BY statement species the variables that the procedure
uses to form BY groups. You can specify more than one variable. If you do not
use the NOTSORTED option in the BY statement, the observations in the data set
must either be sorted by all the variables that you specify or they must be indexed
appropriately.
DESCENDING
species that the data set is sorted in descending order by the variable that immediately follows the word DESCENDING in the BY statement.
NOTSORTED
species that observations are not necessarily sorted in alphabetic or numeric order.
The data are grouped in another way, for example, chronological order.
The requirement for ordering or indexing observations according to the values of
BY variables is suspended for BY-group processing when you use the NOTSORTED
option. In fact, the procedure does not use an index if you specify NOTSORTED.
The procedure denes a BY group as a set of contiguous observations that have the
same values for all BY variables. If observations with the same values for the BY
variables are not contiguous, the procedure treats each contiguous set as a separate
BY group.
209
210
CLASS Statement
CLASS variable-1 <(v-options)>< variable-2 <(v-options)>>
< / KEYLEVEL= value1 | ( value1 value2 ) >;
The CLASS statement species one or two variables that the procedure uses to group
the data into classication levels. Variables in a CLASS statement are referred to as
class variables. Class variables can be numeric or character. Class variables can have
oating point values, but they typically have a few discrete values that dene levels of
the variable. You do not have to sort the data by class variables. PROC UNIVARIATE
uses the formatted values of the class variables to determine the classication levels.
You can specify the following v-options enclosed in parentheses after the class variable:
MISSING
species that missing values for the CLASS variable are to be treated as valid classication levels. Special missing values that represent numeric values (the letters A
through Z and the underscore ( ) character) are each considered as a separate value.
If you omit MISSING, PROC UNIVARIATE excludes the observations with a missing class variable value from the analysis. Enclose this option in parentheses after the
class variable.
ORDER=DATA | FORMATTED | FREQ | INTERNAL
species the display order for the class variable values. The default value is
INTERNAL. You can specify the following values with the ORDER=option:
DATA
orders values according to their order in the input data set. When you use
a HISTOGRAM, PROBPLOT, or QQPLOT statement, PROC UNIVARIATE
displays the rows (columns) of the comparative plot from top to bottom (left
to right) in the order that the class variable values rst appear in the input data
set.
FORMATTED
orders values by their ascending formatted values. This order may depend on
your operating environment. When you use a HISTOGRAM, PROBPLOT, or
QQPLOT statement, PROC UNIVARIATE displays the rows (columns) of the
comparative plot from top to bottom (left to right) in increasing order of the
formatted class variable values. For example, suppose a numeric class variable
DAY (with values 1, 2, and 3) has a user-dened format that assigns Wednesday
to the value 1, Thursday to the value 2, and Friday to the value 3. The rows
of the comparative plot will appear in alphabetical order (Friday, Thursday,
Wednesday) from top to bottom.
If there are two or more distinct internal values with the same formatted value,
then PROC UNIVARIATE determines the order by the internal value that occurs rst in the input data set. For numerical variables without an explicit
format, the levels are ordered by their internal values.
CLASS Statement
FREQ
orders values by descending frequency count so that levels with the most observations are listed rst. If two or more values have the same frequency count,
PROC UNIVARIATE uses the formatted values to determine the order.
When you use a HISTOGRAM, PROBPLOT, or QQPLOT statement, PROC
UNIVARIATE displays the rows (columns) of the comparative plot from top
to bottom (left to right) in order of decreasing frequency count for the class
variable values.
INTERNAL
orders values by their unformatted values, which yields the same order as
PROC SORT. This order may depend on your operating environment.
When you use a HISTOGRAM, PROBPLOT, or QQPLOT statement, PROC
UNIVARIATE displays the rows (columns) of the comparative plot from top
to bottom (left to right) in increasing order of the internal (unformatted) values
of the class variable. The rst class variable is used to label the rows of the
comparative plots (top to bottom). The second class variable is used to label
the columns of the comparative plots (left to right). For example, suppose a
numeric class variable DAY (with values 1, 2, and 3) has a user-dened format
that assigns Wednesday to the value 1, Thursday to the value 2, and Friday to
the value 3. The rows of the comparative plot will appear in day-of-the-week
order (Wednesday, Thursday, Friday) from top to bottom.
You can specify the following option after the slash (/) in the CLASS statement.
KEYLEVEL=value1 | ( value1 value2 )
species the key cell in a comparative plot. PROC UNIVARIATE rst determines
the bin size and midpoints for the key cell, and then extends the midpoint list to
accommodate the data ranges for the remaining cells. Thus, the choice of the key
cell determines the uniform horizontal axis that PROC UNIVARIATE uses for all
cells. If you specify only one class variable and use a HISTOGRAM statement,
KEYLEVEL= value identies the key cell as the level for which variable is equal to
value. By default, PROC UNIVARIATE sorts the levels in the order that is determined by the ORDER= option. Then, the key cell is the rst occurrence of a level in
this order. The cells display in order from top to bottom or left to right. Consequently,
the key cell appears at the top (or left). When you specify a different key cell with
the KEYLEVEL= option, this cell appears at the top (or left).
Likewise, with the PROBPLOT and QQPLOT statements, the key cell determines
uniform axis scaling. If you specify two class variables, use KEYLEVEL= value1
value2 to identify the key cell as the level for which variable-n is equal to value-n.
By default, PROC UNIVARIATE sorts the levels of the rst CLASS variable in the
order that is determined by its ORDER= option and, within each of these levels, it
sorts the levels of the second CLASS variable in the order that is determined by its
ORDER= option. Then, the default key cell is the rst occurrence of a combination
of levels for the two variables in this order. The cells display in the order of the rst
CLASS variable from top to bottom and in the order of the second CLASS variable
from left to right. Consequently, the default key cell appears at the upper left corner.
211
212
FREQ Statement
FREQ variable ;
The FREQ statement species a numeric variable whose value represents the frequency of the observation. If you use the FREQ statement, the procedure assumes
that each observation represents n observations, where n is the value of variable. If
the variable is not an integer, the SAS System truncates it. If the variable is less
than 1 or is missing, the procedure excludes that observation from the analysis. See
Example 3.6.
Note: The FREQ statement affects the degrees of freedom, but the WEIGHT statement does not.
HISTOGRAM Statement
HISTOGRAM < variables >< / options >;
The HISTOGRAM statement creates histograms and optionally superimposes estimated parametric and nonparametric probability density curves. You cannot use the
WEIGHT statement with the HISTOGRAM statement. You can use any number of
HISTOGRAM statements after a PROC UNIVARIATE statement. The components
of the HISTOGRAM statement are described as follows.
variables
are the variables for which histograms are to be created. If you specify a VAR statement, the variables must also be listed in the VAR statement. Otherwise, the variables
can be any numeric variables in the input data set. If you do not specify variables in
a VAR statement or in the HISTOGRAM statement, then by default, a histogram is
created for each numeric variable in the DATA= data set. If you use a VAR statement
and do not specify any variables in the HISTOGRAM statement, then by default, a
histogram is created for each variable listed in the VAR statement.
For example, suppose a data set named Steel contains exactly two numeric variables
named Length and Width. The following statements create two histograms, one for
Length and one for Width:
proc univariate data=Steel;
histogram;
run;
HISTOGRAM Statement
Likewise, the following statements create histograms for Length and Width:
proc univariate data=Steel;
var Length Width;
histogram;
run;
add features to the histogram. Specify all options after the slash (/) in the
HISTOGRAM statement. Options can be one of the following:
primary options for tted parametric distributions and kernel density estimates
secondary options for tted parametric distributions and kernel density estimates
general options for graphics and output data sets
For example, in the following statements, the NORMAL option displays a tted normal curve on the histogram, the MIDPOINTS= option species midpoints for the
histogram, and the CTEXT= option species the color of the text:
proc univariate data=Steel;
histogram Length / normal
midpoints = 5.6 5.8 6.0 6.2 6.4
ctext
= blue;
run;
Table 3.2 through Table 3.12 list the HISTOGRAM options by function. For complete
descriptions, see the the section Dictionary of Options on page 217.
213
214
GAMMA(gamma-options)
LOGNORMAL(lognormal-options)
NORMAL(normal-options)
WEIBULL(Weibull-options)
Table 3.3 through Table 3.9 list secondary options that specify parameters for tted
parametric distributions and that control the display of tted curves. Specify these
secondary options in parentheses after the primary distribution option. For example,
you can t a normal curve by specifying the NORMAL option as follows:
proc univariate;
histogram / normal(color=red mu=10 sigma=0.5);
run;
The COLOR= normal-option draws the curve in red, and the MU= and SIGMA=
normal-options specify the parameters = 10 and = 0.5 for the curve. Note
that the sample mean and sample standard deviation are used to estimate and ,
respectively, when the MU= and SIGMA= normal-options are not specied.
Table 3.3. Secondary Options Used with All Parametric Distribution Options
COLOR=color
Species color of density curve
FILL
L=linetype
MIDPERCENTS
NOPRINT
PERCENTS=value-list
W=n
HISTOGRAM Statement
Table 3.4. Secondary Beta-Options
ALPHA=value
Species rst shape parameter for beta curve
BETA=value
SIGMA=value | EST
THETA=value | EST
THETA=value | EST
SIGMA=value
THETA=value | EST
THETA=value | EST
ZETA=value
SIGMA=value
SIGMA=value
THETA=value | EST
COLOR=color
FILL
K=NORMAL |
QUADRATIC |
TRIANGULAR
L=linetype
LOWER=
UPPER=
W=n
215
216
General Options
Table 3.11 summarizes options for enhancing histograms, and Table 3.12 summarizes
options for requesting output data sets.
Table 3.11. General Graphics Options
Option
ANNOKEY
ANNOTATE=
BARWIDTH=
CAXIS=
CBARLINE=
CFILL=
CFRAME=
CFRAMESIDE=
CFRAMETOP=
CGRID=
CHREF=
CPROP=
CTEXT=
CTEXTSIDE=
CTEXTTOP=
CVREF=
DESCRIPTION=
ENDPOINTS=
FONT=
FORCEHIST
GRID
FRONTREF
HEIGHT=
HMINOR=
HOFFSET=
HREF=
HREFLABELS=
HREFLABPOS=
INFONT=
INHEIGHT=
INTERTILE=
LGRID=
LHREF=
LVREF=
MAXNBIN=
MAXSIGMAS=
MIDPOINTS=
NAME=
NCOLS=
NOBARS
NOFRAME
Description
Applies annotation requested in ANNOTATE= data set to key cell only
Species annotate data set
Species width for the bars
Species color for axis
Species color for outlines of histogram bars
Species color for lling under curve
Species color for frame
Species color for lling frame for row labels
Species color for lling frame for column labels
Species color for grid lines
Species color for HREF= lines
Species color for proportion of frequency bar
Species color for text
Species color for row labels of comparative histograms
Species color for column labels of comparative histograms
Species color for VREF= lines
Species description for plot in graphics catalog
Lists endpoints for histogram intervals
Species software font for text
Forces creation of histogram
Creates a grid
Draws reference lines in front of histogram bars
Species height of text used outside framed areas
Species number of horizontal minor tick marks
Species offset for horizontal axis
Species reference lines perpendicular to the horizontal axis
Species labels for HREF= lines
Species vertical position of labels for HREF= lines
Species software font for text inside framed areas
Species height of text inside framed areas
Species distance between tiles
Species a line type for grid lines
Species line style for HREF= lines
Species line style for VREF= lines
Species maximum number of bins to display
Limits the number of bins that display to within a specied number of
standard deviations above and below mean of data in key cell
Lists midpoints for histogram intervals
Species name for plot in graphics catalog
Species number of columns in comparative histogram
Suppresses histogram bars
Suppresses frame around plotting area
HISTOGRAM Statement
Table 3.11. (continued)
Option
NOHLABEL
NOPLOT
NOVLABEL
NOVTICK
NROWS=
PFILL=
RTINCLUDE
TURNVLABELS
VAXIS=
VAXISLABEL=
VMINOR=
VOFFSET=
VREF=
VREFLABELS=
VREFLABPOS=
VSCALE=
WAXIS=
WBARLINE=
WGRID=
Description
Suppresses label for horizontal axis
Suppresses plot
Suppresses label for vertical axis
Suppresses tick marks and tick mark labels for vertical axis
Species number of rows in comparative histogram
Species pattern for lling under curve
Includes right endpoint in interval
Turn and vertically string out characters in labels for vertical axis
Species AXIS statement or values for vertical axis
Species label for vertical axis
Species number of vertical minor tick marks
Species length of offset at upper end of vertical axis
Species reference lines perpendicular to the vertical axis
Species labels for VREF= lines
Species horizontal position of labels for VREF= lines
Species scale for vertical axis
Species line thickness for axes and frame
Species line thickness for bar outlines
Species line thickness for grid
Option
MIDPERCENTS
OUTHISTOGRAM=
Description
Creates table of histogram intervals
Species information on histogram intervals
Dictionary of Options
The following entries provide detailed descriptions of options in the HISTOGRAM
statement.
ALPHA=value
species the shape parameter for tted curves requested with the BETA and
GAMMA options. Enclose the ALPHA= option in parentheses after the BETA or
GAMMA options. By default, the procedure calculates a maximum likelihood estimate for . You can specify A= as an alias for ALPHA= if you use it as a beta-option.
You can specify SHAPE= as an alias for ALPHA= if you use it as a gamma-option.
ANNOKEY
applies the annotation requested with the ANNOTATE= option to the key cell only.
By default, the procedure applies annotation to all of the cells. This option is not
available unless you use the CLASS statement. You can use the KEYLEVEL= option
in the CLASS statement to specify the key cell.
217
218
ANNOTATE=SAS-data-set
ANNO=SAS-data-set
displays a tted beta density curve on the histogram. The BETA option can occur
only once in a HISTOGRAM statement. The beta distribution is bounded below by
the parameter and above by the value + . Use the THETA= and SIGMA= betaoptions to specify these parameters. By default, THETA=0 and SIGMA=1. You can
specify THETA=EST and SIGMA=EST to request maximum likelihood estimates
for and . See Example 3.21.
Note: Three- and four-parameter maximum likelihood estimation may not always
converge. The beta distribution has two shape parameters, and . If these parameters are known, you can specify their values with the ALPHA= and BETA=
beta-options. By default, the procedure computes maximum likelihood estimates for
and . Table 3.3 (page 214) and Table 3.4 (page 215) list options you can specify
with the BETA option.
BETA=value
B=value
species the second shape parameter for beta density curves requested with the
BETA option. Enclose the BETA= option in parentheses after the BETA option. By
default, the procedure calculates a maximum likelihood estimate for .
C=value
species the shape parameter c for Weibull density curves requested with the
WEIBULL option. Enclose the C= Weibull-option in parentheses after the WEIBULL
option. If you do not specify a value for c, the procedure calculates a maximum likelihood estimate. You can specify the SHAPE= Weibull-option as an alias for the C=
Weibull-option.
C=value-list | MISE
species the standardized bandwidth parameter c for kernel density estimates requested with the KERNEL option. Enclose the C= kernel-option in parentheses after
the KERNEL option. You can specify up to ve values to request multiple estimates.
You can also specify the C=MISE option, which produces the estimate with a bandwidth that minimizes the approximate mean integrated square error (MISE).
You can also use the C= kernel-option with the K= kernel-option, which species the
kernel function, to compute multiple estimates. If you specify more kernel functions
than bandwidths, the last bandwidth in the list is repeated for the remaining estimates.
Likewise, if you specify more bandwidths than kernel functions, the last kernel func-
HISTOGRAM Statement
tion is repeated for the remaining estimates. If you do not specify a value for c, the
bandwidth that minimizes the approximate MISE is used for all the estimates.
CAXIS=color
CAXES=color
CA=color
species the color for the axes and tick marks. This option overrides any COLOR=
specications in an AXIS statement. The default value is the rst color in the device
color list.
CBARLINE=color
species the color for the outline of the histogram bars. This option overrides the C=
option in the SYMBOL1 statement. The default value is the rst color in the device
color list.
CFILL=color
species the color to ll the bars of the histogram (or the area under a tted density
curve if you also specify the FILL option). See the entries for the FILL and PFILL=
options for additional details. Refer to SAS/GRAPH Software: Reference for a list of
colors. By default, bars and curve areas are not lled.
CFRAME=color
species the color for the area that is enclosed by the axes and frame. The area is not
lled by default.
CFRAMESIDE=color
species the color to ll the frame area for the row labels that display along the left
side of the comparative histogram. This color also lls the frame area for the label
of the corresponding class variable (if you associate a label with the variable). By
default, these areas are not lled. This option is not available unless you use the
CLASS statement.
CFRAMETOP=color
species the color to ll the frame area for the column labels that display across the
top of the comparative histogram. This color also lls the frame area for the label
of the corresponding class variable (if you associate a label with the variable). By
default, these areas are not lled. This option is not available unless you use the
CLASS statement.
CGRID=color
species the color for grid lines when a grid displays on the histogram. The default
color is the rst color in the device color list. This option also produces a grid.
CHREF=color
CH=color
species the color for horizontal axis reference lines requested by the HREF= option.
The default is the rst color in the device color list.
COLOR=color
species the color of the density curve. Enclose the COLOR= option in parentheses
after the distribution option or the KERNEL option. If you use the COLOR= option
with the KERNEL option, you can specify a list of up to ve colors in parentheses
219
220
CPROP=color | EMPTY
species the color for a horizontal bar whose length (relative to the width of the tile)
indicates the proportion of the total frequency that is represented by the corresponding
cell in a comparative histogram. By default, no bars are displayed. This option is not
available unless you use the CLASS statement. You can specify the keyword EMPTY
to display empty bars. See Example 3.20.
CTEXT=color
CT=color
species the color for tick mark values and axis labels. The default is the color
specied for the CTEXT= option in the GOPTIONS statement. In the absence of a
GOPTIONS statement, the default color is the rst color in the device color list.
CTEXTSIDE=color
species the color for the row labels that display along the left side of the comparative
histogram. By default, the color specied by the CTEXT= option is used. If you omit
the CTEXT= option, the color specied in the GOPTIONS statement is used. If you
omit the GOPTIONS statement, the the rst color in the device color list is used.
This option is not available unless you use the CLASS statement. You can specify
the CFRAMESIDE= option to change the background color for the row labels.
CTEXTTOP=color
species the color for the column labels that display along the left side of the comparative histogram. By default, the color specied by the CTEXT= option is used.
If you omit the CTEXT= option, the color specied in the GOPTIONS statement is
used. If you omit the GOPTIONS statement, the the rst color in the device color list
is used. This option is not available unless you specify the CLASS statement. You
can use the CFRAMETOP= option to change the background color for the column
labels.
CVREF=color
CV=color
species the color for lines requested with the VREF= option. The default is the rst
color in the device color list.
DESCRIPTION=string
DES=string
uses the endpoints as the tick mark values for the horizontal axis and determines how
to compute the bin width of the histogram bars, where values species values for both
the left and right endpoint of each histogram interval. The width of the histogram bars
is the difference between consecutive endpoints. The procedure uses the same values
for all variables.
The range of endpoints must cover the range of the data. For example, if you specify
HISTOGRAM Statement
endpoints=2 to 10 by 2
then all of the observations must fall in the intervals [2,4) [4,6) [6,8) [8,10]. You also
must use evenly spaced endpoints which you list in increasing order.
KEY
determines the endpoints for the data in the key cell. The initial
number of endpoints is based on the number of observations in the
key cell using the method of Terrell and Scott (1985). The procedure extends the endpoint list for the key cell in either direction as
necessary until it spans the data in the remaining cells.
UNIFORM
Neither KEY nor UNIFORM apply unless you use the CLASS statement.
If you omit ENDPOINTS, the procedure uses the midpoints. If you specify
ENDPOINTS, the procedure computes the endpoints by using an algorithm (Terrell
and Scott 1985) that is primarily applicable to continuous data that are approximately
normally distributed.
If you specify both MIDPOINTS= and ENDPOINTS, the procedure issues a warning
message and uses the endpoints.
If you specify RTINCLUDE, the procedure includes the right endpoint of each histogram interval in that interval instead of including the left endpoint.
If you use a CLASS statement and specify ENDPOINTS, the procedure uses
ENDPOINTS=KEY as the default. However if the key cell is empty, then the procedure uses ENDPOINTS=UNIFORM.
EXPONENTIAL <(exponential-options)>
EXP <(exponential-options)>
lls areas under the tted density curve or the kernel density estimate with colors
and patterns. The FILL option can occur with only one tted curve. Enclose the
FILL option in parentheses after a density curve option or the KERNEL option. The
CFILL= and PFILL= options specify the color and pattern for the area under the
curve. For a list of available colors and patterns, see SAS/GRAPH Reference.
221
222
FONT=font
species a software font for reference line and axis labels. You can also specify fonts
for axis labels in an AXIS statement. The FONT= font takes precedence over the
FTEXT= font specied in the GOPTIONS statement. Hardware characters are used
by default.
FORCEHIST
forces the creation of a histogram if there is only one unique observation. By default,
a histogram is not created if the standard deviation of the data is zero.
FRONTREF
draws reference lines requested with the HREF= and VREF= options in front of the
histogram bars. By default, reference lines are drawn behind the histogram bars and
can be obscured by them.
GAMMA <(gamma-options)>
displays a tted gamma density curve on the histogram. The GAMMA option can
occur only once in a HISTOGRAM statement. The parameter must be less than
the minimum data value. Use the THETA= gamma-option to specify . By default, THETA=0. You can specify THETA=EST to request the maximum likelihood
estimate for . Use the ALPHA= and the SIGMA= gamma-options to specify the
shape parameter and the scale parameter . By default, PROC UNIVARIATE
computes maximum likelihood estimates for and . The procedure calculates the
maximum likelihood estimate of iteratively using the Newton-Raphson approximation. Table 3.3 (page 214) and Table 3.6 (page 215) list options you can specify with
the GAMMA option. See Example 3.22.
GRID
displays a grid on the histogram. Grid lines are horizontal lines that are positioned at
major tick marks on the vertical axis.
HEIGHT=value
species the height, in percentage screen units, of text for axis labels, tick mark
labels, and legends. This option takes precedence over the HTEXT= option in the
GOPTIONS statement.
HMINOR=n
HM=n
species the number of minor tick marks between each major tick mark on the horizontal axis. Minor tick marks are not labeled. By default, HMINOR=0.
HOFFSET=value
species the offset, in percentage screen units, at both ends of the horizontal axis.
You can use HOFFSET=0 to eliminate the default offset.
HREF=values
draws reference lines that are perpendicular to the horizontal axis at the values
that you specify. If a reference line is almost completely obscured, then use the
FRONTREF option to draw the reference lines in front of the histogram bars. Also
see the CHREF=, HREFCHAR=, and LHREF= options.
HISTOGRAM Statement
HREFLABELS=label1 . . . labeln
HREFLABEL=label1 . . . labeln
HREFLAB=label1 . . . labeln
species labels for the lines requested by the HREF= option. The number of labels
must equal the number of lines. Enclose each label in quotes. Labels can have up to
16 characters.
HREFLABPOS=1 | 2 | 3
species a software font to use for text inside the framed areas of the histogram.
The INFONT= option takes precedence over the FTEXT= option in the GOPTIONS
statement. For a list of fonts, see SAS/GRAPH Reference.
INHEIGHT=value
species the height, in percentage screen units, of text used inside the framed areas
of the histogram. By default, the height specied by the HEIGHT= option is used.
If you do not specify the HEIGHT= option, the height specied with the HTEXT=
option in the GOPTIONS statement is used.
INTERTILE=value
species the distance, in horizontal percentage screen units, between the framed areas, which are called tiles. By default, INTERTILE=0.75 percentage screen units.
This option is not available unless you use the CLASS statement. You can specify
INTERTILE=0 to create contiguous tiles.
K=NORMAL | QUADRATIC | TRIANGULAR
species the kernel function (normal, quadratic, or triangular) used to compute a kernel density estimate. You can specify up to ve values to request multiple estimates.
You must enclose this option in parentheses after the KERNEL option. You can also
use the K= kernel-option with the C= kernel-option, which species standardized
bandwidths. If you specify more kernel functions than bandwidths, the procedure
repeats the last bandwidth in the list for the remaining estimates. Likewise, if you
specify more bandwidths than kernel functions, the procedure repeats the last kernel
function for the remaining estimates. By default, K=NORMAL.
KERNEL<(kernel-options)>
223
224
L=linetype
species the line type used for tted density curves. Enclose the L= option in parentheses after the distribution option or the KERNEL option. If you use the L= option
with the KERNEL option, you can specify a list of up to ve line types for multiple kernel density estimates. See the entries for the C= and K= options for details
on specifying multiple kernel density estimates. By default, L=1, which produces a
solid line.
LGRID=linetype
species the line type for the grid when a grid displays on the histogram. By default,
LGRID=1, which produces a solid line. This option also creates a grid.
LHREF=linetype
LH=linetype
species the line type for the reference lines that you request with the HREF= option.
By default, LHREF=2, which produces a dashed line.
LOGNORMAL<(lognormal-options)>
displays a tted lognormal density curve on the histogram. The LOGNORMAL option can occur only once in a HISTOGRAM statement. The parameter must be less
than the minimum data value. Use the THETA= lognormal-option to specify . By
default, THETA=0. You can specify THETA=EST to request the maximum likelihood estimate for . Use the SIGMA= and ZETA= lognormal-options to specify
and . By default, the procedure computes maximum likelihood estimates for and
. Table 3.3 (page 214) and Table 3.7 (page 215) list options you can specify with the
LOGNORMAL option. See Example 3.22 and Example 3.24.
LOWER=value-list
species lower bounds for kernel density estimates requested with the KERNEL option. Enclose the LOWER= option in parentheses after the KERNEL option. You can
specify up to ve lower bounds for multiple kernel density estimates. If you specify
more kernel estimates than lower bounds, the last lower bound is repeated for the
remaining estimates. The default is a missing value, indicating no lower bounds for
tted kernel density curves.
LVREF=linetype
LV=linetype
species the line type for lines requested with the VREF= option. By default,
LVREF=2, which produces a dashed line.
MAXNBIN=n
species the maximum number of bins displayed in the comparative histogram. This
option is useful when the scales or ranges of the data distributions differ greatly from
cell to cell. By default, the bin size and midpoints are determined for the key cell, and
then the midpoint list is extended to accommodate the data ranges for the remaining
cells. However, if the cell scales differ considerably, the resulting number of bins
may be so great that each cell histogram is scaled into a narrow region. By using
MAXNBIN= to limit the number of bins, you can narrow the window about the data
distribution in the key cell. This option is not available unless you specify the CLASS
statement. The MAXNBIN= option is an alternative to the MAXSIGMAS= option.
HISTOGRAM Statement
MAXSIGMAS=value
limits the number of bins displayed in the comparative histogram to a range of value
standard deviations (of the data in the key cell) above and below the mean of the data
in the key cell. This option is useful when the scales or ranges of the data distributions
differ greatly from cell to cell. By default, the bin size and midpoints are determined
for the key cell, and then the midpoint list is extended to accommodate the data ranges
for the remaining cells. However, if the cell scales differ considerably, the resulting
number of bins may be so great that each cell histogram is scaled into a narrow region.
By using MAXSIGMAS= to limit the number of bins, you can narrow the window
that surrounds the data distribution in the key cell. This option is not available unless
you specify the CLASS statement.
MIDPERCENTS
requests a table listing the midpoints and percentage of observations in each histogram interval. If you specify MIDPERCENTS in parentheses after a density estimate option, the procedure displays a table that lists the midpoints, the observed
percentage of observations, and the estimated percentage of the population in each
interval (estimated from the tted distribution). See Example 3.18.
MIDPOINTS=values | KEY | UNIFORM
species how to determine the midpoints for the histogram intervals, where values
determines the width of the histogram bars as the difference between consecutive
midpoints. The procedure uses the same values for all variables.
The range of midpoints, extended at each end by half of the bar width, must cover the
range of the data. For example, if you specify
midpoints=2 to 10 by 0.5
then all of the observations should fall between 1.75 and 10.25. You must use evenly
spaced midpoints listed in increasing order.
KEY
determines the midpoints for the data in the key cell. The initial
number of midpoints is based on the number of observations in the
key cell that use the method of Terrell and Scott (1985). The procedure extends the midpoint list for the key cell in either direction
as necessary until it spans the data in the remaining cells.
UNIFORM
Neither KEY nor UNIFORM apply unless you use the CLASS statement. By default,
if you use a CLASS statement, MIDPOINTS=KEY; however, if the key cell is empty
then MIDPOINTS=UNIFORM. Otherwise, the procedure computes the midpoints by
using an algorithm (Terrell and Scott 1985) that is primarily applicable to continuous
data that are approximately normally distributed.
225
226
MU=value
species the parameter for normal density curves requested with the NORMAL
option. Enclose the MU= option in parentheses after the NORMAL option. By
default, the procedure uses the sample mean for .
NAME=string
species a name for the plot, up to eight characters long, that appears in the PROC
GREPLAY master menu. The default value is UNIVAR.
NCOLS=n
NCOL=n
suppresses drawing of histogram bars, which is useful for viewing tted curves only.
NOFRAME
suppresses the label for the horizontal axis. You can use this option to reduce clutter.
NOPLOT
NOCHART
suppresses the creation of a plot. Use this option when you only want to tabulate
summary statistics for a tted density or create an OUTHISTOGRAM= data set.
NOPRINT
suppresses tables summarizing the tted curve. Enclose the NOPRINT option in
parentheses following the distribution option.
NORMAL<(normal-options)>
displays a tted normal density curve on the histogram. The NORMAL option can
occur only once in a HISTOGRAM statement. Use the MU= and SIGMA= normaloptions to specify and . By default, the procedure uses the sample mean and
sample standard deviation for and . Table 3.3 (page 214) and Table 3.8 (page 215)
list options you can specify with the NORMAL option. See Example 3.19.
NOVLABEL
suppresses the label for the vertical axis. You can use this option to reduce clutter.
NOVTICK
suppresses the tick marks and tick mark labels for the vertical axis. This option also
suppresses the label for the vertical axis.
NROWS=n
NROW=n
HISTOGRAM Statement
227
OUTHISTOGRAM=SAS-data-set
OUTHIST=SAS-data-set
creates a SAS data set that contains information about histogram intervals.
Specically, the data set contains the midpoints of the histogram intervals, the
observed percentage of observations in each interval, and the estimated percentage
of observations in each interval (estimated from each of the specied tted curves).
PERCENTS=values
PERCENT=values
species a list of percents for which quantiles calculated from the data and quantiles
estimated from the tted curve are tabulated. The percents must be between 0 and
100. Enclose the PERCENTS= option in parentheses after the curve option. The
default percents are 1, 5, 10, 25, 50, 75, 90, 95, and 99.
PFILL=pattern
species a pattern used to ll the bars of the histograms (or the areas under a tted
curve if you also specify the FILL option). See the entries for the CFILL= and FILL
options for additional details. Refer to SAS/GRAPH Software: Reference for a list of
pattern values. By default, the bars and curve areas are not lled.
RTINCLUDE
includes the right endpoint of each histogram interval in that interval. By default, the
left endpoint is included in the histogram interval.
SCALE=value
is an alias for the SIGMA= option for curves requested by the BETA,
EXPONENTIAL, GAMMA, and WEIBULL options and an alias for the ZETA=
option for curves requested by the LOGNORMAL option.
SHAPE=value
is an alias for the ALPHA= option for curves requested with the GAMMA option, an
alias for the SIGMA= option for curves requested with the LOGNORMAL option,
and an alias for the C= option for curves requested with the WEIBULL option.
SIGMA=value | EST
species the parameter for the tted density curve when you request the BETA,
EXPONENTIAL, GAMMA, LOGNORMAL, NORMAL, and WEIBULL options.
See Table 3.13 for a summary of how to use the SIGMA= option. You must enclose
this option in parentheses after the density curve option. As a beta-option, you can
specify SIGMA=EST to request a maximum likelihood estimate for .
Table 3.13. Uses of the SIGMA= Option
Distribution Keyword
BETA
EXPONENTIAL
GAMMA
WEIBULL
LOGNORMAL
NORMAL
WEIBULL
SIGMA= Species
Scale parameter
Scale parameter
Scale parameter
Scale parameter
Shape parameter
Scale parameter
Scale parameter
Default Value
1
Maximum likelihood estimate
Maximum likelihood estimate
Maximum likelihood estimate
Maximum likelihood estimate
Standard deviation
Maximum likelihood estimate
Alias
SCALE=
SCALE=
SCALE=
SCALE=
SCALE=
SHAPE=
SCALE=
228
THETA=value | EST
species the lower threshold parameter for curves requested with the BETA,
EXPONENTIAL, GAMMA, LOGNORMAL, and WEIBULL options. Enclose the
THETA= option in parentheses after the curve option. By default, THETA=0. If you
specify THETA=EST, an estimate is computed for .
THRESHOLD= value
is an alias for the THETA= option. See the preceding entry for the THETA= option.
TURNVLABELS
TURNVLABEL
turns the characters in the vertical axis labels so that they display vertically. This
happens by default when you use a hardware font.
UPPER=value-list
species upper bounds for kernel density estimates requested with the KERNEL option. Enclose the UPPER= option in parentheses after the KERNEL option. You can
specify up to ve upper bounds for multiple kernel density estimates. If you specify
more kernel estimates than upper bounds, the last upper bound is repeated for the
remaining estimates. The default is a missing value, indicating no upper bounds for
tted kernel density curves.
VAXIS=name|value-list
species the name of an AXIS statement describing the vertical axis. Alternatively,
you can specify a value-list for the vertical axis.
VAXISLABEL=label
species a label for the vertical axis. Labels can have up to 40 characters.
VMINOR=n
VM=n
species the number of minor tick marks between each major tick mark on the vertical
axis. Minor tick marks are not labeled. The default is zero.
VOFFSET=value
species the offset, in percentage screen units, at the upper end of the vertical axis.
VREF=value-list
draws reference lines perpendicular to the vertical axis at the values specied. Also
see the CVREF=, LVREF=, and VREFCHAR= options. If a reference line is almost
completely obscured, then use the FRONTREF option to draw the reference lines in
front of the histogram bars.
VREFLABELS=label1. . . labeln
VREFLABEL=label1. . . labeln
VREFLAB=label1. . . labeln
species labels for the lines requested by the VREF= option. The number of labels
must equal the number of lines. Enclose each label in quotes. Labels can have up to
16 characters.
HISTOGRAM Statement
VREFLABPOS=n
species the scale of the vertical axis for a histogram. The value COUNT requests
the data be scaled in units of the number of observations per data unit. The value
PERCENT requests the data be scaled in units of percent of observations per data
unit. The value PROPORTION requests the data be scaled in units of proportion of
observations per data unit. The default is PERCENT.
W=n
species the width, in pixels, of the tted density curve or the kernel density estimate
curve. By default, W=1. You must enclose this option in parentheses after the density
curve option or the KERNEL option. As a kernel-option, you can specify a list of up
to ve W= values.
WAXIS=n
species the line thickness, in pixels, for the axes and frame. By default, WAXIS=1.
WBARLINE=n
displays a tted Weibull density curve on the histogram. The WEIBULL option can
occur only once in a HISTOGRAM statement. The parameter must be less than
the minimum data value. Use the THETA= Weibull-option to specify . By default,
THETA=0. You can specify THETA=EST to request the maximum likelihood estimate for . Use the C= and SIGMA= Weibull-options to specify the shape parameter
c and the scale parameter . By default, the procedure computes the maximum likelihood estimates for c and . Table 3.3 (page 214) and Table 3.9 (page 215) list option
you can specify with the WEIBULL option. See Example 3.22.
PROC UNIVARIATE calculates the maximum likelihood estimate of a iteratively by
using the Newton-Raphson approximation. See also the C=, SIGMA=, and THETA=
Weibull-options.
WGRID=n
species a value for the scale parameter for lognormal density curves requested
with the LOGNORMAL option. Enclose the ZETA= lognormal-option in parentheses
after the LOGNORMAL option. By default, the procedure calculates a maximum
likelihood estimate for . You can specify the SCALE= option as an alias for the
ZETA= option.
229
230
ID Statement
ID variables ;
The ID statement species one or more variables to include in the table of extreme
observations. The corresponding values of the ID variables appear beside the n
largest and n smallest observations, where n is the value of NEXTROBS= option.
See Example 3.3.
INSET Statement
INSET keyword-list < / options > ;
The INSET statement places a box or table of summary statistics, called an inset,
directly in a high-resolution graph created with the HISTOGRAM, PROBPLOT, or
QQPLOT statement.
The INSET statement must follow the HISTOGRAM, PROBPLOT, or QQPLOT
statement that creates the plot that you want to augment. The inset appears in all
the graphs that the preceding plot statement produces.
You can use multiple INSET statements after a plot statement to add multiple insets
to a plot. See Example 3.17.
In an INSET statement, you specify one or more keywords that identify the information to display in the inset. The information is displayed in the order that you request
the keywords. Keywords can be any of the following:
statistical keywords
primary keywords
secondary keywords
The available statistical keywords are:
Table 3.14. Descriptive Statistic Keywords
CSS
CV
KURTOSIS
MAX
MEAN
MIN
MODE
N
NMISS
NOBS
RANGE
SKEWNESS
STD
STDMEAN
SUM
SUMWGT
USS
VAR
INSET Statement
Table 3.15. Percentile Statistic Keywords
P1
P5
P10
Q1
MEDIAN
Q3
P90
P95
P99
QRANGE
1st percentile
5th percentile
10th percentile
Lower quartile (25th percentile)
Median (50th percentile)
Upper quartile (75th percentile)
90th percentile
95th percentile
99th percentile
Interquartile range (Q3 - Q1)
GINI
MAD
QN
SN
STD GINI
STD MAD
STD QN
STD QRANGE
STD SN
MSIGN
NORMALTEST
PNORMAL
SIGNRANK
PROBM
PROBN
PROBS
PROBT
T
Sign statistic
Test statistic for normality
Probability value for the test of normality
Signed rank statistic
Probability of greater absolute value for
the sign statistic
Probability value for the test of normality
Probability value for the signed rank test
Probability value for the Students t test
Statistics for Students t test
A primary keyword enables you to specify secondary keywords in parentheses immediately after the primary keyword. Primary keywords are BETA, EXPONENTIAL,
GAMMA, LOGNORMAL, NORMAL, WEIBULL, WEIBULL2, KERNEL, and
KERNELn. If you specify a primary keyword but omit a secondary keyword, the
inset displays a colored line and the distribution name as a key for the density curve.
By default, PROC UNIVARIATE identies inset statistics with appropriate labels and
prints numeric values using appropriate formats. To customize the label, specify the
keyword followed by an equal sign (=) and the desired label in quotes. To customize
the format, specify a numeric format in parentheses after the keyword. Labels can
have up to 24 characters.
231
232
requests customized labels for two statistics and displays the standard deviation with
a eld width of 5 and two decimal places.
The following tables list primary keywords:
Table 3.18. Parametric Density Primary Keywords
Keyword
BETA
Distribution
Beta
EXPONENTIAL
Exponential
GAMMA
Gamma
LOGNORMAL
Lognormal
NORMAL
Normal
WEIBULL
Weibull(3parameter)
Weibull(2parameter)
WEIBULL2
Keyword
KERNEL
KERNELn
Description
Displays statistics for all kernel estimates
Displays statistics for only the nth kernel density estimate
n = 1, 2, 3, 4, or 5
Table 3.20 through Table 3.28 list the secondary keywords available with primary
keywords in Table 3.18 and Table 3.19.
Table 3.20. Secondary Keywords Available with the BETA Keyword
Secondary Keyword
ALPHA
BETA
SIGMA
THETA
MEAN
STD
Alias
SHAPE1
SHAPE2
SCALE
THRESHOLD
Description
First shape parameter
Second shape parameter
Scale parameter
Lower threshold parameter
Mean of the tted distribution
Standard deviation of the tted distribution
Secondary Keyword
SIGMA
THETA
MEAN
STD
Alias
SCALE
THRESHOLD
Description
Scale parameter
Threshold parameter
Mean of the tted distribution
Standard deviation of the tted distribution
INSET Statement
Table 3.22. Secondary Keywords Available with the GAMMA Keyword
Secondary Keyword
ALPHA
SIGMA
THETA
MEAN
STD
Alias
SHAPE
SCALE
THRESHOLD
Description
Shape parameter
Scale parameter
Threshold parameter
Mean of the tted distribution
Standard deviation of the tted distribution
Secondary Keyword
SIGMA
THETA
ZETA
MEAN
STD
Alias
SHAPE
THRESHOLD
SCALE
Description
Shape parameter
Threshold parameter
Scale parameter
Mean of the tted distribution
Standard deviation of the tted distribution
Secondary Keyword
MU
SIGMA
Alias
MEAN
STD
Description
Mean parameter
Scale parameter
Secondary Keyword
C
SIGMA
THETA
MEAN
STD
Alias
SHAPE
SCALE
THRESHOLD
Description
Shape parameter c
Scale parameter
Threshold parameter
Mean of the tted distribution
Standard deviation of the tted distribution
Secondary Keyword
C
SIGMA
THETA
MEAN
STD
Alias
SHAPE
SCALE
THRESHOLD
Description
Shape parameter c
Scale parameter
Known lower threshold 0
Mean of the tted distribution
Standard deviation of the tted distribution
Secondary Keyword
TYPE
BANDWIDTH
BWIDTH
C
AMISE
Description
Kernel type: normal, quadratic, or triangular
Bandwidth for the density estimate
Alias for BANDWIDTH
Standardized bandwidth c for the density estimate:
1
c = Q n 5 where n = sample size, = bandwidth, and
Q = interquartile range
Approximate mean integrated square error (MISE) for the
kernel density
233
234
Secondary Keyword
AD
ADPVAL
CVM
CVMPVAL
KSD
KSDPVAL
Description
Anderson-Darling EDF test statistic
Anderson-Darling EDF test p-value
Cramr-von Mises EDF test statistic
Cramr-von Mises EDF test p-value
Kolmogorov-Smirnov EDF test statistic
Kolmogorov-Smirnov EDF test p-value
The inset statistics listed in Table 3.18 through Table 3.28 are not available unless
you request a plot statement and options that calculate these statistics. For example,
proc univariate data=score;
histogram final / normal;
inset mean std normal(ad adpval);
run;
The MEAN and STD keywords display the sample mean and standard deviation of
FINAL. The NORMAL keyword with the secondary keywords AD and ADPVAL
display the Anderson-Darling goodness-of-t test statistic and p-value. The statistics that are specied with the NORMAL keyword are available only because the
NORMAL option is requested in the HISTOGRAM statement.
The KERNEL or KERNELn keyword is available only if you request a kernel density
estimate in a HISTOGRAM statement. The WEIBULL2 keyword is available only
if you request a two-parameter Weibull distribution in the PROBPLOT or QQPLOT
statement.
Summary of Options
The following table lists INSET statement options, which are specied after the slash
(/) in the INSET statement. For complete descriptions, see the section Dictionary of
Options on page 235.
Table 3.29. INSET Options
CFILL=color | BLANK
CFILLH=color
CFRAME=color
CHEADER=color
CSHADOW=color
CTEXT=color
DATA
DATA=SAS-data-set
FONT=font
FORMAT=format
HEADER=quoted string
HEIGHT=value
NOFRAME
INSET Statement
235
POSITION=position
REFPOINT=BR | BL | TR | TL
Dictionary of Options
The following entries provide detailed descriptions of options for the INSET statement.
To specify the same format for all the statistics in the INSET statement, use the
FORMAT= option.
To create a completely customized inset, use a DATA= data set. The data set contains
the label and the value that you want to display in the inset.
If you specify multiple kernel density estimates, you can request inset statistics for
all the estimates with the KERNEL keyword. Alternatively, you can display inset
statistics for individual curves with the KERNELn keyword, where n is the curve
number between 1 and 5.
CFILL=color | BLANK
species the color of the background. If you omit the CFILLH= option the header
background is included. By default, the background is empty, which causes items
that overlap the inset (such as curves or histogram bars) to show through the inset.
If you specify a value for CFILL= option, then overlapping items no longer show
through the inset. Use CFILL=BLANK to leave the background uncolored and to
prevent items from showing through the inset.
CFILLH=color
species the color of the header background. The default value is the CFILL= color.
CFRAME=color
species the color of the frame. The default value is the same color as the axis of the
plot.
CHEADER=color
species the color of the header text. The default value is the CTEXT= color.
CSHADOW=color
species the color of the drop shadow. By default, if a CSHADOW= option is not
specied, a drop shadow is not displayed.
CTEXT=color
species the color of the text. The default value is the same color as the other text on
the plot.
DATA
species that data coordinates are to be used in positioning the inset with the
POSITION= option. The DATA option is available only when you specify
POSITION=(x,y). You must place DATA immediately after the coordinates (x,y).
236
DATA=SAS-data-set
requests that PROC UNIVARIATE display customized statistics from a SAS data set
in the inset table. The data set must contain two variables:
LABEL
VALUE
The label and value from each observation in the data set occupy one line in the inset.
The position of the DATA= keyword in the keyword list determines the position of its
lines in the inset.
FONT=font
species the font of the text. By default, if you locate the inset in the interior of the
plot then the font is SIMPLEX. If you locate the inset in the exterior of the plot then
the font is the same as the other text on the plot.
FORMAT=format
species a format for all the values in the inset. If you specify a format for a particular
statistic, then this format overrides FORMAT= format. For more information about
SAS formats, see SAS Language Reference: Dictionary
HEADER=string
species the header text. The string cannot exceed 40 characters. By default, no
header line appears in the inset. If all the keywords that you list in the INSET statement are secondary keywords that correspond to a tted curve on a histogram, PROC
UNIVARIATE displays a default header that indicates the distribution and identies
the curve.
HEIGHT=value
determines the position of the inset. The position is a compass point keyword, a
margin keyword, or a pair of coordinates (x,y). You can specify coordinates in axis
percent units or axis data units. The default value is NW, which positions the inset
in the upper left (northwest) corner of the display. See the section Positioning the
Inset on page 285.
REFPOINT=BR | BL | TR | TL
species the reference point for an inset that PROC UNIVARIATE positions by a pair
of coordinates with the POSITION= option. The REFPOINT= option species which
corner of the inset frame that you want to position at coordinates (x,y). The keywords
are BL, BR, TL, and TR, which correspond to bottom left, bottom right, top left, and
top right. The default value is BL. You must use REFPOINT= with POSITION=(x,y)
coordinates.
OUTPUT Statement
OUTPUT Statement
OUTPUT < OUT=SAS-data-set >
< keyword1=names. . .keywordk=names >< percentile-options >;
The OUTPUT statement saves statistics and BY variables in an output data set. When
you use a BY statement, each observation in the OUT= data set corresponds to one
of the BY groups. Otherwise, the OUT= data set contains only one observation.
You can use any number of OUTPUT statements in the UNIVARIATE procedure.
Each OUTPUT statement creates a new data set containing the statistics specied in
that statement. You must use the VAR statement with the OUTPUT statement. The
OUTPUT statement must contain a specication of the form keyword=names or the
PCTLPTS= and PCTLPRE= specications. See Example 3.7 and Example 3.8.
OUT=SAS-data-set
identies the output data set. If SAS-data-set does not exist, PROC UNIVARIATE
creates it. If you omit OUT=, the data set is named DATAn, where n is the smallest
integer that makes the name unique. The default SAS-data-set is DATAn.
keyword=name
species the statistics to include in the output data set and gives names to the new
variables that contain the statistics. Specify a keyword for each desired statistic, an
equal sign, and the names of the variables to contain the statistic. In the output
data set, the rst variable listed after a keyword in the OUTPUT statement contains
the statistic for the rst variable listed in the VAR statement; the second variable
contains the statistic for the second variable in the VAR statement, and so on. If the
list of names following the equal sign is shorter than the list of variables in the VAR
statement, the procedure uses the names in the order in which the variables are listed
in the VAR statement. The available keywords are listed in the following tables:
Table 3.30. Descriptive Statistic Keywords
CSS
CV
KURTOSIS
MAX
MEAN
MIN
MODE
N
NMISS
NOBS
RANGE
SKEWNESS
STD
STDMEAN
SUM
SUMWGT
USS
VAR
237
238
P1
P5
P10
Q1
MEDIAN
Q3
P90
P95
P99
QRANGE
1st percentile
5th percentile
10th percentile
Lower quartile (25th percentile)
Median (50th percentile)
Upper quartile (75th percentile)
90th percentile
95th percentile
99th percentile
Interquartile range (Q3 - Q1)
GINI
MAD
QN
SN
STD GINI
STD MAD
STD QN
STD QRANGE
STD SN
MSIGN
NORMALTEST
SIGNRANK
PROBM
PROBN
PROBS
PROBT
T
Sign statistic
Test statistic for normality
Signed rank statistic
Probability of a greater absolute value for
the sign statistic
Probability value for the test of normality
Probability value for the signed rank test
Probability value for the Students t test
Statistic for the Students t test
To store the same statistic for several analysis variables, specify a list of names. The
order of the names corresponds to the order of the analysis variables in the VAR
statement. PROC UNIVARIATE uses the rst name to create a variable that contains
the statistic for the rst analysis variable, the next name to create a variable that
contains the statistic for the second analysis variable, and so on. If you do not want
to output statistics for all the analysis variables, specify fewer names than the number
of analysis variables.
The UNIVARIATE procedure automatically computes the 1st, 5th, 10th, 25th, 50th,
75th, 90th, 95th, and 99th percentiles for the data. These can be saved in an output
data set using keyword=names specications. For additional percentiles, you can use
the following percentile-options:
OUTPUT Statement
PCTLPTS=percentiles
species one or more percentiles that are not automatically computed by the
UNIVARIATE procedure. The PCTLPRE= and PCTLPTS= options must be used
together. You can specify percentiles with the expression start TO stop BY increment
where start is a starting number, stop is an ending number, and increment is a number
to increment by. The PCTLPTS= option generates additional percentiles and outputs
them to a data set; these additional percentiles are not printed.
To compute the 50th, 95th, 97.5th, and 100th percentiles, submit the statement
output pctlpre=P_ pctlpts=50,95 to 100 by 2.5;
You can use PCTLPTS= to output percentiles that are not in the list of quantile statistics. PROC UNIVARIATE computes the requested percentiles based on the method
that you specify with the PCTLDEF= option in the PROC UNIVARIATE statement.
You must use PCTLPRE=, and optionally PCTLNAME=, to specify variable names
for the percentiles. For example, the following statements create an output data set
that is named Pctls that contains the 20th and 40th percentiles of the analysis variables PreTest and PostTest:
proc univariate data=Score;
var PreTest PostTest;
output out=Pctls pctlpts=20 40 pctlpre=PreTest_ PostTest_
pctlname=P20 P40;
run;
PROC UNIVARIATE saves the 20th and 40th percentiles for PreTest and PostTest in
the variables PreTest P20, PostTest P20, PreTest P40, and PostTest P40.
PCTLPRE=prexes
species one or more prexes to create the variable names for the variables that contain the PCTLPTS= percentiles. To save the same percentiles for more than one
analysis variable, specify a list of prexes. The order of the prexes corresponds
to the order of the analysis variables in the VAR statement. The PCTLPRE= and
PCTLPTS= options must be used together.
The procedure generates new variable names using the prex and the percentile values. If the specied percentile is an integer, the variable name is simply the prex
followed by the value. If the specied value is not an integer, an underscore replaces
the decimal point in the variable name, and decimal values are truncated to one decimal place. For example, the following statements create the variables PWID20,
PWID33 3, PWID66 6, and PWID80 for the 20th, 33.33rd, 66.67th, and 80th percentiles of Width, respectively:
proc univariate noprint;
var Width;
output pctlpts=20 33.33 66.67 80 pctlpre=pwid;
run;
239
240
PCTLNAME=sufxes
species one or more sufxes to create the names for the variables that contain the
PCTLPTS= percentiles. PROC UNIVARIATE creates a variable name by combining the PCTLPRE= value and sufx-name. Because the sufx names are associated
with the percentiles that are requested, list the sufx names in the same order as the
PCTLPTS= percentiles. If you specify n sufxes with the PCTLNAME= option and
m percentile values with the PCTLPTS= option, where m > n, the sufxes are used
to name the rst n percentiles, and the default names are used for the remaining mn
percentiles. For example, consider the following statements:
proc univariate;
var Length Width Height;
output pctlpts = 20 40
pctlpre = pl pw ph
pctlname = twenty;
run;
The value TWENTY in the PCTLNAME= option is used for only the rst percentile
in the PCTLPTS= list. This sufx is appended to the values in the PCTLPRE= option
to generate the new variable names PLTWENTY, PWTWENTY, and PHTWENTY,
which contain the 20th percentiles for Length, Width, and Height, respectively.
Since a second PCTLNAME= sufx is not specied, variable names for the 40th
percentiles for Length, Width, and Height are generated using the prexes and percentile values. Thus, the output data set contains the variables PLTWENTY, PL40,
PWTWENTY, PW40, PHTWENTY, and PH40.
You must specify PCTLPRE= to supply prex names for the variables that contain
the PCTLPTS= percentiles.
If the number of PCTLNAME= values is fewer than the number of percentiles, or
if you omit PCTLNAME=, PROC UNIVARIATE uses the percentile as the sufx to
create the name of the variable that contains the percentile. For an integer percentile,
PROC UNIVARIATE uses the percentile. Otherwise, PROC UNIVARIATE truncates
decimal values of percentiles to two decimal places and replaces the decimal point
with an underscore.
If either the prex and sufx name combination or the prex and percentile name
combination is longer than 32 characters, PROC UNIVARIATE truncates the prex
name so that the variable name is 32 characters.
PROBPLOT Statement
PROBPLOT Statement
PROBPLOT < variables >< / options >;
The PROBPLOT statement creates a probability plot, which compares ordered variable values with the percentiles of a specied theoretical distribution. If the data
distribution matches the theoretical distribution, the points on the plot form a linear pattern. Consequently, you can use a probability plot to determine how well a
theoretical distribution models a set of measurements.
Probability plots are similar to Q-Q plots, which you can create with the QQPLOT
statement. Probability plots are preferable for graphical estimation of percentiles,
whereas Q-Q plots are preferable for graphical estimation of distribution parameters.
You can use any number of PROBPLOT statements in the UNIVARIATE procedure.
The components of the PROBPLOT statement are described as follows.
variables
are the variables for which to create probability plots. If you specify a VAR statement,
the variables must also be listed in the VAR statement. Otherwise, the variables can
be any numeric variables in the input data set. If you do not specify a list of variables,
then by default the procedure creates a probability plot for each variable listed in the
VAR statement, or for each numeric variable in the DATA= data set if you do not
specify a VAR statement. For example, each of the following PROBPLOT statements
produces two probability plots, one for Length and one for Width:
proc univariate data=Measures;
var Length Width;
probplot;
proc univariate data=Measures;
probplot Length Width;
run;
options
specify the theoretical distribution for the plot or add features to the plot. If you
specify more than one variable, the options apply equally to each variable. Specify
all options after the slash (/) in the PROBPLOT statement. You can specify only
one option naming a distribution in each PROBPLOT statement, but you can specify
any number of other options. The distributions available are the beta, exponential,
gamma, lognormal, normal, two-parameter Weibull, and three-parameter Weibull.
By default, the procedure produces a plot for the normal distribution.
In the following example, the NORMAL option requests a normal probability plot
for each variable, while the MU= and SIGMA= normal-options request a distribution
reference line corresponding to the normal distribution with = 10 and = 0.3.
The SQUARE option displays the plot in a square frame, and the CTEXT= option
species the text color.
proc univariate data=Measures;
probplot Length1 Length2 / normal(mu=10 sigma=0.3)
square ctext=blue;
run;
241
242
Distribution Options
Table 3.34 lists options for requesting a theoretical distribution.
Table 3.34. Primary Options for Theoretical Distributions
BETA(beta-options)
Species beta probability plot for
GAMMA(gamma-options)
LOGNORMAL(lognormal-options)
NORMAL(normal-options)
WEIBULL(Weibull-options)
WEIBULL2(Weibull2-options)
Species two-parameter
probability plot
Weibull
Table 3.35 through Table 3.42 list secondary options that specify distribution parameters and control the display of a distribution reference line. Specify these options in
parentheses after the distribution keyword. For example, you can request a normal
probability plot with a distribution reference line by specifying the NORMAL option
as follows:
proc univariate;
probplot Length / normal(mu=10 sigma=0.3 color=red);
run;
The MU= and SIGMA= normal-options display a distribution reference line that
corresponds to the normal distribution with mean 0 = 10 and standard deviation
0 = 0.3, and the COLOR= normal-option species the color for the line.
PROBPLOT Statement
243
Table 3.35. Secondary Reference Line Options Used with All Distributions
COLOR=color
Species color of distribution reference line
L=linetype
W=n
BETA=value-list | EST
SIGMA=value | EST
THETA=value | EST
THETA=value | EST
SIGMA=value | EST
THETA=value | EST
SLOPE=value | EST
THETA=value|EST
ZETA=value
SIGMA=value | EST
SIGMA=value | EST
THETA=value | EST
SIGMA=value | EST
SLOPE=value | EST
THETA=value
244
Option
ANNOKEY
ANNOTATE=
CAXIS=
CFRAME=
CFRAMESIDE=
CFRAMETOP=
CGRID=
CHREF=
CTEXT=
CVREF=
DESCRIPTION=
FONT=
GRID
HEIGHT=
HMINOR=
HREF=
HREFLABELS=
INFONT=
INHEIGHT=
INTERTILE=
LGRID=
LHREF=
LVREF=
NADJ=
NAME=
NCOLS=
NOFRAME
NOHLABEL
NOVLABEL
NOVTICK
NROWS=
PCTLMINOR
PCTLORDER=
RANKADJ=
SQUARE
VAXISLABEL=
VMINOR=
VREF=
VREFLABELS=
VREFLABPOS=
WAXIS=
Description
Applies annotation requested in ANNOTATE= data set to key cell only
Species annotate data set
Species color for axis
Species color for frame
Species color for lling frame for row labels
Species color for lling frame for column labels
Species color for grid lines
Species color for HREF= lines
Species color for text
Species color for VREF= lines
Species description for plot in graphics catalog
Species software font for text
Creates a grid
Species height of text used outside framed areas
Species number of horizontal minor tick marks
Species reference lines perpendicular to the horizontal axis
Species labels for HREF= lines
Species software font for text inside framed areas
Species height of text inside framed areas
Species distance between tiles
Species a line type for grid lines
Species line style for HREF= lines
Species line style for VREF= lines
Adjusts sample size when computing percentiles
Species name for plot in graphics catalog
Species number of columns in comparative probability plot
Suppresses frame around plotting area
Suppresses label for horizontal axis
Suppresses label for vertical axis
Suppresses tick marks and tick mark labels for vertical axis
Species number of rows in comparative probability plot
Requests minor tick marks for percentile axis
Species tick mark labels for percentile axis
Adjusts ranks when computing percentiles
Displays plot in square format
Species label for vertical axis
Species number of vertical minor tick marks
Species reference lines perpendicular to the vertical axis
Species labels for VREF= lines
Species horizontal position of labels for VREF= lines
Species line thickness for axes and frame
PROBPLOT Statement
Dictionary of Options
The following entries provide detailed descriptions of options in the PROBPLOT
statement.
ALPHA=value | EST
species the mandatory shape parameter for probability plots requested with the
BETA and GAMMA options. Enclose the ALPHA= option in parentheses after the
BETA or GAMMA options. If you specify ALPHA=EST, a maximum likelihood
estimate is computed for .
ANNOKEY
applies the annotation requested with the ANNOTATE= option to the key cell only.
By default, the procedure applies annotation to all of the cells. This option is not
available unless you use the CLASS statement. Specify the KEYLEVEL= option in
the CLASS statement to specify the key cell.
ANNOTATE=SAS-data-set
ANNO=SAS-data-set
creates a beta probability plot for each combination of the required shape parameters
and specied by the required ALPHA= and BETA= beta-options. If you specify
ALPHA=EST and BETA=EST, the procedure creates a plot based on maximum likelihood estimates for and . You can specify the SCALE= beta-option as an alias
for the SIGMA= beta-option and the THRESHOLD= beta-option as an alias for the
THETA= beta-option. To create a plot that is based on maximum likelihood estimates
for and , specify ALPHA=EST and BETA=EST.
To obtain graphical estimates of and , specify lists of values in the ALPHA= and
BETA= beta-options, and select the combination of and that most nearly linearizes the point pattern. To assess the point pattern, you can add a diagonal distribution reference line corresponding to lower threshold parameter 0 and scale parameter
0 with the THETA= and SIGMA= beta-options. Alternatively, you can add a line
that corresponds to estimated values of 0 and 0 with the beta-options THETA=EST
and SIGMA=EST. Agreement between the reference line and the point pattern indicates that the beta distribution with parameters , , 0 , and 0 is a good t.
BETA=value | EST
B=value | EST
species the mandatory shape parameter for probability plots requested with the
BETA option. Enclose the BETA= option in parentheses after the BETA option. If
you specify BETA=EST, a maximum likelihood estimate is computed for .
245
246
C=value | EST
species the shape parameter c for probability plots requested with the WEIBULL
and WEIBULL2 options. Enclose this option in parentheses after the WEIBULL
or WEIBULL2 option. C= is a required Weibull-option in the WEIBULL option; in
this situation, it accepts a list of values, or if you specify C=EST, a maximum likelihood estimate is computed for c. You can optionally specify C=value or C=EST
as a Weibull2-option with the WEIBULL2 option to request a distribution reference line; in this situation, you must also specify Weibull2-option SIGMA=value or
SIGMA=EST.
CAXIS=color
CAXES=color
species the color for the axes. This option overrides any COLOR= specications in
an AXIS statement. The default value is the rst color in the device color list.
CFRAME=color
species the color for the area that is enclosed by the axes and frame. The area is not
lled by default.
CFRAMESIDE=color
species the color to ll the frame area for the row labels that display along the left
side of a comparative probability plot. This color also lls the frame area for the
label of the corresponding class variable (if you associate a label with the variable).
By default, these areas are not lled. This option is not available unless you use the
CLASS statement.
CFRAMETOP=color
species the color to ll the frame area for the column labels that display across the
top of a comparative probability plot. This color also lls the frame area for the
label of the corresponding class variable (if you associate a label with the variable).
By default, these areas are not lled. This option does not apply unless you use the
CLASS statement.
CGRID=color
species the color for grid lines when a grid displays on the plot. The default color is
the rst color in the device color list. This option also produces a grid.
CHREF=color
CH=color
species the color for horizontal axis reference lines requested by the HREF= option.
The default color is the rst color in the device color list.
COLOR=color
species the color of the diagonal distribution reference line. The default color is the
rst color in the device color list. Enclose the COLOR= option in parentheses after a
distribution option keyword.
PROBPLOT Statement
CTEXT=color
species the color for tick mark values and axis labels. The default color is the color
that you specify for the CTEXT= option in the GOPTIONS statement. If you omit
the GOPTIONS statement, the default is the rst color in the device color list.
CVREF=color
CV=color
species the color for the reference lines requested by the VREF= option. The default
color is the rst color in the device color list.
DESCRIPTION=string
DES=string
creates an exponential probability plot. To assess the point pattern, add a diagonal distribution reference line corresponding to 0 and 0 with the THETA= and SIGMA=
exponential-options. Alternatively, you can add a line corresponding to estimated
values of the threshold parameter 0 and the scale parameter with the exponentialoptions THETA=EST and SIGMA=EST. Agreement between the reference line and
the point pattern indicates that the exponential distribution with parameters 0 and
0 is a good t. You can specify the SCALE= exponential-option as an alias for the
SIGMA= exponential-option and the THRESHOLD= exponential-option as an alias
for the THETA= exponential-option.
FONT=font
species a software font for the reference lines and axis labels. You can also specify
fonts for axis labels in an AXIS statement. The FONT= font takes precedence over
the FTEXT= font specied in the GOPTIONS statement. Hardware characters are
used by default.
GAMMA(ALPHA=value | EST <gamma-options>)
creates a gamma probability plot for each value of the shape parameter given by
the mandatory ALPHA= gamma-option. If you specify ALPHA=EST, the procedure
creates a plot based on a maximum likelihood estimate for . To obtain a graphical
estimate of , specify a list of values for the ALPHA= gamma-option, and select the
value that most nearly linearizes the point pattern. To assess the point pattern, add
a diagonal distribution reference line corresponding to 0 and 0 with the THETA=
and SIGMA= gamma-options. Alternatively, you can add a line corresponding to
estimated values of the threshold parameter 0 and the scale parameter with the
gamma-options THETA=EST and SIGMA=EST. Agreement between the reference
line and the point pattern indicates that the gamma distribution with parameters , 0
and 0 is a good t. You can specify the SCALE= gamma-option as an alias for the
SIGMA= gamma-option and the THRESHOLD= gamma-option as an alias for the
THETA= gamma-option.
247
248
GRID
displays a grid. Grid lines are reference lines that are perpendicular to the percentile
axis at major tick marks.
HEIGHT=value
species the height, in percentage screen units, of text for axis labels, tick mark
labels, and legends. This option takes precedence over the HTEXT= option in the
GOPTIONS statement.
HMINOR=n
HM=n
species the number of minor tick marks between each major tick mark on the horizontal axis. Minor tick marks are not labeled. By default, HMINOR=0.
HREF=values
draws reference lines that are perpendicular to the horizontal axis at the values you
specify.
HREFLABELS=label1 . . . labeln
HREFLABEL=label1 . . . labeln
HREFLAB=label1 . . . labeln
species labels for the reference lines requested by the HREF= option. The number of
labels must equal the number of reference lines. Labels can have up to 16 characters.
HREFLABPOS=n
species a software font to use for text inside the framed areas of the plot. The
INFONT= option takes precedence over the FTEXT= option in the GOPTIONS statement. For a list of fonts, see SAS/GRAPH Reference.
INHEIGHT=value
species the height, in percentage screen units, of text used inside the framed areas
of the plot. By default, the height specied by the HEIGHT= option is used. If you
do not specify the HEIGHT= option, the height specied with the HTEXT= option
in the GOPTIONS statement is used.
INTERTILE=value
species the distance, in horizontal percentage screen units, between the framed areas, which are called tiles. By default, the tiles are contiguous. This option is not
available unless you use the CLASS statement.
L=linetype
species the line type for a diagonal distribution reference line. Enclose the L= option
in parentheses after a distribution option. By default, L=1, which produces a solid
line.
PROBPLOT Statement
LGRID=linetype
species the line type for the grid requested by the GRID= option. By default,
LGRID=1, which produces a solid line.
LHREF=linetype
LH=linetype
species the line type for the reference lines that you request with the HREF= option.
By default, LHREF=2, which produces a dashed line.
LOGNORMAL(SIGMA=value | EST <lognormal-options>)
LNORM(SIGMA=value | EST <lognormal-options>)
creates a lognormal probability plot for each value of the shape parameter given by
the mandatory SIGMA= lognormal-option. If you specify SIGMA=EST, the procedure creates a plot based on a maximum likelihood estimate for . To obtain a graphical estimate of , specify a list of values for the SIGMA= lognormal-option, and select
the value that most nearly linearizes the point pattern. To assess the point pattern, add
a diagonal distribution reference line corresponding to 0 and 0 with the THETA=
and ZETA= lognormal-options. Alternatively, you can add a line corresponding to
estimated values of the threshold parameter 0 and the scale parameter 0 with the
lognormal-options THETA=EST and ZETA=EST. Agreement between the reference
line and the point pattern indicates that the lognormal distribution with parameters
, 0 and 0 is a good t. You can specify the THRESHOLD= lognormal-option as
an alias for the THETA= lognormal-option and the SCALE= lognormal-option as an
alias for the ZETA= lognormal-option. See Example 3.26.
LVREF=linetype
species the line type for the reference lines requested with the VREF= option. By
default, LVREF=2, which produces a dashed line.
MU=value | EST
species the mean 0 for a normal probability plot requested with the NORMAL
option. Enclose the MU= normal-option in parentheses after the NORMAL option.
The MU= normal-option must be specied with the SIGMA= normal-option, and
they request a distribution reference line. You can specify MU=EST to request a
distribution reference line with 0 equal to the sample mean.
NADJ=value
species the adjustment value added to the sample size in the calculation of theoretical percentiles. By default, NADJ= 1 . Refer to Chambers et al. (1983).
4
NAME=string
species a name for the plot, up to eight characters long, that appears in the PROC
GREPLAY master menu. The default value is UNIVAR.
NCOLS=n
NCOL=n
249
250
NOFRAME
suppresses the label for the horizontal axis. You can use this option to reduce clutter.
NORMAL<(normal-options)>
creates a normal probability plot. This is the default if you omit a distribution option.
To assess the point pattern, you can add a diagonal distribution reference line corresponding to 0 and 0 with the MU= and SIGMA= normal-options. Alternatively,
you can add a line corresponding to estimated values of 0 and 0 with the normaloptions MU=EST and SIGMA=EST; the estimates of the mean 0 and the standard
deviation 0 are the sample mean and sample standard deviation. Agreement between the reference line and the point pattern indicates that the normal distribution
with parameters 0 and 0 is a good t.
NOVLABEL
suppresses the label for the vertical axis. You can use this option to reduce clutter.
NOVTICK
suppresses the tick marks and tick mark labels for the vertical axis. This option also
suppresses the label for the vertical axis.
NROWS=n
NROW=n
requests minor tick marks for the percentile axis. The HMINOR option overrides the
minor tick marks requested by the PCTLMINOR option.
PCTLORDER=values
species the tick marks that are labeled on the theoretical percentile axis. Since
the values are percentiles, the labels must be between 0 and 100, exclusive. The
values must be listed in increasing order and must cover the plotted percentile range.
Otherwise, the default values of 1, 5, 10, 25, 50, 75, 90, 95, and 99 are used.
RANKADJ=value
species the adjustment value added to the ranks in the calculation of theoretical
3
percentiles. By default, RANKADJ= 8 , as recommended by Blom (1958). Refer to
Chambers et al. (1983) for additional information.
SCALE=value | EST
is an alias for the SIGMA= option for plots requested by the BETA, EXPONENTIAL,
GAMMA, and WEIBULL options and for the ZETA= option when you request the
LOGNORMAL option. See the entries for the SIGMA= and ZETA= options.
PROBPLOT Statement
251
SHAPE=value | EST
is an alias for the ALPHA= option with the GAMMA option, for the SIGMA= option with the LOGNORMAL option, and for the C= option with the WEIBULL and
WEIBULL2 options. See the entries for the ALPHA=, SIGMA=, and C= options.
SIGMA=value | EST
species the parameter , where > 0. Alternatively, you can specify SIGMA=EST
to request a maximum likelihood estimate for 0 . The interpretation and use of the
SIGMA= option depend on the distribution option with which it is used. See Table
3.44 for a summary of how to use the SIGMA= option. You must enclose this option
in parentheses after the distribution option.
Table 3.44. Uses of the SIGMA= Option
Distribution Option
BETA
EXPONENTIAL
GAMMA
WEIBULL
LOGNORMAL
NORMAL
WEIBULL2
SIGMA=1 . . . n requests n probability plots with shape parameters 1 . . . n . The SIGMA= option must be specied.
SLOPE=value | EST
species the slope for a distribution reference line requested with the LOGNORMAL
and WEIBULL2 options. Enclose the SLOPE= option in parentheses after the distribution option. When you use the SLOPE= lognormal-option with the LOGNORMAL
option, you must also specify a threshold parameter value 0 with the THETA=
lognormal-option to request the line. The SLOPE= lognormal-option is an alternative
to the ZETA= lognormal-option for specifying 0 , since the slope is equal to exp(0 ).
When you use the SLOPE= Weibull2-option with the WEIBULL2 option, you must
also specify a scale parameter value 0 with the SIGMA= Weibull2-option to request
the line. The SLOPE= Weibull2-option is an alternative to the C= Weibull2-option for
specifying c0 , since the slope is equal to c1 .
0
For example, the rst and second PROBPLOT statements produce the same probability plots and the third and fourth PROBPLOT statements produce the same probability plots:
proc univariate data=Measures;
probplot Width / lognormal(sigma=2 theta=0 zeta=0);
probplot Width / lognormal(sigma=2 theta=0 slope=1);
probplot Width / weibull2(sigma=2 theta=0 c=.25);
probplot Width / weibull2(sigma=2 theta=0 slope=4);
run;
252
SQUARE
displays the probability plot in a square frame. By default, the plot is in a rectangular
frame.
THETA=value | EST
species the lower threshold parameter for plots requested with the BETA,
EXPONENTIAL, GAMMA, LOGNORMAL, WEIBULL, and WEIBULL2 options.
Enclose the THETA= option in parentheses after a distribution option. When used
with the WEIBULL2 option, the THETA= option species the known lower threshold 0 , for which the default is 0. When used with the other distribution options, the
THETA= option species 0 for a distribution reference line; alternatively in this situation, you can specify THETA=EST to request a maximum likelihood estimate for
0 . To request the line, you must also specify a scale parameter.
THRESHOLD=value | EST
species a label for the vertical axis. Labels can have up to 40 characters.
VMINOR=n
VM=n
species the number of minor tick marks between each major tick mark on the vertical
axis. Minor tick marks are not labeled. The default is zero.
VREF=values
draws reference lines perpendicular to the vertical axis at the values specied. Also
see the CVREF=, LVREF=, and VREFCHAR= options.
VREFLABELS=label1. . . labeln
VREFLABEL=label1. . . labeln
VREFLAB=label1. . . labeln
species labels for the reference lines requested by the VREF= option. The number
of labels must equal the number of reference lines. Enclose each label in quotes.
Labels can have up to 16 characters.
VREFLABPOS=n
species the width, in pixels, for a diagonal distribution line. Enclose the W= option
in parentheses after the distribution option. By default, W=1.
WAXIS=n
species the line thickness, in pixels, for the axes and frame. By default, WAXIS=1.
QQPLOT Statement
WEIBULL(C=value | EST <Weibull-options>)
WEIB(C=value | EST <Weibull-options>)
creates a three-parameter Weibull probability plot for each value of the required shape
parameter c specied by the mandatory C= Weibull-option. To create a plot that is
based on a maximum likelihood estimate for c, specify C=EST. To obtain a graphical
estimate of c, specify a list of values in the C= Weibull-option, and select the value
that most nearly linearizes the point pattern. To assess the point pattern, add a diagonal distribution reference line corresponding to 0 and 0 with the THETA= and
SIGMA= Weibull-options. Alternatively, you can add a line corresponding to estimated values of 0 and 0 with the Weibull-options THETA=EST and SIGMA=EST.
Agreement between the reference line and the point pattern indicates that the Weibull
distribution with parameters c, 0 , and 0 is a good t. You can specify the SCALE=
Weibull-option as an alias for the SIGMA= Weibull-option and the THRESHOLD=
Weibull-option as an alias for the THETA= Weibull-option.
WEIBULL2<(Weibull2-options)>
W2<(Weibull2-options)>
creates a two-parameter Weibull probability plot. You should use the WEIBULL2 option when your data have a known lower threshold 0 , which is 0 by default. To specify the threshold value 0 , use the THETA= Weibull2-option. By default, THETA=0.
An advantage of the two-parameter Weibull plot over the three-parameter Weibull
plot is that the parameters c and can be estimated from the slope and intercept of
the point pattern. A disadvantage is that the two-parameter Weibull distribution applies only in situations where the threshold parameter is known. To obtain a graphical
estimate of 0 , specify a list of values for the THETA= Weibull2-option, and select
the value that most nearly linearizes the point pattern. To assess the point pattern, add
a diagonal distribution reference line corresponding to 0 and c0 with the SIGMA=
and C= Weibull2-options. Alternatively, you can add a distribution reference line corresponding to estimated values of 0 and c0 with the Weibull2-options SIGMA=EST
and C=EST. Agreement between the reference line and the point pattern indicates that
the Weibull distribution with parameters c0 , 0 , and 0 is a good t. You can specify
the SCALE= Weibull2-option as an alias for the SIGMA= Weibull2-option and the
SHAPE= Weibull2-option as an alias for the C= Weibull2-option.
ZETA=value | EST
species a value for the scale parameter for the lognormal probability plots requested with the LOGNORMAL option. Enclose the ZETA= lognormal-option in
parentheses after the LOGNORMAL option. To request a distribution reference line
with intercept 0 and slope exp(0 ), specify the THETA=0 and ZETA=0 .
QQPLOT Statement
QQPLOT < variables >< / options >;
The QQPLOT statement creates quantile-quantile plots (Q-Q plots) using highresolution graphics and compares ordered variable values with quantiles of a specied
theoretical distribution. If the data distribution matches the theoretical distribution,
the points on the plot form a linear pattern. Thus, you can use a Q-Q plot to determine
how well a theoretical distribution models a set of measurements.
253
254
variables
are the variables for which to create Q-Q plots. If you specify a VAR statement, the
variables must also be listed in the VAR statement. Otherwise, the variables can be
any numeric variables in the input data set. If you do not specify a list of variables,
then by default the procedure creates a Q-Q plot for each variable listed in the VAR
statement, or for each numeric variable in the DATA= data set if you do not specify
a VAR statement. For example, each of the following QQPLOT statements produces
two Q-Q plots, one for Length and one for Width:
proc univariate data=Measures;
var Length Width;
qqplot;
proc univariate data=Measures;
qqplot Length Width;
run;
options
specify the theoretical distribution for the plot or add features to the plot. If you specify more than one variable, the options apply equally to each variable. Specify all
options after the slash (/) in the QQPLOT statement. You can specify only one option
naming the distribution in each QQPLOT statement, but you can specify any number of other options. The distributions available are the beta, exponential, gamma,
lognormal, normal, two-parameter Weibull, and three-parameter Weibull. By default,
the procedure produces a plot for the normal distribution.
In the following example, the NORMAL option requests a normal Q-Q plot for each
variable. The MU= and SIGMA= normal-options request a distribution reference line
with intercept 10 and slope 0.3 for each plot, corresponding to a normal distribution
with mean = 10 and standard deviation = 0.3. The SQUARE option displays
the plot in a square frame, and the CTEXT= option species the text color.
proc univariate data=measures;
qqplot length1 length2 / normal(mu=10 sigma=0.3)
square ctext=blue;
run;
Table 3.45 through Table 3.54 list the QQPLOT options by function. For complete
descriptions, see the section Dictionary of Options on page 258.
Options can be any of the following:
primary options
secondary options
general options
QQPLOT Statement
Distribution Options
Table 3.45 lists primary options for requesting a theoretical distribution.
Table 3.45. Primary Options for Theoretical Distributions
BETA(beta-options)
Species beta Q-Q plot for shape
EXPONENTIAL(exponential-options)
GAMMA(gamma-options)
LOGNORMAL(lognormal-options)
NORMAL(normal-options)
WEIBULL(Weibull-options)
WEIBULL2(Weibull2-options)
Table 3.46 through Table 3.53 list secondary options that specify distribution parameters and control the display of a distribution reference line. Specify these options in
parentheses after the distribution keyword. For example, you can request a normal
Q-Q plot with a distribution reference line by specifying the NORMAL option as
follows:
proc univariate;
qqplot Length / normal(mu=10 sigma=0.3 color=red);
run;
The MU= and SIGMA= normal-options display a distribution reference line that
corresponds to the normal distribution with mean 0 = 10 and standard deviation
0 = 0.3, and the COLOR= normal-option species the color for the line.
Table 3.46. Secondary Reference Line Options Used with All Distributions
COLOR=color
Species color of distribution reference line
L=linetype
W=n
255
256
BETA=value-list | EST
SIGMA=value | EST
THETA=value | EST
THETA=value | EST
SIGMA=value | EST
THETA=value | EST
SLOPE=value | EST
THETA=value|EST
ZETA=value
SIGMA=value | EST
SIGMA=value | EST
THETA=value | EST
SIGMA=value | EST
SLOPE=value | EST
THETA=value
General Options
Table 3.54 summarizes general options for enhancing Q-Q plots.
Table 3.54. General Graphics Options
Option
ANNOKEY
ANNOTATE=
Description
Applies annotation requested in ANNOTATE= data set to key cell only
Species annotate data set
QQPLOT Statement
Table 3.54. (continued)
Option
CAXIS=
CFRAME=
CFRAMESIDE=
CFRAMETOP=
CGRID=
CHREF=
CTEXT=
CVREF=
DESCRIPTION=
FONT=
GRID
HEIGHT=
HMINOR=
HREF=
HREFLABELS=
HREFLABPOS=
INFONT=
INHEIGHT=
INTERTILE=
LGRID=
LHREF=
LVREF=
NADJ=
NAME=
NCOLS=
NOFRAME
NOHLABEL
NOVLABEL
NOVTICK
NROWS=
PCTLAXIS
PCTLMINOR
PCTLSCALE
RANKADJ=
SQUARE
VAXISLABEL=
VMINOR=
VREF=
VREFLABELS=
VREFLABPOS=
WAXIS=
Description
Species color for axis
Species color for frame
Species color for lling frame for row labels
Species color for lling frame for column labels
Species color for grid lines
Species color for HREF= lines
Species color for text
Species color for VREF= lines
Species description for plot in graphics catalog
Species software font for text
Creates a grid
Species height of text used outside framed areas
Species number of horizontal minor tick marks
Species reference lines perpendicular to the horizontal axis
Species labels for HREF= lines
Species vertical position of labels for HREF= lines
Species software font for text inside framed areas
Species height of text inside framed areas
Species distance between tiles
Species a line type for grid lines
Species line style for HREF= lines
Species line style for VREF= lines
Adjusts sample size when computing percentiles
Species name for plot in graphics catalog
Species number of columns in comparative Q-Q plot
Suppresses frame around plotting area
Suppresses label for horizontal axis
Suppresses label for vertical axis
Suppresses tick marks and tick mark labels for vertical axis
Species number of rows in comparative Q-Q plot
Displays a nonlinear percentile axis
Requests minor tick marks for percentile axis
Replaces theoretical quantiles with percentiles
Adjusts ranks when computing percentiles
Displays plot in square format
Species label for vertical axis
Species number of vertical minor tick marks
Species reference lines perpendicular to the vertical axis
Species labels for VREF= lines
Species horizontal position of labels for VREF= lines
Species line thickness for axes and frame
257
258
Dictionary of Options
The following entries provide detailed descriptions of options in the QQPLOT statement.
ALPHA=value | EST
species the mandatory shape parameter for quantile plots requested with the BETA
and GAMMA options. Enclose the ALPHA= option in parentheses after the BETA
or GAMMA options. If you specify ALPHA=EST, a maximum likelihood estimate
is computed for .
ANNOKEY
applies the annotation requested with the ANNOTATE= option to the key cell only.
By default, the procedure applies annotation to all of the cells. This option is not
available unless you use the CLASS statement. Specify the KEYLEVEL= option in
the CLASS statement to specify the key cell.
ANNOTATE=SAS-data-set
ANNO=SAS-data-set
creates a beta quantile plot for each combination of the required shape parameters
and specied by the required ALPHA= and BETA= beta-options. If you specify
ALPHA=EST and BETA=EST, the procedure creates a plot based on maximum likelihood estimates for and . You can specify the SCALE= beta-option as an alias
for the SIGMA= beta-option and the THRESHOLD= beta-option as an alias for the
THETA= beta-option. To create a plot that is based on maximum likelihood estimates
for and , specify ALPHA=EST and BETA=EST.
To obtain graphical estimates of and , specify lists of values in the ALPHA= and
BETA= beta-options, and select the combination of and that most nearly linearizes the point pattern. To assess the point pattern, you can add a diagonal distribution reference line corresponding to lower threshold parameter 0 and scale parameter
0 with the THETA= and SIGMA= beta-options. Alternatively, you can add a line
that corresponds to estimated values of 0 and 0 with the beta-options THETA=EST
and SIGMA=EST. Agreement between the reference line and the point pattern indicates that the beta distribution with parameters , , 0 , and 0 is a good t.
BETA=value | EST
B=value | EST
species the mandatory shape parameter for quantile plots requested with the BETA
option. Enclose the BETA= option in parentheses after the BETA option. If you
specify BETA=EST, a maximum likelihood estimate is computed for .
QQPLOT Statement
C=value | EST
species the shape parameter c for quantile plots requested with the WEIBULL and
WEIBULL2 options. Enclose this option in parentheses after the WEIBULL or
WEIBULL2 option. C= is a required Weibull-option in the WEIBULL option; in
this situation, it accepts a list of values, or if you specify C=EST, a maximum likelihood estimate is computed for c. You can optionally specify C=value or C=EST
as a Weibull2-option with the WEIBULL2 option to request a distribution reference line; in this situation, you must also specify Weibull2-option SIGMA=value or
SIGMA=EST.
CAXIS=color
CAXES=color
species the color for the axes. This option overrides any COLOR= specications in
an AXIS statement. The default value is the rst color in the device color list.
CFRAME=color
species the color for the area that is enclosed by the axes and frame. The area is not
lled by default.
CFRAMESIDE=color
species the color to ll the frame area for the row labels that display along the left
side of a comparative quantile plot. This color also lls the frame area for the label
of the corresponding class variable (if you associate a label with the variable). By
default, these areas are not lled. This option is not available unless you use the
CLASS statement.
CFRAMETOP=color
species the color to ll the frame area for the column labels that display across the
top of a comparative quantile plot. This color also lls the frame area for the label
of the corresponding class variable (if you associate a label with the variable). By
default, these areas are not lled. This option does not apply unless you use the
CLASS statement.
CGRID=color
species the color for grid lines when a grid displays on the plot. The default color is
the rst color in the device color list. This option also produces a grid.
CHREF=color
CH=color
species the color for horizontal axis reference lines requested by the HREF= option.
The default color is the rst color in the device color list.
COLOR=color
species the color of the diagonal distribution reference line. The default color is the
rst color in the device color list. Enclose the COLOR= option in parentheses after a
distribution option keyword.
CTEXT=color
species the color for tick mark values and axis labels. The default color is the color
that you specify for the CTEXT= option in the GOPTIONS statement. If you omit
the GOPTIONS statement, the default is the rst color in the device color list.
259
260
CVREF=color
CV=color
species the color for the reference lines requested by the VREF= option. The default
color is the rst color in the device color list.
DESCRIPTION=string
DES=string
creates an exponential quantile plot. To assess the point pattern, add a diagonal distribution reference line corresponding to 0 and 0 with the THETA= and SIGMA=
exponential-options. Alternatively, you can add a line corresponding to estimated
values of the threshold parameter 0 and the scale parameter with the exponentialoptions THETA=EST and SIGMA=EST. Agreement between the reference line and
the point pattern indicates that the exponential distribution with parameters 0 and
0 is a good t. You can specify the SCALE= exponential-option as an alias for the
SIGMA= exponential-option and the THRESHOLD= exponential-option as an alias
for the THETA= exponential-option.
FONT=font
species a software font for the reference lines and axis labels. You can also specify
fonts for axis labels in an AXIS statement. The FONT= font takes precedence over
the FTEXT= font specied in the GOPTIONS statement. Hardware characters are
used by default.
GAMMA(ALPHA=value | EST <gamma-options>)
creates a gamma quantile plot for each value of the shape parameter given by the
mandatory ALPHA= gamma-option. If you specify ALPHA=EST, the procedure
creates a plot based on a maximum likelihood estimate for . To obtain a graphical
estimate of , specify a list of values for the ALPHA= gamma-option, and select the
value that most nearly linearizes the point pattern. To assess the point pattern, add
a diagonal distribution reference line corresponding to 0 and 0 with the THETA=
and SIGMA= gamma-options. Alternatively, you can add a line corresponding to
estimated values of the threshold parameter 0 and the scale parameter with the
gamma-options THETA=EST and SIGMA=EST. Agreement between the reference
line and the point pattern indicates that the gamma distribution with parameters , 0
and 0 is a good t. You can specify the SCALE= gamma-option as an alias for the
SIGMA= gamma-option and the THRESHOLD= gamma-option as an alias for the
THETA= gamma-option.
GRID
displays a grid of horizontal lines positioned at major tick marks on the vertical axis.
HEIGHT=value
species the height, in percentage screen units, of text for axis labels, tick mark
labels, and legends. This option takes precedence over the HTEXT= option in the
GOPTIONS statement.
QQPLOT Statement
HMINOR=n
HM=n
species the number of minor tick marks between each major tick mark on the horizontal axis. Minor tick marks are not labeled. By default, HMINOR=0.
HREF=values
draws reference lines that are perpendicular to the horizontal axis at specied values.
When you use the PCTLAXIS option, HREF= values must be in quantile units.
HREFLABELS=label1 . . . labeln
HREFLABEL=label1 . . . labeln
HREFLAB=label1 . . . labeln
species labels for the reference lines requested by the HREF= option. The number of
labels must equal the number of reference lines. Labels can have up to 16 characters.
HREFLABPOS=n
species a software font to use for text inside the framed areas of the plot. The
INFONT= option takes precedence over the FTEXT= option in the GOPTIONS statement. For a list of fonts, see SAS/GRAPH Reference.
INHEIGHT=value
species the height, in percentage screen units, of text used inside the framed areas
of the plot. By default, the height specied by the HEIGHT= option is used. If you
do not specify the HEIGHT= option, the height specied with the HTEXT= option
in the GOPTIONS statement is used.
INTERTILE=value
species the distance, in horizontal percentage screen units, between the framed areas, which are called tiles. By default, INTERTILE=0.75 percentage screen units.
This option is not available unless you use the CLASS statement. You can specify
INTERTILE=0 to create contiguous tiles.
L=linetype
species the line type for a diagonal distribution reference line. Enclose the L= option
in parentheses after a distribution option. By default, L=1, which produces a solid
line.
LGRID=linetype
species the line type for the grid requested by the GRID option. By default,
LGRID=1, which produces a solid line. The LGRID= option also produces a grid.
LHREF=linetype
LH=linetype
species the line type for the reference lines that you request with the HREF= option.
By default, LHREF=2, which produces a dashed line.
261
262
creates a lognormal quantile plot for each value of the shape parameter given by the
mandatory SIGMA= lognormal-option. If you specify SIGMA=EST, the procedure
creates a plot based on a maximum likelihood estimate for . To obtain a graphical
estimate of , specify a list of values for the SIGMA= lognormal-option, and select
the value that most nearly linearizes the point pattern. To assess the point pattern, add
a diagonal distribution reference line corresponding to 0 and 0 with the THETA=
and ZETA= lognormal-options. Alternatively, you can add a line corresponding to
estimated values of the threshold parameter 0 and the scale parameter 0 with the
lognormal-options THETA=EST and ZETA=EST. Agreement between the reference
line and the point pattern indicates that the lognormal distribution with parameters
, 0 and 0 is a good t. You can specify the THRESHOLD= lognormal-option as
an alias for the THETA= lognormal-option and the SCALE= lognormal-option as an
alias for the ZETA= lognormal-option. See Example 3.31 through Example 3.33.
LVREF=linetype
species the line type for the reference lines requested with the VREF= option. By
default, LVREF=2, which produces a dashed line.
MU=value | EST
species the mean 0 for a normal quantile plot requested with the NORMAL option. Enclose the MU= normal-option in parentheses after the NORMAL option. The
MU= normal-option must be specied with the SIGMA= normal-option, and they request a distribution reference line. You can specify MU=EST to request a distribution
reference line with 0 equal to the sample mean.
NADJ=value
species the adjustment value added to the sample size in the calculation of theoreti1
cal percentiles. By default, NADJ= 4 . Refer to Chambers et al. (1983) for additional
information.
NAME=string
species a name for the plot, up to eight characters long, that appears in the PROC
GREPLAY master menu. The default value is UNIVAR.
NCOLS=n
NCOL=n
suppresses the frame around the subplot area. If you specify the PCTLAXIS option,
then you cannot specify the NOFRAME option.
NOHLABEL
suppresses the label for the horizontal axis. You can use this option to reduce clutter.
QQPLOT Statement
NORMAL<(normal-options)>
creates a normal quantile plot. This is the default if you omit a distribution option.
To assess the point pattern, you can add a diagonal distribution reference line corresponding to 0 and 0 with the MU= and SIGMA= normal-options. Alternatively,
you can add a line corresponding to estimated values of 0 and 0 with the normaloptions MU=EST and SIGMA=EST; the estimates of the mean 0 and the standard
deviation 0 are the sample mean and sample standard deviation. Agreement between the reference line and the point pattern indicates that the normal distribution
with parameters 0 and 0 is a good t. See Example 3.28 and Example 3.30.
NOVLABEL
suppresses the label for the vertical axis. You can use this option to reduce clutter.
NOVTICK
suppresses the tick marks and tick mark labels for the vertical axis. This option also
suppresses the label for the vertical axis.
NROWS=n
NROW=n
adds a nonlinear percentile axis along the frame of the Q-Q plot opposite the theoretical quantile axis. The added axis is identical to the axis for probability plots produced
with the PROBPLOT statement. When using the PCTLAXIS option, you must specify HREF= values in quantile units, and you cannot use the NOFRAME option. You
can specify the following axis-options:
Table 3.55. Axis Options
GRID
GRIDCHAR=character
LABEL=string
LGRID=linetype
PCTLMINOR
requests minor tick marks for the percentile axis when you specify PCTLAXIS. The
HMINOR option overrides the PCTLMINOR option.
PCTLSCALE
requests scale labels for the theoretical quantile axis in percentile units, resulting in
a nonlinear axis scale. Tick marks are drawn uniformly across the axis based on the
quantile scale. In all other respects, the plot remains the same, and you must specify
HREF= values in quantile units. For a true nonlinear axis, use the PCTLAXIS option
or use the PROBPLOT statement.
RANKADJ=value
species the adjustment value added to the ranks in the calculation of theoretical
3
percentiles. By default, RANKADJ= 8 , as recommended by Blom (1958). Refer to
Chambers et al. (1983) for additional information.
263
264
SCALE=value | EST
is an alias for the SIGMA= option for plots requested by the BETA, EXPONENTIAL,
GAMMA, WEIBULL, and WEIBULL2 options and for the ZETA= option with the
LOGNORMAL option. See the entries for the SIGMA= and ZETA= options.
SHAPE=value | EST
is an alias for the ALPHA= option with the GAMMA option, for the SIGMA= option with the LOGNORMAL option, and for the C= option with the WEIBULL and
WEIBULL2 options. See the entries for the ALPHA=, SIGMA=, and C= options.
SIGMA=value | EST
species the parameter , where > 0. Alternatively, you can specify SIGMA=EST
to request a maximum likelihood estimate for 0 . The interpretation and use of the
SIGMA= option depend on the distribution option with which it is used, as summarized in Table 3.56. Enclose this option in parentheses after the distribution option.
Table 3.56. Uses of the SIGMA= Option
Distribution Option
BETA
EXPONENTIAL
GAMMA
WEIBULL
LOGNORMAL
NORMAL
WEIBULL2
SIGMA=1 . . . n requests n quantile plots with shape parameters 1 . . . n . The SIGMA= option must be specied.
SLOPE=value | EST
species the slope for a distribution reference line requested with the LOGNORMAL
and WEIBULL2 options. Enclose the SLOPE= option in parentheses after the distribution option. When you use the SLOPE= lognormal-option with the LOGNORMAL
option, you must also specify a threshold parameter value 0 with the THETA=
lognormal-option to request the line. The SLOPE= lognormal-option is an alternative
to the ZETA= lognormal-option for specifying 0 , since the slope is equal to exp(0 ).
When you use the SLOPE= Weibull2-option with the WEIBULL2 option, you must
also specify a scale parameter value 0 with the SIGMA= Weibull2-option to request
the line. The SLOPE= Weibull2-option is an alternative to the C= Weibull2-option for
specifying c0 , since the slope is equal to c1 .
0
For example, the rst and second QQPLOT statements produce the same quantile
plots and the third and fourth QQPLOT statements produce the same quantile plots:
proc univariate
qqplot Width
qqplot Width
qqplot Width
qqplot Width
data=Measures;
/ lognormal(sigma=2 theta=0 zeta=0);
/ lognormal(sigma=2 theta=0 slope=1);
/ weibull2(sigma=2 theta=0 c=.25);
/ weibull2(sigma=2 theta=0 slope=4);
QQPLOT Statement
SQUARE
displays the quantile plot in a square frame. By default, the frame is rectangular.
THETA=value | EST
species the lower threshold parameter for plots requested with the BETA,
EXPONENTIAL, GAMMA, LOGNORMAL, WEIBULL, and WEIBULL2 options.
Enclose the THETA= option in parentheses after a distribution option. When used
with the WEIBULL2 option, the THETA= option species the known lower threshold 0 , for which the default is 0. When used with the other distribution options, the
THETA= option species 0 for a distribution reference line; alternatively in this situation, you can specify THETA=EST to request a maximum likelihood estimate for
0 . To request the line, you must also specify a scale parameter.
THRESHOLD=value | EST
species a label for the vertical axis. Labels can have up to 40 characters.
VMINOR=n
VM=n
species the number of minor tick marks between each major tick mark on the vertical
axis. Minor tick marks are not labeled. The default is zero.
VREF=values
draws reference lines perpendicular to the vertical axis at the values specied. Also
see the CVREF=, LVREF=, and VREFCHAR= options.
VREFLABELS=label1. . . labeln
VREFLABEL=label1. . . labeln
VREFLAB=label1. . . labeln
species labels for the reference lines requested by the VREF= option. The number
of labels must equal the number of reference lines. Enclose each label in quotes.
Labels can have up to 16 characters.
VREFLABPOS=n
species the width, in pixels, for a diagonal distribution line. Enclose the W= option
in parentheses after the distribution option. By default, W=1.
WAXIS=n
species the line thickness, in pixels, for the axes and frame. By default, WAXIS=1.
265
266
creates a three-parameter Weibull quantile plot for each value of the required shape
parameter c specied by the mandatory C= Weibull-option. To create a plot that is
based on a maximum likelihood estimate for c, specify C=EST. To obtain a graphical
estimate of c, specify a list of values in the C= Weibull-option, and select the value
that most nearly linearizes the point pattern. To assess the point pattern, add a diagonal distribution reference line corresponding to 0 and 0 with the THETA= and
SIGMA= Weibull-options. Alternatively, you can add a line corresponding to estimated values of 0 and 0 with the Weibull-options THETA=EST and SIGMA=EST.
Agreement between the reference line and the point pattern indicates that the Weibull
distribution with parameters c, 0 , and 0 is a good t. You can specify the SCALE=
Weibull-option as an alias for the SIGMA= Weibull-option and the THRESHOLD=
Weibull-option as an alias for the THETA= Weibull-option. See Example 3.34.
WEIBULL2<(Weibull2-options)>
W2<(Weibull2-options)>
creates a two-parameter Weibull quantile plot. You should use the WEIBULL2 option
when your data have a known lower threshold 0 , which is 0 by default. To specify
the threshold value 0 , use the THETA= Weibull2-option. By default, THETA=0. An
advantage of the two-parameter Weibull plot over the three-parameter Weibull plot
is that the parameters c and can be estimated from the slope and intercept of the
point pattern. A disadvantage is that the two-parameter Weibull distribution applies
only in situations where the threshold parameter is known. To obtain a graphical estimate of 0 , specify a list of values for the THETA= Weibull2-option, and select the
value that most nearly linearizes the point pattern. To assess the point pattern, add
a diagonal distribution reference line corresponding to 0 and c0 with the SIGMA=
and C= Weibull2-options. Alternatively, you can add a distribution reference line corresponding to estimated values of 0 and c0 with the Weibull2-options SIGMA=EST
and C=EST. Agreement between the reference line and the point pattern indicates that
the Weibull distribution with parameters c0 , 0 , and 0 is a good t. You can specify
the SCALE= Weibull2-option as an alias for the SIGMA= Weibull2-option and the
SHAPE= Weibull2-option as an alias for the C= Weibull2-option. See Example 3.34.
ZETA=value | EST
species a value for the scale parameter for the lognormal quantile plots requested
with the LOGNORMAL option. Enclose the ZETA= lognormal-option in parentheses after the LOGNORMAL option. To request a distribution reference line with
intercept 0 and slope exp(0 ), specify the THETA=0 and ZETA=0 .
VAR Statement
VAR variables ;
The VAR statement species the analysis variables and their order in the results. By
default, if you omit the VAR statement, PROC UNIVARIATE analyzes all numeric
variables that are not listed in the other statements.
WEIGHT Statement
Using the Output Statement with the VAR Statement
You must provide a VAR statement when you use an OUTPUT statement. To store
the same statistic for several analysis variables in the OUT= data set, you specify a
list of names in the OUTPUT statement. PROC UNIVARIATE makes a one-to-one
correspondence between the order of the analysis variables in the VAR statement and
the list of names that follow a statistic keyword.
WEIGHT Statement
WEIGHT variable ;
The WEIGHT statement species numeric weights for analysis variables in the statistical calculations. The UNIVARIATE procedure uses the values wi of the WEIGHT
variable to modify the computation of a number of summary statistics by assuming
that the variance of the ith value xi of the analysis variable is equal to 2 /wi , where
is an unknown parameter. The values of the WEIGHT variable do not have to be integers and are typically positive. By default, observations with nonpositive or missing
values of the WEIGHT variable are handled as follows:
If the value is zero, the observation is counted in the total number of observations.
If the value is negative, it is converted to zero, and the observation is counted
in the total number of observations.
If the value is missing, the observation is excluded from the analysis.
To exclude observations that contain negative and zero weights from the analysis, use
EXCLNPWGT. Note that most SAS/STAT procedures, such as PROC GLM, exclude
negative and zero weights by default. The weight variable does not change how
the procedure determines the range, mode, extreme values, extreme observations, or
number of missing values. When you specify a WEIGHT statement, the procedure
also computes a weighted standard error and a weighted version of Students t test.
The Students t test is the only test of location that PROC UNIVARIATE computes
when you weight the analysis variables.
When you specify a WEIGHT variable, the procedure uses its values, wi , to compute
weighted versions of the statistics provided in the Moments table. For example, the
procedure computes a weighted mean xw and a weighted variance s2 as
w
i wi xi
xw =
i wi
and
s2 =
w
1
d
wi (xi xw )2
i
In Release 6.12 and earlier releases, observations were used in the analysis if and only if the
WEIGHT variable value was greater than zero.
In Release 6.12 and earlier releases, weighted skewness and kurtosis were not computed.
267
268
Details
Missing Values
PROC UNIVARIATE excludes missing values for an analysis variable before calculating statistics. Each analysis variable is treated individually; a missing value for an
observation in one variable does not affect the calculations for other variables. The
statements handle missing values as follows:
If a BY or an ID variable value is missing, PROC UNIVARIATE treats it like
any other BY or ID variable value. The missing values form a separate BY
group.
If the FREQ variable value is missing or nonpositive, PROC UNIVARIATE
excludes the observation from the analysis.
If the WEIGHT variable value is missing, PROC UNIVARIATE excludes the
observation from the analysis.
PROC UNIVARIATE tabulates the number of missing values and reports this information in the ODS table named Missing Values; see the section ODS Table Names
on page 309. Before the number of missing values is tabulated, PROC UNIVARIATE
excludes observations when
you use the FREQ statement and the frequencies are nonpositive.
you use the WEIGHT statement and the weights are missing or nonpositive
(you must specify the EXCLNPWGT option).
In Release 6.12 and earlier releases, the weights did not affect the computation of percentiles and
the procedure did not exclude the observations with missing weights from the count of observations.
Descriptive Statistics
Rounding
When you specify ROUND=u, PROC UNIVARIATE rounds a variable by using the
rounding unit to divide the number line into intervals with midpoints of the form
ui, where u is the nonnegative rounding unit and i is an integer. The interval width
is u. Any variable value that falls in an interval is rounded to the midpoint of that
interval. A variable value that is midway between two midpoints, and is therefore on
the boundary of two intervals, rounds to the even midpoint. Even midpoints occur
when i is an even integer (0, 2, 4, . . .).
When ROUND=1 and the analysis variable values are between 2.5 and 2.5, the
intervals are as follows:
Table 3.57. Intervals for Rounding When ROUND=1
i
2
1
0
1
2
Interval
[2.5,1.5]
[1.5,0.5]
[0.5,0.5]
[0.5,1.5]
[1.5,2.5]
Midpoint
2
1
0
1
2
When ROUND=.5 and the analysis variable values are between 1.25 and 1.25, the
intervals are as follows:
Table 3.58. Intervals for Rounding When ROUND=0.5
i
2
1
0
1
2
Interval
[1.25,0.75]
[0.75,0.25]
[0.25,0.25]
[0.25,0.75]
[0.75,1.25]
Midpoint
1.0
0.5
0.0
0.5
1.0
As the rounding unit increases, the interval width also increases. This reduces
the number of unique values and decreases the amount of memory that PROC
UNIVARIATE needs.
Descriptive Statistics
This section provides computational details for the descriptive statistics that are computed with the PROC UNIVARIATE statement. These statistics can also be saved in
the OUT= data set by specifying the keywords listed in Table 3.30 on page 237 in the
OUTPUT statement.
Standard algorithms (Fisher 1973) are used to compute the moment statistics. The
computational methods used by the UNIVARIATE procedure are consistent with
those used by other SAS procedures for calculating descriptive statistics.
The following sections give specic details on a number of statistics calculated by
the UNIVARIATE procedure.
269
270
Mean
The sample mean is calculated as
n
i=1 wi xi
n
i=1 wi
xw =
where n is the number of nonmissing values for a variable, xi is the ith value of the
variable, and wi is the weight associated with the ith value of the variable. If there is
no WEIGHT variable, the formula reduces to
1
n
x=
xi
i=1
Sum
The sum is calculated as n wi xi , where n is the number of nonmissing values
i=1
for a variable, xi is the ith value of the variable, and wi is the weight associated with
the ith value of the variable. If there is no WEIGHT variable, the formula reduces to
n
i=1 xi .
Variance
The variance is calculated as
1
d
wi (xi xw )2
i=1
where n is the number of nonmissing values for a variable, xi is the ith value of the
variable, xw is the weighted mean, wi is the weight associated with the ith value of
the variable, and d is the divisor controlled by the VARDEF= option in the PROC
UNIVARIATE statement:
n1
n
d=
( i wi ) 1
i wi
if VARDEF=DF (default)
if VARDEF=N
if VARDEF=WDF
if VARDEF=WEIGHT|WGT
(xi x)2
i=1
Descriptive Statistics
Standard Deviation
The standard deviation is calculated as
1
d
sw =
wi (xi xw )2
i=1
where n is the number of nonmissing values for a variable, xi is the ith value of the
variable, xw is the weighted mean, wi is the weight associated with the ith value of
the variable, and d is the divisor controlled by the VARDEF= option in the PROC
UNIVARIATE statement. If there is no WEIGHT variable, the formula reduces to
s=
1
d
(xi x)2
i=1
Skewness
The sample skewness, which measures the tendency of the deviations to be larger in
one direction than in the other, is calculated as follows depending on the VARDEF=
option:
Table 3.59. Formulas for Skewness
VARDEF
DF (default)
Formula
n
(n 1)(n 2)
1
n
n
3/2
wi
i=1
WDF
3/2
wi
i=1
xi xw
sw
xi xw
sw
missing
WEIGHT|WGT
missing
where n is the number of nonmissing values for a variable, xi is the ith value of the
variable, xw is the sample average, s is the sample standard deviation, and wi is the
weight associated with the ith value of the variable. If VARDEF=DF, then n must be
greater than 2. If there is no WEIGHT variable, then wi = 1 for all i = 1, . . . , n.
The sample skewness can be positive or negative; it measures the asymmetry of the
3
data distribution and estimates the theoretical skewness 1 = 3 2 2 , where 2
and 3 are the second and third central moments. Observations that are normally
distributed should have a skewness near zero.
271
272
Kurtosis
The sample kurtosis, which measures the heaviness of tails, is calculated as follows
depending on the VARDEF= option:
Table 3.60. Formulas for Kurtosis
VARDEF
DF (default)
Formula
n(n + 1)
(n 1)(n 2)(n 3)
1
n
n
2
wi
i=1
WDF
2
wi
i=1
xi xw
sw
3(n 1)2
(n 2)(n 3)
missing
WEIGHT|WGT
xi xw
sw
missing
where n is the number of nonmissing values for a variable, xi is the ith value of the
variable, xw is the sample average, sw is the sample standard deviation, and wi is the
weight associated with the ith value of the variable. If VARDEF=DF, then n must be
greater than 3. If there is no WEIGHT variable, then wi = 1 for all i = 1, . . . , n.
The sample kurtosis measures the heaviness of the tails of the data distribution. It
100 sw
xw
Calculating Percentiles
Calculating Percentiles
The UNIVARIATE procedure automatically computes the 1st, 5th, 10th, 25th, 50th,
75th, 90th, 95th, and 99th percentiles (quantiles), as well as the minimum and maximum of each analysis variable. To compute percentiles other than these default
percentiles, use the PCTLPTS= and PCTLPRE= options in the OUTPUT statement.
You can specify one of ve denitions for computing the percentiles with the
PCTLDEF= option. Let n be the number of nonmissing values for a variable, and
let x1 , x2 , . . . , xn represent the ordered values of the variable. Let the tth percentile
t
be y, set p = 100 , and let
np = j + g
(n + 1)p = j + g
when PCTLDEF=1, 2, 3, or 5
when PCTLDEF=4
where j is the integer part of np, and g is the fractional part of np. Then the
PCTLDEF= option denes the tth percentile, y, as described in the following table:
Table 3.61. Percentile Denitions
PCTLDEF
1
Description
Weighted average at xnp
Formula
y = (1 g)xj + gxj+1
where x0 is taken to be x1
y
y
y
y
Observation numbered
closest to np
1
= xj
if g < 2
1
= xj
if g = 2 and j is even
1
= xj+1 if g = 2 and j is odd
1
= xj+1 if g > 2
y = xj
if g = 0
y = xj+1 if g > 0
y = (1 g)xj + gxj+1
where xn+1 is taken to be xn
1
y = 2 (xj + xj+1 ) if g = 0
y = xj+1
if g > 0
Weighted Percentiles
When you use a WEIGHT statement, the percentiles are computed differently. The
100pth weighted percentile y is computed from the empirical distribution function
with averaging
y=
1
2 (xi
+ xi+1 ) if
xi+1
if
i
j=1 wj
i
j=1 wj
= pW
< pW <
i+1
j=1 wj
n
i=1 wi
is the sum of
Note that the PCTLDEF= option is not applicable when a WEIGHT statement is
used. However, in this case, if all the weights are identical, the weighted percentiles
are the same as the percentiles that would be computed without a WEIGHT statement
and with PCTLDEF=5.
273
274
Q(k; n, p) =
i=0
n i
p (1 p)ni
i
where 0 < u n
Note that condence limits for percentiles are not computed when a WEIGHT statement is specied. See Example 3.10.
275
276
Students t Test
PROC UNIVARIATE calculates the t statistic as
t=
x 0
s/ n
where x is the sample mean, n is the number of nonmissing values for a variable, and
s is the sample standard deviation. The null hypothesis is that the population mean
equals 0 . When the data values are approximately normally distributed, the probability under the null hypothesis of a t statistic that is as extreme, or more extreme,
than the observed value (the p-value) is obtained from the t distribution with n 1
degrees of freedom. For large n, the t statistic is asymptotically equivalent to a z test.
When you use the WEIGHT statement and the default value of VARDEF=, which is
DF, the t statistic is calculated as
tw =
xw 0
n
sw /
i=1 wi
the weight for ith observation. The tw statistic is treated as having a Students t
distribution with n 1 degrees of freedom. If you specify the EXCLNPWGT option
in the PROC statement, n is the number of nonmissing observations when the value
of the WEIGHT variable is positive. By default, n is the number of nonmissing
observations for the WEIGHT variable.
Sign Test
PROC UNIVARIATE calculates the sign test statistic as
M = (n+ n )/2
where n+ is the number of values that are greater than 0 , and n is the number
of values that are less than 0 . Values equal to 0 are discarded. Under the null
hypothesis that the population median is equal to 0 , the p-value for the observed
statistic Mobs is
min(n+ ,n )
Pr(|Mobs | |M |) = 0.5(nt 1)
j=0
nt
j
S=
i:xi >0
nt (nt + 1)
4
+
where ri is the rank of |xi 0 | after discarding values of xi = 0 , and nt is the
number of xi values not equal to 0 . Average ranks are used for tied values.
n1
nV S 2
V =
ti (ti + 1)(ti 1)
where the sum is over groups tied in absolute value and where ti is the number of
values in the ith group (Iman 1974; Conover 1999). The null hypothesis tested is that
the mean (or median) is zero, assuming that the distribution is symmetric. Refer to
Lehmann (1998).
n
2
where s2 =
1
n1
The two-sided 100(1 )% condence interval for the standard deviation has lower
and upper limits
s
n1
2
1 ;n1
2
and s
n1
2
2 ;n1
277
278
1 2 ;n1
and 2
2 ;n1
percentiles of the
n1
2
1;n1
and s
n1
2
;n1
respectively. The 100(1 )% condence interval for the variance has upper and
lower limits equal to the squares of the corresponding upper and lower limits for the
standard deviation. When you use the WEIGHT statement and specify VARDEF=DF
in the PROC statement, the 100(1 )% condence interval for the weighted mean
is
sw
xw t1
n
i=1 wi
weight for ith observation, and t1 is the (1 ) percentile for the t distribution
2
with n 1 degrees of freedom.
Robust Estimators
A statistical method is robust if it is insensitive to moderate or even large departures
from the assumptions that justify the method. PROC UNIVARIATE provides several
methods for robust estimation of location and scale. See Example 3.11.
Winsorized Means
The Winsorized mean is a robust estimator of the location that is relatively insensitive
to outliers. The k-times Winsorized mean is calculated as
xwk =
1
n
nk1
x(i) + (k + 1)x(nk)
(k + 1)x(k+1) +
i=k+2
where n is the number of observations, and x(i) is the ith order statistic when the
observations are arranged in increasing order:
x(1) x(2) . . . x(n)
The Winsorized mean is computed as the ordinary mean after the k smallest observations are replaced by the (k +1)st smallest observation, and the k largest observations
are replaced by the (k + 1)st largest observation.
For data from a symmetric distribution, the Winsorized mean is an unbiased estimate
of the population mean. However, the Winsorized mean does not have a normal
distribution even if the data are from a normal population.
Robust Estimators
279
s2
wk
= (k + 1)(x(k+1) xwk ) +
i=k+2
xwk 0
SE(wk )
x
where 0 denotes the location under the null hypothesis, and the standard error of the
Winsorized mean is
SE(wk ) =
x
n1
n 2k 1
swk
n(n 1)
When the data are from a symmetric distribution, the distribution of twk is approximated by a Students t distribution with n 2k 1 degrees of freedom (Tukey and
McLaughlin 1963; Dixon and Tukey 1968).
The Winsorized 100(1 )% condence interval for the location parameter has
2
upper and lower limits
xwk t1 ;n2k1 SE(wk )
x
2
n 2k 1 degrees of freedom.
Trimmed Means
Like the Winsorized mean, the trimmed mean is a robust estimator of the location
that is relatively insensitive to outliers. The k-times trimmed mean is calculated as
xtk =
1
n 2k
nk
x(i)
i=k+1
where n is the number of observations, and x(i) is the ith order statistic when the
observations are arranged in increasing order:
x(1) x(2) . . . x(n)
The trimmed mean is computed after the k smallest and k largest observations are
deleted from the sample. In other words, the observations are trimmed at each end.
280
ttk =
(tk 0 )
x
SE(tk )
x
swk
(n 2k)(n 2k 1)
When the data are from a symmetric distribution, the distribution of ttk is approximated by a Students t distribution with n 2k 1 degrees of freedom (Tukey and
McLaughlin 1963; Dixon and Tukey 1968).
The trimmed 100(1 )% condence interval for the location parameter has upper
and lower limits
xtk t1 ;n2k1 SE(tk )
x
2
n 2k 1 degrees of freedom.
1
n
2
|xi xj |
i<j
n
2
+1
2
n
distances between
2
the data points. The bias-corrected statistic cqn Qn is used to estimate , where cqn is
a correction factor; refer to Croux and Rousseeuw (1992).
In other words, Qn is 2.219 times the kth order statistic of the
281
282
Stem-and-Leaf Plot
The rst plot in the output is either a stem-and-leaf plot (Tukey 1977) or a horizontal
bar chart. If any single interval contains more than 49 observations, the horizontal bar
chart appears. Otherwise, the stem-and-leaf plot appears. The stem-and-leaf plot is
like a horizontal bar chart in that both plots provide a method to visualize the overall
distribution of the data. The stem-and-leaf plot provides more detail because each
point in the plot represents an individual data value.
To change the number of stems that the plot displays, use PLOTSIZE= to increase or
decrease the number of rows. Instructions that appear below the plot explain how to
determine the values of the variable. If no instructions appear, you multiply Stem.Leaf
by 1 to determine the values of the variable. For example, if the stem value is 10 and
the leaf value is 1, then the variable value is approximately 10.1. For the stem-andleaf plot, the procedure rounds a variable value to the nearest leaf. If the variable
value is exactly halfway between two leaves, the value rounds to the nearest leaf with
an even integer value. For example, a variable value of 3.15 has a stem value of 3 and
a leaf value of 2.
Box Plot
The box plot, also known as a schematic box plot, appears beside the stem-and-leaf
plot. Both plots use the same vertical scale. The box plot provides a visual summary
of the data and identies outliers. The bottom and top edges of the box correspond to
the sample 25th (Q1) and 75th (Q3) percentiles. The box length is one interquartile
range (Q3 - Q1). The center horizontal line with asterisk endpoints corresponds to
the sample median. The central plus sign (+) corresponds to the sample mean. If the
mean and median are equal, the plus sign falls on the line inside the box. The vertical
lines that project out from the box, called whiskers, extend as far as the data extend,
up to a distance of 1.5 interquartile ranges. Values farther away are potential outliers.
The procedure identies the extreme values with a zero or an asterisk (*). If zero
appears, the value is between 1.5 and 3 interquartile ranges from the top or bottom
edge of the box. If an asterisk appears, the value is more extreme.
Note: To produce box plots using high-resolution graphics, use the BOXPLOT procedure in SAS/STAT software; refer to Chapter 18, The BOXPLOT Procedure, in
SAS/STAT Users Guide.
ri 3
8
1
n+ 4
3
(1 8i )
1
(1+ 4n )
i
j=1 w(j)
n
i=1 wi
3
i 8
1
n+ 4
When the value of VARDEF= is WDF or WEIGHT, a reference line with intercept
and slope is added to the plot. When the value of VARDEF= is DF or N, the slope
n
w
When each observation has an identical weight and the value of VARDEF= is DF, N,
or WEIGHT, the reference line reduces to the usual reference line with intercept
If the data are normally distributed with mean , standard deviation , and each observation has an identical weight w, then the points on the plot should lie approximately
on a straight line. The intercept for this line is . The slope is when VARDEF= is
283
284
285
286
Margin positions are recommended if you list a large number of statistics in the
INSET statement. If you attempt to display a lengthy inset in the interior of the
plot, it is most likely that the inset will collide with the data display.
Positioning the Inset Using Coordinates
To position the inset with coordinates, use POSITION=(x,y). You specify the coordinates in axis data units or in axis percentage units (the default).
If you specify the DATA option immediately following the coordinates, PROC
UNIVARIATE positions the inset by using axis data units. For example, the following statements place the bottom left corner of the inset at 45 on the horizontal
axis and 10 on the vertical axis:
title Test Scores for a College Course;
proc univariate data=Score noprint;
histogram PreTest / midpoints = 45 to 95 by 10;
inset n / header
= Position=(45,10)
position = (45,10) data;
run;
By default, the specied coordinates determine the position of the bottom left corner
of the inset. To change this reference point, use the REFPOINT= option (see the next
example).
If you omit the DATA option, PROC UNIVARIATE positions the inset by using axis
percentage units. The coordinates in axis percentage units must be between 0 and
100. The coordinates of the bottom left corner of the display are (0,0), while the
upper right corner is (100, 100). For example, the following statements create a
histogram and use coordinates in axis percentage units to position the two insets:
title Test Scores for a College Course;
proc univariate data=Score noprint;
histogram PreTest / midpoints = 45 to 95 by 10;
inset min / position = (5,25)
header
= Position=(5,25)
refpoint = tl;
inset max / position = (95,95)
header
= Position=(95,95)
refpoint = tr;
run;
The REFPOINT= option determines which corner of the inset to place at the coordinates that are specied with the POSITION= option. The rst inset uses
REFPOINT=TL, so that the top left corner of the inset is positioned 5% of the way
across the horizontal axis and 25% of the way up the vertical axis. The second inset
uses REFPOINT=TR, so that the top right corner of the inset is positioned 95% of
the way across the horizontal axis and 95% of the way up the vertical axis.
287
288
A sample program, univar3.sas, for these examples is available in the SAS Sample
Library for Base SAS software.
Beta Distribution
The tted density function is
1
p(x) =
where B(, ) =
()()
(+)
and
p(x) =
(xa)p1 (bx)q1
B(p,q)(ba)p+q1
The beta distributions are also referred to as Pearson Type I or II distributions. These
include the power-function distribution ( = 1), the arc-sine distribution ( = =
1
1
2 ), and the generalized arc-sine distributions ( + = 1, = 2 ).
You can use the DATA step function BETAINV to compute beta quantiles and the
DATA step function PROBBETA to compute beta probabilities.
Exponential Distribution
The tted density function is
p(x) =
100h%
exp(( x )) for x
for x <
where
= threshold parameter
= scale parameter ( > 0)
h = width of histogram interval
289
290
Gamma Distribution
The tted density function is
p(x) =
100h% x 1
exp(( x ))
() ( )
for x >
for x
where
= threshold parameter
= scale parameter ( > 0)
= shape parameter ( > 0)
h = width of histogram interval
The threshold parameter must be less than the minimum data value. You can specify with the THRESHOLD= gamma-option. By default, = 0. If you specify
THETA=EST, a maximum likelihood estimate is computed for . In addition, you
can specify and with the SCALE= and ALPHA= gamma-options. By default,
the procedure calculates maximum likelihood estimates for and .
The gamma distributions are also referred to as Pearson Type III distributions, and
they include the chi-square, exponential, and Erlang distributions. The probability
density function for the chi-square distribution is
p(x) =
1
2( )
2
x
2
1
2
p(x) =
2 2 1 ( )
2
0
Lognormal Distribution
The tted density function is
p(x) =
100h%
2(x)
exp (log(x))
2 2
for x >
for x
0
where
= threshold parameter
= scale parameter ( < < )
= shape parameter ( > 0)
h = width of histogram interval
The threshold parameter must be less than the minimum data value. You can specify
with the THRESHOLD= lognormal-option. By default, = 0. If you specify
THETA=EST, a maximum likelihood estimate is computed for . You can specify
and with the SCALE= and SHAPE= lognormal-options, respectively. By default,
the procedure calculates maximum likelihood estimates for these parameters.
Note: The lognormal distribution is also referred to as the SL distribution in the
Johnson system of distributions.
Note: This book uses to denote the shape parameter of the lognormal distribution,
whereas is used to denote the scale parameter of the beta, exponential, gamma,
normal, and Weibull distributions. The use of to denote the lognormal shape pa1
rameter is based on the fact that (log(X ) ) has a standard normal distribution
if X is lognormally distributed. Based on this relationship, you can use the DATA
step function PROBIT to compute lognormal quantiles and the DATA step function
PROBNORM to compute probabilities.
Normal Distribution
The tted density function is
p(x) =
100h%
1
exp 2 ( x )2
where
= mean
= standard deviation ( > 0)
h = width of histogram interval
291
292
Weibull Distribution
The tted density function is
p(x) =
c
100h% ( x )c1 exp(( x )c ) for x >
0
for x
where
= threshold parameter
= scale parameter ( > 0)
c = shape parameter (c > 0)
h = width of histogram interval
The threshold parameter must be less than the minimum data value. You can specify with the THRESHOLD= Weibull-option. By default, = 0. If you specify
THETA=EST, a maximum likelihood estimate is computed for . You can specify
and c with the SCALE= and SHAPE= Weibull-options, respectively. By default, the
procedure calculates maximum likelihood estimates for and c.
The exponential distribution is a special case of the Weibull distribution where c = 1.
Goodness-of-Fit Tests
When you specify the NORMAL option in the PROC UNIVARIATE statement or you
request a tted parametric distribution in the HISTOGRAM statement, the procedure
computes goodness-of-t tests for the null hypothesis that the values of the analysis
variable are a random sample from the specied theoretical distribution. See Example
3.22.
When you specify the NORMAL option, these tests, which are summarized in the
output table labeled Tests for Normality, include the following:
Shapiro-Wilk test
Kolmogorov-Smirnov test
Anderson-Darling test
Cramr-von Mises test
Goodness-of-Fit Tests
The Kolmogorov-Smirnov D statistic, the Anderson-Darling statistic, and the
Cramr-von Mises statistic are based on the empirical distribution function (EDF).
However, some EDF tests are not supported when certain combinations of the parameters of a specied distribution are estimated. See Table 3.62 on page 296 for a
list of the EDF tests available. You determine whether to reject the null hypothesis
by examining the p-value that is associated with a goodness-of-t statistic. When the
p-value is less than the predetermined critical value (), you reject the null hypothesis
and conclude that the data did not come from the specied distribution.
If you want to test the normality assumptions for analysis of variance methods, beware of using a statistical test for normality alone. A tests ability to reject the null
hypothesis (known as the power of the test) increases with the sample size. As the
sample size becomes larger, increasingly smaller departures from normality can be
detected. Since small deviations from normality do not severely affect the validity of analysis of variance tests, it is important to examine other statistics and plots
to make a nal assessment of normality. The skewness and kurtosis measures and
the plots that are provided by the PLOTS option, the HISTOGRAM statement, the
PROBPLOT statement, and the QQPLOT statement can be very helpful. For small
sample sizes, power is low for detecting larger departures from normality that may
be important. To increase the tests ability to detect such deviations, you may want to
declare signicance at higher levels, such as 0.15 or 0.20, rather than the often-used
0.05 level. Again, consulting plots and additional statistics will help you assess the
severity of the deviations from normality.
Shapiro-Wilk Statistic
If the sample size is less than or equal to 2000 and you specify the NORMAL option,
PROC UNIVARIATE computes the Shapiro-Wilk statistic, W (also denoted as Wn
to emphasize its dependence on the sample size n). The W statistic is the ratio of
the best estimator of the variance (based on the square of a linear combination of
the order statistics) to the usual corrected sum of squares estimator of the variance
(Shapiro and Wilk 1965). When n is greater than three, the coefcients to compute the linear combination of the order statistics are approximated by the method of
Royston (1992). The statistic W is always greater than zero and less than or equal to
one (0 < W 1).
Small values of W lead to the rejection of the null hypothesis of normality. The
distribution of W is highly skewed. Seemingly large values of W (such as 0.90)
may be considered small and lead you to reject the null hypothesis. The method for
computing the p-value (the probability of obtaining a W statistic less than or equal
to the observed value) depends on n. For n = 3, the probability distribution of W is
known and is used to determine the p-value. For n > 4, a normalizing transformation
is computed:
Zn =
( log( log(1 Wn )) )/ if 4 n 11
(log(1 Wn ) )/
if 12 n 2000
293
294
The computational formulas for the EDF statistics make use of the probability integral
transformation U = F (X). If F (X) is the distribution function of X, the random
variable U is uniformly distributed between 0 and 1.
Given n observations X(1) , . . . , X(n) , the values U(i) = F (X(i) ) are computed by
applying the transformation, as discussed in the next three sections.
PROC UNIVARIATE provides three EDF tests:
Kolmogorov-Smirnov
Anderson-Darling
Cramr-von Mises
The following sections provide formal denitions of these EDF statistics.
Kolmogorov D Statistic
Goodness-of-Fit Tests
The Kolmogorov-Smirnov statistic is computed as the maximum of D+ and D ,
where D+ is the largest vertical distance between the EDF and the distribution function when the EDF is greater than the distribution function, and D is the largest
vertical distance when the EDF is less than the distribution function.
i
D+ = maxi n U(i)
D = maxi U(i) i1
n
D
= max (D+ , D )
PROC UNIVARIATE uses a modied Kolmogorov D statistic to test the data against
a normal distribution with mean and variance equal to the sample mean and variance.
Anderson-Darling Statistic
The Anderson-Darling statistic and the Cramr-von Mises statistic belong to the
quadratic class of EDF statistics. This class of statistics is based on the squared
difference (Fn (x) F (x))2 . Quadratic statistics have the following general form:
+
Q=n
The function (x) weights the squared difference (Fn (x) F (x))2 .
The Anderson-Darling statistic (A2 ) is dened as
+
A2 = n
W2 = n
U(i)
W =
i=1
2i 1
2n
1
12n
295
296
Once the EDF test statistics are computed, PROC UNIVARIATE computes the associated probability values (p-values). The UNIVARIATE procedure uses internal tables of probability levels similar to those given by DAgostino and Stephens (1986).
If the value is between two probability levels, then linear interpolation is used to
estimate the probability value.
The probability value depends upon the parameters that are known and the parameters
that are estimated for the distribution. Table 3.62 summarizes different combinations
tted for which EDF tests are available.
Table 3.62. Availability of EDF Tests
Distribution
Beta
Exponential
Gamma
Lognormal
Normal
Weibull
Threshold
known
known
known,
known
unknown
unknown
known
known
known
known
unknown
unknown
unknown
unknown
known
known
known
known
unknown
unknown
unknown
unknown
known
known
unknown
unknown
known
known
known
known
unknown
unknown
unknown
unknown
Parameters
Scale
Shape
known
, known
known
, < 5 unknown
known
unknown
known
unknown
known
known
unknown
known
known
unknown
unknown
unknown
known
> 1 known
unknown
> 1 known
known
> 1 unknown
unknown
> 1 unknown
known
known
known
unknown
unknown
known
unknown
unknown
known
< 3 known
known
< 3 unknown
unknown
< 3 known
unknown
< 3 unknown
known
unknown
known
unknown
known
c known
unknown
c known
known
c unknown
unknown
c unknown
known
c > 2 known
unknown
c > 2 known
known
c > 2 unknown
unknown
c > 2 unknown
Tests Available
all
all
all
all
all
all
all
all
all
all
all
all
all
all
all
A2 and W 2
A2 and W 2
all
all
all
all
all
all
A2 and W 2
A2 and W 2
all
all
A2 and W 2
A2 and W 2
A2 and W 2
all
all
all
all
f (x) =
n
K0
i=1
x xi
where K0 () is the kernel function, is the bandwidth, n is the sample size and xi is
the ith observation.
The KERNEL option provides three kernel functions (K0 ): normal, quadratic, and
triangular. You can specify the function with the K= kernel-option in parentheses
after the KERNEL option. Values for the K= option are NORMAL, QUADRATIC,
and TRIANGULAR (with aliases of N, Q, and T, respectively). By default, a normal
kernel is used. The formulas for the kernel functions are
Normal
K0 (t) =
Quadratic
K0 (t) =
1 exp( 1 t2 )
2
2
3
2)
4 (1 t
= cQn 5
For a specic kernel function, the discrepancy between the density estimator f (x)
and the true density f (x) is measured by the mean integrated square error (MISE):
MISE() =
x
var(f (x))dx
x
The MISE is the sum of the integrated squared bias and the variance. An approximate
mean integrated square error (AMISE) is
1
AMISE() = 4
4
t2 K(t)dt
t
f (x)
x
dx +
1
n
K(t)2 dt
t
297
298
K0
x xi
+ K0
(x xl ) + (xi xl )
+ K0
(xu x) + (xu xi )
(xxl )+(xi xl )
(xu x)+(xu xi )
When C=MISE is used with a bounded kernel density, the UNIVARIATE procedure
uses a bandwidth that minimizes the AMISE for its corresponding unbounded kernel.
x (i)
i th point
-1
F (
i - 0.375
n + 0.25
299
300
Possible Interpretation
When the pattern is linear, you can use Q-Q plots to estimate shape, location, and
scale parameters and to estimate percentiles. See Example 3.26 through Example
3.34.
Parameters
Location Scale Shape
Distribution
Range
Beta
(x)1 (+x)1
B(,) (+1)
<x<+
x>
x>
all x
x>
x > 0
Exponential
exp x
x 1
exp
Gamma
1
()
Lognormal
1
2(x)
exp (log(x))
2 2
(3-parameter)
1
Normal
c
Weibull
exp (x)
2 2
x c1
exp
x c
(3-parameter)
Weibull
(2-parameter)
x0
c1
exp
x0
(known)
Beta Distribution
To create the plot, the observations are ordered from smallest to largest, and the ith or1
1
dered observation is plotted against the quantile B i0.375 , where B () is the
n+0.25
inverse normalized incomplete beta function, n is the number of nonmissing observations, and and are the shape parameters of the beta distribution. In a probability
plot, the horizontal axis is scaled in percentile units.
The pattern on the plot for ALPHA= and BETA= tends to be linear with intercept
and slope if the data are beta distributed with the specic density function
p(x) =
(x)1 (+x)1
B(,) (+1)
where B(, ) =
()()
(+)
and
Exponential Distribution
To create the plot, the observations are ordered from smallest to largest, and the ith
ordered observation is plotted against the quantile log 1 i0.375 , where n is
n+0.25
the number of nonmissing observations. In a probability plot, the horizontal axis is
scaled in percentile units.
The pattern on the plot tends to be linear with intercept and slope if the data are
exponentially distributed with the specic density function
p(x) =
exp x
for x
for x <
Gamma Distribution
To create the plot, the observations are ordered from smallest to largest, and the ith
ordered observation is plotted against the quantile G1 i0.375 , where G1 () is
n+0.25
the inverse normalized incomplete gamma function, n is the number of nonmissing
observations, and is the shape parameter of the gamma distribution. In a probability
plot, the horizontal axis is scaled in percentile units.
301
302
1
()
x 1
exp
for x >
for x
where
= threshold parameter
= scale parameter ( > 0)
= shape parameter ( > 0)
Lognormal Distribution
To create the plot, the observations are ordered from smallest to largest, and the
ith ordered observation is plotted against the quantile exp 1 i0.375 , where
n+0.25
1 () is the inverse cumulative standard normal distribution, n is the number of
nonmissing observations, and is the shape parameter of the lognormal distribution.
In a probability plot, the horizontal axis is scaled in percentile units.
The pattern on the plot for SIGMA= tends to be linear with intercept and slope
exp() if the data are lognormally distributed with the specic density function
p(x) =
1
2(x)
exp (log(x))
2 2
for x >
for x
0
where
= threshold parameter
= scale parameter
= shape parameter ( > 0)
See Example 3.26 and Example 3.33.
Normal Distribution
To create the plot, the observations are ordered from smallest to largest, and the ith
ordered observation is plotted against the quantile 1 i0.375 , where 1 () is the
n+0.25
inverse cumulative standard normal distribution, and n is the number of nonmissing
observations. In a probability plot, the horizontal axis is scaled in percentile units.
The point pattern on the plot tends to be linear with intercept and slope if the data
are normally distributed with the specic density function
p(x) =
exp (x)
2 2
for all x
x c1
exp
x c
for x >
for x
0
where
= threshold parameter
= scale parameter ( > 0)
c = shape parameter (c > 0)
See Example 3.34.
x0
c1
exp
x0
for x > 0
for x 0
where
0 = known lower threshold
= scale parameter ( > 0)
c = shape parameter (c > 0)
See Example 3.34.
303
304
Distribution Keyword
BETA
Range
> 0, > 0
EXPONENTIAL
None
GAMMA
ALPHA=
>0
LOGNORMAL
SIGMA=
>0
NORMAL
None
WEIBULL
C=c
WEIBULL2
None
c>0
Distribution
Beta
Exponential
Gamma
Lognormal
Normal
Weibull (3-parameter)
Weibull (2-parameter)
Parameters
Location
Scale
0 (known)
Shape
,
c
c
Linear Pattern
Intercept Slope
exp()
1
log()
c
For instance, specifying MU=3 and SIGMA=2 with the NORMAL option requests
a line with intercept 3 and slope 2. Specifying SIGMA=1 and C=2 with the
1
WEIBULL2 option requests a line with intercept log(1) = 0 and slope 2 . On a
probability plot with the LOGNORMAL and WEIBULL2 options, you can specify
the slope directly with the SLOPE= option. That is, for the LOGNORMAL option,
specifying THETA= 0 and SLOPE=exp(0 ) displays the same line as specifying
THETA= 0 and ZETA= 0 . For the WEIBULL2 option, specifying SIGMA= 0 and
SLOPE= c1 displays the same line as specifying SIGMA= 0 and C= c0 .
0
305
306
307
308
Variable
CURVE
Description
Name of tted distribution (if requested in HISTOGRAM
statement)
EXPPCT
MAXPT
MIDPT
MINPT
OBSPCT
VAR
Variable name
Description
Condence intervals for mean,
standard deviation, variance
Measures of location and variability
Extreme observations
Extreme values
Frequencies
Counts used for sign test and
signed rank test
Missing values
Modes
Sample moments
Line printer plots
Quantiles
Robust measures of scale
Line printer side-by-side box
plots
Tests for location
Tests for normality
Trimmed means
Winsorized means
Option
CIBASIC
Default
Default
NEXTRVAL=
FREQ
LOCCOUNT
Default, if missing values exist
MODES
Default
PLOTS
Default
ROBUSTSCALE
PLOTS (with BY statement)
Default
NORMALTEST
TRIMMED=
WINSORIZED=
309
310
Description
Histogram bins
FitQuantiles
GoodnessOfFit
HistogramBins
ParameterEstimates
Option
MIDPERCENTS secondary option
Any distribution option
Any distribution option
MIDPERCENTS option
Any distribution option
Parameters
The ParameterEstimates table lists the estimated (or specied) parameters for the
tted curve as well as the estimated mean and estimated standard deviation. See
Formulas for Fitted Continuous Distributions on page 288.
Histogram Intervals
The Bins table is included in the summary only if you specify the MIDPERCENTS
option in parentheses after the distribution option. This table lists the midpoints for
the histogram bins along with the observed and estimated percentages of the observations that lie in each bin. The estimated percentages are based on the tted
distribution.
If you specify the MIDPERCENTS option without requesting a tted distribution,
the HistogramBins table is included in the summary. This table lists the interval
midpoints with the observed percent of observations that lie in the interval. See the
entry for the MIDPERCENTS option on page 225.
Quantiles
The FitQuantiles table lists observed and estimated quantiles. You can use the
PERCENTS= option to specify the list of quantiles in this table. See the entry for the
PERCENTS= option on page 227. By default, the table lists observed and estimated
quantiles for the 1, 5, 10, 25, 50, 75, 90, 95, and 99 percent of a tted parametric
distribution.
Computational Resources
Computational Resources
Because the UNIVARIATE procedure computes quantile statistics, it requires additional memory to store a copy of the data in memory. By default, the MEANS,
SUMMARY, and TABULATE procedures require less memory because they do not
automatically compute quantiles. These procedures also provide an option to use a
new xed-memory quantiles estimation method that is usually less memory intensive.
In the UNIVARIATE procedure, the only factor that limits the number of variables
that you can analyze is the computer resources that are available. The amount of
temporary storage and CPU time required depends on the statements and the options
that you specify. To calculate the computer resources the procedure needs, let
N
V
Ui
311
312
Examples
Example 3.1. Computing Descriptive Statistics for Multiple
Variables
This example computes univariate statistics for two variables. The following statements create the data set BPressure, which contains the systolic (Systolic) and diastolic (Diastolic) blood pressure readings for 22 patients:
data BPressure;
length PatientID $2;
input PatientID $ Systolic Diastolic @@;
datalines;
CK 120 50 SS 96 60 FR 100 70
CP 120 75 BL 140 90 ES 120 70
CP 165 110 JI 110 40 MC 119 66
FC 125 76 RW 133 60 KD 108 54
DS 110 50 JW 130 80 BH 120 65
JW 134 80 SB 118 76 NS 122 78
GS 122 70 AB 122 78 EC 112 62
HH 122 82
;
run;
The following statements produce descriptive statistics and quantiles for the variables
Systolic and Diastolic:
title Systolic and Diastolic Blood Pressure;
ods select BasicMeasures Quantiles;
proc univariate data=BPressure;
var Systolic Diastolic;
run;
The ODS SELECT statement restricts the output, which is shown in Output 3.1.1, to
the BasicMeasures and Quantiles tables; see the section ODS Table Names
on page 309. You use the PROC UNIVARIATE statement to request univariate
statistics for the variables listed in the VAR statement, which species the analysis
variables and their order in the output. Formulas for computing the statistics in the
BasicMeasures table are provided in the section Descriptive Statistics on page
269. The quantiles are calculated using Denition 5, which is the default denition;
see the section Calculating Percentiles on page 273.
A sample program, uniex01.sas, for this example is available in the SAS Sample
Library for Base SAS software.
Variability
121.2727
120.0000
120.0000
Std Deviation
Variance
Range
Interquartile Range
14.28346
204.01732
69.00000
13.00000
Quantiles (Definition 5)
Quantile
100% Max
99%
95%
90%
75% Q3
50% Median
25% Q1
10%
5%
1%
0% Min
Estimate
165
165
140
134
125
120
112
108
100
96
96
Variability
70.09091
70.00000
70.00000
Std Deviation
Variance
Range
Interquartile Range
Quantiles (Definition 5)
Quantile
100% Max
99%
95%
90%
75% Q3
50% Median
25% Q1
10%
5%
1%
0% Min
Estimate
110
110
90
82
78
70
60
50
50
40
40
15.16547
229.99134
70.00000
18.00000
313
314
= Exam Score;
@@;
81 84 86 86 97
42 91 90 88 86
82 83 81 80 81
The following statements use the MODES option to request a table of all possible
modes:
title Table of Modes for Exam Scores;
ods select Modes;
proc univariate data=Exam modes;
var Score;
run;
The ODS SELECT statement restricts the output to the Modes table; see the section
ODS Table Names on page 309.
Output 3.2.1. Table of Modes Display
Table of Modes for Exam Scores
The UNIVARIATE Procedure
Variable: Score (Exam Score)
Modes
Mode
Count
81
86
97
4
4
4
By default, when the MODES option is used, and there is more than one mode, the
lowest mode is displayed in the BasicMeasures table. The following statements
illustrate the default behavior:
title Default Output;
ods select BasicMeasures;
proc univariate data=Exam;
var Score;
run;
83.66667
84.50000
81.00000
Variability
Std Deviation
Variance
Range
Interquartile Range
11.08069
122.78161
57.00000
10.00000
The default output displays a mode of 81 and includes a note regarding the number of
modes; the modes 86 and 97 are not displayed. The ODS SELECT statement restricts
the output to the BasicMeasures table; see the section ODS Table Names on page
309.
A sample program, uniex02.sas, for this example is available in the SAS Sample
Library for Base SAS software.
The ODS SELECT statement restricts the output to the ExtremeObs table; see
the section ODS Table Names on page 309. The ID statement requests that the
extreme observations are to be identied using the value of PatientID as well as the
observation number. By default, the ve lowest and ve highest observations are
displayed. You can use the NEXTROBS= option to request a different number of
extreme observations.
Output 3.3.1 shows that the patient identied as CP (Observation 7) has the highest
values for both Systolic and Diastolic. To visualize extreme observations, you can
create histograms; see Example 3.14.
315
316
Value
96
100
108
110
110
Patient
ID
SS
FR
KD
DS
JI
---------Highest--------
Obs
Value
2
3
12
13
8
130
133
134
140
165
Patient
ID
JW
RW
JW
BL
CP
Obs
14
11
16
5
7
Value
40
50
50
54
60
Patient
ID
JI
DS
CK
KD
RW
---------Highest--------
Obs
Value
8
13
1
12
11
80
80
82
90
110
Patient
ID
JW
JW
HH
BL
CP
Obs
14
16
22
5
7
The following statements generate the Extreme Values tables for Systolic and
Diastolic, which tabulate the tails of the distributions:
title Extreme Blood Pressure Values;
ods select ExtremeValues;
proc univariate data=BPressure nextrval=5;
var Systolic Diastolic;
run;
The ODS SELECT statement restricts the output to the ExtremeValues table; see
the section ODS Table Names on page 309. The NEXTRVAL= option species the
number of extreme values at each end of the distribution to be shown in the tables in
Output 3.3.2.
Output 3.3.2 shows that the values 78 and 80 occurred twice for Diastolic and the
maximum of Diastolic is 110. Note that Output 3.3.1 displays the value of 80 twice
for Diastolic because there are two observations with that value. In Output 3.3.2, the
value 80 is only displayed once.
--------Highest--------
Order
Value
Freq
Order
Value
Freq
1
2
3
4
5
96
100
108
110
112
1
1
1
2
1
11
12
13
14
15
130
133
134
140
165
1
1
1
1
1
--------Highest--------
Order
Value
Freq
Order
Value
Freq
1
2
3
4
5
40
50
54
60
62
1
2
1
2
1
11
12
13
14
15
78
80
82
90
110
2
2
1
1
1
A sample program, uniex01.sas, for this example is available in the SAS Sample
Library for Base SAS software.
317
318
The following statements produce a frequency table for the variable ScoreChange:
title Analysis of Score Changes;
ods select Frequencies;
proc univariate data=Score freq;
var ScoreChange;
run;
The ODS SELECT statement restricts the output to the Frequencies table; see
the section ODS Table Names on page 309. The FREQ option on the PROC
UNIVARIATE statement requests the table of frequencies shown in Output 3.4.1.
Output 3.4.1. Table of Frequencies
Analysis of Score Changes
Variable:
Value Count
-37
-14
-7
-5
1
1
1
2
Percents
Cell
Cum
8.3
8.3
8.3
16.7
8.3
16.7
25.0
41.7
Value Count
-3
2
3
2
1
1
Percents
Cell
Cum
16.7
8.3
8.3
58.3
66.7
75.0
Value Count
6
12
14
1
1
1
Percents
Cell
Cum
8.3 83.3
8.3 91.7
8.3 100.0
From Output 3.4.1, the instructor sees that only score changes of 3 and 5 occurred
more than once.
A sample program, uniex03.sas, for this example is available in the SAS Sample
Library for Base SAS software.
The following statements produce stem-and-leaf plots, box plots, and normal probability plots for each site in the AirPoll data set:
ods select Plots SSPlots;
proc univariate data=AirPoll plot;
by Site;
var Ozone;
run;
The PLOT option produces a stem-and-leaf plot, a box plot, and a normal probability
plot for the Ozone variable at each site. Since the BY statement is used, a side-byside box plot is also created to compare the ozone levels across sites. Note that AirPoll
is sorted by Site; in general, the data set should be sorted by the BY variable using
the SORT procedure. The ODS SELECT statement restricts the output to the Plots
and SSPlots tables; see the section ODS Table Names on page 309. Optionally,
you can specify the PLOTSIZE=n option to control the approximate number of rows
(between 8 and the page size) that the plots occupy.
Output 3.5.1 through Output 3.5.3 show the plots produced for each BY group.
Output 3.5.4 shows the side-by-side box plot for comparing Ozone values across
sites.
319
320
Leaf
0
00
0
00
0
000
000
0
0
----+----+----+----+
#
1
2
1
2
1
3
3
1
1
Boxplot
|
|
+-----+
|
|
|
|
*--+--*
+-----+
|
|
*++++
* * ++++
* +++++
* *+++
+*++
**+*
* *+*+
*++++
*++++
+----+----+----+----+----+----+----+----+----+----+
-2
-1
0
+1
+2
Leaf
000
0
000
000
000
00
----+----+----+----+
#
3
1
3
3
3
2
Boxplot
|
+-----+
|
|
*--+--*
+-----+
|
Leaf
000
00000
0000
00
0
----+----+----+----+
#
3
5
4
2
1
Boxplot
|
+-----+
+--+--+
|
0
|
|
|
|
+-----+
|
|
|
|
|
|
| + |
|
|
*-----*
|
|
+-----+
|
|
|
|
|
|
+-----+
|
|
|
|
| + |
*-----*
|
|
+-----+
|
|
|
|
*-----*
| + |
+-----+
|
|
0
------------+-----------+-----------+----------102
134
137
321
322
The following statements create a table of moments for the variable Speed:
title Analysis of Speeding Data;
ods select Moments;
proc univariate data=Speeding;
freq Number;
var Speed;
run;
The ODS SELECT statement restricts the output, which is shown in Output 3.6.1, to
the Moments table; see the section ODS Table Names on page 309. The FREQ
statement species that the value of the variable Number represents the frequency of
each observation.
For the formulas used to compute these moments, see the section Descriptive
Statistics on page 269. A sample program, uniex05.sas, for this example is available in the SAS Sample Library for Base SAS software.
Variable:
Number
Moments
N
Mean
Std Deviation
Skewness
Uncorrected SS
Coeff Variation
94
74.3404255
3.44403237
-0.1275543
520594
4.63278538
Sum Weights
Sum Observations
Variance
Kurtosis
Corrected SS
Std Error Mean
94
6988
11.861359
0.92002287
1103.10638
0.35522482
1129.70
1225.47
1216.75
1249.55
1201.96
1124.46
1208.27
1146.78
1195.66
1246.13
1223.49
1131.33
3.019
2.980
3.037
2.958
3.002
2.929
3.029
3.061
2.995
3.022
2.971
2.984
The following statements produce two output data sets containing summary statistics:
proc univariate data=Belts noprint;
var Strength Width;
output out=Means
mean=StrengthMean WidthMean;
output out=StrengthStats mean=StrengthMean std=StrengthSD
min=StrengthMin
max=StrengthMax;
run;
323
324
Strength
Mean
1205.75
Width
Mean
3.00584
Strength
Mean
Strength
SD
Strength
Max
Strength
Min
1205.75
48.3290
1289.59
1101.73
p95str
p5str
1284.34
1126.78
You can use the PCTLPTS=, PCTLPRE=, and PCTLNAME= options to save percentiles not automatically computed by the UNIVARIATE procedure. For example,
the following statements create an output data set named Pctls, which contains the
20th and 40th percentiles of the variables Strength and Width:
proc univariate data=Belts noprint;
var Strength Width;
output out=Pctls pctlpts = 20 40
pctlpre = Strength Width
pctlname = pct20 pct40;
run;
The PCTLPTS= option species the percentiles to compute (in this case, the 20th
and 40th percentiles). The PCTLPRE= and PCTLNAME= options build the names
for the variables containing the percentiles. The PCTLPRE= option gives prexes
for the new variables, and the PCTLNAME= option gives a sufx to add to the prex. When you use the PCTLPTS= specication, you must also use the PCTLPRE=
specication.
The OUTPUT statement saves the 20th and 40th percentiles of Strength and Width
in the variables Strengthpct20, Widthpct20, Strengthpct40, and Weightpct40.
The output data set Pctls is listed in Output 3.8.2.
325
326
Strengthpct20
Strengthpct40
Widthpct20
Widthpct40
1165.91
1199.26
2.9595
2.995
A sample program, uniex06.sas, for this example is available in the SAS Sample
Library for Base SAS software.
67.4
60.6
64.1
66.0
64.7
66.3
61.0
64.9
68.6
62.1
68.5
66.0
66.2
64.1
63.7
68.6
62.9
64.4
65.1
67.5
65.5
The following statements produce condence limits for the mean, standard deviation,
and variance of the population of heights:
title Analysis of Female Heights;
ods select BasicIntervals;
proc univariate data=Heights cibasic;
var Height;
run;
The CIBASIC option requests condence limits for the mean, standard deviation, and
variance. For example, Output 3.9.1 shows that the 95% condence interval for the
population mean is (64.06, 65.07). The ODS SELECT statement restricts the output
to the BasicIntervals table; see the section ODS Table Names on page 309.
The condence limits in Output 3.9.1 assume that the heights are normally distributed, so you should check this assumption before using these condence limits.
Example 3.9. Computing Condence Limits for the Mean, Standard Deviation, and
Variance
See the section Shapiro-Wilk Statistic on page 293 for information on the ShapiroWilk test for normality in PROC UNIVARIATE. See Example 3.19 for an example
using the test for normality.
Output 3.9.1. Default 95% Condence Limits
Analysis of Female Heights
The UNIVARIATE Procedure
Variable: Height (Height (in))
Basic Confidence Limits Assuming Normality
Parameter
Estimate
Mean
Std Deviation
Variance
64.56667
2.18900
4.79171
65.07031
2.60874
6.80552
By default, the condence limits produced by the CIBASIC option produce 95%
condence intervals. You can request different level condence limits by using the
ALPHA= option in parentheses after the CIBASIC option. The following statements
produce 90% condence limits:
title Analysis of Female Heights;
ods select BasicIntervals;
proc univariate data=Heights cibasic(alpha=.1);
var Height;
run;
Estimate
Mean
Std Deviation
Variance
64.56667
2.18900
4.79171
64.98770
2.53474
6.42492
For the formulas used to compute these limits, see the section Condence Limits for
Parameters of the Normal Distribution on page 277.
A sample program, uniex07.sas, for this example is available in the SAS Sample
Library for Base SAS software.
327
328
The ODS SELECT statement restricts the output to the Quantiles table; see the section ODS Table Names on page 309. The CIQUANTNORMAL option produces
condence limits for the quantiles. As noted in Output 3.10.1, these limits assume
that the data are normally distributed. You should check this assumption before using these condence limits. See the section Shapiro-Wilk Statistic on page 293
for information on the Shapiro-Wilk test for normality in PROC UNIVARIATE; see
Example 3.19 for an example using the test for normality.
Output 3.10.1. Normal-Based Quantile Condence Limits
Analysis of Female Heights
The UNIVARIATE Procedure
Variable: Height (Height (in))
Quantiles (Definition 5)
Quantile
100% Max
99%
95%
90%
75% Q3
50% Median
25% Q1
10%
5%
1%
0% Min
Estimate
70.0
70.0
68.6
67.5
66.0
64.4
63.1
61.6
60.6
60.0
60.0
68.94553
67.59184
66.85981
65.60757
64.14564
62.59071
61.13060
60.24022
58.55106
70.58228
68.89311
68.00273
66.54262
64.98770
63.52576
62.27352
61.54149
60.18781
It is also possible to use PROC UNIVARIATE to compute condence limits for quantiles without assuming normality. The following statements use the CIQUANTDF
option to request distribution-free condence limits for the quantiles of the population of heights:
329
Quantile
100% Max
99%
95%
90%
75% Q3
50% Median
25% Q1
10%
5%
1%
0% Min
Estimate
70.0
70.0
68.6
67.5
66.0
64.4
63.1
61.6
60.6
60.0
60.0
68.6
67.5
66.6
65.7
64.1
62.7
60.6
60.0
60.0
70.0
70.0
68.6
66.6
65.1
63.7
62.7
61.6
60.5
73
68
63
50
31
13
4
1
1
75
75
72
63
46
26
13
8
3
The table in Output 3.10.2 includes the ranks from which the condence limits are
computed. For more information on how these condence limits are calculated, see
the section Condence Limits for Percentiles on page 274. Note that condence
limits for quantiles are not produced when the WEIGHT statement is used.
A sample program, uniex07.sas, for this example is available in the SAS Sample
Library for Base SAS software.
48.97
94.50
91.53
91.77
91.54
91.77
91.53
94.50
48.97
330
Percent
Trimmed
in Tail
Number
Trimmed
in Tail
Trimmed
Mean
Std Error
Trimmed
Mean
4.55
13.64
1
3
120.3500
120.3125
2.573536
2.395387
95% Confidence
Limits
114.9635
115.2069
125.7365
125.4181
DF
t for H0:
Mu0=0.00
Pr > |t|
19
15
46.76446
50.22675
<.0001
<.0001
Winsorized Means
Percent
Winsorized
in Tail
Number
Winsorized
in Tail
Winsorized
Mean
Std Error
Winsorized
Mean
13.64
120.6364
2.417065
95% Confidence
Limits
115.4845
125.7882
DF
t for H0:
Mu0=0.00
Pr > |t|
15
49.91027
<.0001
Output 3.11.1 shows the trimmed mean for Systolic is 120.35 after one observation
has been trimmed, and 120.31 after 3 observations are trimmed. The Winsorized
mean for Systolic is 120.64. For details on trimmed and Winsorized means, see the
section Robust Estimators on page 278. The trimmed means can be compared with
the means shown in Output 3.1.1 (from Example 3.1), which displays the mean for
Systolic as 121.273.
The ROBUSTSCALE option requests a table, displayed in Output 3.11.2, which includes the interquartile range, Ginis mean difference, the median absolute deviation
about the median, Qn , and Sn .
Output 3.11.2 shows the robust estimates of scale for Systolic. For instance, the
interquartile range is 13. The estimates of range from 9.54 to 13.32. See the
section Robust Estimators on page 278.
A sample program, uniex01.sas, for this example is available in the SAS Sample
Library for Base SAS software.
Systolic
Measure
Interquartile Range
Ginis Mean Difference
MAD
Sn
Qn
Value
Estimate
of Sigma
13.00000
15.03030
6.50000
9.54080
13.33140
9.63691
13.32026
9.63690
9.54080
11.36786
The ODS SELECT statement restricts the output to the TestsForLocation and
LocationCounts tables; see the section ODS Table Names on page 309. The
MU0= option species the null hypothesis value of 0 for the tests for location; by
default, 0 = 0. The LOCCOUNT option produces the table of the number of observations greater than, not equal to, and less than 66 inches.
Output 3.12.1 contains the results of the tests for location. All three tests are highly
signicant, causing the researchers to reject the hypothesis that the mean is 66 inches.
A sample program, uniex07.sas, for this example is available in the SAS Sample
Library for Base SAS software.
331
332
-Statistic-
-----p Value------
Students t
Sign
Signed Rank
t
M
S
Pr > |t|
Pr >= |M|
Pr >= |S|
-5.67065
-20
-849
<.0001
<.0001
<.0001
Value
16
72
56
The ODS SELECT statement restricts the output to the BasicMeasures and
TestsForLocation tables; see the section ODS Table Names on page 309. The
instructor is not willing to assume the ScoreChange variable is normal or even
symmetric, so he decides to examine the sign test. The large p-value (0.7744) of the
sign test provides insufcient evidence of a difference in test score medians.
Variable:
Location
Mean
Median
Mode
Variability
-3.08333
-3.00000
-5.00000
Std Deviation
Variance
Range
Interquartile Range
13.33797
177.90152
51.00000
10.50000
-Statistic-
-----p Value------
Students t
Sign
Signed Rank
t
M
S
Pr > |t|
Pr >= |M|
Pr >= |S|
-0.80079
-1
-8.5
0.4402
0.7744
0.5278
A sample program, uniex03.sas, for this example is available in the SAS Sample
Library for Base SAS software.
(mils);
3.478
3.497
3.461
3.501
3.533
3.468
3.528
3.439
3.488
3.516
3.556
3.495
3.489
3.495
3.450
3.564
3.477
3.513
3.515
3.474
3.482
3.518
3.514
3.443
3.516
3.522
3.536
3.496
3.484
3.500
3.512
3.523
3.470
3.458
3.476
3.520
3.491
3.539
3.482
3.466
333
334
A sample program, uniex08.sas, for this example is available in the SAS Sample
Library for Base SAS software.
Length
1
1
2
2
3
3
0.91
.
1.17
1.47
.
1.39
2.04
.
1.91
335
336
The CLASS statement requests comparisons for each level (distinct value) of the
classication variable Lot. The HISTOGRAM statement requests a comparative histogram for the variable Length. The NROWS= option species the number of rows
in the comparative histogram. By default, comparative histograms are displayed in
two rows per panel.
Output 3.15.3. Comparison by Lot Source
Output 3.15.3 reveals that the distributions of Length are similarly distributed except
for shifts in mean.
A sample program, uniex09.sas, for this example is available in the SAS Sample
Library for Base SAS software.
1 = 2002 2 = 2003;
data Disk;
input @1 Supplier $10. Year Width;
label Width = Opening Width (inches);
format Year mytime.;
datalines;
Supplier A
1
1.8932
.
.
.
Supplier B
1
1.8986
Supplier A
2
1.8978
.
.
.
Supplier B
2
1.8997
;
The KEYLEVEL= option species the key cell as the cell for which Supplier is equal
to SUPPLIER A and Year is equal to 2003. This cell determines the binning for
the other cells, and the columns are arranged so that this cell is displayed in the upper
left corner. Without the KEYLEVEL= option, the default key cell would be the
cell for which Supplier is equal to SUPPLIER A and Year is equal to 2002; the
column labeled 2002 would be displayed to the left of the column labeled 2003.
The VAXIS= option species the tick mark labels for the vertical axis. The
NROWS=2 and NCOLS=2 options specify a 2 2 arrangement for the tiles. The
CFRAMESIDE= and CFRAMETOP= options specify ll colors for the row and column labels, and the CFILL= option species a ll color for the bars. Output 3.16.1
provides evidence that both suppliers have reduced variability from 2002 to 2003.
337
338
A sample program, uniex10.sas, for this example is available in the SAS Sample
Library for Base SAS software.
1.91
0.26
-0.08
0.19
0.79
1.17
0.92
1.03
...
0.48
0.79
;
run;
0.41
0.66
0.78
0.22
0.58
0.71
0.43
0.53
0.07
0.57
0.27
0.90
0.49
0.48
The INSET statement requests insets containing the sample mean and standard deviation for each machine in the corresponding tile. The MIDPOINTS= option species
the midpoints of the histogram bins.
Output 3.17.1. Comparative Histograms
Output 3.17.1 shows that the average position for Machines 2 and 3 are similar and
that the spread for Machine 1 is much larger than for Machines 2 and 3.
A sample program, uniex11.sas, for this example is available in the SAS Sample
Library for Base SAS software.
339
340
The ODS SELECT statement restricts the output to the HistogramBins table and
the MyHist histogram; see the section ODS Table Names on page 309. The
ENDPOINTS= option species the endpoints for the histogram bins. By default, if
the ENDPOINTS= option is not specied, the automatic binning algorithm computes
values for the midpoints of the bins. The MIDPERCENTS option requests a table
of the midpoints of each histogram bin and the percent of the observations that fall
in each bin. This table is displayed in Output 3.18.1; the histogram is displayed in
Output 3.18.2. The NAME= option species a name for the histogram that can be
used in the ODS SELECT statement.
Output 3.18.1. Table of Bin Percentages Requested with MIDPERCENTS Option
Enhancing a Histogram
The UNIVARIATE Procedure
Variable: Thick
Histogram Bins for Thick
Bin
Minimum
Point
Observed
Percent
3.425
3.450
3.475
3.500
3.525
3.550
3.575
8.000
21.000
25.000
29.000
11.000
5.000
1.000
The MIDPOINTS= option is an alternative to the ENDPOINTS= option for specifying histogram bins. The following statements create a similar histogram, which is
shown in Output 3.18.3, to the one in Output 3.18.2:
title Enhancing a Histogram;
proc univariate data=Trans noprint;
histogram Thick / midpoints
= 3.4375 to 3.5875 by .025
rtinclude
outhistogram = OutMdpts;
run;
341
342
The OUTHISTOGRAM= option produces an output data set named OutMdpts, displayed in Output 3.18.4. This data set provides information on the bins of the histogram. For more information, see the section OUTHISTOGRAM= Output Data
Set on page 308.
Output 3.18.4. The OUTHISTOGRAM= Data Set OutMdpts
OUTHISTOGRAM= Data Set
Obs
_VAR_
_MIDPT_
1
2
3
4
5
6
7
Thick
Thick
Thick
Thick
Thick
Thick
Thick
3.4375
3.4625
3.4875
3.5125
3.5375
3.5625
3.5875
_OBSPCT_
9
21
26
28
11
5
0
A sample program, uniex08.sas, for this example is available in the SAS Sample
Library for Base SAS software.
343
Symbol
Estimate
Mean
Std Dev
Mu
Sigma
3.49533
0.032117
---Statistic----
-----p Value-----
Kolmogorov-Smirnov
Cramer-von Mises
Anderson-Darling
D
W-Sq
A-Sq
Pr > D
Pr > W-Sq
Pr > A-Sq
0.05563823
0.04307548
0.27840748
>0.150
>0.250
>0.250
344
-------Percent-----Observed
Estimated
3.000
9.000
23.000
19.000
24.000
15.000
3.000
4.000
3.296
9.319
18.091
24.124
22.099
13.907
6.011
1.784
Percent
20.0
40.0
60.0
80.0
------Quantile-----Observed
Estimated
3.46700
3.48350
3.50450
3.52250
3.46830
3.48719
3.50347
3.52236
in Output 3.19.1. By default, the parameters are estimated unless you specify values
with the MU= and SIGMA= secondary options after the NORMAL primary option.
The results of three goodness-of-t tests based on the empirical distribution function
(EDF) are displayed in Output 3.19.1. Since the p-values are all greater than 0.15,
the hypothesis of normality is not rejected.
A sample program, uniex08.sas, for this example is available in the SAS Sample
Library for Base SAS software.
The NOPRINT option in the PROC UNIVARIATE statement suppresses the tables
of statistical output produced by default; the NOPRINT option in parentheses after
the NORMAL option suppresses the tables of statistical output related to the t of the
normal distribution. The normal parameters are estimated from the data for each lot,
and the curves are superimposed on each component histogram. The INTERTILE=
option species the space between the framed areas, which are referred to as tiles. The
CPROP= option requests the shaded bars above each tile, which represent the relative
frequencies of observations in each lot. The comparative histogram is displayed in
Output 3.20.1.
A sample program, uniex09.sas, for this example is available in the SAS Sample
Library for Base SAS software.
345
346
The ODS SELECT statement restricts the output to the ParameterEstimates and
FitQuantiles tables; see the section ODS Table Names on page 309. The BETA
primary option requests a tted beta distribution. The THETA= secondary option
species the lower threshold. The SCALE= secondary option species the range
between the lower threshold and the upper threshold. Note that the default THETA=
and SCALE= values are zero and one, respectively.
Output 3.21.1. Superimposing a Histogram with a Fitted Beta Curve
347
348
Symbol
Estimate
Threshold
Scale
Shape
Shape
Mean
Std Dev
Theta
Sigma
Alpha
Beta
10
0.5
2.06832
6.022479
10.12782
0.072339
Percent
1.0
5.0
10.0
25.0
50.0
75.0
90.0
95.0
99.0
------Quantile-----Observed
Estimated
10.0180
10.0310
10.0380
10.0670
10.1220
10.1750
10.2255
10.2780
10.3220
10.0124
10.0285
10.0416
10.0718
10.1174
10.1735
10.2292
10.2630
10.3237
Weibull,
and
Gamma
1.741
0.378
0.501
0.450
0.643
0.241
0.714
0.247
0.845
0.483
0.777
1.121
0.922
0.319
0.352
0.768
0.597
0.880
0.486
0.636
0.409
0.231
0.344
0.529
1.080
The following statements t three distributions (lognormal, Weibull, and gamma) and
display their density curves on a single histogram:
title Distribution of Plate Gaps;
ods select ParameterEstimates GoodnessOfFit FitQuantiles MyHist;
proc univariate data=Plates;
var Gap;
histogram / midpoints=0.2 to 1.8 by 0.2
lognormal (l=1)
weibull
(l=2)
gamma
(l=8)
vaxis
= axis1
name
= MyHist;
inset n mean(5.3) std=Std Dev(5.3) skewness(5.3)
/ pos = ne header = Summary Statistics;
axis1 label=(a=90 r=0);
run;
349
350
Symbol
Estimate
Threshold
Scale
Shape
Mean
Std Dev
Theta
Zeta
Sigma
0
-0.58375
0.499546
0.631932
0.336436
---Statistic----
-----p Value-----
Kolmogorov-Smirnov
Cramer-von Mises
Anderson-Darling
D
W-Sq
A-Sq
Pr > D
Pr > W-Sq
Pr > A-Sq
0.06441431
0.02823022
0.24308402
>0.150
>0.500
>0.500
Percent
1.0
5.0
10.0
25.0
50.0
75.0
90.0
95.0
99.0
0.23100
0.24700
0.29450
0.37800
0.53150
0.74600
1.10050
1.54700
1.74100
0.17449
0.24526
0.29407
0.39825
0.55780
0.78129
1.05807
1.26862
1.78313
Symbol
Estimate
Threshold
Scale
Shape
Mean
Std Dev
Theta
Sigma
C
0
0.719208
1.961159
0.637641
0.339248
---Statistic----
-----p Value-----
Cramer-von Mises
Anderson-Darling
W-Sq
A-Sq
Pr > W-Sq
Pr > A-Sq
0.15937281
1.15693542
Percent
1.0
5.0
10.0
25.0
50.0
75.0
90.0
95.0
99.0
------Quantile-----Observed
Estimated
0.23100
0.24700
0.29450
0.37800
0.53150
0.74600
1.10050
1.54700
1.74100
0.06889
0.15817
0.22831
0.38102
0.59661
0.84955
1.10040
1.25842
1.56691
0.016
<0.010
351
352
Symbol
Estimate
Threshold
Scale
Shape
Mean
Std Dev
Theta
Sigma
Alpha
0
0.155198
4.082646
0.63362
0.313587
---Statistic----
-----p Value-----
Kolmogorov-Smirnov
Cramer-von Mises
Anderson-Darling
D
W-Sq
A-Sq
Pr > D
Pr > W-Sq
Pr > A-Sq
0.09695325
0.07398467
0.58106613
>0.250
>0.250
0.137
Percent
1.0
5.0
10.0
25.0
50.0
75.0
90.0
95.0
99.0
------Quantile-----Observed
Estimated
0.23100
0.24700
0.29450
0.37800
0.53150
0.74600
1.10050
1.54700
1.74100
0.13326
0.21951
0.27938
0.40404
0.58271
0.80804
1.05392
1.22160
1.57939
Output 3.22.5 provides three EDF goodness-of-t tests for the gamma distribution:
the Anderson-Darling, the Cramr-von Mises, and the Kolmogorov-Smirnov tests.
At the = 0.10 signicance level, all tests support the conclusion that the gamma
distribution with scale parameter = 0.16 and shape parameter = 4.08 provides a
good model for the distribution of plate gaps.
Based on this analysis, the tted lognormal distribution and the tted gamma distribution are both good models for the distribution of plate gaps. A sample program,
uniex13.sas, for this example is available in the SAS Sample Library for Base SAS
software.
The L= secondary option species distinct line types for the curves (the L= values are
paired with the C= values in the order listed). Output 3.23.1 demonstrates the effect
of c. In general, larger values of c yield smoother density estimates, and smaller
values yield estimates that more closely t the data distribution.
Output 3.23.1. Multiple Kernel Density Estimates
Output 3.23.1 reveals strong trimodality in the data, which is displayed with comparative histograms in Example 3.15.
A sample program, uniex09.sas, for this example is available in the SAS Sample
Library for Base SAS software.
353
354
= Strength in psi;
@@;
47.39
61.63
23.80
44.06
55.33
50.76
35.66
33.93
34.90
34.03
42.66
39.44
48.81
59.30
76.15
24.83
33.38
47.98
34.50
31.86
41.96
42.21
68.93
21.87
33.73
73.51
33.88
45.32
The NOPRINT option suppresses the tables of statistical output produced by default.
Specifying THETA=EST requests a local maximum likelihood estimate (LMLE) for
, as described by Cohen (1951). This estimate is then used to compute maximum
likelihood estimates for and .
Note: You can also specify THETA=EST with the WEIBULL primary option to t a
three-parameter Weibull distribution.
A sample program, uniex14.sas, for this example is available in the SAS Sample
Library for Base SAS software.
mm);
5.40
9.69
9.53
6.17
11.32
11.08
16.84
2.10
1.39
10.16
18.38
9.52
13.06
8.56
5.33
11.22
16.61
4.58
11.48
16.49
18.85
7.09
9.38
11.40
4.81
5.53
7.77
5.08
9.36
12.92
355
356
h(x) =
1
(x )2
exp
2 2
2
+ exp
(x + )2
2 2
, x0
f () =
2
2 e /2
2
[1 2()]
1 + 2
=A
A=
x2
1
n
n
2
i=1 xi
0 =
n
i=1
1
n
x2
i
1+2
0 = 0
Begin by using PROC MEANS to compute the rst and second moments and using
the following DATA step to compute the constant A:
proc means data = Assembly noprint;
var Offset;
output out=stat mean=m1 var=var n=n min = min;
run;
* Compute constant A from equation (19) of Elandt (1961);
data stat;
keep m2 a min;
set stat;
a = (m1*m1);
m2 = ((n-1)/n)*var + a;
a = a/m2;
run;
Next, use the SAS/IML subroutine NLPDD to solve equation (19) by minimizing
(f () A)2 , and compute 0 and 0 :
The preliminary estimates are saved in the data set Prelim, as shown in Output 3.25.1:
Output 3.25.1. Preliminary Estimates of , , and
The Data Set Prelim
EMU0
ESIG0
6.51735
ETHETA0
6.54953
0.99509
Now, using 0 and 0 as initial estimates, call the NLPDD subroutine to maximize
the log likelihood, l(, ), of the folded normal distribution, where, up to a constant,
n
log exp
l(, ) = n log +
i=1
(xi )2
2 2
+ exp
(xi + )2
2 2
357
358
The data set ParmEst contains the maximum likelihood estimates and (as well
ESIG
6.39650
ETHETA
1.04239
To annotate the curve on a histogram, begin by computing the width and endpoints
of the histogram intervals. The following statements save these values in a data set
called OutCalc. Note that a plot is not produced at this point.
Output 3.25.3 provides a listing of the data set OutCalc. The width of the histogram
bars is saved as the value of the variable WIDTH ; the midpoints of the rst and last
histogram bars are saved as the values of the variables MIDPT1 and MIDPTN .
Output 3.25.3. The Data Set OutCalc
Data Set OutCalc
_MIDPT1_
1.5
_WIDTH_
_MIDPTN_
22.5
The following statements create an annotate data set named Anno, which contains
the coordinates of the tted curve:
data Anno;
merge ParmEst OutCalc;
length function color $ 8;
function = point;
color
= black;
size
= 2;
xsys
= 2;
ysys
= 2;
when
= a;
constant = 39.894*_width_;;
left
= _midpt1_ - .5*_width_;
right
= _midptn_ + .5*_width_;
inc
= (right-left)/100;
do x = left to right by inc;
z1 = (x-emu)/esig;
z2 = (x+emu)/esig;
y = (constant/esig)*(exp(-0.5*z1*z1)+exp(-0.5*z2*z2));
output;
function = draw;
end;
run;
The following statements read the ANNOTATE= data set and display the histogram
and tted curve:
359
360
A sample program, uniex15.sas, for this example is available in the SAS Sample
Library for Base SAS software.
The LOGNORMAL primary option requests plots based on the lognormal family of
distributions, and the SIGMA= secondary option requests plots for equal to 0.7,
0.9, and 1.1. These plots are displayed in Output 3.26.1, Output 3.26.2, and Output
3.26.3, respectively. Alternatively, you can specify to be estimated using the sample
standard deviation by using the option SIGMA=EST.
The SQUARE option displays the probability plot in a square format, the HREF=
option requests a reference line at the 95th percentile, and the LHREF= option
species the line type for the reference line.
Output 3.26.1. Probability Plot Based on Lognormal Distribution with =0.7
361
362
The value = 0.9 in Output 3.26.2 most nearly linearizes the point pattern.
The 95th percentile of the position deviation distribution seen in Output 3.26.2 is
approximately 0.001, since this is the value corresponding to the intersection of the
point pattern with the reference line.
The plot is displayed in Output 3.26.4. Note that the maximum likelihood estimate
of (in this case 0.882) does not necessarily produce the most linear point pattern.
Output 3.26.4. Probability Plot Based on Lognormal Distribution with Estimated
A sample program, uniex16.sas, for this example is available in the SAS Sample
Library for Base SAS software.
363
364
The ODS SELECT statement restricts the output to the ParameterEstimates and
GoodnessOfFit tables; see the section ODS Table Names on page 309. The
LOGNORMAL primary option superimposes a tted curve on the histogram in
Output 3.27.1. The W= option species the line width for the curve. The INSET
statement species that the mean, standard deviation, and skewness be displayed in
an inset in the northeast corner of the plot. Note that the default value of the threshold
parameter is zero. In applications where the threshold is not zero, you can specify
with the THETA= option. The variable Deviation includes values that are less than
the default threshold; therefore, the option THETA= EST is used.
Output 3.27.1. Normal Probability Plot Created with Graphics Device
Output 3.27.2 provides three EDF goodness-of-t tests for the lognormal distribution: the Anderson-Darling, the Cramr-von Mises, and the Kolmogorov-Smirnov
tests. The null hypothesis for the three tests is that a lognormal distribution holds for
the sample data.
Symbol
Estimate
Threshold
Scale
Shape
Mean
Std Dev
Theta
Zeta
Sigma
-0.00834
-6.14382
0.882225
-0.00517
0.003438
---Statistic----
-----p Value-----
Kolmogorov-Smirnov
Cramer-von Mises
Anderson-Darling
D
W-Sq
A-Sq
Pr > D
Pr > W-Sq
Pr > A-Sq
0.09419634
0.02919815
0.21606642
>0.500
>0.500
>0.500
The p-values for all three tests are greater than 0.5, so the null hypothesis is not
rejected. The tests support the conclusion that the two-parameter lognormal distri
bution with scale parameter = 6.14, and shape parameter = 0.88 provides
a good model for the distribution of position deviations. For further discussion of
goodness-of-t interpretation, see the section Goodness-of-Fit Tests on page 292.
A sample program, uniex16.sas, for this example is available in the SAS Sample
Library for Base SAS software.
@@;
= Hole Distance (cm);
9.70
10.24
10.21
9.79
10.11
10.44
9.77
9.72
10.70
9.98
9.76
9.63
10.00
10.06
9.93
10.16
9.36
9.82
9.54
10.15
365
366
The plot compares the ordered values of Distance with quantiles of the normal distribution. The linearity of the point pattern indicates that the measurements are normally
distributed. Note that a normal Q-Q plot is created by default.
Output 3.28.1. Normal Quantile-Quantile Plot for Distance
A sample program, uniex17.sas, for this example is available in the SAS Sample
Library for Base SAS software.
The data clearly follow the line, which indicates that the distribution of the distances
is normal.
A sample program, uniex17.sas, for this example is available in the SAS Sample
Library for Base SAS software.
367
368
5.607
5.512
5.823
5.244
5.394
5.907
5.475
The following statements request the normal Q-Q plot in Output 3.30.1:
symbol v=plus;
title Normal Q-Q Plot for Diameters;
proc univariate data=Measures noprint;
qqplot Diameter / normal(noprint)
square
vaxis=axis1;
axis1 label=(a=90 r=0);
run;
The nonlinearity of the points in Output 3.30.1 indicates a departure from normality.
Since the point pattern is curved with slope increasing from left to right, a theoretical distribution that is skewed to the right, such as a lognormal distribution, should
provide a better t than the normal distribution. The mild curvature suggests that you
should examine the data with a series of lognormal Q-Q plots for small values of the
shape parameter , as illustrated in Example 3.31. For details on interpreting a Q-Q
plot, see the section Interpretation of Quantile-Quantile and Probability Plots on
page 299.
A sample program, uniex18.sas, for this example is available in the SAS Sample
Library for Base SAS software.
369
370
The plot in Output 3.31.2 displays the most linear point pattern, indicating that the
lognormal distribution with = 0.5 provides a reasonable t for the data distribution.
Data with this particular lognormal distribution have the following density function:
p(x) =
2
(x)
exp 2(log(x ) )2
for x >
for x
The points in the plot fall on or near the line with intercept and slope exp(). Based
on Output 3.31.2, 5 and exp() 1.2 = 0.4, giving log(0.4) 0.92.
3
You can also request a reference line using the SIGMA=, THETA=, and ZETA=
options together. The following statements produce the lognormal Q-Q plot in Output
3.31.4:
symbol v=plus;
title Lognormal Q-Q Plot for Diameters;
proc univariate data=Measures noprint;
qqplot Diameter / lognormal(theta=5 zeta=est sigma=est
color=black l=2 noprint)
square;
run;
Output 3.31.1 through Output 3.31.3 show that the threshold parameter is not equal
to zero. Specifying THETA=5 overrides the default value of zero. The SIGMA=EST
and ZETA=EST secondary options request estimates for and exp using the sample
mean and standard deviation.
371
372
From the plot in Output 3.31.2, can be estimated as 0.51, which is consistent with
the estimate of 0.5 derived from the plot in Output 3.31.2. The next example illustrates how to estimate percentiles using lognormal Q-Q plots.
A sample program, uniex18.sas, for this example is available in the SAS Sample
Library for Base SAS software.
from
Lognormal
This example, which is a continuation of the previous example, shows how to use a QQ plot to estimate percentiles such as the 95th percentile of the lognormal distribution.
A probability plot can also be used for this purpose, as illustrated in Example 3.26.
The point pattern in Output 3.31.4 has a slope of approximately 0.39 and an intercept
of 5. The following statements reproduce this plot, adding a lognormal reference line
with this slope and intercept:
symbol v=plus;
title Lognormal Q-Q Plot for Diameters;
proc univariate data=Measures noprint;
qqplot Diameter / lognormal(sigma=0.5 theta=5 slope=0.39 noprint)
pctlaxis(grid)
vref = 5.8 5.9 6.0
square;
run;
The PCTLAXIS option labels the major percentiles, and the GRID option draws
percentile axis reference lines. The 95th percentile is 5.9, since the intersection of
the distribution reference line and the 95th reference line occurs at this value on the
vertical axis.
Alternatively, you can compute this percentile from the estimated lognormal parameters. The th percentile of the lognormal distribution is
P = exp(1 () + ) +
where 1 () is the inverse cumulative standard normal distribution. Consequently,
P0.95 = exp
1 1
2 (0.95)
+ log(0.39) + 5 = 5.89
A sample program, uniex18.sas, for this example is available in the SAS Sample
Library for Base SAS software.
from
Lognormal
373
374
Because the point pattern in Output 3.33.1 is linear, you can estimate the lognormal
parameters and as the normal plot estimates of and , which are 0.99 and
0.51. These values correspond to the previous estimates of 0.92 for and 0.5 for
from Example 3.31. A sample program, uniex18.sas, for this example is available
in the SAS Sample Library for Base SAS software.
25.10
29.68
32.90
28.58
26.78
31.44
31.34
33.76
35.46
28.88
27.82
32.52
If no assumption is made about the parameters of this distribution, you can use the
WEIBULL option to request a three-parameter Weibull plot. As in the previous example, you can visually estimate the shape parameter c by requesting plots for different
values of c and choosing the value of c that linearizes the point pattern. Alternatively,
you can request a maximum likelihood estimate for c, as illustrated in the following
statements:
symbol v=plus;
title Three-Parameter Weibull Q-Q Plot for Failure Times;
proc univariate data=Failures noprint;
qqplot Time / weibull(c=est theta=est sigma=est noprint)
square
href=0.5 1 1.5 2
vref=25 27.5 30 32.5 35
lhref=4 lvref=4
chref=tan cvref=tan;
run;
Note: When using the WEIBULL option, you must either specify a list of values for
the Weibull shape parameter c with the C= option, or you must specify C=EST.
Output 3.34.1 displays the plot for the estimated value c = 1.99. The reference
line corresponds to the estimated values for the threshold and scale parameters of
375
376
Now, suppose it is known that the circuit lifetime is at least 24 months. The following
statements use the known threshold value 0 = 24 to produce the two-parameter
Weibull Q-Q plot shown in Output 3.31.4:
symbol v=plus;
title Two-Parameter Weibull Q-Q Plot for Failure Times;
proc univariate data=Failures noprint;
qqplot Time / weibull(theta=24 c=est sigma=est noprint)
square
vref= 25 to 35 by 2.5
href= 0.5 to 2.0 by 0.5
lhref=4 lvref=4
chref=tan cvref=tan;
run;
The reference line is based on maximum likelihood estimates c = 2.08 and = 6.05.
References
A sample program, uniex19.sas, for this example is available in the SAS Sample
Library for Base SAS software.
References
Blom, G. (1958), Statistical Estimates and Transformed Beta-Variables, New York:
John Wiley & Sons, Inc.
Chambers, J. M., Cleveland, W. S., Kleiner, B., and Tukey, P. A. (1983), Graphical
Methods for Data Analysis, Belmont, Calif.: Wadsworth International Group.
Cohen, A. C., Jr.
(1951), Estimating Parameters of Logarithmic-Normal
Distributions by Maximum Likelihood, Journal of the American Statistical
Association, 46, 206212.
Conover, W. J. (1999), Practical Nonparametric Statistics, Third Edition, New York:
John Wiley & Sons, Inc.
Croux, C. and Rousseeuw, P. J. (1992), Time-Efcient Algorithms for Two Highly
Robust Estimators of Scale, Computational Statistics, Vol. 1, 411428.
DAgostino, R. B. and Stephens, M. (1986), Goodness-of-Fit Techniques, New York:
Marcel Dekker, Inc.
Dixon, W. J. and Tukey, J. W. (1968), Approximate Behavior of the Distribution of
Winsorized t (Trimming/Winsorization 2), Technometrics, 10, 8398.
377
378
A Guide for
Hampel, F. R. (1974), The Inuence Curve and Its Role in Robust Estimation,
Journal of the American Statistical Association, 69, 383393.
Iman, R. L. (1974), Use of a t-statistic as an Approximation to the Exact Distribution
of the Wilcoxon Signed Ranks Test Statistic, Communications in Statistics, 3,
795806.
Johnson, N. L., Kotz, S., and Balakrishnan, N. (1994), Continuous Univariate
Distributions, Vol. 1, New York: John Wiley & Sons, Inc.
Johnson, N. L., Kotz, S., and Balakrishnan, N. (1995), Continuous Univariate
Distributions, Vol. 2, New York: John Wiley & Sons, Inc.
Lehmann, E. L. (1998), Nonparametrics: Statistical Methods Based on Ranks, New
Jersey: Prentice Hall.
Odeh, R. E. and Owen, D. B. (1980), Tables for Normal Tolerance Limits, Sampling
Plans, and Screening, New York: Marcel Dekker, Inc.
Owen, D. B. and Hua, T. A. (1977), Tables of Condence Limits on the Tail Area of
the Normal Distribution, Communication and Statistics, Part BSimulation and
Computation, 6, 285311.
Rousseeuw, P. J. and Croux, C. (1993), Alternatives to the Median Absolute
Deviation, Journal of the American Statistical Association. 88, 12731283.
Royston, J. P. (1992), Approximating the Shapiro-Wilk W-Test for Non-normality,
Statistics and Computing, 2, 117119.
Shapiro, S. S. and Wilk, M. B. (1965), An Analysis of Variance Test for Normality
(complete samples), Biometrika, 52, 591611.
Silverman, B. W. (1986), Density Estimation for Statistics and Data Analysis, New
York: Chapman and Hall.
Terrell, G. R. and Scott, D. W. (1985), Oversmoothed Nonparametric Density
Estimates, Journal of the American Statistical Association, 80, 209214.
Tukey, J. W. (1977), Exploratory Data Analysis, Reading, Massachusetts: AddisonWesley.
Tukey, J. W. and McLaughlin, D. H. (1963), Less Vulnerable Condence
and Signicance Procedures for Location Based on a Single Sample:
Trimming/Winsorization 1, Sankhya A, 25, 331352.
Subject Index
A
adjusted odds ratio, 137
agreement, measures of, 127
alpha level
FREQ procedure, 79, 87
Anderson-Darling statistic, 295
Anderson-Darling test, 206
annotating
histograms, 218
probability plots, 245
quantile plots, 258
ANOVA (row mean scores) statistic, 136
association, measures of
FREQ procedure, 108
asymmetric lambda, 108, 116
B
beta distribution, 288, 301
deviation from theoretical distribution, 294
EDF goodness-of-t test, 294
estimation of parameters, 218
tting, 218, 288
formulas for, 288
probability plots, 245, 301
quantile plots, 258, 301
binomial proportion test, 118
examples, 166
Bowkers test of symmetry, 127, 128
box plots, line printer, 207, 282
side-by-side, 206, 283
Breslow-Day test, 142
C
case-control studies
odds ratio, 122, 137, 138
cell count data, 98
example (FREQ), 161
chi-square tests
examples (FREQ), 164, 169, 172
FREQ procedure, 103, 104
Cicchetti-Allison weights, 131
Cochrans Q test, 127, 133, 182
Cochran-Armitage test for trend, 124, 177
Cochran-Mantel-Haenszel statistics (FREQ), 81, 134,
See also chi-square tests
ANOVA (row mean scores) statistic, 136
correlation statistic, 135
examples, 174
380
Subject Index
D
data summarization tools, 196
density estimation,
See kernel density estimation
descriptive statistics
computing, 269
discordant observations, 108
distribution of variables, 196
E
EDF,
See empirical distribution function
EDF goodness-of-t tests, 294
probability values of, 296
empirical distribution function
denition of, 294
EDF test statistics, 294
exact tests
computational algorithms (FREQ), 143
computational resources (FREQ), 145
condence limits, 77
FREQ procedure, 142, 177
network algorithm (FREQ), 143
p-value, denitions, 144
exponential distribution, 289, 301
deviation from theoretical distribution, 294
EDF goodness-of-t test, 294
estimation of parameters, 221
tting, 289
formulas for, 289
probability plots, 247, 301
quantile plots, 260, 301
extreme observations, 230, 315
extreme values, 315
F
Fishers exact test
FREQ procedure, 103, 106, 107
Fishers z transformation, 8, 21
applications, 23
condence limits for the correlation, 22
tted parametric distributions, 288
beta distribution, 288
exponential distribution, 289
folded normal distribution, 355
gamma distribution, 290
lognormal distribution, 291
normal distribution, 291
Weibull distribution, 292
Fleiss-Cohen weights, 131
folded normal distribution, 355
Freeman-Halton test, 107
FREQ procedure
alpha level, 79, 87
binomial proportion, 118, 166
Bowkers test of symmetry, 127
Subject Index
uncertainty coefcients, 108
weighted kappa coefcient, 127
frequency tables, 65, 84
creating (UNIVARIATE), 317
one-way (FREQ), 103, 104, 151, 152, 164
two-way (FREQ), 104, 105, 169
Friedmans chi-square statistic, 180
381
J
Jonckheere-Terpstra test, 125
K
kappa coefcient, 129, 130
tests, 132
weights, 131
Kendall correlation statistics, 8
382
Subject Index
L
lambda asymmetric, 108, 116
lambda symmetric, 108, 117
likelihood-ratio chi-square test, 103
likelihood-ratio test
chi-square (FREQ), 105
line printer plots, 281
box plots, 282, 283
normal probability plots, 282
stem-and-leaf plots, 282
listwise deletion, 26
location estimates
robust, 207, 209
location parameters, 304
probability plots, estimation with, 304
quantile plots, estimation with, 304
location, tests for
UNIVARIATE procedure, 331
lognormal distribution, 291, 302
deviation from theoretical distribution, 294
EDF goodness-of-t test, 294
estimation of parameters, 224
tting, 224, 291
formulas for, 291
histograms, 354, 363
probability plots, 249, 302, 360
quantile plots, 261, 302, 373
M
Mantel-Haenszel chi-square test, 103, 106
McNemars test, 127, 128
measures of agreement, 127
measures of association, 34
nonparametric, 3
measures of location
means, 270
modes, 272, 314
trimmed means, 279
Winsorized means, 278
median absolute deviation (MAD), 280
N
network algorithm, 143
nonparametric density estimation,
See kernel density estimation
nonparametric measures of association, 3
normal distribution, 291, 302
deviation from theoretical distribution, 294
EDF goodness-of-t test, 294
estimation of parameters, 226
tting, 226, 291
formulas for, 291
histograms, 226
probability plots, 241, 250, 302
quantile plots, 262, 302, 365
normal probability plots,
See probability plots
line printer, 207, 282
O
odds ratio
adjusted, 137
Breslow-Day test, 142
case-control studies, 122, 137, 138
logit estimate, 138
Mantel-Haenszel estimate, 137
ODS (Output Delivery System)
CORR procedure and, 30
UNIVARIATE procedure table names, 309
ODS graph names
CORR procedure, 34
output data sets
saving correlations in, 52
overall kappa coefcient, 127, 132
P
paired data, 275, 332
pairwise deletion, 26
parameters for tted density curves, 217, 218, 226
229
partial correlations, 18
probability values, 21
Pearson chi-square test, 103, 105
Pearson correlation coefcient, 108, 113
Pearson correlation statistics, 3
example, 34
in output data set, 8
Pearson partial correlation, 3, 13
Pearson product-moment correlation, 3, 8, 14, 34
Pearson weighted product-moment correlation,
3, 13
probability values, 16
Subject Index
suppressing, 8
percentiles
axes, quantile plots, 263, 305
calculating, 273
condence limits for, 274, 328
dening, 207, 273
empirical distribution function, 273
options, 239, 240
probability plots and, 241
quantile plots and, 253
saving to an output data set, 325
visual estimates, probability plots, 305
visual estimates, quantile plots, 305
weighted, 273
weighted average, 273
phi coefcient, 103, 108
plot statements, UNIVARIATE procedure, 195
plots
box plots, 206, 207, 282, 283
comparative, 210212, 284
comparative histograms, 224, 226, 249, 334,
337, 345
comparative probability plots, 250
comparative quantile plots, 262, 263
line printer plots, 281
normal probability plots, 207, 282
probability plots, 241, 300
quantile plots, 253, 300
size of, 207
stem-and-leaf, 207, 282
suppressing, 226
polychoric correlation coefcient, 94, 108, 116
probability plots, 241
annotating, 245
appearance, 246250, 252
axis color, 246
beta distribution, 245, 301
comparative, 249, 250
distribution reference lines, 251
distributions for, 300
exponential distribution, 247, 301
frame, color, 246
gamma distribution, 247, 301
location parameters, estimation of, 304
lognormal distribution, 249, 302, 360, 363
normal distribution, 241, 250, 302
options summarized by function, 243
overview, 241
parameters for distributions, 243, 245, 246, 249
253
percentile axis, 250
percentiles, estimates of, 305
reference lines, 247, 248
reference lines, options, 246, 248250, 252
scale parameters, estimation of, 304
shape parameters, 251, 303
three-parameter Weibull distribution, 302
threshold parameter, 252
threshold parameters, estimation of, 304
383
Q
Q-Q plots,
See quantile plots
Qn , 280
quantile plots, 253
appearance, 259263, 265
axes, percentile scale, 263, 305
axis color, 259
beta distribution, 258, 301
comparative, 262, 263
creating, 298
diagnostics, 299
distribution reference lines, 264, 366
distributions for, 300
exponential distribution, 260, 301
frame, color, 259
gamma distribution, 260, 301
interpreting, 299
legends, suppressing (UNIVARIATE), 366
location parameters, estimation of, 304
lognormal distribution, 261, 302, 369, 373
lognormal distribution, percentiles, 372
nonnormal data, 368
normal distribution, 262, 302, 365
options summarized by function, 255
overview, 253
parameters for distributions, 255, 258, 259, 262
266
percentiles, estimates of, 305
reference lines, 259, 260, 263
reference lines, options, 259, 261263, 265
scale parameters, estimation of, 304
shape parameters, 264, 303
three-parameter Weibull distribution, 302
threshold parameter, 265
threshold parameters, estimation of, 304
tick marks on horizontal axis, 260
tiles for comparative plots, 261
two-parameter Weibull distribution, 303
Weibull distribution, 266, 375
quantile-quantile plots,
See quantile plots
quantiles
dening, 273
empirical distribution function, 273
histograms and, 310
weighted average, 273
R
rank scores, 103
relative risk, 137
cohort studies, 123
logit estimate, 141
384
Subject Index
S
saving correlations
example, 52
scale estimates
robust, 207
scale parameters, 304
probability plots, 304
quantile plots, 304
shape parameters, 303
probability plots, 251
quantile plots, 264
Shapiro-Wilk statistic, 293
Shapiro-Wilk test, 206
sign test, 275, 276
paired data and, 332
signed rank statistic, computing, 277
simple kappa coefcient, 127, 129
singularity of variables, 8
smoothing data distribution,
See kernel density estimation
Sn , 280
Somers D statistics, 108, 112
Spearman correlation statistics, 8
probability values, 16
Spearman partial correlation, 3, 13
Spearman rank-order correlation, 3, 16, 34
Spearman rank correlation coefcient, 108, 114
standard deviation, 8
stem-and-leaf plots, 207, 282
stratied analysis
FREQ procedure, 65, 84
stratied table
example, 174
Stuarts tau-c statistic, 108, 112
Students t test, 275, 276
summary statistics
insets of, 230
sums of squares and crossproducts, 8
T
t test
U
uncertainty coefcients, 108, 117, 118
univariate analysis
for multiple variables, 312
UNIVARIATE procedure
calculating modes, 314
classication levels, 210
comparative plots, 210212, 284
computational resources, 311
concepts, 268
condence limits, 204, 277, 326
descriptive statistics, 269, 312
examples, 312
extreme observations, 230, 315
extreme values, 315
tted continuous distributions, 288
frequency variables, 322
goodness-of-t tests, 292
high-resolution graphics, 284
histograms, 310, 315
insets for descriptive statistics, 230
keywords for insets, 230
keywords for output data sets, 237
Subject Index
line printer plots, 281, 319
missing values, 210, 268
mode calculation, 272
normal probability plots, 282
ODS table names, 309
output data sets, 237, 306, 323
overview, 196
percentiles, 241, 253, 273
percentiles, condence limits, 204, 205, 328
plot statements, 195
probability plots, 241, 300
quantile plots, 253, 300
quantiles, condence limits, 204, 205
results, 308
robust estimates, 329
robust estimators, 278
robust location estimates, 207, 209
robust scale estimates, 207
rounding, 269
sign test, 276, 332
specifying analysis variables, 266
task tables, 253
testing for location, 331
tests for location, 275
weight variables, 267
UNIVARIATE procedure, OUTPUT statement
output data set, 306
V
variances, 8
W
Weibull distribution, 292
deviation from theoretical distribution, 294
EDF goodness-of-t test, 294
estimation of parameters, 229
tting, 229, 292
formulas for, 292
histograms, 354
probability plots, 253
quantile plots, 266, 375
three-parameter, 302
two-parameter, 303
weight values, 205
weighted kappa coefcient, 127, 130
weighted percentiles, 273
Wilcoxon signed rank test, 275, 277
Winsorized means, 209, 278
Y
Yules Q statistic, 110
Z
zeros, structural and random
FREQ procedure, 133
385
386
Subject Index
Syntax Index
A
AGREE option
EXACT statement (FREQ), 78
OUTPUT statement (FREQ), 81
TABLES statement (FREQ), 87
TEST statement (FREQ), 97
AJCHI option
OUTPUT statement (FREQ), 81
ALL option
OUTPUT statement (FREQ), 81
PROC UNIVARIATE statement, 204
TABLES statement (FREQ), 87
ALPHA option
PROC CORR statement, 8
ALPHA= option
EXACT statement (FREQ), 79
HISTOGRAM statement (UNIVARIATE), 217,
289
PLOTS=SCATTER option (CORR), 32
PROBPLOT statement (UNIVARIATE), 245
PROC UNIVARIATE statement, 204
QQPLOT statement (UNIVARIATE), 258
TABLES statement (FREQ), 87
ANNOKEY option
HISTOGRAM statement (UNIVARIATE), 217
PROBPLOT statement (UNIVARIATE), 245
QQPLOT statement (UNIVARIATE), 258
ANNOTATE= option
HISTOGRAM statement (UNIVARIATE), 218,
355
PROBPLOT statement (UNIVARIATE), 245
PROC UNIVARIATE statement, 204, 305
QQPLOT statement (UNIVARIATE), 258
B
BARWIDTH= option
HISTOGRAM statement (UNIVARIATE), 218
BDCHI option
OUTPUT statement (FREQ), 81
BDT option
TABLES statement (FREQ), 87
BEST= option
PROC CORR statement, 8
BETA option
HISTOGRAM statement (UNIVARIATE), 218,
288, 346
PROBPLOT statement (UNIVARIATE), 245,
301
QQPLOT statement (UNIVARIATE), 258, 301
BETA= option
HISTOGRAM statement (UNIVARIATE), 218,
289
PROBPLOT statement (UNIVARIATE), 245
QQPLOT statement (UNIVARIATE), 258
BINOMIAL option
EXACT statement (FREQ), 78
OUTPUT statement (FREQ), 81
TABLES statement (FREQ), 87, 166
BINOMIALC option
TABLES statement (FREQ), 88
BY statement
CORR procedure, 12
FREQ procedure, 77
UNIVARIATE procedure, 209
C
C= option
HISTOGRAM statement (UNIVARIATE), 218,
297, 352
PROBPLOT statement (UNIVARIATE), 246
QQPLOT statement (UNIVARIATE), 259, 375
CAXIS= option
HISTOGRAM statement (UNIVARIATE), 219
PROBPLOT statement (UNIVARIATE), 246
QQPLOT statement (UNIVARIATE), 259
CBARLINE= option
HISTOGRAM statement (UNIVARIATE), 219
CELLCHI2 option
TABLES statement (FREQ), 88
CFILL= option
HISTOGRAM statement (UNIVARIATE), 219
INSET statement (UNIVARIATE), 235
CFILLH= option
INSET statement (UNIVARIATE), 235
CFRAME= option
HISTOGRAM statement (UNIVARIATE), 219
INSET statement (UNIVARIATE), 235
PROBPLOT statement (UNIVARIATE), 246
QQPLOT statement (UNIVARIATE), 259
CFRAMESIDE= option
HISTOGRAM statement (UNIVARIATE), 219
PROBPLOT statement (UNIVARIATE), 246
QQPLOT statement (UNIVARIATE), 259
CFRAMETOP= option
HISTOGRAM statement (UNIVARIATE), 219
PROBPLOT statement (UNIVARIATE), 246
QQPLOT statement (UNIVARIATE), 259
388
Syntax Index
CGRID= option
HISTOGRAM statement (UNIVARIATE), 219
PROBPLOT statement (UNIVARIATE), 246
QQPLOT statement (UNIVARIATE), 259
CHEADER= option
INSET statement (UNIVARIATE), 235
CHISQ option
EXACT statement (FREQ), 78, 169
OUTPUT statement (FREQ), 81
TABLES statement (FREQ), 88, 104, 169
CHREF= option
HISTOGRAM statement (UNIVARIATE), 219
PROBPLOT statement (UNIVARIATE), 246
QQPLOT statement (UNIVARIATE), 259
CIBASIC option
PROC UNIVARIATE statement, 204, 326
CIPCTLDF option
PROC UNIVARIATE statement, 204
CIPCTLNORMAL option
PROC UNIVARIATE statement, 205
CIQUANTDF option
PROC UNIVARIATE statement, 328
CIQUANTNORMAL option
PROC UNIVARIATE statement, 205, 328
CL option
TABLES statement (FREQ), 89
CLASS statement
UNIVARIATE procedure, 210
CMH option
OUTPUT statement (FREQ), 81
TABLES statement (FREQ), 89
CMH1 option
OUTPUT statement (FREQ), 81
TABLES statement (FREQ), 89
CMH2 option
OUTPUT statement (FREQ), 81
TABLES statement (FREQ), 89
CMHCOR option
OUTPUT statement (FREQ), 81
CMHGA option
OUTPUT statement (FREQ), 81
CMHRMS option
OUTPUT statement (FREQ), 81
COCHQ option
OUTPUT statement (FREQ), 81
COLOR= option
HISTOGRAM statement (UNIVARIATE), 219
PROBPLOT statement (UNIVARIATE), 246
QQPLOT statement (UNIVARIATE), 259
COMOR option
EXACT statement (FREQ), 78
COMPRESS option
PROC FREQ statement, 75
CONTENTS= option
TABLES statement (FREQ), 89
CONTGY option
OUTPUT statement (FREQ), 82
CONVERGE= option
TABLES statement (FREQ), 90
CORR procedure
syntax, 6
CORR procedure, BY statement, 12
CORR procedure, FREQ statement, 13
CORR procedure, PARTIAL statement, 13
CORR procedure, PLOTS option
NMAXVAR= option, 31, 32
NMAXWITH= option, 31, 32
CORR procedure, PLOTS=SCATTER option
ALPHA= option, 32
ELLIPSE= option, 32
NOINSET, 32
NOLEGEND, 32
CORR procedure, PROC CORR statement, 7
ALPHA option, 8
BEST= option, 8
COV option, 8
CSSCP option, 8
DATA= option, 9
EXCLNPWGT option, 9
FISHER option, 9
HOEFFDING option, 9
KENDALL option, 9
NOCORR option, 10
NOMISS option, 10
NOPRINT option, 10
NOPROB option, 10
NOSIMPLE option, 10
OUT= option, 10
OUTH= option, 10
OUTK= option, 10
OUTP= option, 10
OUTS= option, 10
PEARSON option, 11
RANK option, 11
SINGULAR= option, 11
SPEARMAN option, 11
SSCP option, 11
VARDEF= option, 11
CORR procedure, VAR statement, 13
CORR procedure, WEIGHT statement, 13
CORR procedure, WITH statement, 14
COV option
PROC CORR statement, 8
CPROP= option
HISTOGRAM statement (UNIVARIATE), 220,
345
CRAMV option
OUTPUT statement (FREQ), 82
CROSSLIST option
TABLES statement (FREQ), 90
CSHADOW= option
INSET statement (UNIVARIATE), 235
CSSCP option
PROC CORR statement, 8
CTEXT= option
HISTOGRAM statement (UNIVARIATE), 220
INSET statement (UNIVARIATE), 235
PROBPLOT statement (UNIVARIATE), 247
Syntax Index
QQPLOT statement (UNIVARIATE), 259
CTEXTSIDE= option
HISTOGRAM statement (UNIVARIATE), 220
CTEXTTOP= option
HISTOGRAM statement (UNIVARIATE), 220
CUMCOL option
TABLES statement (FREQ), 91
CVREF= option
HISTOGRAM statement (UNIVARIATE), 220
PROBPLOT statement (UNIVARIATE), 247
QQPLOT statement (UNIVARIATE), 259
D
DATA option
INSET statement (UNIVARIATE), 235
DATA= option
INSET statement (UNIVARIATE), 235
PROC CORR statement, 9
PROC FREQ statement, 75
PROC UNIVARIATE statement, 205, 305
DESCENDING option
BY statement (UNIVARIATE), 209
DESCRIPTION= option
HISTOGRAM statement (UNIVARIATE), 220
PROBPLOT statement (UNIVARIATE), 247
QQPLOT statement (UNIVARIATE), 260
DEVIATION option
TABLES statement (FREQ), 91
E
ELLIPSE= option
PLOTS=SCATTER option (CORR), 32
ENDPOINTS= option
HISTOGRAM statement (UNIVARIATE), 220,
340
EQKAP option
OUTPUT statement (FREQ), 82
EQWKP option
OUTPUT statement (FREQ), 82
EXACT option
OUTPUT statement (FREQ), 82
EXACT statement
FREQ procedure, 77
EXCLNPWGT option
PROC CORR statement, 9
PROC UNIVARIATE statement, 205
EXPECTED option
TABLES statement (FREQ), 91
EXPONENTIAL option
HISTOGRAM statement (UNIVARIATE), 221,
289
PROBPLOT statement (UNIVARIATE), 247,
301
QQPLOT statement (UNIVARIATE), 260, 301
F
FILL option
HISTOGRAM statement (UNIVARIATE), 221
FISHER option
389
390
Syntax Index
CONTGY option, 82
CRAMV option, 82
EQKAP option, 82
EQWKP option, 82
EXACT option, 82
FISHER option, 82
GAMMA option, 82
JT option, 82
KAPPA option, 82
KENTB option, 82
LAMCR option, 82
LAMDAS option, 82
LAMRC option, 82
LGOR option, 82
LGRRC1 option, 82
LGRRC2 option, 82
LRCHI option, 82, 173
MCNEM option, 82
MEASURES option, 82
MHCHI option, 82
MHOR option, 82
MHRRC1 option, 82
MHRRC2 option, 82
N option, 82
NMISS option, 82
OR option, 83
OUT= option, 80
PCHI option, 83, 173
PCORR option, 83
PHI option, 83
PLCORR option, 83
RDIF1 option, 83
RDIF2 option, 83
RELRISK option, 83
RISKDIFF option, 83
RISKDIFF1 option, 83
RISKDIFF2 option, 83
RRC1 option, 83
RRC2 option, 83
RSK1 option, 83
RSK11 option, 83
RSK12 option, 83
RSK2 option, 83
RSK21 option, 83
RSK22 option, 83
SCORR option, 83
SMDCR option, 83
SMDRC option, 83
STUTC option, 83
TREND option, 83
TSYMM option, 83
U option, 83
UCR option, 83
URC option, 83
WTKAP option, 83
FREQ procedure, PROC FREQ statement, 75
COMPRESS option, 75
DATA= option, 75
FORMCHAR= option, 75
NLEVELS option, 76
NOPRINT option, 76
ORDER= option, 76
PAGE option, 77
FREQ procedure, TABLES statement, 84
ALL option, 87
ALPHA= option, 87
BDT option, 87
BINOMIAL option, 87, 166
BINOMIALC option, 88
CELLCHI2 option, 88
CHISQ option, 88, 104, 169
CL option, 89
CMH option, 89
CMH1 option, 89
CMH2 option, 89
CONTENTS= option, 89
CONVERGE= option, 90
CROSSLIST option, 90
CUMCOL option, 91
DEVIATION option, 91
EXPECTED option, 91
FISHER option, 91
FORMAT= option, 91
JT option, 91
LIST option, 91
MAXITER= option, 92
MEASURES option, 92
MISSING option, 92
MISSPRINT option, 92
NOCOL option, 92
NOCUM option, 92
NOFREQ option, 92
NOPERCENT option, 92
NOPRINT option, 93
NOROW option, 93
NOSPARSE option, 93
NOWARN option, 93
option, 87
OUT= option, 93
OUTCUM option, 93
OUTEXPECT option, 93, 161
OUTPCT option, 94
PLCORR option, 94
PRINTKWT option, 94
RELRISK option, 94, 169
RISKDIFF option, 94
RISKDIFFC option, 94
SCORES= option, 95, 181
SCOROUT option, 95
SPARSE option, 95, 161
TESTF= option, 96, 104
TESTP= option, 96, 104, 164
TOTPCT option, 96
TREND option, 96, 177
FREQ procedure, TEST statement, 96
AGREE option, 97
GAMMA option, 97
KAPPA option, 97
Syntax Index
KENTB option, 97
MEASURES option, 97
PCORR option, 97
SCORR option, 97
SMDCR option, 97, 177
SMDRC option, 97
STUTC option, 97
WTKAP option, 97
FREQ procedure, WEIGHT statement, 97
ZEROS option, 98
FREQ statement
CORR procedure, 13
UNIVARIATE procedure, 212
FRONTREF option
HISTOGRAM statement (UNIVARIATE), 222
G
GAMMA option
HISTOGRAM statement (UNIVARIATE), 222,
290, 348
OUTPUT statement (FREQ), 82
PROBPLOT statement (UNIVARIATE), 247,
301
QQPLOT statement (UNIVARIATE), 260, 301
TEST statement (FREQ), 97
GOUT= option
PROC UNIVARIATE statement, 205
GRID option
HISTOGRAM statement (UNIVARIATE), 222
PROBPLOT statement (UNIVARIATE), 248
QQPLOT statement (UNIVARIATE), 260, 263,
372
GRIDCHAR= option
QQPLOT statement (UNIVARIATE), 263
H
HEADER= option
INSET statement (UNIVARIATE), 236
HEIGHT= option
HISTOGRAM statement (UNIVARIATE), 222
INSET statement (UNIVARIATE), 236
PROBPLOT statement (UNIVARIATE), 248
QQPLOT statement (UNIVARIATE), 260
HISTOGRAM statement
UNIVARIATE procedure, 212
HMINOR= option
HISTOGRAM statement (UNIVARIATE), 222
PROBPLOT statement (UNIVARIATE), 248
QQPLOT statement (UNIVARIATE), 260
HOEFFDING option
PROC CORR statement, 9
HOFFSET= option
HISTOGRAM statement (UNIVARIATE), 222
HREF= option
HISTOGRAM statement (UNIVARIATE), 222
PROBPLOT statement (UNIVARIATE), 248
QQPLOT statement (UNIVARIATE), 261
HREFLABELS= option
HISTOGRAM statement (UNIVARIATE), 222
391
I
ID statement
UNIVARIATE procedure, 230
INFONT= option
HISTOGRAM statement (UNIVARIATE), 223
PROBPLOT statement (UNIVARIATE), 248
QQPLOT statement (UNIVARIATE), 261
INHEIGHT= option
HISTOGRAM statement (UNIVARIATE), 223
PROBPLOT statement (UNIVARIATE), 248
QQPLOT statement (UNIVARIATE), 261
INSET statement
UNIVARIATE procedure, 230
INTERTILE= option
HISTOGRAM statement (UNIVARIATE), 223,
345
PROBPLOT statement (UNIVARIATE), 248
QQPLOT statement (UNIVARIATE), 261
J
JT option
EXACT statement (FREQ), 78
OUTPUT statement (FREQ), 82
TABLES statement (FREQ), 91
K
K= option
HISTOGRAM statement (UNIVARIATE), 223,
297
KAPPA option
EXACT statement (FREQ), 78
OUTPUT statement (FREQ), 82
TEST statement (FREQ), 97
KENDALL option
PROC CORR statement, 9
KENTB option
OUTPUT statement (FREQ), 82
TEST statement (FREQ), 97
KERNEL option
HISTOGRAM statement (UNIVARIATE), 223,
297, 352
KEYLEVEL= option
CLASS statement (UNIVARIATE), 211
PROC UNIVARIATE statement, 337
L
L= option
HISTOGRAM statement (UNIVARIATE), 224
PROBPLOT statement (UNIVARIATE), 248
QQPLOT statement (UNIVARIATE), 261
LABEL= option
QQPLOT statement (UNIVARIATE), 263
392
Syntax Index
LAMCR option
OUTPUT statement (FREQ), 82
LAMDAS option
OUTPUT statement (FREQ), 82
LAMRC option
OUTPUT statement (FREQ), 82
LGOR option
OUTPUT statement (FREQ), 82
LGRID= option
HISTOGRAM statement (UNIVARIATE), 224
PROBPLOT statement (UNIVARIATE), 248
QQPLOT statement (UNIVARIATE), 261, 263
LGRRC1 option
OUTPUT statement (FREQ), 82
LGRRC2 option
OUTPUT statement (FREQ), 82
LHREF= option
HISTOGRAM statement (UNIVARIATE), 224
PROBPLOT statement (UNIVARIATE), 249
QQPLOT statement (UNIVARIATE), 261
LIST option
TABLES statement (FREQ), 91
LOCCOUNT option
PROC UNIVARIATE statement, 205, 331
LOGNORMAL option
HISTOGRAM statement (UNIVARIATE), 224,
291, 348, 354, 363
PROBPLOT statement (UNIVARIATE), 249,
302, 360
QQPLOT statement (UNIVARIATE), 261, 302
LOWER= option
HISTOGRAM statement (UNIVARIATE), 224
LRCHI option
EXACT statement (FREQ), 78
OUTPUT statement (FREQ), 82, 173
LVREF= option
HISTOGRAM statement (UNIVARIATE), 224
PROBPLOT statement (UNIVARIATE), 249
QQPLOT statement (UNIVARIATE), 262
M
MAXITER= option
TABLES statement (FREQ), 92
MAXNBIN= option
HISTOGRAM statement (UNIVARIATE), 224
MAXSIGMAS= option
HISTOGRAM statement (UNIVARIATE), 224
MAXTIME= option
EXACT statement (FREQ), 79
MC option
EXACT statement (FREQ), 79
MCNEM option
EXACT statement (FREQ), 78
OUTPUT statement (FREQ), 82
MEASURES option
EXACT statement (FREQ), 78
OUTPUT statement (FREQ), 82
TABLES statement (FREQ), 92
TEST statement (FREQ), 97
MHCHI option
EXACT statement (FREQ), 78
OUTPUT statement (FREQ), 82
MHOR option
OUTPUT statement (FREQ), 82
MHRRC1 option
OUTPUT statement (FREQ), 82
MHRRC2 option
OUTPUT statement (FREQ), 82
MIDPERCENTS option
HISTOGRAM statement (UNIVARIATE), 225,
343
MIDPOINTS= option
HISTOGRAM statement (UNIVARIATE), 225,
338, 340
MISSING option
CLASS statement (UNIVARIATE), 210
TABLES statement (FREQ), 92
MISSPRINT option
TABLES statement (FREQ), 92
MODES option
PROC UNIVARIATE statement, 206, 314
MU0= option
PROC UNIVARIATE statement, 206
MU= option
HISTOGRAM statement (UNIVARIATE), 226,
343
PROBPLOT statement (UNIVARIATE), 249
QQPLOT statement (UNIVARIATE), 262, 366
N
N option
OUTPUT statement (FREQ), 82
N= option
EXACT statement (FREQ), 79
NADJ= option
PROBPLOT statement (UNIVARIATE), 249
QQPLOT statement (UNIVARIATE), 262, 298
NAME= option
HISTOGRAM statement (UNIVARIATE), 226
PROBPLOT statement (UNIVARIATE), 249
QQPLOT statement (UNIVARIATE), 262
NCOLS= option
HISTOGRAM statement (UNIVARIATE), 226
PROBPLOT statement (UNIVARIATE), 249
QQPLOT statement (UNIVARIATE), 262
NEXTROBS= option
PROC UNIVARIATE statement, 206, 315
NEXTRVAL= option
PROC UNIVARIATE statement, 206, 315
NLEVELS option
PROC FREQ statement, 76
NMAXVAR= option
PLOTS option (CORR), 31
PLOTS=SCATTER option (CORR), 32
NMAXWIDTH= option
PLOTS option (CORR), 31
PLOTS=SCATTER option (CORR), 32
NMISS option
Syntax Index
OUTPUT statement (FREQ), 82
NOBARS option
HISTOGRAM statement (UNIVARIATE), 226
NOBYPLOT option
PROC UNIVARIATE statement, 206
NOCOL option
TABLES statement (FREQ), 92
NOCORR option
PROC CORR statement, 10
NOCUM option
TABLES statement (FREQ), 92
NOFRAME option
HISTOGRAM statement (UNIVARIATE), 226
INSET statement (UNIVARIATE), 236
PROBPLOT statement (UNIVARIATE), 249
QQPLOT statement (UNIVARIATE), 262
NOFREQ option
TABLES statement (FREQ), 92
NOHLABEL option
HISTOGRAM statement (UNIVARIATE), 226
PROBPLOT statement (UNIVARIATE), 250
QQPLOT statement (UNIVARIATE), 262
NOINSET option
PLOTS=SCATTER option (CORR), 32
NOLEGEND option
PLOTS=SCATTER option (CORR), 32
NOMISS option
PROC CORR statement, 10
NOPERCENT option
TABLES statement (FREQ), 92
NOPLOT option
HISTOGRAM statement (UNIVARIATE), 226
NOPRINT option
HISTOGRAM statement (UNIVARIATE), 226
PROC CORR statement, 10
PROC FREQ statement, 76
PROC UNIVARIATE statement, 206
TABLES statement (FREQ), 93
NOPROB option
PROC CORR statement, 10
NORMAL option
HISTOGRAM statement (UNIVARIATE), 226,
291, 343
PROBPLOT statement (UNIVARIATE), 250,
302
PROC UNIVARIATE statement, 206
QQPLOT statement (UNIVARIATE), 262, 302
NORMALTEST option
PROC UNIVARIATE statement, 206
NOROW option
TABLES statement (FREQ), 93
NOSIMPLE option
PROC CORR statement, 10
NOSPARSE option
TABLES statement (FREQ), 93
NOTSORTED option
BY statement (UNIVARIATE), 209
NOVLABEL option
HISTOGRAM statement (UNIVARIATE), 226
393
O
OR option
EXACT statement (FREQ), 78, 169
OUTPUT statement (FREQ), 83
ORDER= option
CLASS statement (UNIVARIATE), 210
PROC FREQ statement, 76
OUT= option
OUTPUT statement (FREQ), 80
OUTPUT statement (UNIVARIATE), 237
PROC CORR statement, 10
TABLES statement (FREQ), 93
OUTCUM option
TABLES statement (FREQ), 93
OUTEXPECT option
TABLES statement (FREQ), 93, 161
OUTH= option
PROC CORR statement, 10
OUTHISTOGRAM= option
HISTOGRAM statement (UNIVARIATE), 227,
308, 340
OUTK= option
PROC CORR statement, 10
OUTP= option
PROC CORR statement, 10
OUTPCT option
TABLES statement (FREQ), 94
OUTPUT statement
FREQ procedure, 80
UNIVARIATE procedure, 237, 267
OUTS= option
PROC CORR statement, 10
P
PAGE option
PROC FREQ statement, 77
PARTIAL statement
CORR procedure, 13
PCHI option
EXACT statement (FREQ), 78
OUTPUT statement (FREQ), 83, 173
PCORR option
EXACT statement (FREQ), 78
OUTPUT statement (FREQ), 83
TEST statement (FREQ), 97
394
Syntax Index
PCTLAXIS option
QQPLOT statement (UNIVARIATE), 263, 305,
372
PCTLDEF= option
PROC UNIVARIATE statement, 207, 273
PCTLMINOR option
PROBPLOT statement (UNIVARIATE), 250
QQPLOT statement (UNIVARIATE), 263
PCTLNAME= option
OUTPUT statement (UNIVARIATE), 240
PCTLORDER= option
PROBPLOT statement (UNIVARIATE), 250
PCTLPRE= option
OUTPUT statement (UNIVARIATE), 239
PCTLPTS= option
OUTPUT statement (UNIVARIATE), 239
PCTLSCALE option
QQPLOT statement (UNIVARIATE), 263, 305
PEARSON option
PROC CORR statement, 11
PERCENTS= option
HISTOGRAM statement (UNIVARIATE), 227
PFILL= option
HISTOGRAM statement (UNIVARIATE), 227
PHI option
OUTPUT statement (FREQ), 83
PLCORR option
OUTPUT statement (FREQ), 83
TABLES statement (FREQ), 94
PLOT option
PROC UNIVARIATE statement, 319
PLOTS option
PROC UNIVARIATE statement, 207
PLOTSIZE= option
PROC UNIVARIATE statement, 207
POINT option
EXACT statement (FREQ), 79
POSITION= option
INSET statement (UNIVARIATE), 236
PRINTKWT option
TABLES statement (FREQ), 94
PROBPLOT statement
UNIVARIATE procedure, 241
PROC CORR statement, 7,
See CORR procedure
CORR procedure, 7
PROC FREQ statement,
See FREQ procedure
PROC UNIVARIATE statement, 203,
See UNIVARIATE procedure
Q
QQPLOT statement
UNIVARIATE procedure, 253
R
RANK option
PROC CORR statement, 11
RANKADJ= option
S
SCALE= option
HISTOGRAM statement (UNIVARIATE), 227,
289, 290, 346
PROBPLOT statement (UNIVARIATE), 250
QQPLOT statement (UNIVARIATE), 263
SCORES= option
TABLES statement (FREQ), 95, 181
SCOROUT option
TABLES statement (FREQ), 95
SCORR option
EXACT statement (FREQ), 78
OUTPUT statement (FREQ), 83
TEST statement (FREQ), 97
SEED= option
Syntax Index
EXACT statement (FREQ), 79
SHAPE= option
HISTOGRAM statement (UNIVARIATE), 227
PROBPLOT statement (UNIVARIATE), 251
QQPLOT statement (UNIVARIATE), 264
SIGMA= option
HISTOGRAM statement (UNIVARIATE), 227,
289, 343
PROBPLOT statement (UNIVARIATE), 251,
360
QQPLOT statement (UNIVARIATE), 264, 366,
369
SINGULAR= option
PROC CORR statement, 11
SLOPE= option
PROBPLOT statement (UNIVARIATE), 251
QQPLOT statement (UNIVARIATE), 264
SMDCR option
OUTPUT statement (FREQ), 83
TEST statement (FREQ), 97, 177
SMDRC option
OUTPUT statement (FREQ), 83
TEST statement (FREQ), 97
SPARSE option
TABLES statement (FREQ), 95, 161
SPEARMAN option
PROC CORR statement, 11
SQUARE option
PROBPLOT statement (UNIVARIATE), 251,
360
QQPLOT statement, 366
QQPLOT statement (UNIVARIATE), 265
SSCP option
PROC CORR statement, 11
STUTC option
OUTPUT statement (FREQ), 83
TEST statement (FREQ), 97
T
TABLES statement
FREQ procedure, 84
TEST statement
FREQ procedure, 96
TESTF= option
TABLES statement (FREQ), 96, 104
TESTP= option
TABLES statement (FREQ), 96, 104, 164
THETA= option
HISTOGRAM statement (UNIVARIATE), 227,
289, 346, 354, 363
PROBPLOT statement (UNIVARIATE), 252
QQPLOT statement (UNIVARIATE), 265
THRESHOLD= option
HISTOGRAM statement (UNIVARIATE), 228,
290
PROBPLOT statement (UNIVARIATE), 252
QQPLOT statement (UNIVARIATE), 265
TOTPCT option
TABLES statement (FREQ), 96
395
TREND option
EXACT statement (FREQ), 78, 177
OUTPUT statement (FREQ), 83
TABLES statement (FREQ), 96, 177
TRIMMED= option
PROC UNIVARIATE statement, 207, 329
TSYMM option
OUTPUT statement (FREQ), 83
TURNVLABELS option
HISTOGRAM statement (UNIVARIATE), 228
U
U option
OUTPUT statement (FREQ), 83
UCR option
OUTPUT statement (FREQ), 83
UNIVARIATE procedure
syntax, 202
UNIVARIATE procedure, BY statement, 209
DESCENDING option, 209
NOTSORTED option, 209
UNIVARIATE procedure, CLASS statement, 210
KEYLEVEL= option, 211
MISSING option, 210
ORDER= option, 210
UNIVARIATE procedure, FREQ statement, 212
UNIVARIATE procedure, HISTOGRAM statement,
212
ALPHA= option, 217, 289
ANNOKEY option, 217
ANNOTATE= option, 218, 355
BARWIDTH= option, 218
BETA option, 218, 288, 346
BETA= option, 218, 289
C= option, 218, 297, 352
CAXIS= option, 219
CBARLINE= option, 219
CFILL= option, 219
CFRAME= option, 219
CFRAMESIDE= option, 219
CFRAMETOP= option, 219
CGRID= option, 219
CHREF= option, 219
COLOR= option, 219
CPROP= option, 220, 345
CTEXT= option, 220
CTEXTSIDE= option, 220
CTEXTTOP= option, 220
CVREF= option, 220
DESCRIPTION= option, 220
ENDPOINTS= option, 220, 340
EXPONENTIAL option, 221, 289
FILL option, 221
FONT= option, 221
FORCEHIST option, 222
FRONTREF option, 222
GAMMA option, 222, 290, 348
GRID option, 222
HEIGHT= option, 222
396
Syntax Index
Syntax Index
NOFRAME option, 249
NOHLABEL option, 250
NORMAL option, 250, 302
NOVLABEL option, 250
NOVTICK option, 250
NROWS= option, 250
PCTLMINOR option, 250
PCTORDER= option, 250
RANKADJ= option, 250
SCALE= option, 250
SHAPE= option, 251
SIGMA= option, 251, 360
SLOPE= option, 251
SQUARE option, 251, 360
THETA= option, 252
THRESHOLD= option, 252
VAXISLABEL= option, 252
VMINOR= option, 252
VREF= option, 252
VREFLABELS= option, 252
VREFLABPOS= option, 252
W= option, 252
WAXIS= option, 252
WEIBULL option, 253, 302
WEIBULL2 option, 303
WEIBULL2 statement, 253
ZETA= option, 253
UNIVARIATE procedure, PROC UNIVARIATE
statement, 203
ALL option, 204
ALPHA= option, 204
ANNOTATE= option, 204, 305
CIBASIC option, 204, 326
CIPCTLDF option, 204
CIPCTLNORMAL option, 205
CIQUANTDF option, 328
CIQUANTNORMAL option, 205, 328
DATA= option, 205, 305
EXCLNPWGT option, 205
FREQ option, 205, 317
GOUT= option, 205
KEYLEVEL= option, 337
LOCCOUNT option, 205, 331
MODES option, 206, 314
MU0= option, 206
NEXTROBS= option, 206, 315
NEXTRVAL= option, 206, 315
NOBYPLOT option, 206
NOPRINT option, 206
NORMAL option, 206
NORMALTEST option, 206
PCTLDEF= option, 207, 273
PLOT option, 319
PLOTS option, 207
PLOTSIZE= option, 207
ROBUSTSCALE option, 207, 329
ROUND= option, 207
TRIMMED= option, 207, 329
VARDEF= option, 208
397
398
Syntax Index
V
VAR statement
CORR procedure, 13
UNIVARIATE procedure, 266
VARDEF= option
PROC CORR statement, 11
PROC UNIVARIATE statement, 208
VAXIS= option
HISTOGRAM statement (UNIVARIATE), 228
VAXISLABEL= option
HISTOGRAM statement (UNIVARIATE), 228
PROBPLOT statement (UNIVARIATE), 252
QQPLOT statement (UNIVARIATE), 265
VMINOR= option
HISTOGRAM statement (UNIVARIATE), 228
PROBPLOT statement (UNIVARIATE), 252
QQPLOT statement (UNIVARIATE), 265
VOFFSET= option
HISTOGRAM statement (UNIVARIATE), 228
VREF= option
HISTOGRAM statement (UNIVARIATE), 228
PROBPLOT statement (UNIVARIATE), 252
QQPLOT statement (UNIVARIATE), 265
VREFLABELS= option
HISTOGRAM statement (UNIVARIATE), 228
PROBPLOT statement (UNIVARIATE), 252
QQPLOT statement (UNIVARIATE), 265
VREFLABPOS= option
HISTOGRAM statement (UNIVARIATE), 229
PROBPLOT statement (UNIVARIATE), 252
QQPLOT statement (UNIVARIATE), 265
VSCALE= option
HISTOGRAM statement (UNIVARIATE), 229
W
W= option
HISTOGRAM statement (UNIVARIATE), 229
PROBPLOT statement (UNIVARIATE), 252
QQPLOT statement (UNIVARIATE), 265
WAXIS= option
HISTOGRAM statement (UNIVARIATE), 229
PROBPLOT statement (UNIVARIATE), 252
Z
ZEROS option
WEIGHT statement (FREQ), 98
ZETA= option
HISTOGRAM statement (UNIVARIATE), 229
PROBPLOT statement (UNIVARIATE), 253
QQPLOT statement (UNIVARIATE), 266, 369