R Integration White Paper
R Integration White Paper
Trademarks
Dell, the Dell logo, Dell Statistica, and Dell Statistica Enterprise Server are trademarks of Dell Inc. Other trademarks and
trade names may be used in this document to refer to either the entities claiming the marks and names or their products.
Dell disclaims any proprietary interest in the marks and names of others.
Legend
CAUTION: A CAUTION icon indicates potential damage to hardware or loss of data if instructions are not
followed.
WARNING: A WARNING icon indicates a potential for property damage, personal injury, or death.
IMPORTANT, NOTE, TIP, MOBILE, or VIDEO: An information icon indicates supporting information.
Contents
Contents
Introduction ................................................................................................ 4
Overview .................................................................................................... 6
Introduction
R is a programming language and environment for statistical computing (www.r-project.org). Most of the R
environment and its source code are currently available under the GNU GPL license (www.r-
project.org/licenses).
NOTE: None of the components of the R environment constitute unrestricted freeware. Instead, they
are available only under the terms of specific licenses. If you intend to download those applications,
you must accept prior to downloading and you must comply with their terms. Also, those licenses can
change over time. Thoroughly familiarize yourself with the terms every time you download any
components of the R environment.
Statistica can interface with third-party applications, such as Microsoft Office, through certain standards-
compliant channels.
EXAMPLE: COM is an interface for interprocess communication, which is built into Microsoft operating systems.
Even though R itself does not provide a COM interface, Statistica has created a separate connector application,
COMadaptR, licensed under LGPL (>=2.1) (with parts under GPL2) that facilitates communication to the R
world. This package is based on GPL2/LGPL2 version of an earlier application, the statconnDCOM library.
Statistica can interface with R via COMadaptR and, if this library is installed on the system, allows you to open
R scripts and submit them for execution to R from within the Statistica environment. It facilitating
bidirectional data transfer and presentation of resulting outputs through a user-adjustable macro that is
executed in Statistica in place of the R script.
This interface makes it possible for all Statistica products, ranging from Desktop to Enterprise and Web
solutions, to provide comprehensive support for interaction with the R platform on systems where necessary
third-party components and libraries are present, allowing you to:
run R scripts within the Statistica environment, sending results to Statistica reports, workbooks, and
graphs
process Statistica data sets in R and import tabular results from R into Statistica spreadsheets
call R from Statistica Visual Basic (SVB) to create new functionality that leverages R libraries
utilize R in Statistica Enterprise, Statistica Server, and Statistica Workspaces
NOTE: You are responsible for ensuring compliance with terms of all applicable licenses for R and all
components of the R environment. Always carefully review all the license agreements before accepting
them, as they can change over time. Those products are external to the Statistica environment, and are
not covered by any Statistica license agreements.
Automatic installation of the COMadaptR support library is included in Statistica Version 12.0 SP3 and above,
or is available with V12A maintenance update UPD008. When the system detects that an R installation is
present, it will offer you the opportunity to automatically install the COMadaptR package in the R
environment. This package is included in Statistica distribution.
You can verify that the COMadaptR library is installed correctly, and that all its dependencies are satisfied, by
running one of the R examples that accompany Statistica. Once all the required third-party components are
installed, you should be able to open and execute R scripts within the Statistica environment. To locate the
examples, on the File tab, click Open Examples and browse to the R folder.
R is highly extensible. Users can submit libraries (packages), implementing a set of functions, usually for a
specific area of their expertise/research. The R community maintains several centralized repositories, which
make hundreds of such packages readily available to all users over the Internet. Many of these packages cater
specifically to highly specialized audiences with particular data analysis needs.
Overview
This document provides a detailed description of features that make the diversity and power of R fully
available to users of all Statistica solutions. Furthermore, these features enable you to combine the unique
capabilities of both Statistica and R platforms.
You can run Native R scripts directly within Statistica, Statistica Enterprise, and Statistica Enterprise
Server.
You can retrieve R output as native Statistica spreadsheets and graphs, and manage them via highly
flexible Statistica workbook containers.
Enterprises can now use the specialized routines and capabilities of R with Statistica, Statistica Enterprise, and
Statistica Enterprise Server for these functionalities:
Enables users to run R All R console output is copied into the report, and R commands are highlighted.
scripts as is, and
retrieve the results Plots generated by the script are automatically embedded in the report as
into Statistica reports scalable images.
These plots are also replicated as Statistica graphs (scalable metafile images are
placed inside these graphs), thus enabling annotation using powerful graphical
facilities in Statistica
Provides R language Transfer data from Statistica spreadsheets into R data frames.
extensions functions
for R scripts run from Extract tabular data from R variables into Statistica spreadsheets. These results
the Statistica spreadsheets (as well as all graphs produced by the script) are returned (routed)
environment according to Output Manager settings, into Statistica workbooks.
Executes R scripts as Scripts can be parameterized with a Collection of objects (numbers, strings,
native macros from arrays, additional R code or overridden R functions, spreadsheets), which are
within Statistica mapped to R variables accessible to the script. This approach provides fine-
Visual Basic (SVB) grained control over scripts’ behavior in repeated runs or when they are used as
programs the backend for custom Statistica modules.
Taken together, these enhancements not only enable you to run R scripts directly in the Statistica desktop
environment, but also provide a way to embed specialized R functionality into these features:
SVB programs
Custom interactive analysis modules
Workspace nodes
Enterprise analysis configurations
You can also offload such scripts to Statistica Enterprise Server for server-side processing.
Enterprise customers have the opportunity to integrate R into validated Statistica Enterprise solutions, such as
those used by FDA-regulated industries, or to provide a very powerful Statistica Enterprise Server-based R
server environment.
CAUTION: Developing and debugging R scripts: The R support in Statistica was not designed to supply
a complete R development and debugging environment. The console application and tools supplied with
standard R installation perform those functions very well and are already familiar to R users and
developers.
NOTE: Depending on your operating system version and settings, the system may ask you to confirm
these actions, possibly requesting administrator credentials.
CAUTION: Usage of such an interface directly by end users is very ineffective, sometimes unproductive,
and usually inflexible. It may also significantly degrade the overall performance of interactions with R if
performed incorrectly.
Therefore, the following architectural extensions have been added to the Statistica platform to provide a
seamless and effective R Integration experience for end users. The example mentioned above also
demonstrates the significant reduction in the efforts required to implement the same analysis using the new
built-in features. In Statistica, simply open and run DoseResponse.r.
R script called R.r Support script R.r implements Statistica-specific R language extensions
functions.
When an R script is executed in Statistica, the support macro parses the script and executes these processes:
1 Transfers data and script parameters
2 Submits script content to the R environment
3 Manages error conditions
4 Handles script outputs, ensuring that they are properly transferred back to Statistica.
CAUTION: Although the support code in R.r is write-protected by default, you can inspect it and
modified or enhanced it to support new functionality required for specific use cases, although you
do so at your own risk.
Statistica Visual Basic macro The R.svb macro supports standalone execution to simplify debugging and
called R.svb testing of modifications.
R scripts are displayed in slightly modified Statistica Visual Basic Macro windows. These windows actually
contain two scripts, the R script itself and the R Integration Support Macro (R.svb), accessible through two
tabs in the upper-left corner (circled in the next image).
NOTE: In order to take advantage of R Integration features described in this document, R scripts should
be executed from within Statistica.
CAUTION: Although you can develop and debug complex R programs within this environment, it was not
specifically designed for these purposes. The R environment itself is better suited for such activities.
Retrieving results
The minimal output produced during execution of an R script is a Statistica workbook that represents an R
console session. It will include highlighted commands and any output generated by the R environment. This
kind of report is produced even if the script is empty. The contents of this report can be edited and
manipulated in the same way that you would edit any other Statistica report.
The metafile images are embedded into graph objects as locked background.
You can annotate R plots in Statistica, using a familiar point-and-click interface, with a set of text and drawing
objects such as lines and arrows, rectangles and ellipses, polygons and pattern/color fill areas, etc.
Since these annotations are anchored to relative positions in the plot area, they will remain correctly attached
to the plot if the graph is resized.
You can flexibly design and further enhance these graphs using Statistica graphics tools. You can save them in
other formats (JPG or GIF), or print them to PDF files.
Since the individual R plot components (the structural elements of the plot) are not accessible for
manipulation in Statistica graphs, the rich capabilities of Statistica for creating and then further editing graphs
(scaling, point markers, fit lines, etc.) are not available.
However, integration between R and Statistica lets you extract data from R. You can then render important
graphs inside the Statistica environment by writing Statistica Visual Basic macros that will execute R scripts,
extract results, and then post-process those results as necessary, which will be illustrated later.
IMPORTANT: The R language is case sensitive; therefore, R language extensions for Statistica are also
case sensitive. Extensions are only recognized by the Statistica environment when typed exactly as
shown.
Tabular data represented in Statistica in the form of spreadsheets are mapped into the equivalent R
structures: data frames.
Mapping preserves as much information as possible for both formats, as illustrated by these examples:
ActiveDataSet
The ActiveDataSet keyword has been adopted from the Statistica Visual Basic language. It performs the same
function in R scripts by referencing the active Statistica data spreadsheet.
In the desktop Statistica environment, the active data set usually means the top-most visible spreadsheet that
can act as a data source. However, it can also be a spreadsheet in a workbook selected as Active Input.
This concept is redefined and extended for server-based environments (Statistica Enterprise Server,
Enterprise), but the keyword is still valid and refers to the corresponding server-side mapping of the active
data source.
If no active data set is defined/available, the R script that uses it will fail. The same is true for SVB macros.
If the R Integration Support Macro encounters the ActiveDataSet keyword in the R script, it transfers the actual
Statistica active data set into the R environment and assigns it to a variable of the same name. Therefore, this
keyword represents a data frame variable and can be handled as such in the script.
EXAMPLE:
Definitions of Functions:
This section explains the following functions: Spreadsheet(path.or.object, [applySelectionConditions],
[recodeTextLabelAsFactors], [recodeMissingDataToNA], [getCaseNames], [getTextValues], [attachObject],
[username], [password], [connectionstring], [stationname])
[connectionstring]
Spreadsheet()
Like the ActiveDataSet keyword, the return value of the Spreadsheet() function should be treated as a data
frame variable with the contents closely matching that of the corresponding Statistica spreadsheet.
One useful feature supported by this function is the use of default search paths for spreadsheet files that are
specified only as simple file names. This means that if the function parameter consists only of a file name,
such as Spreadsheet(“some.sta”), R integration support code looks for this file in several locations:
Support code will also append the default .sta file extension if one is not present. Therefore the following
options are available:
R scripts can reference the accompanying data sets (placed in the same folder) simply by name.
Spreadsheets that are included in every Statistica installation as demonstration (Example: data sets)
can be referenced by name in much the same way as built-in R data sets.
EXAMPLE:
RouteOutput(x,name,description)
RouteOutput transfers various types of data from the R environment into Statistica spreadsheets:
Although the function was introduced to retrieve tabular data (such as data frames, matrices, or arrays) into
spreadsheets, single-value data, such as numbers or strings, can be passed as well. They will be placed in
single-cell spreadsheets. x can be an R variable or a literal value.
The extension, RouteOutput(), is similar to the equivalent Statistica Visual Basic function. The results
spreadsheets recreated by this function in the Statistica environment become the standard output of the R
script/analysis. They follow Output Manager settings in Statistica.
NOTE: In Statistica:
Select the Tools tab.
Click Options.
In the Options dialog box, select the Analyses/Graphs: Output Manager tab.
The results are routed either to individual windows or to a workbook (or multiple workbooks) for each analysis,
with optional output reports, such as a Microsoft Word document. The most popular setting is a single results
workbook.
Optional parameter name and description specify the name and header of the resulting spreadsheet. Provide a
value for the spreadsheet name for visual distinction in the tree view of the results workbook.
NOTE: R plots transferred into Statistica as native graphs do not require explicit output routing. All
plots generated during a script run are automatically transferred and routed according to Output
Manager settings.
Many functions in R, specifically the ones that perform statistical modeling, represent their results as
structured objects, sometimes of significant complexity.
These objects cannot be reduced to a single table, and therefore cannot be handled by the RouteOutput()
extension. They could be automatically searched for tabular components, but since the object structures are
specific to a particular method, such an approach would generate a significant amount of “junk” output.
However, since the results (the actual data of interest) are either stored in such objects as tabular
components or produced by applying an object’s method to some input data, this limitation does not pose any
problems. Particular results can be easily extracted from such a statistical model object and routed back to
Statistica.
EXAMPLE:
Uses() Functions
This section covers the following functions: Uses(package,[lib], [repos], [quiet], [attachImports]).
The Uses() function ensures that the respective package (library), named package, and its dependencies
(packages that are required in order for this package to run) are:
1 installed on the computer/server where the R script is running
2 loaded into the R environment automatically
If some of these libraries are not present, they will be installed from CRAN repository (http://cran.r-
project.org) and then loaded.
Functions Descriptions
quiet TRUE or FALSE (default) flag indicating whether interactive dialog boxes should
be suppressed during package installation
attachImports TRUE OR FALSE (default) flag indicating whether to attach imports from the
package and its dependencies to the current R session
NOTE: This extension is not necessary for interaction between Statistica and R. You can implement it
within the R language itself. It simplifies the process of conditional library installation and loading by
encapsulating it in a single call.
EXAMPLE:
Uses("drc") # make sure that the respective package is installed and loaded
...
DR <- multdrc(SLOPE ~ DOSE, CURVE, data = PestSci) # call the package methods
This program fits dose response curves to the respective variables of the built-in PestSci data set by calling the
multdrc function defined in the “drc” package. Uses(“drc”) ensures that the function is available by installing
and loading the package, if necessary.
Create
Open
Edit
Save
Execute
R functionality is available in Statistica Enterprise analysis configurations and Statistica Workspace nodes,
since they are SVB-based.
Existing R script files can be opened with Macros.Open(“path\to\some.r”) or created on-the-fly with
Macros.New() and Macro.Code.
In the latter case, Statistica needs help in distinguishing R scripts from SVB macros, which can be achieved by
either of two methods:
Specify the name for a new macro with the .R extension, even if you are not going to save it on disk.
Explicitly set Macro.Scripting to 5 (R Macro Type).
Run the scripts by calling Macro.Execute.
NOTE: The Macro.Scripting type for R scripts is 5. It will later be mapped to a symbolic constant.
EXAMPLE:
Sub Main
Dim R As New Macro
R.Code = "ActiveDataSet" ' simple R script created on-the-fly
R.Scripting = 5 ' R Macro Type = 5
R.Execute
End Sub
This Statistica Visual Basic macro runs a simple R script containing only a single command ActiveDataSet. As
described in the previous section, this script is an R language extension for Statistica that will transfer (and in
this case display) the currently active Statistica data file in R.
EXAMPLE: If you run this macro after opening the example data file Exp.sta, a listing of that file will be
displayed in a report window that represents the R console session, as shown below:
NOTE: String parameters without a keyword tag are treated as R code that should be executed prior to
execution of the script itself. This is analogous to SVB hidden code and can be used to, for example,
define a common set of new functions or global variables/constants.
In order to use Collection objects, SVB scripts must include a reference to the Statistica Collection Library.
Add it via Tools > References while editing the macro.
The Collection object has several generic properties and methods that are needed to manipulate the contents
of a Collection:
Count
Exists (key)
Item (key), which adds a member to the collection with the specified key or returns an existing
member of the collection with the specified index or key
Add (item, [key]), which adds a member to the collection with an optional associated key
Remove (Index or key)
R Integration in Dell Statistica 13.1
Copyright © Dell, 2016
25
R Integration in Dell Statistica
Remove All
However, due to the use of so-called default object properties (Item is the default property of a Collection
and it returns its Value property by default), interaction with a Collection object is reduced to intuitively
clear assignment operations:
EXAMPLE:
Once you have assembled the parameter Collection, execute the parameterized R script by calling
Macro.ExecuteWithArgument(Parameters As Collection).
EXAMPLE:
Dim s1 As New Spreadsheet, s2 As New Spreadsheet ' ... populate s1, s2 with data
var1 = Array("CASE 1", "CASE 2")
var2 = Array(1, 2, 3, 4, 5)
' * don't use spaces in parameter names
' * some names are "locked" and can't be used [e.g. 'text', 'str', 'sample']
Dim param As New Collection
param("number") = 57
param("string") = "A STRING sample..."
param("string_array") = var1 ' add items with an assignment operator
param.Add(var2, "number_array") ' OR using explicit Add() method
param("SomeSpreadsheet") = s1
param("ActiveDataSet") = s2 ' override the value of 'ActiveDataSet' keyword
' string parameters without associated keys will be treated as R code and
' will be executed before the script - an analog of SVB 'hidden code'
' * define a function that will be available to the R script
param.Add("func <- function(x) { cat('Called func(x) with x =', x) }")
' * another way to define a global constant or variable
param.Add("STATISTICA.Version = '" & Version & "'")
' now run the R script with this collection of parameters
' (parameters become R variables – the script can reference them by name)
Dim m As Macro
Set m = Macros.Open(MacroDir & "\parameterized.r")
m.ExecuteWithArgument(param)
EXAMPLE:
Dim m As Macro
Set m = Macros.Open(MacroDir & "\some.r")
More examples
We have demonstrated all the functional components required to build custom applications within the
Statistica platform that can take advantage of the specialized functionality available in R.
All installations of Statistica contain a set of examples that provide a detailed demonstration of the R
integration features. You will find these examples in the [Statistica]\Examples\R folder. These examples can
also be used as templates for your own development.
4 To run the script, click Save and Execute or use the File menu to open and run it.
NOTE: This script does not require a data file. If you are using ActiveDataSet or Spreadsheet()
extensions, keep in mind that the data sources must be located on the server:
Spreadsheet() should reference data sets by Statistica Enterprise Server URL paths.
ActiveDataSet will be mapped to the spreadsheet that you open in Statistica Enterprise Server.
This rule can be overridden by a parameter of the same name passed to the R script from SVB. Also, in
order to execute any scripts in Statistica Enterprise Server, you must have the proper permissions on
the server.
Options are also provided to transfer the data file to the server side. These options are described in detail in
the Statistica documentation.
After clicking OK, the respective R script will execute on Statistica Enterprise Server. It might be scheduled to
be executed rather than be executed immediately, depending on the server load.
The progress of the analysis can be monitored via the Server > Task Status dialog box.
You can retrieve the results from the server via the Results button or you can double-click on the task. The
representation of the results of an offloaded task is equivalent to the same task running locally.
EXAMPLE:
1 Open Examples\R\NonLinear Time Series\As Data Miner Node.sdm.
2 In the Workspace, right-click the analysis node.
3 Select Edit Code to review the underlying SVB code (NonLinearTimeSeries.svx).
NOTE: The corresponding .dmi file describes the node’s input parameters. The R script file
(NonLinearTimeSeries.r) performs nonlinear time series analysis using the tsDyn R package. This
script can serve as a template for your own R-based Data Miner nodes.
4 After running the node, double-click the Reporting Documents node to display the workbook.
This example demonstrates that the ability to incorporate specialized R functionality into Statistica Data Miner
provides very powerful opportunities for creating analytic workflows or Collections of Data Miner nodes and
node-browser configurations dedicated to highly specialized analyses (such as nonlinear time series, dose
response curve fitting, and so on).
Depending on the configuration of the Statistica Enterprise system, this reusable analysis template is now
available to the respective users of both Statistica and Statistica Enterprise Server environments. The results
of this template can be combined into standard reports.
Moreover, if the respective Statistica Enterprise installation is integrated with the Statistica Document
Management System, these R scripts can be locked down. For instance, you could version them with complete
audit trails to support FDA 21 CFR Part 11 requirements.
Summary
The following capabilities, made possible by the full integration of the R environment into Statistica
Enterprise, make Statistica Enterprise arguably the most powerful enterprise analysis system to date:
Utilize extensive and highly specialized functionality of R.
Leverage it inside a secure, enterprise-wide, role-based analysis system that insulates end users from
implementation details and that can be deployed into validated manufacturing environments.
Run R-based templates on the Statistica Enterprise Server platform.
Extract results into pre-formatted standard reports.
Final comments
The features provided in Statistica for integration with R are quite flexible. Using them, thousands of highly
specialized R functions and features become available to all Statistica solutions. However, if you plan to
exploit these features, please consider the following system limitations of the R platform.
Error handling
Most of the error conditions generated within the R environment, such as syntax and runtime errors caused by
an R script or by the integration support libraries, such as broken R installation or missing components, are
intercepted and handled by Statistica. Developers can use error handling facilities available in either
environment.
EXAMPLE: On Error handlers in SVB macros calling R scripts.
However, R programs can occasionally crash or hang. In the latter case, program control does not return to
Statistica. Therefore, careful validation of the respective R scripts is crucial for enterprise-level deployment of
R analysis templates.
About Dell
Dell listens to customers and delivers worldwide innovative technology, business solutions and services they
trust and value. For more information, visit www.software.dell.com.
Contacting Dell
Technical Support:
Online Support
Product Questions and Sales:
(800) 306-9329
Email:
[email protected]