\documentclass{article} \usepackage{url,Sweave} %\VignetteIndexEntry{Using car functions inside user functions} \newcommand{\R}{{\normalfont\textsf{R}}{}} \newcommand{\car}{\texttt{car}} \newcommand{\effects}{\texttt{effects}} \newcommand{\code}[1]{\texttt{#1}} \usepackage[authoryear,round]{natbib} \bibliographystyle{plainnat} <>= options(width=80, digits=4, useFancyQuotes=FALSE, prompt=" ", continue=" ") @ \title{Using \car{} Functions in Other Functions} \author{John Fox\footnote{Department of Sociology, McMaster University} \&{} Sanford Weisberg\footnote{ School of Statistics, University of Minnesota}} \date{\today} \begin{document} \maketitle \begin{abstract} The \car{} package \citep{FoxWeisberg11} provides many functions that are applied to a fitted regression model, perform additional calculations on the model or possibly compute a different model, and then return values and graphs. In some cases, users may wish to write functions that call functions in \car{} for a particular purpose. Because of the scoping rules used in \R{}, several functions in \car{} that work when called from the command prompt may fail when called inside another function. We discuss how users can modify their programs to avoid this problem. \end{abstract} \section{\code{deltaMethod}} The \car{} package includes many functions that require an object created by a modeling function like \code{lm}, \code{glm} or \code{nls} as input. For a simple example, the function \code{deltaMethod} uses the delta method \citep[Sec.~4.4.6]{FoxWeisberg11} to estimate the value and standard error of a nonlinear combination of parameter estimates. For example <<>>= library(car) m1 <- lm(time ~ t1 + t2, Transact) deltaMethod(m1, "t1/(t2 + 2)") @ Here \code{deltaMethod} returns the standard error of the estimate of $\beta_1/(\beta_2+2)$, where $\beta_j$ is the parameter corresponding to the regressor \texttt{t}$_j$. The code <<>>= ans <- NULL for (z in 1:4) { ans <- rbind(ans, deltaMethod(m1, "t1/(t2 + z)", func = gsub("z", z, "t1/(t1+z)"))) } ans @ also works as expected. The \code{func} argument uses \code{gsub} to get the right row labels. Consider the function: <<>>= f1 <- function(mod) { ans <- NULL for (x in 1:4) { ans <- rbind(ans, deltaMethod(mod, "t1/(t2 + x)", func = gsub("x", x, "t1/(t1+x)")) )} ans } @ which simply puts the code used above into a function. Executing this function fails: \begin{Schunk} \begin{Sinput} f1(m1) \end{Sinput} \begin{Soutput} Error in eval(expr, envir, enclos) : object 'x' not found \end{Soutput} \end{Schunk} Worse yet, if \texttt{x} is defined in the same environment as \texttt{m1}, this function gives the wrong answer: <<>>= x <- 10 f1(m1) @ The core of the problem is the way that \R{} does scoping. The regression object \texttt{m1} was created in the global environment, whereas the argument \texttt{z} in the \texttt{f1} function is created in the local environment of the function. The call to \code{deltaMethod} is evaluated in the global environment where \texttt{m1} is defined, leading to the error message if \texttt{z} does not exist in the global environment, and to wrong answers if it does exist. For \code{deltaMethod}, there is an additional argument \texttt{constants} that can be used to fix the problem: <<>>= f2 <- function(mod) { ans <- NULL for (x in 1:4) { ans <- rbind(ans, deltaMethod(mod, "t1/(t2 + x)", func = gsub("x", x, "t1/(t1+x)"), constants=list(x=x)) )} ans } f2(m1) @ The \texttt{constants} argument is a named list of quantities defined in the local function that are needed in the evaluation of \code{deltaMethod}. \section{\code{ncvTest}} The function \code{ncvTest} \citep[Sec.~6.5.2]{FoxWeisberg11} computes tests for non-constant variance in linear models as a function of the mean, the default, or any other linear function of regressors, even for regressors not part of the mean function. For example, <<>>= m2 <- lm(prestige ~ education, Prestige) ncvTest(m2, ~ income) @ fits \texttt{prestige} as a linear function of \texttt{education}, and tests for nonconstant variance as a function of \texttt{income}, another regressor in the data set \texttt{Prestige}. Embedding this in a function fails: <>= f3 <- function(meanmod, dta, varmod) { m3 <- lm(meanmod, dta) ncvTest(m3, varmod) } f3(prestige ~ education, Prestige, ~ income) @ \begin{Schunk} \begin{Soutput} Error in is.data.frame(data) : object 'dta' not found \end{Soutput} \end{Schunk} In this case the model \texttt{m3} is defined in the environment of the function, and the argument \texttt{dta} is defined in the global environment, and is therefore invisible when \code{ncvTest} is called. A solution is to copy \code{dta} to the global environment. <<>>= f4 <- function(meanmod, dta, varmod) { assign(".dta", dta, envir=.GlobalEnv) assign(".meanmod", meanmod, envir=.GlobalEnv) m1 <- lm(.meanmod, .dta) ans <- ncvTest(m1, varmod) remove(".dta", envir=.GlobalEnv) remove(".meanmod", envir=.GlobalEnv) ans } f4(prestige ~ education, Prestige, ~income) f4(prestige ~ education, Prestige, ~income) @ The \code{assign} function copies the \code{dta} and \code{meanmod} arguments to the global environment where \code{ncvTest} will be evaluated, and the \code{remove} function removes them before exiting the function. This is an inherently problematic strategy, because an object assigned in the global environment will replace an existing object of the same name. Consequently we renamed the \code{dta} argument \code{.dta}, with an initial period, but this is not a \emph{guarantee} that there was no preexisting object with this name. This same method can be used with functions in the \code{effects} package. Suppose, for example, you want to write a function that will fit a model, provide printed summaries and also draw a effects plot. The following function will fail: <>= library(effects) fc <- function(dta, formula, terms) { print(m1 <- lm(formula, .dta)) Effect(terms, m1) } form <- prestige ~ income*type + education terms <- c("income", "type") fc(Duncan, form, terms) @ As with \code{ncvTest}, \code{dta} will not be in the correct environment when \code{Effect} is evaluated. The solution is to copy \code{dta} to the global environment: <>= library(effects) fc.working <- function(dta, formula, terms) { assign(".dta", dta, env=.GlobalEnv) print(m1 <- lm(formula, .dta)) Effect(terms, m1) remove(".dta", envir=.GlobalEnv) } fc.working(Duncan, form, terms) @ Assigning \code{formula} to the global environment is not necessary here because it is used by \code{lm} but not by \code{Effect}. \section{\code{Boot}} The \code{Boot} function in \car{} provides a convenience front-end for the function \code{boot} in the \texttt{boot} package \citep{cantyRipley13,FoxWeisberg12}. With no arguments beyond the name of a regression object and the number of replications \texttt{R}, \code{Boot} creates the proper arguments for \code{boot} for case resampling bootstraps, and returns the coefficient vector for each sample: <<>>= m1 <- lm(time ~ t1 + t2, Transact) b1 <- Boot(m1, R=999) summary(b1) @ The returned object \texttt{b1} is of class \texttt{"boot"}, as are objects created directly from the \texttt{boot} function, so helper functions in the \texttt{boot} package and in \car{} can be used on these objects, e.g., <<>>= confint(b1) @ The \code{Boot} function would have scoping problems even without the user embedding it in a function because the \code{boot} function called by \code{Boot} tries to evaluate the model defined in the global environment in a local environment. In \code{car} we define an environment <>= .carEnv <- new.env(parent=emptyenv()) @ and then evaluate the model in the environment \code{.carEnv}. This environment is not exported, so to see that it exists you would need to enter <<>>= car:::.carEnv @ We use this same trick in the \code{Boot.default} function so that \code{.carEnv} is globally visible. Here is a copy of \code{Boot.default} to show how this works. <>= Boot.default <- function(object, f=coef, labels=names(coef(object)), R=999, method=c("case", "residual")) { if(!(require(boot))) stop("The 'boot' package is missing") f0 <- f(object) if(length(labels) != length(f0)) labels <- paste("V", seq(length(f0)), sep="") method <- match.arg(method) if(method=="case") { boot.f <- function(data, indices, .fn) { assign(".boot.indices", indices, envir=car:::.carEnv) mod <- update(object, subset=get(".boot.indices", envir=car:::.carEnv)) if(mod$qr$rank != object$qr$rank){ out <- .fn(object) out <- rep(NA, length(out)) } else {out <- .fn(mod)} out } } else { boot.f <- function(data, indices, .fn) { first <- all(indices == seq(length(indices))) res <- if(first) object$residuals else residuals(object, type="pearson")/sqrt(1 - hatvalues(object)) res <- if(!first) (res - mean(res)) else res val <- fitted(object) + res[indices] if (!is.null(object$na.action)){ pad <- object$na.action attr(pad, "class") <- "exclude" val <- naresid(pad, val) } assign(".y.boot", val, envir=car:::.carEnv) mod <- update(object, get(".y.boot", envir=car:::.carEnv) ~ .) if(mod$qr$rank != object$qr$rank){ out <- .fn(object) out <- rep(NA, length(out)) } else {out <- .fn(mod)} out } } b <- boot(data.frame(update(object, model=TRUE)$model), boot.f, R, .fn=f) colnames(b$t) <- labels if(exists(".y.boot", envir=car:::.carEnv)) remove(".y.boot", envir=car:::.carEnv) if(exists(".boot.indices", envir=car:::.carEnv)) remove(".boot.indices", envir=car:::.carEnv) b } @ The was also fixed in \code{bootCase}. \bibliography{embedding} \end{document}