\documentclass{article} \usepackage[margin=1in]{geometry} \usepackage{mathpazo} \usepackage{fancyvrb} \DefineShortVerb{\"} \def\pkg#1{\textsf{#1}} \def\lsm{\pkg{lsmeans}} \def\code{\texttt} \title{Changes in \lsm, Version 2.00} \author{Russell V.~Lenth} %\VignetteIndexEntry{Changes in the lsmeans package} %\VignetteDepends{lsmeans} %\VignetteKeywords{least-squares means} %\VignettePackage{lsmeans} \begin{document} \SweaveOpts{concordance=TRUE} \maketitle{} \section{Introduction} Versions of \lsm{} up through version 1.10 were based on a lot of ``spaghetti code'' that worked but was increasingly difficult to maintain. So starting with version 2.00, the package underwent a complete overhaul where the code is much more modular and extensible. These changes help make the package better prepared for future use. Past users of \lsm{} may use it in much the same ways as in the old version, but not entirely. And of course, that's the catch---especially when it comes to doing something later with an object created by the "lsmeans" function. The purpose of this document is to explain the changes that are being made before the package is released, so that users may be prepared for it. \subsection{Availability of old functionality} For a while, the \lsm{} package will include a function ".old.lsmeans" which is the old version of "lsmeans" from version 1.10-4. Users should adapt to the new "lsmeans" function as quickly as possible. However, in a clutch, this old one may be used. We use it several tuimes in this diocument to illustrate the differences. \section{Changes that could break existing code that uses \lsm} \subsection{In a nutshell} If you have existing code that extracts or manipulates the result of "lsmeans" in its old manifestation (i.e. treats it as a list of "data.frames"s); or uses the arguments "cov.reduce", "fac.reduce", "conf", "glhargs", "lf", or "mlf", your code will break as-is. Details follow. \subsection{Returned objects} Probably the most problematic change for past users is that "lsmeans" used to return a list of "data.frame"s (except sometimes a "glht" object was thrown in). But now it returns a single object of a new class "lsmobj", or a list thereof. <<>>= library(lsmeans) ### OLD warp.lm <- lm(breaks ~ wool * tension, data = warpbreaks) warp.oldlsm <- .old.lsmeans(warp.lm, ~ tension | wool) class(warp.oldlsm) @ <<>>= class(warp.oldlsm[[1]]) @ % I'm faking it to avoid the "envir" listing <>= ### NEW warp.lsmobj <- lsmeans(warp.lm, ~ tension | wool) class(warp.lsmobj) @ <>= "lsmobj" @ Look at the results obtained the new way: <<>>= warp.lsmobj @ Unlike the old display (not shown), this one pays attention to the "| wool" part of the specification. In the old version, users could access/manipulate the results by taking advantage of the fact that they inherited from "data.frame": <<>>= ### OLD warp.oldlsm[[1]]$lsmean @ <<>>= Try <- function(expr) tryCatch(expr, error = function(e) cat("Oops!\n")) ### NEW Try(warp.lsmobj$lsmean) @ The "show" method for an "lsmobj" is "summary", which indeed does produce an object that inherits from "data.frame". So if you need to access values that you see, call "summary" first: <<>>= ### NEW summary(warp.lsmobj)$lsmean @ In casting to "data.frame", note that the ``by'' variable ("wool" in this case) is included: <<>>= as.data.frame(summary(warp.lsmobj)) @ If there is also a contrast specification, then "lsmeans" does return a list (and not an extension thereof). But each element is of class "lsmobj", not "data.frame". <<>>= ### NEW warp.l2 <- lsmeans(warp.lm, pairwise ~ tension) class(warp.l2) sapply(warp.l2, class) @ \subsection{Changes to \code{cov.reduce} and \code{fac.reduce}} The "cov.reduce" and "fac.reduce" arguments to "lsmeans" required a second argument giving the name of the variable. This is awkward and, in the case of "fac.reduce", doesn't even make sense if you think about it. But if you have existing code that uses these functions, you will have to change it. In the new version, "cov.reduce" may be a function or a named list of functions of a single numeric variable. The default is "mean". If it is a named list, then a covariate matching a name on the list is reduced using that function, and any mismatched covariates are reduced using "mean". As before, "cov.reduce" may also be logical: "TRUE" is equivalent to "mean", and "FALSE" is equivalent to "function(x) sort(unique(x))". "fac.reduce" must now be a function of one matrix argument. Its default is "function(X) apply(X, 2, mean)". To override it (at least sensibly), you must provide a function that reduces the rows of the matrix into a single vector of the same length. \subsection{Arguments no longer provided} The "lsmeans" arguments "conf", "glhargs", "lf", and "mlf" are no longer supported. The needs they serve are supported via "lsmobj" methods or slots. Continuing with the "warp.lm" example and the returned object "warp.lsmobj", the "conf" functionality is replaced by the "confint" method: <<>>= confint(warp.lsmobj, level = .90) @ The "glhargs" capability is replaced by an "as.glht" method to create a "glht" for use with the \pkg{multcomp} package: <<>>= library(multcomp) summary(as.glht(warp.lsmobj)) @ In lieu of "lf", simply access the "linfct" slot: <<>>= warp.lsmobj@linfct @ The "mlm" argument was new and gave only rudementary support for multivariate responses. Now multivariate predictors cause "lsmeans" to create one or more additional factors that can be specified in the "lsmeans" specs. More on this later. \section{Corrections} A few bugs turned up in the course of disciovering that new results did not match old ones---and the new ones were right! Of cours, there could well be undiscovered new bugs. \subsection{Degrees of freedom} \lsm{} uses the \pkg{pbkrtest} package to obtain degrees of freedom for models fitted using the \pkg{lme4} package. These depend on both the adjusted and unadjusted covariance matrices, but it turns out that the old "lsmeans" supplied the adjusted one for both. This does not always make a difference: <<>>= library(lme4) data(Oats, package = "nlme") Oats.lmer <- lmer(yield ~ factor(nitro) + Variety + (1|Block/Variety), data = Oats, subset = -c(1,2,3,5,8,13,21,34,55)) ### OLD .old.lsmeans(Oats.lmer, pairwise ~ Variety) ### NEW lsmeans(Oats.lmer, pairwise ~ Variety) @ The discrepancies are not huge, but they are there. Without the "subset" that created unbalanced data, the results essentially agree. \subsection{Processing \code{at}} In models containing "factor" or "ordered" (like "Oats.lmer"), any "at" specification was ignored. The new version handles this correctly, including omitting inappropriate levels. <<>>= ### OLD .old.lsmeans(Oats.lmer, ~ nitro, at = list(nitro = c(.1,.2,.3))) ### NEW lsmeans(Oats.lmer, ~ nitro, at = list(nitro = c(.1,.2,.3))) @ \section{New object structure} The more recent vignettes for \lsm{} have explained least-squares means as predictions on a ``reference grid,'' or marginal averages thereof. By default, the reference grid consists of all combinations of factor levels, along with the averages of numeric predictors. But this can be changed by "at" or "cov.reduce". The new design of \lsm{} uses a reference-grid object explicitly. For example: <<>>= (Oats.rg <- ref.grid(Oats.lmer)) Oats.quad <- update(Oats.lmer, yield ~ Variety + poly(nitro,2) + (1|Block/Variety)) ref.grid(Oats.quad) ref.grid(Oats.quad, at = list(nitro = c(.1,.2,.3))) @ The "ref.grid" function calls two other functions, "recover.data" (to reproduce the dataset) and "lsm.basis" (to get the model matrix, coefficients, etc.), each of which has S3 methods for popular model objects like "lm", "mlm", "gls", "lmer", etc. This allows "ref.grid"'s capabilities to be easily extended to other model objects not yet supported. "ref.grid" serves as a constructor for an S4 object of class "ref.grid", which encapsulates all the information needed to compute---and make inferences on---least-squares means, independently of the model object itself. The "lsmeans" function now consists of S4 methods for a variety of signatures, one of which corresponds to the old version where "object" is a model object and "specs" is a formula. So, for example, we may call "lsmeans" with an existing "ref.grid", and provide specifications in place of the old formula interface: <<>>= (Oats.lsm <- lsmeans(Oats.rg, "nitro", by = "Variety")) @ Moreover, "lsmobj" is in fact an extension of "ref.grid", and we can use it as such: <<>>= str(Oats.lsm) (Oats.n <- lsmeans(Oats.lsm, "nitro")) @ \subsection{Slots} The classes "ref.grid" and "lsmobj" are essentially identical in structure, with "lsmobj" being a minor extension with the same slots. <<>>= slotNames(Oats.lsm) @ "model.info" has the call and terms. "roles" lists the names of predictors and responses. "grid" is a "data.frame" consisting of all combinations of the variables in the list "levels". The rows of "grid" go in one-to-one correspondence with those of "linfct", which contains the linear coefficients associated with each LS~mean (or reference-grid combination). "matlevs" has summary information for any matrices in the dataset. "bhat" holds the regression coefficients. "nbasis" holds information for determining non-estimability in rank-deficient situations. "V" is the covariance matrix for "bhat". "ddfm" is a function to return the degrees of freedom for a linear function of "bhat". It is passed the contents of the list "misc", thus allowing for additional parameters. "misc" also is used for bookkeeping tasks such as remembering "by" variables, labels, "adjust" settings, etc. \section{New functions and methods} There are numerous methods for "lsmobj" objects. The "summary" method produces what you see in a listing, and is an extension of "data.frame" but it is printed with different formatting and with added messages about adjustments, confidence levels, etc. You can also display the results differently. For example: <>= summary(Oats.lsm, by = "nitro") @ (results not shown) will group the "Variety" means for each "nitro" rather than the way it is displayed above. There is also and "infer" argument for flagging whether confidence intervals and/or tests are displayed: <<>>= summary(Oats.n, infer = c(TRUE,TRUE)) @ Most other methods are S3 ones, as those are suitable to our needs and often extend existing S3 methods. The "as.glht" method is illustrated earlier in this document. The "confint" and "test" methods are really courtesy methods for "summary" with argument "infer" set to "c(TRUE,FALSE)" and "c(FALSE,TRUE)" respectively. An important method is "contrast": <<>>= (warp.con <- contrast(warp.lsmobj, method = "poly")) @ These methods all return new objects of class "lsmobj". Hence they may be further analyzed or reanalyzed. For example, suppose we now want to compare the two linear and the two quadratic contrasts in the above: <<>>= contrast(warp.con, "revpairwise", by = "contrast") @ The "pairs" method is equivalent to "contrast" with \verb|method = "pairwise"|. Closely related is the new "cld" method which produces a compact letter display for which pairwise comparisons are nonsignificant: <<>>= cld(Oats.n, sort = FALSE) @ Finally, there is the "lstrends" function for estimating fitted trends of a covariate that interacts with a factor. Like the other methods, it returns an "lsmobj" object, subject to further analysis. To illustrate, the R-provided dataset "ChickWeight" has data on growth of chicks given different diets. We will fit a random-slopes model and compare the mean slope for each diet. In addition, we'll chose symbols for the display that mimic the grouping lines that some people use. <<>>= chick.lmer <- lmer(weight ~ Time * Diet + (0 + Time | Chick), data = ChickWeight) chick.lst <- lstrends(chick.lmer, ~ Diet, var = "Time") cld(chick.lst, Letters = "|||||") @ Chicks fed with Diet~3 seem to grow faster than chicks with the other diets, and Diet~1 is the worst. "lstrends" uses a difference quotient to do its work, and there is an optional argument "delta" that can be used to change its increment. It requires a model object---there is no "ref.grid" method for it. The "var" argument may imply a function call, i.e. "var=sqrt(Time)", in which case the chain rule is applied. \section{Support for multivariate models} \lsm{} now provides for models with multivariate responses, by way of defining factor levels that index the responses. Thus, linear functions of the multivariate response are available for inference. As an example, consider the package-provided dataset "MOats", which is the same as "Oats" except that each observation is a whole plot with the yields for the four "nitro" levels as responses. <<>>= head(MOats) @ Let's fit a model and obtain the reference grid: <<>>= MOats.mlm <- lm(yield ~ Block + Variety, data = MOats) (MOats.rg <- ref.grid(MOats.mlm, mult.levs = list(nitro = c(0,.2,.4,.6)))) @ (The "mult.levs" argument gives a name and levels for later use; if it had been absent, the multivariate response would have been named "rep.meas", with levels "1,2,3,4".) We may now use "nitro" just like we would in the univariate case: <<>>= lsmeans(MOats.rg, ~ nitro) lsmeans(MOats.rg, ~ Variety) @ We can verify that the latter is exactly the same as if we had averaged the responses: <<>>= MOats <- transform(MOats, avg.yield = apply(yield, 1, mean)) lsmeans(lm(avg.yield ~ Block + Variety, data = MOats), ~ Variety) @ \section{Support for more models} Several more model types are supported, including "survreg", "coxph", "coxme", and "polr" models. Here's an example for a Cox proportional-hazards model for the "cgd" dataset in the \pkg{survival} package: <<>>= library(survival) cgd.ph <- coxph(Surv(tstart, tstop, status) ~ treat * inherit + sex + age + cluster(id), data = cgd) (cgd.lsm <- lsmeans(cgd.ph, ~ treat | inherit)) @ \section{Transformations} \lsm{} tries to discover response transformations and link functions, and provides a "type" argument in "summary", "lsmip", and "predict" that allows inverting the transformation. For example, consider the Cox model just fitted. <<>>= summary(cgd.lsm, type = "response") @ As another example, suppose we transform the response in "warp.lm": <<>>= logwarp.rg <- ref.grid(update(warp.lm, log(breaks) ~ .)) summary(logwarp.rg) summary(logwarp.rg, type = "response") @ \end{document}