R-statistics blog

Seeking a New Maintainer for the Popular R Package installr

Tal Galili — Fri, 28 Apr 2023 14:00:40 +0000

TL;DR

I’m seeking someone to take over maintenance of the the popular R package installr (github), due to a shift away from Windows OS. The package has been downloaded over 3.3 million times and has a current download rate of around 61k times a month. The ideal candidate should have experience with Windows OS, be an experienced R developer, and be passionate about helping fellow R users.

Interested? please leave a comment on the github issue here.

Details

Background

Dear R Community,

As many of you may know, the installr package has been providing valuable functions for installing and updating software on Windows OS, with a particular focus on allowing R users to update R itself from the terminal (using the updateR() function). Over the years, this package has gained significant traction, and I’m grateful for the support and appreciation the community has shown.

To give you some insight into its popularity, installr has been downloaded over 3.3 million times, with a current download rate of around 61k times a month. It’s wonderful to see the positive impact this package has had on the R community, making it easier to keep R updated and install essential software for development and reproducible research.

You can find the GitHub repository for the installr package here: https://github.com/talgalili/installr And the CRAN page for the package here: https://cran.r-project.org/web/packages/installr/index.html The current version of the package is 0.23.4.

Looking for a maintainer

However, as the current maintainer of the installr package, I have a personal update to share. About five years ago, I made the switch from Windows to Linux Mint, and I no longer see myself returning to Windows. Consequently, my ability to effectively maintain the installr package has become increasingly limited.

This is why I am reaching out to the community with a request. I am searching for someone enthusiastic and dedicated to take over the maintenance of the installr package (or at least co-maintain, I can still deal with uploading to CRAN). If you are an experienced R developer with a strong background in Windows OS, and you are passionate about helping fellow R users, this might be a great opportunity for you to make a significant contribution to the community.

Responsibilities of the maintainer would include:

Addressing user-reported issues (see list here)
Updating and improving the package to keep up with the evolving R ecosystem
Ensuring compatibility with new software versions and Windows updates
Expanding the installr package’s functionality, if desired

To express your interest or to learn more about the role, please get in touch with me by leaving a comment on the github issue here. I would be more than happy to discuss the role further and answer any questions you might have.

Thank you for your continued support, and I look forward to finding a new maintainer to carry the torch and ensure the installr package continues to be a valuable resource for the R community.

Best regards,

Tal Galili

The post Seeking a New Maintainer for the Popular R Package installr first appeared on R-statistics blog.

How to install R 3.6.3 (NOT 4+) on Linux MINT 19.x (19.1, 19.2, 19.3)

Tal Galili — Sun, 03 Jul 2022 19:14:36 +0000

tl;dr

On Linux MINT 19.2, I was only able to properly install R 3.6.3 (but NOT R 4+)
The correct repos need to be updated in at least 2 files, only then can R be installed
Did I miss any tips? please leave me a comment.

Background

I have tried, and failed, to install R version 4+ on my Linux MINT 19.2. The following are the steps I took to remove my old versions of R and manage to install it again.

Some sources that helped me write this up:

Step 1: remove old R + wrong repos

Open a terminal window (ctrl+alt+T) and paste (shift + insert) the following:

sudo apt purge r-base* r-recommended r-cran-*
sudo apt autoremove
sudo apt update

Once it is all removed, we should make sure we don’t have any wrongly defined repos.

For that, make sure you have gedit installed, if not, install it in the terminal using:

sudo apt-get install gedit

Once installed, we want to make sure we don’t have any wrong repos, run the following:

sudo gedit /etc/apt/sources.list.d/additional-repositories.list

And look for (and delete) lines that look like this:

deb https://cloud.r-project.org/bin/linux/ubuntu bionic-cran40/
deb https://cloud.r-project.org/bin/linux/ubuntu focal-cran35/
deb https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/

After these are all deleted, make sure to save (hence we must use sudo before gedit), also go into the following file:

sudo gedit /etc/apt/sources.list

Once there make sure to remove any redundant lines from r-project. After all these lines are gone, make sure to put in only the following line (notice the bionic-cran35 at the end):

deb https://cloud.r-project.org/bin/linux/ubuntu bionic-cran35/

Exit and save. Now we can install R.

Step 2: install R (+RStudio)

Open a terminal window (ctrl+alt+T) and paste (shift + insert)”

sudo apt update
sudo apt install r-base

Once this is done, open R by typing “R” in the terminal. You can then get most of the packages you might want by simply running in R:

install.packages("tidyverse")
install.packages("Rcpp")

Lastly, the simplest way to get RStudio is to go to their download page, and choose the “Ubuntu 18+/Debian 10+” option. Download and run the file.

Appendix – what could go wrong?

If for some reason you have the wrong repo (ending with 40 instead of 35), such as:

deb https://cloud.r-project.org/bin/linux/ubuntu bionic-cran40/

Then when trying to install R, you might get error massages such as:

The following packages have unmet dependencies:
r-base-core : Depends: libc6 (>= 2.29) but 2.27-3ubuntu1 is to be installed
Depends: libicu66 (>= 66.1-1~) but it is not installable
Depends: libreadline8 (>= 6.0) but it is not installable.

The post How to install R 3.6.3 (NOT 4+) on Linux MINT 19.x (19.1, 19.2, 19.3) first appeared on R-statistics blog.

heatmaply 1.0.0 – beautiful interactive cluster heatmaps in R

Tal Galili — Wed, 08 Jan 2020 14:32:01 +0000

I’m excited to announce that heatmaply version 1.0.0 has been published to CRAN! (getting started vignette is available here)

What is heatmaply?

heatmaply is an R package for easily creating interactive cluster heatmaps that can be shared online as a stand-alone HTML file. Interactivity includes a tooltip display of values when hovering over cells, as well as the ability to zoom in to specific sections of the figure from the data matrix, the side dendrograms, or annotated labels.

The package aims to be compatible with gplots::heatmap.2 so you could take code written for it and just change the heatmap.2 command to be heatmaply, and get the interactive version of the plot (although with slightly different, improved, defaults for colors and dendrogram ordering). Thanks to the synergistic relationship between heatmaply and other R packages, the user is empowered by a refined control over the statistical and visual aspects of the heatmap layout.

What makes heatmaply great?

The change from version 0.16.0 to version 1.0.0 is to indicate the maturity of the package. It is to reflect the following facts:

The first version of heatmaply (0.1.0) was released on 2016-05-14. Since then, the package has had over 16 version releases (see the NEWS page for changes across versions).
The package gets around 5,000 monthly downloads, and has been downloaded over 140,000 times as of today.
We published an academic paper on heatmaply in the bioinformatics journal: heatmaply: an R package for creating interactive cluster heatmaps for online publishing. The paper is open-access under CC-BY license. As of today, the paper has been cited 47 times.
The package has unit-tests and got 90% code coverage.
This package relies primarily on the packages plotly and dendextend. Both are very mature packages.
The package is maintained by two authors, Tal Galili (me), and Alan O’Callaghan (who has been the main reason this package has gotten this far, providing a huge number of improvements and bug fixes!)

What can heatmaply do?

Many things! You can learn about the various options in the online vignette.

For example, running the following code will produce an interactive cluster heatmap of the mtcars dataset (after ranking the columns and normalizing them to range from 0 to 1):

# install.packages("heatmaply")
library(heatmaply)
mtcars_2 <- percentize(mtcars)
heatmaply(mtcars_2, k_row = 4, k_col = 2)
# I got the static image using ggheatmap instead of heatmaply

Stay in touch

The official package homepage is: https://talgalili.github.io/heatmaply/ where you can see the recent NEWS, documentation, and vignette.
Help us make the package better:
- Ask questions on https://stackoverflow.com/questions/tagged/heatmaply
- submit suggestions and bug-reports at: https://github.com/talgalili/heatmaply/issues
- send a pull request on: https://github.com/talgalili/heatmaply/
If you use the package in academic publication, please cite our paper.

We hope you’ll enjoy heatmaply

The post heatmaply 1.0.0 – beautiful interactive cluster heatmaps in R first appeared on R-statistics blog.

Registration for eRum 2018 closes in two days!

Tal Galili — Fri, 27 Apr 2018 12:15:47 +0000

Why I’m going to eRum this year instead of useR!

I have attended the useR! conferences every year now for the past 9 years, and loved it! However, this year I’m saddened that I won’t be able to go. This is because this year the conference will be held in Australia, and going there would require me to be away from home for at least 8 days (my heart goes to the people of Australia who had a hard time coming to useR all these years). Ordinarily I would do it, but given that my wife and I have a sweet 8 months year old baby (called Maya), I’m very reluctant to be away from home for that long.

The eRum 2018 conference

Fortunately for me, and for many other R users out there, we have a backup plan called eRum (a.k.a: The European R Users Meeting). It is an international conference, similar to useR!, that occurs every two years (specifically, in the years in which useR is taking place outside of Europe), and organized by Gergely Daroczi and others.

About the plan for this year:

Time and location: the conference will take place on May 14-16, 2018 @ Budapest, Hungary
Crowd size: The expectation is for ~500 R users from mostly Europe (you can see a visual breakdown of people’s country of origin here)
Content: The program has 5 keynote speakers, 12 invited speakers, 7 tracks of workshops and 2 tracks for contributed talks (picked after sifting over 150 abstracts). Knowing some of the people in the program, I can vouch for the high quality of the program.
The registration closes this Sunday, so hurry up and register! (the price is relatively cheap, starting from 80 Euro for students, and up to 275 Euro for industry).

If you get to see me around, feel free to come and say Hi

The post Registration for eRum 2018 closes in two days! first appeared on R-statistics blog.

R 3.5.0 is released! (major release with many new features)

Tal Galili — Tue, 24 Apr 2018 06:10:44 +0000

R 3.5.0 (codename “Joy in Playing”) was released yesterday. You can get the latest binaries version from here. (or the .tar.gz source code from here).

This is a major release with many new features and bug fixes, the full list is provided below.

Upgrading R on Windows and Mac

If you are using Windows you can easily upgrade to the latest version of R using the installr package. Simply run the following code in Rgui:

install.packages("installr") # install 
setInternet2(TRUE) # only for R versions older than 3.3.0
installr::updateR() # updating R.
# If you wish it to go faster, run: installr::updateR(T)

Running “updateR()” will detect if there is a new R version available, and if so it will download+install it (etc.). There is also a step by step tutorial (with screenshots) on how to upgrade R on Windows, using the installr package. If you only see the option to upgrade to an older version of R, then change your mirror or try again in a few hours (it usually take around 24 hours for all CRAN mirrors to get the latest version of R).

If you are using Mac you can easily upgrade to the latest version of R using Andrea Cirillo’s updateR package. The package is not on CRAN, so you’ll need to run the following code in Rgui:

install.packages("devtools")
devtools::install_github("AndreaCirilloAC/updateR")
updateR(admin_password = "PASSWORD") # Where "PASSWORD" stands for your system password

Later this year Andrea and I intend to merge the updateR package into installr so that the updateR function will work seamlessly in both Windows and Mac. Stay tuned

CHANGES IN R 3.5.0

SIGNIFICANT USER-VISIBLE CHANGES

All packages are by default byte-compiled on installation. This makes the installed packages larger (usually marginally so) and may affect the format of messages and tracebacks (which often exclude .Call and similar).

NEW FEATURES

factor() now uses order() to sort its levels, rather than sort.list(). This allows factor() to support custom vector-like objects if methods for the appropriate generics are defined. It has the side effect of making factor() succeed on empty or length-one non-atomic vector(-like) types (e.g., "list"), where it failed before.
diag() gets an optional names argument: this may require updates to packages defining S4 methods for it.
chooseCRANmirror() and chooseBioCmirror() no longer have a useHTTPS argument, not needed now all R builds support https:// downloads.
New summary() method for warnings() with a (somewhat experimental) print() method.
(methods package.) .self is now automatically registered as a global variable when registering a reference class method.
tempdir(check = TRUE) recreates the tempdir() directory if it is no longer valid (e.g. because some other process has cleaned up the ‘/tmp’ directory).
New askYesNo() function and "askYesNo" option to ask the user binary response questions in a customizable but consistent way. (Suggestion of PR#17242.)
New low level utilities ...elt(n) and ...length() for working with ... parts inside a function.
isTRUE() is more tolerant and now true in
```
   x <- rlnorm(99)
   isTRUE(median(x) == quantile(x)["50%"])
```
New function isFALSE() defined analogously to isTRUE().
The default symbol table size has been increased from 4119 to 49157; this may improve the performance of symbol resolution when many packages are loaded. (Suggested by Jim Hester.)
line() gets a new option iter = 1.
Reading from connections in text mode is buffered, significantly improving the performance of readLines(), as well as scan() and read.table(), at least when specifying colClasses.
order() is smarter about picking a default sort method when its arguments are objects.
available.packages() has two new arguments which control if the values from the per-session repository cache are used (default true, as before) and if so how old cached values can be to be used (default one hour).These arguments can be passed from install.packages(), update.packages() and functions calling that: to enable this available.packages(), packageStatus() anddownload.file() gain a ... argument.
packageStatus()‘s upgrade() method no longer ignores its ... argument but passes it to install.packages().
installed.packages() gains a ... argument to allow arguments (including noCache) to be passed from new.packages(), old.packages(), update.packages() and packageStatus().
factor(x, levels, labels) now allows duplicated labels (not duplicated levels!). Hence you can map different values of x to the same level directly.
Attempting to use names<-() on an S4 derivative of a basic type no longer emits a warning.
The list method of within() gains an option keepAttrs = FALSE for some speed-up.
system() and system2() now allow the specification of a maximum elapsed time (‘timeout’).
debug() supports debugging of methods on any object of S4 class "genericFunction", including group generics.
Attempting to increase the length of a variable containing NULL using length()<- still has no effect on the target variable, but now triggers a warning.
type.convert() becomes a generic function, with additional methods that operate recursively over list and data.frame objects. Courtesy of Arni Magnusson (PR#17269).
lower.tri(x) and upper.tri(x) only needing dim(x) now work via new functions .row() and .col(), so no longer call as.matrix() by default in order to work efficiently for all kind of matrix-like objects.
print() methods for "xgettext" and "xngettext" now use encodeString() which keeps, e.g. "\n", visible. (Wish of PR#17298.)
package.skeleton() gains an optional encoding argument.
approx(), spline(), splinefun() and approxfun() also work for long vectors.
deparse() and dump() are more useful for S4 objects, dput() now using the same internal C code instead of its previous imperfect workaround R code. S4 objects now typically deparse perfectly, i.e., can be recreated identically from deparsed code.dput(), deparse() and dump() now print the names() information only once, using the more readable (tag = value) syntax, notably for list()s, i.e., including data frames.
These functions gain a new control option "niceNames" (see .deparseOpts()), which when set (as by default) also uses the (tag = value) syntax for atomic vectors. On the other hand, without deparse options "showAttributes" and "niceNames", names are no longer shown also for lists. as.character(list( c (one = 1))) now includes the name, as as.character(list(list(one = 1))) has always done.

m:n now also deparses nicely when m > n.

The "quoteExpressions" option, also part of "all", no longer quote()s formulas as that may not re-parse identically. (PR#17378)
If the option setWidthOnResize is set and TRUE, R run in a terminal using a recent readline library will set the width option when the terminal is resized. Suggested by Ralf Goertz.
If multiple on.exit() expressions are set using add = TRUE then all expressions will now be run even if one signals an error.
mclapply() gets an option affinity.list which allows more efficient execution with heterogeneous processors, thanks to Helena Kotthaus.
The character methods for as.Date() and as.POSIXlt() are more flexible via new arguments tryFormats and optional: see their help pages.
on.exit() gains an optional argument after with default TRUE. Using after = FALSE with add = TRUE adds an exit expression before any existing ones. This way the expressions are run in a first-in last-out fashion. (From Lionel Henry.)
On Windows, file.rename() internally retries the operation in case of error to attempt to recover from possible anti-virus interference.
Command line completion on :: now also includes lazy-loaded data.
If the TZ environment variable is set when date-time functions are first used, it is recorded as the session default and so will be used rather than the default deduced from the OS if TZ is subsequently unset.
There is now a [ method for class "DLLInfoList".
glm() and glm.fit get the same singular.ok = TRUE argument that lm() has had forever. As a consequence, in glm(*, method = ), user specified methods need to accept a singular.ok argument as well.
aspell() gains a filter for Markdown (‘.md’ and ‘.Rmd’) files.
intToUtf8(multiple = FALSE) gains an argument to allow surrogate pairs to be interpreted.
The maximum number of DLLs that can be loaded into R e.g. via dyn.load() has been increased up to 614 when the OS limit on the number of open files allows.
Sys.timezone() on a Unix-alike caches the value at first use in a session: inter alia this means that setting TZ later in the session affects only the current time zone and not the system one.Sys.timezone() is now used to find the system timezone to pass to the code used when R is configured with –with-internal-tzcode.
When tar() is used with an external command which is detected to be GNU tar or libarchive tar (aka bsdtar), a different command-line is generated to circumvent line-length limits in the shell.
system(*, intern = FALSE), system2() (when not capturing output), file.edit() and file.show() now issue a warning when the external command cannot be executed.
The “default” ("lm" etc) methods of vcov() have gained new optional argument complete = TRUE which makes the vcov() methods more consistent with the coef()methods in the case of singular designs. The former (back-compatible) behavior is given by vcov(*, complete = FALSE).
coef() methods (for lm etc) also gain a complete = TRUE optional argument for consistency with vcov().
For "aov", both coef() and vcov() methods remain back-compatibly consistent, using the other default, complete = FALSE.
attach(*, pos = 1) is now an error instead of a warning.
New function getDefaultCluster() in package parallel to get the default cluster set via setDefaultCluster().
str(x) for atomic objects x now treats both cases of is.vector(x) similarly, and hence much less often prints "atomic". This is a slight non-back-compatible change producing typically both more informative and shorter output.
write.dcf() gets optional argument useBytes.
New, partly experimental packageDate() which tries to get a valid "Date" object from a package ‘DESCRIPTION’ file, thanks to suggestions in PR#17324.
tools::resaveRdaFiles() gains a version argument, for use when packages should remain compatible with earlier versions of R.
ar.yw(x) and hence by default ar(x) now work when x has NAs, mostly thanks to a patch by Pavel Krivitsky in PR#17366. The ar.yw.default()‘s AIC computations have become more efficient by using determinant().
New warnErrList() utility (from package nlme, improved).
By default the (arbitrary) signs of the loadings from princomp() are chosen so the first element is non-negative.
If –default-packages is not used, then Rscript now checks the environment variable R_SCRIPT_DEFAULT_PACKAGES. If this is set, then it takes precedence over R_DEFAULT_PACKAGES. If default packages are not specified on the command line or by one of these environment variables, then Rscript now uses the same default packages as R. For now, the previous behavior of not including methods can be restored by setting the environment variable R_SCRIPT_LEGACY to yes.
When a package is found more than once, the warning from find.package(*, verbose=TRUE) lists all library locations.
POSIXt objects can now also be rounded or truncated to month or year.
stopifnot() can be used alternatively via new argument exprs which is nicer and useful when testing several expressions in one call.
The environment variable R_MAX_VSIZE can now be used to specify the maximal vector heap size. On macOS, unless specified by this environment variable, the maximal vector heap size is set to the maximum of 16GB and the available physical memory. This is to avoid having the R process killed when macOS over-commits memory.
sum(x) and sum(x1,x2,..,x) with many or long logical or integer vectors no longer overflows (and returns NA with a warning), but returns double numbers in such cases.
Single components of "POSIXlt" objects can now be extracted and replaced via [ indexing with 2 indices.
S3 method lookup now searches the namespace registry after the top level environment of the calling environment.
Arithmetic sequences created by 1:n, seq_along, and the like now use compact internal representations via the ALTREP framework. Coercing integer and numeric vectors to character also now uses the ALTREP framework to defer the actual conversion until first use.
Finalizers are now run with interrupts suspended.
merge() gains new option no.dups and by default suffixes the second of two duplicated column names, thanks to a proposal by Scott Ritchie (and Gabe Becker).
scale.default(x, center, scale) now also allows center or scale to be “numeric-alike”, i.e., such that as.numeric(.) coerces them correctly. This also eliminates a wrong error message in such cases.
par*apply and par*applyLB gain an optional argument chunk.size which allows to specify the granularity of scheduling.
Some as.data.frame() methods, notably the matrix one, are now more careful in not accepting duplicated or NA row names, and by default produce unique non-NA row names. This is based on new function .rowNamesDF(x, make.names = *) <- rNms where the logical argument make.names allows to specify how invalid row names rNms are handled. .rowNamesDF() is a “workaround” compatible default.
R has new serialization format (version 3) which supports custom serialization of ALTREP framework objects. These objects can still be serialized in format 2, but less efficiently. Serialization format 3 also records the current native encoding of unflagged strings and converts them when de-serialized in R running under different native encoding. Format 3 comes with new serialization magic numbers (RDA3, RDB3, RDX3). Format 3 can be selected by version = 3 in save(), serialize() and saveRDS(), but format 2 remains the default for all serialization and saving of the workspace. Serialized data in format 3 cannot be read by versions of R prior to version 3.5.0.
The "Date" and “date-time” classes "POSIXlt" and "POSIXct" now have a working `length<-` method, as wished in PR#17387.
optim(*, control = list(warn.1d.NelderMead = FALSE)) allows to turn off the warning when applying the default "Nelder-Mead" method to 1-dimensional problems.
matplot(.., panel.first = .) etc now work, as log becomes explicit argument and ... is passed to plot() unevaluated, as suggested by Sebastian Meyer in PR#17386.
Interrupts can be suspended while evaluating an expression using suspendInterrupts. Subexpression can be evaluated with interrupts enabled using allowInterrupts. These functions can be used to make sure cleanup handlers cannot be interrupted.
R 3.5.0 includes a framework that allows packages to provide alternate representations of basic R objects (ALTREP). The framework is still experimental and may undergo changes in future R releases as more experience is gained. For now, documentation is provided in https://svn.r-project.org/R/branches/ALTREP/ALTREP.html.

UTILITIES

install.packages() for source packages now has the possibility to set a ‘timeout’ (elapsed-time limit). For serial installs this uses the timeout argument of system2(): for parallel installs it requires the timeout utility command from GNU coreutils.
It is now possible to set ‘timeouts’ (elapsed-time limits) for most parts of R CMD check via environment variables documented in the ‘R Internals’ manual.
The ‘BioC extra’ repository which was dropped from Bioconductor 3.6 and later has been removed from setRepositories(). This changes the mapping for 6–8 used by setRepositories(ind=).
R CMD check now also applies the settings of environment variables _R_CHECK_SUGGESTS_ONLY_ and _R_CHECK_DEPENDS_ONLY_ to the re-building of vignettes.
R CMD check with environment variable _R_CHECK_DEPENDS_ONLY_ set to a true value makes test-suite-management packages available and (for the time being) works around a common omission of rmarkdown from the VignetteBuilder field.

INSTALLATION on a UNIX-ALIKE

Support for a system Java on macOS has been removed — install a fairly recent Oracle Java (see ‘R Installation and Administration’ §C.3.2).
configure works harder to set additional flags in SAFE_FFLAGS only where necessary, and to use flags which have little or no effect on performance.In rare circumstances it may be necessary to override the setting of SAFE_FFLAGS.
C99 functions expm1, hypot, log1p and nearbyint are now required.
configure sets a -std flag for the C++ compiler for all supported C++ standards (e.g., -std=gnu++11 for the C++11 compiler). Previously this was not done in a few cases where the default standard passed the tests made (e.g. clang 6.0.0 for C++11).

C-LEVEL FACILITIES

‘Writing R Extensions’ documents macros MAYBE_REFERENCED, MAYBE_SHARED and MARK_NOT_MUTABLE that should be used by package C code instead NAMED or SET_NAMED.
The object header layout has been changed to support merging the ALTREP branch. This requires re-installing packages that use compiled code.
‘Writing R Extensions’ now documents the R_tryCatch, R_tryCatchError, and R_UnwindProtect functions.
NAMEDMAX has been raised to 3 to allow protection of intermediate results from (usually ill-advised) assignments in arguments to BUILTIN functions. Package C code usingSET_NAMED may need to be revised.

DEPRECATED AND DEFUNCT

Sys.timezone(location = FALSE) is defunct, and is ignored (with a warning).
methods:::bind_activation() is defunct now; it typically has been unneeded for years.The undocumented ‘hidden’ objects .__H__.cbind and .__H__.rbind in package base are deprecated (in favour of cbind and rbind).
The declaration of pythag() in ‘Rmath.h’ has been removed — the entry point has not been provided since R 2.14.0.

BUG FIXES

printCoefmat() now also works without column names.
The S4 methods on Ops() for the "structure" class no longer cause infinite recursion when the structure is not an S4 object.
nlm(f, ..) for the case where f() has a "hessian" attribute now computes LL’ = H + µI correctly. (PR#17249).
An S4 method that “rematches” to its generic and overrides the default value of a generic formal argument to NULL no longer drops the argument from its formals.
Rscript can now accept more than one argument given on the #! line of a script. Previously, one could only pass a single argument on the #! line in Linux.
Connections are now written correctly with encoding "UTF-16LE". (PR#16737).
Evaluation of ..0 now signals an error. When ..1 is used and ... is empty, the error message is more appropriate.
(Windows mainly.) Unicode code points which require surrogate pairs in UTF-16 are now handled. All systems should properly handle surrogate pairs, even those systems that do not need to make use of them. (PR#16098)
stopifnot(e, e2, ...) now evaluates the expressions sequentially and in case of an error or warning shows the relevant expression instead of the full stopifnot(..) call.
path.expand() on Windows now accepts paths specified as UTF-8-encoded character strings even if not representable in the current locale. (PR#17120)
line(x, y) now correctly computes the medians of the left and right group’s x-values and in all cases reproduces straight lines.
Extending S4 classes with slots corresponding to special attributes like dim and dimnames now works.
Fix for legend() when fill has multiple values the first of which is NA (all colours used to default to par(fg)). (PR#17288)
installed.packages() did not remove the cached value for a library tree that had been emptied (but would not use the old value, just waste time checking it).
The documentation for installed.packages(noCache = TRUE) incorrectly claimed it would refresh the cache.
aggregate() no longer uses spurious names in some cases. (PR#17283)
object.size() now also works for long vectors.
packageDescription() tries harder to solve re-encoding issues, notably seen in some Windows locales. This fixes the citation() issue in PR#17291.
poly(, 3) now works, thanks to prompting by Marc Schwartz.
readLines() no longer segfaults on very large files with embedded '\0' (aka ‘nul’) characters. (PR#17311)
ns() (package splines) now also works for a single observation. interpSpline() gives a more friendly error message when the number of points is less than four.
dist(x, method = "canberra") now uses the correct definition; the result may only differ when x contains values of differing signs, e.g. not for 0-1 data.
methods:::cbind() and methods:::rbind() avoid deep recursion, thanks to Suharto Anggono via PR#17300.
Arithmetic with zero-column data frames now works more consistently; issue raised by Bill Dunlap.Arithmetic with data frames gives a data frame for ^ (which previously gave a numeric matrix).
pretty(x, n) for large n or large diff(range(x)) now works better (though it was never meant for large n); internally it uses the same rounding fuzz (1e-10) as seq.default() — as it did up to 2010-02-03 when both were 1e-7.
Internal C-level R_check_class_and_super() and hence R_check_class_etc() now also consider non-direct super classes and hence return a match in more cases. This e.g., fixes behaviour of derived classes in package Matrix.
Reverted unintended change in behavior of return calls in on.exit expressions introduced by stack unwinding changes in R 3.3.0.
Attributes on symbols are now detected and prevented; attempt to add an attribute to a symbol results in an error.
fisher.test(*, workspace = ) now may also increase the internal stack size which allows larger problem to be solved, fixing PR#1662.
The methods package no longer directly copies slots (attributes) into a prototype that is of an “abnormal” (reference) type, like a symbol.
The methods package no longer attempts to call length<-() on NULL (during the bootstrap process).
The methods package correctly shows methods when there are multiple methods with the same signature for the same generic (still not fully supported, but at least the user can see them).
sys.on.exit() is now always evaluated in the right frame. (From Lionel Henry.)
seq.POSIXt(*, by = " DSTdays") now should work correctly in all cases and is faster. (PR#17342)
.C() when returning a logical vector now always maps values other than FALSE and NA to TRUE (as documented).
Subassignment with zero length vectors now coerces as documented (PR#17344).
Further, x <- numeric(); x[1] <- character() now signals an error ‘replacement has length zero’ (or a translation of that) instead of doing nothing.
(Package parallel.) mclapply(), pvec() and mcparallel() (when mccollect() is used to collect results) no longer leave zombie processes behind.
R CMD INSTALL now produces the intended error message when, e.g., the LazyData field is invalid.
as.matrix(dd) now works when the data frame dd contains a column which is a data frame or matrix, including a 0-column matrix/d.f. .
mclapply(X, mc.cores) now follows its documentation and calls lapply() in case mc.cores = 1 also in the case mc.preschedule is false. (PR#17373)
aggregate(, drop=FALSE) no longer calls the function on parts but sets corresponding results to NA. (Thanks to Suharto Anggono’s patches in PR#17280).
The duplicated() method for data frames is now based on the list method (instead of string coercion). Consequently unique() is better distinguishing data frame rows, fixing PR#17369 and PR#17381. The methods for matrices and arrays are changed accordingly.
Calling names() on an S4 object derived from "environment" behaves (by default) like calling names() on an ordinary environment.
read.table() with a non-default separator now supports quotes following a non-whitespace character, matching the behavior of scan().
parLapplyLB and parSapplyLB have been fixed to do load balancing (dynamic scheduling). This also means that results of computations depending on random number generators will now really be non-reproducible, as documented.
Indexing a list using dollar and empty string (l$"") returns NULL.
Using \usage{ data(, package="") } no longer produces R CMD check warnings.
match.arg() more carefully chooses the environment for constructing default choices, fixing PR#17401 as proposed by Duncan Murdoch.
Deparsing of consecutive ! calls is now consistent with deparsing unary - and + calls and creates code that can be reparsed exactly; thanks to a patch by Lionel Henry inPR#17397. (As a side effect, this uses fewer parentheses in some other deparsing involving ! calls.)

The post R 3.5.0 is released! (major release with many new features) first appeared on R-statistics blog.

R 3.4.3 is released (a bug-fix release)

Tal Galili — Fri, 08 Dec 2017 18:32:48 +0000

R 3.4.3 (codename “Kite-Eating Tree”) was released last week. You can get the latest binaries version from here. (or the .tar.gz source code from here).

As mentioned by David Smith, R 3.4.3 is primarily a bug-fix release:

It fixes an issue with incorrect time zones on MacOS High Sierra, and some issues with handling Unicode characters. (Incidentally, representing international and special characters is something that R takes great care in handling properly. It’s not an easy task: a 2003 essay by Joel Spolsky describes the minefield that is character representation, and not much has changed since then.)

The full list of bug fixes and new features is provided below.

Upgrading to R 3.4.3 on Windows

If you are using Windows you can easily upgrade to the latest version of R using the installr package. Simply run the following code in Rgui:

install.packages("installr") # install 
setInternet2(TRUE) # only for R versions older than 3.3.0
installr::updateR() # updating R.
# If you wish it to go faster, run: installr::updateR(T)

I try to keep the installr package updated and useful, so if you have any suggestions or remarks on the package – you are invited to open an issue in the github page.

CHANGES IN R 3.4.3

INSTALLATION on a UNIX-ALIKE

A workaround has been added for the changes in location of time-zone files in macOS 10.13 ‘High Sierra’ and again in 10.13.1, so the default time zone is deduced correctly from the system setting when R is configured with –with-internal-tzcode (the default on macOS).
R CMD javareconf has been updated to recognize the use of a Java 9 SDK on macOS.

BUG FIXES

raw(0) & raw(0) and raw(0) | raw(0) again return raw(0) (rather than logical(0)).
intToUtf8() converts integers corresponding to surrogate code points to NA rather than invalid UTF-8, as well as values larger than the current Unicode maximum of 0x10FFFF. (This aligns with the current RFC3629.)
Fix calling of methods on S4 generics that dispatch on ... when the call contains ....
Following Unicode ‘Corrigendum 9’, the UTF-8 representations of U+FFFE and U+FFFF are now regarded as valid by utf8ToInt().
range(c(TRUE, NA), finite = TRUE) and similar no longer return NA. (Reported by Lukas Stadler.)
The self starting function attr(SSlogis, "initial") now also works when the y values have exact minimum zero and is slightly changed in general, behaving symmetrically in the y range.
The printing of named raw vectors is now formatted nicely as for other such atomic vectors, thanks to Lukas Stadler.

The post R 3.4.3 is released (a bug-fix release) first appeared on R-statistics blog.

heatmaply: an R package for creating interactive cluster heatmaps for online publishing

Tal Galili — Mon, 30 Oct 2017 14:07:54 +0000

This post on the heatmaply package is based on my recent paper from the journal bioinformatics (a link to a stable DOI). The paper was published just last week, and since it is released as CC-BY, I am permitted (and delighted) to republish it here in full. My co-authors for this paper are Jonathan Sidi, Alan O’Callaghan, and Carson Sievert.

Summary: heatmaply is an R package for easily creating interactive cluster heatmaps that can be shared online as a stand-alone HTML file. Interactivity includes a tooltip display of values when hovering over cells, as well as the ability to zoom in to specific sections of the figure from the data matrix, the side dendrograms, or annotated labels. Thanks to the synergistic relationship between heatmaply and other R packages, the user is empowered by a refined control over the statistical and visual aspects of the heatmap layout.

Availability: The heatmaply package is available under the GPL-2 Open Source license. It comes with a detailed vignette, and is freely available from: http://cran.r-project.org/package=heatmaply

Contact: Tal.Galili@math.tau.ac.il

Introduction

A cluster heatmap is a popular graphical method for visualizing high dimensional data. In it, a table of numbers is scaled and encoded as a tiled matrix of colored cells. The rows and columns of the matrix are ordered to highlight patterns and are often accompanied by dendrograms and extra columns of categorical annotation. The ongoing development of this iconic visualization, spanning over more than a century, has provided the foundation for one of the most widely used of all bioinformatics displays (Wilkinson and Friendly, 2009). When using the R language for statistical computing (R Core Team, 2016), there are many available packages for producing static heatmaps, such as: stats, gplots, heatmap3, fheatmap, pheatmap, and others. Recently released packages also allow for more complex layouts; these include gapmap, superheat, and ComplexHeatmap (Gu et al., 2016). The next evolutionary step has been to create interactive cluster heatmaps, and several solutions are already available. However, these solutions, such as the idendro R package (Sieger et al., 2017), are often focused on providing an interactive output that can be explored only on the researcher’s personal computer. Some solutions do exist for creating shareable interactive heatmaps. However, these are either dependent on a specific online provider, such as XCMS Online, or require JavaScript knowledge to operate, such as InCHlib. In practice, when publishing in academic journals, the reader is left with a static figure only (often in a png or pdf format).

To fill this gap, we have developed the heatmaply R package for easily creating a shareable HTML file that contains an interactive cluster heatmap. The interactivity is based on a client-side JavaScript code that is generated based on the user’s data, after running the following command:

install.packages("heatmaply")
library(heatmaply)
heatmaply(data, file = "my_heatmap.html")

The HTML file contains a publication-ready, interactive figure that allows the user to zoom in as well as see values when hovering over the cells. This self-contained HTML file can be made available to interested readers by uploading it to the researcher’s homepage or as a supplementary material in the journal’s server. Concurrently, this interactive figure can be displayed in RStudio’s viewer pane, included in a Shiny application, or embedded in a knitr/RMarkdown HTML documents.

The rest of this paper offers guidelines for creating effective cluster heatmap visualization. Figure 1 demonstrates the suggestions from this section on data from project Tycho (van Panhuis et al., 2013), while the online supplementary information includes the interactive version, as well as several examples of using the package on real-world biological data.

Fig. 1. The (square root) number of people infected by Measles in 50 states, from 1928 to 2003. Vaccines were introduced in 1963

click the image for the online interactive version of the plot

An interactive version of the measles heatmap (embedded in the post using iframe)

I uploaded the measles_heatmaply.html to github and then used the following code to embed it in the post:

Here is the result:

heatmaply – a simple example

The generation of cluster heatmaps is a subtle process (Gehlenborg and Wong, 2012; Weinstein, 2008), requiring the user to make many decisions along the way. The major decisions to be made deal with the data matrix and the dendrogram. The raw data often need to be transformed in order to have a meaningful and comparable scale, while an appropriate color palette should be picked. The clustering of the data requires us to decide on a distance measure between the observation, a linkage function, as well as a rotation and coloring of branches that manage to highlight interpretable clusters. Each such decision can have consequences on the patterns and interpretations that emerge. In this section, we go through some of the arguments in the function heatmaply, aiming to make it easy for the user to tune these important statistical and visual parameters. Our toy example visualizes the effect of vaccines on measles infection. The output is given in the static Fig. 1, while an interactive version is available online in the supplementary file “measles.html”. Both were created using:

heatmaply(x = sqrt(measles),
           color = viridis, # the default
           Colv = NULL,
           hclust_method = "average", k_row = NA, # ...
           file = c("measles.html", "measles.png") )

The first argument of the function (x) accepts a matrix of the data. In the measles data, each row corresponds with a state, each column with a year (from 1928 to 2003), and each cell with the number of people infected with measles per 100,000 people. In this example, the data were scaled twice – first by not giving the raw number of cases with measles, but scaling them relatively to 100,000 people, thus making it possible to more easily compare between states. And second by taking the square root of the values. This was done since all the values in the data represent the same unit of measure, but come from a right-tailed distribution of count data with some extreme observations. Taking the square root helps with bringing extreme observations closer to one another, helping to avoid an extreme observation from masking the general pattern. Other transformations that may be considered come from Box-Cox or Yeo-Johnson family of power transformations. If each column of the data were to represent a different unit of measure, then leaving the values unchanged will often result in the entire figure being un-usable due to the column with the largest range of values taking over most of the colors in the figure. Possible per-column transformations include the scale function, suitable for data that are relatively normal. normalize, and percentize functions bring data to the comparable 0 to 1 scale for each column. The normalize function preserves the shape of each column’s distribution by subtracting the minimum and dividing by the maximum of all observations for each column. The percentize function is similar to ranking but with the simpler interpretation of each value being replaced by the percent of observations that have that value or below. It uses the empirical cumulative distribution function of each variable on its own values. The sparseness of the dataset can be explored using is.na10.

Once the data are adequately scaled, it is important to choose a good color palette for the data. Other than being pretty, an ideal color palette should have three (somewhat conflicting) properties: (1) Colorful, spanning as wide a palette as possible so as to make differences easy to see; (2) Perceptually uniform, so that values close to each other have similar-appearing colors compared with values that are far away, consistently across the range of values; and (3) Robust to colorblindness, so that the above properties hold true for people with common forms of colorblindness, as well as printing well in grey scale. The default passed to the color argument in heatmaply is viridis, which offers a sequential color palette, offering a good balance of these properties. Divergent color scale should be preferred when visualizing a correlation matrix, as it is important to make the low and high ends of the range visually distinct. A helpful divergent palette available in the package is cool_warm (other alternatives in the package include RdBu, BrBG, or RdYlBu, based on the RColorBrewer package). It is also advisable to set the limits argument to range from -1 to 1.

Passing NULL to the Colv argument, in our example, removed the column dendrogram (since we wish to keep the order of the columns, relating to the years). The row dendrogram is automatically calculated using hclust with a Euclidean distance measure and the average linkage function. The user can choose to use an alternative clustering function (hclustfun), distance measure (dist_method), or linkage function (hclust_method), or to have a dendrogram only in the rows/columns or none at all (through the dendrogram argument). Also, the users can supply their own dendrogram objects into the Rowv (or Colv) arguments. The preparation of the dendrograms can be made easier using the dendextend R package (Galili, 2015) for comparing and adjusting dendrograms. These choices are all left for the user to decide. Setting the k_col/k_row argument to NA makes the function search for the number of clusters (between from 2 to 10) by which to color the branches of the dendrogram. The number picked is the one that yields the highest average silhouette coefficient (based on the find_k function from dendextend). Lastly, the heatmaply function uses the seriation package to find an “optimal” ordering of rows and columns (Hahsler et al., 2008). This is controlled using the seriation argument where the default is “OLO” (optimal-leaf-order) – which rotates the branches so that the sum of distances between each adjacent leaf (label) will be minimized (i.e.: optimize the Hamiltonian path length that is restricted by the dendrogram structure). The other arguments in the example were omitted since they are self-explanatory – the exact code is available in the supplementary material.

In order to make some of the above easier, we created the shinyHeatmaply package (available on CRAN) which offers a GUI to help guide the researcher with the heatmap construction, with the functionality to export the heatmap as an html file and summaries parameter specifications to reproduce the heatmap with heatmaply. For more detailed step-by-step demonstration of using heatmaply on biological datasets, you should explore the heatmaplyExamples package (at github.com/talgalili/heatmaplyExamples).

The following biological examples are available and fully reproducible from within the package. You may also view them online in the following links (the html files also include the R code for producing the figures):

Introduction to heatmaply
General biological examples
- Using heatmaply with the measles data set ( the final output)
- Using heatmaply with famous data sets
Reproducing heatmaps from papers published in Nature
- Using heatmaply to reproduce Nature (2015) Kotsyfakis et al.
- Using heatmaply to reproduce Nature (2015) Alfano et al.
Using heatmaply with gene expression data
General examples
- Using heatmaply for visualizing glmnet coefficient path

Acknowledgements

The heatmaply package was made possible by leveraging many wonderful R packages, including ggplot2 (Wickham, 2009), plotly (Sievert et al., 2016), dendextend (Galili, 2015) and many others. We would also like to thank Yoav Benjamini, Madeline Bauer, and Marilyn Friedes for their helpful comments, as well as Joe Cheng for initiating the collaboration with Tal Galili on d3heatmap, which helped lay the foundation for heatmaply.

Funding: This work was supported in part by the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement no. 604102 (Human Brain Project).

Conflict of Interest: none declared.

References

Galili,T. (2015) dendextend: an R package for visualizing, adjusting and comparing trees of hierarchical clustering. Bioinformatics, 31, 3718–3720.
Gehlenborg,N. and Wong,B. (2012) Points of view: Heat maps. Nat. Methods, 9, 213–213.
Gu,Z. et al. (2016) Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics, 32, 2847–2849.
Hahsler,M. et al. (2008) Getting Things in Order : An Introduction to the R Package seriation. J. Stat. Softw., 25, 1–27.
van Panhuis,W.G. et al. (2013) Contagious Diseases in the United States from 1888 to the Present. N. Engl. J. Med., 369, 2152–2158.
R Core Team,(R Foundation for Statistical Computing) (2016) R: A Language and Environment for Statistical Computing.
Sieger,T. et al. (2017) Interactive Dendrograms: The R Packages idendro and idendr0. J. Stat. Softw., 76.
Sievert,C. et al. (2016) plotly: Create Interactive Web Graphics via ‘plotly.js’.
Weinstein,J.N. (2008) BIOCHEMISTRY: A Postgenomic Visual Icon. Science (80-. )., 319, 1772–1773.
Wickham,H. (2009) ggplot2 Elegant Graphics for Data Analysis.
Wilkinson,L. and Friendly,M. (2009) The History of the Cluster Heat Map. Am. Stat., 63, 179–184.

The post heatmaply: an R package for creating interactive cluster heatmaps for online publishing first appeared on R-statistics blog.

R 3.4.2 is released (with several bug fixes and a few performance improvements)

Tal Galili — Fri, 29 Sep 2017 17:46:38 +0000

R 3.4.2 (codename “Short Summer”) was released yesterday. You can get the latest binaries version from here. (or the .tar.gz source code from here).

As mentioned by David Smith, R 3.4.2 includes a performance improvement for names:

c() and unlist() are now more efficient in constructing the names(.) of their return value, thanks to a proposal by Suharto Anggono. (PR#17284)

The full list of bug fixes and new features is provided below.

Thank you Duncan Murdoch !

On a related note, following the announcement on R 3.4.2, Duncan Murdoch wrote yesterday:

I’ve just finished the Windows build of R 3.4.2. It will make it to CRAN and its mirrors over the next few hours.

This is the last binary release that I will be producing. I’ve been building them for about 15 years, and it’s time to retire. Builds using different tools and scripts are available from https://mran.microsoft.com/download/. I’ll be putting my own scripts on CRAN soon in case anyone wants to duplicate them.

Nightly builds of R-patched and R-devel will continue to run on autopilot for the time being, without maintenance.

I will also be retiring from maintenance of the Rtools collection.

I am grateful to Duncan for contributing so much of his time and expertise throughout the years. And I am confident that other R users, using the binaries for the Windows OS, share this sentiment.

Upgrading to R 3.4.2 on Windows

If you are using Windows you can easily upgrade to the latest version of R using the installr package. Simply run the following code in Rgui:

install.packages("installr") # install 
setInternet2(TRUE) # only for R versions older than 3.3.0
installr::updateR() # updating R.
# If you wish it to go faster, run: installr::updateR(T)

I try to keep the installr package updated and useful, so if you have any suggestions or remarks on the package – you are invited to open an issue in the github page.

CHANGES IN R 3.4.2

NEW FEATURES

Setting the LC_ALL category in Sys.setlocale() invalidates any cached locale-specific day/month names and the AM/PM indicator for strptime() (as setting LC_TIME has since R 3.1.0).
The version of LAPACK included in the sources has been updated to 3.7.1, a bug-fix release.
The default for tools::write_PACKAGES(rds_compress=) has been changed to "xz" to match the compression used by CRAN.
c() and unlist() are now more efficient in constructing the names(.) of their return value, thanks to a proposal by Suharto Anggono. (PR#17284)

UTILITIES

R CMD check checks for and R CMD build corrects CRLF line endings in shell scripts configure and cleanup (even on Windows).

INSTALLATION on a UNIX-ALIKE

The order of selection of OpenMP flags has been changed: Oracle Developer Studio 12.5 accepts -fopenmp and -xopenmp but only the latter enables OpenMP so it is now tried first.

BUG FIXES

within(List, rm(x1, x2)) works correctly again, including when List[["x2"]] is NULL.
regexec(pattern, text, *) now applies as.character(.) to its first two arguments, as documented.
write.table() and related functions, writeLines(), and perhaps other functions writing text to connections did not signal errors when the writes failed, e.g. due to a disk being full. Errors will now be signalled if detected during the write, warnings if detected when the connection is closed. (PR#17243)
rt() assumed the ncp parameter was a scalar. (PR#17306)
menu(choices) with more than 10 choices which easily fit into one getOption("width")-line no longer erroneously repeats choices. (PR#17312)
length()<- on a pairlist succeeds. (https://stat.ethz.ch/pipermail/r-devel/2017-July/074680.html)
Language objects such as quote(("\n")) or R functions are correctly printed again, where R 3.4.1 accidentally duplicated the backslashes.
Construction of names() for very large objects in c() and unlist() now works, thanks to Suharto Anggono’s patch proposals in PR#17292.
Resource leaks (and similar) reported by Steve Grubb fixed. (PR#17314, PR#17316, PR#17317, PR#17318, PR#17319, PR#17320)
model.matrix(~1, mf) now gets the row names from mf also when they differ from 1:nrow(mf), fixing PR#14992 thanks to the suggestion by Sebastian Meyer.
sigma(fm) now takes the correct denominator degrees of freedom for a fitted model with NA coefficients. (PR#17313)
hist(x, "FD") no longer “dies” with a somewhat cryptic error message when x has extreme outliers or IQR() zero: nclass.FD(x) tries harder to find a robust bin width h in the latter case, and hist.default(*, breaks) now checks and corrects a too large breaks number. (PR#17274)
callNextMethod() works for ... methods.
qr.coef(qd, y) now has correct names also when qd is a complex QR or stems from qr(*, LAPACK=TRUE).
Setting options(device = *) to an invalid function no longer segfaults when plotting is initiated. (PR#15883)
encodeString() no longer segfaults. (PR#15885)
It is again possible to use configure --enable-maintainer-mode without having installed notangle (it was required in R 3.4.[01]).
S4 method dispatch on ... calls the method by name instead of .Method (for consistency with default dispatch), and only attempts to pass non-missing arguments from the generic.
readRDS(textConnection(.)) works again. (PR#17325)
(1:n)[-n] no longer segfaults for n <- 2.2e9 (on a platform with enough RAM).
x <- 1:2; tapply(x, list(x, x), function(x) "")[1,2] now correctly returns NA. (PR#17333)
Running of finalizers after explicit GC request moved from the R interface do_gc to the C interface R_gc. This helps with reclaiming inaccessible connections.
help.search(topic) and ??topic matching topics in vignettes with multiple file name extensions (e.g., ‘*.md.rsp’ but not ‘*.Rmd’) failed with an error when using options(help_type = "html").
The X11 device no longer uses the Xlib backing store (PR#16497).
array(character(), 1) now gives (a 1D array with) NA as has been documented for a long time as in the other cases of zero-length array initialization and also compatibly withmatrix(character(), *). As mentioned there, this also fixes PR#17333.
splineDesign(.., derivs = 4) no longer segfaults.
fisher.test(*, hybrid=TRUE) now (again) will use the hybrid method when Cochran’s conditions are met, fixing PR#16654.

The post R 3.4.2 is released (with several bug fixes and a few performance improvements) first appeared on R-statistics blog.

R 3.4.1 is released – with some Windows related bug-fixes

Tal Galili — Tue, 11 Jul 2017 07:02:27 +0000

R 3.4.1 (codename “Single Candle”) was released several days ago. You can get the latest binaries version from here. (or the .tar.gz source code from here).

As mentioned last week by David Smith, R 3.4.1 includes several Windows related bug fixed:

including an issue sometimes encountered when attempting to install packages on Windows, and problems displaying functions including Unicode characters (like “日本語”) in the Windows GUI.

The full list of bug fixes and new features is provided below.

Upgrading to R 3.4.1 on Windows

If you are using Windows you can easily upgrade to the latest version of R using the installr package. Simply run the following code in Rgui:

install.packages("installr") # install 
setInternet2(TRUE) # only for R versions older than 3.3.0
installr::updateR() # updating R.
# If you wish it to go faster, run: installr::updateR(T)

I try to keep the installr package updated and useful, so if you have any suggestions or remarks on the package – you are invited to open an issue in the github page.

CHANGES IN R 3.4.1

INSTALLATION on a UNIX-ALIKE

The deprecated support for PCRE versions older than 8.20 has been removed.

BUG FIXES

getParseData() gave incorrect column information when code contained multi-byte characters. (PR#17254)
Asking for help using expressions like ?stats::cor() did not work. (PR#17250)
readRDS(url(....)) now works.
R CMD Sweave again returns status = 0 on successful completion.
Vignettes listed in ‘.Rbuildignore’ were not being ignored properly. (PR#17246)
file.mtime() no longer returns NA on Windows when the file or directory is being used by another process. This affected installed.packages(), which is now protected against this.
R CMD INSTALL Windows .zip file obeys --lock and --pkglock flags.
(Windows only) The choose.files() function could return incorrect results when called with multi = FALSE. (PR#17270)
aggregate(, drop = FALSE) now also works in case of near-equal numbers in by. (PR#16918)
fourfoldplot() could encounter integer overflow when calculating the odds ratio. (PR#17286)
parse() no longer gives spurious warnings when extracting srcrefs from a file not encoded in the current locale.
This was seen from R CMD check with ‘inst/doc/*.R’ files, and check has some additional protection for such files.
print.noquote(x) now always returns its argument x (invisibly).
Non-UTF-8 multibyte character sets were not handled properly in source references. (PR#16732)

The post R 3.4.1 is released – with some Windows related bug-fixes first appeared on R-statistics blog.

R 3.4.0 is released – with new speed upgrades and bug-fixes

Tal Galili — Mon, 24 Apr 2017 09:14:22 +0000

R 3.4.0 (codename “You Stupid Darkness”) was released 3 days ago. You can get the latest binaries version from here. (or the .tar.gz source code from here). The full list of bug fixes and new features is provided below.

As mentioned two months ago by David Smith, R 3.4.0 indicates several major changes aimed at improving the performance of R in various ways. These includes:

The JIT (‘Just In Time’) byte-code compiler is now enabled by default at its level 3. This means functions will be compiled on first or second use and top-level loops will be compiled and then run. (Thanks to Tomas Kalibera for extensive work to make this possible.) For now, the compiler will not compile code containing explicit calls to browser(): this is to support single stepping from the browser() call. JIT compilation can be disabled for the rest of the session using compiler::enableJIT(0) or by setting environment variable R_ENABLE_JIT to 0.
Matrix products now consistently bypass BLAS when the inputs have NaN/Inf values. Performance of the check of inputs has been improved. Performance when BLAS is used is improved for matrix/vector and vector/matrix multiplication (DGEMV is now used instead of DGEMM). One can now choose from alternative matrix product implementations via options(matprod = ). The “internal” implementation is not optimized for speed but consistent in precision with other summations in R (using long double accumulators where available). “blas” calls BLAS directly for best speed, but usually with undefined behavior for inputs with NaN/Inf.
Speedup in simplify2array() and hence sapply() and mapply() (for the case of names and common length #> 1), thanks to Suharto Anggono’s PR#17118.
Accumulating vectors in a loop is faster – Assigning to an element of a vector beyond the current length now over-allocates by a small fraction. The new vector is marked internally as growable, and the true length of the new vector is stored in the truelength field. This makes building up a vector result by assigning to the next element beyond the current length more efficient, though pre-allocating is still preferred. The implementation is subject to change and not intended to be used in packages at this time.
C-LEVEL FACILITIES have been extended.
Radix sort (which can be considered more efficient for some cases) is now chosen by method = “auto” for sort.int() for double vectors (and hence used for sort() for unclassed double vectors), excluding ‘long’ vectors. sort.int(method = “radix”) no longer rounds double vectors. The default method until R 3.2.0 was “shell”. A minimal comparison between the two shows that for very short vectors (100 values), “shell” would perform better. From a 1000 values, they are comparable, and for larger vectors – “radix” is doing 2-3 times faster (which is probably the use case for which we would care about more). More about this can be read in ?sort.int

#> 
#> set.seed(2017-04-24)
#> x  microbenchmark(shell = sort.int(x, method = "shell"), radix = sort.int(x, method = "radix"))
Unit: microseconds
  expr    min     lq     mean median     uq    max neval cld
 shell 15.775 16.606 17.80971 17.989 18.543 33.211   100  a 
 radix 32.657 34.595 35.67700 35.148 35.702 88.561   100   b
#> 
#> set.seed(2017-04-24)
#> x  microbenchmark(shell = sort.int(x, method = "shell"), radix = sort.int(x, method = "radix"))
Unit: microseconds
  expr    min     lq     mean median      uq    max neval cld
 shell 53.414 55.074 56.54395 56.182 57.0120 96.034   100   b
 radix 45.665 46.772 48.04222 47.325 48.1555 78.598   100  a 
#> 
#> set.seed(2017-04-24)
#> x  microbenchmark(shell = sort.int(x, method = "shell"), radix = sort.int(x, method = "radix"))
Unit: milliseconds
  expr      min       lq      mean    median        uq      max neval cld
 shell 93.33140 95.94478 107.75347 103.02756 115.33709 221.0800   100   b
 radix 38.18241 39.01516  46.47038  41.45722  47.49596 159.3518   100  a 
#> 
#>

More about the changes in R case be read at the nice post by David Smith, or in the list of changes given below.

Upgrading to R 3.4.0 on Windows

If you are using Windows you can easily upgrade to the latest version of R using the installr package. Simply run the following code in Rgui:

install.packages("installr") # install 
setInternet2(TRUE) # only for R versions older than 3.3.0
installr::updateR() # updating R.
# If you wish it to go faster, run: installr::updateR(T)

I try to keep the installr package updated and useful, so if you have any suggestions or remarks on the package – you are invited to open an issue in the github page.

CHANGES IN R 3.4.0

SIGNIFICANT USER-VISIBLE CHANGES

(Unix-alike) The default methods for download.file() and url() now choose "libcurl" except for file:// URLs. There will be small changes in the format and wording of messages, including in rare cases if an issue is a warning or an error. For example, when HTTP re-direction occurs, some messages refer to the final URL rather than the specified one.Those who use proxies should check that their settings are compatible (see ?download.file: the most commonly used forms work for both "internal" and "libcurl").
table() has been amended to be more internally consistent and become back compatible to R <= 2.7.2 again. Consequently, table(1:2, exclude = NULL) no longer contains a zero count for , but useNA = "always" continues to do so.
summary.default() no longer rounds, but its print method does resulting in less extraneous rounding, notably of numbers in the ten thousands.
factor(x, exclude = L) behaves more rationally when x or L are character vectors. Further, exclude = now behaves as documented for long.
Arithmetic, logic (&, |) and comparison (aka ‘relational’, e.g., <, ==) operations with arrays now behave consistently, notably for arrays of length zero.Arithmetic between length-1 arrays and longer non-arrays had silently dropped the array attributes and recycled. This now gives a warning and will signal an error in the future, as it has always for logic and comparison operations in these cases (e.g., compare matrix(1,1) + 2:3 and matrix(1,1) < 2:3).
The JIT (‘Just In Time’) byte-code compiler is now enabled by default at its level 3. This means functions will be compiled on first or second use and top-level loops will be compiled and then run. (Thanks to Tomas Kalibera for extensive work to make this possible.)For now, the compiler will not compile code containing explicit calls to browser(): this is to support single stepping from the browser() call.JIT compilation can be disabled for the rest of the session using compiler::enableJIT(0) or by setting environment variable R_ENABLE_JIT to 0.
xtabs() works more consistently with NAs, also in its result no longer setting them to 0. Further, a new logical option addNA allows to count NAs where appropriate. Additionally, for the case sparse = TRUE, the result’s dimnames are identical to the default case’s.
Matrix products now consistently bypass BLAS when the inputs have NaN/Inf values. Performance of the check of inputs has been improved. Performance when BLAS is used is improved for matrix/vector and vector/matrix multiplication (DGEMV is now used instead of DGEMM).One can now choose from alternative matrix product implementations via options(matprod = ). The "internal" implementation is not optimized for speed but consistent in precision with other summations in R (using long double accumulators where available). "blas" calls BLAS directly for best speed, but usually with undefined behavior for inputs with NaN/Inf.
factor() now uses order() to sort its levels, not sort.list(). This makes factor() support custom vector-like objects if methods for the appropriate generics are defined. This change has the side effect of making factor() succeed on empty or length-one non-atomic vector(-like) types (e.g., list), where it failed before.

NEW FEATURES

User errors such as integrate(f, 0:1, 2) are now caught.
Add signature argument to debug(), debugonce(), undebug() and isdebugged() for more conveniently debugging S3 and S4 methods. (Based on a patch by Gabe Becker.)
Add utils::debugcall() and utils::undebugcall() for debugging the function that would be called by evaluating the given expression. When the call is to an S4 generic or standard S3 generic, debugcall() debugs the method that would be dispatched. A number of internal utilities were added to support this, most notably utils::isS3stdGeneric(). (Based on a patch by Gabe Becker.)
Add utils::strcapture(). Given a character vector and a regular expression containing capture expressions, strcapture() will extract the captured tokens into a tabular data structure, typically a data.frame.
str() and strOptions() get a new option drop.deparse.attr with improved but changed default behaviour for expressions. For expressionobjects x, str(x) now may remove extraneous white space and truncate long lines.
str() is no longer very slow; inspired by Mikko Korpela’s proposal in PR#16527.
str(x)‘s default method is more “accurate” and hence somewhat more generous in displaying character vectors; this will occasionally change Routputs (and need changes to some ‘*.Rout(.save)’ files).
For a classed integer vector such as x <- xtabs(~ c(1,9,9,9)), str(x) now shows both the class and "int", instead of only the latter.
isSymmetric(m) is much faster for large asymmetric matrices m via pre-tests and a new option tol1 (with which strict back compatibility is possible but not the default).
The result of eigen() now is of class "eigen" in the default case when eigenvectors are computed.
Zero-length date and date-time objects (of classes "POSIX[cl]?t") now print() “recognizably”.
xy.coords() and xyz.coords() get a new setLab option.
The method argument of sort.list(), order() and sort.int() gains an "auto" option (the default) which should behave the same as before when method was not supplied.
stopifnot(E, ..) now reports differences when E is a call to all.equal() and that is not true.
boxplot(, *) gain optional arguments drop, sep, and lex.order to pass to split.default() which itself gains an argument lex.orderto pass to interaction() for more flexibility.
The plot() method for ppr() has enhanced default labels (xmin and main).
sample.int() gains an explicit useHash option (with a back compatible default).
identical() gains an ignore.srcref option which drops "srcref" and similar attributes when true (as by default).
diag(x, nrow = n) now preserves typeof(x), also for logical, integer and raw x (and as previously for complex and numeric).
smooth.spline() now allows direct specification of lambda, gets a hatvalues() method and keeps tol in the result, and optionally parts of the internal matrix computations.
addNA() is faster now, e.g. when applied twice. (Part of PR#16895.)
New option rstandard(, type = "predicted") provides the “PRESS”–related leave-one-out cross-validation errors for linear models.
After seven years of deprecation, duplicated factor levels now produce a warning when printed and an error in levels<- instead of a warning.
Invalid factors, e.g., with duplicated levels (invalid but constructable) now give a warning when printed, via new function .valid.factor().
sessionInfo() has been updated for Apple’s change in OS naming as from ‘10.12’ (‘macOS Sierra’ vs ‘OS X El Capitan’).Its toLatex() method now includes the running component.
options(interrupt=) can be used to specify a default action for user interrupts. For now, if this option is not set and the error option is set, then an unhandled user interrupt invokes the error option. (This may be dropped in the future as interrupt conditions are not errorconditions.)
In most cases user interrupt handlers will be called with a "resume" restart available. Handlers can invoke this restart to resume computation. At the browser prompt the r command will invoke a "resume" restart if one is available. Some read operations cannot be resumed properly when interrupted and do not provide a "resume" restart.
Radix sort is now chosen by method = "auto" for sort.int() for double vectors (and hence used for sort() for unclassed double vectors), excluding ‘long’ vectors.sort.int(method = "radix") no longer rounds double vectors.
The default and data.frame methods for stack() preserve the names of empty elements in the levels of the ind column of the return value. Set the new drop argument to TRUE for the previous behavior.
Speedup in simplify2array() and hence sapply() and mapply() (for the case of names and common length #> 1), thanks to Suharto Anggono’s PR#17118.
table(x, exclude = NULL) now sets useNA = "ifany" (instead of "always"). Together with the bug fixes for this case, this recovers more consistent behaviour compatible to older versions of R. As a consequence, summary() for a logical vector no longer reports (zero) counts for NAwhen there are no NAs.
dump.frames() gets a new option include.GlobalEnv which allows to also dump the global environment, thanks to Andreas Kersting’s proposal in PR#17116.
system.time() now uses message() instead of cat() when terminated early, such that suppressMessages() has an effect; suggested by Ben Bolker.
citation() supports ‘inst/CITATION’ files from package source trees, with lib.loc pointing to the directory containing the package.
try() gains a new argument outFile with a default that can be modified via options(try.outFile = .), useful notably for Sweave.
The unexported low-level functions in package parallel for passing serialized R objects to and from forked children now support long vectors on 64-bit platforms. This removes some limits on higher-level functions such as mclapply() (but returning gigabyte results from forked processes via serialization should be avoided if at all possible).
Connections now print() without error even if invalid, e.g. after having been destroyed.
apropos() and find(simple.words = FALSE) no longer match object names starting with . which are known to be internal objects (such as .__S3MethodsTable__.).
Convenience function hasName() has been added; it is intended to replace the common idiom !is.null(x$name) without the usually unintended partial name matching.
strcapture() no longer fixes column names nor coerces strings to factors (suggested by Bill Dunlap).
strcapture() returns NA for non-matching values in x (suggested by Bill Dunlap).
source() gets new optional arguments, notably exprs; this is made use of in the new utility function withAutoprint().
sys.source() gets a new toplevel.env argument. This argument is useful for frameworks running package tests; contributed by Tomas Kalibera.
Sys.setFileTime() and file.copy(copy.date = TRUE) will set timestamps with fractions of seconds on platforms/filesystems which support this.
(Windows only.) file.info() now returns file timestamps including fractions of seconds; it has done so on other platforms since R 2.14.0. (NB: some filesystems do not record modification and access timestamps to sub-second resolution.)
The license check enabled by options(checkPackageLicense = TRUE) is now done when the package’s namespace is first loaded.
ppr() and supsmu() get an optional trace argument, and ppr(.., sm.method = ..spline) is no longer limited to sample size n <= 2500.
The POSIXct method for print() gets optional tz and usetz arguments, thanks to a report from Jennifer S. Lyon.
New function check_packages_in_dir_details() in package tools for analyzing package-check log files to obtain check details.
Package tools now exports function CRAN_package_db() for obtaining information about current packages in the CRAN package repository, and several functions for obtaining the check status of these packages.
The (default) Stangle driver Rtangle allows annotate to be a function and gets a new drop.evalFALSE option.
The default method for quantile(x, prob) should now be monotone in prob, even in border cases, see PR#16672.
bug.report() now tries to extract an email address from a BugReports field, and if there is none, from a Contacts field.
The format() and print() methods for object.size() results get new options standard and digits; notably, standard = "IEC" and standard = "SI" allow more standard (but less common) abbreviations than the default ones, e.g. for kilobytes. (From contributions by Henrik Bengtsson.)
If a reference class has a validity method, validObject will be called automatically from the default initialization method for reference classes.
tapply() gets new option default = NA allowing to change the previously hardcoded value.
read.dcf() now consistently interprets any ‘whitespace’ to be stripped to include newlines.
The maximum number of DLLs that can be loaded into R e.g. via dyn.load() can now be increased by setting the environment variable R_MAX_NUM_DLLS before starting R.
Assigning to an element of a vector beyond the current length now over-allocates by a small fraction. The new vector is marked internally as growable, and the true length of the new vector is stored in the truelength field. This makes building up a vector result by assigning to the next element beyond the current length more efficient, though pre-allocating is still preferred. The implementation is subject to change and not intended to be used in packages at this time.
Loading the parallel package namespace no longer sets or changes the .Random.seed, even if R_PARALLEL_PORT is unset.NB: This can break reproducibility of output, and did for a CRAN package.
Methods "wget" and "curl" for download.file() now give an R error rather than a non-zero return value when the external command has a non-zero status.
Encoding name "utf8" is mapped to "UTF-8". Many implementations of iconv accept "utf8", but not GNU libiconv (including the late 2016 version 1.15).
sessionInfo() shows the full paths to the library or executable files providing the BLAS/LAPACK implementations currently in use (not available on Windows).
The binning algorithm used by bandwidth selectors bw.ucv(), bw.bcv() and bw.SJ() switches to a version linear in the input size n for n #> nb/2. (The calculations are the same, but for larger n/nb it is worth doing the binning in advance.)
There is a new option PCRE_study which controls when grep(perl = TRUE) and friends ‘study’ the compiled pattern. Previously this was done for 11 or more input strings: it now defaults to 10 or more (but most examples need many more for the difference from studying to be noticeable).
grep(perl = TRUE) and friends can now make use of PCRE’s Just-In-Time mechanism, for PCRE #>= 8.20 on platforms where JIT is supported. It is used by default whenever the pattern is studied (see the previous item). (Based on a patch from Mikko Korpela.)This is controlled by a new option PCRE_use_JIT.Note that in general this makes little difference to the speed, and may take a little longer: its benefits are most evident on strings of thousands of characters. As a side effect it reduces the chances of C stack overflow in the PCRE library on very long strings (millions of characters, but see next item).
Warning: segfaults were seen using PCRE with JIT enabled on 64-bit Sparc builds.
There is a new option PCRE_limit_recursion for grep(perl = TRUE) and friends to set a recursion limit taking into account R‘s estimate of the remaining C stack space (or 10000 if that is not available). This reduces the chance of C stack overflow, but because it is conservative may report a non-match (with a warning) in examples that matched before. By default it is enabled if any input string has 1000 or more bytes. (PR#16757)
getGraphicsEvent() now works on X11(type = "cairo") devices. Thanks to Frederick Eaton (for reviving an earlier patch).
There is a new argument onIdle for getGraphicsEvent(), which allows an R function to be run whenever there are no pending graphics events. This is currently only supported on X11 devices. Thanks to Frederick Eaton.
The deriv() and similar functions now can compute derivatives of log1p(), sinpi() and similar one-argument functions, thanks to a contribution by Jerry Lewis.
median() gains a formal ... argument, so methods with extra arguments can be provided.
strwrap() reduces indent if it is more than half width rather than giving an error. (Suggested by Bill Dunlap.)
When the condition code in if(.) or while(.) is not of length one, an error instead of a warning may be triggered by setting an environment variable, see the help page.
Formatting and printing of bibliography entries (bibentry) is more flexible and better documented. Apart from setting options(citation.bibtex.max = 99) you can also use print(, bibtex=TRUE) (or format(..)) to get the BibTeX entries in the case of more than one entry. This also affects citation(). Contributions to enable style = "html+bibtex" are welcome.

C-LEVEL FACILITIES

Entry points R_MakeExternalPtrFn and R_ExternalPtrFn are now declared in header ‘Rinternals.h’ to facilitate creating and retrieving an Rexternal pointer from a C function pointer without ISO C warnings about the conversion of function pointers.
There was an exception for the native Solaris C++ compiler to the dropping (in R 3.3.0) of legacy C++ headers from headers such as ‘R.h’ and ‘Rmath.h’ — this has now been removed. That compiler has strict C++98 compliance hence does not include extensions in its (non-legacy) C++ headers: some packages will need to request C++11 or replace non-C++98 calls such as lgamma: see §1.6.4 of ‘Writing R Extensions’.Because it is needed by about 70 CRAN packages, headers ‘R.h’ and ‘Rmath.h’ still declare
```
use namespace std;
```
when included on Solaris.
When included from C++, the R headers now use forms such as std::FILE directly rather than including the line
```
using std::FILE;
```
C++ code including these headers might be relying on the latter.
Headers ‘R_ext/BLAS.h’ and ‘R_ext/Lapack.h’ have many improved declarations including const for double-precision complex routines. Inter alia this avoids warnings when passing ‘string literal’ arguments from C++11 code.
Headers for Unix-only facilities ‘R_ext/GetX11Image.h’, ‘R_ext/QuartzDevice.h’ and ‘R_ext/eventloop.h’ are no longer installed on Windows.
No-longer-installed headers ‘GraphicsBase.h’, ‘RGraphics.h’, ‘Rmodules/RX11.h’ and ‘Rmodules/Rlapack.h’ which had a LGPL license no longer do so.
HAVE_UINTPTR_T is now defined where appropriate by Rconfig.h so that it can be included before Rinterface.h when CSTACK_DEFNS is defined and a C compiler (not C++) is in use. Rinterface.h now includes C header ‘stdint.h’ or C++11 header ‘cstdint’ where needed.
Package tools has a new function package_native_routine_registration_skeleton() to assist adding native-symbol registration to a package. See its help and §5.4.1 of ‘Writing R Extensions’ for how to use it. (At the time it was added it successfully automated adding registration to over 90% of CRAN packages which lacked it. Many of the failures were newly-detected bugs in the packages, e.g. 50 packages called entry points with varying numbers of arguments and 65 packages called entry points not in the package.)

INSTALLATION on a UNIX-ALIKE

readline headers (and not just the library) are required unless configuring with –with-readline=no.
configure now adds a compiler switch for C++11 code, even if the compiler supports C++11 by default. (This ensures that g++ 6.x uses C++11 mode and not its default mode of C++14 with ‘GNU extensions’.)The tests for C++11 compliance are now much more comprehensive. For gcc < 4.8, the tests from R 3.3.0 are used in order to maintain the same behaviour on Linux distributions with long-term support.
An alternative compiler for C++11 is now specified with CXX11, not CXX1X. Likewise C++11 flags are specified with CXX11FLAGS and the standard (e.g., -std=gnu++11 is specified with CXX11STD.
configure now tests for a C++14-compliant compiler by testing some basic features. This by default tries flags for the compiler specified by CXX11, but an alternative compiler, options and standard can be specified by variables CXX14, CXX14FLAGS and CXX14STD (e.g., -std=gnu++14).
There is a new macro CXXSTD to help specify the standard for C++ code, e.g. -std=c++98. This makes it easier to work with compilers which default to a later standard: for example, with CXX=g++6 CXXSTD=-std=c++98 configure will select commands for g++ 6.x which conform to C++11 and C++14 where specified but otherwise use C++98.
Support for the defunct IRIX and OSF/1 OSes and Alpha CPU has been removed.
configure checks that the compiler specified by $CXX $CXXFLAGS is able to compile C++ code.
configure checks for the required header ‘sys/select.h’ (or ‘sys/time.h’ on legacy systems) and system call select and aborts if they are not found.
If available, the POSIX 2008 system call utimensat will be used by Sys.setFileTime() and file.copy(copy.date = TRUE). This may result in slightly more accurate file times. (It is available on Linux and FreeBSD but not macOS.)
The minimum version requirement for libcurl has been reduced to 7.22.0, although at least 7.28.0 is preferred and earlier versions are little tested. (This is to support Debian 7 ‘Wheezy’ LTS and Ubuntu ‘Precise’ 12.04 LTS, although the latter is close to end-of-life.)
configure tests for a C++17-compliant compiler. The tests are experimental and subject to change in the future.

INCLUDED SOFTWARE

(Windows only) Tcl/Tk version 8.6.4 is now included in the binary builds. The ‘tcltk*.chm’ help file is no longer included; please consult the online help at http://www.tcl.tk/man/ instead.
The version of LAPACK included in the sources has been updated to 3.7.0: no new routines have been added to R.

PACKAGE INSTALLATION

There is support for compiling C++14 or C++17 code in packages on suitable platforms: see ‘Writing R Extensions’ for how to request this.
The order of flags when LinkingTo other packages has been changed so their include directories come earlier, before those specified in CPPFLAGS. This will only have an effect if non-system include directories are included with -I flags in CPPFLAGS (and so not the default -I/usr/local/include which is treated as a system include directory on most platforms).
Packages which register native routines for .C or .Fortran need to be re-installed for this version (unless installed with R-devel SVN revision r72375 or later).
Make variables with names containing CXX1X are deprecated in favour of those using CXX11, but for the time being are still made available viafile ‘etc/Makeconf’. Packages using them should be converted to the new forms and made dependent on R (#>= 3.4.0).

UTILITIES

Running R CMD check --as-cran with _R_CHECK_CRAN_INCOMING_REMOTE_ false now skips tests that require remote access. The remaining (local) tests typically run quickly compared to the remote tests.
R CMD build will now give priority to vignettes produced from files in the ‘vignettes’ directory over those in the ‘inst/doc’ directory, with a warning that the latter are being ignored.
R CMD config gains a –all option for printing names and values of all basic configure variables.It now knows about all the variables used for the C++98, C++11 and C++14 standards.
R CMD check now checks that output files in ‘inst/doc’ are newer than the source files in ‘vignettes’.
For consistency with other package subdirectories, files named ‘*.r’ in the ‘tests’ directory are now recognized as tests by R CMD check. (Wish of PR#17143.)
R CMD build and R CMD check now use the union of R_LIBS and .libPaths(). They may not be equivalent, e.g., when the latter is determined by R_PROFILE.
R CMD build now preserves dates when it copies files in preparing the tarball. (Previously on Windows it changed the dates on all files; on Unix, it changed some dates when installing vignettes.)
The new option R CMD check --no-stop-on-test-error allows running the remaining tests (under ‘tests/’) even if one gave an error.
Check customization via environment variables to detect side effects of .Call() and .External() calls which alter their arguments is described in §8 of the ‘R Internals’ manual.
R CMD check now checks any BugReports field to be non-empty and a suitable single URL.
R CMD check --as-cran now NOTEs if the package does not register its native routines or does not declare its intentions on (native) symbol search. (This will become a WARNING in due course.)

DEPRECATED AND DEFUNCT

(Windows only) Function setInternet2() is defunct.
Installation support for readline emulations based on editline (aka libedit) is deprecated.
Use of the C/C++ macro NO_C_HEADERS is defunct and silently ignored.
unix.time(), a traditional synonym for system.time(), has been deprecated.
structure(NULL, ..) is now deprecated as you cannot set attributes on NULL.
Header ‘Rconfig.h’ no longer defines SUPPORT_OPENMP; instead use _OPENMP (as documented for a long time).
(C-level Native routine registration.) The deprecated styles member of the R_CMethodDef and R_FortranMethodDef structures has been removed. Packages using these will need to be re-installed for R 3.4.0.
The deprecated support for PCRE versions older than 8.20 will be removed in R 3.4.1. (Versions 8.20–8.31 will still be accepted but remain deprecated.)

BUG FIXES

Getting or setting body() or formals() on non-functions for now signals a warning and may become an error for setting.
match(x, t), duplicated(x) and unique(x) work as documented for complex numbers with NAs or NaNs, where all those containing NA do match, whereas in the case of NaN‘s both real and imaginary parts must match, compatibly with how print() and format() work for complex numbers.
deparse(, options = "digits17") prints more nicely now, mostly thanks to a suggestion by Richie Cotton.
Rotated symbols in plotmath expressions are now positioned correctly on x11(type = "Xlib"). (PR#16948)
as<-() avoids an infinite loop when a virtual class is interposed between a subclass and an actual superclass.
Fix level propagation in unlist() when the list contains zero-length lists or factors.
Fix S3 dispatch on S4 objects when the methods package is not attached.
Internal S4 dispatch sets .Generic in the method frame for consistency with standardGeneric(). (PR#16929)
Fix order(x, decreasing = TRUE) when x is an integer vector containing MAX_INT. Ported from a fix Matt Dowle made to data.table.
Fix caching by callNextMethod(), resolves PR#16973 and PR#16974.
grouping() puts NAs last, to be consistent with the default behavior of order().
Point mass limit cases: qpois(-2, 0) now gives NaN with a warning and qgeom(1, 1) is 0. (PR#16972)
table() no longer drops an "NaN" factor level, and better obeys exclude = , thanks to Suharto Anggono’s patch for PR#16936. Also, in the case of exclude = NULL and NAs, these are tabulated correctly (again).Further, table(1:2, exclude = 1, useNA = "ifany") no longer erroneously reports counts.Additionally, all cases of empty exclude are equivalent, and useNA is not overwritten when specified (as it was by exclude = NULL).
wilcox.test(x, conf.int=TRUE) no longer errors out in cases where the confidence interval is not available, such as for x = 0:2.
droplevels(f) now keeps levels when present.
In integer arithmetic, NULL is now treated as integer(0) whereas it was previously treated as double(0).
The radix sort considers NA_real_ and NaN to be equivalent in rank (like the other sort algorithms).
When index.return=TRUE is passed to sort.int(), the radix sort treats NAs like sort.list() does (like the other sort algorithms).
When in tabulate(bin, nbin) length(bin) is larger than the maximal integer, the result is now of type double and hence no longer silently overflows to wrong values. (PR#17140)
as.character.factor() respects S4 inheritance when checking the type of its argument. (PR#17141)
The factor method for print() no longer sets the class of the factor to NULL, which would violate a basic constraint of an S4 object.
formatC(x, flag = f) allows two new flags, and signals an error for invalid flags also in the case of character formatting.
Reading from file("stdin") now also closes the connection and hence no longer leaks memory when reading from a full pipe, thanks to Gábor Csárdi, see thread starting at https://stat.ethz.ch/pipermail/r-devel/2016-November/073360.html.
Failure to create file in tempdir() for compressed pdf() graphics device no longer errors (then later segfaults). There is now a warning instead of error and compression is turned off for the device. Thanks to Alec Wysoker (PR#17191).
Asking for methods() on "|" returns only S3 methods. See https://stat.ethz.ch/pipermail/r-devel/2016-December/073476.html.
dev.capture() using Quartz Cocoa device (macOS) returned invalid components if the back-end chose to use ARGB instead of RGBA image format. (Reported by Noam Ross.)
seq("2", "5") now works too, equivalently to "2":"5" and seq.int().
seq.int(to = 1, by = 1) is now correct, other cases are integer (instead of double) when seq() is integer too, and the “non-finite” error messages are consistent between seq.default() and seq.int(), no longer mentioning NaN etc.
rep(x, times) and rep.int(x, times) now work when times is larger than the largest value representable in an integer vector. (PR#16932)
download.file(method = "libcurl") does not check for URL existence before attempting downloads; this is more robust to servers that do not support HEAD or range-based retrieval, but may create empty or incomplete files for aborted download requests.
Bandwidth selectors bw.ucv(), bw.bcv() and bw.SJ() now avoid integer overflow for large sample sizes.
str() no longer shows "list output truncated", in cases that list was not shown at all. Thanks to Neal Fultz (PR#17219)
Fix for cairo_pdf() (and svg() and cairo_ps()) when replaying a saved display list that contains a mix of grid and graphics output. (Report by Yihui Xie.)
The str() and as.hclust() methods for "dendrogram" now also work for deeply nested dendrograms thanks to non-recursive implementations by Bradley Broom.
sample() now uses two uniforms for added precision when the uniform generator is Knuth-TAOCP, Knuth-TAOCP-2002, or a user-defined generator and the population size is 2^25 or greater.
If a vignette in the ‘vignettes’ directory is listed in ‘.Rbuildignore’, R CMD build would not include it in the tarball, but would include it in the vignette database, leading to a check warning. (PR#17246)
tools::latexToUtf8() infinite looped on certain inputs. (PR#17138)
terms.formula() ignored argument names when determining whether two terms were identical. (PR#17235)
callNextMethod() was broken when called from a method that augments the formal arguments of a primitive generic.
Coercion of an S4 object to a vector during sub-assignment into a vector failed to dispatch through the as.vector() generic (often leading to a segfault).
Fix problems in command completion: Crash (PR#17222) and junk display in Windows, handling special characters in filenames on all systems.

The post R 3.4.0 is released – with new speed upgrades and bug-fixes first appeared on R-statistics blog.