The name of the language, R, comes from being both an S language successor as well as the shared first letter of the authors, Ross and Robert.[10] In August 1993, Ihaka and Gentleman posted a binaryofRonStatLib — a data archive website. At the same time, they announced the posting on the s-news mailing list.[11] On December 5, 1997, R became a GNU project when version 0.60 was released.[12] On February 29, 2000, the first official 1.0 version was released.[13]
R packages are collections of functions, documentation, and data that expand R.[14] For example, packages add report features such as RMarkdown, Quarto, knitr and Sweave. Easy package installation and use have contributed to the language's adoption in data science.[15]
The Task Views on the CRAN website lists packages in fields such as finance, genetics, high-performance computing, machine learning, medical imaging, meta-analysis, social sciences, and spatial statistics.
An example package is the tidyverse package. Its focus is having a common interface around accessing and processing data contained in a data frame data structure, a two-dimensional table of rows and columns called "tidy data".[20] Each function in the package is designed to couple together all the other functions in the package.[21]
Installing a package occurs only once. To install tidyverse:[21]
> install.packages("tidyverse")
Toinstantiate the functions, data, and documentation of a package, execute the library() function. To instantiate tidyverse:[a]
The R Core Team was founded in 1997 to maintain the Rsource code. The R Foundation for Statistical Computing was founded in April 2003 to provide financial support. The R Consortium is a Linux Foundation project to develop R infrastructure.
The R Journal is an open access, academic journal which features short to medium-length articles on the use and development of R. It includes articles on packages, programming tips, CRAN news, and foundation news.
The R community hosts many conferences and in-person meetups. These groups include:
UseR!: an annual international R user conference (website)
Directions in Statistical Computing (DSC) (website)
The following examples illustrate the basic syntax of the language and use of the command-line interface. (An expanded list of standard language features can be found in the R manual, "An Introduction to R".[25])
In R, the generally preferred assignment operator is an arrow made from two characters <-, although = can be used in some cases.[26]
> x<-1:6# Create a numeric vector in the current environment> y<-x^2# Create vector based on the values in x.> print(y)# Print the vector’s contents.[1] 1 4 9 16 25 36> z<-x+y# Create a new vector that is the sum of x and y> z# Return the contents of z to the current environment.[1] 2 6 12 20 30 42> z_matrix<-matrix(z,nrow=3)# Create a new matrix that turns the vector z into a 3x2 matrix object> z_matrix [,1] [,2][1,] 2 20[2,] 6 30[3,] 12 42> 2*t(z_matrix)-2# Transpose the matrix, multiply every element by 2, subtract 2 from each element in the matrix, and return the results to the terminal. [,1] [,2] [,3][1,] 2 10 22[2,] 38 58 82> new_df<-data.frame(t(z_matrix),row.names=c("A","B"))# Create a new data.frame object that contains the data from a transposed z_matrix, with row names 'A' and 'B'> names(new_df)<-c("X","Y","Z")# Set the column names of new_df as X, Y, and Z.> print(new_df)# Print the current results. X Y ZA 2 6 12B 20 30 42> new_df$Z# Output the Z column[1] 12 42> new_df$Z==new_df['Z']&&new_df[3]==new_df$Z# The data.frame column Z can be accessed using $Z, ['Z'], or [3] syntax and the values are the same. [1] TRUE> attributes(new_df)# Print attributes information about the new_df object$names[1] "X" "Y" "Z"$row.names[1] "A" "B"$class[1] "data.frame"> attributes(new_df)$row.names<-c("one","two")# Access and then change the row.names attribute; can also be done using rownames()> new_df X Y Zone 2 6 12two 20 30 42
One of R's strengths is the ease of creating new functions.[27] Objects in the function body remain local to the function, and any data type may be returned. In R, almost all functions and all user-defined functions are closures.[28]
Create a function:
# The input parameters are x and y.# The function returns a linear combination of x and y.f<-function(x,y){z<-3*x+4*y# this return() statement is optionalreturn(z)}
In R version 4.1.0, a native pipe operator, |>, was introduced.[30] This operator allows users to chain functions together one after another, instead of a nested function call.
> nrow(subset(mtcars,cyl==4))# Nested without the pipe character[1] 11> mtcars|>subset(cyl==4)|>nrow()# Using the pipe character[1] 11
Another alternative to nested functions, in contrast to using the pipe character, is using intermediate objects. However, some argue that using the pipe operator will produce code that is easier to read.[21]
The R language has native support for object-oriented programming. There are two native frameworks, the so-called S3 and S4 systems. The former, being more informal, supports single dispatch on the first argument and objects are assigned to a class by just setting a "class" attribute in each object. The latter is a Common Lisp Object System (CLOS)-like system of formal classes (also derived from S) and generic methods that supports multiple dispatch and multiple inheritance[31]
In the example, summary is a generic function that dispatches to different methods depending on whether its argument is a numeric vector or a "factor":
> data<-c("a","b","c","a",NA)> summary(data) Length Class Mode 5 character character > summary(as.factor(data)) a b c NA's 2 1 1 1
The R language has built-in support for data modeling and graphics. The following example shows how R can generate and plot a linear model with residuals.
# Create x and y valuesx<-1:6y<-x^2# Linear regression model y = A + B * xmodel<-lm(y~x)# Display an in-depth summary of the modelsummary(model)# Create a 2 by 2 layout for figurespar(mfrow=c(2,2))# Output diagnostic plots of the modelplot(model)
Output:
Residuals: 1 2 3 4 5 6 7 8 9 10 3.3333 -0.6667 -2.6667 -2.6667 -0.6667 3.3333Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -9.3333 2.8441 -3.282 0.030453 * x 7.0000 0.7303 9.585 0.000662 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1Residual standard error: 3.055 on 4 degrees of freedomMultiple R-squared: 0.9583, Adjusted R-squared: 0.9478F-statistic: 91.88 on 1 and 4 DF, p-value: 0.000662
This Mandelbrot set example highlights the use of complex numbers. It models the first 20 iterations of the equation z = z2 + c, where c represents different complex constants.
Install the package that provides the write.gif() function beforehand:
install.packages("caTools")
R Source code:
library(caTools)jet.colors<-colorRampPalette(c("green","pink","#007FFF","cyan","#7FFF7F","white","#FF7F00","red","#7F0000"))dx<-1500# define widthdy<-1400# define heightC<-complex(real=rep(seq(-2.2,1.0,length.out=dx),each=dy),imag=rep(seq(-1.2,1.2,length.out=dy),times=dx))# reshape as matrix of complex numbersC<-matrix(C,dy,dx)# initialize output 3D arrayX<-array(0,c(dy,dx,20))Z<-0# loop with 20 iterationsfor (kin1:20){# the central difference equationZ<-Z^2+C# capture the resultsX[,,k]<-exp(-abs(Z))}write.gif(X,"Mandelbrot.gif",col=jet.colors,delay=100)
All R version releases from 2.14.0 onward have codenames that make reference to Peanuts comics and films.[32][33][34]
In 2018, core R developer Peter Dalgaard presented a history of R releases since 1997.[35] Some notable early releases before the named releases include:
Version 1.0.0 released on February 29, 2000 (2000-02-29), a leap day
Version 2.0.0 released on October 4, 2004 (2004-10-04), "which at least had a nice ring to it"[35]
The idea of naming R version releases was inspired by the Debian and Ubuntu version naming system. Dalgaard also noted that another reason for the use of Peanuts references for R codenames is because, "everyone in statistics is a P-nut".[35]
Wickham, Hadley; Çetinkaya-Rundel, Mine; Grolemund, Garrett (2023). R for data science: import, tidy, transform, visualize, and model data (2nd ed.). Beijing Boston Farnham Sebastopol Tokyo: O'Reilly. ISBN978-1-4920-9740-2.
^This displays to standard error a listing of all the packages that tidyverse depends upon. It may also display warnings showing two namespace conflicts, which may typically be ignored.
^ abHornik, Kurt; The R Core Team (12 April 2022). "R FAQ". The Comprehensive R Archive Network. 3.3 What are the differences between R and S?. Archived from the original on 28 December 2022. Retrieved 27 December 2022.
^Wickham, Hadley; Cetinkaya-Rundel, Mine; Grolemund, Garrett (2023). R for Data Science, Second Edition. O'Reilly. p. xvii. ISBN978-1-492-09740-2.
^Chambers, John M. (2020). "S, R, and Data Science". The R Journal. 12 (1): 462–476. doi:10.32614/RJ-2020-028. ISSN2073-4859. The R language and related software play a major role in computing for data science. ... R packages provide tools for a wide range of purposes and users.
^ abcWickham, Hadley; Cetinkaya-Rundel, Mine; Grolemund, Garrett (2023). R for Data Science, Second Edition. O'Reilly. ISBN978-1-492-09740-2.
^Talbot, Justin; DeVito, Zachary; Hanrahan, Pat (1 January 2012). "Riposte: A trace-driven compiler and parallel VM for vector code in R". Proceedings of the 21st international conference on Parallel architectures and compilation techniques. ACM. pp. 43–52. doi:10.1145/2370816.2370825. ISBN9781450311823. S2CID1989369.