The post Binary Classification via dce-GMDH Algorithm in R appeared first on Universe of Data Science.

]]>In this tutorial, we will work dce-GMDH type neural network approach for binary classification. Before we start, we need to divide data into three parts; train, validation and test sets. We use train set for model building. We utilize validation set for neuron selection. Last, we show the performance of the model on test set.

**Check Out:***Feature Selection and Classification via GMDH Algorithm in R*

In this tutorial, we will implement the algorithm on urine dataset, also used in the work done by Dag et al. (2022), available in boot package (Canty and Ripley, 2020). Before we go ahead, we load dataset and start to process the data.

```
data(urine, package = "boot")
```

After loading dataset, let’s exclude missing values to work on the complete dataset.

```
data <- na.exclude(urine)
head(data)
## r gravity ph osmo cond urea calc
## 2 0 1.017 5.74 577 20.0 296 4.49
## 3 0 1.008 7.20 321 14.9 101 2.36
## 4 0 1.011 5.51 408 12.6 224 2.15
## 5 0 1.005 6.52 187 7.5 91 1.16
## 6 0 1.020 5.27 668 25.3 252 3.34
## 7 0 1.012 5.62 461 17.4 195 1.40
```

**Also Check:** *How to Handle Missing Values in R*

We need to define the output variable as factor and input variables as matrix.

```
x <- data.matrix(data[,2:7])
y <- as.factor(data[,1])
```

We need to divide data into three sets; train (60%), validation (20%) and test (20%) sets. Then, we obtain the number of observations in each fold.

```
nobs <- dim(data)[1]
ntrain <- round(nobs*0.6,0)
nvalid <- round(nobs*0.2,0)
ntest <- nobs-(ntrain+nvalid)
```

Now let’s obtain the indices of train, validation and test sets. Before we obtain the indices, we shuffle the indices to prevent any bias based on order. For reproducibility of results, let’s fix the seed number to 1234.

```
set.seed(1234)
indices <- sample(1:nobs)
train.indices <- sort(indices[1:ntrain])
valid.indices <- sort(indices[(ntrain+1):(ntrain+nvalid)])
test.indices <- sort(indices[(ntrain+nvalid+1):nobs])
```

We can construct train, validatation and test sets.

```
x.train <- x[train.indices,]
y.train <- y[train.indices]
x.valid <- x[valid.indices,]
y.valid <- y[valid.indices]
x.test <- x[test.indices,]
y.test <- y[test.indices]
```

After obtaining train, validation and test sets, we can use dce-GMDH type neural network algorithm. dce-GMDH algorithm is available in GMDH2 package (Dag et al., 2019).

```
library(GMDH2)
model <- dceGMDH(x.train, y.train, x.valid, y.valid, alpha = 0.6, maxlayers = 10, maxneurons = 15, exCriterion ="MSE", verbose = TRUE)
## Structure :
##
## Layer Neurons Selected neurons Min MSE
## 0 5 5 0.141036573711885
## 1 10 1 0.139424256676092
##
## External criterion : Mean Square Error
##
## Classifiers ensemble : 2 out of 5 classifiers are assembled.
##
## naiveBayes
## cv.glmnet
```

**Also Check:** *How to Clean Data in R*

Now, let’s obtain performance measures on test set.

```
y.test_pred <- predict(model, x.test, type = "class")
confMat(y.test_pred, y.test, positive = "1")
## Confusion Matrix and Statistics
##
## reference
## data 1 0
## 1 4 2
## 0 2 8
##
##
## Accuracy : 0.75
## No Information Rate : 0.625
## Kappa : 0.4667
## Matthews Corr Coef : 0.4667
## Sensitivity : 0.6667
## Specificity : 0.8
## Positive Pred Value : 0.6667
## Negative Pred Value : 0.8
## Prevalence : 0.375
## Balanced Accuracy : 0.7333
## Youden Index : 0.4667
## Detection Rate : 0.25
## Detection Prevalence : 0.375
## Precision : 0.6667
## Recall : 0.6667
## F1 : 0.6667
##
## Positive Class : 1
```

In this model, dce-GMDH algorithm assembles the classification algorithms, naive bayes and elastic net logistic regression, contributing the classification performance. This ensemble algorithm classified 75.0% of individuals in a correct class. Also, sensitivity and specificity are calculated as 0.6667 and 0.8, respectively.

The application of the codes is available in our youtube channel below.

**Don’t forget to check:** *6 Ways of Subsetting Data in R*

**References **

Dag, O., Karabulut, E., Alpar, R. (2019). GMDH2: Binary Classification via GMDH-Type Neural Network Algorithms – R Package and Web-Based Tool. International Journal of Computational Intelligence Systems, 12:2, 649-660.

Dag, O., Kasikci, M., Karabulut, E., Alpar, R. (2022). Diverse Classifiers Ensemble Based on GMDH-Type Neural Network Algorithm for Binary Classification. Communications in Statistics – Simulation and Computation, 51:5, 2440-2456.

Canty, A., Ripley, B. (2020). boot: Bootstrap R (S-Plus) Functions. R package version 1.3-25.

The post Binary Classification via dce-GMDH Algorithm in R appeared first on Universe of Data Science.

]]>The post How to Create Dummy Variables Based on Variable Class in R Data Frame appeared first on Universe of Data Science.

]]>In this tutorial, we learn the usage of dummy.data.frame() function available in dummies package (Brown, 2012). Firstly, we learn how to create dummy variables for categorical variables . Secondly, we go over how to create dummy variables for specified class. At last, we learn how to create dummy variables for all variables in R data frame.

In this tutorial, we do not discuss that k-1 dummy variables are used if we have k levels of a categorical variable. You can read it here.

Let’s construct a data frame involving the variables involving four different variable types in R.

```
w <- factor(rep(c("apple","banana","carrot"), each = 2))
x <- rep(c("A","B","C"), 2)
y <- rep(1:3, 2)
z <- rep(c(0.4,0.8), 3)
data <- data.frame(w, x, y, z)
data
## w x y z
## 1 apple A 1 0.4
## 2 apple B 2 0.8
## 3 banana C 3 0.4
## 4 banana A 1 0.8
## 5 carrot B 2 0.4
## 6 carrot C 3 0.8
sapply(data, class)
## w x y z
## "factor" "character" "integer" "numeric"
```

**Check Out:***How to Convert Categorical Variables into Dummy Variables in R*

In this part, we use dummy.data.frame() function with default arguments. It converts the variables with factor and character classes to dummy variables. If we set all = FALSE, it removes the variables except for dummy variables.

```
library(dummies)
dummy.data.frame(data)
## wapple wbanana wcarrot xA xB xC y z
## 1 1 0 0 1 0 0 1 0.4
## 2 1 0 0 0 1 0 2 0.8
## 3 0 1 0 0 0 1 3 0.4
## 4 0 1 0 1 0 0 1 0.8
## 5 0 0 1 0 1 0 2 0.4
## 6 0 0 1 0 0 1 3 0.8
dummy.data.frame(data, all = FALSE)
## wapple wbanana wcarrot xA xB xC
## 1 1 0 0 1 0 0
## 2 1 0 0 0 1 0
## 3 0 1 0 0 0 1
## 4 0 1 0 1 0 0
## 5 0 0 1 0 1 0
## 6 0 0 1 0 0 1
```

**Also Check:** *How to Merge Data Frames in R*

We can create dummy variables by specifying the variable class with dummy.class argument. In this part, we set to dummy.class to “factor”, “character”, “numeric” and “integer”.

```
dummy.data.frame(data, dummy.classes = "factor")
## wapple wbanana wcarrot x y z
## 1 1 0 0 A 1 0.4
## 2 1 0 0 B 2 0.8
## 3 0 1 0 C 3 0.4
## 4 0 1 0 A 1 0.8
## 5 0 0 1 B 2 0.4
## 6 0 0 1 C 3 0.8
dummy.data.frame(data, dummy.classes = "character")
## w xA xB xC y z
## 1 apple 1 0 0 1 0.4
## 2 apple 0 1 0 2 0.8
## 3 banana 0 0 1 3 0.4
## 4 banana 1 0 0 1 0.8
## 5 carrot 0 1 0 2 0.4
## 6 carrot 0 0 1 3 0.8
dummy.data.frame(data, dummy.classes = "numeric")
## w x y z0.4 z0.8
## 1 apple A 1 1 0
## 2 apple B 2 0 1
## 3 banana C 3 1 0
## 4 banana A 1 0 1
## 5 carrot B 2 1 0
## 6 carrot C 3 0 1
dummy.data.frame(data, dummy.classes = "integer")
## w x y1 y2 y3 z
## 1 apple A 1 0 0 0.4
## 2 apple B 0 1 0 0.8
## 3 banana C 0 0 1 0.4
## 4 banana A 1 0 0 0.8
## 5 carrot B 0 1 0 0.4
## 6 carrot C 0 0 1 0.8
```

**Also Check:** *How to Remove Outliers from Data in R*

We can create dummy variables for all variables by setting dummy.class to “ALL”.

```
dummy.data.frame(data, dummy.classes = "ALL")
## wapple wbanana wcarrot xA xB xC y1 y2 y3 z0.4 z0.8
## 1 1 0 0 1 0 0 1 0 0 1 0
## 2 1 0 0 0 1 0 0 1 0 0 1
## 3 0 1 0 0 0 1 0 0 1 1 0
## 4 0 1 0 1 0 0 1 0 0 0 1
## 5 0 0 1 0 1 0 0 1 0 1 0
## 6 0 0 1 0 0 1 0 0 1 0 1
```

The application of the codes is available in our youtube channel below.

**Don’t forget to check:** *How to Reinstall All Packages After Updating R*

**References**

Brown, C. (2012). dummies: Create dummy/indicator variables flexibly and efficiently. R package version 1.5.6.

The post How to Create Dummy Variables Based on Variable Class in R Data Frame appeared first on Universe of Data Science.

]]>The post How to Convert Categorical Variables into Dummy Variables in R appeared first on Universe of Data Science.

]]>Sometimes, researchers can use integer encoding for a nominal variable to put it in a regression model. Integer encoding assigns a unique integer to each level of a categorical variable. Therefore, just integer encoding to nominal variable is misleading since it lets the model do a natural ordering between categories. This cause unexpected results and poor performance.

If we have a nominal variable and want to put it in the model, we need to create dummy variables for each nominal variable, i.e. one hot encoding. If we have k levels of a categorical variable, k new dummy variables are created. Each dummy variable has a value of either 0 or 1 , representing absence or presence of that feature, respectively.

If we have k levels of a categorical variable and we create k new dummy variables, we may fall in dummy variable trap. Dummy variable trap is a situation in which one variable can be exactly predicted by the value of other variables (multicollinearity). Therefore, we need to exclude one dummy variable while constructing regression model. **As a result, if we have k levels of a categorical variable, we need to create k-1 dummy variables.**

In this tutorial, we learn the usage of dummy_cols() function available in fastDummies package (Kaplan, 2020). Firstly, we learn how to create dummy variables. Secondly, we go over how to remove the nominal variables from data after creating dummy variables. At last, we learn how to save from dummy variable trap.

Let’s construct a data frame involving two categorical variables in which no ordinal relation exists.

```
x <- factor(rep(c("apple","banana","carrot"), each = 2))
y <- factor(rep(c("A","B","C"), 2))
data <- data.frame(x, y)
data
## x y
## 1 apple A
## 2 apple B
## 3 banana C
## 4 banana A
## 5 carrot B
## 6 carrot C
```

**Check Out:***How to Merge Data Frames in R*

In this part, we use select_columns argument to define which variables are converted into dummy variables.

```
library(fastDummies)
dummy_cols(data, select_columns = c("x","y"))
## x y x_apple x_banana x_carrot y_A y_B y_C
## 1 apple A 1 0 0 1 0 0
## 2 apple B 1 0 0 0 1 0
## 3 banana C 0 1 0 0 0 1
## 4 banana A 0 1 0 1 0 0
## 5 carrot B 0 0 1 0 1 0
## 6 carrot C 0 0 1 0 0 1
```

**Also Check:** *How to Remove Outliers from Data in R*

We can use remove_selected_columns argument to remove initial categorical variables from data after creation of dummy variables by set it to TRUE.

```
dummy_cols(data, select_columns = c("x","y"), remove_selected_columns = TRUE)
## x_apple x_banana x_carrot y_A y_B y_C
## 1 1 0 0 1 0 0
## 2 1 0 0 0 1 0
## 3 0 1 0 0 0 1
## 4 0 1 0 1 0 0
## 5 0 0 1 0 1 0
## 6 0 0 1 0 0 1
```

**Also Check:** How to Create Dummy Variables Based on Variable Class in R Data Frame

At last, we can use remove_first_dummy argument to save from dummy variable trap by setting it to TRUE.

```
dummy_cols(data, select_columns = c("x","y"), remove_selected_columns = TRUE, remove_first_dummy = TRUE)
## x_banana x_carrot y_B y_C
## 1 0 0 0 0
## 2 0 0 1 0
## 3 1 0 0 1
## 4 1 0 0 0
## 5 0 1 1 0
## 6 0 1 0 1
```

The application of the codes is available in our youtube channel below.

**Don’t forget to check:** *Missing Data Imputations in R – Mean, Median, Mode*

**References**

Kaplan, J. (2020). fastDummies: Fast Creation of Dummy (Binary) Columns and Rows from Categorical Variables. R package version 1.6.3.

The post How to Convert Categorical Variables into Dummy Variables in R appeared first on Universe of Data Science.

]]>The post How to Reinstall All Packages After Updating R appeared first on Universe of Data Science.

]]>Managing R packages is important part for the data scientist working with R since lots of tools are available in separate R packages. Firstly, we will learn how to get a list of installed packages. In second step, we pull the name of packages available in R. Then, we will learn how to save the name of the packages. After that, you can update R. Then, we learn to pull the names of packages in R console. At last, we go over how to reinstall all packages in R.

We can see the installed packages with installed.packages() function.

```
packages <- as.data.frame(installed.packages())
rownames(packages) <- NULL
```

**Check Out:***How to List Installed Packages with Versions in R*

After we obtained the list of install packages, we pull the names of packages.

```
out <- packages[,"Package"]
```

Let’s see the head of the package names available in R.

```
head(out)
## [1] "A3" "ABCanalysis" "abind" "ada" "admisc" "AER"
```

**Also Check:** *How to Install and Load a Package in R*

In this part, we save the names of the packages available in R with write.table() function.

```
write.table(out, file = "Package_List.txt", sep = "\t", row.names = FALSE, col.names = FALSE)
```

**Also Check:** *How to Clean Data in R*

Then, we can update our R programme. After we have the new version of R, we need to read the names of R packages we saved in .txt file. We read the package names with read.table() function.

```
List <- read.table("Package_List.txt")
```

The class of List object is data frame having just one column which is the package names. Let’s see the head of the package names.

```
head(List[,1])
## [1] "A3" "ABCanalysis" "abind" "ada" "admisc" "AER"
```

At last, we can install multiple R packages with install.packages() function.

```
install.packages(List[,1], repos = "https://cloud.r-project.org")
```

The application of the codes is available in our youtube channel below.

**Don’t forget to check:** *Missing Data Imputations in R – Mean, Median, Mode*

The post How to Reinstall All Packages After Updating R appeared first on Universe of Data Science.

]]>The post How to List Installed Packages with Versions in R appeared first on Universe of Data Science.

]]>Managing R packages is essential for R users. Firstly, we will learn how to get a list of installed packages. Secondly, we go over finding version of the package. Thirdly, we will learn the pathways of the R packages installed. At last, we check whether a package is installed or not.

In this article, we will learn the answers of the following questions.

- How can I get a list of installed packages?
- How to find out which package version is loaded in R?
- How do I find where R packages are installed?
- How do you check if an R package has been installed?

We can see the installed packages with installed.packages() function. Then, we pull the packages and their versions.

```
packINFO <- as.data.frame(installed.packages())[,c("Package", "Version")]
rownames(packINFO) <- NULL
```

Let’s see the head of installed packages and their versions.

```
head(packINFO)
## Package Version
## 1 A3 1.0.0
## 2 ABCanalysis 1.2.1
## 3 abind 1.4-5
## 4 ada 2.0-5
## 5 admisc 0.30
## 6 AER 1.2-10
```

**Check Out:***How to Install and Load a Package in R*

In this section, we learn how to find the package version in R. For instance, let’s find the version of onewaytests package (Dag et al., 2018).

```
packINFO[packINFO$Package == "onewaytests",]
## Package Version
## 300 onewaytests 2.7
packageVersion("onewaytests")
## [1] ‘2.7’
getNamespaceVersion("onewaytests")
## version
## "2.7"
```

**Also Check:** *How to Change Working Directory in R*

We can find the pathways of the R packages installed with .libPaths() function.

```
.libPaths()
## [1] "C:/Users/osmandag/Documents/R/win-library/4.0"
## [2] "C:/Program Files/R/R-4.0.2/library"
```

**Also Check:** *How to Import Data into R*

In this part, we learn how to check whether an R package is installed or not. For example, let’s check whether onewaytests package (Dag et al., 2018) is installed with system.file() function.

```
system.file(package = "onewaytests")
## [1] "C:/Users/osmandag/Documents/R/win-library/4.0/onewaytests"
```

The application of the codes is available in our youtube channel below.

**Don’t forget to check:** *How to Download and Install R for Windows*

**References**

Dag, O., Dolgun, A., Konar, N.M. (2018). onewaytests: An R Package for One-Way Tests in Independent Groups Designs. *R Journal*, 10(1), 175-199.

The post How to List Installed Packages with Versions in R appeared first on Universe of Data Science.

]]>The post Missing Data Imputations in R – Mean, Median, Mode appeared first on Universe of Data Science.

]]>In this tutorial, we learn three simple imputation methods in R. Firstly, we learn how to make missing data imputation with mean. Secondly, we go over median imputation. At last, we learn how to make mode imputation in R.

In our example, we create a vector including a missing observation. We find the place of missing observation with is.na() function. After that, we use mean() function to find by excluding missing observations. In our example, the mean of the vector is 225 after excluding missing observations.

```
data <- c(100, 200, 300, 300, NA)
data[is.na(data)] <- mean(data, na.rm = TRUE)
data
## [1] 100 200 300 300 225
```

**Check Out:***How to Remove Outliers from Data in R*

In this section, we learn how to conduct median imputation in R. We utilize the median of the vector with median() function by keeping the missing observations out. For our example data, the median is 250 after excluding NAs.

```
data <- c(100, 200, 300, 300, NA)
data[is.na(data)] <- median(data, na.rm = TRUE)
data
## [1] 100 200 300 300 250
```

**Also Check:** *How to Handle Missing Values in R*

In this part, we go over how to implement mode imputation in R. This imputation type is generally used for categorical variables. We need to use mode of the variable. For this purpose, we can find frequency of each value using table() function which removes the NAs in default. After finding the frequencies, we use which.max() function to find the place of highest frequency. Then, we use names() function to find the mode, but it returns the output with “character” class. Therefore, we use as.numeric() function to return output as numeric. In our example, the mode of the variable is 300 after keeping missing observations away.

```
data <- c(100, 200, 300, 300, NA)
data[is.na(data)] <- as.numeric(names(which.max(table(data))))
data
## [1] 100 200 300 300 300
```

**Also Check:** *How to Clean Data in R*

The application of the codes is available in our youtube channel below.

**Don’t forget to check:** *How to Merge Data Frames in R*

The post Missing Data Imputations in R – Mean, Median, Mode appeared first on Universe of Data Science.

]]>The post 4 Ways of Finding Unique Values in R appeared first on Universe of Data Science.

]]>In this tutorial, we learn how to find unique values in R. Firstly, we go over unique() function. Secondly, we learn how to use duplicated() function to obtain unique values. Thirdly, we learn how to use distinct() function available in dplyr package (Wickham et al., 2022). At last, we go over names() function to find unique values in R.

Let’s construct an example data including duplicated observations to illustrate how to find unique values in R.

```
data <- c("apple","banana","banana","carrot","carrot","carrot")
data
## [1] "apple" "banana" "banana" "carrot" "carrot" "carrot"
```

**Check Out:***How to Clean Data in R*

In this part, we use unique() function to find unique values in R.

```
unique(data)
## [1] "apple" "banana" "carrot"
```

**Also Check:** *How to Use apply Functions in R*

In this section, we use duplicated() function to obtain duplicated values. Then, we add the sign ! before the duplicated() function in the data to obtain unique values in R.

```
data[!duplicated(data)]
## [1] "apple" "banana" "carrot"
```

**Also Check:** *How to Recode Character Variables in R*

In this section, we learn distinct() function available in dplyr package (Wickham et al., 2022) to learn how to obtain unique values in R.

```
library(dplyr)
distinct(data.frame(data))[,1]
## [1] "apple" "banana" "carrot"
```

At last, we construct the frequency table by table() function. Then, we read the names of observations with names() function to find unique values in R.

```
names(table(data))
## [1] "apple" "banana" "carrot"
```

The application of the codes is available in our youtube channel below.

**Don’t forget to check:** *What are Data Types in R?*

**References**

Wickham, H., Francois, R., Henry, L., Muller, K. (2022). dplyr: A Grammar of Data Manipulation. R package version 1.0.10.

The post 4 Ways of Finding Unique Values in R appeared first on Universe of Data Science.

]]>The post How to Use apply Functions in R appeared first on Universe of Data Science.

]]>In this tutorial, we learn how to use apply functions in R. Firstly, we go over apply() function. Secondly, we learn how to use tapply() function to obtain results by groups. Thirdly, we learn how to use lapply() function to obtain results in listwise. Also, we use sapply() function to obtain results in vector. Moreover, we learn the use of vapply() function. At last, we go over mapply() function.

The apply() function takes data frames as an input. We can apply it by row or column. First, we find sum of the observations by row. Then, we calculate the sum of values by column.

```
mydata <- data.frame(x = 1:3, y = 11:13, z = 101:103)
apply(mydata, 1, sum)
## [1] 113 116 119
apply(mydata, 2, sum)
## x y z
## 6 36 306
```

**Check Out:***How to Merge Data Frames in R*

In this section, we go over tapply() function. This function is used to obtain the result by group. For this purpose, let’s contruct a data frame. Then, we use tapply() to obtain sum of observation for each group.

```
mydata <- data.frame(x = c("a","a","b","b"), y = 11:14)
tapply(mydata$y, mydata$x, sum)
## a b
## 23 27
```

In this part, we learn how to use lapply() function. Firstly, we construct a list as an example. Then, we obtain sum of the observations in each of list object and return the result as a list.

```
mylist <- list(1:3, 11:13, 101:103)
lapply(mylist, sum)
## [[1]]
## [1] 6
##
## [[2]]
## [1] 36
##
## [[3]]
## [1] 306
```

**Also Check:** *How to Remove Outliers from Data in R*

In this section, we use sapply() function. First, we construct a list as an example. Then, we obtain sum of the observations in each of list object and return the result as a vector.

```
mylist <- list(1:3, 11:13, 101:103)
sapply(mylist, sum)
## [1] 6 36 306
```

**Also Check:** *How to Find Class of Each Column in R Data Frame*

The vapply() function is similar tı sapply() function, but vapply() function requires the output type. In our example, we return the result as numeric.

```
mylist <- list(1:3, 11:13, 101:103)
vapply(mylist, sum, numeric(1))
## [1] 6 36 306
```

The mapply() requires a function and inputs. We construct a data frame. We need to have a function or we can use available function in R. Then, we can use mapply() function in which we put our function and inputs, respectively.

```
mydata <- data.frame(x = 1:3, y = 11:13, z = 101:103)
myfunction <- function(x, y, z){x+y+z}
mapply(myfunction, mydata$x, mydata$y, mydata$z)
## [1] 113 116 119
```

The application of the codes is available in our youtube channel below.

**Don’t forget to check:** *How to Convert All Columns of Data Frame to Numeric in R*

The post How to Use apply Functions in R appeared first on Universe of Data Science.

]]>The post How to Merge Data Frames in R appeared first on Universe of Data Science.

]]>In this tutorial, we will cover how to merge data frames in different ways. We will learn combining data frames by common ids, first data frame ids, second data frame ids and all ids. There are commonly used three ways of merging data frames in R. Firstly, we will learn how to join the data frames by using merge() function. Secondly, we learn dplyr package (Wickham et al., 2022) to merge data frames in R. At last, we use tidyverse package (Wickham et al., 2019) to combine data drames in R.

Let’s construct two data frames to illustrate how to merge data frames in R.

```
data1 <- data.frame(id = 1:4, x1 = 101:104)
data1
## id x1
## 1 1 101
## 2 2 102
## 3 3 103
## 4 4 104
data2 <- data.frame(id = 3:6, x2 = 13:16)
data2
## id x2
## 1 3 13
## 2 4 14
## 3 5 15
## 4 6 16
```

**Check Out:***How to Convert All Columns of Data Frame to Numeric in R*

In this part, we use merge() function to combine the data frames by common ids, first data frame ids, second data frame ids and all ids, respectively.

```
merge(data1, data2, by = "id")
## id x1 x2
## 1 3 103 13
## 2 4 104 14
merge(data1, data2, by = "id", all.x = TRUE)
## id x1 x2
## 1 1 101 NA
## 2 2 102 NA
## 3 3 103 13
## 4 4 104 14
merge(data1, data2, by = "id", all.y = TRUE)
## id x1 x2
## 1 3 103 13
## 2 4 104 14
## 3 5 NA 15
## 4 6 NA 16
merge(data1, data2, by = "id", all.x = TRUE, all.y = TRUE)
## id x1 x2
## 1 1 101 NA
## 2 2 102 NA
## 3 3 103 13
## 4 4 104 14
## 5 5 NA 15
## 6 6 NA 16
```

**Also Check:** *How to Find Class of Each Column in R Data Frame*

In this section, we learn inner_join(), left_join(), right_join() and full_join() functions available in dplyr package (Wickham et al., 2022) to merge data frames by common ids, first data frame ids, second data frame ids and all ids, respectively.

```
library(dplyr)
data1 %>% inner_join(data2, by = 'id')
## id x1 x2
## 1 3 103 13
## 2 4 104 14
data1 %>% left_join(data2, by = 'id')
## id x1 x2
## 1 1 101 NA
## 2 2 102 NA
## 3 3 103 13
## 4 4 104 14
data1 %>% right_join(data2, by = 'id')
## id x1 x2
## 1 3 103 13
## 2 4 104 14
## 3 5 NA 15
## 4 6 NA 16
data1 %>% full_join(data2, by = 'id')
## id x1 x2
## 1 1 101 NA
## 2 2 102 NA
## 3 3 103 13
## 4 4 104 14
## 5 5 NA 15
## 6 6 NA 16
```

**Also Check:** *How to Round Data Frame Containing Character Variables in R*

In this part, we first need to list the data frames. Then, we use reduce() function. Inside reduce() function, inner_join, left_join, right_join and full_join must be defined to merge data frames by common ids, first data frame ids, second data frame ids and all ids, respectively.

```
library(tidyverse)
data_list <- list(data1, data2)
data_list %>% reduce(inner_join, by = 'id')
## id x1 x2
## 1 3 103 13
## 2 4 104 14
data_list %>% reduce(left_join, by = 'id')
## id x1 x2
## 1 1 101 NA
## 2 2 102 NA
## 3 3 103 13
## 4 4 104 14
data_list %>% reduce(right_join, by = 'id')
## id x1 x2
## 1 3 103 13
## 2 4 104 14
## 3 5 NA 15
## 4 6 NA 16
data_list %>% reduce(full_join, by = 'id')
## id x1 x2
## 1 1 101 NA
## 2 2 102 NA
## 3 3 103 13
## 4 4 104 14
## 5 5 NA 15
## 6 6 NA 16
```

The application of the codes is available in our youtube channel below.

**Don’t forget to check:** *How to Sort a Data Frame by Single and Multiple Columns in R*

**References**

Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L. D. A., François, R., … & Yutani, H. (2019). Welcome to the Tidyverse. *Journal of open source software*, *4*(43), 1686.

Wickham, H., Francois, R., Henry, L., Muller, K. (2022). dplyr: A Grammar of Data Manipulation. R package version 1.0.10.

The post How to Merge Data Frames in R appeared first on Universe of Data Science.

]]>The post How to Convert All Columns of Data Frame to Numeric in R appeared first on Universe of Data Science.

]]>In this tutorial, we learn three ways of converting all data frame columns to numeric in R. Firstly, we go over dplyr package to convert the columns to numeric in data frame. Secondly, we work on sapply() function to convert all columns to numeric in R data frame. At last, we learn how to convert all columns of data frame to numeric in R using apply() function.

Let’s construct a data frame including the variables with different classes as an example data frame.

```
a <- c(1,3,5,-4)
b <- c("5","3","1","4")
c <- c(3L,-2L,4L,7L)
data <- data.frame(a,b,c)
data
## a b c
## 1 1 5 3
## 2 3 3 -2
## 3 5 1 4
## 4 -4 4 7
sapply(data, class)
## a b c
## "numeric" "character" "integer"
```

**Check Out:***How to Find Class of Each Column in R Data Frame*

In this part, we use mutate_at() function available in dplyr package to convert the columns to numeric in data frame.

```
library(dplyr)
data2 <- data %>% mutate_at(1:3, as.numeric)
data2
## a b c
## 1 1 5 3
## 2 3 3 -2
## 3 5 1 4
## 4 -4 4 7
sapply(data2, class)
## a b c
## "numeric" "numeric" "numeric"
```

**Also Check:** *How to Clean Data in R*

In this section, we learn sapply() function to change the classes of all data frame columns to numeric in R. When we use sapply() function, the class of data frame becomes matrix or array. Therefore, we need to convert the class of data to data frame with as.data.frame() function.

```
data2 <- sapply(data, as.numeric)
data2 <- as.data.frame(data2)
data2
## a b c
## 1 1 5 3
## 2 3 3 -2
## 3 5 1 4
## 4 -4 4 7
sapply(data2, class)
## a b c
## "numeric" "numeric" "numeric"
```

**Also Check:** *How to Remove Outliers from Data in R*

In this part, we work on apply() function to change the classes of all data frame columns to numeric in R. When we use apply() function, the class of data frame becomes matrix or array. Therefore, we need to convert the class of data to data frame with as.data.frame() function.

```
data2 <- apply(data, 2, function(x) as.numeric(x))
data2 <- as.data.frame(data2)
data2
## a b c
## 1 1 5 3
## 2 3 3 -2
## 3 5 1 4
## 4 -4 4 7
sapply(data2, class)
## a b c
## "numeric" "numeric" "numeric"
```

The application of the codes is available in our youtube channel below.

**Don’t forget to check:** *How to Sort a Data Frame by Single and Multiple Columns in R*

**References**

Wickham, H., Francois, R., Henry, L., Muller, K. (2022). dplyr: A Grammar of Data Manipulation. R package version 1.0.10.

The post How to Convert All Columns of Data Frame to Numeric in R appeared first on Universe of Data Science.

]]>