R language can also be used for commercial applications. In this article, I will describe how R language can be used for a commercial application like Payroll. In this exercise let us assume that in a typical Indian Company, every employee gets two types of allowances called Dearness Allowance(DA) and House Rent Allowance(HRA) besides the basic Salary. Let us assume that the DA and HRA are 22% and 30% of Basic Salary respectively. In addition, the company also deducts IncomeTax, which is 10% of Basic Salary. The Gross Salary is obtained by adding the Basic Salary, DA and HRA. Net Salary is obtained by subtracting the IncomeTax amount from the Gross Salary. It is required to do all the calculations for each employee and generate the Payslips for each employee in a specified format mentioning all these details.

Sample Data

Some sample data of four employees is shown below :

Calculations and Printing of Payslips

The following User Defined Function payroll() will perform all the calculations and returns the results as list of class “payroll” :

payroll<-function(df) {

lst<-list()

lst$eno<-df$eno

lst$name<-df$name

lst$dept<-df$dept

lst$salary<-df$salary

lst$da<-df$salary*0.22

lst$hra<-df$salary*0.3

lst$gpay<-df$salary+lst$da+lst$hra

lst$itax<-df$salary*0.1

lst$npay<-lst$gpay-lst$itax

class(lst)<-“payroll”

lst

}

In order to print the payslips the print function of payroll is defined as follows :

print.payroll<-function(x) {

cat(“\n”)

for(i in 1:n) {

cat(“Eno : “,x[[1]][i],”\n”)

cat(“Name : “,x[[2]][i],”\n”)

cat(“Dept : “,x[[3]][i],”\n”)

cat(“Salary : “,x[[4]][i],”\n”)

cat(“DA : “,x[[5]][i],”\n”)

cat(“HRA : “,x[[6]][i],”\n”)

cat(“Gross Pay : “,x[[7]][i],”\n”)

cat(“ITax : “,x[[8]][i],”\n”)

cat(“Net Salary : “,x[[9]][i],”\n\n”)

for(i in 1:40) {

cat(“=”)

}

cat(“\n”)

}

}

The following R code will read the data from the excel data file payroll.csv, do all the calculations and prints the payslips for all the employees.

df<-read.csv(“g:/RExercises/OOP/payroll.csv”,header=TRUE,sep=”,”,

stringsAsFactors=FALSE)

n<-nrow(df)

res<-payroll(df)

res

Payslips printing

The payslips generated by the above code is given below :

Conclusions

R language program for developing a commercial application like Payroll is described in this article.

]]>

There are many packages available in R like data.table, tables, psych etc. to provide descriptive statistics like mean, standard deviation etc. group-wise(factor-wise) for number variables. In this article, an attempt is made to generate similar type of tabulated results utilizing the functions available in the base package and the concepts of object oriented system available with R. The main purpose of this type of exercise is to illustrate the application of object oriented system available in R to generate the output results as per our requirement. For the purpose of illustration, the iris data is considered, which consists of the data of four variables Sepal Length, Sepal Width, Petal Length and Petal Width for three species(factors). The following algorithm and R code illustrate the calculation of mean and standard deviations of these four variables for each specie and generate a tabulated results, which is similar to those obtained from the above packages.

**Algorithm and Code**

1.For generating the mean and standard deviation of any vector, a user defined function meansd() is defined as follows :

meansd<-function(x) {

l<-list()

l$Mean<-mean(x)

l$SD<-sd(x)

return(l)

}

The above function receives any vector as input, calculates the mean and standard deviation and returns the results as a list l.

2.Initially, the execution starts by calling a function with just two arguments viz., i).the data frame containing all the variables for which mean and standard deviation are required and ii). a vector containing the factor variable. So a new function basstat() is defined with just two arguments, the first containing the data frame all the variables and the second containing the factor variable. For the iris data we can call this function as given below.

res<-basstat(iris[,1:4],iris[,5])

3. The basstat() function will split the data iris specie-wise into a list containing three sub data frames, one for each specie. We will now use lapply function, which in turn calls another function result() for each of these sub data frames and obtains the aggregated results in a variable “bres”. For the purpose of printing these aggregated results in a neat tabular fashion, we will take the help of object oriented programming concepts of R. For this purpose, we will change the class of bres as “myclass” and return this bres object. The basstat () function code is given below :

basstat<-function(df,f) {

l<-split(df,f)

res<-lapply(l,result)

class(res)<-“myclass”

return(res)

}

4.The lapply function in step 3 in turn, is calling the function result(), using each sub data frame as input argrument. The result function, contains a sapply() function. This function in turn will call the meansd function, with each of these sub data frames one at a time and receives the mean and standard deviation results for all the variables in the sub data frames. It will capture them in the object “tres” and returns these results to the calling function lapply. The code of the result() function is given below :

result<-function(x) {

tres<-sapply(x,meansd)

return(tres)

}

Through all these function calls, all the results are now available in the object res, which is of class “myclass”. The results obtained from all these function calls are available specie-wise but not in a neat compact tabular fashion as shown below.

5.To facilitate the printing in a compact neat tabular fashion, the print function of myclass is defined as follows. This function, in turn cbinds all the results, does the required string manipulations and finally prints the results in a neat tabular fashion. The code of the print.myclass() function is given below :

print.myclass<-function(x) {

nm<-names(x)

options(digits=4)

finres<-vector()

for(i in 1:length(x)) {

finres<-cbind(finres,t(x[[i]]))

}

cat(” “)

tsp<-max(nchar(names(x)))

isp<-paste(rep(” “,tsp),collapse=””)

cat(isp)

for(i in 1:length(nm)) {

tt<-nchar(nm[i])

ifelse((tt<12),esp<-(12-tt),esp<-1)

rsp<-paste(rep(” “,esp),collapse=””)

nm[i]<-paste(nm[i],rsp)

cat(nm[i])

}

cat(“\n”)

print(finres)

}

6.These results can be printed by just typing the res object of step 2

>res

The code for using the basstat() function and the results obtained are given below :

**Some more Results** :

i).Descriptive Statistics of six variables mpg,disp,hp,drat,wt,qsec for the factor cyl consisting of the levels/groups viz., cylinder 4, 6 and 8 of the dataset mtcars of MASS package

res1<-basstat(mtcars[,c(1,3,4,5,6,7)],mtcars[,2])

res1

ii).Descriptive Statistics of two variables Prewt and Postwt for the groupsCBT, Cont and FT of the dataset anorexia of MASS package

iii).Desriptive Statistics of three variables Price, MPG.city and MPG.highway for the groups Compact, Large, Midsize, Small, Sporty and Van of the dataset Cars93 of MASS package

**Conclusions**

The tabulations of the output results obtained from all the above examples are found to be similar to those obtained from the data tables and tables packages. We could achieve this by using object oriented concepts of R language. In this exercise, I have obtained the mean and standard deviations of number of variables group-wise. It is also possible to modify the program to obtain the other statistics like min, max, median, 1st and 3rd quartiles etc. for number of variables group-wise.

]]>1.RcmdrPlugin.bca – Business and Customer Analytics

2.RcmdrPlugin.depthTools – A package that implements different statistical tools for the description and analysis of gene expression data based on the concept of data depth

3.RcmdrPlugin.DoE – Design of Experiments

4.RcmdrPlugin.EBM – Evidence Based Medicine plug-in

5.RcmdrPlugin.epack – Plugin for Time Series

6. RcmdrPlugin.EZR : adds a variety of statistical functions, including survival analyses, ROC analyses, metaanalyses, sample size calculation, and so on

7.RcmdrPlugin.FactoMineR : dedicated to multivariate Data Analysis

8.RcmdrPlugin.KMggplor2 – Kaplan-Meier plots and other plots by using the ggplot2 package

9. RcmdrPlugin.NMBU – extends linear models and provides new extended interfaces for PCA,PLS,LDA,QDA, clustering of variables, tests, plots etc.

10. RcmdrPlugin.sampling – provides tools for calculating sample sizes and selecting samples using various sampling designs

11. Rcmdr.survival : survival package, with dialogs for Cox models, parametric survival regression models, estimation of survival curves etc.

12. Rcmdr.temis – provides an integrated solution to perform a series of text mining tasks

]]>

The following is the csv file containing the patients treatments data.

1.Read the csv file and convert it as a matrix “trt”. Next extract the four rows of the matrix and convert them into vectors.

2.Create a list and subsequently using stack() function convert this list into a dataframe “df”.

3. The above stack function creates the dataframe df with two columns ind and values. ind is a categorical variable(factors) and values variable contain the Hb percentage levels. Rename the column ind as “Treatments” and perform one-way ANOVA.

]]>

**1.Mean and Standard deviation for all the four variables specie-wise using data-table package**

2**.Mean and Standard deviation for all the four variables specie-wise using tables package**

3**.Mean and Standard deviation for all the four variables specie-wise using psych package**