Making diagrams with R
Part 1: Starting with R – installation : http://msprogrammer.serviciipeweb.ro/2016/10/03/starting-with-r-as-in-sqlserver-r-server-and-vs-package/
Part 2: Starting with R – language and data http://msprogrammer.serviciipeweb.ro/2016/10/10/starting-with-r-language-and-data/
Part 3: Starting with R – basic and statistics http://msprogrammer.serviciipeweb.ro/2016/10/17/starting-with-r-basic-and-statistics/
Part 4: Making diagrams with R ( and transformation of data) : http://msprogrammer.serviciipeweb.ro/2016/10/31/making-diagrams-with-r/
Part 5: Forecasting currency exchange with R http://msprogrammer.serviciipeweb.ro/2016/11/06/forecasting-eurbnr-currency-exchange/
I have a site that list currency exchange data from National Bank of Romania ( www.infovalutar.ro ).
I was having a problem about how looks EUR data for each year that I recorded
My data was in a table structured like this
IDMoneda,Valoare,DataValorii
USD,0.8121,1998-01-05 00:00:00
USD,0.8203,1998-01-06 00:00:00
USD,0.8236,1998-01-07 00:00:00
USD,0.8269,1998-01-08 00:00:00
USD,0.836,1998-01-09 00:00:00
USD,0.842,1998-01-12 00:00:00
USD,0.8415,1998-01-13 00:00:00
USD,0.8407,1998-01-14 00:00:00
USD,0.8418,1998-01-15 00:00:00
USD,0.8423,1998-01-16 00:00:00
USD,0.8418,1998-01-19 00:00:00
….
EUR,1.3062,1999-01-04 00:00:00
EUR,1.3169,1999-01-05 00:00:00
EUR,1.3155,1999-01-06 00:00:00
EUR,1.3112,1999-01-07 00:00:00
EUR,1.3142,1999-01-08 00:00:00
EUR,1.3053,1999-01-11 00:00:00
EUR,1.2872,1999-01-12 00:00:00
EUR,1.3109,1999-01-13 00:00:00
EUR,1.3093,1999-01-14 00:00:00
EUR,1.3162,1999-01-15 00:00:00
EUR,1.3192,1999-01-18 00:00:00
EUR,1.3326,1999-01-19 00:00:00
….
And so on . The first column, IDMoneda, it is a ID of the currency (USD , EUR). The second, Valoare, is the Value. The third, DataValorii, is the value date( You can find all the data at https://1drv.ms/u/s!Aordxu1LWKDZgf1i0-y8cDIDoKNx2w as a 7z file)
The task that I want is to have a diagram with the values for each year for EUR , in order to figure how it is goinf.
I have started small, with importing data:
BNR <- read.csv(file= "e:\\bnr.csv")
Then I have display the data
head(bnr,12)
Ok. So far so good. But I need the data in this form
EUR, DataValorii ( DateValue)
1.3062,1999-01-04 00:00:00
1.3169,1999-01-05 00:00:00
and EUR is on the row, not on the column. In SqlServer it is called PIVOT(https://technet.microsoft.com/en-us/library/ms177410(v=sql.105).aspx)
In R I have discovered the following link http://garrettgman.github.io/tidying/ and a library:
install.packages(‘tidyr’)
library(tidyr)
And then I have executed
bnr1<-spread(BNR, IDMoneda, Valoare)
head(bnr1,12)
and voila – it is pivoting over the IDMoneda column
DataValorii AED ATS AUD BEF ….
I want to exclude NA values ( EUR was not from the first date, so some rows with EUR are nulls)
eur <-na.omit( data.frame(bnr1$EUR, bnr1$DataValorii))
And want to put better names of the columns
colnames(eur)[1] <- "EUR"
colnames(eur)[2] <- "Data"
Now we can see
head(eur,10)
And now in dataframe eur I have
EUR Data
256 1.3062 1999-01-04 00:00:00
257 1.3169 1999-01-05 00:00:00
258 1.3155 1999-01-06 00:00:00
259 1.3112 1999-01-07 00:00:00
260 1.3142 1999-01-08 00:00:00
261 1.3053 1999-01-11 00:00:00
262 1.2872 1999-01-12 00:00:00
263 1.3109 1999-01-13 00:00:00
264 1.3093 1999-01-14 00:00:00
265 1.3162 1999-01-15 00:00:00
Now I want to transform second column into a date ( we can look with str(eur) to the structure of the dataframe)
eur$RealDate<- as.Date(eur$Data)
And add a year column
eur$Year<-as.numeric(format(eur$RealDate,’%Y’))
Now
head(eur, 10)
looks like
EUR Data RealDate Year
256 1.3062 1999-01-04 00:00:00 1999-01-04 1999
257 1.3169 1999-01-05 00:00:00 1999-01-05 1999
258 1.3155 1999-01-06 00:00:00 1999-01-06 1999
259 1.3112 1999-01-07 00:00:00 1999-01-07 1999
260 1.3142 1999-01-08 00:00:00 1999-01-08 1999
261 1.3053 1999-01-11 00:00:00 1999-01-11 1999
262 1.2872 1999-01-12 00:00:00 1999-01-12 1999
263 1.3109 1999-01-13 00:00:00 1999-01-13 1999
264 1.3093 1999-01-14 00:00:00 1999-01-14 1999
265 1.3162 1999-01-15 00:00:00 1999-01-15 1999
And the structure
str(eur)
looks like
‘data.frame’: 3834 obs. of 4 variables:
$ EUR : num 1.31 1.32 1.32 1.31 1.31 …
$ Data : Factor w/ 4119 levels "1998-01-05 00:00:00",..: 256 257 258 259 260 261 262 263 264 265 …
$ RealDate: Date, format: "1999-01-04" "1999-01-05" "1999-01-06" "1999-01-07" …
$ Year : num 1999 1999 1999 1999 1999 …
– attr(*, "na.action")=Class ‘omit’ Named int [1:285] 1 2 3 4 5 6 7 8 9 10 …
.. ..- attr(*, "names")= chr [1:285] "1" "2" "3" "4" …
Now we want to filter the data based on the column Year, for example, for this year:
install.packages("dplyr")
library("dplyr")
thisYear= as.numeric(format(Sys.Date(),’%Y’))
eurThisYear <-filter(eur,Year == thisYear)
head(eurThisYear,10)
And the result is
EUR Data RealDate Year
1 4.4629 2016-03-09 00:00:00 2016-03-09 2016
2 4.4624 2016-03-11 00:00:00 2016-03-11 2016
3 4.4705 2016-03-15 00:00:00 2016-03-15 2016
4 4.4775 2016-03-16 00:00:00 2016-03-16 2016
5 4.4765 2016-03-17 00:00:00 2016-03-17 2016
6 4.4694 2016-03-22 00:00:00 2016-03-22 2016
7 4.4641 2016-03-23 00:00:00 2016-03-23 2016
8 4.4655 2016-03-24 00:00:00 2016-03-24 2016
9 4.4639 2016-03-25 00:00:00 2016-03-25 2016
10 4.4619 2016-03-28 00:00:00 2016-03-28 2016
Now we want to extract the distinct years from column YEAR from dataframe eur and for each year generate a plot
install.packages(‘foreach’)
library(‘foreach’)
foreach(n = unique(eur$Year)) %do% {
fileName= paste(n,"BNR_img.jpg",sep="_")
eurThisYear <-filter(eur,Year == n)
plot(eurThisYear$RealDate, eurThisYear$EUR ,main=paste("EUR in year",n,sep=" "), sub="National Bank of Romania", xlab="Months", ylab="Value 1 EUR in RON")
dev.copy(jpeg,filename=fileName)
dev.off()
}
The results can be found here https://1drv.ms/f/s!Aordxu1LWKDZgf1h_iMeIEbEWhgNtQ
or here: https://1drv.ms/a/s!Aordxu1LWKDZgf13J4aeGHLEo2sIvQ
Total time spent (without knowing R, but just searching help) : 1 hour.
R is a wonderfull data scripting !
Leave a Reply