Category: r

Pivot / Transform rows from one column into multiple columns with R in SqlServer

Let’s say we have a table  [CurrencyValues] with the following datas

CurrencyID CurrencyValue ValueDate
CAD 2.9929 9/27/2016 0:00
EUR 4.4495 9/27/2016 0:00
GBP 5.1205 9/27/2016 0:00
USD 3.9567 9/27/2016 0:00
CAD 3.0006 9/26/2016 0:00
EUR 4.4504 9/26/2016 0:00
GBP 5.1213 9/26/2016 0:00
USD 3.9589 9/26/2016 0:00
CAD 3.0397 9/23/2016 0:00
EUR 4.447 9/23/2016 0:00

We want to transform the CurrencyID into columns( i.e. a column with CAD, other with EUR, other with GBP and so on) .

We can utilize R from SqlServer (https://msdn.microsoft.com/en-us/library/mt604885.aspx )

We have 2 prerequisites;

1. R is running inside SqlServer

2. The library tidyR is downloaded

( see the final of this post to verify)

Then we execute the following script:

 

 

execute sp_execute_exterhttp://msprogrammer.serviciipeweb.ro/wp-admin/widgets.phpnal_script
@language = N'R'
, @script = N'
library(tidyr);
model<-spread(InputDataSet, CurrencyID, CurrencyValue);
OutputDataSet <- model'
, @input_data_1 = N'select   CurrencyID, CurrencyValue, ValueDate  from CurrencyValues '

WITH RESULT SETS undefined

The output is this:

(No column name) (No column name) (No column name) (No column name) (No column name)
00:00.0 3.0397 4.447 NULL NULL
00:00.0 3.0006 4.4504 5.1213 3.9589
00:00.0 2.9929 4.4495 5.1205 3.9567

The problem is that we have columns without names (spread returns data frame columns names, but SqlServer does not … yet, I hope)

So we add the column names:

execute sp_execute_external_script
@language = N'R'
, @script = N'
library(tidyr);
model<-spread(InputDataSet, CurrencyID, CurrencyValue);
cols<-paste(colnames(model),collapse = ",");
model<-cbind(ColumnNames = cols, model);

OutputDataSet <- model'
, @input_data_1 = N'select   CurrencyID, CurrencyValue, ValueDate  from CurrencyValues '

WITH RESULT SETS undefined

and  the result is slightly better:

(No column name) (No column name) (No column name) (No column name) (No column name) (No column name)
ValueDate,CAD,EUR,GBP,USD 00:00.0 3.0397 4.447 NULL NULL
ValueDate,CAD,EUR,GBP,USD 00:00.0 3.0006 4.4504 5.1213 3.9589
ValueDate,CAD,EUR,GBP,USD 00:00.0 2.9929 4.4495 5.1205 3.9567

What I want? Sql Server should return column names in data frame by default …

And that is all  –  execute sp_execute_external_script have the possibility of parameters, so you can make general.

End!

——————————————————————-

Note 1: R is running inside SqlServer

Verify

exec sp_execute_external_script  
  @language =N'R',    
  @script=N'OutputDataSet<-InputDataSet',      
  @input_data_1 =N'select 1 as hello'    
  with result sets (([hello] int not null));    
go

If it gives error , then follow https://msdn.microsoft.com/en-us/library/mt696069.aspx  – basic is

Exec sp_configure  'external scripts enabled', 1  
Reconfigure  with  override

and restart SqlServer

 

Note 2: Installing package tidyr

Follow https://msdn.microsoft.com/en-us/library/mt591989.aspx : basic is starting R with admin rights and executing

lib.SQL <- "C:\\Program Files\\Microsoft SQL Server\\MSSQL13.MSSQLSERVER\\R_SERVICES\\library"
install.packages("tidyr", lib = lib.SQL)

and that will be all ( ok, you may find some additional libraries to install)

Forecasting EUR–BNR currency exchange

Part 1: Starting with R – installation : http://msprogrammer.serviciipeweb.ro/2016/10/03/starting-with-r-as-in-sqlserver-r-server-and-vs-package/

Part 2: Starting with R – language and data http://msprogrammer.serviciipeweb.ro/2016/10/10/starting-with-r-language-and-data/

Part 3: Starting with R – basic and statistics http://msprogrammer.serviciipeweb.ro/2016/10/17/starting-with-r-basic-and-statistics/

Part 4: Making diagrams with R ( and transformation of data) : http://msprogrammer.serviciipeweb.ro/2016/10/31/making-diagrams-with-r/

Part 5:  Forecasting currency exchange with R   http://msprogrammer.serviciipeweb.ro/2016/11/06/forecasting-eurbnr-currency-exchange/

Let’s suppose that we have data from previous post

The form of the data was:

head(EUR,10)

      EUR                Data   RealDate Year
1  4.4629 2016-03-09 00:00:00 2016-03-09 2016
2  4.4624 2016-03-11 00:00:00 2016-03-11 2016
3  4.4705 2016-03-15 00:00:00 2016-03-15 2016
4  4.4775 2016-03-16 00:00:00 2016-03-16 2016
5  4.4765 2016-03-17 00:00:00 2016-03-17 2016
6  4.4694 2016-03-22 00:00:00 2016-03-22 2016
7  4.4641 2016-03-23 00:00:00 2016-03-23 2016
8  4.4655 2016-03-24 00:00:00 2016-03-24 2016
9  4.4639 2016-03-25 00:00:00 2016-03-25 2016
10 4.4619 2016-03-28 00:00:00 2016-03-28 2016

After reading materials

http://www.statoek.wiso.uni-goettingen.de/veranstaltungen/zeitreihen/sommer03/ts_r_intro.pdf

http://a-little-book-of-r-for-time-series.readthedocs.io/en/latest/src/timeseries.html

https://www.otexts.org/fpp/8/7

http://stats.stackexchange.com/questions/14742/auto-arima-with-daily-data-how-to-capture-seasonality-periodicity

https://www.r-bloggers.com/time-series-analysis-using-r-forecast-package/

http://www.statmethods.net/advstats/timeseries.html

and understanding that time series is not for beginners, I decide to find R packages that can do the job for me “automatically” . I mean, I give the pervious data –give me the forecast

So this link proved to be useful: http://blog.revolutionanalytics.com/2013/06/learning-time-series-with-r.html 

So this is the code  to importing forecast package( to forecast) and xts – extended time series – to can specify dates .

install.packages("forecast")
install.packages("xts")
library("forecast")
library("xts")

Then the real code:

myts <- xts(eur$EUR, eur$RealDate)

To see the data

head(myts,3)

and the result:

             [,1]
1999-01-04 1.3062
1999-01-05 1.3169
1999-01-06 1.3155

Data from backwards

tail(myts,3)

and the result

            [,1]
2016-09-23 4.4470
2016-09-26 4.4504
2016-09-27 4.4495

Now to the forecasting: tried to generate a model with auto.arima
fit<- auto.arima(myts)

auto.arima result was a warning also

Warning message:
In auto.arima(myts) :
  Unable to fit final model using maximum likelihood. AIC value approximated

So I tried

fit <- ets(myts)

that worked without warnings.

So let’s try a forecast for 5 days:

forecast(fit,5)

and the result

   Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
3835       4.449745 4.428025 4.471464 4.416527 4.482962
3836       4.449989 4.419210 4.480769 4.402916 4.497062
3837       4.450234 4.412485 4.487983 4.392502 4.507966
3838       4.450478 4.406836 4.494120 4.383734 4.517223
3839       4.450723 4.401874 4.499572 4.376014 4.525431

( below the arima result

     Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
3835       4.449287 4.427748 4.470825 4.416347 4.482226
3836       4.448930 4.416600 4.481259 4.399485 4.498374
3837       4.448847 4.409238 4.488455 4.388270 4.509423
3838       4.448763 4.403960 4.493565 4.380244 4.517282
3839       4.448666 4.399278 4.498054 4.373133 4.524198

)

And this is the situation

RealValues

(RV)

ETS

Dif ETS

(ETS-RV)/RV*100

arima

Dif Arima

(A-RV)/RV*100

4.4493

4.449745

0.0100%

4.449287

0.0003%

4.4514

4.449989

0.0317%

4.448930

0.0555%

4.4523

4.450234

0.0464%

4.448847

0.0776%

4.4484

4.450478

0.0467%

4.448763

0.0082%

4.4576

4.450723

0.1543%

4.448666

0.2004%

Apparently , ETS is pretty ok ( with the exception of first result) 

Making diagrams with R

Part 1: Starting with R – installation : http://msprogrammer.serviciipeweb.ro/2016/10/03/starting-with-r-as-in-sqlserver-r-server-and-vs-package/

Part 2: Starting with R – language and data http://msprogrammer.serviciipeweb.ro/2016/10/10/starting-with-r-language-and-data/

Part 3: Starting with R – basic and statistics http://msprogrammer.serviciipeweb.ro/2016/10/17/starting-with-r-basic-and-statistics/

Part 4: Making diagrams with R ( and transformation of data) : http://msprogrammer.serviciipeweb.ro/2016/10/31/making-diagrams-with-r/

Part 5:  Forecasting currency exchange with R   http://msprogrammer.serviciipeweb.ro/2016/11/06/forecasting-eurbnr-currency-exchange/

I have a site that list currency exchange data from National Bank of Romania ( www.infovalutar.ro ).

I was having a problem about how looks EUR data for each year that I recorded

My data was in a table structured like this

IDMoneda,Valoare,DataValorii

USD,0.8121,1998-01-05 00:00:00
USD,0.8203,1998-01-06 00:00:00
USD,0.8236,1998-01-07 00:00:00
USD,0.8269,1998-01-08 00:00:00
USD,0.836,1998-01-09 00:00:00
USD,0.842,1998-01-12 00:00:00
USD,0.8415,1998-01-13 00:00:00
USD,0.8407,1998-01-14 00:00:00
USD,0.8418,1998-01-15 00:00:00
USD,0.8423,1998-01-16 00:00:00
USD,0.8418,1998-01-19 00:00:00

….

EUR,1.3062,1999-01-04 00:00:00
EUR,1.3169,1999-01-05 00:00:00
EUR,1.3155,1999-01-06 00:00:00
EUR,1.3112,1999-01-07 00:00:00
EUR,1.3142,1999-01-08 00:00:00
EUR,1.3053,1999-01-11 00:00:00
EUR,1.2872,1999-01-12 00:00:00
EUR,1.3109,1999-01-13 00:00:00
EUR,1.3093,1999-01-14 00:00:00
EUR,1.3162,1999-01-15 00:00:00
EUR,1.3192,1999-01-18 00:00:00
EUR,1.3326,1999-01-19 00:00:00

….

And so on . The first column, IDMoneda, it is a ID of the currency (USD , EUR). The second, Valoare, is the Value. The third, DataValorii, is the value date( You can find all the data at https://1drv.ms/u/s!Aordxu1LWKDZgf1i0-y8cDIDoKNx2w  as a 7z file)

The task that I want is to have a diagram with the values for each year for EUR , in order to figure how it is goinf.

I have started small, with importing data:

BNR <- read.csv(file= "e:\\bnr.csv")

Then I have display the data

head(bnr,12)

Ok. So far so good. But I need the data in this form

EUR, DataValorii ( DateValue)

1.3062,1999-01-04 00:00:00

1.3169,1999-01-05 00:00:00

and EUR is on the row, not on the column. In SqlServer it is called PIVOT(https://technet.microsoft.com/en-us/library/ms177410(v=sql.105).aspx)

In R I have discovered the following link http://garrettgman.github.io/tidying/ and a library:

install.packages(‘tidyr’)

library(tidyr)

And then I have executed

 bnr1<-spread(BNR, IDMoneda, Valoare)

head(bnr1,12)

and voila – it is pivoting over the IDMoneda column

DataValorii AED    ATS    AUD    BEF  ….

I want to exclude NA values ( EUR was not from the first date, so some rows with EUR are nulls)

eur <-na.omit( data.frame(bnr1$EUR, bnr1$DataValorii))

And want to put better names of the columns

colnames(eur)[1] <- "EUR"
colnames(eur)[2] <- "Data"

Now we can see

head(eur,10)

And now in dataframe eur I have

  EUR                Data  
256 1.3062 1999-01-04 00:00:00
257 1.3169 1999-01-05 00:00:00
258 1.3155 1999-01-06 00:00:00
259 1.3112 1999-01-07 00:00:00
260 1.3142 1999-01-08 00:00:00
261 1.3053 1999-01-11 00:00:00
262 1.2872 1999-01-12 00:00:00
263 1.3109 1999-01-13 00:00:00
264 1.3093 1999-01-14 00:00:00
265 1.3162 1999-01-15 00:00:00

Now I want to transform second column into a date ( we can look with str(eur) to the structure of the dataframe)

eur$RealDate<- as.Date(eur$Data)

And add a year column

eur$Year<-as.numeric(format(eur$RealDate,’%Y’))

Now

head(eur, 10)

looks like

    EUR                Data   RealDate Year
256 1.3062 1999-01-04 00:00:00 1999-01-04 1999
257 1.3169 1999-01-05 00:00:00 1999-01-05 1999
258 1.3155 1999-01-06 00:00:00 1999-01-06 1999
259 1.3112 1999-01-07 00:00:00 1999-01-07 1999
260 1.3142 1999-01-08 00:00:00 1999-01-08 1999
261 1.3053 1999-01-11 00:00:00 1999-01-11 1999
262 1.2872 1999-01-12 00:00:00 1999-01-12 1999
263 1.3109 1999-01-13 00:00:00 1999-01-13 1999
264 1.3093 1999-01-14 00:00:00 1999-01-14 1999
265 1.3162 1999-01-15 00:00:00 1999-01-15 1999

And the structure

str(eur)

looks like

‘data.frame’:   3834 obs. of  4 variables:
$ EUR     : num  1.31 1.32 1.32 1.31 1.31 …
  $ Data    : Factor w/ 4119 levels "1998-01-05 00:00:00",..: 256 257 258 259 260 261 262 263 264 265 …
  $ RealDate: Date, format: "1999-01-04" "1999-01-05" "1999-01-06" "1999-01-07" …
  $ Year    : num  1999 1999 1999 1999 1999 …
  – attr(*, "na.action")=Class ‘omit’  Named int [1:285] 1 2 3 4 5 6 7 8 9 10 …
  .. ..- attr(*, "names")= chr [1:285] "1" "2" "3" "4" …

Now we want to filter the data based on the column Year, for example, for this year:

install.packages("dplyr")
library("dplyr")

thisYear= as.numeric(format(Sys.Date(),’%Y’))

eurThisYear <-filter(eur,Year == thisYear)

head(eurThisYear,10)

And the result is

      EUR                Data   RealDate Year
1  4.4629 2016-03-09 00:00:00 2016-03-09 2016
2  4.4624 2016-03-11 00:00:00 2016-03-11 2016
3  4.4705 2016-03-15 00:00:00 2016-03-15 2016
4  4.4775 2016-03-16 00:00:00 2016-03-16 2016
5  4.4765 2016-03-17 00:00:00 2016-03-17 2016
6  4.4694 2016-03-22 00:00:00 2016-03-22 2016
7  4.4641 2016-03-23 00:00:00 2016-03-23 2016
8  4.4655 2016-03-24 00:00:00 2016-03-24 2016
9  4.4639 2016-03-25 00:00:00 2016-03-25 2016
10 4.4619 2016-03-28 00:00:00 2016-03-28 2016

Now we want to extract the distinct years from column YEAR from dataframe eur and for each year generate a plot

install.packages(‘foreach’)
library(‘foreach’)

foreach(n = unique(eur$Year)) %do% {
  fileName= paste(n,"BNR_img.jpg",sep="_")
  eurThisYear <-filter(eur,Year == n)
plot(eurThisYear$RealDate, eurThisYear$EUR ,main=paste("EUR in year",n,sep=" "), sub="National Bank of Romania",       xlab="Months", ylab="Value 1 EUR in RON")
  dev.copy(jpeg,filename=fileName)
  dev.off()
}

The results can be found here https://1drv.ms/f/s!Aordxu1LWKDZgf1h_iMeIEbEWhgNtQ 

or here: https://1drv.ms/a/s!Aordxu1LWKDZgf13J4aeGHLEo2sIvQ

Total time spent (without knowing R, but just searching help) : 1 hour.

R is a wonderfull data scripting !

Starting with R – language and data

Part 1: Starting with R – installation : http://msprogrammer.serviciipeweb.ro/2016/10/03/starting-with-r-as-in-sqlserver-r-server-and-vs-package/

Part 2: Starting with R – language and data http://msprogrammer.serviciipeweb.ro/2016/10/10/starting-with-r-language-and-data/

Part 3: Starting with R – basic and statistics http://msprogrammer.serviciipeweb.ro/2016/10/17/starting-with-r-basic-and-statistics/

Part 4: Making diagrams with R ( and transformation of data) : http://msprogrammer.serviciipeweb.ro/2016/10/31/making-diagrams-with-r/

Part 5:  Forecasting currency exchange with R   http://msprogrammer.serviciipeweb.ro/2016/11/06/forecasting-eurbnr-currency-exchange/

The http://tryr.codeschool.com/ also features a gentle introduction to R – if you do not know programming  .

As in other programming languages, you can store values into a variable to access it later. Type x <- 42 to store a value in x.

x <- 42

I have typed :

> x<-7

And site answers by:

`x` should be set to the number `42`. Try again!  # this is the site answer ….

> x<-42 # made by me

( and site goes to next exercise – although you can trick to strings …)

The next  best tutorial ( comprised on six parts) I have found here:

http://www.computerworld.com/article/2497164/business-intelligence/business-intelligence-beginner-s-guide-to-r-get-your-data-into-r.html

( did you know about data() command ?)

Anyway, those are the six parts

After that , if you have a Pluralsight subscription, this course is worth looking : http://www.pluralsight.com/courses/r-understanding-machine-learning  -  it gives you a short introduction to MachineLearning with R

Starting with R (as in SqlServer R server and VS package)

Part 1: Starting with R – installation : http://msprogrammer.serviciipeweb.ro/2016/10/03/starting-with-r-as-in-sqlserver-r-server-and-vs-package/

Part 2: Starting with R – language and data http://msprogrammer.serviciipeweb.ro/2016/10/10/starting-with-r-language-and-data/

Part 3: Starting with R – basic and statistics http://msprogrammer.serviciipeweb.ro/2016/10/17/starting-with-r-basic-and-statistics/

Part 4: Making diagrams with R ( and transformation of data) : http://msprogrammer.serviciipeweb.ro/2016/10/31/making-diagrams-with-r/

Part 5:  Forecasting currency exchange with R   http://msprogrammer.serviciipeweb.ro/2016/11/06/forecasting-eurbnr-currency-exchange/

First, start with installing Sql Server 2016 Developer ( free from https://www.microsoft.com/en-us/cloud-platform/sql-server-editions-developers )

Ensure that R is selected when installing.

Start SSMS ( Sql Server Management Studio) and ensure that R is configured and running ( https://msdn.microsoft.com/en-us/library/mt590884.aspx  or more detailed https://msdn.microsoft.com/en-us/library/mt696069.aspx )

Then install R Client from http://aka.ms/rclient/download ( see https://msdn.microsoft.com/en-us/library/mt696067.aspx ) . It features also an package for Visual Studio

Once you install R Client, start RGUI ( press Windows key and type RGUI ) .

In the window of RCLient, type getwd() ( R is case sensitive – does not work GetWD() )

( The same thing you can do in Visual Studio, from R Tools =>Windows=>R interactive)

Then you can follow the tutorial from https://msdn.microsoft.com/microsoft-r/scaler-getting-started# -  it is ok and easy to  reproduce.