Pivot / Transform rows from one column into multiple columns with R in SqlServer
Let’s say we have a table [CurrencyValues] with the following datas
CurrencyID | CurrencyValue | ValueDate |
CAD | 2.9929 | 9/27/2016 0:00 |
EUR | 4.4495 | 9/27/2016 0:00 |
GBP | 5.1205 | 9/27/2016 0:00 |
USD | 3.9567 | 9/27/2016 0:00 |
CAD | 3.0006 | 9/26/2016 0:00 |
EUR | 4.4504 | 9/26/2016 0:00 |
GBP | 5.1213 | 9/26/2016 0:00 |
USD | 3.9589 | 9/26/2016 0:00 |
CAD | 3.0397 | 9/23/2016 0:00 |
EUR | 4.447 | 9/23/2016 0:00 |
We want to transform the CurrencyID into columns( i.e. a column with CAD, other with EUR, other with GBP and so on) .
We can utilize R from SqlServer (https://msdn.microsoft.com/en-us/library/mt604885.aspx )
We have 2 prerequisites;
1. R is running inside SqlServer
2. The library tidyR is downloaded
( see the final of this post to verify)
Then we execute the following script:
1 2 3 4 5 6 7 8 9 | @language = N 'R' , @script = N ' library(tidyr); model<-spread(InputDataSet, CurrencyID, CurrencyValue); OutputDataSet <- model' , @input_data_1 = N 'select CurrencyID, CurrencyValue, ValueDate from CurrencyValues ' WITH RESULT SETS undefined |
The output is this:
(No column name) | (No column name) | (No column name) | (No column name) | (No column name) |
00:00.0 | 3.0397 | 4.447 | NULL | NULL |
00:00.0 | 3.0006 | 4.4504 | 5.1213 | 3.9589 |
00:00.0 | 2.9929 | 4.4495 | 5.1205 | 3.9567 |
The problem is that we have columns without names (spread returns data frame columns names, but SqlServer does not … yet, I hope)
So we add the column names:
01 02 03 04 05 06 07 08 09 10 11 12 | execute sp_execute_external_script @language = N 'R' , @script = N ' library(tidyr); model<-spread(InputDataSet, CurrencyID, CurrencyValue); cols<-paste(colnames(model),collapse = ","); model<-cbind(ColumnNames = cols, model); OutputDataSet <- model' , @input_data_1 = N 'select CurrencyID, CurrencyValue, ValueDate from CurrencyValues ' WITH RESULT SETS undefined |
and the result is slightly better:
(No column name) | (No column name) | (No column name) | (No column name) | (No column name) | (No column name) |
ValueDate,CAD,EUR,GBP,USD | 00:00.0 | 3.0397 | 4.447 | NULL | NULL |
ValueDate,CAD,EUR,GBP,USD | 00:00.0 | 3.0006 | 4.4504 | 5.1213 | 3.9589 |
ValueDate,CAD,EUR,GBP,USD | 00:00.0 | 2.9929 | 4.4495 | 5.1205 | 3.9567 |
What I want? Sql Server should return column names in data frame by default …
And that is all – execute sp_execute_external_script have the possibility of parameters, so you can make general.
End!
——————————————————————-
Note 1: R is running inside SqlServer
Verify
1 2 3 4 5 6 | exec sp_execute_external_script @language =N 'R' , @script=N 'OutputDataSet<-InputDataSet' , @input_data_1 =N 'select 1 as hello' with result sets (([hello] int not null )); go |
If it gives error , then follow https://msdn.microsoft.com/en-us/library/mt696069.aspx – basic is
1 2 | Exec sp_configure 'external scripts enabled' , 1 Reconfigure with override |
and restart SqlServer
Note 2: Installing package tidyr
Follow https://msdn.microsoft.com/en-us/library/mt591989.aspx : basic is starting R with admin rights and executing
lib.SQL <- "C:\\Program Files\\Microsoft SQL Server\\MSSQL13.MSSQLSERVER\\R_SERVICES\\library"
install.packages("tidyr", lib = lib.SQL)
and that will be all ( ok, you may find some additional libraries to install)
Leave a Reply