Pivot / Transform rows from one column into multiple columns with R in SqlServer
Let’s say we have a table [CurrencyValues] with the following datas
CurrencyID | CurrencyValue | ValueDate |
CAD | 2.9929 | 9/27/2016 0:00 |
EUR | 4.4495 | 9/27/2016 0:00 |
GBP | 5.1205 | 9/27/2016 0:00 |
USD | 3.9567 | 9/27/2016 0:00 |
CAD | 3.0006 | 9/26/2016 0:00 |
EUR | 4.4504 | 9/26/2016 0:00 |
GBP | 5.1213 | 9/26/2016 0:00 |
USD | 3.9589 | 9/26/2016 0:00 |
CAD | 3.0397 | 9/23/2016 0:00 |
EUR | 4.447 | 9/23/2016 0:00 |
We want to transform the CurrencyID into columns( i.e. a column with CAD, other with EUR, other with GBP and so on) .
We can utilize R from SqlServer (https://msdn.microsoft.com/en-us/library/mt604885.aspx )
We have 2 prerequisites;
1. R is running inside SqlServer
2. The library tidyR is downloaded
( see the final of this post to verify)
Then we execute the following script:
execute sp_execute_exterhttp://msprogrammer.serviciipeweb.ro/wp-admin/widgets.phpnal_script @language = N'R' , @script = N' library(tidyr); model<-spread(InputDataSet, CurrencyID, CurrencyValue); OutputDataSet <- model' , @input_data_1 = N'select CurrencyID, CurrencyValue, ValueDate from CurrencyValues ' WITH RESULT SETS undefined
The output is this:
(No column name) | (No column name) | (No column name) | (No column name) | (No column name) |
00:00.0 | 3.0397 | 4.447 | NULL | NULL |
00:00.0 | 3.0006 | 4.4504 | 5.1213 | 3.9589 |
00:00.0 | 2.9929 | 4.4495 | 5.1205 | 3.9567 |
The problem is that we have columns without names (spread returns data frame columns names, but SqlServer does not … yet, I hope)
So we add the column names:
execute sp_execute_external_script @language = N'R' , @script = N' library(tidyr); model<-spread(InputDataSet, CurrencyID, CurrencyValue); cols<-paste(colnames(model),collapse = ","); model<-cbind(ColumnNames = cols, model); OutputDataSet <- model' , @input_data_1 = N'select CurrencyID, CurrencyValue, ValueDate from CurrencyValues ' WITH RESULT SETS undefined
and the result is slightly better:
(No column name) | (No column name) | (No column name) | (No column name) | (No column name) | (No column name) |
ValueDate,CAD,EUR,GBP,USD | 00:00.0 | 3.0397 | 4.447 | NULL | NULL |
ValueDate,CAD,EUR,GBP,USD | 00:00.0 | 3.0006 | 4.4504 | 5.1213 | 3.9589 |
ValueDate,CAD,EUR,GBP,USD | 00:00.0 | 2.9929 | 4.4495 | 5.1205 | 3.9567 |
What I want? Sql Server should return column names in data frame by default …
And that is all – execute sp_execute_external_script have the possibility of parameters, so you can make general.
End!
——————————————————————-
Note 1: R is running inside SqlServer
Verify
exec sp_execute_external_script @language =N'R', @script=N'OutputDataSet<-InputDataSet', @input_data_1 =N'select 1 as hello' with result sets (([hello] int not null)); go
If it gives error , then follow https://msdn.microsoft.com/en-us/library/mt696069.aspx – basic is
Exec sp_configure 'external scripts enabled', 1 Reconfigure with override
and restart SqlServer
Note 2: Installing package tidyr
Follow https://msdn.microsoft.com/en-us/library/mt591989.aspx : basic is starting R with admin rights and executing
lib.SQL <- "C:\\Program Files\\Microsoft SQL Server\\MSSQL13.MSSQLSERVER\\R_SERVICES\\library"
install.packages("tidyr", lib = lib.SQL)
and that will be all ( ok, you may find some additional libraries to install)
Leave a Reply