Let’s say we have a table [CurrencyValues] with the following datas
CurrencyID | CurrencyValue | ValueDate |
CAD | 2.9929 | 9/27/2016 0:00 |
EUR | 4.4495 | 9/27/2016 0:00 |
GBP | 5.1205 | 9/27/2016 0:00 |
USD | 3.9567 | 9/27/2016 0:00 |
CAD | 3.0006 | 9/26/2016 0:00 |
EUR | 4.4504 | 9/26/2016 0:00 |
GBP | 5.1213 | 9/26/2016 0:00 |
USD | 3.9589 | 9/26/2016 0:00 |
CAD | 3.0397 | 9/23/2016 0:00 |
EUR | 4.447 | 9/23/2016 0:00 |
We want to transform the CurrencyID into columns( i.e. a column with CAD,other with EUR,other with GBP and so on) .
We can utilize R from SqlServer (https://msdn.microsoft.com/en-us/library/mt604885.aspx )
We have 2 prerequisites;
1. R is running inside SqlServer
2. The library tidyR is downloaded
( see the final of this post to verify)
Then we execute the following script:
execute sp_execute_exterhttp://msprogrammer.serviciipeweb.ro/wp-admin/widgets.phpnal_script @language = N'R' ,@script = N' library(tidyr); model<-spread(InputDataSet,CurrencyID,CurrencyValue); OutputDataSet <- model' ,@input_data_1 = N'select CurrencyID,CurrencyValue,ValueDate from CurrencyValues ' WITH RESULT SETS undefined
The output is this:
(No column name) | (No column name) | (No column name) | (No column name) | (No column name) |
00:00.0 | 3.0397 | 4.447 | NULL | NULL |
00:00.0 | 3.0006 | 4.4504 | 5.1213 | 3.9589 |
00:00.0 | 2.9929 | 4.4495 | 5.1205 | 3.9567 |
The problem is that we have columns without names (spread returns data frame columns names,but SqlServer does not … yet,I hope)
So we add the column names:
execute sp_execute_external_script @language = N'R' ,@script = N' library(tidyr); model<-spread(InputDataSet,CurrencyID,CurrencyValue); cols<-paste(colnames(model),collapse = ","); model<-cbind(ColumnNames = cols,model); OutputDataSet <- model' ,@input_data_1 = N'select CurrencyID,CurrencyValue,ValueDate from CurrencyValues ' WITH RESULT SETS undefined
and the result is slightly better:
(No column name) | (No column name) | (No column name) | (No column name) | (No column name) | (No column name) |
ValueDate,CAD,EUR,GBP,USD | 00:00.0 | 3.0397 | 4.447 | NULL | NULL |
ValueDate,CAD,EUR,GBP,USD | 00:00.0 | 3.0006 | 4.4504 | 5.1213 | 3.9589 |
ValueDate,CAD,EUR,GBP,USD | 00:00.0 | 2.9929 | 4.4495 | 5.1205 | 3.9567 |
What I want? Sql Server should return column names in data frame by default …
And that is all – execute sp_execute_external_script have the possibility of parameters,so you can make general.
End!
——————————————————————-
Note 1: R is running inside SqlServer
Verify
exec sp_execute_external_script @language =N'R', @script=N'OutputDataSet<-InputDataSet', @input_data_1 =N'select 1 as hello' with result sets (([hello] int not null)); go
If it gives error,then follow https://msdn.microsoft.com/en-us/library/mt696069.aspx – basic is
Exec sp_configure 'external scripts enabled',1 Reconfigure with override
and restart SqlServer
Note 2: Installing package tidyr
Follow https://msdn.microsoft.com/en-us/library/mt591989.aspx : basic is starting R with admin rights and executing
lib.SQL <- "C:\\Program Files\\Microsoft SQL Server\\MSSQL13.MSSQLSERVER\\R_SERVICES\\library"
install.packages("tidyr",lib = lib.SQL)
and that will be all ( ok,you may find some additional libraries to install)
Leave a Reply