Fertility Rate on European Countries with R

This is the source from https://notebooks.azure.com/ignatandrei/libraries/EuropeStatData – FertitlityRate

Preload packages

Pacman

First install pacman to install once

In [1]:
if (!require("pacman")) install.packages("pacman")
              

Loading required package: pacman
              

Load the necessary packages

In [2]:
pacman::p_load('XML', "magrittr","ggplot2", "RCurl", "rlist", "rvest", "dplyr", "devtools","assertthat","XLConnect","tidyr")
              

Use the functions in the packages

In [3]:
library(stringr)
library(magrittr)
library(XML)
library(RCurl)
library(rlist)
library(ggplot2)
library(rvest)
library(tidyr)
              

Primary data

Download fertility rate for regions / counties

In [4]:
theUrl<-"http://ec.europa.eu/eurostat/web/products-datasets/-/demo_r_frate3"
content <- read_html(theUrl)
urlsDownload <- html_nodes(x = content, xpath = '//a') %>% # find tags a
                 html_attr("href") %>% # find href attribute\n",
                .[grepl(glob2rx("*BulkDownload*"), .)]  # glob2rx  - transform regular expression\n",

#print(urlsDownload)\n",
assert_that(length(urlsDownload) ==1, msg= paste("urlsDownload must have length 1! real length", length(urlsDownload)))
urlFile<-urlsDownload[1]
print(paste("Downloading from ",urlFile))
              

TRUE

[1] "Downloading from  http://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?file=data/demo_r_frate3.tsv.gz"
              

Now download file from {{theUrl}} in order to see it

In [5]:
#print(getwd())
dest<-file.path(getwd(),"file.gz")
download.file(urlFile,dest, mode = "wb", cacheOK = F) #mode binary\n",
assert_that(file.exists(dest))
              

TRUE

Read data

In [6]:
data<-as.data.frame(read.table(gzfile(dest),fill = TRUE, header=T) )
keeps <- c("unit.geo.time", "X2015")
print(head(data))
              

  unit.geo.time X2015 X2014
1         NR,AL  1.67  1.79
2        NR,AL0  1.67  1.76
3         NR,AT  1.50  1.47
4        NR,AT1  1.45  1.43
5       NR,AT11  1.37  1.33
6      NR,AT111  1.39  1.39
              

Tidying the data

Split the first colum

In [7]:
data<- data[keeps] %>%
 separate(unit.geo.time, c("unit", "geo"), ",")
      
print(head(data))
              

  unit   geo X2015
1   NR    AL  1.67
2   NR   AL0  1.67
3   NR    AT  1.50
4   NR   AT1  1.45
5   NR  AT11  1.37
6   NR AT111  1.39
              

Dictionary of metadata

Download dictionary

In [8]:
theUrlNuts<-"http://ec.europa.eu/eurostat/ramon/documents/nuts/NUTS_2013.zip"
destNuts<-file.path(getwd(),"nuts.zip")
download.file(theUrlNuts,destNuts, mode = "wb", cacheOK = F) #mode binary\n"
assert_that(file.exists(destNuts))
              

TRUE

Reading into memory

In [9]:
nuts3<-as.data.frame(readWorksheetFromFile(unzip(destNuts),sheet = 1))
              

Tidy the data

In [10]:
keeps <- c("NUTS.CODE", "NUTS.LABEL","NUTS.LEVEL","COUNTRY.CODE")
nuts3<- nuts3[keeps]
print(head(nuts3))
              

  NUTS.CODE
1        BE
2       BE1
3      BE10
4     BE100
5       BE2
6      BE21
                                                            NUTS.LABEL
1                                              BELGIQUE-BELGI<U+00CB> 
2 R<U+00C9>GION DE BRUXELLES-CAPITALE / BRUSSELS HOOFDSTEDELIJK GEWEST
3 R<U+00E9>gion de Bruxelles-Capitale / Brussels Hoofdstedelijk Gewest
4              Arr. de Bruxelles-Capitale / Arr. van Brussel-Hoofdstad
5                                                        VLAAMS GEWEST
6                                                      Prov. Antwerpen
  NUTS.LEVEL COUNTRY.CODE
1          0           BE
2          1           BE
3          2           BE
4          3           BE
5          1           BE
6          2           BE
              

Merging the data with metadata

In [11]:
allData <- merge(nuts3,data,by.x="NUTS.CODE",by.y="geo")
allData$Val <- with(allData, as.numeric(as.character(X2015)))
print(head(allData))
              

  NUTS.CODE           NUTS.LABEL NUTS.LEVEL COUNTRY.CODE unit X2015  Val
1        AT    <U+00D6>STERREICH          0           AT   NR  1.50 1.50
2       AT1 OST<U+00D6>STERREICH          1           AT   NR  1.45 1.45
3      AT11          Burgenland           2           AT   NR  1.37 1.37
4     AT111     Mittelburgenland          3           AT   NR  1.39 1.39
5     AT112       Nordburgenland          3           AT   NR  1.36 1.36
6     AT113 S<U+00FC>dburgenland          3           AT   NR  1.38 1.38
              

Data for countries

Filtering data

In [12]:
allCountries <- allData %>%
     filter(NUTS.LEVEL==0) %>%
     select(NUTS.CODE,Val) %>%
    as.data.frame

print(allCountries)
              

   NUTS.CODE  Val
1         AT 1.50
2         BE 1.70
3         BG 1.53
4         CY 1.33
5         CZ 1.57
6         DE 1.50
7         DK 1.70
8         EE 1.58
9         EL 1.33
10        ES 1.34
11        FI 1.65
12        FR 1.96
13        HR 1.40
14        HU 1.45
15        IE 1.94
16        IT 1.35
17        LT 1.71
18        LU 1.47
19        LV 1.71
20        MT 1.45
21        NL 1.66
22        PL 1.33
23        PT 1.30
24        RO 1.58
25        SE 1.85
26        SI 1.58
27        SK 1.41
28        UK 1.81
              

Show data in graphic form

In [13]:
sapply(allCountries, mode)
png(filename="allCountries.png", width=600, height=600)
ggplot(allCountries, aes(x=reorder(NUTS.CODE,-Val), y=Val))+geom_point()
dev.off()
              

NUTS.CODE
‘character’
Val
‘numeric’

png: 2

Image for all countries

All Countries

Display data for a chosen country

Chosing the country

In [14]:
countryCode <- "RO"
              

Filtering the data

Filtering data for {{countryCode}}

In [15]:
#head(allData)
Country<-allData %>% 
filter(COUNTRY.CODE==CountryCode ) %>%
filter(NUTS.LEVEL==3) %>%
select(NUTS.LABEL, NUTS.CODE, Val)

Country <- as.data.frame(Country)
#print(Country)
              

Error in filter_impl(.data, quo): Evaluation error: object 'CountryCode' not found.
Traceback:

1. allData %>% filter(COUNTRY.CODE == CountryCode) %>% filter(NUTS.LEVEL == 
 .     3) %>% select(NUTS.LABEL, NUTS.CODE, Val)
2. withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
3. eval(quote(`_fseq`(`_lhs`)), env, env)
4. eval(quote(`_fseq`(`_lhs`)), env, env)
5. `_fseq`(`_lhs`)
6. freduce(value, `_function_list`)
7. function_list[[i]](value)
8. filter(., COUNTRY.CODE == CountryCode)
9. filter.data.frame(., COUNTRY.CODE == CountryCode)
10. as.data.frame(filter(tbl_df(.data), ...))
11. filter(tbl_df(.data), ...)
12. filter.tbl_df(tbl_df(.data), ...)
13. filter_impl(.data, quo)

Show data in graphic form

In [ ]:
#sapply(Country, mode)
file<- paste('Country',CountryCode,'.png', sep='')
print(file)
png(filename=file , width=600, height=600)
ggplot(Country, aes(x=Val, y=reorder(NUTS.LABEL,-Val)))+geom_point()
dev.off()
              

Show graphic about country

alt text

Download large files with R and RCurl

First, read documentation. Try with

#getCurlOptionTypes(opts = getCurlOptionsConstants())
#print (getCurlOptionTypes)
#listCurlOptions()
#getCurlOptionsConstants()
#getCurlOptionTypes(opts = getCurlOptionsConstants())

Then, this is the code ( pretty self explanatory)

downloadFile <-function(url, file){
      curl = getCurlHandle()
curlSetOpt( .opts = list(http.version =HTTP_VERSION_1_0, verbose = TRUE),
                curl = curl)
    
     f = CFILE(file, mode=”wb”)
     a = curlPerform(url = url,curl=curl, writedata = f@ref, noprogress=FALSE)
     close(f)

}

Github as a repository for database (sqlite)

In making http://isthistaxilegal.apphb.com I neede to have a database. I decided to have sqlite, since it needs just a file and the dll’s.( Side Note: The good dll’s for sqlite and .net core are not the original ones, that have a dependency of .NET Framework – but Microsoft.Data.Sqlite ) . But how to deploy this every time?

Simplest way: Verify at start if the file exists, and , if not, download from GitHub

The code is

var url ="https://raw.githubusercontent.com/ignatandrei/IsThisTaxiLegal/master/datas/taxis.sqlite3";
using (var client = new HttpClient())
{

    using (var result = await client.GetAsync(url))
    {
        if (result.IsSuccessStatusCode)
        {
            var bytes=await result.Content.ReadAsByteArrayAsync();
            File.WriteAllBytes("taxis.sqlite3",bytes);
        }

    }
}

And the idea is not so bad  – having the database at github and download every time is a good idea…

Interpreting pdf to database(sqlite)

In making the http://isthistaxilegal.apphb.com I have had the need to parse pdf files, that contains tables inside. A more daunting task is hard. I have investigating the existing software – and the most used one was https://github.com/tabulapdf – however, not even him was closed to achieve . The problem was not transforming the pdf into text. The problem was transforming tables in PDF to csv or something similar. One of the reasons that is difficult is stated here: https://stackoverflow.com/questions/31579922/extracting-tables-lines-and-coordinates-from-pdf-using-pdfsharp-c :

The (my) solution
Starting from a PDF reader (iTextSharp) you need to:
1. read the lines (hopefully only vertical and horizontal lines);
2. join the lines (a line of a table could be several lines, for example one per cell);
3. understand where the tables are (sometimes making some hypothesis based on your needs);
4. optionally find the text outside the tables (better to keep all the text) and insert it in paragraphs;
5. Insert text inside the cells of the table

So I tried the old automation solution –Word. And word can read PDF files. More ,the tables are transferred as real tables! So – the next task to do is to transform Word tables into CSV.Word Macro to the rescue:

Sub ExportTables()
    Dim d As Document
    Set d = ActiveDocument
    MsgBox d.Tables.Count
    Dim t As Table
    Dim r As Row
    Dim iRow As Integer
    Dim iCell As Integer
    Dim str As String
    Dim start As Integer
    start = 1
    For Each t In d.Tables
        
        'str = str + t.ConvertToText(Separator:=wdSeparateByCommas).Text
        Dim nrRows As Integer
        nrRows = t.Rows.Count
        For iRow = start To nrRows
            
            Set r = t.Rows.Item(iRow)
            Dim nrCells As Integer
            nrCells = r.Cells.Count
            For iCell = 1 To nrCells
                str = str & Replace(Replace(Replace(r.Cells.Item(iCell).Range.Text, vbCr, ""), vbLf, ""), Chr(7), "") & "|"
            Next iCell
            
            str = str & vbCrLf
            
        Next iRow
            
        If start = 1 Then start = 2 ' do not read headers for each table
    Next t
    Open d.Path & "\Cluj.csv" For Output As #1
    Print #1, str
    Close #1
End Sub


Then from CSV to sqlite I have used sqlite3.exe with the following code

.open ../taxis.sqlite3
.import ../bucuresti.csv bucuresti
.quit

And the image shows the flow:

License plate number from image

One feature to implement is to have the plate number recognized from image – the user, instead of entering the number, take a picture of the plate and sends to the application. So I decided to do this. Roxana Beloiu pointed me to https://github.com/openalpr/openalpr   –  that can recognize the plate number. I have had some problems with the application in .NET- the latest releases are better than the source code. However, this was done in .NET 4.6 – and my site has .NET Core. First idea: having a mix between those frameworks – no!

Second idea: why not port from .NET 4.6 to .NET Core ? No – too many things that are not yet into .NET Core( e.g. Image)

Third idea: why do not make a serverless function in Azure ?

And this struggle me for 4 hours . The problem? The .NET Dll alpr depends on C libraries and other data – that are copied in the bin folder when you are making a reference from  a console application – but not when you are making a reference from a VS Azure function project.

Solution? Old times to the rescue

Making a console application that receives the file name and send as output the plate number

Uploading this console application and all the necessary dlls with Kudu

Receiving a base64 array in the Azure function, converting to byte, write to file, running the console, delete the file, sending text back.

Not the most advantageous, but hey,. it works.

Andrei Ignat weekly software news(mostly .NET)

* indicates required

Please select all the ways you would like to hear from me:

You can unsubscribe at any time by clicking the link in the footer of our emails. For information about our privacy practices, please visit our website.

We use Mailchimp as our marketing platform. By clicking below to subscribe, you acknowledge that your information will be transferred to Mailchimp for processing. Learn more about Mailchimp's privacy practices here.