Category: isthistaxilegal

Github as a repository for database (sqlite)

In making http://isthistaxilegal.apphb.com I neede to have a database. I decided to have sqlite, since it needs just a file and the dll’s.( Side Note: The good dll’s for sqlite and .net core are not the original ones, that have a dependency of .NET Framework – but Microsoft.Data.Sqlite ) . But how to deploy this every time?

Simplest way: Verify at start if the file exists, and , if not, download from GitHub

The code is

var url ="https://raw.githubusercontent.com/ignatandrei/IsThisTaxiLegal/master/datas/taxis.sqlite3";
using (var client = new HttpClient())
{

    using (var result = await client.GetAsync(url))
    {
        if (result.IsSuccessStatusCode)
        {
            var bytes=await result.Content.ReadAsByteArrayAsync();
            File.WriteAllBytes("taxis.sqlite3",bytes);
        }

    }
}

And the idea is not so bad  – having the database at github and download every time is a good idea…

Interpreting pdf to database(sqlite)

In making the http://isthistaxilegal.apphb.com I have had the need to parse pdf files, that contains tables inside. A more daunting task is hard. I have investigating the existing software – and the most used one was https://github.com/tabulapdf – however, not even him was closed to achieve . The problem was not transforming the pdf into text. The problem was transforming tables in PDF to csv or something similar. One of the reasons that is difficult is stated here: https://stackoverflow.com/questions/31579922/extracting-tables-lines-and-coordinates-from-pdf-using-pdfsharp-c :

The (my) solution
Starting from a PDF reader (iTextSharp) you need to:
1. read the lines (hopefully only vertical and horizontal lines);
2. join the lines (a line of a table could be several lines, for example one per cell);
3. understand where the tables are (sometimes making some hypothesis based on your needs);
4. optionally find the text outside the tables (better to keep all the text) and insert it in paragraphs;
5. Insert text inside the cells of the table

So I tried the old automation solution –Word. And word can read PDF files. More ,the tables are transferred as real tables! So – the next task to do is to transform Word tables into CSV.Word Macro to the rescue:

Sub ExportTables()
    Dim d As Document
    Set d = ActiveDocument
    MsgBox d.Tables.Count
    Dim t As Table
    Dim r As Row
    Dim iRow As Integer
    Dim iCell As Integer
    Dim str As String
    Dim start As Integer
    start = 1
    For Each t In d.Tables
        
        'str = str + t.ConvertToText(Separator:=wdSeparateByCommas).Text
        Dim nrRows As Integer
        nrRows = t.Rows.Count
        For iRow = start To nrRows
            
            Set r = t.Rows.Item(iRow)
            Dim nrCells As Integer
            nrCells = r.Cells.Count
            For iCell = 1 To nrCells
                str = str & Replace(Replace(Replace(r.Cells.Item(iCell).Range.Text, vbCr, ""), vbLf, ""), Chr(7), "") & "|"
            Next iCell
            
            str = str & vbCrLf
            
        Next iRow
            
        If start = 1 Then start = 2 ' do not read headers for each table
    Next t
    Open d.Path & "\Cluj.csv" For Output As #1
    Print #1, str
    Close #1
End Sub


Then from CSV to sqlite I have used sqlite3.exe with the following code

.open ../taxis.sqlite3
.import ../bucuresti.csv bucuresti
.quit

And the image shows the flow:

License plate number from image

One feature to implement is to have the plate number recognized from image – the user, instead of entering the number, take a picture of the plate and sends to the application. So I decided to do this. Roxana Beloiu pointed me to https://github.com/openalpr/openalpr   –  that can recognize the plate number. I have had some problems with the application in .NET- the latest releases are better than the source code. However, this was done in .NET 4.6 – and my site has .NET Core. First idea: having a mix between those frameworks – no!

Second idea: why not port from .NET 4.6 to .NET Core ? No – too many things that are not yet into .NET Core( e.g. Image)

Third idea: why do not make a serverless function in Azure ?

And this struggle me for 4 hours . The problem? The .NET Dll alpr depends on C libraries and other data – that are copied in the bin folder when you are making a reference from  a console application – but not when you are making a reference from a VS Azure function project.

Solution? Old times to the rescue

Making a console application that receives the file name and send as output the plate number

Uploading this console application and all the necessary dlls with Kudu

Receiving a base64 array in the Azure function, converting to byte, write to file, running the console, delete the file, sending text back.

Not the most advantageous, but hey,. it works.

Making IsThisTaxiLegal–evolving from Proof of Concept

What we need first is automating some tasks – like downloading the authorizations pdf ( and parsing after) from http://www.pmb.ro/adrese_utile/transport_urban/autorizatii_taxi/autorizatii_TAXI.php .

For this I want to use R – because C# I know already and is a good occasion to learn R.

The file is on https://github.com/ignatandrei/IsThisTaxiLegal/blob/master/applications/RDownloadFile/Script.R 

Some ideas from here from a C# developer:

  • Very easy to work with R – as soon as you grasp the concepts
  • Nuget from C# = CRAN from R
  • Does not provide in VS something like nuget, but pacman is perfect

if (!require(“pacman”)) install.packages(“pacman”)
pacman::p_load(‘XML’, “magrittr”, “RCurl”, “rlist”, “rvest”, “pdftools”, “dplyr”, “devtools”)
#if (!require(XML)) install.packages(‘XML’)
#install.packages(“magrittr”)
#install.packages(“RCurl”)
#install.packages(“rlist”)
#install.packages(“rvest”)
library(stringr)
library(magrittr)
library(XML)
library(RCurl)
library(rlist)
library(rvest)

  • R is supports normal programing and  functional programming  and magrittr helps

Code example:

print(head(splitData))  # normal

splitData %>% head(10) %>% print # showing data # function

  • Some WTF ( paste function is concatenate strings  ,  return in function must have () )
  • Many , many WTF when you try to make something in R until you grasp it. ( partially)
  • CRAN have many libraries free – someones requires java on your PC …. C# has a good point here , because he has now Linux + Mac + Windows natively and self contained

You will find code at https://github.com/ignatandrei/IsThisTaxiLegal/blob/master/applications/RDownloadFile/Script.R

Making IsThisTaxiLegal–thinking about Proof of Concept

It can be improved so many ways:

  1. Making OCR ( to recognize the plates number)
  2. Making application Android / iPhone
  3. Making Windows (Store ) application ( not really necessary, but anyway)
  4. Finding if another cities have taxi licenses available
  5. Publicity
  6. Improving with ratings for taxi drivers
  7. Find a lawyer that knows what to do next
  8. Improving the documentation
  9. Passing from csv to sql lite
  10. Downloading automatically the pdf with licences from www.pmb.ro/adrese_utile/transport_urban/autorizatii_taxi/autorizatii_TAXI.php
  11. Making all this points on https://github.com/ignatandrei/IsThisTaxiLegal as issues
  12. Running the test automatically

And it is a small application!

If someone wants to help / or come with new ideas, p[lease leave a message here or on https://github.com/ignatandrei/IsThisTaxiLegal

Making Is This Taxi Legal–Proof of concept

This werer the steps for making a Proof of concept for Is This Taxi Legal application http://isthistaxilegal.apphb.com

Finding PDF about taxi licence for Bucuresti : PDF at http://www.pmb.ro/adrese_utile/transport_urban/autorizatii_taxi/autorizatii_TAXI.php

Thinking about application + web site about this -with a simple search and possible OCR ( take image of the car plate and recognizing number) . Also for international ( GPS for phones to know the city). Also IOT for monitoring the illegal taxi movement( could be also illegal to do this)

The Proof of Concept: A WebAPI + WebSite( that can be accessed also by mobile) to enter a plate number and find if it is legal. Maybe later an Android App.

Step 1 : acquiring data from http://www.pmb.ro/adrese_utile/transport_urban/autorizatii_taxi/autorizatii_TAXI.php

. Download situatia_autorizatiilor_taxi_20171208.PDF

Step 2 : cleaning data – read with Word and transform into CSV.

Trying <table>. ConvertToText – not good, it preserves the return character and data can not be read safely after that.

The solution: read row by row and replace CR/LF with empty for each cell.

After this, problem with Bell character – replace this also the 7 character

After this, problem with repeating headers for each table in the csv

This is the VBA code

( maybe doing same in R : https://datascienceplus.com/extracting-tables-from-pdfs-in-r-using-the-tabulizer-package/ )

Step 3 : Making objects to support

First creating objects to support this . Car, City, LicenseState, Licensee, TaxiAutorization.

Creating test – the City should be unique – even if Bucarest is multiple times, the City object should be the same.

Step 4: Consolidating data from CSV to objects

Creating objects to mimic CSV data . Return to step 2 and put separator | instead of ,

Problem with parsing data

– a date could not exist, so it will be nullable

– could be either dd.mm.yyyy , either d.mm.yyyy either 27.02.202

Separate the lines with errors from the lines without errors . And return the result as a tuple

Making test in order to see the errors

Step 5: Application Web + WebAPI to display data

Making WEBAPI for seeing all and some taxis

See that valid taxis are not parsed correctly ( Validat vs Valid) . Modify test

Discover there are some licenses with state…

Step 6: Deploy sources to GitHub

Easy to do – create and submit at https://github.com/ignatandrei/IsThisTaxiLegal

Step 7: Create an application visible on internet

You can create an account at appharbor.com , integrate with GitHub and deploy there : http://isthistaxilegal.apphb.com

 

Step 8: Document the API

Swagger / Swashbuckle is the easy way to do this. Deployed at http://isthistaxilegal.apphb.com/swagger/

Step 9 : Remake the documentation

Mention all the documentation in all places ( API, GitHub, others)

Mention the contact name if something is wrong

Add API for enums

 

This was a work of 8 hours –  and it is just a proof of concept .