How is started :
We have had a .NET meeting in Romania where one of the presenters load the MaxMind GEO-IP database to see from where the user comes from.
He loads from csv into an List
He was insisting on the speed of the algorithm to find IP range.
And I was thinking : ok,the algorithm speed is one argument.
What if,instead of loading at runtime the csv file and putting into an List
So I have made a .tt file that parses the MaxMind csv file and generates something like
public class Location:List<GeoLocation> { public Location():base() { this.Add(new GeoLocation(1,"O1","","","",0f,0f)); this.Add(new GeoLocation(2,"AP","","","",35f,105f)); this.Add(new GeoLocation(3,"EU","","","",47f,8f)); this.Add(new GeoLocation(4,"AD","","","",42.5f,1.5f)); this.Add(new GeoLocation(5,"AE","","","",24f,54f)); this.Add(new GeoLocation(6,"AF","","","",33f,65f)); this.Add(new GeoLocation(7,"AG","","","",17.05f,-61.8f)); this.Add(new GeoLocation(8,"AI","","","",18.25f,-63.1667f)); this.Add(new GeoLocation(9,"AL","","","",41f,20f)); this.Add(new GeoLocation(10,"AM","","","",40f,45f)); this.Add(new GeoLocation(11,"AN","","","",12.25f,-68.75f)); this.Add(new GeoLocation(12,"AO","","","",-12.5f,18.5f)); this.Add(new GeoLocation(13,"AQ","","","",-90f,0f)); //and so on
The .cs generated file has 30 MB. The compiled exe of a test Console application have some 19 MB.
I have put each code into a console application and the results were:
The Console application that have csv parser loads in 1 second This is the runtime loading the file.
The Console application that have the class with all predefined does not loads after 1 minute – and the memory is increasing. This is the compiletime loading the contents.
And this is the problem – after all,the csv parser loads all those in memory the same – and,more,it’s the hard drive access time that counts. So the compiled one should be faster,right ?
Not so fast . The JIT comes into action . And it compiles the exe. So it takes MORE time.
I submit the question to some list and they come with 2(big) suggestions:
Suggestion 1 : NGEN-ing takes 1 hour on my PC (x64,4 GB of RAM,4 core ) and did not finish.( 4GB used at maximum). Not a good idea apparently.
Suggestion 2 : put struct. Same time…
For you to try please download
http://msprogrammer.serviciipeweb.ro/wp-content/uploads/runcomp.7z
However,the final question arises : from what number of data we should load from runtime insteand of compiletime ?
( For 1 item,the compile time is better. For 2,the same. … For all data in the csv,- runtime is required)
What I expect is a function that takes a parameter ( data that have x bytes long) and says :
loading < 1000 records is faster on compile time rather than from hard disk ( runtime) loading > 2000 records is faster on hard disk ( runtime) rather from compile time
From 1000 to 2000 depends on RAM,RPM and others
( or some algorythm for that)
How we calculate this number ?
( Take note is a pure mathematical question of minimizing the time by using both compile time and run time . Does not matter in practice since the time in runtime is so small)
Leave a Reply