Tuesday, July 19, 2011

Reading Linestrings in CSV

I've had some troubles to find information about reading a CSV file containing polygons or linestrings as the geometry field.So I will explain an example, taking the information from the new Opendata portal from my city, Barcelona. Here you can download all the files used.

As we saw in the latest post, OGR can read a CSV file interpreting the geographic information if you add a VRT file describing the fields. When the geometry information is not a point, the way to do it is a little tricky.

So first, download the data from the city opendata website (they will have an English version soon):
http://w20.bcn.cat/opendata/DonaRecurs.aspx?arbre=general&recurs=TRANSIT_RELACIO_TRAMS&fitxer=3001
This data is an Excel file containing some of the city streets. There is an other file containing the real-time traffic information, identifying stretches with the same id
So the example could be used to generate a map showing the traffic status in real-time.
Then, we save the EXCEL file as a CSV file. We use ; as the field separator, since the geometry of each stretch contains commas, so we can't use them as the field separator.

Now it comes the problem. Nearly never a user will store the data in a WKT format. WKT stands for Well Known Text, and is the format that OGR can understand in CSV files when you don't have simple points.
In our example, the first record is this one:
1;"Diagonal (Ronda de Dalt a Doctor Marañón)";"2.11203535639414,41.3841912394771,0 2.101502862881051,41.3816307921222,0"
They have separated the points in the linestring using spaces, while the coordinates are separated using commas. WKT separates points using commas and coordinates using spaces. The good format will look like that:
 1,Diagonal (Ronda de Dalt a Doctor Marañón),"LINESTRING (2.11203535639414 41.3841912394771,2.101502862881051 41.3816307921222)"
Notice how are used the commas and the quotes. Since the fields and the points are both separated by commas, the quotes are mandatory in the geometry field.
So we have to change the CSV file. The following sample code does it for you:

<?PHP
$separator = ";"; //Put the separator used when exporting to csv here
$fp = fopen("TRANSIT_RELACIO_TRAMS.csv","r");
$fp_out = fopen("TRANSIT_RELACIO_TRAMS_FORMATED.csv","w");
fgets($fp); //Do not process the header!
fwrite($fp_out,"Tram,Descripcio,Coordenades\n");

while ($line = rtrim(str_replace('"','',fgets($fp)))){
  list($id,$name,$geom) = split($separator,$line);
  $geom_points = split(" ",$geom);
  $out_geom = "";
  foreach ($geom_points as $point) {
    list($lon,$lat,$z) = split(",",$point);
    $out_geom.= $lon." ".$lat.",";

  }
  fwrite ($fp_out, $id.','.$name.',"LINESTRING ('.rtrim($out_geom,',').')"');
  fwrite ($fp_out,"\n");
}
fclose($fp);
fclose($fp_out)
?>

Now we have the correct CSV. This step will change in every situation, of course, but I wanted to do the example with real data.

The VRT file, named TRANSIT_RELACIO_TRAMS_FORMATED.VRT will be

    
        TRANSIT_RELACIO_TRAMS_FORMATED.csv
        TRANSIT_RELACIO_TRAMS_FORMATED
        
 Tram
 wkbLineString
        EPSG:4326
    

Notice the wkbLineString as the geometrytype and the Coordenades as the geomertyfield. This is actually the only difference of the common example using points that you will find around there. It is also very important to format properly the CSV file.

Now, check the file by executing
ogrinfo -ro -al TRANSIT_RELACIO_TRAMS_FORMATED.VRT
If everything works properly, the screen will show all the rows detecting the geometry column.
It is also possible to open the file with any software able to use OGR, such as QGIS.
The example will work for any WKT, including poines, polygons, etc.




No comments:

Post a Comment