Tuesday, July 19, 2011

Reading Linestrings in CSV

I've had some troubles to find information about reading a CSV file containing polygons or linestrings as the geometry field.So I will explain an example, taking the information from the new Opendata portal from my city, Barcelona. Here you can download all the files used.

As we saw in the latest post, OGR can read a CSV file interpreting the geographic information if you add a VRT file describing the fields. When the geometry information is not a point, the way to do it is a little tricky.

So first, download the data from the city opendata website (they will have an English version soon):
http://w20.bcn.cat/opendata/DonaRecurs.aspx?arbre=general&recurs=TRANSIT_RELACIO_TRAMS&fitxer=3001
This data is an Excel file containing some of the city streets. There is an other file containing the real-time traffic information, identifying stretches with the same id
So the example could be used to generate a map showing the traffic status in real-time.
Then, we save the EXCEL file as a CSV file. We use ; as the field separator, since the geometry of each stretch contains commas, so we can't use them as the field separator.

Now it comes the problem. Nearly never a user will store the data in a WKT format. WKT stands for Well Known Text, and is the format that OGR can understand in CSV files when you don't have simple points.
In our example, the first record is this one:
1;"Diagonal (Ronda de Dalt a Doctor Marañón)";"2.11203535639414,41.3841912394771,0 2.101502862881051,41.3816307921222,0"
They have separated the points in the linestring using spaces, while the coordinates are separated using commas. WKT separates points using commas and coordinates using spaces. The good format will look like that:
 1,Diagonal (Ronda de Dalt a Doctor Marañón),"LINESTRING (2.11203535639414 41.3841912394771,2.101502862881051 41.3816307921222)"
Notice how are used the commas and the quotes. Since the fields and the points are both separated by commas, the quotes are mandatory in the geometry field.
So we have to change the CSV file. The following sample code does it for you:

<?PHP
$separator = ";"; //Put the separator used when exporting to csv here
$fp = fopen("TRANSIT_RELACIO_TRAMS.csv","r");
$fp_out = fopen("TRANSIT_RELACIO_TRAMS_FORMATED.csv","w");
fgets($fp); //Do not process the header!
fwrite($fp_out,"Tram,Descripcio,Coordenades\n");

while ($line = rtrim(str_replace('"','',fgets($fp)))){
  list($id,$name,$geom) = split($separator,$line);
  $geom_points = split(" ",$geom);
  $out_geom = "";
  foreach ($geom_points as $point) {
    list($lon,$lat,$z) = split(",",$point);
    $out_geom.= $lon." ".$lat.",";

  }
  fwrite ($fp_out, $id.','.$name.',"LINESTRING ('.rtrim($out_geom,',').')"');
  fwrite ($fp_out,"\n");
}
fclose($fp);
fclose($fp_out)
?>

Now we have the correct CSV. This step will change in every situation, of course, but I wanted to do the example with real data.

The VRT file, named TRANSIT_RELACIO_TRAMS_FORMATED.VRT will be

    
        TRANSIT_RELACIO_TRAMS_FORMATED.csv
        TRANSIT_RELACIO_TRAMS_FORMATED
        
 Tram
 wkbLineString
        EPSG:4326
    

Notice the wkbLineString as the geometrytype and the Coordenades as the geomertyfield. This is actually the only difference of the common example using points that you will find around there. It is also very important to format properly the CSV file.

Now, check the file by executing
ogrinfo -ro -al TRANSIT_RELACIO_TRAMS_FORMATED.VRT
If everything works properly, the screen will show all the rows detecting the geometry column.
It is also possible to open the file with any software able to use OGR, such as QGIS.
The example will work for any WKT, including poines, polygons, etc.




Sunday, July 10, 2011

Converting an EXCEL file into a PostGIS table

A very typical way to give information from non GIS users is an EXCEL file. But we want to play a little with this data,map it, etc. We will use OGR to read the data and some SQL to write in the correct place.
All the files used in this example are available here

So let's say that we have an EXCEL file called bars.xls with some data fields in some columns and a longitude column and a latitude column. I have taken some OSM data to do the example. 
First, we export the data as a CSV file. We will get something like that:
Starbucks Rambla de Sant Josep 2.12 41.23
The first thing that we have to make sure is that it has a header line. If not, OGR will not be able to read it. The CSV file will have a header line, and look like:

"Name","Type","longitude","latitude"
"McDonald's","Fast Food","2.1695839","41.3853409"
"Burguer King","Fast Food","2.1697341","41.3853329"
"Nuria","Restaurant","2.1698748","41.3851596"
"Jules Verne","Pub","2.1700193","41.385022"
"Cafe Zurich","Cafe","2.1695839","41.3855744"
"Pastafiore","Restaurant","2.170138","41.3848666"

But OGR can't read a file like this. We have to create a VRT file to describe our CSV so the library can decode it. The file in our example will be the following:


    
        bars.csv
        wkbPoint
        EPSG:4326
        
    



We save the file as bars.vrt (or whatever we want, because we can specify the name in the SrcDataSource field)
If we try to execute ogrinfo -ro -al bars.vrt and everything is right, we will se an output like:
So OGR is now recognizing the file, and you can work with it as if it was any other vector file recognized by OGR. You can convert it to a shapefile with ogr2ogr bars.shp bars.vrt, open it with qgis, etc.
Now, we have to upload it to our database using ogr2ogr. Execute
ogr2ogr -f PostgreSQL PG:"host=localhost user=postgres dbname=geoexamples" bars.vrt
This will create a table with the fields in the CSV file and the data contained in it. As you can see in the picture, the coordinate fields appear as in the csv file, but a geometry field and the spatial index are also created.



Usually, you will want to put the data in a predefined structure, to relate it with other tables, to add different sets of data, etc. Let's put an example:

We want to store different types of points of interest in a table named "city_facilities". The type of the POI will be stored in a table named "city_facilities_types". We must be able to make several imports to the same table, from different sources.

Let's create the table to store the type of each point:

CREATE TABLE city_facilities_types 
( 
  idint integer NOT NULL, 
  "name" character varying, 
  CONSTRAINT pk_city_facilities_type_idint PRIMARY KEY (idint) 
) 
WITH ( 
  OIDS=TRUE 
); 
ALTER TABLE city_facilities_types OWNER TO postgres; 

and the table for the points too:

CREATE TABLE city_facilities 
( 
  idint bigint NOT NULL, 
  "name" character varying, 
  "type" integer, 
  geometry geometry, 
  CONSTRAINT pk_city_dacilities_idint PRIMARY KEY (idint), 
  CONSTRAINT fk_city_facilities_type FOREIGN KEY ("type") 
      REFERENCES city_facilities_types (idint) MATCH SIMPLE 
      ON UPDATE NO ACTION ON DELETE NO ACTION 
) 
WITH ( 
  OIDS=TRUE 
); 
ALTER TABLE city_facilities OWNER TO postgres; 

We create the sequence to be able to add the points without caring for the id:

CREATE SEQUENCE seq_city_facilities_idint 
  INCREMENT 1 
  MINVALUE 1 
  MAXVALUE 9223372036854775807 
  START 1 
  CACHE 1; 
ALTER TABLE seq_city_facilities_idint OWNER TO postgres; 

We must add the entryto the "geometry_columns" table, so it's a well done PostGIS table:

INSERT INTO geometry_columns (f_table_catalog,f_table_schema,f_table_name,f_geometry_column,coord_dimension, srid, type) VALUES ('','public','city_facilities','geometry',2,4326,'POINT') 

You could also do the step using the PostGIS command  AddGeometryColumn

Now the final step. We insert the data in the target table from a select in the source table:

INSERT INTO city_facilities 
(idint,name,type,geometry) 
select  
nextval('seq_city_facilities_idint') as idint, 
name as name, 
type::Integer as type, 
wkb_geometry as geometry 
from bars 

Notice that, during the import, the fields are stored as character varying, despite of their actual type. This can be solved either in the VRT file or casting the type as we have done with the type field in the example.


tips:

In the VRT file, the layer name and the file name must be the same, otherwise, it didn't work for me.

Tuesday, July 5, 2011

Hello

It was necessary a new blog?

Well, not really... but I was storing some information in my hard drive, hidden from everybody, even difficult to organize and be used by myself. Reading geospatialpython, which is a blog I recommend to everybody, I though that my "investigations" could be interesting for somebody. At least would be easier to find by me.

Next week, the first post!