Sunday, July 10, 2011

Converting an EXCEL file into a PostGIS table

A very typical way to give information from non GIS users is an EXCEL file. But we want to play a little with this data,map it, etc. We will use OGR to read the data and some SQL to write in the correct place.
All the files used in this example are available here

So let's say that we have an EXCEL file called bars.xls with some data fields in some columns and a longitude column and a latitude column. I have taken some OSM data to do the example. 
First, we export the data as a CSV file. We will get something like that:
Starbucks Rambla de Sant Josep 2.12 41.23
The first thing that we have to make sure is that it has a header line. If not, OGR will not be able to read it. The CSV file will have a header line, and look like:

"McDonald's","Fast Food","2.1695839","41.3853409"
"Burguer King","Fast Food","2.1697341","41.3853329"
"Jules Verne","Pub","2.1700193","41.385022"
"Cafe Zurich","Cafe","2.1695839","41.3855744"

But OGR can't read a file like this. We have to create a VRT file to describe our CSV so the library can decode it. The file in our example will be the following:


We save the file as bars.vrt (or whatever we want, because we can specify the name in the SrcDataSource field)
If we try to execute ogrinfo -ro -al bars.vrt and everything is right, we will se an output like:
So OGR is now recognizing the file, and you can work with it as if it was any other vector file recognized by OGR. You can convert it to a shapefile with ogr2ogr bars.shp bars.vrt, open it with qgis, etc.
Now, we have to upload it to our database using ogr2ogr. Execute
ogr2ogr -f PostgreSQL PG:"host=localhost user=postgres dbname=geoexamples" bars.vrt
This will create a table with the fields in the CSV file and the data contained in it. As you can see in the picture, the coordinate fields appear as in the csv file, but a geometry field and the spatial index are also created.

Usually, you will want to put the data in a predefined structure, to relate it with other tables, to add different sets of data, etc. Let's put an example:

We want to store different types of points of interest in a table named "city_facilities". The type of the POI will be stored in a table named "city_facilities_types". We must be able to make several imports to the same table, from different sources.

Let's create the table to store the type of each point:

CREATE TABLE city_facilities_types 
  idint integer NOT NULL, 
  "name" character varying, 
  CONSTRAINT pk_city_facilities_type_idint PRIMARY KEY (idint) 
ALTER TABLE city_facilities_types OWNER TO postgres; 

and the table for the points too:

CREATE TABLE city_facilities 
  idint bigint NOT NULL, 
  "name" character varying, 
  "type" integer, 
  geometry geometry, 
  CONSTRAINT pk_city_dacilities_idint PRIMARY KEY (idint), 
  CONSTRAINT fk_city_facilities_type FOREIGN KEY ("type") 
      REFERENCES city_facilities_types (idint) MATCH SIMPLE 
ALTER TABLE city_facilities OWNER TO postgres; 

We create the sequence to be able to add the points without caring for the id:

CREATE SEQUENCE seq_city_facilities_idint 
  MAXVALUE 9223372036854775807 
  START 1 
  CACHE 1; 
ALTER TABLE seq_city_facilities_idint OWNER TO postgres; 

We must add the entryto the "geometry_columns" table, so it's a well done PostGIS table:

INSERT INTO geometry_columns (f_table_catalog,f_table_schema,f_table_name,f_geometry_column,coord_dimension, srid, type) VALUES ('','public','city_facilities','geometry',2,4326,'POINT') 

You could also do the step using the PostGIS command  AddGeometryColumn

Now the final step. We insert the data in the target table from a select in the source table:

INSERT INTO city_facilities 
nextval('seq_city_facilities_idint') as idint, 
name as name, 
type::Integer as type, 
wkb_geometry as geometry 
from bars 

Notice that, during the import, the fields are stored as character varying, despite of their actual type. This can be solved either in the VRT file or casting the type as we have done with the type field in the example.


In the VRT file, the layer name and the file name must be the same, otherwise, it didn't work for me.

No comments:

Post a Comment