GeoExamples: CSV

Showing posts with label CSV. Show all posts

Friday, July 6, 2012

Testing inverse of the distance interpolation with AEMet pressure data

I like testing the things I learn with actual data. Some time ago, I wrote an entry about how to use the inverse of the distance to interpolate data from scattered points. Now, I will show how to use it to draw a sea level pressure map from the data given by the Spanish Meteorology Agency. The Agency has a nice Open Data section that it's worth a visit if you are interested in meteorology as I am.

As usual, I have uploaded all the files you can need to check the entry.

Getting the data
The station data from the AEMet is available in their ftp server.
In the example, an FTP connection to the AEMet server has to be done, and only the sea level pressure is parsed. If the actual pressure is used, the interpolation will be wrong, since the altitude won't be considered. AEMet gives the corrected pressure directly.
The file format is quite easy to understand and parse. It's a compressed CSV file, with the variable names indicated in each value. Not all the stations have pressure data, and the file's directory changes it's name with the date, so the script has to do some checks:

from ftplib import FTP
import gzip

'''
Gets the last sea level pressure values from the AEMet service.
'''
def get_data():
 ftp = FTP('ftpdatos.aemet.es')   # connect to host, default port
 ftp.login()               # user anonymous, passwd anonymous@
 ftp.cwd('datos_observacion/observaciones_diezminutales') #Getting in the 10 minute datadirectory

 files = ftp.nlst()  # list directory contents
 #Getting the newest day directory. It must include the word "diezminutales". The name includes the date, so ordering it by name, the last one is also the newest one.
 directories = []
 for file_name in files:
     if file_name.endswith('diezminutales'):
  directories.append(file_name)

 directories.sort()

 ftp.cwd(directories.pop())
        #Getting the last file, ordered by name.
 files = ftp.nlst()
 data_files = []
 for file_name in files:
     if file_name.endswith('gz'):
  data_files.append(file_name)

 data_files.sort()

 data_file = data_files.pop()
 ftp.retrbinary("RETR " + data_file, open(data_file, 'wb').write)
 
 #The files are compressed with gzip. Getting the contents into a variable.
 f = gzip.open(data_file, 'rb')

 data = {'lon':[],'lat':[],'press':[]}
        #Parsing the file line by line
 for line in f:
     station_values = line.split(',')
     #Some valeus are always in the same position
     lat = float(station_values[2])
     lon = float(station_values[3])
     #Other values must be found using their label name, and not all the stations give pressure data
     press = None
     for var in station_values:
  if ('PRES_nmar=' in var) is True and ('QPRES_nmar=' in var) is False:
      press = float(var.replace('PRES_nmar=',''))
     if press != None:
  data['lon'].append(lon)
  data['lat'].append(lat)
  data['press'].append(press)
 f.close()

 return data

The script defines a function for using it from other files, as we will in the next section.
We will use the latest data, so is necessary to order the directory and file names to guess which file to download. To understand it better, just connect to the ftp and check how is organized.
The data is compressed, so I use gzip to open it.
Finally, only one of the values is used, so it's chosen. Other maps could be done from the same data set, although interpolating some variables, like temperature, this way doesn't make sense.

Viewing the data

To represent the data in the example, we will create a GeoTiff layer for the interpolated raster, and a Shapefile fot the point data. Then, using QGIS, we will draw the map. The coasline has been downloaded from the NOAA coastline extractor.

from get_data import get_data
from invdistgis import invDist
from osgeo import osr
from osgeo import ogr


def writeShape(data,proj,outFile):
    driverName = "ESRI Shapefile"
    drv = ogr.GetDriverByName( driverName )
    dsOut = drv.CreateDataSource( outFile )  
    outLayer = dsOut.CreateLayer('PRESS', None, ogr.wkbPoint)  
    field = ogr.FieldDefn( "PRESS", ogr.OFTReal )
    outLayer.CreateField ( field )
    for i in range(0,len(data['press'])):
        #print str(i) + " - " + str(data['lon'][i])+" "+str(data['lat'][i])
        geometry = ogr.CreateGeometryFromWkt("POINT("+str(data['lon'][i])+" "+str(data['lat'][i])+")")
        feature = ogr.Feature( outLayer.GetLayerDefn())
        feature.SetField( 'PRESS', data['press'][i] )

        feature.SetGeometry(geometry)
        outLayer.CreateFeature(feature)        

    dsOut = None


    
def createMap():
    #Getting the data from the function
    data = get_data()
    driverName="GTiff"
    outFile="press.tiff"
    power=2
    smoothing=5
    xSize=200
    ySize=200
    proj = osr.SpatialReference()
    proj.SetFromUserInput(str("EPSG:4326"))  
    geotransform=[-10.35,0.07185,0,44.28,0,-0.0458]
    writeShape(data,proj,"press.shp")
    invDist(data['lon'],data['lat'],data['press'],geotransform,proj,xSize,ySize,power,smoothing,driverName,outFile)
  
    
if __name__ == "__main__":
    createMap()

I have created a function to draw the shapefile, so it's easier to understand how to do it.
If you run the script and a shapefile already exists, the GDAL will throw an error. I think that is a kind of bug... or feature!
To get the data, the script is using the function described above.
The map creation uses the inverse of the distance function used in the entry that explained how to do it.

The final result looks like that:

You can find the color scale used in the project files with the name colorscale.txt
It could be a good idea to interpolate more pixels so the raster is smoother

Tuesday, July 19, 2011

Reading Linestrings in CSV

I've had some troubles to find information about reading a CSV file containing polygons or linestrings as the geometry field.So I will explain an example, taking the information from the new Opendata portal from my city, Barcelona. Here you can download all the files used.

As we saw in the latest post, OGR can read a CSV file interpreting the geographic information if you add a VRT file describing the fields. When the geometry information is not a point, the way to do it is a little tricky.

So first, download the data from the city opendata website (they will have an English version soon):
http://w20.bcn.cat/opendata/DonaRecurs.aspx?arbre=general&recurs=TRANSIT_RELACIO_TRAMS&fitxer=3001
This data is an Excel file containing some of the city streets. There is an other file containing the real-time traffic information, identifying stretches with the same id.
So the example could be used to generate a map showing the traffic status in real-time.
Then, we save the EXCEL file as a CSV file. We use ; as the field separator, since the geometry of each stretch contains commas, so we can't use them as the field separator.

Now it comes the problem. Nearly never a user will store the data in a WKT format. WKT stands for Well Known Text, and is the format that OGR can understand in CSV files when you don't have simple points.
In our example, the first record is this one:

1;"Diagonal (Ronda de Dalt a Doctor Marañón)";"2.11203535639414,41.3841912394771,0 2.101502862881051,41.3816307921222,0"

They have separated the points in the linestring using spaces, while the coordinates are separated using commas. WKT separates points using commas and coordinates using spaces. The good format will look like that:

1,Diagonal (Ronda de Dalt a Doctor Marañón),"LINESTRING (2.11203535639414 41.3841912394771,2.101502862881051 41.3816307921222)"

Notice how are used the commas and the quotes. Since the fields and the points are both separated by commas, the quotes are mandatory in the geometry field.
So we have to change the CSV file. The following sample code does it for you:

<?PHP
$separator = ";"; //Put the separator used when exporting to csv here
$fp = fopen("TRANSIT_RELACIO_TRAMS.csv","r");
$fp_out = fopen("TRANSIT_RELACIO_TRAMS_FORMATED.csv","w");
fgets($fp); //Do not process the header!
fwrite($fp_out,"Tram,Descripcio,Coordenades\n");

while ($line = rtrim(str_replace('"','',fgets($fp)))){
  list($id,$name,$geom) = split($separator,$line);
  $geom_points = split(" ",$geom);
  $out_geom = "";
  foreach ($geom_points as $point) {
    list($lon,$lat,$z) = split(",",$point);
    $out_geom.= $lon." ".$lat.",";

  }
  fwrite ($fp_out, $id.','.$name.',"LINESTRING ('.rtrim($out_geom,',').')"');
  fwrite ($fp_out,"\n");
}
fclose($fp);
fclose($fp_out)
?>

Now we have the correct CSV. This step will change in every situation, of course, but I wanted to do the example with real data.

The VRT file, named TRANSIT_RELACIO_TRAMS_FORMATED.VRT will be


    
        TRANSIT_RELACIO_TRAMS_FORMATED.csv
        TRANSIT_RELACIO_TRAMS_FORMATED
        
 Tram
 wkbLineString
        EPSG:4326

Notice the wkbLineString as the geometrytype and the Coordenades as the geomertyfield. This is actually the only difference of the common example using points that you will find around there. It is also very important to format properly the CSV file.

Now, check the file by executing

ogrinfo -ro -al TRANSIT_RELACIO_TRAMS_FORMATED.VRT

If everything works properly, the screen will show all the rows detecting the geometry column.
It is also possible to open the file with any software able to use OGR, such as QGIS.
The example will work for any WKT, including poines, polygons, etc.

Sunday, July 10, 2011

Converting an EXCEL file into a PostGIS table

A very typical way to give information from non GIS users is an EXCEL file. But we want to play a little with this data,map it, etc. We will use OGR to read the data and some SQL to write in the correct place.
All the files used in this example are available here

So let's say that we have an EXCEL file called bars.xls with some data fields in some columns and a longitude column and a latitude column. I have taken some OSM data to do the example.
First, we export the data as a CSV file. We will get something like that:
Starbucks Rambla de Sant Josep 2.12 41.23
The first thing that we have to make sure is that it has a header line. If not, OGR will not be able to read it. The CSV file will have a header line, and look like:

"Name","Type","longitude","latitude"
"McDonald's","Fast Food","2.1695839","41.3853409"
"Burguer King","Fast Food","2.1697341","41.3853329"
"Nuria","Restaurant","2.1698748","41.3851596"
"Jules Verne","Pub","2.1700193","41.385022"
"Cafe Zurich","Cafe","2.1695839","41.3855744"
"Pastafiore","Restaurant","2.170138","41.3848666"

But OGR can't read a file like this. We have to create a VRT file to describe our CSV so the library can decode it. The file in our example will be the following:


    
        bars.csv
        wkbPoint
        EPSG:4326

We save the file as bars.vrt (or whatever we want, because we can specify the name in the SrcDataSource field)
If we try to execute ogrinfo -ro -al bars.vrt and everything is right, we will se an output like:
So OGR is now recognizing the file, and you can work with it as if it was any other vector file recognized by OGR. You can convert it to a shapefile with ogr2ogr bars.shp bars.vrt, open it with qgis, etc.
Now, we have to upload it to our database using ogr2ogr. Execute

ogr2ogr -f PostgreSQL PG:"host=localhost user=postgres dbname=geoexamples" bars.vrt

This will create a table with the fields in the CSV file and the data contained in it. As you can see in the picture, the coordinate fields appear as in the csv file, but a geometry field and the spatial index are also created.

Usually, you will want to put the data in a predefined structure, to relate it with other tables, to add different sets of data, etc. Let's put an example:

We want to store different types of points of interest in a table named "city_facilities". The type of the POI will be stored in a table named "city_facilities_types". We must be able to make several imports to the same table, from different sources.

Let's create the table to store the type of each point:

CREATE TABLE city_facilities_types 
( 
  idint integer NOT NULL, 
  "name" character varying, 
  CONSTRAINT pk_city_facilities_type_idint PRIMARY KEY (idint) 
) 
WITH ( 
  OIDS=TRUE 
); 
ALTER TABLE city_facilities_types OWNER TO postgres;

and the table for the points too:

CREATE TABLE city_facilities 
( 
  idint bigint NOT NULL, 
  "name" character varying, 
  "type" integer, 
  geometry geometry, 
  CONSTRAINT pk_city_dacilities_idint PRIMARY KEY (idint), 
  CONSTRAINT fk_city_facilities_type FOREIGN KEY ("type") 
      REFERENCES city_facilities_types (idint) MATCH SIMPLE 
      ON UPDATE NO ACTION ON DELETE NO ACTION 
) 
WITH ( 
  OIDS=TRUE 
); 
ALTER TABLE city_facilities OWNER TO postgres;

We create the sequence to be able to add the points without caring for the id:

CREATE SEQUENCE seq_city_facilities_idint 
  INCREMENT 1 
  MINVALUE 1 
  MAXVALUE 9223372036854775807 
  START 1 
  CACHE 1; 
ALTER TABLE seq_city_facilities_idint OWNER TO postgres;

We must add the entryto the "geometry_columns" table, so it's a well done PostGIS table:

INSERT INTO geometry_columns (f_table_catalog,f_table_schema,f_table_name,f_geometry_column,coord_dimension, srid, type) VALUES ('','public','city_facilities','geometry',2,4326,'POINT')

You could also do the step using the PostGIS command AddGeometryColumn

Now the final step. We insert the data in the target table from a select in the source table:

INSERT INTO city_facilities 
(idint,name,type,geometry) 
select  
nextval('seq_city_facilities_idint') as idint, 
name as name, 
type::Integer as type, 
wkb_geometry as geometry 
from bars

Notice that, during the import, the fields are stored as character varying, despite of their actual type. This can be solved either in the VRT file or casting the type as we have done with the type field in the example.

tips:

In the VRT file, the layer name and the file name must be the same, otherwise, it didn't work for me.