Monday, December 10, 2012

Raster calculations with GDAL and numpy: calculating GFS wid speed

Performing raster calculations is a frequent need. GDAL python bindings come with a utility program,, that does that, but many times is useful to code them.
In this example, we will be able to calculate the wind speed field from the GFS model, from the given u and v speed.
 The calculated wind field. America is easily seen at the right side of the image

The data

In this example, we will use the GFS global model data, which is available at their web site. Just navigate to a date that pleases you and download a .grb file.
Files are in grib format, with a little problem when trying to view them in QGis: Coordinates are from 0 to 360º in longitude, while the usual thing is to put them from -180º to 180º. Everything will work without changing this, but you can do it after taking a look to this post.

Inside the grib file, there are many bands (hundreds). You can see their definition using the gdal command gdalifo, or the file with the extension *inv at the same place you downloaded the grib file. There are many wind fields, that correspond to different altitudes, all them indicated as u (east) and v components (north). Just take note of the band number of both at the desired level.


The best of using the python gdal bindings, compared from those of the other programs, is that converting a raster file to a NumPy matrix is now very easy. 
NumPy is a python extension that allows to perform matrix calculations in a really easy and efficient way. It should be installed in your computer if GDAL python is installed.
Once you have two matrices A and B in NumPy, adding element by element to a C matrix would be as easy as:
C = A + B
and multiplying:
C = A * B
Note that is not a matrix multiplication (more info here), but this is what we usually need when working with raster images.

The code

The wind speed is the modulus of the two components, so 
spd = sqrt(u^2+v^2)
The "traditional way" to calculate it would be opening two bands and loop all the elements to create the new band. Instead, this can be done:
from osgeo import gdal
from osgeo.gdalnumeric import *
from osgeo.gdalconst import *

fileName = "gfs.tiff"
bandNum1 = 199
bandNum2 = 200

outFile = "out.tiff"

#Open the dataset
ds1 = gdal.Open(fileName, GA_ReadOnly )
band1 = ds1.GetRasterBand(bandNum1)
band2 = ds1.GetRasterBand(bandNum2)

#Read the data into numpy arrays
data1 = BandReadAsArray(band1)
data2 = BandReadAsArray(band2)

#The actual calculation
dataOut = numpy.sqrt(data1*data1+data2*data2)

#Write the out file
driver = gdal.GetDriverByName("GTiff")
dsOut = driver.Create("out.tiff", ds1.RasterXSize, ds1.RasterYSize, 1, band1.DataType)
BandWriteArray(bandOut, dataOut)

#Close the datasets
band1 = None
band2 = None
ds1 = None
bandOut = None
dsOut = None

Note that:
  1. gdalnumeric is used, because it's the easiest way to dump a raster into a NumPy matrix using BandReadAsArray and the inverse function with BandWriteArray.
  2. In this case, we are reading the bands 199 and 200, but you should check which bands are the ones to read in the file you download.
  3. CopyDatasetInfo will copy all the input file metadata into the output file. This includes the geotransform, the projection, the GCPs (there aren't in this case) and the other metadata.
  4. Don't forget to set the bands and datasets to None at the end to free them. In longer scripts this may be critical.

If you want to use the command line function to do the same, you can try with: -A gfs.tiff --A_band 199 -B gfs.tiff --B_band 200 --outfile out.tiff --calc="sqrt(A*A+B*B)"
It gives the same result.

Thursday, November 29, 2012

kartograph tutorial II: going interactive

In my last post I showed how to create a cool SVG map using Kartograph, to be opened and edited with an Inkscape-like program.
The elections were celebrated and CiU won again, but with less votes. I could either update the map or play with the same data. The fun with Kartograph is using the generated SVG to create interactive web pages using Kartograph.js, so this is what I'll do.
The web page will be able to:
  • Generate the map not only for the first position, but also for the others
  • Query every municipality to know which were the results there.
As usual, the code is available to download, and at the working web page too.

This second example has a continuation post: Kartograph tutorial III: Symbols
Changes to the SVG
In the other post, we used the SVG only to draw, but now we need to add data to it, so we can query it.
So change the municipalities layer at the file elections.json from:
       "src": "./mun_out.shp"

       "src": "./mun_out.shp",
       "attributes": {"winner": "Winner","ciu": "CiU","psc":"PSC-PSOE","erc":"ERC","pp":"PP","icv":"ICV-EUiA","cs":"C's","name":"Name"}

This will add the attributes set in the shapefile geometries to the SVG geometries.
To generate the SVG again, just type:
kartograph -o elections.svg elections.json

Viewing the map in a web page

Even though is quite simple, I haven't found any really basic working example, so here it is:

        <script src="jquery-1.8.3.min.js"></script>
        <script src="raphael-min.js"></script>
        <script src="kartograph.min.js"></script>
 <script language="JavaScript">
function loadMap(){
var map ='#map', 600, 0);

map.loadMap('elections.svg', function() {

}, { padding: -30 })
    <body onLoad="loadMap()">

        <div id="map"></div>

  1. Include jquery, raphael and kartograph
  2. The map must be created when the page loads, so the function loadMap() is called at the body tag , using the onLoad event.
  3.'#map', 600, 0) creates the map at the div tag with the id map. X size will be 600 px, and the aspect ratio will be respected, because the y size is 0. When an other y is set, strange things happen with the png borders.
  4. loadMap draws the map from the file elections.svg. Then, executes a function. In this case, the function loads all the layers one by one, using the method addLayer(LayerNameInTheSVG)
  5. padding: -30 removes the strange lines from the vectors in the borders by padding the image. Try without using it to understand what happens.

This renders the SVG without any styling or interaction. But is the basic way to start any Kartograph.js project.

CSS Styling

There are two ways to add styles to the map. We will use both, to show them.
The first one, is to use a separate css file, as in the last post.
Adding the css file is quite tricky, because of the IExplorer compatibility. Now, the loadMap call would look like:

map.loadMap('elections.svg', function() {
 map.loadCSS('elections.css', function() {
So, the only thing that the loadMap method does, is calling the loadCss method, which will do all the stuff.

The css file is also different than the one used before. To style the layer world, the css section will be:
#map svg {
stroke: #fff;
fill: #cccccf;
fill-opacity: .7;

Tip1: Be careful, Chrome doesn't reload your css even though you use ctrl-F5!
Tip2: Do not separate the CSS elements with comas (look at the example). With the python example, this didn't matter.
The other way is using the style method from the layer, as we will see in the final part of the post.

Adding tooltips

If we want the web page to show a tooltip with the result when the mouse is over a municipality, just add a tooltip to the desired layer (municipalities, in this case). The function in loadCSS will be now:
map.loadCSS('elections.css', function() {

$ = 'qtip-light';

 map.getLayer('municipalities').tooltips(function(data) {

return [, 'CiU: <b>' + data.ciu + '</b><br/>PSC-PSOE: <b>'+ data.psc + '</b><br/>ERC: <b>' + data.erc+'</b><br/>PP: <b>'+data.pp+'</b><br/>ICV-EUA: <b>'+data.icv+'</b><br/>C\'s: <b>'+data.cs+'</b>'];

A function is called as a parameter to the tooltips method of the layer municipalities. This function returns what we want to be shown in the tooltip. The data parameter has all the attributes in the SVG geometries. In this case, a list with the results is shown.
The other line in the code just sets the style of the tooltip. The official documentation has more information about the options.
Also, the tooltips styles and javascript must be included in the header, adding:

<script src="jquery.qtip.js"></script>
 <link rel="stylesheet" type="text/css" href="jquery.qtip.css"/>

Setting the colors in the map

Finally, the map will be colored. By default, depending on the winner party, but with a combo box to choose the position.
So first, a function is created to sort the results in every municipality, and set the style with a different color for every party:

function setColors(value){
 map.getLayer('municipalities').style('fill', function(data) {

    var positions = new Array();
    positions[0] = parseInt(data.ciu);
    positions[1] = parseInt(data.psc);
    positions[2] = parseInt(data.erc);
    positions[3] = parseInt(data.pp);
    positions[4] = parseInt(data.icv);
    positions[5] = parseInt(data.cs);

    positions.sort(function(a,b) { return b-a; });

    var bgColor;
    if (positions[value-1] == parseInt(data.ciu)){
        bgColor = "#99edff";
    } else if (positions[value-1] == parseInt(data.psc)){
        bgColor = "#ff9999";
    } else if (positions[value-1] == parseInt(data.erc)){
        bgColor = "#EDE61A";
    } else if (positions[value-1] == parseInt(data.pp)){
        bgColor = "#005aff";
    } else if (positions[value-1] == parseInt(data.icv)){
        bgColor = "#74ff74";
    } else if (positions[value-1] == parseInt(data.cs)){
        bgColor = "#ff7b00";
    return bgColor;
First, we set the style to the layer with the method style. the first argument is the property to change, fill in this case. The second parameter is the value to set. In this case, it will depend of the function that chooses the position. Things to look at:
  • value is the position to show (from 1 to 5)
  • To sort the positions in the array in descending order, use the sort() method, with a function to order the elements as numbers. See why here.
  • Then, check which party corresponds to the number of votes at the desired position, and set the color.
This function will be called at the beginning to show the first position by default:
and a select element is created to select the position and change the colors when the option is changed:
<select onChange="changePos(this)">

<option value="1">1st position</option>

<option value="2">2nd position</option>
<option value="3">3rd position</option>
<option value="4">4th position</option>
<option value="5">5th position</option>
An auxiliar function is created to get the option value:
function changePos(sel){
 var value = sel.options[sel.selectedIndex].value;
That's all!
The best is to download the code or look at the working web.
Of course, some styling is still needed to do a deliverable web page, but the main idea is already done.

Saturday, November 24, 2012

Kartograph tutorial: Electoral map

Tomorrow I will be all the day at the polling station, since I've been chosen as a member.
The last two weeks I was playing with the amazing Kartograph software, so it was a good moment to experiment with electoral maps (the first time for me).

 In this example, I will explain step by step how to create the map above. It's quite similar to this tutorial, but I want to continue in a new post, going interactive.
Kartograph creates vector SVG images, which you can edit later with Inkscape or Illustrator, so gives much more flexibility than systems generating PNG like files, much more difficult to modify.
As in all the posts, you can get all the files used in the example.
This example has two continuation posts: 

Getting the data

As usual, getting the data is not so easy. The Catalan government has a good open data web site, where I found:
  • A file with a really descriptive name,, with all the administrative boundaries (provinces, comarques, and municipalities).
  • Lots of files with election results. I choose the 2010 elections, since they where to the Catalan parliament, like the ones tomorrow. As you can see on the map, the party CiU won with a very big majority, so the map is not as interesting as it could be.
I have used the municipalities to draw the map because the result is more diverse than using bigger zones. Actually, the real  constituency is the province, but CiU won everywhere, and a plain blue map is quite boring.
So I've had to join the two files to get one file with the geometries and the results. The process is quite long and dirty (why didn't they use an id? I had to join with the names), so I won't explain how to do it, but put the result at the data files. You can find this file here.

Then, to decorate the map, I used the following files
  • World boundaries from Natural Earth (, to draw the coast line outside Catalonia
  • From VMAP, the layers Trees, Crops, and DepthContours, to decorate the map outside the electoral  constituencies.
Since the layers are worldwide, so very big, I have used these ogr commands to clip:
 ogr2ogr -clipsrc -3 37 4 44 Trees2.shp Trees.shp
and to simplify:
 ogr2ogr -simplify 10 munis.shp bm50mv33sh1fpm1r170.shp
Doing so, the time to generate the map is divided by five or more.

Installing Kartograph

Since we only need for this tutorial,  first, download it from the github page clicking at the zip icon.
In a linux system, uncompress and execute 
python setup install
as a super user.
That's all, if you have the GDAL python bindings installed.

Creating the map

To create a map with Kartograph, you will need a configuration file in JSON format, which will have three basic sections:


To set the projection, there used to be a web page named Visual map configurator, that doesn't work any more. But don't worry, you can use the Map Projections page. Just choose the projection that fits you more, change the parameters and click the gear icon:
A dialog will open, and the lines that are interesting in this case are, in the image example, like:
         "proj": "sinusoidal",
        "lon0": 20
This will be translated in our json file as:
     "proj": {
            "id": "sinusoidal",
           "lon0": 20


The part of the world we want to represent is set here. It's quite well explained at the documentation, but it can be a bit confusing, and not all the options work with all the projections.
In our example, I have used:
  "bounds": {
    "mode": "bbox",
    "data": [-0, 40, 4, 43],
    "crop": [-3, 37, 5, 44]

  • mode: How the bounds are expressed. BBOX is the basic option, but you can also set it defining the points you want to enter in the map, or even the features in a layer. If the layers are in different projections, other modes can be a little tricky.
  • data: In our case, the bounding box. In other modes, the layer name, the points, or whatever.
  • crop: Is an optional label. Our svg will be clipped at the bounds set at data, but all the data in the files will be processed. If the files include all the world, this takes a long time, and generates much bigger SVG outputs. With crop, only the features inside the BBOX will be included.


As the name suggests, the layers to include. 
The shapefiles are added as:
       "src": "./mun_out.shp"
There are also two special layer, graticule and sea. The first draws the meridians ans parallels, while the second does nothing more than giving a feature to draw the background:
   "background": {"special": "sea"},
   "graticule":{ "special": "graticule", "latitudes": 1, "longitudes": 1}

All the layers  will be drawn in the order indicated at the json file, so this must be well chosen to select which layer hides what.


This is the nice part. Without styling, the SVG can be used directly with Inkscape or Kartograph.js, but is possible to generate styled maps directly with
You can give the style either in the json file or in a separate css file, which seems cleaner. The names given to the layer are the ones to be used in the css as the id. So to give a style to the municipalities layer, add
#municipalities {
 fill: #FFF;
 stroke: #882222;
 stroke-width: 0.5px;
 stroke-opacity: 0.4;
The general options are at the documentation again. CSS for SVG is a little different from the one used in traditional html.
Since we want to paint the municipalities in a different color depending of the party who won the elections, we will use filters, like this one:
 fill: #99edff;

It would be nice to compare different fields i.e. CiU > PSOE, but this is not possible (at least, I haven't found how to do it), so I had to calculate the winner and put it in a field (called Winner, as you can see in the example)


There are two options to draw the map. A command line program is installed with the setup, called kartograph. 
To draw the styled map, just type
   kartograph elections.json --style elections.css -o elections.svg
But you can also include all this in a python program, so could generate the data and then the map. In our case, the code would be
from kartograph import Kartograph
from kartograph.options import read_map_descriptor
import sys
K = Kartograph()
css = open("elections.css").read()
cfg = read_map_descriptor(open("elections.json"))
K.generate(cfg, outfile='elections.svg', format='svg', stylesheet=css) 
Finally, I edited the svg file with Inkscape to put the titles and legend. Is just to show that the idea is generating a base svg and from there, draw the pretty final map.

Configuration files

To draw the map in the example, I have used the following files:
"proj": {
        "id": "sinusoidal",
        "lon0": 20
   "layers": {
   "background": {"special": "sea"},
   "graticule":{ "special": "graticule", "latitudes": 1, "longitudes": 1, "styles": { "stroke-width": "0.3px" } },
       "src": "data/ne_10m_admin_0_countries2.shp"
      "src": "data/Trees2.shp",
      "simplify": true
      "src": "data/Crops2.shp",
      "simplify": true
   "depth": {
       "src": "data/DepthContours2.shp",
       "simplify": true
       "src": "./mun_out.shp"
  "bounds": {
    "mode": "bbox",
    "data": [-0, 40, 4, 43],
    "crop": [-3, 37, 5, 44]

#background {
 fill: #e8f9fb;
 stroke: none;
#world {
 fill: #f5f3f2;
 stroke: none;
#graticule {
 stroke-width: 0.3px;
#municipalities {
 fill: #FFF;
 stroke: #882222;
 stroke-width: 0.5px;
 stroke-opacity: 0.4;
#municipalities-label {
 font-family: Arial;
 font-size: 13px;
 fill: #99edff;
 fill: #ff9999;
 fill: #EDE61A;
#depth {
 stroke: #223366;
 stroke-width: 0.5px;
 stroke-opacity: 0.4;
#trees {
  fill: #d2f8c0;
  stroke: none;
#crops {
  fill: #fcf8d8;
  stroke: none;

What's next

If I have time, I'll try my first Kartograph.js example. From the svg generated, is possible to create cool interactive maps.

Saturday, November 10, 2012

Fast tip: Changing the geotransform with GDAL python

Preparing the next post, I have found a file with a wrong geotransform, but not an easy tool to do it.
Coding it is as easy as opening the datasource with the update option and setting the new geotransform as follows:
ds = gdal.Open( fileIn, GA_Update )
ds = None
Where the geotransform must be a tuple like (-180.5, 1.0, 0.0, 90.5, 0.0, -1.0). Take a look to the documentation for more information.

I have created a small program to make it even easier, called
from osgeo import gdal
from osgeo.gdalconst import *
import sys

def changeGeotransform(fileIn, geotransform):
    Takes a dataset, and changes its original geotransform to an arbitrary geotransform
    ds = gdal.Open( fileIn, GA_Update )
    #Exit if there is an error opening the dataset. GDAL will output the error message
    if ds is None: 
    ds = None

def helpText():
    print "To change the file Geotransform, the command is python changeGeotransform  , where the new geotransform must be like '(left_value,delta_x,rotation_x,top_value,rotation_y,delta_y)'"
def geotransformError(inputGeotransform):
    print "Error reading the geotransform: " + inputGeotransform + " it must be like (-180.5, 1.0, 0.0, 90.5, 0.0, -1.0)"

if __name__ == "__main__":
    Program mainline
    # Parse command line arguments.
    argv = gdal.GeneralCmdLineProcessor( sys.argv )
    if len(argv) != 3:
    fileIn = argv[1]
        geotransform = eval(argv[2])
    except Exception, ex:
    if type(geotransform) != type(()):
    changeGeotransform(fileIn, geotransform)
To execute, just type:
python changeGeotransform file_name new_geotransform
 where the new geotransform must be like:
The program basically tests if the inputs are correct and executes the code I have commented before.
Of course, only the files with drivers able to write the format can be changed. Type  gdalinfo --formats to know which drivers do you have installed.

Tuesday, October 9, 2012

IRIS driver for GDAL

IRIS is a software by VAISALA (formerly by Sigmet) to manage and visualize meteorological radar data. It has its own data format, which is pretty well documented at their web page.

Since I've worked for a long time at the SMC, I've had to work with these kind of stuff many times, developing scripts and programs to change its format so it was possible to draw the radar data.

Besides, I started using GDAL quite a lot, and wondered why it wasn't possible to open IRIS data files directly. Since nobody did a driver for it, I decide to do it myself after reading the GDAL driver tutorial, which seemed easier than what it actually was to me.

But finally I've managed to create the driver. When compiled into the GDAL distribution, the driver allows to change IRIS data format to GTiff or any other using gdal_translate, opening the file using QGIS, serving the data with Mapserver, etc. So your life will become much easier if you have to work with IRIS data files, without having to code the format translation yourself.

This first version comes with a CAPPI example file to test it,  and supports the following data types:
  • PPI (reflectivity and speed): Plan position indicator
  • CAPPI: Constant Altitude Plan position indicator
  • RAIN1: Hourly rainfall accumulation
  • RAINN: N-Hour rainfall accumulation
  • TOPS: Height for selectable dBZ contour
  • VIL: Vertically integrated liquid for selected layer
  • MAX: Column Max Z WF W/NS Sections

I've submitted the driver at the GDAL trac so they can include it to the software if they find it convenient: and commented it at their mailing list.
 But if you are impatient (or the driver isn't finally included), it is possible to add it to your GDAL after re-compiling it. The instructions:
  • Download the latest GDAL source and the driver source code.
  • Copy the iris folder into the frmts folder in your GDAL distribution.
  • Then following the steps at the GDAL driver tutorial:
    •  Edit the file gcore/gdal_frmts.h  and add the driver at the end of the list, which will look like:
      void CPL_DLL GDALRegister_IRIS(void);
    • Edit the file frmts/gdalallregister.cpp and register the driver adding a piece of code:
      #ifdef FRMT_iris
    • Edit the files GDALmake.opt and and look for the line containing GDAL_FORMATS =, adding the word iris to the formats list in the variable.
    • Edit frmts/ and change the EXTRAFLAGS = list adding
  •  Once all this is done, just compile GDAL as usual:
    ./configure (add the flags you need here, IRIS format does not need any extra flag)
    make install
 And that's all. As you can see, it would be much easier if the driver was included in the GDAL distribution!
If you have any comment or problem, please write it in the comments.
The latest SMC radar image processed from the IRIS data format

Friday, July 6, 2012

Testing inverse of the distance interpolation with AEMet pressure data

I like testing the things I learn with actual data. Some time ago, I wrote an entry about how to use the inverse of the distance to interpolate data from scattered points. Now, I will show how to use it to draw a sea level pressure map from the data given by the Spanish Meteorology Agency. The Agency has a nice Open Data section that it's worth a visit if you are interested in meteorology as I am.

As usual, I have uploaded all the files you can need to check the entry.
Getting the data
The station data from the AEMet is available in their ftp server. 
In the example, an FTP connection to the AEMet server has to be done, and only the sea level pressure is parsed. If the actual pressure is used, the interpolation will be wrong, since the altitude won't be considered. AEMet gives the corrected pressure directly.
The file format is quite easy to understand and parse. It's a compressed CSV file, with the variable names indicated in each value. Not all the stations have pressure data, and the file's directory changes it's name with the date, so the script has to do some checks:

from ftplib import FTP
import gzip

Gets the last sea level pressure values from the AEMet service.
def get_data():
 ftp = FTP('')   # connect to host, default port
 ftp.login()               # user anonymous, passwd anonymous@
 ftp.cwd('datos_observacion/observaciones_diezminutales') #Getting in the 10 minute datadirectory

 files = ftp.nlst()  # list directory contents
 #Getting the newest day directory. It must include the word "diezminutales". The name includes the date, so ordering it by name, the last one is also the newest one.
 directories = []
 for file_name in files:
     if file_name.endswith('diezminutales'):


        #Getting the last file, ordered by name.
 files = ftp.nlst()
 data_files = []
 for file_name in files:
     if file_name.endswith('gz'):


 data_file = data_files.pop()
 ftp.retrbinary("RETR " + data_file, open(data_file, 'wb').write)
 #The files are compressed with gzip. Getting the contents into a variable.
 f =, 'rb')

 data = {'lon':[],'lat':[],'press':[]}
        #Parsing the file line by line
 for line in f:
     station_values = line.split(',')
     #Some valeus are always in the same position
     lat = float(station_values[2])
     lon = float(station_values[3])
     #Other values must be found using their label name, and not all the stations give pressure data
     press = None
     for var in station_values:
  if ('PRES_nmar=' in var) is True and ('QPRES_nmar=' in var) is False:
      press = float(var.replace('PRES_nmar=',''))
     if press != None:

 return data 
  • The script defines a function for using it from other files, as we will in the next section.
  • We will use the latest data, so is necessary to order the directory and file names to guess which file to download. To understand it better, just connect to the ftp and check how is organized. 
  • The data is compressed, so I use gzip to open it.
  • Finally, only one of the values is used, so it's chosen. Other maps could be done from the same data set, although interpolating some variables, like temperature,  this way doesn't make sense.

Viewing the data 

To represent the data in the example, we will create a GeoTiff layer for the interpolated raster, and a Shapefile fot the point data. Then, using QGIS, we will draw the map. The coasline has been downloaded from the NOAA coastline extractor.
from get_data import get_data
from invdistgis import invDist
from osgeo import osr
from osgeo import ogr

def writeShape(data,proj,outFile):
    driverName = "ESRI Shapefile"
    drv = ogr.GetDriverByName( driverName )
    dsOut = drv.CreateDataSource( outFile )  
    outLayer = dsOut.CreateLayer('PRESS', None, ogr.wkbPoint)  
    field = ogr.FieldDefn( "PRESS", ogr.OFTReal )
    outLayer.CreateField ( field )
    for i in range(0,len(data['press'])):
        #print str(i) + " - " + str(data['lon'][i])+" "+str(data['lat'][i])
        geometry = ogr.CreateGeometryFromWkt("POINT("+str(data['lon'][i])+" "+str(data['lat'][i])+")")
        feature = ogr.Feature( outLayer.GetLayerDefn())
        feature.SetField( 'PRESS', data['press'][i] )


    dsOut = None

def createMap():
    #Getting the data from the function
    data = get_data()
    proj = osr.SpatialReference()
if __name__ == "__main__":
  • I have created a function to draw the shapefile, so it's easier to understand how to do it.
  • If you run the script and a shapefile already exists, the GDAL will throw an error. I think that is a kind of bug... or feature!
  • To get the data, the script is using the function described above.
  • The map creation uses the inverse of the distance function used in the entry that explained how to do it.
The final result looks like that:

Thursday, June 14, 2012

Density maps using GDAL/OGR python

Having a cloud of points is quite common. This tutorial shows how to calculate the density of points in a set of polygons.
As always, you can download all the data and scripts.

The data

Seeking some data for the tutorial, I found  the  position of all the tornado touchdowns in the period 1950-2004. Calculating the "tornado density" in the US seemed interesting enough.
You can find the data at the site.

The density can be calculated in a regular grid or a set of polygons. We will do both things, using the US counties as the polygons, which I found at

So the downloaded data looks like that when opened in QGIS:
It's easy to see that most of the tornadoes occur in the east half of the USA, but it's still difficult to see actual density differences.

Tornado density in every US county 

First, a script to assign to every county a "tornado touchdown density" value. What the script will do is basically:
  1. Open the input datasets: The tornado points file, and the counties file
  2. Create the output file, and the fields we will use (name and state name, plus the density field)
  3. For each county, we use the method Within for all the points representing the tornadoes. 
  4. We calculate the density. Since the file is in latlon projection, the GetArea method would be in square degrees, which is incorrect when comparing, because the actual area would be much smaller in northern regions.
    So we re project the county geometry to the EPSG:900913 projection, which is in meters. I think that is actually incorrect, that the UTM would be better, but the zone changes too much and it's out of the scope of the entry.
    The density is simply the number of tornadoes divided by the county area.
  5. We write the output file.
So the code looks like:
from osgeo import ogr
from osgeo import osr
import sys

def density(pointsFile,polygonsFile,outFile):
    #Opening the input files
    dsPolygons = ogr.Open(polygonsFile)
    if dsPolygons is None:
       raise Exception('Could not open ' + pointsFile)
    dsPoints = ogr.Open(pointsFile)
    if dsPoints is None:
       raise Exception('Could not open ' + pointsFile)
    polygonsLayer = dsPolygons.GetLayer()
    pointsLayer = dsPoints.GetLayer()

    #Creating the output file

    driverName = "ESRI Shapefile"
    drv = ogr.GetDriverByName( driverName )
    dsOut = drv.CreateDataSource( outFile )  
    outLayer = dsOut.CreateLayer('DENSITY', None, ogr.wkbPolygon)  
    field = ogr.FieldDefn( "Density", ogr.OFTReal )
    outLayer.CreateField ( field )
    field = ogr.FieldDefn( "Name", ogr.OFTString )
    outLayer.CreateField ( field )
    field = ogr.FieldDefn( "State_name", ogr.OFTString )
    outLayer.CreateField ( field )
    #Preparing the coordinate transformation
    srsQuery = osr.SpatialReference()
    srsArea = osr.SpatialReference()

    transf = osr.CoordinateTransformation(srsQuery,srsArea)

    #Iterating all the polygons
    polygonFeature = polygonsLayer.GetNextFeature()
    while polygonFeature:
        k = k + 1
        print  "processing " + polygonFeature.GetField("NAME") + " ("+polygonFeature.GetField("STATE_NAME")+") - " + str(k) + " of " + str(polygonsLayer.GetFeatureCount())
        geometry = polygonFeature.GetGeometryRef()
        numCounts = 0.0
        pointFeature = pointsLayer.GetNextFeature()
        #Iterating all the points
        while pointFeature:
            if pointFeature.GetGeometryRef().Within(geometry):
                numCounts = numCounts + 1
            pointFeature = pointsLayer.GetNextFeature()
        #Calculating the actual area
        geometryArea = geometry.Clone()
        polygonArea = geometryArea.GetArea()/(1000000.0)
        density = numCounts/polygonArea

        #Writting the fields in the output layer
        feature = ogr.Feature( outLayer.GetLayerDefn())
        feature.SetField( "Density", density )
        feature.SetField( "Name", polygonFeature.GetField("NAME") )
        feature.SetField( "State_name", polygonFeature.GetField("STATE_NAME") )


        polygonFeature = polygonsLayer.GetNextFeature()

    dsOut = None
    dsPolygons = None
    dsPoints = None

if __name__ == "__main__":
    points = sys.argv[1]
    polygons = sys.argv[2]
    outfile = sys.argv[3]


To run the script, just type:
python "data/tornadx020.shp" "data/UScounties.shp" "out.shp"

Note that the script isn't for any general case. Different input projections should be handled, and the input fields should be different according to the file. Besides, no geometry checks are done.

The result, opening and colouring the file with QGIS looks like that:
 Now is easy to see that the central USA and Florida have the highest tornado density.

Tornado density in a grid

Sometimes we don't have a particular set of polygons of interest, but a raster where every pixel has the density value is an interesting data.
In this case, the output is created using GDAL instead of OGR, and is a regular grid defined by the user. But the steps are very similar.
As in the example before, the areas in each pixel are in square degrees, and this must be changed to square kilometers.
The code is:
from osgeo import ogr
from osgeo import osr
from osgeo import gdal
from struct import pack
import sys

def density(pointsFile,outFile,x0,y0,x1,y1,samples,lines):
    deltax = (x1 - x0) / samples  
    deltay = (y1 - y0) / lines

    dsPoints = ogr.Open(pointsFile)
    if dsPoints is None:
       raise Exception('Could not open ' + pointsFile)
    pointsLayer = dsPoints.GetLayer()

    #Creating the output file

    driverName = "ESRI Shapefile"
    drv = ogr.GetDriverByName( driverName )
    srsQuery = osr.SpatialReference()
    srsArea = osr.SpatialReference()

    transf = osr.CoordinateTransformation(srsQuery,srsArea)

    driverName = gdal.GetDriverByName( 'GTiff' )
    dsOut = driverName.Create( outFile, samples, lines, 1, gdal.GDT_Float32)

    #Iterating all the pixels to write the raster values
    dataString = ''
    for j in range(0,lines):
      for i in range(0,samples):
 print "sample " + str(i) + " line " + str(j)
        px = x0 + deltax * i
        py = y0 + deltay * j
        #The pixel geometry must be created to execute the within method, and to calculate the actual area
        wkt = "POLYGON(("+str(px)+" "+str(py)+","+str(px + deltax)+" "+str(py)+","+str(px + deltax)+" "+str(py+deltay)+","+str(px)+" "+str(py+deltay)+","+str(px)+" "+str(py)+"))"
        geometry = ogr.CreateGeometryFromWkt(wkt)
        numCounts = 0.0
        pointFeature = pointsLayer.GetNextFeature()
        #Iterating all the points
        while pointFeature:
            if pointFeature.GetGeometryRef().Within(geometry):
                numCounts = numCounts + 1
            pointFeature = pointsLayer.GetNextFeature()
        geometryArea = geometry.Clone()
        polygonArea = geometryArea.GetArea()/(1000000.0)
        density = numCounts/polygonArea
        dataString = dataString + pack('f',density)
    #Writting the raster
    dsOut.GetRasterBand(1).WriteRaster( 0, 0, samples, lines, dataString ) 

    dsOut = None
    dsPoints = None
if __name__ == "__main__":
    points = sys.argv[1]
    outfile = sys.argv[2]


To run the script, type:

python "data/tornadx020.shp" "grid.tiff" -127.0 51.0 -65.0 24.0 50 50

Where the numbers are the coordinates and the size of the output raster as explained in the code.

Again, the script lasts quite a lot to run.

The result seen with QGIS looks like that:
Is similar to the counties map, because both pixels and counties have similar sizes. The result may be easier to handle in this way.

Final thoughts

In the example data, executing the script takes a long time, since 53000 points are given, and there are about 3500 counties. Using PostGIS or so would be faster, but the example doesn't require more installations than GDAL and python.

It would be nice improve the code, creating all the functionalities in a class, checking the input values, accepting different projections, etc.

Remember that all the example is available in a zip file.

Saturday, May 26, 2012

Creating a grid from scattered data using inverse of the distance with python (gdal_grid approach)

OK, I have to admit that I was so happy when I found the  scipy rbf function that I went too fast writing the entry about inverse of the distance.
I swear that the images are real (you can check it yourself), but when you change a few parameters or the point position, the result is really strange, definitely not the inverse of the distance algorithm I was thinking about.
Looking a little more around the gdal documentation, I found this explanation, which is very easy and clear:
The problem is that, at least in my python gdal version, is not possible to call the function directly, so I've been coding a little to get it.
All the example code and files can be downloaded in a zip file.

First version: the algorithm

Putting all the code directly makes things difficult to understand, so first,  I'll reproduce the same program I wanted to show in the entrance, using matplotlib to show the results:

from math import pow
from math import sqrt
import numpy as np
import matplotlib.pyplot as plt

def pointValue(x,y,power,smoothing,xv,yv,values):
    for i in range(0,len(values)):
        dist = sqrt((x-xv[i])*(x-xv[i])+(y-yv[i])*(y-yv[i])+smoothing*smoothing);
        #If the point is really close to one of the data points, return the data point value to avoid singularities
            return values[i]
    #Return NODATA if the denominator is zero
    if denominator > 0:
        value = nominator/denominator
        value = -9999
    return value

def invDist(xv,yv,values,xsize=100,ysize=100,power=2,smoothing=0):
    valuesGrid = np.zeros((ysize,xsize))
    for x in range(0,xsize):
        for y in range(0,ysize):
            valuesGrid[y][x] = pointValue(x,y,power,smoothing,xv,yv,values)
    return valuesGrid

if __name__ == "__main__":

    #Creating some data, with each coodinate and the values stored in separated lists
    xv = [10,60,40,70,10,50,20,70,30,60]
    yv = [10,20,30,30,40,50,60,70,80,90]
    values = [1,2,2,3,4,6,7,7,8,10]
    #Creating the output grid (100x100, in the example)
    ti = np.linspace(0, 100, 100)
    XI, YI = np.meshgrid(ti, ti)

    #Creating the interpolation function and populating the output matrix value
    ZI = invDist(xv,yv,values,100,100,power,smoothing)

    # Plotting the result
    n = plt.normalize(0.0, 100.0)
    plt.subplot(1, 1, 1)
    plt.pcolor(XI, YI, ZI)
    plt.scatter(xv, yv, 100, values)
    plt.title('Inv dist interpolation - power: ' + str(power) + ' smoothing: ' + str(smoothing))
    plt.xlim(0, 100)
    plt.ylim(0, 100)
Let's explain it step by step:
  1. The main method defines some sample data, calls the interpolation function and draws it using matplotlib. Just copy&paste, since it's long to explain.
  2. invDist function: For each point in the coordinates matrix, a value needs to be calculated. First, a xSize*ySize zeroes values is calculated. Then for each point coordinate (the pixel position, in this simple case), a function that calculates the value is called.
  3. This is the important part: We apply the formula: $V=\frac{\sum\limits_{i=1}^n \frac{v_{i}}{d_{i}^{p}}}{\sum\limits_{i=1}^n \frac{1}{d_{i}^{p}}}$ for each pixel.
    • V is the interpolated value. 
    • n is the number of points
    • d is the distance from the point to the pixel
    • v is the value of the point
    • p is the power we want to apply to the distance o weight it.
    The distance is the Cartesian one, plus the smoothing factor (it gives more distance than the actual one, lowering the influence around the points). If the distance is close to the precision of the float numbers, we give the data value instead of the interpolated one, to avoid strange results.
And that's all. Here, the results (you can compare them with the ones in the other entry) which are consistent to the inverse of the distance method:
 If the smoothing is zero and the power is high, the interpolation changes a lot around the points to give them their exact value.
If the smoothing is high and the power is one, the result is much smoother, but the values at the points are not maintained.

Adding the GIS stuff:

The example above doesn't write to a GIS file. The function invDist must be changed so it creates the file using a gdal driver:
def invDist(xv,yv,values,geotransform,proj,xSize,ySize,power,smoothing,driverName,outFile):
    #Transform geographic coordinates to pixels
    for i in range(0,len(xv)):
         xv[i] = (xv[i]-geotransform[0])/geotransform[1]
    for i in range(0,len(yv)):
         yv[i] = (yv[i]-geotransform[3])/geotransform[5]
    #Creating the file
    driver = gdal.GetDriverByName( driverName )
    ds = driver.Create( outFile, xSize, ySize, 1, gdal.GDT_Float32)
    if proj is not None:
    valuesGrid = np.zeros((ySize,xSize))
    #Getting the interpolated values
    for x in range(0,xSize):
        for y in range(0,ySize):
            valuesGrid[y][x] = pointValue(x,y,power,smoothing,xv,yv,values)
    ds = None
    return valuesGrid
  1. Note that the function will  need the geographic parameters geotransform and proj, as well as the output file name and driver.
  2. First, the point positions must be changed to pixel locations in order to calculate distances
  3. Then, the output file is created with the size and projection asked by the user
  4. The values are calculated in the same way as in the other example
  5. The file is written
The data in the first example was set into three lists, but the correct way to do it would be read it from a geometry file with the OGR library. Another function is defined to do that:

def readPoints(dataFile,Zfield='Z'):
    data = {}
    ds = ogr.Open(dataFile)
    if ds is None:
       raise Exception('Could not open ' + dataFile)
    layer = ds.GetLayer()
    proj = layer.GetSpatialRef()
    extent = layer.GetExtent()

    feature = layer.GetNextFeature()
    if feature.GetFieldIndex(zField) == -1:
         raise Exception('zField is not valid: ' + zField)

    while feature:
        geometry = feature.GetGeometryRef()

        feature = layer.GetNextFeature()
    data['extent'] = extent 
    data['proj'] = proj
    ds = None
    return data

  1. The function reads the point position and values from the given file and field name.
  2. A dictionary is returned, to help retrieving all the values we may need to create the output raster. If no projection or geotransform is passed by the user, the input file extents and projection are used.
  3. In the example files, a file called points.shp has some sample points to test the functions.
Finally, the main method must get the  parameters from the user, and call the other two methods to create the raster. I have copied some of the input options from the gdal_grid program, so it's easier to test the results. Of course, not all the options are implemented, but it shouldn't be difficult to do it.

if __name__ == "__main__":
    #Setting the default values

    geotransform = None

    #Parsing the command line
    argv = gdal.GeneralCmdLineProcessor( sys.argv )
    i = 1
    while i < len(argv):
        arg = argv[i]
        if arg == '-out_format':
         driverName = argv[i+1]
         driverName = driverName.replace("'","")
         driverName = driverName.replace('"','')
         i = i + 1
        elif arg == '-zfield':
         zField = argv[i+1]
         zField = zField.replace("'","")
         zField = zField.replace('"','')
         i = i + 1
        elif arg == '-a_srs':
         proj = argv[i+1]
         proj = proj.replace("'","")
         proj = proj.replace('"','')
         i = i + 1
        elif arg == '-outsize':
         xSize = argv[i+1]
         xSize = xSize.replace("'","")
         xSize = int(xSize.replace('"',''))
         ySize = argv[i+2]
         ySize = ySize.replace("'","")
         ySize = int(ySize.replace('"',''))
         i = i + 2
        elif arg == '-txe':
         xMin = argv[i+1]
         xMin = xMin.replace("'","")
         xMin = float(xMin.replace('"',''))
         xMax = argv[i+2]
         xMax = xMax.replace("'","")
         xMax = float(xMax.replace('"',''))
         i = i + 2
        elif arg == '-tye':
         yMin = argv[i+1]
         yMin = yMin.replace("'","")
         yMin = float(yMin.replace('"',''))
         yMax = argv[i+2]
         yMax = yMax.replace("'","")
         yMax = float(yMax.replace('"',''))
         i = i + 2
        elif dataFile is None:
         dataFile = arg
         dataFile = dataFile.replace("'","")
         dataFile = dataFile.replace('"','')
        elif outFile is None:
         outFile = arg
         outFile = outFile.replace("'","")
         outFile = outFile.replace('"','')
        i = i + 1

    if dataFile is None or outFile is None:
        data = readPoints(dataFile,zField)
    except Exception,ex:
        print ex
    if xMin is None:
    if yMin is None:


    if proj is None:
        proj = data['proj'] 
            proj = osr.SpatialReference()
        except Exception,ex:
            print ex 
    #Creating the interpolation function and populating the output matrix value
        ZI = invDist(data['xv'],data['yv'],data['values'],geotransform,proj,xSize,ySize,power,smoothing,driverName,outFile)
    except Exception,ex:
        print ex

  1. First, the default parameter values are set. Some of them are None, since they are mandatory.
  2. Then, the input arguments are read.
  3. After having the input arguments, all the parameters that can be set, are set.
  4. The function that reads the input file is run, and with it's result, the raster is created. Note that the geotransform is defined from the input file if no one is set by the user.
And the result is something like this when opened with QGis or any GIS:
The points are not coloured, but are the ones in the original file.
The whole code comes in the example file, with all the data you need to run the example.
To generate the picture above, the command is:
 python -zfield value points.shp out.tiff

What's next

The original gdal_grid implements more interpolation methods (nearest value, and average). Doing the same in the script is as easy as adding other functions to use instead of pointValue.
Also, in the original program, an ellipse can be set to search only in some area around each pixel. This would be a little more difficult do implement, but is easy to find it in the gdal source code in a file called gdalgrid.cpp. I've put it in the example data.

As you can see, implementing this kind of stuff in python is not as difficult as it may seem at the beginning.

Thursday, May 24, 2012

Running GDAL Java

The GDAL Java bindings work very well, but the documentation is very very limited.
The API documentation is much better that the one in python, but it's quite difficult to find how to run a simple example.

Installation (Ubuntu)

In the Ubuntu distribution, there is a GDAL package, but the java bindings are not included. So it's necessary to compile the libraries to have it. 
If you have the gdal libraries already installed from the packages, is not necessary to remove them, since you can use just the compiled jar with the old libraries.
Before compiling, make sure that swig, libgeos-dev and proj4 packages are installed.
The, just run:
./configure --with-java --with-static-proj4=[]
Go to  the /path_to_gdal/swig/java directory and edit the file java.opt to set the correct JAVA_HOME variable, which is set to a Windows installation by default.
Then type make.
The gdal.jar file will be in the /path_to_gdal/swig/java directory.
The official docs to install the GDAL Java bindings are here:

Running the classes

GDAL java bindings use the native GDAL installation, so they make use of JNI. The Java Virtual Machine must find the GDAL binaries. To do so, an environment variable must be set. Froma the command line in linux, the order would be:
 export LD_LIBRARY_PATH=/path_to/gdal-1.9.0/swig/java
where the path is the one you have compiled.
If any coordinate transformation is used in the code, GDAL must find a file named gcs.csv. To find where is this file, just run:
gdal-config --datadir
Then, set the environment variable:
export GDAL_DATA= /path_to_gdal_data
or, even better, execute directly:
export GDAL_DATA=`gdal-config --datadir`
Don't forget to include the file gdal.jar in the classpath, and the class will run properly.
Anyway, is much easier to use an IDE to code in JAVA. Next two sections explain how.

 Running in Eclipse

Using the Eclipse IDE, is possible to configure the environment variables from the configuration window. To do it, click the small arrow next to the play button, and choose Run Configurations...
Once there, choose the Environment tab and add both GDAL_DATA and LD_LIBRARY_PATH using the new button.

Running in Netbeans

I haven't found a graphical solution for configuring both LD_LIBRARY_PATH and GDAL_DATA with Netbeans.
To set these environment variables, locate and open the file netbeans.conf. In my case (ubuntu 12.04), it was located at the path: /usr/local/netbeans-7.0.1/etc/
Add at the end of the file, the lines

export GDAL_DATA=/path_to/gdal/1.9
export LD_LIBRARY_PATH=/path_to/gdal-1.9.0/swig/java

Running in Apache Tomcat

If you are using Apache Tomcat to serve web pages, reading GIS files with GDAL can be a good idea. As in the other cases,  the environment variables must be set.
Edit /path_to_tomcat/bin/ (the path may change depending of your installation), and, as in NetBeans, add the lines:

export GDAL_DATA=/path_to/gdal/1.9
export LD_LIBRARY_PATH=/path_to/gdal-1.9.0/swig/java

The libraries must be installed as shared libraries, or the loading will crash if two instances are generated (a java.lang.UnsatisfiedLinkError will be launched).   So:
  • Create the directory: $CATALINA_HOME/shared/lib
  • Put into this directory the contents of  gdal_home/swig/java, which includes the gdal.jar file
  • Edit (in my case was the one under the conf directory) and change the shared.loader variable as:

Using Maven

gdal.jar is not in the maven repositories (as far as I know), so it must be added manually. Besides, if gdal has to be used in Tomcat in a specific directory, different from the one in maven.
So the pom.xml file has to be edited in the gdal section like this:

  Example class

If you want to check how to run a GDAL java class, you can use this example. The class prints the number of bands of the GIS file passed as a parameter:
import org.gdal.gdal.Dataset;
import org.gdal.gdal.gdal;
import org.gdal.gdalconst.gdalconstConstants;

 * Test class for GDAL java bindings 
public class Test {
 Dataset hDataset;
 int numBands;
 public Test(String filename){
  hDataset = gdal.Open(filename, gdalconstConstants.GA_ReadOnly);
  this.numBands = hDataset.getRasterCount();
  * @param args
 public static void main(String[] args) {
  if(args.length == 0){
   System.out.println("You must pass the file name as an argument");
  } else {
  Test instance = new Test(args[0]);

Saturday, March 3, 2012

Creating a grid from scattered data using inverse of the distance with python

Attention: I've found some problems in this post. Please read the new one about the same topic.

A usual problem is drawing a map from some scattered data. So it's necessary to set a value for each point in the map from the data in the points already known.
There are different methods used for this purpose, such as kriging or inverse of the distance. The second one is the one used in the example.
The python library scipy has a function called RBF that does that. The code:

import numpy as np
from scipy.interpolate import Rbf

import matplotlib.pyplot as plt

#Creating some data, with each coordinate and the values stored in separated lists
x = [10,60,40,70,10,50,20,70,30,60]
y = [10,20,30,30,40,50,60,70,80,90]
values = [1,2,2,3,4,6,7,7,8,10]

#Creating the output grid (100x100, in the example)
ti = np.linspace(0, 100.0, 100)
XI, YI = np.meshgrid(ti, ti)

#Creating the interpolation function and populating the output matrix value
rbf = Rbf(x, y, values, function='inverse')
ZI = rbf(XI, YI)

# Plotting the result
n = plt.normalize(0.0, 100.0)
plt.subplot(1, 1, 1)
plt.pcolor(XI, YI, ZI)
plt.scatter(x, y, 100, values)
plt.title('RBF interpolation')
plt.xlim(0, 100)
plt.ylim(0, 100)

When executing the script, the output is an image like this:
The Rbf function  takes as arguments the x and y axes, and a list of the values in the points.  A function to use has to be set as well. In our example, inverse of the distance is used, but there are other options, and even a custom function can be used.
Then an output matrix has to be defined. To do it, the new x and y axes must be defined. In our example, a 100x100 pixel matrix has been defined with the meshgrid function.
Finally, the output matrix is created with the function that was th output of the Rbf function. This ZI matrix is what we can use to draw a map.
In this example, there is only a simple plot, but in the next post a real case will be shown.

Sunday, February 19, 2012

Colorize raster with GDAL python

As I was telling in my last post, what we usually want is to take a raster file, classify it, and output a png in a color scale.
The easiest way to do it, is using gdaldem with the color-relief option.
But is not in python.
So, if you want to integrate it in a script, or modify it's behavior easily, here you have a solution.
All the example data, including the whole script, dataset, etc, can be downloaded here.

The script:

The first thing to do is reading the color file. As in the gdaldem application, the file is plain text with four columns, value, red value, green value, blue value, and, optionally, the alpha value. We will consider alpha value to be 255 if is not set. The comments are indicated with the hash. This format is the one used in the GRASS r.colors utility
So a function is defined:

def readColorTable(color_file):
    color_table = {}
    if exists(color_file) is False:
        raise Exception("Color file " + color_file + " does not exist")
    fp = open(color_file, "r")
    for line in fp:
        if line.find('#') == -1 and line.find('/') == -1:
            entry = line.split()
            if len(entry) == 5:
                alpha = int(entry[4])
    return color_table

Once the table is read, is necessary to read the band in the raster file:

from os.path import exists
from osgeo import gdal
from osgeo.gdalconst import GA_ReadOnly
from struct import unpack

data_types ={'Byte':'B','UInt16':'H','Int16':'h','UInt32':'I','Int32':'i','Float32':'f','Float64':'d'}
if exists(raster_file) is False:
    raise Exception('[Errno 2] No such file or directory: \'' + raster_file + '\'')    
dataset = gdal.Open(raster_file, GA_ReadOnly )
if dataset == None:
    raise Exception("Unable to read the data file")
band = dataset.GetRasterBand(raster_band)
values = band.ReadRaster( 0, 0, band.XSize, band.YSize, band.XSize, band.YSize, band.DataType )
values = unpack(data_types[gdal.GetDataTypeName(band.DataType)]*band.XSize*band.YSize,values)

Notice that the data types in the struct command and in the GDAL library haven't got the same names, so a dict has to be created to translate them. (Maybe there's a better way to do it, but this one works) A band has to be set. If the file has got only one band (as in our example), the value must be one. Then, two images must be created, one for the colors, an the other for the alpha channel:

import Image
import ImageDraw
base = 'RGBA', (band.XSize,band.YSize) )
base_draw = ImageDraw.Draw(base)
alpha_mask ='L', (band.XSize,band.YSize), 255)
alpha_draw = ImageDraw.Draw(alpha_mask)

Now, the classification is done. There are two modes, the one that gives the exact color defined in the color file, and the one that calculates the color according to the value, and the two nearest colors in the palette. Here, to make more readable, only the exact color is shown:

import Image
color_table = readColorTable(color_file)
classification_values = color_table.keys()
for pos in range(len(values)):
        y = pos/band.XSize
        x = pos - y * band.XSize
        for index in range(len(classification_values)):

            if values[pos] <= classification_values[index] or index == len(classification_values)-1:
                if index == 0:
                        index = 1
                    elif index == len(classification_values)-1 and values[pos] >= classification_values[index]:
                        index = index + 1
                    color = color_table[classification_values[index-1]]
                    base_draw.point((x,y), (color[0],color[1],color[2]))
                    base_draw.point((x,y), (int(r),int(g),int(b)))

The first thing to do is getting the values of the color table. Then, for each pixel, the pixel position is calculated. To know the color value to be used, is necessary to iterate all the values in the color table to see which one is just over the pixel value. There is a special case if the pixel value is under the minimum value of the scale. Then, the pixel is set in the color image and the alpha channel. See the break statement to exit the loop.
Once the image and the alpha channel are calculated, the alpha channel has to be blend with the color channel before saving the result:

color_layer ='RGBA', base.size, (255, 255, 255, 0))
base = Image.composite(color_layer, base, alpha_mask)

And that's all.
In the example, everything is packed in a function, and is possible to call it from the command line.
To calculate the color in the -nearest_color_entry mode, the code would be like this for each channel:

r = color_table[classification_values[index-1]][0] + (values[pos] - classification_values[index-1])*(color_table[classification_values[index]][0] - color_table[classification_values[index-1]][0])/(classification_values[index]-classification_values[index-1])

Of course, there is also the special case when the pixel value is lower than the lowest value in the scale or higher than the higher one. The solution is in the script.

Running the code:

We will use the same data of the last post (elevation data from the USGS).
All the sample data and the script is here.
A color file example could be this one (for our elevation data)

1600 166 122 59
1900 180 165 98
2200 188 189 123
2500 74 156 74
2800 123 189 83
3100 165 205 132
3300 255 255 255
To create the file, just type:
python -exact_color_entry w001001.adf colorfile.clr out.png
python -nearest_color_entry w001001.adf colorfile.clr out.png

Sample output with the -exact_color_entry option
Sample output with the default behaviour
It is possible to call the function from another python script just by typing

from colorize import raster2png
except Exception, ex:
    print "Error: " + str(ex) 

If discrete is True, the output is with the exact color in the palette. If is False, with the continuous mode.

Monday, February 6, 2012

Raster classification with GDAL Python

There is a new entry about this topic, with a much more efficient code:

Classifying a raster means assigning a set of discrete values from the original continuous raster data. You can think about it as colorizing a raster file using data intervals, but it has many other uses, of course.

We will use the  Seamless Data Warehouse page from the USGS to get some data. Just open the following page and use the download section tools to get the data from the place you prefer:
The result is a file in the sub folder with a number.

So be sure that the GDAL python bindings are installed and execute the script:

#! /usr/bin/python

#Change the value with your raster filename here
raster_file = 'w001001.adf'
output_file = 'classified.tiff'

classification_values = [0,500,1000,1500,2000,2500,3000,3500,4000] #The interval values to classify
classification_output_values = [10,20,30,40,50,60,70,80,90] #The value assigned to each interval

from osgeo import gdal
from osgeo.gdalconst import *
import numpy
import struct

#Opening the raster file
dataset = gdal.Open(raster_file, GA_ReadOnly )
band = dataset.GetRasterBand(1)
#Reading the raster properties
projectionfrom = dataset.GetProjection()
geotransform = dataset.GetGeoTransform()
xsize = band.XSize
ysize = band.YSize
datatype = band.DataType

#Reading the raster values
values = band.ReadRaster( 0, 0, xsize, ysize, xsize, ysize, datatype )
#Conversion between GDAL types and python pack types (Can't use complex integer or float!!)
data_types ={'Byte':'B','UInt16':'H','Int16':'h','UInt32':'I','Int32':'i','Float32':'f','Float64':'d'}
values = struct.unpack(data_types[gdal.GetDataTypeName(band.DataType)]*xsize*ysize,values)

#Now that the raster is into an array, let's classify it
out_str = ''
for value in values:
    index = 0
    for cl_value in classification_values:
        if value <= cl_value:
            out_str = out_str + struct.pack('B',classification_output_values[index])
        index = index + 1
#Once classified, write the output raster
#In the example, it's not possible to use the same output format than the input file, because GDAL is not able to write this file format. Geotiff will be used instead
gtiff = gdal.GetDriverByName('GTiff') 
output_dataset = gtiff.Create(output_file, xsize, ysize, 4)

output_dataset.GetRasterBand(1).WriteRaster( 0, 0, xsize, ysize, out_str ) 
output_dataset = None

  • To classify the values is necessary to check for each classification value, until it fits to the interval. 
  • Look also how the projection is kept. The same for the geotransform.
  • If the input file format can be written by GDAL, it's possible to get it with:
    driver = dataset.GetDriver()
  • Look how the GDAL data types are managed. The names are different from the python types.
To see what has occurred,  the best is using QGIS.
Just open both files and you will see that the original file looks something like that once coloured with pseudocolor:
 And how the classified file looks like.

A nice exercice would be generating the coloured png's directly from the python script from a color file. Maybe the next post.

Tuesday, January 3, 2012

Creating files with OGR, GDAL and python

Although  the way to create files is quite well documented in the official gdal tutorials, it's quite useful to have it summarized.

Creating a GDAL file

To create a raster file using GDAL is possible to use different techniques. The simplest is to use the Create statement.

from osgeo import gdal
from osgeo import osr
from osgeo.gdalconst import *
driver = gdal.GetDriverByName( 'GTiff' )
ds = driver.Create( 'out_file.tiff', 255, 255, 1, gdal.GDT_Int32)
proj = osr.SpatialReference()
proj.SetWellKnownGeogCS( "EPSG:4326" )
geotransform = (1,0.1,0,40,0,0.1)
ds = None
The three first lines are the library imports. First, a Driver must be declared. This determines the format of the created file. The official GDAL page has the complete supported formats list. Look if the creation option is 'YES' The second step is to create the datasource. The first parameter is the ouput file name. The next two are the raster dimensions. Then, the number of layers. The last parameter is the data type of the raster. The next steps configure the projection and are optional, but highly recommended. The easyest way to declare a projection, in my opinion, is to use the EPSG code. The geotransform defines the relation between the raster coordinates x, y and the geographic coordinates, using the following definition:

Xgeo = geotransform[0] + Xpixel*geotransform[1] + Yline*geotransform[2]
Ygeo = geotransform[3] + Xpixel*geotransform[4] + Yline*geotransform[5]
The first and fourth parameters define the origin of the upper left pixel The second and sixth parameters define the pixels size. The third and fifth parameters define the rotation of the raster. After doing all the operations, is a good practice to set the datasource as None to close it properly.

Creating an OGR file

Creating a vector file is as easy as creating a raster:

from osgeo import ogr
from osgeo import osr
drv = ogr.GetDriverByName( 'ESRI Shapefile ' )
dst_ds = drv.CreateDataSource( 'out_file.shp' )
proj = osr.SpatialReference()
proj.SetWellKnownGeogCS( "EPSG:4326" )
dst_layer = dst_ds.CreateLayer('TOWNS', srs = proj )
dst_ds = None
After the imports, we declare de Driver, choosing the format from the official list (be careful to use a format with the creation option set to YES). To create the datasource, only the output file name is needed. Sometimes it will fail if an other file with the same name already exists, so it's better to check it before executing this statement. Is good, but not mandatory, to set the projection information. As in the raster case, the easiest way is from the EPSG code. The projection is set to each layer with the CreateLayer method. After doing all the operations, is a good practice to set the datasource as None to close it properly.

Creating temporary memory files 

Sometimes is useful to have a file in memory. If you want to import some vector data and analyze it with the ogr functions, you may not want to create the actual file. This can be done in a raster file with GDAL or a vector file in OGR.

When using GDAL, the Driver name must be declared as 'MEM', and the data source must be created with a null name , using ''. The example creates a 255x255 raster with integer values:

driver = gdal.GetDriverByName( 'MEM' )
ds = driver.Create( '', 255, 255, 1, gdal.GDT_Int32)

In OGR, you must declare the Driver used as 'Memory', and when creating the Datasource, use the name 'out':

drv = ogr.GetDriverByName( 'Memory' )
dst_ds = drv.CreateDataSource( 'out' )