Thursday, June 14, 2012

Density maps using GDAL/OGR python

Having a cloud of points is quite common. This tutorial shows how to calculate the density of points in a set of polygons.
As always, you can download all the data and scripts.

The data

Seeking some data for the tutorial, I found  the  position of all the tornado touchdowns in the period 1950-2004. Calculating the "tornado density" in the US seemed interesting enough.
You can find the data at the data.gov site.

The density can be calculated in a regular grid or a set of polygons. We will do both things, using the US counties as the polygons, which I found at wingis.org

So the downloaded data looks like that when opened in QGIS:
It's easy to see that most of the tornadoes occur in the east half of the USA, but it's still difficult to see actual density differences.

Tornado density in every US county 

First, a script to assign to every county a "tornado touchdown density" value. What the script will do is basically:
  1. Open the input datasets: The tornado points file, and the counties file
  2. Create the output file, and the fields we will use (name and state name, plus the density field)
  3. For each county, we use the method Within for all the points representing the tornadoes. 
  4. We calculate the density. Since the file is in latlon projection, the GetArea method would be in square degrees, which is incorrect when comparing, because the actual area would be much smaller in northern regions.
    So we re project the county geometry to the EPSG:900913 projection, which is in meters. I think that is actually incorrect, that the UTM would be better, but the zone changes too much and it's out of the scope of the entry.
    The density is simply the number of tornadoes divided by the county area.
  5. We write the output file.
So the code looks like:
from osgeo import ogr
from osgeo import osr
import sys


def density(pointsFile,polygonsFile,outFile):
    #Opening the input files
    dsPolygons = ogr.Open(polygonsFile)
    if dsPolygons is None:
       raise Exception('Could not open ' + pointsFile)
    dsPoints = ogr.Open(pointsFile)
    if dsPoints is None:
       raise Exception('Could not open ' + pointsFile)
    
    polygonsLayer = dsPolygons.GetLayer()
    pointsLayer = dsPoints.GetLayer()


    #Creating the output file

    driverName = "ESRI Shapefile"
    drv = ogr.GetDriverByName( driverName )
    dsOut = drv.CreateDataSource( outFile )  
    outLayer = dsOut.CreateLayer('DENSITY', None, ogr.wkbPolygon)  
    field = ogr.FieldDefn( "Density", ogr.OFTReal )
    outLayer.CreateField ( field )
    field = ogr.FieldDefn( "Name", ogr.OFTString )
    outLayer.CreateField ( field )
    field = ogr.FieldDefn( "State_name", ogr.OFTString )
    outLayer.CreateField ( field )
    
    #Preparing the coordinate transformation
    srsQuery = osr.SpatialReference()
    srsQuery.ImportFromEPSG(4326)    
    srsArea = osr.SpatialReference()
    srsArea.ImportFromEPSG(900913)  

    transf = osr.CoordinateTransformation(srsQuery,srsArea)

    #Iterating all the polygons
    polygonFeature = polygonsLayer.GetNextFeature()
    k=0
    while polygonFeature:
        k = k + 1
        print  "processing " + polygonFeature.GetField("NAME") + " ("+polygonFeature.GetField("STATE_NAME")+") - " + str(k) + " of " + str(polygonsLayer.GetFeatureCount())
         
        geometry = polygonFeature.GetGeometryRef()
        numCounts = 0.0
        pointsLayer.ResetReading()
        pointFeature = pointsLayer.GetNextFeature()
        #Iterating all the points
        while pointFeature:
            if pointFeature.GetGeometryRef().Within(geometry):
                numCounts = numCounts + 1
            pointFeature = pointsLayer.GetNextFeature()
        
        #Calculating the actual area
        geometryArea = geometry.Clone()
        geometryArea.Transform(transf)
        polygonArea = geometryArea.GetArea()/(1000000.0)
        density = numCounts/polygonArea

        #Writting the fields in the output layer
        feature = ogr.Feature( outLayer.GetLayerDefn())
        feature.SetField( "Density", density )
        feature.SetField( "Name", polygonFeature.GetField("NAME") )
        feature.SetField( "State_name", polygonFeature.GetField("STATE_NAME") )

        feature.SetGeometry(geometry)
        outLayer.CreateFeature(feature)        

        polygonFeature = polygonsLayer.GetNextFeature()

    dsOut = None
    dsPolygons = None
    dsPoints = None


if __name__ == "__main__":
    points = sys.argv[1]
    polygons = sys.argv[2]
    outfile = sys.argv[3]

    density(points,polygons,outfile)

To run the script, just type:
python density.py "data/tornadx020.shp" "data/UScounties.shp" "out.shp"

Note that the script isn't for any general case. Different input projections should be handled, and the input fields should be different according to the file. Besides, no geometry checks are done.


The result, opening and colouring the file with QGIS looks like that:
 Now is easy to see that the central USA and Florida have the highest tornado density.


Tornado density in a grid

Sometimes we don't have a particular set of polygons of interest, but a raster where every pixel has the density value is an interesting data.
In this case, the output is created using GDAL instead of OGR, and is a regular grid defined by the user. But the steps are very similar.
As in the example before, the areas in each pixel are in square degrees, and this must be changed to square kilometers.
The code is:
from osgeo import ogr
from osgeo import osr
from osgeo import gdal
from struct import pack
import sys

def density(pointsFile,outFile,x0,y0,x1,y1,samples,lines):
    
    deltax = (x1 - x0) / samples  
    deltay = (y1 - y0) / lines

    dsPoints = ogr.Open(pointsFile)
    if dsPoints is None:
       raise Exception('Could not open ' + pointsFile)
    
    pointsLayer = dsPoints.GetLayer()

    #Creating the output file

    driverName = "ESRI Shapefile"
    drv = ogr.GetDriverByName( driverName )
    
    srsQuery = osr.SpatialReference()
    srsQuery.ImportFromEPSG(4326)    
    srsArea = osr.SpatialReference()
    srsArea.ImportFromEPSG(900913)  

    transf = osr.CoordinateTransformation(srsQuery,srsArea)

    driverName = gdal.GetDriverByName( 'GTiff' )
    dsOut = driverName.Create( outFile, samples, lines, 1, gdal.GDT_Float32)
    dsOut.SetProjection(srsPeticio.ExportToWkt())
    dsOut.SetGeoTransform([x0,deltax,0,y0,0,deltay])
    

    #Iterating all the pixels to write the raster values
    dataString = ''
    for j in range(0,lines):
      for i in range(0,samples):
 print "sample " + str(i) + " line " + str(j)
        px = x0 + deltax * i
        py = y0 + deltay * j
        #The pixel geometry must be created to execute the within method, and to calculate the actual area
        wkt = "POLYGON(("+str(px)+" "+str(py)+","+str(px + deltax)+" "+str(py)+","+str(px + deltax)+" "+str(py+deltay)+","+str(px)+" "+str(py+deltay)+","+str(px)+" "+str(py)+"))"
        
        geometry = ogr.CreateGeometryFromWkt(wkt)
        
        numCounts = 0.0
        pointsLayer.ResetReading()
        pointFeature = pointsLayer.GetNextFeature()
        
        #Iterating all the points
        while pointFeature:
            if pointFeature.GetGeometryRef().Within(geometry):
                numCounts = numCounts + 1
            pointFeature = pointsLayer.GetNextFeature()
        
        geometryArea = geometry.Clone()
        geometryArea.Transform(transf)
        polygonArea = geometryArea.GetArea()/(1000000.0)
        density = numCounts/polygonArea
        dataString = dataString + pack('f',density)
    
    #Writting the raster
    dsOut.GetRasterBand(1).WriteRaster( 0, 0, samples, lines, dataString ) 

    dsOut = None
    dsPoints = None
if __name__ == "__main__":
    points = sys.argv[1]
    outfile = sys.argv[2]
    x0=float(sys.argv[3])
    y0=float(sys.argv[4])
    x1=float(sys.argv[5])
    y1=float(sys.argv[6])
    samples=int(sys.argv[7])
    lines=int(sys.argv[8])

  
    density(points,outfile,x0,y0,x1,y1,samples,lines)

To run the script, type:

python density_raster.py "data/tornadx020.shp" "grid.tiff" -127.0 51.0 -65.0 24.0 50 50

Where the numbers are the coordinates and the size of the output raster as explained in the code.

Again, the script lasts quite a lot to run.

The result seen with QGIS looks like that:
Is similar to the counties map, because both pixels and counties have similar sizes. The result may be easier to handle in this way.

Final thoughts

In the example data, executing the script takes a long time, since 53000 points are given, and there are about 3500 counties. Using PostGIS or so would be faster, but the example doesn't require more installations than GDAL and python.

It would be nice improve the code, creating all the functionalities in a class, checking the input values, accepting different projections, etc.

Remember that all the example is available in a zip file.