## Thursday, June 14, 2012

### Density maps using GDAL/OGR python

Having a cloud of points is quite common. This tutorial shows how to calculate the density of points in a set of polygons.

### The data

Seeking some data for the tutorial, I found  the  position of all the tornado touchdowns in the period 1950-2004. Calculating the "tornado density" in the US seemed interesting enough.
You can find the data at the data.gov site.

The density can be calculated in a regular grid or a set of polygons. We will do both things, using the US counties as the polygons, which I found at wingis.org

It's easy to see that most of the tornadoes occur in the east half of the USA, but it's still difficult to see actual density differences.

### Tornado density in every US county

First, a script to assign to every county a "tornado touchdown density" value. What the script will do is basically:
1. Open the input datasets: The tornado points file, and the counties file
2. Create the output file, and the fields we will use (name and state name, plus the density field)
3. For each county, we use the method Within for all the points representing the tornadoes.
4. We calculate the density. Since the file is in latlon projection, the GetArea method would be in square degrees, which is incorrect when comparing, because the actual area would be much smaller in northern regions.
So we re project the county geometry to the EPSG:900913 projection, which is in meters. I think that is actually incorrect, that the UTM would be better, but the zone changes too much and it's out of the scope of the entry.
The density is simply the number of tornadoes divided by the county area.
5. We write the output file.
So the code looks like:
from osgeo import ogr
from osgeo import osr
import sys

def density(pointsFile,polygonsFile,outFile):
#Opening the input files
dsPolygons = ogr.Open(polygonsFile)
if dsPolygons is None:
raise Exception('Could not open ' + pointsFile)
dsPoints = ogr.Open(pointsFile)
if dsPoints is None:
raise Exception('Could not open ' + pointsFile)

polygonsLayer = dsPolygons.GetLayer()
pointsLayer = dsPoints.GetLayer()

#Creating the output file

driverName = "ESRI Shapefile"
drv = ogr.GetDriverByName( driverName )
dsOut = drv.CreateDataSource( outFile )
outLayer = dsOut.CreateLayer('DENSITY', None, ogr.wkbPolygon)
field = ogr.FieldDefn( "Density", ogr.OFTReal )
outLayer.CreateField ( field )
field = ogr.FieldDefn( "Name", ogr.OFTString )
outLayer.CreateField ( field )
field = ogr.FieldDefn( "State_name", ogr.OFTString )
outLayer.CreateField ( field )

#Preparing the coordinate transformation
srsQuery = osr.SpatialReference()
srsQuery.ImportFromEPSG(4326)
srsArea = osr.SpatialReference()
srsArea.ImportFromEPSG(900913)

transf = osr.CoordinateTransformation(srsQuery,srsArea)

#Iterating all the polygons
polygonFeature = polygonsLayer.GetNextFeature()
k=0
while polygonFeature:
k = k + 1
print  "processing " + polygonFeature.GetField("NAME") + " ("+polygonFeature.GetField("STATE_NAME")+") - " + str(k) + " of " + str(polygonsLayer.GetFeatureCount())

geometry = polygonFeature.GetGeometryRef()
numCounts = 0.0
pointFeature = pointsLayer.GetNextFeature()
#Iterating all the points
while pointFeature:
if pointFeature.GetGeometryRef().Within(geometry):
numCounts = numCounts + 1
pointFeature = pointsLayer.GetNextFeature()

#Calculating the actual area
geometryArea = geometry.Clone()
geometryArea.Transform(transf)
polygonArea = geometryArea.GetArea()/(1000000.0)
density = numCounts/polygonArea

#Writting the fields in the output layer
feature = ogr.Feature( outLayer.GetLayerDefn())
feature.SetField( "Density", density )
feature.SetField( "Name", polygonFeature.GetField("NAME") )
feature.SetField( "State_name", polygonFeature.GetField("STATE_NAME") )

feature.SetGeometry(geometry)
outLayer.CreateFeature(feature)

polygonFeature = polygonsLayer.GetNextFeature()

dsOut = None
dsPolygons = None
dsPoints = None

if __name__ == "__main__":
points = sys.argv[1]
polygons = sys.argv[2]
outfile = sys.argv[3]

density(points,polygons,outfile)


To run the script, just type:
python density.py "data/tornadx020.shp" "data/UScounties.shp" "out.shp"

Note that the script isn't for any general case. Different input projections should be handled, and the input fields should be different according to the file. Besides, no geometry checks are done.

The result, opening and colouring the file with QGIS looks like that:
Now is easy to see that the central USA and Florida have the highest tornado density.

### Tornado density in a grid

Sometimes we don't have a particular set of polygons of interest, but a raster where every pixel has the density value is an interesting data.
In this case, the output is created using GDAL instead of OGR, and is a regular grid defined by the user. But the steps are very similar.
As in the example before, the areas in each pixel are in square degrees, and this must be changed to square kilometers.
The code is:
from osgeo import ogr
from osgeo import osr
from osgeo import gdal
from struct import pack
import sys

def density(pointsFile,outFile,x0,y0,x1,y1,samples,lines):

deltax = (x1 - x0) / samples
deltay = (y1 - y0) / lines

dsPoints = ogr.Open(pointsFile)
if dsPoints is None:
raise Exception('Could not open ' + pointsFile)

pointsLayer = dsPoints.GetLayer()

#Creating the output file

driverName = "ESRI Shapefile"
drv = ogr.GetDriverByName( driverName )

srsQuery = osr.SpatialReference()
srsQuery.ImportFromEPSG(4326)
srsArea = osr.SpatialReference()
srsArea.ImportFromEPSG(900913)

transf = osr.CoordinateTransformation(srsQuery,srsArea)

driverName = gdal.GetDriverByName( 'GTiff' )
dsOut = driverName.Create( outFile, samples, lines, 1, gdal.GDT_Float32)
dsOut.SetProjection(srsPeticio.ExportToWkt())
dsOut.SetGeoTransform([x0,deltax,0,y0,0,deltay])

#Iterating all the pixels to write the raster values
dataString = ''
for j in range(0,lines):
for i in range(0,samples):
print "sample " + str(i) + " line " + str(j)
px = x0 + deltax * i
py = y0 + deltay * j
#The pixel geometry must be created to execute the within method, and to calculate the actual area
wkt = "POLYGON(("+str(px)+" "+str(py)+","+str(px + deltax)+" "+str(py)+","+str(px + deltax)+" "+str(py+deltay)+","+str(px)+" "+str(py+deltay)+","+str(px)+" "+str(py)+"))"

geometry = ogr.CreateGeometryFromWkt(wkt)

numCounts = 0.0
pointFeature = pointsLayer.GetNextFeature()

#Iterating all the points
while pointFeature:
if pointFeature.GetGeometryRef().Within(geometry):
numCounts = numCounts + 1
pointFeature = pointsLayer.GetNextFeature()

geometryArea = geometry.Clone()
geometryArea.Transform(transf)
polygonArea = geometryArea.GetArea()/(1000000.0)
density = numCounts/polygonArea
dataString = dataString + pack('f',density)

#Writting the raster
dsOut.GetRasterBand(1).WriteRaster( 0, 0, samples, lines, dataString )

dsOut = None
dsPoints = None
if __name__ == "__main__":
points = sys.argv[1]
outfile = sys.argv[2]
x0=float(sys.argv[3])
y0=float(sys.argv[4])
x1=float(sys.argv[5])
y1=float(sys.argv[6])
samples=int(sys.argv[7])
lines=int(sys.argv[8])

density(points,outfile,x0,y0,x1,y1,samples,lines)


To run the script, type:

python density_raster.py "data/tornadx020.shp" "grid.tiff" -127.0 51.0 -65.0 24.0 50 50

Where the numbers are the coordinates and the size of the output raster as explained in the code.

Again, the script lasts quite a lot to run.

The result seen with QGIS looks like that:
Is similar to the counties map, because both pixels and counties have similar sizes. The result may be easier to handle in this way.

### Final thoughts

In the example data, executing the script takes a long time, since 53000 points are given, and there are about 3500 counties. Using PostGIS or so would be faster, but the example doesn't require more installations than GDAL and python.

It would be nice improve the code, creating all the functionalities in a class, checking the input values, accepting different projections, etc.

Remember that all the example is available in a zip file.

1. Thank you. Not just for this post but every post on your blog is excellent and is what i've been googling to find for a while because i'm new to programming and started to learn Python and ogr/gdal
Thanks!

2. Nice examples, very useful! I did find that the syntax to run the first example was slightly wrong. The script wanted a directory (rather than the name of a shapefile) for the last argument. I used the following:

mkdir out && python density.py "data/tornadx020.shp" "data/UScounties.shp" out

1. Thank you Andy. Actually, the problem is that gdal is not able to overwrite an existing shp file, and doesn't give a good error message.
If I understood well, this is because gdal is intended to append the new data in these cases instead of removing everything and adding the new data.
So you can either create a new directory as you suggest (different each time you run the script) or remove or rename the existing out.* files.
Or even better, add a os.path.exists() in the script and remove the existing file if the result is true.

3. Hi. Useful codes here, thanks. Have searched the blogs and communities, postgis, qgis, ogr for some time trying to find an easy method for line density calculations like what can be done in arcpy for Spatial Analyst with one line of code and half a minute of computation. Best found is this http://lists.osgeo.org/pipermail/postgis-users/2012-December/036136.html
Do you have any alternative?

4. Hi Karl,
If I understand the message, you want to do a similar thing than the explained here, but changing points for lines, isn't it?
I haven't got an example, but the grid greation would be the same. Only this "line density" would make to change the code a little. If "line density" means how many lines pass through the selected pixel, the Crosses function could be used.

I don't know if this is "one line of code", but it should work.

5. Hi. I am having task with the inverse problem of your tutorial which is to create random points with known density. What do you think the best approach? Using python and gdal/ogr of course

6. Hi. I saw an example done with D3js, but I can't find it again...
Without thinking it better, I would get the bounding box for each polygon, and get a random location there x = random.uniform(xmin, xmax), test if it's inside the polygon with the ogr within function, and draw it or look for an other if it's outside, until you arrive to the number of points, that will depend on the area of the polygon and the density.
Good luck!