Having a cloud of points is quite common. This tutorial shows how to calculate the density of points in a set of polygons.
As always, you can download all the data and scripts.
Note that the script isn't for any general case. Different input projections should be handled, and the input fields should be different according to the file. Besides, no geometry checks are done.
The result, opening and colouring the file with QGIS looks like that:
In the example data, executing the script takes a long time, since 53000
points are given, and there are about 3500 counties. Using PostGIS or so
would be faster, but the example doesn't require more installations than GDAL and python.
It would be nice improve the code, creating all the functionalities in a class, checking the input values, accepting different projections, etc.
Remember that all the example is available in a zip file.
As always, you can download all the data and scripts.
The data
Seeking some data for the tutorial, I found the position of all the tornado touchdowns in the period 1950-2004. Calculating the "tornado density" in the US seemed interesting enough.
You can find the data at the data.gov site.
The density can be calculated in a regular grid or a set of polygons. We will do both things, using the US counties as the polygons, which I found at wingis.org
So the downloaded data looks like that when opened in QGIS:
It's easy to see that most of the tornadoes occur in the east half of the USA, but it's still difficult to see actual density differences.
The density can be calculated in a regular grid or a set of polygons. We will do both things, using the US counties as the polygons, which I found at wingis.org
So the downloaded data looks like that when opened in QGIS:
It's easy to see that most of the tornadoes occur in the east half of the USA, but it's still difficult to see actual density differences.
Tornado density in every US county
First, a script to assign to every county a "tornado touchdown density" value. What the script will do is basically:
- Open the input datasets: The tornado points file, and the counties file
- Create the output file, and the fields we will use (name and state name, plus the density field)
- For each county, we use the method Within for all the points representing the tornadoes.
- We calculate the density. Since the file is in latlon projection, the GetArea method would be in square degrees, which is incorrect when comparing, because the actual area would be much smaller in northern regions.
So we re project the county geometry to the EPSG:900913 projection, which is in meters. I think that is actually incorrect, that the UTM would be better, but the zone changes too much and it's out of the scope of the entry.
The density is simply the number of tornadoes divided by the county area. - We write the output file.
So the code looks like:
from osgeo import ogr from osgeo import osr import sys def density(pointsFile,polygonsFile,outFile): #Opening the input files dsPolygons = ogr.Open(polygonsFile) if dsPolygons is None: raise Exception('Could not open ' + pointsFile) dsPoints = ogr.Open(pointsFile) if dsPoints is None: raise Exception('Could not open ' + pointsFile) polygonsLayer = dsPolygons.GetLayer() pointsLayer = dsPoints.GetLayer() #Creating the output file driverName = "ESRI Shapefile" drv = ogr.GetDriverByName( driverName ) dsOut = drv.CreateDataSource( outFile ) outLayer = dsOut.CreateLayer('DENSITY', None, ogr.wkbPolygon) field = ogr.FieldDefn( "Density", ogr.OFTReal ) outLayer.CreateField ( field ) field = ogr.FieldDefn( "Name", ogr.OFTString ) outLayer.CreateField ( field ) field = ogr.FieldDefn( "State_name", ogr.OFTString ) outLayer.CreateField ( field ) #Preparing the coordinate transformation srsQuery = osr.SpatialReference() srsQuery.ImportFromEPSG(4326) srsArea = osr.SpatialReference() srsArea.ImportFromEPSG(900913) transf = osr.CoordinateTransformation(srsQuery,srsArea) #Iterating all the polygons polygonFeature = polygonsLayer.GetNextFeature() k=0 while polygonFeature: k = k + 1 print "processing " + polygonFeature.GetField("NAME") + " ("+polygonFeature.GetField("STATE_NAME")+") - " + str(k) + " of " + str(polygonsLayer.GetFeatureCount()) geometry = polygonFeature.GetGeometryRef() numCounts = 0.0 pointsLayer.ResetReading() pointFeature = pointsLayer.GetNextFeature() #Iterating all the points while pointFeature: if pointFeature.GetGeometryRef().Within(geometry): numCounts = numCounts + 1 pointFeature = pointsLayer.GetNextFeature() #Calculating the actual area geometryArea = geometry.Clone() geometryArea.Transform(transf) polygonArea = geometryArea.GetArea()/(1000000.0) density = numCounts/polygonArea #Writting the fields in the output layer feature = ogr.Feature( outLayer.GetLayerDefn()) feature.SetField( "Density", density ) feature.SetField( "Name", polygonFeature.GetField("NAME") ) feature.SetField( "State_name", polygonFeature.GetField("STATE_NAME") ) feature.SetGeometry(geometry) outLayer.CreateFeature(feature) polygonFeature = polygonsLayer.GetNextFeature() dsOut = None dsPolygons = None dsPoints = None if __name__ == "__main__": points = sys.argv[1] polygons = sys.argv[2] outfile = sys.argv[3] density(points,polygons,outfile)
To run the script, just type:
python density.py "data/tornadx020.shp" "data/UScounties.shp" "out.shp"
Note that the script isn't for any general case. Different input projections should be handled, and the input fields should be different according to the file. Besides, no geometry checks are done.
The result, opening and colouring the file with QGIS looks like that:
Now is easy to see that the central USA and Florida have the highest tornado density.
Tornado density in a grid
Sometimes we don't have a particular set of polygons of interest, but a raster where every pixel has the density value is an interesting data.
In this case, the output is created using GDAL instead of OGR, and is a regular grid defined by the user. But the steps are very similar.
As in the example before, the areas in each pixel are in square degrees, and this must be changed to square kilometers.
The code is:
In this case, the output is created using GDAL instead of OGR, and is a regular grid defined by the user. But the steps are very similar.
As in the example before, the areas in each pixel are in square degrees, and this must be changed to square kilometers.
The code is:
from osgeo import ogr from osgeo import osr from osgeo import gdal from struct import pack import sys def density(pointsFile,outFile,x0,y0,x1,y1,samples,lines): deltax = (x1 - x0) / samples deltay = (y1 - y0) / lines dsPoints = ogr.Open(pointsFile) if dsPoints is None: raise Exception('Could not open ' + pointsFile) pointsLayer = dsPoints.GetLayer() #Creating the output file driverName = "ESRI Shapefile" drv = ogr.GetDriverByName( driverName ) srsQuery = osr.SpatialReference() srsQuery.ImportFromEPSG(4326) srsArea = osr.SpatialReference() srsArea.ImportFromEPSG(900913) transf = osr.CoordinateTransformation(srsQuery,srsArea) driverName = gdal.GetDriverByName( 'GTiff' ) dsOut = driverName.Create( outFile, samples, lines, 1, gdal.GDT_Float32) dsOut.SetProjection(srsPeticio.ExportToWkt()) dsOut.SetGeoTransform([x0,deltax,0,y0,0,deltay]) #Iterating all the pixels to write the raster values dataString = '' for j in range(0,lines): for i in range(0,samples): print "sample " + str(i) + " line " + str(j) px = x0 + deltax * i py = y0 + deltay * j #The pixel geometry must be created to execute the within method, and to calculate the actual area wkt = "POLYGON(("+str(px)+" "+str(py)+","+str(px + deltax)+" "+str(py)+","+str(px + deltax)+" "+str(py+deltay)+","+str(px)+" "+str(py+deltay)+","+str(px)+" "+str(py)+"))" geometry = ogr.CreateGeometryFromWkt(wkt) numCounts = 0.0 pointsLayer.ResetReading() pointFeature = pointsLayer.GetNextFeature() #Iterating all the points while pointFeature: if pointFeature.GetGeometryRef().Within(geometry): numCounts = numCounts + 1 pointFeature = pointsLayer.GetNextFeature() geometryArea = geometry.Clone() geometryArea.Transform(transf) polygonArea = geometryArea.GetArea()/(1000000.0) density = numCounts/polygonArea dataString = dataString + pack('f',density) #Writting the raster dsOut.GetRasterBand(1).WriteRaster( 0, 0, samples, lines, dataString ) dsOut = None dsPoints = None if __name__ == "__main__": points = sys.argv[1] outfile = sys.argv[2] x0=float(sys.argv[3]) y0=float(sys.argv[4]) x1=float(sys.argv[5]) y1=float(sys.argv[6]) samples=int(sys.argv[7]) lines=int(sys.argv[8]) density(points,outfile,x0,y0,x1,y1,samples,lines)
To run the script, type:
Where the numbers are the coordinates and the size of the output raster as explained in the code.
Again, the script lasts quite a lot to run.
The result seen with QGIS looks like that:
python density_raster.py "data/tornadx020.shp" "grid.tiff" -127.0 51.0 -65.0 24.0 50 50
Where the numbers are the coordinates and the size of the output raster as explained in the code.
Again, the script lasts quite a lot to run.
The result seen with QGIS looks like that:
Is similar to the counties map, because both pixels and counties have similar sizes. The result may be easier to handle in this way.
Final thoughts
It would be nice improve the code, creating all the functionalities in a class, checking the input values, accepting different projections, etc.
Remember that all the example is available in a zip file.
Thank you. Not just for this post but every post on your blog is excellent and is what i've been googling to find for a while because i'm new to programming and started to learn Python and ogr/gdal
ReplyDeleteThanks!
Thank you!
ReplyDeleteNice examples, very useful! I did find that the syntax to run the first example was slightly wrong. The script wanted a directory (rather than the name of a shapefile) for the last argument. I used the following:
ReplyDeletemkdir out && python density.py "data/tornadx020.shp" "data/UScounties.shp" out
Thank you Andy. Actually, the problem is that gdal is not able to overwrite an existing shp file, and doesn't give a good error message.
DeleteIf I understood well, this is because gdal is intended to append the new data in these cases instead of removing everything and adding the new data.
So you can either create a new directory as you suggest (different each time you run the script) or remove or rename the existing out.* files.
Or even better, add a os.path.exists() in the script and remove the existing file if the result is true.
Hi. Useful codes here, thanks. Have searched the blogs and communities, postgis, qgis, ogr for some time trying to find an easy method for line density calculations like what can be done in arcpy for Spatial Analyst with one line of code and half a minute of computation. Best found is this http://lists.osgeo.org/pipermail/postgis-users/2012-December/036136.html
ReplyDeleteDo you have any alternative?
Hi Karl,
ReplyDeleteIf I understand the message, you want to do a similar thing than the explained here, but changing points for lines, isn't it?
I haven't got an example, but the grid greation would be the same. Only this "line density" would make to change the code a little. If "line density" means how many lines pass through the selected pixel, the Crosses function could be used.
I don't know if this is "one line of code", but it should work.
Hi. I am having task with the inverse problem of your tutorial which is to create random points with known density. What do you think the best approach? Using python and gdal/ogr of course
ReplyDeleteHi. I saw an example done with D3js, but I can't find it again...
ReplyDeleteWithout thinking it better, I would get the bounding box for each polygon, and get a random location there x = random.uniform(xmin, xmax), test if it's inside the polygon with the ogr within function, and draw it or look for an other if it's outside, until you arrive to the number of points, that will depend on the area of the polygon and the density.
Good luck!