Having a cloud of points is quite common. This tutorial shows how to calculate the density of points in a set of polygons.
As always, you can download all the data and scripts.
Note that the script isn't for any general case. Different input projections should be handled, and the input fields should be different according to the file. Besides, no geometry checks are done.
The result, opening and colouring the file with QGIS looks like that:
In the example data, executing the script takes a long time, since 53000
points are given, and there are about 3500 counties. Using PostGIS or so
would be faster, but the example doesn't require more installations than GDAL and python.
It would be nice improve the code, creating all the functionalities in a class, checking the input values, accepting different projections, etc.
Remember that all the example is available in a zip file.
As always, you can download all the data and scripts.
The data
Seeking some data for the tutorial, I found the position of all the tornado touchdowns in the period 1950-2004. Calculating the "tornado density" in the US seemed interesting enough.
You can find the data at the data.gov site.
The density can be calculated in a regular grid or a set of polygons. We will do both things, using the US counties as the polygons, which I found at wingis.org
So the downloaded data looks like that when opened in QGIS:
It's easy to see that most of the tornadoes occur in the east half of the USA, but it's still difficult to see actual density differences.
The density can be calculated in a regular grid or a set of polygons. We will do both things, using the US counties as the polygons, which I found at wingis.org
So the downloaded data looks like that when opened in QGIS:
It's easy to see that most of the tornadoes occur in the east half of the USA, but it's still difficult to see actual density differences.
Tornado density in every US county
First, a script to assign to every county a "tornado touchdown density" value. What the script will do is basically:
- Open the input datasets: The tornado points file, and the counties file
- Create the output file, and the fields we will use (name and state name, plus the density field)
- For each county, we use the method Within for all the points representing the tornadoes.
- We calculate the density. Since the file is in latlon projection, the GetArea method would be in square degrees, which is incorrect when comparing, because the actual area would be much smaller in northern regions.
So we re project the county geometry to the EPSG:900913 projection, which is in meters. I think that is actually incorrect, that the UTM would be better, but the zone changes too much and it's out of the scope of the entry.
The density is simply the number of tornadoes divided by the county area. - We write the output file.
So the code looks like:
from osgeo import ogr from osgeo import osr import sys def density(pointsFile,polygonsFile,outFile): #Opening the input files dsPolygons = ogr.Open(polygonsFile) if dsPolygons is None: raise Exception('Could not open ' + pointsFile) dsPoints = ogr.Open(pointsFile) if dsPoints is None: raise Exception('Could not open ' + pointsFile) polygonsLayer = dsPolygons.GetLayer() pointsLayer = dsPoints.GetLayer() #Creating the output file driverName = "ESRI Shapefile" drv = ogr.GetDriverByName( driverName ) dsOut = drv.CreateDataSource( outFile ) outLayer = dsOut.CreateLayer('DENSITY', None, ogr.wkbPolygon) field = ogr.FieldDefn( "Density", ogr.OFTReal ) outLayer.CreateField ( field ) field = ogr.FieldDefn( "Name", ogr.OFTString ) outLayer.CreateField ( field ) field = ogr.FieldDefn( "State_name", ogr.OFTString ) outLayer.CreateField ( field ) #Preparing the coordinate transformation srsQuery = osr.SpatialReference() srsQuery.ImportFromEPSG(4326) srsArea = osr.SpatialReference() srsArea.ImportFromEPSG(900913) transf = osr.CoordinateTransformation(srsQuery,srsArea) #Iterating all the polygons polygonFeature = polygonsLayer.GetNextFeature() k=0 while polygonFeature: k = k + 1 print "processing " + polygonFeature.GetField("NAME") + " ("+polygonFeature.GetField("STATE_NAME")+") - " + str(k) + " of " + str(polygonsLayer.GetFeatureCount()) geometry = polygonFeature.GetGeometryRef() numCounts = 0.0 pointsLayer.ResetReading() pointFeature = pointsLayer.GetNextFeature() #Iterating all the points while pointFeature: if pointFeature.GetGeometryRef().Within(geometry): numCounts = numCounts + 1 pointFeature = pointsLayer.GetNextFeature() #Calculating the actual area geometryArea = geometry.Clone() geometryArea.Transform(transf) polygonArea = geometryArea.GetArea()/(1000000.0) density = numCounts/polygonArea #Writting the fields in the output layer feature = ogr.Feature( outLayer.GetLayerDefn()) feature.SetField( "Density", density ) feature.SetField( "Name", polygonFeature.GetField("NAME") ) feature.SetField( "State_name", polygonFeature.GetField("STATE_NAME") ) feature.SetGeometry(geometry) outLayer.CreateFeature(feature) polygonFeature = polygonsLayer.GetNextFeature() dsOut = None dsPolygons = None dsPoints = None if __name__ == "__main__": points = sys.argv[1] polygons = sys.argv[2] outfile = sys.argv[3] density(points,polygons,outfile)
To run the script, just type:
python density.py "data/tornadx020.shp" "data/UScounties.shp" "out.shp"
Note that the script isn't for any general case. Different input projections should be handled, and the input fields should be different according to the file. Besides, no geometry checks are done.
The result, opening and colouring the file with QGIS looks like that:
Now is easy to see that the central USA and Florida have the highest tornado density.
Tornado density in a grid
Sometimes we don't have a particular set of polygons of interest, but a raster where every pixel has the density value is an interesting data.
In this case, the output is created using GDAL instead of OGR, and is a regular grid defined by the user. But the steps are very similar.
As in the example before, the areas in each pixel are in square degrees, and this must be changed to square kilometers.
The code is:
In this case, the output is created using GDAL instead of OGR, and is a regular grid defined by the user. But the steps are very similar.
As in the example before, the areas in each pixel are in square degrees, and this must be changed to square kilometers.
The code is:
from osgeo import ogr from osgeo import osr from osgeo import gdal from struct import pack import sys def density(pointsFile,outFile,x0,y0,x1,y1,samples,lines): deltax = (x1 - x0) / samples deltay = (y1 - y0) / lines dsPoints = ogr.Open(pointsFile) if dsPoints is None: raise Exception('Could not open ' + pointsFile) pointsLayer = dsPoints.GetLayer() #Creating the output file driverName = "ESRI Shapefile" drv = ogr.GetDriverByName( driverName ) srsQuery = osr.SpatialReference() srsQuery.ImportFromEPSG(4326) srsArea = osr.SpatialReference() srsArea.ImportFromEPSG(900913) transf = osr.CoordinateTransformation(srsQuery,srsArea) driverName = gdal.GetDriverByName( 'GTiff' ) dsOut = driverName.Create( outFile, samples, lines, 1, gdal.GDT_Float32) dsOut.SetProjection(srsPeticio.ExportToWkt()) dsOut.SetGeoTransform([x0,deltax,0,y0,0,deltay]) #Iterating all the pixels to write the raster values dataString = '' for j in range(0,lines): for i in range(0,samples): print "sample " + str(i) + " line " + str(j) px = x0 + deltax * i py = y0 + deltay * j #The pixel geometry must be created to execute the within method, and to calculate the actual area wkt = "POLYGON(("+str(px)+" "+str(py)+","+str(px + deltax)+" "+str(py)+","+str(px + deltax)+" "+str(py+deltay)+","+str(px)+" "+str(py+deltay)+","+str(px)+" "+str(py)+"))" geometry = ogr.CreateGeometryFromWkt(wkt) numCounts = 0.0 pointsLayer.ResetReading() pointFeature = pointsLayer.GetNextFeature() #Iterating all the points while pointFeature: if pointFeature.GetGeometryRef().Within(geometry): numCounts = numCounts + 1 pointFeature = pointsLayer.GetNextFeature() geometryArea = geometry.Clone() geometryArea.Transform(transf) polygonArea = geometryArea.GetArea()/(1000000.0) density = numCounts/polygonArea dataString = dataString + pack('f',density) #Writting the raster dsOut.GetRasterBand(1).WriteRaster( 0, 0, samples, lines, dataString ) dsOut = None dsPoints = None if __name__ == "__main__": points = sys.argv[1] outfile = sys.argv[2] x0=float(sys.argv[3]) y0=float(sys.argv[4]) x1=float(sys.argv[5]) y1=float(sys.argv[6]) samples=int(sys.argv[7]) lines=int(sys.argv[8]) density(points,outfile,x0,y0,x1,y1,samples,lines)
To run the script, type:
Where the numbers are the coordinates and the size of the output raster as explained in the code.
Again, the script lasts quite a lot to run.
The result seen with QGIS looks like that:
python density_raster.py "data/tornadx020.shp" "grid.tiff" -127.0 51.0 -65.0 24.0 50 50
Where the numbers are the coordinates and the size of the output raster as explained in the code.
Again, the script lasts quite a lot to run.
The result seen with QGIS looks like that:
Is similar to the counties map, because both pixels and counties have similar sizes. The result may be easier to handle in this way.
Final thoughts
It would be nice improve the code, creating all the functionalities in a class, checking the input values, accepting different projections, etc.
Remember that all the example is available in a zip file.