Showing posts with label QGIS. Show all posts
Showing posts with label QGIS. Show all posts

Friday, July 6, 2012

Testing inverse of the distance interpolation with AEMet pressure data

I like testing the things I learn with actual data. Some time ago, I wrote an entry about how to use the inverse of the distance to interpolate data from scattered points. Now, I will show how to use it to draw a sea level pressure map from the data given by the Spanish Meteorology Agency. The Agency has a nice Open Data section that it's worth a visit if you are interested in meteorology as I am.

As usual, I have uploaded all the files you can need to check the entry.
Getting the data
The station data from the AEMet is available in their ftp server. 
In the example, an FTP connection to the AEMet server has to be done, and only the sea level pressure is parsed. If the actual pressure is used, the interpolation will be wrong, since the altitude won't be considered. AEMet gives the corrected pressure directly.
The file format is quite easy to understand and parse. It's a compressed CSV file, with the variable names indicated in each value. Not all the stations have pressure data, and the file's directory changes it's name with the date, so the script has to do some checks:


from ftplib import FTP
import gzip

'''
Gets the last sea level pressure values from the AEMet service.
'''
def get_data():
 ftp = FTP('ftpdatos.aemet.es')   # connect to host, default port
 ftp.login()               # user anonymous, passwd anonymous@
 ftp.cwd('datos_observacion/observaciones_diezminutales') #Getting in the 10 minute datadirectory

 files = ftp.nlst()  # list directory contents
 #Getting the newest day directory. It must include the word "diezminutales". The name includes the date, so ordering it by name, the last one is also the newest one.
 directories = []
 for file_name in files:
     if file_name.endswith('diezminutales'):
  directories.append(file_name)

 directories.sort()

 ftp.cwd(directories.pop())
        #Getting the last file, ordered by name.
 files = ftp.nlst()
 data_files = []
 for file_name in files:
     if file_name.endswith('gz'):
  data_files.append(file_name)

 data_files.sort()

 data_file = data_files.pop()
 ftp.retrbinary("RETR " + data_file, open(data_file, 'wb').write)
 
 #The files are compressed with gzip. Getting the contents into a variable.
 f = gzip.open(data_file, 'rb')

 data = {'lon':[],'lat':[],'press':[]}
        #Parsing the file line by line
 for line in f:
     station_values = line.split(',')
     #Some valeus are always in the same position
     lat = float(station_values[2])
     lon = float(station_values[3])
     #Other values must be found using their label name, and not all the stations give pressure data
     press = None
     for var in station_values:
  if ('PRES_nmar=' in var) is True and ('QPRES_nmar=' in var) is False:
      press = float(var.replace('PRES_nmar=',''))
     if press != None:
  data['lon'].append(lon)
  data['lat'].append(lat)
  data['press'].append(press)
 f.close()

 return data 
  • The script defines a function for using it from other files, as we will in the next section.
  • We will use the latest data, so is necessary to order the directory and file names to guess which file to download. To understand it better, just connect to the ftp and check how is organized. 
  • The data is compressed, so I use gzip to open it.
  • Finally, only one of the values is used, so it's chosen. Other maps could be done from the same data set, although interpolating some variables, like temperature,  this way doesn't make sense.

Viewing the data 

To represent the data in the example, we will create a GeoTiff layer for the interpolated raster, and a Shapefile fot the point data. Then, using QGIS, we will draw the map. The coasline has been downloaded from the NOAA coastline extractor.
from get_data import get_data
from invdistgis import invDist
from osgeo import osr
from osgeo import ogr


def writeShape(data,proj,outFile):
    driverName = "ESRI Shapefile"
    drv = ogr.GetDriverByName( driverName )
    dsOut = drv.CreateDataSource( outFile )  
    outLayer = dsOut.CreateLayer('PRESS', None, ogr.wkbPoint)  
    field = ogr.FieldDefn( "PRESS", ogr.OFTReal )
    outLayer.CreateField ( field )
    for i in range(0,len(data['press'])):
        #print str(i) + " - " + str(data['lon'][i])+" "+str(data['lat'][i])
        geometry = ogr.CreateGeometryFromWkt("POINT("+str(data['lon'][i])+" "+str(data['lat'][i])+")")
        feature = ogr.Feature( outLayer.GetLayerDefn())
        feature.SetField( 'PRESS', data['press'][i] )

        feature.SetGeometry(geometry)
        outLayer.CreateFeature(feature)        

    dsOut = None


    
def createMap():
    #Getting the data from the function
    data = get_data()
    driverName="GTiff"
    outFile="press.tiff"
    power=2
    smoothing=5
    xSize=200
    ySize=200
    proj = osr.SpatialReference()
    proj.SetFromUserInput(str("EPSG:4326"))  
    geotransform=[-10.35,0.07185,0,44.28,0,-0.0458]
    writeShape(data,proj,"press.shp")
    invDist(data['lon'],data['lat'],data['press'],geotransform,proj,xSize,ySize,power,smoothing,driverName,outFile)
  
    
if __name__ == "__main__":
    createMap()
  • I have created a function to draw the shapefile, so it's easier to understand how to do it.
  • If you run the script and a shapefile already exists, the GDAL will throw an error. I think that is a kind of bug... or feature!
  • To get the data, the script is using the function described above.
  • The map creation uses the inverse of the distance function used in the entry that explained how to do it.
The final result looks like that:

Thursday, June 14, 2012

Density maps using GDAL/OGR python

Having a cloud of points is quite common. This tutorial shows how to calculate the density of points in a set of polygons.
As always, you can download all the data and scripts.

The data

Seeking some data for the tutorial, I found  the  position of all the tornado touchdowns in the period 1950-2004. Calculating the "tornado density" in the US seemed interesting enough.
You can find the data at the data.gov site.

The density can be calculated in a regular grid or a set of polygons. We will do both things, using the US counties as the polygons, which I found at wingis.org

So the downloaded data looks like that when opened in QGIS:
It's easy to see that most of the tornadoes occur in the east half of the USA, but it's still difficult to see actual density differences.

Tornado density in every US county 

First, a script to assign to every county a "tornado touchdown density" value. What the script will do is basically:
  1. Open the input datasets: The tornado points file, and the counties file
  2. Create the output file, and the fields we will use (name and state name, plus the density field)
  3. For each county, we use the method Within for all the points representing the tornadoes. 
  4. We calculate the density. Since the file is in latlon projection, the GetArea method would be in square degrees, which is incorrect when comparing, because the actual area would be much smaller in northern regions.
    So we re project the county geometry to the EPSG:900913 projection, which is in meters. I think that is actually incorrect, that the UTM would be better, but the zone changes too much and it's out of the scope of the entry.
    The density is simply the number of tornadoes divided by the county area.
  5. We write the output file.
So the code looks like:
from osgeo import ogr
from osgeo import osr
import sys


def density(pointsFile,polygonsFile,outFile):
    #Opening the input files
    dsPolygons = ogr.Open(polygonsFile)
    if dsPolygons is None:
       raise Exception('Could not open ' + pointsFile)
    dsPoints = ogr.Open(pointsFile)
    if dsPoints is None:
       raise Exception('Could not open ' + pointsFile)
    
    polygonsLayer = dsPolygons.GetLayer()
    pointsLayer = dsPoints.GetLayer()


    #Creating the output file

    driverName = "ESRI Shapefile"
    drv = ogr.GetDriverByName( driverName )
    dsOut = drv.CreateDataSource( outFile )  
    outLayer = dsOut.CreateLayer('DENSITY', None, ogr.wkbPolygon)  
    field = ogr.FieldDefn( "Density", ogr.OFTReal )
    outLayer.CreateField ( field )
    field = ogr.FieldDefn( "Name", ogr.OFTString )
    outLayer.CreateField ( field )
    field = ogr.FieldDefn( "State_name", ogr.OFTString )
    outLayer.CreateField ( field )
    
    #Preparing the coordinate transformation
    srsQuery = osr.SpatialReference()
    srsQuery.ImportFromEPSG(4326)    
    srsArea = osr.SpatialReference()
    srsArea.ImportFromEPSG(900913)  

    transf = osr.CoordinateTransformation(srsQuery,srsArea)

    #Iterating all the polygons
    polygonFeature = polygonsLayer.GetNextFeature()
    k=0
    while polygonFeature:
        k = k + 1
        print  "processing " + polygonFeature.GetField("NAME") + " ("+polygonFeature.GetField("STATE_NAME")+") - " + str(k) + " of " + str(polygonsLayer.GetFeatureCount())
         
        geometry = polygonFeature.GetGeometryRef()
        numCounts = 0.0
        pointsLayer.ResetReading()
        pointFeature = pointsLayer.GetNextFeature()
        #Iterating all the points
        while pointFeature:
            if pointFeature.GetGeometryRef().Within(geometry):
                numCounts = numCounts + 1
            pointFeature = pointsLayer.GetNextFeature()
        
        #Calculating the actual area
        geometryArea = geometry.Clone()
        geometryArea.Transform(transf)
        polygonArea = geometryArea.GetArea()/(1000000.0)
        density = numCounts/polygonArea

        #Writting the fields in the output layer
        feature = ogr.Feature( outLayer.GetLayerDefn())
        feature.SetField( "Density", density )
        feature.SetField( "Name", polygonFeature.GetField("NAME") )
        feature.SetField( "State_name", polygonFeature.GetField("STATE_NAME") )

        feature.SetGeometry(geometry)
        outLayer.CreateFeature(feature)        

        polygonFeature = polygonsLayer.GetNextFeature()

    dsOut = None
    dsPolygons = None
    dsPoints = None


if __name__ == "__main__":
    points = sys.argv[1]
    polygons = sys.argv[2]
    outfile = sys.argv[3]

    density(points,polygons,outfile)

To run the script, just type:
python density.py "data/tornadx020.shp" "data/UScounties.shp" "out.shp"

Note that the script isn't for any general case. Different input projections should be handled, and the input fields should be different according to the file. Besides, no geometry checks are done.


The result, opening and colouring the file with QGIS looks like that:
 Now is easy to see that the central USA and Florida have the highest tornado density.


Tornado density in a grid

Sometimes we don't have a particular set of polygons of interest, but a raster where every pixel has the density value is an interesting data.
In this case, the output is created using GDAL instead of OGR, and is a regular grid defined by the user. But the steps are very similar.
As in the example before, the areas in each pixel are in square degrees, and this must be changed to square kilometers.
The code is:
from osgeo import ogr
from osgeo import osr
from osgeo import gdal
from struct import pack
import sys

def density(pointsFile,outFile,x0,y0,x1,y1,samples,lines):
    
    deltax = (x1 - x0) / samples  
    deltay = (y1 - y0) / lines

    dsPoints = ogr.Open(pointsFile)
    if dsPoints is None:
       raise Exception('Could not open ' + pointsFile)
    
    pointsLayer = dsPoints.GetLayer()

    #Creating the output file

    driverName = "ESRI Shapefile"
    drv = ogr.GetDriverByName( driverName )
    
    srsQuery = osr.SpatialReference()
    srsQuery.ImportFromEPSG(4326)    
    srsArea = osr.SpatialReference()
    srsArea.ImportFromEPSG(900913)  

    transf = osr.CoordinateTransformation(srsQuery,srsArea)

    driverName = gdal.GetDriverByName( 'GTiff' )
    dsOut = driverName.Create( outFile, samples, lines, 1, gdal.GDT_Float32)
    dsOut.SetProjection(srsPeticio.ExportToWkt())
    dsOut.SetGeoTransform([x0,deltax,0,y0,0,deltay])
    

    #Iterating all the pixels to write the raster values
    dataString = ''
    for j in range(0,lines):
      for i in range(0,samples):
 print "sample " + str(i) + " line " + str(j)
        px = x0 + deltax * i
        py = y0 + deltay * j
        #The pixel geometry must be created to execute the within method, and to calculate the actual area
        wkt = "POLYGON(("+str(px)+" "+str(py)+","+str(px + deltax)+" "+str(py)+","+str(px + deltax)+" "+str(py+deltay)+","+str(px)+" "+str(py+deltay)+","+str(px)+" "+str(py)+"))"
        
        geometry = ogr.CreateGeometryFromWkt(wkt)
        
        numCounts = 0.0
        pointsLayer.ResetReading()
        pointFeature = pointsLayer.GetNextFeature()
        
        #Iterating all the points
        while pointFeature:
            if pointFeature.GetGeometryRef().Within(geometry):
                numCounts = numCounts + 1
            pointFeature = pointsLayer.GetNextFeature()
        
        geometryArea = geometry.Clone()
        geometryArea.Transform(transf)
        polygonArea = geometryArea.GetArea()/(1000000.0)
        density = numCounts/polygonArea
        dataString = dataString + pack('f',density)
    
    #Writting the raster
    dsOut.GetRasterBand(1).WriteRaster( 0, 0, samples, lines, dataString ) 

    dsOut = None
    dsPoints = None
if __name__ == "__main__":
    points = sys.argv[1]
    outfile = sys.argv[2]
    x0=float(sys.argv[3])
    y0=float(sys.argv[4])
    x1=float(sys.argv[5])
    y1=float(sys.argv[6])
    samples=int(sys.argv[7])
    lines=int(sys.argv[8])

  
    density(points,outfile,x0,y0,x1,y1,samples,lines)

To run the script, type:

python density_raster.py "data/tornadx020.shp" "grid.tiff" -127.0 51.0 -65.0 24.0 50 50

Where the numbers are the coordinates and the size of the output raster as explained in the code.

Again, the script lasts quite a lot to run.

The result seen with QGIS looks like that:
Is similar to the counties map, because both pixels and counties have similar sizes. The result may be easier to handle in this way.

Final thoughts

In the example data, executing the script takes a long time, since 53000 points are given, and there are about 3500 counties. Using PostGIS or so would be faster, but the example doesn't require more installations than GDAL and python.

It would be nice improve the code, creating all the functionalities in a class, checking the input values, accepting different projections, etc.

Remember that all the example is available in a zip file.