Friday, July 6, 2012

Testing inverse of the distance interpolation with AEMet pressure data

I like testing the things I learn with actual data. Some time ago, I wrote an entry about how to use the inverse of the distance to interpolate data from scattered points. Now, I will show how to use it to draw a sea level pressure map from the data given by the Spanish Meteorology Agency. The Agency has a nice Open Data section that it's worth a visit if you are interested in meteorology as I am.

As usual, I have uploaded all the files you can need to check the entry.
Getting the data
The station data from the AEMet is available in their ftp server. 
In the example, an FTP connection to the AEMet server has to be done, and only the sea level pressure is parsed. If the actual pressure is used, the interpolation will be wrong, since the altitude won't be considered. AEMet gives the corrected pressure directly.
The file format is quite easy to understand and parse. It's a compressed CSV file, with the variable names indicated in each value. Not all the stations have pressure data, and the file's directory changes it's name with the date, so the script has to do some checks:


from ftplib import FTP
import gzip

'''
Gets the last sea level pressure values from the AEMet service.
'''
def get_data():
 ftp = FTP('ftpdatos.aemet.es')   # connect to host, default port
 ftp.login()               # user anonymous, passwd anonymous@
 ftp.cwd('datos_observacion/observaciones_diezminutales') #Getting in the 10 minute datadirectory

 files = ftp.nlst()  # list directory contents
 #Getting the newest day directory. It must include the word "diezminutales". The name includes the date, so ordering it by name, the last one is also the newest one.
 directories = []
 for file_name in files:
     if file_name.endswith('diezminutales'):
  directories.append(file_name)

 directories.sort()

 ftp.cwd(directories.pop())
        #Getting the last file, ordered by name.
 files = ftp.nlst()
 data_files = []
 for file_name in files:
     if file_name.endswith('gz'):
  data_files.append(file_name)

 data_files.sort()

 data_file = data_files.pop()
 ftp.retrbinary("RETR " + data_file, open(data_file, 'wb').write)
 
 #The files are compressed with gzip. Getting the contents into a variable.
 f = gzip.open(data_file, 'rb')

 data = {'lon':[],'lat':[],'press':[]}
        #Parsing the file line by line
 for line in f:
     station_values = line.split(',')
     #Some valeus are always in the same position
     lat = float(station_values[2])
     lon = float(station_values[3])
     #Other values must be found using their label name, and not all the stations give pressure data
     press = None
     for var in station_values:
  if ('PRES_nmar=' in var) is True and ('QPRES_nmar=' in var) is False:
      press = float(var.replace('PRES_nmar=',''))
     if press != None:
  data['lon'].append(lon)
  data['lat'].append(lat)
  data['press'].append(press)
 f.close()

 return data 
  • The script defines a function for using it from other files, as we will in the next section.
  • We will use the latest data, so is necessary to order the directory and file names to guess which file to download. To understand it better, just connect to the ftp and check how is organized. 
  • The data is compressed, so I use gzip to open it.
  • Finally, only one of the values is used, so it's chosen. Other maps could be done from the same data set, although interpolating some variables, like temperature,  this way doesn't make sense.

Viewing the data 

To represent the data in the example, we will create a GeoTiff layer for the interpolated raster, and a Shapefile fot the point data. Then, using QGIS, we will draw the map. The coasline has been downloaded from the NOAA coastline extractor.
from get_data import get_data
from invdistgis import invDist
from osgeo import osr
from osgeo import ogr


def writeShape(data,proj,outFile):
    driverName = "ESRI Shapefile"
    drv = ogr.GetDriverByName( driverName )
    dsOut = drv.CreateDataSource( outFile )  
    outLayer = dsOut.CreateLayer('PRESS', None, ogr.wkbPoint)  
    field = ogr.FieldDefn( "PRESS", ogr.OFTReal )
    outLayer.CreateField ( field )
    for i in range(0,len(data['press'])):
        #print str(i) + " - " + str(data['lon'][i])+" "+str(data['lat'][i])
        geometry = ogr.CreateGeometryFromWkt("POINT("+str(data['lon'][i])+" "+str(data['lat'][i])+")")
        feature = ogr.Feature( outLayer.GetLayerDefn())
        feature.SetField( 'PRESS', data['press'][i] )

        feature.SetGeometry(geometry)
        outLayer.CreateFeature(feature)        

    dsOut = None


    
def createMap():
    #Getting the data from the function
    data = get_data()
    driverName="GTiff"
    outFile="press.tiff"
    power=2
    smoothing=5
    xSize=200
    ySize=200
    proj = osr.SpatialReference()
    proj.SetFromUserInput(str("EPSG:4326"))  
    geotransform=[-10.35,0.07185,0,44.28,0,-0.0458]
    writeShape(data,proj,"press.shp")
    invDist(data['lon'],data['lat'],data['press'],geotransform,proj,xSize,ySize,power,smoothing,driverName,outFile)
  
    
if __name__ == "__main__":
    createMap()
  • I have created a function to draw the shapefile, so it's easier to understand how to do it.
  • If you run the script and a shapefile already exists, the GDAL will throw an error. I think that is a kind of bug... or feature!
  • To get the data, the script is using the function described above.
  • The map creation uses the inverse of the distance function used in the entry that explained how to do it.
The final result looks like that: