OK, I have to admit that I was so happy when I found the scipy rbf function that I went too fast writing the entry about inverse of the distance.
I swear that the images are real (you can check it yourself), but when you change a few parameters or the point position, the result is really strange, definitely not the inverse of the distance algorithm I was thinking about.
Looking a little more around the gdal documentation, I found this explanation, which is very easy and clear:
http://www.gdal.org/grid_tutorial.html
The problem is that, at least in my python gdal version, is not possible to call the function directly, so I've been coding a little to get it.
All the example code and files can be downloaded in a zip file.
Let's explain it step by step:
If the smoothing is zero and the power is high, the interpolation changes a lot around the points to give them their exact value.
If the smoothing is high and the power is one, the result is much smoother, but the values at the points are not maintained.
The data in the first example was set into three lists, but the correct way to do it would be read it from a geometry file with the OGR library. Another function is defined to do that:I swear that the images are real (you can check it yourself), but when you change a few parameters or the point position, the result is really strange, definitely not the inverse of the distance algorithm I was thinking about.
Looking a little more around the gdal documentation, I found this explanation, which is very easy and clear:
http://www.gdal.org/grid_tutorial.html
The problem is that, at least in my python gdal version, is not possible to call the function directly, so I've been coding a little to get it.
All the example code and files can be downloaded in a zip file.
First version: the algorithm
Putting all the code directly makes things difficult to understand, so first, I'll reproduce the same program I wanted to show in the entrance, using matplotlib to show the results:
from math import pow from math import sqrt import numpy as np import matplotlib.pyplot as plt def pointValue(x,y,power,smoothing,xv,yv,values): nominator=0 denominator=0 for i in range(0,len(values)): dist = sqrt((x-xv[i])*(x-xv[i])+(y-yv[i])*(y-yv[i])+smoothing*smoothing); #If the point is really close to one of the data points, return the data point value to avoid singularities if(dist<0.0000000001): return values[i] nominator=nominator+(values[i]/pow(dist,power)) denominator=denominator+(1/pow(dist,power)) #Return NODATA if the denominator is zero if denominator > 0: value = nominator/denominator else: value = -9999 return value def invDist(xv,yv,values,xsize=100,ysize=100,power=2,smoothing=0): valuesGrid = np.zeros((ysize,xsize)) for x in range(0,xsize): for y in range(0,ysize): valuesGrid[y][x] = pointValue(x,y,power,smoothing,xv,yv,values) return valuesGrid if __name__ == "__main__": power=1 smoothing=20 #Creating some data, with each coodinate and the values stored in separated lists xv = [10,60,40,70,10,50,20,70,30,60] yv = [10,20,30,30,40,50,60,70,80,90] values = [1,2,2,3,4,6,7,7,8,10] #Creating the output grid (100x100, in the example) ti = np.linspace(0, 100, 100) XI, YI = np.meshgrid(ti, ti) #Creating the interpolation function and populating the output matrix value ZI = invDist(xv,yv,values,100,100,power,smoothing) # Plotting the result n = plt.normalize(0.0, 100.0) plt.subplot(1, 1, 1) plt.pcolor(XI, YI, ZI) plt.scatter(xv, yv, 100, values) plt.title('Inv dist interpolation - power: ' + str(power) + ' smoothing: ' + str(smoothing)) plt.xlim(0, 100) plt.ylim(0, 100) plt.colorbar() plt.show()
- The main method defines some sample data, calls the interpolation function and draws it using matplotlib. Just copy&paste, since it's long to explain.
- invDist function: For each point in the coordinates matrix, a value needs to be calculated. First, a xSize*ySize zeroes values is calculated. Then for each point coordinate (the pixel position, in this simple case), a function that calculates the value is called.
- This is the important part: We apply the formula: $V=\frac{\sum\limits_{i=1}^n \frac{v_{i}}{d_{i}^{p}}}{\sum\limits_{i=1}^n \frac{1}{d_{i}^{p}}}$ for each pixel.
- V is the interpolated value.
- n is the number of points
- d is the distance from the point to the pixel
- v is the value of the point
- p is the power we want to apply to the distance o weight it.
If the smoothing is zero and the power is high, the interpolation changes a lot around the points to give them their exact value.
If the smoothing is high and the power is one, the result is much smoother, but the values at the points are not maintained.
Adding the GIS stuff:
The example above doesn't write to a GIS file. The function invDist must be changed so it creates the file using a gdal driver:
def invDist(xv,yv,values,geotransform,proj,xSize,ySize,power,smoothing,driverName,outFile): #Transform geographic coordinates to pixels for i in range(0,len(xv)): xv[i] = (xv[i]-geotransform[0])/geotransform[1] for i in range(0,len(yv)): yv[i] = (yv[i]-geotransform[3])/geotransform[5] #Creating the file driver = gdal.GetDriverByName( driverName ) ds = driver.Create( outFile, xSize, ySize, 1, gdal.GDT_Float32) if proj is not None: ds.SetProjection(proj.ExportToWkt()) ds.SetGeoTransform(geotransform) valuesGrid = np.zeros((ySize,xSize)) #Getting the interpolated values for x in range(0,xSize): for y in range(0,ySize): valuesGrid[y][x] = pointValue(x,y,power,smoothing,xv,yv,values) ds.GetRasterBand(1).WriteArray(valuesGrid) ds = None return valuesGrid
- Note that the function will need the geographic parameters geotransform and proj, as well as the output file name and driver.
- First, the point positions must be changed to pixel locations in order to calculate distances
- Then, the output file is created with the size and projection asked by the user
- The values are calculated in the same way as in the other example
- The file is written
def readPoints(dataFile,Zfield='Z'): data = {} xv=[] yv=[] values=[] ds = ogr.Open(dataFile) if ds is None: raise Exception('Could not open ' + dataFile) layer = ds.GetLayer() proj = layer.GetSpatialRef() extent = layer.GetExtent() feature = layer.GetNextFeature() if feature.GetFieldIndex(zField) == -1: raise Exception('zField is not valid: ' + zField) while feature: geometry = feature.GetGeometryRef() xv.append(geometry.GetX()) yv.append(geometry.GetY()) values.append(feature.GetField(zField)) feature = layer.GetNextFeature() data['extent'] = extent data['xv']=xv data['yv']=yv data['values']=values data['proj'] = proj ds = None return data
- The function reads the point position and values from the given file and field name.
- A dictionary is returned, to help retrieving all the values we may need to create the output raster. If no projection or geotransform is passed by the user, the input file extents and projection are used.
- In the example files, a file called points.shp has some sample points to test the functions.
if __name__ == "__main__": #Setting the default values power=2 smoothing=0 zField='Z' dataFile=None outFile=None driverName='GTiff' proj=None geotransform = None xMin=None xMax=None yMin=None yMax=None xSize=100 ySize=100 #Parsing the command line argv = gdal.GeneralCmdLineProcessor( sys.argv ) i = 1 while i < len(argv): arg = argv[i] if arg == '-out_format': driverName = argv[i+1] driverName = driverName.replace("'","") driverName = driverName.replace('"','') i = i + 1 elif arg == '-zfield': zField = argv[i+1] zField = zField.replace("'","") zField = zField.replace('"','') i = i + 1 elif arg == '-a_srs': proj = argv[i+1] proj = proj.replace("'","") proj = proj.replace('"','') i = i + 1 elif arg == '-outsize': xSize = argv[i+1] xSize = xSize.replace("'","") xSize = int(xSize.replace('"','')) ySize = argv[i+2] ySize = ySize.replace("'","") ySize = int(ySize.replace('"','')) i = i + 2 elif arg == '-txe': xMin = argv[i+1] xMin = xMin.replace("'","") xMin = float(xMin.replace('"','')) xMax = argv[i+2] xMax = xMax.replace("'","") xMax = float(xMax.replace('"','')) i = i + 2 elif arg == '-tye': yMin = argv[i+1] yMin = yMin.replace("'","") yMin = float(yMin.replace('"','')) yMax = argv[i+2] yMax = yMax.replace("'","") yMax = float(yMax.replace('"','')) i = i + 2 elif dataFile is None: dataFile = arg dataFile = dataFile.replace("'","") dataFile = dataFile.replace('"','') elif outFile is None: outFile = arg outFile = outFile.replace("'","") outFile = outFile.replace('"','') i = i + 1 if dataFile is None or outFile is None: usage() try: data = readPoints(dataFile,zField) except Exception,ex: print ex sys.exit(0) if xMin is None: xMin=data['extent'][0] xMax=data['extent'][1] if yMin is None: yMin=data['extent'][2] yMax=data['extent'][3] geotransform=[] geotransform.append(xMin) geotransform.append((xMax-xMin)/xSize) geotransform.append(0) geotransform.append(yMax) geotransform.append(0) geotransform.append((yMin-yMax)/ySize) if proj is None: proj = data['proj'] else: try: proj = osr.SpatialReference() proj.SetFromUserInput(str(proj)) except Exception,ex: print ex sys.exit(0) #Creating the interpolation function and populating the output matrix value try: ZI = invDist(data['xv'],data['yv'],data['values'],geotransform,proj,xSize,ySize,power,smoothing,driverName,outFile) except Exception,ex: print ex sys.exit(0)
- First, the default parameter values are set. Some of them are None, since they are mandatory.
- Then, the input arguments are read.
- After having the input arguments, all the parameters that can be set, are set.
- The function that reads the input file is run, and with it's result, the raster is created. Note that the geotransform is defined from the input file if no one is set by the user.
The points are not coloured, but are the ones in the original file.
The whole code comes in the example file, with all the data you need to run the example.
To generate the picture above, the command is:
python invdistgis.py -zfield value points.shp out.tiff
What's next
The original gdal_grid implements more interpolation methods (nearest value, and average). Doing the same in the script is as easy as adding other functions to use instead of pointValue.
Also, in the original program, an ellipse can be set to search only in some area around each pixel. This would be a little more difficult do implement, but is easy to find it in the gdal source code in a file called gdalgrid.cpp. I've put it in the example data.
As you can see, implementing this kind of stuff in python is not as difficult as it may seem at the beginning.