Saturday, May 26, 2012

Creating a grid from scattered data using inverse of the distance with python (gdal_grid approach)

OK, I have to admit that I was so happy when I found the  scipy rbf function that I went too fast writing the entry about inverse of the distance.
I swear that the images are real (you can check it yourself), but when you change a few parameters or the point position, the result is really strange, definitely not the inverse of the distance algorithm I was thinking about.
Looking a little more around the gdal documentation, I found this explanation, which is very easy and clear:
The problem is that, at least in my python gdal version, is not possible to call the function directly, so I've been coding a little to get it.
All the example code and files can be downloaded in a zip file.

First version: the algorithm

Putting all the code directly makes things difficult to understand, so first,  I'll reproduce the same program I wanted to show in the entrance, using matplotlib to show the results:

from math import pow
from math import sqrt
import numpy as np
import matplotlib.pyplot as plt

def pointValue(x,y,power,smoothing,xv,yv,values):
    for i in range(0,len(values)):
        dist = sqrt((x-xv[i])*(x-xv[i])+(y-yv[i])*(y-yv[i])+smoothing*smoothing);
        #If the point is really close to one of the data points, return the data point value to avoid singularities
            return values[i]
    #Return NODATA if the denominator is zero
    if denominator > 0:
        value = nominator/denominator
        value = -9999
    return value

def invDist(xv,yv,values,xsize=100,ysize=100,power=2,smoothing=0):
    valuesGrid = np.zeros((ysize,xsize))
    for x in range(0,xsize):
        for y in range(0,ysize):
            valuesGrid[y][x] = pointValue(x,y,power,smoothing,xv,yv,values)
    return valuesGrid

if __name__ == "__main__":

    #Creating some data, with each coodinate and the values stored in separated lists
    xv = [10,60,40,70,10,50,20,70,30,60]
    yv = [10,20,30,30,40,50,60,70,80,90]
    values = [1,2,2,3,4,6,7,7,8,10]
    #Creating the output grid (100x100, in the example)
    ti = np.linspace(0, 100, 100)
    XI, YI = np.meshgrid(ti, ti)

    #Creating the interpolation function and populating the output matrix value
    ZI = invDist(xv,yv,values,100,100,power,smoothing)

    # Plotting the result
    n = plt.normalize(0.0, 100.0)
    plt.subplot(1, 1, 1)
    plt.pcolor(XI, YI, ZI)
    plt.scatter(xv, yv, 100, values)
    plt.title('Inv dist interpolation - power: ' + str(power) + ' smoothing: ' + str(smoothing))
    plt.xlim(0, 100)
    plt.ylim(0, 100)
Let's explain it step by step:
  1. The main method defines some sample data, calls the interpolation function and draws it using matplotlib. Just copy&paste, since it's long to explain.
  2. invDist function: For each point in the coordinates matrix, a value needs to be calculated. First, a xSize*ySize zeroes values is calculated. Then for each point coordinate (the pixel position, in this simple case), a function that calculates the value is called.
  3. This is the important part: We apply the formula: $V=\frac{\sum\limits_{i=1}^n \frac{v_{i}}{d_{i}^{p}}}{\sum\limits_{i=1}^n \frac{1}{d_{i}^{p}}}$ for each pixel.
    • V is the interpolated value. 
    • n is the number of points
    • d is the distance from the point to the pixel
    • v is the value of the point
    • p is the power we want to apply to the distance o weight it.
    The distance is the Cartesian one, plus the smoothing factor (it gives more distance than the actual one, lowering the influence around the points). If the distance is close to the precision of the float numbers, we give the data value instead of the interpolated one, to avoid strange results.
And that's all. Here, the results (you can compare them with the ones in the other entry) which are consistent to the inverse of the distance method:
 If the smoothing is zero and the power is high, the interpolation changes a lot around the points to give them their exact value.
If the smoothing is high and the power is one, the result is much smoother, but the values at the points are not maintained.

Adding the GIS stuff:

The example above doesn't write to a GIS file. The function invDist must be changed so it creates the file using a gdal driver:
def invDist(xv,yv,values,geotransform,proj,xSize,ySize,power,smoothing,driverName,outFile):
    #Transform geographic coordinates to pixels
    for i in range(0,len(xv)):
         xv[i] = (xv[i]-geotransform[0])/geotransform[1]
    for i in range(0,len(yv)):
         yv[i] = (yv[i]-geotransform[3])/geotransform[5]
    #Creating the file
    driver = gdal.GetDriverByName( driverName )
    ds = driver.Create( outFile, xSize, ySize, 1, gdal.GDT_Float32)
    if proj is not None:
    valuesGrid = np.zeros((ySize,xSize))
    #Getting the interpolated values
    for x in range(0,xSize):
        for y in range(0,ySize):
            valuesGrid[y][x] = pointValue(x,y,power,smoothing,xv,yv,values)
    ds = None
    return valuesGrid
  1. Note that the function will  need the geographic parameters geotransform and proj, as well as the output file name and driver.
  2. First, the point positions must be changed to pixel locations in order to calculate distances
  3. Then, the output file is created with the size and projection asked by the user
  4. The values are calculated in the same way as in the other example
  5. The file is written
The data in the first example was set into three lists, but the correct way to do it would be read it from a geometry file with the OGR library. Another function is defined to do that:

def readPoints(dataFile,Zfield='Z'):
    data = {}
    ds = ogr.Open(dataFile)
    if ds is None:
       raise Exception('Could not open ' + dataFile)
    layer = ds.GetLayer()
    proj = layer.GetSpatialRef()
    extent = layer.GetExtent()

    feature = layer.GetNextFeature()
    if feature.GetFieldIndex(zField) == -1:
         raise Exception('zField is not valid: ' + zField)

    while feature:
        geometry = feature.GetGeometryRef()

        feature = layer.GetNextFeature()
    data['extent'] = extent 
    data['proj'] = proj
    ds = None
    return data

  1. The function reads the point position and values from the given file and field name.
  2. A dictionary is returned, to help retrieving all the values we may need to create the output raster. If no projection or geotransform is passed by the user, the input file extents and projection are used.
  3. In the example files, a file called points.shp has some sample points to test the functions.
Finally, the main method must get the  parameters from the user, and call the other two methods to create the raster. I have copied some of the input options from the gdal_grid program, so it's easier to test the results. Of course, not all the options are implemented, but it shouldn't be difficult to do it.

if __name__ == "__main__":
    #Setting the default values

    geotransform = None

    #Parsing the command line
    argv = gdal.GeneralCmdLineProcessor( sys.argv )
    i = 1
    while i < len(argv):
        arg = argv[i]
        if arg == '-out_format':
         driverName = argv[i+1]
         driverName = driverName.replace("'","")
         driverName = driverName.replace('"','')
         i = i + 1
        elif arg == '-zfield':
         zField = argv[i+1]
         zField = zField.replace("'","")
         zField = zField.replace('"','')
         i = i + 1
        elif arg == '-a_srs':
         proj = argv[i+1]
         proj = proj.replace("'","")
         proj = proj.replace('"','')
         i = i + 1
        elif arg == '-outsize':
         xSize = argv[i+1]
         xSize = xSize.replace("'","")
         xSize = int(xSize.replace('"',''))
         ySize = argv[i+2]
         ySize = ySize.replace("'","")
         ySize = int(ySize.replace('"',''))
         i = i + 2
        elif arg == '-txe':
         xMin = argv[i+1]
         xMin = xMin.replace("'","")
         xMin = float(xMin.replace('"',''))
         xMax = argv[i+2]
         xMax = xMax.replace("'","")
         xMax = float(xMax.replace('"',''))
         i = i + 2
        elif arg == '-tye':
         yMin = argv[i+1]
         yMin = yMin.replace("'","")
         yMin = float(yMin.replace('"',''))
         yMax = argv[i+2]
         yMax = yMax.replace("'","")
         yMax = float(yMax.replace('"',''))
         i = i + 2
        elif dataFile is None:
         dataFile = arg
         dataFile = dataFile.replace("'","")
         dataFile = dataFile.replace('"','')
        elif outFile is None:
         outFile = arg
         outFile = outFile.replace("'","")
         outFile = outFile.replace('"','')
        i = i + 1

    if dataFile is None or outFile is None:
        data = readPoints(dataFile,zField)
    except Exception,ex:
        print ex
    if xMin is None:
    if yMin is None:


    if proj is None:
        proj = data['proj'] 
            proj = osr.SpatialReference()
        except Exception,ex:
            print ex 
    #Creating the interpolation function and populating the output matrix value
        ZI = invDist(data['xv'],data['yv'],data['values'],geotransform,proj,xSize,ySize,power,smoothing,driverName,outFile)
    except Exception,ex:
        print ex

  1. First, the default parameter values are set. Some of them are None, since they are mandatory.
  2. Then, the input arguments are read.
  3. After having the input arguments, all the parameters that can be set, are set.
  4. The function that reads the input file is run, and with it's result, the raster is created. Note that the geotransform is defined from the input file if no one is set by the user.
And the result is something like this when opened with QGis or any GIS:
The points are not coloured, but are the ones in the original file.
The whole code comes in the example file, with all the data you need to run the example.
To generate the picture above, the command is:
 python -zfield value points.shp out.tiff

What's next

The original gdal_grid implements more interpolation methods (nearest value, and average). Doing the same in the script is as easy as adding other functions to use instead of pointValue.
Also, in the original program, an ellipse can be set to search only in some area around each pixel. This would be a little more difficult do implement, but is easy to find it in the gdal source code in a file called gdalgrid.cpp. I've put it in the example data.

As you can see, implementing this kind of stuff in python is not as difficult as it may seem at the beginning.

Thursday, May 24, 2012

Running GDAL Java

The GDAL Java bindings work very well, but the documentation is very very limited.
The API documentation is much better that the one in python, but it's quite difficult to find how to run a simple example.

Installation (Ubuntu)

In the Ubuntu distribution, there is a GDAL package, but the java bindings are not included. So it's necessary to compile the libraries to have it. 
If you have the gdal libraries already installed from the packages, is not necessary to remove them, since you can use just the compiled jar with the old libraries.
Before compiling, make sure that swig, libgeos-dev and proj4 packages are installed.
The, just run:
./configure --with-java --with-static-proj4=[]
Go to  the /path_to_gdal/swig/java directory and edit the file java.opt to set the correct JAVA_HOME variable, which is set to a Windows installation by default.
Then type make.
The gdal.jar file will be in the /path_to_gdal/swig/java directory.
The official docs to install the GDAL Java bindings are here:

Running the classes

GDAL java bindings use the native GDAL installation, so they make use of JNI. The Java Virtual Machine must find the GDAL binaries. To do so, an environment variable must be set. Froma the command line in linux, the order would be:
 export LD_LIBRARY_PATH=/path_to/gdal-1.9.0/swig/java
where the path is the one you have compiled.
If any coordinate transformation is used in the code, GDAL must find a file named gcs.csv. To find where is this file, just run:
gdal-config --datadir
Then, set the environment variable:
export GDAL_DATA= /path_to_gdal_data
or, even better, execute directly:
export GDAL_DATA=`gdal-config --datadir`
Don't forget to include the file gdal.jar in the classpath, and the class will run properly.
Anyway, is much easier to use an IDE to code in JAVA. Next two sections explain how.

 Running in Eclipse

Using the Eclipse IDE, is possible to configure the environment variables from the configuration window. To do it, click the small arrow next to the play button, and choose Run Configurations...
Once there, choose the Environment tab and add both GDAL_DATA and LD_LIBRARY_PATH using the new button.

Running in Netbeans

I haven't found a graphical solution for configuring both LD_LIBRARY_PATH and GDAL_DATA with Netbeans.
To set these environment variables, locate and open the file netbeans.conf. In my case (ubuntu 12.04), it was located at the path: /usr/local/netbeans-7.0.1/etc/
Add at the end of the file, the lines

export GDAL_DATA=/path_to/gdal/1.9
export LD_LIBRARY_PATH=/path_to/gdal-1.9.0/swig/java

Running in Apache Tomcat

If you are using Apache Tomcat to serve web pages, reading GIS files with GDAL can be a good idea. As in the other cases,  the environment variables must be set.
Edit /path_to_tomcat/bin/ (the path may change depending of your installation), and, as in NetBeans, add the lines:

export GDAL_DATA=/path_to/gdal/1.9
export LD_LIBRARY_PATH=/path_to/gdal-1.9.0/swig/java

The libraries must be installed as shared libraries, or the loading will crash if two instances are generated (a java.lang.UnsatisfiedLinkError will be launched).   So:
  • Create the directory: $CATALINA_HOME/shared/lib
  • Put into this directory the contents of  gdal_home/swig/java, which includes the gdal.jar file
  • Edit (in my case was the one under the conf directory) and change the shared.loader variable as:

Using Maven

gdal.jar is not in the maven repositories (as far as I know), so it must be added manually. Besides, if gdal has to be used in Tomcat in a specific directory, different from the one in maven.
So the pom.xml file has to be edited in the gdal section like this:

  Example class

If you want to check how to run a GDAL java class, you can use this example. The class prints the number of bands of the GIS file passed as a parameter:
import org.gdal.gdal.Dataset;
import org.gdal.gdal.gdal;
import org.gdal.gdalconst.gdalconstConstants;

 * Test class for GDAL java bindings 
public class Test {
 Dataset hDataset;
 int numBands;
 public Test(String filename){
  hDataset = gdal.Open(filename, gdalconstConstants.GA_ReadOnly);
  this.numBands = hDataset.getRasterCount();
  * @param args
 public static void main(String[] args) {
  if(args.length == 0){
   System.out.println("You must pass the file name as an argument");
  } else {
  Test instance = new Test(args[0]);