GeoExamples: numpy

Showing posts with label numpy. Show all posts

Wednesday, November 26, 2014

Basemap Tutorial

Basemap is a great tool for creating maps using python in a simple way. It's a `matplotlib <http://matplotlib.org/>`_ extension, so it has got all its features to create data visualizations, and adds the geographical projections and some datasets to be able to plot coast lines, countries, and so on directly from the library.

Basemap has got `some documentation <http://matplotlib.org/basemap/index.html>`_, but some things are a bit more difficult to find. I started a readthedocs page to extend a little the original documentation and examples, but it grew a little, and now covers many of the basemap possibilities.

Some of the examples from the tutorial

The tutorial can be found at http://basemaptutorial.readthedocs.org/, and all the examples and its source code, at GitHub and it's available for sharing or being modified by adding the attribution.

The tutorial covers:

An introductory chapter, showing the basic functionalities, and how to work with the included methods for reading shapefiles
The explanation and an example for each of the basemap methods
A section with more advanced examples, such as 3D maps, multiplots or WRF examples

I would really appreciate some feedback, the comments are open!

Monday, March 24, 2014

Shaded relief images using GDAL python

After showing how to colour a DEM file, classifying it, and calculating its isobands, this post shows how to create a shaded relief image from it.

The resulting image

A shaded relief image simulates the shadow thrown upon a relief map. This shadow is usually blended with some colouring, related to the altitude, a terrain classification, etc.
The shadow is usually drawn considering that the sun is at 315 degrees of azimuth and 45 degrees over the horizon, which never happens at the north hemisphere. This values avoid strange perceptions, such as seeing the mountain tops as the bottom of a valley.

In this example, the script calculates the hillshade image, a coloured image, and blends them into the shaded relief image.

As usual, all the code, plus the sample DEM file, can be found at GitHub.

The hillshade image

I didn't know how to create a shaded relief image using numpy. Eric Gayer helped me with some samples, and I found some other information here.

The script is:

"""
Creates a shaded relief file from a DEM.
"""

from osgeo import gdal
from numpy import gradient
from numpy import pi
from numpy import arctan
from numpy import arctan2
from numpy import sin
from numpy import cos
from numpy import sqrt
from numpy import zeros
from numpy import uint8
import matplotlib.pyplot as plt

def hillshade(array, azimuth, angle_altitude):
        
    x, y = gradient(array)
    slope = pi/2. - arctan(sqrt(x*x + y*y))
    aspect = arctan2(-x, y)
    azimuthrad = azimuth*pi / 180.
    altituderad = angle_altitude*pi / 180.
     
 
    shaded = sin(altituderad) * sin(slope)\
     + cos(altituderad) * cos(slope)\
     * cos(azimuthrad - aspect)
    return 255*(shaded + 1)/2

ds = gdal.Open('w001001.tiff')  
band = ds.GetRasterBand(1)  
arr = band.ReadAsArray()

hs_array = hillshade(arr,315, 45)
plt.imshow(hs_array,cmap='Greys')
plt.show()

The script draws the image using matplotlib, to make it easy
The hillshade function starts calculating the gradient for the x and y directions using the numpy.gradient function. The result are two matrices of the same size than the original, one for each direction.
From the gradient, the aspect and slope can be calculated. The aspect will give the mountain orientation, which will be illuminated depending on the azimuth angle. The slopewill change the illumination depending on the altitude angle.
Finally, the hillshade is calculated.

shaded_relief.py

The shaded relief image is calculated using the algorithm explained in the post Colorize PNG from a raster file and the hillshade.
As in the coloring post, the image is read by blocks to improve the performance, because it uses a lot of arrays, and doing it at once with a big image can take a lot of resources.
I will coment the code block by block, to make it easier. The full code is here.

The main function, called shaded_relief, is the most important, and calls the different algorithms:

def shaded_relief(in_file, raster_band, color_file, out_file_name,
    azimuth=315, angle_altitude=45):
    '''
    The main function. Reads the input image block by block to improve the performance, and calculates the shaded relief image
    '''

    if exists(in_file) is False:
            raise Exception('[Errno 2] No such file or directory: \'' + in_file + '\'')    
    
    dataset = gdal.Open(in_file, GA_ReadOnly )
    if dataset == None:
        raise Exception("Unable to read the data file")
    
    band = dataset.GetRasterBand(raster_band)

    block_sizes = band.GetBlockSize()
    x_block_size = block_sizes[0]
    y_block_size = block_sizes[1]

    #If the block y size is 1, as in a GeoTIFF image, the gradient can't be calculated, 
    #so more than one block is used. In this case, using8 lines gives a similar 
    #result as taking the whole array.
    if y_block_size < 8:
        y_block_size = 8

    xsize = band.XSize
    ysize = band.YSize

    max_value = band.GetMaximum()
    min_value = band.GetMinimum()

    #Reading the color table
    color_table = readColorTable(color_file)
    #Adding an extra value to avoid problems with the last & first entry
    if sorted(color_table.keys())[0] > min_value:
        color_table[min_value - 1] = color_table[sorted(color_table.keys())[0]]

    if sorted(color_table.keys())[-1] < max_value:
        color_table[max_value + 1] = color_table[sorted(color_table.keys())[-1]]
    #Preparing the color table
    classification_values = color_table.keys()
    classification_values.sort()

    max_value = band.GetMaximum()
    min_value = band.GetMinimum()

    if max_value == None or min_value == None:
        stats = band.GetStatistics(0, 1)
        max_value = stats[1]
        min_value = stats[0]

    out_array = zeros((3, ysize, xsize), 'uint8')

    #The iteration over the blocks starts here
    for i in range(0, ysize, y_block_size):
        if i + y_block_size < ysize:
            rows = y_block_size
        else:
            rows = ysize - i
        
        for j in range(0, xsize, x_block_size):
            if j + x_block_size < xsize:
                cols = x_block_size
            else:
                cols = xsize - j

            dem_array = band.ReadAsArray(j, i, cols, rows)
            
            hs_array = hillshade(dem_array, azimuth, 
                angle_altitude)

            rgb_array = values2rgba(dem_array, color_table, 
                classification_values, max_value, min_value)

            hsv_array = rgb_to_hsv(rgb_array[:, :, 0], 
                rgb_array[:, :, 1], rgb_array[:, :, 2]) 

            hsv_adjusted = asarray( [hsv_array[0], 
                hsv_array[1], hs_array] )          

            shaded_array = hsv_to_rgb( hsv_adjusted )
            
            out_array[:,i:i+rows,j:j+cols] = shaded_array
    
    #Saving the image using the PIL library
    im = fromarray(transpose(out_array, (1,2,0)), mode='RGB')
    im.save(out_file_name)

After opening the file, at line 20 comes the first interesting point. If the image is read block by block, some times the blocks will have only one line, as in the GeoTIFF images. With this situation, the y gradient won't be calculated, so the hillshade function will fail. I've seen that taking only two lines gives coarse results, and with lines the result is more or less the same as taking the whole array. The performance won't be as good as using only one block, but works faster anyway.
Lines 32 to 51 read the color table and file maximim and minumum. This has to be outside the values2rgba function, since is needed only once.
Lines 54 to 66 control the block reading. For each iteration, a small array will be read (line 67). This is what will be processed. The result will be written in the output array defined at line 52, that has the final size.
Now the calculations start:

At line 69, the hillshade is calculated
At line 72, the color array is calculated
At line 75, the color array is changed from rgb values to hsv.
At line 78, the value (the v in hsv) is changed to the hillshade value. This will blend both images. I took the idea from this post.
Then the image is transformed to rgb again (line 81) and written into the output array (line 83)

Finally, the array is transformed to a png image using the PIL library. The numpy.transpose function is used to re-order the matrix, since the original values are with the shape (3, height, width), and the Image.fromarray function needs (height, width, 3). An other way to do this is using scipy.misc.imsave (that would need scipy installed just for that), or the Image.merge function.

The colouring funcion is taken from the post Colorize PNG from a raster file, but modifying it so the colors are only continuous, since the discrete option doesn't give nice results in this case:

def values2rgba(array, color_table, classification_values, max_value, min_value):
    '''
    This function calculates a the color of an array given a color table. 
    The color is interpolated from the color table values.
    '''
    rgba = zeros((array.shape[0], array.shape[1], 4), dtype = uint8)

    for k in range(len(classification_values) - 1):
        if classification_values[k] < max_value and (classification_values[k + 1] > min_value ):
            mask = logical_and(array >= classification_values[k], array < classification_values[k + 1])

            v0 = float(classification_values[k])
            v1 = float(classification_values[k + 1])

            rgba[:,:,0] = rgba[:,:,0] + mask * (color_table[classification_values[k]][0] + (array - v0)*(color_table[classification_values[k + 1]][0] - color_table[classification_values[k]][0])/(v1-v0) )
            rgba[:,:,1] = rgba[:,:,1] + mask * (color_table[classification_values[k]][1] + (array - v0)*(color_table[classification_values[k + 1]][1] - color_table[classification_values[k]][1])/(v1-v0) )
            rgba[:,:,2] = rgba[:,:,2] + mask * (color_table[classification_values[k]][2] + (array - v0)*(color_table[classification_values[k + 1]][2] - color_table[classification_values[k]][2])/(v1-v0) )
            rgba[:,:,3] = rgba[:,:,3] + mask * (color_table[classification_values[k]][3] + (array - v0)*(color_table[classification_values[k + 1]][3] - color_table[classification_values[k]][3])/(v1-v0) )
    return rgba

The hillshade function is the same explained at the first point
The functions rgb_to_hsv and hsv_to_rgb are taken from this post, and change the image mode from rgb to hsv and hsv to rgb.

Saturday, May 26, 2012

Creating a grid from scattered data using inverse of the distance with python (gdal_grid approach)

OK, I have to admit that I was so happy when I found the scipy rbf function that I went too fast writing the entry about inverse of the distance.
I swear that the images are real (you can check it yourself), but when you change a few parameters or the point position, the result is really strange, definitely not the inverse of the distance algorithm I was thinking about.
Looking a little more around the gdal documentation, I found this explanation, which is very easy and clear:
http://www.gdal.org/grid_tutorial.html
The problem is that, at least in my python gdal version, is not possible to call the function directly, so I've been coding a little to get it.
All the example code and files can be downloaded in a zip file.

First version: the algorithm

Putting all the code directly makes things difficult to understand, so first, I'll reproduce the same program I wanted to show in the entrance, using matplotlib to show the results:

from math import pow
from math import sqrt
import numpy as np
import matplotlib.pyplot as plt

def pointValue(x,y,power,smoothing,xv,yv,values):
    nominator=0
    denominator=0
    for i in range(0,len(values)):
        dist = sqrt((x-xv[i])*(x-xv[i])+(y-yv[i])*(y-yv[i])+smoothing*smoothing);
        #If the point is really close to one of the data points, return the data point value to avoid singularities
        if(dist<0.0000000001):
            return values[i]
        nominator=nominator+(values[i]/pow(dist,power))
        denominator=denominator+(1/pow(dist,power))
    #Return NODATA if the denominator is zero
    if denominator > 0:
        value = nominator/denominator
    else:
        value = -9999
    return value

def invDist(xv,yv,values,xsize=100,ysize=100,power=2,smoothing=0):
    valuesGrid = np.zeros((ysize,xsize))
    for x in range(0,xsize):
        for y in range(0,ysize):
            valuesGrid[y][x] = pointValue(x,y,power,smoothing,xv,yv,values)
    return valuesGrid
    

if __name__ == "__main__":
    power=1
    smoothing=20

    #Creating some data, with each coodinate and the values stored in separated lists
    xv = [10,60,40,70,10,50,20,70,30,60]
    yv = [10,20,30,30,40,50,60,70,80,90]
    values = [1,2,2,3,4,6,7,7,8,10]
    
    #Creating the output grid (100x100, in the example)
    ti = np.linspace(0, 100, 100)
    XI, YI = np.meshgrid(ti, ti)

    #Creating the interpolation function and populating the output matrix value
    ZI = invDist(xv,yv,values,100,100,power,smoothing)


    # Plotting the result
    n = plt.normalize(0.0, 100.0)
    plt.subplot(1, 1, 1)
    plt.pcolor(XI, YI, ZI)
    plt.scatter(xv, yv, 100, values)
    plt.title('Inv dist interpolation - power: ' + str(power) + ' smoothing: ' + str(smoothing))
    plt.xlim(0, 100)
    plt.ylim(0, 100)
    plt.colorbar()

    plt.show()

Let's explain it step by step:

The main method defines some sample data, calls the interpolation function and draws it using matplotlib. Just copy&paste, since it's long to explain.
invDist function: For each point in the coordinates matrix, a value needs to be calculated. First, a xSize*ySize zeroes values is calculated. Then for each point coordinate (the pixel position, in this simple case), a function that calculates the value is called.
This is the important part: We apply the formula: $V=\frac{\sum\limits_{i=1}^n \frac{v_{i}}{d_{i}^{p}}}{\sum\limits_{i=1}^n \frac{1}{d_{i}^{p}}}$ for each pixel.
- V is the interpolated value.
- n is the number of points
- d is the distance from the point to the pixel
- v is the value of the point
- p is the power we want to apply to the distance o weight it.
The distance is the Cartesian one, plus the smoothing factor (it gives more distance than the actual one, lowering the influence around the points). If the distance is close to the precision of the float numbers, we give the data value instead of the interpolated one, to avoid strange results.

And that's all. Here, the results (you can compare them with the ones in the other entry) which are consistent to the inverse of the distance method:

If the smoothing is zero and the power is high, the interpolation changes a lot around the points to give them their exact value.

If the smoothing is high and the power is one, the result is much smoother, but the values at the points are not maintained.

Adding the GIS stuff:

The example above doesn't write to a GIS file. The function invDist must be changed so it creates the file using a gdal driver:

def invDist(xv,yv,values,geotransform,proj,xSize,ySize,power,smoothing,driverName,outFile):
    #Transform geographic coordinates to pixels
    for i in range(0,len(xv)):
         xv[i] = (xv[i]-geotransform[0])/geotransform[1]
    for i in range(0,len(yv)):
         yv[i] = (yv[i]-geotransform[3])/geotransform[5]
    #Creating the file
    driver = gdal.GetDriverByName( driverName )
    ds = driver.Create( outFile, xSize, ySize, 1, gdal.GDT_Float32)
    if proj is not None:
        ds.SetProjection(proj.ExportToWkt())
    ds.SetGeoTransform(geotransform)
    valuesGrid = np.zeros((ySize,xSize))
    #Getting the interpolated values
    for x in range(0,xSize):
        for y in range(0,ySize):
            valuesGrid[y][x] = pointValue(x,y,power,smoothing,xv,yv,values)
    
    ds.GetRasterBand(1).WriteArray(valuesGrid)
    ds = None
    return valuesGrid

Note that the function will need the geographic parameters geotransform and proj, as well as the output file name and driver.
First, the point positions must be changed to pixel locations in order to calculate distances
Then, the output file is created with the size and projection asked by the user
The values are calculated in the same way as in the other example
The file is written

The data in the first example was set into three lists, but the correct way to do it would be read it from a geometry file with the OGR library. Another function is defined to do that:

def readPoints(dataFile,Zfield='Z'):
    data = {}
    xv=[]
    yv=[]
    values=[]
    ds = ogr.Open(dataFile)
    if ds is None:
       raise Exception('Could not open ' + dataFile)
    
    layer = ds.GetLayer()
    proj = layer.GetSpatialRef()
    extent = layer.GetExtent()

    feature = layer.GetNextFeature()
    if feature.GetFieldIndex(zField) == -1:
         raise Exception('zField is not valid: ' + zField)

    while feature:
        geometry = feature.GetGeometryRef()
        xv.append(geometry.GetX())
        yv.append(geometry.GetY())
        values.append(feature.GetField(zField))

        feature = layer.GetNextFeature()
    data['extent'] = extent 
    data['xv']=xv
    data['yv']=yv
    data['values']=values
    data['proj'] = proj
    ds = None
    return data

The function reads the point position and values from the given file and field name.
A dictionary is returned, to help retrieving all the values we may need to create the output raster. If no projection or geotransform is passed by the user, the input file extents and projection are used.
In the example files, a file called points.shp has some sample points to test the functions.

Finally, the main method must get the parameters from the user, and call the other two methods to create the raster. I have copied some of the input options from the gdal_grid program, so it's easier to test the results. Of course, not all the options are implemented, but it shouldn't be difficult to do it.

if __name__ == "__main__":
    #Setting the default values
    power=2
    smoothing=0
    
    zField='Z'
    dataFile=None
    outFile=None
    driverName='GTiff'
    proj=None

    geotransform = None
    xMin=None
    xMax=None
    yMin=None
    yMax=None
    xSize=100
    ySize=100

    #Parsing the command line
    argv = gdal.GeneralCmdLineProcessor( sys.argv )
    
    i = 1
    while i < len(argv):
        arg = argv[i]
        if arg == '-out_format':
         driverName = argv[i+1]
         driverName = driverName.replace("'","")
         driverName = driverName.replace('"','')
         i = i + 1
        elif arg == '-zfield':
         zField = argv[i+1]
         zField = zField.replace("'","")
         zField = zField.replace('"','')
         i = i + 1
        elif arg == '-a_srs':
         proj = argv[i+1]
         proj = proj.replace("'","")
         proj = proj.replace('"','')
         i = i + 1
        elif arg == '-outsize':
         xSize = argv[i+1]
         xSize = xSize.replace("'","")
         xSize = int(xSize.replace('"',''))
         ySize = argv[i+2]
         ySize = ySize.replace("'","")
         ySize = int(ySize.replace('"',''))
         i = i + 2
        elif arg == '-txe':
         xMin = argv[i+1]
         xMin = xMin.replace("'","")
         xMin = float(xMin.replace('"',''))
         xMax = argv[i+2]
         xMax = xMax.replace("'","")
         xMax = float(xMax.replace('"',''))
         i = i + 2
        elif arg == '-tye':
         yMin = argv[i+1]
         yMin = yMin.replace("'","")
         yMin = float(yMin.replace('"',''))
         yMax = argv[i+2]
         yMax = yMax.replace("'","")
         yMax = float(yMax.replace('"',''))
         i = i + 2
        elif dataFile is None:
         dataFile = arg
         dataFile = dataFile.replace("'","")
         dataFile = dataFile.replace('"','')
        elif outFile is None:
         outFile = arg
         outFile = outFile.replace("'","")
         outFile = outFile.replace('"','')
        i = i + 1



    if dataFile is None or outFile is None:
        usage()
    try:
        data = readPoints(dataFile,zField)
    except Exception,ex:
        print ex
        sys.exit(0)
    if xMin is None:
        xMin=data['extent'][0]
        xMax=data['extent'][1]
    if yMin is None:
        yMin=data['extent'][2]
        yMax=data['extent'][3]

    geotransform=[]
    geotransform.append(xMin)
    geotransform.append((xMax-xMin)/xSize)
    geotransform.append(0)
    geotransform.append(yMax)
    geotransform.append(0)
    geotransform.append((yMin-yMax)/ySize)

    if proj is None:
        proj = data['proj'] 
    else:
        try:
            proj = osr.SpatialReference()
            proj.SetFromUserInput(str(proj))  
        except Exception,ex:
            print ex 
            sys.exit(0)
    #Creating the interpolation function and populating the output matrix value
    try:
        ZI = invDist(data['xv'],data['yv'],data['values'],geotransform,proj,xSize,ySize,power,smoothing,driverName,outFile)
    except Exception,ex:
        print ex
        sys.exit(0)

First, the default parameter values are set. Some of them are None, since they are mandatory.
Then, the input arguments are read.
After having the input arguments, all the parameters that can be set, are set.
The function that reads the input file is run, and with it's result, the raster is created. Note that the geotransform is defined from the input file if no one is set by the user.

And the result is something like this when opened with QGis or any GIS:

The points are not coloured, but are the ones in the original file.
The whole code comes in the example file, with all the data you need to run the example.
To generate the picture above, the command is:
python invdistgis.py -zfield value points.shp out.tiff

What's next

The original gdal_grid implements more interpolation methods (nearest value, and average). Doing the same in the script is as easy as adding other functions to use instead of pointValue.

Also, in the original program, an ellipse can be set to search only in some area around each pixel. This would be a little more difficult do implement, but is easy to find it in the gdal source code in a file called gdalgrid.cpp. I've put it in the example data.

As you can see, implementing this kind of stuff in python is not as difficult as it may seem at the beginning.