GeoExamples: GDAL

Showing posts with label GDAL. Show all posts

Saturday, October 11, 2014

Basemap raster clipping with a shapefile

Basemap is a great library for mapping faster than other python options, but there are some usual things I couldn't find how to do. Clipping a raster using a shape is one of them. Here's how do I do it

The output

As usual, all the code can be found at GitHub

Getting some data

The example plots some elevation data, taken from the SRTM. After looking for some options, the easiest to work with was this one: http://srtm.csi.cgiar.org/SELECTION/inputCoord.asp

The shapefile will be the border of Andorra, taken from Natural Earth

The result is a little poor because the resolution is low, but works well for the example.

The script

from mpl_toolkits.basemap import Basemap
from matplotlib.path import Path
from matplotlib.patches import PathPatch
import matplotlib.pyplot as plt
from osgeo import gdal
import numpy
import shapefile

fig = plt.figure()
ax = fig.add_subplot(111)

sf = shapefile.Reader("ne_10m_admin_0_countries")

for shape_rec in sf.shapeRecords():
    if shape_rec.record[3] == 'Andorra':
        vertices = []
        codes = []
        pts = shape_rec.shape.points
        prt = list(shape_rec.shape.parts) + [len(pts)]
        for i in range(len(prt) - 1):
            for j in range(prt[i], prt[i+1]):
                vertices.append((pts[j][0], pts[j][1]))
            codes += [Path.MOVETO]
            codes += [Path.LINETO] * (prt[i+1] - prt[i] -2)
            codes += [Path.CLOSEPOLY]
        clip = Path(vertices, codes)
        clip = PathPatch(clip, transform=ax.transData)


m = Basemap(llcrnrlon=1.4,
    llcrnrlat=42.4,
    urcrnrlon=1.77,
    urcrnrlat=42.7,
    resolution = None, 
    projection = 'cyl')

ds = gdal.Open('srtm_37_04.tif')
data = ds.ReadAsArray()

gt = ds.GetGeoTransform()
x = numpy.linspace(gt[0], gt[0] + gt[1] * data.shape[1], data.shape[1])
y = numpy.linspace(gt[3], gt[3] + gt[5] * data.shape[0], data.shape[0])

xx, yy = numpy.meshgrid(x, y)

cs = m.contourf(xx,yy,data,range(0, 3600, 200))

for contour in cs.collections:
        contour.set_clip_path(clip)

plt.show()

I used the pyshp library for reading the shapefile, since Fiona and GDAL don't work well together, and OGR was longer
Lines 14 to 27 create the path. A Matplotlib path is made by two arrays. One with the points (called vertices in the script), and the other with the functions for every point (called codes)

In our case, only straight lines have to be used, so there will be a MOVETO to indicate the beginning of the polygon, many LINETO to create the segments and one CLOSEPOLY for closing it
Of course, only the polygon for Andorra has to be used. I get it from the shapefile attributes
The prt array is for managing multipolygons, which is not the case, but the code will create correct clipping for multipolygons
The path is created using the Path function, and then added to a PathPatch, to be able to use it as a closed polygon. Note the trasnform=ax.transData attribute. This assumes the polygon coordinates to be the ones used in the data (longitudes and latitudes in our case). More information here

Next code lines draw the map as usual. I have used a latlon projection, so all the values for the raster and shapefile can be used directly. If the output raster was in an other projection, the shapefile coordinates should be appended to the path using the output projection (m(pts[j][0], pts[j][1]))
The x and y coordinates are calculated from the GDAL geotransform, and then turned into a matrix using meshgrid
The clipping itself is made in the lines 48 and 49. For each drawn element, the method set_clip_path is applied

Links

SRTM data download
Matplotlib path tutorial
Clipping an image in Basemap
Understanding Basemap transformations

Clipping a raster with GDAL python

Monday, March 24, 2014

Shaded relief images using GDAL python

After showing how to colour a DEM file, classifying it, and calculating its isobands, this post shows how to create a shaded relief image from it.

The resulting image

A shaded relief image simulates the shadow thrown upon a relief map. This shadow is usually blended with some colouring, related to the altitude, a terrain classification, etc.
The shadow is usually drawn considering that the sun is at 315 degrees of azimuth and 45 degrees over the horizon, which never happens at the north hemisphere. This values avoid strange perceptions, such as seeing the mountain tops as the bottom of a valley.

In this example, the script calculates the hillshade image, a coloured image, and blends them into the shaded relief image.

As usual, all the code, plus the sample DEM file, can be found at GitHub.

The hillshade image

I didn't know how to create a shaded relief image using numpy. Eric Gayer helped me with some samples, and I found some other information here.

The script is:

"""
Creates a shaded relief file from a DEM.
"""

from osgeo import gdal
from numpy import gradient
from numpy import pi
from numpy import arctan
from numpy import arctan2
from numpy import sin
from numpy import cos
from numpy import sqrt
from numpy import zeros
from numpy import uint8
import matplotlib.pyplot as plt

def hillshade(array, azimuth, angle_altitude):
        
    x, y = gradient(array)
    slope = pi/2. - arctan(sqrt(x*x + y*y))
    aspect = arctan2(-x, y)
    azimuthrad = azimuth*pi / 180.
    altituderad = angle_altitude*pi / 180.
     
 
    shaded = sin(altituderad) * sin(slope)\
     + cos(altituderad) * cos(slope)\
     * cos(azimuthrad - aspect)
    return 255*(shaded + 1)/2

ds = gdal.Open('w001001.tiff')  
band = ds.GetRasterBand(1)  
arr = band.ReadAsArray()

hs_array = hillshade(arr,315, 45)
plt.imshow(hs_array,cmap='Greys')
plt.show()

The script draws the image using matplotlib, to make it easy
The hillshade function starts calculating the gradient for the x and y directions using the numpy.gradient function. The result are two matrices of the same size than the original, one for each direction.
From the gradient, the aspect and slope can be calculated. The aspect will give the mountain orientation, which will be illuminated depending on the azimuth angle. The slopewill change the illumination depending on the altitude angle.
Finally, the hillshade is calculated.

shaded_relief.py

The shaded relief image is calculated using the algorithm explained in the post Colorize PNG from a raster file and the hillshade.
As in the coloring post, the image is read by blocks to improve the performance, because it uses a lot of arrays, and doing it at once with a big image can take a lot of resources.
I will coment the code block by block, to make it easier. The full code is here.

The main function, called shaded_relief, is the most important, and calls the different algorithms:

def shaded_relief(in_file, raster_band, color_file, out_file_name,
    azimuth=315, angle_altitude=45):
    '''
    The main function. Reads the input image block by block to improve the performance, and calculates the shaded relief image
    '''

    if exists(in_file) is False:
            raise Exception('[Errno 2] No such file or directory: \'' + in_file + '\'')    
    
    dataset = gdal.Open(in_file, GA_ReadOnly )
    if dataset == None:
        raise Exception("Unable to read the data file")
    
    band = dataset.GetRasterBand(raster_band)

    block_sizes = band.GetBlockSize()
    x_block_size = block_sizes[0]
    y_block_size = block_sizes[1]

    #If the block y size is 1, as in a GeoTIFF image, the gradient can't be calculated, 
    #so more than one block is used. In this case, using8 lines gives a similar 
    #result as taking the whole array.
    if y_block_size < 8:
        y_block_size = 8

    xsize = band.XSize
    ysize = band.YSize

    max_value = band.GetMaximum()
    min_value = band.GetMinimum()

    #Reading the color table
    color_table = readColorTable(color_file)
    #Adding an extra value to avoid problems with the last & first entry
    if sorted(color_table.keys())[0] > min_value:
        color_table[min_value - 1] = color_table[sorted(color_table.keys())[0]]

    if sorted(color_table.keys())[-1] < max_value:
        color_table[max_value + 1] = color_table[sorted(color_table.keys())[-1]]
    #Preparing the color table
    classification_values = color_table.keys()
    classification_values.sort()

    max_value = band.GetMaximum()
    min_value = band.GetMinimum()

    if max_value == None or min_value == None:
        stats = band.GetStatistics(0, 1)
        max_value = stats[1]
        min_value = stats[0]

    out_array = zeros((3, ysize, xsize), 'uint8')

    #The iteration over the blocks starts here
    for i in range(0, ysize, y_block_size):
        if i + y_block_size < ysize:
            rows = y_block_size
        else:
            rows = ysize - i
        
        for j in range(0, xsize, x_block_size):
            if j + x_block_size < xsize:
                cols = x_block_size
            else:
                cols = xsize - j

            dem_array = band.ReadAsArray(j, i, cols, rows)
            
            hs_array = hillshade(dem_array, azimuth, 
                angle_altitude)

            rgb_array = values2rgba(dem_array, color_table, 
                classification_values, max_value, min_value)

            hsv_array = rgb_to_hsv(rgb_array[:, :, 0], 
                rgb_array[:, :, 1], rgb_array[:, :, 2]) 

            hsv_adjusted = asarray( [hsv_array[0], 
                hsv_array[1], hs_array] )          

            shaded_array = hsv_to_rgb( hsv_adjusted )
            
            out_array[:,i:i+rows,j:j+cols] = shaded_array
    
    #Saving the image using the PIL library
    im = fromarray(transpose(out_array, (1,2,0)), mode='RGB')
    im.save(out_file_name)

After opening the file, at line 20 comes the first interesting point. If the image is read block by block, some times the blocks will have only one line, as in the GeoTIFF images. With this situation, the y gradient won't be calculated, so the hillshade function will fail. I've seen that taking only two lines gives coarse results, and with lines the result is more or less the same as taking the whole array. The performance won't be as good as using only one block, but works faster anyway.
Lines 32 to 51 read the color table and file maximim and minumum. This has to be outside the values2rgba function, since is needed only once.
Lines 54 to 66 control the block reading. For each iteration, a small array will be read (line 67). This is what will be processed. The result will be written in the output array defined at line 52, that has the final size.
Now the calculations start:

At line 69, the hillshade is calculated
At line 72, the color array is calculated
At line 75, the color array is changed from rgb values to hsv.
At line 78, the value (the v in hsv) is changed to the hillshade value. This will blend both images. I took the idea from this post.
Then the image is transformed to rgb again (line 81) and written into the output array (line 83)

Finally, the array is transformed to a png image using the PIL library. The numpy.transpose function is used to re-order the matrix, since the original values are with the shape (3, height, width), and the Image.fromarray function needs (height, width, 3). An other way to do this is using scipy.misc.imsave (that would need scipy installed just for that), or the Image.merge function.

The colouring funcion is taken from the post Colorize PNG from a raster file, but modifying it so the colors are only continuous, since the discrete option doesn't give nice results in this case:

def values2rgba(array, color_table, classification_values, max_value, min_value):
    '''
    This function calculates a the color of an array given a color table. 
    The color is interpolated from the color table values.
    '''
    rgba = zeros((array.shape[0], array.shape[1], 4), dtype = uint8)

    for k in range(len(classification_values) - 1):
        if classification_values[k] < max_value and (classification_values[k + 1] > min_value ):
            mask = logical_and(array >= classification_values[k], array < classification_values[k + 1])

            v0 = float(classification_values[k])
            v1 = float(classification_values[k + 1])

            rgba[:,:,0] = rgba[:,:,0] + mask * (color_table[classification_values[k]][0] + (array - v0)*(color_table[classification_values[k + 1]][0] - color_table[classification_values[k]][0])/(v1-v0) )
            rgba[:,:,1] = rgba[:,:,1] + mask * (color_table[classification_values[k]][1] + (array - v0)*(color_table[classification_values[k + 1]][1] - color_table[classification_values[k]][1])/(v1-v0) )
            rgba[:,:,2] = rgba[:,:,2] + mask * (color_table[classification_values[k]][2] + (array - v0)*(color_table[classification_values[k + 1]][2] - color_table[classification_values[k]][2])/(v1-v0) )
            rgba[:,:,3] = rgba[:,:,3] + mask * (color_table[classification_values[k]][3] + (array - v0)*(color_table[classification_values[k + 1]][3] - color_table[classification_values[k]][3])/(v1-v0) )
    return rgba

The hillshade function is the same explained at the first point
The functions rgb_to_hsv and hsv_to_rgb are taken from this post, and change the image mode from rgb to hsv and hsv to rgb.

Tuesday, February 25, 2014

3D terrain visualization with python and Mayavi2

I have always wanted to draw these 3D terrains like those in www.shadedrelief.com, which are amazing. But the examples were all using software I don't use, so I tried to do it with python.

The final result

As usual, you can get all the source code and data at my GitHub page.

Getting the data

After trying different locations, I decided to use the mountain of Montserrat, close to Barcelona, since it has nice stone towers that are a good test for the DEM and the 3D visualization. An actual picture of the zone used is this one:

Montserrat monastery

The building is a good reference, since the stone only areas make the result testing much more difficult.
All the data has been downloaded from the ICGC servers:

The DEM data was downloaded from the Vissir3 service, going to the section catàleg i descàrregues -> MDE 5x5. The file is named met5v10as0f0392Amr1r020.txt, but I cut a small part of it, to make mayavi2 work smoother using:

gdalwarp -te 401620 4604246 403462 4605867 -s_srs EPSG:25831 -t_srs EPSG:25831 met5v10as0f0392Amr1r020.txt dem.tiff
The picture to drap over the dem file can be downloaded using the WMS service given by the ICGC:

http://geoserveis.icc.cat/icc_mapesbase/wms/service?REQUEST=GetMap&VERSION=1.1.0&SERVICE=WMS&SRS=EPSG:25831&BBOX=401620.0,4604246.0,403462.0,4605867.0&WIDTH=1403&HEIGHT=1146&LAYERS=orto5m&STYLES=&FORMAT=JPEG&BGCOLOR=0xFFFFFF&TRANSPARENT=TRUE&EXCEPTION=INIMAGE

It's not as automatic as I would like, but if it's possible to download a DEM and the corresponding image, it's possible to create the 3D image.

Creating the image

First, let's plot the DEM file in 3D using mayavi2:

"""
Plotting the terrain DEM with Mayavi2
"""

from osgeo import gdal
from mayavi import mlab

ds = gdal.Open('dem.tiff')
data = ds.ReadAsArray()

mlab.figure(size=(640, 800), bgcolor=(0.16, 0.28, 0.46))

mlab.surf(data, warp_scale=0.2)

mlab.show()

In first place, we import as usual, gdal and numpy. Also the mlab library from mayavi, which lets set the mayavi canvas.
The data is read, as usual, with the gdal ReadAsArray method.
The figure is created. This works like creating the Image object in the PIL library, creating the canvas where the data wil be drawn. In this case, the size is 640 x 800 pixels, making the figure bigger can affect the performance in some old computers. bgcolor sets the blue color as the background.
The surf method will plot the surface. The input has to be a 2D numpy array, which is what we have.

The warp_scale argument sets the vertical scale. In this case, letting the default value (1?) creates a really exaggerated effect so its better to play a little to get a more realistic effect.
The colors depend of the Z value at each point, and can be changed using the color or colormap option.

The show() method makes the image to stay visible when running the example from a script. If you use ipython, you don't need this step.

If you want to save the figure as a png, you can either use the icon in the mayavi window or call the method mlab.savefig('image_name')
If you want to move the camera (change the prespective), you can use the roll/yaw/pitch methods:
```
f = mlab.gcf()
camera = f.scene.camera
camera.yaw(45)
```

The plotted DEM

Now, let's put an aerial image over the 3D visualization:

"""
Draping an image over a terrain surface
"""
from osgeo import gdal
from tvtk.api import tvtk
from mayavi import mlab
import Image

ds = gdal.Open('dem.tiff')
data = ds.ReadAsArray()
im1 = Image.open("ortofoto.jpg")
im2 = im1.rotate(90)
im2.save("/tmp/ortofoto90.jpg")
bmp1 = tvtk.JPEGReader()
bmp1.file_name="/tmp/ortofoto90.jpg" #any jpeg file

my_texture=tvtk.Texture()
my_texture.interpolate=0
my_texture.set_input(0,bmp1.get_output())


mlab.figure(size=(640, 800), bgcolor=(0.16, 0.28, 0.46))

surf = mlab.surf(data, color=(1,1,1), warp_scale=0.2) 
surf.actor.enable_texture = True
surf.actor.tcoord_generator_mode = 'plane'
surf.actor.actor.texture = my_texture

mlab.show()

The most important new import is tvtk. TVTK is a python api that allows to work with VTK objects. Actually, my knowledge of Mayavi2 is very limited, but I see TVTK as an extension.
The DEM data is read the same way, using the ReadAsArray method.
The aerial image, named ortofoto.jpg, is not in the correct orientation. It took me a lot of time to get what was happening. I rotate it opening the image with the PIL library and using the rotate method (lines 11 to 13)
Then, the tvtk object with the texture is created, loading the image with a JPEGReader object, and assigning it to the Texture object (lines 14 to 19).
The figure and the 3D surface is created as in the other example (lines 22 and 24)
Then, the surface is modified to show the image over it (lines 25 to 28).

The final result

The result is correct, but the aerial image is a little moved from the place it should be so, since the terrain is really steep, part of the buildings are drawn on the stone walls! I edited the WMS coordinates a little so the result is slightly better. Anyway, the method is correct.

Sunday, November 24, 2013

Fast tip: Drawing raster NO DATA values with MapServer

MapServer is a great way to draw web maps, using it either as a WMS server or creating image files directly.
Is the raster has a No Data value, MapServer reads it from the GDAL library and doesn't draw the pixels with the value. But what happens if you want to draw the pixels with No Data values?

Fast answer

The solution, that I found here, is setting the LAYER property PROCESSING in this way:

PROCESSING "NODATA=OFF"

which disables the regular MapServer behaviour, which is, quite logically, not drawing the pixels with the NODATA value.
Once this is done, just create a CLASS with the No Data Value ¡in the EXPRESSION tag, so the pixels are coloured as you prefer.

Why did I need this stuff

Maybe this is not a common need, but when drawing weather radar images, is a good idea to indicate the areas not covered by the radar in some colour, so it's easy to distinguish between zones where there is no rain from the zones where no data is available. Like in these examples:

Catalan Weather Service radar with the range visible in gray

The Catalan Weather Service radar in La Panadella with the range visible in gray, using the no data values

http://www.aemet.es/ca/eltiempo/observacion/radar?w=0

The Spanish Agency radar network, with the not covered areas in gray.

Strangely, finding the solution was quite difficult, so I put it in this post. Maybe somebody, someday, will find faster than me.

Some links:

The only place I found some documentation about this topic: http://trac.osgeo.org/mapserver/ticket/2404
The Accueather web site, showing the radar network without shading the not covered areas
Hawaii Doppler Radar visualization with the nodata zone visible
The entry about the GDAL IRIS driver
MapServer raster documentation page

Thursday, September 5, 2013

Reading WRF NetCDF files with GDAL python

Since I work at a meteorological service I have to deal quite often with numerical weather prediction models. I use to work with GRIB files that have the data in the format I need to draw simple maps.
But sometimes, I need to calculate stuff that require the original data, since GRIB files are usually reprojected and simplified in vertical levels.
In the case of the WRF model, the output is in NetCDF format. The data is in sigma levels instead the pressure levels I need, and the variables such as temperature don't come directly in the file, but need to be calculated.
The post is actually written to order the things I've been using for myself, but I've added it to the blog because I haven't found step by step instructions.

Getting the data

As usual, you can download the script from GitHub.

I have used some data from the model run at the SMC, where I work, but you can get a WRF output from the ucar.edu web site. The projection is different from the one I have used, and it has 300 sigma levels instead of 30, but it should work.

The NetCDF format

Every NETCDF file contains a collection of datasets with different sizes and types. Each dataset can have one or more layers, can have unidimensional arrays in it, etc. So the "traditional" GDAL datamodel can't work directly, since assumes that a file have multiple layers of the same size. That's why "subdatasets" are introduced. So opening the file directly (or running gdalinfo) only gives some metadata about all the datasets, but no actual data can be read or consulted. To do it, the subdataset must be opened instead.

Opening a subdataset

Since opening the file directly doesn't open the actual dataset, but the "container", a different notation is used, maintaining the Open method. The filename string must be preceded by NETCDF: and continued by :VAR_NAME
To open the variable XLONG subdataset, the order would be:

gdal.Open('NETCDF:"'+datafile+'":XLONG')

ARWPost

ARWPost is a program that reads WRF outputs in FORTRAN. I have used it's code to know how to calculate the actual model variables from the model NetCDF file.
There is a file for calculating each variable, some for the interpolation, etc. so it has helped me a lot when trying to understand how everything works.
The problem of ARWPost is that the usage is quite complicated (change a configuration file and executing), so it was not very practical to me.

Sigma Levels

Instead of using constant pressure levels, the wrf model works using the Sigma coordinate system
This makes a bit more difficult extracting the data at a constant pressure, since for every point in the horizontal grid, a different level position is needed to get the value.

Besides, not all the variables use the same sigma levels. There are two different level sets known as "half levels" and "full levels", as shown in the picture:


Image adapted from the Wikipedia Sigma Levels entry

Of course, this makes things a bit more complicated.

A layer sigma value in the half level is the average between full level above and below.

Getting the geopotential height at a pressure level, step by step

Setting the right projection

NetCDF can hold the following projections:
'lambert', 'polar', 'mercator', 'lat-lon' and 'rotated_ll'. The projection is defined using the label MAP_PROJ at the original file metadata (not the subdatasets metadata). To get all the metadata file, jut use:

ds_in = gdal.Open(datafile)
metadata = ds_in.GetMetadata()

All the metadata comes with the prefix NC_GLOBAL#

In my case, I have MAP_PROJ=2 ,so the projection is polar stereographic (the code is the position in the projection list, starting by 1). I'll set it with:

proj_out = osr.SpatialReference()
proj_out.SetPS(90, -1.5, 1, 0, 0)

where -1.5 is the STAND_LON label at the original file metadata and 90 is the POLE_LAT label.

Finding the geotransform

I'm not sure if the method I use is the best one, but the data I get is consistent.
The metadata given at the main file doesn't set the geotransform properly (or, at least, I'm not able to get it), since it sets only the center point coordinates, and the deltax and deltay. But this deltas are not referenced to the same point, so getting the real geotransform is not easy.

Fortunately, the files come with the fields XLONG and XLAT, that contain the longitude and latitude for each point in the matrix. So taking the corners is possible to get the geotransform, although the coordinates have to be reprojected before:

ds_lon = gdal.Open('NETCDF:"'+datafile+'":XLONG')
ds_lat = gdal.Open('NETCDF:"'+datafile+'":XLAT')

longs = ds_lon.GetRasterBand(1).ReadAsArray()
lats = ds_lat.GetRasterBand(1).ReadAsArray()
ds_lon = None
ds_lat = None
proj_gcp = osr.SpatialReference()
proj_gcp.ImportFromEPSG(4326)
transf = osr.CoordinateTransformation(proj_gcp, proj_out)
ul = transf.TransformPoint(float(longs[0][0]), float(lats[0][0]))
lr = transf.TransformPoint(float(longs[len(longs)-1][len(longs[0])-1]), float(lats[len(longs)-1][len(longs[0])-1]))
ur = transf.TransformPoint(float(longs[0][len(longs[0])-1]), float(lats[0][len(longs[0])-1]))
ll = transf.TransformPoint(float(longs[len(longs)-1][0]), float(lats[len(longs)-1][0]))
gt0 = ul[0]
gt1 = (ur[0] - gt0) / len(longs[0])
gt2 = (lr[0] - gt0 - gt1*len(longs[0])) / len(longs)
gt3 = ul[1]
gt5 = (ll[1] - gt3) / len(longs)
gt4 = (lr[1] - gt3 - gt5*len(longs) ) / len(longs[0])
geotransform = (gt0,gt1,gt2,gt3,gt4,gt5)

Getting the geopotential height

From the ARWPost code, I know that the geopotential height is:

H = (PH + PHB)/9.81

So the PH and PHB subdatasets must be added and divided by the gravity to get the actual geopotential height

The code:

ds_ph = gdal.Open('NETCDF:"'+datafile+'":PH')
ds_phb = gdal.Open('NETCDF:"'+datafile+'":PHB')
num_bands = ds_ph.RasterCount
data_hgp = numpy.zeros(((num_bands, ds_ph.RasterYSize, ds_ph.RasterXSize)))
for i in range(num_bands):
    data_hgp[i,:,:] = ( ds_ph.GetRasterBand(num_bands - i).ReadAsArray() + ds_phb.GetRasterBand(num_bands - i).ReadAsArray() ) / 9.81
    ds_ph = None 
    ds_phb = None

Note that the bands are inverted in order and that in the raster, the order of the elements is [layer][y][x]. This is needed later to calculate layer positions.

Calculating the geopotential height for a given Air Pressure

At this moment, we have the geopotential height for every sigma level, which is not very useful for working with the data. People prefers to have the geopotential height for a given pressure (850 hPa, 500 hPa, and so on).

The model gives the pressure values at every point, calculated doing:
Press = P + PB
in Pa or
Press = (P + PB ) / 100
in hPa

The code for reading the pressure:

ds_p = gdal.Open('NETCDF:"'+datafile+'":P')
ds_pb = gdal.Open('NETCDF:"'+datafile+'":PB')
num_bands = ds_p.RasterCount
data_p = numpy.zeros(((num_bands, y_size, x_size)))
for i in range(num_bands):
    data_p[i,:,:] = ( ds_p.GetRasterBand(num_bands - i).ReadAsArray() + ds_pb.GetRasterBand(num_bands - i).ReadAsArray() ) / 100
    ds_p = None
    ds_pb = None

Now we can calculated at which layer on the pressure subdataset the given pressure would be.
NumPy has a function that calculates at which position a value would be in an ordered array called searchsorted: http://docs.scipy.org/doc/numpy/reference/generated/numpy.searchsorted.html

numpy.searchsorted([1,2,3,4,5], 2.5)

would give
2, since the value 2.5 is between 2 and 3 (positions 1 and 2).
I needed two tricks to use it:
*The function needs to have an ascending ordered array. That's why the layers order is reversed in the code, because the air pressure is descending with height, and we need the opposite to use this function.
*The function can only run in 1-d arrays. To use it in our 3-d array, the numpy.apply_along_axis function is used:
numpy.apply_along_axis(lambda a: a.searchsorted(p_ref), axis = 0, arr = data_p)
where p_ref is the pressure we want for the geopotential height.
That's why the layer number is the first element in the array, because is the only way to get apply_along_axis work.

The final code for the calculation:

p_ref = 500
data_positions = numpy.apply_along_axis(lambda a: a.searchsorted(p_ref), axis = 0, arr = data_p)

Now we know between which layers a given pressure would be for every point. To know the actual pressures at the upper and lower level for each point, the numpy.choose function can be used:

p0 = numpy.choose(data_positions - 1, data_p)
p1 = numpy.choose(data_positions, data_p)
layer_p = (data_positions - 1) + (p_ref - p0) / (p1 - p0)

Now we know the position in the pressure layers. i.e. if we want 850 hPa, ant for one point the upper layer 3 is 900 hPa and the lower layer 2 is 800 hPa, the value would be 2.5

But for the geopotential height, the sigma level is not the same, since the pressure is referenced to the half levels, and the geopotential to the full levels.
So the we will add 0.5 layers to the pressure layer to get the geopotential height layer:

layer_h = layer_p + 0.5

Then the final value will be:

h0 = numpy.floor(layer_h).astype(int)
h1 = numpy.ceil(layer_h).astype(int)
data_out = ( numpy.choose(h0, data_hgp) * (h1 - layer_h) ) + ( numpy.choose(h1, data_hgp) * (layer_h - h0) )

Note that the h0 and h1 (the positions of the layers above and below the final position) must be integers to be used with the choose function.

Extrapolation

If the wanted pressure is 500hPa, a data layer above and below will be always found. But when we get closer to the land, like at 850 hPa, another typical pressure to show the geopotential height, we find that in some points, the pressure value is above the pressure layers of the model. So an extrapolation is needed, and the previous code, that assumed that the value would always between two layers will fail.

So the data calculated above must be filtered for each case. For the non interpolated data:

data_out = data_out * (data_positions < num_bands_p)

The extrapolation is defined in the file module_interp.f90 from the ARWPost source code. There are two defined cases:

The height is below the lowest sigma level, but above the ground height defined for that point
The height is below the ground level

The first case is quite simple, because the surface data for many variables is available, so it only changes the layer used to calculate the interpolation:

##If the pressure is between the surface pressure and the lowest layer
array_filter = numpy.logical_and(data_positions == num_bands_p, p_ref*100 < data_psfc)
zlev = ( ( (100.*p_ref - 100*data_p[-1,:,:])*data_hgt + (data_psfc - 100.*p_ref)*data_hgp[-1,:,:] ) / (data_psfc - 100*data_p[-1,:,:])) 
data_out = data_out + (zlev * array_filter )

The second case needs to extrapolate below the ground level. The code at the file module_interp.f90 iterates over each element in the array, but using numpy will increase the speed a lot, so let's see how is it done:

expon=287.04*0.0065/9.81
ptarget = (data_psfc*0.01) - 150.0

kupper = numpy.apply_along_axis(lambda a: numpy.where(a == numpy.min(a))[0][0] + 1, axis = 0, arr = numpy.absolute(data_p - ptarget))

pbot=numpy.maximum(100*data_p[-1,:,:], data_psfc)
zbot=numpy.minimum(data_hgp[-1,:,:], data_hgt)

tbotextrap=numpy.choose(kupper, data_tk) * pow((pbot/(100*numpy.choose(kupper, data_p))),expon)
tvbotextrap=tbotextrap*(0.622+data_qv[-1,:,:])/(0.622*(1+data_qv[-1,:,:]))

zlev = zbot+((tvbotextrap/0.0065)*(1.0-pow(100.0*p_ref/pbot,expon)))

data_out = data_out + ( zlev * array_filter )

Expon and ptarget are calculated directly (ptarget is an array, but the operations are simple)
kupper is a parameter that could be defined as the level where the absolute value of the pressure - ptarget changes. Since we have a 3d array, being the 0th dimension the level, we can calculate the values in the 2d xy matrix by using the apply_along_axis method. This method will take all the levels and run the function defined in the first parameter passing the second parameter as the argument. In our case, the position (where) of the minimum value (min). We add the +1 to get the same result than the original code.
So in one line, we do what in the original code needs three nested loops!
The value is inverse to the ones at the original code, because we changed the pressure order.
pbot and zbot are only to take the value closer to the ground betweeen the lowest level and the ground level defined by the model
tbotextrap is the extrapolated temperature. The choose function is used to get the value at the calculated kupper level for each element in the matrix.
tvbotextrap is the virtual temperature for tbotextrap. The original code uses an external function, but I have joined everything. The formula for the virtual temperature is virtual=tmp*(0.622+rmix)/(0.622*(1.+rmix))
zlev can be calculated directly with the values calculated before, and is added to the output after applying the filter.

Drawing the map using Basemap

The output file can be opened with QGIS, or any other GIS capable to read GeoTIFF files.

But is possible to draw it directly from the script using Basemap:

from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
m = Basemap(projection='cyl',llcrnrlat=40.5,urcrnrlat=43,\
            llcrnrlon=0,urcrnrlon=3.5,resolution='h')


cs = m.contourf(longs,lats,data_out,numpy.arange(1530,1600,10),latlon=True)
m.drawcoastlines(color='gray')
m.drawcountries(color='gray')

cbar = m.colorbar(cs,location='bottom',pad="5%")
cbar.set_label('m')

plt.title('Geopotential height at 850hPa')
plt.show()

First, we initialize the projection (latlon directly in this case), setting the map limits.
contourf will draw the isobands for each 10m.
Then, just draw the borders and coas lines and show the map.

This is how the map at the header was made.

Notes

To compile ARWPost in my Ubuntu, the file src/Makefile needs to be changed: -lnetcdf for -lnetcdff
It was very hard for me to find it!

Monday, August 5, 2013

Creating vectorial isobands with Python

GDAL has an utility program, gdal_contour, that generates contour lines from a raster file (take a look at his example post). GDAL provides access to this method through its API, so contour lines can be generated inside your GDAL python scripts.

Unfortunately, there is no way to generate isobands, that is, filled contour lines that use polygons instead of polylines. Creating isobands allows the creation of nice contour maps easily, colouring the zones between two values:

Using contour lines, the result is this one:

Both visualizations are drawn directly from QGIS.

So I created two scripts to get the result. As usual, you can download them from my GitHub account.
I created two scripts, since the one I like the most uses matplotlib, so sometimes it's not convenient to install. On the other hand, the other has less requirements (GDAL python), but the code is dirtier.

Both have the same usage, which is very similar to the gdal_contour usage:

    Usage: isobands_matplotlib.py [-h] [-b band] [-off offset] [-i interval]
    [-nln layer_name] [-a attr_name] [-f formatname]
    src_file out_file
    Calculates the isobands from a raster into a vector file

    positional arguments:
      src_file         The raster source file
      out_file         The vectorial out file
    optional arguments:
      -h, --help       show this help message and exit
      -b band          The band in the source file to process (default 1)
      -off offset      The offset to start the isobands (default 0)
      -i interval      The interval (default 0)
      -nln layer_name  The out layer name (default bands)
      -a attr_name     The out layer attribute name (default h)
      -f formatname    The output file format name (default ESRI Shapefile)

Just choose the input and output files, the input file band to process (1 by default), and the interval, and you will get the vector file in the GIS format of your choice.

Now, the code of both script, step by step.

isobands_matplotlib.py

As it's name suggests, this version uses matplotlib, and more specifically the contourf function, to create the isobands.

The script has a function called isobands, that actually generates the data, and a main part that reads the user input values:

if __name__ == "__main__":

    PARSER = ArgumentParser(
        description="Calculates the isobands from a raster into a vector file")
    PARSER.add_argument("src_file", help="The raster source file")
    PARSER.add_argument("out_file", help="The vectorial out file")
    PARSER.add_argument("-b", 
        help="The band in the source file to process (default 1)", 
        type=int, default = 1, metavar = 'band')
    PARSER.add_argument("-off", 
        help="The offset to start the isobands (default 0)", 
        type=float, default = 0.0, metavar = 'offset')
    PARSER.add_argument("-i", 
        help="The interval  (default 0)", 
        type=float, default = 0.0, metavar = 'interval')
    PARSER.add_argument("-nln", 
        help="The out layer name  (default bands)", 
        default = 'bands', metavar = 'layer_name')
    PARSER.add_argument("-a", 
        help="The out layer attribute name  (default h)", 
        default = 'h', metavar = 'attr_name')
    PARSER.add_argument("-f", 
        help="The output file format name  (default ESRI Shapefile)", 
        default = 'ESRI Shapefile', metavar = 'formatname')
    ARGS = PARSER.parse_args()

    isobands(ARGS.src_file, ARGS.b, ARGS.out_file, ARGS.f, ARGS.nln, ARGS.a, 
        ARGS.off, ARGS.i)

I used the argparse.ArgumentParser to read the script arguments, since is easy to use and it should come woth most of the updated python distributions (since python 2.7, I think).
This part of the code is quite easy to understand, it simply sets the arguments with their default value, sets if they are mandatory or not, etc.

The isobands function creates the vector file:

def isobands(in_file, band, out_file, out_format, layer_name, attr_name, 
    offset, interval, min_level = None):
    '''
    The method that calculates the isobands
    '''
    ds_in = gdal.Open(in_file)
    band_in = ds_in.GetRasterBand(band)
    xsize_in = band_in.XSize
    ysize_in = band_in.YSize

    geotransform_in = ds_in.GetGeoTransform()

    srs = osr.SpatialReference()
    srs.ImportFromWkt( ds_in.GetProjectionRef() )  

    #Creating the output vectorial file
    drv = ogr.GetDriverByName(out_format)

    if exists(out_file):
        remove(out_file)
    dst_ds = drv.CreateDataSource( out_file )
       
    dst_layer = dst_ds.CreateLayer(layer_name, geom_type = ogr.wkbPolygon, 
        srs = srs)

    fdef = ogr.FieldDefn( attr_name, ogr.OFTReal )
    dst_layer.CreateField( fdef )



    x_pos = arange(geotransform_in[0], 
        geotransform_in[0] + xsize_in*geotransform_in[1], geotransform_in[1])
    y_pos = arange(geotransform_in[3], 
        geotransform_in[3] + ysize_in*geotransform_in[5], geotransform_in[5])
    x_grid, y_grid = meshgrid(x_pos, y_pos)

    raster_values = band_in.ReadAsArray(0, 0, xsize_in, ysize_in)

    stats = band_in.GetStatistics(True, True)
    if min_level == None:
        min_value = stats[0]
        min_level = offset + interval * floor((min_value - offset)/interval)
   
    max_value = stats[1]
    #Due to range issues, a level is added
    max_level = offset + interval * (1 + ceil((max_value - offset)/interval)) 

    levels = arange(min_level, max_level, interval)

    contours = plt.contourf(x_grid, y_grid, raster_values, levels)

    

    for level in range(len(contours.collections)):
        paths = contours.collections[level].get_paths()
        for path in paths:

            feat_out = ogr.Feature( dst_layer.GetLayerDefn())
            feat_out.SetField( attr_name, contours.levels[level] )
            pol = ogr.Geometry(ogr.wkbPolygon)


            ring = None            
            
            for i in range(len(path.vertices)):
                point = path.vertices[i]
                if path.codes[i] == 1:
                    if ring != None:
                        pol.AddGeometry(ring)
                    ring = ogr.Geometry(ogr.wkbLinearRing)
                    
                ring.AddPoint_2D(point[0], point[1])
            

            pol.AddGeometry(ring)
            
            feat_out.SetGeometry(pol)
            if dst_layer.CreateFeature(feat_out) != 0:
                print "Failed to create feature in shapefile.\n"
                exit( 1 )

            
            feat_out.Destroy()

Lines 26 to 49 open the input file and create the output vector file.
Lines 53 to 57 set two matrices, containing the coordinates for each pixel in the input file. To do it, two vectors are created first with the coordinates calculated from the input geotransform, and then the meshgrid is called to generate the 2d matrices, needed by matplotlib to make its calculations.
Lines 59 to 70 read the input file data and calculates the levels inside it, from the interval and the offset. Since calculating isobands for levels not present in the file isn't efficient, the minimum and maximum values in the input file are calculated before the isobands.
The line 72 is where the isobands are actually calculated using the contourf function.
Lines 76 to 105 create the polygons at the output vector file.

The generated contour has a property called collections, containing every isoband.
Each element in the collections has the paths of the isolines.
The property codes in the path indicates if a new ring has to be generated.
The other lines are to generate an OGR feature

isobands_gdal.py

This script is quite dirtier, since I have used a trick to get the isobands from the ContourGenerate function.

Closing the contour lines, which is what has to be done to transform isolines to isobands, is not an easy task.

Looking at some D3js examples I got the idea from Jason Davies conrec.js. I can't find the original example now, but basicaly, the idea is to add some pixels around the image with a value lower than the lowest value, so all the isolines will be closed. The solution gets closer, but some problems must be solved:

Convert the isolines to polygons
Clip the vectors to the original extent
Substract the overlapping polygons, since when the lines are converted to polygons, all the area above the isoline value is contained in the polygon even though other lines with higher values are inside

When not clipped, the isolines will create strange effects at the borders

The main part code is exacly the same as in the other script, and the isobands function is:

def isobands(in_file, band, out_file, out_format, layer_name, attr_name, 
    offset, interval, min_level = None):
    '''
    The method that calculates the isobands
    '''    
    #Loading the raster file
    ds_in = gdal.Open(in_file)
    band_in = ds_in.GetRasterBand(band)
    xsize_in = band_in.XSize
    ysize_in = band_in.YSize

    stats = band_in.GetStatistics(True, True)
    
    if min_level == None:
        min_value = stats[0]
        min_level = ( offset + interval * 
            (floor((min_value - offset)/interval) - 1) )
    nodata_value = min_level - interval    



    geotransform_in = ds_in.GetGeoTransform()
    
    srs = osr.SpatialReference()
    srs.ImportFromWkt( ds_in.GetProjectionRef() )  

    data_in = band_in.ReadAsArray(0, 0, xsize_in, ysize_in)


    #The contour memory
    contour_ds = ogr.GetDriverByName('Memory').CreateDataSource('')
    contour_lyr = contour_ds.CreateLayer('contour', 
        geom_type = ogr.wkbLineString25D, srs = srs )
    field_defn = ogr.FieldDefn('ID', ogr.OFTInteger)
    contour_lyr.CreateField(field_defn)
    field_defn = ogr.FieldDefn('elev', ogr.OFTReal)
    contour_lyr.CreateField(field_defn)

    #The in memory raster band, with new borders to close all the polygons
    driver = gdal.GetDriverByName( 'MEM' )
    xsize_out = xsize_in + 2
    ysize_out = ysize_in + 2

    column = numpy.ones((ysize_in, 1)) * nodata_value
    line = numpy.ones((1, xsize_out)) * nodata_value

    data_out = numpy.concatenate((column, data_in, column), axis=1)
    data_out = numpy.concatenate((line, data_out, line), axis=0)

    ds_mem = driver.Create( '', xsize_out, ysize_out, 1, band_in.DataType)
    ds_mem.GetRasterBand(1).WriteArray(data_out, 0, 0)
    ds_mem.SetProjection(ds_in.GetProjection())
    #We have added the buffer!
    ds_mem.SetGeoTransform((geotransform_in[0]-geotransform_in[1],
        geotransform_in[1], 0, geotransform_in[3]-geotransform_in[5], 
        0, geotransform_in[5]))
    gdal.ContourGenerate(ds_mem.GetRasterBand(1), interval, 
        offset, [], 0, 0, contour_lyr, 0, 1)

    #Creating the output vectorial file
    drv = ogr.GetDriverByName(out_format)
    if exists(out_file):
        remove(out_file)
    dst_ds = drv.CreateDataSource( out_file )

      
    dst_layer = dst_ds.CreateLayer(layer_name, 
        geom_type = ogr.wkbPolygon, srs = srs)

    fdef = ogr.FieldDefn( attr_name, ogr.OFTReal )
    dst_layer.CreateField( fdef )


    contour_lyr.ResetReading()

    geometry_list = {}
    for feat_in in contour_lyr:
        value = feat_in.GetFieldAsDouble(1)

        geom_in = feat_in.GetGeometryRef()
        points = geom_in.GetPoints()

        if ( (value >= min_level and points[0][0] == points[-1][0]) and 
            (points[0][1] == points[-1][1]) ):
            if (value in geometry_list) is False:
                geometry_list[value] = []

            pol = ogr.Geometry(ogr.wkbPolygon)
            ring = ogr.Geometry(ogr.wkbLinearRing)

            for point in points:

                p_y = point[1]
                p_x = point[0]
                          
                if p_x < (geotransform_in[0] + 0.5*geotransform_in[1]):
                    p_x = geotransform_in[0] + 0.5*geotransform_in[1]
                elif p_x > ( (geotransform_in[0] + 
                    (xsize_in - 0.5)*geotransform_in[1]) ):
                    p_x = ( geotransform_in[0] + 
                        (xsize_in - 0.5)*geotransform_in[1] )
                if p_y > (geotransform_in[3] + 0.5*geotransform_in[5]):
                    p_y = geotransform_in[3] + 0.5*geotransform_in[5]
                elif p_y < ( (geotransform_in[3] + 
                    (ysize_in - 0.5)*geotransform_in[5]) ):
                    p_y = ( geotransform_in[3] + 
                        (ysize_in - 0.5)*geotransform_in[5] )

                ring.AddPoint_2D(p_x, p_y)
                
  
            pol.AddGeometry(ring)
            geometry_list[value].append(pol)



    values = sorted(geometry_list.keys())

    geometry_list2 = {}

    for i in range(len(values)):
        geometry_list2[values[i]] = []
        interior_rings = []
        for j in range(len(geometry_list[values[i]])):
            if (j in interior_rings) == False:
                geom = geometry_list[values[i]][j]
                
                for k in range(len(geometry_list[values[i]])):
                    
                    if ((k in interior_rings) == False and 
                        (j in interior_rings) == False):
                        geom2 = geometry_list[values[i]][k]
                        
                        if j != k and geom2 != None and geom != None:
                            if geom2.Within(geom) == True:
                                
                                geom3 = geom.Difference(geom2)
                                interior_rings.append(k)
                                geometry_list[values[i]][j] = geom3
                                                            
                            elif geom.Within(geom2) == True:
                                
                                geom3 = geom2.Difference(geom)
                                interior_rings.append(j)
                                geometry_list[values[i]][k] = geom3
                    
        for j in range(len(geometry_list[values[i]])):
            if ( (j in interior_rings) == False and 
                geometry_list[values[i]][j] != None ):
                geometry_list2[values[i]].append(geometry_list[values[i]][j])
    

    for i in range(len(values)):
        value = values[i]
        if value >= min_level:
            for geom in geometry_list2[values[i]]:
                
                if i < len(values)-1:

                    for geom2 in geometry_list2[values[i+1]]:
                        if geom.Intersects(geom2) is True:
                            geom = geom.Difference(geom2)
                
                feat_out = ogr.Feature( dst_layer.GetLayerDefn())
                feat_out.SetField( attr_name, value )
                feat_out.SetGeometry(geom)
                if dst_layer.CreateFeature(feat_out) != 0:
                    print "Failed to create feature in shapefile.\n"
                    exit( 1 )
                feat_out.Destroy()

Lines 23 to 44 read the input file, including the data, geotransform, etc.
Lines 47 to 54 create a vector file where the contour lines will be written. Since the data must be clipped later, the file is a temporary memory file.
Lines 56 to 71 create a memory raster file with the input data, plus a buffer around it with a value lower than the lowest in the input data, so the contour lines will get closed.
Line 74 calls the ContourGenerate function, that will add the contour lines to the vectorial memory file.
Lines 77 to 91 create the actual output file
Lines 93 to 130 create a collection of geometries with the polygons.

The lines are transformed to polygons by creating a ring from the polyline and adding it to a polygon.
Lines 113 to 123 clip the polygons to the original geotransform, by moving the coordinates of the points outside the geotransform to the border.

Lines 138 to 167 stores the difference from the polygons that overlap, so the output polygons contain only the zones where the values are in the interval.
Lines 170 to 187 write the final polygons to the output file, the same way than the other script.

Notes:

An other option would be, of course, creating the polygons myself. I tried to do it, but as you can see at the marching squares wikipedia page, the algorithm is not easy when using polygons, and especially the matplotlib version of the scripts is quite clean.

The scripts have a problem when there is a nodata zone, since I haven't added the code to handle it.

Finally, as asked in the GDAL-dev mailing list, the best option would be modifying the ContourGenerate function to create isobands, but I think that is out of my scope.
The discussion at the GDAL-dev mailing list where I posted the first version.

The data used for the examples is the same I used at the raster classification post.

Some links I have used:
http://matplotlib.org/api/path_api.html
http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.contourf
http://matplotlib.org/basemap/api/basemap_api.html#mpl_toolkits.basemap.Basemap.contourf
http://fossies.org/dox/matplotlib-1.2.0/classmatplotlib_1_1contour_1_1QuadContourSet.html

Saturday, June 29, 2013

GDAL Performance II: Create a PNG from a raster file

After improving the raster classification script, in this entry, is the turn for the PNG creation script.

As usual, all the code is at GitHub

Problems in the original script

The PIL python module was used to draw the PNG image pixel by pixel. This can be changed using the SciPy function scipy.misc.imsave. Using it, all the iterations, comparisons, etc. are done at a lower level, so the script is cleaner and much faster.
GDAL blocks weren't used, so all the image was loaded into a huge array if the original raster was big. Using blocks as in the last entry example, performance improves a lot.
All the values in the color file were evaluated. If the minimum and maximum values of the raster are known, many of these comparisons can be avoided, so the performance there is also improved.

The new script

The new script works exactly the same way as the old one at the user level.

'''
Script to generate a png from a raster file.
It can be used from the command line or from an other python script
'''
from os.path import exists
from osgeo import gdal
from osgeo.gdalconst import GA_ReadOnly
from sys import argv
from sys import exit
from numpy import logical_and
from numpy import zeros
from numpy import uint8
import scipy

'''
Main function.
Requires the input file, the color file and the otuput file name. 
The raster band is 1 by default. 
If nothing is asked, the discrete colroscale image is created.
'''
def raster2png(raster_file, color_file, out_file_name, raster_band=1, discrete=True):

    if exists(raster_file) is False:
            raise Exception('[Errno 2] No such file or directory: \'' + raster_file + '\'')    
    
    dataset = gdal.Open(raster_file, GA_ReadOnly )
    if dataset == None:
        raise Exception("Unable to read the data file")
    
    band = dataset.GetRasterBand(raster_band)

    block_sizes = band.GetBlockSize()
    x_block_size = block_sizes[0]
    y_block_size = block_sizes[1]

    xsize = band.XSize
    ysize = band.YSize

    max_value = band.GetMaximum()
    min_value = band.GetMinimum()

    if max_value == None or min_value == None:
        stats = band.GetStatistics(0, 1)
        max_value = stats[1]
        min_value = stats[0]

    #Reading the color table
    color_table = readColorTable(color_file)
    #Adding an extra value to avoid problems with the last & first entry
    if sorted(color_table.keys())[0] > min_value:
        color_table[min_value - 1] = color_table[sorted(color_table.keys())[0]]

    if sorted(color_table.keys())[-1] < max_value:
        color_table[max_value + 1] = color_table[sorted(color_table.keys())[-1]]
    #Preparing the color table and the output file
    classification_values = color_table.keys()
    classification_values.sort()
    

    rgb = zeros((ysize, xsize, 4), dtype = uint8)

    for i in range(0, ysize, y_block_size):
        if i + y_block_size < ysize:
            rows = y_block_size
        else:
            rows = ysize - i
        
        for j in range(0, xsize, x_block_size):
            if j + x_block_size < xsize:
                cols = x_block_size
            else:
                cols = xsize - j
                
            
            values = band.ReadAsArray(j, i, cols, rows)
            r = zeros((rows, cols), dtype = uint8)
            g = zeros((rows, cols), dtype = uint8)
            b = zeros((rows, cols), dtype = uint8)
            a = zeros((rows, cols), dtype = uint8)

            for k in range(len(classification_values) - 1):
                #print classification_values[k]
                if classification_values[k] < max_value and (classification_values[k + 1] > min_value ):
                    mask = logical_and(values >= classification_values[k], values < classification_values[k + 1])
                    if discrete == True:
                        r = r + color_table[classification_values[k]][0] * mask
                        g = g + color_table[classification_values[k]][1] * mask
                        b = b + color_table[classification_values[k]][2] * mask
                        a = a + color_table[classification_values[k]][3] * mask
                    else:
                        v0 = float(classification_values[k])
                        v1 = float(classification_values[k + 1])

                        r = r + mask * (color_table[classification_values[k]][0] + (values - v0)*(color_table[classification_values[k + 1]][0] - color_table[classification_values[k]][0])/(v1-v0) )
                        g = g + mask * (color_table[classification_values[k]][1] + (values - v0)*(color_table[classification_values[k + 1]][1] - color_table[classification_values[k]][1])/(v1-v0) )
                        b = b + mask * (color_table[classification_values[k]][2] + (values - v0)*(color_table[classification_values[k + 1]][2] - color_table[classification_values[k]][2])/(v1-v0) )
                        a = a + mask * (color_table[classification_values[k]][3] + (values - v0)*(color_table[classification_values[k + 1]][3] - color_table[classification_values[k]][3])/(v1-v0) )
                     
            rgb[i:i+rows,j:j+cols, 0] = r 
            rgb[i:i+rows,j:j+cols, 1] = g 
            rgb[i:i+rows,j:j+cols, 2] = b
            rgb[i:i+rows,j:j+cols, 3] = a   

    scipy.misc.imsave(out_file_name, rgb)
    

def readColorTable(color_file):
    '''
    The method for reading the color file.
    * If alpha is not defined, a 255 value is set (no transparency).
    '''    

    color_table = {}
    if exists(color_file) is False:
        raise Exception("Color file " + color_file + " does not exist")
    
    fp = open(color_file, "r")
    for line in fp:
        if line.find('#') == -1 and line.find('/') == -1:
            entry = line.split()
            if len(entry) == 5:
                alpha = int(entry[4])
            else:
                alpha=255
            color_table[eval(entry[0])]=[int(entry[1]),int(entry[2]),int(entry[3]),alpha]
    fp.close()
    
    return color_table

'''
Usage explanation
'''
def Usage():
    print "Not enough arguments." 
    print "Usage:"
    print argv[0] + ' [-exact_color_entry] [-band=1] data_file color_file output_file'    
    exit()

'''
Program Mainline
'''
if __name__ == "__main__":
    
    file_name = None
    colorfile_name = None
    out_file_name = None
    discrete = False
    band = 1

    i = 1
    while i < len(argv):
        arg = argv[i]
        if arg == '-exact_color_entry':
            discrete = True
        elif arg == '-band':
            band = argv[i+1]
            i = i + 1
        elif file_name is None:
            file_name = arg
            file_name = file_name.replace("'","")
            file_name = file_name.replace('"','')
        elif colorfile_name is None:
            colorfile_name = arg
            colorfile_name = colorfile_name.replace("'","")
            colorfile_name = colorfile_name.replace('"','')
        elif out_file_name is None:
            out_file_name = arg
            out_file_name = out_file_name.replace("'","")
            out_file_name = out_file_name.replace('"','')
        i = i + 1   



    if len(argv) == 1 or file_name == None or colorfile_name == None or out_file_name == None: 
       Usage()     
    '''
    try:
        raster2png(file_name,colorfile_name,out_file_name,band,discrete)
    except Exception, ex:
        print "Error: " + str(ex)
    '''
    raster2png(file_name,colorfile_name,out_file_name,band,discrete)

The part of the code that reads the input (lines 130 - 182) hasn't changed, like the function readColorTable, that reads the color file so it can be processed. Only the main method raster2png has changed.
Lines 26 to 45 read the raster file metadata, including the extreme values in the file.
Then, the colorfile is read, and extra values are added if the raster file has values above the maximum value at the color file or below the minimum (lines 47-57). If this isn't done, some strange nodata zones will appear.
At line 60, an array is created, where the four bands (red, green, blue and alpha) of the output png will be stored.
Then, every block of data is evaluated (after line 62). Empty arrays are created for each band, and for every value in the colorfile, the affected pixels are chosen at line 84.
Depending on the user choice, the colour is assigned either directly from the colorfile value or interpolating from the two most close colors. (lines 85-97) Line 91 and 92 have a float cast, very important to get the right value.
Lines 99-103 assign the block to the right output array region (see how a region of a NumPy array is chosen)
Line 104 creates the png file.

Benchmarks

To check the results, I have used the same raster used at the original post, and for bigger stuff, the rasters I downloaded for the last post (read it for the big file creation).

Old script

File	with -exact_color_entry	Continuous colors
w001001.adf (1126x576px)	9.363s	13.329s
srtm_54_05.tif (6001x6001px)	9m19s	12m44s
big.tif (12001x12001px)	Error	Error

New script

File	with -exact_color_entry	Continuous colors
w001001.adf (1126x576px)	0.822s	1.653s
srtm_54_05.tif (6001x6001px)	22.503s	1m07s
big.tif (12001x12001px)	1m46s	4m58s

The improvement is really big and, sometimes, the old version will crash because not enough memory will be available.

What's next

I'm not satisfied yet with the way the image is saved using SciPy, because all the image must be in a single array, instead of working by blocks. Maybe there's an alternative to do this more efficiently. With really big images, the script will fail even using the new script version.

Monday, June 3, 2013

GDAL performance: raster classification using NumPy

I wrote some time ago two posts about raster classification and how to colorize a raster using GDAL. Both examples work well, but they are terribly inefficient and won't process a really big raster.
In this post, I'll show how to work properly with rasters, specially with big rasters or when the processing time matters, such as on the fly raster generation processes.
Before continuing, I recommend to read this course about Geoprocessing by Chris Garrard, that helped me a lot.

The classified raster seen on QGIS

As usual, you can find all the code at GitHub

The original code

To learn about the major errors in my former posts, the best is to look at the code:

#! /usr/bin/python

#Change the value with your raster filename here
raster_file = 'w001001.adf'
output_file = 'classified.tiff'

classification_values = [0,500,1000,1500,2000,2500,3000,3500,4000] #The interval values to classify
classification_output_values = [10,20,30,40,50,60,70,80,90] #The value assigned to each interval

from osgeo import gdal
from osgeo.gdalconst import *
import numpy
import struct

#Opening the raster file
dataset = gdal.Open(raster_file, GA_ReadOnly )
band = dataset.GetRasterBand(1)
#Reading the raster properties
projectionfrom = dataset.GetProjection()
geotransform = dataset.GetGeoTransform()
xsize = band.XSize
ysize = band.YSize
datatype = band.DataType

#Reading the raster values
values = band.ReadRaster( 0, 0, xsize, ysize, xsize, ysize, datatype )
#Conversion between GDAL types and python pack types (Can't use complex integer or float!!)
data_types ={'Byte':'B','UInt16':'H','Int16':'h','UInt32':'I','Int32':'i','Float32':'f','Float64':'d'}
values = struct.unpack(data_types[gdal.GetDataTypeName(band.DataType)]*xsize*ysize,values)

#Now that the raster is into an array, let's classify it
out_str = ''
for value in values:
    index = 0
    for cl_value in classification_values:
        if value <= cl_value:
            out_str = out_str + struct.pack('B',classification_output_values[index])
            break
        index = index + 1
#Once classified, write the output raster
#In the example, it's not possible to use the same output format than the input file, because GDAL is not able to write this file format. Geotiff will be used instead
gtiff = gdal.GetDriverByName('GTiff') 
output_dataset = gtiff.Create(output_file, xsize, ysize, 4)
output_dataset.SetProjection(projectionfrom)
output_dataset.SetGeoTransform(geotransform)

output_dataset.GetRasterBand(1).WriteRaster( 0, 0, xsize, ysize, out_str ) 
output_dataset = None

Reading the raster without using the ReadAsArray function (lines 26 to 29). Is more inefficient and really more complicated.
Iterating pixel by pixel to calculate the classified values instead of doing it at once using the NumPy arrays capabilities.
Classifying all the possible values, instead of looking for the maximum and minimum values of the raster and classify it only in this interval
Not reading the raster by block size, which is much more efficient in big rasters, and indispensable for the really big ones.

I'll explain how to change that point by point.

New code

The following code has all the changes together, but at the GitHub you can find different versions applying them one by one:

import numpy
from numpy import zeros
from numpy import logical_and
from osgeo import gdal

classification_values = [0,500,1000,1500,2000,2500,3000,3500,4000,4500,5000,5500,6000,6500,7000,7500,8000] #The interval values to classify  
classification_output_values = [10,20,30,40,50,60,70,80,90,100,110,120,130,140,150,160,170] #The value assigned to each interval  
  
in_file = "YourInputFileHere.tif"

ds = gdal.Open(in_file)
band = ds.GetRasterBand(1)

block_sizes = band.GetBlockSize()
x_block_size = block_sizes[0]
y_block_size = block_sizes[1]

xsize = band.XSize
ysize = band.YSize

max_value = band.GetMaximum()
min_value = band.GetMinimum()

if max_value == None or min_value == None:
    stats = band.GetStatistics(0, 1)
    max_value = stats[1]
    min_value = stats[0]

format = "GTiff"
driver = gdal.GetDriverByName( format )
dst_ds = driver.Create("original_blocks.tif", xsize, ysize, 1, gdal.GDT_Byte )
dst_ds.SetGeoTransform(ds.GetGeoTransform())
dst_ds.SetProjection(ds.GetProjection())

for i in range(0, ysize, y_block_size):
    if i + y_block_size < ysize:
        rows = y_block_size
    else:
        rows = ysize - i
    for j in range(0, xsize, x_block_size):
        if j + x_block_size < xsize:
            cols = x_block_size
        else:
            cols = xsize - j

        data = band.ReadAsArray(j, i, cols, rows)
        r = zeros((rows, cols), numpy.uint8)

        for k in range(len(classification_values) - 1):
            if classification_values[k] <= max_value and (classification_values[k + 1] > min_value ):
                r = r + classification_output_values[k] * logical_and(data >= classification_values[k], data < classification_values[k + 1])
        if classification_values[k + 1] < max_value:
            r = r + classification_output_values[k+1] * (data >= classification_values[k + 1])

        dst_ds.GetRasterBand(1).WriteArray(r,j,i)

dst_ds = None

At line 46, you can see how ReadAsArray works. No conversion to NumPy array has to be done, and no types are needed to be specified. The example works using blocks, but you can read the whole array just by changing the parameters to ReadAsArray (0, 0, xsize, ysize).
This is the main change.

At line 47, an empty array is created, specifying that the output will be an array of unsigned bytes (the most efficient for classifications up to 256 values)
For every possible value in the classification (line 49), the input raster is evaluated, using the NumPy function logical_and. This creates an array with a boolean value set to true for each pixel that meets the condition. Then, the array is multiplied to the output classification value and added to the output array.
When this is done for each value in the scale, the output classified raster is get.
To see better how to do it, take a look at the file classification_numpy_arrays.py, since working by blocks makes things a bit more difficult to understand.

At lines 21 to 27, the maximum and minimum values are calculated. Note the if block. Some formats store these values, and others, like GeoTIFF, don't. But GDAL can calculate them, and stores the values in a file with the extension .aux.xml, so the next time the calculation is not needed.
All this takes a time, but then, at line 50, the if can be done, and the process improves a lot if there are many values to classify out of the image range.

Performance improvements

To see how does the new code affect the performance I needed bigger rasters than at the old example. The example files I have downloaded are from the CGIAR SRTM elevation data. I have chosen a region from the Himalaya (srtm_54_05.tif).

The results for this file are:

classification_original.py: 2m22.047s
classification_numpy_array.py: 0m32.359s
classification_blocks.py: 0m17.900s
classification_blocks_minmax.py: 0m14.223s

to test the time I ahve just run the script with using time:
time python classification_original.py

So the biggest improvement comes from using NumPy arrays instead of reading the file pixel by pixel. Then, using blocks instead of reading the whole array, also improves the performance more than a 40%, since the file is quite big (69MB)
Using the maximum and minimum can change the efficiency depending of the raster range.

But to see the importance of using blocks, I have merged four images from the same source, using:
gdal_merge.py -o big.tiff srtm*.tif

So the resulting GeoTIFF is 275MB, too big for my old computer to read it at once. The results are:

classification_original.py: 4m41.438s + Memory Error
classification_numpy_array.py: 1m5.858s + Memory Error
classification_blocks.py: 1m11.053s
classification_blocks_minmax.py: 0m50.891s

As you can see, not reading by block can make the script crash, so if you can have big files, it's mandatory.
And using all the improvements, the time is still less than the half it lasted the original script to process a raster 75% smaller!

Next post, I will explain how to colour the rasters using color tables or SciPy to generate PNG ths fast way.

Saturday, May 25, 2013

Drawing wind barbs: GDAL python

Wind on a map is usually represented using wind barbs, a graphical way to show both speed and direction. The problem is that most GIS software is not prepared to draw them directly.
In this entry, I'll show how I use to generate PNG files from the meteorological model GFS, the same from the entry showing how to do some raster calculations.

As usual, the code is available at GitHub.

First the file that generates the png:

'''
Draws wind barbs on a png from a GRIB file using PIL
'''

import Image
import ImageDraw
from osgeo import gdal
from osgeo import osr
from osgeo.gdalconst import GA_ReadOnly
from math import floor
from math import ceil
from math import sqrt
from math import atan2
from math import degrees


def find_band_number(dataset, variable, level):
    '''
    Finds the band number inside the GRIB file, given the variable and the level names
    '''
    for i in range(1,dataset.RasterCount + 1):
        band = dataset.GetRasterBand(i)
        metadata = band.GetMetadata()
        band_level = metadata['GRIB_SHORT_NAME']
        band_variable = metadata['GRIB_ELEMENT']
        if (variable == band_variable) and (level == band_level):
            return i
    return None

def draw_barb(size, module, angle, color):
    extra_pixels = 10 #Avoid cutting the barb when rotating
    barb = Image.new('RGBA', (size + extra_pixels, size + extra_pixels))
    barb_draw = ImageDraw.Draw(barb)
    size = float(size) #Force it to float  
    separation = size/6
    module5 = int(round(module/5))

    x,v = divmod(module5,2)
    l,x = divmod(x,5)

    barb_draw.line([(extra_pixels,extra_pixels+size/2),(size+extra_pixels,extra_pixels+size/2)],color)
    pos = 0
    for nl in range(0,l,1): #50 kt triangles
        barb_draw.polygon([(extra_pixels+pos,extra_pixels+size/2),(extra_pixels+pos + extra_pixels+size/8,0),(extra_pixels+pos+size/4,extra_pixels+size/2),(extra_pixels+pos,extra_pixels+size/2)],color)

        pos = pos + size/4 + separation
    for nx in range(0,x,1):
        barb_draw.line([(extra_pixels+pos,extra_pixels+size/2),(extra_pixels+pos,extra_pixels+0)],color)

        pos = pos + separation
    if pos == 0: #advance a little for 5kt barbs
        pos = pos + separation
    if v == 1: # Only 0 or 1 are possible
        barb_draw.line([(extra_pixels+pos,extra_pixels+size/2),(extra_pixels+pos,extra_pixels+size/4)],color)

    barb = barb.rotate(angle, Image.BICUBIC)

    return barb

def draw_wind_barbs(xsize, ysize, out_file, data_file, epsg, geotransform, u_var, v_var, level, separation, barb_size, barb_color):
    out_img = Image.new('RGBA', (xsize, ysize) )

    dataset = gdal.Open(data_file, GA_ReadOnly )
    u_band_id = find_band_number(dataset, u_var, level)
    v_band_id = find_band_number(dataset, v_var, level)

    band_u = dataset.GetRasterBand(u_band_id)
    band_v = dataset.GetRasterBand(v_band_id)

    geotransform_in = dataset.GetGeoTransform()   

    xsize_in = band_u.XSize
    ysize_in = band_u.YSize

    values_u = band_u.ReadAsArray(0, 0, xsize_in, ysize_in)
    values_v = band_v.ReadAsArray(0, 0, xsize_in, ysize_in)
    

    proj_in = osr.SpatialReference()
    proj_in.SetFromUserInput(dataset.GetProjection())
    proj_out = osr.SpatialReference()
    proj_out.ImportFromEPSG(epsg)

    transf = osr.CoordinateTransformation(proj_out,proj_in)

    for px in range(0,xsize + separation,separation):
        x_out = geotransform[0] + px * geotransform[1]
        for py in range(0,ysize + separation,separation):
            y_out = geotransform[3] + py * geotransform[5]
            point = transf.TransformPoint(x_out, y_out,0)
            px_in = (float(point[0]) - geotransform_in[0]) / geotransform_in[1]
            py_in = (float(point[1]) - geotransform_in[3]) / geotransform_in[5]
           
            d1 = 1/sqrt( pow((px_in - floor(px_in)), 2) + pow((py_in - floor(py_in)), 2) )
            d2 = 1/sqrt( pow((ceil(px_in) - px_in), 2) + pow((py_in - floor(py_in)), 2) )
            d3 = 1/sqrt( pow((ceil(px_in) - px_in), 2) + pow((ceil(py_in) - py_in), 2) )
            d4 = 1/sqrt( pow((px_in - floor(px_in)), 2) + pow((ceil(py_in) - py_in), 2) ) 
            d_sum = d1 + d2 + d3 + d4   

            u = (d1*values_u[floor(py_in)][floor(px_in)] + d2*values_u[floor(py_in)][ceil(px_in)] + d3*values_u[ceil(py_in)][ceil(px_in)] + d4*values_u[ceil(py_in)][floor(px_in)]) / d_sum
            v = (d1*values_v[floor(py_in)][floor(px_in)] + d2*values_v[floor(py_in)][ceil(px_in)] + d3*values_v[ceil(py_in)][ceil(px_in)] + d4*values_v[ceil(py_in)][floor(px_in)]) / d_sum

            module = 1.943844494 * sqrt((u*u)+(v*v)) #in knots
           
            angle = degrees(atan2(v,u)) #Angle like in trigonometry, not wind direction where 0 is northern wind
            try:
                barb = draw_barb(barb_size, module, angle, barb_color)
                out_img.paste(barb, (px-(barb_size/2),py-(barb_size/2)), barb)
            except Exception, ex:
                raise Exception("No es pot dibuixar la barba: " + str(ex))
   
    out_img.save( out_file )

    #Creating the pngw worldfile
    fp = open(out_file.replace("png","pngw"),"w")
    print str(geotransform[1])+"\n0\n0\n"+str(geotransform[5])+"\n"+str(geotransform[0])+"\n"+str(geotransform[3])
    fp.write(str(geotransform[1])+"\n0\n0\n"+str(geotransform[5])+"\n"+str(geotransform[0])+"\n"+str(geotransform[3]))
    fp.close()

if __name__ == '__main__':
   
    draw_wind_barbs(600, 400, 'wind.png', 'gfs_3_20130524_0000_000.grb', 4326, [0, 0.1, 0, 70, 0, -0.1], 'UGRD', 'VGRD', '850-ISBL', 20, 20, '#000000')

The last part of the file (line 122) shows how to call tha main function draw_wind_barbs
draw_wind_barbs takes the following arguments:

xsize, ysize: The output image size
out_file: the output png file. An other file will be created with the extension pngw, as the worldfile.
data_file: The input file from the model (actually, any GRIB file with the wind in U and V components)
epsg, geotransform: The output projection parameters.
u_var, v_var, level: The wind component variable names (south-north is v, east-west is u) as they appear in the GRIB_ELEMENT metadata in the GRIB file. Level as it appears in the GRIB_SHORT_NAME metadata.
separation, barb_size, color: The separation and size in pixels at the output file (it's possible change the barb density)

From line 61 to 84, the function creates the output image according to the input file features and the output file as asked in the parameters

The number of the band corresponding to the u and v components is calculated with the function find_band_number, that looks into the file metadata to match the asked values with some of the existing bands. Lines 17-28

From 86 to 110, the barbs are drawn. For every n pixels as asked in the separation parameter, a barb is calculated. To do it, is necessary to know the position in the input file (lines 91 to 98), with the projection transformation,
Then the wind value is calculated from the closest four pixels in the original file, using the inverse of the distance method . lines 94-102
After the u and v components are known, the module and angle are calculated. Note that the original speeds are in m/s, but the barbs use knots, so the transformation is done here.The angle is not S-N, but W-E, which is fine for drawing, but a bit confusing. The coordinates origins are S and W.
With the angle and module, the barb can be drawn, using the function draw_barb, and the result pasted into the output file.
The draw_barb function divides the module by 5, 10 and 50 to know how many ticks the barb has to have, and draws the barb with the PIL polygon and line methods.
The file is written and the worldfile is created at lines 114-118

To download the data files and draw directly the png, I wrote the following script:

from urllib2 import urlopen
from urllib2 import HTTPError
from sys import exit
from datetime import datetime
from barbs import draw_wind_barbs

def download_file(year,month,day,hour,valid_time):
    filename = "gfs_3_%d%02d%02d_%02d00_%03d.grb"%(year, month, day, hour, valid_time)
    url = "http://nomads.ncdc.noaa.gov/data/gfs-avn-hi/%d%02d/%d%02d%02d/gfs_3_%d%02d%02d_%02d00_%03d.grb"%(year, month, year, month, day, year, month, day, hour, valid_time)
    try: 
        req = urlopen(url)
    except HTTPError:
        print "File not found: May be the date is incorrect?"
        exit()

    f = open(filename, 'w')
    f.write(req.read())
    f.close()
    return filename

if __name__ == "__main__":
    now = datetime.now()
    print now.year
    year = raw_input('Type year ('+str(now.year)+'): ')
    month = raw_input('Type month ('+str(now.month)+'): ')
    day = raw_input('Type day ('+str(now.day)+'): ')
    hour = raw_input('Type hour (0): ')
    valid_time = raw_input('Type valid time (0): ')
    if (year == ''):
        year = now.year
    if (month == ''):
        month = now.month
    if (day == ''):
        day = now.day  
    if (hour == ''):
        hour = 0
    if (valid_time == ''):
        valid_time = 0     
    filename = download_file(int(year),int(month),int(day),int(hour),int(valid_time))
    out_filename = filename.replace("grb","png")

    draw_wind_barbs(600, 400, out_filename, filename, 4326, [0, 0.1, 0, 70, 0, -0.1], 'UGRD', 'VGRD', '850-ISBL', 20, 20, '#000000')

The main part takes the date wanted by the user and sets it at the current day at 00h as default.
After that, calls the download_file function to download the GRIB file from the server.
Then call the draw_wind_barbs function to draw the map. Try changing the geotransform and image size to see how the output changes.

Finally, to draw the file with the map at the background, I have opened the png with QGIS (since it has the worldfile, is possible to do it), and added the background from naturalearth before exporting the image as a png.