GeoExamples: python

Showing posts with label python. Show all posts

Wednesday, November 26, 2014

Basemap Tutorial

Basemap is a great tool for creating maps using python in a simple way. It's a `matplotlib <http://matplotlib.org/>`_ extension, so it has got all its features to create data visualizations, and adds the geographical projections and some datasets to be able to plot coast lines, countries, and so on directly from the library.

Basemap has got `some documentation <http://matplotlib.org/basemap/index.html>`_, but some things are a bit more difficult to find. I started a readthedocs page to extend a little the original documentation and examples, but it grew a little, and now covers many of the basemap possibilities.

Some of the examples from the tutorial

The tutorial can be found at http://basemaptutorial.readthedocs.org/, and all the examples and its source code, at GitHub and it's available for sharing or being modified by adding the attribution.

The tutorial covers:

An introductory chapter, showing the basic functionalities, and how to work with the included methods for reading shapefiles
The explanation and an example for each of the basemap methods
A section with more advanced examples, such as 3D maps, multiplots or WRF examples

I would really appreciate some feedback, the comments are open!

Saturday, October 11, 2014

Basemap raster clipping with a shapefile

Basemap is a great library for mapping faster than other python options, but there are some usual things I couldn't find how to do. Clipping a raster using a shape is one of them. Here's how do I do it

The output

As usual, all the code can be found at GitHub

Getting some data

The example plots some elevation data, taken from the SRTM. After looking for some options, the easiest to work with was this one: http://srtm.csi.cgiar.org/SELECTION/inputCoord.asp

The shapefile will be the border of Andorra, taken from Natural Earth

The result is a little poor because the resolution is low, but works well for the example.

The script

from mpl_toolkits.basemap import Basemap
from matplotlib.path import Path
from matplotlib.patches import PathPatch
import matplotlib.pyplot as plt
from osgeo import gdal
import numpy
import shapefile

fig = plt.figure()
ax = fig.add_subplot(111)

sf = shapefile.Reader("ne_10m_admin_0_countries")

for shape_rec in sf.shapeRecords():
    if shape_rec.record[3] == 'Andorra':
        vertices = []
        codes = []
        pts = shape_rec.shape.points
        prt = list(shape_rec.shape.parts) + [len(pts)]
        for i in range(len(prt) - 1):
            for j in range(prt[i], prt[i+1]):
                vertices.append((pts[j][0], pts[j][1]))
            codes += [Path.MOVETO]
            codes += [Path.LINETO] * (prt[i+1] - prt[i] -2)
            codes += [Path.CLOSEPOLY]
        clip = Path(vertices, codes)
        clip = PathPatch(clip, transform=ax.transData)


m = Basemap(llcrnrlon=1.4,
    llcrnrlat=42.4,
    urcrnrlon=1.77,
    urcrnrlat=42.7,
    resolution = None, 
    projection = 'cyl')

ds = gdal.Open('srtm_37_04.tif')
data = ds.ReadAsArray()

gt = ds.GetGeoTransform()
x = numpy.linspace(gt[0], gt[0] + gt[1] * data.shape[1], data.shape[1])
y = numpy.linspace(gt[3], gt[3] + gt[5] * data.shape[0], data.shape[0])

xx, yy = numpy.meshgrid(x, y)

cs = m.contourf(xx,yy,data,range(0, 3600, 200))

for contour in cs.collections:
        contour.set_clip_path(clip)

plt.show()

I used the pyshp library for reading the shapefile, since Fiona and GDAL don't work well together, and OGR was longer
Lines 14 to 27 create the path. A Matplotlib path is made by two arrays. One with the points (called vertices in the script), and the other with the functions for every point (called codes)

In our case, only straight lines have to be used, so there will be a MOVETO to indicate the beginning of the polygon, many LINETO to create the segments and one CLOSEPOLY for closing it
Of course, only the polygon for Andorra has to be used. I get it from the shapefile attributes
The prt array is for managing multipolygons, which is not the case, but the code will create correct clipping for multipolygons
The path is created using the Path function, and then added to a PathPatch, to be able to use it as a closed polygon. Note the trasnform=ax.transData attribute. This assumes the polygon coordinates to be the ones used in the data (longitudes and latitudes in our case). More information here

Next code lines draw the map as usual. I have used a latlon projection, so all the values for the raster and shapefile can be used directly. If the output raster was in an other projection, the shapefile coordinates should be appended to the path using the output projection (m(pts[j][0], pts[j][1]))
The x and y coordinates are calculated from the GDAL geotransform, and then turned into a matrix using meshgrid
The clipping itself is made in the lines 48 and 49. For each drawn element, the method set_clip_path is applied

Links

SRTM data download
Matplotlib path tutorial
Clipping an image in Basemap
Understanding Basemap transformations

Clipping a raster with GDAL python

Saturday, August 16, 2014

Shortest distance to a geometry in a specified direction using Python

Looking at this map, I wondered how to calculate which geometry in a set is the closest to a point in a given direction.

Usually, the problem is finding the closest geometry in general, which is easy using the distance function, but I couldn't find a solution for this other.

So I put me this problem: Which is the closest country that I have at each direction, knowing my geographical coordinates?

All the source code is, as usual, at GitHub

The algorithm

The main idea is:

Create an infinite line from the point towards the desired direction.
Calculate the difference geometry between the line and each polygon

If the polygon and the line actually intersect, the result will be a multi-line. The first line length of the multi-line is the distance we are looking for

So this would be the initial situation:

And the distance to the polygon 1 would be calculated as:

The main problem is how to calculate the difference between the two geometries, but fortunately, shapely comes with this function, so coding it is not so difficult:

from shapely.geometry import Polygon
from shapely.geometry import LineString
from math import cos
from math import sin
from math import pi

def closest_polygon(x, y, angle, polygons, dist = 10000):
  
    angle = angle * pi / 180.0
    line = LineString([(x, y), (x + dist * sin(angle), y + dist * cos(angle))])

    dist_min = None
    closest_polygon = None
    for i in range(len(polygons)):
        difference = line.difference(polygons[i])
        if difference.geom_type == 'MultiLineString':
            dist = list(difference.geoms)[0].length
            if dist_min is None or dist_min > dist:
                dist_min = dist
                closest_polygon = i
        
    
    
    return {'closest_polygon': closest_polygon, 'distance': dist_min}


if __name__ == '__main__':

    polygons = []
    polygons.append(Polygon([(4, 2), (4, 4), (6, 4), (6, 2)]))
    polygons.append(Polygon([(7, 2), (7, 4), (9, 4), (9, 2)]))
   
    
    print closest_polygon(3, 3, 90, polygons)

The main section creates the two squares using shapely
The closest_polygon function calculates the closest polygon and its distance:

A LineString to the desired direction is calculated. The dist is in the units used by the polygons. An infinite line isn't possible, so the distance must be larger than the further
For each of the polygons to analyze, the difference is calculated using the shapely difference method
Then, if the line and the polygon intersect (and the line is long enough), a MultilineString will be the result of the difference operation. The first String in the MultilineString is the one that connects our point with the polygon. Its length is the distance we are looking for

The example schema, drawn with the script draw_closest.py

Calculating the closest country in each direction

After getting the formula for calculating the closest polygon, the next step would be using it for something. So:

Which country do I have in all directions?

To create the script, some things have to be considered:

The projection should be azimuthal equidistant so the distances can be compared in all the directions from the given point
I've used the BaseMap library to draw the maps. I find it a bit tricky to use, but the code will be shorter

The script is used this way:

usage: closest_country.py [-h] [-n num_angles] [-o out_file] [-wf zoom_factor]
                          lon lat

Creates a map with the closest country in each direction

positional arguments:
lon              The point longitude
lat              The point latitude

optional arguments:
-h, --help       show this help message and exit
-n num_angles    Number of angles
-o out_file      Out file. If present, saves the file instead of showing it
-wf zoom_factor The width factor. Use it to zoom in and out. Use > 1 to
                   draw a bigger area, and <1 for a smaller one. By default is
                   1

For example:

python closest_country.py -n 100 -wf 2.0 5 41

The code has some functions, but the main one is draw_map:

def draw_map(self, num_angles = 360, width_factor = 1.0):

        #Create the map, with no countries
        self.map = Basemap(projection='aeqd',
                    lat_0=self.center_lat,lon_0=self.center_lon,resolution =None)
        #Iterate over all the angles:
        self.read_shape()
        results = {}
        distances = []
        for num in range(num_angles):
            angle = num * 360./num_angles
            closest, dist = self.closest_polygon(angle)
            if closest is not None:
                distances.append(dist)
                if (self.names[closest] in results) == False:
                    results[self.names[closest]] = []

                results[self.names[closest]].append(angle)
        
        #The map zoom is calculated here, 
        #taking the 90% of the distances to be drawn by default       
        width = width_factor * sorted(distances)[
                int(-1 * round(len(distances)/10.))]

        #Create the figure so a legend can be added
        plt.close()
        fig = plt.figure()
        ax = fig.add_subplot(111)
        cmap = plt.get_cmap('Paired')
        
       
        self.map = Basemap(projection='aeqd', width=width, height=width,
                    lat_0=self.center_lat,lon_0=self.center_lon,resolution =None)
        self.read_shape()
        
        #Fill background.
        self.map.drawmapboundary(fill_color='aqua')

        #Draw parallels and meridians to give some references
        self.map.drawparallels(range(-80, 100, 20))
        self.map.drawmeridians(range(-180, 200, 20))

           
        #Draw a black dot at the center.
        xpt, ypt = self.map(self.center_lon, self.center_lat)
        self.map.plot([xpt],[ypt],'ko')
    
        #Draw the sectors
        for i in range(len(results.keys())):
            for angle in results[results.keys()[i]]:
                anglerad = float(angle) * pi / 180.0
                anglerad2 = float(angle + 360./num_angles) * pi / 180.0
                polygon = Polygon([(xpt, ypt), (xpt + width * sin(anglerad), ypt + width * cos(anglerad)), (xpt + width * sin(anglerad2), ypt + width * cos(anglerad2))])
                patch2b = PolygonPatch(polygon, fc=cmap(float(i)/(len(results) - 1)), ec=cmap(float(i)/(len(results) - 1)), alpha=1., zorder=1)
                ax.add_patch(patch2b)
        

        #Draw the countries
        for polygon in self.polygons:
            patch2b = PolygonPatch(polygon, fc='#555555', ec='#787878', alpha=1., zorder=2)
            ax.add_patch(patch2b)

        #Draw the legend
        cmap = self.cmap_discretize(cmap, len(results.keys()))
        mappable = cm.ScalarMappable(cmap=cmap)
        mappable.set_array([])
        mappable.set_clim(0, len(results))
        colorbar = plt.colorbar(mappable, ticks= [x + 0.5 for x in range(len(results.keys()))])
        colorbar.ax.set_yticklabels(results.keys())

        plt.title('Closest country')

The first steps are used to calculate the closest country in each direction, storing the result in a dict. The distance is calculated using the closest_polygon method, explained in the previous section..
The actual map size is then calculated, taking the distance where the 90% of the polygons will appear. The width_factor can change this, because some times the result is not pretty enough. Some times has to much zoom and some, too few. Note that the aeqd i.e., Azimuthal Equidistant projection is used, since is the one that makes the distances in all directions comparable.
Next steps are to actually drawing the map

The sectors (the colors indicating the closest country) are drawn using the Descartes library and it's PolygonPatch
The legend needs to change the color map to a discrete color map. I used a function called cmap_discretize, found here, to do it
The legend is created using the examples found in this cookbook

Some outputs:

Next steps: What's across the ocean

Well, my original idea was creating a map like this one, showing the closest country when you are at the beach. Given a point and a direction (east or west in the example), calculating the country is easy, and doing it for each point in the coast is easy too. The problem is that doing it automatic is far more difficult, since you have to know the best direction (not easy in many places like islands), which countries to take as the origin, etc.
An other good thing would be doing the same, but with d3js, since the point position could become interactive. I found some libraries like shapely.js or jsts, but I think that they still don't implement the difference operation that we need.

Links

The post examples source code

If you’re on the beach, this map shows you what’s across the ocean: The map that made me think about this problem

A LinkedIn discussion that gave me some ideas

Basemap links

How to install Basemap (you can use a virtual environment to test it without installing it in the whole system). Be sure to have pip installed, and the python-dev package in case you are using Ubuntu. Some distributions have Basemap as a system package too.
How to create an Azimuthal equidistant map with Basemap - The Azimuthal Equidistant projection
Some simple and useful Basemap examples
Advanced Basemap tricks that helped me to add the legend and much more
How to discretize a color map
Descartes: Drawing polygons in Matplotlib

Monday, March 31, 2014

Slides for the workshop "Introduction to Python for geospatial uses"

Last 26th, 27th and 28th of March, the 8as Jornadas SIG Libre were held in Girona, where I had the opportunity to give a workshop about Python for geospatial uses.

The slides in Spanish:
http://rveciana.github.io/introduccion-python-geoespacial

The Slides in English:
http://rveciana.github.io/introduccion-python-geoespacial/index_en.html

The example files in both languages:
https://github.com/rveciana/introduccion-python-geoespacial

The meeting was awesome, if you have the opportunity and understand Spanish, come next year!

Monday, March 24, 2014

Shaded relief images using GDAL python

After showing how to colour a DEM file, classifying it, and calculating its isobands, this post shows how to create a shaded relief image from it.

The resulting image

A shaded relief image simulates the shadow thrown upon a relief map. This shadow is usually blended with some colouring, related to the altitude, a terrain classification, etc.
The shadow is usually drawn considering that the sun is at 315 degrees of azimuth and 45 degrees over the horizon, which never happens at the north hemisphere. This values avoid strange perceptions, such as seeing the mountain tops as the bottom of a valley.

In this example, the script calculates the hillshade image, a coloured image, and blends them into the shaded relief image.

As usual, all the code, plus the sample DEM file, can be found at GitHub.

The hillshade image

I didn't know how to create a shaded relief image using numpy. Eric Gayer helped me with some samples, and I found some other information here.

The script is:

"""
Creates a shaded relief file from a DEM.
"""

from osgeo import gdal
from numpy import gradient
from numpy import pi
from numpy import arctan
from numpy import arctan2
from numpy import sin
from numpy import cos
from numpy import sqrt
from numpy import zeros
from numpy import uint8
import matplotlib.pyplot as plt

def hillshade(array, azimuth, angle_altitude):
        
    x, y = gradient(array)
    slope = pi/2. - arctan(sqrt(x*x + y*y))
    aspect = arctan2(-x, y)
    azimuthrad = azimuth*pi / 180.
    altituderad = angle_altitude*pi / 180.
     
 
    shaded = sin(altituderad) * sin(slope)\
     + cos(altituderad) * cos(slope)\
     * cos(azimuthrad - aspect)
    return 255*(shaded + 1)/2

ds = gdal.Open('w001001.tiff')  
band = ds.GetRasterBand(1)  
arr = band.ReadAsArray()

hs_array = hillshade(arr,315, 45)
plt.imshow(hs_array,cmap='Greys')
plt.show()

The script draws the image using matplotlib, to make it easy
The hillshade function starts calculating the gradient for the x and y directions using the numpy.gradient function. The result are two matrices of the same size than the original, one for each direction.
From the gradient, the aspect and slope can be calculated. The aspect will give the mountain orientation, which will be illuminated depending on the azimuth angle. The slopewill change the illumination depending on the altitude angle.
Finally, the hillshade is calculated.

shaded_relief.py

The shaded relief image is calculated using the algorithm explained in the post Colorize PNG from a raster file and the hillshade.
As in the coloring post, the image is read by blocks to improve the performance, because it uses a lot of arrays, and doing it at once with a big image can take a lot of resources.
I will coment the code block by block, to make it easier. The full code is here.

The main function, called shaded_relief, is the most important, and calls the different algorithms:

def shaded_relief(in_file, raster_band, color_file, out_file_name,
    azimuth=315, angle_altitude=45):
    '''
    The main function. Reads the input image block by block to improve the performance, and calculates the shaded relief image
    '''

    if exists(in_file) is False:
            raise Exception('[Errno 2] No such file or directory: \'' + in_file + '\'')    
    
    dataset = gdal.Open(in_file, GA_ReadOnly )
    if dataset == None:
        raise Exception("Unable to read the data file")
    
    band = dataset.GetRasterBand(raster_band)

    block_sizes = band.GetBlockSize()
    x_block_size = block_sizes[0]
    y_block_size = block_sizes[1]

    #If the block y size is 1, as in a GeoTIFF image, the gradient can't be calculated, 
    #so more than one block is used. In this case, using8 lines gives a similar 
    #result as taking the whole array.
    if y_block_size < 8:
        y_block_size = 8

    xsize = band.XSize
    ysize = band.YSize

    max_value = band.GetMaximum()
    min_value = band.GetMinimum()

    #Reading the color table
    color_table = readColorTable(color_file)
    #Adding an extra value to avoid problems with the last & first entry
    if sorted(color_table.keys())[0] > min_value:
        color_table[min_value - 1] = color_table[sorted(color_table.keys())[0]]

    if sorted(color_table.keys())[-1] < max_value:
        color_table[max_value + 1] = color_table[sorted(color_table.keys())[-1]]
    #Preparing the color table
    classification_values = color_table.keys()
    classification_values.sort()

    max_value = band.GetMaximum()
    min_value = band.GetMinimum()

    if max_value == None or min_value == None:
        stats = band.GetStatistics(0, 1)
        max_value = stats[1]
        min_value = stats[0]

    out_array = zeros((3, ysize, xsize), 'uint8')

    #The iteration over the blocks starts here
    for i in range(0, ysize, y_block_size):
        if i + y_block_size < ysize:
            rows = y_block_size
        else:
            rows = ysize - i
        
        for j in range(0, xsize, x_block_size):
            if j + x_block_size < xsize:
                cols = x_block_size
            else:
                cols = xsize - j

            dem_array = band.ReadAsArray(j, i, cols, rows)
            
            hs_array = hillshade(dem_array, azimuth, 
                angle_altitude)

            rgb_array = values2rgba(dem_array, color_table, 
                classification_values, max_value, min_value)

            hsv_array = rgb_to_hsv(rgb_array[:, :, 0], 
                rgb_array[:, :, 1], rgb_array[:, :, 2]) 

            hsv_adjusted = asarray( [hsv_array[0], 
                hsv_array[1], hs_array] )          

            shaded_array = hsv_to_rgb( hsv_adjusted )
            
            out_array[:,i:i+rows,j:j+cols] = shaded_array
    
    #Saving the image using the PIL library
    im = fromarray(transpose(out_array, (1,2,0)), mode='RGB')
    im.save(out_file_name)

After opening the file, at line 20 comes the first interesting point. If the image is read block by block, some times the blocks will have only one line, as in the GeoTIFF images. With this situation, the y gradient won't be calculated, so the hillshade function will fail. I've seen that taking only two lines gives coarse results, and with lines the result is more or less the same as taking the whole array. The performance won't be as good as using only one block, but works faster anyway.
Lines 32 to 51 read the color table and file maximim and minumum. This has to be outside the values2rgba function, since is needed only once.
Lines 54 to 66 control the block reading. For each iteration, a small array will be read (line 67). This is what will be processed. The result will be written in the output array defined at line 52, that has the final size.
Now the calculations start:

At line 69, the hillshade is calculated
At line 72, the color array is calculated
At line 75, the color array is changed from rgb values to hsv.
At line 78, the value (the v in hsv) is changed to the hillshade value. This will blend both images. I took the idea from this post.
Then the image is transformed to rgb again (line 81) and written into the output array (line 83)

Finally, the array is transformed to a png image using the PIL library. The numpy.transpose function is used to re-order the matrix, since the original values are with the shape (3, height, width), and the Image.fromarray function needs (height, width, 3). An other way to do this is using scipy.misc.imsave (that would need scipy installed just for that), or the Image.merge function.

The colouring funcion is taken from the post Colorize PNG from a raster file, but modifying it so the colors are only continuous, since the discrete option doesn't give nice results in this case:

def values2rgba(array, color_table, classification_values, max_value, min_value):
    '''
    This function calculates a the color of an array given a color table. 
    The color is interpolated from the color table values.
    '''
    rgba = zeros((array.shape[0], array.shape[1], 4), dtype = uint8)

    for k in range(len(classification_values) - 1):
        if classification_values[k] < max_value and (classification_values[k + 1] > min_value ):
            mask = logical_and(array >= classification_values[k], array < classification_values[k + 1])

            v0 = float(classification_values[k])
            v1 = float(classification_values[k + 1])

            rgba[:,:,0] = rgba[:,:,0] + mask * (color_table[classification_values[k]][0] + (array - v0)*(color_table[classification_values[k + 1]][0] - color_table[classification_values[k]][0])/(v1-v0) )
            rgba[:,:,1] = rgba[:,:,1] + mask * (color_table[classification_values[k]][1] + (array - v0)*(color_table[classification_values[k + 1]][1] - color_table[classification_values[k]][1])/(v1-v0) )
            rgba[:,:,2] = rgba[:,:,2] + mask * (color_table[classification_values[k]][2] + (array - v0)*(color_table[classification_values[k + 1]][2] - color_table[classification_values[k]][2])/(v1-v0) )
            rgba[:,:,3] = rgba[:,:,3] + mask * (color_table[classification_values[k]][3] + (array - v0)*(color_table[classification_values[k + 1]][3] - color_table[classification_values[k]][3])/(v1-v0) )
    return rgba

The hillshade function is the same explained at the first point
The functions rgb_to_hsv and hsv_to_rgb are taken from this post, and change the image mode from rgb to hsv and hsv to rgb.

Tuesday, February 25, 2014

3D terrain visualization with python and Mayavi2

I have always wanted to draw these 3D terrains like those in www.shadedrelief.com, which are amazing. But the examples were all using software I don't use, so I tried to do it with python.

The final result

As usual, you can get all the source code and data at my GitHub page.

Getting the data

After trying different locations, I decided to use the mountain of Montserrat, close to Barcelona, since it has nice stone towers that are a good test for the DEM and the 3D visualization. An actual picture of the zone used is this one:

Montserrat monastery

The building is a good reference, since the stone only areas make the result testing much more difficult.
All the data has been downloaded from the ICGC servers:

The DEM data was downloaded from the Vissir3 service, going to the section catàleg i descàrregues -> MDE 5x5. The file is named met5v10as0f0392Amr1r020.txt, but I cut a small part of it, to make mayavi2 work smoother using:

gdalwarp -te 401620 4604246 403462 4605867 -s_srs EPSG:25831 -t_srs EPSG:25831 met5v10as0f0392Amr1r020.txt dem.tiff
The picture to drap over the dem file can be downloaded using the WMS service given by the ICGC:

http://geoserveis.icc.cat/icc_mapesbase/wms/service?REQUEST=GetMap&VERSION=1.1.0&SERVICE=WMS&SRS=EPSG:25831&BBOX=401620.0,4604246.0,403462.0,4605867.0&WIDTH=1403&HEIGHT=1146&LAYERS=orto5m&STYLES=&FORMAT=JPEG&BGCOLOR=0xFFFFFF&TRANSPARENT=TRUE&EXCEPTION=INIMAGE

It's not as automatic as I would like, but if it's possible to download a DEM and the corresponding image, it's possible to create the 3D image.

Creating the image

First, let's plot the DEM file in 3D using mayavi2:

"""
Plotting the terrain DEM with Mayavi2
"""

from osgeo import gdal
from mayavi import mlab

ds = gdal.Open('dem.tiff')
data = ds.ReadAsArray()

mlab.figure(size=(640, 800), bgcolor=(0.16, 0.28, 0.46))

mlab.surf(data, warp_scale=0.2)

mlab.show()

In first place, we import as usual, gdal and numpy. Also the mlab library from mayavi, which lets set the mayavi canvas.
The data is read, as usual, with the gdal ReadAsArray method.
The figure is created. This works like creating the Image object in the PIL library, creating the canvas where the data wil be drawn. In this case, the size is 640 x 800 pixels, making the figure bigger can affect the performance in some old computers. bgcolor sets the blue color as the background.
The surf method will plot the surface. The input has to be a 2D numpy array, which is what we have.

The warp_scale argument sets the vertical scale. In this case, letting the default value (1?) creates a really exaggerated effect so its better to play a little to get a more realistic effect.
The colors depend of the Z value at each point, and can be changed using the color or colormap option.

The show() method makes the image to stay visible when running the example from a script. If you use ipython, you don't need this step.

If you want to save the figure as a png, you can either use the icon in the mayavi window or call the method mlab.savefig('image_name')
If you want to move the camera (change the prespective), you can use the roll/yaw/pitch methods:
```
f = mlab.gcf()
camera = f.scene.camera
camera.yaw(45)
```

The plotted DEM

Now, let's put an aerial image over the 3D visualization:

"""
Draping an image over a terrain surface
"""
from osgeo import gdal
from tvtk.api import tvtk
from mayavi import mlab
import Image

ds = gdal.Open('dem.tiff')
data = ds.ReadAsArray()
im1 = Image.open("ortofoto.jpg")
im2 = im1.rotate(90)
im2.save("/tmp/ortofoto90.jpg")
bmp1 = tvtk.JPEGReader()
bmp1.file_name="/tmp/ortofoto90.jpg" #any jpeg file

my_texture=tvtk.Texture()
my_texture.interpolate=0
my_texture.set_input(0,bmp1.get_output())


mlab.figure(size=(640, 800), bgcolor=(0.16, 0.28, 0.46))

surf = mlab.surf(data, color=(1,1,1), warp_scale=0.2) 
surf.actor.enable_texture = True
surf.actor.tcoord_generator_mode = 'plane'
surf.actor.actor.texture = my_texture

mlab.show()

The most important new import is tvtk. TVTK is a python api that allows to work with VTK objects. Actually, my knowledge of Mayavi2 is very limited, but I see TVTK as an extension.
The DEM data is read the same way, using the ReadAsArray method.
The aerial image, named ortofoto.jpg, is not in the correct orientation. It took me a lot of time to get what was happening. I rotate it opening the image with the PIL library and using the rotate method (lines 11 to 13)
Then, the tvtk object with the texture is created, loading the image with a JPEGReader object, and assigning it to the Texture object (lines 14 to 19).
The figure and the 3D surface is created as in the other example (lines 22 and 24)
Then, the surface is modified to show the image over it (lines 25 to 28).

The final result

The result is correct, but the aerial image is a little moved from the place it should be so, since the terrain is really steep, part of the buildings are drawn on the stone walls! I edited the WMS coordinates a little so the result is slightly better. Anyway, the method is correct.

Saturday, October 5, 2013

Castor project's earthquakes map with D3js

The Castor project is a submarine natural gas storage facility located in front of the eastern Iberian Peninsula coast. The idea is to store the Algerian gas inside an old oilfield cavity. At least this is what I understood (sorry, geologists).
Somehow, when the facility started working, a series of earthquakes have started to occur. At the beginning, the platform owners said it wasn't related to their activity, but now everybody agrees that it is, and the activity has stopped, but not the earthquakes.

I didn't find a nice map about the earthquakes epicenters, so I thought that D3js would be a good option to do it.


The star represents the aproximate platform location, and the circles, the epicenters. It's easy to see why they are related to the platform activity.

The animated map is at my bl.ocks.org page. The explanations are in Catalan, but basically say the same as here.

Getting the data

The data about the significant earthquakes around Catalonia can be found at the Catalan Geologic Institute web site, but the format for the reports is not very convenient to get the data, so I made this python script to get it:

# -*- coding: utf-8 -*-
#http://jramosgarcia.wordpress.com/2013/10/01/que-es-el-proyecto-castor/
import urllib
import json
import dateutil.parser
import datetime

def get_data(url):
    filehandle = urllib.urlopen(url)
    html = filehandle.read()
    filehandle.close()

    lines = html.splitlines()

    for i in range(len(lines)):
    
        if lines[i].find('Latitud') > 0:
            lat = lines[i].strip().split(" ")[1].replace("º","")
        if lines[i].find('Longitud') > 0:
            lon = lines[i].strip().split(" ")[1].replace("º","")   
        if lines[i].find('mol del dia') > 0:
            date = lines[i + 1].strip().replace(" >/div<","")
        if lines[i].find('Hora origen') > 0:
            hour = lines[i].strip().split(" ")[4]                   
        if lines[i].find('Magnitud') > 0:
            magnitude = lines[i+1].strip().split(" ")[0]

    date_array  = date.split("/")
    hour_array = hour.split(":")

    date_time = datetime.datetime(int(date_array[2]),int(date_array[1]),int(date_array[0]),int(hour_array[0]), int(hour_array[1]))

    data = {'lat':lat, 'lon':lon, 'datetime': date_time.isoformat(), 'magnitude': magnitude}
    return json.dumps(data)

if __name__ == "__main__":
    url_list = [
    'http://www.igc.cat/web/gcontent/ca/sismologia/sismescomact/comhistcat/20130910175510/comact.html',
    'http://www.igc.cat/web/gcontent/ca/sismologia/sismescomact/comhistcat/20130913095842/comact.html',
    'http://www.igc.cat/web/gcontent/ca/sismologia/sismescomact/comhistcat/20130918142943/comact.html',
    'http://www.igc.cat/web/gcontent/ca/sismologia/sismescomact/comhistcat/20130920104607/comact.html',
    'http://www.igc.cat/web/gcontent/ca/sismologia/sismescomact/comhistcat/20130924091301/comact.html',
    'http://www.igc.cat/web/gcontent/ca/sismologia/sismescomact/comhistcat/20130925125029/comact.html',
    'http://www.igc.cat/web/gcontent/ca/sismologia/sismescomact/comhistcat/20130929084140/comact.html',
    'http://www.igc.cat/web/gcontent/ca/sismologia/sismescomact/comhistcat/20130929192416/comact.html',
    'http://www.igc.cat/web/gcontent/ca/sismologia/sismescomact/comhistcat/20130930005900/comact.html',
    'http://www.igc.cat/web/gcontent/ca/sismologia/sismescomact/comhistcat/20130930051316/comact.html',
    'http://www.igc.cat/web/gcontent/ca/sismologia/sismescomact/comhistcat/20131001045206/comact.html',
    'http://www.igc.cat/web/gcontent/ca/sismologia/sismescomact/comhistcat/20131001055709/comact.html',
    'http://www.igc.cat/web/gcontent/ca/sismologia/sismescomact/comhistcat/20131002121626/comact.html',
    'http://www.igc.cat/web/gcontent/ca/sismologia/sismescomact/comhistcat/20131002232928/comact.html',
    'http://www.igc.cat/web/gcontent/ca/sismologia/sismescomact/comhistcat/20131003012732/comact.html',
    'http://www.igc.cat/web/gcontent/ca/sismologia/sismescomact/comhistcat/20131003031301/comact.html',
    'http://www.igc.cat/web/gcontent/ca/sismologia/sismescomact/comhistcat/20131004113222/comact.html',
    'http://www.igc.cat/web/gcontent/ca/sismologia/sismescomact/comhistcat/20131004120323/comact.html'
    ]

    f = open("data.json","w")
    f.write("[")
    json_data = ""
    for url in url_list:
     json_data = json_data + get_data( url ) + ", "
       
    f.write(json_data[0:-2])
    f.write("]")
    f.close()

The web pages are in the list, since the interesting reports have to be chosen one by one. It would be nice to have a better way to do it.
Then, the get_data function just parses the text in the way that all the reports are parsed properly.The data is stored in a json file to make easier it's use from D3js.

Using D3js to visualize the data

I used this example by Mike Bostock to create the background map. The tiles origin has been changed because the example tiles are not available at this zoom level, and to have more points of interest to situate the earthquake locations.

This is the code:






<script>
/**
Based on Mike Bostock's http://bl.ocks.org/mbostock/4150951
*/
var width = 960,
    height = 500;

var projection = d3.geo.mercator()
    .center([0.5491, 40.4942])
    .scale(20000);

var path = d3.geo.path()
    .projection(projection);

var tile = d3.geo.tile()
    .scale(projection.scale() * 2 * Math.PI)
    .translate(projection([0, 0]))
    .zoomDelta((window.devicePixelRatio || 1) - .5);

var svg = d3.select("body").append("svg")
    .attr("width", width)
    .attr("height", height);

  var tiles = tile();

  var defs = svg.append("defs");

var magnitudeScale = d3.scale.linear().domain([2,5]).range([5,30]);
d3.json("data.json", function(error, locations) {

  svg.append("g")
      
    .selectAll("image")
      .data(tiles)
    .enter().append("image")
      .attr("xlink:href", function(d) { return "http://" + ["a", "b", "c", "d"][Math.random() * 4 | 0] + ".tiles.mapbox.com/v3/examples.map-vyofok3q/" + d[2] + "/" + d[0] + "/" + d[1] + ".png"; })
      .attr("width", Math.round(tiles.scale) + 1)
      .attr("height", Math.round(tiles.scale) + 1)
      .attr("x", function(d) { return Math.round((d[0] + tiles.translate[0]) * tiles.scale); })
      .attr("y", function(d) { return Math.round((d[1] + tiles.translate[1]) * tiles.scale); });

  svg.append("g")
      .append('path')
      .attr("d","m 0,0 -8.47858,-5.22254 -8.31623,5.47756 2.34696,-9.67752 -7.77927,-6.21653 9.92909,-0.75852 3.50829,-9.31953 3.78972,9.20873 9.94748,0.45679 -7.58687,6.44982 z")
      .attr("stroke","black")
      .attr("stroke-width",2)
      .style("fill", d3.rgb(90, 90, 90))
      .attr("transform", "translate("+projection([0.66879, 40.33503])[0]+","+projection([0.66879, 40.33503])[1]+")");
  
  var locationsG = svg.append("g");
  

  function addLocation(loc){
    locationsG.append('circle')
      .attr('class','location')
      .attr("r", 5)
      .attr("cx", projection([loc.lon, loc.lat])[0])
      .attr("cy", projection([loc.lon, loc.lat])[1])
      .style("fill", d3.rgb(255, 0, 0).darker(2))
      .style("opacity", 0.8)
      .transition()
      .duration(1000)
      .attr("r", magnitudeScale(loc.magnitude))
      .transition()
      .delay(2000)
      .duration(2000)
      .style("opacity",0.3);

    locationsG
      .append("text")
      .text(loc.magnitude)
      .attr("x", projection([loc.lon, loc.lat])[0] - 10)
      .attr("y", projection([loc.lon, loc.lat])[1] + 5)
      .attr("font-family", "sans-serif")
      .attr("font-size", "12px")
      .attr("fill", "black")
      .style("opacity",0)
      .transition()
      .duration(1000)
      .style("opacity",1)
      .transition()
      .delay(2000)
      .duration(2000)
      .style("opacity",0);
  }

  //addLocation({"lat": "40.43", "magnitude": "2.7", "lon": "0.7", "datetime": "2013-10-09T06:43:16"});
  
  var intDate = new Date("2013-09-10T00:00:00Z");
  var maxDate = new Date("2013-10-04T00:00:00Z");
  var usedLocations = new Array();

  var dateTitle = svg
    .append("text")
    .attr("id", "dataTitle")
    .text(intDate.toLocaleDateString())
    .attr("x", 70)
    .attr("y", 20)
    .attr("font-family", "sans-serif")
    .attr("font-size", "20px")
    .attr("fill", "black");

  var interval = setInterval(function() {

    dateTitle.text(intDate.toLocaleDateString());

    for (i = 0; imaxDate){
        clearInterval(interval);
      }

    }
    
    intDate.setDate(intDate.getDate() + 1);
  }, 1000);
  
});

Lines 30 and 48 are the most important ones to create the tile map. In this case, no zoom or pan is set, so the code is quite simple.
The line 57 creates the star indicating the platform location. I made the star using Inkscape and captured the code using the tool to see the generated svg. To situate the symbol on the map, a transform attribute is used, translating the symbol to the pixel calculated projecting the location coordinates.
To add the earthquake epicenters:

The function addLocation (line 68) adds the circle and the text indicating the earthquake magnitude. To locate them, the projection function is used again. Two transitions are used to make the circle grow and then get brighter. The same for making the text disappear.
An interval is created (line 118) to change the date (one day per second). For every iteration, the date is evaluated and the earthquakes occured during this day plotted using the addLocation function. I didn't find a "more D3js" solution. The earthquakes already plotted are stored in an array to avoid plotting them more than once.
The date on the map is also changed for every iteration.

Some related posts and pages:

D3.geo.tile
D3 World maps
Let's make a map
Three little circles
Que es el proyecto Castor? (in Spanish). Very good explanation about the Castor project, with some illustrations for the non-Spanish speakers.
Quakes rattle Spain, offshore gas storage project blamed

Thursday, September 5, 2013

Reading WRF NetCDF files with GDAL python

Since I work at a meteorological service I have to deal quite often with numerical weather prediction models. I use to work with GRIB files that have the data in the format I need to draw simple maps.
But sometimes, I need to calculate stuff that require the original data, since GRIB files are usually reprojected and simplified in vertical levels.
In the case of the WRF model, the output is in NetCDF format. The data is in sigma levels instead the pressure levels I need, and the variables such as temperature don't come directly in the file, but need to be calculated.
The post is actually written to order the things I've been using for myself, but I've added it to the blog because I haven't found step by step instructions.

Getting the data

As usual, you can download the script from GitHub.

I have used some data from the model run at the SMC, where I work, but you can get a WRF output from the ucar.edu web site. The projection is different from the one I have used, and it has 300 sigma levels instead of 30, but it should work.

The NetCDF format

Every NETCDF file contains a collection of datasets with different sizes and types. Each dataset can have one or more layers, can have unidimensional arrays in it, etc. So the "traditional" GDAL datamodel can't work directly, since assumes that a file have multiple layers of the same size. That's why "subdatasets" are introduced. So opening the file directly (or running gdalinfo) only gives some metadata about all the datasets, but no actual data can be read or consulted. To do it, the subdataset must be opened instead.

Opening a subdataset

Since opening the file directly doesn't open the actual dataset, but the "container", a different notation is used, maintaining the Open method. The filename string must be preceded by NETCDF: and continued by :VAR_NAME
To open the variable XLONG subdataset, the order would be:

gdal.Open('NETCDF:"'+datafile+'":XLONG')

ARWPost

ARWPost is a program that reads WRF outputs in FORTRAN. I have used it's code to know how to calculate the actual model variables from the model NetCDF file.
There is a file for calculating each variable, some for the interpolation, etc. so it has helped me a lot when trying to understand how everything works.
The problem of ARWPost is that the usage is quite complicated (change a configuration file and executing), so it was not very practical to me.

Sigma Levels

Instead of using constant pressure levels, the wrf model works using the Sigma coordinate system
This makes a bit more difficult extracting the data at a constant pressure, since for every point in the horizontal grid, a different level position is needed to get the value.

Besides, not all the variables use the same sigma levels. There are two different level sets known as "half levels" and "full levels", as shown in the picture:


Image adapted from the Wikipedia Sigma Levels entry

Of course, this makes things a bit more complicated.

A layer sigma value in the half level is the average between full level above and below.

Getting the geopotential height at a pressure level, step by step

Setting the right projection

NetCDF can hold the following projections:
'lambert', 'polar', 'mercator', 'lat-lon' and 'rotated_ll'. The projection is defined using the label MAP_PROJ at the original file metadata (not the subdatasets metadata). To get all the metadata file, jut use:

ds_in = gdal.Open(datafile)
metadata = ds_in.GetMetadata()

All the metadata comes with the prefix NC_GLOBAL#

In my case, I have MAP_PROJ=2 ,so the projection is polar stereographic (the code is the position in the projection list, starting by 1). I'll set it with:

proj_out = osr.SpatialReference()
proj_out.SetPS(90, -1.5, 1, 0, 0)

where -1.5 is the STAND_LON label at the original file metadata and 90 is the POLE_LAT label.

Finding the geotransform

I'm not sure if the method I use is the best one, but the data I get is consistent.
The metadata given at the main file doesn't set the geotransform properly (or, at least, I'm not able to get it), since it sets only the center point coordinates, and the deltax and deltay. But this deltas are not referenced to the same point, so getting the real geotransform is not easy.

Fortunately, the files come with the fields XLONG and XLAT, that contain the longitude and latitude for each point in the matrix. So taking the corners is possible to get the geotransform, although the coordinates have to be reprojected before:

ds_lon = gdal.Open('NETCDF:"'+datafile+'":XLONG')
ds_lat = gdal.Open('NETCDF:"'+datafile+'":XLAT')

longs = ds_lon.GetRasterBand(1).ReadAsArray()
lats = ds_lat.GetRasterBand(1).ReadAsArray()
ds_lon = None
ds_lat = None
proj_gcp = osr.SpatialReference()
proj_gcp.ImportFromEPSG(4326)
transf = osr.CoordinateTransformation(proj_gcp, proj_out)
ul = transf.TransformPoint(float(longs[0][0]), float(lats[0][0]))
lr = transf.TransformPoint(float(longs[len(longs)-1][len(longs[0])-1]), float(lats[len(longs)-1][len(longs[0])-1]))
ur = transf.TransformPoint(float(longs[0][len(longs[0])-1]), float(lats[0][len(longs[0])-1]))
ll = transf.TransformPoint(float(longs[len(longs)-1][0]), float(lats[len(longs)-1][0]))
gt0 = ul[0]
gt1 = (ur[0] - gt0) / len(longs[0])
gt2 = (lr[0] - gt0 - gt1*len(longs[0])) / len(longs)
gt3 = ul[1]
gt5 = (ll[1] - gt3) / len(longs)
gt4 = (lr[1] - gt3 - gt5*len(longs) ) / len(longs[0])
geotransform = (gt0,gt1,gt2,gt3,gt4,gt5)

Getting the geopotential height

From the ARWPost code, I know that the geopotential height is:

H = (PH + PHB)/9.81

So the PH and PHB subdatasets must be added and divided by the gravity to get the actual geopotential height

The code:

ds_ph = gdal.Open('NETCDF:"'+datafile+'":PH')
ds_phb = gdal.Open('NETCDF:"'+datafile+'":PHB')
num_bands = ds_ph.RasterCount
data_hgp = numpy.zeros(((num_bands, ds_ph.RasterYSize, ds_ph.RasterXSize)))
for i in range(num_bands):
    data_hgp[i,:,:] = ( ds_ph.GetRasterBand(num_bands - i).ReadAsArray() + ds_phb.GetRasterBand(num_bands - i).ReadAsArray() ) / 9.81
    ds_ph = None 
    ds_phb = None

Note that the bands are inverted in order and that in the raster, the order of the elements is [layer][y][x]. This is needed later to calculate layer positions.

Calculating the geopotential height for a given Air Pressure

At this moment, we have the geopotential height for every sigma level, which is not very useful for working with the data. People prefers to have the geopotential height for a given pressure (850 hPa, 500 hPa, and so on).

The model gives the pressure values at every point, calculated doing:
Press = P + PB
in Pa or
Press = (P + PB ) / 100
in hPa

The code for reading the pressure:

ds_p = gdal.Open('NETCDF:"'+datafile+'":P')
ds_pb = gdal.Open('NETCDF:"'+datafile+'":PB')
num_bands = ds_p.RasterCount
data_p = numpy.zeros(((num_bands, y_size, x_size)))
for i in range(num_bands):
    data_p[i,:,:] = ( ds_p.GetRasterBand(num_bands - i).ReadAsArray() + ds_pb.GetRasterBand(num_bands - i).ReadAsArray() ) / 100
    ds_p = None
    ds_pb = None

Now we can calculated at which layer on the pressure subdataset the given pressure would be.
NumPy has a function that calculates at which position a value would be in an ordered array called searchsorted: http://docs.scipy.org/doc/numpy/reference/generated/numpy.searchsorted.html

numpy.searchsorted([1,2,3,4,5], 2.5)

would give
2, since the value 2.5 is between 2 and 3 (positions 1 and 2).
I needed two tricks to use it:
*The function needs to have an ascending ordered array. That's why the layers order is reversed in the code, because the air pressure is descending with height, and we need the opposite to use this function.
*The function can only run in 1-d arrays. To use it in our 3-d array, the numpy.apply_along_axis function is used:
numpy.apply_along_axis(lambda a: a.searchsorted(p_ref), axis = 0, arr = data_p)
where p_ref is the pressure we want for the geopotential height.
That's why the layer number is the first element in the array, because is the only way to get apply_along_axis work.

The final code for the calculation:

p_ref = 500
data_positions = numpy.apply_along_axis(lambda a: a.searchsorted(p_ref), axis = 0, arr = data_p)

Now we know between which layers a given pressure would be for every point. To know the actual pressures at the upper and lower level for each point, the numpy.choose function can be used:

p0 = numpy.choose(data_positions - 1, data_p)
p1 = numpy.choose(data_positions, data_p)
layer_p = (data_positions - 1) + (p_ref - p0) / (p1 - p0)

Now we know the position in the pressure layers. i.e. if we want 850 hPa, ant for one point the upper layer 3 is 900 hPa and the lower layer 2 is 800 hPa, the value would be 2.5

But for the geopotential height, the sigma level is not the same, since the pressure is referenced to the half levels, and the geopotential to the full levels.
So the we will add 0.5 layers to the pressure layer to get the geopotential height layer:

layer_h = layer_p + 0.5

Then the final value will be:

h0 = numpy.floor(layer_h).astype(int)
h1 = numpy.ceil(layer_h).astype(int)
data_out = ( numpy.choose(h0, data_hgp) * (h1 - layer_h) ) + ( numpy.choose(h1, data_hgp) * (layer_h - h0) )

Note that the h0 and h1 (the positions of the layers above and below the final position) must be integers to be used with the choose function.

Extrapolation

If the wanted pressure is 500hPa, a data layer above and below will be always found. But when we get closer to the land, like at 850 hPa, another typical pressure to show the geopotential height, we find that in some points, the pressure value is above the pressure layers of the model. So an extrapolation is needed, and the previous code, that assumed that the value would always between two layers will fail.

So the data calculated above must be filtered for each case. For the non interpolated data:

data_out = data_out * (data_positions < num_bands_p)

The extrapolation is defined in the file module_interp.f90 from the ARWPost source code. There are two defined cases:

The height is below the lowest sigma level, but above the ground height defined for that point
The height is below the ground level

The first case is quite simple, because the surface data for many variables is available, so it only changes the layer used to calculate the interpolation:

##If the pressure is between the surface pressure and the lowest layer
array_filter = numpy.logical_and(data_positions == num_bands_p, p_ref*100 < data_psfc)
zlev = ( ( (100.*p_ref - 100*data_p[-1,:,:])*data_hgt + (data_psfc - 100.*p_ref)*data_hgp[-1,:,:] ) / (data_psfc - 100*data_p[-1,:,:])) 
data_out = data_out + (zlev * array_filter )

The second case needs to extrapolate below the ground level. The code at the file module_interp.f90 iterates over each element in the array, but using numpy will increase the speed a lot, so let's see how is it done:

expon=287.04*0.0065/9.81
ptarget = (data_psfc*0.01) - 150.0

kupper = numpy.apply_along_axis(lambda a: numpy.where(a == numpy.min(a))[0][0] + 1, axis = 0, arr = numpy.absolute(data_p - ptarget))

pbot=numpy.maximum(100*data_p[-1,:,:], data_psfc)
zbot=numpy.minimum(data_hgp[-1,:,:], data_hgt)

tbotextrap=numpy.choose(kupper, data_tk) * pow((pbot/(100*numpy.choose(kupper, data_p))),expon)
tvbotextrap=tbotextrap*(0.622+data_qv[-1,:,:])/(0.622*(1+data_qv[-1,:,:]))

zlev = zbot+((tvbotextrap/0.0065)*(1.0-pow(100.0*p_ref/pbot,expon)))

data_out = data_out + ( zlev * array_filter )

Expon and ptarget are calculated directly (ptarget is an array, but the operations are simple)
kupper is a parameter that could be defined as the level where the absolute value of the pressure - ptarget changes. Since we have a 3d array, being the 0th dimension the level, we can calculate the values in the 2d xy matrix by using the apply_along_axis method. This method will take all the levels and run the function defined in the first parameter passing the second parameter as the argument. In our case, the position (where) of the minimum value (min). We add the +1 to get the same result than the original code.
So in one line, we do what in the original code needs three nested loops!
The value is inverse to the ones at the original code, because we changed the pressure order.
pbot and zbot are only to take the value closer to the ground betweeen the lowest level and the ground level defined by the model
tbotextrap is the extrapolated temperature. The choose function is used to get the value at the calculated kupper level for each element in the matrix.
tvbotextrap is the virtual temperature for tbotextrap. The original code uses an external function, but I have joined everything. The formula for the virtual temperature is virtual=tmp*(0.622+rmix)/(0.622*(1.+rmix))
zlev can be calculated directly with the values calculated before, and is added to the output after applying the filter.

Drawing the map using Basemap

The output file can be opened with QGIS, or any other GIS capable to read GeoTIFF files.

But is possible to draw it directly from the script using Basemap:

from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
m = Basemap(projection='cyl',llcrnrlat=40.5,urcrnrlat=43,\
            llcrnrlon=0,urcrnrlon=3.5,resolution='h')


cs = m.contourf(longs,lats,data_out,numpy.arange(1530,1600,10),latlon=True)
m.drawcoastlines(color='gray')
m.drawcountries(color='gray')

cbar = m.colorbar(cs,location='bottom',pad="5%")
cbar.set_label('m')

plt.title('Geopotential height at 850hPa')
plt.show()

First, we initialize the projection (latlon directly in this case), setting the map limits.
contourf will draw the isobands for each 10m.
Then, just draw the borders and coas lines and show the map.

This is how the map at the header was made.

Notes

To compile ARWPost in my Ubuntu, the file src/Makefile needs to be changed: -lnetcdf for -lnetcdff
It was very hard for me to find it!

Monday, August 5, 2013

Creating vectorial isobands with Python

GDAL has an utility program, gdal_contour, that generates contour lines from a raster file (take a look at his example post). GDAL provides access to this method through its API, so contour lines can be generated inside your GDAL python scripts.

Unfortunately, there is no way to generate isobands, that is, filled contour lines that use polygons instead of polylines. Creating isobands allows the creation of nice contour maps easily, colouring the zones between two values:

Using contour lines, the result is this one:

Both visualizations are drawn directly from QGIS.

So I created two scripts to get the result. As usual, you can download them from my GitHub account.
I created two scripts, since the one I like the most uses matplotlib, so sometimes it's not convenient to install. On the other hand, the other has less requirements (GDAL python), but the code is dirtier.

Both have the same usage, which is very similar to the gdal_contour usage:

    Usage: isobands_matplotlib.py [-h] [-b band] [-off offset] [-i interval]
    [-nln layer_name] [-a attr_name] [-f formatname]
    src_file out_file
    Calculates the isobands from a raster into a vector file

    positional arguments:
      src_file         The raster source file
      out_file         The vectorial out file
    optional arguments:
      -h, --help       show this help message and exit
      -b band          The band in the source file to process (default 1)
      -off offset      The offset to start the isobands (default 0)
      -i interval      The interval (default 0)
      -nln layer_name  The out layer name (default bands)
      -a attr_name     The out layer attribute name (default h)
      -f formatname    The output file format name (default ESRI Shapefile)

Just choose the input and output files, the input file band to process (1 by default), and the interval, and you will get the vector file in the GIS format of your choice.

Now, the code of both script, step by step.

isobands_matplotlib.py

As it's name suggests, this version uses matplotlib, and more specifically the contourf function, to create the isobands.

The script has a function called isobands, that actually generates the data, and a main part that reads the user input values:

if __name__ == "__main__":

    PARSER = ArgumentParser(
        description="Calculates the isobands from a raster into a vector file")
    PARSER.add_argument("src_file", help="The raster source file")
    PARSER.add_argument("out_file", help="The vectorial out file")
    PARSER.add_argument("-b", 
        help="The band in the source file to process (default 1)", 
        type=int, default = 1, metavar = 'band')
    PARSER.add_argument("-off", 
        help="The offset to start the isobands (default 0)", 
        type=float, default = 0.0, metavar = 'offset')
    PARSER.add_argument("-i", 
        help="The interval  (default 0)", 
        type=float, default = 0.0, metavar = 'interval')
    PARSER.add_argument("-nln", 
        help="The out layer name  (default bands)", 
        default = 'bands', metavar = 'layer_name')
    PARSER.add_argument("-a", 
        help="The out layer attribute name  (default h)", 
        default = 'h', metavar = 'attr_name')
    PARSER.add_argument("-f", 
        help="The output file format name  (default ESRI Shapefile)", 
        default = 'ESRI Shapefile', metavar = 'formatname')
    ARGS = PARSER.parse_args()

    isobands(ARGS.src_file, ARGS.b, ARGS.out_file, ARGS.f, ARGS.nln, ARGS.a, 
        ARGS.off, ARGS.i)

I used the argparse.ArgumentParser to read the script arguments, since is easy to use and it should come woth most of the updated python distributions (since python 2.7, I think).
This part of the code is quite easy to understand, it simply sets the arguments with their default value, sets if they are mandatory or not, etc.

The isobands function creates the vector file:

def isobands(in_file, band, out_file, out_format, layer_name, attr_name, 
    offset, interval, min_level = None):
    '''
    The method that calculates the isobands
    '''
    ds_in = gdal.Open(in_file)
    band_in = ds_in.GetRasterBand(band)
    xsize_in = band_in.XSize
    ysize_in = band_in.YSize

    geotransform_in = ds_in.GetGeoTransform()

    srs = osr.SpatialReference()
    srs.ImportFromWkt( ds_in.GetProjectionRef() )  

    #Creating the output vectorial file
    drv = ogr.GetDriverByName(out_format)

    if exists(out_file):
        remove(out_file)
    dst_ds = drv.CreateDataSource( out_file )
       
    dst_layer = dst_ds.CreateLayer(layer_name, geom_type = ogr.wkbPolygon, 
        srs = srs)

    fdef = ogr.FieldDefn( attr_name, ogr.OFTReal )
    dst_layer.CreateField( fdef )



    x_pos = arange(geotransform_in[0], 
        geotransform_in[0] + xsize_in*geotransform_in[1], geotransform_in[1])
    y_pos = arange(geotransform_in[3], 
        geotransform_in[3] + ysize_in*geotransform_in[5], geotransform_in[5])
    x_grid, y_grid = meshgrid(x_pos, y_pos)

    raster_values = band_in.ReadAsArray(0, 0, xsize_in, ysize_in)

    stats = band_in.GetStatistics(True, True)
    if min_level == None:
        min_value = stats[0]
        min_level = offset + interval * floor((min_value - offset)/interval)
   
    max_value = stats[1]
    #Due to range issues, a level is added
    max_level = offset + interval * (1 + ceil((max_value - offset)/interval)) 

    levels = arange(min_level, max_level, interval)

    contours = plt.contourf(x_grid, y_grid, raster_values, levels)

    

    for level in range(len(contours.collections)):
        paths = contours.collections[level].get_paths()
        for path in paths:

            feat_out = ogr.Feature( dst_layer.GetLayerDefn())
            feat_out.SetField( attr_name, contours.levels[level] )
            pol = ogr.Geometry(ogr.wkbPolygon)


            ring = None            
            
            for i in range(len(path.vertices)):
                point = path.vertices[i]
                if path.codes[i] == 1:
                    if ring != None:
                        pol.AddGeometry(ring)
                    ring = ogr.Geometry(ogr.wkbLinearRing)
                    
                ring.AddPoint_2D(point[0], point[1])
            

            pol.AddGeometry(ring)
            
            feat_out.SetGeometry(pol)
            if dst_layer.CreateFeature(feat_out) != 0:
                print "Failed to create feature in shapefile.\n"
                exit( 1 )

            
            feat_out.Destroy()

Lines 26 to 49 open the input file and create the output vector file.
Lines 53 to 57 set two matrices, containing the coordinates for each pixel in the input file. To do it, two vectors are created first with the coordinates calculated from the input geotransform, and then the meshgrid is called to generate the 2d matrices, needed by matplotlib to make its calculations.
Lines 59 to 70 read the input file data and calculates the levels inside it, from the interval and the offset. Since calculating isobands for levels not present in the file isn't efficient, the minimum and maximum values in the input file are calculated before the isobands.
The line 72 is where the isobands are actually calculated using the contourf function.
Lines 76 to 105 create the polygons at the output vector file.

The generated contour has a property called collections, containing every isoband.
Each element in the collections has the paths of the isolines.
The property codes in the path indicates if a new ring has to be generated.
The other lines are to generate an OGR feature

isobands_gdal.py

This script is quite dirtier, since I have used a trick to get the isobands from the ContourGenerate function.

Closing the contour lines, which is what has to be done to transform isolines to isobands, is not an easy task.

Looking at some D3js examples I got the idea from Jason Davies conrec.js. I can't find the original example now, but basicaly, the idea is to add some pixels around the image with a value lower than the lowest value, so all the isolines will be closed. The solution gets closer, but some problems must be solved:

Convert the isolines to polygons
Clip the vectors to the original extent
Substract the overlapping polygons, since when the lines are converted to polygons, all the area above the isoline value is contained in the polygon even though other lines with higher values are inside

When not clipped, the isolines will create strange effects at the borders

The main part code is exacly the same as in the other script, and the isobands function is:

def isobands(in_file, band, out_file, out_format, layer_name, attr_name, 
    offset, interval, min_level = None):
    '''
    The method that calculates the isobands
    '''    
    #Loading the raster file
    ds_in = gdal.Open(in_file)
    band_in = ds_in.GetRasterBand(band)
    xsize_in = band_in.XSize
    ysize_in = band_in.YSize

    stats = band_in.GetStatistics(True, True)
    
    if min_level == None:
        min_value = stats[0]
        min_level = ( offset + interval * 
            (floor((min_value - offset)/interval) - 1) )
    nodata_value = min_level - interval    



    geotransform_in = ds_in.GetGeoTransform()
    
    srs = osr.SpatialReference()
    srs.ImportFromWkt( ds_in.GetProjectionRef() )  

    data_in = band_in.ReadAsArray(0, 0, xsize_in, ysize_in)


    #The contour memory
    contour_ds = ogr.GetDriverByName('Memory').CreateDataSource('')
    contour_lyr = contour_ds.CreateLayer('contour', 
        geom_type = ogr.wkbLineString25D, srs = srs )
    field_defn = ogr.FieldDefn('ID', ogr.OFTInteger)
    contour_lyr.CreateField(field_defn)
    field_defn = ogr.FieldDefn('elev', ogr.OFTReal)
    contour_lyr.CreateField(field_defn)

    #The in memory raster band, with new borders to close all the polygons
    driver = gdal.GetDriverByName( 'MEM' )
    xsize_out = xsize_in + 2
    ysize_out = ysize_in + 2

    column = numpy.ones((ysize_in, 1)) * nodata_value
    line = numpy.ones((1, xsize_out)) * nodata_value

    data_out = numpy.concatenate((column, data_in, column), axis=1)
    data_out = numpy.concatenate((line, data_out, line), axis=0)

    ds_mem = driver.Create( '', xsize_out, ysize_out, 1, band_in.DataType)
    ds_mem.GetRasterBand(1).WriteArray(data_out, 0, 0)
    ds_mem.SetProjection(ds_in.GetProjection())
    #We have added the buffer!
    ds_mem.SetGeoTransform((geotransform_in[0]-geotransform_in[1],
        geotransform_in[1], 0, geotransform_in[3]-geotransform_in[5], 
        0, geotransform_in[5]))
    gdal.ContourGenerate(ds_mem.GetRasterBand(1), interval, 
        offset, [], 0, 0, contour_lyr, 0, 1)

    #Creating the output vectorial file
    drv = ogr.GetDriverByName(out_format)
    if exists(out_file):
        remove(out_file)
    dst_ds = drv.CreateDataSource( out_file )

      
    dst_layer = dst_ds.CreateLayer(layer_name, 
        geom_type = ogr.wkbPolygon, srs = srs)

    fdef = ogr.FieldDefn( attr_name, ogr.OFTReal )
    dst_layer.CreateField( fdef )


    contour_lyr.ResetReading()

    geometry_list = {}
    for feat_in in contour_lyr:
        value = feat_in.GetFieldAsDouble(1)

        geom_in = feat_in.GetGeometryRef()
        points = geom_in.GetPoints()

        if ( (value >= min_level and points[0][0] == points[-1][0]) and 
            (points[0][1] == points[-1][1]) ):
            if (value in geometry_list) is False:
                geometry_list[value] = []

            pol = ogr.Geometry(ogr.wkbPolygon)
            ring = ogr.Geometry(ogr.wkbLinearRing)

            for point in points:

                p_y = point[1]
                p_x = point[0]
                          
                if p_x < (geotransform_in[0] + 0.5*geotransform_in[1]):
                    p_x = geotransform_in[0] + 0.5*geotransform_in[1]
                elif p_x > ( (geotransform_in[0] + 
                    (xsize_in - 0.5)*geotransform_in[1]) ):
                    p_x = ( geotransform_in[0] + 
                        (xsize_in - 0.5)*geotransform_in[1] )
                if p_y > (geotransform_in[3] + 0.5*geotransform_in[5]):
                    p_y = geotransform_in[3] + 0.5*geotransform_in[5]
                elif p_y < ( (geotransform_in[3] + 
                    (ysize_in - 0.5)*geotransform_in[5]) ):
                    p_y = ( geotransform_in[3] + 
                        (ysize_in - 0.5)*geotransform_in[5] )

                ring.AddPoint_2D(p_x, p_y)
                
  
            pol.AddGeometry(ring)
            geometry_list[value].append(pol)



    values = sorted(geometry_list.keys())

    geometry_list2 = {}

    for i in range(len(values)):
        geometry_list2[values[i]] = []
        interior_rings = []
        for j in range(len(geometry_list[values[i]])):
            if (j in interior_rings) == False:
                geom = geometry_list[values[i]][j]
                
                for k in range(len(geometry_list[values[i]])):
                    
                    if ((k in interior_rings) == False and 
                        (j in interior_rings) == False):
                        geom2 = geometry_list[values[i]][k]
                        
                        if j != k and geom2 != None and geom != None:
                            if geom2.Within(geom) == True:
                                
                                geom3 = geom.Difference(geom2)
                                interior_rings.append(k)
                                geometry_list[values[i]][j] = geom3
                                                            
                            elif geom.Within(geom2) == True:
                                
                                geom3 = geom2.Difference(geom)
                                interior_rings.append(j)
                                geometry_list[values[i]][k] = geom3
                    
        for j in range(len(geometry_list[values[i]])):
            if ( (j in interior_rings) == False and 
                geometry_list[values[i]][j] != None ):
                geometry_list2[values[i]].append(geometry_list[values[i]][j])
    

    for i in range(len(values)):
        value = values[i]
        if value >= min_level:
            for geom in geometry_list2[values[i]]:
                
                if i < len(values)-1:

                    for geom2 in geometry_list2[values[i+1]]:
                        if geom.Intersects(geom2) is True:
                            geom = geom.Difference(geom2)
                
                feat_out = ogr.Feature( dst_layer.GetLayerDefn())
                feat_out.SetField( attr_name, value )
                feat_out.SetGeometry(geom)
                if dst_layer.CreateFeature(feat_out) != 0:
                    print "Failed to create feature in shapefile.\n"
                    exit( 1 )
                feat_out.Destroy()

Lines 23 to 44 read the input file, including the data, geotransform, etc.
Lines 47 to 54 create a vector file where the contour lines will be written. Since the data must be clipped later, the file is a temporary memory file.
Lines 56 to 71 create a memory raster file with the input data, plus a buffer around it with a value lower than the lowest in the input data, so the contour lines will get closed.
Line 74 calls the ContourGenerate function, that will add the contour lines to the vectorial memory file.
Lines 77 to 91 create the actual output file
Lines 93 to 130 create a collection of geometries with the polygons.

The lines are transformed to polygons by creating a ring from the polyline and adding it to a polygon.
Lines 113 to 123 clip the polygons to the original geotransform, by moving the coordinates of the points outside the geotransform to the border.

Lines 138 to 167 stores the difference from the polygons that overlap, so the output polygons contain only the zones where the values are in the interval.
Lines 170 to 187 write the final polygons to the output file, the same way than the other script.

Notes:

An other option would be, of course, creating the polygons myself. I tried to do it, but as you can see at the marching squares wikipedia page, the algorithm is not easy when using polygons, and especially the matplotlib version of the scripts is quite clean.

The scripts have a problem when there is a nodata zone, since I haven't added the code to handle it.

Finally, as asked in the GDAL-dev mailing list, the best option would be modifying the ContourGenerate function to create isobands, but I think that is out of my scope.
The discussion at the GDAL-dev mailing list where I posted the first version.

The data used for the examples is the same I used at the raster classification post.

Some links I have used:
http://matplotlib.org/api/path_api.html
http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.contourf
http://matplotlib.org/basemap/api/basemap_api.html#mpl_toolkits.basemap.Basemap.contourf
http://fossies.org/dox/matplotlib-1.2.0/classmatplotlib_1_1contour_1_1QuadContourSet.html