Sunday, February 19, 2012

Colorize raster with GDAL python

As I was telling in my last post, what we usually want is to take a raster file, classify it, and output a png in a color scale.
The easiest way to do it, is using gdaldem with the color-relief option.
But is not in python.
So, if you want to integrate it in a script, or modify it's behavior easily, here you have a solution.
All the example data, including the whole script, dataset, etc, can be downloaded here.

The script:

The first thing to do is reading the color file. As in the gdaldem application, the file is plain text with four columns, value, red value, green value, blue value, and, optionally, the alpha value. We will consider alpha value to be 255 if is not set. The comments are indicated with the hash. This format is the one used in the GRASS r.colors utility
So a function is defined:

def readColorTable(color_file):
    color_table = {}
    if exists(color_file) is False:
        raise Exception("Color file " + color_file + " does not exist")
    fp = open(color_file, "r")
    for line in fp:
        if line.find('#') == -1 and line.find('/') == -1:
            entry = line.split()
            if len(entry) == 5:
                alpha = int(entry[4])
    return color_table

Once the table is read, is necessary to read the band in the raster file:

from os.path import exists
from osgeo import gdal
from osgeo.gdalconst import GA_ReadOnly
from struct import unpack

data_types ={'Byte':'B','UInt16':'H','Int16':'h','UInt32':'I','Int32':'i','Float32':'f','Float64':'d'}
if exists(raster_file) is False:
    raise Exception('[Errno 2] No such file or directory: \'' + raster_file + '\'')    
dataset = gdal.Open(raster_file, GA_ReadOnly )
if dataset == None:
    raise Exception("Unable to read the data file")
band = dataset.GetRasterBand(raster_band)
values = band.ReadRaster( 0, 0, band.XSize, band.YSize, band.XSize, band.YSize, band.DataType )
values = unpack(data_types[gdal.GetDataTypeName(band.DataType)]*band.XSize*band.YSize,values)

Notice that the data types in the struct command and in the GDAL library haven't got the same names, so a dict has to be created to translate them. (Maybe there's a better way to do it, but this one works) A band has to be set. If the file has got only one band (as in our example), the value must be one. Then, two images must be created, one for the colors, an the other for the alpha channel:

import Image
import ImageDraw
base = 'RGBA', (band.XSize,band.YSize) )
base_draw = ImageDraw.Draw(base)
alpha_mask ='L', (band.XSize,band.YSize), 255)
alpha_draw = ImageDraw.Draw(alpha_mask)

Now, the classification is done. There are two modes, the one that gives the exact color defined in the color file, and the one that calculates the color according to the value, and the two nearest colors in the palette. Here, to make more readable, only the exact color is shown:

import Image
color_table = readColorTable(color_file)
classification_values = color_table.keys()
for pos in range(len(values)):
        y = pos/band.XSize
        x = pos - y * band.XSize
        for index in range(len(classification_values)):

            if values[pos] <= classification_values[index] or index == len(classification_values)-1:
                if index == 0:
                        index = 1
                    elif index == len(classification_values)-1 and values[pos] >= classification_values[index]:
                        index = index + 1
                    color = color_table[classification_values[index-1]]
                    base_draw.point((x,y), (color[0],color[1],color[2]))
                    base_draw.point((x,y), (int(r),int(g),int(b)))

The first thing to do is getting the values of the color table. Then, for each pixel, the pixel position is calculated. To know the color value to be used, is necessary to iterate all the values in the color table to see which one is just over the pixel value. There is a special case if the pixel value is under the minimum value of the scale. Then, the pixel is set in the color image and the alpha channel. See the break statement to exit the loop.
Once the image and the alpha channel are calculated, the alpha channel has to be blend with the color channel before saving the result:

color_layer ='RGBA', base.size, (255, 255, 255, 0))
base = Image.composite(color_layer, base, alpha_mask)

And that's all.
In the example, everything is packed in a function, and is possible to call it from the command line.
To calculate the color in the -nearest_color_entry mode, the code would be like this for each channel:

r = color_table[classification_values[index-1]][0] + (values[pos] - classification_values[index-1])*(color_table[classification_values[index]][0] - color_table[classification_values[index-1]][0])/(classification_values[index]-classification_values[index-1])

Of course, there is also the special case when the pixel value is lower than the lowest value in the scale or higher than the higher one. The solution is in the script.

Running the code:

We will use the same data of the last post (elevation data from the USGS).
All the sample data and the script is here.
A color file example could be this one (for our elevation data)

1600 166 122 59
1900 180 165 98
2200 188 189 123
2500 74 156 74
2800 123 189 83
3100 165 205 132
3300 255 255 255
To create the file, just type:
python -exact_color_entry w001001.adf colorfile.clr out.png
python -nearest_color_entry w001001.adf colorfile.clr out.png

Sample output with the -exact_color_entry option
Sample output with the default behaviour
It is possible to call the function from another python script just by typing

from colorize import raster2png
except Exception, ex:
    print "Error: " + str(ex) 

If discrete is True, the output is with the exact color in the palette. If is False, with the continuous mode.

Monday, February 6, 2012

Raster classification with GDAL Python

There is a new entry about this topic, with a much more efficient code:

Classifying a raster means assigning a set of discrete values from the original continuous raster data. You can think about it as colorizing a raster file using data intervals, but it has many other uses, of course.

We will use the  Seamless Data Warehouse page from the USGS to get some data. Just open the following page and use the download section tools to get the data from the place you prefer:
The result is a file in the sub folder with a number.

So be sure that the GDAL python bindings are installed and execute the script:

#! /usr/bin/python

#Change the value with your raster filename here
raster_file = 'w001001.adf'
output_file = 'classified.tiff'

classification_values = [0,500,1000,1500,2000,2500,3000,3500,4000] #The interval values to classify
classification_output_values = [10,20,30,40,50,60,70,80,90] #The value assigned to each interval

from osgeo import gdal
from osgeo.gdalconst import *
import numpy
import struct

#Opening the raster file
dataset = gdal.Open(raster_file, GA_ReadOnly )
band = dataset.GetRasterBand(1)
#Reading the raster properties
projectionfrom = dataset.GetProjection()
geotransform = dataset.GetGeoTransform()
xsize = band.XSize
ysize = band.YSize
datatype = band.DataType

#Reading the raster values
values = band.ReadRaster( 0, 0, xsize, ysize, xsize, ysize, datatype )
#Conversion between GDAL types and python pack types (Can't use complex integer or float!!)
data_types ={'Byte':'B','UInt16':'H','Int16':'h','UInt32':'I','Int32':'i','Float32':'f','Float64':'d'}
values = struct.unpack(data_types[gdal.GetDataTypeName(band.DataType)]*xsize*ysize,values)

#Now that the raster is into an array, let's classify it
out_str = ''
for value in values:
    index = 0
    for cl_value in classification_values:
        if value <= cl_value:
            out_str = out_str + struct.pack('B',classification_output_values[index])
        index = index + 1
#Once classified, write the output raster
#In the example, it's not possible to use the same output format than the input file, because GDAL is not able to write this file format. Geotiff will be used instead
gtiff = gdal.GetDriverByName('GTiff') 
output_dataset = gtiff.Create(output_file, xsize, ysize, 4)

output_dataset.GetRasterBand(1).WriteRaster( 0, 0, xsize, ysize, out_str ) 
output_dataset = None

  • To classify the values is necessary to check for each classification value, until it fits to the interval. 
  • Look also how the projection is kept. The same for the geotransform.
  • If the input file format can be written by GDAL, it's possible to get it with:
    driver = dataset.GetDriver()
  • Look how the GDAL data types are managed. The names are different from the python types.
To see what has occurred,  the best is using QGIS.
Just open both files and you will see that the original file looks something like that once coloured with pseudocolor:
 And how the classified file looks like.

A nice exercice would be generating the coloured png's directly from the python script from a color file. Maybe the next post.