Monday, February 6, 2012

Raster classification with GDAL Python

There is a new entry about this topic, with a much more efficient code:


Classifying a raster means assigning a set of discrete values from the original continuous raster data. You can think about it as colorizing a raster file using data intervals, but it has many other uses, of course.

We will use the  Seamless Data Warehouse page from the USGS to get some data. Just open the following page and use the download section tools to get the data from the place you prefer:
http://seamless.usgs.gov/website/seamless/viewer.htm
The result is a file in the sub folder with a number.

So be sure that the GDAL python bindings are installed and execute the script:

#! /usr/bin/python

#Change the value with your raster filename here
raster_file = 'w001001.adf'
output_file = 'classified.tiff'

classification_values = [0,500,1000,1500,2000,2500,3000,3500,4000] #The interval values to classify
classification_output_values = [10,20,30,40,50,60,70,80,90] #The value assigned to each interval

from osgeo import gdal
from osgeo.gdalconst import *
import numpy
import struct

#Opening the raster file
dataset = gdal.Open(raster_file, GA_ReadOnly )
band = dataset.GetRasterBand(1)
#Reading the raster properties
projectionfrom = dataset.GetProjection()
geotransform = dataset.GetGeoTransform()
xsize = band.XSize
ysize = band.YSize
datatype = band.DataType

#Reading the raster values
values = band.ReadRaster( 0, 0, xsize, ysize, xsize, ysize, datatype )
#Conversion between GDAL types and python pack types (Can't use complex integer or float!!)
data_types ={'Byte':'B','UInt16':'H','Int16':'h','UInt32':'I','Int32':'i','Float32':'f','Float64':'d'}
values = struct.unpack(data_types[gdal.GetDataTypeName(band.DataType)]*xsize*ysize,values)

#Now that the raster is into an array, let's classify it
out_str = ''
for value in values:
    index = 0
    for cl_value in classification_values:
        if value <= cl_value:
            out_str = out_str + struct.pack('B',classification_output_values[index])
            break
        index = index + 1
#Once classified, write the output raster
#In the example, it's not possible to use the same output format than the input file, because GDAL is not able to write this file format. Geotiff will be used instead
gtiff = gdal.GetDriverByName('GTiff') 
output_dataset = gtiff.Create(output_file, xsize, ysize, 4)
output_dataset.SetProjection(projectionfrom)
output_dataset.SetGeoTransform(geotransform)

output_dataset.GetRasterBand(1).WriteRaster( 0, 0, xsize, ysize, out_str ) 
output_dataset = None

  • To classify the values is necessary to check for each classification value, until it fits to the interval. 
  • Look also how the projection is kept. The same for the geotransform.
  • If the input file format can be written by GDAL, it's possible to get it with:
    driver = dataset.GetDriver()
  • Look how the GDAL data types are managed. The names are different from the python types.
To see what has occurred,  the best is using QGIS.
Just open both files and you will see that the original file looks something like that once coloured with pseudocolor:
 And how the classified file looks like.

A nice exercice would be generating the coloured png's directly from the python script from a color file. Maybe the next post.




2 comments:

  1. This is the greatest blog i've encountered in several years. Exactly what I have been googling for for so long :)

    /Martin

    ReplyDelete
  2. This comment has been removed by a blog administrator.

    ReplyDelete