Monday, April 1, 2013

Fast tip: Filtering features using OGR Python

Actualization

Take a look to the -where option in ogr2ogr. It does more or less the same, although coding it gives much more flexibility.


Original post

Playing with some files I found that I didn't know how to get only the features in a vectorial file fitting some condition.
So I made a little script.
You can find it at GitHub for download, or take a look at the code:

'''
Writes an ogr file from an input file, filtering the content
Roger Veciana, 23/mar/2013
'''
from osgeo import ogr
from os.path import exists
from os.path import basename
from os.path import splitext
from os import remove
import sys

def extract(in_file, out_file, filter_field, filter_values):
    '''
    Opens the input file, copies it into the oputput file, checking 
    the filter.
    '''
    print "Extracting data"
    
    in_ds = ogr.Open( in_file )
    if in_ds is None:
        print "Open failed.\n"
        sys.exit( 1 )
    in_lyr = in_ds.GetLayerByName( splitext(basename(in_file))[0] )
    if in_lyr is None:
        print "Error opening layer"
        sys.exit( 1 )


    ##Creating the output file, with its projection
    if exists(out_file):
        remove(out_file)
    driver_name = "ESRI Shapefile"
    drv = ogr.GetDriverByName( driver_name )
    if drv is None:
        print "%s driver not available.\n" % driver_name
        sys.exit( 1 )
    out_ds = drv.CreateDataSource( out_file )
    if out_ds is None:
        print "Creation of output file failed.\n"
        sys.exit( 1 )
    proj = in_lyr.GetSpatialRef()
    ##Creating the layer with its fields
    out_lyr = out_ds.CreateLayer( 
        splitext(basename(out_file))[0], proj, ogr.wkbMultiPolygon )
    lyr_def = in_lyr.GetLayerDefn ()
    for i in range(lyr_def.GetFieldCount()):
        out_lyr.CreateField ( lyr_def.GetFieldDefn(i) )
    
    
    ##Writing all the features
    in_lyr.ResetReading()

    for feat in in_lyr:
        value = feat.GetFieldAsString(feat.GetFieldIndex(filter_field))
        if filter_func(value, filter_values):
            out_lyr.CreateFeature(feat)
       

    in_ds = None
    out_ds = None


def filter_func(value, filter_values):
    '''
    The custom filter function. In this case, we check that the value is in the
    value list, stripping the white spaces. In the case of numeric values, a
    comparaison could be done
    '''
    if value.strip() in filter_values:
        return True
    else:
        return False

if __name__ == '__main__':
    extract('./recintos_etrs89.shp', './municipis.shp',
     'provincia', ('8', '25', '17', '43'))

The data I have used is from the CNIG, downloading the file LINEAS LIMITES MUNICIPALES. The file contains all the Spanish municipalities. The script will write the output file using the input data that matches as condition that the field provincia is one of the values 8, 25, 17 or 43, corresponding to the Catalan provinces.

How does the script works:

  • The main method just calls the function extract. (filter is a reserved name for python, so I haven't used it here)
  • The function opens the input file, and creates the output file with the same projection.
  • Then, at line 42, the first layer of the input file is replicated at the output file. This should be changed if more than one layer exists.
  •  At line 50, for each feature in the input layer, the filter function filter_func is applied, and if the result is true, the feature is written in the ouput file.
  • The filter function can be changed depending of the needs (comparing numbers, strings, or whatever). In this case, the white spaces are stripped (if not, some leading spaces that come with the file make the comparison fail), and then, the strings are compared. Quite simple.
Maybe there is better tool (a command-line tool, not a desktop GIS), but I didn't find it. If there is one, please tell me! If there is no one, may be could be a good idea to create a serious one, like gdal_calc.py for the raster calculations.

4 comments:

  1. Thanks for sharing. This is awesome and made my day.
    I am developing some code for England maps. It works! Just a note: at line 75 put './recintos_etrs89.shp' instead of './recintos_ETRS89.shp' to make it working with your shapefile.
    Jacopo

    ReplyDelete
    Replies
    1. Thank you!
      Now it should work properly.

      Delete
    2. Hello, it's Jacopo again. I am working with Virtualenv on the current project and I have encountered some difficulties in getting the library 'ogr' working with Virtualenv. Then I have found out about Fiona, that seems to be much more straight forward (it has a much clearer API too!), and wrote up a little script that does the same as your code. You can check it out here: https://github.com/JacopoHirschstein/filter_shapefile
      Regards,
      Jacopo

      Delete
  2. The post is very helpful to me. Thank you!
    But after exporting a subset of features into a new shapefile, it seems that the extent is not updated. Do you have any idea?

    ReplyDelete