Monday, December 30, 2013

D3 map Styling tutorial I: Preparing the data

This is the first post of a series to explain how to draw styled maps with paths in them, using D3js.
The final map will be an animated map of the Haiyan typhoon track:
This entries are to explain my last post La Belle France, which had a part with filters and an other with paths, not very clearly explained.
So first, the typhoon path data is needed, and a base map to draw under it too.

Base Map

To draw the maps for this tutorial, the wolrd map at 50m resolution used in most of the mbostock's examples will be used. The base map, is taken from this example, and can be seen, as usual, at bl.ocks.org.







The basic difference from the Mercator example is the projection variable:
var projection = d3.geo.mercator()
    .scale(5*(width + 1) / 2 / Math.PI)
    .translate([width / 2, height / 2])
    .rotate([-125, -15, 0])
    .precision(.1);

  • The scale is multoplied by five, to zoom to the typhoon zone
  • A rotation is added, to center the map to the typhoon zone

Typhoon Haiyan track

It's not easy to find the actual data for the hurricanes and typhoons in a convenient format. After looking into several sites, the one I like the most is this one from Japan. The data is in a list which is possible to copy, but putting all the points into an array is not an amusing activity. So here is the script I've used to transform the data into a json:
f = open("track.txt","r")
out = "["
for line in f:
    day = int(line[12:16])
    hour = int(line[16:18])
    lat = float(line[20:24])
    lon = float(line[28:34])
    h_class = int(line[48:49])

    out += '{"day":%d, "hour":%d, "lat":%.1f, "lon":%.1f, "class": %d},' %(
        day, hour, lat, lon, h_class)
out = out[0:-1] + "]"
f.close()

print out
Where track.txt is the file where I pasted the data copied from the web page.

Links

The base map for the next posts
La Belle France - The blog entry that made me write this post
D3 Mercator world map example
Haiyan typhoon track data, from the Japan Meteorological Agency Best Track Data
Digital Typhoon in English, with tracks from many cuurent and past typhoons
Hong Kong observatory Haiyan typhoon track data





Tuesday, December 3, 2013

La Belle France - Map styling with D3js

When I looked to the vectorial web mapping tools for the first time, I was completely carried away by this example made with Kartograph by the software creator Gregor Aisch.
This example was, actually, the reason I learned Kartograph and published some posts here.
Later, I learned D3js, which was also amazing, but I didn't find a map like La Bella Italia. So I tried to do a similar one myself.

See the working web 
To make the map, I had to reproduce a the result of the Kartograph example to understand how the effects were achieved. Then, get some base data to make the map, and finally, code the example.

Getting the data

The map has, basically, data about the land masses, the countries and the French regions.
I tried data from different places (one of them, Eurostat, that ended in this example), until I decided to use the Natural Earth data. After some attempts, I decided to use the 1:10m data. The files are:

  1. ne_10m_admin_0_countries.shp, clipped using:
    ogr2ogr -clipsrc -10 35 15 60 countries.shp ne_10m_admin_0_countries.shp
  2. ne_10m_admin_1_states_provinces.shp, clipped using:
    ogr2ogr -clipsrc -10 35 15 60 -where "adm0_a3 = 'FRA'" regions.shp ne_10m_admin_1_states_provinces.shp
  3. ne_10m_land.shp, downloaded from github, since the official version gave errors when converted to TopoJSON. Clipped using:
    ogr2ogr -clipsrc -10 35 15 60  land.shp ne_10m_land.shp
With that, the land, countries and regions are available. To merge them into a single TopoJson file, I used:
topojson -o ../data.json countries.shp regions.shp

The html code

Since the code is quite long, and I think that I will made some more posts about specific parts of the technique, the comments are a bit shorter than usually.

CSS

Note how is the font AquilineTwo.ttf  loaded:
@font-face {
    font-family: 'AquilineTwoRegular';
    src: url('AquilineTwo-webfont.eot');
    src: url('AquilineTwo-webfont.eot?#iefix') format('embedded-opentype'),
         url('AquilineTwo-webfont.woff') format('woff'),
         url('AquilineTwo-webfont.ttf') format('truetype'),
         url('AquilineTwo-webfont.svg#AquilineTwoRegular') format('svg');
    font-weight: normal;
    font-style: normal;

}

Later, the font can be set using .attr("font-family","AquilineTwoRegular")

Loading the layers

To achieve the effects, some layers are loaded more than one time, so different filters can be applied to get the shades and blurred borders:
svg.selectAll(".bgback")
    .data(topojson.feature(data, data.objects.land).features)
  .enter()
    .append("path")
      .attr("class", "bgback")
      .attr("d", path)
      .style("filter","url(#oglow)")
      .style("stroke", "#999")
      .style("stroke-width", 0.2);
In this case, the land masses are drawn, applying the effect named oglow, which looks like:
var oglow = defs.append("filter")
  .attr("id","oglow");
  oglow.append("feColorMatrix")
    .attr("in","SourceGraphic")
    .attr("type", "matrix")
    .attr("values", "0 0 0 0 0   0 0 0 0 0   0 0 0 0 0   0 0 0 1 0")
    .attr("result","mask");
  oglow.append("feMorphology")
    .attr("in","mask")
    .attr("radius","1")
    .attr("operator","dilate")
    .attr("result","mask");
  oglow.append("feColorMatrix")
    .attr("in","mask")
    .attr("type", "matrix")
    .attr("values", "0 0 0 0 0.6 0 0 0 0 0.5333333333333333 0 0 0 0 0.5333333333333333  0 0 0 1 0")
    .attr("result","r0");
  oglow.append("feGaussianBlur")
    .attr("in","r0")
    .attr("stdDeviation","4")
    .attr("result","r1");
  oglow.append("feComposite")
    .attr("operator","out")
    .attr("in","r1")
    .attr("in2","mask")
    .attr("result","comp");
To see how svg filters work, many pages are available. I got them looking at the Kartograph example generated html.

Adding the labels

The labels aren't inside the TopoJSON (although they could be!), so I decided the labels to add and put them into an array:
var cities = [ 
                {'pos': [2.351, 48.857], 'name': 'Paris'},
                {'pos':[5.381, 43.293], 'name': 'Marseille'},
                {'pos':[3.878, 43.609], 'name': 'Montpellier'},
                {'pos':[4.856, 45.756], 'name': 'Lyon'}, 
                {'pos':[1.436, 43.602], 'name': 'Toulouse'},
                {'pos':[-0.566, 44.841], 'name': 'Bordeaux'},
                {'pos':[-1.553, 47.212], 'name': 'Nantes'},
                {'pos':[8.737, 41.925], 'name': 'Ajaccio'},
              ];
Then, adding them to the map is easy:
var city_labels =svg.selectAll(".city_label")
    .data(cities)
    .enter();

  city_labels
    .append("text")
    .attr("class", "city_label")
    .text(function(d){return d.name;})
    .attr("font-family", "AquilineTwoRegular")
    .attr("font-size", "18px")
    .attr("fill", "#544")
    .attr("x",function(d){return projection(d.pos)[0];})
    .attr("y",function(d){return projection(d.pos)[1];});


  city_labels
    .append("circle")
    .attr("r", 3)
    .attr("fill", "black")
    .attr("cx",function(d){return projection(d.pos)[0];})
    .attr("cy",function(d){return projection(d.pos)[1];});
Note that the positions must be calculated transforming the longitude and latitude using the d3js projection functions.

The ship

To draw the ship, tow things are necessary, the path and the ship.
To draw the path:
var ferry_path = [[8.745, 41.908],
                  [8.308, 41.453],
                  [5.559, 43.043], 
                  [5.268, 43.187], 
                  [5.306, 43.289]
                  ];
  var shipPathLine = d3.svg.line()
    .interpolate("cardinal")
    .x(function(d) { return projection(d)[0]; })
    .y(function(d) { return projection(d)[1]; });

  var shipPath = svg.append("path")
    .attr("d",shipPathLine(ferry_path))
    .attr("stroke","#000")
    .attr("class","ferry_path");
Basically, d3.svg.line is used to interpolate the points, making the line smoother. This is easier than the Kartograph way with geopaths, where the Bézier control points have to be calculated. d3.svg.line is amazing, more than what I thought before.
I don't know if the way to calculate the projected points is the best one, since I do it twice for each point, which is ugly.
To move the ship, a ship image is appended, and then moved with a setInterval:
  var shipPathEl = shipPath.node();
  var shipPathElLen = shipPathEl.getTotalLength();

  var pt = shipPathEl.getPointAtLength(0);
  var shipIcon = svg.append("image")
          .attr("xlink:href","ship.png")
          .attr("x", pt.x - 10)
          .attr("y", pt.y - 5.5)
          .attr("width", 15)
          .attr("height", 8);

  var i = 0;
  var delta = 0.05;
  var dist_ease = 0.2;
  var delta_ease = 0.9;
  setInterval(function(){
    
    pt = shipPathEl.getPointAtLength(i*shipPathElLen);
    shipIcon
      .transition()
      .ease("linear")
      .duration(1000)
      .attr("x", pt.x - 10)
      .attr("y", pt.y - 5.5);
    
    //i = i + delta;

    if (i < dist_ease){
      i = i + delta * ((1-delta_ease) + i*delta_ease/dist_ease);
    }else if (i > 1 - dist_ease){
      i = i + delta * (1 - ((i - (1 - dist_ease)) * (delta_ease/dist_ease)));
    }else{
      i = i + delta;
    }
    if (i+0.0001 >= 1 || i-0.0001 <= 0)
      delta = -1 * delta;
  },1000);
The ship position is calculated every second, and moved with a d3js transition to make it smooth (calculating everything more often didn't give this smooth effect)
The speed of the ship is changed depending on the proximity to the harbour, to a void the strange effect of the ship crashing into it. The effect is controlled by dist_ease and delta_ease parameters, that change the distance where the speed is changed, and the amount of speed changed.

What's next

  • The SVG filters should be explained in  a better way, maybe packing them into functions as Kartograph does.
  • SVG rendering lasts quite a lot in my computer. The same happens with Kartograph, so the problem comes from the SVG rendering. Anyway, could be improved.
  • A canvas version would be nice.

Links

La Bella Italia -- The example I have used as a model
Gregor Aisch's home page
Natural Earth



Sunday, November 24, 2013

Fast tip: Drawing raster NO DATA values with MapServer

MapServer is a great way to draw web maps, using it either as a WMS server or creating image files directly.
Is the raster has a No Data value, MapServer reads it from the GDAL library and doesn't draw the pixels with the value. But what happens if you want to draw the pixels with No Data values?

Fast answer 

The solution, that I found here, is setting the LAYER property PROCESSING in this way:

PROCESSING "NODATA=OFF"
which disables the regular MapServer behaviour, which is, quite logically,  not drawing the pixels with the NODATA value.
Once this is done, just create a CLASS with the No Data Value  ¡in the EXPRESSION tag, so the pixels are coloured as you prefer.

Why did I need this stuff

Maybe this is not a common need, but when drawing weather radar images, is a good idea to indicate the areas not covered by the radar in some colour, so it's easy to distinguish between zones where there is no rain from the zones where no data is available. Like in these examples:

Catalan Weather Service radar with the range visible in gray
 The Catalan Weather Service radar in La Panadella with the range visible in gray, using the no data values
http://www.aemet.es/ca/eltiempo/observacion/radar?w=0
The Spanish Agency radar network, with the not covered areas in gray.

Strangely, finding the solution was quite difficult, so I put it in this post. Maybe somebody, someday, will find faster than me.

Some links:

Saturday, October 26, 2013

Using EUROSTAT's data with D3.js

EUROSTAT is the European Union office for statistics. Its goal is to provide statistics at European level that enable comparisons between regions and countries.

The NUTS classification (Nomenclature of territorial units for statistics) is a hierarchical system for dividing up the economic territory of the EU. Many of the EUROSTAT data is relative to these regions, and this data is a good candidate to be mapped.

The problem is that EUROSTAT provides some shapefiles in one hand, but without the actual names for the regions, nor other information, and some tables  with the name and population for every region. This data has to be joined to be able to map properly.

In this example, he EUROSTAT's criminality statistics are mapped using these provided files.

You can see the working example at bl.ocks.org, and get all the source code at GitHub.
There is also a second example to see how to change the data easily.

Organizing the data

At their site, EUROSTAT provides different Shapefiles for different scales. I think that the best option for maps of all Europe is to use NUTS_2010_10M_SH.zip. Using the 1:60 million scale could be an option, but the file doesn't contain the NUTS3 polygons, so only data at country or big region level is possible.

After downloading the polygons, you will see that no names or population are inside the fields, only the NUTS code. So to make a map, we have to label every polygon. To do it, I downloaded  the NUTS2010.zip file, which contains a csv file with the names for every NUTS region. At the web, it's also possible to find the population in a file, looking for the text Annual average population (1 000) by sex - NUTS 3 regions. Then, I made the following script:

'''
Joins a shp file from EUROSTAT with the xls with the region names so
the resulting shp file has the name of every region and the code.
Roger Veciana, oct 2013
'''
from osgeo import ogr
from os.path import exists
from os.path import basename
from os.path import splitext
from os import remove

def create_shp(in_shp, out_shp, csv_data):
    print "Extracting data"
    
    in_ds = ogr.Open( in_shp )
    if in_ds is None:
        print "Open failed.\n"
        sys.exit( 1 )
    in_lyr = in_ds.GetLayerByName( splitext(basename(in_shp))[0] )
    if in_lyr is None:
        print "Error opening layer"
        sys.exit( 1 )

    if exists(out_shp):
        remove(out_shp)
    driver_name = "ESRI Shapefile"
    drv = ogr.GetDriverByName( driver_name )
    if drv is None:
        print "%s driver not available.\n" % driver_name
        sys.exit( 1 )
    out_ds = drv.CreateDataSource( out_shp )
    if out_ds is None:
        print "Creation of output file failed.\n"
        sys.exit( 1 )
    proj = in_lyr.GetSpatialRef()
    ##Creating the layer with its fields
    out_lyr = out_ds.CreateLayer( 
        splitext(basename(out_shp))[0], proj, ogr.wkbMultiPolygon )
    lyr_def = in_lyr.GetLayerDefn ()
    for i in range(lyr_def.GetFieldCount()):
        out_lyr.CreateField ( lyr_def.GetFieldDefn(i) )

    field_defn = ogr.FieldDefn( "NAME", ogr.OFTString )
    out_lyr.CreateField ( field_defn )

    field_defn = ogr.FieldDefn( "COUNTRY", ogr.OFTString )
    out_lyr.CreateField ( field_defn )

    field_defn = ogr.FieldDefn( "COUNTRY_CO", ogr.OFTString )
    out_lyr.CreateField ( field_defn )

    field_defn = ogr.FieldDefn( "POPULATION", ogr.OFTInteger )
    out_lyr.CreateField ( field_defn )
    

    ##Writing all the features
    in_lyr.ResetReading()
    for feat_in in in_lyr:
        value = feat_in.GetFieldAsString(feat_in.GetFieldIndex('NUTS_ID'))
        if value in csv_data:
            #print csv_data[value]
            feat_out = ogr.Feature( out_lyr.GetLayerDefn())
            feat_out.SetField( 'NUTS_ID', value )
            feat_out.SetField( 'NAME', csv_data[value]['label'] )
            feat_out.SetField( 'POPULATION', csv_data[value]['population'] )
            feat_out.SetField( 'COUNTRY_CO', csv_data[value]['id_country'] )
            feat_out.SetField( 'COUNTRY', csv_data[csv_data[value]['id_country']]['label'] )
            feat_out.SetField( 'STAT_LEVL_', csv_data[value]['level'] )
            feat_out.SetField( 'SHAPE_AREA', feat_in.GetFieldAsString(feat_in.GetFieldIndex('SHAPE_AREA')) )
            feat_out.SetField( 'SHAPE_LEN', feat_in.GetFieldAsString(feat_in.GetFieldIndex('SHAPE_LEN')) )

            feat_out.SetGeometry(feat_in.GetGeometryRef())
            out_lyr.CreateFeature(feat_out)
    in_ds = None
    out_ds = None   

def read_population(csv_file):
    '''
    Reads the NUTS csv population file and returns the data in a dict
    '''
    csv_data = {}
    f = open(csv_file, "r")
    f.readline(); #Skip header
    for line in f:
        line_data = line.split(";")
        try:
            csv_data[line_data[0]] = int(float(line_data[4]) * 1000)
        except Exception, e:
            csv_data[line_data[0]] = -9999
    f.close

    return csv_data
def read_names(csv_file, population_data):
    '''
    Reads a NUTS csv file and returns the data in a dict
    '''
    csv_data = {}
    f = open(csv_file, "r")
    f.readline(); #Skip header
    for line in f:
        line_data = line.split(";")
        csv_data[line_data[0]] = {'label': line_data[1],
            'level': line_data[2],
            'id_country': line_data[3],
            'code_country': line_data[4],
            'population': 0}
        if line_data[0] in population_data:
            csv_data[line_data[0]]['population'] = population_data[line_data[0]]

    f.close

    return csv_data

if __name__ == '__main__':
    population_data = read_population('demo_r_d3avg_1_Data.csv')
    csv_data = read_names('NUTS_2010.csv', population_data)
    create_shp('NUTS_2010_10M_SH/Data/NUTS_RG_10M_2010.shp', 'regions.shp', csv_data)
  • Basically, we iterate for every polygon in the shp file, and look into the csv data for the attributes to add. 
  • Population data and region name are in tow separate files. To read them, there are two functions: read_population and read_names.
  • create_shp function reads the geometries, joins them with the population and name, and finally writes the output file.
  • Notice that NUTS regions overlap. There are levels 1, 2 and 3. The first one is at country level, the second takes big areas from the countries, and the third one are the actual departments in every country. To map any data is necessary to choose the level at which the map has to be represented, or mix the different levels if it's necessary, as in this case.
To create a map, I chose the criminality data. The file is again at EUROSTAT, under the text Crimes recorded by the police - NUTS 3 regions. Despite its name, some regions are NUTS3, and some NUTS2. Most of  the files from the site labeled with NUTS regions should work with only little changes in the map code.

Creating a D3.js map to show the data

To draw the data in a map I've used a similar solution as the one in the post D3js Electoral Map:

Crimes recorded by the police - NUTS 3 regions


Crimes recorded by the police - NUTS 3 regions

  • Lines 5 to 18 give style to the map background and the tooltip
  • Lies 29 to 43 set up the map, with its size, projection and svg element
  • Line 45 sets the color scale (this is different to the electoral map example). The colors will vary from blue to red. Discrete colors are an other option. Just changing this line, you can change the whole map colors.
  • Lines 47 to 53 read and parse the csv with the criminality data into  an object (we will only use 2010 data)
  • Line 55 downloads the TopoJSON containing the NUTS regions
  • Then lines 56 to 78, the map is actually drawn. Note
    • Line 60 filters only the regions that have data in the CSV file. For some countries, this is the NUTS2 region (see Italy) and for others, NUTS3 (like Germany). The following lines are only run if there is data (see D3js filter)
    • Line 66 sets the color of the region. Note that calls the scale defined at line 45, and calculates the number by multiplying the crimes by 100000 and dividing by the region population to get the density
    • Line 71 sets the region opacity to 0 if there is no data for it. With the filter could be avoided
    • Lines 78 and 81 set the tooltip box, taken from this example.
  • Line 98 calls the function that draws the legend. The function is in a separate file to make it easy to reuse
 
var legend = function(accessor){
    return function(selection){
      
      //Draw the legend for the map
      var legend = selection.append("g");
      var legendText = accessor.title;
      var numSquares = accessor.elements;
      var xLegend = 0;
      var yLegend = 0;
      var widthLegend = 400;
      
      var title_g = legend.append("g");
      var values_g = legend.append("g");
      
      var legendTitle = title_g.append("text")
        .text("Legend")
        .attr("font-family", "sans-serif")
        .attr("font-size", "18px")
        .attr("fill", "black");
        var bbox = legendTitle.node().getBBox();
        legendTitle.attr("x", xLegend + widthLegend/2 - bbox.width/2);
        legendTitle.attr("y", yLegend + 20);
        
      var legendTextEl = title_g.append("text")
        .text(legendText)
        .attr("y", yLegend + 75)
        .attr("font-family", "sans-serif")
        .attr("font-size", "14px")
        .attr("fill", "black");
      var bbox = legendTextEl.node().getBBox();
      legendTextEl.attr("x", xLegend + widthLegend/2 - bbox.width/2);

      for (i=0; i < numSquares; i++){
        values_g.append("rect")
          .attr("x", xLegend + (i+0.1*i/numSquares)*(widthLegend/numSquares))
          .attr("y", yLegend + 25)
          .attr("width", (widthLegend/numSquares)*0.9)
          .attr("height", 20)
          .attr("fill", color(accessor.domain[0] + i * accessor.domain[1]/(numSquares-1)));
        var value_text = values_g.append("text")
          .text(accessor.domain[0] + i * accessor.domain[1]/(numSquares-1))
          .attr("y", yLegend + 55)
          .attr("font-family", "sans-serif")
          .attr("font-size", "12px")
          .attr("fill", "black");
        var bbox = value_text.node().getBBox();
        value_text
          .attr("x", xLegend + (i+0.1*i/numSquares)*(widthLegend/numSquares) + (widthLegend/numSquares)*(0.9/2)- bbox.width/2);

      }
      var scale = accessor.width / 400;
      legend.attr("transform","translate("+accessor.posx+","+accessor.posy+") scale("+scale+") ");

};
};

  •  Note how the function is called. The argument accessor is the one thas have the object properties (if called from a geometry, for instance), and selection is the node where the function is called from.
  • The code is not very D3jsnic, using things as a for loop but it works and can be reused for drawing other legends from a D3js scale.

Saturday, October 5, 2013

Castor project's earthquakes map with D3js

The Castor project is a submarine natural gas storage facility located in front of the eastern Iberian Peninsula coast. The idea is to store the Algerian gas inside an old oilfield cavity. At least this is what I understood (sorry, geologists).
Somehow, when the facility started working, a series of earthquakes have started to occur. At the beginning, the platform owners said it wasn't related to their activity, but now everybody agrees that it is, and the activity has stopped, but not the earthquakes.

I didn't find a nice map about the earthquakes epicenters, so I thought that D3js would be a good option to do it.


The star represents the aproximate platform location, and the circles, the epicenters. It's easy to see why they are related to the platform activity.



The animated map is at my bl.ocks.org page. The explanations are in Catalan, but basically say the same as here.

Getting the data

The data about the significant earthquakes around Catalonia can be found at the Catalan Geologic Institute web site, but the format for the reports is not very convenient to get the data, so I made this python script to get it:

# -*- coding: utf-8 -*-
#http://jramosgarcia.wordpress.com/2013/10/01/que-es-el-proyecto-castor/
import urllib
import json
import dateutil.parser
import datetime

def get_data(url):
    filehandle = urllib.urlopen(url)
    html = filehandle.read()
    filehandle.close()

    lines = html.splitlines()

    for i in range(len(lines)):
    
        if lines[i].find('Latitud') > 0:
            lat = lines[i].strip().split(" ")[1].replace("º","")
        if lines[i].find('Longitud') > 0:
            lon = lines[i].strip().split(" ")[1].replace("º","")   
        if lines[i].find('mol del dia') > 0:
            date = lines[i + 1].strip().replace(" >/div<","")
        if lines[i].find('Hora origen') > 0:
            hour = lines[i].strip().split(" ")[4]                   
        if lines[i].find('Magnitud') > 0:
            magnitude = lines[i+1].strip().split(" ")[0]

    date_array  = date.split("/")
    hour_array = hour.split(":")

    date_time = datetime.datetime(int(date_array[2]),int(date_array[1]),int(date_array[0]),int(hour_array[0]), int(hour_array[1]))

    data = {'lat':lat, 'lon':lon, 'datetime': date_time.isoformat(), 'magnitude': magnitude}
    return json.dumps(data)

if __name__ == "__main__":
    url_list = [
    'http://www.igc.cat/web/gcontent/ca/sismologia/sismescomact/comhistcat/20130910175510/comact.html',
    'http://www.igc.cat/web/gcontent/ca/sismologia/sismescomact/comhistcat/20130913095842/comact.html',
    'http://www.igc.cat/web/gcontent/ca/sismologia/sismescomact/comhistcat/20130918142943/comact.html',
    'http://www.igc.cat/web/gcontent/ca/sismologia/sismescomact/comhistcat/20130920104607/comact.html',
    'http://www.igc.cat/web/gcontent/ca/sismologia/sismescomact/comhistcat/20130924091301/comact.html',
    'http://www.igc.cat/web/gcontent/ca/sismologia/sismescomact/comhistcat/20130925125029/comact.html',
    'http://www.igc.cat/web/gcontent/ca/sismologia/sismescomact/comhistcat/20130929084140/comact.html',
    'http://www.igc.cat/web/gcontent/ca/sismologia/sismescomact/comhistcat/20130929192416/comact.html',
    'http://www.igc.cat/web/gcontent/ca/sismologia/sismescomact/comhistcat/20130930005900/comact.html',
    'http://www.igc.cat/web/gcontent/ca/sismologia/sismescomact/comhistcat/20130930051316/comact.html',
    'http://www.igc.cat/web/gcontent/ca/sismologia/sismescomact/comhistcat/20131001045206/comact.html',
    'http://www.igc.cat/web/gcontent/ca/sismologia/sismescomact/comhistcat/20131001055709/comact.html',
    'http://www.igc.cat/web/gcontent/ca/sismologia/sismescomact/comhistcat/20131002121626/comact.html',
    'http://www.igc.cat/web/gcontent/ca/sismologia/sismescomact/comhistcat/20131002232928/comact.html',
    'http://www.igc.cat/web/gcontent/ca/sismologia/sismescomact/comhistcat/20131003012732/comact.html',
    'http://www.igc.cat/web/gcontent/ca/sismologia/sismescomact/comhistcat/20131003031301/comact.html',
    'http://www.igc.cat/web/gcontent/ca/sismologia/sismescomact/comhistcat/20131004113222/comact.html',
    'http://www.igc.cat/web/gcontent/ca/sismologia/sismescomact/comhistcat/20131004120323/comact.html'
    ]

    f = open("data.json","w")
    f.write("[")
    json_data = ""
    for url in url_list:
     json_data = json_data + get_data( url ) + ", "
       
    f.write(json_data[0:-2])
    f.write("]")
    f.close()
   


  • The web pages are in the list, since the interesting reports have to be chosen one by one. It would be nice to have a better way to do it.
  • Then, the get_data function just parses the text in the way that all the reports are parsed properly.The data is stored in a json file to make easier it's use from D3js.

Using D3js to visualize the data

I used this example by Mike Bostock to create the background  map. The tiles origin has been changed because the example tiles are not available at this zoom level, and to have more points of interest to situate the earthquake locations.

This is the code:





<script>
/**
Based on Mike Bostock's http://bl.ocks.org/mbostock/4150951
*/
var width = 960,
    height = 500;

var projection = d3.geo.mercator()
    .center([0.5491, 40.4942])
    .scale(20000);

var path = d3.geo.path()
    .projection(projection);

var tile = d3.geo.tile()
    .scale(projection.scale() * 2 * Math.PI)
    .translate(projection([0, 0]))
    .zoomDelta((window.devicePixelRatio || 1) - .5);

var svg = d3.select("body").append("svg")
    .attr("width", width)
    .attr("height", height);

  var tiles = tile();

  var defs = svg.append("defs");

var magnitudeScale = d3.scale.linear().domain([2,5]).range([5,30]);
d3.json("data.json", function(error, locations) {

  svg.append("g")
      
    .selectAll("image")
      .data(tiles)
    .enter().append("image")
      .attr("xlink:href", function(d) { return "http://" + ["a", "b", "c", "d"][Math.random() * 4 | 0] + ".tiles.mapbox.com/v3/examples.map-vyofok3q/" + d[2] + "/" + d[0] + "/" + d[1] + ".png"; })
      .attr("width", Math.round(tiles.scale) + 1)
      .attr("height", Math.round(tiles.scale) + 1)
      .attr("x", function(d) { return Math.round((d[0] + tiles.translate[0]) * tiles.scale); })
      .attr("y", function(d) { return Math.round((d[1] + tiles.translate[1]) * tiles.scale); });

  svg.append("g")
      .append('path')
      .attr("d","m 0,0 -8.47858,-5.22254 -8.31623,5.47756 2.34696,-9.67752 -7.77927,-6.21653 9.92909,-0.75852 3.50829,-9.31953 3.78972,9.20873 9.94748,0.45679 -7.58687,6.44982 z")
      .attr("stroke","black")
      .attr("stroke-width",2)
      .style("fill", d3.rgb(90, 90, 90))
      .attr("transform", "translate("+projection([0.66879, 40.33503])[0]+","+projection([0.66879, 40.33503])[1]+")");
  
  var locationsG = svg.append("g");
  

  function addLocation(loc){
    locationsG.append('circle')
      .attr('class','location')
      .attr("r", 5)
      .attr("cx", projection([loc.lon, loc.lat])[0])
      .attr("cy", projection([loc.lon, loc.lat])[1])
      .style("fill", d3.rgb(255, 0, 0).darker(2))
      .style("opacity", 0.8)
      .transition()
      .duration(1000)
      .attr("r", magnitudeScale(loc.magnitude))
      .transition()
      .delay(2000)
      .duration(2000)
      .style("opacity",0.3);

    locationsG
      .append("text")
      .text(loc.magnitude)
      .attr("x", projection([loc.lon, loc.lat])[0] - 10)
      .attr("y", projection([loc.lon, loc.lat])[1] + 5)
      .attr("font-family", "sans-serif")
      .attr("font-size", "12px")
      .attr("fill", "black")
      .style("opacity",0)
      .transition()
      .duration(1000)
      .style("opacity",1)
      .transition()
      .delay(2000)
      .duration(2000)
      .style("opacity",0);
  }

  //addLocation({"lat": "40.43", "magnitude": "2.7", "lon": "0.7", "datetime": "2013-10-09T06:43:16"});
  
  var intDate = new Date("2013-09-10T00:00:00Z");
  var maxDate = new Date("2013-10-04T00:00:00Z");
  var usedLocations = new Array();

  var dateTitle = svg
    .append("text")
    .attr("id", "dataTitle")
    .text(intDate.toLocaleDateString())
    .attr("x", 70)
    .attr("y", 20)
    .attr("font-family", "sans-serif")
    .attr("font-size", "20px")
    .attr("fill", "black");

  var interval = setInterval(function() {

    dateTitle.text(intDate.toLocaleDateString());

    for (i = 0; imaxDate){
        clearInterval(interval);
      }

    }
    
    intDate.setDate(intDate.getDate() + 1);
  }, 1000);
  
});



  •  Lines 30 and 48 are the most important ones to create the tile map. In this case, no zoom or pan is set, so the code is quite simple.
  • The line 57 creates the star indicating the platform location. I made the star using Inkscape and captured the code using the tool to see the generated svg. To situate the symbol on the map, a transform attribute is used, translating the symbol to the pixel calculated projecting the location coordinates.
  • To add the earthquake epicenters:
    • The function addLocation (line 68) adds the circle and the text indicating the earthquake magnitude. To locate them, the projection function is used again. Two transitions are used to make the circle grow and then get brighter. The same for making the text disappear.
    • An interval is created (line 118) to change the date (one day per second). For every iteration, the date is evaluated and the earthquakes occured during this day plotted using the addLocation function. I didn't find a "more D3js" solution. The earthquakes already plotted are stored in an array to avoid plotting them more than once.
      The date on the map is also changed for every iteration.

Some related posts and pages:

Thursday, September 5, 2013

Reading WRF NetCDF files with GDAL python

Since I work at a meteorological service I have to deal quite often with numerical weather prediction models. I use to work with GRIB files that have the data in the format I need to draw simple maps.
But sometimes, I need to calculate stuff that require the original data, since GRIB files are usually reprojected and simplified in vertical levels.
In the case of the WRF model, the output is in NetCDF format. The data is in sigma levels instead the pressure levels I need, and the variables such as temperature don't come directly in the file, but need to be calculated.
The post is actually written to order the things I've been using for myself, but I've added it to the blog because I haven't found step by step instructions.

Getting the data

As usual, you can download the script from GitHub
I have used some data from the model run at the SMC, where I work, but you can  get a WRF output from the ucar.edu web site. The projection is different from the one I have used, and it has 300 sigma levels instead of 30, but it should work.

The NetCDF format


Every NETCDF file contains a collection of datasets with different sizes and types. Each dataset can have one or more layers, can have unidimensional arrays in it, etc. So the "traditional" GDAL datamodel can't work directly, since assumes that a file have multiple layers of the same size. That's why "subdatasets" are introduced. So opening the file directly (or running gdalinfo) only gives some metadata about all the datasets, but no actual data can be read or consulted. To do it, the subdataset must be opened instead.

Opening a subdataset

Since opening the file directly doesn't open the actual dataset, but the "container", a different notation is used, maintaining the Open method. The filename string must be preceded by NETCDF: and continued by :VAR_NAME
To open the variable XLONG subdataset, the order would be:
gdal.Open('NETCDF:"'+datafile+'":XLONG')

ARWPost

ARWPost is a program that reads WRF outputs in FORTRAN. I have used it's code to know how to calculate the actual model variables from the model NetCDF file.
There is a file for calculating each variable, some for the interpolation, etc. so it has helped me a lot when trying to understand how everything works.
The problem of ARWPost is that the usage is quite complicated (change a configuration file and executing), so it was not very practical to me.

Sigma Levels

Instead of using constant pressure levels, the wrf model works using the Sigma coordinate system
This makes a bit more difficult extracting the data at a constant pressure, since for every point in the horizontal grid, a different level position is needed to get the value.

Besides, not all the variables use the same sigma levels. There are two different level sets known as "half levels" and "full levels", as shown in the picture:
Image adapted from the Wikipedia Sigma Levels entry

Of course, this makes things a bit more complicated.

A layer sigma value in the half level is the average between full level above and below.

Getting the geopotential height at a pressure level, step by step

Setting the right projection

NetCDF can hold the following projections:
'lambert', 'polar', 'mercator', 'lat-lon' and 'rotated_ll'. The projection is defined using the label MAP_PROJ at the original file metadata (not the subdatasets metadata). To get all the metadata file, jut use:
ds_in = gdal.Open(datafile)
metadata = ds_in.GetMetadata()

All the metadata comes with the prefix NC_GLOBAL#

In my case, I have MAP_PROJ=2 ,so the projection is polar stereographic (the code is the position in the projection list, starting by 1). I'll set it with:

proj_out = osr.SpatialReference()
proj_out.SetPS(90, -1.5, 1, 0, 0)

where -1.5 is the STAND_LON label at the original file metadata and 90 is the POLE_LAT label.

Finding the geotransform

I'm not sure if the method I use is the best one, but the data I get is consistent.
The metadata given at the main file doesn't set the geotransform properly (or, at least, I'm not able to get it), since it sets only the center point coordinates, and the deltax and deltay. But this deltas are not referenced to the same point, so getting the real geotransform is not easy.

Fortunately, the files come with the fields XLONG and XLAT, that contain the longitude and latitude for each point in the matrix. So taking the corners is possible to get the geotransform, although the coordinates have to be reprojected before:
ds_lon = gdal.Open('NETCDF:"'+datafile+'":XLONG')
ds_lat = gdal.Open('NETCDF:"'+datafile+'":XLAT')

longs = ds_lon.GetRasterBand(1).ReadAsArray()
lats = ds_lat.GetRasterBand(1).ReadAsArray()
ds_lon = None
ds_lat = None
proj_gcp = osr.SpatialReference()
proj_gcp.ImportFromEPSG(4326)
transf = osr.CoordinateTransformation(proj_gcp, proj_out)
ul = transf.TransformPoint(float(longs[0][0]), float(lats[0][0]))
lr = transf.TransformPoint(float(longs[len(longs)-1][len(longs[0])-1]), float(lats[len(longs)-1][len(longs[0])-1]))
ur = transf.TransformPoint(float(longs[0][len(longs[0])-1]), float(lats[0][len(longs[0])-1]))
ll = transf.TransformPoint(float(longs[len(longs)-1][0]), float(lats[len(longs)-1][0]))
gt0 = ul[0]
gt1 = (ur[0] - gt0) / len(longs[0])
gt2 = (lr[0] - gt0 - gt1*len(longs[0])) / len(longs)
gt3 = ul[1]
gt5 = (ll[1] - gt3) / len(longs)
gt4 = (lr[1] - gt3 - gt5*len(longs) ) / len(longs[0])
geotransform = (gt0,gt1,gt2,gt3,gt4,gt5)

Getting the geopotential height

From the ARWPost code, I know that the geopotential height is:

H = (PH + PHB)/9.81

So the PH and PHB subdatasets must be added and divided by the gravity to get the actual geopotential height

The code:
ds_ph = gdal.Open('NETCDF:"'+datafile+'":PH')
ds_phb = gdal.Open('NETCDF:"'+datafile+'":PHB')
num_bands = ds_ph.RasterCount
data_hgp = numpy.zeros(((num_bands, ds_ph.RasterYSize, ds_ph.RasterXSize)))
for i in range(num_bands):
    data_hgp[i,:,:] = ( ds_ph.GetRasterBand(num_bands - i).ReadAsArray() + ds_phb.GetRasterBand(num_bands - i).ReadAsArray() ) / 9.81
    ds_ph = None 
    ds_phb = None

Note that the bands are inverted in order and that in the raster, the order of the elements is [layer][y][x]. This is needed later to calculate layer positions.

Calculating the geopotential height for a given Air Pressure

At this moment, we have the geopotential height for every sigma level, which is not very useful for working with the data. People prefers to have the geopotential height for a given pressure (850 hPa, 500 hPa, and so on).

The model gives the pressure values at every point, calculated doing:
Press = P + PB
in Pa or
Press = (P + PB ) / 100
in hPa

The code for reading the pressure:
ds_p = gdal.Open('NETCDF:"'+datafile+'":P')
ds_pb = gdal.Open('NETCDF:"'+datafile+'":PB')
num_bands = ds_p.RasterCount
data_p = numpy.zeros(((num_bands, y_size, x_size)))
for i in range(num_bands):
    data_p[i,:,:] = ( ds_p.GetRasterBand(num_bands - i).ReadAsArray() + ds_pb.GetRasterBand(num_bands - i).ReadAsArray() ) / 100
    ds_p = None
    ds_pb = None

Now we can calculated at which layer on the pressure subdataset the given pressure would be.
NumPy has a function that calculates at which position a value would be in an ordered array called searchsorted: http://docs.scipy.org/doc/numpy/reference/generated/numpy.searchsorted.html
numpy.searchsorted([1,2,3,4,5], 2.5)

would give
2, since the value 2.5 is between 2 and 3 (positions 1 and 2).
I needed two tricks to use it:
*The function needs to have an ascending ordered array. That's why the layers order is reversed in the code, because the air pressure is descending with height, and we need the opposite to use this function.
*The function can only run in 1-d arrays. To use it in our 3-d array, the numpy.apply_along_axis function is used:
numpy.apply_along_axis(lambda a: a.searchsorted(p_ref), axis = 0, arr = data_p)
where p_ref is the pressure we want for the geopotential height.
That's why the layer number is the first element in the array, because is the only way to get apply_along_axis work.


The final code for the calculation:
p_ref = 500
data_positions = numpy.apply_along_axis(lambda a: a.searchsorted(p_ref), axis = 0, arr = data_p)

Now we know between which layers a given pressure would be for every point. To know the actual pressures at the upper and lower level for each point, the numpy.choose function can be used:
p0 = numpy.choose(data_positions - 1, data_p)
p1 = numpy.choose(data_positions, data_p)
layer_p = (data_positions - 1) + (p_ref - p0) / (p1 - p0)

Now we know the position in the pressure layers. i.e. if we want 850 hPa, ant for one point the upper layer 3 is 900 hPa and the lower layer 2 is 800 hPa, the value would be 2.5

But for the geopotential height, the sigma level is not the same, since the pressure is referenced to the half levels, and the geopotential to the full levels.
So the we will add 0.5 layers to the pressure layer to get the geopotential height layer:

layer_h = layer_p + 0.5


Then the final value will be:
h0 = numpy.floor(layer_h).astype(int)
h1 = numpy.ceil(layer_h).astype(int)
data_out = ( numpy.choose(h0, data_hgp) * (h1 - layer_h) ) + ( numpy.choose(h1, data_hgp) * (layer_h - h0) )

Note that the h0 and h1 (the positions of the layers above and below the final position) must be integers to be used with the choose function.

Extrapolation

If the wanted pressure is 500hPa, a data layer above and below will be always found. But when we get closer to the land, like at 850 hPa, another typical pressure to show the geopotential height, we find that in some points, the pressure value is above the pressure layers of the model. So an extrapolation is needed, and the previous code, that assumed that the value would always between two layers will fail.

So the data calculated above must be filtered for each case. For the non interpolated data:
data_out = data_out * (data_positions < num_bands_p)

The extrapolation is defined in the file module_interp.f90 from the ARWPost source code. There are two defined cases:
  1. The height is below the lowest sigma level, but above the ground height defined for that point
  2. The height is below the ground level

The first case is quite simple, because the surface data for many variables is available, so it only changes the layer used to calculate the interpolation:
##If the pressure is between the surface pressure and the lowest layer
array_filter = numpy.logical_and(data_positions == num_bands_p, p_ref*100 < data_psfc)
zlev = ( ( (100.*p_ref - 100*data_p[-1,:,:])*data_hgt + (data_psfc - 100.*p_ref)*data_hgp[-1,:,:] ) / (data_psfc - 100*data_p[-1,:,:])) 
data_out = data_out + (zlev * array_filter )

The second case needs to extrapolate below the ground level. The code at the file module_interp.f90 iterates over each element in the array, but using numpy will increase the speed a lot, so let's see how is it done:
expon=287.04*0.0065/9.81
ptarget = (data_psfc*0.01) - 150.0

kupper = numpy.apply_along_axis(lambda a: numpy.where(a == numpy.min(a))[0][0] + 1, axis = 0, arr = numpy.absolute(data_p - ptarget))

pbot=numpy.maximum(100*data_p[-1,:,:], data_psfc)
zbot=numpy.minimum(data_hgp[-1,:,:], data_hgt)

tbotextrap=numpy.choose(kupper, data_tk) * pow((pbot/(100*numpy.choose(kupper, data_p))),expon)
tvbotextrap=tbotextrap*(0.622+data_qv[-1,:,:])/(0.622*(1+data_qv[-1,:,:]))

zlev = zbot+((tvbotextrap/0.0065)*(1.0-pow(100.0*p_ref/pbot,expon)))

data_out = data_out + ( zlev * array_filter )

  • Expon and ptarget are calculated directly (ptarget is an array, but the operations are simple)
  • kupper is a parameter that could be defined as the level where the absolute value of the pressure - ptarget changes. Since we have a 3d array, being the 0th dimension the level, we can calculate the values in the 2d xy matrix by using the apply_along_axis method. This method will take all the levels and run the function defined in the first parameter passing the second parameter as the argument. In our case, the position (where) of the minimum value (min). We add the +1 to get the same result than the original code.
    So in one line, we do what in the original code needs three nested loops!
    The value is inverse to the ones at the original code, because we changed the pressure order.
  • pbot and zbot are only to take the value closer to the ground betweeen the lowest level and the ground level defined by the model
  • tbotextrap is the extrapolated temperature. The choose function is used to get the value at the calculated kupper level for each element in the matrix.
  • tvbotextrap is the virtual temperature for tbotextrap. The original code uses an external function, but I have joined everything. The formula for the virtual temperature is virtual=tmp*(0.622+rmix)/(0.622*(1.+rmix))
  • zlev can be calculated directly with the values calculated before, and is added to the output after applying the filter.

Drawing the map using Basemap

The output file can be opened with QGIS, or any other GIS capable to read GeoTIFF files. 

But is possible to draw it directly from the script using Basemap:
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
m = Basemap(projection='cyl',llcrnrlat=40.5,urcrnrlat=43,\
            llcrnrlon=0,urcrnrlon=3.5,resolution='h')


cs = m.contourf(longs,lats,data_out,numpy.arange(1530,1600,10),latlon=True)
m.drawcoastlines(color='gray')
m.drawcountries(color='gray')

cbar = m.colorbar(cs,location='bottom',pad="5%")
cbar.set_label('m')

plt.title('Geopotential height at 850hPa')
plt.show()

  • First, we initialize the projection (latlon directly in this case), setting the map limits.
  • contourf will draw the isobands for each 10m.
  • Then, just draw the borders and coas lines and show the map.
This is how the map at the header was made.

Notes

To compile ARWPost in my Ubuntu, the file src/Makefile needs to be changed: -lnetcdf for -lnetcdff
It was very hard for me to find it!

Wednesday, August 28, 2013

Flag map with D3js

 In this example I'll show how to draw a map filling the polygons with images. In this case, Western Africa contries filled with their flags:

The zone represented is this one:

As usual, you can get all the code at GitHub:


Canvas

The code basically uses Mike Bostock's queue.js to download all the flag images from the wikipedia, the world borders and the contries names. Once everyting is donwloaded, a function is triggered, that draws the flags, clipping them by the country shape. The first version is written using Canvas instead of SVG, just to see how the clipping and loading of external images is done.

The code:



  • I make an array with all the image locations, indexing by name to make it easier to understand, even though the map file doesn't contain names but indexes.
  • Lines 37 to 47 configure the map projection and the Canvas element.
  • Lines 54 to 77 load all the elements. Since loading an image is not a function in D3js, a custom function has to be created, returning the error or the success value. If images are not preloaded, the code will try to paste them anyway, so the country flag won't be drawn
  • Function ready is triggered when everything is loaded, and it actually draws the map.
    • The error is handled at the first lines (very poorly, I know, a nodata image could be loaded, or something like that)
    • Until line 97, the base map is loaded (the land gets hidden later by the flags).
    • Lines 100 to 109 get the countries from the list, looks for them in the countries file by using a filter function and creates the images we are going to use later.
    • At line 111, all the flags are pasted on the map.
      • context.save();  Saves what has been drawn until the moment, so new things can happen
      • Then, the country border is drawn in a path, without using any color or fill, because then, context.clip(); is called to use this path as the clip path.
      • The image is pasted, using the drawImage method. The bounds are recalculated before for the countries that appera cut at the map.
      • context.restore(); frees the clipping.
    • Finally, at line 130, the borders are drawn to make the map prettier.

SVG

The SVG version is simplier, because no image preloading is needed, since the svg contains only the image source, and the browser will draw it when the image loads.
 


  • The code is almost the same, just that it doesn't preload the images with queue.js
  • Obviously, the canvas tag is replaced by SVG, and to draw, the usual SVG mode is used.
  • To clip, a clipPath is defined with the country shape and an id, so it can be assigned to an image. 
  • The image is added using an image tag
    • xlink:href sets the image link
    • clip-path sets the path used to clip the image
    • preserveAspectRatio has to be set as none, so the image ratio can be changed to make the flag fit the bounding box.

Notes:

To preload images using queue.js I asked this question at StackOverflow, kindly
answered by explunit.

Wednesday, August 14, 2013

Representing population density with D3js


Choropleth maps represent variables colouring (or filling with a pattern) areas in the map.
We were discussing a map with the local OGGeo group in Barcelona, and we all noticed that it had a problem.
The map was about the percentage of people in Catalonia whose home language was Catalan.It showed most of the areas coloured as mostly Catalan-speakers zone, but since the greatest share of the population in Catalonia is around the city of Barcelona, where Spanish is more spoken as primary language, the map could give a wrong impression to people that didn't know the population distribution in Catalonia.

Cartograms are thought to solve this problem, but the author of the map didn't like them, because if the population is very unevenly distributed, the result is really difficult to understand
Map taken from the Wikipedia's cartogram entry

So I tried a pair of things to see if the result was better. The examples code are in a GIST at GitHUB, so they can be visualized at http://bl.ocks.org:

Resizing polygons according to the population density

Every comarca (the regions Catalonia is divided in) is coloured according to the percentage of people able to speak Catalan (yes, I couldn't find the number of people who has Catalan as their first language). When the button is pressed, the coloured polygons shrink according to the population density of the region:
The only comarca that stays with the same size is Barcelonès, since is the one with the highest population density. You can see that the dark-green zones are low populated, so they shrink a lot.
It's very difficult to me to see the color when the polygon gets too small, even though I didn't shrink them linearly, but smoothing it with a square root, so I think that it's not the best solution.
Since the code is very similar to the one at Mike Bostok's example I won't comment it.

Filling with points according to the population density

This example at Kartograph web page gave me the idea for the second example, although I changed it a little using colors.
 In this case, every polygon is filled with circles. The color of the circle represents the magnitude we are showing (% of people able to speak Catalan, in this case), and the size represents the population density.
I think that the result is clearer in this case. The separation between circles and the maximum circle radius can be adjusted to get better results.
I will comment the code, since I had to write it from the scratch.

Basically, for each polygon, the bounding box is calculated, and the region filled with dots. Then, the polygon is used to clip the rectangle of circles.

Point density according to the population density







  •  The first lines set the pointSeparation and maxPointSize and set the color scale.
  • Lines 20 to 31 set the projection and the SVG element.
  • Lines 32 to 47 get the data about the number of speakers and population, and set the maximum and minimum values to be able to adjust colors and circle sizes.
  • Line 49 sets the color scale
  • Lines 53 to 60 sets the masks for each polygon by appending a clipPath
  • Then, from 63 to 69, the points are drawn by calling the function draw_circles for each polygon. To call the function, the each method is used. The clip-path attribute links the masks set before.
  • Lines 74 to 93 contain the function draw circles.
    • The bounding box and centroid are nneded. The first one to get the width and height, and the second to move it at he region center.
    • Once the size is set, the grid is drawn with the two loops
  • lines 95 to 100 draw the borders
  • lines 102 to 120 draw the legend.

Notes

I created a GIST to store all the data for both examples, since It's easier to re-use it.

Monday, August 5, 2013

Creating vectorial isobands with Python

GDAL has an utility program, gdal_contour, that generates contour lines from a raster file (take a look at his example post). GDAL provides access to this method through its API, so contour lines can be generated inside your GDAL python scripts.

Unfortunately, there is no way to generate isobands, that is, filled contour lines that use polygons instead of polylines. Creating isobands allows the creation of nice contour maps easily, colouring the zones between two values:
 
 
Using contour lines, the result is this one:



Both visualizations are drawn directly from QGIS.

So I created two scripts to get the result. As usual, you can download them from my GitHub account.
I created two scripts, since the one I like the most uses matplotlib, so sometimes it's not convenient to install. On the other hand, the other has less requirements (GDAL python), but the code is dirtier.

Both have the same usage, which is very similar to the gdal_contour usage:

    Usage: isobands_matplotlib.py [-h] [-b band] [-off offset] [-i interval]
    [-nln layer_name] [-a attr_name] [-f formatname]
    src_file out_file
    Calculates the isobands from a raster into a vector file

    positional arguments:
      src_file         The raster source file
      out_file         The vectorial out file
    optional arguments:
      -h, --help       show this help message and exit
      -b band          The band in the source file to process (default 1)
      -off offset      The offset to start the isobands (default 0)
      -i interval      The interval (default 0)
      -nln layer_name  The out layer name (default bands)
      -a attr_name     The out layer attribute name (default h)
      -f formatname    The output file format name (default ESRI Shapefile)
Just choose the input and output files, the input file band to process (1 by default), and the interval, and you will get the  vector file in the GIS format of your choice.

Now, the code of both script, step by step.

isobands_matplotlib.py

As it's name suggests,  this version uses matplotlib, and more specifically the contourf function, to create the isobands.

The script has a function called isobands, that actually generates the data, and a main part that reads the user input values:
if __name__ == "__main__":

    PARSER = ArgumentParser(
        description="Calculates the isobands from a raster into a vector file")
    PARSER.add_argument("src_file", help="The raster source file")
    PARSER.add_argument("out_file", help="The vectorial out file")
    PARSER.add_argument("-b", 
        help="The band in the source file to process (default 1)", 
        type=int, default = 1, metavar = 'band')
    PARSER.add_argument("-off", 
        help="The offset to start the isobands (default 0)", 
        type=float, default = 0.0, metavar = 'offset')
    PARSER.add_argument("-i", 
        help="The interval  (default 0)", 
        type=float, default = 0.0, metavar = 'interval')
    PARSER.add_argument("-nln", 
        help="The out layer name  (default bands)", 
        default = 'bands', metavar = 'layer_name')
    PARSER.add_argument("-a", 
        help="The out layer attribute name  (default h)", 
        default = 'h', metavar = 'attr_name')
    PARSER.add_argument("-f", 
        help="The output file format name  (default ESRI Shapefile)", 
        default = 'ESRI Shapefile', metavar = 'formatname')
    ARGS = PARSER.parse_args()

    isobands(ARGS.src_file, ARGS.b, ARGS.out_file, ARGS.f, ARGS.nln, ARGS.a, 
        ARGS.off, ARGS.i)
I used the argparse.ArgumentParser to read the script arguments, since is easy to use and it should come woth most of the updated python distributions (since python 2.7, I think).
This part of the code is quite easy to understand, it simply sets the arguments with their default value, sets if they are mandatory or not, etc.

The isobands function creates the vector file:
def isobands(in_file, band, out_file, out_format, layer_name, attr_name, 
    offset, interval, min_level = None):
    '''
    The method that calculates the isobands
    '''
    ds_in = gdal.Open(in_file)
    band_in = ds_in.GetRasterBand(band)
    xsize_in = band_in.XSize
    ysize_in = band_in.YSize

    geotransform_in = ds_in.GetGeoTransform()

    srs = osr.SpatialReference()
    srs.ImportFromWkt( ds_in.GetProjectionRef() )  

    #Creating the output vectorial file
    drv = ogr.GetDriverByName(out_format)

    if exists(out_file):
        remove(out_file)
    dst_ds = drv.CreateDataSource( out_file )
       
    dst_layer = dst_ds.CreateLayer(layer_name, geom_type = ogr.wkbPolygon, 
        srs = srs)

    fdef = ogr.FieldDefn( attr_name, ogr.OFTReal )
    dst_layer.CreateField( fdef )



    x_pos = arange(geotransform_in[0], 
        geotransform_in[0] + xsize_in*geotransform_in[1], geotransform_in[1])
    y_pos = arange(geotransform_in[3], 
        geotransform_in[3] + ysize_in*geotransform_in[5], geotransform_in[5])
    x_grid, y_grid = meshgrid(x_pos, y_pos)

    raster_values = band_in.ReadAsArray(0, 0, xsize_in, ysize_in)

    stats = band_in.GetStatistics(True, True)
    if min_level == None:
        min_value = stats[0]
        min_level = offset + interval * floor((min_value - offset)/interval)
   
    max_value = stats[1]
    #Due to range issues, a level is added
    max_level = offset + interval * (1 + ceil((max_value - offset)/interval)) 

    levels = arange(min_level, max_level, interval)

    contours = plt.contourf(x_grid, y_grid, raster_values, levels)

    

    for level in range(len(contours.collections)):
        paths = contours.collections[level].get_paths()
        for path in paths:

            feat_out = ogr.Feature( dst_layer.GetLayerDefn())
            feat_out.SetField( attr_name, contours.levels[level] )
            pol = ogr.Geometry(ogr.wkbPolygon)


            ring = None            
            
            for i in range(len(path.vertices)):
                point = path.vertices[i]
                if path.codes[i] == 1:
                    if ring != None:
                        pol.AddGeometry(ring)
                    ring = ogr.Geometry(ogr.wkbLinearRing)
                    
                ring.AddPoint_2D(point[0], point[1])
            

            pol.AddGeometry(ring)
            
            feat_out.SetGeometry(pol)
            if dst_layer.CreateFeature(feat_out) != 0:
                print "Failed to create feature in shapefile.\n"
                exit( 1 )

            
            feat_out.Destroy()
  • Lines 26 to 49 open the input file and create the output vector file.
  • Lines 53 to 57 set two matrices, containing the coordinates for each pixel in the input file. To do it, two vectors are created first with the coordinates calculated from the input geotransform, and then the meshgrid is called to generate the 2d matrices, needed by matplotlib to make its calculations.
  • Lines 59 to 70 read the input file data and calculates the levels inside it, from the interval and the offset. Since calculating isobands for levels not present in the file isn't efficient, the minimum and maximum values in the input file are calculated before the isobands.
  • The line 72 is where the isobands are actually calculated using the contourf function.
  •  Lines 76 to 105 create the polygons at the output vector file.
    • The generated contour has a property called collections, containing every isoband.
    • Each element in the collections has the paths of the isolines.
    • The property codes in the path indicates if a new ring has to be generated.
    • The other lines are to generate an OGR feature 

isobands_gdal.py

This script is quite dirtier, since I have used a trick to get the isobands from the ContourGenerate function.
Closing the contour lines, which is what has to be done to transform isolines to isobands, is not an easy task.
Looking at some D3js examples I got the idea from Jason Davies conrec.js. I can't find the original example now, but basicaly, the idea is to add some pixels around the image with a value lower than the lowest value, so all the isolines will be closed. The solution gets closer, but some problems must be solved:
  • Convert the isolines to polygons
  • Clip the vectors to the original extent
  • Substract the overlapping polygons, since when the lines are converted to polygons, all the area above the isoline value is contained in the polygon even though other lines with higher values are inside
When not clipped, the isolines will create strange effects at the borders

The main part code is exacly the same as in the other script, and the isobands function is:

def isobands(in_file, band, out_file, out_format, layer_name, attr_name, 
    offset, interval, min_level = None):
    '''
    The method that calculates the isobands
    '''    
    #Loading the raster file
    ds_in = gdal.Open(in_file)
    band_in = ds_in.GetRasterBand(band)
    xsize_in = band_in.XSize
    ysize_in = band_in.YSize

    stats = band_in.GetStatistics(True, True)
    
    if min_level == None:
        min_value = stats[0]
        min_level = ( offset + interval * 
            (floor((min_value - offset)/interval) - 1) )
    nodata_value = min_level - interval    



    geotransform_in = ds_in.GetGeoTransform()
    
    srs = osr.SpatialReference()
    srs.ImportFromWkt( ds_in.GetProjectionRef() )  

    data_in = band_in.ReadAsArray(0, 0, xsize_in, ysize_in)


    #The contour memory
    contour_ds = ogr.GetDriverByName('Memory').CreateDataSource('')
    contour_lyr = contour_ds.CreateLayer('contour', 
        geom_type = ogr.wkbLineString25D, srs = srs )
    field_defn = ogr.FieldDefn('ID', ogr.OFTInteger)
    contour_lyr.CreateField(field_defn)
    field_defn = ogr.FieldDefn('elev', ogr.OFTReal)
    contour_lyr.CreateField(field_defn)

    #The in memory raster band, with new borders to close all the polygons
    driver = gdal.GetDriverByName( 'MEM' )
    xsize_out = xsize_in + 2
    ysize_out = ysize_in + 2

    column = numpy.ones((ysize_in, 1)) * nodata_value
    line = numpy.ones((1, xsize_out)) * nodata_value

    data_out = numpy.concatenate((column, data_in, column), axis=1)
    data_out = numpy.concatenate((line, data_out, line), axis=0)

    ds_mem = driver.Create( '', xsize_out, ysize_out, 1, band_in.DataType)
    ds_mem.GetRasterBand(1).WriteArray(data_out, 0, 0)
    ds_mem.SetProjection(ds_in.GetProjection())
    #We have added the buffer!
    ds_mem.SetGeoTransform((geotransform_in[0]-geotransform_in[1],
        geotransform_in[1], 0, geotransform_in[3]-geotransform_in[5], 
        0, geotransform_in[5]))
    gdal.ContourGenerate(ds_mem.GetRasterBand(1), interval, 
        offset, [], 0, 0, contour_lyr, 0, 1)

    #Creating the output vectorial file
    drv = ogr.GetDriverByName(out_format)
    if exists(out_file):
        remove(out_file)
    dst_ds = drv.CreateDataSource( out_file )

      
    dst_layer = dst_ds.CreateLayer(layer_name, 
        geom_type = ogr.wkbPolygon, srs = srs)

    fdef = ogr.FieldDefn( attr_name, ogr.OFTReal )
    dst_layer.CreateField( fdef )


    contour_lyr.ResetReading()

    geometry_list = {}
    for feat_in in contour_lyr:
        value = feat_in.GetFieldAsDouble(1)

        geom_in = feat_in.GetGeometryRef()
        points = geom_in.GetPoints()

        if ( (value >= min_level and points[0][0] == points[-1][0]) and 
            (points[0][1] == points[-1][1]) ):
            if (value in geometry_list) is False:
                geometry_list[value] = []

            pol = ogr.Geometry(ogr.wkbPolygon)
            ring = ogr.Geometry(ogr.wkbLinearRing)

            for point in points:

                p_y = point[1]
                p_x = point[0]
                          
                if p_x < (geotransform_in[0] + 0.5*geotransform_in[1]):
                    p_x = geotransform_in[0] + 0.5*geotransform_in[1]
                elif p_x > ( (geotransform_in[0] + 
                    (xsize_in - 0.5)*geotransform_in[1]) ):
                    p_x = ( geotransform_in[0] + 
                        (xsize_in - 0.5)*geotransform_in[1] )
                if p_y > (geotransform_in[3] + 0.5*geotransform_in[5]):
                    p_y = geotransform_in[3] + 0.5*geotransform_in[5]
                elif p_y < ( (geotransform_in[3] + 
                    (ysize_in - 0.5)*geotransform_in[5]) ):
                    p_y = ( geotransform_in[3] + 
                        (ysize_in - 0.5)*geotransform_in[5] )

                ring.AddPoint_2D(p_x, p_y)
                
  
            pol.AddGeometry(ring)
            geometry_list[value].append(pol)



    values = sorted(geometry_list.keys())

    geometry_list2 = {}

    for i in range(len(values)):
        geometry_list2[values[i]] = []
        interior_rings = []
        for j in range(len(geometry_list[values[i]])):
            if (j in interior_rings) == False:
                geom = geometry_list[values[i]][j]
                
                for k in range(len(geometry_list[values[i]])):
                    
                    if ((k in interior_rings) == False and 
                        (j in interior_rings) == False):
                        geom2 = geometry_list[values[i]][k]
                        
                        if j != k and geom2 != None and geom != None:
                            if geom2.Within(geom) == True:
                                
                                geom3 = geom.Difference(geom2)
                                interior_rings.append(k)
                                geometry_list[values[i]][j] = geom3
                                                            
                            elif geom.Within(geom2) == True:
                                
                                geom3 = geom2.Difference(geom)
                                interior_rings.append(j)
                                geometry_list[values[i]][k] = geom3
                    
        for j in range(len(geometry_list[values[i]])):
            if ( (j in interior_rings) == False and 
                geometry_list[values[i]][j] != None ):
                geometry_list2[values[i]].append(geometry_list[values[i]][j])
    

    for i in range(len(values)):
        value = values[i]
        if value >= min_level:
            for geom in geometry_list2[values[i]]:
                
                if i < len(values)-1:

                    for geom2 in geometry_list2[values[i+1]]:
                        if geom.Intersects(geom2) is True:
                            geom = geom.Difference(geom2)
                
                feat_out = ogr.Feature( dst_layer.GetLayerDefn())
                feat_out.SetField( attr_name, value )
                feat_out.SetGeometry(geom)
                if dst_layer.CreateFeature(feat_out) != 0:
                    print "Failed to create feature in shapefile.\n"
                    exit( 1 )
                feat_out.Destroy()
  • Lines 23 to 44 read the input file, including the data, geotransform, etc.
  • Lines 47 to 54 create a vector file where the contour lines will be written. Since the data must be clipped later, the file is a temporary memory file.
  • Lines 56 to 71 create a memory raster  file with the input data, plus a buffer around it with a value lower than the lowest in the input data, so the contour lines will get closed.
  • Line 74 calls the ContourGenerate function, that will add the contour lines to the vectorial memory file.
  • Lines 77 to 91 create the actual output file
  • Lines 93 to 130 create a collection of geometries with the polygons.
    • The lines are transformed to polygons by creating a ring from the polyline and adding it to a polygon.
    • Lines 113 to 123 clip the polygons to the original geotransform, by moving the coordinates of the points outside the geotransform to the border.
  • Lines 138 to 167 stores the difference from the polygons that overlap, so the output polygons contain only the zones where the values are in the interval.
  • Lines 170 to 187 write the final polygons to the output file, the same way than the other script.

Notes:

An other option would be, of course, creating the polygons myself. I tried to do it, but as you can see at the marching squares wikipedia page, the algorithm is not easy when using polygons, and especially the matplotlib version of the scripts is quite clean.

The scripts have a problem when there is a nodata zone, since I haven't added the code to handle it.

Finally, as asked in the GDAL-dev mailing list, the best option would be modifying the ContourGenerate function to create isobands, but I think that is out of my scope.
The discussion at the GDAL-dev mailing list where I posted the first version. 
 
The data used for the examples is the same I used at the raster classification post.
 
Some links I have used:
http://matplotlib.org/api/path_api.html
http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.contourf
http://matplotlib.org/basemap/api/basemap_api.html#mpl_toolkits.basemap.Basemap.contourf
http://fossies.org/dox/matplotlib-1.2.0/classmatplotlib_1_1contour_1_1QuadContourSet.html