Saturday, October 26, 2013

Using EUROSTAT's data with D3.js

EUROSTAT is the European Union office for statistics. Its goal is to provide statistics at European level that enable comparisons between regions and countries.

The NUTS classification (Nomenclature of territorial units for statistics) is a hierarchical system for dividing up the economic territory of the EU. Many of the EUROSTAT data is relative to these regions, and this data is a good candidate to be mapped.

The problem is that EUROSTAT provides some shapefiles in one hand, but without the actual names for the regions, nor other information, and some tables  with the name and population for every region. This data has to be joined to be able to map properly.

In this example, he EUROSTAT's criminality statistics are mapped using these provided files.

You can see the working example at bl.ocks.org, and get all the source code at GitHub.
There is also a second example to see how to change the data easily.

Organizing the data

At their site, EUROSTAT provides different Shapefiles for different scales. I think that the best option for maps of all Europe is to use NUTS_2010_10M_SH.zip. Using the 1:60 million scale could be an option, but the file doesn't contain the NUTS3 polygons, so only data at country or big region level is possible.

After downloading the polygons, you will see that no names or population are inside the fields, only the NUTS code. So to make a map, we have to label every polygon. To do it, I downloaded  the NUTS2010.zip file, which contains a csv file with the names for every NUTS region. At the web, it's also possible to find the population in a file, looking for the text Annual average population (1 000) by sex - NUTS 3 regions. Then, I made the following script:

'''
Joins a shp file from EUROSTAT with the xls with the region names so
the resulting shp file has the name of every region and the code.
Roger Veciana, oct 2013
'''
from osgeo import ogr
from os.path import exists
from os.path import basename
from os.path import splitext
from os import remove

def create_shp(in_shp, out_shp, csv_data):
    print "Extracting data"
    
    in_ds = ogr.Open( in_shp )
    if in_ds is None:
        print "Open failed.\n"
        sys.exit( 1 )
    in_lyr = in_ds.GetLayerByName( splitext(basename(in_shp))[0] )
    if in_lyr is None:
        print "Error opening layer"
        sys.exit( 1 )

    if exists(out_shp):
        remove(out_shp)
    driver_name = "ESRI Shapefile"
    drv = ogr.GetDriverByName( driver_name )
    if drv is None:
        print "%s driver not available.\n" % driver_name
        sys.exit( 1 )
    out_ds = drv.CreateDataSource( out_shp )
    if out_ds is None:
        print "Creation of output file failed.\n"
        sys.exit( 1 )
    proj = in_lyr.GetSpatialRef()
    ##Creating the layer with its fields
    out_lyr = out_ds.CreateLayer( 
        splitext(basename(out_shp))[0], proj, ogr.wkbMultiPolygon )
    lyr_def = in_lyr.GetLayerDefn ()
    for i in range(lyr_def.GetFieldCount()):
        out_lyr.CreateField ( lyr_def.GetFieldDefn(i) )

    field_defn = ogr.FieldDefn( "NAME", ogr.OFTString )
    out_lyr.CreateField ( field_defn )

    field_defn = ogr.FieldDefn( "COUNTRY", ogr.OFTString )
    out_lyr.CreateField ( field_defn )

    field_defn = ogr.FieldDefn( "COUNTRY_CO", ogr.OFTString )
    out_lyr.CreateField ( field_defn )

    field_defn = ogr.FieldDefn( "POPULATION", ogr.OFTInteger )
    out_lyr.CreateField ( field_defn )
    

    ##Writing all the features
    in_lyr.ResetReading()
    for feat_in in in_lyr:
        value = feat_in.GetFieldAsString(feat_in.GetFieldIndex('NUTS_ID'))
        if value in csv_data:
            #print csv_data[value]
            feat_out = ogr.Feature( out_lyr.GetLayerDefn())
            feat_out.SetField( 'NUTS_ID', value )
            feat_out.SetField( 'NAME', csv_data[value]['label'] )
            feat_out.SetField( 'POPULATION', csv_data[value]['population'] )
            feat_out.SetField( 'COUNTRY_CO', csv_data[value]['id_country'] )
            feat_out.SetField( 'COUNTRY', csv_data[csv_data[value]['id_country']]['label'] )
            feat_out.SetField( 'STAT_LEVL_', csv_data[value]['level'] )
            feat_out.SetField( 'SHAPE_AREA', feat_in.GetFieldAsString(feat_in.GetFieldIndex('SHAPE_AREA')) )
            feat_out.SetField( 'SHAPE_LEN', feat_in.GetFieldAsString(feat_in.GetFieldIndex('SHAPE_LEN')) )

            feat_out.SetGeometry(feat_in.GetGeometryRef())
            out_lyr.CreateFeature(feat_out)
    in_ds = None
    out_ds = None   

def read_population(csv_file):
    '''
    Reads the NUTS csv population file and returns the data in a dict
    '''
    csv_data = {}
    f = open(csv_file, "r")
    f.readline(); #Skip header
    for line in f:
        line_data = line.split(";")
        try:
            csv_data[line_data[0]] = int(float(line_data[4]) * 1000)
        except Exception, e:
            csv_data[line_data[0]] = -9999
    f.close

    return csv_data
def read_names(csv_file, population_data):
    '''
    Reads a NUTS csv file and returns the data in a dict
    '''
    csv_data = {}
    f = open(csv_file, "r")
    f.readline(); #Skip header
    for line in f:
        line_data = line.split(";")
        csv_data[line_data[0]] = {'label': line_data[1],
            'level': line_data[2],
            'id_country': line_data[3],
            'code_country': line_data[4],
            'population': 0}
        if line_data[0] in population_data:
            csv_data[line_data[0]]['population'] = population_data[line_data[0]]

    f.close

    return csv_data

if __name__ == '__main__':
    population_data = read_population('demo_r_d3avg_1_Data.csv')
    csv_data = read_names('NUTS_2010.csv', population_data)
    create_shp('NUTS_2010_10M_SH/Data/NUTS_RG_10M_2010.shp', 'regions.shp', csv_data)
  • Basically, we iterate for every polygon in the shp file, and look into the csv data for the attributes to add. 
  • Population data and region name are in tow separate files. To read them, there are two functions: read_population and read_names.
  • create_shp function reads the geometries, joins them with the population and name, and finally writes the output file.
  • Notice that NUTS regions overlap. There are levels 1, 2 and 3. The first one is at country level, the second takes big areas from the countries, and the third one are the actual departments in every country. To map any data is necessary to choose the level at which the map has to be represented, or mix the different levels if it's necessary, as in this case.
To create a map, I chose the criminality data. The file is again at EUROSTAT, under the text Crimes recorded by the police - NUTS 3 regions. Despite its name, some regions are NUTS3, and some NUTS2. Most of  the files from the site labeled with NUTS regions should work with only little changes in the map code.

Creating a D3.js map to show the data

To draw the data in a map I've used a similar solution as the one in the post D3js Electoral Map:

Crimes recorded by the police - NUTS 3 regions


Crimes recorded by the police - NUTS 3 regions

  • Lines 5 to 18 give style to the map background and the tooltip
  • Lies 29 to 43 set up the map, with its size, projection and svg element
  • Line 45 sets the color scale (this is different to the electoral map example). The colors will vary from blue to red. Discrete colors are an other option. Just changing this line, you can change the whole map colors.
  • Lines 47 to 53 read and parse the csv with the criminality data into  an object (we will only use 2010 data)
  • Line 55 downloads the TopoJSON containing the NUTS regions
  • Then lines 56 to 78, the map is actually drawn. Note
    • Line 60 filters only the regions that have data in the CSV file. For some countries, this is the NUTS2 region (see Italy) and for others, NUTS3 (like Germany). The following lines are only run if there is data (see D3js filter)
    • Line 66 sets the color of the region. Note that calls the scale defined at line 45, and calculates the number by multiplying the crimes by 100000 and dividing by the region population to get the density
    • Line 71 sets the region opacity to 0 if there is no data for it. With the filter could be avoided
    • Lines 78 and 81 set the tooltip box, taken from this example.
  • Line 98 calls the function that draws the legend. The function is in a separate file to make it easy to reuse
 
var legend = function(accessor){
    return function(selection){
      
      //Draw the legend for the map
      var legend = selection.append("g");
      var legendText = accessor.title;
      var numSquares = accessor.elements;
      var xLegend = 0;
      var yLegend = 0;
      var widthLegend = 400;
      
      var title_g = legend.append("g");
      var values_g = legend.append("g");
      
      var legendTitle = title_g.append("text")
        .text("Legend")
        .attr("font-family", "sans-serif")
        .attr("font-size", "18px")
        .attr("fill", "black");
        var bbox = legendTitle.node().getBBox();
        legendTitle.attr("x", xLegend + widthLegend/2 - bbox.width/2);
        legendTitle.attr("y", yLegend + 20);
        
      var legendTextEl = title_g.append("text")
        .text(legendText)
        .attr("y", yLegend + 75)
        .attr("font-family", "sans-serif")
        .attr("font-size", "14px")
        .attr("fill", "black");
      var bbox = legendTextEl.node().getBBox();
      legendTextEl.attr("x", xLegend + widthLegend/2 - bbox.width/2);

      for (i=0; i < numSquares; i++){
        values_g.append("rect")
          .attr("x", xLegend + (i+0.1*i/numSquares)*(widthLegend/numSquares))
          .attr("y", yLegend + 25)
          .attr("width", (widthLegend/numSquares)*0.9)
          .attr("height", 20)
          .attr("fill", color(accessor.domain[0] + i * accessor.domain[1]/(numSquares-1)));
        var value_text = values_g.append("text")
          .text(accessor.domain[0] + i * accessor.domain[1]/(numSquares-1))
          .attr("y", yLegend + 55)
          .attr("font-family", "sans-serif")
          .attr("font-size", "12px")
          .attr("fill", "black");
        var bbox = value_text.node().getBBox();
        value_text
          .attr("x", xLegend + (i+0.1*i/numSquares)*(widthLegend/numSquares) + (widthLegend/numSquares)*(0.9/2)- bbox.width/2);

      }
      var scale = accessor.width / 400;
      legend.attr("transform","translate("+accessor.posx+","+accessor.posy+") scale("+scale+") ");

};
};

  •  Note how the function is called. The argument accessor is the one thas have the object properties (if called from a geometry, for instance), and selection is the node where the function is called from.
  • The code is not very D3jsnic, using things as a for loop but it works and can be reused for drawing other legends from a D3js scale.

Saturday, October 5, 2013

Castor project's earthquakes map with D3js

The Castor project is a submarine natural gas storage facility located in front of the eastern Iberian Peninsula coast. The idea is to store the Algerian gas inside an old oilfield cavity. At least this is what I understood (sorry, geologists).
Somehow, when the facility started working, a series of earthquakes have started to occur. At the beginning, the platform owners said it wasn't related to their activity, but now everybody agrees that it is, and the activity has stopped, but not the earthquakes.

I didn't find a nice map about the earthquakes epicenters, so I thought that D3js would be a good option to do it.


The star represents the aproximate platform location, and the circles, the epicenters. It's easy to see why they are related to the platform activity.



The animated map is at my bl.ocks.org page. The explanations are in Catalan, but basically say the same as here.

Getting the data

The data about the significant earthquakes around Catalonia can be found at the Catalan Geologic Institute web site, but the format for the reports is not very convenient to get the data, so I made this python script to get it:

# -*- coding: utf-8 -*-
#http://jramosgarcia.wordpress.com/2013/10/01/que-es-el-proyecto-castor/
import urllib
import json
import dateutil.parser
import datetime

def get_data(url):
    filehandle = urllib.urlopen(url)
    html = filehandle.read()
    filehandle.close()

    lines = html.splitlines()

    for i in range(len(lines)):
    
        if lines[i].find('Latitud') > 0:
            lat = lines[i].strip().split(" ")[1].replace("º","")
        if lines[i].find('Longitud') > 0:
            lon = lines[i].strip().split(" ")[1].replace("º","")   
        if lines[i].find('mol del dia') > 0:
            date = lines[i + 1].strip().replace(" >/div<","")
        if lines[i].find('Hora origen') > 0:
            hour = lines[i].strip().split(" ")[4]                   
        if lines[i].find('Magnitud') > 0:
            magnitude = lines[i+1].strip().split(" ")[0]

    date_array  = date.split("/")
    hour_array = hour.split(":")

    date_time = datetime.datetime(int(date_array[2]),int(date_array[1]),int(date_array[0]),int(hour_array[0]), int(hour_array[1]))

    data = {'lat':lat, 'lon':lon, 'datetime': date_time.isoformat(), 'magnitude': magnitude}
    return json.dumps(data)

if __name__ == "__main__":
    url_list = [
    'http://www.igc.cat/web/gcontent/ca/sismologia/sismescomact/comhistcat/20130910175510/comact.html',
    'http://www.igc.cat/web/gcontent/ca/sismologia/sismescomact/comhistcat/20130913095842/comact.html',
    'http://www.igc.cat/web/gcontent/ca/sismologia/sismescomact/comhistcat/20130918142943/comact.html',
    'http://www.igc.cat/web/gcontent/ca/sismologia/sismescomact/comhistcat/20130920104607/comact.html',
    'http://www.igc.cat/web/gcontent/ca/sismologia/sismescomact/comhistcat/20130924091301/comact.html',
    'http://www.igc.cat/web/gcontent/ca/sismologia/sismescomact/comhistcat/20130925125029/comact.html',
    'http://www.igc.cat/web/gcontent/ca/sismologia/sismescomact/comhistcat/20130929084140/comact.html',
    'http://www.igc.cat/web/gcontent/ca/sismologia/sismescomact/comhistcat/20130929192416/comact.html',
    'http://www.igc.cat/web/gcontent/ca/sismologia/sismescomact/comhistcat/20130930005900/comact.html',
    'http://www.igc.cat/web/gcontent/ca/sismologia/sismescomact/comhistcat/20130930051316/comact.html',
    'http://www.igc.cat/web/gcontent/ca/sismologia/sismescomact/comhistcat/20131001045206/comact.html',
    'http://www.igc.cat/web/gcontent/ca/sismologia/sismescomact/comhistcat/20131001055709/comact.html',
    'http://www.igc.cat/web/gcontent/ca/sismologia/sismescomact/comhistcat/20131002121626/comact.html',
    'http://www.igc.cat/web/gcontent/ca/sismologia/sismescomact/comhistcat/20131002232928/comact.html',
    'http://www.igc.cat/web/gcontent/ca/sismologia/sismescomact/comhistcat/20131003012732/comact.html',
    'http://www.igc.cat/web/gcontent/ca/sismologia/sismescomact/comhistcat/20131003031301/comact.html',
    'http://www.igc.cat/web/gcontent/ca/sismologia/sismescomact/comhistcat/20131004113222/comact.html',
    'http://www.igc.cat/web/gcontent/ca/sismologia/sismescomact/comhistcat/20131004120323/comact.html'
    ]

    f = open("data.json","w")
    f.write("[")
    json_data = ""
    for url in url_list:
     json_data = json_data + get_data( url ) + ", "
       
    f.write(json_data[0:-2])
    f.write("]")
    f.close()
   


  • The web pages are in the list, since the interesting reports have to be chosen one by one. It would be nice to have a better way to do it.
  • Then, the get_data function just parses the text in the way that all the reports are parsed properly.The data is stored in a json file to make easier it's use from D3js.

Using D3js to visualize the data

I used this example by Mike Bostock to create the background  map. The tiles origin has been changed because the example tiles are not available at this zoom level, and to have more points of interest to situate the earthquake locations.

This is the code:





<script>
/**
Based on Mike Bostock's http://bl.ocks.org/mbostock/4150951
*/
var width = 960,
    height = 500;

var projection = d3.geo.mercator()
    .center([0.5491, 40.4942])
    .scale(20000);

var path = d3.geo.path()
    .projection(projection);

var tile = d3.geo.tile()
    .scale(projection.scale() * 2 * Math.PI)
    .translate(projection([0, 0]))
    .zoomDelta((window.devicePixelRatio || 1) - .5);

var svg = d3.select("body").append("svg")
    .attr("width", width)
    .attr("height", height);

  var tiles = tile();

  var defs = svg.append("defs");

var magnitudeScale = d3.scale.linear().domain([2,5]).range([5,30]);
d3.json("data.json", function(error, locations) {

  svg.append("g")
      
    .selectAll("image")
      .data(tiles)
    .enter().append("image")
      .attr("xlink:href", function(d) { return "http://" + ["a", "b", "c", "d"][Math.random() * 4 | 0] + ".tiles.mapbox.com/v3/examples.map-vyofok3q/" + d[2] + "/" + d[0] + "/" + d[1] + ".png"; })
      .attr("width", Math.round(tiles.scale) + 1)
      .attr("height", Math.round(tiles.scale) + 1)
      .attr("x", function(d) { return Math.round((d[0] + tiles.translate[0]) * tiles.scale); })
      .attr("y", function(d) { return Math.round((d[1] + tiles.translate[1]) * tiles.scale); });

  svg.append("g")
      .append('path')
      .attr("d","m 0,0 -8.47858,-5.22254 -8.31623,5.47756 2.34696,-9.67752 -7.77927,-6.21653 9.92909,-0.75852 3.50829,-9.31953 3.78972,9.20873 9.94748,0.45679 -7.58687,6.44982 z")
      .attr("stroke","black")
      .attr("stroke-width",2)
      .style("fill", d3.rgb(90, 90, 90))
      .attr("transform", "translate("+projection([0.66879, 40.33503])[0]+","+projection([0.66879, 40.33503])[1]+")");
  
  var locationsG = svg.append("g");
  

  function addLocation(loc){
    locationsG.append('circle')
      .attr('class','location')
      .attr("r", 5)
      .attr("cx", projection([loc.lon, loc.lat])[0])
      .attr("cy", projection([loc.lon, loc.lat])[1])
      .style("fill", d3.rgb(255, 0, 0).darker(2))
      .style("opacity", 0.8)
      .transition()
      .duration(1000)
      .attr("r", magnitudeScale(loc.magnitude))
      .transition()
      .delay(2000)
      .duration(2000)
      .style("opacity",0.3);

    locationsG
      .append("text")
      .text(loc.magnitude)
      .attr("x", projection([loc.lon, loc.lat])[0] - 10)
      .attr("y", projection([loc.lon, loc.lat])[1] + 5)
      .attr("font-family", "sans-serif")
      .attr("font-size", "12px")
      .attr("fill", "black")
      .style("opacity",0)
      .transition()
      .duration(1000)
      .style("opacity",1)
      .transition()
      .delay(2000)
      .duration(2000)
      .style("opacity",0);
  }

  //addLocation({"lat": "40.43", "magnitude": "2.7", "lon": "0.7", "datetime": "2013-10-09T06:43:16"});
  
  var intDate = new Date("2013-09-10T00:00:00Z");
  var maxDate = new Date("2013-10-04T00:00:00Z");
  var usedLocations = new Array();

  var dateTitle = svg
    .append("text")
    .attr("id", "dataTitle")
    .text(intDate.toLocaleDateString())
    .attr("x", 70)
    .attr("y", 20)
    .attr("font-family", "sans-serif")
    .attr("font-size", "20px")
    .attr("fill", "black");

  var interval = setInterval(function() {

    dateTitle.text(intDate.toLocaleDateString());

    for (i = 0; imaxDate){
        clearInterval(interval);
      }

    }
    
    intDate.setDate(intDate.getDate() + 1);
  }, 1000);
  
});



  •  Lines 30 and 48 are the most important ones to create the tile map. In this case, no zoom or pan is set, so the code is quite simple.
  • The line 57 creates the star indicating the platform location. I made the star using Inkscape and captured the code using the tool to see the generated svg. To situate the symbol on the map, a transform attribute is used, translating the symbol to the pixel calculated projecting the location coordinates.
  • To add the earthquake epicenters:
    • The function addLocation (line 68) adds the circle and the text indicating the earthquake magnitude. To locate them, the projection function is used again. Two transitions are used to make the circle grow and then get brighter. The same for making the text disappear.
    • An interval is created (line 118) to change the date (one day per second). For every iteration, the date is evaluated and the earthquakes occured during this day plotted using the addLocation function. I didn't find a "more D3js" solution. The earthquakes already plotted are stored in an array to avoid plotting them more than once.
      The date on the map is also changed for every iteration.

Some related posts and pages: