## Saturday, October 26, 2013

### Using EUROSTAT's data with D3.js

EUROSTAT is the European Union office for statistics. Its goal is to provide statistics at European level that enable comparisons between regions and countries.

The NUTS classification (Nomenclature of territorial units for statistics) is a hierarchical system for dividing up the economic territory of the EU. Many of the EUROSTAT data is relative to these regions, and this data is a good candidate to be mapped.

The problem is that EUROSTAT provides some shapefiles in one hand, but without the actual names for the regions, nor other information, and some tables  with the name and population for every region. This data has to be joined to be able to map properly.

In this example, he EUROSTAT's criminality statistics are mapped using these provided files.

You can see the working example at bl.ocks.org, and get all the source code at GitHub.
There is also a second example to see how to change the data easily.

### Organizing the data

At their site, EUROSTAT provides different Shapefiles for different scales. I think that the best option for maps of all Europe is to use NUTS_2010_10M_SH.zip. Using the 1:60 million scale could be an option, but the file doesn't contain the NUTS3 polygons, so only data at country or big region level is possible.

After downloading the polygons, you will see that no names or population are inside the fields, only the NUTS code. So to make a map, we have to label every polygon. To do it, I downloaded  the NUTS2010.zip file, which contains a csv file with the names for every NUTS region. At the web, it's also possible to find the population in a file, looking for the text Annual average population (1 000) by sex - NUTS 3 regions. Then, I made the following script:

'''
Joins a shp file from EUROSTAT with the xls with the region names so
the resulting shp file has the name of every region and the code.
Roger Veciana, oct 2013
'''
from osgeo import ogr
from os.path import exists
from os.path import basename
from os.path import splitext
from os import remove

def create_shp(in_shp, out_shp, csv_data):
print "Extracting data"

in_ds = ogr.Open( in_shp )
if in_ds is None:
print "Open failed.\n"
sys.exit( 1 )
in_lyr = in_ds.GetLayerByName( splitext(basename(in_shp))[0] )
if in_lyr is None:
print "Error opening layer"
sys.exit( 1 )

if exists(out_shp):
remove(out_shp)
driver_name = "ESRI Shapefile"
drv = ogr.GetDriverByName( driver_name )
if drv is None:
print "%s driver not available.\n" % driver_name
sys.exit( 1 )
out_ds = drv.CreateDataSource( out_shp )
if out_ds is None:
print "Creation of output file failed.\n"
sys.exit( 1 )
proj = in_lyr.GetSpatialRef()
##Creating the layer with its fields
out_lyr = out_ds.CreateLayer(
splitext(basename(out_shp))[0], proj, ogr.wkbMultiPolygon )
lyr_def = in_lyr.GetLayerDefn ()
for i in range(lyr_def.GetFieldCount()):
out_lyr.CreateField ( lyr_def.GetFieldDefn(i) )

field_defn = ogr.FieldDefn( "NAME", ogr.OFTString )
out_lyr.CreateField ( field_defn )

field_defn = ogr.FieldDefn( "COUNTRY", ogr.OFTString )
out_lyr.CreateField ( field_defn )

field_defn = ogr.FieldDefn( "COUNTRY_CO", ogr.OFTString )
out_lyr.CreateField ( field_defn )

field_defn = ogr.FieldDefn( "POPULATION", ogr.OFTInteger )
out_lyr.CreateField ( field_defn )

##Writing all the features
for feat_in in in_lyr:
value = feat_in.GetFieldAsString(feat_in.GetFieldIndex('NUTS_ID'))
if value in csv_data:
#print csv_data[value]
feat_out = ogr.Feature( out_lyr.GetLayerDefn())
feat_out.SetField( 'NUTS_ID', value )
feat_out.SetField( 'NAME', csv_data[value]['label'] )
feat_out.SetField( 'POPULATION', csv_data[value]['population'] )
feat_out.SetField( 'COUNTRY_CO', csv_data[value]['id_country'] )
feat_out.SetField( 'COUNTRY', csv_data[csv_data[value]['id_country']]['label'] )
feat_out.SetField( 'STAT_LEVL_', csv_data[value]['level'] )
feat_out.SetField( 'SHAPE_AREA', feat_in.GetFieldAsString(feat_in.GetFieldIndex('SHAPE_AREA')) )
feat_out.SetField( 'SHAPE_LEN', feat_in.GetFieldAsString(feat_in.GetFieldIndex('SHAPE_LEN')) )

feat_out.SetGeometry(feat_in.GetGeometryRef())
out_lyr.CreateFeature(feat_out)
in_ds = None
out_ds = None

'''
Reads the NUTS csv population file and returns the data in a dict
'''
csv_data = {}
f = open(csv_file, "r")
for line in f:
line_data = line.split(";")
try:
csv_data[line_data[0]] = int(float(line_data[4]) * 1000)
except Exception, e:
csv_data[line_data[0]] = -9999
f.close

return csv_data
'''
Reads a NUTS csv file and returns the data in a dict
'''
csv_data = {}
f = open(csv_file, "r")
for line in f:
line_data = line.split(";")
csv_data[line_data[0]] = {'label': line_data[1],
'level': line_data[2],
'id_country': line_data[3],
'code_country': line_data[4],
'population': 0}
if line_data[0] in population_data:
csv_data[line_data[0]]['population'] = population_data[line_data[0]]

f.close

return csv_data

if __name__ == '__main__':
create_shp('NUTS_2010_10M_SH/Data/NUTS_RG_10M_2010.shp', 'regions.shp', csv_data)

• Basically, we iterate for every polygon in the shp file, and look into the csv data for the attributes to add.
• Population data and region name are in tow separate files. To read them, there are two functions: read_population and read_names.
• create_shp function reads the geometries, joins them with the population and name, and finally writes the output file.
• Notice that NUTS regions overlap. There are levels 1, 2 and 3. The first one is at country level, the second takes big areas from the countries, and the third one are the actual departments in every country. To map any data is necessary to choose the level at which the map has to be represented, or mix the different levels if it's necessary, as in this case.
To create a map, I chose the criminality data. The file is again at EUROSTAT, under the text Crimes recorded by the police - NUTS 3 regions. Despite its name, some regions are NUTS3, and some NUTS2. Most of  the files from the site labeled with NUTS regions should work with only little changes in the map code.

### Creating a D3.js map to show the data

To draw the data in a map I've used a similar solution as the one in the post D3js Electoral Map:

Crimes recorded by the police - NUTS 3 regions

.background {
fill: #fff;
stroke: #ccc;
}

.tooltip{ background-color:rgba(200,200,200,0.5);;
margin: 10px;
height: 90px;
width: 150px;
}

Crimes recorded by the police - NUTS 3 regions

var width = 600,
height = 600;

var projection = d3.geo.stereographic()
.center([3.9,43.0])
.scale(1000)
.translate([width / 4 , height / 2]);

var path = d3.geo.path()
.projection(projection);

var svg = d3.select("#map").append("svg")
.attr("width", 960)
.attr("height", 500);

var domain = [0,5]
var color = d3.scale.linear().domain(domain).range(['blue', 'red']);

d3.csv("crim_gen_reg_1_Data.csv", function(stats) {
data = {};
stats.forEach(function(d) {
if (d.TIME == 2010){
data[d.GEO] = d.Value;
}
});

d3.json("/rveciana/raw/5919944/regions.json", function(error, europe) {
svg.selectAll(".region")
.data(topojson.feature(europe, europe.objects.regions).features)
.enter()
.append("path")
.filter(function(d) { return !isNaN(parseFloat(data[d.properties.NUTS_ID])); })
.attr("class", "region")
.attr("d", path)
.style("stroke", "#999")
.style("stroke-width", 0.2)
.style("fill", function(d) {
if (!isNaN(parseFloat(data[d.properties.NUTS_ID])))
return color(100000*data[d.properties.NUTS_ID]/d.properties.POPULATION);
else
return "#999";
})
.style("opacity", function(d){
if (!isNaN(parseFloat(data[d.properties.NUTS_ID])))
return 1;
else
return 0;
})

function tooltipText(d){
if (isNaN(parseFloat(data[d.properties.NUTS_ID]))) {
var crimes = "No Data";
} else {
var crimes = data[d.properties.NUTS_ID];
}
return "<b>" + d.properties.NAME + "</b>"
+ "<br/> pop: " + d.properties.POPULATION
+ "<br/> crimes: " + crimes;
}
svg.call(legend({width:300, posx: 100, posy:400, elements: 6, domain: domain, title:"Number of crimes / 100 000 inhabitants"}));
});
});

Source: eurostat


• Lines 5 to 18 give style to the map background and the tooltip
• Lies 29 to 43 set up the map, with its size, projection and svg element
• Line 45 sets the color scale (this is different to the electoral map example). The colors will vary from blue to red. Discrete colors are an other option. Just changing this line, you can change the whole map colors.
• Lines 47 to 53 read and parse the csv with the criminality data into  an object (we will only use 2010 data)
• Then lines 56 to 78, the map is actually drawn. Note
• Line 60 filters only the regions that have data in the CSV file. For some countries, this is the NUTS2 region (see Italy) and for others, NUTS3 (like Germany). The following lines are only run if there is data (see D3js filter)
• Line 66 sets the color of the region. Note that calls the scale defined at line 45, and calculates the number by multiplying the crimes by 100000 and dividing by the region population to get the density
• Line 71 sets the region opacity to 0 if there is no data for it. With the filter could be avoided
• Lines 78 and 81 set the tooltip box, taken from this example.
• Line 98 calls the function that draws the legend. The function is in a separate file to make it easy to reuse

var legend = function(accessor){
return function(selection){

//Draw the legend for the map
var legend = selection.append("g");
var legendText = accessor.title;
var numSquares = accessor.elements;
var xLegend = 0;
var yLegend = 0;
var widthLegend = 400;

var title_g = legend.append("g");
var values_g = legend.append("g");

var legendTitle = title_g.append("text")
.text("Legend")
.attr("font-family", "sans-serif")
.attr("font-size", "18px")
.attr("fill", "black");
var bbox = legendTitle.node().getBBox();
legendTitle.attr("x", xLegend + widthLegend/2 - bbox.width/2);
legendTitle.attr("y", yLegend + 20);

var legendTextEl = title_g.append("text")
.text(legendText)
.attr("y", yLegend + 75)
.attr("font-family", "sans-serif")
.attr("font-size", "14px")
.attr("fill", "black");
var bbox = legendTextEl.node().getBBox();
legendTextEl.attr("x", xLegend + widthLegend/2 - bbox.width/2);

for (i=0; i < numSquares; i++){
values_g.append("rect")
.attr("x", xLegend + (i+0.1*i/numSquares)*(widthLegend/numSquares))
.attr("y", yLegend + 25)
.attr("width", (widthLegend/numSquares)*0.9)
.attr("height", 20)
.attr("fill", color(accessor.domain[0] + i * accessor.domain[1]/(numSquares-1)));
var value_text = values_g.append("text")
.text(accessor.domain[0] + i * accessor.domain[1]/(numSquares-1))
.attr("y", yLegend + 55)
.attr("font-family", "sans-serif")
.attr("font-size", "12px")
.attr("fill", "black");
var bbox = value_text.node().getBBox();
value_text
.attr("x", xLegend + (i+0.1*i/numSquares)*(widthLegend/numSquares) + (widthLegend/numSquares)*(0.9/2)- bbox.width/2);

}
var scale = accessor.width / 400;
legend.attr("transform","translate("+accessor.posx+","+accessor.posy+") scale("+scale+") ");

};
};


•  Note how the function is called. The argument accessor is the one thas have the object properties (if called from a geometry, for instance), and selection is the node where the function is called from.
• The code is not very D3jsnic, using things as a for loop but it works and can be reused for drawing other legends from a D3js scale.

### Related posts and sites

1. Hi Roger,
Have you considered James Fee over at planetgs.com to link to your very useful blog?
JAMES.FEE@GMAIL.COM

Karl Zinglersen

2. Thank you for your idea, Karl

3. Hi Roger,
Thanks for sharing your methodology. In order to make web maps based on NUTS regions, you could also consider using this: https://github.com/jgaffuri/Nuts2json
DB

1. Thank you for the link, I think it's much easier with it.