{"id":342,"date":"2015-05-01T15:52:43","date_gmt":"2015-05-01T15:52:43","guid":{"rendered":"http:\/\/www.reades.com\/?p=342"},"modified":"2015-05-01T15:52:43","modified_gmt":"2015-05-01T15:52:43","slug":"hex-binning-land-registry-data","status":"publish","type":"post","link":"http:\/\/www.reades.com\/wp\/?p=342","title":{"rendered":"Hex Binning Land Registry Data"},"content":{"rendered":"<p>\t\t\t\tOne of the known problems with choropleth maps is that small zones, even if they contain very significant values, tend to get lost in amongst much larger zones. A current example is that the ridings in London are much smaller than those outside of London, so it can be hard to tell what&#8217;s happening in the capital if you are looking at a map of the entire UK.\u00a0One solution to this is the hexagonal bin.<!--more--><\/p>\n<p>The idea behind spatial &#8216;binning&#8217; (<em>i.e.<\/em> grouping things together into &#8216;buckets&#8217; or &#8216;bins&#8217;)\u00a0is that, rather than try to show everything at once, you impose a regular grid on top of your map and calculate a useful metric to summarise whatever fell within. This way,\u00a0every zone on your map is the same size and it&#8217;s easier to see the overall pattern. My former colleague, Duncan Smith of UCL, has done a very\u00a0nice job with hexagonal bins on the <a href=\"http:\/\/luminocity3d.org\/index.html#population_density_2011\/10\/51.4489\/-0.1284\" target=\"_blank\" rel=\"noopener\">Luminocity3D<\/a> web site.<\/p>\n<p><a href=\"http:\/\/www.reades.com\/wp-content\/uploads\/2015\/05\/Screen-Shot-2015-05-01-at-16.18.34.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-344\" src=\"http:\/\/www.reades.com\/wp-content\/uploads\/2015\/05\/Screen-Shot-2015-05-01-at-16.18.34.png\" alt=\"Luminocity3D\" width=\"483\" height=\"339\" \/><\/a><\/p>\n<p>Of course, this isn&#8217;t a perfect solution since we still haven&#8217;t resolved the problem that a large zone might cover several hexagons and so <em>still<\/em> look much more important than the single hexagon containing, for example, the City of London and the West End. <a href=\"http:\/\/www.theguardian.com\/politics\/ng-interactive\/2015\/apr\/20\/election-2015-constituency-map\" target=\"_blank\" rel=\"noopener\">The Guardian<\/a> has a particularly neat solution in the context of the election since they&#8217;ve combined it with a cartogram, in which zones are scaled according to some property.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-343\" src=\"http:\/\/www.reades.com\/wp-content\/uploads\/2015\/05\/Screen-Shot-2015-05-01-at-16.18.57.png\" alt=\"Guardian Cartogram\" width=\"383\" height=\"570\" \/><\/p>\n<p>So I&#8217;ve been wanting to experiment with this approach for a while&#8230; below are the results for property affordability in 1997 and 2012 using postcode-level data and 2,500m hexagons. Production\u00a0details below.<\/p>\n<p><a href=\"http:\/\/www.reades.com\/wp-content\/uploads\/2015\/05\/1997-and-2012-2.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-345\" src=\"http:\/\/www.reades.com\/wp-content\/uploads\/2015\/05\/1997-and-2012-2-1024x724.png\" alt=\"Affordability Changes\" width=\"560\" height=\"396\" \/><\/a><\/p>\n<h3>Production Info<\/h3>\n<p>These maps were produced using a mix of QGIS, Python, and Postgres\/PostGIS.<\/p>\n<h4>Overlay Grid<\/h4>\n<p>Using the MMQGIS plug-in for QGIS select &#8220;Create&#8221; then &#8220;Create Grid Lines Layer&#8221;. Since I&#8217;m working with the Ordnance Survey&#8217;s British National Grid I could specify my Left X and Bottom Y as 0, and the width and height to the entire UK (roughly: 667500 and 1224000). Then set the H Spacing to your desired &#8216;bin size&#8217; in metres.<\/p>\n<p>You can then clip the bins to the UK shoreline by doing a spatial join (an intersection) either directly within QGIS or within the Postgres database. The latter is much, much faster. For these maps I used 2,500m bins. I am currently trying to clip a 750m bin that would give me higher resolution but that doesn&#8217;t seem to have completed as of 4 hours ago.<\/p>\n<h4>Adding Pricing Data<\/h4>\n<p>Although the Land Registry provides unit address-level data, the easiest resolution to work with is the postcode (<em>e.g.<\/em> WC2R 2LS). The Ordnance Survey now provides a <a href=\"https:\/\/www.ordnancesurvey.co.uk\/business-and-government\/products\/code-point-open.html\" target=\"_blank\" rel=\"noopener\">Code-Point Open<\/a> data set that contains postcode centroids which is rather useful for this task. We will &#8216;map&#8217; every transaction to a postcode centroid, and then calculate a median for all transactions falling within each hexagonal bin. This isn&#8217;t perfect, but it&#8217;s the only computationally feasible one.<\/p>\n<p>I suppose that, if you were a very patient person, you could do this step within QGIS as well but I am\u00a0simply\u00a0not <em>that<\/em> patient. I&#8217;m still working on documenting and releasing some ETL code to process LandReg data, but for now that&#8217;s beyond the scope of this post so I&#8217;ll just assume that you&#8217;ve managed to load the data in such a way as to preserve the columns and can now query and group your data by postcode and year.<\/p>\n<h4>Affordability<\/h4>\n<p>The measure of affordability I&#8217;m using is derived from Median Gross Household Income for the entire UK. This is a reasonable measure since it&#8217;s closest to what the banks are looking at when they&#8217;re thinking about what to lend to a household in order to purchase a property. Historically, 3 times gross income was considered a fairly safe bet, but more recently these ratios have climbed rather steeply.<\/p>\n<p>The 1997 and 2012 incomes\u00a0have to be inflation-adjusted since the Land Registry doesn&#8217;t inflation-adjust the price paid. I wrote a function to do this for me directly within Postgres:<\/p>\n<p>[code]<\/p>\n<p>CREATE OR REPLACE FUNCTION RPIConversion (<br \/>\ni_price float8,<br \/>\ni integer,<br \/>\nj integer<br \/>\n)<br \/>\nRETURNS numeric<br \/>\nAS $$<br \/>\nDECLARE<br \/>\nadjusted numeric := 0; &#8212; total area<br \/>\nBEGIN<br \/>\nSELECT i_price * (SELECT rpi_idx FROM inflation.rpi_fct WHERE yr=j)\/(SELECT rpi_idx FROM inflation.rpi_fct WHERE yr=i) INTO adjusted;<br \/>\nRETURN ROUND(adjusted::numeric,2);<br \/>\nEND;<br \/>\n$$ LANGUAGE plpgsql;<br \/>\nALTER FUNCTION RPIConversion(float8, integer, integer) OWNER TO postgres;<br \/>\n[\/code]<\/p>\n<p>Note that this version of the function assumes that your database has an\u00a0<em>inflation<\/em> schema with a RPI (<code>rpi_fct<\/code>) table with the following layout:<\/p>\n<p>[code]<\/p>\n<p>yr (smallint)<\/p>\n<p>rpi_idx (numeric)<\/p>\n<p>[\/code]<\/p>\n<h4>Materialised Views<\/h4>\n<p>They&#8217;re not often used, but materialised views are a rather handy feature of good databases \u2013 the look and act like real tables, but they are dynamically created\u00a0using a SQL statement instead of the full CREATE TABLE command. The principal difference between Materialised and Non-Materialised (Immaterialised?) Views is that the latter are slower but update dynamically. In other words, if you add more data to one of the underlying tables involved in a Non-Materialised View then the view is also updated.<\/p>\n<p>Anyway, these play nicely with QGIS&#8217; Postgres tools, and the results can be added as layers that are as quick to work with as a local shape file. I&#8217;ve parameterised the query below so that I can create annual views for the entire data set by simply running a Python script (<em>i.e.<\/em> all &#8216;{}&#8217; are substituted with the &#8216;current&#8217; year&#8217;): to process all 20 years&#8217; worth of data took approximately 10 minutes and the results can be added to QGIS in seconds.<\/p>\n<p>[code]<br \/>\nCREATE MATERIALIZED VIEW viz.hex{} AS<br \/>\nSELECT<br \/>\nrow_number() OVER() as id,<br \/>\nextract(year from pp.completion_dt) as yr,<br \/>\nquantile(pp.price_int, 0.5) as median_price,<br \/>\nRPIConversion(lf.income_amt*52::numeric, 2012, {}) AS median_income,<br \/>\nround(quantile(pp.price_int, 0.5)::numeric \/ RPIConversion(lf.income_amt*52::numeric, 2012, {}), 2) AS affordability,<br \/>\nh.gid,<br \/>\nh.geom<br \/>\nFROM<br \/>\nlandreg.price_paid_fct pp,<br \/>\ninflation.hh_income_fct lf,<br \/>\nosopen.hex_mapping_dim hd,<br \/>\nosopen.hex_2500m h<br \/>\nWHERE extract(year from pp.completion_dt) = {}<br \/>\nAND lf.year = {}<br \/>\nAND lf.metric_nm::text = &#8216;Median&#8217;::text<br \/>\nAND lf.region_nm::text = &#8216;All households&#8217;::text<br \/>\nAND pp.pc = hd.pc<br \/>\nAND hd.h2500m = h.gid<br \/>\nGROUP BY yr, h.gid, lf.income_amt;<\/p>\n<p>[\/code]<\/p>\n<p>In the worst tradition of university lecturers: I&#8217;ll leave the remainder of the workflow\u00a0as an exercise for the reader.\t\t<\/p>\n","protected":false},"excerpt":{"rendered":"<p>One of the known problems with choropleth maps is that small zones, even if they contain very significant values, tend to get lost in amongst much larger zones. A current example is that the ridings in London are much smaller than those outside of London, so it can be hard to tell what&#8217;s happening in [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":345,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[16,5,8,13],"tags":[19,60,81],"class_list":["post-342","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-economic-geography","category-mapping","category-postgis","category-visualisation","tag-affordability","tag-land-registry","tag-price-paid"],"_links":{"self":[{"href":"http:\/\/www.reades.com\/wp\/index.php?rest_route=\/wp\/v2\/posts\/342","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/www.reades.com\/wp\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.reades.com\/wp\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.reades.com\/wp\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.reades.com\/wp\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=342"}],"version-history":[{"count":0,"href":"http:\/\/www.reades.com\/wp\/index.php?rest_route=\/wp\/v2\/posts\/342\/revisions"}],"wp:attachment":[{"href":"http:\/\/www.reades.com\/wp\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=342"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.reades.com\/wp\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=342"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.reades.com\/wp\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=342"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}