Address not found: can we really trust geocoded crime data?
The ease with which crime statistics could be currently mapped online is definitely one of the few indisputable achievements that the otherwise dubious era of Web2.0 could claim on the social front. The pioneering site ChicagoCrime (now defunct, at least in its original form) was one of the first to expose us to the elegance ...
The ease with which crime statistics could be currently mapped online is definitely one of the few indisputable achievements that the otherwise dubious era of Web2.0 could claim on the social front.
The pioneering site ChicagoCrime (now defunct, at least in its original form) was one of the first to expose us to the elegance of mash-ups by showing how crime statistics gathered by police departments all over the world could be neatly plotted on online maps like those provided by Google. Back in the day, there were hardly any shortcuts for doing this, so the data had to be hard-coded – and yet, it all worked beautifully – thanks to the genius of developers like Adrian Holovaty.
By 2009, projects like ChicagoCrime either became part of the traditional media establishment, having been integrated into the web-sites of local and regional newspapers, or they simply morphed into something bigger in scale (CC has followed the latter path, evolving into a very ambitious site called EveryBlock, which tracks many other kinds of data – not just crime – for a number of cities). The process seemed to be almost glitch-free: as long as the data gathered by police could be trusted, there was hardly ever any reason to suspect that there might be something wrong with the back-end technology.
A lengthy article in Sunday’s Los Angeles Times – investigating an unlikely spike in crime in a particular area of Los Angeles – sheds some light on what could go wrong with the geocoding technology which is widely used to map police reports on a city map (LAPD’s map of crime in Los Angeles is available here). Geocoding is actually less ominuous than it sounds; it simply denotes the process of finding latitude and longtitude from other geographic data like street addresses or zip codes. No computer systems, as we know, are perfect; often, they can’t see that “68th ST” and “W 68th St” might refer to the same neighborhood, and any geocoder – a piece of software that helps with the process – could make an occasional mistake.
The situation gets even more complicated when it comes to cities as big and complex as as New York or Los Angeles, which often have similarly named streets both uptown and downtown or spread accross different boroughs. This may still have been fine and acceptable for a regular user, but it’s increasingly common that geocoded crime data is used for policy-making and allocating resources, which have mugh higher quality standards (much of this data is also fed into aggregators like EveryBlock, which help even more people discover it; the problem is that these aggregators do not check the validity of this data, trusting that governments have done their job).
The LAT piece is worth checking in full. It also discusses the dangerous implications of relying on similar automated geocoded techniques in mapping data about sex offenders and the role that sites like EveryBlock play in the process. Below is a short excerpt:
…Unable to parse the intersection of Paloma Street and Adams Boulevard, for instance, the computer used a default point for Los Angeles, roughly 1st and Spring streets.
Mistakes could have the effect of masking real crime spikes as well as creating false ones.
In the six months from last October through March, the LAPD placed 1,380 crimes — 4% of all crimes mapped — at the Civic Center point, a rate of nearly eight a day, a Times analysis found. The Times discovered the problems while developing its own crime website that will feature the LAPD data. After finding the Civic Center error and others, The Times is developing a strategy to geocode addresses with a higher degree of accuracy.
In the LAPD map, many of the crimes placed downtown actually occurred in the San Fernando Valley, the Westside or South L.A. Sometimes, L.A. crimes were placed outside the city, as far away as Lancaster and Catalina Island. In hundreds of cases, crimes that took place in South Los Angeles migrated north dozens of blocks.
Alerted to the findings, Lightray Productions, the contractor that designed the LAPD site at a cost of at least $362,000, has promised to fix the problems
…One reason the errors were not caught earlier may be that the LAPD site retains crimes for only six months and allows viewers to see only a seven-day period at a time. The presentation makes some trends, such as the large accumulation of crimes mapped at Civic Center, more difficult to spot
Photo by wwworks/Flickr
More from Foreign Policy
Can Russia Get Used to Being China’s Little Brother?
The power dynamic between Beijing and Moscow has switched dramatically.
Xi and Putin Have the Most Consequential Undeclared Alliance in the World
It’s become more important than Washington’s official alliances today.
It’s a New Great Game. Again.
Across Central Asia, Russia’s brand is tainted by Ukraine, China’s got challenges, and Washington senses another opening.
Iraqi Kurdistan’s House of Cards Is Collapsing
The region once seemed a bright spot in the disorder unleashed by U.S. regime change. Today, things look bleak.