In a research paper and technical report presented at the USENIX Networked Systems Design and Implementation (NDSI) conference at the beginning of April, researchers from Northwestern University presented new methods for estimating the exact physical location of an IP address tens or hundreds of times more accurately than previously thought possible. The technique builds on existing approaches but adds a new element: it uses local businesses, government agencies, and educational institutions as landmarks, helping it achieve a median accuracy of just 690m—less than half a mile.
The researchers, led by Yong Wang, used a variety of statistical techniques to combine data from 163 public ping servers and 136 traceroute servers into a precise estimate of the range of possible physical locations for a particular IP address. They state that, despite the large number of data sources they need to combine, their technique is capable of real-time use, giving results in just one or two seconds in real-world applications. The novel technique uses several iterations to successively hone in on a target’s location.
How it works
Step one: a signal travels through optical cables at about two-thirds the speed of light, which drops down to about four-ninths the speed of light once you account for queuing at uncongested routers. The researchers’ first iteration takes advantage of this fact by pinging the targeted address from multiple servers, then recording the amount of time that it takes a signal to return. Since the servers have known locations, this method of absolute timing results in a selection of circles around the ping servers; and the target must lie within the area where all of these circles overlap.
At this point, the researchers have a pretty good idea of the general area of the target address, (to within several miles) so they can start homing in by looking for local landmarks.
Step two: a selection of points within the possible area are selected, and these geographic points are converted into their corresponding postal ZIP codes. For each ZIP code found, a commercial mapping service is used to guess at a variety of possible businesses, schools, and other institutions in the area. The researchers are looking for locations that publish their street address on their website and also host their website from that same physical address. The websites of the candidate business are scraped, looking for a street address.
Meanwhile, a couple of clever techniques are used to weed out websites that are hosted by a CDN, on a shared hosting service, or otherwise located away from the physical address. The resulting places are very important landmarks, because they combine a known location on the network with a precise geographic point.
Step three: now that the researchers have reliable pairs of IP and physical addresses, they can start searching for Internet backbone routers in the vicinity. They send traceroute requests from as many servers as possible to both the nearby landmarks and to the target IP address. Comparing some of these traces and the geographic locations of the known landmarks, they can deduce which nearby routers are connected to both the target and the landmark.
Then, using timing data from the pings, they eliminate congested routers which add too much delay to be reliable sources of distance data. The time it takes these nearby routers to ping the target allows for another, more fine-grained set of circles which constrain the target’s location again, this time down to the area of just a few city blocks.
It turns out that physical distances vary in close proportion with relative ping times of nearby landmarks. The researchers can look at a particular router and see how long it takes pings through that router to reach landmarks and the target. The relative ping times can then be translated into quite accurate local distances. Now, the research team can guess how close the target is to the small number of landmarks which remain in the possible area, and associate its physical location with that of the nearest, most reliable landmark.
This final analysis gives a very good guess at the target’s location: the median estimate is about 690m away from the target’s actual position. That’s almost close enough to send in the black helicopters—or the lawyers.
Here come the ads
The most important part of the research is that the method described is completely client independent: it doesn’t require any particular software on (or even permission from) the computer being targeted. This makes it particularly valuable to advertisers, who can now choose to target ads for the burger joint down the street or the record shop a block over.
But the technique also has some serious privacy implications. Before this, turning an IP address into a truly accurate location required a lot of work and some human interaction. With this method, the barriers to accessing real location data are considerably lower.