bloghow2doitde

A blog about computer science and a life between Heidelberg and Munich

Working and analysing BGP data always contains a bit of geographical knowledge. So I was in need of a geolocation database to look up certain IPs and different ranges and their location. This article is about creating such a small database and about how to get information out of it.

Creating the databases

The first thing we have to do is to download a free Geolocation-Database like GeoLite City of MaxMind. If the uncompressing is done we can start creating the mySQL database. There are now two CSV files in your directory. GeoLiteCity-Blocks.csv contains the decoded IP ranges and GeoLiteCity-Location.csv the locations with some metadata related to them.

The easiest part is now to create the database.

1
2
CREATE DATABASE geoip;
USE geoip;

Creating the database table for the IP Ranges mapped to a certain location (in form of an ID) is a simple thing. The only strange thing here is that MaxMind stores the first and the last IP address of an IP range instead of range information in a syntax like CIDR. I suggest to use a primary key containing both IPs as they are the ones you access when you want to locate a certain IP address.

1
2
3
4
5
6
CREATE TABLE blocks (
  startIP INT(10) UNSIGNED NOT NULL,
  endIP INT(10) UNSIGNED NOT NULL,
  locationID INT(10) UNSIGNED NOT NULL,
  PRIMARY KEY (startIP, endIP)
);

To import the csv’s into your newly create table. In the most scenarios it’s no need to worry if one or two out of 2255500 entries are corrput.

1
LOAD DATA LOCAL INFILE '/path/to/GeoLiteCity/GeoLiteCity-Blocks.csv' INTO TABLE geoip.blocks FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"' LINES TERMINATED BY '\n';

We can now create the table containing the location information. I had a look at the data and found that all fields marked with NOT NULL are actually never empty in the given dataset of MaxMind. Therefore I denied null values within the table.

1
2
3
4
5
6
7
8
9
10
11
12
CREATE TABLE location (
  locationID INT(10) UNSIGNED NOT NULL,
  country CHAR(2) NOT NULL,
  region CHAR(2) NOT NULL,
  city VARCHAR(50),
  postalcode VARCHAR(10),
  latitude FLOAT NOT NULL,
  longitude FLOAT NOT NULL,
  dmacode INTEGER NOT NULL,
  areacode INTEGER NOT NULL,
  PRIMARY KEY(locationID)
);

Now it’s time to import the data from the CSV location file. This can be done similar to the blocks file of course.

1
LOAD DATA LOCAL INFILE '/path/to/GeoLiteCity/GeoLiteCity-Location.csv' INTO TABLE geoip.location FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"' LINES TERMINATED BY '\n';

The last step is to create an account to access the tables. As we do not change anything we only need SELECT right on mySQL for this very user.

1
GRANT SELECT on geoip.* TO 'geoipuser'@'%' IDENTIFIED BY 'mySweetLittlePassword';

Usage of the geolocation database

It’s time for having some fun now. Let’s have a look where for example the website of the Technical University of Munich is located. The IP of the www.tum.de resolves to 129.187.255.228. We have to use a light JOIN to get all the information needed, but as this is a very small one there is no need to worry about cryptographic SQL statements. And we also forget about more complex issues like anycasts - we’re just keeping it clear and simple.

But the tricky part is to encode this IP. As mySQL has no special field-types as PostgreSQL does the whole IPs are stored in a decoded way. The CSV files already include all IP data in an encoded way. The MaxMind website includes information about how the encoding is done. So we do it this way and get 2176581604 which is calculated by 2176581604 = 16777216 * 129 + 65536 * 187 + 256 * 255 + 228. Fairly trivial stuff after reading the website of MaxMind.

Another important part of the query is to remember that we’ve got an starting and ending IP address of a whole rang. In most cases the IP we’ve got to lookup lies somewhere in between. So we need to use greater than or less than combined with equal as comparing functions. A working SQL statement therefore looks like this:

1
2
3
4
5
6
7
8
SELECT b.startIP, b.endIP, l.country, l.region, l.city FROM location l JOIN blocks b ON (l.locationID = b.locationID) WHERE 2176581604 >= b.startIP AND 2176581604 <= b.endIP;

+------------+------------+---------+--------+----------+
| startIP    | endIP      | country | region | city     |
+------------+------------+---------+--------+----------+
| 2176577536 | 2176581631 | DE      | 02     | Garching |
+------------+------------+---------+--------+----------+
1 row in set (0.12 sec)
posted in: Geolocation, Linux, mySQL