Tag Archives: Python

How To Python import GeoIP in OS X

python-powered-h-100x130

I’ve recently purchased a MacBook Pro and I started doing some Python/Django development. I came to a point where I had to use some of MaxMind.com’s GeoIP module and setting GeoIP up on OSX was a bit more challenging than Ubuntu but it wasn’t too bad.

In Ubuntu you can essentially sudo apt-get install python-geoip and things start working but it takes a slight bit more in OSX.

 

In order to do this on the Mac/OS X you need to install two things,
1. The MaxMind GeoIP C library http://www.maxmind.com/app/c
./configure
make
make check
make install

and 2. The Python MaxMind GeoIP library http://www.maxmind.com/app/python
python2 setup.py build
python2 setup.py install

 

For further information here’s a great page on MaxMind.com http://www.maxmind.com/app/geoip_resources

Python GeoIP to TimeZone! Plus Daylight Savings Time calculation!

python-powered-h-100x130

Some of you may have read my article about python-geoip for mapping an IP to a city.

If you provide a service which relies on timezones or even just want to display a message at that time then being able to map an IP address to a timezone can be valuable.

This assumes that you have the GeoLiteCity.dat database and that it’s all setup and working. I can show you how to do that in this article.

Once you have that all setup and working all you have to do to get the timezone is pass the country code and region to the GeoIP library function GeoIP.time_zone_by_country_and_region() function.

Here’s an example…

>>> import GeoIP
>>> GeoIP.time_zone_by_country_and_region(gi.record_by_addr("74.125.93.106")['country_code'], gi.record_by_addr("74.125.93.106")['region'])
'America/Los_Angeles'

PyTZ

PyTZ is a nice library that you can use to manipulate timezones in python.

If you want to find out the timezone for an IP address you’ll need to pass it through PyTZ.

>>> from pytz import timezone
>>> tz = timezone(GeoIP.time_zone_by_country_and_region(gi.record_by_addr("74.125.93.106")['country_code'], gi.record_by_addr("74.125.93.106")['region']))

Calculating Daylight Savings Time (DST)

>>> from datetime import datetime
>>> loc_dt = tz.localize(datetime(2011,5,26,16,0,0))
>>> loc_dt.dst()
datetime.timedelta(0, 3600)
>>> loc_dt = tz.localize(datetime(2011,12,26,16,0,0))
>>> loc_dt.dst()
datetime.timedelta(0)

Here you can see that on May 26th 2011 the DST 3600 seconds (an hour), then on December 26th 2011 the DST is 0.

Summary

So here’s the complete code for figuring out daylight savings time from an IP…

>>> import GeoIP
>>> from pytz import timezone
>>> from datetime import datetime

>>> gi = GeoIP.open("/usr/share/GeoIP/GeoIPCity.dat", GeoIP.GEOIP_STANDARD)
>>> googles_tz = GeoIP.time_zone_by_country_and_region(gi.record_by_addr("74.125.93.106")['country_code'], gi.record_by_addr("74.125.93.106")['region'])
>>> tz = timezone(googles_tz)
>>> loc_dt = tz.localize(datetime(2011,5,26,16,0,0))
>>> loc_dt.dst()
datetime.timedelta(0, 3600)
>>> loc_dt = tz.localize(datetime(2011,12,26,16,0,0))
>>> loc_dt.dst()
datetime.timedelta(0)

JUnit and PyUnit, @Ignore and @expectedFailure

java-logo2-150x150

What’s @ignore and @expectedFailure?

I think that JUnit’s description says it best, “Sometimes you want to temporarily disable a test.”

Here’s the whole thing for JUnit…

“Sometimes you want to temporarily disable a test. Methods annotated with Test that are also annotated with @Ignore will not be executed as tests. Native JUnit 4 test runners should report the number of ignored tests along with the number of tests that ran and the number of tests that failed.” - http://api.dpml.net/junit/4.2/org/junit/Ignore.html

And for Python these two links seem to begin to explain some of the thought behind the decisions to add this feature…

http://bugs.python.org/issue1399935
http://mail.python.org/pipermail/python-dev/2006-January/059503.html

My concern with this feature

I think this is one of those features that kind of got added on a slow day rather than really thinking about the impact that it has on the code and that reflection on the language. It makes sense right? You run your tests, you see some tests failing and you make a decision in your mind that you expected those to fail for a little while during development so you want them to stop sounding the alarm and blowing whistles!

This seems innocent enough at first but what are the long term implications?
Well firstly I think that you should really ask the question “Why do I have failing tests that I want to suppress??”  I can’t think of any good reason personally.  If you’re implementing something so huge that you’ve written tests that you can’t make pass before you test it on an integration server then you’ve probably bitten off more than you can chew and your feature will change/get removed before you can make the tests pass anyway.

Secondly, writing tests should be as simple as possible. Adding something extra to a testing framework, which also defeats the purpose of the framework, seems like we’re making things way more complicated than they have to be. A unittesting framework is supposed to tell you when things pass or fail. If you don’t want the tests to show up as failures and they do not indicate that something is broken then remove them.

Thirdly, maintenance…who’s going to clean this all up? You’ll see some places that warn the users of these features about this. Make sure that you remove this before you finally commit or merge to trunk! Why not just not worry about it by not using it?

If anyone has a really great use case for this please share in the comments! I may have my opinions/rants but I’m mutable :)

Update:

Twitter response about @ignore and @expectedFailure

This is an interesting perspective on the use of @expectedFailure. Basically it can allow you to manage bugs in third-party libraries. For instance if you find a new bug in some library that your’re using, you can write a test for that bug and when it gets fixed that will turn into an “unexpected success” and you can then clean up that test.

For me I still wonder how often this is truly useful. I haven’t seen a lot of tests written for bugs in third-party software. Also, if you have to use that part of the third-party library that has the bug then you must depend on that bug being there or not so you should have tests covering the outcome of the third-party’s bug.
The tests that you have will be covering “your” use of the third-party software. Obviously if the third-party software has a severe bug in it then you may decide not to continue using it.

Thanks Andrzej Krzywda! Great article on your blog.

unittest-xml-reporting multi database django 1.2

Background

I’ve recently started playing with continuous integration platforms like hudson and cruise control.  Currently most of my projects are written in python using django. There’s just one problem…The django unit test runner doesn’t output XML. These continuous integration platforms need JUnit XML output so that they can parse the test results.

There are a few test runner projects out there that seems pretty good. I haven’t seen any yet that support the XML output and multiple databases in django(there could be some, I didn’t look super far).

What I did

I forked danielfm’s project(https://github.com/danielfm/unittest-xml-reporting) and updated the run_tests() function and converted it to the new django 1.2 class for TEST_RUNNERS.

This article on django test runners explains the change pretty clearly http://docs.djangoproject.com/en/dev/topics/testing/?from=olddocs#defining-a-test-runner.

Changed in Django 1.2: Prior to 1.2, test runners were a single function, not a class.
A test runner is a class defining a run_tests() method. Django ships with a DjangoTestSuiteRunner class that defines the default Django testing behavior. This class defines the run_tests() entry point, plus a selection of other methods that are used to by run_tests() to set up, execute and tear down the test suite.

I also updated the create_test_db() functions with the new setup_databases() and teardown_databases() methods.

How you can use it

Here is my fork of danielfm’s unittest-xml-reporting project with the updated multiple database support.

https://github.com/daspecster/unittest-xml-reporting

You can clone this project…

git clone git@github.com:daspecster/unittest-xml-reporting.git

Or download it from the url above. Just make sure that you get the master branch.

To start using this code in your project you will need to install this code and then include the django TEST_RUNNER information in your settings.py.

  1. After you’ve gotten the code from git, you’ll need to go into the unittest-xml-reporting directory and run $ python setup.py install,  as root. This will install the project into your python path.
  2. Now you need to tell django that you want to use this. You’ll need to add these lines to your django project’s settings.py file
    TEST_RUNNER = 'xmlrunner.extra.djangotestrunner.XMLTestRunner'
    TEST_OUTPUT_VERBOSE = True
    TEST_OUTPUT_DESCRIPTIONS = True
    TEST_OUTPUT_DIR = 'xmlrunner'
  3. Now you should be able to run $ python manage.py test and it will create a folder called xmlrunner that has all your XML for the results of your tests!

Getting Started with GoogleCode and Mercurial

I put a few of my Python projects on GoogleCode last week and learned quite a few things that I thought I would share.

Setting It Up

If you want to start a GoogleCode project you’ll need the project creation page. Enter a project name (all lower case), summary, and description. For version control, select Mercurial, and go with whichever license suits you best. Add a few labels and create your project. You should then be taken to your project page, which you’ll probably want to bookmark.

The second piece you’ll need (if you don’t have it already) is Mercurial for your source code version control. Before I had projects on GoogleCode, I use Mercurial in single-player fashion, saving revisions of my own projects so I didn’t have to worry about making major or possibly risky changes.

Mercurial

Presuming you are putting an existing project on GoogleCode, the first thing you’ll want to do is have a repository to push to the servers. Using the command line (or Dos prompt if you’re using Windows), navigate to the folder containing your code. You’ll need to create a repository, add your code to it, and commit it. You can use the following commands:

hg init
hg add
hg commit -m "Initial commit."

There are options for the add command, such as hg add -X [pattern]. This excludes files matching a certain pattern. I typically don’t include .pyc files in my repositories, so often I’ll use hg add -X *.pyc

If you haven’t used Mercurial before, you’ll need to set up an hgrc file somewhere. There are several places you can put the file (see the link), and you can even have multiple files that Mercurial will combine (more on this later). For now, create a new file in your home directory called “.hgrc” (again, see the link above for details). Put the following information in the file:

[ui]
username = Your Name <youremail@server.com>
verbose = True

This tells Mercurial who made a commit (useful when looking through your revision log) and instructs Mercurial to show more output when doing things rather than less.

When working with GoogleCode, user authentication is required to push changes. You’ll need your Google username and the code/password generated by GoogleCode (go to the Sources tab and click the link for “googlecode.com password”). If you don’t do the following steps to do authentication automatically, you’ll have to enter your username and password each time you push, which, given the kind of generated password GoogleCode produces, is a pain.

To authenticate automatically, add an authorization section to your hgrc file.

[auth]
project.prefix = https://projectname.googlecode.com/hg/
project.username = your.username
project.password = geNeRAteDpasSWOrd

You can put multiple prefixes, usernames, and passwords here, using something else instead of “project.”. One irritating thing is that you have to repeat your name and password for each project, even if they’re the same. It would be nice if there was a way to reference a variable containing the data, but if there is, I haven’t found it. If anyone knows how, let me know (but it’s not a big deal).

If you decide to do this, it’s important to make sure your hgrc file is not publicly visible, as in, stored in your repository somewhere. You can put an hgrc file in the .hg directory in your repository, and putting your password in there isn’t a good idea.

Speaking of the hgrc file in your repository, let’s make one. Create a file named “hgrc” in the .hg folder in the folder containing your code. This step is another convenience step, meant to make pushing your code easier. Generally, without this step, you’ll have to issue the command hg push https://yourproject.googlecode.com/hg/ Adding this section to the repository’s hgrc file allows us to use simply hg push

[paths]
default-push = https://yourproject.googlecode.com

You should now be able to push your code to the GoogleCode servers. One thing I learned when I first did this is that your entire version history will go on GoogleCode, not simply your latest revision. In retrospect, this makes perfect sense since version control is simply a history of changes, all of which you’ll need to create the most recent version.

Added Bonus: Externals and Subrepos

One of the issues I discovered with serving code from a repository is how to manage external dependencies. I have several modules I use on a regular basis, for instance, a module of commonly used decorators. None of this modules, however, warrants a project that I could upload and list as a dependency for all my other projects, and of course, you don’t want to simply copy the module and add it to each repository that uses it (maintaining the same code in different places is not only a pain, but a bad design idea in general).

Enter the helpful folks at subrepositories, an experimental feature in Mercurial version 1.3 and up. I’ve used it a little, and I’m still seeing how well it works, but on first glance, it seems to be the right idea. Here’s how I resolved my problem.

Put your external dependencies in a folder and make it a repository with the simple sequence hg init; hg add; hg commit -m "Log" You don’t need to worry about the hgrc file this time, since, if you put your username in the home directory, Mercurial will look there for it.

Now, navigate back to your main project’s folder. I keep my projects in a Code folder, among them a folder named “externals” with the code I often reuse between different projects. If you have a different scheme, you’ll have to adjust the following instructions accordingly.

You’ll have to do four things: create the .hgsub file, clone the externals repository, add the subrepository to the project, and commit. First, create the “.hgsub” file in the main project directory. Put in it the line externals = ../externals This tells Mercurial where to check for updates. For the other three steps, execute the following commands:

hg clone ../externals externals
hg add
hg commit -m "Added externals subrepository."

When you issue the “add” command, you might not see anything happen. That’s okay. I know I started trying to troubleshoot when I didn’t see anything happen, but it does the work on the commit, so you don’t have to worry. (Mercurial will also create a .hgsubstate file that it needs but you don’t need to worry about.)

The last thing to do is push your updated repository. Mercurial will pull changes for the subrepository from the repository listed in the .hgsub file. (I’m not sure if it does this on commit or push.) You can include multiple subrepositories by repeating the process.

Conclusion

That’s my experience putting projects on GoogleCode. Let us know yours in the comments.

Python code metrics

Recently at work I’ve been pushing to start tracking some metrics on our python code base. There are a few tools out there like pygenie, pychecker and pylint. These seem to be the leading code metric tools for python at the moment. Where I work we have eagerly adopted pylint in our daily use of python.

Pylint seems to be the most useful for the day to day developer. It’s fast and flexible, you can run it on the file you’re working in,a group of files or the entire project.

There are some problems with all of these tools when it comes to frameworks. I’ve been using django alot lately and with the way that it has it’s settings.py file integrated it kind of  tricks these lint tools. So you may see errors saying something like this…

ImportError: Settings cannot be imported, because environment variable DJANGO_SETTINGS_MODULE is undefined.

There are a couple django-lint checkers it appears.

django-lint is a project that appears to have died. I’ve tried to use this a little bit, but I haven’t really had much success.

It appears however that the project has moved or branched http://chris-lamb.co.uk/projects/django-lint.

I haven’t tried this version but I imagine it would be better since it was last updated in March 2010.

You can also set the DJANGO_SETTINGS_MODULE environment variable like this…

$ export DJANGO_SETTINGS_MODULE=mysite.settings
$ django-admin.py runserver

So here’s my story:

I have some surprising elements that came out of looking for lint tools for python or any other programming language that you’re working in.

One, was that my team latched on to the pylint scoring mechanism instantly and it became a game of who could get a perfect score over all the code they were working in.  This is good for a few reasons, and I challenge those nay sayers out there.

Warning: POINT ahead!

Ok yes, maybe “your” coding style isn’t accomodated in pylint…you know what! It doesn’t matter.  The POINT is that everyone is meeting a STANDARD.

That’s the point! I don’t care if no one can get a perfect score unless all their variables start with “i” (which I assume is Apple’s policy).  The power that comes with everyone driving, pushing and accelerating in the same direction is far far far more valuable than “your” coding style.

Python GeoIP (python-geoip) cities tutorial

Using geolocation is something that people are doing a lot lately.  You may have noticed twitter.com in FireFox showing a little bar at the top asking if it’s alright to share your location. This is so that when you tweet you can have your location show up there. That way you can keep track of where you tweet from.  There are many uses for this kind of information and these days there’s a lot of free things out there to help you get started with your geolocation project.

Installing GeoIP

Requirements/Dependancies:

  • Python 2.4+
  • python-geoip
  • libc6
  • libgeoip1

More on dependencies here http://ns2.canonical.com/es/karmic/python-geoip. But don’t worry about these since most of them get installed by the package anyway.

A couple of basic principles before we get started.

  1. Geolocation is gathered from an IP address.
  2. There has to be a database that connects the IP address to a geographical location

I’m going to be using Python here because frankly it’s powerful, easy and has awesome libraries for geolocation. Which brings me to the GeoIP library! I’m using Ubuntu 9.10 so most of these libraries will just take an apt-get to install.

Then you can install python-geoip with

sudo apt-get install python-geoip

Or you can get the source from http://geolite.maxmind.com/download/geoip/api/python/

Now that you have this installed you can  test it with the following code put in the python terminal.

>>> import GeoIP
>>> gi = GeoIP.new(GeoIP.GEOIP_MEMORY_CACHE)
>>> print gi.country_code_by_addr("203.195.93.0")

I got that from MaxMind’s tutorial http://www.maxmind.com/app/python. At this point you have the ability to track IPs down to the country level. What you probably really want is to go down to the city level.

Geolocation – Cities

If you call some of the other functions on the GeoIP class like record_by_addr() you’lld get an error like this

“Invalid database type GeoIP Country Edition, expected GeoIP City Edition, Rev 1″

Read more »