Linux Follies: python

Showing posts with label python. Show all posts

2011-12-07

Free book: Natural Language Processing with Python

Here's a free online book (with supplementary material): Natural Language Processing with Python, by Bird, Klein, and Loper. It's based on the Natural Language Toolkit (NLTK), a Python library. You can fork it and the book at GitHub.

2011-06-08

Some Python introspection

Python has very powerful introspection capabilities. These let you examine any object (and everything is an object in Python) and extract information about it.

I am currently writing a script to monitor licenses for Matlab, and found occasion to use introspection. I have a string which is the name of a local variable in a function, and I want to extract the value of that local variable. (The variable is really an argument passed to the function.) Having the name of the variable as a string, how do I get its value? Use the locals() function. locals() returns a dictionary of local variables.

    def check_licenses(checked_out=0, warning=10, critical=15): 
        # checked_out, warning, and critical are now local variables
        levels = ('warning', 'critical')
        for l in levels:
            if checked_out >= locals()[l]:
                print 'Licenses checked out are above threshold %s' % (l)

I found this mailing list post very useful.

2010-08-11

Python Imaging Library: building from source

The Python Imaging Library (PIL) provides image processing capabilities to Python. If, like me, you need several versions of Python available, you will need to compile PIL from source.

If you run on a 64-bit version of Linux (I use Fedora 13), the locations of libraries are different from what PIL expects. So, you have to edit the setup.py file to point to the correct directories. Near the top of setup.py, put in:

    FREETYPE_ROOT = "/usr/lib64", "/usr/include"
    JPEG_ROOT = "/usr/lib64", "/usr/include"
    TIFF_ROOT = "/usr/lib64", "/usr/include"
    ZLIB_ROOT = "/usr/lib64", "/usr/include"
    TCL_ROOT = "/usr/lib64", "/usr/include"

If you had an unsuccessful build, you will need to remove the build directory before trying again.

2010-08-03

Python's dict

Python class instance members are really dictionaries/mappings. For example,

    class Foo:
        def __init__(self, name=''):
            self.name = name

You can access the name member:

    In [2]: f = Foo('vito')

    In [3]: f.name
    Out[3]: 'vito'

You can also do:

    In [4]: f.__dict__['name']
    Out[4]: 'vito'

In fact, you can see all the data members:

    In [5]: f.__dict__
    Out[5]: {'name': 'vito'}

This gives us a quick way of creating an object at run time, say when parsing a text file. For a very contrived example, we have a text file that looks like this:

    name=John,age=35,hobby=philately
    name=Sally,age=28
    name=Vito,age=18,sex=male
    name=Maria,age=58

We can grab all the data into a bunch of objects like this:

    class Person:
        def __init__(self, name=''):
            self.name = name

    if __name__ == '__main__':
        f = open('people.dat', 'ro')
        people = []
        for l in f.readlines():
            people.append(Person())
            lsp = l.strip().split(',')
            p = []
            for i in lsp:
                p.append(i.split('='))
            people[-1].__dict__ = dict(p)

        for p in people:
            print p.__dict__

And the output is:

    {'hobby': 'philately', 'age': '35', 'name': 'John'}
    {'age': '28', 'name': 'Sally'}
    {'age': '18', 'name': 'Vito', 'sex': 'male'}
    {'age': '58', 'name': 'Maria'}

You could do something fancier in Person.__str__() (or the __unicode__()) method:

    def __str__(self):
        retstr = ''
        for k,v in self.__dict__.iteritems():
            retstr += '%s: %s\n' % (k, v)
        return retstr

2010-04-23

Python lexical closures

I just learnt some nuances about Python’s lexical closures. Having not had experience with closures in other languages, particularly Perl (which has different behavior), I was a little bit ahead in not having to unlearn something. Anyway, the key point is that Python binds to the -- watch, I’m going to screw up the terminology -- variable rather than the value.

In particular, I wanted to loop over a bunch of strings. These strings were to be simple matching regexp patterns. Using these, I wanted to create a substitution function for each. Basically, I wanted to anonymize an IRC log file, and I predefine a dictionary matching real handles with anonymized handles.

My problem was that the naive way of doing things gave me only the last substitution in my dictionary.

 
    lines = ['Adam: hello Charlie', 'Barbara: hi Adam', 'Charlie: howdy Barbara']
    subs = {'Adam': 'Nobody', 'Barbara': 'Somebody', 'Charlie': 'Dr. Who'}

    relist = []
    for k,v in subs.iteritems():
        relist.append(lambda x: re.compile(k).sub(v,x))

    newlines = []
    for l in lines:
        nl = l
        for s in relist:
            nl = s(nl)
        newlines.append(nl)

    for l in newlines: print l

Which produces this output:

 
    Nobody: hello Charlie
    Barbara: hi Nobody
    Charlie: howdy Barbara

Obviously wrong. Only one substitution worked. The closure captured (k,v), and when the loop over subs exited, the last value of (k,v) was used in the lambda functions.

Here are two correct ways to do this. You have to force the value of (k,v) to be used in each substitution function.

 
    def dosub(relist, lines):
        newlines = []
        for l in lines:
            nl = l
            for s in relist:
                nl = s(nl)
            newlines.append(nl)
        return newlines

    def right(lines):
        subs = {'Adam': 'Nobody', 'Barbara': 'Somebody', 'Charlie': 'Dr. Who'}

        # the long and clear way
        relist = []
        for k,v in subs.iteritems():
            def subber(x,k=k,v=v):
                return re.compile(k).sub(v, x)
        
            relist.append(subber)

        newlines = dosub(relist, lines)

        for l in newlines: print l
        print '==================='

        # the short and opaque way
        relist = [(lambda kv: lambda x: re.compile(kv[0]).sub(kv[1], x))((k,v)) for k,v in subs.iteritems()]

        newlines = dosub(relist, lines)
        for l in newlines: print l

    if __name__ == '__main__':
        lines = ['Adam: hello Charlie', 'Barbara: hi Adam', 'Charlie: howdy Barbara']
        for l in lines: print l
        print '==================='
        right(lines)

Which gives the desired output:

Adam: hello Charlie
    Barbara: hi Adam
    Charlie: howdy Barbara
    ===================
    Nobody: hello Dr. Who
    Somebody: hi Nobody
    Dr. Who: howdy Somebody
    ===================
    Nobody: hello Dr. Who
    Somebody: hi Nobody
    Dr. Who: howdy Somebody

Credit is due to the contributor piro at Stack Overflow who responded to a question.

Elsewhere, someone has done a nice study on various methods of concatenating a list of strings in Python. It turns out, the one-liner using the list comprehension is most efficient.