Python notes

From Noah.org
Jump to: navigation, search


Viva el pitón!

Pexpect

I loved Expect, but I hated TCL, so I wrote this 100% pure Python module that does the same thing. Click here for more details:

Pexpect -- An Expect-like module in pure Python

'rm -rf' in Python

The rmtree function in shutil almost gets you there, but it actually doesn't like to remove single regular files. The following does the right thing;

import os
import shutil


def rm_r(path):
    if os.path.isdir(path):
        shutil.rmtree(path)
    elif os.path.exists(path):
        os.remove(path)

factor with a generator

This factors a given number. When used as a script it prints the prime factorization in standard form. Note that the output is compatible with Bash arithmetic expressions.

The following examples show how the script works. The second integer is the product of the first 100 prime numbers. Factoring this number is actually fairly quick. Compare this to the running time of the third example, factoring a 20-digit prime number, which must check over 3.5 trillion factors before determining that the number is prime. Factoring this number took over 15 minutes on a Intel i5-3320M 2.6 GHz (circa 2013).

./factor.py 64794774807
3**2 * 7**3 * 3769 * 5569

./factor.py 4711930799906184953162487834760260422020574773409675520188634839616415335845034221205289256705544681972439104097777157991804380284218315038719444943990492579030720635990538452312528339864352999310398481791730017201031090
2 * 3 * 5 * 7 * 11 * 13 * 17 * 19 * 23 * 29 * 31 * 37 * 41 * 43 * 47 * 53 * 59 * 61 * 67 * 71 * 73 * 79 * 83 * 89 * 97 * 101 * 103 * 107 * 109 * 113 * 127 * 131 * 137 * 139 * 149 * 151 * 157 * 163 * 167 * 173 * 179 * 181 * 191 * 193 * 197 * 199 * 211 * 223 * 227 * 229 * 233 * 239 * 241 * 251 * 257 * 263 * 269 * 271 * 277 * 281 * 283 * 293 * 307 * 311 * 313 * 317 * 331 * 337 * 347 * 349 * 353 * 359 * 367 * 373 * 379 * 383 * 389 * 397 * 401 * 409 * 419 * 421 * 431 * 433 * 439 * 443 * 449 * 457 * 461 * 463 * 467 * 479 * 487 * 491 * 499 * 503 * 509 * 521 * 523 * 541

./factor.py 12764787846358441471
12764787846358441471

factor.py

#!/usr/bin/env python


def prime_factors(nn):

    ii = 2
    limit = nn ** 0.5
    while ii <= limit:
        if nn % ii == 0:
            nn = nn / ii
            limit = nn ** 0.5
            yield ii
        else:
            ii += 1
    yield nn


def standard_form(nn):

    pf = list(prime_factors(nn))
    pf_sf = [(ff, pf.count(ff)) for ff in set(pf)]
    pf_sf.sort()
    return pf_sf


def format_sf(nn):

    sf = standard_form(nn)
    ss = ''
    for (ff, cc) in sf:
        if cc > 1:
            ss += '%d**%d * ' % (ff, cc)
        else:
            ss += '%d * ' % ff
    ss = ss[:-3]
    return ss


if __name__ == '__main__':

    import sys
    nn = int(sys.argv[1])
    print (format_sf(nn))

The the product of the first 100 prime numbers is 4711930799906184953162487834760260422020574773409675520188634839616415335845034221205289256705544681972439104097777157991804380284218315038719444943990492579030720635990538452312528339864352999310398481791730017201031090. 2 * 3 * 5 * 7 * 11 * 13 * 17 * 19 * 23 * 29 * 31 * 37 * 41 * 43 * 47 * 53 * 59 * 61 * 67 * 71 * 73 * 79 * 83 * 89 * 97 * 101 * 103 * 107 * 109 * 113 * 127 * 131 * 137 * 139 * 149 * 151 * 157 * 163 * 167 * 173 * 179 * 181 * 191 * 193 * 197 * 199 * 211 * 223 * 227 * 229 * 233 * 239 * 241 * 251 * 257 * 263 * 269 * 271 * 277 * 281 * 283 * 293 * 307 * 311 * 313 * 317 * 331 * 337 * 347 * 349 * 353 * 359 * 367 * 373 * 379 * 383 * 389 * 397 * 401 * 409 * 419 * 421 * 431 * 433 * 439 * 443 * 449 * 457 * 461 * 463 * 467 * 479 * 487 * 491 * 499 * 503 * 509 * 521 * 523 * 541 = 4711930799906184953162487834760260422020574773409675520188634839616415335845034221205289256705544681972439104097777157991804380284218315038719444943990492579030720635990538452312528339864352999310398481791730017201031090

flatten a nested list (a 2D list, or a list of lists)

In this example, L is [(1, 4, 7), (2, 5, 8), (3, 6, 9)]. The last line is a list comprehension which will convert L to [1, 4, 7, 2, 5, 8, 3, 6, 9]. This can also be done with itertools.chain.from_iterable.

A=[1,2,3]
B=[4,5,6]
C=[7,8,9]
L=zip(a,b,c)
list(I for II in L for I in II)

Yielding blocks -- a sub-list generator

Often it is necessary to work with chunks of data from a bigger list. This generator will yield sequential blocks of a given size from a list.

def blocks(biglist, block_size):
    for ii in range(0, len(biglist), block_size):
        yield biglist[ii:ii+block_size]

poll directory for changes

This is a code snippet I sometimes use. The "better" way to do this would be to use inotify, but polling is simple, portable, and robust.

The idle_time adds a delay between each poll check. At 250 ms this script barely increases the load on a laptop.

import sys
import os
import time

def poll_path (target_path='.', timeout=10, idle_time=0.250):

    timeout_mark = time.time() + timeout
    before_timestamp = os.stat(target_path).st_mtime
    before_list = os.listdir(target_path)
    while True:
        time.sleep(idle_time)
        after_timestamp = os.stat(target_path).st_mtime
        if after_timestamp != before_timestamp:
            break
        if time.time() > timeout_mark:
            return None
    after_list = os.listdir(target_path)
    new_files = [ii for ii in after_list if ii not in before_list]
    old_files = [ii for ii in before_list if ii not in after_list]
    return new_files, old_files

Free Python software

You can find other free software I wrote (mostly in Python) here: Free Software

Python Style

PEP-8 gives the recommended style for Python code. It isn't very strict and none of that mixed-case stuff they do in Java: http://www.python.org/dev/peps/pep-0008/

A short summary:

Class Names
class names use the CapWords convention. Classes for internal use have a leading underscore in addition.
Function Names
function names should be lowercase, with words separated by underscores as necessary to improve readability.
Line Length
limit all lines to a maximum of 79 characters.
Indentation
use 4 spaces per indentation level. Avoid hard tabs.
Encoding
UTF-8 is preferred over Latin-1.

.vimrc for Python

I like Vim. This is how I set my .vimrc file.

This adds sensible tab settings for Python. It also defines simple folding. I never liked the built-in indent folding for Python because it folds too much -- every indent becomes a fold! For code folding I just want to see class names and method names. I don't need every single nested loop folded many levels deep... In this .vimrc see the section on "folding using /search/ pattern". This maps the normal mode key sequence, 'zff' to set the search pattern to find all class and method names. This also maps '\z' to refold every search pattern. The neat thing is that this pattern also works on PHP code! The '\z' mapping is also handy for editing other documents.

Python Cookbook

I have a few recipes on ActiveState's fun and useful Python cookbook.

Thread synchronization -- mutex section

In the example below, the code under the with section will run exclusively.

This is based on an idea by Raymond Hettinger to create a simple way to synchronize threads in a small section of code. It is intended more for spot checking and debugging. The problem with this code is that the interpreter could give another thread execution context between the two instructions, sys.setcheckinterval(sys.maxint) and yield. The results would be disaster since the other thread would be running in the interpreter with thread switching 2 billion times less than normal, so it would be the only thread ever allowed to run. The original thread running other_threads_suspended would never be given a chance to run the finally block necessary to return the thread switching interval back to normal.

import sys
from contextlib import contextmanager
@contextmanager
def other_threads_suspended():
    checkinterval_original = sys.getcheckinterval()
    try:
        sys.setcheckinterval(sys.maxint)
        yield None
    finally:
        sys.setcheckinterval(checkinterval_original)

# Example usage:
with other_threads_suspended():
    print "Here we might log system state info that we do no want changed by"
    print "    another thread until we are finished logging. When logging is"
    print "    complete then other threads will be allowed to run as normal."
    print "    If the system state was entirely encapsulated by an object"
    print "    then exclusive thread access could be managed there; however,"
    print "    few systems are so clean that you can guarantee this."

GUI toolkits

I hope tkinter will be deprecated someday. I think wxPython is the best alternative at the moment: http://wxpython.org/quotes.php which is a Python interface to wxWidgets.

Other libraries to consider: GTK+ used by Gnome; Qt used by KDE; Pygame wrapper on SDL.

hashlib versus sha (python2.4 vs. python2.5 vs. python3.0)

Python 2.4:

import sha
p = 'password'
sha.new(p).hexdigest()

Python 2.5:

import hashlib
p = 'password'
hashlib.sha1(p).hexdigest()

Python 3.0:

import hashlib
p = 'password'
hashlib.sha1(bytes(p,'utf-8')).hexdigest()

SHA1 for any version of Python

This works on any version of Python that has at least 'sha' or 'hashlib':

def any_sha1_hex_digest (s):

    try:
        # Python <= 2.4
        import sha
        return sha.new(s).hexdigest()
    except ImportError:
        try:
            import hashlib
            if 'bytes' in __builtins__.__dict__:
                # Python >= 3.0
                return hashlib.sha1(bytes(s,'utf-8')).hexdigest()
            else:
                # Python >= 2.5
                return hashlib.sha1(s).hexdigest()
        except ImportError:
            raise ImportError("No module 'sha' or 'hashlib'. You must " +
                    "have at least one of these modules.")

Parse Boolean strings

This will convert boolean strings into Boolean types. This is useful for parsing conf files and the like. This will handle strings like "yes", "1", "true", "no", "false", "0" with any sort of capitalization and whitespace padding. It will raise a ValueError if the string cannot be parsed.

def ParseBoolean (b):
    # Handle case where b is already a Boolean type.
    if b == False or b == True:
        return b
    b = b.strip()
    if len(b) < 1:
        raise ValueError ('Cannot parse empty string into boolean.')
    b = b[0].lower()
    if b == 't' or b == 'y' or b == '1':
        return True
    if b == 'f' or b == 'n' or b == '0':
        return False
    raise ValueError ('Cannot parse string into boolean.')

WTF?

This will test for primality with a regular expression. No, I'm serious...

    import re
    def is_prime (num):
        return re.match(r"^1?$|^(11+?)\1+$",'1'*num) is None