The Unweekly Review, 15/05/2011

The Unweekly Review

After two months of silence, I have some great stuff for you today:

PyCuriosities, part 2

This is a follow-up to my first blog entry on curiosities in Python - PyCuriosities, part 1

Next set of tips:

  • Reversing a string or list (or more general - a sequence) is as simple as making a copy of it with negative increment: sequence[::-1]  - which is equivalent to  sequence[-1::-1]  (see: Extended slices)
  • Built-in debugger (pdb) is great, but IPython-powered pdb (ipdb) is much better
  • Built-in SMTP module - smtplib - can raise not only its own SMTPException-s, but also socket.error-s. The same surprise can happen when using e.g. urllib2 - it can raise its own exceptions, httplib exceptions and socket.error-s.
  • In try..finally blocks, code in finally block is always executed. This can lead to errors if you put such try..finally block inside a function:
    • >>> def f():
    • >>>     try:
    • >>>         return 1
    • >>>     finally:
    • >>>         return 2
    • >>> f()
    • 2
  • dict.setdefault(key, val) works like dict.get(key, val), except that in case key is not present in dict, it first sets dict[key] to val. As a result, it doesn't raise KeyError-s.
  • Collections.defaultdict is like dict.setdefault, but taken to the next level.
  • The Zip Trick (matrix transposition):
    • >>> a = [1, 2, 3]
    • >>> b = ['a', 'b', 'c']
    • >>> x = zip(a, b)
    • >>> x
    • [(1, 'a'), (2, 'b'), (3, 'c')]
    • >>> zip(*x)
    • [(1, 2, 3), ('a', 'b', 'c')] 
  • help('modules') lists all installed Python modules
  • You can create Python class manually using type(name, base_classes, class_members_dict). It can be used e.g. to create Django forms on the fly, like here.
  • Python supports so called metaclasses. You might never need them, but it's good to be aware of what they are:

PyCuriosities, part 1

Here are some interesting, non-obvious things about Python language:

  • Logical operators and and or do not return boolean value - they return the last evaluated value. For example:
    • ('a' and 'b') == 'b'
    • ('' and 'b') == ''
    • ('a' or 'b') == 'a'
    • ('' or 'b') == 'b'
  • a == b is not the same as a is b. The former tests for equality and is just a syntactic sugar for a.__eq__(b) (well, not quite, but read this http://stackoverflow.com/questions/2281222/why-when-in-python-does-x-y-call-y-eq-x for details). The latter is a syntactic sugar for something like id(a) == id(b)  (at least in 2.x line of CPython), i.e. it checks if both a and b point to the same object in memory.
  • Small integers are "cached" in 2.x line of CPython, for details read this: http://stackoverflow.com/questions/306313/python-is-operator-behaves-unexpectedly-with-integers
  • Python 2.x has a built-in profiler called Hotshot, which output file is compatible with KCacheGrind, a profile data visualization program
  • Default recursion limit in CPython 2.x is set to 1000 (http://docs.python.org/library/sys.html#sys.getrecursionlimit)
  • To check if given object is a string (str or unicode), just use isinstance(obj, basestring), where basestring is a base class for both str and unicode types
  • Python has a set of interesting string codecs. Some examples:
    • s.encode('base64')
    • s.encode('zip')
    • s.encode('rot13')
    • s.encode('string_escape')
  • If you need to embed a multi-line string in a nested block of code then you probably don't want to start each line of that string at the beginning of each code line - you rather prefer to keep it indented same as the surrounding code block. Still, you probably want it to print without the leading blanks. In such case textwrap.dedent is your friend

The Unweekly Review

Slow urllib2

On different occasions I noticed that urllib2 (as well as lower-level httplib) was slow at opening remote pages, which was strange as usually GET <url> was doing much better in terms of performance. A few days ago, being tired of this strange problem, I said "enough!" to myself.

A quick scan on what was going on behind the curtains revealed that it was the DNS lookup in Python library that was the bottleneck, and that the whole problem had been known for some time:

Fortunately, according to this message, it should be possible to fix the problem by messing with DNS configuration.

 

Python's logging.TimedRotatingFileHandler quirk

If you're a logging module addict like me, you might want (at some point) to use its TimedRotatingFileHandler with when parameter set to "midnight". Unfortunately, this setup doesn't work in a general case, because the rollover happens only when log file is written to at a later date than it was opened!

Details of this problem are described here:

One of the possible solutions (described here), is to use logrotate for rolling logs over, and then make use of WatchedFileHandled (that works in a similar way to tail -F).

Emacs (and Emacs config) update

Today I've upgraded my Emacs 23.1 - which is the default version in Ubuntu 10.04 - to the latest 23.2 release. The whole process has been very smooth, thanks to Michael Olson&Co who did all the hard work and shared the results here: https://launchpad.net/~ubuntu-elisp/+archive/ppa.

My overall impression is that the new Emacs seems to be "snappier" and have some additional small goodies here and there (or maybe they are just bug fixes?). It also has a bundled JavaScript major mode, which works great for my own scripts. Having such nice working environment, I've done my homework and updated my Emacs config in order to fix a few problems I've been living with for too long - here's the link: https://github.com/tomaszzielinski/My-Emacs-config/commit/85560cb7e8da2ba7fe7b827d0ea16606f50f7749

 

 

The Weekly Review

Irregular as usual, but hot as always:

Changing MySQL/InnoDb log buffer size (lifesaver hint)

The default configuration of MySQL is far from optimal. That's why one of my first steps after the installation is to tweak /etc/mysql/my.cnf.

You can find a nice reference configurations on a few pages around the web, and one of my favourite ones is: http://www.mysqlperformanceblog.com/2007/11/01/innodb-performance-optimization-basics/

One of the parameters they advise to tune is innodb_log_buffer_size. But there is a catch you have to be aware of - if you change that parameter's value, you also have to stop the MySQL service, remove InnoDb log files: http://www.mysqlperformanceblog.com/2007/11/01/innodb-performance-optimization-basics/#comment-364739, and then restart the service again. 

Happy tuning!