python csv

From Noah.org
Revision as of 04:59, 23 May 2011 by Root (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search


It annoys me that the Python CSV reader module does not have support for handling comment lines and blank lines. So to fix this one must add the following ugly bit of filter code. Usually when I want to read a CSV file I am working on a quick script. I don't want to have to think about the CSV module or read the documentation. If I were working on a large app that needs to parse a specific CSV dialect then I would write my own parser. When I need to do something quick then I would use the standard Python library CSV because I would expect it to be more flexible and handle a wide variety of use-cases. Unfortunately, this is not the case. It handles a limited number of use-cases, and most of those involve dealing with Microsoft formats, which rarely interest me.

Parsing any given CSV is really easy. The trouble begins when you want to parse different dialects of CSV. It seems to me that the main reason to even have a CSV module is to have a single place that gathers and organizes the years of collected experience in handling all the various quirky CSV dialects. This way I don't have to stop what I'm doing and walk down to the corner store just to pick up yet another battery.

The CSV module should be filled with messy details. CSV parsing is messy. The purpose of the module should be to organize and isolate these messy bits in one place where bugs and can be fixed once for everybody. Handling comments and blank lines is hardly an obscure and idiosyncratic use-case.

Anyway... here is the ugly wrapper to the CSV module to take care of comments and blank lines:


class CSVFileCleaner:
    def __init__ (self, file_to_wrap, comment_starts=('#', '//', '-- ')):
        self.wrapped_file = file_to_wrap
        self.comment_starts = comment_starts
    def next (self):
        line = self.wrapped_file.next()
        while line.strip().startswith(self.comment_starts) or len(line.strip())==0:
            line = self.wrapped_file.next()
        return line
    def __iter__ (self):
        return self

if __name__ == __main__:
    import csv
    radiation_events = csv.reader(CSVFileCleaner(open("ionization-event-log.csv")) )
    # Print date-time-stamp and Exponentially Weighted Counts Per Second:
    for row in radiation_events:
        print row[0], row[3] # prints column 3 of each line