Difference between revisions of "python csv"

From Noah.org
Jump to navigationJump to search
m (Created page with 'Category:Engineering Category:Python It annoys me that the Python CSV reader module does not have support for handling comment lines and blank lines. So to fix this one …')
 
m
 
Line 16: Line 16:
 
         self.comment_starts = comment_starts
 
         self.comment_starts = comment_starts
 
     def next (self):
 
     def next (self):
         line = self.f.next()
+
         line = self.wrapped_file.next()
 
         while line.strip().startswith(self.comment_starts) or len(line.strip())==0:
 
         while line.strip().startswith(self.comment_starts) or len(line.strip())==0:
             line = self.f.next()
+
             line = self.wrapped_file.next()
 
         return line
 
         return line
 
     def __iter__ (self):
 
     def __iter__ (self):

Latest revision as of 05:59, 23 May 2011


It annoys me that the Python CSV reader module does not have support for handling comment lines and blank lines. So to fix this one must add the following ugly bit of filter code. Usually when I want to read a CSV file I am working on a quick script. I don't want to have to think about the CSV module or read the documentation. If I were working on a large app that needs to parse a specific CSV dialect then I would write my own parser. When I need to do something quick then I would use the standard Python library CSV because I would expect it to be more flexible and handle a wide variety of use-cases. Unfortunately, this is not the case. It handles a limited number of use-cases, and most of those involve dealing with Microsoft formats, which rarely interest me.

Parsing any given CSV is really easy. The trouble begins when you want to parse different dialects of CSV. It seems to me that the main reason to even have a CSV module is to have a single place that gathers and organizes the years of collected experience in handling all the various quirky CSV dialects. This way I don't have to stop what I'm doing and walk down to the corner store just to pick up yet another battery.

The CSV module should be filled with messy details. CSV parsing is messy. The purpose of the module should be to organize and isolate these messy bits in one place where bugs and can be fixed once for everybody. Handling comments and blank lines is hardly an obscure and idiosyncratic use-case.

Anyway... here is the ugly wrapper to the CSV module to take care of comments and blank lines:


class CSVFileCleaner:
    def __init__ (self, file_to_wrap, comment_starts=('#', '//', '-- ')):
        self.wrapped_file = file_to_wrap
        self.comment_starts = comment_starts
    def next (self):
        line = self.wrapped_file.next()
        while line.strip().startswith(self.comment_starts) or len(line.strip())==0:
            line = self.wrapped_file.next()
        return line
    def __iter__ (self):
        return self

if __name__ == __main__:
    import csv
    radiation_events = csv.reader(CSVFileCleaner(open("ionization-event-log.csv")) )
    # Print date-time-stamp and Exponentially Weighted Counts Per Second:
    for row in radiation_events:
        print row[0], row[3] # prints column 3 of each line