Parsing a CSV with a non-CSV header #393
-
I'm trying to parse a 'CSV' file with some non-CSV header before it. That said I've been tearing my hair out and I just can't get it to parse, so I'm hoping someone can help 🙏🏻 I'll post my whole parser, in case there's any other Bad Things. My input file looks like this:
And here's what I've come up with (incl. workaround for #392). import pyparsing as pp
class BasicMultilingualPlane(unicode_set):
"Unicode set for Basic Multilingual Plane (BMP)"
_ranges: UnicodeRangeList = [(0x0000, 0xFFFF)]
start = pp.Suppress(pp.Literal('Some fixed title'))
foo = pp.Keyword('Foo:') + pp.Word(pp.alphas + '_')('foo')
number = pp.common.real.setParseAction(pp.common.convert_to_float)
num = pp.Keyword('Some number (N):') + number('num')
key = pp.Word(BasicMultilingualPlane.printables, exclude_chars=':')
other = pp.Suppress(key + pp.FollowedBy(':') + pp.rest_of_line)
header = start + (foo | num | other)[...] Now into the 'CSV' part. My CSV header parser works, but only because the csv_header_word =(pp.Word(BasicMultilingualPlane.alphas + '�()/% '))
csv_header = pp.delimitedList(csv_header_word, allow_trailing_delim=True)('csv_header') Here's where I've spent hours and am getting nowhere: csv_body_word = pp.Word(pp.nums + ':-.')
csv_row = pp.delimitedList(csv_body_word, allow_trailing_delim=True).setWhitespaceChars('')
csv_body = pp.Group(csv_row[...]('csv_body')) I feel like |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
I think I only changed code from csv_body_word on down.
|
Beta Was this translation helpful? Give feedback.
I think I only changed code from csv_body_word on down.