Releases: pyparsing/pyparsing
Pyparsing 3.1.2
-
Support for Python 3.13.
-
Added
ieee_float
expression topyparsing.common
, which parses float values, plus "NaN", "Inf", "Infinity". PR submitted by Bob Peterson (#538). -
Updated pep8 synonym wrappers for better type checking compatibility. PR submitted by Ricardo Coccioli (#507).
-
Fixed empty error message bug, PR submitted by InSync (#534). This should return pyparsing's exception messages to a former, more helpful form. If you have code that parses the exception messages returned by pyparsing, this may require some code changes.
-
Added unit tests to test for exception message contents, with enhancement to
pyparsing.testing.assertRaisesParseException
to accept an expected exception message. -
Updated example
select_parser.py
to use PEP8 names and added Groups for better retrieval of parsed values from multiple SELECT clauses. -
Added example
email_address_parser.py
, as suggested by John Byrd (#539). -
Added example
directx_x_file_parser.py
to parse DirectX template definitions, and generate a Pyparsing parser from a template to parse .x files. -
Some code refactoring to reduce code nesting, PRs submitted by InSync.
-
All internal string expressions using '%' string interpolation and
str.format()
converted to f-strings.
Pyparsing 3.1.1
-
Fixed regression in
Word(min)
, reported by Ricardo Coccioli, good catch! (Issue #502) -
Fixed bug in bad exception messages raised by
Forward
expressions. PR submitted by Kyle Sunden, thanks for your patience and collaboration on this (#493). -
Fixed regression in
SkipTo
, where ignored expressions were not checked when looking for the target expression. Reported by catcombo, Issue #500. -
Fixed type annotation for
enable_packrat
, PR submitted by Mike Urbach, thanks! (Issue #498) -
Some general internal code cleanup. (Instigated by Michal Čihař, Issue #488)
Pyparsing 3.1.0
NOTE: In the future release 3.2.0, use of many of the pre-PEP8 methods (such as ParserElement.parseString
) will start to raise DeprecationWarnings
. 3.2.0 should get released some time later in 2023. I currently plan to completely drop the pre-PEP8 methods in pyparsing 4.0, though we won't see that release until at least late 2023 if not 2024. So there is plenty of time to convert existing parsers to the new function names before the old functions are completely removed. (Big help from Devin J. Pohly in structuring the code to enable this peaceful transition.)
Version 3.2.0 will also discontinue support for Python versions 3.6 and 3.7.
Version 3.1.0 - June, 2023
API CHANGES
-
A slight change has been implemented when unquoting a quoted string parsed using the
QuotedString
class. Formerly, when unquoting and processing whitespace markers such as \t and \n, these substitutions would occur first, and then any additional '' escaping would be done on the resulting string. This would parse "\\n" as "\<newline>". Now escapes and whitespace markers are all processed in a single pass working left to right, so the quoted string "\\n" would get unquoted to "\n" (a backslash followed by "n"). Fixes issue #474 raised by jakeanq, thanks! -
Reworked
delimited_list
function into the newDelimitedList
class.DelimitedList
has the same constructor interface asdelimited_list
, and in this release,delimited_list
changes from a function to a synonym forDelimitedList
.delimited_list
and the olderdelimitedList
method will be deprecated in a future release, in favor ofDelimitedList
. -
ParserElement.validate()
is deprecated. It predates the support for left-recursive parsers, and was prone to false positives (warning that a grammar was invalid when it was in fact valid). It will be removed in a future pyparsing release. In its place, developers should use debugging and analytical tools, such asParserElement.set_debug()
andParserElement.create_diagram()
. (Raised in Issue #444, thanks Andrea Micheli!)
NEW FEATURES AND ENHANCEMENTS
-
Optional(expr)
may now be written asexpr | ""
This will make this code:
"{" + Optional(Literal("A") | Literal("a")) + "}"
writable as:
"{" + (Literal("A") | Literal("a") | "") + "}"
Some related changes implemented as part of this work:
Literal("")
now internally generates anEmpty()
(and no longer raises an exception)Empty
is now a subclass ofLiteral
Suggested by Antony Lee (issue #412), PR (#413) by Devin J. Pohly.
-
Added new class method
ParserElement.using_each
, to simplify code that creates a sequence ofLiterals
,Keywords
, or otherParserElement
subclasses.For instance, to define suppressible punctuation, you would previously write:
LPAR, RPAR, LBRACE, RBRACE, SEMI = map(Suppress, "(){};")
You can now write:
LPAR, RPAR, LBRACE, RBRACE, SEMI = Suppress.using_each("(){};")
using_each
will also accept optional keyword args, which it will pass through to the class initializer. Here is an expression for single-letter variable names that might be used in an algebraic expression:algebra_var = MatchFirst( Char.using_each(string.ascii_lowercase, as_keyword=True) )
-
Added new builtin
python_quoted_string
, which will match any form of single-line or multiline quoted strings defined in Python. (Inspired by discussion with Andreas Schörgenhumer in Issue #421.) -
Extended
expr[]
notation for repetition ofexpr
to accept a slice, where the slice's stop value indicates astop_on
expression:test = "BEGIN aaa bbb ccc END" BEGIN, END = Keyword.using_each("BEGIN END".split()) body_word = Word(alphas) expr = BEGIN + Group(body_word[...:END]) + END # equivalent to # expr = BEGIN + Group(ZeroOrMore(body_word, stop_on=END)) + END print(expr.parse_string(test))
Prints:
['BEGIN', ['aaa', 'bbb', 'ccc'], 'END']
-
Added named field "url" to
pyparsing.common.url
, returning the entire parsed URL string. -
Added bool
embed
argument toParserElement.create_diagram()
. When passed as True, the resulting diagram will omit the<DOCTYPE>
,<HEAD>
, and<BODY>
tags so that it can be embedded in other HTML source. (Useful when embedding a call tocreate_diagram()
in a PyScript HTML page.) -
Added
recurse
argument toParserElement.set_debug
to set the debug flag on an expression and all of its sub-expressions. Requested by multimeric in Issue #399. -
Added '·' (Unicode MIDDLE DOT) to the set of Latin1.identbodychars.
-
ParseResults
now has a new methoddeepcopy()
, in addition to the currentcopy()
method.copy()
only makes a shallow copy - any containedParseResults
are copied as references - changes in the copy will be seen as changes in the original. In many cases, a shallow copy is sufficient, but some applications require a deep copy.deepcopy()
makes a deeper copy: any containedParseResults
or other mappings or containers are built with copies from the original, and do not get changed if the original is later changed. Addresses issue #463, reported by Bryn Pickering. -
Added new class property
identifier
to all Unicode set classes inpyparsing.unicode
, using the class's values forcls.identchars
andcls.identbodychars
. Now Unicode-aware parsers that formerly wrote:ppu = pyparsing.unicode ident = Word(ppu.Greek.identchars, ppu.Greek.identbodychars)
can now write:
ident = ppu.Greek.identifier # or # ident = ppu.Ελληνικά.identifier
-
Error messages from
MatchFirst
andOr
expressions will try to give more details if one of the alternatives matches better than the others, but still fails. Question raised in Issue #464 by msdemlei, thanks!
BUG FIXES AND GENERAL CHANGES
-
Added support for Python 3.12.
-
Updated
ci.yml
permissions to limit default access to source - submitted by Joyce Brum of Google. Thanks so much! -
Updated
create_diagram()
code to be compatible withrailroad-diagrams
package version 3.0. Fixes Issue #477 (railroad diagrams generated with black bars), reported by Sam Morley-Short. -
Fixed bug in
NotAny
, where parse actions on the negated expr were not being run. This could causeNotAny
to incorrectly fail if the expr would normally match, but would fail to match if a condition used as a parse action returned False. Fixes Issue #482, raised by byaka, thank you! -
Fixed
create_diagram()
to accept keyword args, to be passed through to thetemplate.render()
method to generate the output HTML (PR submitted by Aussie Schnore, good catch!) -
Fixed bug in
python_quoted_string
regex. -
Fixed bug when parse actions returned an empty string for an expression that had a results name, that the results name was not saved. That is:
expr = Literal("X").add_parse_action(lambda tokens: "")("value") result = expr.parse_string("X") print(result["value"])
would raise a
KeyError
. Now empty strings will be saved with the associated results name. Raised in Issue #470 by Nicco Kunzmann, thank you. -
Fixed bug in
SkipTo
where ignore expressions were not properly handled while scanning for the target expression. Issue #475, reported by elkniwt, thanks (this bug has been there for a looooong time!). -
Fixed bug in
Word
whenmax=2
. Also added performance enhancement when specifyingexact
argument. Reported in issue #409 by panda-34, nice catch! -
Word
arguments are now validated ifmin
andmax
are both given, thatmin
<=max
; raisesValueError
if values are invalid. -
Fixed bug in srange, when parsing escaped '/' and '' inside a range set.
-
Fixed exception messages for some
ParserElements
with custom names, which instead showed their contained expression names. -
Fixed bug in pyparsing.common.url, when input URL is not alone on an input line. Fixes Issue #459, reported by David Kennedy.
-
Multiple added and corrected type annotations. With much help from Stephen Rosen, thanks!
-
Some documentation and error message clarifications on pyparsing's keyword logic, cited by Basil Peace.
-
General docstring cleanup for Sphinx doc generation, PRs submitted by Devin J. Pohly. A dirty job, but someone has to do it - much appreciated!
EXAMPLE UPDATES
-
Added
tag_emitter.py
to examples. This example demonstrates how to insert tags into your parsed results that are not part of the original parsed text. -
Added
bf.py
Brainf*ck parser/executor example. Illustrates using a pyparsing grammar to parse language syntax, and attach executable AST nodes to the parsed results. -
invRegex.py
example renamed toinv_regex.py
and updated to PEP-8 variable and method naming. PR submitted by Ross J. Duff, thanks! -
Removed examples
sparser.py
andpymicko.py
, since each included its own GPL license in the header. Since this conflicts with pyparsing's MIT license, they were removed from the distribution to avoid confusion among those making use of them in their own projects. -
Updated the
lucene_grammar.py
example (better support for '*' and '?' wildcards) and corrected the test cases - brought to my attention by Elijah Nicol, good catch!
Pyparsing 3.1.0b2
-
Updated
create_diagram()
code to be compatible with railroad-diagrams package version 3.0. Fixes Issue #477 (railroad diagrams generated with black bars), reported by Sam Morley-Short. -
Fixed bug in
NotAny
, where parse actions on the negated expr were not being run. This could causeNotAny
to incorrectly fail if the expr would normally match, but would fail to match if a condition used as a parse action returned False. Fixes Issue #482, raised by byaka, thank you! -
Fixed
create_diagram()
to accept keyword args, to be passed through to thetemplate.render()
method to generate the output HTML (PR submitted by Aussie Schnore, good catch!) -
Fixed bug in
python_quoted_string
regex. -
Added
examples/bf.py
Brainf*ck parser/executor example. Illustrates using a pyparsing grammar to parse language syntax, and attach executable AST nodes to the parsed results.
Pyparsing 3.1.0b1
-
Added support for Python 3.12.
-
API CHANGE: A slight change has been implemented when unquoting a quoted string parsed using the QuotedString class. Formerly, when unquoting and processing whitespace markers such as \t and \n, these substitutions would occur first, and then any additional '' escaping would be done on the resulting string. This would parse "\n" as "<newline>". Now escapes and whitespace markers are all processed in a single pass working left to right, so the quoted string "\n" would get unquoted to "\n" (a backslash followed by "n"). Fixes issue #474 raised by jakeanq, thanks!
-
Added named field "url" to pyparsing.common.url, returning the entire parsed URL string.
-
Fixed bug when parse actions returned an empty string for an expression that had a results name, that the results name was not saved. That is:
expr = Literal("X").add_parse_action(lambda tokens: "")("value") result = expr.parse_string("X") print(result["value"])
would raise a
KeyError
. Now empty strings will be saved with the associated results name. Raised in Issue #470 by Nicco Kunzmann, thank you. -
Fixed bug in
SkipTo
where ignore expressions were not properly handled while scanning for the target expression. Issue #475, reported by elkniwt, thanks (this bug has been there for a looooong time!). -
Updated ci.yml permissions to limit default access to source - submitted by Joyce Brum of Google. Thanks so much!
-
Updated the lucene_grammar.py example (better support for '*' and '?' wildcards) and corrected the test cases - brought to my attention by Elijah Nicol, good catch!
Pyparsing 3.1.0a1
NOTE: In the future release 3.2.0, use of many of the pre-PEP8 methods (such as ParserElement.parseString
) will start to raise DeprecationWarnings
. 3.2.0 should get released some time later in 2023. I currently plan to completely drop the pre-PEP8 methods in pyparsing 4.0, though we won't see that release until at least late 2023 if not 2024. So there is plenty of time to convert existing parsers to the new function names before the old functions are completely removed. (Big help from Devin J. Pohly in structuring the code to enable this peaceful transition.)
Version 3.2.0 will also discontinue support for Python versions 3.6 and 3.7.
-
API ENHANCEMENT:
Optional(expr)
may now be written asexpr | ""
This will make this code:
"{" + Optional(Literal("A") | Literal("a")) + "}"
writable as:
"{" + (Literal("A") | Literal("a") | "") + "}"
Some related changes implemented as part of this work:
Literal("")
now internally generates anEmpty()
(and no longer raises an exception)Empty
is now a subclass ofLiteral
Suggested by Antony Lee (issue #412), PR (#413) by Devin J. Pohly.
-
Added new class property
identifier
to all Unicode set classes inpyparsing.unicode
, using the class's values forcls.identchars
andcls.identbodychars
. Now Unicode-aware parsers that formerly wrote:ppu = pyparsing.unicode ident = Word(ppu.Greek.identchars, ppu.Greek.identbodychars)
can now write:
ident = ppu.Greek.identifier # or # ident = ppu.Ελληνικά.identifier
-
Reworked
delimited_list
function into the newDelimitedList
class.DelimitedList
has the same constructor interface asdelimited_list
, and in this release,delimited_list
changes from a function to a synonym forDelimitedList
.delimited_list
and the olderdelimitedList
method will be deprecated in a future release, in favor ofDelimitedList
. -
Added new class method
ParserElement.using_each
, to simplify code that creates a sequence ofLiterals
,Keywords
, or otherParserElement
subclasses.For instance, to define suppressable punctuation, you would previously write:
LPAR, RPAR, LBRACE, RBRACE, SEMI = map(Suppress, "(){};")
You can now write:
LPAR, RPAR, LBRACE, RBRACE, SEMI = Suppress.using_each("(){};")
using_each
will also accept optional keyword args, which it will pass through to the class initializer. Here is an expression for single-letter variable names that might be used in an algebraic expression:algebra_var = MatchFirst( Char.using_each(string.ascii_lowercase, as_keyword=True) )
-
Added new builtin
python_quoted_string
, which will match any form of single-line or multiline quoted strings defined in Python. (Inspired by discussion with Andreas Schörgenhumer in Issue #421.) -
Extended
expr[]
notation for repetition ofexpr
to accept a slice, where the slice's stop value indicates astop_on
expression:test = "BEGIN aaa bbb ccc END" BEGIN, END = Keyword.using_each("BEGIN END".split()) body_word = Word(alphas) expr = BEGIN + Group(body_word[:END]) + END # equivalent to # expr = BEGIN + Group(ZeroOrMore(body_word, stop_on=END)) + END print(expr.parse_string(test))
Prints:
['BEGIN', ['aaa', 'bbb', 'ccc'], 'END']
-
ParserElement.validate()
is deprecated. It predates the support for left-recursive parsers, and was prone to false positives (warning that a grammar was invalid when it was in fact valid). It will be removed in a future pyparsing release. In its place, developers should use debugging and analytical tools, such asParserElement.set_debug()
andParserElement.create_diagram()
. (Raised in Issue #444, thanks Andrea Micheli!) -
Added bool
embed
argument toParserElement.create_diagram()
. When passed as True, the resulting diagram will omit the<DOCTYPE>
,<HEAD>
, and<BODY>
tags so that it can be embedded in other HTML source. (Useful when embedding a call tocreate_diagram()
in a PyScript HTML page.) -
Added
recurse
argument toParserElement.set_debug
to set the debug flag on an expression and all of its sub-expressions. Requested by multimeric in Issue #399. -
Added '·' (Unicode MIDDLE DOT) to the set of
pp.unicode.Latin1.identbodychars
. -
Fixed bug in
Word
whenmax=2
. Also added performance enhancement when specifyingexact
argument. Reported in issue #409 by panda-34, nice catch! -
Word
arguments are now validated ifmin
andmax
are both given, thatmin
<=max
; raisesValueError
if values are invalid. -
Fixed bug in
srange
, when parsing escaped '/' and '' inside a range set. -
Fixed exception messages for some
ParserElements
with custom names, which instead showed their contained expression names. -
Fixed bug in pyparsing.common.url, when input URL is not alone on an input line. Fixes Issue #459, reported by David Kennedy.
-
Multiple added and corrected type annotations. With much help from Stephen Rosen, thanks!
-
Some documentation and error message clarifications on pyparsing's keyword logic, cited by Basil Peace.
-
General docstring cleanup for Sphinx doc generation, PRs submitted by Devin J. Pohly. A dirty job, but someone has to do it - much appreciated!
-
invRegex.py example renamed to inv_regex.py and updated to PEP-8 variable and method naming. PR submitted by Ross J. Duff, thanks!
-
Removed examples sparser.py and pymicko.py, since each included its own GPL license in the header. Since this conflicts with pyparsing's MIT license, they were removed from the distribution to avoid confusion among those making use of them in their own projects.
pyparsing 3.0.9
-
Added Unicode set
BasicMultilingualPlane
(may also be referenced asBMP
) representing the Basic Multilingual Plane (Unicode characters up to code point 65535). Can be used to parse most language characters, but omits emojis, wingdings, etc. Raised in discussion with Dave Tapley (issue #392). -
To address mypy confusion of
pyparsing.Optional
andtyping.Optional
resulting inerror: "_SpecialForm" not callable
message reported in issue #365, fixed the import in exceptions.py. Nice sleuthing by Iwan Aucamp and Dominic Davis-Foster, thank you! (Removed definitions ofOptionalType
,DictType
, andIterableType
and replaced them withtyping.Optional
,typing.Dict
, andtyping.Iterable
throughout.) -
Fixed typo in jinja2 template for railroad diagrams, thanks for the catch Nioub (issue #388).
-
Removed use of deprecated
pkg_resources
package in railroad diagramming code (issue #391). -
Updated bigquery_view_parser.py example to parse examples at https://cloud.google.com/bigquery/docs/reference/legacy-sql
pyparsing 3.0.8
Version 3.0.8 -
-
API CHANGE: modified pyproject.toml to require Python version 3.6.8 or later for pyparsing 3.x. Earlier minor versions of 3.6 fail in evaluating the
version_info
class (implemented usingtyping.NamedTuple
). If you are using an earlier version of Python 3.6, you will need to use pyparsing 2.4.7. -
Improved pyparsing import time by deferring regex pattern compiles. PR submitted by Anthony Sottile to fix issue #362, thanks!
-
Updated build to use flit, PR by Michał Górny, added BUILDING.md doc and removed old Windows build scripts - nice cleanup work!
-
More type-hinting added for all arithmetic and logical operator methods in
ParserElement
. PR from Kazantcev Andrey, thank you. -
Fixed
infix_notation
's definitions oflpar
andrpar
, to accept parse expressions such that they do not get suppressed in the parsed results. PR submitted by Philippe Prados, nice work. -
Fixed bug in railroad diagramming with expressions containing
Combine
elements. Reported by Jeremy White, thanks! -
Added
show_groups
argument tocreate_diagram
to highlight grouped elements with an unlabeled bounding box. -
Added
unicode_denormalizer.py
to the examples as a demonstration of how Python's interpreter will accept Unicode characters in identifiers, but normalizes them back to ASCII so that identifiersprint
and𝕡𝓻ᵢ𝓃𝘁
and𝖕𝒓𝗂𝑛ᵗ
are all equivalent. -
Removed imports of deprecated
sre_constants
module for catching exceptions when compiling regular expressions. PR submitted by Serhiy Storchaka, thank you.
pyparsing 3.0.7
-
Fixed bug #345, in which delimitedList changed expressions in place using expr.streamline(). Reported by Kim Gräsman, thanks!
-
Fixed bug #346, when a string of word characters was passed to WordStart or WordEnd instead of just taking the default value. Originally posted as a question by Parag on StackOverflow, good catch!
-
Fixed bug #350, in which White expressions could fail to match due to unintended whitespace-skipping. Reported by Fu Hanxi, thank you!
-
Fixed bug #355, when a QuotedString is defined with characters in its quoteChar string containing regex-significant characters such as ., *, ?, [, ], etc.
-
Fixed bug in ParserElement.run_tests where comments would be displayed using with_line_numbers.
-
Added optional "min" and "max" arguments to
delimited_list
. PR submitted by Marius, thanks! -
Added new API change note in
whats_new_in_pyparsing_3_0_0
, regarding a bug fix in thebool()
behavior ofParseResults
.Prior to pyparsing 3.0.x, the
ParseResults
class implementation of__bool__
would returnFalse
if theParseResults
item list was empty, even if it contained named results. In 3.0.0 and later,ParseResults
will returnTrue
if either the item list is not empty or if the named results dict is not empty.# generate an empty ParseResults by parsing a blank string with # a ZeroOrMore result = Word(alphas)[...].parse_string("") print(result.as_list()) print(result.as_dict()) print(bool(result)) # add a results name to the result result["name"] = "empty result" print(result.as_list()) print(result.as_dict()) print(bool(result))
Prints:
[] {} False [] {'name': 'empty result'} True
In previous versions, the second call to
bool()
would returnFalse
. -
Minor enhancement to Word generation of internal regular expression, to emit consecutive characters in range, such as "ab", as "ab", not "a-b".
-
Fixed character ranges for search terms using non-Western characters in booleansearchparser, PR submitted by tc-yu, nice work!
-
Additional type annotations on public methods.
pyparsing 3.0.6
-
Added
suppress_warning()
method to individually suppress a warning on a specificParserElement
. Used to refactororiginal_text_for
to preserve internal results names, which, while undocumented, had been adopted by some projects. -
Fix bug when
delimited_list
was called with a str literal instead of a parse expression.