Releases: pyparsing/pyparsing
Pyparsing 2.4.1
For a minor point release, this release contains many new features!
-
A new shorthand notation has been added for repetition expressions:
expr[min, max]
, with...
valid as a min or max value:expr[...]
is equivalent toOneOrMore(expr)
expr[0, ...]
is equivalent toZeroOrMore(expr)
expr[1, ...]
is equivalent toOneOrMore(expr)
expr[n, ...]
orexpr[n,]
is equivalent toexpr*n + ZeroOrMore(expr)
(read as "n or more instances of expr")expr[..., n]
is equivalent toexpr*(0, n)
expr[m, n]
is equivalent toexpr*(m, n)
Note thatexpr[..., n]
andexpr[m, n]
do not raise an exception if more than n exprs exist in the input stream. If this behavior is desired, then writeexpr[..., n] + ~expr
.
-
...
can also be used as short hand forSkipTo
when used in adding parse expressions to compose anAnd
expression.Literal('start') + ... + Literal('end') And(['start', ..., 'end'])
are both equivalent to:
Literal('start') + SkipTo('end')("_skipped*") + Literal('end')
The
...
form has the added benefit of not requiring repeating the skip target expression. Note that the skipped text is returned with '_skipped' as a results name, and that the contents of_skipped
will contain a list of text from all...
s in the expression. -
...
can also be used as a "skip forward in case of error" expression:expr = "start" + (Word(nums).setName("int") | ...) + "end" expr.parseString("start 456 end") ['start', '456', 'end'] expr.parseString("start 456 foo 789 end") ['start', '456', 'foo 789 ', 'end'] - _skipped: ['foo 789 '] expr.parseString("start foo end") ['start', 'foo ', 'end'] - _skipped: ['foo '] expr.parseString("start end") ['start', '', 'end'] - _skipped: ['missing <int>']
Note that in all the error cases, the
'_skipped'
results name is present, showing a list of the extra or missing items.This form is only valid when used with the
'|'
operator. -
Improved exception messages to show what was actually found, not just what was expected.
word = pp.Word(pp.alphas) pp.OneOrMore(word).parseString("aaa bbb 123", parseAll=True)
Former exception message:
pyparsing.ParseException: Expected end of text (at char 8), (line:1, col:9)
New exception message:
pyparsing.ParseException: Expected end of text, found '1' (at char 8), (line:1, col:9)
-
Added diagnostic switches to help detect and warn about common parser construction mistakes, or enable additional parse debugging. Switches are attached to the
pyparsing.__diag__
namespace object:warn_multiple_tokens_in_named_alternation
- flag to enable warnings when a results name is defined on aMatchFirst
orOr
expression with one or moreAnd
subexpressions (default=True)warn_ungrouped_named_tokens_in_collection
- flag to enable warnings when a results name is defined on a containing expression with ungrouped subexpressions that also have results names (default=True)warn_name_set_on_empty_Forward
- flag to enable warnings whan a Forward is defined with a results name, but has no contents defined (default=False)warn_on_multiple_string_args_to_oneof
- flag to enable warnings whanoneOf
is incorrectly called with multiple str arguments (default=True)enable_debug_on_named_expressions
- flag to auto-enable debug on all subsequent calls toParserElement.setName()
(default=False)
warn_multiple_tokens_in_named_alternation
is intended to help those who currently have set__compat__.collect_all_And_tokens
to False as a workaround for using the pre-2.3.1 code with namedMatchFirst
orOr
expressions containing anAnd
expression. -
Added
ParseResults.from_dict
classmethod, to simplify creation of aParseResults
with results names using a dict, which may be nested. This makes it easy to add a sub-level of named items to the parsed tokens in a parse action. -
Added
asKeyword
argument (default=False) tooneOf
, to force keyword-style matching on the generated expressions. -
ParserElement.runTests
now accepts an optional 'file' argument to redirect test output to a file-like object (such as a StringIO, or opened file). Default is to write to sys.stdout. -
conditionAsParseAction
is a helper method for constructing a parse action method from a predicate function that simply returns a boolean result. Useful for those places where a predicate cannot be added usingaddCondition
, but must be converted to a parse action (such as ininfixNotation
). May be used as a decorator if default message and exception types can be used. SeeParserElement.addCondition
for more details about the expected signature and behavior for predicate condition methods. -
While investigating issue #93, I found that
Or
andaddCondition
could interact to select an alternative that is not the longest match. This is becauseOr
first checks all alternatives for matches without running attached parse actions or conditions, orders by longest match, and then rechecks for matches with conditions and parse actions. Some expressions, when checking with conditions, may end up matching on a shorter token list than originally matched, but would be selected because of its original priority. This matching code has been expanded to do more extensive searching for matches when a second-pass check matches a smaller list than in the first pass. -
Fixed issue #87, a regression in indented block. Reported by Renz Bagaporo, who submitted a very nice repro example, which makes the bug-fixing process a lot easier, thanks!
-
Fixed MemoryError issue #85 and #91 with str generation for Forwards. Thanks decalage2 and Harmon758 for your patience.
-
Modified
setParseAction
to acceptNone
as an argument, indicating that all previously-defined parse actions for the expression should be cleared. -
Modified
pyparsing_common.real
andsci_real
to parse reals without leading integer digits before the decimal point, consistent with Python real number formats. Original PR #98 submitted by ansobolev. -
Modified
runTests
to callpostParse
function before dumping out the parsed results - allows forpostParse
to add further results, such as indications of additional validation success/failure. -
Updated
statemachine
example: refactored state transitions to use overridden classmethods; added<statename>Mixin
class to simplify definition of application classes that "own" the state object and delegate to it to model state-specific properties and behavior. -
Added example
nested_markup.py
, showing a simple wiki markup with nested markup directives, and illustrating the use of...
for skipping over input to match the next expression. (This example uses syntax that is not valid under Python 2.) -
Rewrote
delta_time.py
example (renamed fromdeltaTime.py
) to fix some omitted formats and upgrade to latest pyparsing idioms, beginning with writing an actual BNF. -
With the help and encouragement from several contributors, including Matej Cepl and Cengiz Kaygusuz, I've started cleaning up the internal coding styles in core pyparsing, bringing it up to modern coding practices from pyparsing's early development days dating back to 2003. Whitespace has been largely standardized along PEP8 guidelines, removing extra spaces around parentheses, and adding them around arithmetic operators and after colons and commas. I was going to hold off on doing this work until after 2.4.1, but after cleaning up a few trial classes, the difference was so significant that I continued on to the rest of the core code base. This should facilitate future work and submitted PRs, allowing them to focus on substantive code changes, and not get sidetracked by whitespace issues.
-
NOTE: Deprecated functions and features that will be dropped in pyparsing 2.5.0 (planned next release):
-
support for Python 2 - ongoing users running with Python 2 can continue to use pyparsing 2.4.1
-
ParseResults.asXML()
- if used for debugging, switch to usingParseResults.dump()
; if used for data transfer, useParseResults.asDict()
to convert to a nested Python dict, which can then be converted to XML or JSON or other transfer format -
operatorPrecedence
synonym forinfixNotation
- convert to callinginfixNotation
-
commaSeparatedList
- convert to usingpyparsing_common.comma_separated_list
-
upcaseTokens
anddowncaseTokens
- convert to usingpyparsing_common.upcaseTokens
anddowncaseTokens
-
__compat__.collect_all_And_tokens
will not be settable to False to revert to pre-2.3.1 results name behavior - review use of names forMatchFirst
andOr
expressions containingAnd
expressions, as they will return the complete list of parsed tokens, not just the first one. Use__diag__.warn_multiple_tokens_in_named_alternation
to help identify those expressions in your parsers that will have changed as a result.
-
Pyparsing 2.4.0
-
Well, it looks like the API change that was introduced in 2.3.1 was more drastic than expected, so for a friendlier forward upgrade path, this release:
. Bumps the current version number to 2.4.0, to reflect this incompatible change.
. Adds apyparsing.__compat__
object for specifying compatibility with future breaking changes.
. Conditionalizes the API-breaking behavior, based on the valuepyparsing.__compat__.collect_all_And_tokens
. By default, this value will be set to True, reflecting the new bugfixed behavior. To set this value to False, add to your code:import pyparsing pyparsing.__compat__.collect_all_And_tokens = False
. User code that is dependent on the pre-bugfix behavior can restore it by setting this value to False.
In 2.5 and later versions, the conditional code will be removed and setting the flag to True or False in these later versions will have no effect.
-
Updated unitTests.py and simple_unit_tests.py to be compatible with
python setup.py test
. To run tests using setup, do:python setup.py test python setup.py test -s unitTests.suite python setup.py test -s simple_unit_tests.suite
Prompted by issue #83 and PR submitted by bdragon28, thanks.
-
Fixed bug in
ParserElement.runTests
handling '\n' literals in quoted strings. -
Added
tag_body
attribute to the start tag expressions generated bymakeHTMLTags
, so that you can avoid usingSkipTo
to roll your own tag body expression:a, aEnd = pp.makeHTMLTags('a') link = a + a.tag_body("displayed_text") + aEnd for t in s.searchString(html_page): print(t.displayed_text, '->', t.startA.href)
-
indentedBlock
failure handling was improved; PR submitted by TMiguelT, thanks! -
Address Py2 incompatibility in
simple_unit_tests
, plus explain() and Forward str() cleanup; PRs graciously provided by eswald. -
Fixed docstring with embedded '\w', which creates SyntaxWarnings in Py3.8, issue #80.
-
Examples:
-
Added example parser for rosettacode.org tutorial compiler.
-
Added example to show how an HTML table can be parsed into a collection of Python lists or dicts, one per row.
-
Updated SimpleSQL.py example to handle nested selects, reworked 'where' expression to use infixNotation.
-
Added include_preprocessor.py, similar to macroExpander.py.
-
Examples using makeHTMLTags use new tag_body expression when retrieving a tag's body text.
-
Updated examples that are runnable as unit tests:
python setup.py test -s examples.antlr_grammar_tests python setup.py test -s examples.test_bibparse
-
Pyparsing 2.3.1
New features in Pyparsing 2.3.1 -
-
ParseException.explain() method, to convert a raw Python traceback into a list of the parse expressions leading up to a parse mismatch.
-
New unicode sets Latin-A and Latin-B, and the ability to define custom sets using multiple inheritance.
class Turkish_set(pp.pyparsing_unicode.Latin1, pp.pyparsing_unicode.LatinA): pass turkish_word = pp.Word(Turkish_set.alphas)
-
State machine examples, showing how to extend Python with your own pyparsing-enabled syntax. The examples implement a 'statemachine' keyword to define a set of classes and transition attribute to implement a State pattern:
statemachine TrafficLightState: Red -> Green Green -> Yellow Yellow -> Red
Transitions can be named also:
statemachine LibraryBookState: New -(shelve)-> Available Available -(reserve)-> OnHold OnHold -(release)-> Available Available -(checkout)-> CheckedOut CheckedOut -(checkin)-> Available
-
Example parser for decaf language. This language is commonly used in university CS compiler classes.
-
Fixup of docstrings to Sphinx format, so pyparsing docs are now available on readthedocs.com! (https://pyparsing-docs.readthedocs.io/en/latest/)
Pyparsing 2.3.0
- NEW SUPPORT FOR UNICODE CHARACTER RANGES
This release introduces the pyparsing_unicode namespace class, defining
a series of language character sets to simplify the definition of alphas,
nums, alphanums, and printables in the following language sets:
. Arabic
. Chinese
. Cyrillic
. Devanagari
. Greek
. Hebrew
. Japanese (including Kanji, Katakana, and Hirigana subsets)
. Korean
. Latin1 (includes 7 and 8-bit Latin characters)
. Thai
. CJK (combination of Chinese, Japanese, and Korean sets)
POSSIBLE API CHANGES:
IndexError
s raised in parse actions are now wrapped inParseException
sParseResults
have had several bugfixes which remove erroneous nesting levels
See the CHANGES file for more details.
New classes:
PrecededBy
- lookbehind matchChar
- single character match (similar toWord(exact=1)
)
pyparsing_2.2.2
Version 2.2.2 - September, 2018
-
Fixed bug in SkipTo, if a SkipTo expression that was skipping to
an expression that returned a list (such as an And), and the
SkipTo was saved as a named result, the named result could be
saved as a ParseResults - should always be saved as a string.
Issue #28, reported by seron. -
Added simple_unit_tests.py, as a collection of easy-to-follow unit
tests for various classes and features of the pyparsing library.
Primary intent is more to be instructional than actually rigorous
testing. Complex tests can still be added in the unitTests.py file. -
New features added to the Regex class:
-
optional asGroupList parameter, returns all the capture groups as
a list -
optional asMatch parameter, returns the raw re.match result
-
new sub(repl) method, which adds a parse action calling
re.sub(pattern, repl, parsed_result). Simplifies creating
Regex expressions to be used with transformString. Like re.sub,
repl may be an ordinary string (similar to using pyparsing's
replaceWith), or may contain references to capture groups by group
number, or may be a callable that takes an re match group and
returns a string.For instance:
expr = pp.Regex(r"([Hh]\d):\s*(.*)").sub(r"<\1>\2</\1>") expr.transformString("h1: This is the title")
will return
<h1>This is the title</h1>
-
-
Fixed omission of LICENSE file in source tarball, also added
CODE_OF_CONDUCT.md per GitHub community standards.
Issue #31
pyparsing_2.2.1
- Updates to migrate source repo to GitHub
- Fix deprecation warning in Python 3.7 re: importing collections.abc
- Fix Literal/Keyword bug raising IndexError instead of ParseException