Releases: pyparsing/pyparsing
Pyparsing 3.0.0b1
-
API CHANGE
Diagnostic flags have been moved to an enum,pyparsing.Diagnostics
, and they are enabled through module-level methods:pyparsing.enable_diag()
pyparsing.disable_diag()
pyparsing.enable_all_warnings()
-
API CHANGE
Most previousSyntaxWarnings
that were warned when using pyparsing classes incorrectly have been converted toTypeError
andValueError
exceptions, consistent with Python calling conventions. All warnings warned by diagnostic flags have been converted fromSyntaxWarnings
toUserWarnings
. -
To support parsers that are intended to generate native Python collection types such as lists and dicts, the
Group
andDict
classes now accept an additional boolean keyword argumentaslist
andasdict
respectively. See thejsonParser.py
example in thepyparsing/examples
source directory for how to return types asParseResults
and as Python collection types, and the distinctions in working with the different types.In addition parse actions that must return a value of list type (which would normally be converted internally to a ParseResults) can override this default behavior by returning their list wrapped in the new
ParseResults.List
class:# this parse action tries to return a list, but pyparsing # will convert to a ParseResults def return_as_list_but_still_get_parse_results(tokens): return tokens.asList() # this parse action returns the tokens as a list, and pyparsing will # maintain its list type in the final parsing results def return_as_list(tokens): return ParseResults.List(tokens.asList())
This is the mechanism used internally by the
Group
class when defined usingaslist=True
. -
A new
IndentedBlock
class is introduced, to eventually replace the currentindentedBlock
helper method. The interface is largely the same, however, the new class manages its own internal indentation stack, so it is no longer necessary to maintain an externalindentStack
variable. -
API CHANGE
Addedcache_hit
keyword argument to debug actions. Previously, if packrat parsing was enabled, the debug methods were not called in the event of cache hits. Now these methods will be called, with an added argumentcache_hit=True
.If you are using packrat parsing and enable debug on expressions using a custom debug method, you can add the
cache_hit=False
keyword argument,
and your method will be called on packrat cache hits. If you choose not to add this keyword argument, the debug methods will fail silently, behaving as they did previously. -
When using
setDebug
with packrat parsing enabled, packrat cache hits will now be included in the output, shown with a leading '*'. (Previously, cache hits and responses were not included in debug output.) For those using custom debug actions, see the previous item regarding an optional API change for those methods. -
setDebug
output will also show more details about what expression is about to be parsed (the current line of text being parsed, and the current parse position):Match integer at loc 0(1,1) 1 2 3 ^ Matched integer -> ['1']
The current debug location will also be indicated after whitespace has been skipped (was previously inconsistent, reported in Issue #244, by Frank Goyens, thanks!).
-
Modified the repr() output for
ParseResults
to include the class name as part of the output. This is to clarify for new pyparsing users who misread the repr output as a tuple of a list and a dict. pyparsing results will now read like:ParseResults(['abc', 'def'], {'qty': 100}]
instead of just:
(['abc', 'def'], {'qty': 100}]
-
Fixed bugs in Each when passed OneOrMore or ZeroOrMore expressions:
. first expression match could be enclosed in an extra nesting level
. out-of-order expressions now handled correctly if mixed with required expressions
. results names are maintained correctly for these expressions -
Fixed traceback trimming, and added
ParserElement.verbose_traceback
save/restore toreset_pyparsing_context()
. -
Default string for
Word
expressions now also include indications ofmin
andmax
length specification, if applicable, similar to regex length specifications:Word(alphas) -> "W:(A-Za-z)" Word(nums) -> "W:(0-9)" Word(nums, exact=3) -> "W:(0-9){3}" Word(nums, min=2) -> "W:(0-9){2,...}" Word(nums, max=3) -> "W:(0-9){1,3}" Word(nums, min=2, max=3) -> "W:(0-9){2,3}"
For expressions of the
Char
class (similar toWord(..., exact=1)
, the expression is simply the character range in parentheses:Char(nums) -> "(0-9)" Char(alphas) -> "(A-Za-z)"
-
Removed
copy()
override inKeyword
class which did not preserve definition of ident chars from the original expression. PR #233 submitted by jgrey4296, thanks! -
In addition to
pyparsing.__version__
, there is now also apyparsing.__version_info__
, following the same structure and field names as insys.version_info
.
Pyparsing 3.0.0a2
Version 3.0.0a2 - June, 2020
-
Summary of changes for 3.0.0 can be found in "What's New in Pyparsing 3.0.0" documentation.
-
API CHANGE
Changed result returned when parsing using countedArray, the array items are no longer returned in a doubly-nested list. -
An excellent new enhancement is the new railroad diagram generator for documenting pyparsing parsers:
import pyparsing as pp from pyparsing.diagram import to_railroad, railroad_to_html from pathlib import Path # define a simple grammar for parsing street addresses such # as "123 Main Street" # number word... number = pp.Word(pp.nums).setName("number") name = pp.Word(pp.alphas).setName("word")[1, ...] parser = number("house_number") + name("street") parser.setName("street address") # construct railroad track diagram for this parser and # save as HTML rr = to_railroad(parser) Path('parser_rr_diag.html').write_text(railroad_to_html(rr))
Very nice work provided by Michael Milton, thanks a ton!
-
Enhanced default strings created for Word expressions, now showing string ranges if possible.
Word(alphas)
would formerly print asW:(ABCD...)
, now prints asW:(A-Za-z)
. -
Added ignoreWhitespace(recurse:bool = True) and added a recurse argument to leaveWhitespace, both added to provide finer control over pyparsing's whitespace skipping. Also contributed by Michael Milton.
-
The unicode range definitions for the various languages were recalculated by interrogating the unicodedata module by character name, selecting characters that contained that language in their Unicode name. (Issue #227)
Also, pyparsing_unicode.Korean was renamed to Hangul (Korean is also defined as a synonym for compatibility).
-
Enhanced ParseResults dump() to show both results names and list subitems. Fixes bug where adding a results name would hide lower-level structures in the ParseResults.
-
Added new
__diag__
warnings:"warn_on_parse_using_empty_Forward" - warns that a Forward has been included in a grammar, but no expression was attached to it using '<<=' or '<<'
"warn_on_assignment_to_Forward" - warns that a Forward has been created, but was probably later overwritten by erroneously using '=' instead of '<<=' (this is a common mistake when using Forwards) (currently not working on PyPy)
-
Added ParserElement.recurse() method to make it simpler for grammar utilities to navigate through the tree of expressions in a pyparsing grammar.
-
Fixed bug in ParseResults repr() which showed all matching entries for a results name, even if listAllMatches was set to False when creating the ParseResults originally. Reported by Nicholas42 on GitHub, good catch! (Issue #205)
-
Modified refactored modules to use relative imports, as pointed out by setuptools project member jaraco, thank you!
-
Off-by-one bug found in the roman_numerals.py example, a bug that has been there for about 14 years! PR submitted by Jay Pedersen, nice catch!
-
A simplified Lua parser has been added to the examples (lua_parser.py).
-
Added make_diagram.py to the examples directory to demonstrate creation of railroad diagrams for selected pyparsing examples. Also restructured some examples to make their parsers importable without running their embedded tests.
Pyparsing 2.4.7
Version 2.4.7 - April, 2020
- Backport of selected fixes from 3.0.0 work:
. Each bug with Regex expressions
. And expressions not properly constructing with generator
. Traceback abbreviation
. Bug in delta_time example
. Fix regexen in pyparsing_common.real and .sci_real
. Avoid FutureWarning on Python 3.7 or later
. Cleanup output in runTests if comments are embedded in test string
Pyparsing 2.4.6
Version 2.4.6 - December, 2019
-
Fixed typos in White mapping of whitespace characters, to use
correct "\u" prefix instead of "u". -
Fix bug in left-associative ternary operators defined using
infixNotation. First reported on StackOverflow by user Jeronimo. -
Backport of pyparsing_test namespace from 3.0.0, including
TestParseResultsAsserts mixin class defining unittest-helper
methods:
. def assertParseResultsEquals(
self, result, expected_list=None, expected_dict=None, msg=None)
. def assertParseAndCheckList(
self, expr, test_string, expected_list, msg=None, verbose=True)
. def assertParseAndCheckDict(
self, expr, test_string, expected_dict, msg=None, verbose=True)
. def assertRunTestResults(
self, run_tests_report, expected_parse_results=None, msg=None)
. def assertRaisesParseException(self, exc_type=ParseException, msg=None)To use the methods in this mixin class, declare your unittest classes as:
from pyparsing import pyparsing_test as ppt
class MyParserTest(ppt.TestParseResultsAsserts, unittest.TestCase):
...
Pyparsing 2.4.5
Version 2.4.5 - November, 2019
- Fixed encoding when setup.py reads README.rst to include the
project long description when uploading to PyPI. A stray
unicode space in README.rst prevented the source install on
systems whose default encoding is not 'utf-8'.
Pyparsing 2.4.4
Check-in bug in Pyparsing 2.4.3 that raised UserWarnings was masked by stdout buffering in unit tests - fixed.
Pyparsing 2.4.3
Version 2.4.3 - November, 2019
(Backport of selected critical items from 3.0.0 development branch.)
-
Fixed a bug in
ParserElement.__eq__
that would for some parsers create a recursion error at parser definition time. Thanks to Michael Clerx for the assist. (Addresses issue #123) -
Fixed bug in
indentedBlock
where a block that ended at the end of the input string could cause pyparsing to loop forever. Raised as part of discussion on StackOverflow with geckos. -
Backports from pyparsing 3.0.0:
.__diag__.enable_all_warnings()
. Fixed bug inPrecededBy
which caused infinite recursion, issue #127
. support for usingregex
-compiled RE to constructRegex
expressions
Pyparsing 2.4.2
Version 2.4.2 - July, 2019
-
Updated the shorthand notation that has been added for repetition
expressions: expr[min, max], with '...' valid as a min or max value:- expr[...] and expr[0, ...] are equivalent to ZeroOrMore(expr)
- expr[1, ...] is equivalent to OneOrMore(expr)
- expr[n, ...] or expr[n,] is equivalent
to expr*n + ZeroOrMore(expr)
(read as "n or more instances of expr") - expr[..., n] is equivalent to expr*(0, n)
- expr[m, n] is equivalent to expr*(m, n)
Note that expr[..., n] and expr[m, n] do not raise an exception
if more than n exprs exist in the input stream. If this
behavior is desired, then write expr[..., n] + ~expr.
Better interpretation of [...] as ZeroOrMore raised by crowsonkb,
thanks for keeping me in line!If upgrading from 2.4.1 or 2.4.1.1 and you have used
expr[...]
forOneOrMore(expr)
, it must be updated toexpr[1, ...]
. -
The defaults on all the
__diag__
switches have been set to False,
to avoid getting alarming warnings. To use these diagnostics, set
them to True after importing pyparsing.Example:
import pyparsing as pp pp.__diag__.warn_multiple_tokens_in_named_alternation = True
-
Fixed bug introduced by the use of getitem for repetition,
overlooking Python's legacy implementation of iteration
by sequentially calling getitem with increasing numbers until
getting an IndexError. Found during investigation of problem
reported by murlock, merci!
Pyparsing 2.4.1.1
This is a re-release of version 2.4.1 to restore the release history
in PyPI, since the 2.4.1 release was deleted.
There are 3 known issues in this release, which are fixed in
the upcoming 2.4.2:
-
API change adding support for
expr[...]
- the original
code in 2.4.1 incorrectly implemented this as OneOrMore.
Code using this feature under this relase should explicitly
useexpr[0, ...]
for ZeroOrMore andexpr[1, ...]
for
OneOrMore. In 2.4.2 you will be able to writeexpr[...]
equivalent toZeroOrMore(expr)
. -
Bug if composing And, Or, MatchFirst, or Each expressions
using an expression. This only affects code which uses
explicit expression construction using the And, Or, etc.
classes instead of using overloaded operators '+', '^', and
so on. If constructing an And using a single expression,
you may get an error that "cannot multiply ParserElement by
0 or (0, 0)" or a PythonIndexError
. Change code likecmd = Or(Word(alphas))
to
cmd = Or([Word(alphas)])
(Note that this is not the recommended style for constructing
Or expressions.) -
Some newly-added
__diag__
switches are enabled by default,
which may give rise to noisy user warnings for existing parsers.
You can disable them using:import pyparsing as pp pp.__diag__.warn_multiple_tokens_in_named_alternation = False pp.__diag__.warn_ungrouped_named_tokens_in_collection = False pp.__diag__.warn_name_set_on_empty_Forward = False pp.__diag__.warn_on_multiple_string_args_to_oneof = False pp.__diag__.enable_debug_on_named_expressions = False
In 2.4.2 these will all be set to False by default.
Pyparsing 2.4.2a1
Release candidate for 2.4.2:
- FIxes incorrect implementation of expr[…] as OneOrMore, changed to ZeroOrMore
- Fixes
__getitem__
-induced iterability for ParserElement class __diag__
flags are now all False by default