Skip to content

Pyparsing 3.1.0a1

Pre-release
Pre-release
Compare
Choose a tag to compare
@ptmcg ptmcg released this 08 Mar 03:53
· 125 commits to master since this release

NOTE: In the future release 3.2.0, use of many of the pre-PEP8 methods (such as ParserElement.parseString) will start to raise DeprecationWarnings. 3.2.0 should get released some time later in 2023. I currently plan to completely drop the pre-PEP8 methods in pyparsing 4.0, though we won't see that release until at least late 2023 if not 2024. So there is plenty of time to convert existing parsers to the new function names before the old functions are completely removed. (Big help from Devin J. Pohly in structuring the code to enable this peaceful transition.)

Version 3.2.0 will also discontinue support for Python versions 3.6 and 3.7.

  • API ENHANCEMENT: Optional(expr) may now be written as expr | ""

    This will make this code:

    "{" + Optional(Literal("A") | Literal("a")) + "}"
    

    writable as:

    "{" + (Literal("A") | Literal("a") | "") + "}"
    

    Some related changes implemented as part of this work:

    • Literal("") now internally generates an Empty() (and no longer raises an exception)
    • Empty is now a subclass of Literal

    Suggested by Antony Lee (issue #412), PR (#413) by Devin J. Pohly.

  • Added new class property identifier to all Unicode set classes in pyparsing.unicode, using the class's values for cls.identchars and cls.identbodychars. Now Unicode-aware parsers that formerly wrote:

    ppu = pyparsing.unicode
    ident = Word(ppu.Greek.identchars, ppu.Greek.identbodychars)
    

    can now write:

    ident = ppu.Greek.identifier
    # or
    # ident = ppu.Ελληνικά.identifier
    
  • Reworked delimited_list function into the new DelimitedList class. DelimitedList has the same constructor interface as delimited_list, and in this release, delimited_list changes from a function to a synonym for DelimitedList. delimited_list and the older delimitedList method will be deprecated in a future release, in favor of DelimitedList.

  • Added new class method ParserElement.using_each, to simplify code that creates a sequence of Literals, Keywords, or other ParserElement subclasses.

    For instance, to define suppressable punctuation, you would previously write:

    LPAR, RPAR, LBRACE, RBRACE, SEMI = map(Suppress, "(){};")
    

    You can now write:

    LPAR, RPAR, LBRACE, RBRACE, SEMI = Suppress.using_each("(){};")
    

    using_each will also accept optional keyword args, which it will pass through to the class initializer. Here is an expression for single-letter variable names that might be used in an algebraic expression:

    algebra_var = MatchFirst(
        Char.using_each(string.ascii_lowercase, as_keyword=True)
    )
    
  • Added new builtin python_quoted_string, which will match any form of single-line or multiline quoted strings defined in Python. (Inspired by discussion with Andreas Schörgenhumer in Issue #421.)

  • Extended expr[] notation for repetition of expr to accept a slice, where the slice's stop value indicates a stop_on expression:

    test = "BEGIN aaa bbb ccc END"
    BEGIN, END = Keyword.using_each("BEGIN END".split())
    body_word = Word(alphas)
    
    expr = BEGIN + Group(body_word[:END]) + END
    # equivalent to
    # expr = BEGIN + Group(ZeroOrMore(body_word, stop_on=END)) + END
    
    print(expr.parse_string(test))
    

    Prints:

    ['BEGIN', ['aaa', 'bbb', 'ccc'], 'END']
    
  • ParserElement.validate() is deprecated. It predates the support for left-recursive parsers, and was prone to false positives (warning that a grammar was invalid when it was in fact valid). It will be removed in a future pyparsing release. In its place, developers should use debugging and analytical tools, such as ParserElement.set_debug() and ParserElement.create_diagram(). (Raised in Issue #444, thanks Andrea Micheli!)

  • Added bool embed argument to ParserElement.create_diagram(). When passed as True, the resulting diagram will omit the <DOCTYPE>, <HEAD>, and <BODY> tags so that it can be embedded in other HTML source. (Useful when embedding a call to create_diagram() in a PyScript HTML page.)

  • Added recurse argument to ParserElement.set_debug to set the debug flag on an expression and all of its sub-expressions. Requested by multimeric in Issue #399.

  • Added '·' (Unicode MIDDLE DOT) to the set of pp.unicode.Latin1.identbodychars.

  • Fixed bug in Word when max=2. Also added performance enhancement when specifying exact argument. Reported in issue #409 by panda-34, nice catch!

  • Word arguments are now validated if min and max are both given, that min <= max; raises ValueError if values are invalid.

  • Fixed bug in srange, when parsing escaped '/' and '' inside a range set.

  • Fixed exception messages for some ParserElements with custom names, which instead showed their contained expression names.

  • Fixed bug in pyparsing.common.url, when input URL is not alone on an input line. Fixes Issue #459, reported by David Kennedy.

  • Multiple added and corrected type annotations. With much help from Stephen Rosen, thanks!

  • Some documentation and error message clarifications on pyparsing's keyword logic, cited by Basil Peace.

  • General docstring cleanup for Sphinx doc generation, PRs submitted by Devin J. Pohly. A dirty job, but someone has to do it - much appreciated!

  • invRegex.py example renamed to inv_regex.py and updated to PEP-8 variable and method naming. PR submitted by Ross J. Duff, thanks!

  • Removed examples sparser.py and pymicko.py, since each included its own GPL license in the header. Since this conflicts with pyparsing's MIT license, they were removed from the distribution to avoid confusion among those making use of them in their own projects.