Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

subclasses of pathlib.PurePosixPath never call __init__ or __new__ #85281

Closed
conchylicultor mannequin opened this issue Jun 25, 2020 · 14 comments
Closed

subclasses of pathlib.PurePosixPath never call __init__ or __new__ #85281

conchylicultor mannequin opened this issue Jun 25, 2020 · 14 comments
Labels
3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes 3.10 only security fixes stdlib Python modules in the Lib dir topic-pathlib type-bug An unexpected behavior, bug, or error

Comments

@conchylicultor
Copy link
Mannequin

conchylicultor mannequin commented Jun 25, 2020

BPO 41109
Nosy @pitrou, @websurfer5, @Conchylicultor, @Louis-Vincent
PRs
  • bpo-41109: subclasses of pathlib.Path and pathlib.PurePath now call the subclass's __init__() and __new__() functions when returning new objects #21920
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2020-06-25.02:51:23.231>
    labels = ['type-bug', '3.8', '3.9', '3.10', '3.7', 'library']
    title = 'subclasses of pathlib.PurePosixPath never call __init__ or __new__'
    updated_at = <Date 2020-08-19.06:37:57.142>
    user = 'https://github.com/conchylicultor'

    bugs.python.org fields:

    activity = <Date 2020-08-19.06:37:57.142>
    actor = 'Jeffrey.Kintscher'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2020-06-25.02:51:23.231>
    creator = 'conchylicultor'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 41109
    keywords = ['patch']
    message_count = 12.0
    messages = ['372297', '372298', '375030', '375031', '375041', '375047', '375162', '375332', '375333', '375334', '375364', '375638']
    nosy_count = 4.0
    nosy_names = ['pitrou', 'Jeffrey.Kintscher', 'conchylicultor', 'louis-vincent.boudre']
    pr_nums = ['21920']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue41109'
    versions = ['Python 3.6', 'Python 3.7', 'Python 3.8', 'Python 3.9', 'Python 3.10']

    @conchylicultor
    Copy link
    Mannequin Author

    conchylicultor mannequin commented Jun 25, 2020

    I have a subclass GithubPath of PurePosixPath.

    class GithubPath(pathlib.PurePosixPath):
    
      def __new__(cls, *args, **kwargs):
        print('New')
        return super().__new__(cls, *args, **kwargs)
    
      def __init__(self, *args, **kwargs):
        print('Init')
        super().__init__()
    

    Calling child.parent create a new GithubPath but without ever calling new nor init. So my subclass is never notified it is created.

    p = GithubPath()  # Print "New", "Init"
    
    p.parent  # Create a new GithubPath but bypass the constructors
    

    The reason seems to be that parent calls _from_parts which create a new object through object.__new__(cls):

    def _from_parts(cls, args, init=True):

    A hack is to subclass _init but it seems hacky as it relies on internal implementation detail.

    @conchylicultor conchylicultor mannequin added 3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes 3.10 only security fixes type-bug An unexpected behavior, bug, or error labels Jun 25, 2020
    @conchylicultor
    Copy link
    Mannequin Author

    conchylicultor mannequin commented Jun 25, 2020

    Note that this likely affect all methods which returns new Path by calling _from_parts or _from_parsed_parts, like .absolute, .resolve,...

    @websurfer5
    Copy link
    Mannequin

    websurfer5 mannequin commented Aug 8, 2020

    The workaround is to override _init(), which I agree is not desirable.

    This is the relevant code in PurePath, which is the super class of PurePosixPath:

        @classmethod
        def _from_parsed_parts(cls, drv, root, parts, init=True):
            self = object.__new__(cls)
            self._drv = drv
            self._root = root
            self._parts = parts
            if init:
                self._init()
            return self

    ...

        def _init(self):
            # Overridden in concrete Path
            pass

    To me, the clean way to get the desired behavior seems like it would be to have _init() call self.__init__().

        def _init(self):
            # Overridden in concrete Path
            self.__init__()

    This fixes p.parent, but GithubPath() ends up calling GithubPath.__init__() twice – the first time by PurePath.__new__() calling PurePath._init() and the second time by the GithubPath object creation.

    @websurfer5
    Copy link
    Mannequin

    websurfer5 mannequin commented Aug 8, 2020

    Clarification:

    PurePath.__new__() calls PurePath._from_parts(), which then calls PurePath._init()

    @conchylicultor
    Copy link
    Mannequin Author

    conchylicultor mannequin commented Aug 8, 2020

    Before solving this issue, I think it would be best to think on a more generic solution on how to make Pathlib more extensible. Related to: https://discuss.python.org/t/make-pathlib-extensible/3428

    For instance, if childs created with p.parent(), p / 'subdir' need to forward some state (e.g. RemotePath(path, password=, user=)).

    Rather than __init__, maybe there should be some __post_init__ like dataclasses.

    @Louis-Vincent
    Copy link
    Mannequin

    Louis-Vincent mannequin commented Aug 8, 2020

    Path.new should not call _from_parts because it breaks the specialization by directly using object.new.
    This is why _from_parts has to be duplicated in each subclass's __new__.

    My suggestion (first draft) is to do the parsing of arguments inside an __init__ in the Path class hierarchy
    and deprecate _from_parts.

    class PurePath:
       
         def __new__(cls, *args):
             if cls is PurePath:
                 cls = PureWindowsPath if os.name == 'nt' else PurePosixPath
             super().__new__(cls, *args) # Here we remove call to from_parts
    
         def __init__(self, *args):
             # We should have an __init__ in the hierarchy.
             drv, root, parts = self._parse_args(args)  # this would get the proper _flavour.
             self._drv = drv
             self._root = root
             self._parts = parts
    
         ...
    
    class Path(PurePath):
         def __new__(cls, *args, **kwargs):
            if cls is Path:
                cls = WindowsPath if os.name == 'nt' else PosixPath
    
            # REMOVE THIS LINE: self = cls._from_parts(args, init=False) #
    
            if not self._flavour.is_supported:
                raise NotImplementedError("cannot instantiate %r on your system"
                                          % (cls.__name__,))
            return super().__new__(cls, *args, **kwargs) # Use super
    
    

    I don't know what is the purpose of _init and if it could be replaced by an extra keyword argument in init.

    The class method _from_parts would become deprecated since the argument parsing would be now handled by __init__.

    By using init and avoid the direct call to object.__new__ we would respect class specialization and custom class could be implemented intuitively.

    @websurfer5
    Copy link
    Mannequin

    websurfer5 mannequin commented Aug 11, 2020

    The purpose of the _init() function in PurePath is to allow PurePath methods to call the Path subclass override _init() function when initializing a Path object.

    @websurfer5
    Copy link
    Mannequin

    websurfer5 mannequin commented Aug 13, 2020

    Adding __init__() to PurePath complicates things and doesn't provide any benefit. A subclass that calls super.__init__() ends up invoking object.__init__(), which is perfectly fine.

    I was able to find a solution by calling type(self)() instead of object.__new__() in most cases. I am working on a PR.

    @Louis-Vincent
    Copy link
    Mannequin

    Louis-Vincent mannequin commented Aug 13, 2020

    This is not true, because the classmethod use the library shortcuts the
    class mro order, to prevent infinite loop inthe __new__. However, it was
    using __init__ before hand, we would
    Not have this issue

    Le jeu. 13 août 2020 à 15:27, Jeffrey Kintscher <report@bugs.python.org> a
    écrit :

    Jeffrey Kintscher <websurfer@surf2c.net> added the comment:

    Adding __init__() to PurePath complicates things and doesn't provide any
    benefit. A subclass that calls super.__init__() ends up invoking
    object.__init__(), which is perfectly fine.

    I was able to find a solution by calling type(self)() instead of
    object.__new__() in most cases. I am working on a PR.

    ----------


    Python tracker <report@bugs.python.org>
    <https://bugs.python.org/issue41109\>


    @Louis-Vincent
    Copy link
    Mannequin

    Louis-Vincent mannequin commented Aug 13, 2020

    This is not true, because the classmethod used the library shortcuts the
    class mro order, this is to prevent infinite loop inthe __new__. However,
    If it was using __init__ before hand, we would
    Not have this issue

    Le jeu. 13 août 2020 à 15:31, Louis-Vincent Boudreault <
    lv.boudreault95@gmail.com> a écrit :

    This is not true, because the classmethod use the library shortcuts the
    class mro order, to prevent infinite loop inthe __new__. However, it was
    using __init__ before hand, we would
    Not have this issue

    Le jeu. 13 août 2020 à 15:27, Jeffrey Kintscher <report@bugs.python.org>
    a écrit :

    >
    > Jeffrey Kintscher <websurfer@surf2c.net> added the comment:
    >
    > Adding __init__() to PurePath complicates things and doesn't provide any
    > benefit. A subclass that calls super.__init__() ends up invoking
    > object.__init__(), which is perfectly fine.
    >
    > I was able to find a solution by calling type(self)() instead of
    > object.__new__() in most cases. I am working on a PR.
    >
    > ----------
    >
    > _______________________________________
    > Python tracker <report@bugs.python.org>
    > <https://bugs.python.org/issue41109\>
    > _______________________________________
    >

    @websurfer5
    Copy link
    Mannequin

    websurfer5 mannequin commented Aug 14, 2020

    The current implementation calls object.__new__(cls), where cls is the child class type, from within a class method (@classmethod). This is fine for Path.__new__() and PurePath.__new__(), which are called by the child class's __new__(), because we don't want them to recursively call the child class's __new__() when the child class is created. This all works as expected when the child class is instantiated outside of Path and PurePath, and the child's __init__() gets called as expected. I don't see any point in making changes to this behavior.

    When one of approximately 20 PurePath and Path functions and properties instantiate a new child class object the same way PurePath.__new__() and Path.__new__() do, the child class's __new__() and __init__() functions are not called. This is the problem we are trying to solve.

    My fix is to add normal functions (i.e. not decorated with @classmethod) to instantiate child class objects using

    obj = type(self)()

    This creates a child class instance, and the child's __new__() and __init__() functions get called.

    Once I have finished re-plumbing Path and PurePath to use the new functions and created the necessary unit tests (to make sure I didn't break anything), I will also look at adding
    a proper __init__() function to the two classes instead of having __new__() initialize the member variables. I didn't mean to imply that __init__() isn't useful. It is required to allow the child class to initialize its own variable. I just meant it isn't required to force calling __init__() and __new__() in the child class.

    @websurfer5
    Copy link
    Mannequin

    websurfer5 mannequin commented Aug 19, 2020

    I created a PR that should provide the desired behavior: __init__() and __new__() get called in subclass objects that are created by Path and PurePath. Also, Path and PurePath now have __init__() functions, and the __new__() functions only return new objects and rely upon __init__() to perform the object initialization.

    @websurfer5 websurfer5 mannequin added stdlib Python modules in the Lib dir labels Aug 19, 2020
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @barneygale
    Copy link
    Contributor

    I believe #100481 would address this.

    barneygale added a commit to barneygale/cpython that referenced this issue Feb 4, 2023
    `PurePath` now normalises and splits paths only when necessary, e.g. when
    `.name` or `.parent` is accessed. The result is cached. This speeds up path
    object construction by around 4x.
    
    `PurePath.__fspath__()` now returns an unnormalised path, which should be
    transparent to filesystem APIs (else pathlib's normalisation is broken!).
    This extends the earlier performance improvement to most impure `Path`
    methods, and also speeds up pickling, `p.joinpath('bar')` and `p / 'bar'`.
    
    This also fixes pythonGH-76846 and pythonGH-85281 by unifying path constructors and
    adding an `__init__()` method.
    barneygale added a commit that referenced this issue Apr 3, 2023
    …lasses (GH-102789)
    
    Fix an issue where `__new__()` and `__init__()` were not called on subclasses of `pathlib.PurePath` and `Path` in some circumstances.
    
    Paths are now normalized on-demand. This speeds up path construction, `p.joinpath(q)`, and `p / q`.
    
    Co-authored-by: Steve Dower <steve.dower@microsoft.com>
    @barneygale
    Copy link
    Contributor

    Fixed in Python 3.12 / #102789 / 11c3020

    gaogaotiantian pushed a commit to gaogaotiantian/cpython that referenced this issue Apr 8, 2023
    …pathlib subclasses (pythonGH-102789)
    
    Fix an issue where `__new__()` and `__init__()` were not called on subclasses of `pathlib.PurePath` and `Path` in some circumstances.
    
    Paths are now normalized on-demand. This speeds up path construction, `p.joinpath(q)`, and `p / q`.
    
    Co-authored-by: Steve Dower <steve.dower@microsoft.com>
    warsaw pushed a commit to warsaw/cpython that referenced this issue Apr 11, 2023
    …pathlib subclasses (pythonGH-102789)
    
    Fix an issue where `__new__()` and `__init__()` were not called on subclasses of `pathlib.PurePath` and `Path` in some circumstances.
    
    Paths are now normalized on-demand. This speeds up path construction, `p.joinpath(q)`, and `p / q`.
    
    Co-authored-by: Steve Dower <steve.dower@microsoft.com>
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes 3.10 only security fixes stdlib Python modules in the Lib dir topic-pathlib type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants