`Path` cannot add or init integers, only strings and other Paths #146

MestreLion · 2021-10-21T05:43:35Z

Path.__add__() (and __radd__), as well as __init__(), accepts str and Path, but not int. Considering it can handle ints just fine in subscriptors, why not add support for + too?

Currently we have:

Path("a")["b"]["c"] == Path("a.b.c")
"a" + Path("b") + "c" == Path("a.b.c")
Path("a")[1]["c"] == Path("a[1].c")
Path("[0]")[1]["c"] == Path("[0][1].c")

It could also allow the following:

"a" + Path("[1]") + "b" + 2 == Path("a[1].b[2]")
0 + Path("b") + 1 == Path("[0].b[1]")
0 + Path()[1] + 2 == Path("[0][1][2]")
"a" + Path(1)["b"] + Path(3)[4] + 5 + "c" == Path("a[1].b[3][4][5].c")

I could send a PR if you approve the idea

The text was updated successfully, but these errors were encountered:

vberlier · 2021-10-23T04:08:10Z

The idea is that path objects and strings representing path objects should be interchangeable. When you instantiate or concatenate a path with a string, the string represents a full nbt path, not an individual key, the string actually gets parsed into a path object. But when you index into a path you extend it by an individual key. That's why you can use numbers too, the string you use to subscript is a key. Path + str and Path[str] are not equivalent.

>>> Path("a.b") + Path("c.d")
Path('a.b.c.d')
>>> Path("a.b") + "c.d"
Path('a.b.c.d')
>>> Path("a.b")["c.d"]
Path('a.b."c.d"')

To me it makes sense to keep the meaning clear. We can magically make it work like you suggested but it will lead to inconsistencies.

MestreLion · 2021-10-23T20:01:17Z

I really *love Path's design and features, it's undoubtedly one of the most amazing features of nbtlib. But I don't see by your example and explanation why wouldn't int fit that design or create any inconsistency.

If I understood correctly, the design principles you mentioned are:

Path(x) constructor takes a full NBT path x, represented by a string or Path.
path + x extends a path by concatenating it with another full NBT path x, represented by a string or Path.
path[x] indexing appends a single key (or index) x, represented by a string (or int)

Is that correct?

What about indexing using another full NBT path? Currently we have Path("a.b")[Path("c.d")] == Path("a.b.c.d"). But anyway, that's not my point.

My question is: what inconsistency would emerge from allowing an int, say 1, to be treated like "[1]"? Sure, an int is only able to represent a single component, and always an index (not a key). So what? That's a limitation of int, and Path shouldn't care. Just like any single component is accepted as a full path, be it "a", "[1]", or even "", so could an int.

My proposal is to handle integers the same way you handle "[x]", no more, no less. You can even convert 1 to "[1]" upfront in __new__, __add__ and __radd__, and let it be parsed as a string.

I believe such approach would not break the API in any way, nor create any inconsistency. The key here (no pun intended) is that an int is always an index

As an `int` can only represent a single index, we can use handle `P + int` the same way we handle `P[int]`: `Path(...) + 1 == Path(...)[1]` `1 + Path(...) == Path()[1][Path(...)]` An alternate implementation would be to convert `1` to `"[1]"` and let it parse as a string. Corresponding tests will be updated in the folowing commits. The case for `Path(int)` is independent of this and is not addressed in this commit.

This effectively treats an integer `x` as if it were the string `"[x]"`, allowing `Path(x)` and `x + Path(...) + x` in addition to current `Path(...)[x]`. There are several possible implementation approaches, including a straightforward `if isinstance(x, int): x = f"[{int(x)}]"` in __new__, __add__ and __radd__, then letting it parse as an actual string. However, not only that approach is very inneficient, as it calls the parser and tokenization for a known outcome, but also this was not the approach taken by __getitem__, currently the only method that can be used as reference on how to properly handle integers. It is also important be consistent with the curent _semantics_ of `Path(x)` and `Path + x`, as they're different from `Path[x]` for strings and Paths, so do not converge to using `Path[x]` for both. Even if for integers it ends up being the same result, as integers can only be a single index, do not rely on that and keep semantics distinct and consistent with the string code path. So in __new__, instead of resorting to a `return cls()[x]` shortcut, we call the same classmethod as str and Path do, `cls.from_accessors()`, passing the same argument as __getitem__ uses: a tuple with a single element `ListIndex(index=x)`, which is also, not coincidentaly, the same result of parsing `"[x]"`. As for __add__/__radd__, again we use the same semantics as used by str: `Path(...) + x == Path(...)[Path(x)]`, and `x + Path(...) == Path(x)[Path(...)]` And, as we just defined `Path(x)` for an integer x, we can simply extend the `isinstance(x, str)` check to include `int`, without creating a new case, thus enforcing consistent semantics.

This effectively treats an integer `x` as if it were the string `"[x]"`, allowing `Path(x)` and `x + Path(...) + x` in addition to current `Path(...)[x]`. There are several possible implementation approaches, including a straightforward `if isinstance(x, int): x = f"[{int(x)}]"` in `__new__`, `__add__` and `__radd__`, then letting it parse as an actual string. However, not only that approach is very inefficient, as it calls the parser and tokenization for a known outcome, but also this was not the approach taken by `__getitem__`, currently the only method that can be used as reference on how to properly handle integers. It is also important be consistent with the current _semantics_ of `Path(x)` and `Path + x`, as they're different from `Path[x]` for strings and Paths, so do not converge to using `Path[x]` for both. Even if for integers it ends up being the same result, as integers can only be a single index, do not rely on that and keep semantics distinct and consistent with the string code path. So in `__new__`, instead of resorting to a `return cls()[x]` shortcut, we call the same classmethod as str and Path do, `cls.from_accessors()`, passing the same argument as __getitem__ uses: a tuple with a single element `ListIndex(index=x)`, which is also, not coincidentally, the same result of parsing `"[x]"`. As for `__add__`/`__radd__`, again we use the same semantics as used by str: `Path(...) + x == Path(...)[Path(x)]`, and `x + Path(...) == Path(x)[Path(...)]` And, as we just defined `Path(x)` for an integer x, we can simply extend the `isinstance(x, str)` check to include `int`, without creating a new case, thus enforcing consistent semantics.

vberlier closed this as completed Oct 23, 2021

vberlier reopened this Nov 2, 2021

vberlier mentioned this issue Nov 2, 2021

nbtlib 2.0 #156

Open

9 tasks

MestreLion mentioned this issue Nov 2, 2021

Add Path.from_parts() or similar to allow tuple constructor like pathlib.Path #157

Open

vberlier closed this as completed in 061e095 Nov 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`Path` cannot add or init integers, only strings and other Paths #146

`Path` cannot add or init integers, only strings and other Paths #146

MestreLion commented Oct 21, 2021

vberlier commented Oct 23, 2021

MestreLion commented Oct 23, 2021

Path cannot add or init integers, only strings and other Paths #146

Path cannot add or init integers, only strings and other Paths #146

Comments

MestreLion commented Oct 21, 2021

vberlier commented Oct 23, 2021

MestreLion commented Oct 23, 2021

`Path` cannot add or init integers, only strings and other Paths #146

`Path` cannot add or init integers, only strings and other Paths #146