Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #249 - Add strip to get() and getall() #260

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

felipeboffnunes
Copy link
Member

@felipeboffnunes felipeboffnunes commented Oct 28, 2022

Resolves #249

@codecov
Copy link

codecov bot commented Oct 28, 2022

Codecov Report

Merging #260 (7ce8789) into master (1913fb7) will not change coverage.
The diff coverage is n/a.

❗ Current head 7ce8789 differs from pull request most recent head b044b76. Consider uploading reports for the commit b044b76 to get more accurate results

@@            Coverage Diff            @@
##            master      #260   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            4         4           
  Lines          138       138           
  Branches        29        29           
=========================================
  Hits           138       138           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Comment on lines 223 to 225
@typing.overload
def get(self, strip: bool) -> str:
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this necessary? (I am not saying it is not)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is, I just saw the overloads above this line and thought it was required

@typing.overload
def get(self, default: None = None) -> Optional[str]:
    pass

@typing.overload
def get(self, default: str) -> str:
    pass

I removed it on the last push. I don't understand the whole overload completely, but I feel those 2 are also unnecessary?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is related to typing, when parameter types affect output type, but I do not think it applies here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, removing it broke the pipeline for type checks.

No overload variant of "get" of "SelectorList" matches argument type "bool"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you added a new argument to the method you need to add them to all overloads too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And those two overloads are needed because the method either can return None if default wasn't passed or it returns the default instead, so this describes the typing info more clearly.

tests/test_selector.py Outdated Show resolved Hide resolved
@@ -200,12 +200,15 @@ def re_first(
return el
return default

def getall(self) -> List[str]:
def getall(self, strip: bool = False) -> List[str]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let’s make it keyword-only.

Suggested change
def getall(self, strip: bool = False) -> List[str]:
def getall(self, *, strip: bool = False) -> List[str]:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same on get.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we apply this to typing.overload of get as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried adding it to get but then this would require some type check on default entry to validate it is a str, either way the behavior looks fine without it for get

Comment on lines 223 to 225
@typing.overload
def get(self, strip: bool) -> str:
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you added a new argument to the method you need to add them to all overloads too.

Comment on lines 223 to 225
@typing.overload
def get(self, strip: bool) -> str:
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And those two overloads are needed because the method either can return None if default wasn't passed or it returns the default instead, so this describes the typing info more clearly.

"""
Call the ``.get()`` method for each element is this list and return
their results flattened, as a list of strings.
"""
return [x.get() for x in self]
data = [x.get() for x in self]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can just pass strip to get() instead of essentially reimplementing the new code from get() here?
(granted, the code is almost trivial, but still)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

x is a Selector, at that point neither default nor strip exist anymore
Only SelectorList recognizes those two fields

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About adding the strip argument to the other overloads, wouldn't that remove the purpose of the overload on those?
Wouldn't having an overloaded default + strip be just going back to the non-overloaded method signature?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

x is a Selector, at that point neither default nor strip exist anymore

Aren’t you also adding strip to Selector? Can’t you use data = [x.get(strip=True) for x in self]?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That ended up looking harder than it should when I tried.
Also as far as I could see, default also is not on the Selector, so for standardization, I believe adding it would also be required. And in that case I don't understand why it wasn't added up until now, it gives me the feeling it is not a trivial task given the current implementation of get on Selector.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you are changing Selector.get() to have strip and it already has default so I think there shouldn't be any problems with using it, and it indeed doesn't look too complicated to me :)

Also if you have issues with typing I can hep, I'm working on improving parsel typing right now.

this_doesnt_work = sel.xpath("//ul/li[position()>1]")[0].get(default="6") # Selector doesnt have default field

sel.xpath(...).get(default=...) works in normal parsel without your changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About adding the strip argument to the other overloads, wouldn't that remove the purpose of the overload on those?

No, the purpose of those overloads is to show that the type of the return value depends on whether default is None or not.

Wouldn't having an overloaded default + strip be just going back to the non-overloaded method signature?

I believe just adding strip to the existing overloads is enough, no new overloads are needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sel.xpath(...).get(default=...) uses SelectorList, not Selector. The problem is that by indexing [0] then you go to a Selector, and it does not have default .
but either way I still have to give it a look

Also, it seems a little unintuitive to be using the [0] when doing a get operation, as you are already asking the SelectorList to get the first instance it found. I used it on my tests because I saw some of the tests use this kind of indexing to prove Selector behaves correctly, and in this case, I believe you should be able to do both default and strip directly on a Selector, but you can't for a Selector at the moment so maybe this needed further discussion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, you are right. Selector.get() is much simpler than SelectorList.get(). Well, I think it makes sense and is easy to add strip= to Selector.get().

it seems a little unintuitive to be using the [0] when doing a get operation, as you are already asking the SelectorList to get the first instance it found

Yeah, it's probably unusual to write <SelectorList>[0].get(). It's OK to have a Selector and call a get) on it though.

I believe you should be able to do both default and strip directly on a Selector, but you can't for a Selector at the moment so maybe this needed further discussion.

Not so sure about default=, but maybe there are use cases where it's useful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can keep this as is, strip for Selector.get() is less useful and can be implemented separately.

parsel/selector.py Outdated Show resolved Hide resolved
@@ -200,30 +200,36 @@ def re_first(
return el
return default

def getall(self) -> List[str]:
def getall(self, *, strip: bool = False) -> List[str]:
"""
Call the ``.get()`` method for each element is this list and return
their results flattened, as a list of strings.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the new option needs to be mentioned in the docstring.

def get(self, default: Optional[str] = None) -> Optional[str]:
def get(
self, default: Optional[str] = None, strip: bool = False
) -> Optional[str]:
"""
Return the result of ``.get()`` for the first element in this list.
If the list is empty, return the default value.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new option needs to be mentioned in the docstring

@kmike
Copy link
Member

kmike commented Nov 1, 2022

Hey! Is there a use case for this feature if #127 is merged, do you have some examples?

I'm asking because sel.get(text=True) would also be stripping text by default, so strip=True won't be useful with text argument. It may be also hard to support strip=False, text=True. And I wonder if there are real-world cases where you want strip=True, but don't want text=True.

@felipeboffnunes
Copy link
Member Author

@kmike By the time I saw #127 I had most of the bulk of the strip logic coded. I do0n't think there is a common use case for this PR compared to 127 besides maybe being a more straight forward pre processing step of the text where only one thing happens (strip).
Either way I felt that #127 has been around for a good while and not finished, so a simpler alternative was presented just to fix the issue #249
I guess we can close this one, mention #127 in #249 and try to close the features presented there

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Adding a strip kwarg to get() and getall()
4 participants