Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a custom :contains-regexp() pseudo class? #117

Open
facelessuser opened this issue Feb 22, 2019 · 5 comments
Open

Add a custom :contains-regexp() pseudo class? #117

facelessuser opened this issue Feb 22, 2019 · 5 comments
Labels
C: css-custom CSS custom selectors. P: maybe Pending approval of low priority request. T: feature Feature.

Comments

@facelessuser
Copy link
Owner

This is open currently as an exploratory idea. This would be a custom pseudo-class that would allow for regular expression searches of content. The idea would probably not be to include regular expression directly in the pattern, but most likely references to compiled patterns:

pattern = re.compile(r'some .*? pattern')
regexp = {'content_pattern': pattern}
sv.compile('p:-regex(content_pattern)', regexp=regexp)

Do we make this like contains, and have it search all children of p looking for the pattern, or do we constrain it to the target element of p? Or do we have two variants that do all children or only the target: :-regexp() and :-regexp-direct (or some other name that gets the idea across).

Anyways this is just an idea, but maybe in the future (if we flesh this out enough), we can implement this.

@facelessuser facelessuser added T: feature Feature. selectors C: css-custom CSS custom selectors. P: maybe Pending approval of low priority request. labels Feb 22, 2019
@facelessuser
Copy link
Owner Author

It's important to note Beautiful Soup already provides regex, we don't need this, but it might be nice to incorporate regex in some way for selectors as well. We just need to decide if we are willing to pay to commit to a solution, and what that solution should look like.

@facelessuser
Copy link
Owner Author

If we do this, a name like :contains-regexp() might be more descriptive and make more sense.

When defining regex keywords, should we require them to be in the form of custom CSS variables: --regex-key? As far as I know, we will never really have a need for regex variables in our scheme. Maybe we should require some other kind of variable prefix $key 🤷‍♂️ .

Or we could extend custom maybe? If you give a regex pattern instead of selector string, it searches a tag's content? Just some ideas.

@facelessuser
Copy link
Owner Author

Thinking about this more, we really could use custom selectors to do regex. Currently we take a string for a given custom pseudo-class, but we could accept an custom pseudo-class object as well. The object could take a selector, a text search value regex or string. You could even extend it to allow attribute values as well:

So just thinking out loud here. Assuming custom is a hashable object

import soupsieve as sv
import re

custom = {
    ':--custom-pseudo': sv.CustomPseudo(
        'p.class',
        text=re.compile(r'test-[a-z\d]+', re.I),
        attr={'data-item': re.compile(r'1[0-9]{2}')}
    )
}

sv.compile('article div > :custom-pseudo', custom=custom)

It may even be possible to allow a custom function, but I'm not sure yet. As long as things remained hashable and pickle-able, it would be doable, but I imagined this may not always behave proper sending in a function, as the patterns get cached. Caching a pattern with a function does not guarantee you'd get the same behavior....I think I'd pass on functions for now.

@facelessuser
Copy link
Owner Author

Another possibility is to extend contains and the attribute equal case to accept custom template variables: $var.

You would define regular expressions with custom variable names which could be a valid identifier with a $ prefix.

regexp = {
    'content-pattern': re.compile(r'test-[a-z\d]+', re.I),
    'attr-pattern': re.compile(r'1[0-9]{2}')
}

sv.compile('p:contains($content-pattern)[data-item=$attr-pattern]', regexp=regexp)

Maybe this is the most straight forward approach? If nothing, it is another option. Custom patterns may still need a way to
provide regex when defining them.

@facelessuser facelessuser changed the title Add a custom :-regex() pseudo class? Add a custom :contains-regexp() pseudo class? Mar 7, 2019
@gir-bot gir-bot removed the selectors label Nov 1, 2019
@facelessuser
Copy link
Owner Author

If we end up doing #175, this would not be needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: css-custom CSS custom selectors. P: maybe Pending approval of low priority request. T: feature Feature.
Projects
None yet
Development

No branches or pull requests

2 participants