Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate simplifying static checks for initialized Crawler #6136

Open
wRAR opened this issue Nov 5, 2023 · 7 comments
Open

Investigate simplifying static checks for initialized Crawler #6136

wRAR opened this issue Nov 5, 2023 · 7 comments

Comments

@wRAR
Copy link
Member

wRAR commented Nov 5, 2023

After we changed the Crawler logic to initialize components such as Crawler.stats later, static checkers (specifically mypy) don't know that when code like self.crawler.stats.inc_value() is called, self.crawler.stats is already initialized, so we need to add asserts in all methods that do this. We should try to find a way to do the same with fewer asserts. This also applies to 3rd-party code (middlewares etc.).

@antarahebbar
Copy link

Hi! @yueric985 and I are students are we are working on a project for a class that requires us to make an open-source contribution to a library. We will be working on this issue as an example.

@sa2415
Copy link
Contributor

sa2415 commented Dec 5, 2023

Hi! Can I work on this?
"We should try to find a way to do the same with fewer asserts." --> Does this mean we'd need to reduce the assert checks in certain places or keep the assertion checks as is and simplify the way these checks are done?

This was referenced Dec 5, 2023
@wRAR
Copy link
Member Author

wRAR commented Dec 5, 2023

Hi! Can I work on this?

Sure.

Does this mean we'd need to reduce the assert checks in certain places or keep the assertion checks as is and simplify the way these checks are done?

This means doing fewer asserts.

@derekk10
Copy link

Question about the issue. I understand that several components of Crawler are initialized later. However, is it ensured that these components such as self.crawler.stats is always initialized before calling a method such as self.crawler.stats.inc_value()? If so, would doing something like self.stats = cast(StatsCollector, self.stats) be a valid solution? This would tell mypy to treat self.stats as an ainstance of StatsCollector for the rest of the method so we wouldn't have to use an assert statement before accessing self.stats.

@wRAR
Copy link
Member Author

wRAR commented Dec 11, 2023

would doing something like self.stats = cast(StatsCollector, self.stats) be a valid solution?

I don't think this code does anything?

Also, where would you put it?

This would tell mypy to treat self.stats as an ainstance of StatsCollector for the rest of the method

Oh, you mean in each method? That's the same amount of code as an assert.

@derekk10
Copy link

derekk10 commented Dec 12, 2023

Yes, I see the issue. I have another proposed solution to use a proxy object for crawler.stats that initializes the actual stats object the first time it is accessed. This way, we would not need to use assrts or if statements to check if stats is initialized.
It might look something like this:

`class StatsProxy:
def init(self, crawler):
self._crawler = crawler
self._stats = None

def __getattr__(self, name):
    if self._stats is None:
        self._stats = self._crawler.initialize_stats()
    return getattr(self._stats, name)

class Crawler:
def init(self):
self.stats = StatsProxy(self)

def initialize_stats(self):
    #init logic
    return Stats()`

StatsProxy intercepts all attribute accesses to crawler.stats and initializes stats the first time it's accessed. This way we can be sure that stats is always initialized before use without using asserts. I have a sample initialize_stats() method that I haven't filled out. Please let me know if this make sense and I'd appreciate feedback and your thoughts on this solution.

@wRAR
Copy link
Member Author

wRAR commented Dec 19, 2023

initializes the actual stats object the first time it is accessed

That's the problem, it shouldn't be initialized on access, it's initialized by a separate method and the question of this ticket is how can we tell the checkers that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants