Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid imports inside functions #60

Merged
merged 12 commits into from Mar 18, 2022
Merged

Avoid imports inside functions #60

merged 12 commits into from Mar 18, 2022

Conversation

elacuesta
Copy link
Member

@elacuesta elacuesta commented Mar 12, 2022

Related to #59 (not sure if it fixes it completely)

Tasks:

  • Restore test coverage (the solution I found is a bit hacky: blocking some imports, also deleting them from sys.modules if they were imported already)
  • Remove additional non top-level imports in itemadapter.adapter

@codecov
Copy link

codecov bot commented Mar 12, 2022

Codecov Report

Merging #60 (fdf1339) into master (817054d) will increase coverage by 0.37%.
The diff coverage is 100.00%.

@@             Coverage Diff             @@
##           master       #60      +/-   ##
===========================================
+ Coverage   99.62%   100.00%   +0.37%     
===========================================
  Files           3         4       +1     
  Lines         266       285      +19     
===========================================
+ Hits          265       285      +20     
+ Misses          1         0       -1     
Impacted Files Coverage Δ
itemadapter/_imports.py 100.00% <100.00%> (ø)
itemadapter/adapter.py 100.00% <100.00%> (ø)
itemadapter/utils.py 100.00% <100.00%> (+1.53%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 817054d...fdf1339. Read the comment docs.

@elacuesta elacuesta marked this pull request as ready for review March 13, 2022 17:48
@Gallaecio
Copy link
Member

(did not check the performance change, though)

@Gallaecio
Copy link
Member

Performance-wise, it is 16.5 times slower than it used to be in 3bc56f2 (the commit I happen to have in the main branch of my fork), but 29 times faster than it currently is.

The speed up is amazing, although I wonder whether or not it warrants closing #59, given the performance difference with 3bc56f2 (which is probably explained by support for new item types, I did not check what that commit supported and did not support).

from timeit import timeit

from itemadapter import is_item


def a():
    return is_item({})

print(timeit(a, number=1000000))

3bc56f2: 0.08102401599990117
f985219: 1.3361113370001476
817054d: 38.81687346000035

@elacuesta
Copy link
Member Author

That's awesome, thanks!

I just pushed eee23cf, which improves performance greatly by removing unnecessary duplicated calls to get Scrapy item classes, that was also being done on each item check.

Two additional things that are improving performance for me:

  • Use ItemAdapter.is_item instead of itemadapter.is_item as the former actually imports the latter upon being called
  • One thing I found when comparing master with the commit you shared is that we used to check for dict first, so in this particular case it works faster if you put the dict adapter first: ItemAdapter.ADAPTER_CLASSES = deque([DictAdapter, ScrapyItemAdapter]). I probably changed that to optimize for Scrapy items over dicts, thinking that they were used more often; we could of course revisit that decision.

@elacuesta
Copy link
Member Author

elacuesta commented Mar 17, 2022

Re fdf1339, seems to me like the extra access to the __class__ attribute slows things down a little

@elacuesta elacuesta merged commit 6d937d7 into scrapy:master Mar 18, 2022
@elacuesta elacuesta deleted the avoid-imports-in-predicates branch March 18, 2022 19:48
@elacuesta elacuesta mentioned this pull request Mar 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants