Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issue #6323: add SpiderLoggerAdapter, change Spider.logger to return SpiderLoggerAdapter #6324

Merged
merged 7 commits into from
May 13, 2024
5 changes: 3 additions & 2 deletions scrapy/spiders/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@

from scrapy import signals
from scrapy.http import Request, Response
from scrapy.utils.spider_logger_adapter import SpiderLoggerAdapter
from scrapy.utils.trackref import object_ref
from scrapy.utils.url import url_is_from_spider

Expand Down Expand Up @@ -42,9 +43,9 @@ def __init__(self, name: Optional[str] = None, **kwargs: Any):
self.start_urls: List[str] = []

@property
def logger(self) -> logging.LoggerAdapter:
def logger(self) -> SpiderLoggerAdapter:
logger = logging.getLogger(self.name)
return logging.LoggerAdapter(logger, {"spider": self})
return SpiderLoggerAdapter(logger, {"spider": self})
bloodforcream marked this conversation as resolved.
Show resolved Hide resolved

def log(self, message: Any, level: int = logging.DEBUG, **kw: Any) -> None:
"""Log the given message at the given log level
Expand Down
15 changes: 15 additions & 0 deletions scrapy/utils/spider_logger_adapter.py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should go into scrapy.utils.log

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. That's where I tried to put it first but I get a circular import error.

..\scrapy\__init__.py:15: in <module>
    from scrapy.spiders import Spider
..\scrapy\spiders\__init__.py:16: in <module>
    from scrapy.utils.log import SpiderLoggerAdapter
..\scrapy\utils\log.py:13: in <module>
    from scrapy.settings import Settings
..\scrapy\settings\__init__.py:23: in <module>
    from scrapy.settings import default_settings
..\scrapy\settings\default_settings.py:322: in <module>
    USER_AGENT = f'Scrapy/{import_module("scrapy").__version__} (+https://scrapy.org)'
E   AttributeError: partially initialized module 'scrapy' has no attribute '__version__' (most likely due to a circular import)

Maybe i'm wrong and it's solvable

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't like having a separate module for this, I see these options:

  1. Move it to scrapy.spiders (probably no technical downsides as it's not useful anywhere else?).
  2. Remove the SpiderLoggerAdapter import from the scrapy.spiders top-level, moving it under if TYPE_CHECKING and inside def logger().
  3. (likely a worse idea) Change some other imports in the loop.

I would go with 2 in general but probably with 1 in this case.

@Gallaecio second opinion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done the second option

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest importing SpiderLoggerAdapter on every single log call looks a bit inefficient, imho.

But it's obviously up to you guys

Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
import logging
from typing import Any, MutableMapping, Tuple


class SpiderLoggerAdapter(logging.LoggerAdapter):
def process(
self, msg: str, kwargs: MutableMapping[str, Any]
) -> Tuple[str, MutableMapping[str, Any]]:
"""Method that augments logging with additional 'extra' data"""
if isinstance(kwargs.get("extra"), MutableMapping):
kwargs["extra"].update(self.extra)
else:
kwargs["extra"] = self.extra

return msg, kwargs
41 changes: 41 additions & 0 deletions tests/test_spider_logger_adapter.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
import logging
from typing import Any, Dict, Mapping, MutableMapping

import pytest

from scrapy.utils.spider_logger_adapter import SpiderLoggerAdapter


@pytest.mark.parametrize(
("base_extra", "log_extra", "expected_extra"),
(
(
{"spider": "test"},
{"extra": {"log_extra": "info"}},
{"extra": {"log_extra": "info", "spider": "test"}},
),
(
{"spider": "test"},
{"extra": None},
{"extra": {"spider": "test"}},
),
(
{"spider": "test"},
{"extra": {"spider": "test2"}},
{"extra": {"spider": "test"}},
),
),
)
def test_spider_logger_adapter_process(
base_extra: Mapping[str, Any], log_extra: MutableMapping, expected_extra: Dict
):
logger = logging.getLogger("test")
spider_logger_adapter = SpiderLoggerAdapter(logger, base_extra)

log_message = "test_log_message"
result_message, result_kwargs = spider_logger_adapter.process(
log_message, log_extra
)

assert result_message == log_message
assert result_kwargs == expected_extra