-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IN PROGRESS Logging setup #842
base: master
Are you sure you want to change the base?
Conversation
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @sylwiamm!
Big thanks for the awesome work on switching us over to Python’s logging module. It’s a solid move, and your code is super clean and well thought out. Really, it's clear you’ve put in a ton of effort and it shows:)
I’ve dropped a bunch of comments here and there, mostly nitpicks and minor fixes, because the core of your PR is pretty good. So, don’t let the number of comments psych you out; it’s all about polishing up something that’s already great.
So, onto the big chat – the logging setup. Right now, we’re rolling with a single gandlf
logger across the library. However as logging
can be configured by two ways:
- using root logger and
basicConfig
- simple and clean, though not much flexible and against "python logging best practices" as google shows, - using separate loggers and a pile of configuration - complex, non-obvious, error-prone, but really flexible.
And there is no middle solutions, thanks tologging
What do you think about sticking with the root logger to keep things simple, or do you believe it's better to juggle multiple loggers for more control? I’m kinda torn, and your take would be super helpful.
It’s all about finding that sweet spot where our logging is easy to use but still does everything we need. Keen to hear your thoughts and figure out the best move forward.
Thanks again for your hard work – it’s making a real difference. Let’s nail down this logging thing together:)
logger_dir (str): directory for the logs | ||
logger_name (str): name of the logger | ||
""" | ||
logs_dir = f"{output_dir}/logs" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd propose to use pathlib here for not being bothered with some weird cases:
logs_dir = f"{output_dir}/logs" | |
logs_dir = Path(output_dir) / "logs" |
moreover you are already using Path a few lines further
@log_capture() | ||
def test_generic_logging(capture): | ||
print("52: Testing logging setup") | ||
|
||
logger, logs_dir, logger_name = setup_logger(outputDir, verbose=True) | ||
|
||
assert os.path.isdir(logs_dir), "Directory for logs was not generated" | ||
assert ( | ||
logger_name in logging.root.manager.loggerDict | ||
), f"Logger with name {logger_name} was not created" | ||
|
||
logger.info("a message") | ||
logger.error("an error") | ||
|
||
capture.check( | ||
(logger_name, "INFO", "a message"), (logger_name, "ERROR", "an error") | ||
) | ||
|
||
sanitize_outputDir() | ||
|
||
print("passed") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I see, this test checks only that logger was created with an appropriate name. However, you do have a bunch of other stuff in setup_logger
: setting warnings format, file handler, verbosity, etc.
I understand they may be quite hard to be tested. If you do have a resource and ideas how to cover it with tests - that would be great. If no - still ok. In my turn, I'll also look on how we can improve this.
+ "' could not be loaded due to the following exception: {}".format( | ||
type(exception).__name__ | ||
) | ||
+ "; message: {}".format(exception) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In some places you left prints, is it intentional?
logger = logging.getLogger(params["logger_name"]) | ||
else: | ||
logger, params["logs_dir"], params["logger_name"] = setup_logger( | ||
output_dir=params["output_dir"], verbose=params.get("verbose", False) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is duplicated in multiple places. Can we deduplicate it or simplify somehow? The only idea I have in mind is just move it to a separate function in utils, like
logger, params = get_or_setup_logger(params)
that is not elegant tho because without going inside it's hard to understand how much global params
are changed... What do you think?
|===========================================================================| \n | ||
| CPU Utilization | \n | ||
| Load_Percent : {psutil.cpu_percent(interval=None)} | \n | ||
| MemUtil_Percent: {psutil.virtual_memory()[2]} | \n |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you sure we need additional \n
inside? doesn't it create an empty lines?
logger.debug( | ||
"Full image " + mode + ":: Loss: ", | ||
final_loss, | ||
"; Metric: ", | ||
final_metric, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logger.debug( | |
"Full image " + mode + ":: Loss: ", | |
final_loss, | |
"; Metric: ", | |
final_metric, | |
) | |
logger.debug(f"Full image {mode}:: Loss: {final_loss}; Metric: {final_metric}") |
Because you know logging utils are not like a print, it fails if you pass multiple values instead of printing them together
logger.debug( | ||
"=== Validation shapes : label:", | ||
label.shape, | ||
", image:", | ||
image.shape, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logger.debug( | |
"=== Validation shapes : label:", | |
label.shape, | |
", image:", | |
image.shape, | |
) | |
logger.debug(f"=== Validation shapes : label: {label.shape}, image: {image.shape}") |
logger.debug( | ||
"=== Current patch:", | ||
current_patch, | ||
", time : ", | ||
get_date_time(), | ||
", location :", | ||
patches_batch[torchio.LOCATION], | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logger.debug( | |
"=== Current patch:", | |
current_patch, | |
", time : ", | |
get_date_time(), | |
", location :", | |
patches_batch[torchio.LOCATION], | |
) | |
logger.debug(f"=== Current patch: {current_patch}, time : {get_date_time()}, location : {patches_batch[torchio.LOCATION]}") |
or maybe even we don't need time here as it is already included in logging format
@@ -115,8 +122,7 @@ def validate_network( | |||
for batch_idx, (subject) in enumerate( | |||
tqdm(valid_dataloader, desc="Looping over " + mode + " data") | |||
): | |||
if params["verbose"]: | |||
print("== Current subject:", subject["subject_id"], flush=True) | |||
logger.debug("== Current subject:", subject["subject_id"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logger.debug("== Current subject:", subject["subject_id"]) | |
logger.debug(f"== Current subject: {subject['subject_id']}") |
logger_name (str): name of the logger | ||
""" | ||
logs_dir = f"{output_dir}/logs" | ||
logger_name = "gandlf" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, all the loggers would be created with gandlf
logger_name. I.e., actually, you create only one logger that would be used by all the code. Is it intentional?
That makes create-logger-in-every-module-logic somehow excessive as there is one and only one logger.
I see a couple of solutions:
One logger
If you believe one logger is enough, I'd propose to use root logger. It simplifies working with logging
module A MUCH.
Say, we can get rid of these pieces of code:
if "logger_name" in parameters:
logger = logging.getLogger(parameters["logger_name"])
else:
logger, parameters["logs_dir"], parameters["logger_name"] = setup_logger(
output_dir=parameters["output_dir"],
verbose=parameters.get("verbose", False),
)
and it's much easier to use logger in subutils:
def csv_processing(df: DataFrame):
logging.info(f"df processing: {df.shape}")
train_df, val_df = split_df(df)
...
def split_df(df: DataFrame):
train_df = df[:1000]
val_df = df[1000:]
logging.info("len of train/val: train={len(train_df)}, val={len(val_df)}") # <- you see, here we didn't passed any loggers, but still use the already configured root logger
...
And it's much easier to be configured via logging.basicConfig()
. Configuring loggers outside of basicConfig is always pain in the ass; say, right now logging do not work properly (at least for me) because logger.addHandler(fh)
actually removes stdout handler so nothing is printed to the console anymore 🤦 (that's facepalm to logging configuration, not to your code, jfy)
However, I should warn that logging best practices advice to avoid using root logger as it is hard to configure different loggers / logstreams in a flexible way then.
- if you prefer to stay with best practices, maybe you do want to make a different loggers for different modules also.
- Actually, keeping existing loggers logic is also possible - like, "yes, right now we have only one logger, but it can easily be updated in future if anybody needs a separate logger").
What do you think?
If we use root logger, all the configuration simplifies significantly. If we'll keep separate loggers, then configuration needs to be checked / fixed (I can help you there)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey! I aggree that using root logger would be the most strraightforward option (I wanted to do it myself), but I was reluctant to write something that is against "best practices" :) I'll update the code asap!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also have a question :) What do you think is more developer-friendly / readable: configuring logger in a function, or having it a json format? (https://docs.python.org/3/howto/logging-cookbook.html#custom-handling-of-levels)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely function:) JSON / ini config are even more mind-blowing and error prone... That's may be useful for a bigger production services, but in case of Gandlf easier is better IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are exploring configuration of logger, might I recommend we do it in the current configuration mechanism? This will ensure anything that can be set by the user is in a single place, hence easier to maintain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have a logs path independent from output_dir
Where do you wanna to place them?./gandlf.log
or kinda this?
This can actually go intoparameters["output_dir"]/logs
. To make it robust, we can do something like this:log_dir = os.path.join(parameters.get("output_dir", ${some_meaningful_default}), "logs")
.
Sorry, didn't get. parameters
- do you mean model config file? or CLI command parameters? Also we can think about a common gandlf config file / config setup (like ~/.gandlf_config.yaml
, but that's definitely out of this PR's scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all logger with .getLogger(name) in modules
It means that it would be
root
logger that defines all handlers, formatters, filters, etc, right? We create a new logger in each module, but all of them would process messages in the same way, right? (as well as third-party libraries - say, logs from pytorch would be processed by root logger in the same way)This will need extremely robust developer notes and documentation, and associated PR template update to ensure that the reviewers/contributors have this in mind.
Hm, your're right, that should be noted in the developer notes. Though I don't see much problems here: logging usage is much simpler than logging configuration. Like, developer would just need to
import blabla
import foobar
import logging # <- add this (1)
logger = logging.getLogger("my-new-awesome-command") # <- add this (2)
def my_awesome_logic(input_data: str):
...
logging.info(f"input data shape: {input_df.shape}") # <- just use logging.info instead of print (3)
...
and even simpler for improving existing modules (where logger is already defined).
And we can simplify it a bit more by logger = logging.getLogger(__name__)
so developer don't need to bother about logger name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we somehow structure
tqdm
to output to std.out instead?
Yep, there is file
param that can be set to stdout
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I feel as long as warnings are present in a separate file, users can always check and take appropriate action. IMO, filtering some information from the main output (which is currently usually noise) does make sense.
Didn't get, sorry. What do you mean under "main output"? and what (and where) do you want to filer from it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed everything together outside of PR.
- log is splitted into four parts: debug log file, stdout, stderr, tqdm progress file
- debug log file catches everything
DEBUG+
. In particular, it should catch errors and warnings etc also (thus, both stdout and stderr content is also catched by debug log file) - stdout displays
INFO
messages. Not lower, not higher - stderr displays
WARNING+
messages. - tqdm is redirected to tqdm progress (btw, how @sylwiamm ?
tqdm(.., file=tqdm_file_path)
? a separate logger for tqdm? filter that catches tqdm messages?)
- Excessive warnings from third party libs make a mess.
y_pred contains classes not in y_true
: solve here [BUG] Trying to train a model for classification gives a weird warning #848The ``converters`` are currently experimental.
- either patch ACSConv, or hide with a separate filter on our logging side
- debug log file is located somewhere like
TMP_FOLDER
/gandlf
/gandlf.log
with file rotation. It may be difficult to print the specific path in case of any exception, so I propose to print it once on the command start (if it possible at all?)
Add plz anything if I forgot / understand in wrong way, fabs
Fixes #755
Proposed Changes
Checklist
CONTRIBUTING
guide has been followed.typing
is used to provide type hints, including and not limited to usingOptional
if a variable has a pre-defined value).pip install
step is needed for PR to be functional), please ensure it is reflected in all the files that control the CI, namely: python-test.yml, and all docker files [1,2,3].