Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

printing dataset_id and important filenames/values in blue #356

Merged

Conversation

MalteEbner
Copy link
Contributor

@MalteEbner MalteEbner commented May 5, 2021

closes #332
closes https://github.com/lightly-ai/lightly-core/issues/352

Description

  • My change is breaking
  • The lightly-upload command now also prints the dataset_id if the new_dataset_name was specified.
  • Many important filenames/values are now printed in blue.
  • The lightly-magic command now prints e.g. ########## Starting to embed your dataset. before each of its three steps
  • More unittests for lightly-download to also cover new printings in blue
  • added helper to print missing cli api arguments as yellow warnings, but without text

lightly-magic: (cut off parts of the start)
image
image

lightly-download:
image
image

Tests

  • My change is covered by existing tests
  • My change needs new tests
  • I have added/adapted tests accordingly.
  • I have manually tested the change.

If applicable, describe the manual test procedure, e.g:

pip uninstall lightly
export BRANCH_NAME="332-Return-the-dataset_id-when-creating-a-new-dataset-me"
pip install "git+https://github.com/lightly-ai/lightly.git@$BRANCH_NAME"
lightly-magic new_dataset_name='new_dataset_name_xyz' token=MY_TOKEN trainer.max_epochs=1 input_dir='my_input_dir'
lightly-download token= MY_TOKEN dataset_id='was_in_blue_in_CLI_print' tag_name='initial-tag' input_dir='my_input_dir' output_dir='my_output_dir'

Documentation

  • I have added docstrings to all changed/added public functions/methods.
  • My change requires a change to the documentation ( .rst files).
  • I have updated the documentation accordingly.
  • The autodocs update the documentation accordingly.`

Copy link
Contributor

@japrescott japrescott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good with my limited knowledge :)

Questions;

  • Can we use a custom blue (like our lightly blue? #2BA3D1 :)) )
  • Can we color the upload/progress bar different from the rest of the information? e.g white?

Suggestions;

  • Maybe we would also want to show warnings/errors in an orange/red color to make it more eye-popping when it occurs?

@@ -61,10 +73,30 @@ def test_download_tag_name_exclude_parent(self):
self.parse_cli_string(cli_string)
lightly.cli.download_cli(self.cfg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since every test has this line as well, it would call this multiple times, no?

Copy link
Contributor Author

@MalteEbner MalteEbner May 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is called once for every test. It runs against the mocked server.
We could put those two lines into an extra function which is called directly with the cli_string. However, this would only save one line per test, thus it is not worth it in my eyes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, I just realised its called twice thats all. If that is intended behaviour 1) to set it up and 2) to then call it correctly, then all is good

lightly/cli/download_cli.py Show resolved Hide resolved
@MalteEbner
Copy link
Contributor Author

MalteEbner commented May 5, 2021

  • Can we use a custom blue (like our lightly blue? #2BA3D1 :)) )
  • Maybe we would also want to show warnings/errors in an orange/red color to make it more eye-popping when it occurs?

We can also have it blink, change the background and font and even more:
https://stackoverflow.com/questions/4842424/list-of-ansi-color-escape-sequences

However, I would put those ideas into a new issue, as they need a discussion about which design languages to follow. We should not overwhelm our users with a rainbow of colours.

@MalteEbner
Copy link
Contributor Author

MalteEbner commented May 5, 2021

  • Can we color the upload/progress bar different from the rest of the information? e.g white?

We use tqdm for the bar, so this is possible with some tricks: tqdm/tqdm#450
However, I would not do it, as other common frameworks (pytorch, tensorflow) don't do it either and consider it as distracting.

Copy link
Contributor

@IgorSusmelj IgorSusmelj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@japrescott
Copy link
Contributor

japrescott commented May 5, 2021

I thought you wanted an open discussion of how we want to handle it, since you also added the ########## "styling" instead of just colouring info text differently.

Which was requested in https://github.com/lightly-ai/lightly-core/issues/352 as far as I understood "clear separation between sections"

However, I had a closer look at our warnings and found that they were inconsistent: Some were in the warnings.warn() format, others just printed. I aligned that by introducing a helper function for printing as warnings, but in yellow and without giving the exact line of the warning.

Copy link
Contributor

@philippmwirth philippmwirth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! The only thing I don't like is that there seems to be no linebreak before printing the dataset id.

Other than that I would love it if you could add one print statement (or two) for the embedding upload.

@codecov
Copy link

codecov bot commented May 6, 2021

Codecov Report

Merging #356 (6906efd) into develop (f58144f) will increase coverage by 0.81%.
The diff coverage is 97.67%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #356      +/-   ##
===========================================
+ Coverage    84.21%   85.03%   +0.81%     
===========================================
  Files           70       70              
  Lines         2408     2432      +24     
===========================================
+ Hits          2028     2068      +40     
+ Misses         380      364      -16     
Impacted Files Coverage Δ
lightly/cli/_helpers.py 80.28% <90.00%> (+1.59%) ⬆️
lightly/cli/download_cli.py 94.73% <100.00%> (+18.46%) ⬆️
lightly/cli/embed_cli.py 90.32% <100.00%> (+0.15%) ⬆️
lightly/cli/lightly_cli.py 96.29% <100.00%> (+0.64%) ⬆️
lightly/cli/train_cli.py 85.71% <100.00%> (+0.18%) ⬆️
lightly/cli/upload_cli.py 94.82% <100.00%> (+1.07%) ⬆️
lightly/data/dataset.py 94.38% <0.00%> (+6.74%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f58144f...6906efd. Read the comment docs.

@MalteEbner
Copy link
Contributor Author

MalteEbner commented May 6, 2021

Looks good! The only thing I don't like is that there seems to be no linebreak before printing the dataset id.

This is due to tqdm, which creates those artifacts due to race conditions of the multiple workers. I added one line to solve it:

print(f"Finished the upload of the dataset.")

Other than that I would love it if you could add one print statement (or two) for the embedding upload.

I added those statements:

print("Starting upload of embeddings.")
api_workflow_client.upload_embeddings(
path_to_embeddings_csv=path_to_embeddings, name=name
)
print("Finished upload of embeddings.")

@MalteEbner MalteEbner force-pushed the 332-Return-the-dataset_id-when-creating-a-new-dataset-me branch from a03a876 to 6906efd Compare May 6, 2021 12:20
@MalteEbner MalteEbner merged commit 4553c9f into develop May 6, 2021
@MalteEbner MalteEbner deleted the 332-Return-the-dataset_id-when-creating-a-new-dataset-me branch May 6, 2021 12:29
IgorSusmelj added a commit that referenced this pull request May 7, 2021
* Extract exif data (#331)

* Add lightly_utils to requirements and use it for exif

* Add tests for corrupt image upload

* Use lightly-utils for thumbnail and metadata

* Add tests for remaining utils

* Github actions: remove testing on python 3.7 and 3.8 (#346)

* refactor NTXent loss (#340)

- For the unittests: wrote down an alternative calculation of the NTXent loss following exactly the pseudocode in the paper.
https://github.com/lightly-ai/lightly/blob/05bb9bd6c93c537763e8be8dc1f776db7ef4b793/tests/loss/test_NTXentLoss.py#L30-L31
Then it is checked whether the torch implementation and manual implementations with for-loops and numpy yield the same results.
- Deleted unused functions in the NTXent loss class
- Made the code a bit simpler and more performant and calculated the similarity matrix with Einstein summation.
- Added a higher number of comments to make it easier to understand what is going on.

* 284 object detection tutorial (#347)

* Add first version of new AL tutorial

* Update tutorial

* Update tutorial and make sure it runs properly

* Add new tutorial

* Implemented feedback

* Fix bad reference

* Update tutorial based on feedback

* Replace `unlabeled_set` with `query_set`

* Update docs

* Remove docker routes and adapt tests (#352)

* 309 semantic segmentation scorer pw (#355)

* First implementation of semseg scorer

* Add tests

* Update documentation

* clean up of platform tutorial 2 active learning (#344)

- use LinearRegression instead of kNN and describe that this equals using the embedding model as backbone and only fine-tuning a classification head on top of it.
- use `lightly-magic` to make usage easier
- use always the same dataset_name everywhere
- more and better comments

* printing dataset_id and important filenames/values in blue (#356)

- The `lightly-upload` command now also prints the `dataset_id` if the `new_dataset_name` was specified.
- Many important filenames/values are now printed in blue.
- The `lightly-magic` command now prints e.g. `########## Starting to embed your dataset.` before each of its three steps
- More unittests for `lightly-download` to also cover new printings in blue
- added helper to print missing cli api arguments as yellow warnings, but without text

* Added BYOL model and Test cases (#348)

* Added BYOL model and Test cases

* Remove return_features and fix tests

* Add byol model

* Add module info. Add note that implementation is experimental.

Co-authored-by: philippmwirth <philipp.m.wirth@gmail.com>
Co-authored-by: IgorSusmelj <igor_7779@hotmail.com>

* Bump version to 1.1.8

* Remove pytorch_lightning>=1.3.0 (#360)

Co-authored-by: Philipp Wirth <65946090+philippmwirth@users.noreply.github.com>
Co-authored-by: MalteEbner <malte.ebner@gmail.com>
Co-authored-by: pranavsinghps1 <46050869+pranavsinghps1@users.noreply.github.com>
Co-authored-by: philippmwirth <philipp.m.wirth@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Return the dataset_id when creating a new dataset through the CLI
4 participants