build: bump nltk to 3.6.7 for security and performance #130

tianjianjiang · 2022-01-21T15:26:28Z

@shanyas10 and @SaulLu I'm going to merge it since it is based on dependabot. But if this makes website description preprocessing slower or even unusable, we can always revert it.

nltk 3.6.6: GHSA-f8m6-h2c7-8h9x and GHSA-rqjh-jp2r-59cj
nltk 3.6.7: Resolve IndexError in sent_tokenize nltk/nltk#2922

Two side notes: 1. Python 3.7.11 is for Colab; 2. Poetry is optional for managing venv and dependencies, but syncing with requirements(-dev).txt must be done manually for the time being.

Bumps [nltk](https://github.com/nltk/nltk) from 3.6.5 to 3.6.6. - [Release notes](https://github.com/nltk/nltk/releases) - [Changelog](https://github.com/nltk/nltk/blob/develop/ChangeLog) - [Commits](nltk/nltk@3.6.5...3.6.6) --- updated-dependencies: - dependency-name: nltk dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>

* master: (141 commits) build: bump nltk to 3.6.7 for security and performance (bigscience-workshop#130) build: bump nltk to 3.6.7 for security and performance (#5) Add fp16, multi-GPU training script (toy dataset) (bigscience-workshop#123) create dataset with html, timestamp, url, datasource, generation length and website description metadata and tittles, footers and headers from HTML (bigscience-workshop#119) remove `#SBATCH --gres=gpu:0 ` from `03_create_dataset.slurm` (bigscience-workshop#121) Add joint training slurm script (bigscience-workshop#111) Add features types for the metadata to extract and test multiprocessing (bigscience-workshop#118) feat: add a feature to choose where to extract metadata (bigscience-workshop#116) Use dateutil to parse date (bigscience-workshop#117) feat: change how the entity extraction process use ids (bigscience-workshop#115) add `path_or_url_flair_ner_model` in order to execute the entity extraction on a partition without internet (bigscience-workshop#106) delete old submodule delete ds_store style check style & quality imports handle IndexError for `wikipedia_desc_utils` (bigscience-workshop#102) handle the comment specific type not recognized by pyarrow (bigscience-workshop#83) quality check Change torch version + make it optional (bigscience-workshop#82) ... # Conflicts: # bsmetadata/metadata_utils.py

SaulLu

Let's try it 🚀

tianjianjiang and others added 11 commits September 6, 2021 22:18

build: set to py37 with an optional Poetry env

7853657

Two side notes: 1. Python 3.7.11 is for Colab; 2. Poetry is optional for managing venv and dependencies, but syncing with requirements(-dev).txt must be done manually for the time being.

Merge branch 'bigscience-workshop:master' into master

aa57ad6

refactor: set position of tqdm for multithread eval without newline (#3)

c67eb68

Merge branch 'bigscience-workshop:master' into master

2c55192

Merge branch 'bigscience-workshop:master' into master

e7f3b3a

Merge branch 'bigscience-workshop:master' into master

256927a

Merge branch 'bigscience-workshop:master' into master

3d1ec52

Merge branch 'bigscience-workshop:master' into master

3460c8d

Merge branch 'bigscience-workshop:master' into master

3d600dd

build: bump nltk to 3.6.7 for security and speed

50bc714

tianjianjiang changed the title ~~build: bump nltk o 3.6.7 for security and performance~~ build: bump nltk to 3.6.7 for security and performance Jan 21, 2022

tianjianjiang self-assigned this Jan 21, 2022

tianjianjiang added the bug Something isn't working label Jan 21, 2022

dependabot bot deleted the dependabot/pip/nltk-3.6.6 branch January 21, 2022 15:38

tianjianjiang marked this pull request as ready for review January 21, 2022 15:39

tianjianjiang requested review from SaulLu and shanyas10 January 21, 2022 15:40

tianjianjiang merged commit 9382b4f into bigscience-workshop:master Jan 21, 2022

SaulLu reviewed Jan 21, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

build: bump nltk to 3.6.7 for security and performance #130

build: bump nltk to 3.6.7 for security and performance #130

tianjianjiang commented Jan 21, 2022 •

edited

SaulLu left a comment

build: bump nltk to 3.6.7 for security and performance #130

build: bump nltk to 3.6.7 for security and performance #130

Conversation

tianjianjiang commented Jan 21, 2022 • edited

SaulLu left a comment

Choose a reason for hiding this comment

tianjianjiang commented Jan 21, 2022 •

edited