Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build: bump nltk to 3.6.7 for security and performance #130

Merged

Conversation

tianjianjiang
Copy link
Collaborator

@tianjianjiang tianjianjiang commented Jan 21, 2022

@shanyas10 and @SaulLu I'm going to merge it since it is based on dependabot. But if this makes website description preprocessing slower or even unusable, we can always revert it.

@tianjianjiang tianjianjiang changed the title build: bump nltk o 3.6.7 for security and performance build: bump nltk to 3.6.7 for security and performance Jan 21, 2022
@tianjianjiang tianjianjiang self-assigned this Jan 21, 2022
@tianjianjiang tianjianjiang added the bug Something isn't working label Jan 21, 2022
@dependabot dependabot bot deleted the dependabot/pip/nltk-3.6.6 branch January 21, 2022 15:38
@tianjianjiang tianjianjiang marked this pull request as ready for review January 21, 2022 15:39
@tianjianjiang tianjianjiang merged commit 9382b4f into bigscience-workshop:master Jan 21, 2022
tianjianjiang added a commit to tianjianjiang/bigscience-metadata that referenced this pull request Jan 21, 2022
* master: (141 commits)
  build: bump nltk to 3.6.7 for security and performance (bigscience-workshop#130)
  build: bump nltk to 3.6.7 for security and performance (#5)
  Add fp16, multi-GPU training script (toy dataset) (bigscience-workshop#123)
  create dataset with html, timestamp, url, datasource, generation length and website description metadata and tittles, footers and headers from HTML (bigscience-workshop#119)
  remove `#SBATCH --gres=gpu:0 ` from `03_create_dataset.slurm` (bigscience-workshop#121)
  Add joint training slurm script (bigscience-workshop#111)
  Add features types for the metadata to extract and test multiprocessing (bigscience-workshop#118)
  feat: add a feature to choose where to extract metadata (bigscience-workshop#116)
  Use dateutil to parse date (bigscience-workshop#117)
  feat: change how the entity extraction process use ids (bigscience-workshop#115)
  add `path_or_url_flair_ner_model` in order to execute the entity extraction on a partition without internet (bigscience-workshop#106)
  delete old submodule
  delete ds_store
  style check
  style & quality
  imports
  handle IndexError for `wikipedia_desc_utils` (bigscience-workshop#102)
  handle the comment specific type not recognized by pyarrow (bigscience-workshop#83)
  quality check
  Change torch version + make it optional (bigscience-workshop#82)
  ...

# Conflicts:
#	bsmetadata/metadata_utils.py
Copy link
Collaborator

@SaulLu SaulLu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try it 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Closed
Development

Successfully merging this pull request may close these issues.

None yet

2 participants