A seq2seq (BERT) model from HuggingFace was trained on the Yelp Open Dataset to generate Yelp reviews.
- ~1.5M reviews (subset of the ~8M reviews from Yelp Open Dataset)
- ~100k businesses
- stars: rating of business (1 - 5)
- funny: number of funny votes received
- elite level: total number of years the reviewer was elite
- name: business' name
- city: city of the business
- categories: business categories
The six input features are concatenated into a single string for each example.
- Yelp review
During training, the Yelp review is constrained to 128 tokens. During inference, the model can generate words until an EOS token is generated or stopped early at a predetermined fixed length.
To train a new model, use the seq2seq_train notebook. The Yelp Open Dataset needs to be downloaded first.
Use the seq2seq_predict notebook. The model trained from this project will be downloaded automatically in the notebook. You don't have to provide your own trained model.