Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors in Train with Datasets Tensorflow code section on Huggingface.co #4084

Closed
blackhat-coder opened this issue Apr 1, 2022 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@blackhat-coder
Copy link

Describe the bug

Hi

Error 1

Running the Tensforlow code on Huggingface gives a TypeError: init() got an unexpected keyword argument 'return_tensors'

Error 2

DataCollatorWithPadding isn't imported

Steps to reproduce the bug

import tensorflow as tf
from datasets import load_dataset
from transformers import AutoTokenizer
dataset = load_dataset('glue', 'mrpc', split='train')
tokenizer = AutoTokenizer.from_pretrained('bert-base-cased')
dataset = dataset.map(lambda e: tokenizer(e['sentence1'], truncation=True, padding='max_length'), batched=True)
data_collator = DataCollatorWithPadding(tokenizer=tokenizer, return_tensors="tf")
train_dataset = dataset["train"].to_tf_dataset(
  columns=['input_ids', 'token_type_ids', 'attention_mask', 'label'],
  shuffle=True,
  batch_size=16,
  collate_fn=data_collator,
)

This is the same code on Huggingface.co

Actual results

TypeError: init() got an unexpected keyword argument 'return_tensors'

Environment info

  • datasets version: 2.0.0
  • Platform: Windows-10-10.0.19044-SP0
  • Python version: 3.9.7
  • PyArrow version: 6.0.0
  • Pandas version: 1.4.1
@blackhat-coder blackhat-coder added the bug Something isn't working label Apr 1, 2022
@albertvillanova albertvillanova self-assigned this Apr 4, 2022
@albertvillanova
Copy link
Member

albertvillanova commented Apr 4, 2022

Hi @blackhat-coder, thanks for reporting.

Please note that the transformers library updated their data collators API last year (version 4.10.0):

now requiring to pass return_tensors argument at Data Collator instantiation.

And therefore, we also updated in the datasets library documentation all the examples using transformers data collators.

If you would like to follow our examples, please update your installed transformers version:

pip install -U transformers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants