Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error "Invalid data access for a virtual column" when creating datatable.Frame() from pandas.DataFrame #3470

Open
ooooona opened this issue May 31, 2023 · 0 comments

Comments

@ooooona
Copy link

ooooona commented May 31, 2023

  • Did you find a bug in datatable, or maybe the bug found you?
    Tell us what it is.

Hi, I found a bug when I'm trying to convert pandas.DataFrame to datatable.Frame().

** Succeeded panda.DataFrame**
age job marital education default balance housing loan contact day month duration campaign pdays previous poutcome
0 23 self-employed single secondary no -921 no no telephone 2 jan 9 1 56 unknown success
1 26 entrepreneur married secondary no -1206 no no cellular 5 apr 16 9 56 unknown other
2 25 admin. single primary no -932 no no telephone 1 jun 14 5 1 unknown other
3 24 retired divorced secondary no -701 no no cellular 6 may 11 1 1 unknown failure
4 28 entrepreneur single primary no -932 yes yes telephone 5 jan 15 1 69 unknown other
5 29 self-employed single secondary no -701 yes yes cellular 10 may 7 2 2 unknown success
6 21 housemaid divorced primary no -679 yes yes telephone 3 aug 16 1 85 unknown other
7 27 services married secondary no -665 yes yes cellular 10 may 9 4 81 unknown success
8 29 admin. married primary no -710 no no telephone 2 nov 14 10 73 unknown success
9 26 technician divorced primary no -921 yes yes telephone 4 may 11 2 81 unknown success
10 28 admin. divorced primary no -701 no no telephone 6 dec 10 10 -1 unknown failure
11 20 housemaid divorced tertiary no -679 yes yes telephone 10 apr 9 4 81 unknown success
12 25 entrepreneur married primary no -710 no no cellular 4 dec 12 8 73 unknown other
13 20 housemaid married tertiary no -679 yes yes telephone 1 dec 12 10 85 unknown other
14 29 blue-collar married primary no -932 no no telephone 5 feb 7 4 64 unknown failure

** Failed pandas.DataFrame**
age job marital education default balance housing loan contact day month duration campaign pdays previous poutcome
0 23.0 self-employed single secondary no -921.0 no no telephone 2.0 jan 9.0 1.0 56.0 unknown success
1 26.0 entrepreneur married secondary no -1206.0 no no cellular 5.0 apr 16.0 9.0 56.0 unknown other
2 25.0 admin. single primary no -932.0 no no telephone 1.0 jun 14.0 5.0 1.0 unknown other
3 24.0 retired divorced secondary no -701.0 no no cellular 6.0 may 11.0 1.0 1.0 unknown failure
4 28.0 entrepreneur single primary no -932.0 yes yes telephone 5.0 jan 15.0 1.0 69.0 unknown other
5 29.0 self-employed single secondary no -701.0 yes yes cellular 10.0 may 7.0 2.0 2.0 unknown success
6 21.0 housemaid divorced primary no -679.0 yes yes telephone 3.0 aug 16.0 1.0 85.0 unknown other
7 27.0 services married secondary no -665.0 yes yes cellular 10.0 may 9.0 4.0 81.0 unknown success
8 29.0 admin. married primary no -710.0 no no telephone 2.0 nov 14.0 10.0 73.0 unknown success
9 26.0 technician divorced primary no -921.0 yes yes telephone 4.0 may 11.0 2.0 81.0 unknown success
10 28.0 admin. divorced primary no -701.0 no no telephone 6.0 dec 10.0 10.0 -1.0 unknown failure
11 20.0 housemaid divorced tertiary no -679.0 yes yes telephone 10.0 apr 9.0 4.0 81.0 unknown success
12 25.0 entrepreneur married primary no -710.0 no no cellular 4.0 dec 12.0 8.0 73.0 unknown other
13 20.0 housemaid married tertiary no -679.0 yes yes telephone 1.0 dec 12.0 10.0 85.0 unknown other
14 29.0 blue-collar married primary no -932.0 no no telephone 5.0 feb 7.0 4.0 64.0 unknown failure

But for the second one, if I reduce the batch_size from 15 to 1, it can work!!!

Could you please help to solve it? Thanks so much!

  • How to reproduce the bug?
  1. plaste the following to csv file:
    """csv
    age,job,marital,education,default,balance,housing,loan,contact,day,month,duration,campaign,pdays,previous,poutcome
    23.0,self-employed,single,secondary,no,-921.0,no,no,telephone,2.0,jan,9.0,1.0,56.0,unknown,success
    26.0,entrepreneur,married,secondary,no,-1206.0,no,no,cellular,5.0,apr,16.0,9.0,56.0,unknown,other
    """

  2. use dataframe = pandas.read_csv(${csv_path}) to load the csv file as pandas.DataFrame

  3. then, execute table = datatable.Frame(dataframe), it will core here.

  • What was the expected behavior?
    In case it is not obvious, please tell us what result should your code
    produce.

I think it should generated a datatable.Frame() rather than core dump

  • Your environment?
    Linux Add ability to delete columns in a datatable. Closes #38 #40~20.04.1-Ubuntu SMP Tue Apr 11 02:49:52 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

  • Tag the issue with [bug] or [segfault] (depending on whether it crashes
    Python or not).

  • Thank you for contributing, and sorry for the inconvenience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant