Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

numpy throws munmap_chunk(): invalid pointer when freeing its cache #14102

Closed
siddharth-singh-mindtickle opened this issue Jul 24, 2019 · 4 comments
Labels
00 - Bug 57 - Close? Issues which may be closable unless discussion continued

Comments

@siddharth-singh-mindtickle
Copy link

The Python code in a Docker container running Ubuntu bionic with Python 3.7.3 throws munmap_chunk(): invalid pointer(), I used memory profiling to monitor the amount of memory used and its at max 160MB, the Docker container has 1024MB assigned to it.

Through GDB I received the following backtrace which points to numpy free cache snippets in the source code.

Relevant Source Code at Numpy

https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/alloc.c 264, 104(deallocation called)

https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/arrayobject.c 533, 520 cache free operations

Error message:

Thread 1 "python3.7" received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51    ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) backtrace
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007ffff7a24801 in __GI_abort () at abort.c:79
#2  0x00007ffff7a6d897 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff7b9ab9a "%s\n") at ../sysdeps/posix/libc_fatal.c:181
#3  0x00007ffff7a7490a in malloc_printerr (str=str@entry=0x7ffff7b9c7a8 "munmap_chunk(): invalid pointer") at malloc.c:5350
#4  0x00007ffff7a7becc in munmap_chunk (p=0x1737a90) at malloc.c:2846
#5  __GI___libc_free (mem=0x1737aa0) at malloc.c:3117
#6  0x00007ffff64b3ecc in PyDataMem_FREE (ptr=0x1737aa0) at numpy/core/src/multiarray/alloc.c:264
#7  _npy_free_cache (dealloc=0x7ffff64b4130 <PyDataMem_FREE>, cache=0x7ffff6a292a0 <datacache>, msz=1024, nelem=<optimized out>, p=0x1737aa0) at numpy/core/src/multiarray/alloc.c:104
#8  npy_free_cache (p=0x1737aa0, sz=<optimized out>) at numpy/core/src/multiarray/alloc.c:139
#9  0x00007ffff64b7627 in array_dealloc (self=0x7fffdee1ab70) at numpy/core/src/multiarray/arrayobject.c:533
#10 0x00007ffff64b7747 in array_dealloc (self=0x7fffdee1a3a0) at numpy/core/src/multiarray/arrayobject.c:520
#11 0x000000000056aa70 in ?? ()
#12 0x000000000056a5ca in ?? ()
#13 0x000000000056aa70 in ?? ()
#14 0x000000000058decd in _PyObject_GenericSetAttrWithDict ()
#15 0x00000000005435db in ?? ()
#16 0x00000000004d0336 in ?? ()
#17 0x00000000005b1d2b in _PyObject_FastCallKeywords ()
#18 0x00000000005cc8f1 in ?? ()
#19 0x000000000051add9 in _PyEval_EvalFrameDefault ()
#20 0x00000000005b2257 in _PyFunction_FastCallDict ()
#21 0x0000000000541053 in ?? ()
#22 0x00000000005a409b in PyObject_SetAttr ()
#23 0x000000000051710d in _PyEval_EvalFrameDefault ()
#24 0x00000000005cd202 in _PyEval_EvalCodeWithName ()
#25 0x00000000005b11a2 in _PyFunction_FastCallKeywords ()
#26 0x0000000000516b9a in _PyEval_EvalFrameDefault ()
#27 0x00000000005b0eac in _PyFunction_FastCallKeywords ()
#28 0x00000000005cc6ea in ?? ()
#29 0x000000000051add9 in _PyEval_EvalFrameDefault ()
#30 0x00000000005cd202 in _PyEval_EvalCodeWithName ()
#31 0x00000000005b11a2 in _PyFunction_FastCallKeywords ()
#32 0x00000000005cc6ea in ?? ()
#33 0x000000000051add9 in _PyEval_EvalFrameDefault ()
#34 0x00000000005cd202 in _PyEval_EvalCodeWithName ()
#35 0x00000000005b2437 in _PyFunction_FastCallDict ()
#36 0x00000000005425b3 in ?? ()
#37 0x000000000056acb1 in ?? ()
#38 0x00000000005b4266 in PyObject_Call ()
#39 0x0000000000518041 in _PyEval_EvalFrameDefault ()
#40 0x00000000005cd202 in _PyEval_EvalCodeWithName ()
#41 0x00000000005b2437 in _PyFunction_FastCallDict ()
#42 0x0000000000518041 in _PyEval_EvalFrameDefault ()
#43 0x00000000005cd202 in _PyEval_EvalCodeWithName ()
#44 0x00000000005b11a2 in _PyFunction_FastCallKeywords ()
#45 0x00000000005cc6ea in ?? ()
#46 0x0000000000517a9e in _PyEval_EvalFrameDefault ()
#47 0x00000000005cd202 in _PyEval_EvalCodeWithName ()
#48 0x00000000005b11a2 in _PyFunction_FastCallKeywords ()
#49 0x00000000005cc6ea in ?? ()
#50 0x0000000000517a9e in _PyEval_EvalFrameDefault ()
#51 0x00000000005b0eac in _PyFunction_FastCallKeywords ()
#52 0x0000000000516b9a in _PyEval_EvalFrameDefault ()
#53 0x00000000005cd202 in _PyEval_EvalCodeWithName ()
#54 0x0000000000516673 in PyEval_EvalCode ()
#55 0x0000000000629bb2 in ?? ()
#56 0x0000000000629c6a in PyRun_FileExFlags ()
#57 0x000000000062a950 in PyRun_SimpleFileExFlags ()
#58 0x00000000006569c5 in ?? ()
#59 0x0000000000656d2e in _Py_UnixMain ()
#60 0x00007ffff7a05b97 in __libc_start_main (main=0x4c0450 <main>, argc=3, argv=0x7fffffff1c58, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffff1c48) at ../csu/libc-start.c:310
#61 0x00000000005e864a in _start ()

Numpy/Python version information:

PYTHON : 3.7.3
numpy: 1.16.4

A Much more detailed Trace from a Stretch Distro Base

podlogs.txt

@seberg
Copy link
Member

seberg commented Jul 24, 2019

@siddharth-singh-mindtickle what is the code you are running? My first reaction is to think there is either something specfic to what you do, or something wrong with the setup itself.

@siddharth-singh-mindtickle
Copy link
Author

@siddharth-singh-mindtickle what is the code you are running? My first reaction is to think there is either something specfic to what you do, or something wrong with the setup itself.

Nothing fancy, just reading a json, processing in pandas and saving it, tried two ways, once by using to_save next by loop, but I still have the issue,

result_json_admin_project = pd.read_json(r_admin_project.text)
required_data_admin_project = result_json_admin_project[['key']].rename(index=str, columns={'key':'All Data'})
required_data_admin_project = required_data_admin_project["All Data"].apply(pd.Series).rename(index=str, columns={0:'Name',1:'Time',2:'Browser',3:'Browser-version',4:'City',5:'Country',6:'Region',7:'Company',8:'User',9:'First Load',10:'Page Name',11:'Page Loading Time',12:'Stream',13:'Company Type',14:'Company-1',15:'Current URL'})
required_data_admin_project["Admin"] = ["admin.mindtickle" in i for i in required_data_admin_project["Current URL"]]
required_data_admin_project = required_data_admin_project[required_data_admin_project['Admin']==True]
#required_data_admin_project['Page Name'] = required_data_admin_project['Page Name'].fillna(required_data_admin_project['Page Name-1'])
required_data_admin_project['Company'] = required_data_admin_project['Company'].fillna(required_data_admin_project['Company-1'])
required_data_admin_project = required_data_admin_project.drop(['Company-1'],axis=1)
required_data_admin_project["UM"] = ["/ui/profile-management/" in i for i in required_data_admin_project["Current URL"]]
required_data_admin_project = required_data_admin_project[required_data_admin_project['Admin']==True]
required_data_admin_project.loc[required_data_admin_project['UM'] == True, 'Page Name'] = 'User Management'
required_data_admin_project = required_data_admin_project.drop('UM',axis=1)
required_data_admin_project.loc[required_data_admin_project['Page Name'] == 'Profile Managment Page', 'Stream'] = 'User Management'
company_type_data = required_data_new_project[['Company','Company Type']].append(required_data_admin_project[['Company','Company Type']]).drop_duplicates().dropna()
required_data_admin_project = required_data_admin_project.drop('Company Type',axis=1)
required_data_admin_project = required_data_admin_project.merge(company_type_data,on='Company',how='left')
required_data_admin_project_customer_data = required_data_admin_project[(required_data_admin_project['Company Type']!='QA') & (required_data_admin_project['Company Type']!='DEV')]
#required_data_admin_project_customer_data[required_data_admin_project_customer_data['Company Type'].isnull()][['Company']].drop_duplicates()
required_data_admin_project_customer_data = required_data_admin_project_customer_data[(required_data_admin_project_customer_data['Company']!='ilt.mindtickle.com')&(required_data_admin_project_customer_data['Company']!='alicia.mindtickle.com')&(required_data_admin_project_customer_data['Company']!='csgtesting.mindtickle.com')&(required_data_admin_project_customer_data['Company']!='localhost:3000')]
total_data = required_data_new_project_admin_data.append(required_data_admin_project_customer_data)
total_data = total_data.drop('Company Type',axis=1).drop_duplicates()
total_data['Time'] = (total_data['Time']/1000).astype(int)
total_data['Date and Time'] = pd.to_datetime(total_data['Time'],unit='s')
total_data['Date'] = total_data['Date and Time'].map(lambda x: x.strftime('%m-%d'))
total_data_summary = total_data[['Date','Page Loading Time']]
total_data_summary['Page Loading Time'] = total_data_summary['Page Loading Time']/1000
total_data_summary_mean = round(total_data_summary.groupby(['Date'])['Page Loading Time'].mean(),2).reset_index()
total_data_summary_mean['Stream'] = 'Overall Average'
total_data_summary_mean = pd.pivot_table(total_data_summary_mean,index='Stream',columns='Date',values=['Page Loading Time']).fillna(0).reset_index()
total_data_summary_median = round(total_data_summary.groupby(['Date'])['Page Loading Time'].median(),2).reset_index()
total_data_summary_median['Stream'] = 'Overall Median'
total_data_summary_median = pd.pivot_table(total_data_summary_median,index='Stream',columns='Date',values=['Page Loading Time']).fillna(0).reset_index()
total_data_summary_90percentile = round(total_data_summary.groupby(['Date'])['Page Loading Time'].quantile(0.9),2).reset_index()
total_data_summary_90percentile['Stream'] = 'Overall 90th Percentile'
total_data_summary_90percentile = pd.pivot_table(total_data_summary_90percentile,index='Stream',columns='Date',values=['Page Loading Time']).fillna(0).reset_index()
Overall_pages_summary_dashboard = total_data_summary_mean.append([total_data_summary_median,total_data_summary_90percentile])
total_data_stream_summary = total_data[['Stream','Date','Page Loading Time']]
total_data_stream_summary['Page Loading Time'] = total_data_stream_summary['Page Loading Time']/1000
total_data_stream_summary_mean = round(total_data_stream_summary.groupby(['Stream','Date'])['Page Loading Time'].mean(),2).reset_index()
total_data_stream_summary_mean = pd.pivot_table(total_data_stream_summary_mean,index='Stream',columns='Date',values=['Page Loading Time']).fillna(0).reset_index()
total_data_stream_summary_median = round(total_data_stream_summary.groupby(['Stream','Date'])['Page Loading Time'].median(),2).reset_index()
total_data_stream_summary_median = pd.pivot_table(total_data_stream_summary_median,index='Stream',columns='Date',values=['Page Loading Time']).fillna(0).reset_index()
total_data_stream_summary_90percentile = round(total_data_stream_summary.groupby(['Stream','Date'])['Page Loading Time'].quantile(0.9),2).reset_index()
total_data_stream_summary_90percentile = pd.pivot_table(total_data_stream_summary_90percentile,index='Stream',columns='Date',values=['Page Loading Time']).fillna(0).reset_index()
overall_pages_streamwise = total_data_stream_summary.groupby(['Stream']).size().reset_index(name='Total Pages')
total_data_stream_summary_mean = total_data_stream_summary_mean.merge(overall_pages_streamwise,left_on=[('Stream', '')],right_on='Stream').drop('Stream',axis=1)
total_data_stream_summary_median = total_data_stream_summary_median.merge(overall_pages_streamwise,left_on=[('Stream', '')],right_on='Stream').drop('Stream',axis=1)
total_data_stream_summary_90percentile = total_data_stream_summary_90percentile.merge(overall_pages_streamwise,left_on=[('Stream', '')],right_on='Stream').drop('Stream',axis=1)
abc = total_data_stream_summary_90percentile.transpose().reset_index()
abc.columns = abc.iloc[0]





abc[abc.columns[0]].iloc[1:-1] = abc[abc.columns[0]].iloc[1:-1].str[1]


new_df = pd.DataFrame(abc.values[1:-1], columns = abc.columns)
#overall_data = total_data_summary_90percentile.append(total_data_stream_summary_90percentile)
# dropped this line because of error and replaced it with loop below but to no use
# new_df.to_csv(out, index=False)

csv.register_dialect('myDialect', quoting=csv.QUOTE_ALL, skipinitialspace=True)
with open("custom.csv", 'w') as f:
    writer = csv.writer(f, dialect='myDialect')
    writer.writerow(new_df.columns)
    for row in new_df.fillna(0.0).values:
        writer.writerow(row)
f.close()

@seberg
Copy link
Member

seberg commented Aug 22, 2019

@siddharth-singh-mindtickle just found this again. Can you make sure to give a full example? (please try to make it minimal, i.e. if you do not need to put up a big txt to load, that would be better).

It will be impossible for anyone here to debug numpy unless we can reproduce the issue, and we may have to just close it then unfortunately.

@seberg seberg added 57 - Close? Issues which may be closable unless discussion continued 00 - Bug labels Aug 22, 2019
@rgommers
Copy link
Member

no follow-up for 2.5 years, so I'll close this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
00 - Bug 57 - Close? Issues which may be closable unless discussion continued
Projects
None yet
Development

No branches or pull requests

3 participants