Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash in MDArray API when opening same file from multiple threads #6253

Closed
lnicola opened this issue Aug 23, 2022 · 4 comments
Closed

Crash in MDArray API when opening same file from multiple threads #6253

lnicola opened this issue Aug 23, 2022 · 4 comments
Assignees

Comments

@lnicola
Copy link
Contributor

lnicola commented Aug 23, 2022

Expected behavior and actual behavior.

The following code crashes with SIGSEGV in libhdf5.

This is using ASAN, but it also happens without it:

AddressSanitizer:DEADLYSIGNAL
=================================================================
==1537122==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000040 (pc 0x7fd4d28fae10 bp 0x7fd4cb9fd73f sp 0x7fd4cb9fd4f8 T2)
==1537122==The signal is caused by a READ memory access.
==1537122==Hint: address points to the zero page.
    #0 0x7fd4d28fae10 in H5F_addr_decode (/usr/lib/libhdf5.so.200+0xfae10)
    #1 0x7fd4d2aeede8 in H5VL__native_blob_specific (/usr/lib/libhdf5.so.200+0x2eede8)
    #2 0x7fd4d2adfb97  (/usr/lib/libhdf5.so.200+0x2dfb97)
    #3 0x7fd4d2ae747c in H5VL_blob_specific (/usr/lib/libhdf5.so.200+0x2e747c)
    #4 0x7fd4d2ad4153  (/usr/lib/libhdf5.so.200+0x2d4153)
    #5 0x7fd4d2a5f4b2 in H5T__conv_vlen (/usr/lib/libhdf5.so.200+0x25f4b2)
    #6 0x7fd4d2a508f0 in H5T_convert (/usr/lib/libhdf5.so.200+0x2508f0)
    #7 0x7fd4d28c7f94 in H5D_get_create_plist (/usr/lib/libhdf5.so.200+0xc7f94)
    #8 0x7fd4d2aef860 in H5VL__native_dataset_get (/usr/lib/libhdf5.so.200+0x2ef860)
    #9 0x7fd4d2ad4d47  (/usr/lib/libhdf5.so.200+0x2d4d47)
    #10 0x7fd4d2adcd31 in H5VL_dataset_get (/usr/lib/libhdf5.so.200+0x2dcd31)
    #11 0x7fd4d28a087c in H5Dget_create_plist (/usr/lib/libhdf5.so.200+0xa087c)
    #12 0x7fd4d2cad1a3 in nc4_get_var_meta (/usr/lib/libnetcdf.so.19+0xad1a3)
    #13 0x7fd4d2cad920 in nc4_hdf5_find_grp_var_att (/usr/lib/libnetcdf.so.19+0xad920)
    #14 0x7fd4d2cb3f37 in NC4_HDF5_inq_var_all (/usr/lib/libnetcdf.so.19+0xb3f37)
    #15 0x7fd4d2c2fc96 in nc_inq_var (/usr/lib/libnetcdf.so.19+0x2fc96)
    #16 0x7fd4d2c2fcd7 in nc_inq_varname (/usr/lib/libnetcdf.so.19+0x2fcd7)
    #17 0x7fd4d447ce1d  (/usr/lib/gdalplugins/gdal_netCDF.so+0x62e1d)
    #18 0x7fd4d4475b63  (/usr/lib/gdalplugins/gdal_netCDF.so+0x5bb63)
    #19 0x7fd4d447c711  (/usr/lib/gdalplugins/gdal_netCDF.so+0x62711)
    #20 0x558a18608657 in main::{lambda()#1}::operator()() const (/home/grayshade/gdal-threads/gdal-threads+0x2657)
    #21 0x558a186094c9 in void std::__invoke_impl<void, main::{lambda()#1}>(std::__invoke_other, main::{lambda()#1}&&) (/home/grayshade/gdal-threads/gdal-threads+0x34c9)
    #22 0x558a1860944f in std::__invoke_result<main::{lambda()#1}>::type std::__invoke<main::{lambda()#1}>(main::{lambda()#1}&&) (/home/grayshade/gdal-threads/gdal-threads+0x344f)
    #23 0x558a186093a9 in void std::thread::_Invoker<std::tuple<main::{lambda()#1}> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) (/home/grayshade/gdal-threads/gdal-threads+0x33a9)
    #24 0x558a18609351 in std::thread::_Invoker<std::tuple<main::{lambda()#1}> >::operator()() (/home/grayshade/gdal-threads/gdal-threads+0x3351)
    #25 0x558a18609319 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<main::{lambda()#1}> > >::_M_run() (/home/grayshade/gdal-threads/gdal-threads+0x3319)
    #26 0x7fd4d94d62f2 in execute_native_thread_routine /usr/src/debug/gcc/libstdc++-v3/src/c++11/thread.cc:82
    #27 0x7fd4d929f78c  (/usr/lib/libc.so.6+0x8678c)
    #28 0x7fd4d93208e3 in __clone (/usr/lib/libc.so.6+0x1078e3)

Important note: it only seems to happen when opening the same file. If I make a copy of the .nc under another name and use that in the second thread, it doesn't crash any more.

The 2D API might be affected too, I haven't tried.

Steps to reproduce the problem.

// g++ gdal-threads.cpp -o gdal-threads -pthread -lgdal -fsanitize=address && ./gdal-threads

#include <thread>

#include "gdal_priv.h"

int main()
{
    GDALAllRegister();
    for (int i = 0; i < 1000; i++) {
        std::thread t1([] {
            auto poDataset = std::unique_ptr<GDALDataset>(
                GDALDataset::Open("alldatatypes.nc", GDAL_OF_MULTIDIM_RASTER));
            if (!poDataset) {
                exit(1);
            }
            auto poRootGroup = poDataset->GetRootGroup();
            if (!poRootGroup) {
                exit(1);
            }
            auto poVar = poRootGroup->OpenMDArray("string_var");
            if (!poVar) {
                exit(1);
            }
        });
        std::thread t2([] {
            std::unique_ptr<GDALDataset>(
                GDALDataset::Open("alldatatypes.nc", GDAL_OF_MULTIDIM_RASTER));
        });
        t1.join();
        t2.join();
    }
    return 0;
}

Operating system

Arch Linux x64

GDAL version and provenance

  • gdal 3.5.1
  • netcdf 4.9.0
  • hdf5 1.12.2

All of the above being distro packages. I think Arch carries a couple of patches, but the ones for netcdf don't seem relevant.

alldatatypes.zip

@lnicola
Copy link
Contributor Author

lnicola commented Aug 23, 2022

This also happens in osgeo/gdal:ubuntu-full-3.5.1, but not in osgeo/gdal:latest. It might have been fixed, either in GDAL, or in one of the other libraries.

EDIT: actually no, it still crashes in latest.

ChristianBeilschmidt added a commit to ChristianBeilschmidt/rust-gdal that referenced this issue Aug 24, 2022
@lnicola lnicola changed the title Crash when in MDArray API when opening same file from multiple threads Crash in MDArray API when opening same file from multiple threads Sep 2, 2022
@rouault rouault self-assigned this Sep 3, 2022
@rouault
Copy link
Member

rouault commented Sep 3, 2022

Reproducer involving only netCDF API provided to Unidata/netcdf-c#2496 . I can't see any reasonable workaround on GDAL side for this

@lnicola
Copy link
Contributor Author

lnicola commented Sep 3, 2022

Thanks a lot for looking into this.

We're actually seeing a similar crash even when opening different files, so it might run deeper than this 🥲.

@rouault
Copy link
Member

rouault commented Sep 3, 2022

#6311 seems to be a workaround for the reproducer of that ticket (I didn't manage to reproduce an issue when opening 2 different files)

@rouault rouault closed this as completed in f3392bc Sep 6, 2022
rouault added a commit that referenced this issue Sep 6, 2022
netCDF (multidim): workaround crash with using same file in 2 differents threads (each thread with its own dataset object) (fixes #6253)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants