Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPI error #8447

Open
mikilterribile opened this issue May 3, 2024 · 6 comments
Open

MPI error #8447

mikilterribile opened this issue May 3, 2024 · 6 comments
Labels
triage Please triage and relabel this issue

Comments

@mikilterribile
Copy link

I have created a very complex application in python using many libraries. Now, I am trying to freeze it with pyinstaller but I get two errors. The first with gdal, but I don't think is too important, while the second with MPI makes all the process abort, when I try to open my .exe file. The error are the following:

"Microsoft Windows [Versione 10.0.18362.959]
(c) 2019 Microsoft Corporation. Tutti i diritti sono riservati.

C:\Users\miche>cd C:\Users\miche\PycharmProjects\pythonProjectMichele\dist\SOLO_GIS

C:\Users\miche\PycharmProjects\pythonProjectMichele\dist\SOLO_GIS>SOLO_GIS.exe
pyogrio\core.py:23: RuntimeWarning: Could not detect GDAL data files. Set GDAL_DATA environment variable to the correct path.
Abort(1090191) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init_thread: Unknown error class, error stack:
MPIR_Init_thread(189)........:
MPID_Init(1561)..............:
MPIDI_OFI_mpi_init_hook(1546):
(unknown)(): Unknown error class
[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=1090191
:
system msg for write_line failure : No error
Abort(1090191) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init_thread: Unknown error class, error stack:
MPIR_Init_thread(189)........:
MPID_Init(1561)..............:
MPIDI_OFI_mpi_init_hook(1546):
(unknown)(): Unknown error class
[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=1090191
:
system msg for write_line failure : No error"

I just want to add some information, I think this error is due to anuga library and the mpi sub library. Anuga is a hydraulic library who uses the mpi multiptocessing capability in order to speed up the computational burden.
Thank you very much in advance for any kind of help and support. Best regards, Michele Zucchelli.

@mikilterribile mikilterribile added the triage Please triage and relabel this issue label May 3, 2024
@rokm
Copy link
Member

rokm commented May 3, 2024

Can you provide a minimal example that triggers this error?

@mikilterribile
Copy link
Author

mikilterribile commented May 3, 2024

Yes, this error occours whan I try to open the .exe file previously created using the following file .spec:

# -*- mode: python ; coding: utf-8 -*-

block_cipher = None


a = Analysis(
    ['SOLO_GIS.py'],
    pathex=['C:\\Users\\miche\\anaconda3\\envs\\ambiente_anuga_completo\\Lib\\site-packages'],
    binaries=[('C:\\Users\\miche\\anaconda3\\envs\\ambiente_anuga_completo\\Lib\\site-packages\\NumbaMinpack', 'NumbaMinpack'), ('C:\\Users\\miche\\anaconda3\\envs\\ambiente_anuga_completo\\Library\\bin\\gdal.dll', 'bin')],
    datas=[('C:\\Users\\miche\\anaconda3\\envs\\ambiente_anuga_completo/Lib/site-packages/numpy-1.26.4.dist-info/*', 'numpy.dist-info'),('C:\\Users\\miche\\PycharmProjects\\pythonProjectMichele\\program_data\\locale\\', 'locale\\'), ('C:\\Users\\miche\\PycharmProjects\\pythonProjectMichele\\program_data\\locale\\it\\LC_MESSAGES', 'locale\\it\\LC_MESSAGES'),('C:\\Users\\miche\\PycharmProjects\\pythonProjectMichele\\program_data\\images\\3D_icon.png', 'program_data\\images'),....(many others)........'program_data\\images'),('C:\\Users\\miche\\PycharmProjects\\pythonProjectMichele\\program_data\\images\\zoom_rectangle_icon.png', 'program_data\\images')],
    hiddenimports=["rasterio.sample", 'rasterio._shim', 'openpyxl.cell._writer', 'rasterio._version', 'pyogrio._geometry', 'anuga', "mpi4py"],
    hookspath=[],
    hooksconfig={},
    runtime_hooks=[],
    excludes=[],
    noarchive=False,
)



a.datas += Tree('C:\\Users\\miche\\anaconda3\\envs\\ambiente_anuga_completo\\Lib\\site-packages\\osgeo\\', prefix='osgeo')
a.datas += Tree('C:\\Users\\miche\\anaconda3\\envs\\ambiente_anuga_completo\\Lib\\site-packages\\osgeo_utils', 'osgeo_utils')
a.datas += Tree('C:\\Users\\miche\\anaconda3\\envs\\ambiente_anuga_completo\\Lib\\site-packages\\rasterio\\', prefix='rasterio')
a.datas += Tree('C:\\Users\\miche\\anaconda3\\envs\\ambiente_anuga_completo\\Lib\\xml', prefix='xml')



pyz = PYZ(a.pure)

exe = EXE(
    pyz,
    a.scripts,
    [],
    exclude_binaries=True,
    name='SOLO_GIS',
    debug=False,
    bootloader_ignore_signals=False,
    strip=False,
    upx=True,
    console=True,
    disable_windowed_traceback=False,
    argv_emulation=False,
    target_arch=None,
    codesign_identity=None,
    entitlements_file=None,
)
coll = COLLECT(
    exe,
    a.binaries,
    a.datas,
    strip=False,
    upx=True,
    upx_exclude=[],
    name='SOLO_GIS',
)

The .exe file seems to be correctly generated in the "dist" folder but once I try to open it the above error occours.
If you need further informations, here is the YAML file of my conda environment:

name: ambiente_anuga_completo
channels:
  - conda-forge
  - defaults
dependencies:
  - affine=2.4.0=pyhd8ed1ab_0
  - anuga=3.1.9=py310hbbfc1a7_1
  - asttokens=2.4.1=pyhd8ed1ab_0
  - aws-c-auth=0.7.16=h7613915_8
  - aws-c-cal=0.6.10=hf6fcf4e_2
  - aws-c-common=0.9.14=hcfcfb64_0
  - aws-c-compression=0.2.18=hf6fcf4e_2
  - aws-c-event-stream=0.4.2=h3df98b0_6
  - aws-c-http=0.8.1=h4e3df0f_7
  - aws-c-io=0.14.6=hf0b8b6f_2
  - aws-c-mqtt=0.10.3=h96fac68_2
  - aws-c-s3=0.5.5=h08df315_0
  - aws-c-sdkutils=0.1.15=hf6fcf4e_2
  - aws-checksums=0.1.18=hf6fcf4e_2
  - aws-crt-cpp=0.26.4=h944602d_3
  - aws-sdk-cpp=1.11.267=hfaf0dd0_4
  - azure-core-cpp=1.11.1=h249a519_1
  - azure-storage-blobs-cpp=12.10.0=h91493d7_1
  - azure-storage-common-cpp=12.5.0=h91493d7_4
  - blosc=1.21.5=hdccc3a2_0
  - brotli=1.1.0=hcfcfb64_1
  - brotli-bin=1.1.0=hcfcfb64_1
  - bzip2=1.0.8=hcfcfb64_5
  - c-ares=1.28.1=hcfcfb64_0
  - ca-certificates=2024.2.2=h56e8100_0
  - cairo=1.18.0=h1fef639_0
  - certifi=2024.2.2=pyhd8ed1ab_0
  - cfitsio=4.4.0=h9b0cee5_0
  - cftime=1.6.3=py310h3e78b6c_0
  - colorama=0.4.6=pyhd8ed1ab_0
  - contourpy=1.2.0=py310h232114e_0
  - cycler=0.12.1=pyhd8ed1ab_0
  - cython=3.0.10=py310h00ffb61_0
  - decorator=5.1.1=pyhd8ed1ab_0
  - dill=0.3.8=pyhd8ed1ab_0
  - exceptiongroup=1.2.0=pyhd8ed1ab_2
  - executing=2.0.1=pyhd8ed1ab_0
  - expat=2.6.2=h63175ca_0
  - fmt=10.2.1=h181d51b_0
  - font-ttf-dejavu-sans-mono=2.37=hab24e00_0
  - font-ttf-inconsolata=3.000=h77eed37_0
  - font-ttf-source-code-pro=2.038=h77eed37_0
  - font-ttf-ubuntu=0.83=h77eed37_1
  - fontconfig=2.14.2=hbde0cde_0
  - fonts-conda-ecosystem=1=0
  - fonts-conda-forge=1=0
  - fonttools=4.50.0=py310h8d17308_0
  - freetype=2.12.1=hdaf720e_2
  - freexl=2.0.0=h8276f4a_0
  - future=1.0.0=pyhd8ed1ab_0
  - gdal=3.8.4=py310h7028bf2_5
  - geos=3.12.1=h1537add_0
  - geotiff=1.7.1=hbf5ca3a_15
  - gettext=0.21.1=h5728263_0
  - glib=2.80.0=h39d0aa6_1
  - glib-tools=2.80.0=h0a98069_1
  - gst-plugins-base=1.22.9=h001b923_1
  - gstreamer=1.22.9=hb4038d2_1
  - hdf4=4.2.15=h5557f11_7
  - hdf5=1.14.3=nompi_h73e8ff5_100
  - icu=73.2=h63175ca_0
  - impi_rt=2021.11.0=h57928b3_49500
  - iniconfig=2.0.0=pyhd8ed1ab_0
  - intel-openmp=2024.1.0=h57928b3_964
  - ipython=8.22.2=pyh7428d3b_0
  - jedi=0.19.1=pyhd8ed1ab_0
  - kealib=1.5.3=hd248416_0
  - kiwisolver=1.4.5=py310h232114e_1
  - krb5=1.21.2=heb0366b_0
  - lcms2=2.16=h67d730c_0
  - lerc=4.0.0=h63175ca_0
  - libabseil=20240116.1=cxx17_h63175ca_2
  - libaec=1.1.3=h63175ca_0
  - libarchive=3.7.2=h313118b_1
  - libblas=3.9.0=21_win64_mkl
  - libboost-headers=1.84.0=h57928b3_2
  - libbrotlicommon=1.1.0=hcfcfb64_1
  - libbrotlidec=1.1.0=hcfcfb64_1
  - libbrotlienc=1.1.0=hcfcfb64_1
  - libcblas=3.9.0=21_win64_mkl
  - libclang13=18.1.2=default_hf64faad_1
  - libcrc32c=1.1.2=h0e60522_0
  - libcurl=8.7.1=hd5e4a3a_0
  - libdeflate=1.20=hcfcfb64_0
  - libexpat=2.6.2=h63175ca_0
  - libffi=3.4.2=h8ffe710_5
  - libgdal=3.8.4=hf83a0e2_5
  - libglib=2.80.0=h39d0aa6_1
  - libgoogle-cloud=2.22.0=h9cad5c0_1
  - libgoogle-cloud-storage=2.22.0=hb581fae_1
  - libgrpc=1.62.1=h5273850_0
  - libhwloc=2.9.3=default_haede6df_1009
  - libiconv=1.17=hcfcfb64_2
  - libjpeg-turbo=3.0.0=hcfcfb64_1
  - libkml=1.3.0=haf3e7a6_1018
  - liblapack=3.9.0=21_win64_mkl
  - libnetcdf=4.9.2=nompi_h07c049d_113
  - libogg=1.3.4=h8ffe710_1
  - libpng=1.6.43=h19919ed_0
  - libpq=16.2=hdb24f17_1
  - libprotobuf=4.25.3=h503648d_0
  - libre2-11=2023.09.01=hf8d8778_2
  - librttopo=1.1.0=h94c4f80_15
  - libspatialite=5.1.0=hf2f0abc_4
  - libsqlite=3.45.2=hcfcfb64_0
  - libssh2=1.11.0=h7dfc565_0
  - libtiff=4.6.0=hddb2be6_3
  - libvorbis=1.3.7=h0e60522_0
  - libwebp-base=1.3.2=hcfcfb64_0
  - libxcb=1.15=hcd874cb_0
  - libxml2=2.12.6=hc3477c8_1
  - libzip=1.10.1=h1d365fa_3
  - libzlib=1.2.13=hcfcfb64_5
  - lz4-c=1.9.4=hcfcfb64_0
  - lzo=2.10=he774522_1000
  - m2w64-gcc-libgfortran=5.3.0=6
  - m2w64-gcc-libs=5.3.0=7
  - m2w64-gcc-libs-core=5.3.0=7
  - m2w64-gmp=6.1.0=2
  - m2w64-libwinpthread-git=5.0.0.4634.697f757=2
  - matplotlib=3.8.3=py310h5588dad_0
  - matplotlib-base=3.8.3=py310hc9baf74_0
  - matplotlib-inline=0.1.6=pyhd8ed1ab_0
  - meshpy=2022.1.3=py310hecd3228_1
  - metis=5.1.1=h63175ca_2
  - minizip=4.0.5=h5bed578_0
  - mkl=2024.0.0=h66d3029_49657
  - mpi=1.0=impi
  - mpi4py=3.1.5=py310headf037_1
  - msys2-conda-epoch=20160418=1
  - munkres=1.1.4=pyh9f0ad1d_0
  - netcdf4=1.6.5=nompi_py310h6477780_100
  - numpy=1.26.4=py310hf667824_0
  - openjpeg=2.5.2=h3d672ee_0
  - openssl=3.2.1=hcfcfb64_1
  - packaging=24.0=pyhd8ed1ab_0
  - parso=0.8.3=pyhd8ed1ab_0
  - pcre2=10.43=h17e33f8_0
  - pickleshare=0.7.5=py_1003
  - pillow=10.3.0=py310hf5d6e66_0
  - pip=24.0=pyhd8ed1ab_0
  - pixman=0.43.4=h63175ca_0
  - platformdirs=4.2.0=pyhd8ed1ab_0
  - pluggy=1.4.0=pyhd8ed1ab_0
  - ply=3.11=py_1
  - pmw=2.0.1=py310h5588dad_1008
  - poppler=24.03.0=hc2f3c52_0
  - poppler-data=0.4.12=hd8ed1ab_0
  - postgresql=16.2=h94c9ec1_1
  - proj=9.3.1=he13c7e8_0
  - prompt-toolkit=3.0.42=pyha770c72_0
  - pthread-stubs=0.4=hcd874cb_1001
  - pthreads-win32=2.9.1=hfa6e2cd_3
  - pure_eval=0.2.2=pyhd8ed1ab_0
  - pygments=2.17.2=pyhd8ed1ab_0
  - pymetis=2023.1.1=py310h5fd4015_2
  - pyparsing=3.1.2=pyhd8ed1ab_0
  - pyproj=3.6.1=py310h05d47c7_5
  - pyqt=5.15.9=py310h1fd54f2_5
  - pyqt5-sip=12.12.2=py310h00ffb61_5
  - pytest=8.1.1=pyhd8ed1ab_0
  - python=3.10.14=h4de0772_0_cpython
  - python-dateutil=2.9.0=pyhd8ed1ab_0
  - python_abi=3.10=4_cp310
  - pytools=2024.1.1=pyhd8ed1ab_0
  - pytz=2024.1=pyhd8ed1ab_0
  - qt-main=5.15.8=h9e85ed6_20
  - re2=2023.09.01=hd3b24a8_2
  - scipy=1.12.0=py310hf667824_2
  - setuptools=69.2.0=pyhd8ed1ab_0
  - sip=6.7.12=py310h00ffb61_0
  - six=1.16.0=pyh6c4a22f_0
  - snappy=1.1.10=hfb803bf_0
  - spdlog=1.12.0=h64d2f7d_2
  - sqlite=3.45.2=hcfcfb64_0
  - stack_data=0.6.2=pyhd8ed1ab_0
  - tbb=2021.11.0=h91493d7_1
  - tiledb=2.21.1=h25b666a_1
  - tk=8.6.13=h5226925_1
  - toml=0.10.2=pyhd8ed1ab_0
  - tomli=2.0.1=pyhd8ed1ab_0
  - tornado=6.4=py310h8d17308_0
  - traitlets=5.14.2=pyhd8ed1ab_0
  - typing_extensions=4.10.0=pyha770c72_0
  - tzdata=2024a=h0c530f3_0
  - ucrt=10.0.22621.0=h57928b3_0
  - unicodedata2=15.1.0=py310h8d17308_0
  - uriparser=0.9.7=h1537add_1
  - utm=0.7.0=pyhd8ed1ab_0
  - vc=14.3=hcf57466_18
  - vc14_runtime=14.38.33130=h82b7239_18
  - vs2015_runtime=14.38.33130=hcb4865c_18
  - wcwidth=0.2.13=pyhd8ed1ab_0
  - wheel=0.43.0=pyhd8ed1ab_1
  - xerces-c=3.2.5=h63175ca_0
  - xorg-libxau=1.0.11=hcd874cb_0
  - xorg-libxdmcp=1.1.3=hcd874cb_0
  - xz=5.2.6=h8d14728_0
  - zlib=1.2.13=hcfcfb64_5
  - zstd=1.5.5=h12be248_0
  - cartopy
  - contextily
  - datashader
  - fiona
  - geopandas
  - geopy
  - gettext
  - mercantile
  - mpmath
  - numba
  - owslib
  - pandas
  - pyflwdir
  - pyinstaller
  - pyinstaller-hooks-contrib
  - pykrige
  - pyogrio
  - rasterio
  - rasterstats
  - rioxarray
  - shapely
  - wxpython
  - xarray
  - dask
  - pip:
      - netlicensing-client==0.0.5
      - numbaminpack==0.1.3

@rokm
Copy link
Member

rokm commented May 3, 2024

I meant a minimal code example, so I can try reproducing the error.

Because the basic mpi4py Point-to-Point communication example seems to run fine if I freeze it.

And so does the basic anuga example.

@mikilterribile
Copy link
Author

mikilterribile commented May 3, 2024

ah ok sorry for the misunderstanding. Here a minimal example of the code, that gives the same error:

import os
import time as mtime
import sys
#from osgeo import gdal
import netCDF4
import anuga
from anuga import distribute, myid, numprocs, finalize, barrier #myid è un identificativo univoco del numero del processo (nel multiprocessing)
from anuga.operators.anuga_sed import Sed_transport_operator
from anuga import Inflow
from numpy import allclose
import numpy as np
from mpi4py import MPI
import warnings
from anuga.structures import inlet
import shutil

print("hello")

and the error:

"C:\Users\miche\PycharmProjects\pythonProjectMichele\dist\prova>C:\Users\miche\PycharmProjects\pythonProjectMichele\dist\prova\prova.exe
Abort(1090447) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init_thread: Unknown error class, error stack:
MPIR_Init_thread(192)........:
MPID_Init(1645)..............:
MPIDI_OFI_mpi_init_hook(1574):
(unknown)(): Unknown error class
[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=1090447
:
system msg for write_line failure : No error
Abort(1090447) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init_thread: Unknown error class, error stack:
MPIR_Init_thread(192)........:
MPID_Init(1645)..............:
MPIDI_OFI_mpi_init_hook(1574):
(unknown)(): Unknown error class
[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=1090447
:
system msg for write_line failure : No error"

@rokm
Copy link
Member

rokm commented May 3, 2024

Turns out I could not reproduce the error because I was using mpi4py 3.1.4 from conda main channel, which uses msmpi. Whereas in your environment, you have mpi4py 3.1.5 from conda-forge, which uses impi_rt. And that requires dynamically-loaded Library/bin/libfabric/libfabric.dll to be collected.

In the same directory as you have the .spec file, create a directory called extra-hooks, and a hook file in it (called hook-mpi4py.py):

# extra-hooks/hook-mpi4py.py
from PyInstaller.utils.hooks import get_installer, logger

binaries = []

if get_installer('mpi4py') == 'conda':
    from PyInstaller.utils.hooks import conda
    
    # conda-forge builds of `mpi4py` depend on `impi_rt`, from which we need to collect dynamically-loaded
    # `Library/bin/libfabric/libfabric.dll`. The main conda channel builds depend on `msmpi`, which does not seem to
    # require any extra collection steps.
    try:
        impi_rt_files = conda.files('impi_rt')
    except ImportError:
        pass
    
    for impi_rt_file in impi_rt_files:
        if impi_rt_file.name != 'libfabric.dll':
            continue
        impi_rt_file = impi_rt_file.locate()
        logger.info("hook-mpi4py: collecting %r", str(impi_rt_file))
        binaries += [(str(impi_rt_file), '.')]

In spec file, add the (relative) path to this extra-hooks directory via hookspath argument:

a = Analysis(
    ...
    hookspath=['hooks'],
    ...
)

And try to rebuild using the modified spec file (preferably with added --clean command-line option).

After the rebuild, you should see libfabric.dll collected into top-level application directory, and you should be able to get past that error.

@mikilterribile
Copy link
Author

wow, it seems to be working correctly on a quick test. Thank you very much!!!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage Please triage and relabel this issue
Projects
None yet
Development

No branches or pull requests

2 participants