Segfault when using sentence-transformers (2.7.0) with Sklearn (1.3.0) #2631

delip opened this issue May 7, 2024 · 7 comments


delip commented May 7, 2024

I have a strange failure case.

  1. The following code works fine:
    CleanShot 2024-05-07 at 18 22 24

  2. If I import this single line from sklearn, I get a segfault!
    CleanShot 2024-05-07 at 18 21 15

Any idea what could be happening? I have tried this in a fresh environment and still see it.

Environment details:

$ python --version                                                                                                
Python 3.12.2

$  pip list
Package                           Version
--------------------------------- ------------
aiobotocore                       2.7.0
aiohttp                           3.9.3
aioitertools                      0.7.1
aiosignal                         1.2.0
alabaster                         0.7.12
altair                            5.0.1
annotated-types                   0.6.0
anyio                             4.2.0
appdirs                           1.4.4
applaunchservices                 0.3.0
appnope                           0.1.3
appscript                         1.1.2
argon2-cffi                       21.3.0
argon2-cffi-bindings              21.2.0
arrow                             1.2.3
astroid                           2.14.2
astropy                           5.3.4
asttokens                         2.0.5
async-lru                         2.0.4
atomicwrites                      1.4.0
attrs                             23.1.0
Automat                           20.2.0
autopep8                          2.0.4
Babel                             2.11.0
backcall                          0.2.0
bcrypt                            3.2.0
beautifulsoup4                    4.12.2
binaryornot                       0.4.4
black                             23.11.0
bleach                            4.1.0
blinker                           1.6.2
bokeh                             3.3.4
botocore                          1.31.64
Bottleneck                        1.3.7
Brotli                            1.0.9
cachetools                        4.2.2
certifi                           2024.2.2
cffi                              1.16.0
chardet                           4.0.0
charset-normalizer                2.0.4
click                             8.1.7
cloudpickle                       2.2.1
colorama                          0.4.6
colorcet                          3.1.0
comm                              0.2.1
constantly                        23.10.4
contourpy                         1.2.0
cookiecutter                      2.6.0
cryptography                      42.0.5
cssselect                         1.2.0
cycler                            0.11.0
cytoolz                           0.12.2
dask                              2023.11.0
datasets                          2.19.1
datashader                        0.16.0
debugpy                           1.6.7
decorator                         5.1.1
defusedxml                        0.7.1
diff-match-patch                  20200713
dill                              0.3.8
distributed                       2023.11.0
distro                            1.9.0
docopt                            0.6.2
docstring-to-markdown             0.11
docutils                          0.18.1
entrypoints                       0.4
et-xmlfile                        1.1.0
executing                         0.8.3
fastjsonschema                    2.16.2
filelock                          3.13.1
flake8                            7.0.0
Flask                             2.2.5
fonttools                         4.25.0
frozenlist                        1.4.0
fsspec                            2023.10.0
gensim                            4.3.2
gitdb                             4.0.7
GitPython                         3.1.37
greenlet                          3.0.1
h11                               0.14.0
h5py                              3.9.0
HeapDict                          1.0.1
holoviews                         1.18.3
httpcore                          1.0.5
httpx                             0.27.0
huggingface-hub                   0.23.0
hvplot                            0.9.2
hyperlink                         21.0.0
idna                              3.4
imagecodecs                       2023.1.23
imageio                           2.33.1
imagesize                         1.4.1
imbalanced-learn                  0.11.0
importlib-metadata                7.0.1
incremental                       22.10.0
inflection                        0.5.1
iniconfig                         1.1.1
intake                            0.6.8
intervaltree                      3.1.0
ipykernel                         6.28.0
ipython                           8.12.3
ipython-genutils                  0.2.0
ipywidgets                        7.8.1
isort                             5.9.3
itemadapter                       0.3.0
itemloaders                       1.1.0
itsdangerous                      2.0.1
jaraco.classes                    3.2.1
jedi                              0.18.1
jellyfish                         1.0.1
Jinja2                            3.1.3
jmespath                          1.0.1
joblib                            1.2.0
json5                             0.9.6
jsonschema                        4.19.2
jsonschema-specifications         2023.7.1
jupyter                           1.0.0
jupyter_client                    8.6.0
jupyter-console                   6.6.3
jupyter_core                      5.5.0
jupyter-events                    0.8.0
jupyter-lsp                       2.2.0
jupyter_server                    2.10.0
jupyter_server_terminals          0.4.4
jupyterlab                        4.0.11
jupyterlab-pygments               0.1.2
jupyterlab_server                 2.25.1
jupyterlab-widgets                1.0.0
keyring                           24.3.1
kiwisolver                        1.4.4
lazy_loader                       0.3
lazy-object-proxy                 1.6.0
lckr_jupyterlab_variableinspector 3.1.0
linkify-it-py                     2.0.0
litellm                           1.35.38
llvmlite                          0.42.0
lmdb                              1.4.1
locket                            1.0.0
lxml                              4.9.3
lz4                               4.3.2
Markdown                          3.4.1
markdown-it-py                    2.2.0
MarkupSafe                        2.1.3
matplotlib                        3.8.0
matplotlib-inline                 0.1.6
mccabe                            0.7.0
mdit-py-plugins                   0.3.0
mdurl                             0.1.0
mistune                           2.0.4
more-itertools                    10.1.0
mpmath                            1.3.0
msgpack                           1.0.3
multidict                         6.0.4
multipledispatch                  0.6.0
multiprocess                      0.70.16
munkres                           1.1.4
mypy                              1.8.0
mypy-extensions                   1.0.0
nbclient                          0.8.0
nbconvert                         7.16.4
nbformat                          5.9.2
nest-asyncio                      1.6.0
networkx                          3.1
nltk                              3.8.1
notebook                          7.0.8
notebook_shim                     0.2.3
numba                             0.59.0
numexpr                           2.8.7
numpy                             1.26.4
numpydoc                          1.5.0
openai                            1.25.2
openpyxl                          3.0.10
overrides                         7.4.0
packaging                         23.2
pandas                            2.1.4
pandocfilters                     1.5.0
panel                             1.3.8
param                             2.1.0
parsel                            1.8.1
parso                             0.8.3
partd                             1.4.1
pathspec                          0.10.3
patsy                             0.5.3
pexpect                           4.8.0
pickleshare                       0.7.5
pillow                            10.2.0
pip                               23.3.1
pipreqs                           0.5.0
platformdirs                      3.10.0
plotly                            5.19.0
pluggy                            1.0.0
ply                               3.11
prometheus-client                 0.14.1
prompt-toolkit                    3.0.43
Protego                           0.1.16
protobuf                          3.20.3
psutil                            5.9.0
ptyprocess                        0.7.0
pure-eval                         0.2.2
py-cpuinfo                        9.0.0
pyarrow                           14.0.2
pyarrow-hotfix                    0.6
pyasn1                            0.4.8
pyasn1-modules                    0.2.8
pycodestyle                       2.11.1
pycparser                         2.21
pyct                              0.5.0
pycurl                            7.45.2
pydantic                          2.7.1
pydantic_core                     2.18.2
pydeck                            0.8.0
PyDispatcher                      2.0.5
pydocstyle                        6.3.0
pyerfa                            2.0.0
pyflakes                          3.2.0
Pygments                          2.15.1
pylint                            2.16.2
pylint-venv                       3.0.3
pyls-spyder                       0.4.0
pyobjc-core                       10.1
pyobjc-framework-Cocoa            10.1
pyobjc-framework-CoreServices     10.1
pyobjc-framework-FSEvents         10.1
pyodbc                            5.0.1
pyOpenSSL                         24.0.0
pyparsing                         3.0.9
PyQt5                             5.15.10
PyQt5-sip                         12.13.0
PyQtWebEngine                     5.15.6
PySocks                           1.7.1
pytest                            7.4.0
python-dateutil                   2.8.2
python-dotenv                     1.0.1
python-json-logger                2.0.7
python-lsp-black                  2.0.0
python-lsp-jsonrpc                1.1.2
python-lsp-server                 1.10.0
python-slugify                    5.0.2
python-snappy                     0.6.1
pytoolconfig                      1.2.6
pytz                              2023.3.post1
pyviz_comms                       3.0.2
pywavelets                        1.5.0
PyYAML                            6.0.1
pyzmq                             25.1.2
QDarkStyle                        3.2.3
qstylizer                         0.2.2
QtAwesome                         1.2.2
qtconsole                         5.5.1
QtPy                              2.4.1
queuelib                          1.6.2
referencing                       0.30.2
regex                             2023.10.3
requests                          2.31.0
requests-file                     1.5.1
rfc3339-validator                 0.1.4
rfc3986-validator                 0.1.1
rich                              13.3.5
rope                              1.12.0
rpds-py                           0.10.6
Rtree                             1.0.1
s3fs                              2023.10.0
safetensors                       0.4.3
scikit-image                      0.22.0
scikit-learn                      1.3.0
scipy                             1.11.4
Scrapy                            2.11.1
seaborn                           0.12.2
Send2Trash                        1.8.2
sentence-transformers             2.7.0
service-identity                  18.1.0
setuptools                        68.2.2
sip                               6.7.12
six                               1.16.0
smart-open                        5.2.1
smmap                             4.0.0
sniffio                           1.3.0
snowballstemmer                   2.2.0
sortedcontainers                  2.4.0
soupsieve                         2.5
Sphinx                            5.0.2
sphinxcontrib-applehelp           1.0.2
sphinxcontrib-devhelp             1.0.2
sphinxcontrib-htmlhelp            2.0.0
sphinxcontrib-jsmath              1.0.1
sphinxcontrib-qthelp              1.0.3
sphinxcontrib-serializinghtml     1.1.5
spyder                            5.5.1
spyder-kernels                    2.5.0
SQLAlchemy                        2.0.25
stack-data                        0.2.0
statsmodels                       0.14.0
streamlit                         1.32.0
sympy                             1.12
tables                            3.9.2
tabulate                          0.9.0
tblib                             1.7.0
tenacity                          8.2.2
terminado                         0.17.1
text-unidecode                    1.3
textdistance                      4.2.1
threadpoolctl                     2.2.0
three-merge                       0.1.1
tifffile                          2023.4.12
tiktoken                          0.6.0
tinycss2                          1.2.1
tldextract                        3.2.0
tokenizers                        0.19.1
toml                              0.10.2
tomli                             2.0.1
tomlkit                           0.11.1
toolz                             0.12.0
torch                             2.3.0
tornado                           6.3.3
tqdm                              4.65.0
traitlets                         5.7.1
transformers                      4.40.2
Twisted                           23.10.0
typing_extensions                 4.9.0
tzdata                            2023.3
uc-micro-py                       1.0.1
ujson                             5.4.0
Unidecode                         1.2.0
urllib3                           2.0.3
w3lib                             2.1.2
watchdog                          2.1.6
wcwidth                           0.2.5
webencodings                      0.5.1
websocket-client                  0.58.0
Werkzeug                          2.2.3
whatthepatch                      1.0.2
wheel                             0.41.2
widgetsnbextension                3.6.6
wrapt                             1.14.1
wurlitzer                         3.0.2
xarray                            2023.6.0
xlwings                           0.29.1
xxhash                            3.4.1
xyzservices                       2022.9.0
yapf                              0.40.2
yarg                              0.1.9
yarl                              1.9.3
zict                              3.0.0
zipp                              3.17.0
zope.interface                    5.4.0
da03 commented May 8, 2024

Works fine for me using Python 3.9.16 though:

python --version

Python 3.9.16

pip list

Package                       Version
----------------------------- ------------------
accelerate                    0.24.1
aiohttp                       3.8.4
aiosignal                     1.3.1
appdirs                       1.4.4
async-timeout                 4.0.2
attrs                         23.1.0
bibtexparser                  1.4.1
blis                          0.7.11
brotlipy                      0.7.0
cachetools                    5.3.3
catalogue                     2.0.10
certifi                       2022.12.7
cffi                          1.15.1
charset-normalizer            2.0.4
click                         8.1.3
cloudpathlib                  0.16.0
colorama                      0.4.6
confection                    0.1.3
contourpy                     1.0.7
cryptography                  39.0.1
cupy-cuda113                  10.6.0
curated-tokenizers            0.0.8
curated-transformers          0.1.1
cycler                        0.11.0
cymem                         2.0.8
datasets                      2.19.0
DAWG-Python                   0.7.2
de-core-news-lg               3.7.0
deepspeed                     0.9.5
dill                          0.3.6
docker-pycreds                0.4.0
docopt-ng                     0.9.0
einops                        0.7.0
en-core-web-lg                3.7.0
en-core-web-trf               3.7.2
es-core-news-lg               3.7.0
evaluate                      0.4.0
fastrlock                     0.8.2
filelock                      3.9.0
fonttools                     4.39.3
fr-core-news-lg               3.7.0
frozenlist                    1.3.3
fsspec                        2023.10.0
gitdb                         4.0.10
GitPython                     3.1.31
gmpy2                         2.1.2
google-api-core               2.17.1
google-auth                   2.28.1
google-cloud-aiplatform       1.43.0
google-cloud-bigquery         3.17.2
google-cloud-core             2.4.1
google-cloud-resource-manager 1.12.2
google-cloud-storage          2.14.0
google-crc32c                 1.5.0
google-resumable-media        2.7.0
googleapis-common-protos      1.62.0
grpc-google-iam-v1            0.13.0
grpcio                        1.62.0
grpcio-status                 1.62.0
hjson                         3.1.0
huggingface-hub               0.22.2
idna                          3.4
importlib-resources           5.12.0
it-core-news-lg               3.7.0
ja-core-news-sm               3.7.0
ja-core-news-trf              3.7.2
Jinja2                        3.1.2
joblib                        1.2.0
kiwisolver                    1.4.4
ko-core-news-lg               3.7.0
ko-core-news-sm               3.7.0
langcodes                     3.3.0
latexcodec                    2.0.1
lxml                          4.9.3
MarkupSafe                    2.1.1
matplotlib                    3.7.1
mkl-fft                       1.3.1
mkl-random                    1.2.2
mkl-service                   2.4.0
mpmath                        1.2.1
multidict                     6.0.4
multiprocess                  0.70.14
murmurhash                    1.0.10
networkx                      2.8.4
ninja                         1.11.1
numpy                         1.23.5
openai                        0.28.0
packaging                     23.1
pandas                        2.0.1
pathtools                     0.1.2
pdfminer.six                  20221105
pdfplumber                    0.10.2
phonenumbers                  8.13.24
Pillow                        9.4.0
pip                           23.0.1
portalocker                   2.8.2
preshed                       3.0.9
presidio-analyzer             2.2.350
presidio-anonymizer           2.2.350
proto-plus                    1.23.0
protobuf                      4.25.3
psutil                        5.9.5
pt-core-news-lg               3.7.0
py-cpuinfo                    9.0.0
pyarrow                       14.0.1
pyarrow-hotfix                0.6
pyasn1                        0.5.1
pyasn1-modules                0.3.0
pybtex                        0.24.0
pycparser                     2.21
pycryptodome                  3.19.0
pydantic                      1.10.11
pylatexenc                    2.10
pymorphy3                     1.2.1
pymorphy3-dicts-ru            2.4.417150.4580142
pyOpenSSL                     23.0.0
pyparsing                     3.0.9
pypdfium2                     4.22.0
PySocks                       1.7.1
python-dateutil               2.8.2
pytz                          2023.3
PyYAML                        6.0
rebiber                       1.1.3
regex                         2023.3.23
requests                      2.28.1
requests-file                 1.5.1
responses                     0.18.0
rsa                           4.9
ru-core-news-lg               3.7.0
sacrebleu                     2.3.3
safetensors                   0.4.2
scikit-learn                  1.3.0
scipy                         1.10.1
sentence-transformers         2.7.0
sentencepiece                 0.1.99
sentry-sdk                    1.24.0
setproctitle                  1.3.2
setuptools                    66.0.0
shapely                       2.0.3
six                           1.16.0
sklearn                       0.0.post4
smart-open                    6.4.0
smmap                         5.0.0
spacy                         3.7.2
spacy-curated-transformers    0.2.0
spacy-legacy                  3.0.12
spacy-loggers                 1.0.5
spacy-pkuseg                  0.0.33
srsly                         2.4.8
SudachiDict-core              20230927
SudachiPy                     0.6.7
sympy                         1.11.1
tabulate                      0.9.0
tenacity                      8.2.3
termcolor                     2.3.0
thinc                         8.2.1
threadpoolctl                 3.1.0
tiktoken                      0.4.0
tldextract                    5.1.0
tokenizers                    0.15.1
torch                         2.0.0
torchaudio                    2.0.0
torchvision                   0.15.0
tqdm                          4.65.0
transformers                  4.38.0.dev0
triton                        2.0.0
tsv                           1.2
typer                         0.9.0
typing_extensions             4.5.0
tzdata                        2023.3
Unidecode                     1.3.7
urllib3                       1.26.15
wandb                         0.15.3
wasabi                        1.1.2
weasel                        0.3.4
wheel                         0.38.4
xxhash                        3.2.0
yarl                          1.9.2
zh-core-web-lg                3.7.0
zh-core-web-trf               3.7.2
zipp                          3.15.0

delip commented May 8, 2024

Could this be a Python 12 problem in sentence transformer?

Copy link

It worked for me. Apart from a FutureWarning. I am using 3.12.0

pip list
Package Version

certifi 2024.2.2
charset-normalizer 3.3.2
filelock 3.14.0
fsspec 2024.3.1
huggingface-hub 0.23.0
idna 3.7
Jinja2 3.1.4
joblib 1.4.2
MarkupSafe 2.1.5
mpmath 1.3.0
networkx 3.3
numpy 1.26.4
packaging 24.0
pillow 10.3.0
pip 23.2.1
PyYAML 6.0.1
regex 2024.4.28
requests 2.31.0
safetensors 0.4.3
scikit-learn 1.4.2
scipy 1.13.0
sentence-transformers 2.7.0
sympy 1.12
threadpoolctl 3.5.0
tokenizers 0.19.1
torch 2.3.0
tqdm 4.66.4
transformers 4.40.2
typing_extensions 4.11.0
urllib3 2.2.1

da03 commented May 8, 2024

also works for python 3.12.2 (exact same version):

python --version

Python 3.12.2

pip list

Package                  Version
------------------------ ----------
certifi                  2024.2.2
charset-normalizer       3.3.2
filelock                 3.14.0
fsspec                   2024.3.1
huggingface-hub          0.23.0
idna                     3.7
Jinja2                   3.1.4
joblib                   1.4.2
MarkupSafe               2.1.5
mpmath                   1.3.0
networkx                 3.3
numpy                    1.26.4
nvidia-cuda-cupti-cu12   12.1.105
nvidia-cuda-nvrtc-cu12   12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-nccl-cu12         2.20.5
nvidia-nvjitlink-cu12    12.4.127
nvidia-nvtx-cu12         12.1.105
packaging                24.0
pillow                   10.3.0
pip                      24.0
PyYAML                   6.0.1
regex                    2024.4.28
requests                 2.31.0
safetensors              0.4.3
scikit-learn             1.3.0
scipy                    1.13.0
sentence-transformers    2.7.0
setuptools               69.5.1
sympy                    1.12
threadpoolctl            3.5.0
tokenizers               0.19.1
torch                    2.3.0
torchaudio               2.3.0
torchvision              0.18.0
tqdm                     4.66.4
transformers             4.40.2
typing_extensions        4.11.0
urllib3                  2.2.1
wheel                    0.43.0

delip commented May 9, 2024

forgot to add: the OS I am on is Sonoma 14.4 Beta (23E5205c).
Where are you seeing this working, @da03 @chottuthejimmy?

Copy link

forgot to add: the OS I am on is Sonoma 14.4 Beta (23E5205c). Where are you seeing this working, @da03 @chottuthejimmy?


@delip I can hop on a call tmr if you feel like there's more to the problem.

