Skip to content

Torchaudio + tensorflow + CUDA 11.0 = segfault #1595

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
zpapakipos opened this issue Jun 21, 2021 · 15 comments
Closed

Torchaudio + tensorflow + CUDA 11.0 = segfault #1595

zpapakipos opened this issue Jun 21, 2021 · 15 comments

Comments

@zpapakipos
Copy link

🐛 Bug

Importing torchaudio after tensorflow-gpu while using CUDA 11.0 causes a segfault. This issue was originally reported in the AugLy repo: facebookresearch/AugLy#28.

To Reproduce

Steps to reproduce the behavior:

Only happens on CUDA 11.0, so we haven't been able to reproduce this error.

import tensorflow
import augly.audio as audaugs

Output:

2021-06-18 18:42:04.241048: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
Segmentation fault (core dumped)

Expected behavior

No segfault.

Environment

  • What commands did you used to install torchaudio (conda/pip/build from source)?

  • pip install -r torch==1.8.1 torchaudio==0.8.1 (in AugLy's requirements.txt)

  • PyTorch Version (e.g., 1.0):

  • OS (e.g., Linux):

  • How you installed PyTorch (conda, pip, source):

  • Build command you used (if compiling from source):

  • Python version:

  • CUDA/cuDNN version: 11.0

  • GPU models and configuration:

  • Any other relevant information:

Additional context

This problem doesn't happen on our (AugLy's maintainers') environments, only on one user's.

@mthrok
Copy link
Collaborator

mthrok commented Jun 21, 2021

Hi @zpapakipos

What PyTorch version is this? torchaudio==0.8 has no CUDA-related code, so it seems that the issue is caused by importing torch, like two CUDA versions are loaded.

Can you verify that it's not PyTorch?

@mcanan
Copy link

mcanan commented Jun 21, 2021

The error is happening to me as I reported in this issue: facebookresearch/AugLy#28
In my environment it can be reproduced with this commands:

python3 -m venv venv
source venv/bin/activate
pip3 install wheel
pip3 install tensorflow-gpu==2.4.1
pip3 install augly
python3 -c "import tensorflow; import augly.audio as audaugs"

Answering your question, the PyTorch version is the version installed by AugLy torch==1.8.1

Doing a:

strace python3 -c "import tensorflow; import augly.audio as audaugs"

The last file opened before the segfault is: python3.8/site-packages/torchaudio/_internal/fft.py

More information about my environment:

  • SO: Linux Ubuntu 20.04
  • Python version: 3.8.5
  • tensorflow version: 2.4.1
  • CUDA version: 11.0
  • Nvidia Driver Version: 450.119.03.
  • GPU: Quadro P5200

Please let me know if you need additional information.
Thank you.

@mthrok
Copy link
Collaborator

mthrok commented Jun 21, 2021

@mcanan Thanks. Can you replace import augly.audio as audaugs with import torch and see that happens?

@mcanan
Copy link

mcanan commented Jun 21, 2021

It doesn't fail:

python3 -c "import tensorflow; import torch"
2021-06-21 10:18:58.630042: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0

In the other case the output is this:

python3 -c "import tensorflow; import augly.audio as audaugs"
2021-06-21 10:18:45.234129: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
Segmentation fault (core dumped)

@mthrok
Copy link
Collaborator

mthrok commented Jun 21, 2021

Oh that's interesting. Can you run python -m 'torch.utils.collect_env' and report the output?

@mcanan
Copy link

mcanan commented Jun 21, 2021

Collecting environment information...
PyTorch version: 1.8.1+cu102
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.1 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: version 3.16.3

Python version: 3.8 (64-bit runtime)
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration: GPU 0: Quadro P5200
Nvidia driver version: 450.119.03
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.20.2
[pip3] torch==1.8.1
[pip3] torchaudio==0.8.1
[conda] Could not collect

@mthrok
Copy link
Collaborator

mthrok commented Jun 21, 2021

Do you use CUDA from both TF and PyTorch?

The original issue description says Only happens on CUDA 11.0. (and I am interpreting this as The issue does not happen on CPU-version of PyTorch).
If you do not use PyTorch for DL-related stuff, as a workaround, it might work to replace the CUDA-enabled PyTorch with one without it.

@mcanan
Copy link

mcanan commented Jun 21, 2021

We don't use PyTorch. We use only TF for DL.
We wanted to test the new AugLy library to test the audio augmentations and we installed it. PyTorch is installed during the AugLy installation.
How can we replace the CUDA enabled PyTorch for the other one?
Should't it be done during AugLy installation?

@mthrok
Copy link
Collaborator

mthrok commented Jun 21, 2021

So first uninstall PyTorch and torchaudio pip uninstall torch torchaudio then install the right version of torch and torchaudio.

To install torch, something like pip3 install torch==1.8.1+cpu torchaudio==0.8.1 will work but the best way to find the correct command is to go to https://pytorch.org/get-started/locally/#start-locally and choose the right configuration for you.

@mcanan
Copy link

mcanan commented Jun 21, 2021

I installed the cpu version and I still have the same segfault.
I can reproduce it from scratch following these steps:

python3 -m venv venv
source venv/bin/activate
pip3 install tensorflow-gpu==2.4.1
pip3 install augly
pip3 uninstall torch torchaudio
pip3 install torch==1.8.1+cpu torchaudio==0.8.1 -f https://download.pytorch.org/whl/lts/1.8/torch_lts.html
python3 -c "import tensorflow; import augly.audio as audaugs"

The output is:

python3 -c "import tensorflow; import augly.audio as audaugs"
2021-06-21 14:05:53.719005: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
Segmentation fault (core dumped)

The output of python -m 'torch.utils.collect_env' is:

Collecting environment information...
PyTorch version: 1.8.1+cpu
Is debug build: False
CUDA used to build PyTorch: Could not collect
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.1 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: version 3.16.3

Python version: 3.8 (64-bit runtime)
Is CUDA available: False
CUDA runtime version: Could not collect
GPU models and configuration: GPU 0: Quadro P5200
Nvidia driver version: 450.119.03
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.20.2
[pip3] torch==1.8.1+cpu
[pip3] torchaudio==0.8.1
[conda] Could not collect

Thank you

@mthrok
Copy link
Collaborator

mthrok commented Jun 21, 2021

Thanks for the report. One more thing, can you try torch==1.9.0 and torchaudio==0.9.0?
It's unlikely that we can make a change to the past release (1.8/0.8), but if we can fix it on master, then we can do a minor release 0.9.1

@mcanan
Copy link

mcanan commented Jun 21, 2021

It doesn't fail now. I followed these steps from scratch:

python3 -m venv venv
source venv/bin/activate
pip3 install tensorflow-gpu==2.4.1
pip3 install augly
pip3 uninstall torch torchaudio
pip3 install torch==1.9.0+cpu torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
python3 -c "import tensorflow; import augly.audio as audaugs"

The only error messages I had were these during torch and torchaudio installation:

ERROR: augly 0.1.1 has requirement torch==1.8.1, but you'll have torch 1.9.0+cpu which is incompatible.
ERROR: augly 0.1.1 has requirement torchaudio==0.8.1, but you'll have torchaudio 0.9.0 which is incompatible.

@mthrok
Copy link
Collaborator

mthrok commented Jun 21, 2021

@mcanan Thanks. Grad that it works.

@zpapakipos @jbitton Not sure if the version requirements should be bumped up in AugLy's requirements.txt, but at least this seems to work.

@jbitton
Copy link

jbitton commented Jun 21, 2021

@mthrok thank you for helping debug this issue! I'll check the unit tests for AugLy audio with torch 1.9.0 and torchaudio 0.9.0 and see if we're still getting the same expected results :)

jbitton added a commit to jbitton/AugLy that referenced this issue Jun 22, 2021
Summary:
As called out in facebookresearch#28, there are some conflicting dependencies between `torchaudio`/`torch` 0.8.1/1.8.1 and `tensorflow-gpu`.

However, as discovered in pytorch/audio#1595, upgrading to v0.9 etc actually resolve this issue.

Thus, I update the torchaudio/torch versions in our `requirements.txt` and I also updated our `numpy` requirement so there are no conflicting dependencies between `tf-gpu` and `augly` :)

I verified on my side that all unit tests still pass and that `setup.py` finishes as expected with no errors. I also update `setup.py` to add our README to our PyPI page.

Differential Revision: D29292956

fbshipit-source-id: e07f8b3d6d2d8bc9b21af166307f2ae00dbca663
zpapakipos added a commit to facebookresearch/AugLy that referenced this issue Jun 22, 2021
* Update `torchaudio` to 0.9 for `tensorflow-gpu` compatibility

Summary:
As called out in #28, there are some conflicting dependencies between `torchaudio`/`torch` 0.8.1/1.8.1 and `tensorflow-gpu`.

However, as discovered in pytorch/audio#1595, upgrading to v0.9 etc actually resolve this issue.

Thus, I update the torchaudio/torch versions in our `requirements.txt` and I also updated our `numpy` requirement so there are no conflicting dependencies between `tf-gpu` and `augly` :)

I verified on my side that all unit tests still pass and that `setup.py` finishes as expected with no errors. I also update `setup.py` to add our README to our PyPI page.

Differential Revision: D29292956

fbshipit-source-id: e07f8b3d6d2d8bc9b21af166307f2ae00dbca663

* Update setup.py

Co-authored-by: Zoe Papakipos <[email protected]>
@mthrok
Copy link
Collaborator

mthrok commented Jun 22, 2021

Closing this issue as it does not happen in recent release (0.9) and master branch.
The reason why it does not work with 0.8 is still unknown but we do not update the past release as-well, so we recommend users to use the 0.9.

@mthrok mthrok closed this as completed Jun 22, 2021
tanujdhiman pushed a commit to tanujdhiman/AugLy that referenced this issue Jul 23, 2021
…okresearch#43)

* Update `torchaudio` to 0.9 for `tensorflow-gpu` compatibility

Summary:
As called out in facebookresearch#28, there are some conflicting dependencies between `torchaudio`/`torch` 0.8.1/1.8.1 and `tensorflow-gpu`.

However, as discovered in pytorch/audio#1595, upgrading to v0.9 etc actually resolve this issue.

Thus, I update the torchaudio/torch versions in our `requirements.txt` and I also updated our `numpy` requirement so there are no conflicting dependencies between `tf-gpu` and `augly` :)

I verified on my side that all unit tests still pass and that `setup.py` finishes as expected with no errors. I also update `setup.py` to add our README to our PyPI page.

Differential Revision: D29292956

fbshipit-source-id: e07f8b3d6d2d8bc9b21af166307f2ae00dbca663

* Update setup.py

Co-authored-by: Zoe Papakipos <[email protected]>
tanujdhiman pushed a commit to tanujdhiman/AugLy that referenced this issue Oct 16, 2021
…okresearch#43)

* Update `torchaudio` to 0.9 for `tensorflow-gpu` compatibility

Summary:
As called out in facebookresearch#28, there are some conflicting dependencies between `torchaudio`/`torch` 0.8.1/1.8.1 and `tensorflow-gpu`.

However, as discovered in pytorch/audio#1595, upgrading to v0.9 etc actually resolve this issue.

Thus, I update the torchaudio/torch versions in our `requirements.txt` and I also updated our `numpy` requirement so there are no conflicting dependencies between `tf-gpu` and `augly` :)

I verified on my side that all unit tests still pass and that `setup.py` finishes as expected with no errors. I also update `setup.py` to add our README to our PyPI page.

Differential Revision: D29292956

fbshipit-source-id: e07f8b3d6d2d8bc9b21af166307f2ae00dbca663

* Update setup.py

Co-authored-by: Zoe Papakipos <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants