Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move from xcode v12 to v14 #6218

Merged
merged 1 commit into from Jun 29, 2022
Merged

Move from xcode v12 to v14 #6218

merged 1 commit into from Jun 29, 2022

Conversation

datumbox
Copy link
Contributor

Fixes #6210

Copy link
Collaborator

@vfdev-5 vfdev-5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @datumbox

nit: circle ci docs says that 14.0.0 is beta = Xcode 14 Beta 1

@datumbox datumbox changed the title Move from xcode v12 to v14 Move from xcode v12 to latest stable Jun 29, 2022
@datumbox
Copy link
Contributor Author

@vfdev-5 good callout. Shall I fallback to the latest stable instead (v13)?

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Jun 29, 2022

Let's keep it as 14.0 then I hope circle-ci will pick stable 14.0.X once available.

@datumbox datumbox changed the title Move from xcode v12 to latest stable Move from xcode v12 to v14 Jun 29, 2022
@datumbox datumbox merged commit 7ba9719 into pytorch:main Jun 29, 2022
@datumbox datumbox deleted the ci/xcode_update branch June 29, 2022 11:26
facebook-github-bot pushed a commit that referenced this pull request Jul 6, 2022
Reviewed By: jdsgomes

Differential Revision: D37643912

fbshipit-source-id: 88f548557a9bb5cb6b65817556cabc4b8d72b892
atalman pushed a commit to atalman/vision that referenced this pull request Aug 3, 2022
Reviewed By: jdsgomes

Differential Revision: D37643912

fbshipit-source-id: 88f548557a9bb5cb6b65817556cabc4b8d72b892
atalman pushed a commit to atalman/vision that referenced this pull request Aug 3, 2022
atalman added a commit that referenced this pull request Aug 3, 2022
Co-authored-by: Vasilis Vryniotis <datumbox@users.noreply.github.com>
facebook-github-bot pushed a commit to pytorch/audio that referenced this pull request Aug 16, 2022
Summary:
Similar to pytorch/vision#6218
Fixing MacOS builds

Pull Request resolved: #2622

Reviewed By: weiwangmeta

Differential Revision: D38722983

Pulled By: atalman

fbshipit-source-id: 4cef85c97dc270fc812bc289592c4f3815f73c85
mthrok pushed a commit to pytorch/audio that referenced this pull request Aug 18, 2022
Summary:
Similar to pytorch/vision#6218
Fixing MacOS builds

Pull Request resolved: #2622

Reviewed By: weiwangmeta

Differential Revision: D38722983

Pulled By: atalman

fbshipit-source-id: 4cef85c97dc270fc812bc289592c4f3815f73c85
BriansIDP pushed a commit to BriansIDP/audio that referenced this pull request Jan 21, 2023
parent c68625152ad84f9ea4e881fac695f7d98ee326a9
author Caroline Chen <carolinechen@fb.com> 1659983982 -0700
committer G. Sun <gs534@login-e-3.data.cluster> 1674296079 +0000

parent c68625152ad84f9ea4e881fac695f7d98ee326a9
author Caroline Chen <carolinechen@fb.com> 1659983982 -0700
committer G. Sun <gs534@login-e-3.data.cluster> 1674296047 +0000

parent c68625152ad84f9ea4e881fac695f7d98ee326a9
author Caroline Chen <carolinechen@fb.com> 1659983982 -0700
committer G. Sun <gs534@login-e-3.data.cluster> 1674295932 +0000

parent c68625152ad84f9ea4e881fac695f7d98ee326a9
author Caroline Chen <carolinechen@fb.com> 1659983982 -0700
committer G. Sun <gs534@login-e-3.data.cluster> 1674295795 +0000

parent c68625152ad84f9ea4e881fac695f7d98ee326a9
author Caroline Chen <carolinechen@fb.com> 1659983982 -0700
committer G. Sun <gs534@login-e-3.data.cluster> 1674295664 +0000

parent c68625152ad84f9ea4e881fac695f7d98ee326a9
author Caroline Chen <carolinechen@fb.com> 1659983982 -0700
committer G. Sun <gs534@login-e-3.data.cluster> 1674295524 +0000

parent c68625152ad84f9ea4e881fac695f7d98ee326a9
author Caroline Chen <carolinechen@fb.com> 1659983982 -0700
committer G. Sun <gs534@login-e-3.data.cluster> 1674295462 +0000

Fix stylecheck (#2606)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2606

Reviewed By: nateanl

Differential Revision: D38502666

Pulled By: carolineechen

fbshipit-source-id: 1e279996fff3621835a07882c63328856fe38f3a

Add NNLM support to CTC Decoder (#2528)

Summary:
Expose flashlight's LM and LMState classes to support decoding with custom language models, including NN LMs.

The `ctc_decoder` API is as follows
- To decode with KenLM, pass in KenLM language model path to `lm` variable
- To decode with custom LM, create Python class with `CTCDecoderLM` subclass, and pass in the class to `lm` variable. Additionally create a file of LM words listed in order of the LM index, with a word per line, and pass in the file to `lm_path`.
- To decode without a language model, set `lm` to `None` (default)

Validated against fairseq w2l decoder on sample LibriSpeech dataset and LM. Code for validation can be found [here](https://github.com/facebookresearch/fairseq/compare/main...carolineechen:fairseq:ctc-decoder). Also added unit tests to validate custom implementations of ZeroLM and KenLM, and also using a biased LM.

Follow ups:
- Train simple LM on LibriSpeech and demonstrate usage in tutorial or examples directory

cc jacobkahn

Pull Request resolved: https://github.com/pytorch/audio/pull/2528

Reviewed By: mthrok

Differential Revision: D38243802

Pulled By: carolineechen

fbshipit-source-id: 445e78f6c20bda655aabf819fc0f771fe68c73d7

Fix dataset docs parsing issue with extra spaces (#2607)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2607

Reviewed By: carolineechen, nateanl

Differential Revision: D38522606

Pulled By: skim0514

fbshipit-source-id: 2c38b8dcb343bcf624bfda1bfa2afd91abf2e668

Fixed argument validation in TorchAudio filtering (#2609)

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2609

Converted argument validations in torchaudio/functional/filtering from assert based validation to the preferred if-then raise validation. Added specific error messages in all cases.

Reviewed By: mthrok

Differential Revision: D38515029

fbshipit-source-id: 6c644a042f86c6feb2bbe8bd02fdb484fe27fae9

Fix bug in Conformer RNN-T recipe (#2611)

Summary:
https://github.com/pytorch/audio/issues/2535 modified the Conformer RNN-T Lightning module to accept a SentencePiece model instance rather than a file path. This PR makes changes to account for this in the train script.

Pull Request resolved: https://github.com/pytorch/audio/pull/2611

Reviewed By: carolineechen

Differential Revision: D38578892

Pulled By: hwangjeff

fbshipit-source-id: ec3b9823ad30ffb730baa13d10d8b79020866aac

Add additive noise function (#2608)

Summary:
Adds function `add_noise`, which computes and returns the sum of a waveform and scaled noise.

Pull Request resolved: https://github.com/pytorch/audio/pull/2608

Reviewed By: nateanl

Differential Revision: D38557141

Pulled By: hwangjeff

fbshipit-source-id: 1457fa213f43ca5b4333d3c7580971655d4260a0

Introducing pytorch-cuda metapackage (#2612)

Summary:
Introducing pytorch-cuda metapackage

Same as: https://github.com/pytorch/vision/pull/6371
Following PR: https://github.com/pytorch/builder/pull/1094
Adds cuda metapackage called pytorch-cuda . This way we can make sure to install correct version of cuda dependencies and don't depend on conda-forge.

Pull Request resolved: https://github.com/pytorch/audio/pull/2612

Reviewed By: hwangjeff, seemethere, nateanl

Differential Revision: D38633332

Pulled By: atalman

fbshipit-source-id: 78a6115bb252ebdb6d66a57d7d2c4a4978ddb501

Remove outdated doc (#2617)

Summary:
`ctc_decoder` has become beta, remove it from prototype documents.

Pull Request resolved: https://github.com/pytorch/audio/pull/2617

Reviewed By: hwangjeff

Differential Revision: D38706869

Pulled By: nateanl

fbshipit-source-id: 41679f4e65a584b6b882af4551a50123f1dcef02

Update doc version selector link (#2605)

Summary:
The link to version selector has been absolute link, which had been
a trap when reviewing gh-pages deployment from folk.

This commit changes that to relative link.

Pull Request resolved: https://github.com/pytorch/audio/pull/2605

Test Plan:
- https://mthrok.github.io/audio/main/index.html -> click version selector -> https://mthrok.github.io/audio/versions.html
- https://mthrok.github.io/audio/0.12.1/index.html -> click version selector -> https://pytorch.org/audio/versions.html

Reviewed By: carolineechen, nateanl

Differential Revision: D38695645

Pulled By: mthrok

fbshipit-source-id: 91132ac19b8c61f39d304a162435b9c6599ef2b2

Fix anaconda upload (#2621)

Summary:
Same as:
https://github.com/pytorch/vision/pull/6422

Testing:
```
export ANACONDA_PATH=$(conda info --base)/bin
echo $ANACONDA_PATH
/opt/homebrew/Caskroom/miniconda/base/bin
$ANACONDA_PATH/anaconda -V
anaconda Command line client (version 1.10.0)
```
Failure: https://github.com/pytorch/audio/runs/7837085749?check_suite_focus=true

Pull Request resolved: https://github.com/pytorch/audio/pull/2621

Reviewed By: weiwangmeta, seemethere

Differential Revision: D38714324

Pulled By: atalman

fbshipit-source-id: 55342cf69006e9250403c955202846bab4516f3e

Move xcode to 14 from 12.5 (#2622)

Summary:
Similar to https://github.com/pytorch/vision/pull/6218
Fixing MacOS builds

Pull Request resolved: https://github.com/pytorch/audio/pull/2622

Reviewed By: weiwangmeta

Differential Revision: D38722983

Pulled By: atalman

fbshipit-source-id: 4cef85c97dc270fc812bc289592c4f3815f73c85

Added example for MelScale transform (#2616)

Summary:
Added example for MelScale transform as mentioned in issue https://github.com/pytorch/audio/issues/1564.

Pull Request resolved: https://github.com/pytorch/audio/pull/2616

Reviewed By: carolineechen

Differential Revision: D38743145

Pulled By: nateanl

fbshipit-source-id: e24ca92f5317a0ea5a141418bf084b12cfb22486

Added example for AmplitudeToDB transform (#2615)

Summary:
Added example for AmplitudeToDB transform as mentioned in issue https://github.com/pytorch/audio/issues/1564.

Pull Request resolved: https://github.com/pytorch/audio/pull/2615

Reviewed By: carolineechen

Differential Revision: D38743117

Pulled By: nateanl

fbshipit-source-id: bf0f760299f4777a4bca65da86359faa00b16207

Use double quotes for string in functional and transforms (#2618)

Summary:
To make the code consistent, we should use double quotation marks for all strings. This PR make such changes in functional and transforms.

Pull Request resolved: https://github.com/pytorch/audio/pull/2618

Reviewed By: carolineechen

Differential Revision: D38744137

Pulled By: nateanl

fbshipit-source-id: 74213a24d9f66c306cc92019d77dcb2a877f94bd

Fix doc warning (#2627)

Summary:
Resolves the following warning

```
/torchaudio/docs/source/transforms.rst:94: WARNING: Title underline too short.

:hidden:`Loudness`
-----------------
```

Pull Request resolved: https://github.com/pytorch/audio/pull/2627

Reviewed By: carolineechen

Differential Revision: D38814802

Pulled By: mthrok

fbshipit-source-id: 5dfaf2d7bae22dba0f4a14f04ca63f28d6b2a749

Fix Sphinx-gallery display and pin sphinx-related packages (#2629)

Summary:
This commit fixes the issue with the recent Sphinx-Gallery update.
Also it pins the versions of Sphinx-related packages.

Before:

<img width="256" alt="Screen Shot 2022-08-17 at 10 02 23 PM" src="https://user-images.githubusercontent.com/855818/185140952-28f2d98a-b586-424c-a003-b69089f48eb9.png">

After:

https://user-images.githubusercontent.com/855818/185271889-bd4f86a0-986b-43bb-8121-bd77750d74f0.mov

Pull Request resolved: https://github.com/pytorch/audio/pull/2629

Reviewed By: carolineechen

Differential Revision: D38816417

Pulled By: mthrok

fbshipit-source-id: 11ee3f9121d9a302772ee1f461dacae52eb28852

Tweak tutorials (#2630)

Summary:
Resolves the following warnings

```
/torchaudio/docs/source/tutorials/asr_inference_with_ctc_decoder_tutorial.rst:195: WARNING: Unexpected indentation.
/torchaudio/docs/source/tutorials/asr_inference_with_ctc_decoder_tutorial.rst:446: WARNING: Unexpected indentation.
/torchaudio/docs/source/tutorials/audio_io_tutorial.rst:559: WARNING: Content block expected for the "note" directive; none found.
/torchaudio/docs/source/tutorials/mvdr_tutorial.rst:338: WARNING: Bullet list ends without a blank line; unexpected unindent.
```

Pull Request resolved: https://github.com/pytorch/audio/pull/2630

Reviewed By: nateanl

Differential Revision: D38816632

Pulled By: mthrok

fbshipit-source-id: 135ded4e064d136be67ce24439e96f5e9c9ce635

Update notes around nightly build and third parties (#2632)

Summary:
Google Colab now has torchaudio 0.12 pre-installed.
This commit removes the note about nightly build.

Pull Request resolved: https://github.com/pytorch/audio/pull/2632

Reviewed By: carolineechen

Differential Revision: D38827632

Pulled By: mthrok

fbshipit-source-id: ac769780868b741c3012357d589ec0019d9af6eb

Added example for InverseMelScale transform (#2635)

Summary:
Added example for InverseMelScale transform as mentioned in issue https://github.com/pytorch/audio/issues/1564.

Pull Request resolved: https://github.com/pytorch/audio/pull/2635

Reviewed By: carolineechen

Differential Revision: D38830318

Pulled By: nateanl

fbshipit-source-id: fd26a700d495f6755db0767625aa8577cb89bd83

Update ASR inference tutorial (#2631)

Summary:
* Use download_asset
* Remove notes around nightly
* Print versions first
* Remove duplicated import

Pull Request resolved: https://github.com/pytorch/audio/pull/2631

Reviewed By: carolineechen

Differential Revision: D38830395

Pulled By: mthrok

fbshipit-source-id: c9259df33562defe249734d1ed074dac0fddc2f6

Update README.md (#2633)

Summary:
Update compatibility matrix

Pull Request resolved: https://github.com/pytorch/audio/pull/2633

Reviewed By: nateanl

Differential Revision: D38827670

Pulled By: mthrok

fbshipit-source-id: 5c66bf60a06e37919ee725a5f4adf571e6c89100

Refactor sox pybind source code (#2636)

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2636

At the early stage of torchaudio extension module,
`torchaudio/csrc/pybind` directory was created so that
all the code defining Python interface would be placed
there and there will be only one extension module called
`torchaudio._torchaudio`.

However, the codebase has been evolved in a way separate
extensions are defined for each feature (third party
dependency) for the sake of more moduler file organization.

What is left in `csrc/pybind` is libsox Python bindings.
This commit moves it under `csrc/sox`.

Follow-up rename `torchaudio._torchaudio` to `torchaudio._torchaudio_sox`.

Reviewed By: carolineechen

Differential Revision: D38829253

fbshipit-source-id: 3554af45a2beb0f902810c5548751264e093f28d

Added example for MFCC transform (#2637)

Summary:
Added example for MFCC transform as mentioned in issue https://github.com/pytorch/audio/issues/1564.

Note: Python formatter package `black` uses double quotes for the string dict keys (e.g. in `melkwargs` for this example). Please let me know if there is a different linter/format/convention that is preferred!

Pull Request resolved: https://github.com/pytorch/audio/pull/2637

Reviewed By: carolineechen

Differential Revision: D38873729

Pulled By: nateanl

fbshipit-source-id: 2e8fe2930671e7c5d02c0c37cf1ca5cc8c5079e3

Added example for Loudness transform (#2641)

Summary:
Added example for Loudness transform (implemented in PR https://github.com/pytorch/audio/issues/2472) as mentioned in issue https://github.com/pytorch/audio/issues/1564.

Pull Request resolved: https://github.com/pytorch/audio/pull/2641

Reviewed By: nateanl

Differential Revision: D38907782

Pulled By: carolineechen

fbshipit-source-id: fd2bcc4bac3095a626ea9cf36cb70cb2bf003d63

Update Sphinx-gallery to 0.11.1 (#2638)

Summary:
The minor release fixes some gallery issue, which allows to remove
some of the customization we had in https://github.com/pytorch/audio/issues/2629

https://output.circle-artifacts.com/output/job/553a9b98-8260-4cb4-a681-20ef97d2c33e/artifacts/0/docs/pipelines.html#torchaudio.pipelines.Wav2Vec2ASRBundle

Pull Request resolved: https://github.com/pytorch/audio/pull/2638

Reviewed By: carolineechen, nateanl

Differential Revision: D38909097

Pulled By: mthrok

fbshipit-source-id: 78346d93b54fca2a19b28991c224324ef53221c9

[Nova] Added draft calling GHA workflow for building linux wheels (#2548)

Summary:
As part of Project Nova, we are consolidating CI/CD workflows and infra, making them reusable across PyTorch ecosystem libraries. https://github.com/pytorch/test-infra/pull/460 introduces a general-purpose reusable workflow to build linux wheels for python libraries. This PR introduces a caller workflow that triggers the reusable workflow. Details around modular env setup, passing input args across workflows, etc. are still being worked out.

Using reusable workflow defined in https://github.com/pytorch/test-infra/pull/506

Pull Request resolved: https://github.com/pytorch/audio/pull/2548

Reviewed By: osalpekar

Differential Revision: D38947733

Pulled By: mehtanirav

fbshipit-source-id: 03ab88cef973a092f5c5d1ff8c74ec7ae7e46d01

Added example for LFCC transform (#2640)

Summary:
Added example for LFCC transform as mentioned in issue https://github.com/pytorch/audio/issues/1564.

Pull Request resolved: https://github.com/pytorch/audio/pull/2640

Reviewed By: carolineechen

Differential Revision: D38908975

Pulled By: nateanl

fbshipit-source-id: ffdd994390db7f27556b011a8050a65eef9cd09d

Add StreamWriter (#2628)

Summary:
This commit adds FFmpeg-based encoder StreamWriter class.
StreamWriter is pretty much the opposite of StreamReader class, and
it supports;

* Encoding audio / still image / video
* Exporting to local file / streaming protocol / devices etc...
* File-like object support (in later commit)
* HW video encoding (in later commit)

See also: https://fburl.com/gslide/z85kn5a9 (Meta internal)

Pull Request resolved: https://github.com/pytorch/audio/pull/2628

Reviewed By: nateanl

Differential Revision: D38816650

Pulled By: mthrok

fbshipit-source-id: a9343b0d55755e186971dc96fb86eb52daa003c8

[Nova] Build Linux Conda Binaries using reusable workflow (#2626)

Summary:
Calling the reusable workflow introduced in https://github.com/pytorch/test-infra/pull/546 to build conda binaries on linux.

Pull Request resolved: https://github.com/pytorch/audio/pull/2626

Reviewed By: mehtanirav

Differential Revision: D39028057

Pulled By: osalpekar

fbshipit-source-id: d74ea3771967d0ee2b0ad28a8f811a95145b2183

Replace bg_iterator in examples (#2645)

Summary:
`bg_iterator` was deprecated in 0.11 because it was known to have issues (deadlock) without speed up. Remove instances of `bg_iterator` used in torchaudio examples.

Resolves https://github.com/pytorch/audio/issues/2642

Pull Request resolved: https://github.com/pytorch/audio/pull/2645

Reviewed By: nateanl

Differential Revision: D38954292

Pulled By: carolineechen

fbshipit-source-id: 2333ab5228c2b8511ff532057543aaf9d02b2789

[Nova] Use pkg-helpers to modularize GHA Linux Conda Builds (#2650)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2650

Reviewed By: mehtanirav

Differential Revision: D39040559

Pulled By: osalpekar

fbshipit-source-id: df39e23d7c246728793aab969b8dc1070af88d75

add CUDA 11.7 builds (#2623)

Summary:
CC atalman

Pull Request resolved: https://github.com/pytorch/audio/pull/2623

Reviewed By: hwangjeff, nateanl

Differential Revision: D39036432

Pulled By: atalman

fbshipit-source-id: cd74a1bf8f74e31bd2c32c80d32c617f4b1766e8

Add file-like object support to StreamWriter (#2648)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2648

Reviewed By: nateanl

Differential Revision: D38976874

Pulled By: mthrok

fbshipit-source-id: 0541dea2a633d97000b4b8609ff6b83f6b82c864

Add CUDA HW encoding support to StreamWriter (#2505)

Summary:
This commits add CUDA hardware encoding to StreamWriter.
For certain video formats, it can encode video directly from
CUDA Tensor, without needing to move the data to host CPU.

Pull Request resolved: https://github.com/pytorch/audio/pull/2505

Reviewed By: hwangjeff

Differential Revision: D37446830

Pulled By: mthrok

fbshipit-source-id: eee6424f01a99a3b611dcad45ed58f86cba4672a

Remove obsolete examples (#2655)

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2655

Removed obsolete example and the corresponding test

Reviewed By: mthrok

Differential Revision: D39260253

fbshipit-source-id: 0bde71ffd75dd0c94a5cc4a9940f4648a5d61bd7

Add metadata function for LibriSpeech (#2653)

Summary:
Adding support for metadata mode, requested in https://github.com/pytorch/audio/issues/2539, by adding a public `get_metadata()` function in the dataset. This function can be used directly by users to fetch metadata for individual dataset indices, or users can subclass the dataset and override `__getitem__` with `get_metadata` to create a dataset class that directly handles metadata mode.

Pull Request resolved: https://github.com/pytorch/audio/pull/2653

Reviewed By: nateanl, mthrok

Differential Revision: D39105114

Pulled By: carolineechen

fbshipit-source-id: 6f26f1402a053dffcfcc5d859f87271ed5923348

Fix random Gaussian generation (#2639)

Summary:
This PR is meant to address the bug raised in issue https://github.com/pytorch/audio/issues/2634.

In particular, previously the Box Muller transform was used to generate Gaussian variates for dithering based on `torch.rand` uniform variates, but it was incorrectly implemented (e.g. the same uniform variate was used as input to the transform, rather than two different uniform variates), which led to a different (non-Gaussian) distribution. This PR instead uses `torch.randn` to generate the Gaussian variates.

Pull Request resolved: https://github.com/pytorch/audio/pull/2639

Reviewed By: mthrok

Differential Revision: D39101144

Pulled By: carolineechen

fbshipit-source-id: 691e49679f6598ef0a1675f6f4ee721ef32215fd

Tweak documentation (#2656)

Summary:
1. Override class `__module__` attribute in `conf.py` so that no manual override is necessary
2. Fix SourceSeparationBundle member attribute

Pull Request resolved: https://github.com/pytorch/audio/pull/2656

Reviewed By: carolineechen

Differential Revision: D39293053

Pulled By: mthrok

fbshipit-source-id: 2b8d6be1aee517d0e692043c26ac2438a787adc6

Fix LibriSpeech Conforner RNN-T eval script (#2666)

Summary:
`ConformerRNNTModule`'s initializer now accepts a SentencePiece model rather than a path to a model as input. This PR corrects `eval.py` accordingly.

Pull Request resolved: https://github.com/pytorch/audio/pull/2666

Reviewed By: carolineechen

Differential Revision: D39386968

Pulled By: hwangjeff

fbshipit-source-id: 95a94dd898263d648650f7376c29810b1456d6c1

[Nova] Remove the old caller GitHub Actions Linux wheels/conda Build Workflows (#2660)

Summary:
We moved over to a new design for release workflows that encompass all the build logic in the test-infra repo (apart from custom pre-build and post-build scripts). Thus, we no longer need these caller workflows in the audio repo. This PR removes them entirely.

Pull Request resolved: https://github.com/pytorch/audio/pull/2660

Reviewed By: seemethere

Differential Revision: D39392456

Pulled By: osalpekar

fbshipit-source-id: a8bdeb4738b91666abcdc883f6f8f1bf359f1d42

Move hybrid demucs model out of prototype (#2668)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2668

Reviewed By: nateanl, mthrok

Differential Revision: D39433671

Pulled By: carolineechen

fbshipit-source-id: 3545a5b4019832861c34fd8c05e5f8600fd80d5c

Do not use nested namespaces in torchaudio/sox (#2663)

Summary:
As it is a C++17 feature, and PyTorch and its extensions must still be C++14 compatible, as also specified in the top level CMakeLists.txt:
https://github.com/pytorch/audio/blob/8a0d7b36f7821fe55175f0d4e3ca6299b3817a6c/CMakeLists.txt#L30

Otherwise, it pollutes build logs with noisy
```
/Users/runner/work/test-infra/test-infra/pytorch/audio/torchaudio/csrc/sox/pybind/io.cpp:12:21: warning: nested namespace definition is a C++17 extension; define each namespace separately [-Wc++17-extensions]
namespace torchaudio::sox_io {
                    ^~~~~~~~
                     { namespace sox_io
```

Pull Request resolved: https://github.com/pytorch/audio/pull/2663

Reviewed By: atalman, nateanl

Differential Revision: D39362842

Pulled By: malfet

fbshipit-source-id: f9659d4420f1cc0194990d531455cf59b66c26b9

[Bootcamp] Fix Typo (#2661)

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2661

Fixed typo in `audio_data_augmentation_tutorial.py`

Reviewed By: malfet, mthrok

Differential Revision: D39352353

fbshipit-source-id: aea35dab03fb7422421948bd26716e10a8d65f92

Move SourceSeparationBundle and pre-trained ConvTasNet pipeline into Beta (#2669)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2669

Reviewed By: carolineechen, mthrok

Differential Revision: D39433560

Pulled By: nateanl

fbshipit-source-id: 5b652b31c00badb37b27a32ac25b422a5bcc74cb

CUDA 11.3 remove. New Stable version is 11.6 (#2670)

Summary:
CUDA 11.3 Removing.

Core PR: https://github.com/pytorch/pytorch/pull/84866
cc malfet ptrblck

Pull Request resolved: https://github.com/pytorch/audio/pull/2670

Reviewed By: malfet, osalpekar

Differential Revision: D39449263

Pulled By: atalman

fbshipit-source-id: f86bb119685ead3ffcabd92c4bb8076aecde4095

Move Hybrid Demucs pipeline to beta (#2673)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2673

Reviewed By: mthrok

Differential Revision: D39507612

Pulled By: carolineechen

fbshipit-source-id: 3a9ee53f72cabd6e3085c76867017be4a6ed7f53

Add Decoder LM Docs (#2658)

Summary:
modifications to ctc decoder LM docstrings on top of https://github.com/pytorch/audio/issues/2657

Pull Request resolved: https://github.com/pytorch/audio/pull/2658

Reviewed By: mthrok

Differential Revision: D39468921

Pulled By: carolineechen

fbshipit-source-id: c5497cc2fa22fb98a304d037e27c91bf68a9ad6a

Tweak badge link URL generation (#2677)

Summary:
Currently, the way feature badges are generated assumes that both documentations and the supported features page are on the same level from the root.

This does not work when we introduce `:autosummary:` which generates individual documentation pages one level below.

This commit changes it so that links to the supported features page are properly relative from the documentation level.

There is no appearance change from this commit.

Pull Request resolved: https://github.com/pytorch/audio/pull/2677

Reviewed By: carolineechen

Differential Revision: D39507451

Pulled By: mthrok

fbshipit-source-id: f18da4201f0eb747586be21c8bd9a958217aebc2

Move conv_tasnet_base doc out of prototype (#2675)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2675

Reviewed By: carolineechen

Differential Revision: D39515996

Pulled By: nateanl

fbshipit-source-id: 5824375f6a758af21b6ad6c635dd06081663644f

Consolidate bibliography / reference (#2676)

Summary:
Preparation for the adoptation of `autosummary`.

Replace `:footcite:` with `:cite:` and introduce dedicated reference page, as `:footcite:` does not work well with `autosummary`.

Example:

https://output.circle-artifacts.com/output/job/4da47ba6-d9c7-418e-b5b0-e9f8a146a6c3/artifacts/0/docs/datasets.html#cmuarctic

https://output.circle-artifacts.com/output/job/4da47ba6-d9c7-418e-b5b0-e9f8a146a6c3/artifacts/0/docs/references.html

Pull Request resolved: https://github.com/pytorch/audio/pull/2676

Reviewed By: carolineechen

Differential Revision: D39509431

Pulled By: mthrok

fbshipit-source-id: e6003dd01ec3eff3d598054690f61de8ee31ac9a

Update doc theme to the latest (#2679)

Summary:
To follow the change related to Linux Foundation movement.

(we are still pinning the theme version so that our customization does not break randomly.)

Pull Request resolved: https://github.com/pytorch/audio/pull/2679

Reviewed By: carolineechen

Differential Revision: D39531566

Pulled By: mthrok

fbshipit-source-id: 64353577d05f9dbda00dd9d10b9ebcedddfdce5b

Update Sphinx to 5.1.1 (#2678)

Summary:
Previous versions of Sphinx reported wrong path for return class. This issue is fixed on the latest Sphinx.

It allows to remove the patch we apply in `conf.py`. This is essential for the adoptation of `:autosummary:`, as it won't render correctly with the patch.

https://output.circle-artifacts.com/output/job/19d93ede-08de-4b9e-9d66-67ca5dab964e/artifacts/0/docs/pipelines.html

Pull Request resolved: https://github.com/pytorch/audio/pull/2678

Reviewed By: carolineechen

Differential Revision: D39509447

Pulled By: mthrok

fbshipit-source-id: e104bc6a87f32cba6c549a9fe8f2d1e489ee27e4

Switch to use conda install action for m1 builds (#2674)

Summary:
Usage setup-minicoda action for m1 build
We want to try to address space issues on m1. The following action:
```
pytorch/test-infra/.github/actions/setup-miniconda@main
```

Sets up miniconda in temp folder which should be cleaned between runs

Pull Request resolved: https://github.com/pytorch/audio/pull/2674

Reviewed By: jeanschmidt

Differential Revision: D39540481

Pulled By: atalman

fbshipit-source-id: 0596598ab6b2f99c775aa0c9e14a3a388533068d

Adopt `:autosummary:` in `torchaudio.io` module doc (#2681)

Summary:
This commit adopts :autosummary: directive to `torchaudio.io` module.
It adds table of contents on `torchaudio.io` level.

https://output.circle-artifacts.com/output/job/282089d1-c120-4d22-809f-0e0ac0947c37/artifacts/0/docs/io.html
<img width="1094" alt="Screen Shot 2022-09-16 at 7 33 32 AM" src="https://user-images.githubusercontent.com/855818/190520248-27e469f8-7689-4dc2-b591-7b3f08bb4dff.png">

https://output.circle-artifacts.com/output/job/282089d1-c120-4d22-809f-0e0ac0947c37/artifacts/0/docs/generated/torchaudio.io.StreamReader.html#torchaudio.io.StreamReader
<img width="1108" alt="Screen Shot 2022-09-16 at 7 33 59 AM" src="https://user-images.githubusercontent.com/855818/190520292-d090fed0-2f18-4961-b9f3-9e4808fd437e.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2681

Reviewed By: carolineechen

Differential Revision: D39560459

Pulled By: mthrok

fbshipit-source-id: 3de5f22b8d8d0834dfd8bac8619fbfaa44c5f4dd

Adopt `:autosummary:` in `torchaudio.models.decoder` module doc (#2684)

Summary:
* Adopts `:autosummary:` in decoder module doc
* Hide the constructor signature of `CTCDecoder` as `ctc_decoder` function is the one client code is supposed to be using.
* Introduce `children` property to `CTCDecoderLMState` otherwise it does not show up in the doc.

https://output.circle-artifacts.com/output/job/7aac5eb9-7d2d-4f63-bcdf-83a6f40b4e5a/artifacts/0/docs/models.decoder.html

<img width="748" alt="Screen Shot 2022-09-16 at 5 23 22 PM" src="https://user-images.githubusercontent.com/855818/190592409-0c2ec8a4-d2cf-4d76-a965-8a570faaeb1a.png">

https://output.circle-artifacts.com/output/job/7aac5eb9-7d2d-4f63-bcdf-83a6f40b4e5a/artifacts/0/docs/generated/torchaudio.models.decoder.CTCDecoder.html#torchaudio.models.decoder.CTCDecoder

<img width="723" alt="Screen Shot 2022-09-16 at 5 23 53 PM" src="https://user-images.githubusercontent.com/855818/190592501-3fad1e07-ae3e-44f5-93be-f33181025390.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2684

Reviewed By: carolineechen

Differential Revision: D39574272

Pulled By: mthrok

fbshipit-source-id: d977660bd46f5cf98c535adbf2735be896b28773

Adopt `:autosummary:` in `torchaudio.transforms` module doc (#2683)

Summary:
* Introduce the mini-index at `torchaudio.transforms` page.
* Add "Augmentations" subsection.
* Also updated the overall introduction.

https://output.circle-artifacts.com/output/job/1b65246a-403c-4d2c-b97d-d1b582d8b4e5/artifacts/0/docs/transforms.html

<img width="721" alt="Screen Shot 2022-09-16 at 5 20 08 PM" src="https://user-images.githubusercontent.com/855818/190591795-97c169db-a95b-480a-8d3c-d80072efa045.png">

<img width="755" alt="Screen Shot 2022-09-16 at 5 20 28 PM" src="https://user-images.githubusercontent.com/855818/190591828-03026918-febd-4194-91aa-7d8f704e17cc.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2683

Reviewed By: carolineechen

Differential Revision: D39574255

Pulled By: mthrok

fbshipit-source-id: a4beed7cacbb5184bad96efa903a3a1123dab627

[Nova] Remove Extraneous Build Scripts (#2695)

Summary:
There is a single pre/post script needed for building torchaudio. This PR:
1. Removes the old conda-specific build script
2. Renames the wheel script to be a general name

Pull Request resolved: https://github.com/pytorch/audio/pull/2695

Reviewed By: kit1980

Differential Revision: D39631971

Pulled By: osalpekar

fbshipit-source-id: 52b49a6e792536b6264228c01ac356d247b18ea8

Update nightly wheels to ROCm5.2 (#2672)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2672

Reviewed By: atalman

Differential Revision: D39468320

Pulled By: mthrok

fbshipit-source-id: 0e7bd4fd922ba0db51700e140b95328a5b687a6f

Adopt `:autosummary:` in `torchaudio.functional` module doc (#2693)

Summary:
https://output.circle-artifacts.com/output/job/b23174d2-5cee-4ee9-be39-3228b9ae4abe/artifacts/0/docs/functional.html

<img width="1133" alt="Screen Shot 2022-09-20 at 11 19 23 AM" src="https://user-images.githubusercontent.com/855818/191152824-96c5b16c-bd38-4656-b1ae-0b58699dbd62.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2693

Reviewed By: carolineechen

Differential Revision: D39650930

Pulled By: mthrok

fbshipit-source-id: 28b5b03d21b922e37e02bfddda2bf1dea696cc18

Add Speech Commands metadata function (#2687)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2687

Reviewed By: mthrok

Differential Revision: D39647596

Pulled By: carolineechen

fbshipit-source-id: 8ff874fc1e828130f6754e83ce1f702ca13dfac0

Adopt `:autosummary:` in `torchaudio.models` module doc (#2690)

Summary:
* Introduce the mini-index at `torchaudio.models` page.

https://output.circle-artifacts.com/output/job/25e59810-3866-4ece-b1b7-8a10c7a2286d/artifacts/0/docs/models.html

<img width="1042" alt="Screen Shot 2022-09-20 at 1 20 50 PM" src="https://user-images.githubusercontent.com/855818/191166816-83314ad1-8b67-475b-aa10-d4cc59126295.png">

<img width="1048" alt="Screen Shot 2022-09-20 at 1 20 58 PM" src="https://user-images.githubusercontent.com/855818/191166829-1ceb65e0-9506-4328-9a2f-8b75b4e54404.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2690

Reviewed By: carolineechen

Differential Revision: D39654948

Pulled By: mthrok

fbshipit-source-id: 703d1526617596f647c85a7148f41ca55fffdbc8

Support in-memory decoding via Tensor wrapper in StreamReader (#2694)

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2694

This commit adds Tensor type as input to `StreamReader`.
The Tensor is interpreted as byte string buffer.

Reviewed By: hwangjeff

Differential Revision: D39467630

fbshipit-source-id: 6369eed5e16fbb657568bf6bb80d703483d72f8e

Add StreamReader Tensor Binding to src (#2699)

Summary:
In https://github.com/pytorch/audio/issues/2694 CMakeLists.txt was not properly updated, so the tests are failing. This commit fix it.

Pull Request resolved: https://github.com/pytorch/audio/pull/2699

Reviewed By: carolineechen

Differential Revision: D39687409

Pulled By: mthrok

fbshipit-source-id: 2e14f3c478f1f8a112a03839f2dbcca51215fed7

Adopt `:autosummary:` in `torchaudio.pipelines` module doc (#2689)

Summary:
* Introduce the mini-index at `torchaudio.pipelines` page.
* Add introductions
* Update pipeline tutorials

https://output.circle-artifacts.com/output/job/ccc57d95-1930-45c9-b967-c8d477d35f29/artifacts/0/docs/pipelines.html

<img width="1163" alt="Screen Shot 2022-09-20 at 1 23 29 PM" src="https://user-images.githubusercontent.com/855818/191167049-98324e93-2e16-41db-8538-3b5b54eb8224.png">

<img width="1115" alt="Screen Shot 2022-09-20 at 1 23 49 PM" src="https://user-images.githubusercontent.com/855818/191167071-4770f594-2540-43a4-a01c-e983bf59220f.png">

https://output.circle-artifacts.com/output/job/ccc57d95-1930-45c9-b967-c8d477d35f29/artifacts/0/docs/generated/torchaudio.pipelines.RNNTBundle.html#torchaudio.pipelines.RNNTBundle

<img width="1108" alt="Screen Shot 2022-09-20 at 1 24 18 PM" src="https://user-images.githubusercontent.com/855818/191167123-51b33a5f-c30c-46bc-b002-b05d2d0d27b7.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2689

Reviewed By: carolineechen

Differential Revision: D39691253

Pulled By: mthrok

fbshipit-source-id: ddf5fdadb0b64cf2867b6271ba53e8e8c0fa7e49

Add metadata mode for various datasets (#2697)

Summary:
Add metadata mode for the following SUPERB benchmark datasets
- QUESST14
- Fluent Speech Commands
- VoxCeleb1

follow ups:
- Add metadata mode for LibriMix -- waiting for unit tests to merge
- Add IEMOCAP + SNIPS datasets

Pull Request resolved: https://github.com/pytorch/audio/pull/2697

Reviewed By: mthrok

Differential Revision: D39666809

Pulled By: carolineechen

fbshipit-source-id: 3a8f07627acceed70f960f47e694efad75b108c2

Update and fix tutorials (#2701)

Summary:
* Fix Sphinx warning
* Update asset management

Pull Request resolved: https://github.com/pytorch/audio/pull/2701

Reviewed By: carolineechen

Differential Revision: D39714126

Pulled By: mthrok

fbshipit-source-id: a5b04cfbf8bedce67c621b6bfe1dcd975b343313

Adopt `:autosummary:` in `torchaudio.datasets` module doc (#2692)

Summary:
* Introduce the mini-index at `torchaudio.datasets` page.
* Standardize the format of return type docstring.

https://output.circle-artifacts.com/output/job/989328b2-0270-4958-b577-19cf749af3fd/artifacts/0/docs/datasets.html

<img width="936" alt="Screen Shot 2022-09-21 at 6 56 52 PM" src="https://user-images.githubusercontent.com/855818/191475141-a97f2bea-705f-49bc-8c34-6ec869e76793.png">

https://output.circle-artifacts.com/output/job/989328b2-0270-4958-b577-19cf749af3fd/artifacts/0/docs/generated/torchaudio.datasets.CMUDict.html#torchaudio.datasets.CMUDict

<img width="1069" alt="Screen Shot 2022-09-21 at 6 57 32 PM" src="https://user-images.githubusercontent.com/855818/191475293-e3302528-27ea-4212-9c12-fd6d900fdf3e.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2692

Reviewed By: carolineechen

Differential Revision: D39687463

Pulled By: mthrok

fbshipit-source-id: 4175fc15388817d2fe76206188618dd1576281df

Introduce IO section to getting started tutorials (#2703)

Summary:
Since that new tutorials for StreamWriter are being added, there are more tutorials for media IO than the rest.
So this commit introduces sub-index for IO tutorials.

Pull Request resolved: https://github.com/pytorch/audio/pull/2703

Reviewed By: carolineechen

Differential Revision: D39769049

Pulled By: mthrok

fbshipit-source-id: 19a3981bc624fdce1d5d703c67e28a751a15e812

[Nova] Moving Linux Wheels over to Nova (#2702)

Summary:
This does 2 things:

Comments out Linux Wheels-related jobs in CircleCI so that they are not run on nightlies/releases.
Adds a GHA workflow that calls the build workflow in pytorch/test-infra.
Testing:
Verified that the builds are triggered by this workflow, and all builds are green: https://github.com/pytorch/audio/actions/runs/3109635749/jobs/5040029155

Pull Request resolved: https://github.com/pytorch/audio/pull/2702

Reviewed By: seemethere

Differential Revision: D39756852

Pulled By: osalpekar

fbshipit-source-id: 7e222d80ca0720e3be43b929f1e55f5c0166b947

[perf][5/5] Replace IValue::toString()->string() with IValue::toStringRef() (#2700)

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2700

ATT for pytorch/audio

Reviewed By: mthrok

Differential Revision: D39707243

fbshipit-source-id: 1dc2a5a9fe913a9071e6df679e39d632b75212fb

Add CUDA version check (#2707)

Summary:
Adds check to ensure that TorchAudio and PyTorch versions use the same CUDA version.

Pull Request resolved: https://github.com/pytorch/audio/pull/2707

Reviewed By: mthrok

Differential Revision: D39791154

Pulled By: hwangjeff

fbshipit-source-id: de00889c7bac897c6b8762502f9d37797016b71d

Fix CUDA check (#2710)

Summary:
`torch.version.cuda` can return a string of form X.X or X.X.X. This PR modifies the CUDA version check to account for this.

Pull Request resolved: https://github.com/pytorch/audio/pull/2710

Reviewed By: carolineechen, nateanl

Differential Revision: D39796810

Pulled By: hwangjeff

fbshipit-source-id: b483bd8200195844d65d0caddebaf1b10f939b64

Remove linux wheel from circleci (#2714)

Summary:
Remove linux wheel from circleci

Pull Request resolved: https://github.com/pytorch/audio/pull/2714

Reviewed By: weiwangmeta

Differential Revision: D39816121

Pulled By: atalman

fbshipit-source-id: a3c99b530896888d7b4271d8b3f27f3c986b3480

Fix windows tests related to old conda on circleci (#2704)

Summary:
Conda version on circleCI prints following message:
```
==> WARNING: A newer version of conda exists. <==
  current version: 4.6.14
  latest version: 4.14.0
```
and as a result this error:

```
+ /c/tools/miniconda3/Scripts/conda.exe install -v -y -c pytorch-nightly -c nvidia pytorch numpy ffmpeg pytorch-cuda=11.6
Collecting package metadata: ...working... done
Solving environment: ...working...

Too long with no output (exceeded 30m0s): context deadline exceeded
```

This should update the conda version running on the system and allow us to install pytorch and run some tests.

Pull Request resolved: https://github.com/pytorch/audio/pull/2704

Reviewed By: weiwangmeta

Differential Revision: D39820037

Pulled By: atalman

fbshipit-source-id: 4a82a7a6cbe3dc1a5807ac669e2fa79f454037fa

[Nova] Add build-type argument for when upload should be triggered (#2706)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2706

Reviewed By: kit1980

Differential Revision: D39786253

Pulled By: osalpekar

fbshipit-source-id: 2a0c427f57e5c70ff1cf419b7e0c2316e5f0e16c

Back out "[audio][PR] [Nova] Moving Linux Wheels over to Nova" (#2718)

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2718

Original commit changeset: 7e222d80ca07

Original Phabricator Diff: D39756852 (https://github.com/pytorch/audio/commit/7ba7cf4d24a2967b8fa4aaff437116524281f8fd)

Reviewed By: weiwangmeta

Differential Revision: D39839899

fbshipit-source-id: f5605eb9882f7c7f0008e88338ab711131b29404

Fix mismatched cuda version in smoke tests on windows wheels (#2721)

Summary:
Example job that was failing previously:
https://app.circleci.com/pipelines/github/pytorch/audio/12796/workflows/ae96794a-6df4-4a2a-84df-ada7a7250045/jobs/927709

The failure:
```
"Detected that PyTorch and TorchAudio were compiled with different CUDA versions. "
RuntimeError: Detected that PyTorch and TorchAudio were compiled with different CUDA versions. PyTorch has CUDA version 11.7 whereas TorchAudio has CUDA version 11.6. Please install the TorchAudio version that matches your PyTorch version.
```

Has install command:
```
pip install $(ls ~/workspace/torchaudio*.whl) -f "https://download.pytorch.org/whl/${UPLOAD_CHANNEL}/torch_${UPLOAD_CHANNEL}.html"

pip install /c/Users/circleci/workspace/torchaudio-0.13.0.dev20220927+cu116-cp37-cp37m-win_amd64.whl -f https://download.pytorch.org/whl/nightly/torch_nightly.html
```

Linux job (succeeds) for uses different "-f" (find links) url, that includes specific cuda version:
https://app.circleci.com/pipelines/github/pytorch/audio/12809/workflows/aadca2ab-5a00-4a0a-ab6a-4a1b7a503713/jobs/927861

Command:
```
pip install $(ls ~/workspace/torchaudio*.whl) -f "https://download.pytorch.org/whl/${UPLOAD_CHANNEL}/${CU_VERSION}/torch_${UPLOAD_CHANNEL}.html"

 pip install /root/workspace/torchaudio-0.13.0.dev20220927+cu116-cp37-cp37m-linux_x86_64.whl -f https://download.pytorch.org/whl/nightly/cu116/torch_nightly.html

```

This PR makes Windows installation match the linux one.

Testing:
* verified command manually on Circle CI:
```
>>> import torch
>>> import torchaudio
C:\tools\miniconda3\lib\site-packages\torchaudio\compliance\kaldi.py:22: UserWarning: Failed to initialize NumPy: numpy.core.multiarray failed to import (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\utils\tensor_numpy.cpp:77.)
  EPSILON = torch.tensor(torch.finfo(torch.float).eps)
C:\tools\miniconda3\lib\site-packages\torchaudio\backend\utils.py:62: UserWarning: No audio backend is available.
  warnings.warn("No audio backend is available.")
```

Co-authered: weiwangmeta

Pull Request resolved: https://github.com/pytorch/audio/pull/2721

Reviewed By: hwangjeff

Differential Revision: D39870805

Pulled By: izaitsevfb

fbshipit-source-id: 2957cba4f53d00783a5c07099f24050ce15e7d1c

Removing cuda102 (#2715)

Summary:
Removing cuda102

Pull Request resolved: https://github.com/pytorch/audio/pull/2715

Reviewed By: hwangjeff

Differential Revision: D39823444

Pulled By: atalman

fbshipit-source-id: c11d798ab86cf9a6d5ed3804958b4a0c2f8a87ff

Revert "Removing cuda102 (#2715)" (#2723)

Summary:
Revert this fot now untill docker is updated

Pull Request resolved: https://github.com/pytorch/audio/pull/2723

Reviewed By: nateanl

Differential Revision: D39900382

Pulled By: atalman

fbshipit-source-id: f8701e359bc11e8f9f3a29144f7e7da336a470da

Cuda 102 deprecation (#2724)

Summary:
Cuda 10.2 deprecation, migration of unit tests from cuda 10.2 to cuda 11.6

Pull Request resolved: https://github.com/pytorch/audio/pull/2724

Reviewed By: weiwangmeta

Differential Revision: D39912484

Pulled By: atalman

fbshipit-source-id: e760b630375eae94384cda68d24f83ef46ada6d9

Delete packaging/README.md (#2730)

Summary:
The file looks hopelessly outdated.

Pull Request resolved: https://github.com/pytorch/audio/pull/2730

Reviewed By: mthrok

Differential Revision: D39993805

Pulled By: kit1980

fbshipit-source-id: f5ad97c83873061175455cc7b129ec71a9ec3d7d

Add citation for MuST-C dataset in Emformer RNNT pipeline. (#2728)

Summary:
The MuST-C reference is added in https://github.com/pytorch/audio/pull/2689. This PR adds the citation to the RNNT pipeline documentation.

Pull Request resolved: https://github.com/pytorch/audio/pull/2728

Reviewed By: carolineechen

Differential Revision: D39990882

Pulled By: nateanl

fbshipit-source-id: 011057952dd8aa30a4cb7c7af0ac75123e329d7e

Adopt :autosummary: to multiple modules (#2664)

Summary:
Adopt `:autosummary:` to various modules

    * torchaudio.compliance.kaldi
    * torchaudio.sox_effects
    * torchaudio.utils

Pull Request resolved: https://github.com/pytorch/audio/pull/2664

Reviewed By: nateanl

Differential Revision: D39841873

Pulled By: mthrok

fbshipit-source-id: ff4fa6976324fca5f35b737b715f976e2a722bac

Add StreamWriter media device/streaming tutorial (#2708)

Summary:
https://output.circle-artifacts.com/output/job/213c71c8-c9b5-4516-af92-a2f8dab2c9fd/artifacts/0/docs/tutorials/streamwriter_advanced.html

Pull Request resolved: https://github.com/pytorch/audio/pull/2708

Reviewed By: carolineechen

Differential Revision: D40013310

Pulled By: mthrok

fbshipit-source-id: 7226b021ce2fe951b3bf0bd41e93a6bbcf696124

Tweak tutorials (#2733)

Summary:
* Port downstream change https://github.com/pytorch/tutorials/pull/2060
* Fix inter-tutorial links and references

Pull Request resolved: https://github.com/pytorch/audio/pull/2733

Reviewed By: hwangjeff

Differential Revision: D40086902

Pulled By: hwangjeff

fbshipit-source-id: 00b04c6a1b68fb9fadd52b610b26ecaab15d52d8

Increase CircleCi no_output_timeout for `install binaries` steps (#2734)

Summary:
The goal is to to reduce the number of job failures due to timeouts, see https://app.circleci.com/pipelines/github/pytorch/audio/12882/workflows/f99da1a5-32e6-4bac-8ceb-fbf36d693e2d/jobs/936363?invite=true#step-105-105 for example.

Pull Request resolved: https://github.com/pytorch/audio/pull/2734

Reviewed By: weiwangmeta, atalman

Differential Revision: D40077578

fbshipit-source-id: 573f43a4d088a7086fa6925ac5ba1fdd1e8f39ec

Torchaudio load libary path fix for windows python 3.8 (#2735)

Summary:
Torchaudio load libary path fix for windows and python = 3.8

Fixes: https://github.com/pytorch/audio/issues/2726

Fixes following issue:

```
>>> import torchaudio
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\atalman\miniconda3\envs\mywin38\lib\site-packages\torchaudio\__init__.py", line 1, in <module>
    from torchaudio import (  # noqa: F401
  File "C:\Users\atalman\miniconda3\envs\mywin38\lib\site-packages\torchaudio\_extension.py", line 128, in <module>
    _init_extension()
  File "C:\Users\atalman\miniconda3\envs\mywin38\lib\site-packages\torchaudio\_extension.py", line 98, in _init_extension
    _load_lib("libtorchaudio")
  File "C:\Users\atalman\miniconda3\envs\mywin38\lib\site-packages\torchaudio\_extension.py", line 52, in _load_lib
    torch.ops.load_library(path)
  File "C:\Users\atalman\miniconda3\envs\mywin38\lib\site-packages\torch\_ops.py", line 573, in load_library
    ctypes.CDLL(path)
  File "C:\Users\atalman\miniconda3\envs\mywin38\lib\ctypes\__init__.py", line 373, in __init__
    self._handle = _dlopen(self._name, mode)
FileNotFoundError: Could not find module 'C:\Users\atalman\miniconda3\envs\mywin38\Lib\site-packages\torchaudio\lib\libtorchaudio.pyd' (or one of its dependencies). Try using the full path with constructor syntax.
>>>
```

Caused by dlls not being found in the conda environment
```
C:\Users\atalman\miniconda3\envs\mywin38\bin\
```

While this environment is set correctly in PATH its ignored with Python = 3.8
Please refer to: https://stackoverflow.com/questions/59330863/cant-import-dll-module-in-python

Pull Request resolved: https://github.com/pytorch/audio/pull/2735

Reviewed By: carolineechen

Differential Revision: D40112293

Pulled By: carolineechen

fbshipit-source-id: c7fc9bb49fc3ec4a2855c6ea473f36808103ed1e

Add StreamWriter tutorial (#2698)

Summary:
Add a tutorial for basic usage of torchaudio.io.StreamWriter.

https://output.circle-artifacts.com/output/job/55d9a495-af7a-483c-84cb-de9a08cfd2f3/artifacts/0/docs/tutorials/streamwriter_basic_tutorial.html

Pull Request resolved: https://github.com/pytorch/audio/pull/2698

Reviewed By: carolineechen

Differential Revision: D40133007

Pulled By: carolineechen

fbshipit-source-id: 141f692c32343981bfb228357f21562ffe36f623

Fix sphinx gallery list in io doc (#2736)

Summary:
Specifying multiple object in `:minigallery:` directive shows duplicated tutorials.

This commit fixes it by listing tutorials based on module used.

https://output.circle-artifacts.com/output/job/c3da2a22-40d5-4e2d-b73a-28b39e712817/artifacts/0/docs/io.html

Before:
<img width="694" alt="Screen Shot 2022-10-07 at 7 04 35 AM" src="https://user-images.githubusercontent.com/855818/194427092-ca1202e7-0731-4c18-b48b-24923d692a4a.png">

After:

<img width="648" alt="Screen Shot 2022-10-07 at 7 03 14 AM" src="https://user-images.githubusercontent.com/855818/194426950-5b780458-2bf0-43ef-b020-fcbbfdf8d41b.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2736

Reviewed By: carolineechen

Differential Revision: D40160247

Pulled By: carolineechen

fbshipit-source-id: 547496f9b569ff7a4d70db97e90f3ea503344477

Modify `info_audio` to compute and return number of frames if not found in stream info (#2740)

Summary:
Modifies `info_audio` to compute and return number of frames if not found in stream info. This resolves the `num_frames == 0` issue for mp3 that's cited in https://github.com/pytorch/audio/issues/2524.

Pull Request resolved: https://github.com/pytorch/audio/pull/2740

Reviewed By: nateanl

Differential Revision: D40168639

Pulled By: nateanl

fbshipit-source-id: bb45baa0f9cd56844315b04e40ab9835d825fc24

Update sox info docstring to account for mp3 frame count handling (#2742)

Summary:
Updates sox info docstring to account for mp3 frame count handling fix introduced in https://github.com/pytorch/audio/issues/2740.

Pull Request resolved: https://github.com/pytorch/audio/pull/2742

Reviewed By: nateanl

Differential Revision: D40189846

Pulled By: nateanl

fbshipit-source-id: d6371418d7d4867dd0b97ee72ebf846d5c93dc30

Update HW video processing tutorial (#2739)

Summary:
* Add HW encoding to HW tutorial

https://colab.research.google.com/drive/1DDah_IaGULEO66CfQWltRqaVheBkiXdN#scrollTo=eXzKSVrHk1vS

Pull Request resolved: https://github.com/pytorch/audio/pull/2739

Reviewed By: hwangjeff

Differential Revision: D40197086

Pulled By: hwangjeff

fbshipit-source-id: 1780a5419f6705f7c24ba96bd46c3310438af7db

Add IEMOCAP dataset (#2732)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2732

Reviewed By: nateanl

Differential Revision: D40186996

Pulled By: nateanl

fbshipit-source-id: a0ad325b7153c9e580dad2c515730dadbe8840c4

Fix HuBERT docstring (#2746)

Summary:
The docstring of `wav2vec2` argument is wrong. Fix it in this PR.

Pull Request resolved: https://github.com/pytorch/audio/pull/2746

Reviewed By: carolineechen

Differential Revision: D40225995

Pulled By: nateanl

fbshipit-source-id: 770e9c928ebebd7b6307e181601eb64625d668da

Add unit test for LibriMix dataset (#2659)

Summary:
Besides the unit test, the PR also addresses these issues:
- The original `LibriMix` dataset only supports "min" mode, which means the audio length is the minimum of all clean sources. It is default for source separation task. Users may also want to use "max" mode which allows for end-to-end separation and recognition. The PR adds ``mode`` argument to let users decide which dataset they want to use.
- If the task is ``"enh_both"``, the target is the audios in ``mix_clean`` instead of separate clean sources. The PR fixes it to use ``mix_clean`` as target.

Pull Request resolved: https://github.com/pytorch/audio/pull/2659

Reviewed By: carolineechen

Differential Revision: D40229227

Pulled By: nateanl

fbshipit-source-id: fc07e0d88a245e1367656d3767cf98168a799235

Add Snips Dataset (#2738)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2738

Reviewed By: carolineechen

Differential Revision: D40238099

Pulled By: nateanl

fbshipit-source-id: c5cc94c2a348a6ef34c04b8dd26114ecb874d73e

Fix windows python 3.8 loading path (#2747)

Summary:
Fix windows python 3.8 loading path

Pull Request resolved: https://github.com/pytorch/audio/pull/2747

Reviewed By: nateanl

Differential Revision: D40264326

Pulled By: nateanl

fbshipit-source-id: f4a24757de7b48c63a7481034eb11fc3ff174327

Add metadata for Librimix (#2751)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2751

Reviewed By: nateanl

Differential Revision: D40267874

Pulled By: carolineechen

fbshipit-source-id: 4e45a02c650ed65c05cde82289a400a3be877927

Increase inactivity timeout for binary build jobs (#2754)

Summary:
Increase inactivity timeout for binary build jobs

Pull Request resolved: https://github.com/pytorch/audio/pull/2754

Reviewed By: carolineechen

Differential Revision: D40275368

Pulled By: atalman

fbshipit-source-id: 5e682bb78bda640d615f874fbdf0e650b5a38ee0

Skip hubert xlarge torchscript test (#2758)

Summary:
a couple of circleci unittests are failing during hubert xlarge torchscript test, which has been known to fail on Windows in the past (#65776). this PR disables this test on circleci

cc atalman

Pull Request resolved: https://github.com/pytorch/audio/pull/2758

Reviewed By: mthrok

Differential Revision: D40290535

Pulled By: carolineechen

fbshipit-source-id: 5c5fb43434a517b6c439a8cb8e853015d1550a57

Improve wav2vec2/hubert model for pre-training (#2716)

Summary:
This PR improves the Wav2Vec2/HuBERT model regarding model pre-training.

- The model initialization of positional embedding and transformer module is essential to model pre-training. The accuracy of unmasked frames should be higher than masked frames, as it is an easier task. but without the initialization, the accuracy of masked frames is higher than unmasked frames.
  Compared the performance after two epochs with 16 GPUs.
  - With model initialization, the accuracies of masked/unmasked frames are 0.08/0.11.
  - Without model initialization, the accuracies of masked/unmasked frames are 0.06/0.04.
- After adding the model initialization, the gradient is easy to overflow (aka `nan` gradient). In paper [Self-Supervised Learning for speech recognition with Intermediate layer supervision](https://arxiv.org/abs/2112.08778) the authors propose a simple but effective method to mitigate the overflow issue, by scaling down the multiplication of query and key and subtracting the maximum value from it (subtracting a constant value won't change the output of softmax). Then it guarantees the value won't be overflowed.
- In the original fairseq, the mask indices are generated by `numpy.random.choice`. Here replace `torch.multinomial` with `torch.randperm`. (cc carolineechen).

Other improvements within training scripts will be included in a separate PR.

Pull Request resolved: https://github.com/pytorch/audio/pull/2716

Reviewed By: xiaohui-zhang

Differential Revision: D39832189

Pulled By: nateanl

fbshipit-source-id: f4d2a473a79ad63add2dd16624bd155d5ce4de27

Improve hubert recipe for pre-training and fine-tuning (#2744)

Summary:
following pr https://github.com/pytorch/audio/issues/2716
- For preprocessing
  - The HuBERT feature takes lots of memory which may not fit some machines. Enable to use a subset of feature for training a k-means model.

- For pre-training
  - Normalize the loss based on the total number of masked frames across all GPUs.
  - Use mixed precision training. fp16 is not well supported in pytorch_lightning.
  - Log accuracies of masked/unmasked frames during training.
  - Clip the gradients with norm `10.0`.

- For ASR fine-tuning
  - Normalize the loss based on the total number of batches across all GPUs, same as in the conformer recipe of TorchAudio.
  - Use mixed precision training.
  - Add "|" after the end of transcription to capture the silence/word termination, same as in fairseq recipe.

- Update the WER results on LibriSpeech dev and test sets.

|                   | WER% (Viterbi)|  WER% (KenLM) |
|:-----------------:|--------------:|--------------:|
| dev-clean         |       10.9    |       4.2     |
| dev-other         |       17.5    |       9.4     |
| test-clean        |       10.9    |       4.4     |
| test-other        |       17.8    |       9.5     |

Pull Request resolved: https://github.com/pytorch/audio/pull/2744

Reviewed By: carolineechen

Differential Revision: D40282322

Pulled By: nateanl

fbshipit-source-id: 4723584c912e70e8970149fe09de005385eaab90

Fix typos in tacotron2 tutorial (#2761)

Summary:
`publishe`->`published`

Also, not sure if it should be `pre-trained weight is published` or `pre-trained weights are published`

Pull Request resolved: https://github.com/pytorch/audio/pull/2761

Reviewed By: carolineechen

Differential Revision: D40313042

Pulled By: malfet

fbshipit-source-id: c22085ca0b1125a06aa04bf38231d0a9fbfed00b

Add gtzan download note (#2763)

Summary:
GTZAN download link is no longer working, so the torchaudio download functionality for GTZAN does not work properly, per https://github.com/pytorch/audio/issues/2743. Add a note in the docs to reflect this discovery.

Pull Request resolved: https://github.com/pytorch/audio/pull/2763

Reviewed By: nateanl, mthrok

Differential Revision: D40315071

Pulled By: carolineechen

fbshipit-source-id: 3250326c45d227546a9c62b33ba890199ad19242

Update tutorial author information (#2764)

Summary:
Adding and updating author information.

Pull Request resolved: https://github.com/pytorch/audio/pull/2764

Reviewed By: carolineechen

Differential Revision: D40332427

Pulled By: mthrok

fbshipit-source-id: 4f04c7351386c122e3b0a45c2ed1757a04b7dc9a

Add custom lm example to decoder tutorial (#2762)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2762

Reviewed By: mthrok

Differential Revision: D40332603

Pulled By: carolineechen

fbshipit-source-id: 2de51265adc81b4728f4d6798d287bd2eccf5251

Fix CTCDecoder doc (#2766)

Summary:
* Document `__call__` instead of `__init__`
* List CTCHypothesis first as it is used in combination with CTCDecoder
* Fix indentation of score method docstring

Pull Request resolved: https://github.com/pytorch/audio/pull/2766

Reviewed By: carolineechen

Differential Revision: D40349388

Pulled By: mthrok

fbshipit-source-id: 5e512e6c2b29d3533eb62d09b289154ccd1abf4c

Fix fading in hybrid demucs tutorial (#2769)

Summary:
The separation applies on chunks of audios to avoid OOM. The combination of consecutive chunks is described in the graph:

![image](https://user-images.githubusercontent.com/8653221/195691886-002844e6-4ec5-41de-8910-df8046553998.png)

In the last audio chunk, there is no future chunk to be combined, hence the overlap on the right side doesn't need to be faded.

Pull Request resolved: https://github.com/pytorch/audio/pull/2769

Reviewed By: carolineechen

Differential Revision: D40358382

Pulled By: nateanl

fbshipit-source-id: ec8be895d7a67acb257e2693b64922397163ed5e

Fix leaking matplotlib figure (#2771)

Summary:
In StreamWriter basic usage tutorial, matplotlib is used to generate raster images of waveforms, and the figure used is left unshown in the resulting tutorial with the use of ``sphinx_gallery_defer_figures`` command.

It turned out that this figure is shown in the next code block executed by Sphinx Gallery, and the figure is placed in totally unrelated place. https://pytorch.org/audio/main/tutorials/audio_feature_extractions_tutorial.html

<img width="951" alt="Screen Shot 2022-10-14 at 10 06 58 PM" src="https://user-images.githubusercontent.com/855818/195855124-ecd9be49-5085-4acd-9a93-608d9d1ee9ce.png">

This commit fixes it by closing the figure.

Pull Request resolved: https://github.com/pytorch/audio/pull/2771

Reviewed By: nateanl

Differential Revision: D40382076

Pulled By: mthrok

fbshipit-source-id: 015f2bab8492d3b4fbe70e1174c7776a5aa2679a

Update resampling tutorial (#2773)

Summary:
* Refactor benchmark script
* Rename `time` variable to avoid (potential) conflicting with time module
* Fix `beta` parameter in benchmark (it was not used previously)
* Use `timeit` module for benchmark
* Add plot
* Move the comment on result at the end
* Add link to an explanation of aliasing

https://output.circle-artifacts.com/output/job/20b57d2f-3614-4161-a18e-e0c1a537739c/artifacts/0/docs/tutorials/audio_resampling_tutorial.html

Pull Request resolved: https://github.com/pytorch/audio/pull/2773

Reviewed By: carolineechen

Differential Revision: D40421337

Pulled By: mthrok

fbshipit-source-id: b402f84d4517695daeca75fb84ad876ef9354b3a

Update description of HDemucs pipelines (#2774)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2774

Reviewed By: carolineechen

Differential Revision: D40445274

Pulled By: nateanl

fbshipit-source-id: 6388323a5fa5c548a86829cb3f7cafee5382d18d

Add file_name to the returned item in Snips dataset (#2775)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2775

Reviewed By: carolineechen

Differential Revision: D40481144

Pulled By: nateanl

fbshipit-source-id: 5d0fb2478767704603a3ec28d74160e7892d4d0e

Update download path for speechcommands (#2777)

Summary:
previous download link for v0.02 did not download the entire dataset, but only the training dataset, resulting in issues when trying to access the testing or validation data.

Pull Request resolved: https://github.com/pytorch/audio/pull/2777

Reviewed By: nateanl

Differential Revision: D40480605

Pulled By: carolineechen

fbshipit-source-id: a594506b4ccfb548a7d5043b716c58463480c103

Add notes on file structure in Voxceleb1 based datasets (#2776)

Summary:
The file structure of VoxCeleb1 is as follows:
```
root/
└── wav/
    └── speaker_id folders
```
Users who use [Kaldi](https://github.com/kaldi-asr/kaldi/blob/f6f4ccaf213f0fe8b26e633a7dc0c802150626a0/egs/voxceleb/v1/local/make_voxceleb1_v2.pl) to get the VoxCeleb1 dataset have "dev" and "test" folders above "wav" folder. However, in the file lists like https://www.robots.ox.ac.uk/~vgg/data/voxceleb/meta/veri_test.txt or https://www.robots.ox.ac.uk/~vgg/data/voxceleb/meta/iden_split.txt there is not such differentiation. It's not necessary to put the extracted files into separate folders.

This PR adds notes in `VoxCeleb1Identification` and `VoxCeleb1Verification` datasets to inform the file structure to users.

Pull Request resolved: https://github.com/pytorch/audio/pull/2776

Reviewed By: carolineechen

Differential Revision: D40483707

Pulled By: nateanl

fbshipit-source-id: ccd1780a72a5b53f0300c2466c3073a293ad7b8d

[Nova] New GHA Workflow for Docstring Sync (#2720)

Summary:
Create a standalone GitHub Actions workflow for Docstring Sync. This job (https://app.circleci.com/pipelines/github/pytorch/audio/12625/workflows/96223ad2-0fcd-4dae-a045-d530aaf9b55c/jobs/907466) currently depends on linux wheels builds, which creates a dependency that makes the migration to Nova trickier. This PR creates a fresh standalone workflow for this job that is triggered per-PR and before nightly/release cuts.

Pull Request resolved: https://github.com/pytorch/audio/pull/2720

Reviewed By: izaitsevfb, seemethere

Differential Revision: D39863574

Pulled By: osalpekar

fbshipit-source-id: 8599dc006693242278857a3dedeb4fddc1eed14b

[Nova] Clean commit for Enabling Nova Linux Wheels Workflows (#2719)

Summary:
Creating this fresh PR since we're reverting the older commit that removed build configs from the CircleCI file. This does not change the existing builds/uploads in CircleCI, and should not break any existing jobs/workflows. This is just to add back workflows to build the Linux Wheels with Nova, upload them to GH artifacts (NOT to the actual nightly channels), and ensure that they produce the same binaries as CircleCI. TO CLARIFY: this does not upload anything to nightly channels, so this PR has not effect on any existing jobs or distributed binaries.

We will create a workflow (most likely in test-infra) that does this comparison between the binaries to ensure there is parity between the binaries before we start uploading with Nova.

Pull Request resolved: https://github.com/pytorch/audio/pull/2719

Reviewed By: hwangjeff, weiwangmeta

Differential Revision: D39866440

Pulled By: osalpekar

fbshipit-source-id: 9ebf0402214fcd97cc519801276d85d336617410

Add iemocap variants (#2778)

Summary:
add ability to load only improvised or only scripted utterances.

Pull Request resolved: https://github.com/pytorch/audio/pull/2778

Reviewed By: nateanl

Differential Revision: D40511865

Pulled By: carolineechen

fbshipit-source-id: e1fe3908ac2aa306ad30c242ddd25762b2268539

Bump version to 0.14 (#2779)

Summary:
Bump version to 0.14

Pull Request resolved: https://github.com/pytorch/audio/pull/2779

Reviewed By: carolineechen

Differential Revision: D40523034

Pulled By: atalman

fbshipit-source-id: 325e6ffcac4763a7d83ba600c2c3d9eadae03c31

Fix doc in torchaudio.backend (#2781)

Summary:
address https://github.com/pytorch/audio/issues/2780

Pull Request resolved: https://github.com/pytorch/audio/pull/2781

Reviewed By: carolineechen, mthrok

Differential Revision: D40556794

Pulled By: nateanl

fbshipit-source-id: b24912489d41e5663b4b4dcfb8be743fb962097e

Remove archive file in gh-pages branch (#2786)

Summary:
The motivation of generating `artifact.tar.gz` in the `build_docs` job is to easily use it for adding documentation in each stable release. But it is committed into `gh-pages` branch which causes the git repository very huge (see https://github.com/pytorch/audio/issues/2783). This PR removes the tar file from the commit.

Pull Request resolved: https://github.com/pytorch/audio/pull/2786

Reviewed By: carolineechen

Differential Revision: D40591152

Pulled By: nateanl

fbshipit-source-id: 47df60c2ec7bcdcc40e2b6…
BriansIDP pushed a commit to BriansIDP/audio that referenced this pull request Jan 21, 2023
first commit BrianSun

Conformer RNN-T with TCPGen for biasing

parent c68625152ad84f9ea4e881fac695f7d98ee326a9
author Caroline Chen <carolinechen@fb.com> 1659983982 -0700
committer G. Sun <gs534@login-e-3.data.cluster> 1674296079 +0000

parent c68625152ad84f9ea4e881fac695f7d98ee326a9
author Caroline Chen <carolinechen@fb.com> 1659983982 -0700
committer G. Sun <gs534@login-e-3.data.cluster> 1674296047 +0000

parent c68625152ad84f9ea4e881fac695f7d98ee326a9
author Caroline Chen <carolinechen@fb.com> 1659983982 -0700
committer G. Sun <gs534@login-e-3.data.cluster> 1674295932 +0000

parent c68625152ad84f9ea4e881fac695f7d98ee326a9
author Caroline Chen <carolinechen@fb.com> 1659983982 -0700
committer G. Sun <gs534@login-e-3.data.cluster> 1674295795 +0000

parent c68625152ad84f9ea4e881fac695f7d98ee326a9
author Caroline Chen <carolinechen@fb.com> 1659983982 -0700
committer G. Sun <gs534@login-e-3.data.cluster> 1674295664 +0000

parent c68625152ad84f9ea4e881fac695f7d98ee326a9
author Caroline Chen <carolinechen@fb.com> 1659983982 -0700
committer G. Sun <gs534@login-e-3.data.cluster> 1674295524 +0000

parent c68625152ad84f9ea4e881fac695f7d98ee326a9
author Caroline Chen <carolinechen@fb.com> 1659983982 -0700
committer G. Sun <gs534@login-e-3.data.cluster> 1674295462 +0000

Fix stylecheck (#2606)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2606

Reviewed By: nateanl

Differential Revision: D38502666

Pulled By: carolineechen

fbshipit-source-id: 1e279996fff3621835a07882c63328856fe38f3a

Add NNLM support to CTC Decoder (#2528)

Summary:
Expose flashlight's LM and LMState classes to support decoding with custom language models, including NN LMs.

The `ctc_decoder` API is as follows
- To decode with KenLM, pass in KenLM language model path to `lm` variable
- To decode with custom LM, create Python class with `CTCDecoderLM` subclass, and pass in the class to `lm` variable. Additionally create a file of LM words listed in order of the LM index, with a word per line, and pass in the file to `lm_path`.
- To decode without a language model, set `lm` to `None` (default)

Validated against fairseq w2l decoder on sample LibriSpeech dataset and LM. Code for validation can be found [here](https://github.com/facebookresearch/fairseq/compare/main...carolineechen:fairseq:ctc-decoder). Also added unit tests to validate custom implementations of ZeroLM and KenLM, and also using a biased LM.

Follow ups:
- Train simple LM on LibriSpeech and demonstrate usage in tutorial or examples directory

cc jacobkahn

Pull Request resolved: https://github.com/pytorch/audio/pull/2528

Reviewed By: mthrok

Differential Revision: D38243802

Pulled By: carolineechen

fbshipit-source-id: 445e78f6c20bda655aabf819fc0f771fe68c73d7

Fix dataset docs parsing issue with extra spaces (#2607)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2607

Reviewed By: carolineechen, nateanl

Differential Revision: D38522606

Pulled By: skim0514

fbshipit-source-id: 2c38b8dcb343bcf624bfda1bfa2afd91abf2e668

Fixed argument validation in TorchAudio filtering (#2609)

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2609

Converted argument validations in torchaudio/functional/filtering from assert based validation to the preferred if-then raise validation. Added specific error messages in all cases.

Reviewed By: mthrok

Differential Revision: D38515029

fbshipit-source-id: 6c644a042f86c6feb2bbe8bd02fdb484fe27fae9

Fix bug in Conformer RNN-T recipe (#2611)

Summary:
https://github.com/pytorch/audio/issues/2535 modified the Conformer RNN-T Lightning module to accept a SentencePiece model instance rather than a file path. This PR makes changes to account for this in the train script.

Pull Request resolved: https://github.com/pytorch/audio/pull/2611

Reviewed By: carolineechen

Differential Revision: D38578892

Pulled By: hwangjeff

fbshipit-source-id: ec3b9823ad30ffb730baa13d10d8b79020866aac

Add additive noise function (#2608)

Summary:
Adds function `add_noise`, which computes and returns the sum of a waveform and scaled noise.

Pull Request resolved: https://github.com/pytorch/audio/pull/2608

Reviewed By: nateanl

Differential Revision: D38557141

Pulled By: hwangjeff

fbshipit-source-id: 1457fa213f43ca5b4333d3c7580971655d4260a0

Introducing pytorch-cuda metapackage (#2612)

Summary:
Introducing pytorch-cuda metapackage

Same as: https://github.com/pytorch/vision/pull/6371
Following PR: https://github.com/pytorch/builder/pull/1094
Adds cuda metapackage called pytorch-cuda . This way we can make sure to install correct version of cuda dependencies and don't depend on conda-forge.

Pull Request resolved: https://github.com/pytorch/audio/pull/2612

Reviewed By: hwangjeff, seemethere, nateanl

Differential Revision: D38633332

Pulled By: atalman

fbshipit-source-id: 78a6115bb252ebdb6d66a57d7d2c4a4978ddb501

Remove outdated doc (#2617)

Summary:
`ctc_decoder` has become beta, remove it from prototype documents.

Pull Request resolved: https://github.com/pytorch/audio/pull/2617

Reviewed By: hwangjeff

Differential Revision: D38706869

Pulled By: nateanl

fbshipit-source-id: 41679f4e65a584b6b882af4551a50123f1dcef02

Update doc version selector link (#2605)

Summary:
The link to version selector has been absolute link, which had been
a trap when reviewing gh-pages deployment from folk.

This commit changes that to relative link.

Pull Request resolved: https://github.com/pytorch/audio/pull/2605

Test Plan:
- https://mthrok.github.io/audio/main/index.html -> click version selector -> https://mthrok.github.io/audio/versions.html
- https://mthrok.github.io/audio/0.12.1/index.html -> click version selector -> https://pytorch.org/audio/versions.html

Reviewed By: carolineechen, nateanl

Differential Revision: D38695645

Pulled By: mthrok

fbshipit-source-id: 91132ac19b8c61f39d304a162435b9c6599ef2b2

Fix anaconda upload (#2621)

Summary:
Same as:
https://github.com/pytorch/vision/pull/6422

Testing:
```
export ANACONDA_PATH=$(conda info --base)/bin
echo $ANACONDA_PATH
/opt/homebrew/Caskroom/miniconda/base/bin
$ANACONDA_PATH/anaconda -V
anaconda Command line client (version 1.10.0)
```
Failure: https://github.com/pytorch/audio/runs/7837085749?check_suite_focus=true

Pull Request resolved: https://github.com/pytorch/audio/pull/2621

Reviewed By: weiwangmeta, seemethere

Differential Revision: D38714324

Pulled By: atalman

fbshipit-source-id: 55342cf69006e9250403c955202846bab4516f3e

Move xcode to 14 from 12.5 (#2622)

Summary:
Similar to https://github.com/pytorch/vision/pull/6218
Fixing MacOS builds

Pull Request resolved: https://github.com/pytorch/audio/pull/2622

Reviewed By: weiwangmeta

Differential Revision: D38722983

Pulled By: atalman

fbshipit-source-id: 4cef85c97dc270fc812bc289592c4f3815f73c85

Added example for MelScale transform (#2616)

Summary:
Added example for MelScale transform as mentioned in issue https://github.com/pytorch/audio/issues/1564.

Pull Request resolved: https://github.com/pytorch/audio/pull/2616

Reviewed By: carolineechen

Differential Revision: D38743145

Pulled By: nateanl

fbshipit-source-id: e24ca92f5317a0ea5a141418bf084b12cfb22486

Added example for AmplitudeToDB transform (#2615)

Summary:
Added example for AmplitudeToDB transform as mentioned in issue https://github.com/pytorch/audio/issues/1564.

Pull Request resolved: https://github.com/pytorch/audio/pull/2615

Reviewed By: carolineechen

Differential Revision: D38743117

Pulled By: nateanl

fbshipit-source-id: bf0f760299f4777a4bca65da86359faa00b16207

Use double quotes for string in functional and transforms (#2618)

Summary:
To make the code consistent, we should use double quotation marks for all strings. This PR make such changes in functional and transforms.

Pull Request resolved: https://github.com/pytorch/audio/pull/2618

Reviewed By: carolineechen

Differential Revision: D38744137

Pulled By: nateanl

fbshipit-source-id: 74213a24d9f66c306cc92019d77dcb2a877f94bd

Fix doc warning (#2627)

Summary:
Resolves the following warning

```
/torchaudio/docs/source/transforms.rst:94: WARNING: Title underline too short.

:hidden:`Loudness`
-----------------
```

Pull Request resolved: https://github.com/pytorch/audio/pull/2627

Reviewed By: carolineechen

Differential Revision: D38814802

Pulled By: mthrok

fbshipit-source-id: 5dfaf2d7bae22dba0f4a14f04ca63f28d6b2a749

Fix Sphinx-gallery display and pin sphinx-related packages (#2629)

Summary:
This commit fixes the issue with the recent Sphinx-Gallery update.
Also it pins the versions of Sphinx-related packages.

Before:

<img width="256" alt="Screen Shot 2022-08-17 at 10 02 23 PM" src="https://user-images.githubusercontent.com/855818/185140952-28f2d98a-b586-424c-a003-b69089f48eb9.png">

After:

https://user-images.githubusercontent.com/855818/185271889-bd4f86a0-986b-43bb-8121-bd77750d74f0.mov

Pull Request resolved: https://github.com/pytorch/audio/pull/2629

Reviewed By: carolineechen

Differential Revision: D38816417

Pulled By: mthrok

fbshipit-source-id: 11ee3f9121d9a302772ee1f461dacae52eb28852

Tweak tutorials (#2630)

Summary:
Resolves the following warnings

```
/torchaudio/docs/source/tutorials/asr_inference_with_ctc_decoder_tutorial.rst:195: WARNING: Unexpected indentation.
/torchaudio/docs/source/tutorials/asr_inference_with_ctc_decoder_tutorial.rst:446: WARNING: Unexpected indentation.
/torchaudio/docs/source/tutorials/audio_io_tutorial.rst:559: WARNING: Content block expected for the "note" directive; none found.
/torchaudio/docs/source/tutorials/mvdr_tutorial.rst:338: WARNING: Bullet list ends without a blank line; unexpected unindent.
```

Pull Request resolved: https://github.com/pytorch/audio/pull/2630

Reviewed By: nateanl

Differential Revision: D38816632

Pulled By: mthrok

fbshipit-source-id: 135ded4e064d136be67ce24439e96f5e9c9ce635

Update notes around nightly build and third parties (#2632)

Summary:
Google Colab now has torchaudio 0.12 pre-installed.
This commit removes the note about nightly build.

Pull Request resolved: https://github.com/pytorch/audio/pull/2632

Reviewed By: carolineechen

Differential Revision: D38827632

Pulled By: mthrok

fbshipit-source-id: ac769780868b741c3012357d589ec0019d9af6eb

Added example for InverseMelScale transform (#2635)

Summary:
Added example for InverseMelScale transform as mentioned in issue https://github.com/pytorch/audio/issues/1564.

Pull Request resolved: https://github.com/pytorch/audio/pull/2635

Reviewed By: carolineechen

Differential Revision: D38830318

Pulled By: nateanl

fbshipit-source-id: fd26a700d495f6755db0767625aa8577cb89bd83

Update ASR inference tutorial (#2631)

Summary:
* Use download_asset
* Remove notes around nightly
* Print versions first
* Remove duplicated import

Pull Request resolved: https://github.com/pytorch/audio/pull/2631

Reviewed By: carolineechen

Differential Revision: D38830395

Pulled By: mthrok

fbshipit-source-id: c9259df33562defe249734d1ed074dac0fddc2f6

Update README.md (#2633)

Summary:
Update compatibility matrix

Pull Request resolved: https://github.com/pytorch/audio/pull/2633

Reviewed By: nateanl

Differential Revision: D38827670

Pulled By: mthrok

fbshipit-source-id: 5c66bf60a06e37919ee725a5f4adf571e6c89100

Refactor sox pybind source code (#2636)

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2636

At the early stage of torchaudio extension module,
`torchaudio/csrc/pybind` directory was created so that
all the code defining Python interface would be placed
there and there will be only one extension module called
`torchaudio._torchaudio`.

However, the codebase has been evolved in a way separate
extensions are defined for each feature (third party
dependency) for the sake of more moduler file organization.

What is left in `csrc/pybind` is libsox Python bindings.
This commit moves it under `csrc/sox`.

Follow-up rename `torchaudio._torchaudio` to `torchaudio._torchaudio_sox`.

Reviewed By: carolineechen

Differential Revision: D38829253

fbshipit-source-id: 3554af45a2beb0f902810c5548751264e093f28d

Added example for MFCC transform (#2637)

Summary:
Added example for MFCC transform as mentioned in issue https://github.com/pytorch/audio/issues/1564.

Note: Python formatter package `black` uses double quotes for the string dict keys (e.g. in `melkwargs` for this example). Please let me know if there is a different linter/format/convention that is preferred!

Pull Request resolved: https://github.com/pytorch/audio/pull/2637

Reviewed By: carolineechen

Differential Revision: D38873729

Pulled By: nateanl

fbshipit-source-id: 2e8fe2930671e7c5d02c0c37cf1ca5cc8c5079e3

Added example for Loudness transform (#2641)

Summary:
Added example for Loudness transform (implemented in PR https://github.com/pytorch/audio/issues/2472) as mentioned in issue https://github.com/pytorch/audio/issues/1564.

Pull Request resolved: https://github.com/pytorch/audio/pull/2641

Reviewed By: nateanl

Differential Revision: D38907782

Pulled By: carolineechen

fbshipit-source-id: fd2bcc4bac3095a626ea9cf36cb70cb2bf003d63

Update Sphinx-gallery to 0.11.1 (#2638)

Summary:
The minor release fixes some gallery issue, which allows to remove
some of the customization we had in https://github.com/pytorch/audio/issues/2629

https://output.circle-artifacts.com/output/job/553a9b98-8260-4cb4-a681-20ef97d2c33e/artifacts/0/docs/pipelines.html#torchaudio.pipelines.Wav2Vec2ASRBundle

Pull Request resolved: https://github.com/pytorch/audio/pull/2638

Reviewed By: carolineechen, nateanl

Differential Revision: D38909097

Pulled By: mthrok

fbshipit-source-id: 78346d93b54fca2a19b28991c224324ef53221c9

[Nova] Added draft calling GHA workflow for building linux wheels (#2548)

Summary:
As part of Project Nova, we are consolidating CI/CD workflows and infra, making them reusable across PyTorch ecosystem libraries. https://github.com/pytorch/test-infra/pull/460 introduces a general-purpose reusable workflow to build linux wheels for python libraries. This PR introduces a caller workflow that triggers the reusable workflow. Details around modular env setup, passing input args across workflows, etc. are still being worked out.

Using reusable workflow defined in https://github.com/pytorch/test-infra/pull/506

Pull Request resolved: https://github.com/pytorch/audio/pull/2548

Reviewed By: osalpekar

Differential Revision: D38947733

Pulled By: mehtanirav

fbshipit-source-id: 03ab88cef973a092f5c5d1ff8c74ec7ae7e46d01

Added example for LFCC transform (#2640)

Summary:
Added example for LFCC transform as mentioned in issue https://github.com/pytorch/audio/issues/1564.

Pull Request resolved: https://github.com/pytorch/audio/pull/2640

Reviewed By: carolineechen

Differential Revision: D38908975

Pulled By: nateanl

fbshipit-source-id: ffdd994390db7f27556b011a8050a65eef9cd09d

Add StreamWriter (#2628)

Summary:
This commit adds FFmpeg-based encoder StreamWriter class.
StreamWriter is pretty much the opposite of StreamReader class, and
it supports;

* Encoding audio / still image / video
* Exporting to local file / streaming protocol / devices etc...
* File-like object support (in later commit)
* HW video encoding (in later commit)

See also: https://fburl.com/gslide/z85kn5a9 (Meta internal)

Pull Request resolved: https://github.com/pytorch/audio/pull/2628

Reviewed By: nateanl

Differential Revision: D38816650

Pulled By: mthrok

fbshipit-source-id: a9343b0d55755e186971dc96fb86eb52daa003c8

[Nova] Build Linux Conda Binaries using reusable workflow (#2626)

Summary:
Calling the reusable workflow introduced in https://github.com/pytorch/test-infra/pull/546 to build conda binaries on linux.

Pull Request resolved: https://github.com/pytorch/audio/pull/2626

Reviewed By: mehtanirav

Differential Revision: D39028057

Pulled By: osalpekar

fbshipit-source-id: d74ea3771967d0ee2b0ad28a8f811a95145b2183

Replace bg_iterator in examples (#2645)

Summary:
`bg_iterator` was deprecated in 0.11 because it was known to have issues (deadlock) without speed up. Remove instances of `bg_iterator` used in torchaudio examples.

Resolves https://github.com/pytorch/audio/issues/2642

Pull Request resolved: https://github.com/pytorch/audio/pull/2645

Reviewed By: nateanl

Differential Revision: D38954292

Pulled By: carolineechen

fbshipit-source-id: 2333ab5228c2b8511ff532057543aaf9d02b2789

[Nova] Use pkg-helpers to modularize GHA Linux Conda Builds (#2650)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2650

Reviewed By: mehtanirav

Differential Revision: D39040559

Pulled By: osalpekar

fbshipit-source-id: df39e23d7c246728793aab969b8dc1070af88d75

add CUDA 11.7 builds (#2623)

Summary:
CC atalman

Pull Request resolved: https://github.com/pytorch/audio/pull/2623

Reviewed By: hwangjeff, nateanl

Differential Revision: D39036432

Pulled By: atalman

fbshipit-source-id: cd74a1bf8f74e31bd2c32c80d32c617f4b1766e8

Add file-like object support to StreamWriter (#2648)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2648

Reviewed By: nateanl

Differential Revision: D38976874

Pulled By: mthrok

fbshipit-source-id: 0541dea2a633d97000b4b8609ff6b83f6b82c864

Add CUDA HW encoding support to StreamWriter (#2505)

Summary:
This commits add CUDA hardware encoding to StreamWriter.
For certain video formats, it can encode video directly from
CUDA Tensor, without needing to move the data to host CPU.

Pull Request resolved: https://github.com/pytorch/audio/pull/2505

Reviewed By: hwangjeff

Differential Revision: D37446830

Pulled By: mthrok

fbshipit-source-id: eee6424f01a99a3b611dcad45ed58f86cba4672a

Remove obsolete examples (#2655)

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2655

Removed obsolete example and the corresponding test

Reviewed By: mthrok

Differential Revision: D39260253

fbshipit-source-id: 0bde71ffd75dd0c94a5cc4a9940f4648a5d61bd7

Add metadata function for LibriSpeech (#2653)

Summary:
Adding support for metadata mode, requested in https://github.com/pytorch/audio/issues/2539, by adding a public `get_metadata()` function in the dataset. This function can be used directly by users to fetch metadata for individual dataset indices, or users can subclass the dataset and override `__getitem__` with `get_metadata` to create a dataset class that directly handles metadata mode.

Pull Request resolved: https://github.com/pytorch/audio/pull/2653

Reviewed By: nateanl, mthrok

Differential Revision: D39105114

Pulled By: carolineechen

fbshipit-source-id: 6f26f1402a053dffcfcc5d859f87271ed5923348

Fix random Gaussian generation (#2639)

Summary:
This PR is meant to address the bug raised in issue https://github.com/pytorch/audio/issues/2634.

In particular, previously the Box Muller transform was used to generate Gaussian variates for dithering based on `torch.rand` uniform variates, but it was incorrectly implemented (e.g. the same uniform variate was used as input to the transform, rather than two different uniform variates), which led to a different (non-Gaussian) distribution. This PR instead uses `torch.randn` to generate the Gaussian variates.

Pull Request resolved: https://github.com/pytorch/audio/pull/2639

Reviewed By: mthrok

Differential Revision: D39101144

Pulled By: carolineechen

fbshipit-source-id: 691e49679f6598ef0a1675f6f4ee721ef32215fd

Tweak documentation (#2656)

Summary:
1. Override class `__module__` attribute in `conf.py` so that no manual override is necessary
2. Fix SourceSeparationBundle member attribute

Pull Request resolved: https://github.com/pytorch/audio/pull/2656

Reviewed By: carolineechen

Differential Revision: D39293053

Pulled By: mthrok

fbshipit-source-id: 2b8d6be1aee517d0e692043c26ac2438a787adc6

Fix LibriSpeech Conforner RNN-T eval script (#2666)

Summary:
`ConformerRNNTModule`'s initializer now accepts a SentencePiece model rather than a path to a model as input. This PR corrects `eval.py` accordingly.

Pull Request resolved: https://github.com/pytorch/audio/pull/2666

Reviewed By: carolineechen

Differential Revision: D39386968

Pulled By: hwangjeff

fbshipit-source-id: 95a94dd898263d648650f7376c29810b1456d6c1

[Nova] Remove the old caller GitHub Actions Linux wheels/conda Build Workflows (#2660)

Summary:
We moved over to a new design for release workflows that encompass all the build logic in the test-infra repo (apart from custom pre-build and post-build scripts). Thus, we no longer need these caller workflows in the audio repo. This PR removes them entirely.

Pull Request resolved: https://github.com/pytorch/audio/pull/2660

Reviewed By: seemethere

Differential Revision: D39392456

Pulled By: osalpekar

fbshipit-source-id: a8bdeb4738b91666abcdc883f6f8f1bf359f1d42

Move hybrid demucs model out of prototype (#2668)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2668

Reviewed By: nateanl, mthrok

Differential Revision: D39433671

Pulled By: carolineechen

fbshipit-source-id: 3545a5b4019832861c34fd8c05e5f8600fd80d5c

Do not use nested namespaces in torchaudio/sox (#2663)

Summary:
As it is a C++17 feature, and PyTorch and its extensions must still be C++14 compatible, as also specified in the top level CMakeLists.txt:
https://github.com/pytorch/audio/blob/8a0d7b36f7821fe55175f0d4e3ca6299b3817a6c/CMakeLists.txt#L30

Otherwise, it pollutes build logs with noisy
```
/Users/runner/work/test-infra/test-infra/pytorch/audio/torchaudio/csrc/sox/pybind/io.cpp:12:21: warning: nested namespace definition is a C++17 extension; define each namespace separately [-Wc++17-extensions]
namespace torchaudio::sox_io {
                    ^~~~~~~~
                     { namespace sox_io
```

Pull Request resolved: https://github.com/pytorch/audio/pull/2663

Reviewed By: atalman, nateanl

Differential Revision: D39362842

Pulled By: malfet

fbshipit-source-id: f9659d4420f1cc0194990d531455cf59b66c26b9

[Bootcamp] Fix Typo (#2661)

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2661

Fixed typo in `audio_data_augmentation_tutorial.py`

Reviewed By: malfet, mthrok

Differential Revision: D39352353

fbshipit-source-id: aea35dab03fb7422421948bd26716e10a8d65f92

Move SourceSeparationBundle and pre-trained ConvTasNet pipeline into Beta (#2669)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2669

Reviewed By: carolineechen, mthrok

Differential Revision: D39433560

Pulled By: nateanl

fbshipit-source-id: 5b652b31c00badb37b27a32ac25b422a5bcc74cb

CUDA 11.3 remove. New Stable version is 11.6 (#2670)

Summary:
CUDA 11.3 Removing.

Core PR: https://github.com/pytorch/pytorch/pull/84866
cc malfet ptrblck

Pull Request resolved: https://github.com/pytorch/audio/pull/2670

Reviewed By: malfet, osalpekar

Differential Revision: D39449263

Pulled By: atalman

fbshipit-source-id: f86bb119685ead3ffcabd92c4bb8076aecde4095

Move Hybrid Demucs pipeline to beta (#2673)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2673

Reviewed By: mthrok

Differential Revision: D39507612

Pulled By: carolineechen

fbshipit-source-id: 3a9ee53f72cabd6e3085c76867017be4a6ed7f53

Add Decoder LM Docs (#2658)

Summary:
modifications to ctc decoder LM docstrings on top of https://github.com/pytorch/audio/issues/2657

Pull Request resolved: https://github.com/pytorch/audio/pull/2658

Reviewed By: mthrok

Differential Revision: D39468921

Pulled By: carolineechen

fbshipit-source-id: c5497cc2fa22fb98a304d037e27c91bf68a9ad6a

Tweak badge link URL generation (#2677)

Summary:
Currently, the way feature badges are generated assumes that both documentations and the supported features page are on the same level from the root.

This does not work when we introduce `:autosummary:` which generates individual documentation pages one level below.

This commit changes it so that links to the supported features page are properly relative from the documentation level.

There is no appearance change from this commit.

Pull Request resolved: https://github.com/pytorch/audio/pull/2677

Reviewed By: carolineechen

Differential Revision: D39507451

Pulled By: mthrok

fbshipit-source-id: f18da4201f0eb747586be21c8bd9a958217aebc2

Move conv_tasnet_base doc out of prototype (#2675)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2675

Reviewed By: carolineechen

Differential Revision: D39515996

Pulled By: nateanl

fbshipit-source-id: 5824375f6a758af21b6ad6c635dd06081663644f

Consolidate bibliography / reference (#2676)

Summary:
Preparation for the adoptation of `autosummary`.

Replace `:footcite:` with `:cite:` and introduce dedicated reference page, as `:footcite:` does not work well with `autosummary`.

Example:

https://output.circle-artifacts.com/output/job/4da47ba6-d9c7-418e-b5b0-e9f8a146a6c3/artifacts/0/docs/datasets.html#cmuarctic

https://output.circle-artifacts.com/output/job/4da47ba6-d9c7-418e-b5b0-e9f8a146a6c3/artifacts/0/docs/references.html

Pull Request resolved: https://github.com/pytorch/audio/pull/2676

Reviewed By: carolineechen

Differential Revision: D39509431

Pulled By: mthrok

fbshipit-source-id: e6003dd01ec3eff3d598054690f61de8ee31ac9a

Update doc theme to the latest (#2679)

Summary:
To follow the change related to Linux Foundation movement.

(we are still pinning the theme version so that our customization does not break randomly.)

Pull Request resolved: https://github.com/pytorch/audio/pull/2679

Reviewed By: carolineechen

Differential Revision: D39531566

Pulled By: mthrok

fbshipit-source-id: 64353577d05f9dbda00dd9d10b9ebcedddfdce5b

Update Sphinx to 5.1.1 (#2678)

Summary:
Previous versions of Sphinx reported wrong path for return class. This issue is fixed on the latest Sphinx.

It allows to remove the patch we apply in `conf.py`. This is essential for the adoptation of `:autosummary:`, as it won't render correctly with the patch.

https://output.circle-artifacts.com/output/job/19d93ede-08de-4b9e-9d66-67ca5dab964e/artifacts/0/docs/pipelines.html

Pull Request resolved: https://github.com/pytorch/audio/pull/2678

Reviewed By: carolineechen

Differential Revision: D39509447

Pulled By: mthrok

fbshipit-source-id: e104bc6a87f32cba6c549a9fe8f2d1e489ee27e4

Switch to use conda install action for m1 builds (#2674)

Summary:
Usage setup-minicoda action for m1 build
We want to try to address space issues on m1. The following action:
```
pytorch/test-infra/.github/actions/setup-miniconda@main
```

Sets up miniconda in temp folder which should be cleaned between runs

Pull Request resolved: https://github.com/pytorch/audio/pull/2674

Reviewed By: jeanschmidt

Differential Revision: D39540481

Pulled By: atalman

fbshipit-source-id: 0596598ab6b2f99c775aa0c9e14a3a388533068d

Adopt `:autosummary:` in `torchaudio.io` module doc (#2681)

Summary:
This commit adopts :autosummary: directive to `torchaudio.io` module.
It adds table of contents on `torchaudio.io` level.

https://output.circle-artifacts.com/output/job/282089d1-c120-4d22-809f-0e0ac0947c37/artifacts/0/docs/io.html
<img width="1094" alt="Screen Shot 2022-09-16 at 7 33 32 AM" src="https://user-images.githubusercontent.com/855818/190520248-27e469f8-7689-4dc2-b591-7b3f08bb4dff.png">

https://output.circle-artifacts.com/output/job/282089d1-c120-4d22-809f-0e0ac0947c37/artifacts/0/docs/generated/torchaudio.io.StreamReader.html#torchaudio.io.StreamReader
<img width="1108" alt="Screen Shot 2022-09-16 at 7 33 59 AM" src="https://user-images.githubusercontent.com/855818/190520292-d090fed0-2f18-4961-b9f3-9e4808fd437e.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2681

Reviewed By: carolineechen

Differential Revision: D39560459

Pulled By: mthrok

fbshipit-source-id: 3de5f22b8d8d0834dfd8bac8619fbfaa44c5f4dd

Adopt `:autosummary:` in `torchaudio.models.decoder` module doc (#2684)

Summary:
* Adopts `:autosummary:` in decoder module doc
* Hide the constructor signature of `CTCDecoder` as `ctc_decoder` function is the one client code is supposed to be using.
* Introduce `children` property to `CTCDecoderLMState` otherwise it does not show up in the doc.

https://output.circle-artifacts.com/output/job/7aac5eb9-7d2d-4f63-bcdf-83a6f40b4e5a/artifacts/0/docs/models.decoder.html

<img width="748" alt="Screen Shot 2022-09-16 at 5 23 22 PM" src="https://user-images.githubusercontent.com/855818/190592409-0c2ec8a4-d2cf-4d76-a965-8a570faaeb1a.png">

https://output.circle-artifacts.com/output/job/7aac5eb9-7d2d-4f63-bcdf-83a6f40b4e5a/artifacts/0/docs/generated/torchaudio.models.decoder.CTCDecoder.html#torchaudio.models.decoder.CTCDecoder

<img width="723" alt="Screen Shot 2022-09-16 at 5 23 53 PM" src="https://user-images.githubusercontent.com/855818/190592501-3fad1e07-ae3e-44f5-93be-f33181025390.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2684

Reviewed By: carolineechen

Differential Revision: D39574272

Pulled By: mthrok

fbshipit-source-id: d977660bd46f5cf98c535adbf2735be896b28773

Adopt `:autosummary:` in `torchaudio.transforms` module doc (#2683)

Summary:
* Introduce the mini-index at `torchaudio.transforms` page.
* Add "Augmentations" subsection.
* Also updated the overall introduction.

https://output.circle-artifacts.com/output/job/1b65246a-403c-4d2c-b97d-d1b582d8b4e5/artifacts/0/docs/transforms.html

<img width="721" alt="Screen Shot 2022-09-16 at 5 20 08 PM" src="https://user-images.githubusercontent.com/855818/190591795-97c169db-a95b-480a-8d3c-d80072efa045.png">

<img width="755" alt="Screen Shot 2022-09-16 at 5 20 28 PM" src="https://user-images.githubusercontent.com/855818/190591828-03026918-febd-4194-91aa-7d8f704e17cc.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2683

Reviewed By: carolineechen

Differential Revision: D39574255

Pulled By: mthrok

fbshipit-source-id: a4beed7cacbb5184bad96efa903a3a1123dab627

[Nova] Remove Extraneous Build Scripts (#2695)

Summary:
There is a single pre/post script needed for building torchaudio. This PR:
1. Removes the old conda-specific build script
2. Renames the wheel script to be a general name

Pull Request resolved: https://github.com/pytorch/audio/pull/2695

Reviewed By: kit1980

Differential Revision: D39631971

Pulled By: osalpekar

fbshipit-source-id: 52b49a6e792536b6264228c01ac356d247b18ea8

Update nightly wheels to ROCm5.2 (#2672)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2672

Reviewed By: atalman

Differential Revision: D39468320

Pulled By: mthrok

fbshipit-source-id: 0e7bd4fd922ba0db51700e140b95328a5b687a6f

Adopt `:autosummary:` in `torchaudio.functional` module doc (#2693)

Summary:
https://output.circle-artifacts.com/output/job/b23174d2-5cee-4ee9-be39-3228b9ae4abe/artifacts/0/docs/functional.html

<img width="1133" alt="Screen Shot 2022-09-20 at 11 19 23 AM" src="https://user-images.githubusercontent.com/855818/191152824-96c5b16c-bd38-4656-b1ae-0b58699dbd62.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2693

Reviewed By: carolineechen

Differential Revision: D39650930

Pulled By: mthrok

fbshipit-source-id: 28b5b03d21b922e37e02bfddda2bf1dea696cc18

Add Speech Commands metadata function (#2687)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2687

Reviewed By: mthrok

Differential Revision: D39647596

Pulled By: carolineechen

fbshipit-source-id: 8ff874fc1e828130f6754e83ce1f702ca13dfac0

Adopt `:autosummary:` in `torchaudio.models` module doc (#2690)

Summary:
* Introduce the mini-index at `torchaudio.models` page.

https://output.circle-artifacts.com/output/job/25e59810-3866-4ece-b1b7-8a10c7a2286d/artifacts/0/docs/models.html

<img width="1042" alt="Screen Shot 2022-09-20 at 1 20 50 PM" src="https://user-images.githubusercontent.com/855818/191166816-83314ad1-8b67-475b-aa10-d4cc59126295.png">

<img width="1048" alt="Screen Shot 2022-09-20 at 1 20 58 PM" src="https://user-images.githubusercontent.com/855818/191166829-1ceb65e0-9506-4328-9a2f-8b75b4e54404.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2690

Reviewed By: carolineechen

Differential Revision: D39654948

Pulled By: mthrok

fbshipit-source-id: 703d1526617596f647c85a7148f41ca55fffdbc8

Support in-memory decoding via Tensor wrapper in StreamReader (#2694)

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2694

This commit adds Tensor type as input to `StreamReader`.
The Tensor is interpreted as byte string buffer.

Reviewed By: hwangjeff

Differential Revision: D39467630

fbshipit-source-id: 6369eed5e16fbb657568bf6bb80d703483d72f8e

Add StreamReader Tensor Binding to src (#2699)

Summary:
In https://github.com/pytorch/audio/issues/2694 CMakeLists.txt was not properly updated, so the tests are failing. This commit fix it.

Pull Request resolved: https://github.com/pytorch/audio/pull/2699

Reviewed By: carolineechen

Differential Revision: D39687409

Pulled By: mthrok

fbshipit-source-id: 2e14f3c478f1f8a112a03839f2dbcca51215fed7

Adopt `:autosummary:` in `torchaudio.pipelines` module doc (#2689)

Summary:
* Introduce the mini-index at `torchaudio.pipelines` page.
* Add introductions
* Update pipeline tutorials

https://output.circle-artifacts.com/output/job/ccc57d95-1930-45c9-b967-c8d477d35f29/artifacts/0/docs/pipelines.html

<img width="1163" alt="Screen Shot 2022-09-20 at 1 23 29 PM" src="https://user-images.githubusercontent.com/855818/191167049-98324e93-2e16-41db-8538-3b5b54eb8224.png">

<img width="1115" alt="Screen Shot 2022-09-20 at 1 23 49 PM" src="https://user-images.githubusercontent.com/855818/191167071-4770f594-2540-43a4-a01c-e983bf59220f.png">

https://output.circle-artifacts.com/output/job/ccc57d95-1930-45c9-b967-c8d477d35f29/artifacts/0/docs/generated/torchaudio.pipelines.RNNTBundle.html#torchaudio.pipelines.RNNTBundle

<img width="1108" alt="Screen Shot 2022-09-20 at 1 24 18 PM" src="https://user-images.githubusercontent.com/855818/191167123-51b33a5f-c30c-46bc-b002-b05d2d0d27b7.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2689

Reviewed By: carolineechen

Differential Revision: D39691253

Pulled By: mthrok

fbshipit-source-id: ddf5fdadb0b64cf2867b6271ba53e8e8c0fa7e49

Add metadata mode for various datasets (#2697)

Summary:
Add metadata mode for the following SUPERB benchmark datasets
- QUESST14
- Fluent Speech Commands
- VoxCeleb1

follow ups:
- Add metadata mode for LibriMix -- waiting for unit tests to merge
- Add IEMOCAP + SNIPS datasets

Pull Request resolved: https://github.com/pytorch/audio/pull/2697

Reviewed By: mthrok

Differential Revision: D39666809

Pulled By: carolineechen

fbshipit-source-id: 3a8f07627acceed70f960f47e694efad75b108c2

Update and fix tutorials (#2701)

Summary:
* Fix Sphinx warning
* Update asset management

Pull Request resolved: https://github.com/pytorch/audio/pull/2701

Reviewed By: carolineechen

Differential Revision: D39714126

Pulled By: mthrok

fbshipit-source-id: a5b04cfbf8bedce67c621b6bfe1dcd975b343313

Adopt `:autosummary:` in `torchaudio.datasets` module doc (#2692)

Summary:
* Introduce the mini-index at `torchaudio.datasets` page.
* Standardize the format of return type docstring.

https://output.circle-artifacts.com/output/job/989328b2-0270-4958-b577-19cf749af3fd/artifacts/0/docs/datasets.html

<img width="936" alt="Screen Shot 2022-09-21 at 6 56 52 PM" src="https://user-images.githubusercontent.com/855818/191475141-a97f2bea-705f-49bc-8c34-6ec869e76793.png">

https://output.circle-artifacts.com/output/job/989328b2-0270-4958-b577-19cf749af3fd/artifacts/0/docs/generated/torchaudio.datasets.CMUDict.html#torchaudio.datasets.CMUDict

<img width="1069" alt="Screen Shot 2022-09-21 at 6 57 32 PM" src="https://user-images.githubusercontent.com/855818/191475293-e3302528-27ea-4212-9c12-fd6d900fdf3e.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2692

Reviewed By: carolineechen

Differential Revision: D39687463

Pulled By: mthrok

fbshipit-source-id: 4175fc15388817d2fe76206188618dd1576281df

Introduce IO section to getting started tutorials (#2703)

Summary:
Since that new tutorials for StreamWriter are being added, there are more tutorials for media IO than the rest.
So this commit introduces sub-index for IO tutorials.

Pull Request resolved: https://github.com/pytorch/audio/pull/2703

Reviewed By: carolineechen

Differential Revision: D39769049

Pulled By: mthrok

fbshipit-source-id: 19a3981bc624fdce1d5d703c67e28a751a15e812

[Nova] Moving Linux Wheels over to Nova (#2702)

Summary:
This does 2 things:

Comments out Linux Wheels-related jobs in CircleCI so that they are not run on nightlies/releases.
Adds a GHA workflow that calls the build workflow in pytorch/test-infra.
Testing:
Verified that the builds are triggered by this workflow, and all builds are green: https://github.com/pytorch/audio/actions/runs/3109635749/jobs/5040029155

Pull Request resolved: https://github.com/pytorch/audio/pull/2702

Reviewed By: seemethere

Differential Revision: D39756852

Pulled By: osalpekar

fbshipit-source-id: 7e222d80ca0720e3be43b929f1e55f5c0166b947

[perf][5/5] Replace IValue::toString()->string() with IValue::toStringRef() (#2700)

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2700

ATT for pytorch/audio

Reviewed By: mthrok

Differential Revision: D39707243

fbshipit-source-id: 1dc2a5a9fe913a9071e6df679e39d632b75212fb

Add CUDA version check (#2707)

Summary:
Adds check to ensure that TorchAudio and PyTorch versions use the same CUDA version.

Pull Request resolved: https://github.com/pytorch/audio/pull/2707

Reviewed By: mthrok

Differential Revision: D39791154

Pulled By: hwangjeff

fbshipit-source-id: de00889c7bac897c6b8762502f9d37797016b71d

Fix CUDA check (#2710)

Summary:
`torch.version.cuda` can return a string of form X.X or X.X.X. This PR modifies the CUDA version check to account for this.

Pull Request resolved: https://github.com/pytorch/audio/pull/2710

Reviewed By: carolineechen, nateanl

Differential Revision: D39796810

Pulled By: hwangjeff

fbshipit-source-id: b483bd8200195844d65d0caddebaf1b10f939b64

Remove linux wheel from circleci (#2714)

Summary:
Remove linux wheel from circleci

Pull Request resolved: https://github.com/pytorch/audio/pull/2714

Reviewed By: weiwangmeta

Differential Revision: D39816121

Pulled By: atalman

fbshipit-source-id: a3c99b530896888d7b4271d8b3f27f3c986b3480

Fix windows tests related to old conda on circleci (#2704)

Summary:
Conda version on circleCI prints following message:
```
==> WARNING: A newer version of conda exists. <==
  current version: 4.6.14
  latest version: 4.14.0
```
and as a result this error:

```
+ /c/tools/miniconda3/Scripts/conda.exe install -v -y -c pytorch-nightly -c nvidia pytorch numpy ffmpeg pytorch-cuda=11.6
Collecting package metadata: ...working... done
Solving environment: ...working...

Too long with no output (exceeded 30m0s): context deadline exceeded
```

This should update the conda version running on the system and allow us to install pytorch and run some tests.

Pull Request resolved: https://github.com/pytorch/audio/pull/2704

Reviewed By: weiwangmeta

Differential Revision: D39820037

Pulled By: atalman

fbshipit-source-id: 4a82a7a6cbe3dc1a5807ac669e2fa79f454037fa

[Nova] Add build-type argument for when upload should be triggered (#2706)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2706

Reviewed By: kit1980

Differential Revision: D39786253

Pulled By: osalpekar

fbshipit-source-id: 2a0c427f57e5c70ff1cf419b7e0c2316e5f0e16c

Back out "[audio][PR] [Nova] Moving Linux Wheels over to Nova" (#2718)

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2718

Original commit changeset: 7e222d80ca07

Original Phabricator Diff: D39756852 (https://github.com/pytorch/audio/commit/7ba7cf4d24a2967b8fa4aaff437116524281f8fd)

Reviewed By: weiwangmeta

Differential Revision: D39839899

fbshipit-source-id: f5605eb9882f7c7f0008e88338ab711131b29404

Fix mismatched cuda version in smoke tests on windows wheels (#2721)

Summary:
Example job that was failing previously:
https://app.circleci.com/pipelines/github/pytorch/audio/12796/workflows/ae96794a-6df4-4a2a-84df-ada7a7250045/jobs/927709

The failure:
```
"Detected that PyTorch and TorchAudio were compiled with different CUDA versions. "
RuntimeError: Detected that PyTorch and TorchAudio were compiled with different CUDA versions. PyTorch has CUDA version 11.7 whereas TorchAudio has CUDA version 11.6. Please install the TorchAudio version that matches your PyTorch version.
```

Has install command:
```
pip install $(ls ~/workspace/torchaudio*.whl) -f "https://download.pytorch.org/whl/${UPLOAD_CHANNEL}/torch_${UPLOAD_CHANNEL}.html"

pip install /c/Users/circleci/workspace/torchaudio-0.13.0.dev20220927+cu116-cp37-cp37m-win_amd64.whl -f https://download.pytorch.org/whl/nightly/torch_nightly.html
```

Linux job (succeeds) for uses different "-f" (find links) url, that includes specific cuda version:
https://app.circleci.com/pipelines/github/pytorch/audio/12809/workflows/aadca2ab-5a00-4a0a-ab6a-4a1b7a503713/jobs/927861

Command:
```
pip install $(ls ~/workspace/torchaudio*.whl) -f "https://download.pytorch.org/whl/${UPLOAD_CHANNEL}/${CU_VERSION}/torch_${UPLOAD_CHANNEL}.html"

 pip install /root/workspace/torchaudio-0.13.0.dev20220927+cu116-cp37-cp37m-linux_x86_64.whl -f https://download.pytorch.org/whl/nightly/cu116/torch_nightly.html

```

This PR makes Windows installation match the linux one.

Testing:
* verified command manually on Circle CI:
```
>>> import torch
>>> import torchaudio
C:\tools\miniconda3\lib\site-packages\torchaudio\compliance\kaldi.py:22: UserWarning: Failed to initialize NumPy: numpy.core.multiarray failed to import (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\utils\tensor_numpy.cpp:77.)
  EPSILON = torch.tensor(torch.finfo(torch.float).eps)
C:\tools\miniconda3\lib\site-packages\torchaudio\backend\utils.py:62: UserWarning: No audio backend is available.
  warnings.warn("No audio backend is available.")
```

Co-authered: weiwangmeta

Pull Request resolved: https://github.com/pytorch/audio/pull/2721

Reviewed By: hwangjeff

Differential Revision: D39870805

Pulled By: izaitsevfb

fbshipit-source-id: 2957cba4f53d00783a5c07099f24050ce15e7d1c

Removing cuda102 (#2715)

Summary:
Removing cuda102

Pull Request resolved: https://github.com/pytorch/audio/pull/2715

Reviewed By: hwangjeff

Differential Revision: D39823444

Pulled By: atalman

fbshipit-source-id: c11d798ab86cf9a6d5ed3804958b4a0c2f8a87ff

Revert "Removing cuda102 (#2715)" (#2723)

Summary:
Revert this fot now untill docker is updated

Pull Request resolved: https://github.com/pytorch/audio/pull/2723

Reviewed By: nateanl

Differential Revision: D39900382

Pulled By: atalman

fbshipit-source-id: f8701e359bc11e8f9f3a29144f7e7da336a470da

Cuda 102 deprecation (#2724)

Summary:
Cuda 10.2 deprecation, migration of unit tests from cuda 10.2 to cuda 11.6

Pull Request resolved: https://github.com/pytorch/audio/pull/2724

Reviewed By: weiwangmeta

Differential Revision: D39912484

Pulled By: atalman

fbshipit-source-id: e760b630375eae94384cda68d24f83ef46ada6d9

Delete packaging/README.md (#2730)

Summary:
The file looks hopelessly outdated.

Pull Request resolved: https://github.com/pytorch/audio/pull/2730

Reviewed By: mthrok

Differential Revision: D39993805

Pulled By: kit1980

fbshipit-source-id: f5ad97c83873061175455cc7b129ec71a9ec3d7d

Add citation for MuST-C dataset in Emformer RNNT pipeline. (#2728)

Summary:
The MuST-C reference is added in https://github.com/pytorch/audio/pull/2689. This PR adds the citation to the RNNT pipeline documentation.

Pull Request resolved: https://github.com/pytorch/audio/pull/2728

Reviewed By: carolineechen

Differential Revision: D39990882

Pulled By: nateanl

fbshipit-source-id: 011057952dd8aa30a4cb7c7af0ac75123e329d7e

Adopt :autosummary: to multiple modules (#2664)

Summary:
Adopt `:autosummary:` to various modules

    * torchaudio.compliance.kaldi
    * torchaudio.sox_effects
    * torchaudio.utils

Pull Request resolved: https://github.com/pytorch/audio/pull/2664

Reviewed By: nateanl

Differential Revision: D39841873

Pulled By: mthrok

fbshipit-source-id: ff4fa6976324fca5f35b737b715f976e2a722bac

Add StreamWriter media device/streaming tutorial (#2708)

Summary:
https://output.circle-artifacts.com/output/job/213c71c8-c9b5-4516-af92-a2f8dab2c9fd/artifacts/0/docs/tutorials/streamwriter_advanced.html

Pull Request resolved: https://github.com/pytorch/audio/pull/2708

Reviewed By: carolineechen

Differential Revision: D40013310

Pulled By: mthrok

fbshipit-source-id: 7226b021ce2fe951b3bf0bd41e93a6bbcf696124

Tweak tutorials (#2733)

Summary:
* Port downstream change https://github.com/pytorch/tutorials/pull/2060
* Fix inter-tutorial links and references

Pull Request resolved: https://github.com/pytorch/audio/pull/2733

Reviewed By: hwangjeff

Differential Revision: D40086902

Pulled By: hwangjeff

fbshipit-source-id: 00b04c6a1b68fb9fadd52b610b26ecaab15d52d8

Increase CircleCi no_output_timeout for `install binaries` steps (#2734)

Summary:
The goal is to to reduce the number of job failures due to timeouts, see https://app.circleci.com/pipelines/github/pytorch/audio/12882/workflows/f99da1a5-32e6-4bac-8ceb-fbf36d693e2d/jobs/936363?invite=true#step-105-105 for example.

Pull Request resolved: https://github.com/pytorch/audio/pull/2734

Reviewed By: weiwangmeta, atalman

Differential Revision: D40077578

fbshipit-source-id: 573f43a4d088a7086fa6925ac5ba1fdd1e8f39ec

Torchaudio load libary path fix for windows python 3.8 (#2735)

Summary:
Torchaudio load libary path fix for windows and python = 3.8

Fixes: https://github.com/pytorch/audio/issues/2726

Fixes following issue:

```
>>> import torchaudio
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\atalman\miniconda3\envs\mywin38\lib\site-packages\torchaudio\__init__.py", line 1, in <module>
    from torchaudio import (  # noqa: F401
  File "C:\Users\atalman\miniconda3\envs\mywin38\lib\site-packages\torchaudio\_extension.py", line 128, in <module>
    _init_extension()
  File "C:\Users\atalman\miniconda3\envs\mywin38\lib\site-packages\torchaudio\_extension.py", line 98, in _init_extension
    _load_lib("libtorchaudio")
  File "C:\Users\atalman\miniconda3\envs\mywin38\lib\site-packages\torchaudio\_extension.py", line 52, in _load_lib
    torch.ops.load_library(path)
  File "C:\Users\atalman\miniconda3\envs\mywin38\lib\site-packages\torch\_ops.py", line 573, in load_library
    ctypes.CDLL(path)
  File "C:\Users\atalman\miniconda3\envs\mywin38\lib\ctypes\__init__.py", line 373, in __init__
    self._handle = _dlopen(self._name, mode)
FileNotFoundError: Could not find module 'C:\Users\atalman\miniconda3\envs\mywin38\Lib\site-packages\torchaudio\lib\libtorchaudio.pyd' (or one of its dependencies). Try using the full path with constructor syntax.
>>>
```

Caused by dlls not being found in the conda environment
```
C:\Users\atalman\miniconda3\envs\mywin38\bin\
```

While this environment is set correctly in PATH its ignored with Python = 3.8
Please refer to: https://stackoverflow.com/questions/59330863/cant-import-dll-module-in-python

Pull Request resolved: https://github.com/pytorch/audio/pull/2735

Reviewed By: carolineechen

Differential Revision: D40112293

Pulled By: carolineechen

fbshipit-source-id: c7fc9bb49fc3ec4a2855c6ea473f36808103ed1e

Add StreamWriter tutorial (#2698)

Summary:
Add a tutorial for basic usage of torchaudio.io.StreamWriter.

https://output.circle-artifacts.com/output/job/55d9a495-af7a-483c-84cb-de9a08cfd2f3/artifacts/0/docs/tutorials/streamwriter_basic_tutorial.html

Pull Request resolved: https://github.com/pytorch/audio/pull/2698

Reviewed By: carolineechen

Differential Revision: D40133007

Pulled By: carolineechen

fbshipit-source-id: 141f692c32343981bfb228357f21562ffe36f623

Fix sphinx gallery list in io doc (#2736)

Summary:
Specifying multiple object in `:minigallery:` directive shows duplicated tutorials.

This commit fixes it by listing tutorials based on module used.

https://output.circle-artifacts.com/output/job/c3da2a22-40d5-4e2d-b73a-28b39e712817/artifacts/0/docs/io.html

Before:
<img width="694" alt="Screen Shot 2022-10-07 at 7 04 35 AM" src="https://user-images.githubusercontent.com/855818/194427092-ca1202e7-0731-4c18-b48b-24923d692a4a.png">

After:

<img width="648" alt="Screen Shot 2022-10-07 at 7 03 14 AM" src="https://user-images.githubusercontent.com/855818/194426950-5b780458-2bf0-43ef-b020-fcbbfdf8d41b.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2736

Reviewed By: carolineechen

Differential Revision: D40160247

Pulled By: carolineechen

fbshipit-source-id: 547496f9b569ff7a4d70db97e90f3ea503344477

Modify `info_audio` to compute and return number of frames if not found in stream info (#2740)

Summary:
Modifies `info_audio` to compute and return number of frames if not found in stream info. This resolves the `num_frames == 0` issue for mp3 that's cited in https://github.com/pytorch/audio/issues/2524.

Pull Request resolved: https://github.com/pytorch/audio/pull/2740

Reviewed By: nateanl

Differential Revision: D40168639

Pulled By: nateanl

fbshipit-source-id: bb45baa0f9cd56844315b04e40ab9835d825fc24

Update sox info docstring to account for mp3 frame count handling (#2742)

Summary:
Updates sox info docstring to account for mp3 frame count handling fix introduced in https://github.com/pytorch/audio/issues/2740.

Pull Request resolved: https://github.com/pytorch/audio/pull/2742

Reviewed By: nateanl

Differential Revision: D40189846

Pulled By: nateanl

fbshipit-source-id: d6371418d7d4867dd0b97ee72ebf846d5c93dc30

Update HW video processing tutorial (#2739)

Summary:
* Add HW encoding to HW tutorial

https://colab.research.google.com/drive/1DDah_IaGULEO66CfQWltRqaVheBkiXdN#scrollTo=eXzKSVrHk1vS

Pull Request resolved: https://github.com/pytorch/audio/pull/2739

Reviewed By: hwangjeff

Differential Revision: D40197086

Pulled By: hwangjeff

fbshipit-source-id: 1780a5419f6705f7c24ba96bd46c3310438af7db

Add IEMOCAP dataset (#2732)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2732

Reviewed By: nateanl

Differential Revision: D40186996

Pulled By: nateanl

fbshipit-source-id: a0ad325b7153c9e580dad2c515730dadbe8840c4

Fix HuBERT docstring (#2746)

Summary:
The docstring of `wav2vec2` argument is wrong. Fix it in this PR.

Pull Request resolved: https://github.com/pytorch/audio/pull/2746

Reviewed By: carolineechen

Differential Revision: D40225995

Pulled By: nateanl

fbshipit-source-id: 770e9c928ebebd7b6307e181601eb64625d668da

Add unit test for LibriMix dataset (#2659)

Summary:
Besides the unit test, the PR also addresses these issues:
- The original `LibriMix` dataset only supports "min" mode, which means the audio length is the minimum of all clean sources. It is default for source separation task. Users may also want to use "max" mode which allows for end-to-end separation and recognition. The PR adds ``mode`` argument to let users decide which dataset they want to use.
- If the task is ``"enh_both"``, the target is the audios in ``mix_clean`` instead of separate clean sources. The PR fixes it to use ``mix_clean`` as target.

Pull Request resolved: https://github.com/pytorch/audio/pull/2659

Reviewed By: carolineechen

Differential Revision: D40229227

Pulled By: nateanl

fbshipit-source-id: fc07e0d88a245e1367656d3767cf98168a799235

Add Snips Dataset (#2738)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2738

Reviewed By: carolineechen

Differential Revision: D40238099

Pulled By: nateanl

fbshipit-source-id: c5cc94c2a348a6ef34c04b8dd26114ecb874d73e

Fix windows python 3.8 loading path (#2747)

Summary:
Fix windows python 3.8 loading path

Pull Request resolved: https://github.com/pytorch/audio/pull/2747

Reviewed By: nateanl

Differential Revision: D40264326

Pulled By: nateanl

fbshipit-source-id: f4a24757de7b48c63a7481034eb11fc3ff174327

Add metadata for Librimix (#2751)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2751

Reviewed By: nateanl

Differential Revision: D40267874

Pulled By: carolineechen

fbshipit-source-id: 4e45a02c650ed65c05cde82289a400a3be877927

Increase inactivity timeout for binary build jobs (#2754)

Summary:
Increase inactivity timeout for binary build jobs

Pull Request resolved: https://github.com/pytorch/audio/pull/2754

Reviewed By: carolineechen

Differential Revision: D40275368

Pulled By: atalman

fbshipit-source-id: 5e682bb78bda640d615f874fbdf0e650b5a38ee0

Skip hubert xlarge torchscript test (#2758)

Summary:
a couple of circleci unittests are failing during hubert xlarge torchscript test, which has been known to fail on Windows in the past (#65776). this PR disables this test on circleci

cc atalman

Pull Request resolved: https://github.com/pytorch/audio/pull/2758

Reviewed By: mthrok

Differential Revision: D40290535

Pulled By: carolineechen

fbshipit-source-id: 5c5fb43434a517b6c439a8cb8e853015d1550a57

Improve wav2vec2/hubert model for pre-training (#2716)

Summary:
This PR improves the Wav2Vec2/HuBERT model regarding model pre-training.

- The model initialization of positional embedding and transformer module is essential to model pre-training. The accuracy of unmasked frames should be higher than masked frames, as it is an easier task. but without the initialization, the accuracy of masked frames is higher than unmasked frames.
  Compared the performance after two epochs with 16 GPUs.
  - With model initialization, the accuracies of masked/unmasked frames are 0.08/0.11.
  - Without model initialization, the accuracies of masked/unmasked frames are 0.06/0.04.
- After adding the model initialization, the gradient is easy to overflow (aka `nan` gradient). In paper [Self-Supervised Learning for speech recognition with Intermediate layer supervision](https://arxiv.org/abs/2112.08778) the authors propose a simple but effective method to mitigate the overflow issue, by scaling down the multiplication of query and key and subtracting the maximum value from it (subtracting a constant value won't change the output of softmax). Then it guarantees the value won't be overflowed.
- In the original fairseq, the mask indices are generated by `numpy.random.choice`. Here replace `torch.multinomial` with `torch.randperm`. (cc carolineechen).

Other improvements within training scripts will be included in a separate PR.

Pull Request resolved: https://github.com/pytorch/audio/pull/2716

Reviewed By: xiaohui-zhang

Differential Revision: D39832189

Pulled By: nateanl

fbshipit-source-id: f4d2a473a79ad63add2dd16624bd155d5ce4de27

Improve hubert recipe for pre-training and fine-tuning (#2744)

Summary:
following pr https://github.com/pytorch/audio/issues/2716
- For preprocessing
  - The HuBERT feature takes lots of memory which may not fit some machines. Enable to use a subset of feature for training a k-means model.

- For pre-training
  - Normalize the loss based on the total number of masked frames across all GPUs.
  - Use mixed precision training. fp16 is not well supported in pytorch_lightning.
  - Log accuracies of masked/unmasked frames during training.
  - Clip the gradients with norm `10.0`.

- For ASR fine-tuning
  - Normalize the loss based on the total number of batches across all GPUs, same as in the conformer recipe of TorchAudio.
  - Use mixed precision training.
  - Add "|" after the end of transcription to capture the silence/word termination, same as in fairseq recipe.

- Update the WER results on LibriSpeech dev and test sets.

|                   | WER% (Viterbi)|  WER% (KenLM) |
|:-----------------:|--------------:|--------------:|
| dev-clean         |       10.9    |       4.2     |
| dev-other         |       17.5    |       9.4     |
| test-clean        |       10.9    |       4.4     |
| test-other        |       17.8    |       9.5     |

Pull Request resolved: https://github.com/pytorch/audio/pull/2744

Reviewed By: carolineechen

Differential Revision: D40282322

Pulled By: nateanl

fbshipit-source-id: 4723584c912e70e8970149fe09de005385eaab90

Fix typos in tacotron2 tutorial (#2761)

Summary:
`publishe`->`published`

Also, not sure if it should be `pre-trained weight is published` or `pre-trained weights are published`

Pull Request resolved: https://github.com/pytorch/audio/pull/2761

Reviewed By: carolineechen

Differential Revision: D40313042

Pulled By: malfet

fbshipit-source-id: c22085ca0b1125a06aa04bf38231d0a9fbfed00b

Add gtzan download note (#2763)

Summary:
GTZAN download link is no longer working, so the torchaudio download functionality for GTZAN does not work properly, per https://github.com/pytorch/audio/issues/2743. Add a note in the docs to reflect this discovery.

Pull Request resolved: https://github.com/pytorch/audio/pull/2763

Reviewed By: nateanl, mthrok

Differential Revision: D40315071

Pulled By: carolineechen

fbshipit-source-id: 3250326c45d227546a9c62b33ba890199ad19242

Update tutorial author information (#2764)

Summary:
Adding and updating author information.

Pull Request resolved: https://github.com/pytorch/audio/pull/2764

Reviewed By: carolineechen

Differential Revision: D40332427

Pulled By: mthrok

fbshipit-source-id: 4f04c7351386c122e3b0a45c2ed1757a04b7dc9a

Add custom lm example to decoder tutorial (#2762)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2762

Reviewed By: mthrok

Differential Revision: D40332603

Pulled By: carolineechen

fbshipit-source-id: 2de51265adc81b4728f4d6798d287bd2eccf5251

Fix CTCDecoder doc (#2766)

Summary:
* Document `__call__` instead of `__init__`
* List CTCHypothesis first as it is used in combination with CTCDecoder
* Fix indentation of score method docstring

Pull Request resolved: https://github.com/pytorch/audio/pull/2766

Reviewed By: carolineechen

Differential Revision: D40349388

Pulled By: mthrok

fbshipit-source-id: 5e512e6c2b29d3533eb62d09b289154ccd1abf4c

Fix fading in hybrid demucs tutorial (#2769)

Summary:
The separation applies on chunks of audios to avoid OOM. The combination of consecutive chunks is described in the graph:

![image](https://user-images.githubusercontent.com/8653221/195691886-002844e6-4ec5-41de-8910-df8046553998.png)

In the last audio chunk, there is no future chunk to be combined, hence the overlap on the right side doesn't need to be faded.

Pull Request resolved: https://github.com/pytorch/audio/pull/2769

Reviewed By: carolineechen

Differential Revision: D40358382

Pulled By: nateanl

fbshipit-source-id: ec8be895d7a67acb257e2693b64922397163ed5e

Fix leaking matplotlib figure (#2771)

Summary:
In StreamWriter basic usage tutorial, matplotlib is used to generate raster images of waveforms, and the figure used is left unshown in the resulting tutorial with the use of ``sphinx_gallery_defer_figures`` command.

It turned out that this figure is shown in the next code block executed by Sphinx Gallery, and the figure is placed in totally unrelated place. https://pytorch.org/audio/main/tutorials/audio_feature_extractions_tutorial.html

<img width="951" alt="Screen Shot 2022-10-14 at 10 06 58 PM" src="https://user-images.githubusercontent.com/855818/195855124-ecd9be49-5085-4acd-9a93-608d9d1ee9ce.png">

This commit fixes it by closing the figure.

Pull Request resolved: https://github.com/pytorch/audio/pull/2771

Reviewed By: nateanl

Differential Revision: D40382076

Pulled By: mthrok

fbshipit-source-id: 015f2bab8492d3b4fbe70e1174c7776a5aa2679a

Update resampling tutorial (#2773)

Summary:
* Refactor benchmark script
* Rename `time` variable to avoid (potential) conflicting with time module
* Fix `beta` parameter in benchmark (it was not used previously)
* Use `timeit` module for benchmark
* Add plot
* Move the comment on result at the end
* Add link to an explanation of aliasing

https://output.circle-artifacts.com/output/job/20b57d2f-3614-4161-a18e-e0c1a537739c/artifacts/0/docs/tutorials/audio_resampling_tutorial.html

Pull Request resolved: https://github.com/pytorch/audio/pull/2773

Reviewed By: carolineechen

Differential Revision: D40421337

Pulled By: mthrok

fbshipit-source-id: b402f84d4517695daeca75fb84ad876ef9354b3a

Update description of HDemucs pipelines (#2774)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2774

Reviewed By: carolineechen

Differential Revision: D40445274

Pulled By: nateanl

fbshipit-source-id: 6388323a5fa5c548a86829cb3f7cafee5382d18d

Add file_name to the returned item in Snips dataset (#2775)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2775

Reviewed By: carolineechen

Differential Revision: D40481144

Pulled By: nateanl

fbshipit-source-id: 5d0fb2478767704603a3ec28d74160e7892d4d0e

Update download path for speechcommands (#2777)

Summary:
previous download link for v0.02 did not download the entire dataset, but only the training dataset, resulting in issues when trying to access the testing or validation data.

Pull Request resolved: https://github.com/pytorch/audio/pull/2777

Reviewed By: nateanl

Differential Revision: D40480605

Pulled By: carolineechen

fbshipit-source-id: a594506b4ccfb548a7d5043b716c58463480c103

Add notes on file structure in Voxceleb1 based datasets (#2776)

Summary:
The file structure of VoxCeleb1 is as follows:
```
root/
└── wav/
    └── speaker_id folders
```
Users who use [Kaldi](https://github.com/kaldi-asr/kaldi/blob/f6f4ccaf213f0fe8b26e633a7dc0c802150626a0/egs/voxceleb/v1/local/make_voxceleb1_v2.pl) to get the VoxCeleb1 dataset have "dev" and "test" folders above "wav" folder. However, in the file lists like https://www.robots.ox.ac.uk/~vgg/data/voxceleb/meta/veri_test.txt or https://www.robots.ox.ac.uk/~vgg/data/voxceleb/meta/iden_split.txt there is not such differentiation. It's not necessary to put the extracted files into separate folders.

This PR adds notes in `VoxCeleb1Identification` and `VoxCeleb1Verification` datasets to inform the file structure to users.

Pull Request resolved: https://github.com/pytorch/audio/pull/2776

Reviewed By: carolineechen

Differential Revision: D40483707

Pulled By: nateanl

fbshipit-source-id: ccd1780a72a5b53f0300c2466c3073a293ad7b8d

[Nova] New GHA Workflow for Docstring Sync (#2720)

Summary:
Create a standalone GitHub Actions workflow for Docstring Sync. This job (https://app.circleci.com/pipelines/github/pytorch/audio/12625/workflows/96223ad2-0fcd-4dae-a045-d530aaf9b55c/jobs/907466) currently depends on linux wheels builds, which creates a dependency that makes the migration to Nova trickier. This PR creates a fresh standalone workflow for this job that is triggered per-PR and before nightly/release cuts.

Pull Request resolved: https://github.com/pytorch/audio/pull/2720

Reviewed By: izaitsevfb, seemethere

Differential Revision: D39863574

Pulled By: osalpekar

fbshipit-source-id: 8599dc006693242278857a3dedeb4fddc1eed14b

[Nova] Clean commit for Enabling Nova Linux Wheels Workflows (#2719)

Summary:
Creating this fresh PR since we're reverting the older commit that removed build configs from the CircleCI file. This does not change the existing builds/uploads in CircleCI, and should not break any existing jobs/workflows. This is just to add back workflows to build the Linux Wheels with Nova, upload them to GH artifacts (NOT to the actual nightly channels), and ensure that they produce the same binaries as CircleCI. TO CLARIFY: this does not upload anything to nightly channels, so this PR has not effect on any existing jobs or distributed binaries.

We will create a workflow (most likely in test-infra) that does this comparison between the binaries to ensure there is parity between the binaries before we start uploading with Nova.

Pull Request resolved: https://github.com/pytorch/audio/pull/2719

Reviewed By: hwangjeff, weiwangmeta

Differential Revision: D39866440

Pulled By: osalpekar

fbshipit-source-id: 9ebf0402214fcd97cc519801276d85d336617410

Add iemocap variants (#2778)

Summary:
add ability to load only improvised or only scripted utterances.

Pull Request resolved: https://github.com/pytorch/audio/pull/2778

Reviewed By: nateanl

Differential Revision: D40511865

Pulled By: carolineechen

fbshipit-source-id: e1fe3908ac2aa306ad30c242ddd25762b2268539

Bump version to 0.14 (#2779)

Summary:
Bump version to 0.14

Pull Request resolved: https://github.com/pytorch/audio/pull/2779

Reviewed By: carolineechen

Differential Revision: D40523034

Pulled By: atalman

fbshipit-source-id: 325e6ffcac4763a7d83ba600c2c3d9eadae03c31

Fix doc in torchaudio.backend (#2781)

Summary:
address https://github.com/pytorch/audio/issues/2780

Pull Request resolved: https://github.com/pytorch/audio/pull/2781

Reviewed By: carolineechen, mthrok

Differential Revision: D40556794

Pulled By: nateanl

fbshipit-source-id: b24912489d41e5663b4b4dcfb8be743fb962097e

Remove archive file in gh-pages branch (#2786)

Summary:
The motivation of generating `artifact.tar.gz` in the `build_docs` job is to easily use it for adding documentation in each stable release. But it is committed into `gh-pages` branch which causes the git repository very huge (see https://github.com/pytorch/audio/issues/2783). This PR removes the tar file from the commit.

Pull Request resolved: https://github.com/pytorch/audio/pull/2786

Reviewed By: carolineechen

Differential Revision: D40591152…
BriansIDP pushed a commit to BriansIDP/audio that referenced this pull request Jan 23, 2023
Conformer RNN-T with TCPGen for biasing

first commit BrianSun

Conformer RNN-T with TCPGen for biasing

parent c68625152ad84f9ea4e881fac695f7d98ee326a9
author Caroline Chen <carolinechen@fb.com> 1659983982 -0700
committer G. Sun <gs534@login-e-3.data.cluster> 1674296079 +0000

parent c68625152ad84f9ea4e881fac695f7d98ee326a9
author Caroline Chen <carolinechen@fb.com> 1659983982 -0700
committer G. Sun <gs534@login-e-3.data.cluster> 1674296047 +0000

parent c68625152ad84f9ea4e881fac695f7d98ee326a9
author Caroline Chen <carolinechen@fb.com> 1659983982 -0700
committer G. Sun <gs534@login-e-3.data.cluster> 1674295932 +0000

parent c68625152ad84f9ea4e881fac695f7d98ee326a9
author Caroline Chen <carolinechen@fb.com> 1659983982 -0700
committer G. Sun <gs534@login-e-3.data.cluster> 1674295795 +0000

parent c68625152ad84f9ea4e881fac695f7d98ee326a9
author Caroline Chen <carolinechen@fb.com> 1659983982 -0700
committer G. Sun <gs534@login-e-3.data.cluster> 1674295664 +0000

parent c68625152ad84f9ea4e881fac695f7d98ee326a9
author Caroline Chen <carolinechen@fb.com> 1659983982 -0700
committer G. Sun <gs534@login-e-3.data.cluster> 1674295524 +0000

parent c68625152ad84f9ea4e881fac695f7d98ee326a9
author Caroline Chen <carolinechen@fb.com> 1659983982 -0700
committer G. Sun <gs534@login-e-3.data.cluster> 1674295462 +0000

Fix stylecheck (#2606)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2606

Reviewed By: nateanl

Differential Revision: D38502666

Pulled By: carolineechen

fbshipit-source-id: 1e279996fff3621835a07882c63328856fe38f3a

Add NNLM support to CTC Decoder (#2528)

Summary:
Expose flashlight's LM and LMState classes to support decoding with custom language models, including NN LMs.

The `ctc_decoder` API is as follows
- To decode with KenLM, pass in KenLM language model path to `lm` variable
- To decode with custom LM, create Python class with `CTCDecoderLM` subclass, and pass in the class to `lm` variable. Additionally create a file of LM words listed in order of the LM index, with a word per line, and pass in the file to `lm_path`.
- To decode without a language model, set `lm` to `None` (default)

Validated against fairseq w2l decoder on sample LibriSpeech dataset and LM. Code for validation can be found [here](https://github.com/facebookresearch/fairseq/compare/main...carolineechen:fairseq:ctc-decoder). Also added unit tests to validate custom implementations of ZeroLM and KenLM, and also using a biased LM.

Follow ups:
- Train simple LM on LibriSpeech and demonstrate usage in tutorial or examples directory

cc jacobkahn

Pull Request resolved: https://github.com/pytorch/audio/pull/2528

Reviewed By: mthrok

Differential Revision: D38243802

Pulled By: carolineechen

fbshipit-source-id: 445e78f6c20bda655aabf819fc0f771fe68c73d7

Fix dataset docs parsing issue with extra spaces (#2607)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2607

Reviewed By: carolineechen, nateanl

Differential Revision: D38522606

Pulled By: skim0514

fbshipit-source-id: 2c38b8dcb343bcf624bfda1bfa2afd91abf2e668

Fixed argument validation in TorchAudio filtering (#2609)

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2609

Converted argument validations in torchaudio/functional/filtering from assert based validation to the preferred if-then raise validation. Added specific error messages in all cases.

Reviewed By: mthrok

Differential Revision: D38515029

fbshipit-source-id: 6c644a042f86c6feb2bbe8bd02fdb484fe27fae9

Fix bug in Conformer RNN-T recipe (#2611)

Summary:
https://github.com/pytorch/audio/issues/2535 modified the Conformer RNN-T Lightning module to accept a SentencePiece model instance rather than a file path. This PR makes changes to account for this in the train script.

Pull Request resolved: https://github.com/pytorch/audio/pull/2611

Reviewed By: carolineechen

Differential Revision: D38578892

Pulled By: hwangjeff

fbshipit-source-id: ec3b9823ad30ffb730baa13d10d8b79020866aac

Add additive noise function (#2608)

Summary:
Adds function `add_noise`, which computes and returns the sum of a waveform and scaled noise.

Pull Request resolved: https://github.com/pytorch/audio/pull/2608

Reviewed By: nateanl

Differential Revision: D38557141

Pulled By: hwangjeff

fbshipit-source-id: 1457fa213f43ca5b4333d3c7580971655d4260a0

Introducing pytorch-cuda metapackage (#2612)

Summary:
Introducing pytorch-cuda metapackage

Same as: https://github.com/pytorch/vision/pull/6371
Following PR: https://github.com/pytorch/builder/pull/1094
Adds cuda metapackage called pytorch-cuda . This way we can make sure to install correct version of cuda dependencies and don't depend on conda-forge.

Pull Request resolved: https://github.com/pytorch/audio/pull/2612

Reviewed By: hwangjeff, seemethere, nateanl

Differential Revision: D38633332

Pulled By: atalman

fbshipit-source-id: 78a6115bb252ebdb6d66a57d7d2c4a4978ddb501

Remove outdated doc (#2617)

Summary:
`ctc_decoder` has become beta, remove it from prototype documents.

Pull Request resolved: https://github.com/pytorch/audio/pull/2617

Reviewed By: hwangjeff

Differential Revision: D38706869

Pulled By: nateanl

fbshipit-source-id: 41679f4e65a584b6b882af4551a50123f1dcef02

Update doc version selector link (#2605)

Summary:
The link to version selector has been absolute link, which had been
a trap when reviewing gh-pages deployment from folk.

This commit changes that to relative link.

Pull Request resolved: https://github.com/pytorch/audio/pull/2605

Test Plan:
- https://mthrok.github.io/audio/main/index.html -> click version selector -> https://mthrok.github.io/audio/versions.html
- https://mthrok.github.io/audio/0.12.1/index.html -> click version selector -> https://pytorch.org/audio/versions.html

Reviewed By: carolineechen, nateanl

Differential Revision: D38695645

Pulled By: mthrok

fbshipit-source-id: 91132ac19b8c61f39d304a162435b9c6599ef2b2

Fix anaconda upload (#2621)

Summary:
Same as:
https://github.com/pytorch/vision/pull/6422

Testing:
```
export ANACONDA_PATH=$(conda info --base)/bin
echo $ANACONDA_PATH
/opt/homebrew/Caskroom/miniconda/base/bin
$ANACONDA_PATH/anaconda -V
anaconda Command line client (version 1.10.0)
```
Failure: https://github.com/pytorch/audio/runs/7837085749?check_suite_focus=true

Pull Request resolved: https://github.com/pytorch/audio/pull/2621

Reviewed By: weiwangmeta, seemethere

Differential Revision: D38714324

Pulled By: atalman

fbshipit-source-id: 55342cf69006e9250403c955202846bab4516f3e

Move xcode to 14 from 12.5 (#2622)

Summary:
Similar to https://github.com/pytorch/vision/pull/6218
Fixing MacOS builds

Pull Request resolved: https://github.com/pytorch/audio/pull/2622

Reviewed By: weiwangmeta

Differential Revision: D38722983

Pulled By: atalman

fbshipit-source-id: 4cef85c97dc270fc812bc289592c4f3815f73c85

Added example for MelScale transform (#2616)

Summary:
Added example for MelScale transform as mentioned in issue https://github.com/pytorch/audio/issues/1564.

Pull Request resolved: https://github.com/pytorch/audio/pull/2616

Reviewed By: carolineechen

Differential Revision: D38743145

Pulled By: nateanl

fbshipit-source-id: e24ca92f5317a0ea5a141418bf084b12cfb22486

Added example for AmplitudeToDB transform (#2615)

Summary:
Added example for AmplitudeToDB transform as mentioned in issue https://github.com/pytorch/audio/issues/1564.

Pull Request resolved: https://github.com/pytorch/audio/pull/2615

Reviewed By: carolineechen

Differential Revision: D38743117

Pulled By: nateanl

fbshipit-source-id: bf0f760299f4777a4bca65da86359faa00b16207

Use double quotes for string in functional and transforms (#2618)

Summary:
To make the code consistent, we should use double quotation marks for all strings. This PR make such changes in functional and transforms.

Pull Request resolved: https://github.com/pytorch/audio/pull/2618

Reviewed By: carolineechen

Differential Revision: D38744137

Pulled By: nateanl

fbshipit-source-id: 74213a24d9f66c306cc92019d77dcb2a877f94bd

Fix doc warning (#2627)

Summary:
Resolves the following warning

```
/torchaudio/docs/source/transforms.rst:94: WARNING: Title underline too short.

:hidden:`Loudness`
-----------------
```

Pull Request resolved: https://github.com/pytorch/audio/pull/2627

Reviewed By: carolineechen

Differential Revision: D38814802

Pulled By: mthrok

fbshipit-source-id: 5dfaf2d7bae22dba0f4a14f04ca63f28d6b2a749

Fix Sphinx-gallery display and pin sphinx-related packages (#2629)

Summary:
This commit fixes the issue with the recent Sphinx-Gallery update.
Also it pins the versions of Sphinx-related packages.

Before:

<img width="256" alt="Screen Shot 2022-08-17 at 10 02 23 PM" src="https://user-images.githubusercontent.com/855818/185140952-28f2d98a-b586-424c-a003-b69089f48eb9.png">

After:

https://user-images.githubusercontent.com/855818/185271889-bd4f86a0-986b-43bb-8121-bd77750d74f0.mov

Pull Request resolved: https://github.com/pytorch/audio/pull/2629

Reviewed By: carolineechen

Differential Revision: D38816417

Pulled By: mthrok

fbshipit-source-id: 11ee3f9121d9a302772ee1f461dacae52eb28852

Tweak tutorials (#2630)

Summary:
Resolves the following warnings

```
/torchaudio/docs/source/tutorials/asr_inference_with_ctc_decoder_tutorial.rst:195: WARNING: Unexpected indentation.
/torchaudio/docs/source/tutorials/asr_inference_with_ctc_decoder_tutorial.rst:446: WARNING: Unexpected indentation.
/torchaudio/docs/source/tutorials/audio_io_tutorial.rst:559: WARNING: Content block expected for the "note" directive; none found.
/torchaudio/docs/source/tutorials/mvdr_tutorial.rst:338: WARNING: Bullet list ends without a blank line; unexpected unindent.
```

Pull Request resolved: https://github.com/pytorch/audio/pull/2630

Reviewed By: nateanl

Differential Revision: D38816632

Pulled By: mthrok

fbshipit-source-id: 135ded4e064d136be67ce24439e96f5e9c9ce635

Update notes around nightly build and third parties (#2632)

Summary:
Google Colab now has torchaudio 0.12 pre-installed.
This commit removes the note about nightly build.

Pull Request resolved: https://github.com/pytorch/audio/pull/2632

Reviewed By: carolineechen

Differential Revision: D38827632

Pulled By: mthrok

fbshipit-source-id: ac769780868b741c3012357d589ec0019d9af6eb

Added example for InverseMelScale transform (#2635)

Summary:
Added example for InverseMelScale transform as mentioned in issue https://github.com/pytorch/audio/issues/1564.

Pull Request resolved: https://github.com/pytorch/audio/pull/2635

Reviewed By: carolineechen

Differential Revision: D38830318

Pulled By: nateanl

fbshipit-source-id: fd26a700d495f6755db0767625aa8577cb89bd83

Update ASR inference tutorial (#2631)

Summary:
* Use download_asset
* Remove notes around nightly
* Print versions first
* Remove duplicated import

Pull Request resolved: https://github.com/pytorch/audio/pull/2631

Reviewed By: carolineechen

Differential Revision: D38830395

Pulled By: mthrok

fbshipit-source-id: c9259df33562defe249734d1ed074dac0fddc2f6

Update README.md (#2633)

Summary:
Update compatibility matrix

Pull Request resolved: https://github.com/pytorch/audio/pull/2633

Reviewed By: nateanl

Differential Revision: D38827670

Pulled By: mthrok

fbshipit-source-id: 5c66bf60a06e37919ee725a5f4adf571e6c89100

Refactor sox pybind source code (#2636)

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2636

At the early stage of torchaudio extension module,
`torchaudio/csrc/pybind` directory was created so that
all the code defining Python interface would be placed
there and there will be only one extension module called
`torchaudio._torchaudio`.

However, the codebase has been evolved in a way separate
extensions are defined for each feature (third party
dependency) for the sake of more moduler file organization.

What is left in `csrc/pybind` is libsox Python bindings.
This commit moves it under `csrc/sox`.

Follow-up rename `torchaudio._torchaudio` to `torchaudio._torchaudio_sox`.

Reviewed By: carolineechen

Differential Revision: D38829253

fbshipit-source-id: 3554af45a2beb0f902810c5548751264e093f28d

Added example for MFCC transform (#2637)

Summary:
Added example for MFCC transform as mentioned in issue https://github.com/pytorch/audio/issues/1564.

Note: Python formatter package `black` uses double quotes for the string dict keys (e.g. in `melkwargs` for this example). Please let me know if there is a different linter/format/convention that is preferred!

Pull Request resolved: https://github.com/pytorch/audio/pull/2637

Reviewed By: carolineechen

Differential Revision: D38873729

Pulled By: nateanl

fbshipit-source-id: 2e8fe2930671e7c5d02c0c37cf1ca5cc8c5079e3

Added example for Loudness transform (#2641)

Summary:
Added example for Loudness transform (implemented in PR https://github.com/pytorch/audio/issues/2472) as mentioned in issue https://github.com/pytorch/audio/issues/1564.

Pull Request resolved: https://github.com/pytorch/audio/pull/2641

Reviewed By: nateanl

Differential Revision: D38907782

Pulled By: carolineechen

fbshipit-source-id: fd2bcc4bac3095a626ea9cf36cb70cb2bf003d63

Update Sphinx-gallery to 0.11.1 (#2638)

Summary:
The minor release fixes some gallery issue, which allows to remove
some of the customization we had in https://github.com/pytorch/audio/issues/2629

https://output.circle-artifacts.com/output/job/553a9b98-8260-4cb4-a681-20ef97d2c33e/artifacts/0/docs/pipelines.html#torchaudio.pipelines.Wav2Vec2ASRBundle

Pull Request resolved: https://github.com/pytorch/audio/pull/2638

Reviewed By: carolineechen, nateanl

Differential Revision: D38909097

Pulled By: mthrok

fbshipit-source-id: 78346d93b54fca2a19b28991c224324ef53221c9

[Nova] Added draft calling GHA workflow for building linux wheels (#2548)

Summary:
As part of Project Nova, we are consolidating CI/CD workflows and infra, making them reusable across PyTorch ecosystem libraries. https://github.com/pytorch/test-infra/pull/460 introduces a general-purpose reusable workflow to build linux wheels for python libraries. This PR introduces a caller workflow that triggers the reusable workflow. Details around modular env setup, passing input args across workflows, etc. are still being worked out.

Using reusable workflow defined in https://github.com/pytorch/test-infra/pull/506

Pull Request resolved: https://github.com/pytorch/audio/pull/2548

Reviewed By: osalpekar

Differential Revision: D38947733

Pulled By: mehtanirav

fbshipit-source-id: 03ab88cef973a092f5c5d1ff8c74ec7ae7e46d01

Added example for LFCC transform (#2640)

Summary:
Added example for LFCC transform as mentioned in issue https://github.com/pytorch/audio/issues/1564.

Pull Request resolved: https://github.com/pytorch/audio/pull/2640

Reviewed By: carolineechen

Differential Revision: D38908975

Pulled By: nateanl

fbshipit-source-id: ffdd994390db7f27556b011a8050a65eef9cd09d

Add StreamWriter (#2628)

Summary:
This commit adds FFmpeg-based encoder StreamWriter class.
StreamWriter is pretty much the opposite of StreamReader class, and
it supports;

* Encoding audio / still image / video
* Exporting to local file / streaming protocol / devices etc...
* File-like object support (in later commit)
* HW video encoding (in later commit)

See also: https://fburl.com/gslide/z85kn5a9 (Meta internal)

Pull Request resolved: https://github.com/pytorch/audio/pull/2628

Reviewed By: nateanl

Differential Revision: D38816650

Pulled By: mthrok

fbshipit-source-id: a9343b0d55755e186971dc96fb86eb52daa003c8

[Nova] Build Linux Conda Binaries using reusable workflow (#2626)

Summary:
Calling the reusable workflow introduced in https://github.com/pytorch/test-infra/pull/546 to build conda binaries on linux.

Pull Request resolved: https://github.com/pytorch/audio/pull/2626

Reviewed By: mehtanirav

Differential Revision: D39028057

Pulled By: osalpekar

fbshipit-source-id: d74ea3771967d0ee2b0ad28a8f811a95145b2183

Replace bg_iterator in examples (#2645)

Summary:
`bg_iterator` was deprecated in 0.11 because it was known to have issues (deadlock) without speed up. Remove instances of `bg_iterator` used in torchaudio examples.

Resolves https://github.com/pytorch/audio/issues/2642

Pull Request resolved: https://github.com/pytorch/audio/pull/2645

Reviewed By: nateanl

Differential Revision: D38954292

Pulled By: carolineechen

fbshipit-source-id: 2333ab5228c2b8511ff532057543aaf9d02b2789

[Nova] Use pkg-helpers to modularize GHA Linux Conda Builds (#2650)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2650

Reviewed By: mehtanirav

Differential Revision: D39040559

Pulled By: osalpekar

fbshipit-source-id: df39e23d7c246728793aab969b8dc1070af88d75

add CUDA 11.7 builds (#2623)

Summary:
CC atalman

Pull Request resolved: https://github.com/pytorch/audio/pull/2623

Reviewed By: hwangjeff, nateanl

Differential Revision: D39036432

Pulled By: atalman

fbshipit-source-id: cd74a1bf8f74e31bd2c32c80d32c617f4b1766e8

Add file-like object support to StreamWriter (#2648)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2648

Reviewed By: nateanl

Differential Revision: D38976874

Pulled By: mthrok

fbshipit-source-id: 0541dea2a633d97000b4b8609ff6b83f6b82c864

Add CUDA HW encoding support to StreamWriter (#2505)

Summary:
This commits add CUDA hardware encoding to StreamWriter.
For certain video formats, it can encode video directly from
CUDA Tensor, without needing to move the data to host CPU.

Pull Request resolved: https://github.com/pytorch/audio/pull/2505

Reviewed By: hwangjeff

Differential Revision: D37446830

Pulled By: mthrok

fbshipit-source-id: eee6424f01a99a3b611dcad45ed58f86cba4672a

Remove obsolete examples (#2655)

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2655

Removed obsolete example and the corresponding test

Reviewed By: mthrok

Differential Revision: D39260253

fbshipit-source-id: 0bde71ffd75dd0c94a5cc4a9940f4648a5d61bd7

Add metadata function for LibriSpeech (#2653)

Summary:
Adding support for metadata mode, requested in https://github.com/pytorch/audio/issues/2539, by adding a public `get_metadata()` function in the dataset. This function can be used directly by users to fetch metadata for individual dataset indices, or users can subclass the dataset and override `__getitem__` with `get_metadata` to create a dataset class that directly handles metadata mode.

Pull Request resolved: https://github.com/pytorch/audio/pull/2653

Reviewed By: nateanl, mthrok

Differential Revision: D39105114

Pulled By: carolineechen

fbshipit-source-id: 6f26f1402a053dffcfcc5d859f87271ed5923348

Fix random Gaussian generation (#2639)

Summary:
This PR is meant to address the bug raised in issue https://github.com/pytorch/audio/issues/2634.

In particular, previously the Box Muller transform was used to generate Gaussian variates for dithering based on `torch.rand` uniform variates, but it was incorrectly implemented (e.g. the same uniform variate was used as input to the transform, rather than two different uniform variates), which led to a different (non-Gaussian) distribution. This PR instead uses `torch.randn` to generate the Gaussian variates.

Pull Request resolved: https://github.com/pytorch/audio/pull/2639

Reviewed By: mthrok

Differential Revision: D39101144

Pulled By: carolineechen

fbshipit-source-id: 691e49679f6598ef0a1675f6f4ee721ef32215fd

Tweak documentation (#2656)

Summary:
1. Override class `__module__` attribute in `conf.py` so that no manual override is necessary
2. Fix SourceSeparationBundle member attribute

Pull Request resolved: https://github.com/pytorch/audio/pull/2656

Reviewed By: carolineechen

Differential Revision: D39293053

Pulled By: mthrok

fbshipit-source-id: 2b8d6be1aee517d0e692043c26ac2438a787adc6

Fix LibriSpeech Conforner RNN-T eval script (#2666)

Summary:
`ConformerRNNTModule`'s initializer now accepts a SentencePiece model rather than a path to a model as input. This PR corrects `eval.py` accordingly.

Pull Request resolved: https://github.com/pytorch/audio/pull/2666

Reviewed By: carolineechen

Differential Revision: D39386968

Pulled By: hwangjeff

fbshipit-source-id: 95a94dd898263d648650f7376c29810b1456d6c1

[Nova] Remove the old caller GitHub Actions Linux wheels/conda Build Workflows (#2660)

Summary:
We moved over to a new design for release workflows that encompass all the build logic in the test-infra repo (apart from custom pre-build and post-build scripts). Thus, we no longer need these caller workflows in the audio repo. This PR removes them entirely.

Pull Request resolved: https://github.com/pytorch/audio/pull/2660

Reviewed By: seemethere

Differential Revision: D39392456

Pulled By: osalpekar

fbshipit-source-id: a8bdeb4738b91666abcdc883f6f8f1bf359f1d42

Move hybrid demucs model out of prototype (#2668)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2668

Reviewed By: nateanl, mthrok

Differential Revision: D39433671

Pulled By: carolineechen

fbshipit-source-id: 3545a5b4019832861c34fd8c05e5f8600fd80d5c

Do not use nested namespaces in torchaudio/sox (#2663)

Summary:
As it is a C++17 feature, and PyTorch and its extensions must still be C++14 compatible, as also specified in the top level CMakeLists.txt:
https://github.com/pytorch/audio/blob/8a0d7b36f7821fe55175f0d4e3ca6299b3817a6c/CMakeLists.txt#L30

Otherwise, it pollutes build logs with noisy
```
/Users/runner/work/test-infra/test-infra/pytorch/audio/torchaudio/csrc/sox/pybind/io.cpp:12:21: warning: nested namespace definition is a C++17 extension; define each namespace separately [-Wc++17-extensions]
namespace torchaudio::sox_io {
                    ^~~~~~~~
                     { namespace sox_io
```

Pull Request resolved: https://github.com/pytorch/audio/pull/2663

Reviewed By: atalman, nateanl

Differential Revision: D39362842

Pulled By: malfet

fbshipit-source-id: f9659d4420f1cc0194990d531455cf59b66c26b9

[Bootcamp] Fix Typo (#2661)

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2661

Fixed typo in `audio_data_augmentation_tutorial.py`

Reviewed By: malfet, mthrok

Differential Revision: D39352353

fbshipit-source-id: aea35dab03fb7422421948bd26716e10a8d65f92

Move SourceSeparationBundle and pre-trained ConvTasNet pipeline into Beta (#2669)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2669

Reviewed By: carolineechen, mthrok

Differential Revision: D39433560

Pulled By: nateanl

fbshipit-source-id: 5b652b31c00badb37b27a32ac25b422a5bcc74cb

CUDA 11.3 remove. New Stable version is 11.6 (#2670)

Summary:
CUDA 11.3 Removing.

Core PR: https://github.com/pytorch/pytorch/pull/84866
cc malfet ptrblck

Pull Request resolved: https://github.com/pytorch/audio/pull/2670

Reviewed By: malfet, osalpekar

Differential Revision: D39449263

Pulled By: atalman

fbshipit-source-id: f86bb119685ead3ffcabd92c4bb8076aecde4095

Move Hybrid Demucs pipeline to beta (#2673)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2673

Reviewed By: mthrok

Differential Revision: D39507612

Pulled By: carolineechen

fbshipit-source-id: 3a9ee53f72cabd6e3085c76867017be4a6ed7f53

Add Decoder LM Docs (#2658)

Summary:
modifications to ctc decoder LM docstrings on top of https://github.com/pytorch/audio/issues/2657

Pull Request resolved: https://github.com/pytorch/audio/pull/2658

Reviewed By: mthrok

Differential Revision: D39468921

Pulled By: carolineechen

fbshipit-source-id: c5497cc2fa22fb98a304d037e27c91bf68a9ad6a

Tweak badge link URL generation (#2677)

Summary:
Currently, the way feature badges are generated assumes that both documentations and the supported features page are on the same level from the root.

This does not work when we introduce `:autosummary:` which generates individual documentation pages one level below.

This commit changes it so that links to the supported features page are properly relative from the documentation level.

There is no appearance change from this commit.

Pull Request resolved: https://github.com/pytorch/audio/pull/2677

Reviewed By: carolineechen

Differential Revision: D39507451

Pulled By: mthrok

fbshipit-source-id: f18da4201f0eb747586be21c8bd9a958217aebc2

Move conv_tasnet_base doc out of prototype (#2675)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2675

Reviewed By: carolineechen

Differential Revision: D39515996

Pulled By: nateanl

fbshipit-source-id: 5824375f6a758af21b6ad6c635dd06081663644f

Consolidate bibliography / reference (#2676)

Summary:
Preparation for the adoptation of `autosummary`.

Replace `:footcite:` with `:cite:` and introduce dedicated reference page, as `:footcite:` does not work well with `autosummary`.

Example:

https://output.circle-artifacts.com/output/job/4da47ba6-d9c7-418e-b5b0-e9f8a146a6c3/artifacts/0/docs/datasets.html#cmuarctic

https://output.circle-artifacts.com/output/job/4da47ba6-d9c7-418e-b5b0-e9f8a146a6c3/artifacts/0/docs/references.html

Pull Request resolved: https://github.com/pytorch/audio/pull/2676

Reviewed By: carolineechen

Differential Revision: D39509431

Pulled By: mthrok

fbshipit-source-id: e6003dd01ec3eff3d598054690f61de8ee31ac9a

Update doc theme to the latest (#2679)

Summary:
To follow the change related to Linux Foundation movement.

(we are still pinning the theme version so that our customization does not break randomly.)

Pull Request resolved: https://github.com/pytorch/audio/pull/2679

Reviewed By: carolineechen

Differential Revision: D39531566

Pulled By: mthrok

fbshipit-source-id: 64353577d05f9dbda00dd9d10b9ebcedddfdce5b

Update Sphinx to 5.1.1 (#2678)

Summary:
Previous versions of Sphinx reported wrong path for return class. This issue is fixed on the latest Sphinx.

It allows to remove the patch we apply in `conf.py`. This is essential for the adoptation of `:autosummary:`, as it won't render correctly with the patch.

https://output.circle-artifacts.com/output/job/19d93ede-08de-4b9e-9d66-67ca5dab964e/artifacts/0/docs/pipelines.html

Pull Request resolved: https://github.com/pytorch/audio/pull/2678

Reviewed By: carolineechen

Differential Revision: D39509447

Pulled By: mthrok

fbshipit-source-id: e104bc6a87f32cba6c549a9fe8f2d1e489ee27e4

Switch to use conda install action for m1 builds (#2674)

Summary:
Usage setup-minicoda action for m1 build
We want to try to address space issues on m1. The following action:
```
pytorch/test-infra/.github/actions/setup-miniconda@main
```

Sets up miniconda in temp folder which should be cleaned between runs

Pull Request resolved: https://github.com/pytorch/audio/pull/2674

Reviewed By: jeanschmidt

Differential Revision: D39540481

Pulled By: atalman

fbshipit-source-id: 0596598ab6b2f99c775aa0c9e14a3a388533068d

Adopt `:autosummary:` in `torchaudio.io` module doc (#2681)

Summary:
This commit adopts :autosummary: directive to `torchaudio.io` module.
It adds table of contents on `torchaudio.io` level.

https://output.circle-artifacts.com/output/job/282089d1-c120-4d22-809f-0e0ac0947c37/artifacts/0/docs/io.html
<img width="1094" alt="Screen Shot 2022-09-16 at 7 33 32 AM" src="https://user-images.githubusercontent.com/855818/190520248-27e469f8-7689-4dc2-b591-7b3f08bb4dff.png">

https://output.circle-artifacts.com/output/job/282089d1-c120-4d22-809f-0e0ac0947c37/artifacts/0/docs/generated/torchaudio.io.StreamReader.html#torchaudio.io.StreamReader
<img width="1108" alt="Screen Shot 2022-09-16 at 7 33 59 AM" src="https://user-images.githubusercontent.com/855818/190520292-d090fed0-2f18-4961-b9f3-9e4808fd437e.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2681

Reviewed By: carolineechen

Differential Revision: D39560459

Pulled By: mthrok

fbshipit-source-id: 3de5f22b8d8d0834dfd8bac8619fbfaa44c5f4dd

Adopt `:autosummary:` in `torchaudio.models.decoder` module doc (#2684)

Summary:
* Adopts `:autosummary:` in decoder module doc
* Hide the constructor signature of `CTCDecoder` as `ctc_decoder` function is the one client code is supposed to be using.
* Introduce `children` property to `CTCDecoderLMState` otherwise it does not show up in the doc.

https://output.circle-artifacts.com/output/job/7aac5eb9-7d2d-4f63-bcdf-83a6f40b4e5a/artifacts/0/docs/models.decoder.html

<img width="748" alt="Screen Shot 2022-09-16 at 5 23 22 PM" src="https://user-images.githubusercontent.com/855818/190592409-0c2ec8a4-d2cf-4d76-a965-8a570faaeb1a.png">

https://output.circle-artifacts.com/output/job/7aac5eb9-7d2d-4f63-bcdf-83a6f40b4e5a/artifacts/0/docs/generated/torchaudio.models.decoder.CTCDecoder.html#torchaudio.models.decoder.CTCDecoder

<img width="723" alt="Screen Shot 2022-09-16 at 5 23 53 PM" src="https://user-images.githubusercontent.com/855818/190592501-3fad1e07-ae3e-44f5-93be-f33181025390.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2684

Reviewed By: carolineechen

Differential Revision: D39574272

Pulled By: mthrok

fbshipit-source-id: d977660bd46f5cf98c535adbf2735be896b28773

Adopt `:autosummary:` in `torchaudio.transforms` module doc (#2683)

Summary:
* Introduce the mini-index at `torchaudio.transforms` page.
* Add "Augmentations" subsection.
* Also updated the overall introduction.

https://output.circle-artifacts.com/output/job/1b65246a-403c-4d2c-b97d-d1b582d8b4e5/artifacts/0/docs/transforms.html

<img width="721" alt="Screen Shot 2022-09-16 at 5 20 08 PM" src="https://user-images.githubusercontent.com/855818/190591795-97c169db-a95b-480a-8d3c-d80072efa045.png">

<img width="755" alt="Screen Shot 2022-09-16 at 5 20 28 PM" src="https://user-images.githubusercontent.com/855818/190591828-03026918-febd-4194-91aa-7d8f704e17cc.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2683

Reviewed By: carolineechen

Differential Revision: D39574255

Pulled By: mthrok

fbshipit-source-id: a4beed7cacbb5184bad96efa903a3a1123dab627

[Nova] Remove Extraneous Build Scripts (#2695)

Summary:
There is a single pre/post script needed for building torchaudio. This PR:
1. Removes the old conda-specific build script
2. Renames the wheel script to be a general name

Pull Request resolved: https://github.com/pytorch/audio/pull/2695

Reviewed By: kit1980

Differential Revision: D39631971

Pulled By: osalpekar

fbshipit-source-id: 52b49a6e792536b6264228c01ac356d247b18ea8

Update nightly wheels to ROCm5.2 (#2672)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2672

Reviewed By: atalman

Differential Revision: D39468320

Pulled By: mthrok

fbshipit-source-id: 0e7bd4fd922ba0db51700e140b95328a5b687a6f

Adopt `:autosummary:` in `torchaudio.functional` module doc (#2693)

Summary:
https://output.circle-artifacts.com/output/job/b23174d2-5cee-4ee9-be39-3228b9ae4abe/artifacts/0/docs/functional.html

<img width="1133" alt="Screen Shot 2022-09-20 at 11 19 23 AM" src="https://user-images.githubusercontent.com/855818/191152824-96c5b16c-bd38-4656-b1ae-0b58699dbd62.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2693

Reviewed By: carolineechen

Differential Revision: D39650930

Pulled By: mthrok

fbshipit-source-id: 28b5b03d21b922e37e02bfddda2bf1dea696cc18

Add Speech Commands metadata function (#2687)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2687

Reviewed By: mthrok

Differential Revision: D39647596

Pulled By: carolineechen

fbshipit-source-id: 8ff874fc1e828130f6754e83ce1f702ca13dfac0

Adopt `:autosummary:` in `torchaudio.models` module doc (#2690)

Summary:
* Introduce the mini-index at `torchaudio.models` page.

https://output.circle-artifacts.com/output/job/25e59810-3866-4ece-b1b7-8a10c7a2286d/artifacts/0/docs/models.html

<img width="1042" alt="Screen Shot 2022-09-20 at 1 20 50 PM" src="https://user-images.githubusercontent.com/855818/191166816-83314ad1-8b67-475b-aa10-d4cc59126295.png">

<img width="1048" alt="Screen Shot 2022-09-20 at 1 20 58 PM" src="https://user-images.githubusercontent.com/855818/191166829-1ceb65e0-9506-4328-9a2f-8b75b4e54404.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2690

Reviewed By: carolineechen

Differential Revision: D39654948

Pulled By: mthrok

fbshipit-source-id: 703d1526617596f647c85a7148f41ca55fffdbc8

Support in-memory decoding via Tensor wrapper in StreamReader (#2694)

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2694

This commit adds Tensor type as input to `StreamReader`.
The Tensor is interpreted as byte string buffer.

Reviewed By: hwangjeff

Differential Revision: D39467630

fbshipit-source-id: 6369eed5e16fbb657568bf6bb80d703483d72f8e

Add StreamReader Tensor Binding to src (#2699)

Summary:
In https://github.com/pytorch/audio/issues/2694 CMakeLists.txt was not properly updated, so the tests are failing. This commit fix it.

Pull Request resolved: https://github.com/pytorch/audio/pull/2699

Reviewed By: carolineechen

Differential Revision: D39687409

Pulled By: mthrok

fbshipit-source-id: 2e14f3c478f1f8a112a03839f2dbcca51215fed7

Adopt `:autosummary:` in `torchaudio.pipelines` module doc (#2689)

Summary:
* Introduce the mini-index at `torchaudio.pipelines` page.
* Add introductions
* Update pipeline tutorials

https://output.circle-artifacts.com/output/job/ccc57d95-1930-45c9-b967-c8d477d35f29/artifacts/0/docs/pipelines.html

<img width="1163" alt="Screen Shot 2022-09-20 at 1 23 29 PM" src="https://user-images.githubusercontent.com/855818/191167049-98324e93-2e16-41db-8538-3b5b54eb8224.png">

<img width="1115" alt="Screen Shot 2022-09-20 at 1 23 49 PM" src="https://user-images.githubusercontent.com/855818/191167071-4770f594-2540-43a4-a01c-e983bf59220f.png">

https://output.circle-artifacts.com/output/job/ccc57d95-1930-45c9-b967-c8d477d35f29/artifacts/0/docs/generated/torchaudio.pipelines.RNNTBundle.html#torchaudio.pipelines.RNNTBundle

<img width="1108" alt="Screen Shot 2022-09-20 at 1 24 18 PM" src="https://user-images.githubusercontent.com/855818/191167123-51b33a5f-c30c-46bc-b002-b05d2d0d27b7.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2689

Reviewed By: carolineechen

Differential Revision: D39691253

Pulled By: mthrok

fbshipit-source-id: ddf5fdadb0b64cf2867b6271ba53e8e8c0fa7e49

Add metadata mode for various datasets (#2697)

Summary:
Add metadata mode for the following SUPERB benchmark datasets
- QUESST14
- Fluent Speech Commands
- VoxCeleb1

follow ups:
- Add metadata mode for LibriMix -- waiting for unit tests to merge
- Add IEMOCAP + SNIPS datasets

Pull Request resolved: https://github.com/pytorch/audio/pull/2697

Reviewed By: mthrok

Differential Revision: D39666809

Pulled By: carolineechen

fbshipit-source-id: 3a8f07627acceed70f960f47e694efad75b108c2

Update and fix tutorials (#2701)

Summary:
* Fix Sphinx warning
* Update asset management

Pull Request resolved: https://github.com/pytorch/audio/pull/2701

Reviewed By: carolineechen

Differential Revision: D39714126

Pulled By: mthrok

fbshipit-source-id: a5b04cfbf8bedce67c621b6bfe1dcd975b343313

Adopt `:autosummary:` in `torchaudio.datasets` module doc (#2692)

Summary:
* Introduce the mini-index at `torchaudio.datasets` page.
* Standardize the format of return type docstring.

https://output.circle-artifacts.com/output/job/989328b2-0270-4958-b577-19cf749af3fd/artifacts/0/docs/datasets.html

<img width="936" alt="Screen Shot 2022-09-21 at 6 56 52 PM" src="https://user-images.githubusercontent.com/855818/191475141-a97f2bea-705f-49bc-8c34-6ec869e76793.png">

https://output.circle-artifacts.com/output/job/989328b2-0270-4958-b577-19cf749af3fd/artifacts/0/docs/generated/torchaudio.datasets.CMUDict.html#torchaudio.datasets.CMUDict

<img width="1069" alt="Screen Shot 2022-09-21 at 6 57 32 PM" src="https://user-images.githubusercontent.com/855818/191475293-e3302528-27ea-4212-9c12-fd6d900fdf3e.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2692

Reviewed By: carolineechen

Differential Revision: D39687463

Pulled By: mthrok

fbshipit-source-id: 4175fc15388817d2fe76206188618dd1576281df

Introduce IO section to getting started tutorials (#2703)

Summary:
Since that new tutorials for StreamWriter are being added, there are more tutorials for media IO than the rest.
So this commit introduces sub-index for IO tutorials.

Pull Request resolved: https://github.com/pytorch/audio/pull/2703

Reviewed By: carolineechen

Differential Revision: D39769049

Pulled By: mthrok

fbshipit-source-id: 19a3981bc624fdce1d5d703c67e28a751a15e812

[Nova] Moving Linux Wheels over to Nova (#2702)

Summary:
This does 2 things:

Comments out Linux Wheels-related jobs in CircleCI so that they are not run on nightlies/releases.
Adds a GHA workflow that calls the build workflow in pytorch/test-infra.
Testing:
Verified that the builds are triggered by this workflow, and all builds are green: https://github.com/pytorch/audio/actions/runs/3109635749/jobs/5040029155

Pull Request resolved: https://github.com/pytorch/audio/pull/2702

Reviewed By: seemethere

Differential Revision: D39756852

Pulled By: osalpekar

fbshipit-source-id: 7e222d80ca0720e3be43b929f1e55f5c0166b947

[perf][5/5] Replace IValue::toString()->string() with IValue::toStringRef() (#2700)

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2700

ATT for pytorch/audio

Reviewed By: mthrok

Differential Revision: D39707243

fbshipit-source-id: 1dc2a5a9fe913a9071e6df679e39d632b75212fb

Add CUDA version check (#2707)

Summary:
Adds check to ensure that TorchAudio and PyTorch versions use the same CUDA version.

Pull Request resolved: https://github.com/pytorch/audio/pull/2707

Reviewed By: mthrok

Differential Revision: D39791154

Pulled By: hwangjeff

fbshipit-source-id: de00889c7bac897c6b8762502f9d37797016b71d

Fix CUDA check (#2710)

Summary:
`torch.version.cuda` can return a string of form X.X or X.X.X. This PR modifies the CUDA version check to account for this.

Pull Request resolved: https://github.com/pytorch/audio/pull/2710

Reviewed By: carolineechen, nateanl

Differential Revision: D39796810

Pulled By: hwangjeff

fbshipit-source-id: b483bd8200195844d65d0caddebaf1b10f939b64

Remove linux wheel from circleci (#2714)

Summary:
Remove linux wheel from circleci

Pull Request resolved: https://github.com/pytorch/audio/pull/2714

Reviewed By: weiwangmeta

Differential Revision: D39816121

Pulled By: atalman

fbshipit-source-id: a3c99b530896888d7b4271d8b3f27f3c986b3480

Fix windows tests related to old conda on circleci (#2704)

Summary:
Conda version on circleCI prints following message:
```
==> WARNING: A newer version of conda exists. <==
  current version: 4.6.14
  latest version: 4.14.0
```
and as a result this error:

```
+ /c/tools/miniconda3/Scripts/conda.exe install -v -y -c pytorch-nightly -c nvidia pytorch numpy ffmpeg pytorch-cuda=11.6
Collecting package metadata: ...working... done
Solving environment: ...working...

Too long with no output (exceeded 30m0s): context deadline exceeded
```

This should update the conda version running on the system and allow us to install pytorch and run some tests.

Pull Request resolved: https://github.com/pytorch/audio/pull/2704

Reviewed By: weiwangmeta

Differential Revision: D39820037

Pulled By: atalman

fbshipit-source-id: 4a82a7a6cbe3dc1a5807ac669e2fa79f454037fa

[Nova] Add build-type argument for when upload should be triggered (#2706)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2706

Reviewed By: kit1980

Differential Revision: D39786253

Pulled By: osalpekar

fbshipit-source-id: 2a0c427f57e5c70ff1cf419b7e0c2316e5f0e16c

Back out "[audio][PR] [Nova] Moving Linux Wheels over to Nova" (#2718)

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2718

Original commit changeset: 7e222d80ca07

Original Phabricator Diff: D39756852 (https://github.com/pytorch/audio/commit/7ba7cf4d24a2967b8fa4aaff437116524281f8fd)

Reviewed By: weiwangmeta

Differential Revision: D39839899

fbshipit-source-id: f5605eb9882f7c7f0008e88338ab711131b29404

Fix mismatched cuda version in smoke tests on windows wheels (#2721)

Summary:
Example job that was failing previously:
https://app.circleci.com/pipelines/github/pytorch/audio/12796/workflows/ae96794a-6df4-4a2a-84df-ada7a7250045/jobs/927709

The failure:
```
"Detected that PyTorch and TorchAudio were compiled with different CUDA versions. "
RuntimeError: Detected that PyTorch and TorchAudio were compiled with different CUDA versions. PyTorch has CUDA version 11.7 whereas TorchAudio has CUDA version 11.6. Please install the TorchAudio version that matches your PyTorch version.
```

Has install command:
```
pip install $(ls ~/workspace/torchaudio*.whl) -f "https://download.pytorch.org/whl/${UPLOAD_CHANNEL}/torch_${UPLOAD_CHANNEL}.html"

pip install /c/Users/circleci/workspace/torchaudio-0.13.0.dev20220927+cu116-cp37-cp37m-win_amd64.whl -f https://download.pytorch.org/whl/nightly/torch_nightly.html
```

Linux job (succeeds) for uses different "-f" (find links) url, that includes specific cuda version:
https://app.circleci.com/pipelines/github/pytorch/audio/12809/workflows/aadca2ab-5a00-4a0a-ab6a-4a1b7a503713/jobs/927861

Command:
```
pip install $(ls ~/workspace/torchaudio*.whl) -f "https://download.pytorch.org/whl/${UPLOAD_CHANNEL}/${CU_VERSION}/torch_${UPLOAD_CHANNEL}.html"

 pip install /root/workspace/torchaudio-0.13.0.dev20220927+cu116-cp37-cp37m-linux_x86_64.whl -f https://download.pytorch.org/whl/nightly/cu116/torch_nightly.html

```

This PR makes Windows installation match the linux one.

Testing:
* verified command manually on Circle CI:
```
>>> import torch
>>> import torchaudio
C:\tools\miniconda3\lib\site-packages\torchaudio\compliance\kaldi.py:22: UserWarning: Failed to initialize NumPy: numpy.core.multiarray failed to import (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\utils\tensor_numpy.cpp:77.)
  EPSILON = torch.tensor(torch.finfo(torch.float).eps)
C:\tools\miniconda3\lib\site-packages\torchaudio\backend\utils.py:62: UserWarning: No audio backend is available.
  warnings.warn("No audio backend is available.")
```

Co-authered: weiwangmeta

Pull Request resolved: https://github.com/pytorch/audio/pull/2721

Reviewed By: hwangjeff

Differential Revision: D39870805

Pulled By: izaitsevfb

fbshipit-source-id: 2957cba4f53d00783a5c07099f24050ce15e7d1c

Removing cuda102 (#2715)

Summary:
Removing cuda102

Pull Request resolved: https://github.com/pytorch/audio/pull/2715

Reviewed By: hwangjeff

Differential Revision: D39823444

Pulled By: atalman

fbshipit-source-id: c11d798ab86cf9a6d5ed3804958b4a0c2f8a87ff

Revert "Removing cuda102 (#2715)" (#2723)

Summary:
Revert this fot now untill docker is updated

Pull Request resolved: https://github.com/pytorch/audio/pull/2723

Reviewed By: nateanl

Differential Revision: D39900382

Pulled By: atalman

fbshipit-source-id: f8701e359bc11e8f9f3a29144f7e7da336a470da

Cuda 102 deprecation (#2724)

Summary:
Cuda 10.2 deprecation, migration of unit tests from cuda 10.2 to cuda 11.6

Pull Request resolved: https://github.com/pytorch/audio/pull/2724

Reviewed By: weiwangmeta

Differential Revision: D39912484

Pulled By: atalman

fbshipit-source-id: e760b630375eae94384cda68d24f83ef46ada6d9

Delete packaging/README.md (#2730)

Summary:
The file looks hopelessly outdated.

Pull Request resolved: https://github.com/pytorch/audio/pull/2730

Reviewed By: mthrok

Differential Revision: D39993805

Pulled By: kit1980

fbshipit-source-id: f5ad97c83873061175455cc7b129ec71a9ec3d7d

Add citation for MuST-C dataset in Emformer RNNT pipeline. (#2728)

Summary:
The MuST-C reference is added in https://github.com/pytorch/audio/pull/2689. This PR adds the citation to the RNNT pipeline documentation.

Pull Request resolved: https://github.com/pytorch/audio/pull/2728

Reviewed By: carolineechen

Differential Revision: D39990882

Pulled By: nateanl

fbshipit-source-id: 011057952dd8aa30a4cb7c7af0ac75123e329d7e

Adopt :autosummary: to multiple modules (#2664)

Summary:
Adopt `:autosummary:` to various modules

    * torchaudio.compliance.kaldi
    * torchaudio.sox_effects
    * torchaudio.utils

Pull Request resolved: https://github.com/pytorch/audio/pull/2664

Reviewed By: nateanl

Differential Revision: D39841873

Pulled By: mthrok

fbshipit-source-id: ff4fa6976324fca5f35b737b715f976e2a722bac

Add StreamWriter media device/streaming tutorial (#2708)

Summary:
https://output.circle-artifacts.com/output/job/213c71c8-c9b5-4516-af92-a2f8dab2c9fd/artifacts/0/docs/tutorials/streamwriter_advanced.html

Pull Request resolved: https://github.com/pytorch/audio/pull/2708

Reviewed By: carolineechen

Differential Revision: D40013310

Pulled By: mthrok

fbshipit-source-id: 7226b021ce2fe951b3bf0bd41e93a6bbcf696124

Tweak tutorials (#2733)

Summary:
* Port downstream change https://github.com/pytorch/tutorials/pull/2060
* Fix inter-tutorial links and references

Pull Request resolved: https://github.com/pytorch/audio/pull/2733

Reviewed By: hwangjeff

Differential Revision: D40086902

Pulled By: hwangjeff

fbshipit-source-id: 00b04c6a1b68fb9fadd52b610b26ecaab15d52d8

Increase CircleCi no_output_timeout for `install binaries` steps (#2734)

Summary:
The goal is to to reduce the number of job failures due to timeouts, see https://app.circleci.com/pipelines/github/pytorch/audio/12882/workflows/f99da1a5-32e6-4bac-8ceb-fbf36d693e2d/jobs/936363?invite=true#step-105-105 for example.

Pull Request resolved: https://github.com/pytorch/audio/pull/2734

Reviewed By: weiwangmeta, atalman

Differential Revision: D40077578

fbshipit-source-id: 573f43a4d088a7086fa6925ac5ba1fdd1e8f39ec

Torchaudio load libary path fix for windows python 3.8 (#2735)

Summary:
Torchaudio load libary path fix for windows and python = 3.8

Fixes: https://github.com/pytorch/audio/issues/2726

Fixes following issue:

```
>>> import torchaudio
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\atalman\miniconda3\envs\mywin38\lib\site-packages\torchaudio\__init__.py", line 1, in <module>
    from torchaudio import (  # noqa: F401
  File "C:\Users\atalman\miniconda3\envs\mywin38\lib\site-packages\torchaudio\_extension.py", line 128, in <module>
    _init_extension()
  File "C:\Users\atalman\miniconda3\envs\mywin38\lib\site-packages\torchaudio\_extension.py", line 98, in _init_extension
    _load_lib("libtorchaudio")
  File "C:\Users\atalman\miniconda3\envs\mywin38\lib\site-packages\torchaudio\_extension.py", line 52, in _load_lib
    torch.ops.load_library(path)
  File "C:\Users\atalman\miniconda3\envs\mywin38\lib\site-packages\torch\_ops.py", line 573, in load_library
    ctypes.CDLL(path)
  File "C:\Users\atalman\miniconda3\envs\mywin38\lib\ctypes\__init__.py", line 373, in __init__
    self._handle = _dlopen(self._name, mode)
FileNotFoundError: Could not find module 'C:\Users\atalman\miniconda3\envs\mywin38\Lib\site-packages\torchaudio\lib\libtorchaudio.pyd' (or one of its dependencies). Try using the full path with constructor syntax.
>>>
```

Caused by dlls not being found in the conda environment
```
C:\Users\atalman\miniconda3\envs\mywin38\bin\
```

While this environment is set correctly in PATH its ignored with Python = 3.8
Please refer to: https://stackoverflow.com/questions/59330863/cant-import-dll-module-in-python

Pull Request resolved: https://github.com/pytorch/audio/pull/2735

Reviewed By: carolineechen

Differential Revision: D40112293

Pulled By: carolineechen

fbshipit-source-id: c7fc9bb49fc3ec4a2855c6ea473f36808103ed1e

Add StreamWriter tutorial (#2698)

Summary:
Add a tutorial for basic usage of torchaudio.io.StreamWriter.

https://output.circle-artifacts.com/output/job/55d9a495-af7a-483c-84cb-de9a08cfd2f3/artifacts/0/docs/tutorials/streamwriter_basic_tutorial.html

Pull Request resolved: https://github.com/pytorch/audio/pull/2698

Reviewed By: carolineechen

Differential Revision: D40133007

Pulled By: carolineechen

fbshipit-source-id: 141f692c32343981bfb228357f21562ffe36f623

Fix sphinx gallery list in io doc (#2736)

Summary:
Specifying multiple object in `:minigallery:` directive shows duplicated tutorials.

This commit fixes it by listing tutorials based on module used.

https://output.circle-artifacts.com/output/job/c3da2a22-40d5-4e2d-b73a-28b39e712817/artifacts/0/docs/io.html

Before:
<img width="694" alt="Screen Shot 2022-10-07 at 7 04 35 AM" src="https://user-images.githubusercontent.com/855818/194427092-ca1202e7-0731-4c18-b48b-24923d692a4a.png">

After:

<img width="648" alt="Screen Shot 2022-10-07 at 7 03 14 AM" src="https://user-images.githubusercontent.com/855818/194426950-5b780458-2bf0-43ef-b020-fcbbfdf8d41b.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2736

Reviewed By: carolineechen

Differential Revision: D40160247

Pulled By: carolineechen

fbshipit-source-id: 547496f9b569ff7a4d70db97e90f3ea503344477

Modify `info_audio` to compute and return number of frames if not found in stream info (#2740)

Summary:
Modifies `info_audio` to compute and return number of frames if not found in stream info. This resolves the `num_frames == 0` issue for mp3 that's cited in https://github.com/pytorch/audio/issues/2524.

Pull Request resolved: https://github.com/pytorch/audio/pull/2740

Reviewed By: nateanl

Differential Revision: D40168639

Pulled By: nateanl

fbshipit-source-id: bb45baa0f9cd56844315b04e40ab9835d825fc24

Update sox info docstring to account for mp3 frame count handling (#2742)

Summary:
Updates sox info docstring to account for mp3 frame count handling fix introduced in https://github.com/pytorch/audio/issues/2740.

Pull Request resolved: https://github.com/pytorch/audio/pull/2742

Reviewed By: nateanl

Differential Revision: D40189846

Pulled By: nateanl

fbshipit-source-id: d6371418d7d4867dd0b97ee72ebf846d5c93dc30

Update HW video processing tutorial (#2739)

Summary:
* Add HW encoding to HW tutorial

https://colab.research.google.com/drive/1DDah_IaGULEO66CfQWltRqaVheBkiXdN#scrollTo=eXzKSVrHk1vS

Pull Request resolved: https://github.com/pytorch/audio/pull/2739

Reviewed By: hwangjeff

Differential Revision: D40197086

Pulled By: hwangjeff

fbshipit-source-id: 1780a5419f6705f7c24ba96bd46c3310438af7db

Add IEMOCAP dataset (#2732)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2732

Reviewed By: nateanl

Differential Revision: D40186996

Pulled By: nateanl

fbshipit-source-id: a0ad325b7153c9e580dad2c515730dadbe8840c4

Fix HuBERT docstring (#2746)

Summary:
The docstring of `wav2vec2` argument is wrong. Fix it in this PR.

Pull Request resolved: https://github.com/pytorch/audio/pull/2746

Reviewed By: carolineechen

Differential Revision: D40225995

Pulled By: nateanl

fbshipit-source-id: 770e9c928ebebd7b6307e181601eb64625d668da

Add unit test for LibriMix dataset (#2659)

Summary:
Besides the unit test, the PR also addresses these issues:
- The original `LibriMix` dataset only supports "min" mode, which means the audio length is the minimum of all clean sources. It is default for source separation task. Users may also want to use "max" mode which allows for end-to-end separation and recognition. The PR adds ``mode`` argument to let users decide which dataset they want to use.
- If the task is ``"enh_both"``, the target is the audios in ``mix_clean`` instead of separate clean sources. The PR fixes it to use ``mix_clean`` as target.

Pull Request resolved: https://github.com/pytorch/audio/pull/2659

Reviewed By: carolineechen

Differential Revision: D40229227

Pulled By: nateanl

fbshipit-source-id: fc07e0d88a245e1367656d3767cf98168a799235

Add Snips Dataset (#2738)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2738

Reviewed By: carolineechen

Differential Revision: D40238099

Pulled By: nateanl

fbshipit-source-id: c5cc94c2a348a6ef34c04b8dd26114ecb874d73e

Fix windows python 3.8 loading path (#2747)

Summary:
Fix windows python 3.8 loading path

Pull Request resolved: https://github.com/pytorch/audio/pull/2747

Reviewed By: nateanl

Differential Revision: D40264326

Pulled By: nateanl

fbshipit-source-id: f4a24757de7b48c63a7481034eb11fc3ff174327

Add metadata for Librimix (#2751)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2751

Reviewed By: nateanl

Differential Revision: D40267874

Pulled By: carolineechen

fbshipit-source-id: 4e45a02c650ed65c05cde82289a400a3be877927

Increase inactivity timeout for binary build jobs (#2754)

Summary:
Increase inactivity timeout for binary build jobs

Pull Request resolved: https://github.com/pytorch/audio/pull/2754

Reviewed By: carolineechen

Differential Revision: D40275368

Pulled By: atalman

fbshipit-source-id: 5e682bb78bda640d615f874fbdf0e650b5a38ee0

Skip hubert xlarge torchscript test (#2758)

Summary:
a couple of circleci unittests are failing during hubert xlarge torchscript test, which has been known to fail on Windows in the past (#65776). this PR disables this test on circleci

cc atalman

Pull Request resolved: https://github.com/pytorch/audio/pull/2758

Reviewed By: mthrok

Differential Revision: D40290535

Pulled By: carolineechen

fbshipit-source-id: 5c5fb43434a517b6c439a8cb8e853015d1550a57

Improve wav2vec2/hubert model for pre-training (#2716)

Summary:
This PR improves the Wav2Vec2/HuBERT model regarding model pre-training.

- The model initialization of positional embedding and transformer module is essential to model pre-training. The accuracy of unmasked frames should be higher than masked frames, as it is an easier task. but without the initialization, the accuracy of masked frames is higher than unmasked frames.
  Compared the performance after two epochs with 16 GPUs.
  - With model initialization, the accuracies of masked/unmasked frames are 0.08/0.11.
  - Without model initialization, the accuracies of masked/unmasked frames are 0.06/0.04.
- After adding the model initialization, the gradient is easy to overflow (aka `nan` gradient). In paper [Self-Supervised Learning for speech recognition with Intermediate layer supervision](https://arxiv.org/abs/2112.08778) the authors propose a simple but effective method to mitigate the overflow issue, by scaling down the multiplication of query and key and subtracting the maximum value from it (subtracting a constant value won't change the output of softmax). Then it guarantees the value won't be overflowed.
- In the original fairseq, the mask indices are generated by `numpy.random.choice`. Here replace `torch.multinomial` with `torch.randperm`. (cc carolineechen).

Other improvements within training scripts will be included in a separate PR.

Pull Request resolved: https://github.com/pytorch/audio/pull/2716

Reviewed By: xiaohui-zhang

Differential Revision: D39832189

Pulled By: nateanl

fbshipit-source-id: f4d2a473a79ad63add2dd16624bd155d5ce4de27

Improve hubert recipe for pre-training and fine-tuning (#2744)

Summary:
following pr https://github.com/pytorch/audio/issues/2716
- For preprocessing
  - The HuBERT feature takes lots of memory which may not fit some machines. Enable to use a subset of feature for training a k-means model.

- For pre-training
  - Normalize the loss based on the total number of masked frames across all GPUs.
  - Use mixed precision training. fp16 is not well supported in pytorch_lightning.
  - Log accuracies of masked/unmasked frames during training.
  - Clip the gradients with norm `10.0`.

- For ASR fine-tuning
  - Normalize the loss based on the total number of batches across all GPUs, same as in the conformer recipe of TorchAudio.
  - Use mixed precision training.
  - Add "|" after the end of transcription to capture the silence/word termination, same as in fairseq recipe.

- Update the WER results on LibriSpeech dev and test sets.

|                   | WER% (Viterbi)|  WER% (KenLM) |
|:-----------------:|--------------:|--------------:|
| dev-clean         |       10.9    |       4.2     |
| dev-other         |       17.5    |       9.4     |
| test-clean        |       10.9    |       4.4     |
| test-other        |       17.8    |       9.5     |

Pull Request resolved: https://github.com/pytorch/audio/pull/2744

Reviewed By: carolineechen

Differential Revision: D40282322

Pulled By: nateanl

fbshipit-source-id: 4723584c912e70e8970149fe09de005385eaab90

Fix typos in tacotron2 tutorial (#2761)

Summary:
`publishe`->`published`

Also, not sure if it should be `pre-trained weight is published` or `pre-trained weights are published`

Pull Request resolved: https://github.com/pytorch/audio/pull/2761

Reviewed By: carolineechen

Differential Revision: D40313042

Pulled By: malfet

fbshipit-source-id: c22085ca0b1125a06aa04bf38231d0a9fbfed00b

Add gtzan download note (#2763)

Summary:
GTZAN download link is no longer working, so the torchaudio download functionality for GTZAN does not work properly, per https://github.com/pytorch/audio/issues/2743. Add a note in the docs to reflect this discovery.

Pull Request resolved: https://github.com/pytorch/audio/pull/2763

Reviewed By: nateanl, mthrok

Differential Revision: D40315071

Pulled By: carolineechen

fbshipit-source-id: 3250326c45d227546a9c62b33ba890199ad19242

Update tutorial author information (#2764)

Summary:
Adding and updating author information.

Pull Request resolved: https://github.com/pytorch/audio/pull/2764

Reviewed By: carolineechen

Differential Revision: D40332427

Pulled By: mthrok

fbshipit-source-id: 4f04c7351386c122e3b0a45c2ed1757a04b7dc9a

Add custom lm example to decoder tutorial (#2762)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2762

Reviewed By: mthrok

Differential Revision: D40332603

Pulled By: carolineechen

fbshipit-source-id: 2de51265adc81b4728f4d6798d287bd2eccf5251

Fix CTCDecoder doc (#2766)

Summary:
* Document `__call__` instead of `__init__`
* List CTCHypothesis first as it is used in combination with CTCDecoder
* Fix indentation of score method docstring

Pull Request resolved: https://github.com/pytorch/audio/pull/2766

Reviewed By: carolineechen

Differential Revision: D40349388

Pulled By: mthrok

fbshipit-source-id: 5e512e6c2b29d3533eb62d09b289154ccd1abf4c

Fix fading in hybrid demucs tutorial (#2769)

Summary:
The separation applies on chunks of audios to avoid OOM. The combination of consecutive chunks is described in the graph:

![image](https://user-images.githubusercontent.com/8653221/195691886-002844e6-4ec5-41de-8910-df8046553998.png)

In the last audio chunk, there is no future chunk to be combined, hence the overlap on the right side doesn't need to be faded.

Pull Request resolved: https://github.com/pytorch/audio/pull/2769

Reviewed By: carolineechen

Differential Revision: D40358382

Pulled By: nateanl

fbshipit-source-id: ec8be895d7a67acb257e2693b64922397163ed5e

Fix leaking matplotlib figure (#2771)

Summary:
In StreamWriter basic usage tutorial, matplotlib is used to generate raster images of waveforms, and the figure used is left unshown in the resulting tutorial with the use of ``sphinx_gallery_defer_figures`` command.

It turned out that this figure is shown in the next code block executed by Sphinx Gallery, and the figure is placed in totally unrelated place. https://pytorch.org/audio/main/tutorials/audio_feature_extractions_tutorial.html

<img width="951" alt="Screen Shot 2022-10-14 at 10 06 58 PM" src="https://user-images.githubusercontent.com/855818/195855124-ecd9be49-5085-4acd-9a93-608d9d1ee9ce.png">

This commit fixes it by closing the figure.

Pull Request resolved: https://github.com/pytorch/audio/pull/2771

Reviewed By: nateanl

Differential Revision: D40382076

Pulled By: mthrok

fbshipit-source-id: 015f2bab8492d3b4fbe70e1174c7776a5aa2679a

Update resampling tutorial (#2773)

Summary:
* Refactor benchmark script
* Rename `time` variable to avoid (potential) conflicting with time module
* Fix `beta` parameter in benchmark (it was not used previously)
* Use `timeit` module for benchmark
* Add plot
* Move the comment on result at the end
* Add link to an explanation of aliasing

https://output.circle-artifacts.com/output/job/20b57d2f-3614-4161-a18e-e0c1a537739c/artifacts/0/docs/tutorials/audio_resampling_tutorial.html

Pull Request resolved: https://github.com/pytorch/audio/pull/2773

Reviewed By: carolineechen

Differential Revision: D40421337

Pulled By: mthrok

fbshipit-source-id: b402f84d4517695daeca75fb84ad876ef9354b3a

Update description of HDemucs pipelines (#2774)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2774

Reviewed By: carolineechen

Differential Revision: D40445274

Pulled By: nateanl

fbshipit-source-id: 6388323a5fa5c548a86829cb3f7cafee5382d18d

Add file_name to the returned item in Snips dataset (#2775)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2775

Reviewed By: carolineechen

Differential Revision: D40481144

Pulled By: nateanl

fbshipit-source-id: 5d0fb2478767704603a3ec28d74160e7892d4d0e

Update download path for speechcommands (#2777)

Summary:
previous download link for v0.02 did not download the entire dataset, but only the training dataset, resulting in issues when trying to access the testing or validation data.

Pull Request resolved: https://github.com/pytorch/audio/pull/2777

Reviewed By: nateanl

Differential Revision: D40480605

Pulled By: carolineechen

fbshipit-source-id: a594506b4ccfb548a7d5043b716c58463480c103

Add notes on file structure in Voxceleb1 based datasets (#2776)

Summary:
The file structure of VoxCeleb1 is as follows:
```
root/
└── wav/
    └── speaker_id folders
```
Users who use [Kaldi](https://github.com/kaldi-asr/kaldi/blob/f6f4ccaf213f0fe8b26e633a7dc0c802150626a0/egs/voxceleb/v1/local/make_voxceleb1_v2.pl) to get the VoxCeleb1 dataset have "dev" and "test" folders above "wav" folder. However, in the file lists like https://www.robots.ox.ac.uk/~vgg/data/voxceleb/meta/veri_test.txt or https://www.robots.ox.ac.uk/~vgg/data/voxceleb/meta/iden_split.txt there is not such differentiation. It's not necessary to put the extracted files into separate folders.

This PR adds notes in `VoxCeleb1Identification` and `VoxCeleb1Verification` datasets to inform the file structure to users.

Pull Request resolved: https://github.com/pytorch/audio/pull/2776

Reviewed By: carolineechen

Differential Revision: D40483707

Pulled By: nateanl

fbshipit-source-id: ccd1780a72a5b53f0300c2466c3073a293ad7b8d

[Nova] New GHA Workflow for Docstring Sync (#2720)

Summary:
Create a standalone GitHub Actions workflow for Docstring Sync. This job (https://app.circleci.com/pipelines/github/pytorch/audio/12625/workflows/96223ad2-0fcd-4dae-a045-d530aaf9b55c/jobs/907466) currently depends on linux wheels builds, which creates a dependency that makes the migration to Nova trickier. This PR creates a fresh standalone workflow for this job that is triggered per-PR and before nightly/release cuts.

Pull Request resolved: https://github.com/pytorch/audio/pull/2720

Reviewed By: izaitsevfb, seemethere

Differential Revision: D39863574

Pulled By: osalpekar

fbshipit-source-id: 8599dc006693242278857a3dedeb4fddc1eed14b

[Nova] Clean commit for Enabling Nova Linux Wheels Workflows (#2719)

Summary:
Creating this fresh PR since we're reverting the older commit that removed build configs from the CircleCI file. This does not change the existing builds/uploads in CircleCI, and should not break any existing jobs/workflows. This is just to add back workflows to build the Linux Wheels with Nova, upload them to GH artifacts (NOT to the actual nightly channels), and ensure that they produce the same binaries as CircleCI. TO CLARIFY: this does not upload anything to nightly channels, so this PR has not effect on any existing jobs or distributed binaries.

We will create a workflow (most likely in test-infra) that does this comparison between the binaries to ensure there is parity between the binaries before we start uploading with Nova.

Pull Request resolved: https://github.com/pytorch/audio/pull/2719

Reviewed By: hwangjeff, weiwangmeta

Differential Revision: D39866440

Pulled By: osalpekar

fbshipit-source-id: 9ebf0402214fcd97cc519801276d85d336617410

Add iemocap variants (#2778)

Summary:
add ability to load only improvised or only scripted utterances.

Pull Request resolved: https://github.com/pytorch/audio/pull/2778

Reviewed By: nateanl

Differential Revision: D40511865

Pulled By: carolineechen

fbshipit-source-id: e1fe3908ac2aa306ad30c242ddd25762b2268539

Bump version to 0.14 (#2779)

Summary:
Bump version to 0.14

Pull Request resolved: https://github.com/pytorch/audio/pull/2779

Reviewed By: carolineechen

Differential Revision: D40523034

Pulled By: atalman

fbshipit-source-id: 325e6ffcac4763a7d83ba600c2c3d9eadae03c31

Fix doc in torchaudio.backend (#2781)

Summary:
address https://github.com/pytorch/audio/issues/2780

Pull Request resolved: https://github.com/pytorch/audio/pull/2781

Reviewed By: carolineechen, mthrok

Differential Revision: D40556794

Pulled By: nateanl

fbshipit-source-id: b24912489d41e5663b4b4dcfb8be743fb962097e

Remove archive file in gh-pages branch (#2786)

Summary:
The motivation of generating `artifact.tar.gz` in the `build_docs` job is to easily use it for adding documentation in each stable release. But it is committed into `gh-pages` branch which causes the git repository very huge (see https://github.com/pytorch/audio/issues/2783). This PR removes the tar file from the commit.

Pull Request resolved: https://github.com/pytorch/audio/pull/2786

Reviewed By: caroli…
BriansIDP pushed a commit to BriansIDP/audio that referenced this pull request Jan 23, 2023
Conformer RNN-T with TCPGen for biasing

first commit BrianSun

Conformer RNN-T with TCPGen for biasing

parent c68625152ad84f9ea4e881fac695f7d98ee326a9
author Caroline Chen <carolinechen@fb.com> 1659983982 -0700
committer G. Sun <gs534@login-e-3.data.cluster> 1674296079 +0000

parent c68625152ad84f9ea4e881fac695f7d98ee326a9
author Caroline Chen <carolinechen@fb.com> 1659983982 -0700
committer G. Sun <gs534@login-e-3.data.cluster> 1674296047 +0000

parent c68625152ad84f9ea4e881fac695f7d98ee326a9
author Caroline Chen <carolinechen@fb.com> 1659983982 -0700
committer G. Sun <gs534@login-e-3.data.cluster> 1674295932 +0000

parent c68625152ad84f9ea4e881fac695f7d98ee326a9
author Caroline Chen <carolinechen@fb.com> 1659983982 -0700
committer G. Sun <gs534@login-e-3.data.cluster> 1674295795 +0000

parent c68625152ad84f9ea4e881fac695f7d98ee326a9
author Caroline Chen <carolinechen@fb.com> 1659983982 -0700
committer G. Sun <gs534@login-e-3.data.cluster> 1674295664 +0000

parent c68625152ad84f9ea4e881fac695f7d98ee326a9
author Caroline Chen <carolinechen@fb.com> 1659983982 -0700
committer G. Sun <gs534@login-e-3.data.cluster> 1674295524 +0000

parent c68625152ad84f9ea4e881fac695f7d98ee326a9
author Caroline Chen <carolinechen@fb.com> 1659983982 -0700
committer G. Sun <gs534@login-e-3.data.cluster> 1674295462 +0000

Fix stylecheck (#2606)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2606

Reviewed By: nateanl

Differential Revision: D38502666

Pulled By: carolineechen

fbshipit-source-id: 1e279996fff3621835a07882c63328856fe38f3a

Add NNLM support to CTC Decoder (#2528)

Summary:
Expose flashlight's LM and LMState classes to support decoding with custom language models, including NN LMs.

The `ctc_decoder` API is as follows
- To decode with KenLM, pass in KenLM language model path to `lm` variable
- To decode with custom LM, create Python class with `CTCDecoderLM` subclass, and pass in the class to `lm` variable. Additionally create a file of LM words listed in order of the LM index, with a word per line, and pass in the file to `lm_path`.
- To decode without a language model, set `lm` to `None` (default)

Validated against fairseq w2l decoder on sample LibriSpeech dataset and LM. Code for validation can be found [here](https://github.com/facebookresearch/fairseq/compare/main...carolineechen:fairseq:ctc-decoder). Also added unit tests to validate custom implementations of ZeroLM and KenLM, and also using a biased LM.

Follow ups:
- Train simple LM on LibriSpeech and demonstrate usage in tutorial or examples directory

cc jacobkahn

Pull Request resolved: https://github.com/pytorch/audio/pull/2528

Reviewed By: mthrok

Differential Revision: D38243802

Pulled By: carolineechen

fbshipit-source-id: 445e78f6c20bda655aabf819fc0f771fe68c73d7

Fix dataset docs parsing issue with extra spaces (#2607)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2607

Reviewed By: carolineechen, nateanl

Differential Revision: D38522606

Pulled By: skim0514

fbshipit-source-id: 2c38b8dcb343bcf624bfda1bfa2afd91abf2e668

Fixed argument validation in TorchAudio filtering (#2609)

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2609

Converted argument validations in torchaudio/functional/filtering from assert based validation to the preferred if-then raise validation. Added specific error messages in all cases.

Reviewed By: mthrok

Differential Revision: D38515029

fbshipit-source-id: 6c644a042f86c6feb2bbe8bd02fdb484fe27fae9

Fix bug in Conformer RNN-T recipe (#2611)

Summary:
https://github.com/pytorch/audio/issues/2535 modified the Conformer RNN-T Lightning module to accept a SentencePiece model instance rather than a file path. This PR makes changes to account for this in the train script.

Pull Request resolved: https://github.com/pytorch/audio/pull/2611

Reviewed By: carolineechen

Differential Revision: D38578892

Pulled By: hwangjeff

fbshipit-source-id: ec3b9823ad30ffb730baa13d10d8b79020866aac

Add additive noise function (#2608)

Summary:
Adds function `add_noise`, which computes and returns the sum of a waveform and scaled noise.

Pull Request resolved: https://github.com/pytorch/audio/pull/2608

Reviewed By: nateanl

Differential Revision: D38557141

Pulled By: hwangjeff

fbshipit-source-id: 1457fa213f43ca5b4333d3c7580971655d4260a0

Introducing pytorch-cuda metapackage (#2612)

Summary:
Introducing pytorch-cuda metapackage

Same as: https://github.com/pytorch/vision/pull/6371
Following PR: https://github.com/pytorch/builder/pull/1094
Adds cuda metapackage called pytorch-cuda . This way we can make sure to install correct version of cuda dependencies and don't depend on conda-forge.

Pull Request resolved: https://github.com/pytorch/audio/pull/2612

Reviewed By: hwangjeff, seemethere, nateanl

Differential Revision: D38633332

Pulled By: atalman

fbshipit-source-id: 78a6115bb252ebdb6d66a57d7d2c4a4978ddb501

Remove outdated doc (#2617)

Summary:
`ctc_decoder` has become beta, remove it from prototype documents.

Pull Request resolved: https://github.com/pytorch/audio/pull/2617

Reviewed By: hwangjeff

Differential Revision: D38706869

Pulled By: nateanl

fbshipit-source-id: 41679f4e65a584b6b882af4551a50123f1dcef02

Update doc version selector link (#2605)

Summary:
The link to version selector has been absolute link, which had been
a trap when reviewing gh-pages deployment from folk.

This commit changes that to relative link.

Pull Request resolved: https://github.com/pytorch/audio/pull/2605

Test Plan:
- https://mthrok.github.io/audio/main/index.html -> click version selector -> https://mthrok.github.io/audio/versions.html
- https://mthrok.github.io/audio/0.12.1/index.html -> click version selector -> https://pytorch.org/audio/versions.html

Reviewed By: carolineechen, nateanl

Differential Revision: D38695645

Pulled By: mthrok

fbshipit-source-id: 91132ac19b8c61f39d304a162435b9c6599ef2b2

Fix anaconda upload (#2621)

Summary:
Same as:
https://github.com/pytorch/vision/pull/6422

Testing:
```
export ANACONDA_PATH=$(conda info --base)/bin
echo $ANACONDA_PATH
/opt/homebrew/Caskroom/miniconda/base/bin
$ANACONDA_PATH/anaconda -V
anaconda Command line client (version 1.10.0)
```
Failure: https://github.com/pytorch/audio/runs/7837085749?check_suite_focus=true

Pull Request resolved: https://github.com/pytorch/audio/pull/2621

Reviewed By: weiwangmeta, seemethere

Differential Revision: D38714324

Pulled By: atalman

fbshipit-source-id: 55342cf69006e9250403c955202846bab4516f3e

Move xcode to 14 from 12.5 (#2622)

Summary:
Similar to https://github.com/pytorch/vision/pull/6218
Fixing MacOS builds

Pull Request resolved: https://github.com/pytorch/audio/pull/2622

Reviewed By: weiwangmeta

Differential Revision: D38722983

Pulled By: atalman

fbshipit-source-id: 4cef85c97dc270fc812bc289592c4f3815f73c85

Added example for MelScale transform (#2616)

Summary:
Added example for MelScale transform as mentioned in issue https://github.com/pytorch/audio/issues/1564.

Pull Request resolved: https://github.com/pytorch/audio/pull/2616

Reviewed By: carolineechen

Differential Revision: D38743145

Pulled By: nateanl

fbshipit-source-id: e24ca92f5317a0ea5a141418bf084b12cfb22486

Added example for AmplitudeToDB transform (#2615)

Summary:
Added example for AmplitudeToDB transform as mentioned in issue https://github.com/pytorch/audio/issues/1564.

Pull Request resolved: https://github.com/pytorch/audio/pull/2615

Reviewed By: carolineechen

Differential Revision: D38743117

Pulled By: nateanl

fbshipit-source-id: bf0f760299f4777a4bca65da86359faa00b16207

Use double quotes for string in functional and transforms (#2618)

Summary:
To make the code consistent, we should use double quotation marks for all strings. This PR make such changes in functional and transforms.

Pull Request resolved: https://github.com/pytorch/audio/pull/2618

Reviewed By: carolineechen

Differential Revision: D38744137

Pulled By: nateanl

fbshipit-source-id: 74213a24d9f66c306cc92019d77dcb2a877f94bd

Fix doc warning (#2627)

Summary:
Resolves the following warning

```
/torchaudio/docs/source/transforms.rst:94: WARNING: Title underline too short.

:hidden:`Loudness`
-----------------
```

Pull Request resolved: https://github.com/pytorch/audio/pull/2627

Reviewed By: carolineechen

Differential Revision: D38814802

Pulled By: mthrok

fbshipit-source-id: 5dfaf2d7bae22dba0f4a14f04ca63f28d6b2a749

Fix Sphinx-gallery display and pin sphinx-related packages (#2629)

Summary:
This commit fixes the issue with the recent Sphinx-Gallery update.
Also it pins the versions of Sphinx-related packages.

Before:

<img width="256" alt="Screen Shot 2022-08-17 at 10 02 23 PM" src="https://user-images.githubusercontent.com/855818/185140952-28f2d98a-b586-424c-a003-b69089f48eb9.png">

After:

https://user-images.githubusercontent.com/855818/185271889-bd4f86a0-986b-43bb-8121-bd77750d74f0.mov

Pull Request resolved: https://github.com/pytorch/audio/pull/2629

Reviewed By: carolineechen

Differential Revision: D38816417

Pulled By: mthrok

fbshipit-source-id: 11ee3f9121d9a302772ee1f461dacae52eb28852

Tweak tutorials (#2630)

Summary:
Resolves the following warnings

```
/torchaudio/docs/source/tutorials/asr_inference_with_ctc_decoder_tutorial.rst:195: WARNING: Unexpected indentation.
/torchaudio/docs/source/tutorials/asr_inference_with_ctc_decoder_tutorial.rst:446: WARNING: Unexpected indentation.
/torchaudio/docs/source/tutorials/audio_io_tutorial.rst:559: WARNING: Content block expected for the "note" directive; none found.
/torchaudio/docs/source/tutorials/mvdr_tutorial.rst:338: WARNING: Bullet list ends without a blank line; unexpected unindent.
```

Pull Request resolved: https://github.com/pytorch/audio/pull/2630

Reviewed By: nateanl

Differential Revision: D38816632

Pulled By: mthrok

fbshipit-source-id: 135ded4e064d136be67ce24439e96f5e9c9ce635

Update notes around nightly build and third parties (#2632)

Summary:
Google Colab now has torchaudio 0.12 pre-installed.
This commit removes the note about nightly build.

Pull Request resolved: https://github.com/pytorch/audio/pull/2632

Reviewed By: carolineechen

Differential Revision: D38827632

Pulled By: mthrok

fbshipit-source-id: ac769780868b741c3012357d589ec0019d9af6eb

Added example for InverseMelScale transform (#2635)

Summary:
Added example for InverseMelScale transform as mentioned in issue https://github.com/pytorch/audio/issues/1564.

Pull Request resolved: https://github.com/pytorch/audio/pull/2635

Reviewed By: carolineechen

Differential Revision: D38830318

Pulled By: nateanl

fbshipit-source-id: fd26a700d495f6755db0767625aa8577cb89bd83

Update ASR inference tutorial (#2631)

Summary:
* Use download_asset
* Remove notes around nightly
* Print versions first
* Remove duplicated import

Pull Request resolved: https://github.com/pytorch/audio/pull/2631

Reviewed By: carolineechen

Differential Revision: D38830395

Pulled By: mthrok

fbshipit-source-id: c9259df33562defe249734d1ed074dac0fddc2f6

Update README.md (#2633)

Summary:
Update compatibility matrix

Pull Request resolved: https://github.com/pytorch/audio/pull/2633

Reviewed By: nateanl

Differential Revision: D38827670

Pulled By: mthrok

fbshipit-source-id: 5c66bf60a06e37919ee725a5f4adf571e6c89100

Refactor sox pybind source code (#2636)

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2636

At the early stage of torchaudio extension module,
`torchaudio/csrc/pybind` directory was created so that
all the code defining Python interface would be placed
there and there will be only one extension module called
`torchaudio._torchaudio`.

However, the codebase has been evolved in a way separate
extensions are defined for each feature (third party
dependency) for the sake of more moduler file organization.

What is left in `csrc/pybind` is libsox Python bindings.
This commit moves it under `csrc/sox`.

Follow-up rename `torchaudio._torchaudio` to `torchaudio._torchaudio_sox`.

Reviewed By: carolineechen

Differential Revision: D38829253

fbshipit-source-id: 3554af45a2beb0f902810c5548751264e093f28d

Added example for MFCC transform (#2637)

Summary:
Added example for MFCC transform as mentioned in issue https://github.com/pytorch/audio/issues/1564.

Note: Python formatter package `black` uses double quotes for the string dict keys (e.g. in `melkwargs` for this example). Please let me know if there is a different linter/format/convention that is preferred!

Pull Request resolved: https://github.com/pytorch/audio/pull/2637

Reviewed By: carolineechen

Differential Revision: D38873729

Pulled By: nateanl

fbshipit-source-id: 2e8fe2930671e7c5d02c0c37cf1ca5cc8c5079e3

Added example for Loudness transform (#2641)

Summary:
Added example for Loudness transform (implemented in PR https://github.com/pytorch/audio/issues/2472) as mentioned in issue https://github.com/pytorch/audio/issues/1564.

Pull Request resolved: https://github.com/pytorch/audio/pull/2641

Reviewed By: nateanl

Differential Revision: D38907782

Pulled By: carolineechen

fbshipit-source-id: fd2bcc4bac3095a626ea9cf36cb70cb2bf003d63

Update Sphinx-gallery to 0.11.1 (#2638)

Summary:
The minor release fixes some gallery issue, which allows to remove
some of the customization we had in https://github.com/pytorch/audio/issues/2629

https://output.circle-artifacts.com/output/job/553a9b98-8260-4cb4-a681-20ef97d2c33e/artifacts/0/docs/pipelines.html#torchaudio.pipelines.Wav2Vec2ASRBundle

Pull Request resolved: https://github.com/pytorch/audio/pull/2638

Reviewed By: carolineechen, nateanl

Differential Revision: D38909097

Pulled By: mthrok

fbshipit-source-id: 78346d93b54fca2a19b28991c224324ef53221c9

[Nova] Added draft calling GHA workflow for building linux wheels (#2548)

Summary:
As part of Project Nova, we are consolidating CI/CD workflows and infra, making them reusable across PyTorch ecosystem libraries. https://github.com/pytorch/test-infra/pull/460 introduces a general-purpose reusable workflow to build linux wheels for python libraries. This PR introduces a caller workflow that triggers the reusable workflow. Details around modular env setup, passing input args across workflows, etc. are still being worked out.

Using reusable workflow defined in https://github.com/pytorch/test-infra/pull/506

Pull Request resolved: https://github.com/pytorch/audio/pull/2548

Reviewed By: osalpekar

Differential Revision: D38947733

Pulled By: mehtanirav

fbshipit-source-id: 03ab88cef973a092f5c5d1ff8c74ec7ae7e46d01

Added example for LFCC transform (#2640)

Summary:
Added example for LFCC transform as mentioned in issue https://github.com/pytorch/audio/issues/1564.

Pull Request resolved: https://github.com/pytorch/audio/pull/2640

Reviewed By: carolineechen

Differential Revision: D38908975

Pulled By: nateanl

fbshipit-source-id: ffdd994390db7f27556b011a8050a65eef9cd09d

Add StreamWriter (#2628)

Summary:
This commit adds FFmpeg-based encoder StreamWriter class.
StreamWriter is pretty much the opposite of StreamReader class, and
it supports;

* Encoding audio / still image / video
* Exporting to local file / streaming protocol / devices etc...
* File-like object support (in later commit)
* HW video encoding (in later commit)

See also: https://fburl.com/gslide/z85kn5a9 (Meta internal)

Pull Request resolved: https://github.com/pytorch/audio/pull/2628

Reviewed By: nateanl

Differential Revision: D38816650

Pulled By: mthrok

fbshipit-source-id: a9343b0d55755e186971dc96fb86eb52daa003c8

[Nova] Build Linux Conda Binaries using reusable workflow (#2626)

Summary:
Calling the reusable workflow introduced in https://github.com/pytorch/test-infra/pull/546 to build conda binaries on linux.

Pull Request resolved: https://github.com/pytorch/audio/pull/2626

Reviewed By: mehtanirav

Differential Revision: D39028057

Pulled By: osalpekar

fbshipit-source-id: d74ea3771967d0ee2b0ad28a8f811a95145b2183

Replace bg_iterator in examples (#2645)

Summary:
`bg_iterator` was deprecated in 0.11 because it was known to have issues (deadlock) without speed up. Remove instances of `bg_iterator` used in torchaudio examples.

Resolves https://github.com/pytorch/audio/issues/2642

Pull Request resolved: https://github.com/pytorch/audio/pull/2645

Reviewed By: nateanl

Differential Revision: D38954292

Pulled By: carolineechen

fbshipit-source-id: 2333ab5228c2b8511ff532057543aaf9d02b2789

[Nova] Use pkg-helpers to modularize GHA Linux Conda Builds (#2650)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2650

Reviewed By: mehtanirav

Differential Revision: D39040559

Pulled By: osalpekar

fbshipit-source-id: df39e23d7c246728793aab969b8dc1070af88d75

add CUDA 11.7 builds (#2623)

Summary:
CC atalman

Pull Request resolved: https://github.com/pytorch/audio/pull/2623

Reviewed By: hwangjeff, nateanl

Differential Revision: D39036432

Pulled By: atalman

fbshipit-source-id: cd74a1bf8f74e31bd2c32c80d32c617f4b1766e8

Add file-like object support to StreamWriter (#2648)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2648

Reviewed By: nateanl

Differential Revision: D38976874

Pulled By: mthrok

fbshipit-source-id: 0541dea2a633d97000b4b8609ff6b83f6b82c864

Add CUDA HW encoding support to StreamWriter (#2505)

Summary:
This commits add CUDA hardware encoding to StreamWriter.
For certain video formats, it can encode video directly from
CUDA Tensor, without needing to move the data to host CPU.

Pull Request resolved: https://github.com/pytorch/audio/pull/2505

Reviewed By: hwangjeff

Differential Revision: D37446830

Pulled By: mthrok

fbshipit-source-id: eee6424f01a99a3b611dcad45ed58f86cba4672a

Remove obsolete examples (#2655)

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2655

Removed obsolete example and the corresponding test

Reviewed By: mthrok

Differential Revision: D39260253

fbshipit-source-id: 0bde71ffd75dd0c94a5cc4a9940f4648a5d61bd7

Add metadata function for LibriSpeech (#2653)

Summary:
Adding support for metadata mode, requested in https://github.com/pytorch/audio/issues/2539, by adding a public `get_metadata()` function in the dataset. This function can be used directly by users to fetch metadata for individual dataset indices, or users can subclass the dataset and override `__getitem__` with `get_metadata` to create a dataset class that directly handles metadata mode.

Pull Request resolved: https://github.com/pytorch/audio/pull/2653

Reviewed By: nateanl, mthrok

Differential Revision: D39105114

Pulled By: carolineechen

fbshipit-source-id: 6f26f1402a053dffcfcc5d859f87271ed5923348

Fix random Gaussian generation (#2639)

Summary:
This PR is meant to address the bug raised in issue https://github.com/pytorch/audio/issues/2634.

In particular, previously the Box Muller transform was used to generate Gaussian variates for dithering based on `torch.rand` uniform variates, but it was incorrectly implemented (e.g. the same uniform variate was used as input to the transform, rather than two different uniform variates), which led to a different (non-Gaussian) distribution. This PR instead uses `torch.randn` to generate the Gaussian variates.

Pull Request resolved: https://github.com/pytorch/audio/pull/2639

Reviewed By: mthrok

Differential Revision: D39101144

Pulled By: carolineechen

fbshipit-source-id: 691e49679f6598ef0a1675f6f4ee721ef32215fd

Tweak documentation (#2656)

Summary:
1. Override class `__module__` attribute in `conf.py` so that no manual override is necessary
2. Fix SourceSeparationBundle member attribute

Pull Request resolved: https://github.com/pytorch/audio/pull/2656

Reviewed By: carolineechen

Differential Revision: D39293053

Pulled By: mthrok

fbshipit-source-id: 2b8d6be1aee517d0e692043c26ac2438a787adc6

Fix LibriSpeech Conforner RNN-T eval script (#2666)

Summary:
`ConformerRNNTModule`'s initializer now accepts a SentencePiece model rather than a path to a model as input. This PR corrects `eval.py` accordingly.

Pull Request resolved: https://github.com/pytorch/audio/pull/2666

Reviewed By: carolineechen

Differential Revision: D39386968

Pulled By: hwangjeff

fbshipit-source-id: 95a94dd898263d648650f7376c29810b1456d6c1

[Nova] Remove the old caller GitHub Actions Linux wheels/conda Build Workflows (#2660)

Summary:
We moved over to a new design for release workflows that encompass all the build logic in the test-infra repo (apart from custom pre-build and post-build scripts). Thus, we no longer need these caller workflows in the audio repo. This PR removes them entirely.

Pull Request resolved: https://github.com/pytorch/audio/pull/2660

Reviewed By: seemethere

Differential Revision: D39392456

Pulled By: osalpekar

fbshipit-source-id: a8bdeb4738b91666abcdc883f6f8f1bf359f1d42

Move hybrid demucs model out of prototype (#2668)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2668

Reviewed By: nateanl, mthrok

Differential Revision: D39433671

Pulled By: carolineechen

fbshipit-source-id: 3545a5b4019832861c34fd8c05e5f8600fd80d5c

Do not use nested namespaces in torchaudio/sox (#2663)

Summary:
As it is a C++17 feature, and PyTorch and its extensions must still be C++14 compatible, as also specified in the top level CMakeLists.txt:
https://github.com/pytorch/audio/blob/8a0d7b36f7821fe55175f0d4e3ca6299b3817a6c/CMakeLists.txt#L30

Otherwise, it pollutes build logs with noisy
```
/Users/runner/work/test-infra/test-infra/pytorch/audio/torchaudio/csrc/sox/pybind/io.cpp:12:21: warning: nested namespace definition is a C++17 extension; define each namespace separately [-Wc++17-extensions]
namespace torchaudio::sox_io {
                    ^~~~~~~~
                     { namespace sox_io
```

Pull Request resolved: https://github.com/pytorch/audio/pull/2663

Reviewed By: atalman, nateanl

Differential Revision: D39362842

Pulled By: malfet

fbshipit-source-id: f9659d4420f1cc0194990d531455cf59b66c26b9

[Bootcamp] Fix Typo (#2661)

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2661

Fixed typo in `audio_data_augmentation_tutorial.py`

Reviewed By: malfet, mthrok

Differential Revision: D39352353

fbshipit-source-id: aea35dab03fb7422421948bd26716e10a8d65f92

Move SourceSeparationBundle and pre-trained ConvTasNet pipeline into Beta (#2669)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2669

Reviewed By: carolineechen, mthrok

Differential Revision: D39433560

Pulled By: nateanl

fbshipit-source-id: 5b652b31c00badb37b27a32ac25b422a5bcc74cb

CUDA 11.3 remove. New Stable version is 11.6 (#2670)

Summary:
CUDA 11.3 Removing.

Core PR: https://github.com/pytorch/pytorch/pull/84866
cc malfet ptrblck

Pull Request resolved: https://github.com/pytorch/audio/pull/2670

Reviewed By: malfet, osalpekar

Differential Revision: D39449263

Pulled By: atalman

fbshipit-source-id: f86bb119685ead3ffcabd92c4bb8076aecde4095

Move Hybrid Demucs pipeline to beta (#2673)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2673

Reviewed By: mthrok

Differential Revision: D39507612

Pulled By: carolineechen

fbshipit-source-id: 3a9ee53f72cabd6e3085c76867017be4a6ed7f53

Add Decoder LM Docs (#2658)

Summary:
modifications to ctc decoder LM docstrings on top of https://github.com/pytorch/audio/issues/2657

Pull Request resolved: https://github.com/pytorch/audio/pull/2658

Reviewed By: mthrok

Differential Revision: D39468921

Pulled By: carolineechen

fbshipit-source-id: c5497cc2fa22fb98a304d037e27c91bf68a9ad6a

Tweak badge link URL generation (#2677)

Summary:
Currently, the way feature badges are generated assumes that both documentations and the supported features page are on the same level from the root.

This does not work when we introduce `:autosummary:` which generates individual documentation pages one level below.

This commit changes it so that links to the supported features page are properly relative from the documentation level.

There is no appearance change from this commit.

Pull Request resolved: https://github.com/pytorch/audio/pull/2677

Reviewed By: carolineechen

Differential Revision: D39507451

Pulled By: mthrok

fbshipit-source-id: f18da4201f0eb747586be21c8bd9a958217aebc2

Move conv_tasnet_base doc out of prototype (#2675)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2675

Reviewed By: carolineechen

Differential Revision: D39515996

Pulled By: nateanl

fbshipit-source-id: 5824375f6a758af21b6ad6c635dd06081663644f

Consolidate bibliography / reference (#2676)

Summary:
Preparation for the adoptation of `autosummary`.

Replace `:footcite:` with `:cite:` and introduce dedicated reference page, as `:footcite:` does not work well with `autosummary`.

Example:

https://output.circle-artifacts.com/output/job/4da47ba6-d9c7-418e-b5b0-e9f8a146a6c3/artifacts/0/docs/datasets.html#cmuarctic

https://output.circle-artifacts.com/output/job/4da47ba6-d9c7-418e-b5b0-e9f8a146a6c3/artifacts/0/docs/references.html

Pull Request resolved: https://github.com/pytorch/audio/pull/2676

Reviewed By: carolineechen

Differential Revision: D39509431

Pulled By: mthrok

fbshipit-source-id: e6003dd01ec3eff3d598054690f61de8ee31ac9a

Update doc theme to the latest (#2679)

Summary:
To follow the change related to Linux Foundation movement.

(we are still pinning the theme version so that our customization does not break randomly.)

Pull Request resolved: https://github.com/pytorch/audio/pull/2679

Reviewed By: carolineechen

Differential Revision: D39531566

Pulled By: mthrok

fbshipit-source-id: 64353577d05f9dbda00dd9d10b9ebcedddfdce5b

Update Sphinx to 5.1.1 (#2678)

Summary:
Previous versions of Sphinx reported wrong path for return class. This issue is fixed on the latest Sphinx.

It allows to remove the patch we apply in `conf.py`. This is essential for the adoptation of `:autosummary:`, as it won't render correctly with the patch.

https://output.circle-artifacts.com/output/job/19d93ede-08de-4b9e-9d66-67ca5dab964e/artifacts/0/docs/pipelines.html

Pull Request resolved: https://github.com/pytorch/audio/pull/2678

Reviewed By: carolineechen

Differential Revision: D39509447

Pulled By: mthrok

fbshipit-source-id: e104bc6a87f32cba6c549a9fe8f2d1e489ee27e4

Switch to use conda install action for m1 builds (#2674)

Summary:
Usage setup-minicoda action for m1 build
We want to try to address space issues on m1. The following action:
```
pytorch/test-infra/.github/actions/setup-miniconda@main
```

Sets up miniconda in temp folder which should be cleaned between runs

Pull Request resolved: https://github.com/pytorch/audio/pull/2674

Reviewed By: jeanschmidt

Differential Revision: D39540481

Pulled By: atalman

fbshipit-source-id: 0596598ab6b2f99c775aa0c9e14a3a388533068d

Adopt `:autosummary:` in `torchaudio.io` module doc (#2681)

Summary:
This commit adopts :autosummary: directive to `torchaudio.io` module.
It adds table of contents on `torchaudio.io` level.

https://output.circle-artifacts.com/output/job/282089d1-c120-4d22-809f-0e0ac0947c37/artifacts/0/docs/io.html
<img width="1094" alt="Screen Shot 2022-09-16 at 7 33 32 AM" src="https://user-images.githubusercontent.com/855818/190520248-27e469f8-7689-4dc2-b591-7b3f08bb4dff.png">

https://output.circle-artifacts.com/output/job/282089d1-c120-4d22-809f-0e0ac0947c37/artifacts/0/docs/generated/torchaudio.io.StreamReader.html#torchaudio.io.StreamReader
<img width="1108" alt="Screen Shot 2022-09-16 at 7 33 59 AM" src="https://user-images.githubusercontent.com/855818/190520292-d090fed0-2f18-4961-b9f3-9e4808fd437e.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2681

Reviewed By: carolineechen

Differential Revision: D39560459

Pulled By: mthrok

fbshipit-source-id: 3de5f22b8d8d0834dfd8bac8619fbfaa44c5f4dd

Adopt `:autosummary:` in `torchaudio.models.decoder` module doc (#2684)

Summary:
* Adopts `:autosummary:` in decoder module doc
* Hide the constructor signature of `CTCDecoder` as `ctc_decoder` function is the one client code is supposed to be using.
* Introduce `children` property to `CTCDecoderLMState` otherwise it does not show up in the doc.

https://output.circle-artifacts.com/output/job/7aac5eb9-7d2d-4f63-bcdf-83a6f40b4e5a/artifacts/0/docs/models.decoder.html

<img width="748" alt="Screen Shot 2022-09-16 at 5 23 22 PM" src="https://user-images.githubusercontent.com/855818/190592409-0c2ec8a4-d2cf-4d76-a965-8a570faaeb1a.png">

https://output.circle-artifacts.com/output/job/7aac5eb9-7d2d-4f63-bcdf-83a6f40b4e5a/artifacts/0/docs/generated/torchaudio.models.decoder.CTCDecoder.html#torchaudio.models.decoder.CTCDecoder

<img width="723" alt="Screen Shot 2022-09-16 at 5 23 53 PM" src="https://user-images.githubusercontent.com/855818/190592501-3fad1e07-ae3e-44f5-93be-f33181025390.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2684

Reviewed By: carolineechen

Differential Revision: D39574272

Pulled By: mthrok

fbshipit-source-id: d977660bd46f5cf98c535adbf2735be896b28773

Adopt `:autosummary:` in `torchaudio.transforms` module doc (#2683)

Summary:
* Introduce the mini-index at `torchaudio.transforms` page.
* Add "Augmentations" subsection.
* Also updated the overall introduction.

https://output.circle-artifacts.com/output/job/1b65246a-403c-4d2c-b97d-d1b582d8b4e5/artifacts/0/docs/transforms.html

<img width="721" alt="Screen Shot 2022-09-16 at 5 20 08 PM" src="https://user-images.githubusercontent.com/855818/190591795-97c169db-a95b-480a-8d3c-d80072efa045.png">

<img width="755" alt="Screen Shot 2022-09-16 at 5 20 28 PM" src="https://user-images.githubusercontent.com/855818/190591828-03026918-febd-4194-91aa-7d8f704e17cc.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2683

Reviewed By: carolineechen

Differential Revision: D39574255

Pulled By: mthrok

fbshipit-source-id: a4beed7cacbb5184bad96efa903a3a1123dab627

[Nova] Remove Extraneous Build Scripts (#2695)

Summary:
There is a single pre/post script needed for building torchaudio. This PR:
1. Removes the old conda-specific build script
2. Renames the wheel script to be a general name

Pull Request resolved: https://github.com/pytorch/audio/pull/2695

Reviewed By: kit1980

Differential Revision: D39631971

Pulled By: osalpekar

fbshipit-source-id: 52b49a6e792536b6264228c01ac356d247b18ea8

Update nightly wheels to ROCm5.2 (#2672)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2672

Reviewed By: atalman

Differential Revision: D39468320

Pulled By: mthrok

fbshipit-source-id: 0e7bd4fd922ba0db51700e140b95328a5b687a6f

Adopt `:autosummary:` in `torchaudio.functional` module doc (#2693)

Summary:
https://output.circle-artifacts.com/output/job/b23174d2-5cee-4ee9-be39-3228b9ae4abe/artifacts/0/docs/functional.html

<img width="1133" alt="Screen Shot 2022-09-20 at 11 19 23 AM" src="https://user-images.githubusercontent.com/855818/191152824-96c5b16c-bd38-4656-b1ae-0b58699dbd62.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2693

Reviewed By: carolineechen

Differential Revision: D39650930

Pulled By: mthrok

fbshipit-source-id: 28b5b03d21b922e37e02bfddda2bf1dea696cc18

Add Speech Commands metadata function (#2687)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2687

Reviewed By: mthrok

Differential Revision: D39647596

Pulled By: carolineechen

fbshipit-source-id: 8ff874fc1e828130f6754e83ce1f702ca13dfac0

Adopt `:autosummary:` in `torchaudio.models` module doc (#2690)

Summary:
* Introduce the mini-index at `torchaudio.models` page.

https://output.circle-artifacts.com/output/job/25e59810-3866-4ece-b1b7-8a10c7a2286d/artifacts/0/docs/models.html

<img width="1042" alt="Screen Shot 2022-09-20 at 1 20 50 PM" src="https://user-images.githubusercontent.com/855818/191166816-83314ad1-8b67-475b-aa10-d4cc59126295.png">

<img width="1048" alt="Screen Shot 2022-09-20 at 1 20 58 PM" src="https://user-images.githubusercontent.com/855818/191166829-1ceb65e0-9506-4328-9a2f-8b75b4e54404.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2690

Reviewed By: carolineechen

Differential Revision: D39654948

Pulled By: mthrok

fbshipit-source-id: 703d1526617596f647c85a7148f41ca55fffdbc8

Support in-memory decoding via Tensor wrapper in StreamReader (#2694)

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2694

This commit adds Tensor type as input to `StreamReader`.
The Tensor is interpreted as byte string buffer.

Reviewed By: hwangjeff

Differential Revision: D39467630

fbshipit-source-id: 6369eed5e16fbb657568bf6bb80d703483d72f8e

Add StreamReader Tensor Binding to src (#2699)

Summary:
In https://github.com/pytorch/audio/issues/2694 CMakeLists.txt was not properly updated, so the tests are failing. This commit fix it.

Pull Request resolved: https://github.com/pytorch/audio/pull/2699

Reviewed By: carolineechen

Differential Revision: D39687409

Pulled By: mthrok

fbshipit-source-id: 2e14f3c478f1f8a112a03839f2dbcca51215fed7

Adopt `:autosummary:` in `torchaudio.pipelines` module doc (#2689)

Summary:
* Introduce the mini-index at `torchaudio.pipelines` page.
* Add introductions
* Update pipeline tutorials

https://output.circle-artifacts.com/output/job/ccc57d95-1930-45c9-b967-c8d477d35f29/artifacts/0/docs/pipelines.html

<img width="1163" alt="Screen Shot 2022-09-20 at 1 23 29 PM" src="https://user-images.githubusercontent.com/855818/191167049-98324e93-2e16-41db-8538-3b5b54eb8224.png">

<img width="1115" alt="Screen Shot 2022-09-20 at 1 23 49 PM" src="https://user-images.githubusercontent.com/855818/191167071-4770f594-2540-43a4-a01c-e983bf59220f.png">

https://output.circle-artifacts.com/output/job/ccc57d95-1930-45c9-b967-c8d477d35f29/artifacts/0/docs/generated/torchaudio.pipelines.RNNTBundle.html#torchaudio.pipelines.RNNTBundle

<img width="1108" alt="Screen Shot 2022-09-20 at 1 24 18 PM" src="https://user-images.githubusercontent.com/855818/191167123-51b33a5f-c30c-46bc-b002-b05d2d0d27b7.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2689

Reviewed By: carolineechen

Differential Revision: D39691253

Pulled By: mthrok

fbshipit-source-id: ddf5fdadb0b64cf2867b6271ba53e8e8c0fa7e49

Add metadata mode for various datasets (#2697)

Summary:
Add metadata mode for the following SUPERB benchmark datasets
- QUESST14
- Fluent Speech Commands
- VoxCeleb1

follow ups:
- Add metadata mode for LibriMix -- waiting for unit tests to merge
- Add IEMOCAP + SNIPS datasets

Pull Request resolved: https://github.com/pytorch/audio/pull/2697

Reviewed By: mthrok

Differential Revision: D39666809

Pulled By: carolineechen

fbshipit-source-id: 3a8f07627acceed70f960f47e694efad75b108c2

Update and fix tutorials (#2701)

Summary:
* Fix Sphinx warning
* Update asset management

Pull Request resolved: https://github.com/pytorch/audio/pull/2701

Reviewed By: carolineechen

Differential Revision: D39714126

Pulled By: mthrok

fbshipit-source-id: a5b04cfbf8bedce67c621b6bfe1dcd975b343313

Adopt `:autosummary:` in `torchaudio.datasets` module doc (#2692)

Summary:
* Introduce the mini-index at `torchaudio.datasets` page.
* Standardize the format of return type docstring.

https://output.circle-artifacts.com/output/job/989328b2-0270-4958-b577-19cf749af3fd/artifacts/0/docs/datasets.html

<img width="936" alt="Screen Shot 2022-09-21 at 6 56 52 PM" src="https://user-images.githubusercontent.com/855818/191475141-a97f2bea-705f-49bc-8c34-6ec869e76793.png">

https://output.circle-artifacts.com/output/job/989328b2-0270-4958-b577-19cf749af3fd/artifacts/0/docs/generated/torchaudio.datasets.CMUDict.html#torchaudio.datasets.CMUDict

<img width="1069" alt="Screen Shot 2022-09-21 at 6 57 32 PM" src="https://user-images.githubusercontent.com/855818/191475293-e3302528-27ea-4212-9c12-fd6d900fdf3e.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2692

Reviewed By: carolineechen

Differential Revision: D39687463

Pulled By: mthrok

fbshipit-source-id: 4175fc15388817d2fe76206188618dd1576281df

Introduce IO section to getting started tutorials (#2703)

Summary:
Since that new tutorials for StreamWriter are being added, there are more tutorials for media IO than the rest.
So this commit introduces sub-index for IO tutorials.

Pull Request resolved: https://github.com/pytorch/audio/pull/2703

Reviewed By: carolineechen

Differential Revision: D39769049

Pulled By: mthrok

fbshipit-source-id: 19a3981bc624fdce1d5d703c67e28a751a15e812

[Nova] Moving Linux Wheels over to Nova (#2702)

Summary:
This does 2 things:

Comments out Linux Wheels-related jobs in CircleCI so that they are not run on nightlies/releases.
Adds a GHA workflow that calls the build workflow in pytorch/test-infra.
Testing:
Verified that the builds are triggered by this workflow, and all builds are green: https://github.com/pytorch/audio/actions/runs/3109635749/jobs/5040029155

Pull Request resolved: https://github.com/pytorch/audio/pull/2702

Reviewed By: seemethere

Differential Revision: D39756852

Pulled By: osalpekar

fbshipit-source-id: 7e222d80ca0720e3be43b929f1e55f5c0166b947

[perf][5/5] Replace IValue::toString()->string() with IValue::toStringRef() (#2700)

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2700

ATT for pytorch/audio

Reviewed By: mthrok

Differential Revision: D39707243

fbshipit-source-id: 1dc2a5a9fe913a9071e6df679e39d632b75212fb

Add CUDA version check (#2707)

Summary:
Adds check to ensure that TorchAudio and PyTorch versions use the same CUDA version.

Pull Request resolved: https://github.com/pytorch/audio/pull/2707

Reviewed By: mthrok

Differential Revision: D39791154

Pulled By: hwangjeff

fbshipit-source-id: de00889c7bac897c6b8762502f9d37797016b71d

Fix CUDA check (#2710)

Summary:
`torch.version.cuda` can return a string of form X.X or X.X.X. This PR modifies the CUDA version check to account for this.

Pull Request resolved: https://github.com/pytorch/audio/pull/2710

Reviewed By: carolineechen, nateanl

Differential Revision: D39796810

Pulled By: hwangjeff

fbshipit-source-id: b483bd8200195844d65d0caddebaf1b10f939b64

Remove linux wheel from circleci (#2714)

Summary:
Remove linux wheel from circleci

Pull Request resolved: https://github.com/pytorch/audio/pull/2714

Reviewed By: weiwangmeta

Differential Revision: D39816121

Pulled By: atalman

fbshipit-source-id: a3c99b530896888d7b4271d8b3f27f3c986b3480

Fix windows tests related to old conda on circleci (#2704)

Summary:
Conda version on circleCI prints following message:
```
==> WARNING: A newer version of conda exists. <==
  current version: 4.6.14
  latest version: 4.14.0
```
and as a result this error:

```
+ /c/tools/miniconda3/Scripts/conda.exe install -v -y -c pytorch-nightly -c nvidia pytorch numpy ffmpeg pytorch-cuda=11.6
Collecting package metadata: ...working... done
Solving environment: ...working...

Too long with no output (exceeded 30m0s): context deadline exceeded
```

This should update the conda version running on the system and allow us to install pytorch and run some tests.

Pull Request resolved: https://github.com/pytorch/audio/pull/2704

Reviewed By: weiwangmeta

Differential Revision: D39820037

Pulled By: atalman

fbshipit-source-id: 4a82a7a6cbe3dc1a5807ac669e2fa79f454037fa

[Nova] Add build-type argument for when upload should be triggered (#2706)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2706

Reviewed By: kit1980

Differential Revision: D39786253

Pulled By: osalpekar

fbshipit-source-id: 2a0c427f57e5c70ff1cf419b7e0c2316e5f0e16c

Back out "[audio][PR] [Nova] Moving Linux Wheels over to Nova" (#2718)

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2718

Original commit changeset: 7e222d80ca07

Original Phabricator Diff: D39756852 (https://github.com/pytorch/audio/commit/7ba7cf4d24a2967b8fa4aaff437116524281f8fd)

Reviewed By: weiwangmeta

Differential Revision: D39839899

fbshipit-source-id: f5605eb9882f7c7f0008e88338ab711131b29404

Fix mismatched cuda version in smoke tests on windows wheels (#2721)

Summary:
Example job that was failing previously:
https://app.circleci.com/pipelines/github/pytorch/audio/12796/workflows/ae96794a-6df4-4a2a-84df-ada7a7250045/jobs/927709

The failure:
```
"Detected that PyTorch and TorchAudio were compiled with different CUDA versions. "
RuntimeError: Detected that PyTorch and TorchAudio were compiled with different CUDA versions. PyTorch has CUDA version 11.7 whereas TorchAudio has CUDA version 11.6. Please install the TorchAudio version that matches your PyTorch version.
```

Has install command:
```
pip install $(ls ~/workspace/torchaudio*.whl) -f "https://download.pytorch.org/whl/${UPLOAD_CHANNEL}/torch_${UPLOAD_CHANNEL}.html"

pip install /c/Users/circleci/workspace/torchaudio-0.13.0.dev20220927+cu116-cp37-cp37m-win_amd64.whl -f https://download.pytorch.org/whl/nightly/torch_nightly.html
```

Linux job (succeeds) for uses different "-f" (find links) url, that includes specific cuda version:
https://app.circleci.com/pipelines/github/pytorch/audio/12809/workflows/aadca2ab-5a00-4a0a-ab6a-4a1b7a503713/jobs/927861

Command:
```
pip install $(ls ~/workspace/torchaudio*.whl) -f "https://download.pytorch.org/whl/${UPLOAD_CHANNEL}/${CU_VERSION}/torch_${UPLOAD_CHANNEL}.html"

 pip install /root/workspace/torchaudio-0.13.0.dev20220927+cu116-cp37-cp37m-linux_x86_64.whl -f https://download.pytorch.org/whl/nightly/cu116/torch_nightly.html

```

This PR makes Windows installation match the linux one.

Testing:
* verified command manually on Circle CI:
```
>>> import torch
>>> import torchaudio
C:\tools\miniconda3\lib\site-packages\torchaudio\compliance\kaldi.py:22: UserWarning: Failed to initialize NumPy: numpy.core.multiarray failed to import (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\utils\tensor_numpy.cpp:77.)
  EPSILON = torch.tensor(torch.finfo(torch.float).eps)
C:\tools\miniconda3\lib\site-packages\torchaudio\backend\utils.py:62: UserWarning: No audio backend is available.
  warnings.warn("No audio backend is available.")
```

Co-authered: weiwangmeta

Pull Request resolved: https://github.com/pytorch/audio/pull/2721

Reviewed By: hwangjeff

Differential Revision: D39870805

Pulled By: izaitsevfb

fbshipit-source-id: 2957cba4f53d00783a5c07099f24050ce15e7d1c

Removing cuda102 (#2715)

Summary:
Removing cuda102

Pull Request resolved: https://github.com/pytorch/audio/pull/2715

Reviewed By: hwangjeff

Differential Revision: D39823444

Pulled By: atalman

fbshipit-source-id: c11d798ab86cf9a6d5ed3804958b4a0c2f8a87ff

Revert "Removing cuda102 (#2715)" (#2723)

Summary:
Revert this fot now untill docker is updated

Pull Request resolved: https://github.com/pytorch/audio/pull/2723

Reviewed By: nateanl

Differential Revision: D39900382

Pulled By: atalman

fbshipit-source-id: f8701e359bc11e8f9f3a29144f7e7da336a470da

Cuda 102 deprecation (#2724)

Summary:
Cuda 10.2 deprecation, migration of unit tests from cuda 10.2 to cuda 11.6

Pull Request resolved: https://github.com/pytorch/audio/pull/2724

Reviewed By: weiwangmeta

Differential Revision: D39912484

Pulled By: atalman

fbshipit-source-id: e760b630375eae94384cda68d24f83ef46ada6d9

Delete packaging/README.md (#2730)

Summary:
The file looks hopelessly outdated.

Pull Request resolved: https://github.com/pytorch/audio/pull/2730

Reviewed By: mthrok

Differential Revision: D39993805

Pulled By: kit1980

fbshipit-source-id: f5ad97c83873061175455cc7b129ec71a9ec3d7d

Add citation for MuST-C dataset in Emformer RNNT pipeline. (#2728)

Summary:
The MuST-C reference is added in https://github.com/pytorch/audio/pull/2689. This PR adds the citation to the RNNT pipeline documentation.

Pull Request resolved: https://github.com/pytorch/audio/pull/2728

Reviewed By: carolineechen

Differential Revision: D39990882

Pulled By: nateanl

fbshipit-source-id: 011057952dd8aa30a4cb7c7af0ac75123e329d7e

Adopt :autosummary: to multiple modules (#2664)

Summary:
Adopt `:autosummary:` to various modules

    * torchaudio.compliance.kaldi
    * torchaudio.sox_effects
    * torchaudio.utils

Pull Request resolved: https://github.com/pytorch/audio/pull/2664

Reviewed By: nateanl

Differential Revision: D39841873

Pulled By: mthrok

fbshipit-source-id: ff4fa6976324fca5f35b737b715f976e2a722bac

Add StreamWriter media device/streaming tutorial (#2708)

Summary:
https://output.circle-artifacts.com/output/job/213c71c8-c9b5-4516-af92-a2f8dab2c9fd/artifacts/0/docs/tutorials/streamwriter_advanced.html

Pull Request resolved: https://github.com/pytorch/audio/pull/2708

Reviewed By: carolineechen

Differential Revision: D40013310

Pulled By: mthrok

fbshipit-source-id: 7226b021ce2fe951b3bf0bd41e93a6bbcf696124

Tweak tutorials (#2733)

Summary:
* Port downstream change https://github.com/pytorch/tutorials/pull/2060
* Fix inter-tutorial links and references

Pull Request resolved: https://github.com/pytorch/audio/pull/2733

Reviewed By: hwangjeff

Differential Revision: D40086902

Pulled By: hwangjeff

fbshipit-source-id: 00b04c6a1b68fb9fadd52b610b26ecaab15d52d8

Increase CircleCi no_output_timeout for `install binaries` steps (#2734)

Summary:
The goal is to to reduce the number of job failures due to timeouts, see https://app.circleci.com/pipelines/github/pytorch/audio/12882/workflows/f99da1a5-32e6-4bac-8ceb-fbf36d693e2d/jobs/936363?invite=true#step-105-105 for example.

Pull Request resolved: https://github.com/pytorch/audio/pull/2734

Reviewed By: weiwangmeta, atalman

Differential Revision: D40077578

fbshipit-source-id: 573f43a4d088a7086fa6925ac5ba1fdd1e8f39ec

Torchaudio load libary path fix for windows python 3.8 (#2735)

Summary:
Torchaudio load libary path fix for windows and python = 3.8

Fixes: https://github.com/pytorch/audio/issues/2726

Fixes following issue:

```
>>> import torchaudio
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\atalman\miniconda3\envs\mywin38\lib\site-packages\torchaudio\__init__.py", line 1, in <module>
    from torchaudio import (  # noqa: F401
  File "C:\Users\atalman\miniconda3\envs\mywin38\lib\site-packages\torchaudio\_extension.py", line 128, in <module>
    _init_extension()
  File "C:\Users\atalman\miniconda3\envs\mywin38\lib\site-packages\torchaudio\_extension.py", line 98, in _init_extension
    _load_lib("libtorchaudio")
  File "C:\Users\atalman\miniconda3\envs\mywin38\lib\site-packages\torchaudio\_extension.py", line 52, in _load_lib
    torch.ops.load_library(path)
  File "C:\Users\atalman\miniconda3\envs\mywin38\lib\site-packages\torch\_ops.py", line 573, in load_library
    ctypes.CDLL(path)
  File "C:\Users\atalman\miniconda3\envs\mywin38\lib\ctypes\__init__.py", line 373, in __init__
    self._handle = _dlopen(self._name, mode)
FileNotFoundError: Could not find module 'C:\Users\atalman\miniconda3\envs\mywin38\Lib\site-packages\torchaudio\lib\libtorchaudio.pyd' (or one of its dependencies). Try using the full path with constructor syntax.
>>>
```

Caused by dlls not being found in the conda environment
```
C:\Users\atalman\miniconda3\envs\mywin38\bin\
```

While this environment is set correctly in PATH its ignored with Python = 3.8
Please refer to: https://stackoverflow.com/questions/59330863/cant-import-dll-module-in-python

Pull Request resolved: https://github.com/pytorch/audio/pull/2735

Reviewed By: carolineechen

Differential Revision: D40112293

Pulled By: carolineechen

fbshipit-source-id: c7fc9bb49fc3ec4a2855c6ea473f36808103ed1e

Add StreamWriter tutorial (#2698)

Summary:
Add a tutorial for basic usage of torchaudio.io.StreamWriter.

https://output.circle-artifacts.com/output/job/55d9a495-af7a-483c-84cb-de9a08cfd2f3/artifacts/0/docs/tutorials/streamwriter_basic_tutorial.html

Pull Request resolved: https://github.com/pytorch/audio/pull/2698

Reviewed By: carolineechen

Differential Revision: D40133007

Pulled By: carolineechen

fbshipit-source-id: 141f692c32343981bfb228357f21562ffe36f623

Fix sphinx gallery list in io doc (#2736)

Summary:
Specifying multiple object in `:minigallery:` directive shows duplicated tutorials.

This commit fixes it by listing tutorials based on module used.

https://output.circle-artifacts.com/output/job/c3da2a22-40d5-4e2d-b73a-28b39e712817/artifacts/0/docs/io.html

Before:
<img width="694" alt="Screen Shot 2022-10-07 at 7 04 35 AM" src="https://user-images.githubusercontent.com/855818/194427092-ca1202e7-0731-4c18-b48b-24923d692a4a.png">

After:

<img width="648" alt="Screen Shot 2022-10-07 at 7 03 14 AM" src="https://user-images.githubusercontent.com/855818/194426950-5b780458-2bf0-43ef-b020-fcbbfdf8d41b.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2736

Reviewed By: carolineechen

Differential Revision: D40160247

Pulled By: carolineechen

fbshipit-source-id: 547496f9b569ff7a4d70db97e90f3ea503344477

Modify `info_audio` to compute and return number of frames if not found in stream info (#2740)

Summary:
Modifies `info_audio` to compute and return number of frames if not found in stream info. This resolves the `num_frames == 0` issue for mp3 that's cited in https://github.com/pytorch/audio/issues/2524.

Pull Request resolved: https://github.com/pytorch/audio/pull/2740

Reviewed By: nateanl

Differential Revision: D40168639

Pulled By: nateanl

fbshipit-source-id: bb45baa0f9cd56844315b04e40ab9835d825fc24

Update sox info docstring to account for mp3 frame count handling (#2742)

Summary:
Updates sox info docstring to account for mp3 frame count handling fix introduced in https://github.com/pytorch/audio/issues/2740.

Pull Request resolved: https://github.com/pytorch/audio/pull/2742

Reviewed By: nateanl

Differential Revision: D40189846

Pulled By: nateanl

fbshipit-source-id: d6371418d7d4867dd0b97ee72ebf846d5c93dc30

Update HW video processing tutorial (#2739)

Summary:
* Add HW encoding to HW tutorial

https://colab.research.google.com/drive/1DDah_IaGULEO66CfQWltRqaVheBkiXdN#scrollTo=eXzKSVrHk1vS

Pull Request resolved: https://github.com/pytorch/audio/pull/2739

Reviewed By: hwangjeff

Differential Revision: D40197086

Pulled By: hwangjeff

fbshipit-source-id: 1780a5419f6705f7c24ba96bd46c3310438af7db

Add IEMOCAP dataset (#2732)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2732

Reviewed By: nateanl

Differential Revision: D40186996

Pulled By: nateanl

fbshipit-source-id: a0ad325b7153c9e580dad2c515730dadbe8840c4

Fix HuBERT docstring (#2746)

Summary:
The docstring of `wav2vec2` argument is wrong. Fix it in this PR.

Pull Request resolved: https://github.com/pytorch/audio/pull/2746

Reviewed By: carolineechen

Differential Revision: D40225995

Pulled By: nateanl

fbshipit-source-id: 770e9c928ebebd7b6307e181601eb64625d668da

Add unit test for LibriMix dataset (#2659)

Summary:
Besides the unit test, the PR also addresses these issues:
- The original `LibriMix` dataset only supports "min" mode, which means the audio length is the minimum of all clean sources. It is default for source separation task. Users may also want to use "max" mode which allows for end-to-end separation and recognition. The PR adds ``mode`` argument to let users decide which dataset they want to use.
- If the task is ``"enh_both"``, the target is the audios in ``mix_clean`` instead of separate clean sources. The PR fixes it to use ``mix_clean`` as target.

Pull Request resolved: https://github.com/pytorch/audio/pull/2659

Reviewed By: carolineechen

Differential Revision: D40229227

Pulled By: nateanl

fbshipit-source-id: fc07e0d88a245e1367656d3767cf98168a799235

Add Snips Dataset (#2738)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2738

Reviewed By: carolineechen

Differential Revision: D40238099

Pulled By: nateanl

fbshipit-source-id: c5cc94c2a348a6ef34c04b8dd26114ecb874d73e

Fix windows python 3.8 loading path (#2747)

Summary:
Fix windows python 3.8 loading path

Pull Request resolved: https://github.com/pytorch/audio/pull/2747

Reviewed By: nateanl

Differential Revision: D40264326

Pulled By: nateanl

fbshipit-source-id: f4a24757de7b48c63a7481034eb11fc3ff174327

Add metadata for Librimix (#2751)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2751

Reviewed By: nateanl

Differential Revision: D40267874

Pulled By: carolineechen

fbshipit-source-id: 4e45a02c650ed65c05cde82289a400a3be877927

Increase inactivity timeout for binary build jobs (#2754)

Summary:
Increase inactivity timeout for binary build jobs

Pull Request resolved: https://github.com/pytorch/audio/pull/2754

Reviewed By: carolineechen

Differential Revision: D40275368

Pulled By: atalman

fbshipit-source-id: 5e682bb78bda640d615f874fbdf0e650b5a38ee0

Skip hubert xlarge torchscript test (#2758)

Summary:
a couple of circleci unittests are failing during hubert xlarge torchscript test, which has been known to fail on Windows in the past (#65776). this PR disables this test on circleci

cc atalman

Pull Request resolved: https://github.com/pytorch/audio/pull/2758

Reviewed By: mthrok

Differential Revision: D40290535

Pulled By: carolineechen

fbshipit-source-id: 5c5fb43434a517b6c439a8cb8e853015d1550a57

Improve wav2vec2/hubert model for pre-training (#2716)

Summary:
This PR improves the Wav2Vec2/HuBERT model regarding model pre-training.

- The model initialization of positional embedding and transformer module is essential to model pre-training. The accuracy of unmasked frames should be higher than masked frames, as it is an easier task. but without the initialization, the accuracy of masked frames is higher than unmasked frames.
  Compared the performance after two epochs with 16 GPUs.
  - With model initialization, the accuracies of masked/unmasked frames are 0.08/0.11.
  - Without model initialization, the accuracies of masked/unmasked frames are 0.06/0.04.
- After adding the model initialization, the gradient is easy to overflow (aka `nan` gradient). In paper [Self-Supervised Learning for speech recognition with Intermediate layer supervision](https://arxiv.org/abs/2112.08778) the authors propose a simple but effective method to mitigate the overflow issue, by scaling down the multiplication of query and key and subtracting the maximum value from it (subtracting a constant value won't change the output of softmax). Then it guarantees the value won't be overflowed.
- In the original fairseq, the mask indices are generated by `numpy.random.choice`. Here replace `torch.multinomial` with `torch.randperm`. (cc carolineechen).

Other improvements within training scripts will be included in a separate PR.

Pull Request resolved: https://github.com/pytorch/audio/pull/2716

Reviewed By: xiaohui-zhang

Differential Revision: D39832189

Pulled By: nateanl

fbshipit-source-id: f4d2a473a79ad63add2dd16624bd155d5ce4de27

Improve hubert recipe for pre-training and fine-tuning (#2744)

Summary:
following pr https://github.com/pytorch/audio/issues/2716
- For preprocessing
  - The HuBERT feature takes lots of memory which may not fit some machines. Enable to use a subset of feature for training a k-means model.

- For pre-training
  - Normalize the loss based on the total number of masked frames across all GPUs.
  - Use mixed precision training. fp16 is not well supported in pytorch_lightning.
  - Log accuracies of masked/unmasked frames during training.
  - Clip the gradients with norm `10.0`.

- For ASR fine-tuning
  - Normalize the loss based on the total number of batches across all GPUs, same as in the conformer recipe of TorchAudio.
  - Use mixed precision training.
  - Add "|" after the end of transcription to capture the silence/word termination, same as in fairseq recipe.

- Update the WER results on LibriSpeech dev and test sets.

|                   | WER% (Viterbi)|  WER% (KenLM) |
|:-----------------:|--------------:|--------------:|
| dev-clean         |       10.9    |       4.2     |
| dev-other         |       17.5    |       9.4     |
| test-clean        |       10.9    |       4.4     |
| test-other        |       17.8    |       9.5     |

Pull Request resolved: https://github.com/pytorch/audio/pull/2744

Reviewed By: carolineechen

Differential Revision: D40282322

Pulled By: nateanl

fbshipit-source-id: 4723584c912e70e8970149fe09de005385eaab90

Fix typos in tacotron2 tutorial (#2761)

Summary:
`publishe`->`published`

Also, not sure if it should be `pre-trained weight is published` or `pre-trained weights are published`

Pull Request resolved: https://github.com/pytorch/audio/pull/2761

Reviewed By: carolineechen

Differential Revision: D40313042

Pulled By: malfet

fbshipit-source-id: c22085ca0b1125a06aa04bf38231d0a9fbfed00b

Add gtzan download note (#2763)

Summary:
GTZAN download link is no longer working, so the torchaudio download functionality for GTZAN does not work properly, per https://github.com/pytorch/audio/issues/2743. Add a note in the docs to reflect this discovery.

Pull Request resolved: https://github.com/pytorch/audio/pull/2763

Reviewed By: nateanl, mthrok

Differential Revision: D40315071

Pulled By: carolineechen

fbshipit-source-id: 3250326c45d227546a9c62b33ba890199ad19242

Update tutorial author information (#2764)

Summary:
Adding and updating author information.

Pull Request resolved: https://github.com/pytorch/audio/pull/2764

Reviewed By: carolineechen

Differential Revision: D40332427

Pulled By: mthrok

fbshipit-source-id: 4f04c7351386c122e3b0a45c2ed1757a04b7dc9a

Add custom lm example to decoder tutorial (#2762)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2762

Reviewed By: mthrok

Differential Revision: D40332603

Pulled By: carolineechen

fbshipit-source-id: 2de51265adc81b4728f4d6798d287bd2eccf5251

Fix CTCDecoder doc (#2766)

Summary:
* Document `__call__` instead of `__init__`
* List CTCHypothesis first as it is used in combination with CTCDecoder
* Fix indentation of score method docstring

Pull Request resolved: https://github.com/pytorch/audio/pull/2766

Reviewed By: carolineechen

Differential Revision: D40349388

Pulled By: mthrok

fbshipit-source-id: 5e512e6c2b29d3533eb62d09b289154ccd1abf4c

Fix fading in hybrid demucs tutorial (#2769)

Summary:
The separation applies on chunks of audios to avoid OOM. The combination of consecutive chunks is described in the graph:

![image](https://user-images.githubusercontent.com/8653221/195691886-002844e6-4ec5-41de-8910-df8046553998.png)

In the last audio chunk, there is no future chunk to be combined, hence the overlap on the right side doesn't need to be faded.

Pull Request resolved: https://github.com/pytorch/audio/pull/2769

Reviewed By: carolineechen

Differential Revision: D40358382

Pulled By: nateanl

fbshipit-source-id: ec8be895d7a67acb257e2693b64922397163ed5e

Fix leaking matplotlib figure (#2771)

Summary:
In StreamWriter basic usage tutorial, matplotlib is used to generate raster images of waveforms, and the figure used is left unshown in the resulting tutorial with the use of ``sphinx_gallery_defer_figures`` command.

It turned out that this figure is shown in the next code block executed by Sphinx Gallery, and the figure is placed in totally unrelated place. https://pytorch.org/audio/main/tutorials/audio_feature_extractions_tutorial.html

<img width="951" alt="Screen Shot 2022-10-14 at 10 06 58 PM" src="https://user-images.githubusercontent.com/855818/195855124-ecd9be49-5085-4acd-9a93-608d9d1ee9ce.png">

This commit fixes it by closing the figure.

Pull Request resolved: https://github.com/pytorch/audio/pull/2771

Reviewed By: nateanl

Differential Revision: D40382076

Pulled By: mthrok

fbshipit-source-id: 015f2bab8492d3b4fbe70e1174c7776a5aa2679a

Update resampling tutorial (#2773)

Summary:
* Refactor benchmark script
* Rename `time` variable to avoid (potential) conflicting with time module
* Fix `beta` parameter in benchmark (it was not used previously)
* Use `timeit` module for benchmark
* Add plot
* Move the comment on result at the end
* Add link to an explanation of aliasing

https://output.circle-artifacts.com/output/job/20b57d2f-3614-4161-a18e-e0c1a537739c/artifacts/0/docs/tutorials/audio_resampling_tutorial.html

Pull Request resolved: https://github.com/pytorch/audio/pull/2773

Reviewed By: carolineechen

Differential Revision: D40421337

Pulled By: mthrok

fbshipit-source-id: b402f84d4517695daeca75fb84ad876ef9354b3a

Update description of HDemucs pipelines (#2774)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2774

Reviewed By: carolineechen

Differential Revision: D40445274

Pulled By: nateanl

fbshipit-source-id: 6388323a5fa5c548a86829cb3f7cafee5382d18d

Add file_name to the returned item in Snips dataset (#2775)

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2775

Reviewed By: carolineechen

Differential Revision: D40481144

Pulled By: nateanl

fbshipit-source-id: 5d0fb2478767704603a3ec28d74160e7892d4d0e

Update download path for speechcommands (#2777)

Summary:
previous download link for v0.02 did not download the entire dataset, but only the training dataset, resulting in issues when trying to access the testing or validation data.

Pull Request resolved: https://github.com/pytorch/audio/pull/2777

Reviewed By: nateanl

Differential Revision: D40480605

Pulled By: carolineechen

fbshipit-source-id: a594506b4ccfb548a7d5043b716c58463480c103

Add notes on file structure in Voxceleb1 based datasets (#2776)

Summary:
The file structure of VoxCeleb1 is as follows:
```
root/
└── wav/
    └── speaker_id folders
```
Users who use [Kaldi](https://github.com/kaldi-asr/kaldi/blob/f6f4ccaf213f0fe8b26e633a7dc0c802150626a0/egs/voxceleb/v1/local/make_voxceleb1_v2.pl) to get the VoxCeleb1 dataset have "dev" and "test" folders above "wav" folder. However, in the file lists like https://www.robots.ox.ac.uk/~vgg/data/voxceleb/meta/veri_test.txt or https://www.robots.ox.ac.uk/~vgg/data/voxceleb/meta/iden_split.txt there is not such differentiation. It's not necessary to put the extracted files into separate folders.

This PR adds notes in `VoxCeleb1Identification` and `VoxCeleb1Verification` datasets to inform the file structure to users.

Pull Request resolved: https://github.com/pytorch/audio/pull/2776

Reviewed By: carolineechen

Differential Revision: D40483707

Pulled By: nateanl

fbshipit-source-id: ccd1780a72a5b53f0300c2466c3073a293ad7b8d

[Nova] New GHA Workflow for Docstring Sync (#2720)

Summary:
Create a standalone GitHub Actions workflow for Docstring Sync. This job (https://app.circleci.com/pipelines/github/pytorch/audio/12625/workflows/96223ad2-0fcd-4dae-a045-d530aaf9b55c/jobs/907466) currently depends on linux wheels builds, which creates a dependency that makes the migration to Nova trickier. This PR creates a fresh standalone workflow for this job that is triggered per-PR and before nightly/release cuts.

Pull Request resolved: https://github.com/pytorch/audio/pull/2720

Reviewed By: izaitsevfb, seemethere

Differential Revision: D39863574

Pulled By: osalpekar

fbshipit-source-id: 8599dc006693242278857a3dedeb4fddc1eed14b

[Nova] Clean commit for Enabling Nova Linux Wheels Workflows (#2719)

Summary:
Creating this fresh PR since we're reverting the older commit that removed build configs from the CircleCI file. This does not change the existing builds/uploads in CircleCI, and should not break any existing jobs/workflows. This is just to add back workflows to build the Linux Wheels with Nova, upload them to GH artifacts (NOT to the actual nightly channels), and ensure that they produce the same binaries as CircleCI. TO CLARIFY: this does not upload anything to nightly channels, so this PR has not effect on any existing jobs or distributed binaries.

We will create a workflow (most likely in test-infra) that does this comparison between the binaries to ensure there is parity between the binaries before we start uploading with Nova.

Pull Request resolved: https://github.com/pytorch/audio/pull/2719

Reviewed By: hwangjeff, weiwangmeta

Differential Revision: D39866440

Pulled By: osalpekar

fbshipit-source-id: 9ebf0402214fcd97cc519801276d85d336617410

Add iemocap variants (#2778)

Summary:
add ability to load only improvised or only scripted utterances.

Pull Request resolved: https://github.com/pytorch/audio/pull/2778

Reviewed By: nateanl

Differential Revision: D40511865

Pulled By: carolineechen

fbshipit-source-id: e1fe3908ac2aa306ad30c242ddd25762b2268539

Bump version to 0.14 (#2779)

Summary:
Bump version to 0.14

Pull Request resolved: https://github.com/pytorch/audio/pull/2779

Reviewed By: carolineechen

Differential Revision: D40523034

Pulled By: atalman

fbshipit-source-id: 325e6ffcac4763a7d83ba600c2c3d9eadae03c31

Fix doc in torchaudio.backend (#2781)

Summary:
address https://github.com/pytorch/audio/issues/2780

Pull Request resolved: https://github.com/pytorch/audio/pull/2781

Reviewed By: carolineechen, mthrok

Differential Revision: D40556794

Pulled By: nateanl

fbshipit-source-id: b24912489d41e5663b4b4dcfb8be743fb962097e

Remove archive file in gh-pages branch (#2786)

Summary:
The motivation of generating `artifact.tar.gz` in the `build_docs` job is to easily use it for adding documentation in each stable release. But it is committed into `gh-pages` branch which causes the git repository very huge (see https://github.com/pytorch/audio/issues/2783). This PR removes the tar file from the commit.

Pull Request resolved: https://github.com/pytorch/audio/pull/2786

Reviewed By: caroli…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CI Xcode image deprecation and EOL
3 participants