Skip to content

Commit

Permalink
docs: update readme, quickstart (#116)
Browse files Browse the repository at this point in the history
* update & quick start
* update autoencoder and other docs
* add a lighter progress bar callback to make training faster and almost on par with native torch

Signed-off-by: Avik Basu <avikbasu93@gmail.com>
Co-authored-by: Vigith Maurice <vigith@gmail.com>
  • Loading branch information
ab93 and vigith committed Jan 5, 2023
1 parent e8b5304 commit 2735d72
Show file tree
Hide file tree
Showing 10 changed files with 805 additions and 479 deletions.
2 changes: 1 addition & 1 deletion .flake8
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[flake8]
ignore = E203, F821
exclude = .git,__pycache__,docs/source/conf.py,old,build,dist
exclude = .git,__pycache__,docs/source/conf.py,old,build,dist,venv
max-complexity = 10
max-line-length = 100
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -165,3 +165,5 @@ cython_debug/

# Mac related
*.DS_Store

.python-version
73 changes: 43 additions & 30 deletions docs/autoencoders.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,47 +2,60 @@

An Autoencoder is a type of Artificial Neural Network, used to learn efficient data representations (encoding) of unlabeled data.

It mainly consist of 2 components: an encoder and a decoder. The encoder compresses the input into a lower dimensional code, the decoder then reconstructs the input only using this code.
It mainly consists of 2 components: an encoder and a decoder. The encoder compresses the input into a lower dimensional code, the decoder then reconstructs the input only using this code.

### Autoencoder Pipelines
## Datamodules
Pytorch-lightning datamodules abstracts and separates the data functionality from the model and training itself.
Numalogic provides `TimeseriesDataModule` to help set up and load dataloaders.

Numalogic provides two types of pipelines for Autoencoders. These pipelines serve as a wrapper around the base network models, making it easier to train, predict and generate scores. Also, this module follows the sklearn API.
```python
import numpy as np
from numalogic.tools.data import TimeseriesDataModule

train_data = np.random.randn(100, 3)
datamodule = TimeseriesDataModule(12, train_data, batch_size=128)
```

#### AutoencoderPipeline
## Autoencoder Trainer

Here we are using `VanillAE`, a Vanilla Autoencoder model.
Numalogic provides a subclass of Pytorch-Lightning Trainer module specifically for Autoencoders.
This trainer provides a mechanism to train, validate and infer on data, with all the parameters supported by Lightning Trainer.

Here we are using `VanillaAE`, a Vanilla Autoencoder model.

```python
from numalogic.models.autoencoder.variants import Conv1dAE
from numalogic.models.autoencoder import SparseAEPipeline
from numalogic.models.autoencoder.variants import VanillaAE
from numalogic.models.autoencoder import AutoencoderTrainer

model = AutoencoderPipeline(
model=VanillaAE(signal_len=12, n_features=3), seq_len=seq_len
)
model.fit(X_train)
model = VanillaAE(seq_len=12, n_features=3)
trainer = AutoencoderTrainer(max_epochs=50, enable_progress_bar=True)
trainer.fit(model, datamodule=datamodule)
```

#### SparseAEPipeline
## Autoencoder Variants

A Sparse Autoencoder is a type of autoencoder that employs sparsity to achieve an information bottleneck. Specifically the loss function is constructed so that activations are penalized within a layer.
Numalogic supports 2 variants of Autoencoders currently.
More details can be found [here](https://www.deeplearningbook.org/contents/autoencoders.html).

So, by adding a sparsity regularization, we will be able to stop the neural network from copying the input and reduce overfitting.
### 1. Undercomplete autoencoders

```python
from numalogic.models.autoencoder.variants import Conv1dAE
from numalogic.models.autoencoder import SparseAEPipeline
This is the simplest version of autoencoders where it is made sure that the
latent dimension is smaller than the encoding and decoding dimesions.

model = SparseAEPipeline(
model=VanillaAE(signal_len=12, n_features=3), seq_len=36, num_epochs=30
)
model.fit(X_train)
```
Examples would be `VanillaAE`, `Conv1dAE`, `LSTMAE` and `TransformerAE`

### 2. Sparse autoencoders
A Sparse Autoencoder is a type of autoencoder that employs sparsity to achieve an information bottleneck.
Specifically the loss function is constructed so that activations are penalized within a layer.
So, by adding a sparsity regularization, we will be able to stop the neural network from copying the input and reduce overfitting.

Examples would be `SparseVanillaAE`, `SparseConv1dAE`, `SparseLSTMAE` and `SparseTransformerAE`

### Autoencoder Variants
## Network architectures

Numalogic supports the following variants of Autoencoders
Numalogic currently supports the following architectures.

#### VanillaAE
#### Fully Connected

Vanilla Autoencoder model comprising only fully connected layers.

Expand All @@ -52,17 +65,17 @@ from numalogic.models.autoencoder.variants import VanillaAE
model = VanillaAE(seq_len=12, n_features=2)
```

#### Conv1dAE
#### 1d Convolutional

Conv1dAE is a one dimensional Convolutional Autoencoder with multichannel support.

```python
from numalogic.models.autoencoder.variants import Conv1dAE
from numalogic.models.autoencoder.variants import SparseConv1dAE

model=Conv1dAE(in_channels=3, enc_channels=8)
model = SparseConv1dAE(beta=1e-2, seq_len=12, in_channels=3, enc_channels=8)
```

#### LSTMAE
#### LSTM

An LSTM (Long Short-Term Memory) Autoencoder is an implementation of an autoencoder for sequence data using an Encoder-Decoder LSTM architecture.

Expand All @@ -73,7 +86,7 @@ model = LSTMAE(seq_len=12, no_features=2, embedding_dim=15)

```

#### TransformerAE
#### Transformer

The transformer-based Autoencoder model was inspired from [Attention is all you need](https://arxiv.org/abs/1706.03762) paper.

Expand Down
15 changes: 14 additions & 1 deletion docs/post-processing.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,20 @@
Post-processing step is again an optional step, where we normalize the anomalies between 0-10. This is mostly to make the scores more understandable.

```python
import numpy as np
from numalogic.postprocess import tanh_norm

test_anomaly_score_norm = tanh_norm(test_anomaly_score)
raw_anomaly_score = np.random.randn(10, 2)
test_anomaly_score_norm = tanh_norm(raw_anomaly_score)
```

A scikit-learn compatible API is also available.
```python
import numpy as np
from numalogic.postprocess import TanhNorm

raw_score = np.random.randn(10, 2)

norm = TanhNorm(scale_factor=10, smooth_factor=10)
norm_score = norm.fit_transform(raw_score)
```
4 changes: 2 additions & 2 deletions examples/numalogic-simple-pipeline/src/udf/inference.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,8 @@
def inference(_: str, datum: Datum) -> Messages:
r"""
Here inference is done on the data, given, the ML model is present
in the registry. If a model does not exist, it moves on Otherwise, conditional forward the inferred data
to postprocess vertex for generating anomaly score for the payload.
in the registry. If a model does not exist, the payload is flagged for training.
It then passes to the threshold vertex.
For more information about the arguments, refer:
https://github.com/numaproj/numaflow-python/blob/main/pynumaflow/function/_dtypes.py
Expand Down
454 changes: 343 additions & 111 deletions examples/quick-start.ipynb

Large diffs are not rendered by default.

7 changes: 7 additions & 0 deletions numalogic/models/autoencoder/trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

import pytorch_lightning as pl
import torch

from numalogic.tools.callbacks import ProgressDetails
from numalogic.tools.data import TimeseriesDataModule
from pytorch_lightning import Trainer
from torch import Tensor
Expand All @@ -20,8 +22,12 @@ def __init__(
enable_progress_bar=False,
enable_model_summary=False,
limit_val_batches=0,
callbacks=None,
**trainer_kw
):
if (not callbacks) and enable_progress_bar:
callbacks = ProgressDetails()

super().__init__(
logger=logger,
max_epochs=max_epochs,
Expand All @@ -31,6 +37,7 @@ def __init__(
enable_progress_bar=enable_progress_bar,
enable_model_summary=enable_model_summary,
limit_val_batches=limit_val_batches,
callbacks=callbacks,
**trainer_kw
)

Expand Down
34 changes: 34 additions & 0 deletions numalogic/tools/callbacks.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
import logging

import pytorch_lightning as pl
from pytorch_lightning.callbacks import ProgressBarBase


_LOGGER = logging.getLogger(__name__)


class ProgressDetails(ProgressBarBase):
r"""
A lightweight training progress detail producer.
Args:
log_freq: Interval of epochs to log
"""

def __init__(self, log_freq: int = 5):
super().__init__()
self._log_freq = log_freq
self._enable = True

def enable(self) -> None:
self._enable = True

def disable(self):
self._enable = False

def on_train_epoch_end(self, trainer: pl.Trainer, pl_module: pl.LightningModule) -> None:
super().on_train_epoch_end(trainer, pl_module)
metrics = self.get_metrics(trainer, pl_module)
curr_epoch = trainer.current_epoch
if curr_epoch % self._log_freq == 0:
_LOGGER.info("epoch %s, loss: %s", curr_epoch, metrics["loss"])

0 comments on commit 2735d72

Please sign in to comment.