Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update v2 docs with new changes #132

Merged
merged 34 commits into from
Oct 19, 2021
Merged
Show file tree
Hide file tree
Changes from 28 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
57472e0
clarify labels filepath columns
klwetstone Oct 14, 2021
b9ab825
start updating video loader config list
klwetstone Oct 14, 2021
5141f47
update configs
klwetstone Oct 14, 2021
860f4d6
Update video organization reqs
klwetstone Oct 14, 2021
a3dc816
video_width/height and num_workers
klwetstone Oct 14, 2021
2400fbf
filepaths
klwetstone Oct 14, 2021
14f99b9
default num_workers
klwetstone Oct 14, 2021
c2b49a1
update num_workers in CLI
klwetstone Oct 14, 2021
4ad727c
writing out train_config and predict_config yamls
klwetstone Oct 14, 2021
cfe7d8a
splits.csv
klwetstone Oct 14, 2021
16d38c4
updates based on netlify preview
klwetstone Oct 14, 2021
502f642
add screencast video demo
klwetstone Oct 14, 2021
f5da007
fix linting fail
klwetstone Oct 15, 2021
581ee25
updates for caching PR 131
klwetstone Oct 15, 2021
7aa9ee4
typo
klwetstone Oct 15, 2021
ffd6243
use best terminal video from asciinema
klwetstone Oct 15, 2021
8d45102
specific explanation of frame_selection_height
klwetstone Oct 18, 2021
c49700d
correct default batch size
klwetstone Oct 18, 2021
c583bf0
add frame_selection_width v model_input_width
klwetstone Oct 18, 2021
dffe86e
talk about num_workers more
klwetstone Oct 18, 2021
308b70a
flake8 fix
klwetstone Oct 19, 2021
dc94abd
PR feedback
klwetstone Oct 19, 2021
5dc1fa4
update home page language
klwetstone Oct 19, 2021
3c95d89
fix typo and change from sections to expand
klwetstone Oct 19, 2021
50779df
reduce toc depth to 2 to allow expand
klwetstone Oct 19, 2021
c3900c2
use megadetectorlite consistently
klwetstone Oct 19, 2021
ded14b2
move api reference to end
klwetstone Oct 19, 2021
666c8ad
add train data size recs
klwetstone Oct 19, 2021
d0b518c
Update docs/docs/train-tutorial.md
klwetstone Oct 19, 2021
c115932
Enable nav index page for models
jayqi Oct 19, 2021
e0cf326
Merge pull request #136 from drivendataorg/500-docs-updates-section-i…
klwetstone Oct 19, 2021
4abaab4
make contribute section header
klwetstone Oct 19, 2021
97b9729
Edit MDLite language
ejm714 Oct 19, 2021
a81fa39
put yolox back in
ejm714 Oct 19, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
185 changes: 110 additions & 75 deletions docs/docs/configurations.md

Large diffs are not rendered by default.

22 changes: 10 additions & 12 deletions docs/docs/debugging.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,33 +36,31 @@ The dry run will also catch any GPU memory errors. If you hit a GPU memory error

#### Decreasing video size

Resize video frames to be smaller before they are passed to the model. The default for all three models is 224x224 pixels. `video_height` and `video_width` cannot be passed directly to the command line, so if you are using the CLI these must be specified in a [YAML file](yaml-config.md).
Resize video frames to be smaller before they are passed to the model. The default for all three models is 240x426 pixels. `model_input_height` and `model_input_width` cannot be passed directly to the command line, so if you are using the CLI these must be specified in a [YAML file](yaml-config.md).

=== "YAML file"
```yaml
video_loader_config:
video_height: 100
video_width: 100
model_input_height: 100
model_input_width: 100
total_frames: 16 # total_frames is always required
```
=== "Python"
```python
video_loader_config = VideoLoaderConfig(
video_height=100, video_width=100, total_frames=16
model_input_height=100, model_input_width=100, total_frames=16
) # total_frames is always required
```

#### Reducing `num_workers`

Reduce the number of workers (subprocesses) used for data loading. By default, `num_workers` will be set to either one less than the number of CPUs in the system, or one if there is only one CPU in the system. `num_workers` cannot be passed directly to the command line, so if you are using the CLI it must be specified in a [YAML file](yaml-config.md).
Reduce the number of workers (subprocesses) used for data loading. By default `num_workers` will be set to 3. The minimum value is 0, which means that the data will be loaded in the main process, and the maximum is one less than the number of CPUs in the system.

=== "YAML file"
In a YAML file, add `num_workers` to `predict_config` or `train_config`:
```yaml
train_config:
data_directory: "example_vids/" # required
labels: "example_labels.csv" # required
num_workers: 1
=== "CLI"
```console
$ zamba predict --data-dir example_vids/ --num-workers 1

$ zamba train --data-dir example_vids/ --labels example_labels.csv --num-workers 1
```
=== "Python"
In Python, add `num_workers` to [`PredictConfig`](configurations.md#prediction-arguments) or [`TrainConfig`](configurations.md#training-arguments):
Expand Down
70 changes: 55 additions & 15 deletions docs/docs/extra-options.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,24 +31,32 @@ The options for `weight_download_region` are `us`, `eu`, and `asia`. Once a mode

## Video size

When `zamba` loads videos prior to either inference or training, it resizes all of the video frames before feeding them into a model. Higher resolution videos will lead to more detailed accuracy in prediction, but will use more memory and take longer to either predict on or train from. The default video loading configuration for all three pretrained models resizes images to 224x224 pixels.
When `zamba` loads videos prior to either inference or training, it resizes all of the video frames before feeding them into a model. Higher resolution videos will lead to more detailed accuracy in prediction, but will use more memory and take longer to either predict on or train from. The default video loading configuration for all three pretrained models resizes images to 240x426 pixels.

Say that you have a large number of videos, and you are more concerned with detecting blank v. non-blank videos than with identifying different species. In this case, you may not need a very high resolution and iterating through all of your videos with a high resolution would take a very long time. To resize all images to 50x50 pixels instead of the default 224x224:
Say that you have a large number of videos, and you are more concerned with detecting blank v. non-blank videos than with identifying different species. In this case, you may not need a very high resolution and iterating through all of your videos with a high resolution would take a very long time. To resize all images to 50x50 pixels instead of the default 240x426:

=== "YAML file"
```yaml
video_loader_config:
video_height: 50
video_width: 50
model_input_height: 50
model_input_width: 50
total_frames: 16 # total_frames must always be specified
```
=== "Python"
In Python, video resizing can be specified when `VideoLoaderConfig` is instantiated:

```python
```python hl_lines="6 7 8"
from zamba.models.model_manager import predict_model
from zamba.models.config import PredictConfig
from zamba.data.video import VideoLoaderConfig

predict_config = PredictConfig(data_directory="example_vids/")
video_loader_config = VideoLoaderConfig(
video_height=50, video_width=50, total_frames=16
model_input_height=50, model_input_width=50, total_frames=16
) # total_frames must always be specified
predict_model(
predict_config=predict_config, video_loader_config=video_loader_config
)
```

## Frame selection
Expand All @@ -64,8 +72,8 @@ Some camera traps begin recording a video when movement is detected. If this is
=== "YAML File"
```yaml
video_loader_config:
early_bias: True
# ... other parameters
early_bias: True
# ... other parameters
```
=== "Python"
In Python, `early_bias` is specified when `VideoLoaderConfig` is instantiated:
Expand All @@ -78,7 +86,7 @@ This method was used by the winning solution of the [Pri-matrix Factorization](h

### Evenly distributed frames

A simple option is to sample frames that are evenly distributed throughout a video. For example, to select 32 evenly distributed frames, add the following to a [YAML configuration file](yaml-config.md):
A simple option is to sample frames that are evenly distributed throughout a video. For example, to select 32 evenly distributed frames:

=== "YAML file"
```yaml
Expand All @@ -99,9 +107,9 @@ A simple option is to sample frames that are evenly distributed throughout a vid
)
```

### MegadetectorLiteYoloX
### MegadetectorLite

You can use a pretrained object detection model called [MegadetectorLiteYoloX](models.md#megadetectorliteyolox) to select only the frames that are mostly likely to contain an animal. This is the default strategy for all three pretrained models. The parameter `megadetector_lite_config` is used to specify any arguments that should be passed to the megadetector model. If `megadetector_lite_config` is None, the MegadetectorLiteYoloX model will not be used.
You can use a pretrained object detection model called [MegadetectorLite](models.md#megadetectorlite) to select only the frames that are mostly likely to contain an animal. This is the default strategy for all three pretrained models. The parameter `megadetector_lite_config` is used to specify any arguments that should be passed to the MegadetectorLite model. If `megadetector_lite_config` is None, the MegadetectorLite model will not be used.

For example, to take the 16 frames with the highest probability of detection:

Expand All @@ -117,8 +125,8 @@ For example, to take the 16 frames with the highest probability of detection:
In Python, these can be specified in the `megadetector_lite_config` argument passed to `VideoLoaderConfig`:
```python hl_lines="6 7 8 9 10"
video_loader_config = VideoLoaderConfig(
video_height=224,
video_width=224,
model_input_height=240,
model_input_width=426,
crop_bottom_pixels=50,
ensure_total_frames=True,
megadetector_lite_config={
Expand All @@ -134,6 +142,38 @@ For example, to take the 16 frames with the highest probability of detection:
train_model(video_loader_config=video_loader_config, train_config=train_config)
```

To see all of the options that can be passed to `MegadetectorLiteYoloX`, see the `MegadetectorLiteYoloXConfig` class. <!-- TODO: add link to github code><!-->
If you are using the [MegadetectorLite](models.md#megadetectorlite) for frame selection, there are two ways that you can specify frame resizing:

- `frame_selection_width` and `frame_selection_height` resize images *before* they are input to the frame selection method. If both are `None`, the full size images will be used during frame selection. Using full size images for selection is recommended for better detection of smaller species, but will slow down training and inference.
- `model_input_height` and `model_input_width` resize images *after* frame selection. These specify the image size that is passed to the actual model.

You can specify both of the above at once, just one, or neither. The example code feeds full-size images to MegadetectorLite, and then resizes images before running them through the neural network.

To see all of the options that can be passed to the MegadetectorLite, see the `MegadetectorLiteYoloXConfig` class. <!-- TODO: add link to github code><!-->

## Speed up training

Training will run faster if you increase `num_workers` and/or increase `batch_size`. `num_workers` is the number of subprocesses to use for data loading. The minimum is 0, meaning the data will be loaded in the main process, and the maximum is one less than the number of CPUs in your system. By default `num_workers` is set to 3 and `batch_size` is set to 2. Increasing either of these will use more GPU memory, and could raise an error if the memory required is more than your machine has available.

Both can be specified in either [`predict_config`](configurations.md#prediction-arguments) or [`train_config`](configurations.md#training-arguments). For example, to increase `num_workers` to 5 and `batch_size` to 4 for inference:

=== "YAML file"
```yaml
predict_config:
data_directory: example_vids/
num_workers: 5
batch_size: 4
# ... other parameters
```
=== "Python"
```python
predict_config = PredictConfig(
data_directory="example_vids/",
num_workers=5,
batch_size=4,
# ... other parameters
)
```


And that's just the tip of the iceberg! See the [All Optional Arguments](configurations.md) page for more possibilities.
And that's just the tip of the iceberg! See the [All Optional Arguments](configurations.md) page for more possibilities.
7 changes: 1 addition & 6 deletions docs/docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,7 @@ Welcome to zamba's documentation!

*Zamba means "forest" in the Lingala language.*


Zamba is a tool built in Python to automatically identify the species seen
in camera trap videos from sites in Africa and Europe. Using the combined
input of various deep learning models, the tool makes predictions for 42
common species in these videos (as well as blank, or, "no species present").
Zamba can be accessed as both a command-line tool and a Python package.
Zamba is a tool built in Python to automatically detect and classify the species seen in camera trap videos. Using state-of-the-art computer vision and machine learning, the tool is trained to identify 42 common species from sites in Africa and Europe (as well as blank, or "no species present"). Users can also input their own labeled videos to finetune a model and make predictions for new species or new contexts. `zamba` can be accessed as both a command-line tool and a Python package.

Zamba ships with three model options. `time_distributed` and `slowfast` are
trained on 32 common species from central and west Africa. `european` is trained
Expand Down
52 changes: 26 additions & 26 deletions docs/docs/models.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,13 +118,13 @@ See](https://www.chimpandsee.org/). The data included camera trap videos from:

<!-- TODO: add link to yaml file><!-->

By default, an efficient object detection model called [MegadetectorLiteYoloX](#megadetectorliteyolox) is run on all frames to determine which are the most likely to contain an animal. Then `time_distributed` is run on only the 16 frames with the highest predicted probability of detection. By default, videos are resized to 224x224 pixels.
By default, an efficient object detection model called [MegadetectorLite](#megadetectorlite) is run on all frames to determine which are the most likely to contain an animal. Then `time_distributed` is run on only the 16 frames with the highest predicted probability of detection. By default, videos are resized to 240x426 pixels.

The full default video loading configuration is:
```yaml
video_loader_config:
video_height: 224
video_width: 224
model_input_height: 240
model_input_width: 426
crop_bottom_pixels: 50
ensure_total_frames: True
megadetector_lite_config:
Expand All @@ -140,15 +140,15 @@ The above is pulled in by default if `time_distributed` is used in the command l
=== "YAML file"
```yaml
video_loader_config:
video_height: # any integer
video_width: # any integer
model_input_height: # any integer
model_input_width: # any integer
total_frames: 16
```
=== "Python"
```python
video_loader_config = VideoLoaderConfig(
video_height=..., # any integer
video_width=..., # any integer
model_input_height=..., # any integer
model_input_width=..., # any integer
total_frames=16
)
```
Expand Down Expand Up @@ -177,14 +177,14 @@ The `slowfast` model was trained using the same data as the [`time_distributed`

<!-- TODO: add link to yaml file><!-->

By default, an efficient object detection model called [MegadetectorLiteYoloX](#megadetectorliteyolox) is run on all frames to determine which are the most likely to contain an animal. Then `slowfast` is run on only the 32 frames with the highest predicted probability of detection. By default, videos are resized to 224x224 pixels.
By default, an efficient object detection model called [MegadetectorLite](#megadetectorlite) is run on all frames to determine which are the most likely to contain an animal. Then `slowfast` is run on only the 32 frames with the highest predicted probability of detection. By default, videos are resized to 240x426 pixels.

The full default video loading configuration is:

```yaml
video_loader_config:
video_height: 224
video_width: 224
model_input_height: 240
model_input_width: 426
crop_bottom_pixels: 50
ensure_total_frames: True
megadetector_lite_config:
Expand All @@ -200,15 +200,15 @@ The above is pulled in by default if `slowfast` is used in the command line. If
=== "YAML file"
```yaml
video_loader_config:
video_height: # any integer >= 200
video_width: # any integer >= 200
model_input_height: # any integer >= 200
model_input_width: # any integer >= 200
total_frames: 32
```
=== "Python"
```python
video_loader_config = VideoLoaderConfig(
video_height=..., # any integer >= 200
video_width=..., # any integer >= 200
model_input_height=..., # any integer >= 200
model_input_width=..., # any integer >= 200
total_frames=32
)
```
Expand All @@ -234,13 +234,13 @@ Evolutionary Anthropology](https://www.eva.mpg.de/index.html). The finetuning da

<!-- TODO: add link to yaml file><!-->

By default, an efficient object detection model called [MegadetectorLiteYoloX](#megadetectorliteyolox) is run on all frames to determine which are the most likely to contain an animal. Then `european` is run on only the 16 frames with the highest predicted probability of detection. By default, videos are resized to 224x224 pixels.
By default, an efficient object detection model called [MegadetectorLite](#megadetectorlite) is run on all frames to determine which are the most likely to contain an animal. Then `european` is run on only the 16 frames with the highest predicted probability of detection. By default, videos are resized to 240x426 pixels.

The full default video loading configuration is:
```yaml
video_loader_config:
video_height: 224
video_width: 224
model_input_height: 240
model_input_width: 426
crop_bottom_pixels: 50
ensure_total_frames: True
megadetector_lite_config:
Expand All @@ -257,28 +257,28 @@ The above is pulled in by default if `european` is used in the command line. If
=== "YAML file"
```yaml
video_loader_config:
video_height: # any integer
video_width: # any integer
model_input_height: # any integer
model_input_width: # any integer
total_frames: 16
```
=== "Python"
```python
video_loader_config = VideoLoaderConfig(
video_height=..., # any integer
video_width=..., # any integer
model_input_height=..., # any integer
model_input_width=..., # any integer
total_frames=16
)
```

<a id='megadetectorliteyolox'></a>
<a id='megadetectorlite'></a>

## MegadetectorLiteYoloX
## MegadetectorLite

Running any of the three models that ship with `zamba` on all frames of a video would be incredibly time consuming and computationally intensive. Instead, `zamba` uses a more efficient object detection model called MegadetectorLiteYoloX to determine the likelihood that each frame contains an animal. Then, only the frames with the highest probability of detection can be passed to the model.
Running any of the three models that ship with `zamba` on all frames of a video would be incredibly time consuming and computationally intensive. Instead, `zamba` uses a more efficient object detection model called MegadetectorLite to determine the likelihood that each frame contains an animal. Then, only the frames with the highest probability of detection can be passed to the model.

MegadetectorLiteYoloX combines two open-source models:
MegadetectorLite combines two open-source models:

* [Megadetector](https://github.com/microsoft/CameraTraps/blob/master/megadetector.md) is a pretrained image model designed to detect animals, people, and vehicles in camera trap videos.
* [YOLOX](https://github.com/Megvii-BaseDetection/YOLOX) is a high-performance, lightweight object detection model that is much less computationally intensive than Megadetector.

Megadetector is much better at identifying frames with animals than YOLOX, but too computationally intensive to run on every frame. MegadetectorLiteYoloX was created by training the YOLOX model using the predictions of the Megadetector as ground truth - this method is called [student-teacher training](https://towardsdatascience.com/knowledge-distillation-simplified-dd4973dbc764).
The Megadetector is much better at identifying frames with animals than YOLOX, but too computationally intensive to run on every frame. MegadetectorLite was created by training the YOLOX model using the predictions of the Megadetector as ground truth - this method is called [student-teacher training](https://towardsdatascience.com/knowledge-distillation-simplified-dd4973dbc764).