drivendataorg · ejm714 · Oct 19, 2021 · Oct 14, 2021 · Oct 14, 2021 · Oct 14, 2021
diff --git a/docs/docs/configurations.md b/docs/docs/configurations.md
diff --git a/docs/docs/debugging.md b/docs/docs/debugging.md
@@ -36,33 +36,31 @@ The dry run will also catch any GPU memory errors. If you hit a GPU memory error
 
 #### Decreasing video size
 
-Resize video frames to be smaller before they are passed to the model. The default for all three models is 224x224 pixels. `video_height` and `video_width` cannot be passed directly to the command line, so if you are using the CLI these must be specified in a [YAML file](yaml-config.md).
+Resize video frames to be smaller before they are passed to the model. The default for all three models is 240x426 pixels. `model_input_height` and `model_input_width` cannot be passed directly to the command line, so if you are using the CLI these must be specified in a [YAML file](yaml-config.md).
 
 === "YAML file"
     ```yaml
     video_loader_config:
-        video_height: 100
-        video_width: 100
+        model_input_height: 100
+        model_input_width: 100
         total_frames: 16 # total_frames is always required
     ```
 === "Python"
     ```python
     video_loader_config = VideoLoaderConfig(
-        video_height=100, video_width=100, total_frames=16
+        model_input_height=100, model_input_width=100, total_frames=16
     ) # total_frames is always required
     ```
 
 #### Reducing `num_workers`
 
-Reduce the number of workers (subprocesses) used for data loading. By default, `num_workers` will be set to either one less than the number of CPUs in the system, or one if there is only one CPU in the system. `num_workers` cannot be passed directly to the command line, so if you are using the CLI it must be specified in a [YAML file](yaml-config.md).
+Reduce the number of workers (subprocesses) used for data loading. By default `num_workers` will be set to 3. The minimum value is 0, which means that the data will be loaded in the main process, and the maximum is one less than the number of CPUs in the system.
 
-=== "YAML file"
-    In a YAML file, add `num_workers` to `predict_config` or `train_config`:
-    ```yaml
-    train_config:
-        data_directory: "example_vids/" # required
-        labels: "example_labels.csv" # required
-        num_workers: 1
+=== "CLI"
+    ```console
+    $ zamba predict --data-dir example_vids/ --num-workers 1
+
+    $ zamba train --data-dir example_vids/ --labels example_labels.csv --num-workers 1
     ```
 === "Python"
     In Python, add `num_workers` to [`PredictConfig`](configurations.md#prediction-arguments) or [`TrainConfig`](configurations.md#training-arguments):

diff --git a/docs/docs/extra-options.md b/docs/docs/extra-options.md
@@ -31,24 +31,32 @@ The options for `weight_download_region` are `us`, `eu`, and `asia`. Once a mode
 
 ## Video size
 
-When `zamba` loads videos prior to either inference or training, it resizes all of the video frames before feeding them into a model. Higher resolution videos will lead to more detailed accuracy in prediction, but will use more memory and take longer to either predict on or train from. The default video loading configuration for all three pretrained models resizes images to 224x224 pixels. 
+When `zamba` loads videos prior to either inference or training, it resizes all of the video frames before feeding them into a model. Higher resolution videos will lead to more detailed accuracy in prediction, but will use more memory and take longer to either predict on or train from. The default video loading configuration for all three pretrained models resizes images to 240x426 pixels. 
 
-Say that you have a large number of videos, and you are more concerned with detecting blank v. non-blank videos than with identifying different species. In this case, you may not need a very high resolution and iterating through all of your videos with a high resolution would take a very long time. To resize all images to 50x50 pixels instead of the default 224x224: 
+Say that you have a large number of videos, and you are more concerned with detecting blank v. non-blank videos than with identifying different species. In this case, you may not need a very high resolution and iterating through all of your videos with a high resolution would take a very long time. To resize all images to 50x50 pixels instead of the default 240x426: 
 
 === "YAML file"
     ```yaml
     video_loader_config:
-        video_height: 50
-        video_width: 50
+        model_input_height: 50
+        model_input_width: 50
         total_frames: 16 # total_frames must always be specified
     ```
 === "Python"
     In Python, video resizing can be specified when `VideoLoaderConfig` is instantiated:
 
-    ```python
+    ```python hl_lines="6 7 8"
+    from zamba.models.model_manager import predict_model
+    from zamba.models.config import PredictConfig
+    from zamba.data.video import VideoLoaderConfig
+
+    predict_config = PredictConfig(data_directory="example_vids/")
     video_loader_config = VideoLoaderConfig(
-        video_height=50, video_width=50, total_frames=16
+        model_input_height=50, model_input_width=50, total_frames=16
     ) # total_frames must always be specified
+    predict_model(
+        predict_config=predict_config, video_loader_config=video_loader_config
+    )
     ```
 
 ## Frame selection
@@ -64,8 +72,8 @@ Some camera traps begin recording a video when movement is detected. If this is
 === "YAML File"
     ```yaml
     video_loader_config:
-    early_bias: True
-    # ... other parameters
+        early_bias: True
+        # ... other parameters
     ```
 === "Python"
     In Python, `early_bias` is specified when `VideoLoaderConfig` is instantiated:
@@ -78,7 +86,7 @@ This method was used by the winning solution of the [Pri-matrix Factorization](h
 
 ### Evenly distributed frames
 
-A simple option is to sample frames that are evenly distributed throughout a video. For example, to select 32 evenly distributed frames, add the following to a [YAML configuration file](yaml-config.md):
+A simple option is to sample frames that are evenly distributed throughout a video. For example, to select 32 evenly distributed frames:
 
 === "YAML file"
     ```yaml
@@ -99,9 +107,9 @@ A simple option is to sample frames that are evenly distributed throughout a vid
     )
     ```
 
-### MegadetectorLiteYoloX
+### MegadetectorLite
 
-You can use a pretrained object detection model called [MegadetectorLiteYoloX](models.md#megadetectorliteyolox) to select only the frames that are mostly likely to contain an animal. This is the default strategy for all three pretrained models. The parameter `megadetector_lite_config` is used to specify any arguments that should be passed to the megadetector model. If `megadetector_lite_config` is None, the MegadetectorLiteYoloX model will not be used. 
+You can use a pretrained object detection model called [MegadetectorLite](models.md#megadetectorlite) to select only the frames that are mostly likely to contain an animal. This is the default strategy for all three pretrained models. The parameter `megadetector_lite_config` is used to specify any arguments that should be passed to the MegadetectorLite model. If `megadetector_lite_config` is None, the MegadetectorLite model will not be used. 
 
 For example, to take the 16 frames with the highest probability of detection:
 
@@ -117,8 +125,8 @@ For example, to take the 16 frames with the highest probability of detection:
     In Python, these can be specified in the `megadetector_lite_config` argument passed to `VideoLoaderConfig`:
     ```python hl_lines="6 7 8 9 10"
     video_loader_config = VideoLoaderConfig(
-        video_height=224,
-        video_width=224,
+        model_input_height=240,
+        model_input_width=426,
         crop_bottom_pixels=50,
         ensure_total_frames=True,
         megadetector_lite_config={
@@ -134,6 +142,38 @@ For example, to take the 16 frames with the highest probability of detection:
     train_model(video_loader_config=video_loader_config, train_config=train_config)
     ```
 
-To see all of the options that can be passed to `MegadetectorLiteYoloX`, see the `MegadetectorLiteYoloXConfig` class. <!-- TODO: add link to github code><!-->
+If you are using the [MegadetectorLite](models.md#megadetectorlite) for frame selection, there are two ways that you can specify frame resizing:
+
+- `frame_selection_width` and `frame_selection_height` resize images *before* they are input to the frame selection method. If both are `None`, the full size images will be used during frame selection. Using full size images for selection is recommended for better detection of smaller species, but will slow down training and inference.
+- `model_input_height` and `model_input_width` resize images *after* frame selection. These specify the image size that is passed to the actual model.
+
+You can specify both of the above at once, just one, or neither. The example code feeds full-size images to MegadetectorLite, and then resizes images before running them through the neural network.
+
+To see all of the options that can be passed to the MegadetectorLite, see the `MegadetectorLiteYoloXConfig` class. <!-- TODO: add link to github code><!-->
+
+## Speed up training
+
+Training will run faster if you increase `num_workers` and/or increase `batch_size`. `num_workers` is the number of subprocesses to use for data loading. The minimum is 0, meaning the data will be loaded in the main process, and the maximum is one less than the number of CPUs in your system. By default `num_workers` is set to 3 and `batch_size` is set to 2. Increasing either of these will use more GPU memory, and could raise an error if the memory required is more than your machine has available.
+
+Both can be specified in either [`predict_config`](configurations.md#prediction-arguments) or [`train_config`](configurations.md#training-arguments). For example, to increase `num_workers` to 5 and `batch_size` to 4 for inference:
+
+=== "YAML file"
+    ```yaml
+    predict_config:
+        data_directory: example_vids/
+        num_workers: 5
+        batch_size: 4
+        # ... other parameters
+    ```
+=== "Python"
+    ```python
+    predict_config = PredictConfig(
+        data_directory="example_vids/",
+        num_workers=5,
+        batch_size=4,
+        # ... other parameters
+    )
+    ```
+
 
-And that's just the tip of the iceberg! See the [All Optional Arguments](configurations.md) page for more possibilities.
+And that's just the tip of the iceberg! See the [All Optional Arguments](configurations.md) page for more possibilities.
diff --git a/docs/docs/index.md b/docs/docs/index.md
@@ -8,12 +8,7 @@ Welcome to zamba's documentation!
 
 *Zamba means "forest" in the Lingala language.*
 
-
-Zamba is a tool built in Python to automatically identify the species seen
-in camera trap videos from sites in Africa and Europe. Using the combined
-input of various deep learning models, the tool makes predictions for 42
-common species in these videos (as well as blank, or, "no species present").
-Zamba can be accessed as both a command-line tool and a Python package.
+Zamba is a tool built in Python to automatically detect and classify the species seen in camera trap videos. Using state-of-the-art computer vision and machine learning, the tool is trained to identify 42 common species from sites in Africa and Europe (as well as blank, or "no species present"). Users can also input their own labeled videos to finetune a model and make predictions for new species or new contexts. `zamba` can be accessed as both a command-line tool and a Python package.
 
 Zamba ships with three model options. `time_distributed` and `slowfast` are
 trained on 32 common species from central and west Africa. `european` is trained

diff --git a/docs/docs/models.md b/docs/docs/models.md
@@ -118,13 +118,13 @@ See](https://www.chimpandsee.org/). The data included camera trap videos from:
 
 <!-- TODO: add link to yaml file><!-->
 
-By default, an efficient object detection model called [MegadetectorLiteYoloX](#megadetectorliteyolox) is run on all frames to determine which are the most likely to contain an animal. Then `time_distributed` is run on only the 16 frames with the highest predicted probability of detection. By default, videos are resized to 224x224 pixels.
+By default, an efficient object detection model called [MegadetectorLite](#megadetectorlite) is run on all frames to determine which are the most likely to contain an animal. Then `time_distributed` is run on only the 16 frames with the highest predicted probability of detection. By default, videos are resized to 240x426 pixels.
 
 The full default video loading configuration is:
 ```yaml
 video_loader_config:
-  video_height: 224
-  video_width: 224
+  model_input_height: 240
+  model_input_width: 426
   crop_bottom_pixels: 50
   ensure_total_frames: True
   megadetector_lite_config:
@@ -140,15 +140,15 @@ The above is pulled in by default if `time_distributed` is used in the command l
 === "YAML file"
     ```yaml
     video_loader_config:
-      video_height: # any integer
-      video_width: # any integer
+      model_input_height: # any integer
+      model_input_width: # any integer
       total_frames: 16
     ```
 === "Python"
     ```python
     video_loader_config = VideoLoaderConfig(
-      video_height=..., # any integer
-      video_width=..., # any integer
+      model_input_height=..., # any integer
+      model_input_width=..., # any integer
       total_frames=16
     )
     ```
@@ -177,14 +177,14 @@ The `slowfast` model was trained using the same data as the [`time_distributed`
 
 <!-- TODO: add link to yaml file><!-->
 
-By default, an efficient object detection model called [MegadetectorLiteYoloX](#megadetectorliteyolox) is run on all frames to determine which are the most likely to contain an animal. Then `slowfast` is run on only the 32 frames with the highest predicted probability of detection. By default, videos are resized to 224x224 pixels.
+By default, an efficient object detection model called [MegadetectorLite](#megadetectorlite) is run on all frames to determine which are the most likely to contain an animal. Then `slowfast` is run on only the 32 frames with the highest predicted probability of detection. By default, videos are resized to 240x426 pixels.
 
 The full default video loading configuration is:
 
 ```yaml
 video_loader_config:
-  video_height: 224
-  video_width: 224
+  model_input_height: 240
+  model_input_width: 426
   crop_bottom_pixels: 50
   ensure_total_frames: True
   megadetector_lite_config:
@@ -200,15 +200,15 @@ The above is pulled in by default if `slowfast` is used in the command line. If
 === "YAML file"
     ```yaml
     video_loader_config:
-      video_height: # any integer >= 200
-      video_width: # any integer >= 200
+      model_input_height: # any integer >= 200
+      model_input_width: # any integer >= 200
       total_frames: 32
     ```
 === "Python"
     ```python
     video_loader_config = VideoLoaderConfig(
-      video_height=..., # any integer >= 200
-      video_width=..., # any integer >= 200
+      model_input_height=..., # any integer >= 200
+      model_input_width=..., # any integer >= 200
       total_frames=32
     )
     ```
@@ -234,13 +234,13 @@ Evolutionary Anthropology](https://www.eva.mpg.de/index.html). The finetuning da
 
 <!-- TODO: add link to yaml file><!-->
 
-By default, an efficient object detection model called [MegadetectorLiteYoloX](#megadetectorliteyolox) is run on all frames to determine which are the most likely to contain an animal. Then `european` is run on only the 16 frames with the highest predicted probability of detection. By default, videos are resized to 224x224 pixels.
+By default, an efficient object detection model called [MegadetectorLite](#megadetectorlite) is run on all frames to determine which are the most likely to contain an animal. Then `european` is run on only the 16 frames with the highest predicted probability of detection. By default, videos are resized to 240x426 pixels.
 
 The full default video loading configuration is:
 ```yaml
 video_loader_config:
-  video_height: 224
-  video_width: 224
+  model_input_height: 240
+  model_input_width: 426
   crop_bottom_pixels: 50
   ensure_total_frames: True
   megadetector_lite_config:
@@ -257,28 +257,28 @@ The above is pulled in by default if `european` is used in the command line. If
 === "YAML file"
     ```yaml
     video_loader_config:
-      video_height: # any integer
-      video_width: # any integer
+      model_input_height: # any integer
+      model_input_width: # any integer
       total_frames: 16
     ```
 === "Python"
     ```python
     video_loader_config = VideoLoaderConfig(
-      video_height=..., # any integer
-      video_width=..., # any integer
+      model_input_height=..., # any integer
+      model_input_width=..., # any integer
       total_frames=16
     )
     ```
 
-<a id='megadetectorliteyolox'></a>
+<a id='megadetectorlite'></a>
 
-## MegadetectorLiteYoloX
+## MegadetectorLite
 
-Running any of the three models that ship with `zamba` on all frames of a video would be incredibly time consuming and computationally intensive. Instead, `zamba` uses a more efficient object detection model called MegadetectorLiteYoloX to determine the likelihood that each frame contains an animal. Then, only the frames with the highest probability of detection can be passed to the model.
+Running any of the three models that ship with `zamba` on all frames of a video would be incredibly time consuming and computationally intensive. Instead, `zamba` uses a more efficient object detection model called MegadetectorLite to determine the likelihood that each frame contains an animal. Then, only the frames with the highest probability of detection can be passed to the model.
 
-MegadetectorLiteYoloX combines two open-source models:
+MegadetectorLite combines two open-source models:
 
 * [Megadetector](https://github.com/microsoft/CameraTraps/blob/master/megadetector.md) is a pretrained image model designed to detect animals, people, and vehicles in camera trap videos.
 * [YOLOX](https://github.com/Megvii-BaseDetection/YOLOX) is a high-performance, lightweight object detection model that is much less computationally intensive than Megadetector.
 
-Megadetector is much better at identifying frames with animals than YOLOX, but too computationally intensive to run on every frame. MegadetectorLiteYoloX was created by training the YOLOX model using the predictions of the Megadetector as ground truth - this method is called [student-teacher training](https://towardsdatascience.com/knowledge-distillation-simplified-dd4973dbc764).
+The Megadetector is much better at identifying frames with animals than YOLOX, but too computationally intensive to run on every frame. MegadetectorLite was created by training the YOLOX model using the predictions of the Megadetector as ground truth - this method is called [student-teacher training](https://towardsdatascience.com/knowledge-distillation-simplified-dd4973dbc764).