REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments

PPO Implementation changes to REVEIRE CODE BASE

This Section discribes the changes made to REVEIRE and gives an explantaiton of the flow of the code. Below is the ReadMe for installing and using REVEIRE.

All of the main parts of the code that I had to change have a copy i.e: follower.py -> follower_cp.py

First, here is a breakdown of how the code progresses:
TrainFast.py is the starting function. Here our environment and agent are made using Env.py class R2RBatch and Follower.py Seq2Seq Class. From there we enter the train function in TrainFast. This primarily uses the follower.py script and runs till completion (there are 10466 instructions and we use batches of 64. The code saves every 100 iterations so I have it run 16400 times so we save after each batch. Each PPO learn step k+ is computed after a batch )

An explanation of the main parts and changes is as follows:

Follower.py Edit Overview: This script hosts the Seq2Seq class which creates our batches of agents. The function agent.train(..) in TrainFast call train->rollout->rollout_with_loss method in Follower. Rollout_with_loss uses our 64 batch of agents to run through experiments with 64 instruction sets. Each agent/instruction has an episode length of 20 max. I set this value in TrainFast, it was originally set to 10. The max shortest path length for all instructions is under 10. 20 gives the RL env more time to explore and more state-action-reward data to use for training.
More Follower Edits: Along with above, the main edits to follower consist of me by saving obs, log_probs, actions, for all 64 agents during one training loop. This gives me at most 64x20 set of obs,actions,ect. I use this set to run ppo learn and update weights in the decoder. AgentMemory in follower.py is the class I created to use PPO with my agents.
AgentMemory in Follower: The trickiest part of this code is storing history and generating mixed batches. The code is commented but I store the history sequentially so I can compute the advantages. Since the original code uses batches, each agent is not guaranteed to have the same number of possible acitons at time step t. The orignial code pads to accoutn for this. I store the history and sort obs based on action lenght to all mini batches.
TrainFast.py is explained above. The primary changes have to do with the number of iterations. I changed the max number of iterations to 16400 to match the number of instructions/ batch size times iteration saving(100), so we don't repeat examples.
Env.py: This script host the R2RBatch class which creates our env and handels batches of obs. The main change here is in the method '_next_minibatch'. Check the comments in the '_next_minibatch'
ModelFast: This script host the decoder model and all other models. The decoder used is CogroundDecoderLSTM. The main changes I make here are the creation of a value network with in the decoder. This is possible flawed. The state space of the Actor and Citic might be different, check comments above CogroundDecoderLSTM. The computational graph needs to be traced more.

Final Comments: Final push for Reverie PPO Project. Implementaiton should be correct but the model did not learn. State space choice may be wrong, it is possible it is overfitting, or there is some erroneous thing I have not found. ""

REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments

🌟 Results of The 1st REVERIE Challenge on ACL Workshop 2020! More details see here.

Here are the pre-released code and data for the CVPR 2020 paper REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments

Table of Contents

1. Definition of the REVERIE Task
2. Install without Docker
3. Install with Docker
4. Train and Test the Model
5. Data Organization of the REVERIE Task
6. Integrate into Your Existing Project
7. Result File Format
8. Acknowledgements
9. Reference

1. Definition of the REVERIE Task

As shown in the above figure, a robot agent is given a natural language instruction referring to a remote object (here in the red bounding box) in a photo-realistic 3D environment. The agent must navigate to an appropriate location and identify the object from multiple distracting candidates. The blue discs indicate nearby navigable viewpoints provided the simulator.

2. Install without Docker

Note* This section prepares everything to run or train our Navigator-Pointer model. If you are familar with R2R and just want to do the REVERIE task, you can directly go to Section 6.

Note** If you have a fresh Ubuntu system, the following instruction should work well. If not, it may screw up your existing project environments and recommend to try Section 3. Install with Docker.

Prerequisites

A C++ compiler with C++11 support is required. Matterport3D Simulator has several dependencies:

Ubuntu 14.04, 16.04, 18.04
OpenCV >= 2.4 including 3.x
OpenGL
OSMesa
GLM
Numpy
pybind11 for Python bindings
Doxygen for building documentation

E.g. installing dependencies on Ubuntu:

sudo apt-get install libopencv-dev python-opencv freeglut3 freeglut3-dev libglm-dev libjsoncpp-dev doxygen libosmesa6-dev libosmesa6 libglew-dev

If still lack some packages during runing cmake/make or our codes, you can refer to the content in the Dockerfile.

2.1. Clone Repo

Clone the REVERIE repository:

git clone https://github.com/YuankaiQi/REVERIE.git
cd REVERIE

Note that our repository is based on the v0.1 version Matterport3DSimulator, which was originally proposed with the Room-to-Room dataset.

2.2. MAttNet3 Download

Download our pre-trained mini MAttnet3 from Google Drive or Baidu Yun (code: qts6), which is modified from MAttNet to support our model training. Unzip it into the MAttnet3 folder. This is used as the our Pointer model.

2.3. Dataset Download

You need to download RGB images and house segmentation files of the Matterport3D dataset. The following data types are required:

matterport_skybox_images
house_segmentations

The metadata is also needed, and organise data like below:

Matterport
|--v1
   |--metadata
   |--scans

Then update the 'matterportDir' to Matterport setting in trainFast.py.

2.4. Pre-computed Image Features Download

Download and extract the tsv files into the img_features directory from Matterport3DSimulator. You will only need the ImageNet features to replicate our results.

ResNet-152-imagenet features [380K/2.9GB]

2.5. Installation with PyTorch

Let us get things ready to run experiments.

2.5.1. Create Anaconda Environment

# change "rog" (remote object grounding) to any name you prefer
conda create -n rog python=3.6

Activate the enviorment you just created

conda activate rog

2.5.2. Install Special Requirements

pip install -r tasks/REVERIE/requirements.txt

2.5.3. Install PyTorch

# with CUDA 90
conda install pytorch=0.4.0 cuda90 -c pytorch
conda install torchvision=0.2.0 -c pytorch

If you use a newer version, you need to modify codes to load pretrained models.

2.6. Compile the Matterport3D Simulator

Let us compile the simulator so that we can call its functions in python.

Build EGL version using CMake:

cd build
cmake -DEGL_RENDERING=ON ..

# Double-check if CMake find the proper path to your python
# if not, remove the make files and use the cmake with option below instead
rm -rf *
cmake -DEGL_RENDERING=ON -DPYTHON_EXECUTABLE:FILEPATH=/path/to/your/bin/python ..

make
cd ../

Note There are three rendering options, which are selected using cmake options during the build process:

Off-screen GPU rendering using EGL: cmake -DEGL_RENDERING=ON ..
Off-screen CPU rendering using OSMesa: cmake -DOSMESA_RENDERING=ON ..
GPU rendering using OpenGL (requires an X server): cmake ..

The recommended (fast) approach for training agents is using off-screen GPU rendering (EGL).

2.7. Compile MAttNet3

2.7.1. Compile pytorch-faster-rcnn

cd MAttNet3/pyutils/mask-faster-rcnn/lib

You may need to change the -arch version in Makefile to compile the cuda code:

GPU model	Architecture
TitanX (Maxwell/Pascal)	sm_52
GTX 960M	sm_50
GTX 1080 (Ti)	sm_61
Grid K520 (AWS g2.2xlarge)	sm_30
Tesla K80 (AWS p2.xlarge)	sm_37

Compile the CUDA-based nms and roi_pooling using following simple commands:

make

2.7.2. Compile refer

cd ../../refer
make

It will generate _mask.c and _mask.so in external/ folder.

3. Install with Docker

We find that the success rate is slightly lower that obtained using environments built without docker.

Prerequisites

Nvidia GPU with driver >= 384
Install docker
Install nvidia-docker2.0
Note: CUDA / CuDNN toolkits do not need to be installed (these are provided by the docker image)

3.1 Clone Repo

Clone the REVERIE repository:

git clone https://github.com/YuankaiQi/REVERIE.git
cd REVERIE

3.2. Dataset Download

First download fiels as Section 2.3. Then set an environment variable to the location of the dataset, where is the full absolute path (not a relative path or symlink) to the directory 'v1':

export MATTERPORT_DATA_DIR=<PATH>

And set the 'matterportDir' parameter to 'data' in the trainFast.py file.

Note that if is a remote sshfs mount, you will need to mount it with the -o allow_root option or the docker container won't be able to access this directory.

3.3. Dataset Preprocess

To make data loading faster and to reduce memory usage we preprocess the matterport_skybox_images by downscaling and combining all cube faces into a single image using the following script:

./scripts/downsize_skybox.py

This will take a while depending on the number of processes used. By default images are downscaled by 50% and 20 processes are used.

3.4. Build Simulator

Build the docker image:

docker build -t reverie .

Run the docker container, mounting both the git repo and the dataset:

nvidia-docker run -it --mount type=bind,source=$MATTERPORT_DATA_DIR,target=/root/mount/Matterport3DSimulator/data/v1,readonly --volume `pwd`:/root/mount/Matterport3DSimulator reverie

Now (from inside the docker container), build the simulator and run the unit tests:

cd /root/mount/Matterport3DSimulator
mkdir build && cd build
cmake -DEGL_RENDERING=ON ..
make
cd ../

Note There are three rendering options, which are selected using cmake options during the build process (by varying line 3 in the build commands immediately above):

Off-screen GPU rendering using EGL: cmake -DEGL_RENDERING=ON ..
Off-screen CPU rendering using OSMesa: cmake -DOSMESA_RENDERING=ON ..
GPU rendering using OpenGL (requires an X server): cmake ..

The recommended (fast) approach for training agents is using off-screen GPU rendering (EGL).

3.5. Compile MAttNet3

3.5.1. Compile pytorch-faster-rcnn

cd MAttNet3/pyutils/mask-faster-rcnn/lib

You may need to change the -arch version in Makefile to compile the cuda code:

GPU model	Architecture
TitanX (Maxwell/Pascal)	sm_52
GTX 960M	sm_50
GTX 1080 (Ti)	sm_61
Grid K520 (AWS g2.2xlarge)	sm_30
Tesla K80 (AWS p2.xlarge)	sm_37

Compile the CUDA-based nms and roi_pooling using following simple commands:

make

3.5.2. Compile refer

cd ../../refer
make

It will generate _mask.c and _mask.so in external/ folder.

3.6. Enter Simulator with X server

Run the docker container while sharing the host's X server and DISPLAY environment variable with the container:

xhost +
nvidia-docker run -it -e DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix --mount type=bind,source=$MATTERPORT_DATA_DIR,target=/root/mount/Matterport3DSimulator/data/v1,readonly --volume `pwd`:/root/mount/Matterport3DSimulator reverie
cd /root/mount/Matterport3DSimulator

If you get an error like Error: BadShmSeg (invalid shared segment parameter) 128 you may also need to include -e="QT_X11_NO_MITSHM=1" in the docker run command above.

4. Train and Test the Model

For training You can download our pre-trained models from Google Drive or Baidu Yun. If you want to train by yourself, just run the following command:

python tasks/REVERIE/trainFast.py --feedback_method sample2step --experiment_name releaseCheck

For testing To test the model, you need first obtain navigation results by

python tasks/REVERIE/run_search.py

Then run the following command to obtain the grounded object

python tasks/REVERIE/groundingAfterNav.py

Now, you should get results in the 'experiment/releaseCheck/results/' folder.

Note that the results might be slightly different due to using different dependant package versions or GPUs.

5. Data Organization of the REVERIE Task

In the tasks/REVERIE/data folder, you will have REVERIE_train.json, REVERIE_val_seen.json, REVERIE_val_unseen.json, and REVERIE_test four files, which provide instructions, paths, and target object of each task (except the REVERIE_test file). In the tasks/REVERIE/data/BBox folder, you will have json files that record objects observed at each viewpoint within 3 meters.

Example of tarin/val_seen/val_unseen.json file

[
  {
    "distance" : 11.65, # distance to the goal viewpoint
    "ix": 208,  # reserved data, not used
    "scan": "qoiz87JEwZ2", # building ID
    "heading": 4.59, # initial parameters for agent
    "path_id": 1357, # inherited from the R2R dataset
    "objId": 66, # the unique object ID in the current building 
    "id": "1357_66" # task id
    "instructions":[ # collected instructions for REVERIE
        "Go to the entryway and clean the coffee table", 
        "Go to the foyer and wipe down the coffee table", 
        "Go to the foyer on level 1 and pull out the coffee table further from the chair"
     ]
    "path": [ # inherited from the R2R dataset
        "bdb1023cb7cc4ebd8245b9291fcbc1a2", 
        "a6ba3f53b7964464b23341896d3c75fa", 
        "c407e34577aa4724b7e5d447a5d859d1", 
        "9f68b19f50d14f5d8371447f73c3a2e3", 
        "150763c717894adc8ccbbbe640fa67ef", 
        "59b190857cfe47f691bf0d866f1e5aeb", 
        "267a7e2459054db7952fc1e3e45e98fa"
      ]
     "instructions_l":[ # inherited from the R2R dataset and provided just for convenience 
        "Walk into the dining room and continue past the table. Turn left when you xxx ", 
       ...
       ]
  },
  ...
]

Example of json file in the bbox folder

File name format: ScanID_ViewpointID.json, e.g.,VzqfbhrpDEA_57fba128d2f042f7a59793c665a3f587.json

{ # note that this is in the variable type of dict not list
  "57fba128d2f042f7a59793c665a3f587":{ # this key is the id of viewpoint
    "827":{ # the key if object ID
      "name": "toilet",
      "visible_pos":[
        6,7,8,9,19,20  # view index (0~35) which contain the object. Index is consitent with that in R2R 
        ],
      "bbox2d":[
        [585,382,55,98], # [x,y,w,h] and corresponds to the views listed in the "visible_pos"
        ...
       ]
    },
    "833": {
       ...
    },
    ...
  }
}

6. Integrate into Your Existing Project

The easiest way to integrate into your project is to preload all the objects bounding_box/label/visible_pos with the loadObjProposals() function as in the eval_release.py file. Then you are able to access visible objects using ScanID_ViewpointID as key. You can use any referring expression methods to get matched objects with an instruction.

Note The number of instructions may vary across the dataset, we recommend the following way to index an instruction:

instrType = "instructions"
self.instr_ids += ['%s_%d' % (str(item['id']),i) for i in range(len(item[instrType]))]

7. Result File Format

Just add the "'predObjId': int value" pair into your navigation results. That's it!

Below is a toy sample:

[
  {
    "trajectory": [
      [
        "a68b5ae6571e4a66a4727573b88227e4", 
        3.141592653589793, 
        0.0
      ], 
      ...
     ],
     "instr_id": "4774_267_1", 
     "predObjId": 402
  },
  ...
]

8. Acknowledgements

We would like to thank Matterport for allowing the Matterport3D dataset to be used by the academic community. We also thank Philip Roberts, Zheng Liu, Zizheng Pan, and Sam Bahrami for their great help in building the dataset. This project is supported by the Australian Centre for Robotic Vision.

9. Reference

The REVERIE task and dataset are descriped in the following paper:

@inproceedings{reverie,
  title={REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments},
  author={Yuankai Qi and Qi Wu and Peter Anderson and Xin Wang and William Yang Wang and Chunhua Shen and Anton van den Hengel},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2020}
}

Name		Name	Last commit message	Last commit date
Latest commit History 188 Commits
MAttNet3		MAttNet3
build		build
cmake		cmake
connectivity		connectivity
img_features		img_features
include		include
pybind11		pybind11
scripts		scripts
src		src
tasks/REVERIE		tasks/REVERIE
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
Dockerfile		Dockerfile
README.md		README.md
REVERIE_task.png		REVERIE_task.png
demo.gif		demo.gif

CogRob/REVERIE

Folders and files

Latest commit

History

Repository files navigation

PPO Implementation changes to REVEIRE CODE BASE

REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments

🌟 Results of The 1st REVERIE Challenge on ACL Workshop 2020! More details see here.

1. Definition of the REVERIE Task

2. Install without Docker

Prerequisites

2.1. Clone Repo

2.2. MAttNet3 Download

2.3. Dataset Download

2.4. Pre-computed Image Features Download

2.5. Installation with PyTorch

2.5.1. Create Anaconda Environment

2.5.2. Install Special Requirements

2.5.3. Install PyTorch

2.6. Compile the Matterport3D Simulator

2.7. Compile MAttNet3

2.7.1. Compile pytorch-faster-rcnn

2.7.2. Compile refer

3. Install with Docker

Prerequisites

3.1 Clone Repo

3.2. Dataset Download

3.3. Dataset Preprocess

3.4. Build Simulator

3.5. Compile MAttNet3

3.5.1. Compile pytorch-faster-rcnn

3.5.2. Compile refer

3.6. Enter Simulator with X server

4. Train and Test the Model

5. Data Organization of the REVERIE Task

6. Integrate into Your Existing Project

7. Result File Format

8. Acknowledgements

9. Reference

About

Resources

Stars

Watchers

Forks

Languages