Main content

Home

Menu

Loading wiki pages...

View
Wiki Version:
**Context-driven self-supervised visual learning: Harnessing the environment as a data source** A contrastive learning model utilizing spatiotemporal context as a similarity signal and a pipeline constructing image datasets using an embodied sampling agent. The datasets have been deposited at https://osf.io/w98gq/. **Content list by directory** **/checkpoint:** both contrastive training and downstream classification checkpoints for all the experiments based on MoCo V2's model. The following table shows each subfolder's name and the experiments correspond to each folder. |Folder name|Experiments| | --- | --- | |threshold|changing thresholds on Apt14K, House100K, House14K| |longterm|long term training on the House14K| |weighted|adding weights to each positive pair| |augmentation|using lighting conditions as an augmentation| |sensitivity\_analysis|analyzing the influence of different hyperparameters on results| Each experiment is named by **"(prefix_)dataset\_moco-model\_positive-thres-pos\_thres-rot\_run"**. All the parameters' names are listed below. And for the pretraining checkpoint, the file name is experiment name+"pre\_"+pretraining-epoch. For the downstream checkpoint, the file name is experiment name+"pre\_"+pretraining-epoch+"\_down\_checkpoint\_"+ downstream-epoch. |Parameters| Description| | --- | --- | | dataset | **'14kHouse'**: House14K **'14kApt'**: Apt14K **'100kHouse'**: House100K **‘100kHouse_skybox’**:House100KLighting | | moco-model | **'standard'**: baseline original MoCo V2 **'space'**: calculate the similarity by spatial information (ESS) | | positive-pair | **'one'**: use one positive pair **'mult'**: use all the positive pairs in the dictionary (M) | |weighted|**'binary'**: label image pairs as 0 or 1 (B) **'weighted'**: label image pairs with continuous number from 0 to 1 (W)| |thres-rot| the threshold of rotation for positive pairs| |thres-pos| the threshold of position for positive pairs| | run | repeat this experiment the ith time | | proj-name | The project name in Weights & Biases | Some experiments' names have prefix, which is listed as following: |Prefix name|Folder|Meaning| | --- | --- | --- | |weighted_$\alpha$_$\beta$|weighted|the model is trained with weighted positive pair calculated by $w_{i,j}=\frac{1}{\exp{\left(\alpha\left({\Delta_\text{rot.}/\beta}+\Delta_\text{pos.}\right)\right)}}$| |noaug|augmentation|the model is trained without transformations on input images| |temp0.1|sensitivity\_analysis|the temperature coefficient of the pretraining is set to 0.1| |temp0.4|sensitivity\_analysis|the temperature coefficient of the pretraining is set to 0.4| |dict2048|sensitivity\_analysis|the dictionary size of the pretraining is set to 2048| |dict8192|sensitivity\_analysis|the dictionary size of the pretraining is set to 8192| |bs128|sensitivity\_analysis|the batch size of the pretraining is set to 128| |bs512|sensitivity\_analysis|the batch size of the pretraining is set to 512| **/comparison:** each folder under it includes the implementation code, checkpoints, and running commands of ESS-MB in the corresponding model. For more information about the code, please refer to the Readme.md in the corresponding folder. The run.sh includes commands to run the original model and the model with the ESS-MB approach. The checkpoint file contains the results of three runs of the original model (standard) and the model with the ESS-MB approach (space) in House100K. 'checkpoint_0199.pth.tar' is the pretraining checkpoint. 'down_checkpoint_0049.pth.tar' is the downstream checkpoint. Some models have "down_model_best.pth.tar" as the downstream model with best test accuracy, "training.log" as pre-training results and "down_results.csv" as downstream results. **/contrastive\_learning:** main codes for the contrastive training, adapted from the original code for MoCo v2 (He et al. 2020). **/data\_generation\_pipeline:** pipelines for collecting datasets with trajectories in House and apartment environments using ThreeDWorld version 1.8.29. Json files define the characteristics and positions of the added furniture. One python script is used for recording a trajectory, and the other for making images from a trajectory. **/evaluation:**"main\_linicls.py" uses the pre-trained model for the downstream classification task. "label\_room.py" marks room types for each step within the bounding box of the corresponding room in House environment. "room\_cls.py" predicts each image's room label with the pre-trained model. "spatial\_info" predicts each image's spatial information with the pre-trained model. "evaluation" generates the t-SNE results for contrastive training checkpoints. **Prerequisites** Install PyTorch, ImageNet dataset and our datasets. **Train the model** This implementation is adapted from the MoCo V2 implementation and only supports multi-GPU, DistributedDataParallel training; single-GPU or DataParallel training is not supported. To train the model, run: python3 main\_moco.py --mlp --moco-t 0.2 --aug-plus --cos --multiprocessing-distributed --rank 0 --world-size 1 --lr 0.3 --batch-size 256 --pre-epochs 200 --dataset 100kHouse --moco-model standard --positive-pair multi --weighted binary --thres-pos 0.8 --thres-rot 12 --run i --proj-name YourProjName Here, we record the training process by Weights & Biases with calls to wandb. If you don't want, you can remove this code. All the new parameters' name are the same as the table above. **Evaluate the model** With a pre-trained model, train a supervised linear **classifier on ImageNet** of frozen weights by running: python3 main\_lincls.py -a resnet50 --multiprocessing-distributed --world-size 1 --rank 0 --lr 30.0 --batch-size 256 --epochs 50 --dataset '100kHouse' --moco-model space --positive-pair multi --weighted binary --thres-pos 0.8 --thres-rot 12 --pre-epoch '0199' --pre-lr 0.3 --pre-bs 256 --run i --proj-name YourProjName With a pre-trained model, train a supervised linear **room classifier** on Apt14K by running: python3 room\_cls.py -a resnet50 --multiprocessing-distributed --world-size 1 --rank 0 --lr 0.3 --batch-size 256 --epochs 20 --dataset '100kHouse' --moco-model 'space' --positive-pair 'multi' --thr-rot 0.8 --thr-dis 12 --run 0 --proj-name YourProjName --train-data DatasetName --room-num i where train-data represent the downstream dataset, the relationship of train-data and room-num is as follows: |train-data|room-num| | --- | --- | |14kApt|9| |14kHouse|8| |100kHouse|8| With a pre-trained model, train a supervised linear **postition and protation predictor** on Apt14K by running: python3 room\_reg.py -a resnet50 --world-size 1 --rank 0 --lr 0.3 --batch-size 256 --epochs 100 --proj-name 'revision_reg' --train-data '14kApt' --moco-model 'space' --positive-pair 'multi' --run 0 --gpu 0 Test a specific model's t-SNE result in House 14k: python3 evaluation.py --run i --dataset 14k --moco-model space --positive-pair multi --weighted binary --rot-dis reason --eval 'TDWTsne' **License** This code is a modification of the original implementation of MoCo from this repository: [https://github.com/facebookresearch/moco](https://github.com/facebookresearch/moco) That code was released with a Creative Commons non-commercial attribution license and the terms of that license can be read on the repository. Parts of this software were copyright by Facebook Inc (now Meta) and this indication is retained in our python scripts.
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
Accept
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.
Accept
×

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.