본문 바로가기

관리 메뉴

HAMA 블로그

Caffe 에서 이미 제공하는 모델 ( Model Zoo ) 정리 본문

통계 & 머신러닝 & 딥러닝

Caffe 에서 이미 제공하는 모델 ( Model Zoo ) 정리

[하마] 이승현 (wowlsh93@gmail.com) 2016. 3. 13. 14:43

* 미리 만들어 놓은 모델 모음집

 

Network in Network model

이 모델은 여기 자세히 나와있다.  ICLR-2014 paper:

Network In Network
M. Lin, Q. Chen, S. Yan
International Conference on Learning Representations, 2014 (arXiv:1409.1556)

please cite the paper if you use the models.


Models from the BMVC-2014 paper "Return of the Devil in the Details: Delving Deep into Convolutional Nets"

이 모델은 ILSVRC-2012 데이타셋으로 학습되었다. 자세한건 project page 와 BMVC-2014 paper:

Return of the Devil in the Details: Delving Deep into Convolutional Nets
K. Chatfield, K. Simonyan, A. Vedaldi, A. Zisserman
British Machine Vision Conference, 2014 (arXiv ref. cs1405.3531)

Please cite the paper if you use the models.


Models used by the VGG team in ILSVRC-2014

이 모델은 LSVRC-2014 에서 VGG 팀에 의해 사용된 모델의 강화 버전. 참고 : project page  arXiv paper:

Very Deep Convolutional Networks for Large-Scale Image Recognition
K. Simonyan, A. Zisserman
arXiv:1409.1556

Please cite the paper if you use the models.


Places-CNN model from MIT.

Places CNN is described in the following NIPS 2014 paper:

B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva
Learning Deep Features for Scene Recognition using Places Database.
Advances in Neural Information Processing Systems 27 (NIPS) spotlight, 2014.

The project page is here


GoogLeNet GPU implementation from Princeton.

We implemented GoogLeNet using a single GPU. Our main contribution is an effective way to initialize the network and a trick to overcome the GPU memory constraint by accumulating gradients over two training iterations.


Fully Convolutional Semantic Segmentation Models (FCN-Xs)

These models are described in the paper:

Fully Convolutional Models for Semantic Segmentation
Jonathan Long, Evan Shelhamer, Trevor Darrell
CVPR 2015
arXiv:1411.4038


CaffeNet fine-tuned for Oxford flowers dataset

https://gist.github.com/jimgoo/0179e52305ca768a601f

The is the reference CaffeNet (modified AlexNet) fine-tuned for the Oxford 102 category flower dataset. The number of outputs in the inner product layer has been set to 102 to reflect the number of flower categories. Hyperparameter choices reflect those in Fine-tuning CaffeNet for Style Recognition on “Flickr Style” Data. The global learning rate is reduced while the learning rate for the final fully connected is increased relative to the other layers.


CNN Models for Salient Object Subitizing.

CNN models described in the following CVPR'15 papger "Salient Object Subitizing":

Salient Object Subitizing
J. Zhang, S. Ma, M. Sameki, S. Sclaroff, M. Betke, Z. Lin, X. Shen, B. Price and R. Mech. 
CVPR, 2015.

Deep Learning of Binary Hash Codes for Fast Image Retrieval

We present an effective deep learning framework to create the hash-like binary codes for fast image retrieval. The details can be found in the following "CVPRW'15 paper":

Deep Learning of Binary Hash Codes for Fast Image Retrieval
K. Lin, H.-F. Yang, J.-H. Hsiao, C.-S. Chen
CVPR 2015, DeepVision workshop


Places_CNDS_models on Scene Recognition

The details of training this model are described in the following report. Please cite this work if the model is useful for you.

Training Deeper Convolutional Networks with Deep Supervision
L.Wang, C.Lee, Z.Tu, S. Lazebnik, arXiv:1505.02496, 2015 


Models for Age and Gender Classification.


GoogLeNet_cars on car model classification

GoogLeNet_cars is the GoogLeNet model pre-trained on ImageNet classification task and fine-tuned on 431 car models in CompCars dataset. It is described in the technical report. Please cite the following work if the model is useful for you.

A Large-Scale Car Dataset for Fine-Grained Categorization and Verification
L. Yang, P. Luo, C. C. Loy, X. Tang, arXiv:1506.08959, 2015


ParseNet: Looking wider to see better

These models are described in the paper:

ParseNet: Looking Wider to See Better
Wei Liu, Andrew Rabinovich, Alexander C. Berg
arXiv:1506.04579


SegNet and Bayesian SegNet

SegNet is a real-time semantic segmentation architecture for scene understanding. Code and trained models for SegNet and Bayesian SegNet are available.

SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation
Vijay Badrinarayanan, Alex Kendall and Roberto Cipolla
arXiv preprint arXiv:1511.00561, 2015

Conditional Random Fields as Recurrent Neural Networks

Code (with Matlab/Python API) and model are described in the ICCV 2015 paper

Conditional Random Fields as Recurrent Neural Networks
S. Zheng, S. Jayasumana, B. Romera-Paredes, V. Vineet, Z. Su, D. Du, C. Huang, P. Torr
ICCV 2015.


Holistically-Nested Edge Detection

The model and code provided are described in the ICCV 2015 paper:

Holistically-Nested Edge Detection
Saining Xie and Zhuowen Tu
ICCV 2015


Translating Videos to Natural Language

These models are described in this NAACL-HLT 2015 paper.

Translating Videos to Natural Language Using Deep Recurrent Neural Networks 
S. Venugopalan, H. Xu, J. Donahue, M. Rohrbach, R. Mooney, K. Saenko   
NAACL-HLT 2015

More details can be found on this project page.


VGG Face CNN descriptor

These models are described in this BMVC 2015 paper.

Deep Face Recognition 
Omkar M. Parkhi, Andrea Vedaldi, Andrew Zisserman    
BMVC 2015

More details can be found on this project page.


Yearbook Photo Dating

Model from the ICCV 2015 Extreme Imaging Workshop paper:

A Century of Portraits: Exploring the Visual Historical Record of American High School Yearbooks 
Shiry Ginosar, Kate Rakelly, Brian Yin, Sarah Sachs, Alyosha Efros
ICCV Workshop 2015

Model and prototxt files: Yearbook


CCNN: Constrained Convolutional Neural Networks for Weakly Supervised Segmentation

These models are described in the ICCV 2015 paper.

Constrained Convolutional Neural Networks for Weakly Supervised Segmentation
Deepak Pathak, Philipp Krähenbühl, Trevor Darrell
ICCV 2015
arXiv:1506.03648

These are pre-release models. They do not run in any current version of BVLC/caffe, as they require unmerged PRs. Full details, source code, models, prototxts are available here: CCNN.


Emotion Recognition in the Wild via Convolutional Neural Networks and Mapped Binary Patterns

We provide models for facial emotion classification for different image representation obtained using mapped binary patterns. See the Project page for more details.

The models are described in the following paper:

Emotion Recognition in the Wild via Convolutional Neural Networks and Mapped Binary Patterns
Gil Levi and Tal Hassner
Proc. ACM International Conference on Multimodal Interaction (ICMI), Seattle, Nov. 2015

If you find our models useful, please add suitable reference to our paper in your work.


Facial Landmark Detection with Tweaked Convolutional Neural Networks

We provide source code and model for article: Yue Wu and Tal Hassner, "Facial Landmark Detection with Tweaked Convolutional Neural Networks", arXiv preprint arXiv:1511.04031, 12 Nov. 2015. See project page for more information about this project.

Written by Ishay Tubi

This software is provided as is, without any warranty, with no legal constraints. If you find our models useful, please add suitable reference to our paper in your work.


Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Download pre-computed Faster R-CNN detectors cd $FRCN_ROOT
./data/scripts/fetch_faster_rcnn_models.sh This will populate the $FRCN_ROOT/data folder with faster_rcnn_models. See data/README.md for details. These models were trained on VOC 2007 trainval.


Sequence to Sequence - Video to Text

These models are described in this ICCV 2015 paper.

Sequence to Sequence - Video to Text
S. Venugopalan, M. Rohrbach, J. Donahue, T. Darrell, R. Mooney, K. Saenko
The IEEE International Conference on Computer Vision (ICCV) 2015

More details can be found on this project page.

Model:
S2VT_VGG_RGB:
This is the S2VT (RGB) model described in the ICCV 2015 paper. It uses video frame features from the VGG-16 layer model. This is trained only on the Youtube video dataset.

Compatibility:
These are pre-release models. They do not run in any current version of BVLC/caffe, as they require unmerged PRs. The models are currently supported by the recurrent branch of the Caffe fork provided at https://github.com/jeffdonahue/caffe/tree/recurrent andhttps://github.com/vsubhashini/caffe/tree/recurrent.


ResNets: Deep Residual Networks from MSRA at ImageNet and COCO 2015

This repository contains the original models (ResNet-50, ResNet-101, and ResNet-152) described in the paper "Deep Residual Learning for Image Recognition" (http://arxiv.org/abs/1512.03385). These models are those used in ILSVRC and COCO 2015 competitions, which won the 1st places in: ImageNet classification, ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

More instructions with prototxt and binary weight files are in:https://github.com/KaimingHe/deep-residual-networks



Comments