(R-CNN) Rich feature hierarchies for accurate object detection and semantic segmentation
Highlights of R-CNN Paper
- Combines region proposals with CNN (R-CNN : Regions with CNN features).
- First to show that a CNN can lead to very high object detection performance on PASCAL VOC compared to methods based on HOG-like features.
- Improves mAP by more than 30% compared to previous SOTA on VOC 2012.
- Presents insights on what the network learns $\rightarrow$ rich hierarchy of image features.
Methodology
- Module 1 - Category independent region proposals
- Module 2 - For each region CNN $\rightarrow$ extracts feature vector
- Module 3 - Class specific linear SVMs.
Module 1 - Category independent region proposals
In related work, there have been various methods to generate category independent region proposals. Authors have used Selective Search to perform a good comparison with prior work.
Module 2 - Feature Extraction
AlexNet [3] is used to extract a 4096-dimensional feature vector from each region proposal. Since AlexNet requires a 227$\times$227 image, irrespective of the region aspect ratio, they are warped.