yolo 9000
yolo suffers from a variety of shortcomings such as high localization loss and low recall compared to region proposal-based methods. So yolo mainly focus on improving recall and localization while maintaining classification accuracy.
yolo 9000 adopts some methods to better their model like:
Batch Normalization: improve mAP, regularize the model, reduce the overfitting, and make weight initlization easier. 2% mAP
High Resolution Classifier: Unlike yolo, we first fine tune the classification network at the full 448*448 resolution and then the detection.
Using anchor box: We know that yolo1 predicts the coordinates of bounding boxes directly using fully connected layers. While Faster RCNN predicts the offsets and confidences for anchor boxes using convolutional layers. So we adopt the anchor boxes method which predicts class and objectness for every anchor box. Note that this method wound be improved below.
Unlike Faster RCNN, we don’t use hand-picked priors. This means that we don’t define the width and hight manually but use k-means to learn the priors. We choose k = 5 as a tradeoff between model complexity and high recall. To avoid the imbalance between big boxes and small boxes, we redefine the disance matrix like : d = 1 - IOU(box, centroid). Here’s a code anaysis k-means yolov2
Faster RCNN choose (tx, ty) to calculate the (x, y) like:
x = (tx * w) - X
y = (ty * w) - Y
the model is instable because a anchor can end up at any location