본문 바로가기

카테고리 없음

YOLOv10

latency including postprocessing

1. Abstract

 

We aim to further advance the performance-efficiency boundary of YOLOs from both the post-processing and the model architecture.

 

  • We first tackle the problem of redundant predictions in the post-processing by presenting a consistent dual assignments strategy for NMS-free YOLOs with the dual label assignments { one-to-many head and one-to-one head } and consistent matching metric.
    • It allows the model to enjoy rich and harmonious supervision during training while eliminating the need for NMS during inference, leading to competitive performance with high efficiency.
  • We introduce the holistic efficiency-accuracy driven model design strategy for YOLOs
    • For efficiency, we propose the lightweight classification head, spatial channel decoupled downsampling, and rank-guided block design.
    • For accuracy, we explore the large-kernel convolution and present the effective partial self-attention (PSA) module to enhance the model capability.

2. Our Method

1. Consistent dual assignments (one-to-many head and one-to-one head) for NMS-free

Uniform matching metric
Matching metric score assigned for each head branch
largest matching scores ( where mo2o* = mo2o)
'A' is an Supervision Gap between two branches. Distance between two branch is minimized when the largest score of each branch has the same matching score, and other elements has less score-matching. Through the distance metric we enforce the two branch to output same results.

 

  • In dual label assignments, the one-to-many branch provides much richer supervisory signals than
    one-to-one branch. Intuitively, if we can harmonize the supervision of the one-to-one head with that
    of one-to-many head, we can optimize the one-to-one head towards the direction of one-to-many
    head’s optimization. As a result, the one-to-one head can provide improved quality of samples during
    inference, leading to better performance.

2. Holistic Efficiency-Accuracy Driven Model Design

 

large-kernel, MHSA (Multi-head self-attention), FFN(Feed forward Neural Network) : Only used on the last layer with low dimension inputs. Used attention for global features.

3. Experiments