Since Oct. 2010, IMRE Group has been working on MPEG Compact Descriptors for Visual Search (CDVS), standardizing techniques that will enable efficient and interoperable design of visual search applications. As an editor of MPEG CDVS standard, Our major contributions are focused on visual search core techniques including local feature descriptor aggregation. low complexity interest point detection, local feature descriptor compression, high efficient indexing in a Hamming space of binary descriptors, etc. Over 10 patents on MPEG CDVS standard core techniques have been filed.
MPEG CDVS aims to define the format of compact visual descriptors as well as the feature extraction and visual search process pipeline to enable interoperable design of visual search applications. The CDVS Ad Hoc Group came up with a competitive evaluation framework, which is in line with the set requirements. In developing key techniques, MPEG experts set up eight core experiments (CEs) to investigate proposals in a competitive and collaborative platform, Table 1 lists the adopted key technologies in developing the MPEG CDVS Test Model.
The CDVS evaluation framework involves two types of experiments: retrieval and pairwise matching. The retrieval experiment evaluates descriptor performance in image retrieval. Here, mean average precision and the top match success rate are measured. The pairwise matching experiment evaluates performance in matching image pairs, which is measured by the success rate (that is, true positive rate) at a given false alarmrate (say, 1 percent) as well as the localization precision. In particular, the approach evaluates descriptor scalability by reporting the performance at six operating points (that is, different descriptor lengths): 512 bytes and 1, 2, 4, 8, and 16 Kbytes. Aiming toward interoperability, a descriptor generated at any operating point shall allow matching with other operating points.
The MPEG CDVS benchmark is a million-scale image dataset that evaluates the performance of compact descriptors. This dataset involves a variety of visual objects, including Graphics, Painting, Video Frame, Landmark, and common objects from UKBench dataset. Graphics dataset depicts CD/DVD/book cover, text document and business card. There are 1,500 queries and 1,000 reference images. Painting dataset contains 400 queries and 100 reference images of paintings. Frame dataset contains 400 queries and 100 reference images of video frames. Landmark dataset contains 3,499 queries and 9,599 reference images from building benchmarks. UKbench dataset contains 2,550 objects, each containing 4 images taken from different viewpoints. To study the performance over a large dataset, a Flickr1M dataset containing 1 million Flickr images as distractors was merged with reference datasets.
Figures 5a and 5b report the retrieval performance over different dataset types. With the increased descriptor rate, retrieval performance (mean average precision and top match) can be improved progressively, especially at lower operating points. However, the retrieval performance remains stable after 4 Kbytes. In other words, the bit budget of 4 Kbytes suffices for encoding discriminative information. From Figure 5a and 5b, compared to planar objects; searching nonplanar objects is more challenging. Large photometric and geometric distortions challenge correct match of query images with their closest matches in the reference databases. In particular, as landmark objects are located outdoors, more challenges might occur due to clutter, shadows on buildings, reflections on windows, and severe perspective with extreme angles.
Figure 5c gives the pairwise matching performance. Likewise, the matching performance increases with respect to the descriptor code rate. Nonplanar objects are still the most challenging. In addition, the interoperability is shown by pairwise matching between different operating points - for example, 1 versus 4 Kbytes and 2 versus 4 Kbytes.
Tel: +86 10 6275 8116
Fax: +86 10 6275 1638