MPEG CDVS Standard

Since Oct. 2010, IMRE Group has been working on MPEG Compact Descriptors for Visual Search (CDVS), standardizing techniques that will enable efficient and interoperable design of visual search applications. As an editor of MPEG CDVS standard, Our major contributions are focused on visual search core techniques including local feature descriptor aggregation. low complexity interest point detection, local feature descriptor compression, high efficient indexing in a Hamming space of binary descriptors, etc. Over 10 patents on MPEG CDVS standard core techniques have been filed.

Key Proposals

  • Jie Chen, Ling-Yu Duan*, Tiejun Huang, Wen Gao, Alex C. Kot, Massimo Balestri, Gianluca Francini, Skjalg Lepsoy. CDVS CE1: A low complexity detector ALP_BFLoG. ISO/IEC JTC1/SC29/WG11/M33159,, MPEG 108th, Valencia Geneva, Mar. 2014.
  • Zhe Wang, Ling-Yu Duan*, Jie Lin, Tiejun Huang, Wen Gao, Alex C. Kot. Response to CE2: Improved SCFV. ISO/IEC JTC1/SC29/WG11/M32261, MPEG 108th, Valencia Geneva, Mar. 2014.
  • Jie Lin, Ling-Yu Duan*, Zhe Wang, Tiejun Huang, Wen Gao. Peking Univ. Response to CE2: The Improved SCFV Global Descriptor. ISO/IEC JTC1/SC29/WG11/M32261, MPEG 107th, San Jose, USA, Jan. 2014.
  • Jie Lin, Ling-Yu Duan*, Zhe Wang, Tiejun Huang, Wen Gao. Peking Univ. Response to CE2: Improvements of the SCFV Global Descriptor. ISO/IEC JTC1/SC29/WG11/M31401, MPEG 106th, Geneva, Oct. 2013.
  • Zhe Wang, Ling-Yu Duan*, Jie Lin, Tiejun Huang, Wen Gao. MBIT: An indexing structure to speed up retrieval. ISO/IEC JTC1/SC29/WG11/M28893, MPEG 104th, Incheon Korea, Apr. 2013.
  • Fangkun Wang, Ling-Yu Duan*, Jie Chen, Tiejun Huang, Wen Gao. Peking University Response to CE2: Frequency Domain Interest Point Detector. ISO/IEC JTC1/SC29/WG11/M28991, Incheon Korea, Apr. 2013.
  • Jie Lin, Ling-Yu Duan*, Shuang Yang, Jie Chen, Tiejun Huang, Alex C.Kot, Wen Gao. Peking University Response to CE1: Performance Improvements of the Scalable Compressed Fisher Codes (SCFV). ISO/IEC JTC1/SC29/WG11/M28061, MPEG 103th, Geneva, Jan. 2013.
  • Ling-Yu Duan*, Jie Lin, Jie Chen, Tiejun Huang, Wen Gao. Peking Univ. Extended Results on CE1 and Comments on CE1, CE2 and CE6. ISO/IEC JTC1/SC29/WG11/M26729, MPEG 102th, Shanghai, Oct. 2012.
  • Jie Lin, Ling-Yu Duan*, Jie Chen, Tiejun Huang, Wen Gao. Peking Univ. Response to CE1: A Scalable Low-Memory Global Descriptor. ISO/IEC JTC1/SC29/WG11/M26726, MPEG 102th, Shanghai, Oct. 2012.
  • Jie Lin, Ling-Yu Duan*, Tiejun Huang, Wen Gao. CE1: Improvements to Test Model with a Low Bit Rate Global Descriptor. ISO/IEC JTC1/SC29/WG11/M24781, MPEG 100th, Geneva, Apr. 2012.

Selected Papers on MPEG CDVS

  • Compact Descriptors for Visual Search, Ling-Yu Duan, Jie Lin, Jie Chen, Tiejun Huang, and Wen Gao. IEEE Multimedia Magazine, Jul. - Sep. 2014.
  • Rate-adaptive Compact Fisher Codes for Mobile Visual Search, Jie Lin, Ling-Yu Duan*, Yaping Huang, Siwei Luo, Tiejun Huang, and Wen Gao. IEEE Signal Processing Letters, vol. 21, no. 2, Feb. 2014.
  • Component Hashing of Variable-Length Binary Aggregated Descriptors for Fast Image Search, Zhe Wang, Ling-Yu Duan*, Jie Lin, Tiejun Huang, Wen Gao, Miroslaw Bober Proc. ICIP'14, Paris, France, Oct. 2014.
  • Robust Fisher Codes for Large Scale Image Retrieval, Jie Lin, Ling-Yu Duan*, Tiejun Huang, Wen Gao. Proc. ICASSP'13, Vancouver, Canada, May. 2013.
  • Learning from Mobile Contexts to Minimize the Mobile Location Search Latency, Ling-Yu Duan, Rongrong Ji, Jie Chen, Hongyun Yao, Tiejun Huang, Wen Gao. Sig. Proc.: Image Comm. 28(4): 368-385 (2013)
  • Location Discriminative Vocabulary Coding for Mobile Landmark Search, Rongrong Ji, Ling-Yu Duan*, Jie Chen, Hongxun Yao, Junsong Yuan, Yong Rui, Wen Gao. International Journal of Computer Vision 96(3): 290-314 (2012)

Evaluation Framework

MPEG CDVS aims to define the format of compact visual descriptors as well as the feature extraction and visual search process pipeline to enable interoperable design of visual search applications. The CDVS Ad Hoc Group came up with a competitive evaluation framework, which is in line with the set requirements. In developing key techniques, MPEG experts set up eight core experiments (CEs) to investigate proposals in a competitive and collaborative platform, Table 1 lists the adopted key technologies in developing the MPEG CDVS Test Model.


The CDVS evaluation framework involves two types of experiments: retrieval and pairwise matching. The retrieval experiment evaluates descriptor performance in image retrieval. Here, mean average precision and the top match success rate are measured. The pairwise matching experiment evaluates performance in matching image pairs, which is measured by the success rate (that is, true positive rate) at a given false alarmrate (say, 1 percent) as well as the localization precision. In particular, the approach evaluates descriptor scalability by reporting the performance at six operating points (that is, different descriptor lengths): 512 bytes and 1, 2, 4, 8, and 16 Kbytes. Aiming toward interoperability, a descriptor generated at any operating point shall allow matching with other operating points.

The MPEG CDVS benchmark is a million-scale image dataset that evaluates the performance of compact descriptors. This dataset involves a variety of visual objects, including Graphics, Painting, Video Frame, Landmark, and common objects from UKBench dataset. Graphics dataset depicts CD/DVD/book cover, text document and business card. There are 1,500 queries and 1,000 reference images. Painting dataset contains 400 queries and 100 reference images of paintings. Frame dataset contains 400 queries and 100 reference images of video frames. Landmark dataset contains 3,499 queries and 9,599 reference images from building benchmarks. UKbench dataset contains 2,550 objects, each containing 4 images taken from different viewpoints. To study the performance over a large dataset, a Flickr1M dataset containing 1 million Flickr images as distractors was merged with reference datasets.


Performance

Figures 5a and 5b report the retrieval performance over different dataset types. With the increased descriptor rate, retrieval performance (mean average precision and top match) can be improved progressively, especially at lower operating points. However, the retrieval performance remains stable after 4 Kbytes. In other words, the bit budget of 4 Kbytes suffices for encoding discriminative information. From Figure 5a and 5b, compared to planar objects; searching nonplanar objects is more challenging. Large photometric and geometric distortions challenge correct match of query images with their closest matches in the reference databases. In particular, as landmark objects are located outdoors, more challenges might occur due to clutter, shadows on buildings, reflections on windows, and severe perspective with extreme angles.

Figure 5c gives the pairwise matching performance. Likewise, the matching performance increases with respect to the descriptor code rate. Nonplanar objects are still the most challenging. In addition, the interoperability is shown by pairwise matching between different operating points - for example, 1 versus 4 Kbytes and 2 versus 4 Kbytes.


Contact

Lingyu Duan
Email: lingyu@pku.edu.cn
Tel: +86 10 6275 8116
Fax: +86 10 6275 1638