½üÆÚ£¬°¢ÀïÔÆ»úÆ÷ѧϰƽ̨PAI·¢±íµÄ¶àƪÂÛÎÄÔÚICCV 2023ÉÏÈëÑ¡¡£ICCVÊǹú¼Ê¼ÆËã»úÊÓ¾õ´ó»áÊÇÓɵçÆøºÍµç×Ó¹¤³ÌʦлáÿÁ½Äê¾Ù°ìÒ»´ÎµÄÑо¿´ó»á¡£ÓëCVPRºÍECCVÒ»Æð£¬Ëü±»ÈÏΪÊǼÆËã»úÊÓ¾õÁìÓòµÄ¶¥¼¶»áÒéÖ®Ò»¡£ICCV 2023½«ÓÚ10ÔÂ2ÈÕÖÁ10ÔÂ6ÈÕ·¨¹ú°ÍÀè¾Ù°ì¡£ICCV»ã¾ÛÁËÀ´×ÔÊÀ½ç¸÷µØµÄѧÕß¡¢¹¤³ÌʦºÍÑо¿ÈËÔ±£¬·ÖÏí×îеļÆËã»úÊÓ¾õÑо¿³É¹ûºÍ¼¼Êõ½øÕ¹¡£»áÒ麸ÇÁ˼ÆËã»úÊÓ¾õÁìÓòµÄ¸÷¸ö·½Ïò£¬°üÀ¨Í¼Ïñ´¦Àí¡¢Ä£Ê½Ê¶±ð¡¢»úÆ÷ѧϰ¡¢È˹¤ÖÇÄܵȵȡ£ICCVµÄÂÛÎÄ·¢±íºÍÑݽ²¶¼±¸ÊܹØ×¢£¬ÊǼÆËã»úÊÓ¾õÁìÓò½»Á÷ºÍºÏ×÷µÄÖØҪƽ̨¡£
°¢ÀïÔÆPAI×ܹ²ÓÐ3ƪÎÄÕÂÈëÑ¡ICCV 2023£¬ÆäÖа¢ÀïÔÆÓ뻪ÄÏÀí¹¤´óѧÁªºÏÅàÑøÏîÄ¿²ú³öÁË»ù´¡Ä£ÐÍSMTºÍͼÏñ¸´ÔÄ£Ð͵ÄÖ¸ÎƱ£»¤¼¼ÊõÁ½ÆªÎÄÕ£¬°¢ÀïÔÆÓëIDEA-CVRÕÅÀÚÍŶӺÏ×÷²ú³öÁËÄ¿±ê¼ì²âStable DINOһƪÎÄÕ¡£´Ë´Î3ƪÎÄÕÂÈëÑ¡ICCV 2023£¬Òâζ×Å°¢ÀïÔÆPAIÔÚ¹ú¼Ê¼ÆËã»úÊÓ¾õÁìÓò½øÒ»²½ÌáÉýÁËÓ°ÏìÁ¦¡£
ÂÛÎļòÊö
µ±³ß¶È¸ÐÖªµ÷ÖÆÓöÉÏTransformer
½üÄêÀ´£¬»ùÓÚTransformerºÍCNNµÄÊÓ¾õ»ù´¡Ä£ÐÍÈ¡µÃ¾Þ´ó³É¹¦¡£ÓÐÐí¶àÑо¿½øÒ»²½µØ½«Transformer½á¹¹ÓëCNN¼Ü¹¹½áºÏ£¬Éè¼Æ³öÁ˸üΪ¸ßЧµÄhybrid CNN-Transformer Network£¬µ«ËüÃǵľ«¶ÈÈÔÈ»²»¾¡ÈçÒâ¡£±¾ÎĽéÉÜÁËÒ»ÖÖеĻù´¡Ä£ÐÍSMT£¨Scale-Aware Modulation Transformer£©£¬ËüÒÔ¸üµÍµÄ²ÎÊýÁ¿£¨params£©ºÍ¼ÆËãÁ¿£¨flops£©È¡µÃÁË´ó·ùÐÔÄܵÄÌáÉý¡£
²»Í¬ÓÚÆäËûCNN-Transformer½áºÏµÄ·½°¸£¬SMT»ùÓÚ¾í»ý¼ÆËãÉè¼ÆÁËÒ»¸öÐÂÓ±µÄÇáÁ¿³ß¶È¸ÐÖªµ÷ÖƵ¥ÔªScale-Aware Modulation£¨SAM£©£¬ËüÄܹ»²¶×½¶à³ß¶ÈÌØÕ÷µÄͬʱÀ©Õ¹¸ÐÊÜÒ°£¬½øÒ»²½ÔöÇ¿¾í»ýµ÷ÖÆÄÜÁ¦¡£´ËÍ⣬SMTÌá³öÁËÒ»ÖÖ½ø»¯»ìºÏÍøÂçEvolutionary Hybrid Network£¨EHN£©£¬ËüÄܹ»ÓÐЧµØÄ£ÄâÍøÂç´Ódz²ã±äÉîʱ²¶×½ÒÀÀµ¹Øϵ´Ó¾Ö²¿µ½È«¾ÖµÄת±ä£¬´Ó¶øʵÏÖ¸üÓÅÒìµÄÐÔÄÜ¡£ÔÚImagNet¡¢COCOÒÔ¼°ADE20kµÈÈÎÎñÉ϶¼ÑéÖ¤Á˸ÃÄ£Ð͵ÄÓÐЧÐÔ¡£ÖµµÃÒ»ÌáµÄÊÇ£¬SMTÔÚImageNet-22kÉÏԤѵÁ·ºóÒÔ½ö½ö80.5MµÄ²ÎÊýÁ¿ÔÚImageNet-1kÉÏ´ïµ½ÁË88.1%µÄ¾«¶È¡£
×ܵÄÀ´Ëµ£¬ÔÚÊÓ¾õ»ù´¡Ä£ÐÍbackboneµÄ̽Ë÷·³ÌÖУ¬ÎÒÃÇÓÐ×ŶÔδÀ´µÄÕ¹Íû£º
ÒÔÊÓ¾õTransformerΪÀý£¬³ýÁËÔÚ×ԼලѧϰµÈԤѵÁ·ÖÐÒÀ¾ÉÓÃ×ÅViTÕâÖÖplain Vision Transformer£¬´ó²¿·ÖÊÓ¾õ»ù´¡Ä£ÐͶ¼ÒÔSwinºÍPvTÕâÖÖHierarchical¼Ü¹¹Îª»ù´¡Éè¼Æ·¶Ê½¡£¶øÕâÖÖ·¶Ê½ÐèÒª½â¾öµÄÎÊÌâ¾ÍÊÇÈçºÎÔÚdz²ãstageÖÐÉè¼Æ¸ü¸ßЧµÄ×¢ÒâÁ¦»úÖƼÆËãÀ´½â¾ö×Ô×¢ÒâÁ¦µÄ¶þ´Î¸´ÔÓÐÔ´øÀ´µÄ¼ÆË㸺µ£¡£ÊÇ·ñÓиüÓÅÐãµÄ¼ÆËãÄ£¿éÄܹ»´úÌæSAM»òÕßÊÇMSAÊÇÎÒÃǺóÐøÐèÒª¼ÌÐø̽Ë÷µÄ·¡£2023Ä꣬¸ü¶àµÄÊÓ¾õTransformerÄ£ÐͺÍCNN»ù´¡´óÄ£Ðͱ»Ìá³ö£¬ËüÃÇÔÚ¸÷´ó°ñµ¥ÉÏÄã×·ÎҸϣ¬¿ÉÒÔ·¢ÏÖCVÁìÓòÖÐCNNÒÀ¾ÉÓÐ×Åһϯ֮µØ¡£Èç¹ûTransformer²»Äܹ»ÔÚCVÁìÓòÍêÈ«Ìæ´úcnnÉñ¾ÍøÂ磬ÄÇô½«Á½ÕßµÄÓÅÊƽáºÏÆðÀ´ÊÇ·ñÊǸüºÃµÄÑ¡Ôñ£¿Òò´Ë£¬ÎÒÃÇÏ£ÍûSMT¿ÉÒÔ×÷ΪHybrid CNN-Transformer·½ÏòеÄbaseline£¬Íƶ¯¸ÃÁìÓòµÄ½ø²½ºÍ·¢Õ¹¡£
Îȶ¨Æ¥Åä²ßÂÔÌáÉýDetection TransformerÉÏÏÞ
±¾ÎÄÖ¸³öÔÚDETRÖдæÔڵIJ»Îȶ¨µÄÆ¥ÅäÎÊÌâÊÇÓɶàÖØÓÅ»¯Â·¾¶µ¼Öµģ¬¶øÕâ¸öÎÊÌâÔÚDETRµÄone-to-one matchingÖлá±äµÃ¸ü¼ÓÃ÷ÏÔ¡£ÎÒÃDZíÃ÷½öÐèÒªÔÚ·ÖÀàËðʧÖÐÒýÈëÁËλÖöÈÁ¿¾Í¿ÉÒԺܺõÄÓÅ»¯DETRÖдæÔڵIJ»Îȶ¨Æ¥ÅäÎÊÌâ¡£²¢ÇÒ»ùÓÚÕâÒ»ÔÔò£¬ÎÒÃÇͨ¹ýÒýÈëÁËλÖöÈÁ¿ÐÅÏ¢Ìá³öÁËÁ½¸ö¼òµ¥ÓÐЧ²¢ÇÒ¿ÉÒÔÊÊÓÃÓÚËùÓÐDETRϵÁÐÄ£Ð͵Äposition-supervised lossºÍposition-modulated matching costÉè¼Æ¡£´ËÍ⣬ÎÒÃÇÌá³öÁËÃܼ¯memoryÈÚºÏÀ´ÔöÇ¿±àÂëÆ÷ºÍbackboneµÄÌØÕ÷¡£
ÎÒÃÇÔÚһϵÁÐDETRÄ£ÐÍÉ϶ÔÎÒÃǵķ½·¨ÓÐЧÐÔ½øÐÐÁËÑéÖ¤£¬ÆäÖÐÎÒÃǵÄStable-DINOÒÔResNet-50×÷ΪbackboneµÄÌõ¼þÏÂÔÚ1xºÍ2x±ê×¼settingsÏ·ֱð´ïµ½ÁË50.4APºÍ51.5AP¡£²¢ÇÒÎÒÃǵķ½·¨¾ßÓÐ×㹻ǿ´óµÄscalability£¬Ê¹ÓÃSwin-LargeºÍFocal-Huge backboneµÄÌõ¼þÏÂStable-DINOÔÚCOCO test-devÉÏ·Ö±ð´ïµ½ÁË63.8APºÍ64.8APµÄ׼ȷÂÊ¡£
ËäÈ»ÎÒÃǵķ½·¨±íÏÖ³öÁ˺ܺõÄÐÔÄÜ£¬µ«ÎÒÃÇÖ»ÔÚÀàËÆ DETR µÄͼÏñ¶ÔÏó¼ì²âºÍ·Ö¸îÉÏÑéÖ¤Ëü¡£ÖîÈç 3D ¶ÔÏó¼ì²âÖ®ÀàµÄ¸ü¶à̽Ë÷½«×÷ΪÎÒÃÇδÀ´µÄ¹¤×÷¡£´ËÍ⣬ÎÒÃÇÖ»¹Ø×¢ËðʧºÍÆ¥ÅäÖеķÖÀಿ·Ö£¬¶ø±£Áô¶¨Î»²¿·Ö¡£¶Ô¶¨Î»²¿·ÖµÄ·ÖÎöÒ²Áô×÷ÎÒÃÇδÀ´µÄ¹¤×÷¡£
Õë¶ÔͼÏñ¸´ÔÄ£Ð͵ÄÖ¸ÎƱ£»¤¼¼Êõ
Éî¶ÈѧϰÒѾ³ÉΪ½â¾ö¼ÆËã»úÊÓ¾õÎÊÌâµÄÒ»¸öÍ»³ö¹¤¾ß£¬ÔÚ¿ªÔ´ÉçÇøÖй²ÏíÔ¤ÏÈѵÁ·µÄDNNÄ£ÐÍÒѾ³ÉΪһÖÖ³£¼û×ö·¨£¬Ðí¶à¹«Ë¾ºÍ»ú¹¹Ò²Ìṩ¸¶·ÑµÄÉÌÓÃԤѵÁ·Ä£ÐÍ·þÎñ¡£ÕâΪ²»·¨Ê¹ÓÃÕ߳Ϯ/ÇÔÈ¡Ä£ÐÍ´´ÔìÁËÇ¿ÁÒ¶¯»ú£¬ÀýÈçʹÓöñÒâÈí¼þ¸ÐȾ»òÄÚ²¿Ð¹Â©µÈ·½·¨À´¹æ±Ü°º¹óµÄѵÁ·¹ý³Ì¡£Òò´Ë£¬ÉçÇøºÍ¹«Ë¾¶¼ÓÐÇ¿ÁÒÐèÇóÀ´±£»¤ÆäDNNÄ£Ð͵Ä֪ʶ²úȨ¡£±£»¤DNNÄ£ÐÍ֪ʶ²úȨµÄÒ»ÖÖÁ÷Ðз½°¸ÊÇÄ£ÐÍÊý×Öˮӡ£¬Ëü»áÇÖÈëµØǶÈë±»³Æ֮ΪˮӡµÄÌض¨ÐÅÏ¢µ½Ô´Ä£ÐÍÖУ¬²¢¼ì²é¸ÃˮӡÔÚ¿ÉÒÉÄ£ÐÍÖеĴæÔÚ¡£È»¶ø£¬ÇÖÈëʽǶÈë»áÐÞ¸ÄÄ£ÐÍȨÖØ£¬½ø¶ø¿ÉÄÜ»áÓ°ÏìÄ£Ð͵ÄЧÓã¬ÔÚʵ¼ùÖбäµÃ²»ÄÇôÀíÏë¡£
×î½ü£¬Ò»ÖÖ·ÇÇÖÈëʽµÄ·½·¨³ÆΪģÐÍÖ¸ÎƼ¼ÊõÊܵ½Á˹Ø×¢¡£ÓëÄ£ÐÍˮӡ²»Í¬£¬Ö¸ÎƼ¼Êõ²»»áÐÞ¸ÄÄ£ÐÍÈκβÎÊý£¬Æä´ÓÄ£ÐÍÖÐÌáÈ¡³ö³ÆΪָÎƵÄΨһÌØÕ÷À´Ê¶±ðÆäËùÓÐȨ¡£Í¨¹ý±È½ÏÔ´Ä£Ð͵ÄÖ¸ÎÆÓë¿ÉÒÉÄ£Ð͵ÄÖ¸ÎÆÀ´Ñé֤ģÐ͵ÄËùÓÐȨ¡£ÏÖ´æµÄÉî¶ÈÄ£ÐÍÖ¸ÎÆ·½°¸´ó²¿·Ö½ö¾Û½¹ÔÚͼÏñ·ÖÀàÎÊÌâÉÏ£¬ÈçʹÓþö²ß±ß½çµã×÷ΪָÎÆ£¬Õë¶ÔÉî¶ÈͼÏñ¸´ÔÍøÂçµÄÖ¸ÎÆ·½°¸ÉÐδ·¢±í¡£Í¼Ïñ¸´ÔÄ£Ð͵ÄÓ¦ÓÃÒÑȻʮ·Ö¹ã·º£¬ÈçͼÏñÈ¥Ôë¡¢³¬·Ö±æÂÊ¡¢È¥Ä£ºýµÈ¡£Òò´Ë£¬ÎªÌ½¾¿Í¼Ïñ¸´ÔÈÎÎñÖеķÇÇÖÈëʽģÐͱ£»¤·½·¨£¬ÎÒÃÇÊ×´ÎÌá³öÁËÒ»ÖÖÕë¶ÔÉî¶ÈͼÏñ¸´ÔÄ£Ð͵ÄÖ¸ÎÆ·½°¸¡£
ÈçÏÂͼËùʾ£¬ÎÒÃÇ·½·¨ÕûÌå²½ÖèÈçÏ£º
Step1. ¶ÔÔ´Ä£ÐÍÌáÈ¡Ö¸ÎÆ£»Step2. ¶Ô¿ÉÒÉÄ£ÐÍÌáÈ¡Ö¸ÎÆ£¬¿ÉÒÉÄ£ÐÍ¿ÉÄÜÊÇÎ¥¹æ»ñÈ¡µÄ±»¹¥»÷Ä£ÐÍ£¬Ò²¿ÉÄÜÊÇÎ޹صÄÇå°×Ä£ÐÍ£¬Ö¸ÎÆÑéÖ¤µÄÄ¿µÄÊÇÄܹ»Çø·ÖÁ½Õߣ»Step3. ÑéÖ¤Á½×éÖ¸ÎƵÄÏàËÆÐÔ£¬Í¨¹ý¶ÔÁ½×éÖ¸ÎÆ·Ö±ð×öÌØÕ÷ÌáÈ¡£¬²¢¸ù¾ÝÔÚÌØÕ÷Óëͳ¼Æ²ãÃæÉϼÆËãµÄ͵ÇÔ¸ÅÂÊÀ´½øÐÐÅжϡ£
Ö¸ÎÆÌáÈ¡µÄ˼·Ö÷ÒªÊÇ»ùÓÚÄ£ÐÍ·´ÑݵÄ˼Ï룬¹Ì¶¨Ä£ÐÍÓÅ»¯Í¼Ïñ£¬ÕÒ³öÒ»ÕÅÇ¡ºÃʹµÃÄ£Ð͸´ÔÄѶȾùºâµÄÁÙ½çͼÏñ£¬Í¼Ê¾ÈçÏ£º
·½°¸ÓÅȱµã
¶Ô±ÈÄ£ÐÍˮӡ·½°¸£¬ÎÒÃÇÖ¸ÎÆ·½°¸×î´óµÄÓŵãÔÚÓÚÍêÈ«²»»á¸Ä±äÉî¶ÈͼÏñ¸´ÔÍøÂçµÄ²ÎÊý£¬½ø¶ø²»»á¶ÔÄ£ÐÍÐÔÄܲúÉúÈκÎÓ°Ï죬ͬʱ¾ÊµÑéÑéÖ¤Äܹ»µÖÓù³£¼ûµÄÄ£Ð͹¥»÷ÊֶΡ£µ«Ä¿Ç°ÎÒÃǵÄÖ¸ÎÆÑéÖ¤·½°¸ÐèÒª»ñÈ¡Ä£Ð͵ÄÌݶÈÐÅÏ¢£¬Ò²¾ÍÊÇ˵¶Ô±È֮ǰµÄºÚºÐˮӡÑéÖ¤Á÷³Ì£¬ÑéÖ¤·½ÐèÒª¾ß±¸¸ü¸ßµÄȨÏÞ¡£Òò´Ë£¬ÓÅ»¯ÑéÖ¤½×¶ÎÒ²½«³ÉΪÎÒÃÇδÀ´µÄ·½Ïò¡£
Ëã·¨¿ªÔ´
ΪÁ˸üºÃµØ·þÎñ¿ªÔ´ÉçÇø£¬ÉÏÊöÁ½¸öËã·¨µÄÔ´´úÂëÒѾ¿ªÔ´¡£ÁíÍ⣬ÎÒÃÇÕýÔÚ¿ª·¢PAIÉÏÇáËÉѵÁ·ÍÆÀí²¿ÊðÉÏÊöËã·¨µÄ¿ò¼Ü£¬´ó¸Å»áÔÚ10ÔÂÍƳö£¬¾´ÇëÆÚ´ý¡£
GithubµØÖ·£º
https://github.com/AFeng-x/SMT
modelscopeµØÖ·£º
https://modelscope.cn/models/PAI/SMT/summary
°¢ÀïÔÆ»úÆ÷ѧϰƽ̨ PAI ¶àƪÂÛÎÄÈëÑ¡ ICCV 2023
¡ñ ÂÛÎıêÌ⣺
Scale-Aware Modulation Meet Transformer
¡ñ ÂÛÎÄ×÷Õߣº
ÁÖì¿·á¡¢Îâè÷ºã¡¢³Â¼ÑÓí¡¢»Æ¿¡¡¢½ðÁ¬ÎÄ
¡ñ ÂÛÎÄPDFÁ´½Ó£º
https://arxiv.org/pdf/2307.08579.pdf
¡ñ ÂÛÎıêÌ⣺
Detection Transformer with Stable Matching
¡ñ ÂÛÎÄ×÷Õߣº
ÁõÊÀ¡¡¢ÈÎÌìºÍ¡¢³Â¼ÑÓí¡¢ÔøÕ×Ñô¡¢Õźơ¢Àî·å¡¢ÀîºëÑ󡢻ƿ¡¡¢ËÕº½¡¢Öì¾ü¡¢ÕÅÀÚ
¡ñ ÂÛÎÄPDFÁ´½Ó£º
https://arxiv.org/pdf/2304.04742.pdf
¡ñ ÂÛÎıêÌ⣺
Fingerprinting Deep Image Restoration Models
¡ñ ÂÛÎÄ×÷Õߣº
È«ÓîêÍ¡¢ëø御¢ÐíÈôÌΡ¢»Æ¿¡¡¢¼Í»Ô
¡ñ ÂÛÎÄPDFÁ´½Ó£º
https://csyhquan.github.io/manuscript/23-iccv-Fingerprinting%20D
|