-
サマリー
あらすじ・解説
ArXiv Computer Vision research for Wednesday, June 12, 2024.
00:20: FaithFill: Faithful Inpainting for Object Completion Using a Single Reference Image
01:21: Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation
02:49: Unveiling the Power of Wavelets: A Wavelet-based Kolmogorov-Arnold Network for Hyperspectral Image Classification
04:26: Flexible Music-Conditioned Dance Generation with Style Description Prompts
05:52: Robust 3D Face Alignment with Multi-Path Neural Architecture Search
07:00: Small Scale Data-Free Knowledge Distillation
08:48: KernelWarehouse: Rethinking the Design of Dynamic Convolution
10:31: A Comprehensive Survey on Machine Learning Driven Material Defect Detection: Challenges, Solutions, and Future Prospects
12:34: Emotional Conversation: Empowering Talking Faces with Cohesive Expression, Gaze and Pose Generation
14:02: IFTD: Image Feature Triangle Descriptor for Loop Detection in Driving Scenes
14:54: Multi-Teacher Multi-Objective Meta-Learning for Zero-Shot Hyperspectral Band Selection
16:30: DemosaicFormer: Coarse-to-Fine Demosaicing Network for HybridEVS Camera
18:10: Spatial-Frequency Dual Progressive Attention Network For Medical Image Segmentation
20:07: Accurate Explanation Model for Image Classifiers using Class Association Embedding
21:55: Real-world Image Dehazing with Coherence-based Label Generator and Cooperative Unfolding Network
23:11: SimSAM: Simple Siamese Representations Based Semantic Affinity Matrix for Unsupervised Image Segmentation
24:06: Asymptotic Unbiased Sample Sampling to Speed Up Sharpness-Aware Minimization
25:34: OpenObj: Open-Vocabulary Object-Level Neural Radiance Fields with Fine-Grained Understanding
26:58: Generalizable Disaster Damage Assessment via Change Detection with Vision Foundation Model
28:26: Fewer Tokens and Fewer Videos: Extending Video Understanding Abilities in Large Vision-Language Models
29:52: Deep Learning for Slum Mapping in Remote Sensing Images: A Meta-analysis and Review
31:49: LVBench: An Extreme Long Video Understanding Benchmark
33:14: Adaptively Bypassing Vision Transformer Blocks for Efficient Visual Tracking
34:48: A Robust Pipeline for Classification and Detection of Bleeding Frames in Wireless Capsule Endoscopy using Swin Transformer and RT-DETR
36:23: 3D CBCT Challenge 2024: Improved Cone Beam CT Reconstruction using SwinIR-Based Sinogram and Image Enhancement
37:29: MWIRSTD: A MWIR Small Target Detection Dataset
38:34: CFG++: Manifold-constrained Classifier Free Guidance for Diffusion Models
40:27: A$^{2}$-MAE: A spatial-temporal-spectral unified remote sensing pre-training method based on anchor-aware masked autoencoder
42:35: Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams
44:26: Identification of Conversation Partners from Egocentric Video