ArXiv Computer Vision research for Thursday, June 13, 2024.
00:21: FouRA: Fourier Low Rank Adaptation
01:41: Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation
03:18: Few-Shot Anomaly Detection via Category-Agnostic Registration Learning
04:57: Skim then Focus: Integrating Contextual and Fine-grained Views for Repetitive Action Counting
06:46: ToSA: Token Selective Attention for Efficient Vision Transformers
08:00: Computer vision-based model for detecting turning lane features on Florida's public roadways
09:08: Improving Adversarial Robustness via Feature Pattern Consistency Constraint
10:52: Research on Deep Learning Model of Feature Extraction Based on Convolutional Neural Network
12:10: NeRF Director: Revisiting View Selection in Neural Volume Rendering
13:36: Conceptual Learning via Embedding Approximations for Reinforcing Interpretability and Transparency
15:03: Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability,Reproducibility, and Practicality
16:40: COVE: Unleashing the Diffusion Feature Correspondence for Consistent Video Editing
18:16: Fusion of regional and sparse attention in Vision Transformers
19:26: Zoom and Shift are All You Need
20:17: EgoExo-Fitness: Towards Egocentric and Exocentric Full-Body Action Understanding
21:49: The Penalized Inverse Probability Measure for Conformal Classification
23:24: OpenMaterial: A Comprehensive Dataset of Complex Materials for 3D Reconstruction
24:47: Blind Super-Resolution via Meta-learning and Markov Chain Monte Carlo Simulation
26:30: Computer Vision Approaches for Automated Bee Counting Application
27:17: Dual Attribute-Spatial Relation Alignment for 3D Visual Grounding
28:16: A Label-Free and Non-Monotonic Metric for Evaluating Denoising in Event Cameras
29:43: Multiple Prior Representation Learning for Self-Supervised Monocular Depth Estimation via Hybrid Transformer
31:25: Neural NeRF Compression
32:29: Preserving Identity with Variational Score for General-purpose 3D Editing
33:50: AirPlanes: Accurate Plane Estimation via 3D-Consistent Embeddings
34:51: Adaptive Temporal Motion Guided Graph Convolution Network for Micro-expression Recognition
36:10: Enhancing Cross-Modal Fine-Tuning with Gradually Intermediate Modality Generation
37:34: AMSA-UNet: An Asymmetric Multiple Scales U-net Based on Self-attention for Deblurring
38:49: Cross-Modal Learning for Anomaly Detection in Fused Magnesium Smelting Process: Methodology and Benchmark
40:45: A PCA based Keypoint Tracking Approach to Automated Facial Expressions Encoding
42:02: Steganalysis on Digital Watermarking: Is Your Defense Truly Impervious?
43:28: FacEnhance: Facial Expression Enhancing with Recurrent DDPMs
45:11: How structured are the representations in transformer-based vision encoders? An analysis of multi-object representations in vision-language models
47:08: Suitability of KANs for Computer Vision: A preliminary investigation