• Ep. 247 - Part 1 - June 13, 2024

  • 2024/06/15
  • 再生時間: 48 分
  • ポッドキャスト

Ep. 247 - Part 1 - June 13, 2024

  • サマリー

  • ArXiv Computer Vision research for Thursday, June 13, 2024.


    00:21: FouRA: Fourier Low Rank Adaptation

    01:41: Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation

    03:18: Few-Shot Anomaly Detection via Category-Agnostic Registration Learning

    04:57: Skim then Focus: Integrating Contextual and Fine-grained Views for Repetitive Action Counting

    06:46: ToSA: Token Selective Attention for Efficient Vision Transformers

    08:00: Computer vision-based model for detecting turning lane features on Florida's public roadways

    09:08: Improving Adversarial Robustness via Feature Pattern Consistency Constraint

    10:52: Research on Deep Learning Model of Feature Extraction Based on Convolutional Neural Network

    12:10: NeRF Director: Revisiting View Selection in Neural Volume Rendering

    13:36: Conceptual Learning via Embedding Approximations for Reinforcing Interpretability and Transparency

    15:03: Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability,Reproducibility, and Practicality

    16:40: COVE: Unleashing the Diffusion Feature Correspondence for Consistent Video Editing

    18:16: Fusion of regional and sparse attention in Vision Transformers

    19:26: Zoom and Shift are All You Need

    20:17: EgoExo-Fitness: Towards Egocentric and Exocentric Full-Body Action Understanding

    21:49: The Penalized Inverse Probability Measure for Conformal Classification

    23:24: OpenMaterial: A Comprehensive Dataset of Complex Materials for 3D Reconstruction

    24:47: Blind Super-Resolution via Meta-learning and Markov Chain Monte Carlo Simulation

    26:30: Computer Vision Approaches for Automated Bee Counting Application

    27:17: Dual Attribute-Spatial Relation Alignment for 3D Visual Grounding

    28:16: A Label-Free and Non-Monotonic Metric for Evaluating Denoising in Event Cameras

    29:43: Multiple Prior Representation Learning for Self-Supervised Monocular Depth Estimation via Hybrid Transformer

    31:25: Neural NeRF Compression

    32:29: Preserving Identity with Variational Score for General-purpose 3D Editing

    33:50: AirPlanes: Accurate Plane Estimation via 3D-Consistent Embeddings

    34:51: Adaptive Temporal Motion Guided Graph Convolution Network for Micro-expression Recognition

    36:10: Enhancing Cross-Modal Fine-Tuning with Gradually Intermediate Modality Generation

    37:34: AMSA-UNet: An Asymmetric Multiple Scales U-net Based on Self-attention for Deblurring

    38:49: Cross-Modal Learning for Anomaly Detection in Fused Magnesium Smelting Process: Methodology and Benchmark

    40:45: A PCA based Keypoint Tracking Approach to Automated Facial Expressions Encoding

    42:02: Steganalysis on Digital Watermarking: Is Your Defense Truly Impervious?

    43:28: FacEnhance: Facial Expression Enhancing with Recurrent DDPMs

    45:11: How structured are the representations in transformer-based vision encoders? An analysis of multi-object representations in vision-language models

    47:08: Suitability of KANs for Computer Vision: A preliminary investigation

    続きを読む 一部表示

あらすじ・解説

ArXiv Computer Vision research for Thursday, June 13, 2024.


00:21: FouRA: Fourier Low Rank Adaptation

01:41: Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation

03:18: Few-Shot Anomaly Detection via Category-Agnostic Registration Learning

04:57: Skim then Focus: Integrating Contextual and Fine-grained Views for Repetitive Action Counting

06:46: ToSA: Token Selective Attention for Efficient Vision Transformers

08:00: Computer vision-based model for detecting turning lane features on Florida's public roadways

09:08: Improving Adversarial Robustness via Feature Pattern Consistency Constraint

10:52: Research on Deep Learning Model of Feature Extraction Based on Convolutional Neural Network

12:10: NeRF Director: Revisiting View Selection in Neural Volume Rendering

13:36: Conceptual Learning via Embedding Approximations for Reinforcing Interpretability and Transparency

15:03: Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability,Reproducibility, and Practicality

16:40: COVE: Unleashing the Diffusion Feature Correspondence for Consistent Video Editing

18:16: Fusion of regional and sparse attention in Vision Transformers

19:26: Zoom and Shift are All You Need

20:17: EgoExo-Fitness: Towards Egocentric and Exocentric Full-Body Action Understanding

21:49: The Penalized Inverse Probability Measure for Conformal Classification

23:24: OpenMaterial: A Comprehensive Dataset of Complex Materials for 3D Reconstruction

24:47: Blind Super-Resolution via Meta-learning and Markov Chain Monte Carlo Simulation

26:30: Computer Vision Approaches for Automated Bee Counting Application

27:17: Dual Attribute-Spatial Relation Alignment for 3D Visual Grounding

28:16: A Label-Free and Non-Monotonic Metric for Evaluating Denoising in Event Cameras

29:43: Multiple Prior Representation Learning for Self-Supervised Monocular Depth Estimation via Hybrid Transformer

31:25: Neural NeRF Compression

32:29: Preserving Identity with Variational Score for General-purpose 3D Editing

33:50: AirPlanes: Accurate Plane Estimation via 3D-Consistent Embeddings

34:51: Adaptive Temporal Motion Guided Graph Convolution Network for Micro-expression Recognition

36:10: Enhancing Cross-Modal Fine-Tuning with Gradually Intermediate Modality Generation

37:34: AMSA-UNet: An Asymmetric Multiple Scales U-net Based on Self-attention for Deblurring

38:49: Cross-Modal Learning for Anomaly Detection in Fused Magnesium Smelting Process: Methodology and Benchmark

40:45: A PCA based Keypoint Tracking Approach to Automated Facial Expressions Encoding

42:02: Steganalysis on Digital Watermarking: Is Your Defense Truly Impervious?

43:28: FacEnhance: Facial Expression Enhancing with Recurrent DDPMs

45:11: How structured are the representations in transformer-based vision encoders? An analysis of multi-object representations in vision-language models

47:08: Suitability of KANs for Computer Vision: A preliminary investigation

Ep. 247 - Part 1 - June 13, 2024に寄せられたリスナーの声

カスタマーレビュー:以下のタブを選択することで、他のサイトのレビューをご覧になれます。