• Ep. 246 - Part 3 - June 12, 2024

  • 2024/06/13
  • 再生時間: 44 分
  • ポッドキャスト

Ep. 246 - Part 3 - June 12, 2024

  • サマリー

  • ArXiv Computer Vision research for Wednesday, June 12, 2024.


    00:20: From a Social Cognitive Perspective: Context-aware Visual Social Relationship Recognition

    02:09: APSeg: Auto-Prompt Network for Cross-Domain Few-Shot Semantic Segmentatio

    03:57: 2.5D Multi-view Averaging Diffusion Model for 3D Medical Image Translation: Application to Low-count PET Reconstruction with CT-less Attenuation Correction

    05:47: DDR: Exploiting Deep Degradation Response as Flexible Image Descriptor

    06:58: Eyes Wide Unshut: Unsupervised Mistake Detection in Egocentric Video by Detecting Unpredictable Gaze

    08:02: LaneCPP: Continuous 3D Lane Detection using Physical Priors

    09:23: FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation

    11:10: VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks

    12:46: MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos

    14:39: OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

    16:49: AWGUNET: Attention-Aided Wavelet Guided U-Net for Nuclei Segmentation in Histopathology Images

    18:15: Diffusion Soup: Model Merging for Text-to-Image Diffusion Models

    19:58: Coherent Optical Modems for Full-Wavefield Lidar

    21:32: Transformation-Dependent Adversarial Attacks

    22:45: PixMamba: Leveraging State Space Models in a Dual-Level Architecture for Underwater Image Enhancement

    24:10: GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices

    25:57: ConceptHash: Interpretable Fine-Grained Hashing via Concept Discovery

    27:26: Self-supervised Learning of Neural Implicit Feature Fields for Camera Pose Refinement

    28:51: Real2Code: Reconstruct Articulated Objects via Code Generation

    30:02: Human 3Diffusion: Realistic Avatar Creation via Explicit 3D Consistent Diffusion Models

    31:42: RMem: Restricted Memory Banks Improve Video Object Segmentation

    33:12: What If We Recaption Billions of Web Images with LLaMA-3?

    34:42: Real3D: Scaling Up Large Reconstruction Models with Real-World Images

    36:07: Enhancing End-to-End Autonomous Driving with Latent World Model

    37:12: Words Worth a Thousand Pictures: Measuring and Understanding Perceptual Variability in Text-to-Image Generation

    38:43: On Evaluating Adversarial Robustness of Volumetric Medical Segmentation Models

    40:16: Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models

    42:15: ICE-G: Image Conditional Editing of 3D Gaussian Splats

    続きを読む 一部表示

あらすじ・解説

ArXiv Computer Vision research for Wednesday, June 12, 2024.


00:20: From a Social Cognitive Perspective: Context-aware Visual Social Relationship Recognition

02:09: APSeg: Auto-Prompt Network for Cross-Domain Few-Shot Semantic Segmentatio

03:57: 2.5D Multi-view Averaging Diffusion Model for 3D Medical Image Translation: Application to Low-count PET Reconstruction with CT-less Attenuation Correction

05:47: DDR: Exploiting Deep Degradation Response as Flexible Image Descriptor

06:58: Eyes Wide Unshut: Unsupervised Mistake Detection in Egocentric Video by Detecting Unpredictable Gaze

08:02: LaneCPP: Continuous 3D Lane Detection using Physical Priors

09:23: FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation

11:10: VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks

12:46: MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos

14:39: OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

16:49: AWGUNET: Attention-Aided Wavelet Guided U-Net for Nuclei Segmentation in Histopathology Images

18:15: Diffusion Soup: Model Merging for Text-to-Image Diffusion Models

19:58: Coherent Optical Modems for Full-Wavefield Lidar

21:32: Transformation-Dependent Adversarial Attacks

22:45: PixMamba: Leveraging State Space Models in a Dual-Level Architecture for Underwater Image Enhancement

24:10: GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices

25:57: ConceptHash: Interpretable Fine-Grained Hashing via Concept Discovery

27:26: Self-supervised Learning of Neural Implicit Feature Fields for Camera Pose Refinement

28:51: Real2Code: Reconstruct Articulated Objects via Code Generation

30:02: Human 3Diffusion: Realistic Avatar Creation via Explicit 3D Consistent Diffusion Models

31:42: RMem: Restricted Memory Banks Improve Video Object Segmentation

33:12: What If We Recaption Billions of Web Images with LLaMA-3?

34:42: Real3D: Scaling Up Large Reconstruction Models with Real-World Images

36:07: Enhancing End-to-End Autonomous Driving with Latent World Model

37:12: Words Worth a Thousand Pictures: Measuring and Understanding Perceptual Variability in Text-to-Image Generation

38:43: On Evaluating Adversarial Robustness of Volumetric Medical Segmentation Models

40:16: Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models

42:15: ICE-G: Image Conditional Editing of 3D Gaussian Splats

Ep. 246 - Part 3 - June 12, 2024に寄せられたリスナーの声

カスタマーレビュー:以下のタブを選択することで、他のサイトのレビューをご覧になれます。