• Ep. 245 - Part 3 - June 11, 2024

  • 2024/06/13
  • 再生時間: 38 分
  • ポッドキャスト

Ep. 245 - Part 3 - June 11, 2024

  • サマリー

  • ArXiv Computer Vision research for Tuesday, June 11, 2024.


    00:21: DERM12345: A Large, Multisource Dermatoscopic Skin Lesion Dataset with 38 Subclasses

    01:44: Beware of Aliases -- Signal Preservation is Crucial for Robust Image Restoration

    02:49: Benchmarking Vision-Language Contrastive Methods for Medical Representation Learning

    04:04: OphNet: A Large-Scale Video Benchmark for Ophthalmic Surgical Workflow Understanding

    06:01: 4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models

    07:24: VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

    08:58: Image Neural Field Diffusion Models

    10:11: Comparing Deep Learning Models for Rice Mapping in Bhutan Using High Resolution Satellite Imagery

    12:29: GLAD: Towards Better Reconstruction with Global and Local Adaptive Diffusion Models for Unsupervised Anomaly Detection

    14:26: ReduceFormer: Attention with Tensor Reduction by Summation

    15:23: Trim 3D Gaussian Splatting for Accurate Geometry Representation

    16:44: SPIN: Spacecraft Imagery for Navigation

    18:24: Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image Descriptions

    20:00: Understanding Visual Concepts Across Models

    21:12: Instant 3D Human Avatar Generation using Image Diffusion Models

    22:47: Neural Gaffer: Relighting Any Object via Diffusion

    24:19: Autoregressive Pretraining with Mamba in Vision

    25:51: Ctrl-X: Controlling Structure and Appearance for Text-To-Image Generation Without Guidance

    27:19: Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning

    28:50: Situational Awareness Matters in 3D Vision Language Reasoning

    30:10: Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense?

    31:46: Zero-shot Image Editing with Reference Imitation

    33:08: Image and Video Tokenization with Binary Spherical Quantization

    34:18: An Image is Worth 32 Tokens for Reconstruction and Generation

    36:28: Blur-aware Spatio-temporal Sparse Transformer for Video Deblurring

    続きを読む 一部表示

あらすじ・解説

ArXiv Computer Vision research for Tuesday, June 11, 2024.


00:21: DERM12345: A Large, Multisource Dermatoscopic Skin Lesion Dataset with 38 Subclasses

01:44: Beware of Aliases -- Signal Preservation is Crucial for Robust Image Restoration

02:49: Benchmarking Vision-Language Contrastive Methods for Medical Representation Learning

04:04: OphNet: A Large-Scale Video Benchmark for Ophthalmic Surgical Workflow Understanding

06:01: 4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models

07:24: VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

08:58: Image Neural Field Diffusion Models

10:11: Comparing Deep Learning Models for Rice Mapping in Bhutan Using High Resolution Satellite Imagery

12:29: GLAD: Towards Better Reconstruction with Global and Local Adaptive Diffusion Models for Unsupervised Anomaly Detection

14:26: ReduceFormer: Attention with Tensor Reduction by Summation

15:23: Trim 3D Gaussian Splatting for Accurate Geometry Representation

16:44: SPIN: Spacecraft Imagery for Navigation

18:24: Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image Descriptions

20:00: Understanding Visual Concepts Across Models

21:12: Instant 3D Human Avatar Generation using Image Diffusion Models

22:47: Neural Gaffer: Relighting Any Object via Diffusion

24:19: Autoregressive Pretraining with Mamba in Vision

25:51: Ctrl-X: Controlling Structure and Appearance for Text-To-Image Generation Without Guidance

27:19: Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning

28:50: Situational Awareness Matters in 3D Vision Language Reasoning

30:10: Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense?

31:46: Zero-shot Image Editing with Reference Imitation

33:08: Image and Video Tokenization with Binary Spherical Quantization

34:18: An Image is Worth 32 Tokens for Reconstruction and Generation

36:28: Blur-aware Spatio-temporal Sparse Transformer for Video Deblurring

Ep. 245 - Part 3 - June 11, 2024に寄せられたリスナーの声

カスタマーレビュー:以下のタブを選択することで、他のサイトのレビューをご覧になれます。