• Ep. 247 - Part 2 - June 13, 2024

  • 2024/06/15
  • 再生時間: 53 分
  • ポッドキャスト

Ep. 247 - Part 2 - June 13, 2024

  • サマリー

  • ArXiv Computer Vision research for Thursday, June 13, 2024.


    00:21: INS-MMBench: A Comprehensive Benchmark for Evaluating LVLMs' Performance in Insurance

    02:11: Large-Scale Evaluation of Open-Set Image Classification Techniques

    03:43: PC-LoRA: Low-Rank Adaptation for Progressive Model Compression with Knowledge Distillation

    05:00: MMRel: A Relation Understanding Dataset and Benchmark in the MLLM Era

    06:41: Auto-Vocabulary Segmentation for LiDAR Points

    07:30: AdaRevD: Adaptive Patch Exiting Reversible Decoder Pushes the Limit of Image Deblurring

    08:43: EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal Prompts

    10:23: Fine-Grained Domain Generalization with Feature Structuralization

    12:03: SR-CACO-2: A Dataset for Confocal Fluorescence Microscopy Image Super-Resolution

    14:13: ReMI: A Dataset for Reasoning with Multiple Images

    15:41: A Large-scale Universal Evaluation Benchmark For Face Forgery Detection

    17:26: Thoracic Surgery Video Analysis for Surgical Phase Recognition

    18:58: Reducing Task Discrepancy of Text Encoders for Zero-Shot Composed Image Retrieval

    20:40: Adaptive Slot Attention: Object Discovery with Dynamic Slot Number

    22:26: CLIP-Driven Cloth-Agnostic Feature Learning for Cloth-Changing Person Re-Identification

    24:22: Enhanced Object Detection: A Study on Vast Vocabulary Object Detection Track for V3Det Challenge 2024

    25:21: Optimizing Visual Question Answering Models for Driving: Bridging the Gap Between Human and Machine Attention Patterns

    26:30: WildlifeReID-10k: Wildlife re-identification dataset with 10k individual animals

    27:44: MGRQ: Post-Training Quantization For Vision Transformer With Mixed Granularity Reconstruction

    29:28: Comparison Visual Instruction Tuning

    30:51: MirrorCheck: Efficient Adversarial Defense for Vision-Language Models

    32:14: Deep Transformer Network for Monocular Pose Estimation of Ship-Based UAV

    33:10: Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos

    34:33: Neural Assets: 3D-Aware Multi-Object Scene Synthesis with Image Diffusion Models

    36:04: StableMaterials: Enhancing Diversity in Material Generation via Semi-Supervised Learning

    37:30: Parameter-Efficient Active Learning for Foundational models

    38:31: Toffee: Efficient Million-Scale Dataset Construction for Subject-Driven Text-to-Image Generation

    40:22: Common and Rare Fundus Diseases Identification Using Vision-Language Foundation Model with Knowledge of Over 400 Diseases

    42:38: Towards AI Lesion Tracking in PET/CT Imaging: A Siamese-based CNN Pipeline applied on PSMA PET/CT Scans

    44:36: Memory-Efficient Sparse Pyramid Attention Networks for Whole Slide Image Analysis

    46:19: Instance-level quantitative saliency in multiple sclerosis lesion segmentation

    48:37: CMC-Bench: Towards a New Paradigm of Visual Signal Compression

    50:05: Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs

    52:05: CLIPAway: Harmonizing Focused Embeddings for Removing Objects via Diffusion Models

    続きを読む 一部表示

あらすじ・解説

ArXiv Computer Vision research for Thursday, June 13, 2024.


00:21: INS-MMBench: A Comprehensive Benchmark for Evaluating LVLMs' Performance in Insurance

02:11: Large-Scale Evaluation of Open-Set Image Classification Techniques

03:43: PC-LoRA: Low-Rank Adaptation for Progressive Model Compression with Knowledge Distillation

05:00: MMRel: A Relation Understanding Dataset and Benchmark in the MLLM Era

06:41: Auto-Vocabulary Segmentation for LiDAR Points

07:30: AdaRevD: Adaptive Patch Exiting Reversible Decoder Pushes the Limit of Image Deblurring

08:43: EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal Prompts

10:23: Fine-Grained Domain Generalization with Feature Structuralization

12:03: SR-CACO-2: A Dataset for Confocal Fluorescence Microscopy Image Super-Resolution

14:13: ReMI: A Dataset for Reasoning with Multiple Images

15:41: A Large-scale Universal Evaluation Benchmark For Face Forgery Detection

17:26: Thoracic Surgery Video Analysis for Surgical Phase Recognition

18:58: Reducing Task Discrepancy of Text Encoders for Zero-Shot Composed Image Retrieval

20:40: Adaptive Slot Attention: Object Discovery with Dynamic Slot Number

22:26: CLIP-Driven Cloth-Agnostic Feature Learning for Cloth-Changing Person Re-Identification

24:22: Enhanced Object Detection: A Study on Vast Vocabulary Object Detection Track for V3Det Challenge 2024

25:21: Optimizing Visual Question Answering Models for Driving: Bridging the Gap Between Human and Machine Attention Patterns

26:30: WildlifeReID-10k: Wildlife re-identification dataset with 10k individual animals

27:44: MGRQ: Post-Training Quantization For Vision Transformer With Mixed Granularity Reconstruction

29:28: Comparison Visual Instruction Tuning

30:51: MirrorCheck: Efficient Adversarial Defense for Vision-Language Models

32:14: Deep Transformer Network for Monocular Pose Estimation of Ship-Based UAV

33:10: Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos

34:33: Neural Assets: 3D-Aware Multi-Object Scene Synthesis with Image Diffusion Models

36:04: StableMaterials: Enhancing Diversity in Material Generation via Semi-Supervised Learning

37:30: Parameter-Efficient Active Learning for Foundational models

38:31: Toffee: Efficient Million-Scale Dataset Construction for Subject-Driven Text-to-Image Generation

40:22: Common and Rare Fundus Diseases Identification Using Vision-Language Foundation Model with Knowledge of Over 400 Diseases

42:38: Towards AI Lesion Tracking in PET/CT Imaging: A Siamese-based CNN Pipeline applied on PSMA PET/CT Scans

44:36: Memory-Efficient Sparse Pyramid Attention Networks for Whole Slide Image Analysis

46:19: Instance-level quantitative saliency in multiple sclerosis lesion segmentation

48:37: CMC-Bench: Towards a New Paradigm of Visual Signal Compression

50:05: Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs

52:05: CLIPAway: Harmonizing Focused Embeddings for Removing Objects via Diffusion Models

Ep. 247 - Part 2 - June 13, 2024に寄せられたリスナーの声

カスタマーレビュー:以下のタブを選択することで、他のサイトのレビューをご覧になれます。