Ep. 247 - Part 2 - June 13, 2024

カートのアイテムが多すぎます

ご購入は五十タイトルがカートに入っている場合のみです。

カートに追加できませんでした。

しばらく経ってから再度お試しください。

ウィッシュリストに追加できませんでした。

しばらく経ってから再度お試しください。

ほしい物リストの削除に失敗しました。

しばらく経ってから再度お試しください。

ポッドキャストのフォローに失敗しました

ポッドキャストのフォロー解除に失敗しました

Ep. 247 - Part 2 - June 13, 2024

無料で聴く

ポッドキャストの詳細を見る

このコンテンツについて

ArXiv Computer Vision research for Thursday, June 13, 2024.

00:21: INS-MMBench: A Comprehensive Benchmark for Evaluating LVLMs' Performance in Insurance

02:11: Large-Scale Evaluation of Open-Set Image Classification Techniques

03:43: PC-LoRA: Low-Rank Adaptation for Progressive Model Compression with Knowledge Distillation

05:00: MMRel: A Relation Understanding Dataset and Benchmark in the MLLM Era

06:41: Auto-Vocabulary Segmentation for LiDAR Points

07:30: AdaRevD: Adaptive Patch Exiting Reversible Decoder Pushes the Limit of Image Deblurring

08:43: EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal Prompts

10:23: Fine-Grained Domain Generalization with Feature Structuralization

12:03: SR-CACO-2: A Dataset for Confocal Fluorescence Microscopy Image Super-Resolution

14:13: ReMI: A Dataset for Reasoning with Multiple Images

15:41: A Large-scale Universal Evaluation Benchmark For Face Forgery Detection

17:26: Thoracic Surgery Video Analysis for Surgical Phase Recognition

18:58: Reducing Task Discrepancy of Text Encoders for Zero-Shot Composed Image Retrieval

20:40: Adaptive Slot Attention: Object Discovery with Dynamic Slot Number

22:26: CLIP-Driven Cloth-Agnostic Feature Learning for Cloth-Changing Person Re-Identification

24:22: Enhanced Object Detection: A Study on Vast Vocabulary Object Detection Track for V3Det Challenge 2024

25:21: Optimizing Visual Question Answering Models for Driving: Bridging the Gap Between Human and Machine Attention Patterns

26:30: WildlifeReID-10k: Wildlife re-identification dataset with 10k individual animals

27:44: MGRQ: Post-Training Quantization For Vision Transformer With Mixed Granularity Reconstruction

29:28: Comparison Visual Instruction Tuning

30:51: MirrorCheck: Efficient Adversarial Defense for Vision-Language Models

32:14: Deep Transformer Network for Monocular Pose Estimation of Ship-Based UAV

33:10: Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos

34:33: Neural Assets: 3D-Aware Multi-Object Scene Synthesis with Image Diffusion Models

36:04: StableMaterials: Enhancing Diversity in Material Generation via Semi-Supervised Learning

37:30: Parameter-Efficient Active Learning for Foundational models

38:31: Toffee: Efficient Million-Scale Dataset Construction for Subject-Driven Text-to-Image Generation

40:22: Common and Rare Fundus Diseases Identification Using Vision-Language Foundation Model with Knowledge of Over 400 Diseases

42:38: Towards AI Lesion Tracking in PET/CT Imaging: A Siamese-based CNN Pipeline applied on PSMA PET/CT Scans

44:36: Memory-Efficient Sparse Pyramid Attention Networks for Whole Slide Image Analysis

46:19: Instance-level quantitative saliency in multiple sclerosis lesion segmentation

48:37: CMC-Bench: Towards a New Paradigm of Visual Signal Compression

50:05: Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs

52:05: CLIPAway: Harmonizing Focused Embeddings for Removing Objects via Diffusion Models