arXiv LLM Analyzer

EventHub: Data Factory for Generalizable Event-Based Stereo Networks without Active Sensors

Published: 2026-04-02 | Authors: Luca Bartolomei, Fabio Tosi, Matteo Poggi, Stefano Mattoccia, Guillermo Gallego | Tags: cs.CV

Abstract

We propose EventHub, a novel framework for training deep-event stereo networks without ground truth annotations from costly active sensors, relying instead on standard color images. From these images, we derive either proxy annotations and proxy events through state-of-the-art novel view synthesis techniques, or simply proxy annotations when images are already paired with event data. Using the training set generated by our data factory, we repurpose state-of-the-art stereo models from RGB literature to process event data, obtaining new event stereo models with unprecedented generalization capabilities. Experiments on widely used event stereo datasets support the effectiveness of EventHub and show how the same data distillation mechanism can improve the accuracy of RGB stereo foundation mod

Key Methods AI

The event hub was created.

Datasets Used AI

The event hub hosted by the Google Cloud Service (GCS) is a distributed cloud computing platform that allows data to be ...

Future Work AI

Citation: Adam Eriksson, V.

Hot Rocks Survey V: Secondary Eclipse Photometry of GJ 3473 b with JWST/MIRI

Published: 2026-04-02 | Authors: Måns Holmberg, Hannah Diamond-Lowe, João M. Mendonça, Daniel Kitzmann, Néstor Espinoza | Tags: astro-ph.EP

Abstract

JWST is transforming our ability to characterise small exoplanets, from sub-Neptunes to rocky worlds. A key open question is whether highly irradiated rocky planets can retain atmospheres or are stripped bare by stellar irradiation -- a boundary that remains to be mapped observationally. Here we present the first JWST secondary eclipse observations of the rocky exoplanet GJ 3473 b, obtained with MIRI F1500W photometry. Using four visits, we confidently detect the eclipse at an average depth of 186$\pm$45 ppm, somewhat lower than expected for a blackbody. We test a wide range of data reduction and analysis assumptions and provide new insights into MIRI detector settling behaviour that will benefit future observations. We model a suite of airless surfaces with varied compositions, textures,

Key Methods AI

Datasets Used AI

Future Work AI

V2a/MIRI.

ActionParty: Multi-Subject Action Binding in Generative Video Games

Published: 2026-04-02 | Authors: Alexander Pondaven, Ziyi Wu, Igor Gilitschenski, Philip Torr, Sergey Tulyakov | Tags: cs.CV, cs.AI, cs.LG

Abstract

Recent advances in video diffusion have enabled the development of "world models" capable of simulating interactive environments. However, these models are largely restricted to single-agent settings, failing to control multiple agents simultaneously in a scene. In this work, we tackle a fundamental issue of action binding in existing video diffusion models, which struggle to associate specific actions with their corresponding subjects. For this purpose, we propose ActionParty, an action controllable multi-subject world model for generative video games. It introduces subject state tokens, i.e. latent variables that persistently capture the state of each subject in the scene. By jointly modeling state tokens and video latents with a spatial biasing mechanism, we disentangle global video fra

Key Methods AI

Datasets Used AI

A few examples of this action binding in my post: A few examples of this action binding in my post: A few examples of th...

Future Work AI

More research that will help us better understand how to use and interact with the media, and how to interact with the m...

Modulate-and-Map: Crossmodal Feature Mapping with Cross-View Modulation for 3D Anomaly Detection

Published: 2026-04-02 | Authors: Alex Costanzino, Pierluigi Zama Ramirez, Giuseppe Lisanti, Luigi Di Stefano | Tags: cs.CV

Abstract

We present ModMap, a natively multiview and multimodal framework for 3D anomaly detection and segmentation. Unlike existing methods that process views independently, our method draws inspiration from the crossmodal feature mapping paradigm to learn to map features across both modalities and views, while explicitly modelling view-dependent relationships through feature-wise modulation. We introduce a cross-view training strategy that leverages all possible view combinations, enabling effective anomaly scoring through multiview ensembling and aggregation. To process high-resolution 3D data, we train and publicly release a foundational depth encoder tailored to industrial datasets. Experiments on SiM3D, a recent benchmark that introduces the first multiview and multimodal setup for 3D anomaly

Key Methods AI

Implementation: Dependencies: In this article The following module is a cross-view modulator for 3D.

Datasets Used AI

Anomaly Detection.

Future Work AI

Steerable Visual Representations

Published: 2026-04-02 | Authors: Jona Ruthardt, Manu Gaur, Deva Ramanan, Makarand Tapaswi, Yuki M. Asano | Tags: cs.CV, cs.AI

Abstract

Pretrained Vision Transformers (ViTs) such as DINOv2 and MAE provide generic image features that can be applied to a variety of downstream tasks such as retrieval, classification, and segmentation. However, such representations tend to focus on the most salient visual cues in the image, with no way to direct them toward less prominent concepts of interest. In contrast, Multimodal LLMs can be guided with textual prompts, but the resulting representations tend to be language-centric and lose their effectiveness for generic visual tasks. To address this, we introduce Steerable Visual Representations, a new class of visual representations, whose global and local features can be steered with natural language. While most vision-language models (e.g., CLIP) fuse text with visual features after en

Key Methods AI

The data were obtained using a random-effects model based on the data and the underlying data.

Datasets Used AI

The most recent version of the Steerable Visual Representations is available here.

Future Work AI

How to implement a self-contained visual representation of real-world life.