Some collaborators and I recently released a first version of a benchmark we think highlights a critical gap in recent AI models in understanding causality in the real-world, beyond a physics focus.
Everyday environments are rich in tangible control interfaces (TCIs), like, light switches, appliance panels, and embedded GUIs, that are designed for humans and demand commonsense and physics reasoning, but also causal prediction and outcome verification in time and space (e.g., delayed heating, remote lights).
Paper:
SWITCH: Benchmarking Modeling and Handling of Tangible Interfaces in Long-horizon Embodied Scenarios
Data and leaderboard in HuggingFace.
Feedback, suggestions, and collaborators are very welcome!
Selfless plug here... Some collaborators and I just released a first version of a benchmark we think highlights a critical gap in recent models in understanding causality in the real-world, beyond a physics focus.
Everyday environments are rich in tangible control interfaces (TCIs), like, light switches, appliance panels, and embedded GUIs, that are designed for humans and demand commonsense and physics reasoning, but also causal prediction and outcome verification in time and space (e.g., delayed heating, remote lights).
For a while I shared an old apartment with 3 others. Old building, 10th floor, cloth-wrapped wiring (no modern plastic/rubber/etc.), windows that didn't close, a condemned fenced-off balcony, and occasional rat visitors that could reach the 10th floor.
My team and I are working on embodied AI. More specifically, focusing on humanoid legged robots for long horizon tasks combining navigation and manipulation/interaction.
Everyday environments are rich in tangible control interfaces (TCIs), like, light switches, appliance panels, and embedded GUIs, that are designed for humans and demand commonsense and physics reasoning, but also causal prediction and outcome verification in time and space (e.g., delayed heating, remote lights).
Paper: SWITCH: Benchmarking Modeling and Handling of Tangible Interfaces in Long-horizon Embodied Scenarios
Data and leaderboard in HuggingFace.
Feedback, suggestions, and collaborators are very welcome!
reply