Video ID: 8RRzvri051s
Category: speech_dominant
As Sarah stands behind the dark wooden podium emblazoned with the UNHCR logo, wearing a white blazer and multi-strand pearl necklace, what specific action does she perform just before the audience shifts from lingering applause to rapt attention?
Annotation
When the man in the dark suit and red tie who introduced her earlier stands at the same podium, what specific action does he perform immediately after announcing the new spokesperson?
Misleading Information
Category: person_appearance
Description: This tests if the model can distinguish between the two main speakers (Sarah vs. the introducer) and their respective actions. A lazy model might associate 'adjusting the microphone' or 'standing at the podium' with the wrong person based on general conference tropes rather than specific visual tracking of who is speaking at that exact timestamp.
Annotation
While Sarah delivers her speech regarding the fifty million refugees and funding shortfalls, what distinct sound abruptly cuts off her sentence mid-word?
Annotation
Following the collective energy dissipating into near silence after the initial introduction, what distinct sound punctuates the moment as guests shift their chairs?
Misleading Information
Category: sound_type
Description: This requires distinguishing between multiple specific sounds occurring at different times: the musical sting at the start, the mechanical click after applause, and the electronic beep during the speech. A model relying on pattern matching might confuse the 'click' (which happens after applause) with the 'beep' (which happens during the speech), failing to track the temporal sequence of audio events.