← Dashboard 8RRzvri051s
Ready annotator_b
Video ID: 8RRzvri051s
Category: speech_dominant
Standard Vision ○
Misleading Vision ○
Standard Audio ○
Misleading Audio ○
As Sarah stands behind the dark wooden podium emblazoned with the UNHCR logo, wearing a white blazer and multi-strand pearl necklace, what specific action does she perform just before the audience shifts from lingering applause to rapt attention?
A.She grips the edge of the podium and clears her throat nervously.
B.She sweeps her eyes across the audience and adjusts her stance.
C.She adjusts the microphone with precise grace while meeting the audience's gaze. ✓ Correct
D.She gestures emphatically toward the banner behind her.
E.The visual detail in the question is incorrect
F.The audio detail in the question is incorrect
Answer timestamp: [20s-30s]s Modality: vision Category: cross_modality

Annotation

When the man in the dark suit and red tie who introduced her earlier stands at the same podium, what specific action does he perform immediately after announcing the new spokesperson?
A.She grips the edge of the podium and clears her throat nervously.
B.She sweeps her eyes across the audience and adjusts her stance.
C.She adjusts the microphone with precise grace while meeting the audience's gaze.
D.She gestures emphatically toward the banner behind her.
E.The visual detail in the question is incorrect ✓ Correct
F.The audio detail in the question is incorrect
Answer timestamp: [20s-30s]s Modality: vision Category: cross_modality
Misleading Information
Category: person_appearance
Description: This tests if the model can distinguish between the two main speakers (Sarah vs. the introducer) and their respective actions. A lazy model might associate 'adjusting the microphone' or 'standing at the podium' with the wrong person based on general conference tropes rather than specific visual tracking of who is speaking at that exact timestamp.

Annotation

While Sarah delivers her speech regarding the fifty million refugees and funding shortfalls, what distinct sound abruptly cuts off her sentence mid-word?
A.A single, sustained note from a bowed string instrument like a cello.
B.The faint rustle of fabric as guests lean closer to listen.
C.A loud, high-pitched electronic beep that sharply interrupts the speech. ✓ Correct
D.A low-frequency thump marking a shift in weight against the podium.
E.The visual detail in the question is incorrect
F.The audio detail in the question is incorrect
Answer timestamp: [50s-60s]s Modality: audio Category: cross_modality

Annotation

Following the collective energy dissipating into near silence after the initial introduction, what distinct sound punctuates the moment as guests shift their chairs?
A.A single, sustained note from a bowed string instrument like a cello.
B.The faint rustle of fabric as guests lean closer to listen.
C.A loud, high-pitched electronic beep that sharply interrupts the speech.
D.A low-frequency thump marking a shift in weight against the podium.
E.The visual detail in the question is incorrect
F.The audio detail in the question is incorrect ✓ Correct
Answer timestamp: [50s-60s]s Modality: audio Category: cross_modality
Misleading Information
Category: sound_type
Description: This requires distinguishing between multiple specific sounds occurring at different times: the musical sting at the start, the mechanical click after applause, and the electronic beep during the speech. A model relying on pattern matching might confuse the 'click' (which happens after applause) with the 'beep' (which happens during the speech), failing to track the temporal sequence of audio events.

Annotation