← Dashboard 2baUXj3vrEs
Ready annotator_b
Video ID: 2baUXj3vrEs
Category: speech_dominant
Standard Vision ○
Misleading Vision ○
Standard Audio ○
Misleading Audio ○
During the moment when the elderly Asian vendor with a weary expression leans over his hanging scale near the young blonde girl, what specific object does the man with long hair reach toward?
A.The silver pitcher on the table
B.The woven baskets on the cart ✓ Correct
C.The green check emblazoned with The American Mercury
D.The brass fittings of the gramophone
E.The visual detail in the question is incorrect
F.The audio detail in the question is incorrect
Answer timestamp: [80s-90s]s Modality: vision Category: cross_modality

Annotation

As the bald vendor in the crisp white apron measures weights under the striped awnings, what action is he performing with his hands?
A.The silver pitcher on the table
B.The woven baskets on the cart
C.The green check emblazoned with The American Mercury
D.The brass fittings of the gramophone
E.The visual detail in the question is incorrect ✓ Correct
F.The audio detail in the question is incorrect
Answer timestamp: [80s-90s]s Modality: vision Category: cross_modality
Misleading Information
Category: person_action
Description: This misleads models by swapping the specific vendor (Asian vs. Bald) and the associated action context. A model might conflate the two vendors appearing in the same market scene or rely on the generic concept of 'vendor measuring' rather than tracking the specific visual details of the Asian vendor scene where the basket interaction occurs.

Annotation

Following the exchange where a hesitant voice calls 'Mr. Hoya?' and receives a curt reply, what specific phrase is whispered immediately after?
A.Like to say thank you
B.Like to say thank you ✓ Correct
C.All right, amigos… ya conozco en paz…
D.She's come to feel…
E.The visual detail in the question is incorrect
F.The audio detail in the question is incorrect
Answer timestamp: [80s-90s]s Modality: audio Category: cross_modality

Annotation

When the gravelly narrator describes the bullet-faced vendor who keeps someone alive for a nickel, what specific detail about the vendor's demeanor is mentioned?
A.Like to say thank you
B.Like to say thank you
C.All right, amigos… ya conozco en paz…
D.She's come to feel…
E.The visual detail in the question is incorrect
F.The audio detail in the question is incorrect ✓ Correct
Answer timestamp: [80s-90s]s Modality: audio Category: cross_modality
Misleading Information
Category: speech_content
Description: This misleads models by substituting the dialogue source (overlapping character voices vs. the main narrator). Both audio elements occur in the same [80s-90s] segment. A lazy model might grab the first spoken line it hears (the narrator) or confuse the overlapping sounds, failing to isolate the specific conversational exchange requested.

Annotation