← Dashboard -tx14nfI_eY
Ready annotator_b
Video ID: -tx14nfI_eY
Category: scene_dominant
Standard Vision ○
Misleading Vision ○
Standard Audio ○
Misleading Audio ○
During the intense exchange of gunfire on the elevated track, when the woman with her hair pulled back in a messy ponytail jams a pistol against the window frame, what specific action does she perform?
A.She squeezes off rounds into the night, causing molten sparks to erupt from the police vehicle. ✓ Correct
B.She leans out the open window to gesture frantically at the pursuing fleet.
C.She reaches for the door handle to attempt an escape into the alleyway.
D.She grabs the steering wheel to wrestle control away from the male passenger.
E.The visual detail in the question is incorrect
F.The audio detail in the question is incorrect
Answer timestamp: [140s-150s]s Modality: vision Category: existence

Annotation

As the vehicle swerves violently to avoid the pursuing cruiser marked '2182', while the man with a bandaged cheekbone grips the steering wheel until his knuckles bleach white, what specific action does he perform?
A.She squeezes off rounds into the night, causing molten sparks to erupt from the police vehicle.
B.She leans out the open window to gesture frantically at the pursuing fleet.
C.She reaches for the door handle to attempt an escape into the alleyway.
D.She grabs the steering wheel to wrestle control away from the male passenger.
E.The visual detail in the question is incorrect ✓ Correct
F.The audio detail in the question is incorrect
Answer timestamp: [140s-150s]s Modality: vision Category: existence
Misleading Information
Category: person_position
Description: By shifting focus from the female passenger (who is shooting) to the male driver (who is driving), the question tests if the model can distinguish between the actions of two characters occupying the same space but performing different tasks. A lazy model might associate 'action' with the person holding the wheel or simply guess based on common tropes rather than visual evidence.

Annotation

Amidst the thunderous crescendo of engine roars and electronic whines, when the woman's voice cuts raw and urgent shouting 'Keep firing!', what is the immediate auditory result described in the scene?
A.Bullets crack against the metal flank of the police vehicle, erupting into showers of molten sparks. ✓ Correct
B.A robotic officer's plasma rifle grazes the frame, sending sparks flying like fireflies.
C.The rearview mirror fractures further under the weight of impending danger.
D.A calm, authoritative male voice cuts through the din stating 'Repeat, he has headed to suspension three.'
E.The visual detail in the question is incorrect
F.The audio detail in the question is incorrect
Answer timestamp: [140s-150s]s Modality: audio Category: existence

Annotation

Following the sudden lurch of the car as the magnetic lock warning flashes, once the driver yells over the roar of engines asking 'What the hell are you doing?', what is the immediate auditory result described in the scene?
A.Bullets crack against the metal flank of the police vehicle, erupting into showers of molten sparks.
B.A robotic officer's plasma rifle grazes the frame, sending sparks flying like fireflies.
C.The rearview mirror fractures further under the weight of impending danger.
D.A calm, authoritative male voice cuts through the din stating 'Repeat, he has headed to suspension three.'
E.The visual detail in the question is incorrect
F.The audio detail in the question is incorrect ✓ Correct
Answer timestamp: [140s-150s]s Modality: audio Category: existence
Misleading Information
Category: speech_tone
Description: The correct premise involves an urgent command ('Keep firing!') linked to gunfire sounds, while the wrong premise involves a confused/angry question ('What the hell...') linked to a mechanical failure. The distractors mix audio events from different timestamps. This tests if the model links the specific emotional tone and content of the dialogue to the correct concurrent sound effect.

Annotation