Video ID: -tx14nfI_eY
Category: scene_dominant
During the intense exchange of gunfire on the elevated track, when the woman with her hair pulled back in a messy ponytail jams a pistol against the window frame, what specific action does she perform?
Annotation
As the vehicle swerves violently to avoid the pursuing cruiser marked '2182', while the man with a bandaged cheekbone grips the steering wheel until his knuckles bleach white, what specific action does he perform?
Misleading Information
Category: person_position
Description: By shifting focus from the female passenger (who is shooting) to the male driver (who is driving), the question tests if the model can distinguish between the actions of two characters occupying the same space but performing different tasks. A lazy model might associate 'action' with the person holding the wheel or simply guess based on common tropes rather than visual evidence.
Annotation
Amidst the thunderous crescendo of engine roars and electronic whines, when the woman's voice cuts raw and urgent shouting 'Keep firing!', what is the immediate auditory result described in the scene?
Annotation
Following the sudden lurch of the car as the magnetic lock warning flashes, once the driver yells over the roar of engines asking 'What the hell are you doing?', what is the immediate auditory result described in the scene?
Misleading Information
Category: speech_tone
Description: The correct premise involves an urgent command ('Keep firing!') linked to gunfire sounds, while the wrong premise involves a confused/angry question ('What the hell...') linked to a mechanical failure. The distractors mix audio events from different timestamps. This tests if the model links the specific emotional tone and content of the dialogue to the correct concurrent sound effect.