Video analysis models for action recognition, temporal understanding, and video content classification