MLAdaptiveIntelligence/LLaVAction-0.5B
Video-Text-to-Text
•
0.9B
•
Updated
•
22
•
2
LLaVAction: evaluating and training multi-modal large language models for action recognition