X-Humanoid
/

Pelican1.0-VL-7B

@@ -45,7 +45,7 @@ We presents Pelican-VL 1.0, a new family of open-source embodied brain models wi
 * **Spatio-Temporal Cognition**: The model’s training includes tens of thousands of hours of video and dynamic scene question-answering, enabling it to understand continuous temporal sequences. When processing video frames, Pelican-VL captures object motion and the temporal order of actions, allowing it to make coherent inferences about complex sequential tasks—for instance, determining “which item should be moved first before operating the next.”
-* **Embodied Interaction Capabilities**: In robotic tasks such as object grasping, navigation, and collaboration, Pelican-VL not only comprehends task goals but also generates detailed action plans and evaluates the feasibility of each step. This means that upon receiving an instruction, it can design joint movement trajectories, grasping points, and operation strategies for robots. Its multi-task abilities span grasping, navigation, and human-robot interaction, demonstrating strong cross-task generalization.
 * **Self-Correction and Iterative Learning**: Through DPPO cyclic training, Pelican-VL exhibits a “self-correcting” capability. After each reinforcement learning cycle, the model automatically generates new challenging samples for retraining—similar to repeated practice and reflection. Over time, its weaknesses are gradually addressed, and its abilities continuously improve. This process mirrors the concept of “deliberate practice,” allowing Pelican-VL to advance iteratively and achieve performance on par with top-tier proprietary systems.

 * **Spatio-Temporal Cognition**: The model’s training includes tens of thousands of hours of video and dynamic scene question-answering, enabling it to understand continuous temporal sequences. When processing video frames, Pelican-VL captures object motion and the temporal order of actions, allowing it to make coherent inferences about complex sequential tasks—for instance, determining “which item should be moved first before operating the next.”
+* **Embodied Interaction Capabilities**: In robotic tasks such as object grasping, navigation, and collaborative manipulation, Pelican-VL not only understands high-level task goals but also produces detailed action sequences along with feasibility assessments for each step. This means that upon receiving an instruction, the model can determine appropriate grasp points and devise corresponding manipulation strategies. Its multi-task proficiency spans grasping, navigation, and human–robot interaction, demonstrating strong cross-task generalization.
 * **Self-Correction and Iterative Learning**: Through DPPO cyclic training, Pelican-VL exhibits a “self-correcting” capability. After each reinforcement learning cycle, the model automatically generates new challenging samples for retraining—similar to repeated practice and reflection. Over time, its weaknesses are gradually addressed, and its abilities continuously improve. This process mirrors the concept of “deliberate practice,” allowing Pelican-VL to advance iteratively and achieve performance on par with top-tier proprietary systems.