You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

You agree to not use the model to for government surveillance or law enforcement purposes.

Crashout: Real-time Human Gait Analysis

Crashout is a hybrid deep learning system for real-time human gait analysis, built entirely in Rust using the Burn deep learning framework. The system combines computer vision (pose detection) with temporal sequence modeling (LSTM + Transformer) to analyze human walking patterns for medical, sports, and research applications.

Architecture Overview

Input Video Stream → Pose Detection → LSTM Temporal Processing → Transformer Attention → Gait Analysis

Data Flow:

Video Input: RGB frames [batch, 3, 640, 640]
Pose Detection: Extract 17 keypoints → [batch, seq_len, 17, 3] (x, y, confidence)
Sequential Processing: LSTM processes flattened poses [batch, seq_len, 51]
Attention Layer: Transformer attends to important temporal moments
Gait Analysis: Classification, quality scoring, and feature extraction

Quick Start

Building the Library

# Build the library
cargo build --lib

# Check for compilation errors
cargo check --lib

# Run tests (including doctests)
cargo test

# Build with optimizations
cargo build --release --lib

# Run clippy for linting
cargo clippy

# Format code
cargo fmt

Training Pipeline

Train gait analysis models directly on video data with CSV labels:

# Build the training binary
cargo build --bin crashout

# Train a quality scoring model
cargo run --bin crashout train --data ./training_data --model-type quality --epochs 50

# Train a pathology classification model
cargo run --bin crashout train --data ./training_data --model-type classification --num-classes 5 --epochs 100

# Train a multi-task model with custom parameters
cargo run --bin crashout train \
  --data ./training_data \
  --model-type multi-task \
  --num-classes 5 \
  --epochs 200 \
  --batch-size 8 \
  --learning-rate 0.0005 \
  --max-seq-len 150 \
  --val-split 0.15 \
  --output ./models \
  --model-name gait_classifier

Video Resolution Standardization

All videos are automatically resized to 640x640 for consistent processing:

Input: Any size MP4 video (720p, 1080p, 4K, etc.)
Processing: Frames resized to 640x640 during extraction
Pose Detection: Operates on 640x640 frames (optimal input size)
Crashout Model: Processes 640x640 frames (consistent with pose detection)
Output: Labeled video at 640x640 resolution

This ensures:

✅ Consistent model performance across all input videos
✅ Optimal processing speed and memory usage
✅ No resolution mismatches between components
✅ Standardized training data format

CLI Options

The training command supports extensive customization:

# Full training command with all options
cargo run --bin crashout train \
  --data ./training_data \
  --model-type multi-task \
  --num-classes 5 \
  --epochs 200 \
  --batch-size 8 \
  --learning-rate 0.0005 \
  --max-seq-len 150 \
  --val-split 0.15 \
  --output ./models \
  --model-name gait_classifier \
  --skip-frames 2 \
  --video-extensions mp4,avi,mov

Available Commands:

train: Train gait analysis models on video data
inference: Run inference on new videos (coming soon)

Real-Time Streaming Analysis (Experimental)

Crashout's architecture is designed to support real-time streaming gait analysis using buffered sequences, though this capability is currently untested. The system could theoretically process live video streams with sufficient buffering to capture complete gait cycles.

Streaming Potential

Supported Stream Types (theoretical):

RTMP/RTMPS live streams
WebRTC video streams
Direct camera feeds (/dev/video0)
Network video streams (HTTP/HTTPS)

Minimum Buffer Requirements:

Technical minimum: ~30-60 frames (1-2 seconds at 30fps) for basic gait detection
Recommended: ~90-150 frames (3-5 seconds at 30fps) for robust analysis
Optimal: ~120-180 frames (4-6 seconds at 30fps) for highest accuracy

Expected Performance (untested):

Latency: 3-6 seconds (buffer fill time + inference)
Update frequency: Every 0.5-1 seconds using sliding windows
Memory per stream: ~50-100MB buffer overhead

Implementation Approach

The streaming system would use a sliding window buffer:

Continuous buffering: Maintain rolling window of recent frames
Gait cycle capture: Buffer length ensures complete step cycles are captured
Overlapped inference: Run predictions on overlapping windows for smooth output
Real-time pose extraction: Use pose detection on each incoming frame

Potential Applications

Security monitoring: Real-time person identification at checkpoints
Healthcare monitoring: Continuous gait quality assessment in facilities
Sports analysis: Live biomechanical feedback during training
Accessibility: Real-time mobility assistance and fall prevention

Current Status

⚠️ This functionality is theoretical and untested. The current implementation focuses on offline video analysis. Real-time streaming would require:

Streaming video input integration
Sliding window buffer implementation
Real-time inference pipeline optimization
Latency and throughput testing

The existing tensor pipeline and variable sequence length handling provide a solid foundation for future streaming implementation.

Pose Detection Architecture

Crashout implements a dual approach for pose detection to maximize flexibility:

🔥 Internal Pose Detection (Rust/Burn)

Purpose: Core gait analysis pipeline for real-time inference
Implementation: YOLOv5-inspired architecture in pure Rust using Burn
Backend: Burn deep learning framework with WGPU acceleration
Use case: Production gait analysis inference with full control over the pipeline

🌐 External YOLOv11 (ONNX)

Purpose: High-quality pose data extraction for training dataset creation
Implementation: Latest Ultralytics YOLOv11 models via ONNX Runtime
Models: YOLOv11n, YOLOv11s, YOLOv11m, YOLOv11l, YOLOv11x pose models
Use case: Preprocessing videos to create training datasets (default behavior)

This design allows you to:

Extract training data using state-of-the-art YOLOv11 models (default)
Train your gait models on high-quality pose sequences
Deploy for inference using the fast, self-contained Rust implementation

Implementation Status

✅ Completed Components

Real-Time Video Processing: Live MP4 → labeled MP4 pipeline with pose visualization
640x640 Standardization: Automatic video resizing for consistent model input
YOLOv11 Pose Detection: External YOLO model integration via ONNX Runtime
Frame Labeling System: Keypoint and skeleton overlay on video frames
Streaming Pipeline: Decode → detect → label → encode without intermediate files
LSTM Temporal Processing: Bidirectional LSTM with sequence-to-sequence support
Transformer Attention Layers: Multi-head self-attention with positional encoding
End-to-End Gait Model: Complete pipeline from pose data to gait predictions
Video Training Pipeline: Direct video → pose → training without preprocessing
Multi-Command CLI: Extract, train, and inference commands with full parameter control
Model Downloading: Automatic YOLOv11 model download and caching system
Person Tracking: Spatial proximity-based tracking across video frames
Self-Contained Processing: Pure Rust implementation using FFmpeg-next

🔧 Architecture Features

Dual Pose Detection: Internal architecture (Rust/Burn) + External YOLOv11 (ONNX) for data extraction
Input Size: Correctly configured for 51 flattened pose features (17 keypoints × 3 values)
Bidirectional LSTM: Captures both past and future temporal context
Transformer Integration: LSTM outputs feed directly into transformer attention layers
Flexible Prediction Heads: Configurable for different gait analysis tasks
Device Support: Full WGPU backend support for GPU acceleration
Memory Efficient: Optimized for real-time inference with <50ms target latency
Self-Contained: No system dependencies required for video processing

📊 Model Configurations

Quality Scoring Model:

LSTM: 256 hidden units, 2 layers, bidirectional
Transformer: 512 d_model, 8 heads, 4 layers
Output: Single quality score (0.0-1.0)

Pathology Classification Model:

LSTM: 256 hidden units, 3 layers, bidirectional
Transformer: 512 d_model, 8 heads, 6 layers
Output: Multi-class pathology probabilities

Multi-Task Model:

LSTM: 320 hidden units, 3 layers, bidirectional
Transformer: 640 d_model, 8 heads, 6 layers
Output: Quality scores + classification + feature vectors

Dataset Format

Crashout uses a video-based training pipeline that processes videos directly with CSV label files. No preprocessing or JSON conversion is needed.

Directory Structure

Organize your training data like this:

training_data/
├── quality_scores.csv    # Optional: video_name,quality_score
├── pathology_labels.csv  # Optional: video_name,pathology_class
├── class_names.txt       # Optional: class names, one per line
└── videos/
    ├── subject1_walk1.mp4
    ├── subject2_walk1.mp4
    ├── subject3_walk2.mp4
    └── ...

Label Files

quality_scores.csv (for quality scoring models):

video_name,quality_score
subject1_walk1.mp4,0.85
subject2_walk1.mp4,0.92
subject3_walk2.mp4,0.73

pathology_labels.csv (for classification models):

video_name,pathology_class
subject1_walk1.mp4,0
subject2_walk1.mp4,2
subject3_walk2.mp4,1

class_names.txt (for human-readable class names):

normal
limp
parkinson
arthritis
post_surgery

Video Requirements

Format: MP4, AVI, MOV (any size)
Resolution: Automatically resized to 640x640 during processing
Content: Walking sequences with clearly visible people
Duration: Variable length (handled automatically with padding/truncation)
Quality: Higher quality videos improve pose detection accuracy

COCO-17 Keypoint Format

Crashout uses the standard COCO-17 keypoint format:

Index	Keypoint	Description
0	nose	Face center
1	left_eye	Left eye
2	right_eye	Right eye
3	left_ear	Left ear
4	right_ear	Right ear
5	left_shoulder	Left shoulder
6	right_shoulder	Right shoulder
7	left_elbow	Left elbow
8	right_elbow	Right elbow
9	left_wrist	Left wrist
10	right_wrist	Right wrist
11	left_hip	Left hip
12	right_hip	Right hip
13	left_knee	Left knee
14	right_knee	Right knee
15	left_ankle	Left ankle
16	right_ankle	Right ankle

Each keypoint is represented as [x, y, confidence] where:

x, y: Pixel coordinates in the original video frame
confidence: Detection confidence score (0.0-1.0)

Lower body keypoints (hips, knees, ankles) are particularly important for gait analysis.

Training Process

Crashout handles the complete training pipeline automatically:

Video Loading: Reads MP4/AVI/MOV files from the videos/ directory
Pose Extraction: Runs pose detection on each frame
Sequence Creation: Groups consecutive frames into walking sequences
Data Augmentation: Handles variable sequence lengths with padding/truncation
Model Training: Uses LSTM + Transformer architecture with multi-task loss

Automatic Handling

Variable lengths: Sequences are automatically padded to max_seq_len or truncated
Missing frames: Gaps in pose detection are handled gracefully
Quality filtering: Low-confidence poses are filtered automatically
Batch processing: Efficient batching for GPU training
Validation split: Automatic train/validation splitting

Usage Examples

Creating a Gait Quality Model

use crashout::model::gait_model::{GaitModelConfig, utils};
use burn::backend::wgpu::WgpuDevice;
use burn::backend::Wgpu;

let device = WgpuDevice::default();

// Create a model optimized for gait quality scoring
let model = utils::create_quality_model::<Wgpu>(&device)?;

// Or create with custom configuration
let config = GaitModelConfig::quality_scoring();
let model = config.init::<Wgpu>(&device)?;

Gait-Based Person Identification

Crashout can be configured for person identification using gait as a biometric. Each person's walking pattern is unique, making this suitable for security, healthcare monitoring, and behavioral analysis applications.

How It Works

Gait biometrics leverage unique characteristics in how people walk:

Temporal patterns: Walking rhythm, step frequency, cadence
Spatial patterns: Stride length, step width, body movement
Biomechanical signatures: Joint angles, limb coordination, balance
Individual variations: Height, leg length, muscle strength, injuries

Training Data Structure for Person ID

person_labels.csv:
video_name,person_id,session,environment
subject001_session1.mp4,person_001,indoor_treadmill,controlled
subject001_session2.mp4,person_001,outdoor_natural,uncontrolled
subject002_session1.mp4,person_002,indoor_treadmill,controlled
subject003_session1.mp4,person_003,outdoor_natural,uncontrolled
...

Each person should have multiple walking sequences recorded across different:

Sessions: Different days/times to capture consistency
Environments: Indoor/outdoor, treadmill/natural walking
Conditions: Normal speed, fast walking, different clothing

Model Configuration for Person ID

use crashout::model::gait_model::GaitModelConfig;

// Person identification model (100 people)
let config = GaitModelConfig {
    // LSTM Configuration
    lstm_hidden_size: 256,
    lstm_num_layers: 3,
    lstm_bidirectional: true,

    // Transformer Configuration
    transformer_d_model: 512,
    transformer_num_heads: 8,
    transformer_num_layers: 6,

    // Classification head for person IDs
    enable_quality_head: false,
    enable_classification_head: true,
    num_classes: 100, // Number of unique people

    // Feature extraction for similarity matching
    enable_feature_head: true,
    feature_dim: 256, // Gait embeddings

    final_dropout: 0.1,
};

let person_id_model = config.init::<Wgpu>(&device)?;

Training for Person Identification

use crashout::model::training::{VideoGaitDataset, TrainingConfig, GaitTrainer};

// Create dataset with person ID labels
let mut dataset_config = TrainingDatasetConfig::default();
dataset_config.data_root = PathBuf::from("./person_id_data");

let dataset = VideoGaitDataset::from_directory(dataset_config)?;

// Training focused on classification accuracy
let training_config = TrainingConfig {
    num_epochs: 150,
    learning_rate: 1e-4,
    loss_config: LossConfig {
        task_weights: TaskWeights {
            quality: 0.0,          // Disable quality loss
            classification: 1.0,   // Focus on person ID classification
            temporal_consistency: 0.1, // Smooth gait patterns
        },
        use_focal_loss: true,      // Handle person ID imbalance
        focal_alpha: 0.25,
        focal_gamma: 2.0,
    },
    ..Default::default()
};

let mut trainer = GaitTrainer::new(person_id_model, dataset, training_config, device);
let metrics = trainer.train()?;

Inference for Person Recognition

// Single person identification
let prediction = model.predict_single(&unknown_sequence, &device, 100);

if let Some(class_probs) = prediction.class_probabilities {
    let person_id = class_probs.argmax(1).into_scalar();
    let confidence = class_probs.max_dim(1).into_scalar();

    println!("Identified as person_{:03}: {:.2}% confidence",
             person_id, confidence * 100.0);
}

// Feature-based similarity matching
if let Some(features) = prediction.features {
    // Compare against known person embeddings
    let similarities = compute_cosine_similarity(features, known_embeddings);
    let most_similar = similarities.argmax(0).into_scalar();

    println!("Most similar to person_{:03}", most_similar);
}

Multi-Task: Health + Identity

Combine person identification with health monitoring:

// Multi-task model: identify person AND assess their gait health
let config = GaitModelConfig {
    enable_quality_head: true,     // Health assessment
    enable_classification_head: true, // Person ID
    enable_feature_head: true,     // Similarity matching
    num_classes: 50,               // 50 people in system
    // ... other config
};

let health_id_model = config.init::<Wgpu>(&device)?;

// Inference provides both identification and health status
let prediction = health_id_model.predict_single(&sequence, &device, 100);

if let (Some(person_probs), Some(quality)) =
    (prediction.class_probabilities, prediction.quality_score) {

    let person_id = person_probs.argmax(1).into_scalar();
    let health_score = quality.into_scalar();

    println!("Person {}: Health score {:.3}/1.0", person_id, health_score);

    // Track health changes over time
    if health_score < previous_scores[person_id] - 0.1 {
        println!("⚠️  Health decline detected for person {}", person_id);
    }
}

Real-World Applications

Security & Access Control:

Identify individuals at security checkpoints without face visibility
Long-range person recognition for perimeter security
Continuous authentication while walking through facilities

Healthcare Monitoring:

Track specific patients' gait changes over time
Early detection of mobility issues or neurological conditions
Personalized rehabilitation progress monitoring

Research Applications:

Longitudinal studies of gait changes with age
Biomechanical analysis for sports performance
Population health studies with privacy preservation

Privacy Considerations:

Gait data can identify individuals - ensure proper data protection
Consider anonymization techniques for research applications
Implement access controls for person identification databases

Performance Expectations

Training Requirements:

Minimum: 10-20 walking sequences per person across multiple sessions
Recommended: 50+ sequences per person in varied conditions
Training time: 2-4 hours for 100 people on modern GPU

Identification Accuracy:

Controlled environment: High accuracy for known individuals
Natural conditions: Degraded but high accuracy with environmental variation
Degradation factors: Clothing changes, injuries, extreme weather

Real-time Performance (to be tested):

Identification latency: <100ms for sequence classification
Memory usage: ~500MB for 100-person model
Throughput: 10+ simultaneous video streams on GPU

Direct Video Inference

use crashout::model::gait_model::GaitModel;
use crashout::video_processor::FrameIterator;

// Load trained model
let model: GaitModel<Wgpu> = GaitModel::load_from_file("./models/gait_model.burn", &device)?;

// Process video directly
let mut frame_iter = FrameIterator::new("./test_video.mp4")?;
let mut pose_sequence = Vec::new();

while let Some(frame) = frame_iter.decode_frame()? {
    // Extract pose and add to sequence
    // (pose extraction details handled internally during training)
}

// Make prediction on video
let prediction = model.forward(pose_tensor, Some(&sequence_lengths), false);

match prediction.quality_score {
    Some(score) => println!("Gait quality: {:.3}", score.into_scalar()),
    None => println!("Quality head not enabled"),
}

Multi-Task Learning with Video Data

// Train multi-task model on video dataset
let config = GaitModelConfig::multi_task(5); // 5 pathology classes

// Use the CLI for easy training
cargo run --bin crashout train \
  --data ./medical_videos \
  --model-type multi-task \
  --num-classes 5 \
  --epochs 100 \
  --output ./trained_models

Key Design Decisions

Why Video-Based Training?

No preprocessing: Direct video input eliminates intermediate steps
Real-time pipeline: Same pose detection used for training and inference
Simple setup: Just videos + CSV labels - no complex data preparation
Flexible labeling: Easy to add new label types with CSV files

Why 51 Features?

17 keypoints × 3 values each (x, y, confidence) = 51 features
Flattened format is optimal for LSTM input
Preserves all spatial and confidence information

Why Per-Video Training?

Natural units: Each video represents one complete walking sequence
Variable lengths: Videos have different durations, handled automatically
Efficient processing: Batch multiple videos for GPU training
Simple labeling: One label per video file

Tensor Shape Flow

Understanding the tensor transformations through the video training pipeline:

Video Frames:       [height, width, 3] → MP4/AVI/MOV files
↓
Pose Detection:     Extracts keypoints → [[x,y,c], [x,y,c], ...] × 17
↓
Flattened:          [x,y,c,x,y,c,x,y,c,...] → 51 features per frame
↓
Sequence:           [[51], [51], [51], ...] → [seq_len, 51] per video
↓
Batched:            Multiple videos → [batch, seq_len, 51]
↓
LSTM:               Temporal processing → [batch, seq_len, hidden_size]
↓
Transformer:        Attention mechanism → [batch, seq_len, d_model]
↓
Gait Analysis:      Final prediction → [batch, output_size]

Contributing

Ensure your changes maintain tensor shape compatibility
Add tests for new data structures
Update documentation for any format changes
Run cargo test before submitting PRs

Citation

[Add citation information if this becomes a research project]

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Keypoint Detection

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support