You need to agree to share your contact information to access this model
This repository is publicly accessible, but you have to accept the conditions to access its files and content.
You agree to not use the model to for government surveillance or law enforcement purposes.
Log in or Sign Up to review the conditions and access this model content.
Crashout: Real-time Human Gait Analysis
Crashout is a hybrid deep learning system for real-time human gait analysis, built entirely in Rust using the Burn deep learning framework. The system combines computer vision (pose detection) with temporal sequence modeling (LSTM + Transformer) to analyze human walking patterns for medical, sports, and research applications.
Architecture Overview
Input Video Stream β Pose Detection β LSTM Temporal Processing β Transformer Attention β Gait Analysis
Data Flow:
- Video Input: RGB frames
[batch, 3, 640, 640] - Pose Detection: Extract 17 keypoints β
[batch, seq_len, 17, 3](x, y, confidence) - Sequential Processing: LSTM processes flattened poses
[batch, seq_len, 51] - Attention Layer: Transformer attends to important temporal moments
- Gait Analysis: Classification, quality scoring, and feature extraction
Quick Start
Building the Library
# Build the library
cargo build --lib
# Check for compilation errors
cargo check --lib
# Run tests (including doctests)
cargo test
# Build with optimizations
cargo build --release --lib
# Run clippy for linting
cargo clippy
# Format code
cargo fmt
Training Pipeline
Train gait analysis models directly on video data with CSV labels:
# Build the training binary
cargo build --bin crashout
# Train a quality scoring model
cargo run --bin crashout train --data ./training_data --model-type quality --epochs 50
# Train a pathology classification model
cargo run --bin crashout train --data ./training_data --model-type classification --num-classes 5 --epochs 100
# Train a multi-task model with custom parameters
cargo run --bin crashout train \
--data ./training_data \
--model-type multi-task \
--num-classes 5 \
--epochs 200 \
--batch-size 8 \
--learning-rate 0.0005 \
--max-seq-len 150 \
--val-split 0.15 \
--output ./models \
--model-name gait_classifier
Video Resolution Standardization
All videos are automatically resized to 640x640 for consistent processing:
- Input: Any size MP4 video (720p, 1080p, 4K, etc.)
- Processing: Frames resized to 640x640 during extraction
- Pose Detection: Operates on 640x640 frames (optimal input size)
- Crashout Model: Processes 640x640 frames (consistent with pose detection)
- Output: Labeled video at 640x640 resolution
This ensures:
- β Consistent model performance across all input videos
- β Optimal processing speed and memory usage
- β No resolution mismatches between components
- β Standardized training data format
CLI Options
The training command supports extensive customization:
# Full training command with all options
cargo run --bin crashout train \
--data ./training_data \
--model-type multi-task \
--num-classes 5 \
--epochs 200 \
--batch-size 8 \
--learning-rate 0.0005 \
--max-seq-len 150 \
--val-split 0.15 \
--output ./models \
--model-name gait_classifier \
--skip-frames 2 \
--video-extensions mp4,avi,mov
Available Commands:
train: Train gait analysis models on video datainference: Run inference on new videos (coming soon)
Real-Time Streaming Analysis (Experimental)
Crashout's architecture is designed to support real-time streaming gait analysis using buffered sequences, though this capability is currently untested. The system could theoretically process live video streams with sufficient buffering to capture complete gait cycles.
Streaming Potential
Supported Stream Types (theoretical):
- RTMP/RTMPS live streams
- WebRTC video streams
- Direct camera feeds (/dev/video0)
- Network video streams (HTTP/HTTPS)
Minimum Buffer Requirements:
- Technical minimum: ~30-60 frames (1-2 seconds at 30fps) for basic gait detection
- Recommended: ~90-150 frames (3-5 seconds at 30fps) for robust analysis
- Optimal: ~120-180 frames (4-6 seconds at 30fps) for highest accuracy
Expected Performance (untested):
- Latency: 3-6 seconds (buffer fill time + inference)
- Update frequency: Every 0.5-1 seconds using sliding windows
- Memory per stream: ~50-100MB buffer overhead
Implementation Approach
The streaming system would use a sliding window buffer:
- Continuous buffering: Maintain rolling window of recent frames
- Gait cycle capture: Buffer length ensures complete step cycles are captured
- Overlapped inference: Run predictions on overlapping windows for smooth output
- Real-time pose extraction: Use pose detection on each incoming frame
Potential Applications
- Security monitoring: Real-time person identification at checkpoints
- Healthcare monitoring: Continuous gait quality assessment in facilities
- Sports analysis: Live biomechanical feedback during training
- Accessibility: Real-time mobility assistance and fall prevention
Current Status
β οΈ This functionality is theoretical and untested. The current implementation focuses on offline video analysis. Real-time streaming would require:
- Streaming video input integration
- Sliding window buffer implementation
- Real-time inference pipeline optimization
- Latency and throughput testing
The existing tensor pipeline and variable sequence length handling provide a solid foundation for future streaming implementation.
Pose Detection Architecture
Crashout implements a dual approach for pose detection to maximize flexibility:
π₯ Internal Pose Detection (Rust/Burn)
- Purpose: Core gait analysis pipeline for real-time inference
- Implementation: YOLOv5-inspired architecture in pure Rust using Burn
- Backend: Burn deep learning framework with WGPU acceleration
- Use case: Production gait analysis inference with full control over the pipeline
π External YOLOv11 (ONNX)
- Purpose: High-quality pose data extraction for training dataset creation
- Implementation: Latest Ultralytics YOLOv11 models via ONNX Runtime
- Models: YOLOv11n, YOLOv11s, YOLOv11m, YOLOv11l, YOLOv11x pose models
- Use case: Preprocessing videos to create training datasets (default behavior)
This design allows you to:
- Extract training data using state-of-the-art YOLOv11 models (default)
- Train your gait models on high-quality pose sequences
- Deploy for inference using the fast, self-contained Rust implementation
Implementation Status
β Completed Components
- Real-Time Video Processing: Live MP4 β labeled MP4 pipeline with pose visualization
- 640x640 Standardization: Automatic video resizing for consistent model input
- YOLOv11 Pose Detection: External YOLO model integration via ONNX Runtime
- Frame Labeling System: Keypoint and skeleton overlay on video frames
- Streaming Pipeline: Decode β detect β label β encode without intermediate files
- LSTM Temporal Processing: Bidirectional LSTM with sequence-to-sequence support
- Transformer Attention Layers: Multi-head self-attention with positional encoding
- End-to-End Gait Model: Complete pipeline from pose data to gait predictions
- Video Training Pipeline: Direct video β pose β training without preprocessing
- Multi-Command CLI: Extract, train, and inference commands with full parameter control
- Model Downloading: Automatic YOLOv11 model download and caching system
- Person Tracking: Spatial proximity-based tracking across video frames
- Self-Contained Processing: Pure Rust implementation using FFmpeg-next
π§ Architecture Features
- Dual Pose Detection: Internal architecture (Rust/Burn) + External YOLOv11 (ONNX) for data extraction
- Input Size: Correctly configured for 51 flattened pose features (17 keypoints Γ 3 values)
- Bidirectional LSTM: Captures both past and future temporal context
- Transformer Integration: LSTM outputs feed directly into transformer attention layers
- Flexible Prediction Heads: Configurable for different gait analysis tasks
- Device Support: Full WGPU backend support for GPU acceleration
- Memory Efficient: Optimized for real-time inference with <50ms target latency
- Self-Contained: No system dependencies required for video processing
π Model Configurations
Quality Scoring Model:
- LSTM: 256 hidden units, 2 layers, bidirectional
- Transformer: 512 d_model, 8 heads, 4 layers
- Output: Single quality score (0.0-1.0)
Pathology Classification Model:
- LSTM: 256 hidden units, 3 layers, bidirectional
- Transformer: 512 d_model, 8 heads, 6 layers
- Output: Multi-class pathology probabilities
Multi-Task Model:
- LSTM: 320 hidden units, 3 layers, bidirectional
- Transformer: 640 d_model, 8 heads, 6 layers
- Output: Quality scores + classification + feature vectors
Dataset Format
Crashout uses a video-based training pipeline that processes videos directly with CSV label files. No preprocessing or JSON conversion is needed.
Directory Structure
Organize your training data like this:
training_data/
βββ quality_scores.csv # Optional: video_name,quality_score
βββ pathology_labels.csv # Optional: video_name,pathology_class
βββ class_names.txt # Optional: class names, one per line
βββ videos/
βββ subject1_walk1.mp4
βββ subject2_walk1.mp4
βββ subject3_walk2.mp4
βββ ...
Label Files
quality_scores.csv (for quality scoring models):
video_name,quality_score
subject1_walk1.mp4,0.85
subject2_walk1.mp4,0.92
subject3_walk2.mp4,0.73
pathology_labels.csv (for classification models):
video_name,pathology_class
subject1_walk1.mp4,0
subject2_walk1.mp4,2
subject3_walk2.mp4,1
class_names.txt (for human-readable class names):
normal
limp
parkinson
arthritis
post_surgery
Video Requirements
- Format: MP4, AVI, MOV (any size)
- Resolution: Automatically resized to 640x640 during processing
- Content: Walking sequences with clearly visible people
- Duration: Variable length (handled automatically with padding/truncation)
- Quality: Higher quality videos improve pose detection accuracy
COCO-17 Keypoint Format
Crashout uses the standard COCO-17 keypoint format:
| Index | Keypoint | Description |
|---|---|---|
| 0 | nose | Face center |
| 1 | left_eye | Left eye |
| 2 | right_eye | Right eye |
| 3 | left_ear | Left ear |
| 4 | right_ear | Right ear |
| 5 | left_shoulder | Left shoulder |
| 6 | right_shoulder | Right shoulder |
| 7 | left_elbow | Left elbow |
| 8 | right_elbow | Right elbow |
| 9 | left_wrist | Left wrist |
| 10 | right_wrist | Right wrist |
| 11 | left_hip | Left hip |
| 12 | right_hip | Right hip |
| 13 | left_knee | Left knee |
| 14 | right_knee | Right knee |
| 15 | left_ankle | Left ankle |
| 16 | right_ankle | Right ankle |
Each keypoint is represented as [x, y, confidence] where:
x, y: Pixel coordinates in the original video frameconfidence: Detection confidence score (0.0-1.0)
Lower body keypoints (hips, knees, ankles) are particularly important for gait analysis.
Training Process
Crashout handles the complete training pipeline automatically:
- Video Loading: Reads MP4/AVI/MOV files from the videos/ directory
- Pose Extraction: Runs pose detection on each frame
- Sequence Creation: Groups consecutive frames into walking sequences
- Data Augmentation: Handles variable sequence lengths with padding/truncation
- Model Training: Uses LSTM + Transformer architecture with multi-task loss
Automatic Handling
- Variable lengths: Sequences are automatically padded to
max_seq_lenor truncated - Missing frames: Gaps in pose detection are handled gracefully
- Quality filtering: Low-confidence poses are filtered automatically
- Batch processing: Efficient batching for GPU training
- Validation split: Automatic train/validation splitting
Usage Examples
Creating a Gait Quality Model
use crashout::model::gait_model::{GaitModelConfig, utils};
use burn::backend::wgpu::WgpuDevice;
use burn::backend::Wgpu;
let device = WgpuDevice::default();
// Create a model optimized for gait quality scoring
let model = utils::create_quality_model::<Wgpu>(&device)?;
// Or create with custom configuration
let config = GaitModelConfig::quality_scoring();
let model = config.init::<Wgpu>(&device)?;
Gait-Based Person Identification
Crashout can be configured for person identification using gait as a biometric. Each person's walking pattern is unique, making this suitable for security, healthcare monitoring, and behavioral analysis applications.
How It Works
Gait biometrics leverage unique characteristics in how people walk:
- Temporal patterns: Walking rhythm, step frequency, cadence
- Spatial patterns: Stride length, step width, body movement
- Biomechanical signatures: Joint angles, limb coordination, balance
- Individual variations: Height, leg length, muscle strength, injuries
Training Data Structure for Person ID
person_labels.csv:
video_name,person_id,session,environment
subject001_session1.mp4,person_001,indoor_treadmill,controlled
subject001_session2.mp4,person_001,outdoor_natural,uncontrolled
subject002_session1.mp4,person_002,indoor_treadmill,controlled
subject003_session1.mp4,person_003,outdoor_natural,uncontrolled
...
Each person should have multiple walking sequences recorded across different:
- Sessions: Different days/times to capture consistency
- Environments: Indoor/outdoor, treadmill/natural walking
- Conditions: Normal speed, fast walking, different clothing
Model Configuration for Person ID
use crashout::model::gait_model::GaitModelConfig;
// Person identification model (100 people)
let config = GaitModelConfig {
// LSTM Configuration
lstm_hidden_size: 256,
lstm_num_layers: 3,
lstm_bidirectional: true,
// Transformer Configuration
transformer_d_model: 512,
transformer_num_heads: 8,
transformer_num_layers: 6,
// Classification head for person IDs
enable_quality_head: false,
enable_classification_head: true,
num_classes: 100, // Number of unique people
// Feature extraction for similarity matching
enable_feature_head: true,
feature_dim: 256, // Gait embeddings
final_dropout: 0.1,
};
let person_id_model = config.init::<Wgpu>(&device)?;
Training for Person Identification
use crashout::model::training::{VideoGaitDataset, TrainingConfig, GaitTrainer};
// Create dataset with person ID labels
let mut dataset_config = TrainingDatasetConfig::default();
dataset_config.data_root = PathBuf::from("./person_id_data");
let dataset = VideoGaitDataset::from_directory(dataset_config)?;
// Training focused on classification accuracy
let training_config = TrainingConfig {
num_epochs: 150,
learning_rate: 1e-4,
loss_config: LossConfig {
task_weights: TaskWeights {
quality: 0.0, // Disable quality loss
classification: 1.0, // Focus on person ID classification
temporal_consistency: 0.1, // Smooth gait patterns
},
use_focal_loss: true, // Handle person ID imbalance
focal_alpha: 0.25,
focal_gamma: 2.0,
},
..Default::default()
};
let mut trainer = GaitTrainer::new(person_id_model, dataset, training_config, device);
let metrics = trainer.train()?;
Inference for Person Recognition
// Single person identification
let prediction = model.predict_single(&unknown_sequence, &device, 100);
if let Some(class_probs) = prediction.class_probabilities {
let person_id = class_probs.argmax(1).into_scalar();
let confidence = class_probs.max_dim(1).into_scalar();
println!("Identified as person_{:03}: {:.2}% confidence",
person_id, confidence * 100.0);
}
// Feature-based similarity matching
if let Some(features) = prediction.features {
// Compare against known person embeddings
let similarities = compute_cosine_similarity(features, known_embeddings);
let most_similar = similarities.argmax(0).into_scalar();
println!("Most similar to person_{:03}", most_similar);
}
Multi-Task: Health + Identity
Combine person identification with health monitoring:
// Multi-task model: identify person AND assess their gait health
let config = GaitModelConfig {
enable_quality_head: true, // Health assessment
enable_classification_head: true, // Person ID
enable_feature_head: true, // Similarity matching
num_classes: 50, // 50 people in system
// ... other config
};
let health_id_model = config.init::<Wgpu>(&device)?;
// Inference provides both identification and health status
let prediction = health_id_model.predict_single(&sequence, &device, 100);
if let (Some(person_probs), Some(quality)) =
(prediction.class_probabilities, prediction.quality_score) {
let person_id = person_probs.argmax(1).into_scalar();
let health_score = quality.into_scalar();
println!("Person {}: Health score {:.3}/1.0", person_id, health_score);
// Track health changes over time
if health_score < previous_scores[person_id] - 0.1 {
println!("β οΈ Health decline detected for person {}", person_id);
}
}
Real-World Applications
Security & Access Control:
- Identify individuals at security checkpoints without face visibility
- Long-range person recognition for perimeter security
- Continuous authentication while walking through facilities
Healthcare Monitoring:
- Track specific patients' gait changes over time
- Early detection of mobility issues or neurological conditions
- Personalized rehabilitation progress monitoring
Research Applications:
- Longitudinal studies of gait changes with age
- Biomechanical analysis for sports performance
- Population health studies with privacy preservation
Privacy Considerations:
- Gait data can identify individuals - ensure proper data protection
- Consider anonymization techniques for research applications
- Implement access controls for person identification databases
Performance Expectations
Training Requirements:
- Minimum: 10-20 walking sequences per person across multiple sessions
- Recommended: 50+ sequences per person in varied conditions
- Training time: 2-4 hours for 100 people on modern GPU
Identification Accuracy:
- Controlled environment: High accuracy for known individuals
- Natural conditions: Degraded but high accuracy with environmental variation
- Degradation factors: Clothing changes, injuries, extreme weather
Real-time Performance (to be tested):
- Identification latency: <100ms for sequence classification
- Memory usage: ~500MB for 100-person model
- Throughput: 10+ simultaneous video streams on GPU
Direct Video Inference
use crashout::model::gait_model::GaitModel;
use crashout::video_processor::FrameIterator;
// Load trained model
let model: GaitModel<Wgpu> = GaitModel::load_from_file("./models/gait_model.burn", &device)?;
// Process video directly
let mut frame_iter = FrameIterator::new("./test_video.mp4")?;
let mut pose_sequence = Vec::new();
while let Some(frame) = frame_iter.decode_frame()? {
// Extract pose and add to sequence
// (pose extraction details handled internally during training)
}
// Make prediction on video
let prediction = model.forward(pose_tensor, Some(&sequence_lengths), false);
match prediction.quality_score {
Some(score) => println!("Gait quality: {:.3}", score.into_scalar()),
None => println!("Quality head not enabled"),
}
Multi-Task Learning with Video Data
// Train multi-task model on video dataset
let config = GaitModelConfig::multi_task(5); // 5 pathology classes
// Use the CLI for easy training
cargo run --bin crashout train \
--data ./medical_videos \
--model-type multi-task \
--num-classes 5 \
--epochs 100 \
--output ./trained_models
Key Design Decisions
Why Video-Based Training?
- No preprocessing: Direct video input eliminates intermediate steps
- Real-time pipeline: Same pose detection used for training and inference
- Simple setup: Just videos + CSV labels - no complex data preparation
- Flexible labeling: Easy to add new label types with CSV files
Why 51 Features?
- 17 keypoints Γ 3 values each (x, y, confidence) = 51 features
- Flattened format is optimal for LSTM input
- Preserves all spatial and confidence information
Why Per-Video Training?
- Natural units: Each video represents one complete walking sequence
- Variable lengths: Videos have different durations, handled automatically
- Efficient processing: Batch multiple videos for GPU training
- Simple labeling: One label per video file
Tensor Shape Flow
Understanding the tensor transformations through the video training pipeline:
Video Frames: [height, width, 3] β MP4/AVI/MOV files
β
Pose Detection: Extracts keypoints β [[x,y,c], [x,y,c], ...] Γ 17
β
Flattened: [x,y,c,x,y,c,x,y,c,...] β 51 features per frame
β
Sequence: [[51], [51], [51], ...] β [seq_len, 51] per video
β
Batched: Multiple videos β [batch, seq_len, 51]
β
LSTM: Temporal processing β [batch, seq_len, hidden_size]
β
Transformer: Attention mechanism β [batch, seq_len, d_model]
β
Gait Analysis: Final prediction β [batch, output_size]
Contributing
- Ensure your changes maintain tensor shape compatibility
- Add tests for new data structures
- Update documentation for any format changes
- Run
cargo testbefore submitting PRs
Citation
[Add citation information if this becomes a research project]