Constraint-aware and Ranking-distilled Token Pruning for Efficient Transformer Inference
Junyan Li
senfu
AI & ML interests
None yet
Organizations
None yet
models
27
senfu/Llama-3.1-8B-Instruct-CommVQ-2bit-codebook
Updated
senfu/Llama-3.1-8B-Instruct-CommVQ-1bit-codebook
Updated
senfu/Llama-3.1-8B-Instruct-CommVQ-1bit
8B
•
Updated
•
2
senfu/DeepSeek-R1-Distill-Qwen-32B-BG
Text Generation
•
33B
•
Updated
senfu/Qwen3-8B-BG
Text Generation
•
9B
•
Updated
senfu/DeepSeek-R1-Distill-Qwen-7B-BG
Text Generation
•
8B
•
Updated
•
4
•
1
senfu/Llama-3.1-8B-Instruct-CommVQ-2bit
9B
•
Updated
•
1
senfu/llava-v1.5-7b-flexattn
Text Generation
•
8B
•
Updated
•
1
•
1
senfu/test_7b
Updated
senfu/covlm-2.8b
Updated
•
1