Dynamically Sacrificing Accuracy for Reduced Computation: Cascaded Inference Based on Softmax Confidence Paper • 1805.10982 • Published May 28, 2018 • 1
Top-Theta Attention: Sparsifying Transformers by Compensated Thresholding Paper • 2502.08363 • Published Feb 12 • 1
Identifying and Exploiting Sparse Branch Correlations for Optimizing Branch Prediction Paper • 2207.14033 • Published Jul 28, 2022
Physics-Informed Deep Neural Network Method for Limited Observability State Estimation Paper • 1910.06401 • Published Oct 14, 2019
Top-Theta Attention: Sparsifying Transformers by Compensated Thresholding Paper • 2502.08363 • Published Feb 12 • 1
Dynamically Sacrificing Accuracy for Reduced Computation: Cascaded Inference Based on Softmax Confidence Paper • 1805.10982 • Published May 28, 2018 • 1
SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLM Weights Paper • 2509.22944 • Published Sep 26 • 78