Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
165.8
TFLOPS
49
17
603
Mike Ravkine
PRO
mike-ravkine
Follow
drush's profile picture
OwenArli's profile picture
dark-pen's profile picture
39 followers
Β·
53 following
the-crypt-keeper
AI & ML interests
LLM Research / Development / Evaluation
Recent Activity
posted
an
update
about 16 hours ago
Spooky season is coming π» and there's nothing scarier then poor LLM evaluation results, right? The ReasonScape m12x dataset, explorer and leaderboard been updated with 12 additional models to bring up the total to *54* models and over *7.1B thinking tokens* and the groups filter has been split into two: family and size, which lets us take a peek at our first small-models-only reasoning result! The top performer, https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507 has an enormous overthink problem so I would actually give this crown to https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507 - simply ask it to think Step-by-Step to bring out it's latent hybrid-COT behavior and enjoy stellar performance at a fraction of the tokens! Right below the Qwen3-4B set and sitting just above https://huggingface.co/Qwen/Qwen3-1.7B is a model from a family that is somewhat underrated: https://huggingface.co/tencent/Hunyuan-4B-Instruct Note that vllm 0.11.0 has trouble with Hunyuan dense models, it seems to only support the MoE variants, I used 0.10.2 for my evaluations. In 7th place and worth a mention is https://huggingface.co/HuggingFaceTB/SmolLM3-3B which is a very efficient smaller thinker, but it's achilees heel was the strawberry test: it fails both letter counting and word sorting. The last model worth discussing in this context is https://huggingface.co/google/gemma-3-4b-it which is obviously not a reasoning model but when asked to think step-by-step it demonstrated tolerable performance across several tasks with incredibly low token utilization compared to most of these little guys. Would love to hear from the community! Do you use any of these models in your day-to-day? Did I miss any? Let me know! Full leaderboard @ https://reasonscape.com/m12x/leaderboard/
reacted
to
DmitryRyumin
's
post
with π₯
about 16 hours ago
ππ€π New Research Alert - ICCV 2025 (Oral)! ππ€π π Title: Variance-based Pruning for Accelerating and Compressing Trained Networks π π Description: The one-shot pruning method efficiently compresses networks, reducing computation and memory usage while retaining almost full performance and requiring minimal fine-tuning. π₯ Authors: Uranik Berisha, Jens Mehnert, and Alexandru Paul Condurache π Conference: ICCV, 19 β 23 Oct, 2025 | Honolulu, Hawai'i, USA πΊπΈ π Paper: https://huggingface.co/papers/2507.12988 π ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers π Added to the Efficient Learning Section: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/efficient-learning.md π More Papers: more cutting-edge research presented at other conferences in the https://huggingface.co/spaces/DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin π Keywords: #VarianceBasedPruning #NetworkCompression #ModelAcceleration #EfficientDeepLearning #VisionTransformers #AI #ICCV2025 #ResearchHighlight
posted
an
update
2 days ago
Spatial reasoning is a domain where LLMs struggle surprisingly hard. A new paper, "Stuck in the Matrix: Probing Spatial Reasoning in Large Language Models" compares performance on a handful of spacial reasoning tasks and finds all SOTA LLMs breaking down and hallucinating their faces off when the grids get large. https://arxiv.org/html/2510.20198v1 The word search task is especially revealing: notice the bias towards detecting "horizontal" while struggling with "vertical" - LLMs only understand simple, linear relationships.. add a stride for 2D and it's basically over.
View all activity
Organizations
None yet
mike-ravkine
's datasets
3
Sort:Β Recently updated
mike-ravkine/AlteredWorlds
Viewer
β’
Updated
Aug 31, 2024
β’
447
β’
21
β’
3
mike-ravkine/rosettacode-parsed
Viewer
β’
Updated
Jun 20, 2023
β’
4.26k
β’
56
β’
10
mike-ravkine/can-ai-code_junior-dev_v1
Viewer
β’
Updated
May 30, 2023
β’
24
β’
6