AI & ML interests

computer-vision, image-processing, machine-learning, deep-learning

Recent Activity

ZennyKenny 
posted an update about 16 hours ago
merve 
posted an update 3 days ago
view post
Post
4154
deepseek-ai/DeepSeek-OCR is out! 🔥 my take ⤵️
> pretty insane it can parse and re-render charts in HTML
> it uses CLIP and SAM features concatenated, so better grounding
> very efficient per vision tokens/performance ratio
> covers 100 languages
  • 2 replies
·
Nymbo 
posted an update 5 days ago
view post
Post
1375
Two new tools added to the Nymbo/Tools MCP server, File_System and Shell_Exec. You can theoretically do basically anything with these two tools, and it should enable support for many Claude Skills.

GPT-5-Codex proves that for many cases, shell commands really are all you need, and Claude Skills seem to lean into this. The thing is, nothing about the design of Claude Skills actually restricts them to proprietary models!

# File_System

There's a new directory inside the repo called Filesystem, that's the agent's "root". It can perform the following actions : list, read, write, append, mkdir, move, copy, delete, info, help. It's able to keep this all within the scope of one tool call by making the Action field required and all other fields optional. Using a filesystem shouldn't require 15 different tools.

Files created in the public HF space live in the space's running container, and gets cleared when the space is restarted. When running the server locally, files are actually stored on disk.

# Shell_Exec

What good is a filesystem if you can't execute commands in that filesystem? This tool automatically detects if the server is running on Windows or Linux, and suggests using the appropriate shell (PowerShell/Bash). Both of these new tools require that the agent uses relative paths, rather than absolute paths. I could be convinced to back pedal on this.

# Closing Thoughts

The File_System and Shell_Exec tools aren't super polished yet, I'll continue to improve the agent's instructions and UX of using the new tools. Most of my testing was done with gpt-oss-20b and if it messes up, it gets the gist after one failed tool call. It should work perfectly fine for the GPU poor.
  • 1 reply
·
ZennyKenny 
posted an update 8 days ago
view post
Post
2140
Did Hugging Face just ban hammer a bunch of bot accounts or am I just so uninteresting that 30% of my subs dropped me overnight?

😬 Wait, don't answer that.
  • 2 replies
·
ZennyKenny 
posted an update 10 days ago
Nymbo 
posted an update 10 days ago
view post
Post
1595
I've made some improvements to my custom Deep_Research tool in the Nymbo/Tools MCP server. I've added a second LLM process and it still takes less than 1 minute to complete!

The original version of my Deep_Research tool would basically dump up to 50 fetched webpages onto the Researcher model (Qwen3-235B), with only a little bit of context shown from each page.

# New "Filterer" Process

The new process includes another LLM call before the researcher process. The Filterer (also Qwen3-235B) gets the query summary and the original 50 pages with low context, and decides which pages are most relevant to the research topic. The Filterer then outputs the URLs to the relevant pages, which are then re-fetched (with more context) and sent to the Researcher.

# Researcher Context

The Researcher now gets only the relevant webpages, then begins writing the report. When testing with 50 initial results, the researcher would often end up with 10-20 results of relevant context.

It still takes less than a minute to accomplish everything, thanks entirely to Cerebras inference. It now takes about 35-45 seconds to complete once the tool is run.

It's also worth noting that both the Filterer and Researcher now are provided the current time/date before they see the content, reducing hallucinations caused by knowledge cutoffs.
Nymbo 
posted an update 20 days ago
view post
Post
599
I have a few Sora-2 invites - 15509N
  • 1 reply
·
ZennyKenny 
posted an update 21 days ago
view post
Post
1234
🥊 Big Code Arena is live! bigcode/arena

💡 bigcode is an open scientific collaboration working on responsible training of large language models for coding applications.

👉 The Arena ranks LLMs based on their ability to support natural language vibe coding requests in a competitive format, based on feedback from human reviewers.

🧠 It was a pleasure to contribute to this project led by @terryyz and appear as an additional contributor in the Big Code Arena paper.
ZennyKenny 
posted an update 26 days ago
view post
Post
8886
🖤 Probably one of my favorite projects that I've worked on so far, introducing Новояз (Novoyaz).

🛠 One of the first acts of the Bolshevik government after the Russian Revolution was the reform and standardization of the Russian language, which at the time had a non-standard and challenging orthography.

📚 Upon its reform the government launched a nationwide campaign called Ликбез (Likbez), which sought to improve literacy in the country (by the way, it worked, bringing the national literacy rate from <20% in the 1920s to >80% by the 1930s).

‼ While this is a remarkable result that should absolutely be celebrated, it's one that has left behind literally hundreds of thousands if not millions of artifacts using pre-reform Russian orthography.

😓 Researchers and historians are working tirelessly to translate these artifacts to modern Russian so that they may be archived and studied but many have told me that. they are doing this BY HAND (!).

💡 I thought, well this is a perfect use case for OCR and a fine-tuned LLM to step in and help to aid in this important work!

🌏 Introducing НОВОЯЗ (NOVOYAZ)! Powered by ChatDOC/OCRFlux-3B and ZennyKenny/oss-20b-prereform-to-modern-ru-merged, researchers can now convert images of their pre-reform documents to modern Russian orthography using the power of open-source AI!

Check it out and drop a like to support more real-world use cases for open source AI outside of traditional tech-centric domains!

ZennyKenny/Novoyaz
ZennyKenny 
posted an update 27 days ago
view post
Post
555
🔒 Like a lot of other AI builders, I have some anxiety about the emerging surveillance-capitalist paradigm emerging in the AI space.

👉 Of course-- this kind of thing isn't completely new and has been going on for decades, but the difference is the stronger immersion of AI tools into our daily lives (compared to something like a search engine or social network).

❕ That's why I was really excited to come across Lumo: https://lumo.proton.me/u/1/

❕ Lumo is created by ProtonPrivacy and offers privacy-first features that make sure that what you do with you AI assistant is your business.

❕ I already trust Proton with my other business apps and I've never been disappointed, plus the Lumo architecture is really fantastic, dynamically routing each query to the most appropriate model for the request.

🔥 Really awesome stuff Proton, thank you as always.
ZennyKenny 
posted an update about 1 month ago
view post
Post
2376
The reactions to mostlyai/synthetic-sdk-demo have been incredible! 🔥

Some users wrote that they were having performance issues on larger datasets, so I've capped the Space's input to 5000 rows and 10 columns, but you can always use the open source SDK that powers the space any time you want on datasets of arbitrary size and shape!

Check it out: https://github.com/mostly-ai/mostlyai 👈
merve 
posted an update about 1 month ago
view post
Post
6566
large AI labs open-sourced a ton of models last week 🔥
here's few picks, find even more here merve/sep-16-releases-68d13ea4c547f02f95842f05 🤝
> IBM released a new Docling model with 258M params based on Granite (A2.0) 📝 ibm-granite/granite-docling-258M
> Xiaomi released 7B audio LM with base and instruct variants (MIT) XiaomiMiMo/mimo-audio-68cc7202692c27dae881cce0
> DecartAI released Lucy Edit, open Nano Banana 🍌 (NC) decart-ai/Lucy-Edit-Dev
> OpenGVLab released a family of agentic computer use models (3B/7B/32B) with the dataset 💻 OpenGVLab/scalecua-68c912cf56f7ff4c8e034003
> Meituan Longcat released thinking version of LongCat-Flash 💭 meituan-longcat/LongCat-Flash-Thinking
  • 2 replies
·
Nymbo 
posted an update about 1 month ago
view post
Post
1041
There's now a custom Deep_Research tool in my Nymbo/Tools MCP server! TL;DR: The agent using the tools writes a summary of your requests and up to five DuckDuckGo searches (up to 50 results). Each of the webpages found in the searches are then fetched and given to our researcher (Qwen3-235B-A22B-Thinking-2507). The researcher sees the summary, searched queries, and fetched links, then writes a thorough research report. The agent using the tool provides the user with a summary of the report and a link to download research_report.txt. The researcher's instructions are similar to some leaked Perplexity sys prompts.

# Deep_Research Tool

It accomplishes everything in under a minute so it doesn't hit MCP's 60 second timeout, mostly thanks to Cerebras. The only thing required to make this work is a HF_READ_TOKEN for inference.

The Deep_Research tool could certainly be improved. It still needs some sort of mechanism for sorting URLs based on importance (I've got some ideas but I don't want it to be the responsibility of the agent using the tool). I'll probably add a second researcher to filter out the bad sources before inferencing the big researcher. I'm hellbent on keeping this all within the scope of one tool call.

# More Fetch/Web Search Improvements

The Search_DuckDuckGo tool has been further enhanced. It now allows the agent to browse through all pages of results. The results also now include published date (if detected). It also now supports every DDG search types! Default DDG search is called text, but it can also now search by news, images, videos, and books.

The Fetch_Webpage tool now specifies how much of the page has been truncated, and cursor index, allowing it to pickup where it left off without re-consuming tokens. The model can now also choose to strip CSS selectors to remove excess noise, and there's a new URL Scraper mode that only returns URLs found on the full page.

More to come soon ~
merve 
posted an update about 1 month ago
view post
Post
3241
IBM just released small swiss army knife for the document models: granite-docling-258M on Hugging Face 🔥

> not only a document converter but also can do document question answering, understand multiple languages 🤯
> best part: released with Apache 2.0 license 👏 use it with your commercial projects!
> it supports transformers, vLLM and MLX from the get-go! 🤗
> built on SigLIP2 & granite-165M

model: ibm-granite/granite-docling-258M
demo: ibm-granite/granite-docling-258m-demo 💗
ZennyKenny 
posted an update about 1 month ago
view post
Post
2636
The open source Synthetic Data SDK from MOSTLY AI: mostlyai offers the ability to generate realistic, privacy-safe synthetic data with just a few lines of Python.

Try it out yourself in a No Code UI in the SDK Demo Space: mostlyai/synthetic-sdk-demo
merve 
posted an update about 1 month ago
merve 
posted an update about 1 month ago
view post
Post
934
fan-favorite vision LM Florence-2 is now officially supported in transformers 🤗

find all the models in florence-community org 🫡
abhi-khoyani 
updated a model about 1 month ago