Hello
I have a taste of AI, working on Stable Diffusion a bit, then face recognition using Insightface.
I need a bleeding edge tech problem which I can apply my brains to crack it.
Any thots?
Hello
I have a taste of AI, working on Stable Diffusion a bit, then face recognition using Insightface.
I need a bleeding edge tech problem which I can apply my brains to crack it.
Any thots?
What do you mean? This is very vague.
Hello,
Iâve built a face-based event registration and attendance system using InsightFace (GPU), where users self-register via camera. Iâve also implemented an agent to automate onboarding and attendance workflows.
I now want to move beyond a POC into a technically challenging problem with real-world complexity.
Some directions Iâve explored:
Moving from image-based (2D) recognition to video-based (temporal / 3D understanding)
Robustness under real-world conditions (lighting, occlusion, motion blur)
Real-time multi-camera identity tracking across a venue
Using LLMs to analyze event data (engagement, behavior patterns, etc.)
However, these feel like incremental extensions.
What Iâm really looking for is a hard, open problem at the intersection of vision systems, real-time inference, and agentic AIâsomething where current approaches break down in practice (not just benchmarks).
For example:
Where do current face recognition / tracking systems fail at scale in real deployments?
Are there unsolved challenges in combining vision models with LLM-based agents for real-world decision-making?
Any known gaps between research and production systems in this space?
Would appreciate pointers to concrete, technically challenging problems worth tackling.
Thanks
The Problem: Current Local RAG (Retrieval-Augmented Generation) systems like AnythingLLM are limited by storage overhead. If you want to index 100GB of chat history and technical files, your vector database explodes in size, slowing down the CPU/GPU as it tries to search.
The Mission:
High-Density Archiving: Create a pipeline that takes raw data (chat logs, PDF libraries, codebases) and compresses it into a .ZIM file (highly efficient, indexed, offline storage).
AI-Enriched Indexing: Before compression, an LLM âagentâ acts as a librarian, adding metadata and concise summaries to the data.
The API âHole-Punchâ: Develop a script/API that allows AnythingLLM (or any local agent) to query the .ZIM file directly as if it were an active database.
Resource Management: The script must dynamically allocate VRAM/RAM/HDD based on the query. If a user asks a deep history question, the system âhot-loadsâ only that specific .ZIM cluster into memory.
If he can âcrackâ this, it shifts the local AI world in three massive ways:
1. The âPetabyte Partnerâ Currently, a local AI is limited by what fits on your SSD. With .ZIM compression (which can shrink Wikipedia down to a fraction of its size), a home user could carry thousands of times more data in their AIâs âlong-term memoryâ than is currently possible. Your AI wouldnât just know your recent chats; it would have instant access to every book youâve ever read and every line of code youâve ever written.
2. Near-Zero Latency with Massive Scale By using the âLibrarianâ approach (AI-generated summaries inside the ZIM), the model doesnât have to read the whole file. It reads the compressed summary layer first. This would give local users âGoogle-speedâ search across their private data without needing a $10,000 server.
3. Hardware Independence By controlling the âspilloverâ between VRAM, System RAM, and HDD via script, this tech would make high-end AI usable on âbudgetâ hardware (like a 3060 Ti). It turns the local HDDâusually too slow for AIâinto a high-speed library by using the .ZIM indexing logic.
âCan you build the bridge that allows an LLM to perform âDirect-to-ZIMâ writes and reads? If we can treat a compressed ZIM file as a live, editable vector-lite database, we solve the local AI storage bottleneck forever.â