Bleeding Edge Tech

Hello

I have a taste of AI, working on Stable Diffusion a bit, then face recognition using Insightface.

I need a bleeding edge tech problem which I can apply my brains to crack it.

Any thots?

1 Like

What do you mean? This is very vague.

Hello,

I’ve built a face-based event registration and attendance system using InsightFace (GPU), where users self-register via camera. I’ve also implemented an agent to automate onboarding and attendance workflows.

I now want to move beyond a POC into a technically challenging problem with real-world complexity.

Some directions I’ve explored:

  • Moving from image-based (2D) recognition to video-based (temporal / 3D understanding)

  • Robustness under real-world conditions (lighting, occlusion, motion blur)

  • Real-time multi-camera identity tracking across a venue

  • Using LLMs to analyze event data (engagement, behavior patterns, etc.)

However, these feel like incremental extensions.

What I’m really looking for is a hard, open problem at the intersection of vision systems, real-time inference, and agentic AI—something where current approaches break down in practice (not just benchmarks).

For example:

  • Where do current face recognition / tracking systems fail at scale in real deployments?

  • Are there unsolved challenges in combining vision models with LLM-based agents for real-world decision-making?

  • Any known gaps between research and production systems in this space?

Would appreciate pointers to concrete, technically challenging problems worth tackling.

Thanks

1 Like

The Tech Challenge: “The ZIM-Memory Bridge”

The Problem: Current Local RAG (Retrieval-Augmented Generation) systems like AnythingLLM are limited by storage overhead. If you want to index 100GB of chat history and technical files, your vector database explodes in size, slowing down the CPU/GPU as it tries to search.

The Mission:

  1. High-Density Archiving: Create a pipeline that takes raw data (chat logs, PDF libraries, codebases) and compresses it into a .ZIM file (highly efficient, indexed, offline storage).

  2. AI-Enriched Indexing: Before compression, an LLM “agent” acts as a librarian, adding metadata and concise summaries to the data.

  3. The API “Hole-Punch”: Develop a script/API that allows AnythingLLM (or any local agent) to query the .ZIM file directly as if it were an active database.

  4. Resource Management: The script must dynamically allocate VRAM/RAM/HDD based on the query. If a user asks a deep history question, the system “hot-loads” only that specific .ZIM cluster into memory.


How This Changes the AI Landscape

If he can “crack” this, it shifts the local AI world in three massive ways:

1. The “Petabyte Partner” Currently, a local AI is limited by what fits on your SSD. With .ZIM compression (which can shrink Wikipedia down to a fraction of its size), a home user could carry thousands of times more data in their AI’s “long-term memory” than is currently possible. Your AI wouldn’t just know your recent chats; it would have instant access to every book you’ve ever read and every line of code you’ve ever written.

2. Near-Zero Latency with Massive Scale By using the “Librarian” approach (AI-generated summaries inside the ZIM), the model doesn’t have to read the whole file. It reads the compressed summary layer first. This would give local users “Google-speed” search across their private data without needing a $10,000 server.

3. Hardware Independence By controlling the “spillover” between VRAM, System RAM, and HDD via script, this tech would make high-end AI usable on “budget” hardware (like a 3060 Ti). It turns the local HDD—usually too slow for AI—into a high-speed library by using the .ZIM indexing logic.


The “Surgical” Question for him:

“Can you build the bridge that allows an LLM to perform ‘Direct-to-ZIM’ writes and reads? If we can treat a compressed ZIM file as a live, editable vector-lite database, we solve the local AI storage bottleneck forever.”