Dingo Tested It for You: Nano Banana Delivers Strong Results

Community Article Published October 3, 2025

Running a "Blind Aesthetic Evaluation" with Dingo × ArtiMuse

Everyone’s been experimenting with nano banana’s image generation capabilities lately, and we gave it a try too. But we didn’t just ask “Can it generate images?”—we wanted to know “Are these images actually good? And why?”

So we treated nano banana as a sample factory: first, we generated a batch of images in diverse styles, then fed them into ArtiMuse for an overall score plus an 8-dimensional critique. Dingo orchestrated the entire evaluation pipeline—applying consistent thresholds, making pass/fail decisions, and producing a clear, visual report.

Tools used:

Dingo: https://github.com/MigoXLab/dingo

ArtiMuse: https://github.com/thunderbolt215/ArtiMuse

Full documentation: https://github.com/MigoXLab/dingo/blob/main/docs/artimuse.md


Why Dingo × ArtiMuse?

ArtiMuse is essentially an “aesthetic engine that sees and explains.” It doesn’t just spit out a score—it breaks down each image across eight dimensions: composition, visual elements, technical execution, originality, thematic expression, emotional impact, gestalt coherence, and overall assessment. After seeing the score, users know exactly where to improve: what to crop, what elements to simplify, or which details to enhance.

To make this “explanatory” capability reliable, the ArtiMuse team built ArtiMuse-10K, a dataset of 10,000 images spanning photography, illustration, design, and AIGC. Each image was annotated by experienced reviewers with an overall score and eight-dimensional labels across a wide range of styles and subjects—ensuring robust generalization. Instead of using traditional bucketing-and-mapping approaches, ArtiMuse treats scores as continuous values (akin to a Token-as-Score paradigm), resulting in stable, reproducible evaluations that can distinguish subtle but critical differences. In benchmark tests, ArtiMuse shows higher correlation with human judgments on standard aesthetic datasets—and crucially, it explains why a score was given, enabling efficient image selection, iteration, and quality control.

Try it online: https://artimuse.intern-ai.org.cn/

ArtiMuse Interface

In engineering practice, Dingo turns this “aesthetic intelligence” into a smooth pipeline: images are submitted, and Dingo’s built-in RuleImageArtimuse creates an HTTP task, polls for completion at a steady pace, retrieves the score_overall, and applies a local threshold (default: 6.0) to classify each image as Good or Bad. The full granular data from ArtiMuse is preserved in the reason field, enabling transparent reporting and traceability. When needed, Dingo can also run additional rules in parallel—for example:

  • Invalid aspect ratio (e.g., >4 or <0.25) → auto-reject
  • Sharpness check via NIMA (common threshold: 5.5)
  • Duplicate detection using PHash + CNN
  • Text-image alignment scored by CLIP (or sentence-level embeddings for higher stability)

The result? Not just a number—but a structured evidence chain and actionable improvement directions. With standardized criteria and automated workflows, team alignment becomes effortless.

Dingo Evaluation Report

Enough talk—let’s look at the test results.
We evaluated 20 samples using the default threshold of 6.0:

  • 15 passed (75%)
  • 5 were rejected
  • End-to-end runtime: ~2 minutes 6 seconds (~6.3 seconds per image, due to conservative polling)

High-scoring images consistently showed: clear subject focus, natural lighting, preserved details, and strong narrative clarity.
Low-scoring ones typically suffered from: overly dominant logos/stickers, excessive stylization causing detail loss, or flat composition lacking visual tension.

Let’s examine a few representative cases to see how each image was evaluated.


Case 1: A Neon-Lit City at Night

Neon City

Score: 7.73
ArtiMuse’s feedback:

The image has exceptional depth and visual flow—streets and buildings guide the eye firmly toward the center. Neon signs and reflections create rich atmosphere. Technically, exposure and sharpness are excellent, with vivid details in wet pavement reflections. A minor flaw: highlights are slightly blown out; toning down the brightest areas would further elevate completeness.


Case 2: An Elderly Craftsperson at Work

Craftsman

Score: 7.42
ArtiMuse’s feedback:

Warm lighting beautifully illuminates the artisan and his work. The composition is stable and storytelling is strong; background pottery is softly blurred but not messy. Technically, focus is precise, details are rich, and color tones are harmonious. The only slight limitation is in originality—the theme is traditional, offering less surprise or novelty.


Case 3: A Cute Sticker-Style Bear

Cute Bear

Score: 4.82
ArtiMuse’s feedback:

Despite clean lines and clear details, the image lacks visual depth and narrative weight. The composition is flat, layers are thin, and decorative appeal outweighs storytelling. It’s perfectly suitable as a sticker or emoji—but in an aesthetic evaluation context, it falls short on gestalt and emotional resonance.


Case 4: A Minimalist Logo Graphic

Minimal Logo

Score: 5.68
ArtiMuse’s feedback:

Highly functional and recognizable—but scores low on originality, emotional impact, and artistic merit. Such graphics are acceptable in compliance or branding contexts, but they don’t qualify as “visually compelling” standalone images.


If you’d like to upgrade from “gut-feeling selection” to “standardized, explainable curation,” the workflow is straightforward:

  1. ArtiMuse scores each image across 8 dimensions and returns a detailed rationale.
  2. Dingo applies a unified threshold (default: 6.0, adjustable in one click) to classify images as Good/Bad.
  3. Additional rules (aspect ratio, NIMA sharpness, PHash/CNN deduplication, CLIP alignment) run in parallel.
  4. The result: a multi-dimensional “health report” per image.

Our internal experience? Selection efficiency soars, and team debates about “what looks good” vanish—especially when aligning on shared aesthetic standards.


Practical Threshold Recommendations:

  • Brand/public-facing content: Set threshold to 6.5–7.0 (quality over quantity).
  • Daily content production: Use 5.8–6.2 for a balance of speed and quality.
  • Creative iteration or teaching: Lower to 5.5–6.0 to retain edge cases for learning.

For faster turnaround, tighten the polling interval from 5s to 2–3s and set a 90-second timeout—batch processing time drops significantly.


Quick Reference Guide:

  • For brand QA or content gatekeeping: Enable ArtiMuse + sharpness/aspect/dedup/CLIP in Dingo → get a one-page compliance report.
  • For bulk curation: Set threshold to 6.0 → auto-approve Good, flag Bad for manual review or editing.
  • For creative coaching: Study ArtiMuse’s dimension breakdowns and iterate using the “composition → lighting → element reduction → narrative” framework.

Note: All sample images above were generated/edited by nano banana and used solely for technical evaluation.

Want our evaluation template, rule configurations, and reproducible scripts? Comment or DM us “Dingo Review”—we’ll send you the full toolkit. We’ll also keep publishing comparative results across more scenarios (e.g., character consistency, multi-image blending, product demos) to turn “good-looking” into a repeatable, team-wide pipeline.

Community

Sign up or log in to comment