| Metric / Model | FlyMy.AI Agent M1 | ByteDance Bagel | OpenAI Dalle | OpenAI Image 1 |
|---|---|---|---|---|
| Image generation quality (GenEval) | 0.83 | 0.82 | 0.67 | N/A |
| Semantic alignment (WISE) | 0.83 | 0.80 | 0.69 | 0.87 |
| Editing accuracy (GEdit-Average) | 6.85 | 6.90 | N/A | 7.62 |
| Face-preserving editing (Face ID Similarity) | 0.917 | 0.636 | 0.390 | Low (N/A) |
| Advanced video generation | ✅ | ❌ | ❌ | ❌ |
| Type | Model | Single Obj. | Two Obj. | Counting | Colors | Position | Color Attr. | Overall |
|---|---|---|---|---|---|---|---|---|
| Gen. Only | PixArt-α [9] | 0.98 | 0.50 | 0.44 | 0.80 | 0.08 | 0.07 | 0.48 |
| SDv2.1 [61] | 0.98 | 0.51 | 0.44 | 0.85 | 0.07 | 0.17 | 0.50 | |
| DALL-E 2 [60] | 0.94 | 0.66 | 0.49 | 0.77 | 0.10 | 0.19 | 0.52 | |
| Emu3-Gen [79] | 0.98 | 0.71 | 0.34 | 0.81 | 0.17 | 0.21 | 0.54 | |
| SDXL [58] | 0.98 | 0.74 | 0.39 | 0.85 | 0.15 | 0.23 | 0.55 | |
| DALL-E 3 [5] | 0.96 | 0.87 | 0.47 | 0.83 | 0.43 | 0.45 | 0.67 | |
| SD3-Medium [19] | 0.99 | 0.94 | 0.72 | 0.89 | 0.33 | 0.60 | 0.74 | |
| FLUX.1-dev† [35] | 0.98 | 0.93 | 0.75 | 0.93 | 0.68 | 0.65 | 0.82 | |
| Unified | Chameleon [70] | - | - | - | - | - | - | 0.39 |
| LWM [42] | 0.93 | 0.41 | 0.46 | 0.79 | 0.09 | 0.15 | 0.47 | |
| SEED-X [23] | 0.97 | 0.58 | 0.26 | 0.80 | 0.19 | 0.14 | 0.49 | |
| TokenFlow-XL [59] | 0.95 | 0.60 | 0.41 | 0.81 | 0.16 | 0.24 | 0.55 | |
| ILLUME [76] | 0.99 | 0.86 | 0.45 | 0.71 | 0.39 | 0.28 | 0.61 | |
| Janus [83] | 0.97 | 0.68 | 0.30 | 0.84 | 0.46 | 0.42 | 0.61 | |
| Transfusion [102] | - | - | - | - | - | - | 0.63 | |
| Emu3-Gen [79] | 0.99 | 0.81 | 0.42 | 0.80 | 0.49 | 0.45 | 0.66 | |
| Show-o [88] | 0.98 | 0.80 | 0.66 | 0.84 | 0.31 | 0.50 | 0.68 | |
| Janus-Pro-7B [1] | 0.99 | 0.89 | 0.59 | 0.90 | 0.79 | 0.66 | 0.80 | |
| MetaQuery-XL† [57] | - | - | - | - | - | - | 0.80 | |
| BAGEL | 0.99 | 0.94 | 0.81 | 0.88 | 0.64 | 0.63 | 0.82 | |
| Flymy AI M1 | 1.00 | 0.98 | 0.79 | 0.91 | 0.60 | 0.72 | 0.83 |
| Type | Model | Cultural | Time | Space | Biology | Physics | Chemistry | Overall |
|---|---|---|---|---|---|---|---|---|
| Gen-Only | SDv1.5 | 0.34 | 0.35 | 0.32 | 0.28 | 0.29 | 0.21 | 0.32 |
| SDXL | 0.43 | 0.48 | 0.47 | 0.44 | 0.45 | 0.27 | 0.43 | |
| SD3.5-large | 0.44 | 0.50 | 0.58 | 0.44 | 0.52 | 0.31 | 0.46 | |
| PixArt-Alpha | 0.45 | 0.50 | 0.48 | 0.49 | 0.56 | 0.34 | 0.47 | |
| playground-v2.5 | 0.49 | 0.58 | 0.55 | 0.43 | 0.48 | 0.33 | 0.49 | |
| FLUX.1-dev | 0.48 | 0.58 | 0.62 | 0.42 | 0.51 | 0.35 | 0.50 | |
| Unified | Janus | 0.16 | 0.26 | 0.35 | 0.28 | 0.30 | 0.14 | 0.23 |
| VILA-U | 0.26 | 0.33 | 0.37 | 0.35 | 0.39 | 0.23 | 0.31 | |
| Show-o-512 | 0.28 | 0.40 | 0.48 | 0.30 | 0.46 | 0.30 | 0.35 | |
| Janus-Pro-7B | 0.30 | 0.37 | 0.49 | 0.36 | 0.42 | 0.26 | 0.35 | |
| Emu3 | 0.34 | 0.45 | 0.48 | 0.41 | 0.45 | 0.27 | 0.39 | |
| MetaQuery-XL | 0.56 | 0.55 | 0.62 | 0.49 | 0.63 | 0.41 | 0.55 | |
| GPT-4o** | 0.81 | 0.71 | 0.89 | 0.83 | 0.79 | 0.74 | 0.80 | |
| BAGEL | 0.44 | 0.55 | 0.68 | 0.44 | 0.60 | 0.39 | 0.52 | |
| BAGEL w/ Self-CoT | 0.76 | 0.69 | 0.75 | 0.65 | 0.75 | 0.58 | 0.70 | |
| FlyMy AI M1 | 0.791 | 0.926 | 0.876 | 0.838 | 0.910 | 0.841 | 0.864 |
| API | Overall Score | Advantage |
|---|---|---|
| FlyMyAI | 0.917 ⭐ | +44% vs Bagel/Edit, +135% vs OpenAI |
| Bagel/Edit | 0.636 | +63% vs OpenAI |
| OpenAI | 0.390 | Baseline |
Dataset: 8,832 face transformation pairs from 50 FFHQ images across emotions, age, hair, and accessories transformations.
| Category | FlyMyAI Best | Bagel/Edit Best | OpenAI Best | Category Winner |
|---|---|---|---|---|
| Emotions | 0.977 (maximal) | 0.907 (simple) | 0.401 (mid) | FlyMyAI |
| Age | 0.915 (mid) | 0.720 (simple) | 0.404 (mid) | FlyMyAI |
| Hair | 0.899 (maximal) | 0.845 (simple) | 0.398 (mid) | FlyMyAI |
| Accessories | 0.930 (mid) | 0.955 (simple) | 0.402 (mid) | Bagel/Edit |
| API | Simple → Complex | Trend |
|---|---|---|
| FlyMyAI | 0.903 → 0.929 | Improves +3% ⬆️ |
| Bagel/Edit | 0.857 → 0.457 | Degrades -47% ⬇️ |
| OpenAI | 0.385 → 0.383 | Stable (poor) → |
- FlyMyAI dominates 3 out of 4 categories and benefits from complex prompts
- Bagel/Edit competitive only in accessories with simple prompts (0.955 vs 0.930)
- Complex prompting advantage: Only FlyMyAI improves with detailed instructions
- Production recommendation: FlyMyAI for identity-critical face transformations
📁 Detailed results: Face Identity Benchmark
If you use Media agent M1 in your research or projects, please cite:
@article{timoni2025m1,
author = {Denis Timonin and Arseny Shahmatov and Nazar Annanazarov and Valentin Kovalev and Alexey Buzovkin},
title = {Media agent M1: how open-source is all you need},
year = {2025},
note = {Available at https://github.com/yourusername/your-repo},
}