🎨 Omni Editor 2.0

Please give us a ❤️ if you find it helpful. Free trials refreshed every few days.

📤 Upload Image

✏️ Editing Instructions

🎯 Editing Result

💡 Prompt Examples

🎨 Unlimited AI Image Generation & Editing

Access unlimited image generation, video creation, and advanced editing features. No time limitation, no ads, no watermark.

🎨 What Omni Model Can Do

This Space demonstrates just a small fraction of what the Omni model can do. Beyond what you see here, it can also perform:

👗 Virtual Try-On 🌅 Background Replacement 💇 Hairstyle Changer 📄 Poster Editing 🎨 Style Transfer + dozens more...

No fine-tuning required - just modify the prompt and input parameters!

🤖 Omni Creator 2.0: 8B Unified Multi-Modal Diffusion Transformer

8B-parameter native MM-DiT unifying T2I, pixel-level editing, and I2V generation. Uses CLIP/T5-style text encoders and visual conditioners to deliver high-fidelity multi-modal results with shared transformer backbone.

RoPE + AdaLN-Zero DiT Blocks

Multi-head attention with timestep-conditioned modulation

Adaptive Multi-Modal Gating

Learned fusion of text, 3× images, and temporal context

HPC-Ready Optimization

FP8 + RoPE + RMSNorm for production-scale inference

📊 AdaLN-Zero + AMG (Full Stack)
# 1) Modal fusion (text / up to 3 imgs / temporal)
α = Softmax(MLP([ctxt; cimg; ctmp]));  C = αtxt·ctxt + Σ αi·cimg,i + αtmp·ctmp
# 2) Time-conditioned modulation
h = TEmbed(t) ⊕ C;  (γ, β, λ) = MLP(h)
# 3) RoPE self-attn + gated residual
Q = RoPE(Wq·LN(x)); K = RoPE(Wk·LN(x)); V = Wv·LN(x)
A = Softmax((QKT/√d) + Brel) · V
x ← x + λ ⊙ A · (1+γ) + β + CrossAttn(x, C)
# 4) AdaLN-zero modulated SwiGLU FFN
u = SwiGLU(W1·LN(x));   x ← x + λ ⊙ (W2·u) · (1+γ) + β
📅 Product Timeline
Sep 31 2025 Omni Creator 1.0 Released

Supported single-image editing and multi-image editing.

Dec 15 2025 Omni Creator 2.0 Released

Unified image editing, T2I, T2V, I2V, face swap, watermark removal, and more into one full-modal generation & editing suite.

Next Plan for Omni Creator 3.0

Bring video generation into the unified stack for true end-to-end multimodal creation.

⚡ OmniScheduler: Unified Hybrid Flow-Diffusion Sampler

Sampling framework that couples Flow Matching, Karras sigmas, and Heun/RK4 ODE solvers for fast 4–8 step generation.

🎯 Few-Step Flow Matching

Supports velocity/epsilon/sample prediction with flow-to-velocity conversion, enabling 4–8 step inference.

🔄 Multi-Stage Sampling

Coarse-to-fine pipeline: ~70% steps for coarse draft, optional refine passes for details.

📈 RK4 Hybrid ODE

Runge-Kutta 4th order solver with flow matching conditioning for optimal trajectory evolution.

🔢 Flow Matching Formulation

Given data distribution p₁(x) and noise distribution p₀(z) = N(0, I), the Rectified Flow defines:

x_t = (1 - t) · z + t · x    where t ∈ [0, 1]

The velocity field v_θ(x_t, t) is trained to match the conditional velocity:

L_FM = E_{t,x,z}[ ||v_θ(x_t, t) - (x - z)||² ]

At inference, we solve the ODE: dx/dt = v_θ(x_t, t) from t=0 to t=1

π π-Flow Policy Network (Coarse Trajectory)
Instead of evaluating the model dozens of times, a lightweight policy network can predict a multi-step velocity trajectory in one forward pass.
# One-shot trajectory prediction (coarse stage)
v_{0:S-1} = π_φ(z₀, c, t_grid) x_{k+1} = x_k + v_k · Δt (k = 0..S-1)
How it integrates with OmniScheduler:
Stage-1 (coarse): apply the predicted velocities directly (policy rollout) to rapidly move along the flow-matching path.
Stage-2 (refine): optionally switch to Heun/RK4 higher-order updates for detail recovery and stability.
Multi-modal conditioning: the policy is conditioned on aggregated text/visual context + time embedding, outputting velocity fields matching latent shape.
🎲 Multi-Stage Sampling Pipeline
📷
Input
Text + Images
Stage 1
Coarse (≈70%)
Euler/Heun, few steps
💎
Stage 2
Refine (≈30%)
RK4 optional
🎨
Output
HD Image/Video

📈 State-of-the-Art Model Comparison (2025)

Model Params Architecture Training Inference NFE Acceleration
FLUX.2-Dev 32B DiT + MM Flow Matching Euler/DPM 50 FP8 + FlashAttn
Qwen-Image 20B DiT + MLLM Rectified Flow FlowMatch Euler 30-50 Lightning Lora
Qwen-Image-Edit 20B DiT + Dual-Branch Flow Matching Euler 28-50 Lightning Lora
HunyuanVideo 13B+ AsymmDiT Diffusion Multi-step 50+ FP8 + Multi-frame
Wan2.2 5B/14B DiT + MoE Diffusion Multi-step 30-50 MoE Routing + FP8
Z-Image-Turbo 6B Distilled DiT Progressive Distill Few-step 4-8 Distill
Mochi 10B Video DiT Diffusion Multi-step 50+ ComfyUI Parallel
⭐ Omni Creator 2.0 8B MM-DiT + AMG π-Flow + FM RK4 Hybrid 4-8 Policy Distill + Multi-Stage
Abbreviations: DiT = Diffusion Transformer | MM = Multi-Modal | MLLM = Multimodal LLM | MoE = Mixture of Experts | FM = Flow Matching | NFE = Number of Function Evaluations | AMG = Adaptive Multi-Modal Gating