KDD '26 · Jeju, Republic of Korea · August 9–13, 2026

Cosmo3DFlow: Wavelet Flow Matching for
Spatial-to-Spectral Compression in
Reconstructing the Early Universe

Md. Khairul Islam1 Zeyu Xia1 Ryan Goudjil1 Jialu Wang2 Arya Farahi2 Judy Fox1
1 University of Virginia  ·  2 University of Texas at Austin

Abstract

We present Cosmo3DFlow, a generative framework combining 3D Discrete Wavelet Transform with flow matching to reconstruct early-Universe initial conditions from present-day observations. The wavelet transform addresses the void problem — approximately 63.7% of cosmic volume is near-empty — by converting spatial sparsity into spectral sparsity, enabling stable ODE integration with large step sizes. At 128³ resolution, Cosmo3DFlow achieves 50× faster sampling than score-based diffusion (5.2 seconds vs. 243 seconds), with better reconstruction quality and half the peak memory.
Cosmo3DFlow teaser
Fig. 1 — Overview of Cosmo3DFlow: a wavelet-domain flow matching framework for reconstructing cosmological initial conditions from present-day observations.

The Void Problem

Approximately 63.7% of cosmic volume is occupied by near-empty voids holding only 16.2% of dark matter mass, yet voxel-based models allocate equal compute to every cell. A single-level 3D Haar DWT converts this spatial sparsity into spectral sparsity — voids collapse to near-zero high-frequency coefficients while filaments and halos retain fine-grained detail — yielding 8× fewer voxels and ~5× lower per-step compute cost.

Voxel vs wavelet representation of the cosmic web
Fig. 2 — Voxel (left) vs. wavelet (right) representation at 128³. The DWT makes voids explicit as near-zero coefficients while preserving filament detail.

Method

Cosmo3DFlow operates entirely in wavelet space: (1) a 3D Haar DWT compresses the input field to 8 coefficient tensors at half spatial resolution (8× compression); (2) a linear flow path interpolates between Gaussian noise and the target in wavelet space; (3) the flow matching loss is minimized jointly with a power spectrum regularizer (λ = 0.01); and (4) at inference, 100 Euler steps in wavelet space followed by an inverse DWT recover the full-resolution density field.

Wavelet-Aware 3D U-Net

The backbone is a 3D U-Net adapted for wavelet inputs. A 16-channel input (8ch wavelet noise + 8ch conditioned observation) passes through encoder–decoder blocks with a fixed 8³ bottleneck. Scale-specific conditioning injects per-level wavelet features at each resolution via 1×1×1 convolutions, and cross-scale skip connections bridge encoder features to non-corresponding decoder levels for multi-scale information flow. Residual blocks follow the BigGAN design with GroupNorm, SiLU, and Gaussian Fourier time embeddings.

Wavelet-aware 3D U-Net architecture
Fig. 3 — Wavelet-aware 3D U-Net with scale-specific conditioning and cross-scale skip connections.

Training

AdamW lr=1e-4 · ReduceLROnPlateau (patience=5, ×0.5) · grad clip 1.0 · EMA 0.999 100 epochs (best val-loss) · batch 16/8/4 (32³/64³/128³) · A100 80 GB

Results

Qualitative Reconstruction

Each row shows a 2D slice from a held-out Standard LH test simulation. Cosmo3DFlow recovers sharp cosmic filaments and halo positions that the diffusion baseline blurs, achieving a 21% lower VRMSE at 128³.

Qualitative reconstruction comparison
Fig. 4 — Columns: present-day observation (z = 0), ground-truth ICs, diffusion baseline, Cosmo3DFlow, and absolute error maps (darker = lower error).

Computational Efficiency

Cosmo3DFlow is 4.4× faster per ODE step due to 8× wavelet compression, and converges to lower VRMSE at 100 steps than diffusion reaches at 1,000 — a 50× end-to-end speedup (5.2 seconds vs. 243 seconds) with better quality.

Efficiency comparison
Fig. 5 — VRMSE vs. wall-clock sampling time at 128³ for varying ODE step counts.

The table below summarizes the head-to-head comparison at 128³. Cosmo3DFlow outperforms the diffusion baseline on every metric while using half the peak memory.

Table 1 — Head-to-head comparison at 128³

Metric @ 128³Cosmo3DFlowDiffusion
Sampling time5.2 seconds243 seconds
VRMSE0.500.63
Cross-correlation0.880.82
Power spectrum R²0.990.70
Peak memory2.1 GB4.0 GB
ODE steps1001,000

Convergence

Cosmo3DFlow reaches its best quality at 100 Euler steps and plateaus; diffusion requires 1,000 steps to approach a higher error floor. The deterministic ODE trajectory in flat wavelet space enables stable large-step integration.

Convergence vs ODE steps
Fig. 6 — Top: reconstructed slices at 10 / 50 / 100 / 500 steps. Bottom: VRMSE vs. step count.

Physics Validation

Three spectral statistics — power spectrum P(k), cross-correlation C(k), and transfer function T(k) — are evaluated vs. wavenumber k on the Standard LH test set at 128³. Cosmo3DFlow achieves near-perfect agreement with ground truth across all scales (P(k) R² = 0.99 vs. 0.70 for diffusion).

Physics validation metrics
Fig. 7 — P(k), C(k), and T(k) vs. wavenumber k. Cosmo3DFlow (blue) vs. diffusion (red) vs. ground truth (dashed).

Full Results — Ours / Diffusion · bold = best

Results across all three dataset suites and resolutions. Cosmo3DFlow consistently outperforms the diffusion baseline, with the largest gains at higher resolution.

Table 2 — Standard Latin Hypercube (2,000 simulations)

ResolutionVRMSE ↓Corr ↑PS R² ↑Transfer Function ↑
128³0.50 / 0.630.88 / 0.820.99 / 0.700.99 / 0.80
64³0.47 / 0.680.92 / 0.890.98 / 0.590.98 / 0.59
32³0.34 / 0.820.96 / 0.850.95 / 0.480.95 / 0.48

Table 3 — Big Sobol Sequence (1,000 simulations)

ResolutionVRMSE ↓Corr ↑PS R² ↑Transfer Function ↑
128³0.62 / 0.640.80 / 0.790.99 / 0.840.95 / 0.88
64³0.53 / 0.650.88 / 0.880.98 / 0.830.94 / 0.81
32³0.37 / 0.790.95 / 0.850.95 / 0.480.94 / 0.71

Table 4 — Non-Gaussian fNL LH (1,000 simulations)

ResolutionVRMSE ↓Corr ↑PS R² ↑Transfer Function ↑
128³0.56 / 0.590.86 / 0.831.00 / 1.000.98 / 0.98
64³0.47 / 0.570.93 / 0.891.00 / 1.000.99 / 0.99
32³0.31 / 0.670.97 / 0.871.00 / 0.980.99 / 0.98

Dataset

All experiments use Quijote N-body simulations at (1,000 h⁻¹ Mpc)³ with 512³ particles, evaluated at resolutions 32³, 64³, and 128³ (redshifts z = 127 and z = 0). Three suites are used: Standard Latin Hypercube (2,000 simulations, 5 ΛCDM parameters, split 1,800/100/100); Big Sobol Sequence (1,000 simulations, 8:1:1 split); and Non-Gaussian fNL LH (1,000 simulations, fNL ∈ [−300, 300], 8:1:1 split).

Citation

@article{islam2026cosmo3dflow, title = {Cosmo3DFlow: Wavelet Flow Matching for Spatial-to-Spectral Compression in Reconstructing the Early Universe}, author = {Islam, Md Khairul and Xia, Zeyu and Goudjil, Ryan and Wang, Jialu and Farahi, Arya and Fox, Judy}, journal = {arXiv preprint arXiv:2602.10172}, year = {2026} }

Acknowledgments

We acknowledge support from the National Science Foundation under Cooperative Agreement 2421782 and the Simons Foundation award MPS-AI-00010515 and Seed Grant AWD-006703 (UVA00002858-AS-ASTR-NSF Simons CosmicAI). We thank the Quijote team for making their 𝑁 -body suite publicly available. We are grateful for the UVA Research Computing resources and support.