SYSTEM OVERVIEW
This project combines latent diffusion models with real-time WebGL rendering to generate and explore procedural terrain. The neural network runs inference server-side (Python/FastAPI), streaming heightmap tiles to a Three.js client that renders them with physically-based shading.
ARCHITECTURE
┌─────────────────────────────────────────────────────┐
│ PIPELINE │
├─────────────────┬───────────────┬───────────────────┤
│ DIFFUSION MODEL│ TILE SERVER │ WEBGL CLIENT │
│ (Python/HF) │ (FastAPI) │ (Three.js) │
│ │ │ │
│ • SDXL base │ • REST API │ • Heightmap mesh │
│ • ControlNet │ • WebSocket │ • PBR shading │
│ • LoRA finetune│ • Tile cache │ • Fly camera │
└─────────────────┴───────────────┴───────────────────┘
TRAINING PIPELINE
The LoRA was trained on a custom dataset of 12,000 DEM (Digital Elevation Model) images scraped from USGS, combined with their corresponding satellite imagery. Training took ~18 hours on a single A100.
Key decisions:
- Used ControlNet conditioned on sketch maps to allow user-guided generation
- Tile-based generation to support arbitrarily large terrains
- Seam blending using overlapping inference windows with feathered edges
RESULTS
Generated terrains at 512×512 resolution in ~1.4s per tile on CPU, ~0.3s on GPU. The WebGL viewer sustains 60fps on mid-range hardware with 16 active tiles.
REFERENCES
- Ho et al., 2020 — Denoising Diffusion Probabilistic Models
- Zhang et al., 2023 — Adding Conditional Control to Text-to-Image Diffusion Models