NEURAL-TERRAIN-GEN // DALIOSHIN

SYSTEM OVERVIEW

This project combines latent diffusion models with real-time WebGL rendering to generate and explore procedural terrain. The neural network runs inference server-side (Python/FastAPI), streaming heightmap tiles to a Three.js client that renders them with physically-based shading.

ARCHITECTURE

┌─────────────────────────────────────────────────────┐
│                    PIPELINE                         │
├─────────────────┬───────────────┬───────────────────┤
│  DIFFUSION MODEL│  TILE SERVER  │   WEBGL CLIENT    │
│  (Python/HF)    │  (FastAPI)    │   (Three.js)      │
│                 │               │                   │
│  • SDXL base    │  • REST API   │  • Heightmap mesh │
│  • ControlNet   │  • WebSocket  │  • PBR shading    │
│  • LoRA finetune│  • Tile cache │  • Fly camera     │
└─────────────────┴───────────────┴───────────────────┘

TRAINING PIPELINE

The LoRA was trained on a custom dataset of 12,000 DEM (Digital Elevation Model) images scraped from USGS, combined with their corresponding satellite imagery. Training took ~18 hours on a single A100.

Key decisions:

Used ControlNet conditioned on sketch maps to allow user-guided generation
Tile-based generation to support arbitrarily large terrains
Seam blending using overlapping inference windows with feathered edges

RESULTS

Generated terrains at 512×512 resolution in ~1.4s per tile on CPU, ~0.3s on GPU. The WebGL viewer sustains 60fps on mid-range hardware with 16 active tiles.

REFERENCES

Ho et al., 2020 — Denoising Diffusion Probabilistic Models
Zhang et al., 2023 — Adding Conditional Control to Text-to-Image Diffusion Models