Why we run background removal in your browser, not on a server

Server-side AI costs money per call. WebAssembly doesn't. Here's how we ship AI tools for free using ONNX Runtime Web.

Jan Stepien·2026-05-12

Every mainstream background-removal service charges per image: typically $0.02–$0.15 per call after the free tier. For a small utility site serving thousands of users, that math breaks quickly — you either cap usage aggressively, show ads to cover the bill, or find a different architecture entirely. We chose a different architecture: run the AI model entirely in the user's browser using WebAssembly and ONNX Runtime Web. The server cost per image is exactly $0.00.

How background removal AI works

Modern background removal uses a semantic segmentation model — a neural network trained to classify each pixel as "foreground" (person, object) or "background". The most widely used architecture for this is U-Net (and its variants), which uses an encoder-decoder structure with skip connections to produce pixel-level masks at full resolution.

The specific model we use is RMBG-1.4 (from BRIA AI, released under a commercial-use licence for non-revenue applications). It is a lightweight U-Net variant trained on a diverse dataset of images across people, products, and objects. At its default input resolution of 1024×1024, the model has roughly 44 million parameters — small enough to run in a browser, large enough to produce production-quality masks on most subjects.

Inference happens in three stages: preprocess the image (resize to 1024×1024, normalise pixel values to the range the model expects), run the forward pass through the model to produce an alpha mask, then postprocess (resize the mask back to the original image dimensions, apply it as a PNG alpha channel). The entire pipeline runs in under 2 seconds on a modern desktop and 4–6 seconds on a mid-range mobile phone.

What is ONNX Runtime Web?

ONNX (Open Neural Network Exchange) is a vendor-neutral format for ML models. PyTorch, TensorFlow, and JAX can all export to ONNX. ONNX Runtime is Microsoft's inference engine for ONNX models — it's what powers inference in many Azure services and Windows ML.

ONNX Runtime Web is the browser port of ONNX Runtime, compiled to WebAssembly via Emscripten. It exposes the same JavaScript API as the Node.js version, runs entirely client-side with no native dependencies, and supports WebGL and WebGPU backends for GPU acceleration where available. The core WASM binary is ~5 MB; the RMBG model file adds ~176 MB (compressed: ~88 MB) — loaded once and cached by the browser's Cache API across sessions.

The architecture in detail

Here is the exact sequence when a user drops an image into quickhelp.dev's Background Remover:

The user's browser loads the page. The ONNX Runtime Web WASM module and the RMBG model file are fetched from a CDN (Hugging Face Transformers.js CDN) and cached via the Service Worker Cache API. Subsequent uses are fully offline.
The image is read into an HTMLCanvasElement. Pixel data is extracted viagetImageData() and converted to a Float32Array tensor with shape[1, 3, 1024, 1024] (batch, channels, height, width). Pixel values are normalised to mean=[0.5, 0.5, 0.5] std=[1, 1, 1] as the model expects.
The tensor is passed to session.run() on the ONNX Runtime session. This invokes the forward pass through the 44M-parameter network inside WASM, optionally accelerated by WebGL.
The output is a single-channel mask tensor of shape [1, 1, 1024, 1024] with values in [0, 1]. We resize it back to the original image dimensions using bilinear interpolation on the canvas, then apply it as the alpha channel of the original pixel data.
The result is exported as PNG (preserving the alpha channel) via canvas.toBlob()and offered as a download. No pixel of the user's image ever leaves their device.

Why this approach is better for users

Privacy: The image never leaves the browser. With server-side processing, your image is transmitted to an external server, processed, and (in most services) retained for quality monitoring, abuse detection, or model improvement. Running entirely client-side eliminates that surface entirely. This matters for personal photos, confidential product images, and any context where you would not want a third party to see the image.

No throttling: Server-based APIs throttle free users. WASM inference in the browser is limited only by the user's hardware — process one image or a hundred, at full quality, with no rate limits or queue wait times.

Offline capable: After the first load, the model and runtime are cached. You can drop images and get results with no network connection — useful on planes, in areas with poor connectivity, or in enterprise environments with network egress restrictions.

The tradeoffs

This architecture is not free of tradeoffs. The 88 MB first-load download is the most significant — we mitigate it by loading the model only when the user first interacts with the tool (not on page load), showing a clear progress indicator, and caching aggressively. On slow connections this can take 30+ seconds; we show an estimated download time so users can decide whether to wait.

Model quality is a second tradeoff. RMBG-1.4 is excellent but not state-of-the-art — commercial APIs like remove.bg use larger models updated continuously. For hair strands, complex fur, and smoke, the WASM model produces slightly rougher edges. For the majority of common use cases (product photos, profile pictures, simple object isolation), the quality is indistinguishable from commercial alternatives.

CPU usage: A 1024×1024 inference pass pegs a CPU core for 1–3 seconds. On laptops this is fine; on budget phones it can cause a brief UI freeze. We run inference in a Web Worker to keep the main thread responsive.

How to build this yourself

The implementation is straightforward with the Transformers.js library, which wraps ONNX Runtime Web and handles model loading from Hugging Face:

import { pipeline } from '@huggingface/transformers';

const segmenter = await pipeline(
  'image-segmentation',
  'briaai/RMBG-1.4',
  { device: 'wasm' }   // or 'webgpu' if available
);

const result = await segmenter(imageUrl, {
  subtask: 'foreground-extraction'
});

// result[0].mask is a 2D Uint8ClampedArray alpha mask

The device: 'wasm' option runs on ONNX Runtime Web's WASM backend, available in all browsers. Changing to 'webgpu' uses the GPU in Chrome 113+ and can cut inference time to under 500ms on discrete GPUs.

Try it now

quickhelp.dev's Background Remover runs the full RMBG-1.4 model in your browser. Drop any image — JPEG, PNG, WebP — and download the result as a transparent PNG. No account, no upload, no charge.