Image Loading on the Web

You would think that loading images in the browser is a solved problem. After all, websites are basically composed of text and images.

Turns out that loading images from JavaScript to use in WebGL/WebGPU applications is not nearly as simple or well supported as it should be. There are several APIs for it, and the right choice depends on which browser you are targeting. They each have different performance characteristics, and some have unexpected behaviors or outright bugs.

This is surprising considering that loading images is one of the most fundamental things a browser does.

Continue reading →

Metal Lossy Compression Format

The Apple Lossy compression format was introduced in the A15 and M2 chipsets (which share the same GPU generation) and it enables sampling and rendering to textures with a 1:2 compression ratio. The format is transparent to the application: the GPU handles compression and decompression automatically, and the block layout is never exposed through the API.

A few weeks ago I wrote about it in Hardware Image Compression and gave a brief overview of how it’s enabled at the API level and how the results compare to other hardware compression implementations. In this blog post I’ll take a deeper look at the underlying block format and how the hardware decoder is implemented.

Unlike traditional GPU compression formats, the lossy format is not documented. There’s no spec that you can read to understand how the format works. In order to do that I employed the same technique I described in Writing to Compressed Textures in Metal.

Continue reading →

The True Size of ASTC Textures

In my previous post Writing to Compressed Textures in Metal, I showed how Metal heaps can be used to alias buffers and textures to inspect their actual memory layout. Something that caught my attention was how much memory is wasted with non-power-of-two textures. On many devices the texture dimensions get rounded up to the next power of two, and you end up paying for padding memory you never use.

You may think that non power of two textures do not have much use outside of render targets, but it turns out that they are much more prevalent than what most people realize. When allocating ASTC textures with power-of-two pixel dimensions the resulting textures often have non-power-of-two block dimensions, and it’s the block dimensions, not the pixel dimensions, that determine how much memory the texture occupies.

ASTC 6×6 block alignment — 64×64 texture, nominal vs pow2 aligned Left: exact 11×11 block grid = 121 blocks. Right: pow2-aligned 16×16 = 256 blocks, with hatched padding. nominal layout ceil(64/6) = 11 → 11×11 blocks memory layout (pow2 aligned) 11 rounds up to 16 → 16×16 blocks 11 blocks wide 121 blocks × 16 bytes = 1,936 bytes 3.56 bpp (nominal) 16 blocks wide 256 blocks × 16 bytes = 4,096 bytes 8.00 bpp — 2.25× overhead texture data padding (wasted)

How bad is it in practice? I ran a test on several mobile devices to find out. In the tables below I compared the nominal bpp, that is, the exact memory footprint with no alignment padding, against the actual bpp considering the alignment.

Continue reading →

Writing to Compressed Textures in Metal

About a year ago I wrote GPU Texture Compression Everywhere, a post in which, among other things, I lamented that Metal did not have support for writing to compressed textures.

Unlike Vulkan or D3D12, Metal doesn’t support resource casting. There’s no way to write to a compressed texture through an uncompressed view. The only way we can do that is by using a blit operation, so we need to output our results to a temporary buffer and then copy the contents of the buffer to the texture. This requires a temporary memory allocation that needs to be managed, and if the buffer is reused for multiple uploads, hazard tracking may add some synchronization overhead.

I requested support for this feature to Apple when I started working on Spark, more than 3 years ago. Since then not much progress has been made, and support is still not available, so I decided it was time to take matters into my own hands.

Continue reading →

Hardware Image Compression

One of the things I’ve always lamented about hardware image formats is the slow pace of innovation. Developers were usually unwilling to ship textures in a new format unless that format was widely available. That is, the format had to be supported in the majority of the hardware they were targeting, and must be supported across all vendors.

For example, even though ATI introduced the 3Dc formats in 2004 with the Radeon X800 (R420) and exposed them through D3D9 extensions, in practice their use did not become widespread when Direct3D 10 standardized them as BC4 and BC5 in 2007, but only when Direct3D 10 became the minimum hardware requirement.

Crysis was the first major game which shipped with BC5 textures, but most games were not willing to have such a steep hardware requirement until many years later. To avoid these adoption delays, the BC6 & BC7 formats were designed in collaboration between ATI and NVIDIA for inclusion in Direct3D 11.

Hardware development cycles are already long, and for a new format to gain adoption it needs to be proposed for standardization, which often makes the process even longer.

This is one of the reasons why I find real-time texture compression so exciting. When the encoder runs in real-time it’s a lot easier to introduce new hardware formats, because adopting a new format no longer requires waiting for content to be created targeting it.

Continue reading →

Announcing spark.js 0.1

I’m excited to announce spark.js 0.1, now with WebGL support!

spark.js has been evolving since I released it last summer. Since then, the WebGPU ecosystem has matured considerably. WebGPU is now more stable and widely supported across browsers and platforms. However, users kept telling me the same thing: even though targeting WebGPU is practical today, most teams have codebases that still rely on WebGL, and that made adoption difficult. For that reason I committed to adding WebGL support.

This felt like the right moment to bump the version number to 0.1 and signal that spark.js is production ready, not just experimental. That said, I expect the API to continue evolving based on the features developers need and the friction points they encounter.

Continue reading →

Runtime Mipmap Generation

One of the advantages of runtime texture compression is that mipmaps can also be computed at runtime and do not need to be transmitted. In contrast, offline codecs must precompute mipmaps so they can be compressed and transmitted alongside the base image.

Each mip level requires one quarter of the memory of its parent, so a full mipmap chain increases the total texture size by approximately 33%. In principle, offline codecs could attempt to exploit correlation between mip levels, but in practice they rarely do. In fact, rate-distortion optimized codecs often exceed the theoretical 33% overhead, because lower-resolution mip levels tend to have higher entropy and compress less efficiently.

By generating mipmaps on the device, we eliminate the need to transmit them entirely, reducing the total transmitted texture data by 25%. This is a dramatic reduction by image codec standards, and can be achieved without any loss of quality. The tradeoff is that mipmaps must be generated at runtime, but fortunately this is a task that GPUs excel at.

Continue reading →

Normal Map Compression Revisited

Normal maps are one of the most widely used texture types in real-time rendering, but they’re also a bit unusual. They don’t represent color, rely on geometric assumptions, and small encoding or decoding details can lead to subtle artifacts.

This article takes a practical look at how normal maps are commonly compressed today, the tradeoffs involved, and a few pitfalls that are easy to overlook. We’ll look at these details and see how they are handled in practice in the context of spark.js and web 3d development.

Continue reading →

Choosing Texture Formats for WebGPU applications

This is a supplement to Don McCurdy’s excellent article, Choosing texture formats for WebGL and WebGPU applications. Don covers the tradeoffs between modern image formats like WebP and AVIF, and GPU optimized container formats like KTX2, with practical guidance on when each makes sense.

Since that article was published, a new option has emerged: spark.js, a real time GPU texture compression library for WebGPU, which makes it possible to ship textures as standard web images while still ending up with efficient block compressed textures in GPU memory.

Prior the arrival of spark.js, developers wanting to load textures in their applications effectively had three options:

  1. use traditional image formats like PNG, JPEG, WebP, or AVIF,
  2. use GPU-specialized block compression formats like BC7 or ASTC, or
  3. use universal GPU texture formats like KTX2.
Continue reading →