Web Performance Optimization: Part 1 - Network and Assets Optimization

Reading time: 20minViews:
The article discusses optimizing web performance by focusing on network and assets. It highlights the importance of using CDNs to cache resources closer to users, reducing HTTP requests, and addressing issues with HTTP/1.1 like head-of-line blocking and TCP slow start. HTTP/2's multiplexing and HTTP/3's QUIC protocol are also covered for their performance benefits. The article emphasizes reducing static asset sizes through techniques like code splitting, tree shaking, and compression. It also explores caching strategies, including HTTP caching, service worker caching, and resource hints like preload and prefetch. Lazy loading techniques, such as Intersection Observer and native image lazy loading, are also discussed.

In my article Web Performance Metrics, I introduced some crucial metrics that are useful for enhancing user experience.

In order to make the optimization of performance more structured, and in conjunction with the metrics in Web Vitals, I split actions to improve web performance into two main areas: network and assets optimization, and page rendering optimization. Network and assets optimization focuses on network and static resources, with the most important goal being to allow pages to quickly access key static resources in a good network environment. Page rendering optimization aims to get the browser to render the page swiftly for the user.

Let's dive into the first part of web performance enhancement, which is the main focus of this article: network and assets optimization.

CDN

CDN stands for Content Delivery Network. CDN providers cache resources from the original servers to high-performance nodes around the world. When a user requests web resources, they are directed to the nearest node, and the IP of that node is returned to the user. This ensures the user receives the content from the closest node, resulting in faster and more stable transmission. The CDN has two core concepts: caching and fetching from the source:

  • Cache: Resources requested from the source server are cached as needed.

  • Back-to-source: If CDN nodes don't have the required resources (either because they were never cached or the cache has expired), they will retrieve them from the original server.

There are several CDN providers to choose from. I've shared a method to use CDN in Next.js with Vercel in my post: A hands-on SSG implementation using the Next.js App router.

HTTP & TCP

Optimizing the HTTP application layer and TCP transport layer is a crucial for front-end performance optimization. We all know that merging (reducing) HTTP requests for static resources (e.g. image sprites) is a common optimization guideline. But what's the rationale behind it? Is it really necessary to strictly reduce HTTP requests? Let's explore the principles below.

Disadvantages of HTTP/1.1

HTTP/1.1 Head-of-Line Blocking Issue

We all know that HTTP/1.1 uses persistent connections, which allows sharing a single TCP pipeline, but only one request can be processed in the pipeline at a time. Until the current request is completed, other requests are blocked. This means that if a request is blocked for 10 seconds, subsequent queued requests will be delayed by 10 seconds. This head-of-line blocking issue prevents these data requests from being processed in parallel. This is why browsers like chrome support multiple TCP connections (up to 6 default) for each individual domain, allowing requests to be distributed across these separate connections, achieving parallel request processing.

TCP Slow Start

TCP slow start is a strategy for TCP congestion control. When a TCP connection is first established, there is a slow start process, which means gradually increasing the number of data packets sent. Each time the sender receives an acknowledgment packet, the congestion window size increases by one. The performance issue with slow start arises because if a small key resource of a page also has to go through this slow start process, the page rendering performance will be significantly reduced.

Therefore, considering the drawbacks of HTTP/1.1, the use of CSS sprites, which we mentioned earlier, is very necessary. A single TCP connection can only handle one HTTP request at a time. So, when there are many resources on a website and the browser has a limit on the number of TCP connections, the page load time will be slower. In this case, combining several small images into one large image to reduce HTTP requests is highly advisable.

Advantages and Disadvantages of HTTP/2

Multiplexing

HTTP/2 adopts a multiplexing mechanism.

HTTP/2 introduces a binary framing layer, where the browser converts each request into multiple frames with request ID numbers. After the server receives all the frames, it merges frames with the same ID into a complete request. After processing the request, it converts the response into multiple frames with response ID numbers, and the browser merges the frame data based on the ID numbers. Through this mechanism, HTTP/2 achieves parallel transmission of resources. Additionally, since HTTP/2 uses only one TCP connection per domain, it solves the head-of-line blocking issue of HTTP/1.1 and also addresses the bandwidth competition problem caused by multiple TCP connections.

Previously, we discussed CSS sprites. With HTTP/2, multiple requests are no longer a significant performance issue. Optimizing image formats and sizes, like using WebP, offers better performance improvements. As a result, using sprite images in the frontend is no longer considered a best practice with HTTP/2.

Other Advantages

  • Priorities: Based on the binary framing layer, HTTP/2 can also set request priorities, solving the resource priority issue.

  • Server push: allows the server to send key resources for the page without the browser having to request them actively. Once the HTML is parsed, the key resources in the critical rendering path can be obtained.

  • Header compression: HTTP/2 compresses request and response headers.

Disadvantage: TCP Head-of-Line Blocking Issue in HTTP/2

HTTP/2 solves the application layer head-of-line blocking issue but does not change the TCP transport layer protocol, which is the same as HTTP/1.1. We know that TCP is a reliable, connection-oriented (one-to-one single connection) communication protocol. If a data packet is lost or delayed, the entire TCP connection will be paused, waiting for the lost or delayed data packet to be retransmitted. Moreover, in HTTP/2, only one TCP connection is used per domain, so all requests run on a single long TCP connection. If one data stream experiences packet loss, it will block all requests in that TCP connection, thus affecting the transmission efficiency of HTTP/2.

Prospects of HTTP3

QUIC Protocol

Browser Support

87
87
88
partial support
Source

The main difference between HTTP/3 and HTTP/2 is that HTTP/3 uses QUIC as the transport layer to handle streams, while HTTP/2 uses TCP to handle streams at the HTTP layer.

QUIC can be seen as a protocol that integrates "TCP + HTTP/2 multiplexing + TLS". Its fast handshake feature (based on UDP) allows connections to be established using 0-RTT or 1-RTT, significantly improving the speed of the first page load. However, currently, HTTP/3 has compatibility issues with browsers, and Safari does not support this feature by default.

Preconnect & DNS-Prefetch

Both preconnect and dns-prefetch are standard for resource hints and are used as follows:

text
Copy code
<link rel="preconnect" href="<https://example.com>">
text
Copy code
<link rel="dns-prefetch" href="<https://example.com>">
  • preconnect

Browser Support

46
79
115
11.1
Source

It can pre-complete a DNS lookup, TCP handshake, and TLS handshake (for HTTPS) before requesting a resource. This allows the client to send a direct HTTP request with the first TCP packet. Although straightforward, it consumes valuable CPU time, especially on secure connections. If the connection isn't used within 10 seconds, the browser will close it, wasting all the initial setup work.

  • dns-prefetch

Browser Support

4
12
127
5
Source

Browser compatibility is better, but it only handles DNS queries.

Reducing static assets

Let's explore how to optimize the size of common resources. Most of these resources can be optimized during the webpack packaging stage, and additional compression can be applied at the HTTP level.

html

The initial HTML should be kept under 14kb to avoid increasing RTT, which affects the first screen rendering time. The webpack plugin html-webpack-plugin has a minify:true configuration to enable HTML compression. Additionally, avoid overusing inline CSS styles and JS scripts.

JS

Let's highlight some cool things we can do with Webpack.

  • scope hoisting

    In webpack, scope hoisting is used to detect if import chaining can be inlined, thereby reducing unnecessary module code. Enabling the ModuleConcatenationPlugin in webpack activates scope hoisting. This plugin is enabled by default in production mode.

  • code splitting

    Webpack's code splitting can separate code into different bundles, which can then be loaded on demand or in parallel. Code splitting can be used to get smaller bundles and reduce file load times. Common solutions include:

    1. The out-of-the-box SplitChunksPlugin configuration can automatically split chunks.

    2. Dynamic import can achieve on-demand code loading.

    3. Using the entry configuration can set up multiple entry points for code bundling, allowing for manual code separation.

  • tree shaking

    Tree shaking is a term used in JavaScript to remove unused code. In webpack version 2, webpack built-in support for ES2015 modules and also supported unused module export detection. Webpack version 4 extended this feature and provided hints to the compiler by adding the "sideEffects" property in package.json to indicate which files in the project are "pure" and can be safely removed.

  • minify

    Webpack 4+ uses the TerserPlugin for code compression by default in production environments.

CSS

  • Inline Critical CSS

    Inlining critical CSS can speed up page rendering. Since CSS blocks JavaScript execution, and JavaScript blocks DOM generation, which in turn blocks page rendering, CSS can also potentially delay rendering. Some CSS-in-JS solutions, like styled-components, are critical CSS-friendly. Styled-components track the components rendered on the page and automatically inject their inline styles instead of using CSS links. Combined with component-level code splitting, this approach ensures less code is loaded on demand.

  • Dynamically Asynchronous Loading of CSS

    For large CSS files, you can split them into multiple files for different purposes using media query attributes. This way, specific CSS files are loaded only in specific scenarios.

  • CSS File Compression

    You can enable CSS compression using webpack's mini-css-extract-plugin.

Image Compression

Webpack's img-loader supports various image compression plugins. If the image is very small, we can also consider inlining it as a base64 image using url-loader.

HTTP Layer Resource Compression

This mainly uses the content-encoding entity header to compress data of specific media types. With Nginx configuration, we can achieve this:

text
Copy code
# Enable gzip compression gzip on; # Set the minimum HTTP protocol version required for gzip (HTTP/1.1, HTTP/1.0) gzip_http_version 1.1; # Set the compression level, the higher the level, the longer the compression time (1-9) gzip_comp_level 4; # Set the minimum number of bytes to compress, obtained from the Content-Length of the page gzip_min_length 1000; # Set the types of files to compress (text/html) gzip_types text/plain application/javascript text/css;

Caching in assets

HTTP caching

Fresh and stale based on age

Stored HTTP responses can either be fresh or stale. When a response is fresh, it means it's still valid and can be reused. On the other hand, a stale response indicates that the cached data has expired..The freshness of a response is determined by two related headers: Expires and Cache-Control's max-age.

  • Expires

    Its value is a GMT time. The value describes an absolute time, which is returned by the server. Expires is limited to the local time and may cause cache invalidation if the local time is modified.

  • Cache-Control

    There are several main values:

    • no-cache: the browser must re-validate the cached version of the URL with the server every time before using it.

    • no-store: browsers and other intermediate caches (e.g. CDNs) never store any version of a file.

    • private: browsers can cache files, but intermediate caches cannot.

    • public: response content can be cached and stored by any server.

    • max-age: the duration of the cache, used to specify a relative amount of time.

If both of these headers appear together, then cache-control takes precedence over expires.

Validation

Stale responses are not immediately discarded. HTTP has a mechanism to transform a stale response into a fresh one by asking the origin server. This is called validation, or sometimes, revalidation. The server will check whether the relevant resources are modified and updated, and return a 304 status code if there is no update, or return the latest resources and a 200 status code if there is a modification and update.

Validation is done by two pairs of headers: Last-Modified/If-Modified-Since and ETag/If-None-Match . Last-Modified indicates the time when the server considers the resource to have been modified. The browser will include this value as If-Modified-Since in the next request, allowing the server to check if the resource has been modified. The usage of ETag and If-None-Match is similar to Last-Modified and If-Modified-Since, but the generation of the ETag value is more complex, usually being a hash of the content or a hash of the last modification timestamp.

ETag has a higher priority than Last-Modified.

Using Long-term Caching

To utilize caching more effectively, we usually set a long caching time for static resources. Each time we package and go production, to ensure the browser gets the latest resources, we change the version hash filename of the modified static resources, such as changing main.8e0d62a03.js to main.2h124j36.js. We can use webpack to generate such hash file fingerprints. Webpack has the following three ways to generate hashes:

  • hash: Related to the entire project build. As long as the project is modified, the hash value of the entire project build will change.

  • chunkhash: Related to the chunk packaged by webpack. Different entries will generate different chunkhash values.

  • contenthash: Defined based on the file content. If the file content does not change, the contenthash remains the same.

service worker caching

Service Workers intercept HTTP requests and use caching strategies to decide which resources to return to the browser. While both Service Worker caching and HTTP caching aim to improve performance, Service Worker caching provides more advanced features, such as detailed control over what gets cached and how the caching process is managed.

Here are some common Service Worker caching strategies (also a few out-of-the-box caching strategies provided by Workbox):

  • Network only: Always fetch the latest content from the network.

  • Network falling back to cache: Aim to provide the latest content. However, if the network fails or is unstable, slightly older content can be provided.

  • Stale-while-revalidate: Can immediately provide cached content, but should use updated cached content in the future.

  • Cache first, fall back to network: Prioritize serving content from the cache to improve performance, but the Service Worker should occasionally check for updates.

  • Cache only: Use only the cache.

stale-while-revalidate

Browser Support

75
79
68
14
Source

In the above service worker caching strategy, we mentioned stale-while-revalidate. It is actually an HTTP cache invalidation strategy promoted by HTTP RFC 5861. This strategy first returns data from the cache (which may be stale) while sending a fetch request (revalidation), and finally gets the latest data. The usage of stale-while-revalidate is similar to max-age:

Cache-Control: max-age=1, stale-while-revalidate=59

If a request is repeated within the next 1 second, the previously cached value will still be the latest and will be used as is, without any revalidation. If the request is repeated between 1 and 59 seconds later, although the cache is stale, it can still be used directly while performing an asynchronous revalidation. After 59 seconds, the cache is completely stale and a network request is required.

Additionally, based on the idea of stale-while-revalidate, there is a React Hooks library called SWR for data fetching. It allows our components to immediately load cached data while asynchronously refreshing the subscribed data to provide updated interface data. This gives components the ability to continuously and automatically receive data updates.

Above we covered the browser's HTTP cache and service worker cache, let's briefly sort out the caching order (priority from highest to lowest) that the browser follows when requesting resources:

  1. memory cache (if any)

  2. service worker cache

  3. HTTP cache

  4. Server or CDN

Resource hints

preload

typescript
Copy code
<link rel="preload" href="sintel-short.mp4" as="video" type="video/mp4">

The rel attribute of the link element with the value preload allows you to declare fetch requests in the HTML <head>, indicating that the resource will be used soon. This way, the browser can load the resource early (also increasing its priority). This ensures that the resource is available sooner and is less likely to block the rendering of the page, thereby improving performance.

  • Scenarios for using preload

    The basic use of preload is to load resources early that will be discovered later. Although the browser's preloader can discover most resources in the HTML markup early, not all resources are in the HTML. Some resources are hidden in CSS and JavaScript, and the browser cannot discover and download them early. Therefore, in many cases, these resources will eventually delay the first render or the loading of critical parts of the page.

    Font resources are very suitable for optimization using preload. In most cases, fonts are crucial for rendering text on the page, and their usage is deeply embedded in CSS. Even if the browser's preloader parses the CSS, it cannot determine whether they are needed.

    typescript
    Copy code
    <link rel="preload" href="font.woff2" as="font" type="font/woff2">

    By using preload, we can increase the priority of font resources so that the browser can preload them early. There are cases showing that using preload for font loading can reduce the overall page load time by half.

  • Considerations for using preload

    1. Although the benefits of preload are obvious, overuse can waste users' bandwidth. Additionally, if the preloaded resource is not used within 3 seconds, a warning will be displayed in the browser's console.

    2. Do not omit the as attribute. Omitting the as attribute or using an invalid value will make the preload request equivalent to an XHR request, and the browser will not know what it is fetching, thus fetching it with a relatively low priority.

prefetch

text
Copy code
<link rel="prefetch" href="/library.js" as="script">

The usage of prefetch is similar to preload, but its functionality is quite different. It mainly tells the browser to fetch resources that might be needed for the next navigation. This primarily means that the resources will be fetched with very low priority (because the browser knows that all the content needed for the current page is more important than the resources that might be needed for the next page). This means that prefetch resources are mainly used to speed up the next navigation rather than the current one. In terms of browser compatibility, prefetch is supported up to IE11. It should also be noted that if prefetch and preload resources can be cached (for example, if there is a valid cache-control), then the cache is stored in the HTTP cache and placed in the browser's memory cache; if the resources are not cacheable, they will not be stored in the HTTP cache. Instead, they will be elevated to the memory cache and remain there until they are used.

Using Webpack for prefetch and preload

Webpack v4.6.0+ adds support for prefetching and preloading.

javascript
Copy code
import(/* webpackPrefetch: true */ './path/to/LoginModal.js');

This generates <link rel="prefetch" href="login-modal-chunk.js" as="script"> and appends it to the page header, instructing the browser to prefetch the login-modal-chunk.js file at idle. And as soon as the parent chunk finishes loading, webpack adds a prefetch hint.

javascript
Copy code
import(/* webpackPreload: true */ 'ChartingLibrary');

The preload chunk will start loading in parallel when the parent chunk is loaded. The preload chunk is loaded after the parent chunk is loaded.

quicklink

quicklink is a very small (less than 1KB minified/gzipped) library developed by Google Chrome Labs. It aims to speed up subsequent page loads by prefetching links within the viewport during idle times.

Its main principles are:

  • Detecting links within the viewport (using Intersection Observer API)

  • Waiting for the browser to be idle to prefetch page resources (using requestIdleCallback)

  • Checking if the user is on a slow connection (using navigator.connection.effectiveType) or has enabled data saving (using navigator.connection.saveData)

  • Prefetching the URL of the links (using <link rel=prefetch> or XHR). It provides some control over request priority: defaulting to low priority with rel=prefetch or XHR, and for high-priority resources, attempting to use fetch() or falling back to XHR.

A demo provided by Quicklink shows that using Quicklink can improve page load performance by 4 seconds.

Lazy loading

The above subsection introduced a few preloading techniques, next we will talk about lazy loading. Lazy loading is delayed loading, which can greatly reduce the loading of invalid resources, thus improving the performance of the page. The core application of lazy loading is that resources outside the current viewport (or non-critical rendering path resources) do not need to be loaded. Lazy loading at the code level is most commonly done with third-party libraries (dynamic Import) or components (React's React.lazy), which is essentially the dynamic import() syntax. Here's how to implement lazy loading of other resources:

Intersection Observer

The Intersection Observers API allows us to know when observed elements enter or exit the browser's viewport. By leveraging this feature, we can avoid loading resources that are not currently within the viewport.

Among the many third-party frontend lazy loading libraries, there is a high-performance, lightweight JavaScript library called Lozad.js. It supports lazy loading for various resources such as image, picture, iframes, videos, audios, responsive images, background images, and multiple background images. Unlike existing lazy loading libraries that are tied to browser scroll events or periodically need to call getBoundingClientRect() on lazy-loaded elements, which forces the browser to re-layout the entire page causing browser stuttering, Lozad.js uses the non-blocking Intersection Observer API.

Browser-native delayed image loading

Browser Support

77
79
75
15.4
Source
text
Copy code
<img src="image.png" loading="lazy" alt="…" width="200" height="200">

The above code enables native image lazy loading in browsers (based on Chromium and Firefox; browsers that do not support the loading attribute will ignore it). This way, we don't need to use other JavaScript libraries to achieve image lazy loading.

The loading attribute has three values:

  1. auto: Uses the browser's default loading behavior, which is the same as not using the loading attribute.

  2. lazy: Delays loading the resource until it reaches a threshold distance from the viewport.

  3. eager: Loads the resource immediately, regardless of its position on the page.

How to understand the threshold distance from the viewport when loading=lazy?

Chromium's lazy loading implementation tries to ensure that off-screen images load early enough so that they are fully loaded when the user scrolls near them. By fetching image resources before they become visible in the viewport, it maximizes the chances that they are loaded when they become visible. So, how early should the images be loaded? In other words, at what distance from the current viewport will the browser start loading the subsequent images? The answer is that Chromium's threshold distance is not fixed and depends on the following factors:

  1. The type of image resource.

  2. Whether data-savings is enabled.

  3. The current network conditions (effective connection type).

Based on these three factors, Chromium continuously improves the algorithm for this threshold distance, ensuring that images are loaded by the time the user scrolls to them while also saving on image download.

Recap

In this article, I explored the first part of web performance optimization, focusing on network and assets optimization. I discussed the importance of using CDNs to cache resources closer to users, the differences between HTTP/1.1, HTTP/2, and HTTP/3, and how each protocol impacts performance. I also covered techniques for reducing static assets, such as HTML, JS, CSS, and image compression, and the significance of HTTP caching and service worker caching. Additionally, I examined resource hints like preload and prefetch, and the benefits of lazy loading to improve page performance. By implementing these strategies, you can significantly enhance the speed and efficiency of your web applications.