SSR, Streaming, and CSS-in-JS

How does CSS-in-JS exactly work with Server-Side Rendering (SSR), and streaming SSR? Let’s take Next.js and Stitches as an example and dive into the details in this article. Note that this is not targeting a specific framework or library, but a general architecture discussion.

Say you have these two components to render a simple page, and you use CSS-in-JS to style the content:

// page.jsx
export default function Page () {
  return <h1>
    Hello, <User/>
  </h1>
}

// user.jsx
import { styled } from '@stitches/react'

const UsernameContainer = styled('span', {
  fontSize: '2rem',
  variants: {
    type: {
      admin: { color: 'red' }
    }
  }
})

function User() {
  const user = { name: 'John Doe', type: 'admin' }

  return <UsernameContainer type={user.type}>
    {user.name}
  </UsernameContainer>
}

The application renders “Hello, John Doe” as the heading, the name “John Doe” will be in red because it matches the type: 'admin' variant.

#Server-Side Rendering

During SSR, Next.js:

Renders the page to a string (ReactDOMServer.renderToString(<Page/>)). This gives us a result like:

<h1>Hello, <span className="stitches-xyz stitches-xyz-type-admin">John Doe</span></h1>

From step 1., Stitches knows which components were rendered, what styles were used and what classname/variant hashes (in our case, "stitches-xyz stitches-xyz-type-admin") were generated.
Next.js then renders the HTML “shell” element (page/_document.js):

<Html lang="en">
  <Head>
    <style id="stitches" dangerouslySetInnerHTML={{
      __html: stitches.getCssText()
    }}/>
  </Head>
  <body>
    <div id="_next">__PLACEHOLDER__</div>
  </body>
</Html>

Since Stitches already knows which CSS rules are used (as step 2.), it can then return it via getCssText(). And the rendered HTML string will look like:

<html lang="en">
  <head>
    <style id="stitches">
      .stitches-xyz {
        font-size: 2rem;
      }
      .stitches-xyz-type-admin {
        color: red;
      }
    </style>
  </head>
  <body>
    <div id="_next">__PLACEHOLDER__</div>
  </body>
</html>

Then, Next.js replaces __PLACEHOLDER__ above with the rendered app content from step 1.: <h1>Hello, <span className="stitches-xyz stitches-xyz-type-admin">John Doe</span></h1>. Now everything is finished and all styles are server-side rendered.
Send the result to the browser.

Similarly, if the user isn’t an admin, the output won’t contain "stitches-xyz-type-admin" and corresponding CSS rules.

This 2-pass rendering strategy ensures that all the needed information inside the page <body> will be collected, and will then be inserted into the page <head>.

#Streaming

Unlike the final step above which sends everything to the browser all together, "streaming" means the HTML will be intentionally sent to the client piece by piece. The client can render the received parts while the server is still working on the remaining parts.

With the same example, if we use Suspense to fetch user and enable streaming (ReactDOMServer.renderToReadableStream) in SSR:

// page.jsx
export default function Page () {
  return <h1>
    Hello, <Suspense fallback="loading...">
      <User/>
    </Suspense>
  </h1>
}

// user.jsx
import { styled } from '@stitches/react'

const UsernameContainer = styled('span', {
  fontSize: '2rem',
  variants: {
    type: {
      admin: { color: 'red' }
    }
  }
})

function User() {
  // Suspense-based data fetching.
  const user = readUser()

  return <UsernameContainer type={user.type}>
    {user.name}
  </UsernameContainer>
}

...then it becomes more tricky to render the correct styles.

To render all necessary styles inside <head>, we have to render the full content of <body> first. However, we need to send <head> before <body> in the HTML response. This results in a conflict in the order which makes it impossible to do streaming.

The solution is to give up the idea of putting all the styles in <head>:

Instead, we directly render the full page, which including the “shell” to the stream. And there is no <head> anymore:

ReactDOMServer.renderToReadableStream(
  <Html lang="en">
    <body>
      <div id="_next">
        <Page/>
      </div>
    </body>
  </Html>
)

Since we are doing streaming SSR, this first part will be rendered and sent to the browser immediately:

<html lang="en"><body><div id="_next"><h1>Hello,

Then, React hits the Suspense boundary. So it appends the fallback <div id="suspense-1-fallback">loading...</div> to the stream, and waits for the data to resolve. At this point the user can already see “Hello, loading...” showing on their screen because the browser now receives:

<html lang="en"><body><div id="_next"><h1>Hello, 
<div id="suspense-1-fallback">loading...</div>

(Waiting)
...finally, readUser() finishes and the Suspense boundary resolves. React can render the actual data, but also it needs to replace the fallback element with the data. However, the fallback is already sent to the browser. As a result, what React can do is to first append the content to the stream, with the hidden attribute:

<div hidden id="suspense-1-content">
  <span className="stitches-xyz stitches-xyz-type-admin">John Doe</span>
</div>

It needs to be hidden because otherwise, at this momenet both fallback and content will be displayed.

Do you remember that we are no longer injecting styles into <head>? Since we already rendered the resolved content, the CSS-in-JS library can know which styles are generated, and we can inject these styles as an inlined <style> tag into the stream too:

<style>
  .stitches-xyz {
    font-size: 2rem;
  }
  .stitches-xyz-type-admin {
    color: red;
  }
</style>

Next, React sents a small inlined script tag to hide the fallback and reveal the content (pseudo code):

<script>
const fallbackContainer = document.getElementById('suspense-1-fallback')
const contentContainer = document.getElementById('suspense-1-content')
swap(contentContainer, fallbackContainer)
remove(fallbackContainer)
makeVisible(contentContainer)
</script>

Then, the browser can immediately execute the script, and the content will be displayed and the fallback will be gone.

We render and append the remaining HTML to the stream: </h1></div></body></html>.

The key of streaming is step 3, 4, 5. The full HTML, when finished, will be like this:

<html lang="en"><body><div id="_next"><h1>Hello,

<div id="suspense-1-fallback">loading...</div>

<style>
  .stitches-xyz {
    font-size: 2rem;
  }
  .stitches-xyz-type-admin {
    color: red;
  }
</style>

<div hidden id="suspense-1-content">
  <span className="stitches-xyz stitches-xyz-type-admin">John Doe</span>
</div>

<script>
const fallbackContainer = document.getElementById('suspense-1-fallback')
const contentContainer = document.getElementById('suspense-1-content')
swap(contentContainer, fallbackContainer)
remove(fallbackContainer)
makeVisible(contentContainer)
</script>

</h1></div></body></html>

Which is totally valid. Browser receives and executes each chunk in order.

#Conclusion

As you might already noticed, there are a couple of downsides of this approach:

You’ll need to put <style> tags inside <body>, which is not recommended in the WHATWG spec.
CSS can’t be extracted out from the HTML (as .css files) and shared/cached between different pages.

But there are other solutions too. If we can know all the possible styles before rendering <Page/>, we can avoid the two problems above. The old way of writing fully static CSS files is still great. Or something in the middle, such as Tailwind CSS or similar tools, to collect necessary styles at build time so you also have some flexibility.

As always, we get more power by adding more dynamism, with a cost.