2020-02-02

The RikVerse rebuild: The client is not the server

tl;dr: In which Rik investigates how to stop visitors to The RikVerse website receiving the dread “404 Not Found” error

The user encounters a 404 not found error

This is the third in a series of blog posts detailing my journey to rebuild my poetry website - The RikVerse - using Svelte.js, Page.js and Tailwind CSS. Other posts in this series:

The code developed in this blogpost can be found here on GitHub.

Where were we?

In the previous post I started to build the new RikVerse site using Svelte site builder, Page.js routing, Tailwind CSS and PostCSS pre-processing. At the end of that post I discovered that Page.js is a client-side router, which led to a major issue: how do we deal with users who choose to refresh a RikVerse page in their browser, or arrive at the site on a non-landing-page route?

Also: a quick reminder of my twelve(ish) goals for this site rebuild:

[New] Avoid 404 Page Not Found errors at all costs!
The landing page should display a randomly selected poem
Add in cookie consent (meeting UK ICO guidance for explicit opt-in)
Get rid of the database!
Index of all the poems available to read
Tag filtering functionality on the poems index page
Easy for the user to access a poem’s associated media files (images, audio, video)
Make each publication page that book’s keystone page
Let people read the books in-site
Keep the donate button; make it More Fun To Use
Add in poetry-and-writing-related blog posts
~~[Done] Simpler, more minimal design; better fonts~~
~~[Done] Navigation needs to be massively simplified~~

So, what’s the best way to avoid “404 Page Not Found” errors?

404 errors happen when a browser asks a server for a page (or resource, like an image), but that page doesn’t exist on the server. All the server can do is tell the browser it doesn’t have the goods, which it does by returning an HTTP 404 error response.

Sometimes the server will be kind enough to include a custom error page in the response. Sometimes it won’t bother. Servers, huh?

There’s two main ways we can stop users receiving the dread 404:

Make sure there is a page at every possible end point, ready and keen to be returned to the browser by the server (the static site solution); or
Trick the server into returning something other than a 404 response, using various rules to redirect the server’s search to an end point we know it can return (the URL rewrite solution).

Neither solution is perfect; you can guarantee that some user, somewhere in the world, will find a way to request a page that doesn’t exist and at the same time manages to confuse the rewrite rules into sending an unwanted 404 response. In those cases, all you can do is make sure the response includes an entertaining error page.

I decided to look at both solutions, to see if either of them could help handle the situation Svelte and Page.js had cooked up for my new site. But before I report on my findings, let’s ask a more profound question:

Why do I want/need routes in the new RikVerse website?

The old RikVerse website doesn’t have a routing problem. Every page on that site is served from the root URL - .../index.html. Hover the cursor over any of the links in the site, and you will see that the URL looks different - .../index.html?display=xquote - but the page being served remains the same.

There’s two ways to handle navigation from a single page: the #hash trick; or the ?query trick. The old RikVerse site uses the ?query trick, where the PHP backend will check to see if the requested URL includes a query string (?display=xquote) and adjusts the response it sends back accordingly.

There’s nothing wrong with this approach - it works!

No. The real reason why I want routing on the new RikVerse site is because … I want clean URLs - no #hashes or ?queries in the address bar. Just a plain, simple /path/to/a/poem/or/article URL in the address bar.

It looks prettier, okay?

Let’s make lots of index pages - the “static site” approach

I Googled “best static site generators 2019” and got this blog post in response. As the article says: “Essentially, static site generators take a set of source files, manipulate them, and then generate a set of output files which is the static site itself.”

A selection of static site generator landing pages

Most of the site generators mentioned on that list use the classic approach of taking a set of source files, slotting their content into templates using some sort of templating engine, and spitting out the resulting static html pages into a destination folder ready for deployment.

I’ve played with a few of these generators (this blog site runs on Hexo, second in that list) - but they don’t offer solutions to the problem I’m facing. Svelte - as far as I can tell - doesn’t use a templating engine; Svelte files are their own templates.

Two of the offered solutions - Gatsby (using React), and Nuxt.js (using Vue) - seemed closer to what I was looking for.

I checked through the Gatsby repository on GitHub. I understood almost nothing there, but the impression I got from the exercise was that Gatsby is a very, very refined version of a classic static site generator. It takes in files (often in Markdown format), manipulates them in various ways, and spits out a set of React-based html pages ready for deployment.

React sites are effectively (Rik says, generalizing madly) single page applications which hide all their React goodness in associated Javascript files. It’s not difficult to serve a (simpler) React site from an AWS S3 bucket.

… But none of this helps me work out how to generate static pages from my site’s Svelte files.

My examination of the NuxtJS codebase on GitHub was, however, more useful. The main package.json file for that repository included a dev dependency for puppeteer-core.

Now Puppeteer is a wondrous piece of tech! It’s basically a web browser that we can control and manipulate through code. Node code. Javascript code. As the site says, we can use Puppeteer to “Crawl a Single-Page Application and generate pre-rendered content” - this is heady (or headless) stuff indeed!

I quickly checked Sapper‘s GitHub codebase. It, too, lists Puppeteer as a dev dependency in its package.json file. Which strongly suggests to me that if I want to generate a complete, static site from my Svelte code, then I’ll have to implement a terminal-based toolchain which uses Puppeteer to scrape my locally-hosted site, so it can then output index.html files for every internal link it finds.

… That, as they say, feels like a lot of work just to 404-error proof a small, not-very-important website.

Let’s make the server lie to the browser - the “URL rewrite” approach

Web servers (I was surprised to discover a few years back) are not wondrous yet ineffable constructions of chrome and flashing lights that sit in pristine rows in vast concrete caverns deep underground where no mortal human may walk unbowed.

A web server is, in fact, just another piece of software which performs its strange networking stuff on a computer. And - like any software - it can be manipulated to run in various different ways.

Each different type of web server - for there are many species of the beast - has different ways of being manipulated. If we manipulate a server in the wrong way, it will collapse and die - for servers can be delicate and tempramental beasts!

One kind of manipulation common to most species of web server is URL rewriting. This is where the server receives a request for an end point, checks for any rules associated with that type of end point (often involving some form of regex hand-wavium) and, when a rule identifies a qualifying URL path, rewrites that offending path to something it finds a lot more acceptable.

Apache uses mod_rewrite rules to weave this magic
Nginx has nginx rewrite rules
Microsoft IIS uses url-rewrite to perform the trick
AWS CloudFront supplies us with LinkFormat rules
Cloudflare gives us Page Rules
… and so on, etc.

What these solutions all have in common is they assume we have access to the remote production server in some way - either directly, or through a host-provided cloud console.

No access == no rules!

The RikVerse lives on a third party shared hosting server. I do not have access to the shared server, and my third party hosting supplier does not offer me a way to add URL rewrite rules to it via their console. And I’m not about to change hosts (or self-host) just so I can Have Fun With URL Rewrites!

Sadly, URL rewrites are not the solution I am looking for (in this instance).

… And the RikVerse “404 Not Found” solution is?

None of the above!

Well … actually it’s a mixed approach involving static site generation and something I’ve not yet mentioned: web page redirection - specifically a Javascript-mediated redirection (as <meta http-equiv="refresh" content="N; URL=other-web-address"> is no longer considered “cool”).

To achieve this solution, I needed to:

Create a file containing metadata for each of the affected pages
Create a Node.js terminal-based tool which uses that data to generate appropriate directories and index.html files in the ./public directory
Update ./src/App.svelte (which is the target for all these page redirects) so that it can handle URLs containing ?query strings and take appropriate action to load the correct page.

And - as a bonus - I also updated the other .svelte files to make use of the page metadata file.

Step 1: This is the code for the page metadata file ./src/data/pageData.mjs - note that the file needs to have an .mjs extension so that I can get terminal-based Node to treat it like a proper ES6 module (as opposed to Node’s preferred CommonJS version of modules).

const pageData = [
    {
        id: 'about',
        directory: './public/about',
        title: 'RikVerse Author page',
        description: 'Details about Rik Roots',
    }, 
    {
        id: 'blog',
        directory: './public/blog',
        title: 'RikVerse blog index',
        description: 'Blog post listings and links',
    }, 

    [... other page objects go here]

    {
        id: 'publications',
        directory: './public/publications',
        title: 'RikVerse Publications index page',
        description: 'Listing books written by Rik Roots',
    } 
];

export default pageData;

Step 2: The file that Node runs - ./pageBuilder.mjs - is a little bit more complicated because it gets to play with creating directories and files and stuff. Also: Promises!

console.log('page builder called');

import fs from 'fs';
import pageData from './src/data/pageData.mjs';

// index.html template
const buildIndexFile = (data) => {

    // This is the all-important redirect part of the page
    let indexfileLocation = '`${location.origin}/?p=${location.pathname.substring(1)}`';

    return `<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">

    <title>${data.title}</title>

    <!-- to prevent caching -->
    <meta http-equiv="cache-control" content="max-age=0">
    <meta http-equiv="cache-control" content="no-cache">
    <meta http-equiv="expires" content="-1">
    <meta http-equiv="expires" content="Tue, 01 Jan 1980 11:00:00 GMT">
    <meta http-equiv="pragma" content="no-cache">

    <!-- general page metadata -->
    <meta name="author" content="Rik Roots">
    <meta name="description" content="${data.title} - ${data.description}">

    <!-- Facebook metadata -->

    <!-- Twitter metadata -->

</head>
<body>
    <script>
        // The page redirection is triggered here
        location.href = ${indexfileLocation};
    </script>
</body>
</html>`;
};

// This function creates the directory if it doesn't already exist
const checkDirectory = (dir) => {

    return new Promise((resolve, reject) => {

        fs.access(dir, fs.constants.F_OK, (err) => {

            if (err) fs.mkdir(dir, { recursive: false }, (err) => {

                if (err) reject(`failed to create ${dir}`);
                else resolve(`created ${dir}`);
            });

            else resolve(`${dir} already exists and can access`);
        });
    });
};

// This function generates index pages
const writeIndexFile = (data) => {

    return new Promise((resolve, reject) => {

        fs.open(`${data.directory}/index.html`, 'wx', (fileError, fd) => {

            if (fileError && fileError.code !== 'EEXIST') reject(`error for ${data.directory}/index.html - ${fileError.code}, ${fileError.message}`);

            fs.writeFile(`${data.directory}/index.html`, buildIndexFile(data), 'utf8', (writeError) => {

                if (writeError) reject(`failed to write ${data.directory}/index.html file: ${writeError.code}, ${writeError.message}`);

                else resolve(`${data.directory}/index.html file updated`)
            });
         });
    });
};

// Process the router base pages' index files
pageData.forEach(page => {

    checkDirectory(page.directory)
    .then(res => writeIndexFile(page))
    .then(res => console.log(res))
    .catch(err => console.log(err));
});

console.log('\npage builder completed');

Step 3: Updating ./src/App.svelte to manage the redirects.

<script>
    // Svelte variables
    let page, params;

    // Page.js routing functionality
    import {router, startRouter, routes } from './routes.js';

    // Build Page.js routes
    routes.forEach(route => {

        router(
            route.path, 

            (ctx, next) => {
                params = ctx.params;
                next();
            },

            () => page = route.component
        );
    });

    // Start the Page.js router and watch for changes
    startRouter();

    import Navigation from './components/Navigation.svelte';
    import Footer from './components/Footer.svelte';

    // this handles external links into the site
    // - because the site is essentially a single page app
    // - the server will give a '404' not found error if browser tries to load 
    //       http://site.com/blog
    // - thus external links should be in the form 
    //       http://site.com/?p=blog
    let loc = window.location;
    if (loc.search) {

        let searchParams = new URLSearchParams(loc.search.substring(1)),
            redirect = searchParams.get('p');

        // Page.js router is listening for anchor clicks
        // - so use it to trigger a redirect to the correct path
        // - create anchor; add it to the DOM; click it
        let a = document.createElement('a');
        a.href = `/${redirect}`;

        document.body.appendChild(a);
        a.click();
    }
</script>

To run the page builder, we need to go to our terminal and invoke the following command. (Be aware: this won’t work for Node versions < 12).

1	$> node --experimental-modules pageBuilder.mjs

… this is, of course, tedious. So I adapted the project’s ./package.json file to automate the process a bit:

{
  "name": "rikverse-poetry-site",
  "scripts": {
    "build": "node --experimental-modules pageBuilder.mjs && rollup -c",
    "dev": "rollup -c -w",
    "start": "sirv public --single"
  },
  "devDependencies": {
    [... stuff]
  },
  "dependencies": {
    [... more stuff]
  }
}

Now everything will happen when I invoke $> yarn build in the terminal command line.

Bonus code, and next steps

We have metadata for our route pages. It makes sense to me to use that metadata. This is nice and simple to achieve using Svelte. Using ./src/pages/About.svelte as an example:

<script>
    import pageData from '../data/pageData.mjs';

    // Get this page's metadata - title, description, etc.
    let pageMetadata = pageData.filter(item => item.id === 'about')[0];
</script>

<style></style>

<svelte:head>
    <title>{pageMetadata.title}</title>
</svelte:head>

<h3>This will be the Author biography page</h3>

Now when a user clicks on the About Rik link in the navigation bar, not only will they see my biographical details (remember: “Every ass loves to hear himself bray”) but the browser tab will update to let the user know that they’re reading all about me!

For my next post, I shall make the new RikVerse site a bit prettier, and setup the blog index and blog post stuff.

Rik Codes

An occasional series of posts on coding stuff