SEO might be the biggest reason people are afraid to take advantage of modern javascript frameworks. Google and other search engines cannot execute complex javascript when they crawl your website. Instead of your well-crafted website, Google will just see an empty page. What Google can’t see, it can’t index, so your site won’t be listed in Google. Have no fear, if you want to be in Google – and most of us do — you have a few options.

Render templates on the server

Use your favorite server-side framework to render the page the first time. Then use your favorite client-side framework for all future navigation. Your users will get the initial page without waiting on any javascript to load. After the initial load, the javascript will take care of everything without having to wait on the server. It’s the best of both worlds!

Of course there is one major drawback. You basically have to code your entire website twice. Depending on your frameworks, you may be able to reuse your templates and some code. At first you think it won’t be too bad. However, reusing templates and model classes is a little trickier than it seems.

For every difficulty there is a fix, but soon you are dying from the stings of a thousand little differences. Differences between the rendered versions from the server and client can confuse users and make every bug twice as hard to track down.

Render noscript tags on the server

Instead of going all out and rendering everything on the server, you just can render the basic structure of your important pages. If you stick that basic content in the <noscript> tag, Google will see it, but your normal users will just see the javascript version.

The nice thing about this approach is that you don’t have to worry about making your server-generated html identical to your client-generated html. It is important that you keep the content the same –or else you have taken a wrong turn into the black hat art of cloaking and you might find yourself banned from Google. However, you are safe to ignore the ui chrome, auxiliary links, stats, and sidebars that are nice for humans but of no use to search engines.

The drawbacks are pretty much the same as they were for option one, just tamer since you are duplicating some code but not all of it. It will start out easy as your html will be simple and you will only have a couple different pages that need to be crawlable. Over time, the scope is sure to expand. As the site changes, you’ll still have to maintain both the server-side code and client-side code.

Use PhantomJS in realtime

If you don’t want the dual maintenance, then there’s another path you can take. Since Google can’t run the javascript, you can just run it for them.

PhantomJS is a “headless browser”. That means that it is a web browser just like Firefox or Chrome. They only difference is it doesn’t actually display anything on the screen. Instead of interacting with your mouse and keyboard, you interact with PhantomJS programmaticly. You can use PhantomJS to open your web pages. It will run all the javascript and manipulate the DOM. Then, you can send the resulting DOM to Google instead of sending them the normal files that you send to users.

When you get a request from Google, you’ll short circuit your normal routing and launch PhantomJS instead. PhantomJS, in turn, will open your web page. Once the page is completely loaded and the javascript is done, capture the entire DOM with PhantomJS. This is a static version of your web page. It looks just like what your users see after all your javascript executes. Respond to Google’s request with the static version. They will be able to parse it without problem. (Google actually recommends this approach.)

Of course, nothing is as easy as it sounds. Integrating PhantomJS with your app will require some server work, and Phantom has this nasty habit of making things harder than they should be. PhantomJS has great power, but with great power comes great memory leaks and crashes with no explanation. Unfortunately, with great power does not come great speed. Launching Phantom, downloading and processing your page, running the javascript, and saving the result can take several seconds. Speed is a factor in Google’s ranking algorithm. Additionally, each PhantomJS instance can use a couple hundred MB of RAM and a good chuck of processing power. So, a few simultaneous requests can really strain your server.

Use PhantomJS and store snapshots

You can easily improve on the previous method using the oldest trick in the book: caching. Instead of waiting for Google to come calling before you fire off PhatomJS, you can do all the work in advance. You’ll need to open literally every page on your website one by one with PhantomJS. Then you’ll follow the process described above. However, instead of sending the result straight to Google, you’ll store it. When Google asks for the page, send them the page you have already pre-generated.

This takes care of the speed problem, and you can generate pages at a constant rate instead of having to respond to request surges. Of course, there’s added complexity to reach these gains, and you’ll have to store all those static pages. If you make site wide changes, you’ll need to regenerate all your pages. With a large site, that process could take a few days, even with a few dozen servers grinding away.

Use a third party service

If you want to use stored snapshots but don’t want to go through the trouble of setting PhantomJS and a cache, check out BromBone. It takes care of the messy part for you. They process your pages and store the rendered snapshots. When Google crawls your site, you just have to fetch the snapshot from BromBone and pass the page on to Google.

SEO is a bit of a challenge for client-side javascript powered websites. However, it is entirely doable.

  • btrager

    Here is an interesting project for prerendering javascript

  • Derek Brown

    I’ve been using to create crabwlable snapshots of my site. So far so good – HTML matches up with what I see in Chrome dev tools