mtrpcic.net Writing software since 1932

16Feb/1130

Hashbang URIs – They’re not as bad as you think; really.

The internet has been abuzz since the suite of Gawker sites switched from regular old links, to Fragment URIs.  Long story short, a JavaScript bug caused all content, on all of their sites, to be inaccessible to the entire world.  Ouch, I know.  But just because one developer made a farce of a piece of technology doesn't make it as horrendous as everyone's saying it is.

A Brief Introduction

So what's the big guffaw all about?  Wikipedia uses Fragment Identifiers to jump around the page, don't they?  Well yeah, but that's just static content.  The big dilemma is over the loading of remote content when a new fragment identifier is loaded.  It's a nice way to load remote content, while still allowing the user to use the "Back" button properly.  Why would anyone want to do this, you might ask?  Well, let me tell you.

Some Numbers

When loading any website, there's two things you pull from the server.  The Markup (HTML), which structures the page, and your helpers (CSS and JavaScript) to pretty the beast up, and make him shine.  I took a look at 4 sites that are kind of a big deal, to see what was being loaded every time I hit their homepage:

Reddit.com
  HTML.: 37.6 KB
  CSS..: 20.2 KB
  JS...: 47.7 KB
  Total: 105.5 KB, 35.6% Content

Google.com
  HTML.: 12.2 KB
  CSS..: 0 KB (Included in HTML)
  JS...: 73.1 KB
  Total: 85.3 KB, 14.3% Content

Youtube
  HTML.: 20.9 KB
  CSS..: 1.3 KB
  JS...: 17.1 KB
  Total: 39.3 KB, 53.2% Content

Github.com
  HTML.: 7.3 KB
  CSS..: 61.5 KB
  JS...: 127.9 KB
  Total: 196.2 KB, 3.7% Content

It looks like nearly every site is loading a lot of stuff other than content, especially Github.  Every time a request hits the server, it has to handle this additional content if it's not cached.  If that's the case taking away the constant loading of the JavaScript and CSS can significantly decrease bandwidth and server load.  In addition to  not needing to fetch the external resources, you also cut down on the amount of HTML required to be loaded with every request, as the DOM structure is only loaded once, and only the content needs to be loaded with every request. In the case of Reddit.com, the majority of HTML loaded is DOM structure and static content, with only 5.9 KB of the 37.6 KB being real, interesting content.  The example below is a rudimentary way to accomplish this, with very little overhead in cost, time, or complexity.

Server Side

There are three things you need to do to make this work, and that's it.  First, make all internal links begin with the root of your site, sans the domain name.  We want things that "/look/like/this".  Next, modify all of your content and views to host the majority of your markup in a Layout of some sort.  Basically, your controller action should only return a block of content that will be placed into the body tag of the layout.  Finally, modify your controller actions to return the content sans layout when AJAX comes calling.  In rails, it would be as simple as:

class WidgetController < ApplicationController
  def index
    @widgets = Widget.all
    respond_to do |wants|
      wants.html {}
      wants.js {render :layout => false}
    end
  end
end

That's it.  Now, when AJAX hits any of your existing controller actions, they'll return the content without the layout.

Client Side

There are two things to do here.  First, you're going to intercept the click of every "a" tag on your site.  This is easy with jQuery:

$("a").live("click", function(event){
    var href = $(this).attr("href");
    if(href[0] == "/"){
        event.preventDefault();
        window.location.hash = "#!" + href;
    }
});

The code block above checks to see if the link goes to a local resource, and if so, intercepts it, stops the link from acting, and changes the Resource Identifier instead.  Now, using a library like PathJS, listen for route changes and act accordingly.  With pathJS, it's as simple as:

Path.default(function(){
    $.get(window.location.hash.replace("#!", ""), function(data){
        $("#contents").html(data);
    });
});

Path.listen();

Now, whenever a local resource link is clicked, it will be picked up and routed to the default function.  This function does an AJAX call to get new content, which is returned without the layout thanks to our controller action changes, and injects it into the DOM.  Fast and easy.  If a user hits your site with JavaScript disabled, the links will never be intercepted, and will go to your regular routes, returned with the layout intact.

But what about...

There are a few things people complain about when confronted with as simple a solution as this:

It doesn't make use of the HTML5 History API!

The example above does not, but the onus is on you as the application developer to build in support for that.  As time progresses, libraries like PathJS will be updated  to integrate with new technologies, but the HTML5 spec is ever changing, and the majority of browser market share doesn't support a lot of these new technologies.

If I have JavaScript enabled, and send my friend (who has JavaScript disabled) a "Hashbang" URI, he won't load the content!

You're right.  This is the primary drawback of an approach like this, and you'll need to figure out if that's acceptable for you.  Are you building a website, or a web application?  Will users ever share links?  What if you're developing an Admin Panel for an internal company tool, an authentication-locked wiki system, or any application that relies on JavaScript to function (image editing, data plotting, etc)?  In those cases, users won't be sharing links, and it's a moot point.  On the other end of the spectrum, you could be building a website, not an application, where users can often share links.  Well, what is your target audience?  Are they likely to have JavaScript disabled?  Is the edge-case of a broken shared link worth the benefits an approach like this provides?  These are all questions that you and your development team need to answer before making a decision.

Won't browser caching reduce server load already?

Yes and no.  In a perfect world, that would be the case.  However, there are a lot of things that can cause the server to send the additional contents on every request, such as  improperly set Headers, browsers with caching turned off/unsupported, clients that don't respect server headers, or any of a multitude of other possible scenarios.  In cases like these, a solution like "Hashbang URIs" makes sense.

Edit: In addition to the cases above, which are unlikely in the best of scenarios (as was pointed out to me by friends and strangers alike), some browsers have a very low cache limit, namely the mobile browser market.

Conclusion

No, it's not perfect.  No, it's not a solution for everyone. Yes, it has drawbacks and downsides.  So what?  It's pushing the boundaries of the browser, and treating the web as an Application Platform, rather than a dumb terminal for reading content and data entry.  The "Hashbang URI" makes your application feel like an application, instead of a website.  The drawbacks are things that need to be weighed against the benefits in your unique case, and the decision will be different for everyone.

Just because it doesn't make sense for you to use it, doesn't mean it's a stupid idea.  You are not the center of the Universe.

Google.com
HTML - 12.2 KB
CSS - 0 KB (Included in HTML)
JS - 73.1 asKB
Comments (30) Trackbacks (7)
  1. The issue is that Google proposed and supports a pretty bad implementation of this. Why use a hashbang that really sucks compared to a much easier and cleaner alternative.

    Imagine this url:
    http://foo.com/#!/some/page

    Why would you want that rewritten to this url?:
    http://foo.com/?_escaped_fragment=/some/page

    When you can have this url:
    http://foo.com/some/page

    Which gets decorated by Javascript to be:
    http://foo.com/#/some/page

    And ajaxified? It’s so much cleaner to have the markup be “crawlable” using sane and clean urls that represent the structure of your site and use Javascript to add ajax features on top of it.

    Doing it the other way around IS stupid – but only when you compare it to the better alternative.

    My hope is that people don’t actually implement Google’s #! suggestion and just build web pages that don’t require such trickery.

    • I understand, and agree, that what Google does with the “escaped fragment” stuff is a horrendous blight. I just used the “hashbang” route rather than the “hash only” route in my examples because many people want Google to index their remote/ajax content. Everything I posted, including the PathJS library stuff, works perfectly fine without the “!”. In fact, if you do the method discussed in my blog post, Google will see your regular links, and index them instead of the “remote versions”, which is actually the better solution, in my opinion.

      • Yea, we’re in agreement. I just thought the stir on the web was more about baking the hash into the url and reverse engineering it for crawlability – which I agree is the suboptimal solution.

      • Actually, the way Google does this allows an incredible power for developing and crawling websites. Have users interact with this:

        http://foo.com/#!/some/page

        But when a spider crawls the site use this:
        http://foo.com/?_escaped_fragment=/some/page

        and have the page serve up a straight up html implementation optimized view showing whatever SEO code you want.

        • Google doesn’t like sites that show users and Googlebot different content and have reduced sites’ search ranking for that reason.

          No one knows or can predict what factors Google ranks (and they change over time). So we should make these architecture decisions so the URLs are the same for bot and human.

          Just in case.

  2. It really is causing mass breakage across the web.

    No longer do the URLs you expose to your users represent a resource as they have since the dawn of the web. No they simply point to a JS application which knows how to load the actual resource. Sounds like a pedantic distinction, but its really not. Many common workflows are now totally busted, the link sharing one you mentioned is one, but others too:

    curl ‘http://twitter.com/#!/vitriolix/status/37953022277455872′

    This now fails to return the actual content entirely.

    Now try this, want to follow a twitter user via the resources rss feed? You used to be able to just submit ‘http://twitter.com/vitriolix’ to Google Reader and it would scrape out the feed link for you, try submitting ‘http://twitter.com/#!/vitriolix’ and Reader has no way to know what to do.

    This is major, major breakage.

    • It’s not breaking the web, it’s changing the web. It’s allowing us to treat Web Applications the same way we would treat any other application. Yes, your URLs now point to what is essentially a JS application. That’s the point though. Now you can write your consumption/creation services purely as an API, and provide it to all platforms; mobile, browser, what have you. Just because something is a new paradigm doesn’t mean it’s broken, it just means it’s different.

      • Web applications were already treated like others. The whole REST movement was about allowing script-heavy web apps to be accessible to standard HTTP methods.

        This Hashbang nonsense flies in the face of REST. REST was a good thing, making things more readable, more accessible, more reusable. Hashbangs are not – they’re a selfish thing by the owners of sites too lazy to figure out how to use REST instead. They provide nothing over REST, add a lot of complexity to applications and browsers/downloaders, and generally spit in the face of the HTTP spec.

    • I completely agree.
      The web is URIs + Links.

      By using these javascript entrypoints we are transforming the web in a big SOAP-Like (à la web-service) RPC based mechanism. Just the opposite of what a REST-oriented architecture should be.

  3. I’ve worked on a project where we used used a similar method to this successfully, the main problem was people sharing the links – we didn’t know about the special Google tricks so I doubt I’d shared links or links in Twitter and blog posts was helping our page rank.

    I wouldn’t do this method for an entire web site again. It’s just not how the internet works and confuses users. I’d use this for tabs or in a rich content exploring iPad web app. It’s a lot of effort for little gain IMHO.

    Also you left out the fact that the js and css is most likely cached and some will come from CDN. Which means your argument that sites that load a lot of js and css should use hashbangs is not,very strong

    • I’ve made an edit to the post that talks about how that is a weak, and perhaps moot point. The biggest strength to that argument is that Mobile browsers have little to no cache space (I believe Mobile Safari actually has none), and mobile sites that target them can make great use of this style of “client side routing” to reduce bandwidth for themselves, and lighten the data load on their users.

      I’m also in agreement with you on your point about not using this for an entire site. This kind of approach doesn’t work well for everyone, and especially so for something that is clean-cut defined as a “Web Site”. It’s more of an approach to provide your users with a “Web Application”. One of the best features of it is the ability to wrap your AJAX content calls with fancy transitions (jQuery fade/slide, etc), to give an extremely rich UI to your users with, again, very little overhead in development costs.

  4. I too am not a center of the universe, that’s for sure. But I have to disagree with this practice, too. What twitter and gawker did is a nice attempt for innovation and change, which I, as a JavaScript guy, welcome. But the web is not ready for that, yet. For now, it’s a breakage. This is a good practice for Gmail, Google Reader or whatnot, but not for blogs, not even for Twitter I think. Now everytime someone posts a link like http://twitter.com/#!/CNN or lifehacker.com/#!5759186, people with JavaScript turned off (F12 in Opera, NoScript in Firefox, etc) or people on handsets with Opera Mini or some lower-end mobile browsers will not be able to get to any content by following it. Even your system for highlighting links breaks on it, which of course may be fixed easily, but isn’t that ironic? ;-)

  5. You are missing the point. Everything you describe can be done without hashbang/fragments. There is nothing to be gained by using them except breaking the javascript-disabled web users.

    • Gotta agree with See. The benefits you explain are to do with using Ajax for loading in page content and not with the hashbang. However the problems inherited with the hashbang are still there and can be avoided easily.

      For an article that explains how to achieve all the benefits of a RIA without inheriting the problems associated with the hashbang see:
      https://github.com/balupton/History.js/wiki/Intelligent-State-Handling

      So in conclusion the hashbang doesn’t offer any benefits, Rich Interent Applications do (which are the benefits you described). Those benefits can be achieved without using the hashbang and without inheriting the hashbang’s problems.

      • Don’t you ever stop promoting your scripts.. ? you must search google everyday.. Your script is overly complicated. For a website with 200+ pages this script is not optimal to implement.. Is it?

  6. Moreover and simpler, it breaks HTTP.

    If one requests http://uri.com/#!/path/to/resource through say curl, one will receive a content-type, a response code, most likely 200 or 302, and a variety of other headers.

    The problem with this is that the response is no longer representative of the content. I will receive response 302 Found for something that is completely different than the actual content. This is where the problem lies to me.

    Furthermore, the #! hashbang is now used as a more and more common path fragment but, once again, it doesn’t exist in the protocol: http://www.ietf.org/rfc/rfc2396.txt

    Ask yourself where are your referrer as well? Something Gawker seemed to have learnt the hard way :)

    On the other hand, I see the utility, don’t agree with the implementation that is broken by design, but I can see its utility. And I can’t wait for pushState and popState to be common enough so people don’t have to do such silly trickery.

  7. Mobile Safari nowadays *does* cache even pretty big stylesheets and javascript if normal link and script tags are used. Please test again before spreading outdated information.

    Another drawback of hash URLS: Initial page load is slower as first the partial page is sent to browser, javascript examines the hash part and then requests the real content. So two round-trips instead of one.

    Plus history back/forward compatibility can also be achieved without hash URLs. The only thing lost is the illusion of having an URL you can send to friends with state preserved.

    Summary: Let’s petition IE9 and Opera to support pushState/popState (Chrome, Safari and Firefox 4 already do) and skip the kludge know as hash URLs until wider adoption of “modern” browsers.

    • @cschneid: The two round trip theory is false. Notice how twitter accomplishes it. Click a hashbang within a page and an ajax call is made, but if you visit that exact same URL (hashband and everything), the dynamic page is loaded initially. I assume its as simple as using mod_rewrite to load full content from the start of the hashbang example.

  8. Good article, but I have to chime in with the crowd, especially cschneid. You seemed to skim over HTML5 History — pushState() and onpopstate — but that’s the real answer to all this.

    The issue is, we want the app to feel app-like, i.e. fast, responsive, etc., but we also don’t want to break the web (HTTP/REST/cURL/etc., doesn’t work w/ JS disabled, no Referer header). HTML5 History gets you that best of both worlds.

    You say the majority of browser market share doesn’t support HTML5 History, but the majority of browser market share is IE6-8. On the other hand, Firefox 4, Chrome, Safari, iPhone, iPad, Android all support HTML5 History.

    So what it boils down to is: do you break the web in order to make your app feel app-like in IE6-8? Or do you sacrifice the app-like feel in IE6-8 and progressively enhance the experience for your users in modern browsers, thereby keeping the web intact? I’m a fan of the latter approach.

    http://github.com/ is a great example of this — navigate any source tree there in Chrome/Safari/etc. and see the nice transitions between folders. IE6-8 see page refreshes. Everywhere, the URLs remain nice and clean and linkable. Win-win.

    Your mileage may vary, of course — but to me, HTML5 History is clearly the future, and IE6-8 are history (pun intended) and shouldn’t hold us back.

    A couple of caveats: IE9 also won’t have HTML5 History, and unfortunately Firefox 3.6 doesn’t at the moment.

  9. An argument of this not being restful in the browser is a poor point. REST isn’t supported by most browsers, as most browsers only support GET and POST.

    Routing on the client is a great thing, and more effort should be put into supporting it properly. As with anything browser related, when new technologies emerge in new browsers, we rush to implement these new features. However, we are left with a major population still on old browsers, such as IE 6, that we may still be demanded to support. Therefore we have to come up with creative ways to support these features in these legacy browsers. A couple of examples that come to mind of this happening in the past are the emergence of AJAX and the use of ActiveX in IE 6 to support it and the use of CSS Layout and the many many hacks needed to support IE 5/6.

    As far as SEO goes. Any web crawler that can’t see ajax content is out dated. We have the capability to run javascript on the server now and should take advantage of this. We must also recognize that a significant portion of content these days is not rendered on page load and compensate for this.

  10. Expecting browsers to have JavaScript disabled these days is plain silly. JavaScript is an integral part of the working web.

  11. I’ve been browsing in firefox with NoScript for at least 3 years blocking 99% of javascript except when I want to run it.

    It’s worked, but by letting programmers get influential, the web is now an application. Which means I have to program code in order to get on the effing web?

    What happened to MARKUP in HTML?

    I share links – most of which now FAIL with hashbangs.

    Stop being so pleased with yourself for breaking something that worked better than anything invented ever before. Saying it is new and better is at best elitist and at worst masturbation.

    • The reality is that businesses want more responsive websites. Developers are forced to create hacks that support these objectives. Your choice is to name call the “elitists” and not accept this new reality, or you can be part of the solution and improve the technology so we don’t have to use hacky hashbangs. It’s simple supply and demand. Who’s the real elitist here?

    • The web has been about programming since cgi programs were introduced. Like it or not, the rest of us are working hard for you to have a good experience on the web.

      Do you actually remember what the internet was like 15 years ago? It wasn’t a very good experience compared to today. Why should developers double their efforts just because you are paranoid about javascript (something which you very clearly don’t understand).

      Maybe more people should just stick to wordpress blogs…

  12. @PaulZ nobody said it was better… like everything there is a time and a place for everything…

    Im in the middle of making a internal Html5 ipad app, that HAS to have that app feel. using hashbang urls with PathJS allows us to dynamically load content. without moving away from the page, giving that app feel our client wants.

    The links also preserve state too – yes its two requests- but the main page is loaded then the hash handled. it being internal, we don’t worry too much as the multiple requests …

  13. Hashbangs (#!) and (_escaped_fragment_) are specificly about supplying search engines with static versions of the page, and allowing the search engine to understand the equilvence of the #! links and the static content.

    The end user should -never-see the “escaped_fragment” url, so the idea its messy is not really relievant. It could be _sussage_ for all it matters, its just a string to identify that what follows is a…umm…escaped fragment.

    History tokens/fragments by themselfs, meanwhile, are a perfectly fine way to exchange and store states without needing cookies or servers to know what your looking at.
    If you really need universal links in your app that work even for people without javascript, then you can do it….but your app would need to supply a “?=” style link, which could be auto-converted to a # when people have javascript on.

    Honestly though, the idea of somehow the net was “better” before apps like GoogleMaps or Gmail is just nuts.

    Streaming data makes a lot of sense for many different use’s, and HTML as a layout/presentation layer can work just dandy with data being streamed too it.

    “Which means I have to program code in order to get on the effing web?”

    No, it doesnt. I dont even know what you mean by that statement.

  14. With this, we have mad a function for click function,, but, I see a problem… suppose people go to localhost/page instead of localhost/#!/page. Now there we need a system or a function that checvks links and place #! in between whereever requires otherwsie localhost/page will some something else mof show error instead of loading#!/page
    How to place that listener and state saving of this?

  15. Hello there! I know this is kind of off topic but I was wondering which blog platform are you using for
    this website? I’m getting tired of WordPress because I’ve had issues with hackers and I’m looking at options for another platform. I would be fantastic if you could point me in the direction of a good platform.


Leave a comment

(required)