Get your hash — the bulletproof way

This is probably one of the things that everyone thinks they know how to do but many end up doing it wrong. After coming accross yet one more super fragile snippet of code for this, I decided a blog post was in order.

The problem

You want to remove the pound sign (#) from location.hash. For example, when the hash is "#foo", you want to get a string containing "foo". That’s really simple, right?

Tricky cases

What most developers seem to miss is that in modern, JavaScript-heavy applications, a hash can contain any unicode character. It doesn’t necessarily have to correspond to the value of an actual id attribute in the page. And even when it does, ID attributes can now contain almost any unicode character. Another thing sometimes forgotten is that there might be no hash in the page. Even in a URL that ends in #, location.hash is actually equal to "" (the empty string) and not "#".

Naive approaches

This one is the most recent, found in a book I was tech reviewing:

var hash = location.hash.match(/#(\w+)/)[1];

which has quite a few issues:

  • Returns wrong results when there is any non-latin or non-alphanumeric character in the hash. For example, for the hash #foo@o#bar$%huh hello, just "foo" would be returned.
  • Throws a TypeError when location.hash is empty, since .match() will return null.

Other variations of this pattern I’ve seen include using explicitly defined character classes instead of \w, adding an anchor (^) before the pound sign (which is an excellent idea for performance) and checking if .match() actually returned something before using its result. However, they usually also fall into at least one of the 2 aforementioned issues.

Another approach a friend of mine once used was this:

var hash = location.hash.split('#')[1];

This also has its issues, which are ironically less than the first one, even though it seems a far more naive approach.

  • With the same test hash, it would at least get the "foo@o" part, which means it only fails when the hash contains a pound sign
  • When there’s no hash, it doesn’t throw an error, although it returns undefined instead of the empty string.

Getting it right

The approach I usually use is far simpler than both of the above and probably looks too loose:

var hash = location.hash.substring(1);

However, let’s examine it a bit:

  • With our weird test hash, it actually returns the correct result: “foo@o#bar$%huh hello”
  • When no hash exists, it correctly returns the empty string

“But it assumes there’s a pound sign at the start of the string!” I almost hear some of you cry. Well, that could be a real concern, if we were dealing with an arbitrary string. In that case, we would have to check if there’s actually a pound sign first or if the string even exists. However, with location.hash the only case when that is not true, is when there is no hash. And we got that case covered. ;)

Edit: As pointed out in the comments, you may also use location.hash.slice(1) instead of substring. I kinda prefer it, since it’s 4 bytes shorter.

If however you’re obsessed with RegExps and want to do it with them no matter what, this is just as bulletproof and almost as short:

var hash = location.hash.replace(/^#/, '');

If for some reason (OCD?) you want to do it with .match() no matter what, you could do this:

var match = location.hash.match(/^#?(.*)$/)[1];

In that case, since the pound sign is optional, since .match() never returns null. And no, the pound sign never erroneously becomes part of the returned hash, because of the way regex engines work.

“This is too basic, what a waste of my time!”

Sorry for that. I know that for some of you, this is elementary. But the guy who wrote that book is very knowledgable (the book is really good, apart from that code snippet) so I thought this means there are many good developers out there who get this wrong, so this post was needed to be written. If you’re not one of them, you can take it as a compliment.

“Hey, you missed something too!”

In that case, I’d love to find out what it is, so please leave a comment! :)

  • http://www.jbird-design.com Jordan Burke

     While this is basic, it’s most certainly not a waste of my time. I swear, I learn something new every time I visit your blog, but not just new – relevant as well.  Thanks for that!

  • http://twitter.com/jeef3 Jeff K

     As soon as I read “The Problem” I thought “oh know, what’s wrong with my location.hash.substring(1) idea”

    • http://twitter.com/floriancargoet Florian Cargoet

       I had the exact same thought!

  • http://twitter.com/myerscb Corey Myers

    Do not apologize for being a smart cookie helping out all the noobies out there.  Anything you publish is pure gold as far as I’m concerned.

    • http://leaverou.me Lea Verou

      Awww, that’s one of the nicest things I’ve heard for my blog! Thank you! 

      • http://dan.cx/ Daniel15

        It’s definitely true! You’re a genius and all your blog posts are great. :)

  • http://profiles.google.com/bowdowntonoone Marek Stasikowski

    Didn’t quite get it with the crying about the assumption about a pound sign. What is that all about?

    • http://leaverou.me Lea Verou

      If you don’t get it, good for you. I get all kinds of weird questions so I try to answer them before they’re asked, just in case.

  • http://twitter.com/stephband stephband

    Hmm. It’s a bit of an aside, but it’s worth noting that if you’re working on a webapp, it’s likely you’ll be dealing with hash bangs, the constructs that allow google to index paths that you put in the hash. A case in point from new Twitter:

    https://twitter.com/#!/LeaVerou

    As a front end, I’m almost never interested in these, but I am usually interested if there is an id in the hash. Ids can’t start with numbers (http://www.w3.org/TR/CSS21/syndata.html#characters – which is referenced from the CSS3 recommendation as well, so I’m assuming it’s current. I may be wrong, please point me in the right direction…). Given all that, I find that something like 

    /^#([^0-9!].*)/

    …is a reasonable filter.

    • http://leaverou.me Lea Verou

      In HTML5 there are no such restrictions. The only restriction is that the id doesn’t contain whitespace: http://www.w3.org/TR/html5/elements.html#the-id-attribute

      • http://twitter.com/stephband stephband

        Interestingly that spec also says there are “no additional restrictions on the tokens authors can use in the class attribute”…

        http://www.w3.org/TR/html5/elements.html#classes

        But I tried classes beginning with numbers yesterday, and they didn’t work :(

        • http://mathiasbynens.be/ Mathias

          @stephband In the article, Lea links to http://mathiasbynens.be/notes/html5-id-class , which explains classnames (and IDs, as of HTML5) allow pretty much everything. You’ll just need to write your CSS a little differently than you would expect. See the section on “How about the CSS then?” in said article for more information.

    • http://leaverou.me Lea Verou

      In HTML5 there are no such restrictions. The only restriction is that the id doesn’t contain whitespace: http://www.w3.org/TR/html5/elements.html#the-id-attribute

  • http://cheeaun.com/ cheeaun

    I use `var hash = location.hash.slice(1);` but oh well… :)

    • http://leaverou.me Lea Verou

      Nice! Added in the post. Better than mine, since it’s shorter. 

  • Anonymous

     Ah, I would usually use 

    location.hash.replace(/^#/, ”);

    but slice(1)/substring(1) methods are actually much, much sexier.

  • skylamer

    i like your design, lea :)

  • Pingback: SWL-Projekt » Get your hash — the bulletproof way

  • http://twitter.com/segdeha Andrew Hedges

    I settled on the slice method in my post from 2008 on the subject: http://ahedg.es/15

    • http://leaverou.me Lea Verou

      Nice, thanks! Didn’t know that there were browser inconsistencies with the empty hash!

  • http://beben-koben.myopenid.com/ Beben Koben

    this web show posted master class level..WOW
    finally landing here :D
    Beben Koben si bloglang anu ganteng kalem tea m/

  • Steve

    Why do you call it a pound sign? £ is a pound and # is a hash?

    • http://leaverou.me Lea Verou

      Haha, yeah, I was informed about that by my British boyfriend. Turns out that it’s only called a “pound sign” in the US.

  • Nate Cavanaugh

    Or, if you want to have a good balance of grokability and size, go for:
    location.hash.substr(1)

    Not as short as slice, but I’d wager the method name is more understandable to a wider audience than knowing what slice may do (which is typically used with arrays and not strings). Of course, this is pretty debatable, but I’ll take the 1 extra char over slice for it’s clarity.

  • Pingback: Weniger Programmieren ist mehr | Die Welt ist gar nicht so.

  • Pingback: Get your hash — the bulletproof way « Mehmet Mutlu

  • http://jquery4u.com/ jQuery4u

    Thanks

  • Pingback: Using Hash Urls with jQuery Example | jQuery4U | No.1 Resource for jQuery Developers