3 posts on Web Almanac • Lea Verou

Parsel: A tiny, permissive CSS selector parser

7 September 2020 3 min read 0 comments Report broken page

I’ve posted before about my work for the Web Almanac this year. To make it easier to calculate the stats about CSS selectors, we looked to use an existing selector parser, but most were too big and/or had dependencies or didn’t account for all selectors we wanted to parse, and we’d need to write our own walk and specificity methods anyway. So I did what I usually do in these cases: I wrote my own!

You can find it here: https://projects.verou.me/parsel/

It not only parses CSS selectors, but also includes methods to walk the AST produced, as well as calculate specificity as an array and convert it to a number for easy comparison.

It is one of my first libraries released as an ES module, and there are instructions about both using it as a module, and as a global, for those who would rather not deal with ES modules yet, because convenient as ESM are, I wouldn’t want to exclude those less familiar with modern JS.

Please try it out and report any bugs! We plan to use it for Almanac stats in the next few days, so if you can spot bugs sooner rather than later, you can help that volunteer effort. I’m primarily interested in (realistic) valid selectors that are parsed incorrectly. I’m aware there are many invalid selectors that are parsed weirdly, but that’s not a focus (hence the “permissive” aspect, there are many invalid selectors it won’t throw on, and that’s by design to keep the code small, the logic simple, and the functionality future-proof).

How it works

If you’re just interested in using this selector parser, read no further. This section is about how the parser works, for those interested in this kind of thing. :)

I first started by writing a typical parser, with character-by-character gobbling and different modes, with code somewhat inspired by my familiarity with jsep. I quickly realized that was a more fragile approach for what I wanted to do, and would result in a much larger module. I also missed the ease and flexibility of doing things with regexes.

However, since CSS selectors include strings and parens that can be nested, parsing them with regexes is a fool’s errand. Nested structures are not regular languages as my CS friends know. You cannot use a regex to find the closing parenthesis that corresponds to an opening parenthesis, since you can have other nested parens inside it. And it gets even more complex when there are other tokens that can nest, such as strings or comments. What if you have an opening paren that contains a string with a closing paren, like e.g. ("foo)")? A regex would match the closing paren inside the string. In fact, parsing the language of nested parens (strings like (()(()))) with regexes is one of the typical (futile) exercises in a compilers course. Students struggle to do it because it’s an impossible task, and learn the hard way that not everything can be parsed with regexes.

Unlike a typical programming language with lots of nested structures however, the language of CSS selectors is more limited. There are only two nested structures: strings and parens, and they only appear in specific types of selectors (namely attribute selectors, pseudo-classes and pseudo-elements). Once we get those out of the way, everything else can be easily parsed by regexes. So I decided to go with a hybrid approach: The selector is first looked at character-by-character, to extract strings and parens. We only extract top-level parens, since anything inside them can be parsed separately (when it’s a selector), or not at all. The strings are replaced by a single character, as many times as the length of the string, so that any character offsets do not change, and the strings themselves are stored in a stack. Same with parens.

After that point, this modified selector language is a regular language that can be parsed with regexes. To do so, I follow an approach inspired by the early days of Prism: An object literal of tokens in the order they should be matched in, and a function that tokenizes a string by iteratively matching tokens from an object literal. In fact, this function was taken from an early version of Prism and modified.

After we have the list of tokens as a flat array, we can restore strings and parens, and then nest them appropriately to create an AST.

Also note that the token regexes use the new-ish named capture groups feature in ES2018, since it’s now supported pretty widely in terms of market share. For wider support, you can transpile :)

Introspecting CSS via the CSS OM: Get supported properties, shorthands, longhands

27 July 2020 3 min read 0 comments Report broken page

For some of the statistics we are going to study for this year’s Web Almanac we may end up needing a list of CSS shorthands and their longhands. Now this is typically done by maintaining a data structure by hand or guessing based on property name structure. But I knew that if we were going to do it by hand, it’s very easy to miss a few of the less popular ones, and the naming rule where shorthands are a prefix of their longhands has failed to get standardized and now has even more exceptions than it used to. And even if we do an incredibly thorough job, next year the data structure will be inaccurate, because CSS and its implementations evolve fast. The browser knows what the shorthands are, surely we should be able to get the information from it …right? Then we could use it directly if this is a client-side library, or in the case of the Almanac, where code needs to be fast because it will run on millions of websites, paste the precomputed result into whatever script we run.

There are essentially two steps for this:

Get a list of all CSS properties
Figure out how to test if a given property is a shorthand and how to get its longhands if so.

I decided to tell this story in the inverse order. In my exploration, I first focused on figuring out shorthands (2), because I had coded getting a list of properties many times before, but since (1) is useful in its own right (and probably in more use cases), I felt it makes more sense to examine that first.

Note: I’m using document.body instead of a dummy element in these examples, because I like to experiment in about:blank, and it’s just there and because this way you can just copy stuff to the console and try it wherever, even right here while reading this post. However, if you use this as part of code that runs on a real website, it goes without saying that you should create and test things on a dummy element instead!

Getting a list of all CSS properties from the browser

In Chrome and Safari, this is as simple as Object.getOwnPropertyNames(document.body.style). However, in Firefox, this doesn’t work. Why is that? To understand this (and how to work around it), we need to dig a bit deeper.

In Chrome and Safari, element.style is a CSSStyleDeclaration instance. In Firefox however, it is a CSS2Properties instance, which inherits from CSSStyleDeclaration. CSS2Properties is an older interface, defined in the DOM 2 Specification, which is now obsolete. In the current relevant specification, CSS2Properties is gone, and has been merged with CSSStyleDeclaration. However, Firefox hasn’t caught up yet.

Firefox on the left, Safari on the right. Chrome behaves like Safari.

Since the properties are on CSSStyleDeclaration, they are not own properties of element.style, so Object.getOwnPropertyNames() fails to return them. However, we can extract the CSSStyleDeclaration instance by using __proto__ or Object.getPrototypeOf(), and then Object.getOwnPropertyNames(Object.getPrototypeOf(document.body.style)) gives us what we want!

So we can combine the two to get a list of properties regardless of browser:

let properties = Object.getOwnPropertyNames(
	style.hasOwnProperty("background")?
	style : style.__proto__
);

And then, we just drop non-properties, and de-camelCase:

properties = properties.filter(p => style[p] === "") // drop functions etc
	.map(prop => { // de-camelCase
		prop = prop.replace(/[A-Z]/g, function($0) { return '-' + $0.toLowerCase() });
if (prop.indexOf("webkit-") > -1) {
prop = "-" + prop;
}
return prop;
});

You can see a codepen with the result here:

https://codepen.io/leaverou/pen/eYJodjb?editors=0010

Testing if a property is a shorthand and getting a list of longhands

The main things to note are:

When you set a shorthand on an element’s inline style, you are essentially setting all its longhands.
element.style is actually array-like, with numerical properties and .length that gives you the number of properties set on it. This means you can use the spread operator on it:

> document.body.style.background = "red";
> [...document.body.style]
< [
	"background-image",
	"background-position-x",
	"background-position-y",
	"background-size",
	"background-repeat-x",
	"background-repeat-y",
	"background-attachment",
	"background-origin",
	"background-clip",
	"background-color"
]

Interestingly, document.body.style.cssText serializes to background: red and not all the longhands.

There is one exception: The all property. In Chrome, it does not quite behave as a shorthand:

> document.body.style.all = "inherit";
> [...document.body.style]
< ["all"]

Whereas in Safari and Firefox, it actually returns every single property that is not a shorthand!

Firefox and Safari expand all to literally all non-shorthand properties.

While this is interesting from a trivia point of view, it doesn’t actually matter for our use case, since we don’t typically care about all when constructing a list of shorthands, and if we do we can always add or remove it manually.

So, to recap, we can easily get the longhands of a given shorthand:

function getLonghands(property) {
	let style = document.body.style;
	style[property] = "inherit"; // a value that works in every property
	let ret = [...style];
	style.cssText = ""; // clean up
	return ret;
}

Putting the pieces together

You can see how all the pieces fit together (and the output!) in this codepen:

https://codepen.io/leaverou/pen/gOPEJxz?editors=0010

How many of these shorthands did you already know?

Releasing MaVoice: A free app to vote on repo issues

11 July 2020 2 min read 0 comments Report broken page

First off, some news: I agreed to be this year’s CSS content lead for the Web Almanac! One of the first things to do is to flesh out what statistics we should study to answer the question “What is the state of CSS in 2020?”. You can see last year’s chapter to get an idea of what kind of statistics could help answer that question.

Of course, my first thought was “We should involve the community! People might have great ideas of statistics we could study!”. But what should we use to vote on ideas and make them rise to the top?

I wanted to use a repo to manage all this, since I like all the conveniences for managing issues. However, there is not much on Github for voting. You can add 👍 reactions, but not sort by them, and voting itself is tedious: you need to open the comment, click on the reaction, then go back to the list of issues, rinse and repeat. Ideally, I wanted something like UserVoice™️, which lets you vote with one click, and sorts proposals by votes.

And then it dawned on me: I’ll just build a Mavo app on top of the repo issues, that displays them as proposals to be voted on and sorts by 👍 reactions, UserVoice™️-style but without the UserVoice™️ price tag. 😎 In fact, I had started such a Mavo app a couple years ago, and never finished or released it. So, I just dug it up and resurrected it from its ashes! It’s — quite fittingly I think — called MaVoice.

You can set it to any repo via the repo URL parameter, and any label via the labels URL param (defaults to enhancement) to create a customized URL for any repo you want in seconds! For example, here’s the URL for the css-almanac repo, which only displays issues with the label “proposed stat”: https://projects.verou.me/mavoice/?repo=leaverou/css-almanac&labels=proposed%20stat

While this did need some custom JS, unlike other Mavo apps which need none, I’m still pretty happy I could spin up this kind of app with < 100 lines of JS :)

Yes, it’s still rough around the edges, and I’m sure you can find many things that could be improved, but it does the job for now, and PRs are always welcome 🤷🏽‍♀️

The main caveat if you decide to use this for your own repo: Because (to my knowledge) Github API still does not provide a way to sort issues by 👍 reactions, or even reactions in general (in either the v3 REST API, or the GraphQL API), issues are instead requested sorted by comment count, and are sorted by 👍 reactions client-side, right before render. Due to API limitations, this API call can only fetch the top 100 results. This means that if you have more than 100 issues to display (i.e. more than 100 open issues with the given label), it could potentially be inaccurate, especially if you have issues with many reactions and few comments.

Another caveat is that because this is basically reactions on Github issues, there is no limit on how many issues someone can vote on. In theory, if they’re a bad actor (or just overexcited), they can just vote on everything. But I suppose that’s an intrinsic problem with using reactions to vote for things, having a UI for it just reveals the existing issue, it doesn’t create it.

Hope you enjoy, and don’t forget to vote on which CSS stats we should study!