Making simple things easy and complex things possible is not enough
The holy grail of good API design is making complex things possible and simple things easy. But is it enough?
One of my favorite product design principles is Alan Kay’s “Simple things should be simple, complex things should be possible”. [1]
However, in the years since, I’ve come to realize that making simple things easy and complex things possible is a good first step, but for most things, it’s not enough.
Not just about APIs
Since Alan Kay was a computer scientist, his adage is typically framed as an API design principle. However, it’s a good rule of thumb for pretty much any creative tool, any user interface designed to help people create artifacts. APIs are only an example of such an interface.
The line between creative tools and transactional processes [2] is blurry. While APIs or design and development tools are squarely in the creative tool category, what about something like Google Calendar? If you squint, it could be argued that Google Calendar is a creative tool where the artifact being created is a calendar event.
Indeed, Kay’s maxim has clearly been used in its design. Everything has sensible defaults that can be tweaked, so if all we want is to add an hour-long event at a specific date and time, we can do that with a single click at the right place in the calendar. Can’t really get simpler than that. We can drag an edge to make it shorter or longer, or drag & drop to reschedule it (direct manipulation), and we can tweak most details from the popup itself, or click “More Options” and get even more control (e.g. set guest permissions). Simple things are easy, and complex things are possible.
Which things are simple?
For a long time, I used to argue that the principle should be “Common things should be easy, uncommon things should be possible”. Often, the most common use cases are not at all simple!
Let’s take the HTML <video>
element as an example.
Simple things are certainly easy: all we need to get a nice sleek toolbar that works well is a single attribute: controls
.
We just slap it on our <video>
element and bam, we’re done with a single line of HTML:
<video src="videos/cat.mp4" controls></video>
Now let’s suppose use case complexity increases juuuust a little bit. Maybe I want to add buttons to jump 10 seconds back or forwards. Or a language picker for subtitles. Or key moment indicators, like YouTube. Or just to hide the volume control on a video that has no audio track. None of these are particularly niche, but the default controls are all-or-nothing: the only way to change them is to reimplement the whole toolbar from scratch, which takes hundreds of lines of HTML, CSS, and JavaScript to do well.
The user experience of HTML <video>
has a usability cliff.
Simple things are easy and complex things are possible. But once use case complexity crosses a certain (low) threshold, user effort abruptly shoots up.
This is called a usability cliff, and is common when products make simple things easy and complex things possible by providing two distinct interfaces: a very high level one that caters to the most common use case, and a very low-level one that lets users do whatever but they have to reimplement everything from scratch.
For delightful user experiences, making simple things easy and complex things possible is not enough — the transition between the two should also be smooth. The user effort required to achieve incremental value should be proportional to the value gained. There should be no point where incremental value requires orders of magnitude more user effort.
You can visualize this like that:
Apply the principle recursively
One good way to avoid cliffs is to ask yourself: among the use cases I considered “complex”, which ones are most common? Then make simple things easy for them too.
This was a big reason why PrismJS, a syntax highlighting library I wrote in 2012, became so popular, reaching over 2 billion downloads on npm and being used on some pretty huge websites [3].
Simple things were easy: highlighting code on a website that used good HTML took nothing more than including a CSS file and a script tag. Because its only hook was regular HTML, and there was no Prism-specific “handshake” in the markup, it was able to work across a large variety of toolchains, even tooling where authors had little control over the HTML produced (e.g. Markdown).
Complex things were possible: it included a simple, yet extensive system of hooks that allowed plugin authors to modify its internal implementation to do whatever by basically inserting arbitrary code at certain points and modifying state.
But beyond these two extremes, the principle was applied recursively: Common complex things were also easier than uncommon complex things. For example, while adding a new language definition required more knowledge than simply using the library, a lot of effort went into reducing both the effort and the knowledge required to do so. Styling required simple CSS, styling simple, readable class names. And as a result, the ecosystem flourished with hundreds of contributed languages and themes.
This is a very common pattern for designing extensible software: a powerful low-level plugin architecture, with easier shortcuts for common extensibility points.
Malleable shortcuts
A corollary of Incremental user effort for incremental value is that if the interface provides a simple way to accomplish part of a complex use case, users should be able to take advantage of it to get a headstart for more complex use cases, rather than having to recreate the solution from scratch using a more complex interface.
At their core, all ways to smoothen this curve revolve around tweaking: Making sure that the solution to simple cases is sufficiently flexible that it takes a lot of use case complexity before users need to recreate the solution from scratch using lower-level primitives, if they need to at all.
This is the core issue with the <video>
example:
The way it makes simple things easy is completely inflexible.
There are no extensibility points, no way to customize anything.
It’s take it or leave it.
While web components are not typically the poster child of good user experiences, there is one aspect of web component APIs that allows them to provide a very smooth power-to-effort curve: slots.
Slots are predefined insertion points with defaults.
If I’m writing a <my-video>
component, I can define its default play button like this:
<button id="play">
<slot name="play-button-icon">▶️</slot>
</button>
And now, a component consumer can use <my-video src="cat.mp4">
and get a default play button, or slot in their own icon:
<my-video src="cat.mp4">
<i slot="play-button-icon" class="fa-solid fa-play"></i>
</my-video>
But the best thing about slots is that they can be nested.
This means that component authors can defensively wrap parts of the UI in slots, and component consumers can override just the parts they need, at the granularity they need.
For example, <my-video>
could also wrap the default play button itself in a slot:
<slot name="play-button">
<button id="play">
<slot name="play-button-icon">▶️</slot>
</button>
</slot>
And then, component consumers can still only override the icon, or override the whole button:
<my-video src="cat.mp4">
<button slot="play-button">
<i slot="play-button-icon" class="fa-solid fa-play"></i>
</button>
</my-video>
Empty slots facilitate insertion points.
For example, the <my-video>
component author could support inserting controls before or after the play button like so:
<slot name="play-button-before"></slot>
<slot name="play-button">
<button id="play">
<slot name="play-button-icon">▶️</slot>
</button>
</slot>
<slot name="play-button-after"></slot>
And then, component consumers can use them to add additional controls:
<my-video src="cat.mp4">
<button slot="play-button-before" class="skip-backwards"><svg>…</svg></button>
<button slot="play-button-after" class="skip-forwards"><svg>…</svg></button>
</my-video>
Given enough extension points, users would only need to resort to building custom controls from scratch when they truly have a very complex use case that cannot be implemented as a delta over the default controls. That smoothens out the curve, which may look more like this:
Let’s get back to Google Calendar for another example. Suppose we want to create a recurring event. Even within the less simple use case of creating a recurring event, there are simpler use cases (e.g. repeat every week), and more complex ones (e.g. every third week, on the third Sunday of the month, twice a week etc.).
Google Calendar has used tweakable presets to make simple things easy and complex things possible at the micro-interaction level. Simple things are easy: you just pick a preset. But these presets are not just shortcuts for common cases. They also serve as entrypoints into the more “advanced” interface that can be used to set up almost any rule — with enough user effort.
Tweakable presets smoothen the curve exactly because they contain the additional user effort to only the delta between the user’s use case, and the simpler use case the interface is optimized for. By doing that, they also become a teaching tool for the more advanced interface, that is much more effective than help text, which is typically skimmed or ignored.


A hierarchy of abstractions
So far, both malleable abstractions we have seen revolved around extensibility and customization — making the solution to simple use cases more flexible so it can support medium complexity use cases through customization.
The version of this on steroids is defining low-level primitives as building blocks, and then composing them into high-level abstractions.
My favorite end-user facing product that does this is Coda. If you haven’t heard of Coda, imagine it as a cross between a spreadsheet, a database, and a document editor.
Coda implements its own formula language, which is a way for end users to express complex logic through formulas. Think spreadsheet formulas, but a lot better. For many things, the formula language is its lowest level primitive.
Then, to make simple things easy, Coda provides a UI for common cases, but here’s the twist: The UI is generating formulas behind the scenes. Whenever users need to go a little beyond what the UI provides, they can switch to the formula editor and tweak the generated formula, which is infinitely easier than starting from scratch.
Let’s take the filtering interface as an example, which I have written about before. At first, the filtering UI is pretty high level, designed around common use cases:
Another nice touch: “And” is not just communicating how multiple filters are combined, but is also a control that lets users edit the logic.
For the vast majority of use cases, the high-level UI is perfectly sufficient. If you don’t need additional flexibility, you may not even notice the little f button on the top right. But for those that need additional power it can be a lifesaver. That little f indicates that behind the scenes, the UI is actually generating a formula for filtering. Clicking it opens a formula editor, where you can edit the formula directly:
I suspect that even for the use cases that require that escape hatch, a small tweak to the generated formula is all that is necessary. The user may have not been able to write the formula from scratch, but tweaking is easier. As one data point, the one time I used this, it was just about using parentheses to combine AND and OR differently than the UI allowed.
Smoothening the curve is not just about minimizing user effort for a theoretical user that understands your interface perfectly (efficiency), it’s also about minimizing the effort required to get there (learnability). The fewer primitives there are, the better. Defining high-level abstractions in terms of low-level primitives is also a great way to simplify the user’s mental model and keep cognitive load at bay. It’s an antipattern when users need to build multiple different mental models for accomplishing subtly different things.
When high-level abstractions are defined as predefined configurations of the existing low-level primitives, there is only one mental model users need to build. The high level primitives explain how the low-level primitives work, and allow users to get a headstart for addressing more complex use cases via tweaking rather than recreating. And from a product design perspective, it makes it much easier to achieve smooth power-to-effort curves because you can simply define intermediate abstractions rather than having to design entirely separate solutions ad hoc.
For the Web Platform, this was basically the core point of the Extensible Web Manifesto, which those of you who have been around for a while may remember: It aimed to convince standards editors and browsers to ship low-level primitives that explain how the existing high-level abstractions worked.
Low-level doesn’t mean low implementation effort
Low-level primitives are building blocks that can be composed to solve a wider variety of user needs, whereas high-level abstractions focus on eliminating friction for a small set of user needs. Think of it that way: a freezer meal of garlic butter shrimp is a high-level abstraction, whereas butter, garlic, and raw shrimp are some of the low-level primitives that go into it.
The low-level vs high-level distinction refers to the user experience, not the underlying implementation. Low-level primitives are not necessarily easier to implement, and are often much harder. Since they can be composed in many different ways, there is a much larger surface area that needs to be designed, tested, documented, and supported. It’s much easier to build a mortgage calculator than a spreadsheet application.
As an extreme example, a programming language is one of the most low-level primitives possible: it can build anything with enough effort, and is not optimized for any particular use case. Compare the monumental effort needed to design and implement a programming language to that needed to implement e.g. a weather app, which is a high-level abstraction that is optimized for a specific use case and can be prototyped in a day.
As another extreme example, it could even be argued that an AI agent like ChatGPT is actually a low-level primitive from a UX perspective, despite the tremendous engineering effort that went into it. It is not optimized for any particular use case, but with the right prompt, it can be used to effectively replace many existing applications. The floor and ceiling model also explains what is so revolutionary about AI agents: despite having a very high ceiling, their floor is as low as it gets.
Reveal complexity progressively
Another corollary of Incremental user effort should produce incremental value is also that things that produce no value should not incur user effort. Complexity should be tucked away until it’s needed. Users should not have to deal with complexity that is not relevant to them. Enterprise software, I’m looking at you.
For example, individual user accounts should not need to set up “workspaces” separately from setting up their account, or designate representatives for different business functions (legal, accounting, etc.). This is complexity that belongs to complex use cases leaking out to simple ones. Any concepts exposed through a UI should add user-facing value. If a concept does not add user-facing value, it should not be exposed to users.
And for APIs, this emphasizes the importance of sensible defaults, so that users don’t need to make a ton of micro-decisions that may be entirely irrelevant to them.
When a shorter curve is the right call
Every design principle is a rule of thumb, not a hard and fast law. Sometimes, there are good reasons not to make the curve extend across the entire spectrum.
When not motivated by user needs
Some products are framed exactly around only one end of the spectrum. While they could do better and extend their curve a little bit, their entire value proposition is around one end of the spectrum, so it doesn’t make a lot of sense to invest resources in improving the other end.
Professional tools are an example where focusing around complex things being possible may be acceptable, such as airplane cockpits, or Photoshop. Tools that require a high level of domain expertise can typically afford to require some training, as said training often happens at the same time as acquiring the domain expertise. For example, a pilot learns how an airplane cockpit works while also learning how to fly.
For many of these tools, use cases are so variable that making simple things significantly easier would turn them into a different product. For example, Photoshop is a professional-grade graphics editor, that can be used for a large variety of graphics-related tasks. Focusing around a specific subset of use cases, say photo manipulation, doesn’t give us a better Photoshop UI, it gives us Lightroom. Is there a way to combine the two into a single product so that users don’t need to understand when to use which tool, without making both actively worse? Perhaps, but it’s not at all obvious.
On the other hand, something like Instagram’s photo editor makes it trivial to perform simple photo manipulations that look good with very little user effort and no domain expertise (low floor), but is quite limited in its capabilities; there are many things it simply cannot do (low ceiling). While there is a lot of room for improvement, making significantly more complex things possible is largely out of scope as beyond a certain point it would require domain expertise that Instagram’s target audience lacks.
Security & privacy
Sometimes, decomposing a high-level solution into low-level primitives can introduce security & privacy issues that a more tightly coupled high-level solution can avoid.
When I was in the TAG, at some point we reviewed a proposal for a low-level API which would allow websites to read the list of fonts installed on the user’s system. This raised huge red flags about user privacy and fingerprinting. However, upon closer inspection, it turned out that nearly use cases were highly concentrated, and were all variations of the same scenario: letting end-users select a font from their system to apply it to a given artifact (e.g. a document, a graphic etc). A high-level font picker form control where the browser takes care of displaying the list of fonts and only communicates the selected font back to the application would both address privacy concerns and make the API easier to use.
Performance
Sometimes, design decisions are driven by performance considerations, rather than usability principles.
For example, CSS selectors got :focus-within
to match elements that contained a focused element long before :has()
was added, which allows targeting ancestors in a much more generic way.
There was no question that :has()
would have been a better solution, but it was considered impossible to implement performantly at the time :focus-within
was designed.
And even today, browsers apply certain optimizations to :focus-within
that make it perform better than :has()
.
Other times, sensible defaults are not possible because the common case is also the slowest.
This is the reason why inherits
is mandatory when registering CSS properties:
the default that would be best for users (true
) is also the slowest, but making false
the default would mean that registered properties behave differently from native properties by default.
Instead, it was decided to not have a default, and make the descriptor mandatory.
Which comes first, convenience or capability?
Alan Kay’s maxim only deals with what to do, not when to do it. There is no discussion around prioritization. But in the real world, the when is just as important as the what.
Sure, let’s make simple things easy and complex things possible. But which solution do you ship first? Which one do you design first?
Stay tuned for Part 2, which will cover exactly this!
Kay himself replied on Quora and provided background on this quote. Don’t you just love the internet? ↩︎
a distinction I first read about in Joe McLean’s brilliant post on overfitting ↩︎
The other one being that it was the only one at the time that made syntax highlighting actually look good ↩︎