Skip to content

Patterns and related features #546

@munificent

Description

@munificent

We're starting to investigate a couple of related language changes around tuples, pattern matching, and destructuring. These features touch on other things like "data classes", so I wanted to kick things off by trying to sketch out my understanding of the space, how I think the various features relate to each other, and the relevant issues people have filed.

Draft feature specification:
https://github.com/dart-lang/language/blob/master/accepted/future-releases/0546-patterns/feature-specification.md

Implementation issue: dart-lang/sdk#50585

Concepts

As I understand it, the core concepts are:

Patterns

Languages with pattern matching revolve around patterns. "Expression" and "statement" are both syntactic categories in the grammar. Patterns form a third category. Most languages that have pattern matching have a variety of different kinds of patterns they support. The basic idea with a pattern is that it:

  • Can be tested against some value to determine if the pattern matches. Some kinds of patterns always match.
  • If (and only if) it does match, the pattern may bind a new variable in some scope.

Languages with patterns use them in a variety of places. They can be how you declare variables and catch exceptions. Some languages use them for defining parameter lists or "overloads" of a function. Every language with patterns has some kind of explicit pattern matching expression or statement...

Pattern matching

Once you have patterns, it makes sense to introduce a control flow structure that uses them. A pattern matching statement takes a value that it matches against. Then it has a series of pairs of patterns and bodies. At runtime, the implementation tests each pattern against the value in turn. The first pattern that matches has its body executed. Any variables the pattern binds are only available in the corresponding body.

If you really want a functional style, an even more powerful form is a pattern matching expression that lets you do pattern matching in the middle of a larger expression and have it evaluate to a value. In order to do that in a sound way, though, you need the next concept...

Exhaustiveness

So you're executing a pattern matching construct, walking through all of the patterns to find a match. What happens if none of the cases match? If the pattern matching construct is an expression, this is a real problem because what value do you evaluate to in that case? Even when you are using pattern matching as a statement, users still likely want to know if it's possible for a value to not match any of the cases.

To help with that, many languages with pattern matching also do exhaustiveness checking. The static checker will examine all of the patterns and determine if it's possible for a value to sneak through and match none of them. If so, the compiler reports a warning or error to let the user know. Dart does a limited form of this now with switch statements on enum types. If you miss one of the enum cases, Dart gives you a warning to let you know. Exhaustiveness checking in pattern matching takes that concept and applies it to much larger, more complex patterns.

Destructuring

A list literal expression takes a number of smaller values, the list element expressions, and composes a new larger value out of them, the list. Patterns go in the other direction. For example, a list pattern contains a series of nested patterns for elements. When the list pattern is matched against a list, it destructures the list by walking the elements and matching them against the corresponding element patterns.

This gives you a really nice terse way to pull pieces of aggregate objects like lists, maps, and even instances of classes. Any time you have a pattern that contains nested subpatterns, you're usually doing some kind of destructuring.

Tuples

Lists are a great way to collect a series of values of the same type, but they work poorly when you want to, say return a number and a string from a function. Since lists only have a single element type, the type of each separate element is lost.

Tuples fix that. A tuple is sort of like a fixed-size list where each element has its own distinct type. The type system can see "into" the tuple. A tuple of an int followed by a String has a different static type than a tuple of two booleans. Syntactically, tuples are usually a comma-separated list of expressions surrounded by parentheses, like (1, "string").

Tuples are useful in a statically-typed language for things like multiple return values because they maintain precise types. Once you have a tuple, you eventually need to pull the elements back out of it. You can provide number-like getters, but that gets tedious. Languages with tuples almost always have some kind of tuple pattern so that you can destructure them.

Algebraic data types, sum types, unions with fields

Pattern matching comes to us from the ML language family. Another key feature of those languages that ties into pattern matching are algebraic data types, also known as sum types, discriminated unions, or tagged unions. To make matters more confusing, these are quite different from both union types and (untagged) unions in C and C++. Algebraic data types are sometimes abbreviated ADT, which is again distinct from abstract data types. Thanks, computer scientists.

Sum types are like superpowered unions. You have a type that contains a fixed, closed set of cases. Unlike Dart enums, each case may also have its own fields containing extra data. For example, you might have a Shape type, with cases for Rect and Circle. Rect would have four coordinates for its corners. Circle would have a center and radius.

In an object-oriented language like Dart, you'd model this using subclasses. You'd have an abstract Shape class and Rect and Circle subclasses. Subclasses and ADTs are sort of duals. Indeed, Scala's case classes are like ADTs but are subclasses under the hood.

These come into discussions of pattern matching because the typical way to define behavior on specific cases in a sum type is by using a pattern match on the different cases. Likewise, the way to extract each case's fields is by using a case pattern to destructure them. Something like:

someShape match {
  case Rect(left, top, right, bottom) => ...
  case Circle(x, y, radius) => ...
}

It would be very strange to add algebraic data types to a language without a nice way to switch on and decompose them like this.

"Data classes" and "value types"

One of the most frequent feature requests for Dart is some kind of data classes or value types. The former is inspired by Kotlin's corresponding feature. The latter means different things to different people, but common attributes seem to be:

  • A lightweight way of defining a named type with some fields without having to explicitly declare a constructor that initializes each field.

  • An automatically provided implementation of hashCode and operator ==() so that the resulting object has "value semantics".

  • Some kind of immutability. The fields can't be modified. Some ask for deep immutability—the values of all fields must themselves be immutable. This usually implies that the data class can't be extended either.

  • Often some easy way to clone or make a new copy of an object with some fields changed.

Data classes are involved with patterns and pattern matching because users also often request that a data class also implicitly provides destructuring support. (Kotlin data classes give you this.) That means some kind of pattern to let you pull apart the fields in an object.

Also, part of the motivation for both algebraic data types and data classes is a nicer notation for defining a new composite type without all of the boilerplate overhead of a full class declaration. In other words, a good ADT feature might subsume data classes or vice versa.

Type patterns

The last concept which has mostly come up internally but exposes a capability the language doesn't otherwise offer is some kind of way to expose the runtime type argument of an object. If you have patterns and pattern matching, a natural solution is a type pattern that matches an object of some generic type and binds variables to the object's type arguments.

Structure

We are very early in our investigation. This is a family of features that are very close to my heart. I wrote a doc in 2011 requesting that we add pattern matching to Dart. At the same time, as you can see, this is a big sprawling feature and we currently have our hands full with extension methods and non-nullable types (both very large features in their own right!).

That being said, here is how I am interested in approaching the space and what I hope we can do:

  1. Define a set of pattern syntax and semantics. I think patterns are the most important core to all of these features. You obviously can't do pattern matching, destructuring, and exhaustiveness checking at all without them. You technically can do tuples, sum types, and data classes, but I think they are significantly hampered without them.

    Also, while it may not be obvious, I believe there are some real tricky design challenges in this area. Patterns are a dense grammatical region where we want to be able to express a wide variety of behaviors in a small amount of syntax. We need to be able to destructure lists, maps, sets, and tuples. Do runtime type tests. Pull fields out of, at least, sum types or data classes. And, of course, test for literals and constant values for simple switch-like cases.

    We are constrained by the expectations users bring from other languages, the desire for patterns to mirror their corresponding expression forms, and (hardest) Dart's existing notation for variable declarations, since those are already a limited form of "pattern" in that they introduce bindings.

    I think getting this right is key.

  2. Define a set of language constructs where patterns can be used. Some new kind of pattern matching statement is the obvious one. But also retrofitting them into variable declarations so that you can destructure in a declaration. Can we re-engineer catch clauses to be pattern based so that you can have more sophisticated catches? Should parameter lists use patterns? Do we want an expression-based pattern match? Some kind of if-let-like statement? This is the fun stuff. Once users have internalized how patterns work, the more places they can use that knowledge the better.

  3. Add tuples. This can happen in parallel with the previous one. Much like adding sets, this is a whole feature in its own right with new syntax, object representation, runtime behavior, and type checking.

    @lrhn has already put some work into defining this.

  4. User-defined destructuring. My favorite language features are ones that add new expressive syntax whose semantics can be programmed by an API designer. The for-in loop syntax is baked into the language, but you get to decide what it means in your implementation of Iterable.

    I want the same thing for destructuring. I think a syntax something like Name(subpattern1, subpattern2) (an identifier followed by a parenthesized list of subpatterns) should desugar to calling some user-defined destructuring operation on the matched value. That operation can define whether the value matches or not and, if it does, what subvalues the subpatterns are matched against. There are obviously many details to work out here, but also prior are in Scala's unapply() method and Kotlin's componentN() methods.

  1. Data classes. The least-defined (in my mind) goal is around improving the user experience for easily defining value-like types. This could mean some new explicit "data class" syntax, or a family of smaller features like allowing implicit constructors to take parameters.

    If we allow these two be subclasses of a superclass, then that gets pretty close to sum types once you include the previously-mentioned ability to pattern match on types and destructure them. Likewise, once you have general user-defined destructuring, then any lightweight data class-like syntax can just be syntactic sugar for a class declaration that includes a declaration of its own destructuring support.

As I said, this is all still very early and tentative, but I wanted to collect what I think are the related issues into one place and then begin sketching out a map through this territory.

Metadata

Metadata

Assignees

Labels

featureProposed language feature that solves one or more problemspatternsIssues related to pattern matching.recordsIssues related to records.

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions