DANNY YANG

about me · blog · projects · hire me

The Right Way To Pipe

28 Dec 2023 - 1568 words - 7 minute read - RSS

Are you bored over the holidays and itching to bikeshed over programming language syntax? Well, today’s your lucky day!

In this post, I discuss a few ways that different languages pipe data between a sequence of functions, and finally discuss the one I think is the best. I’m only going to cover the left-to-right style of chaining functions because that’s the most widespread and also because I don’t like reading right-to-left.

The Problem #

First, let’s start off with the problem that we’re trying to solve. Deeply nested parentheses are bad.

a(b(c(d)))

Having to name a bunch of use-once variables for each step of a sequence of operations is bad (cough cough Erlang).

let x = c(d);
let y = b(x);
let z = a(y);

Method Chaining #

The most common way we avoid this a lot of mainstream and object-oriented languages is method chaining, something like this:

let numbers = [1, 2, 3, 4, 5];

let result = numbers
    .map(x => x * 2)
    .filter(x => x > 5);

In languages with anonymous functions this pattern is usually adequate when working with built-in collections, but it’s inflexible and has limited utility outside of that unless you structure your code a specific way.

One improvement on the basic pattern is allowing extension methods via partial classes or traits, like in C#, Swift, and Rust. By allowing the user to extend existing classes/structs with methods, method chaining can be used in more places.

Pipe First, Pipe Last #

Some languages have a pipe operator, which is a piece of syntax that allows you to pass the return value of one function as a parameter for another.

This allows programmers to model data flow between functions intuitively, and is probably influenced or inspired by UNIX pipes.

The pipe operator is pretty prevalent in functional languages, especially those in the ML family, but it’s also found in R and has been proposed for languages like C#.

Examples include:

Edit: Another list that’s slightly more comprehensive but by no means exhaustive can be found here

let double x = x * 2
let square x = x * x
let increment x = x + 1

let result =
  5
  |> double
  |> square
  |> increment

Compared to method chaining, this is obviously more flexible since you can use it with any function, not just specific methods.

When chaining functions with multiple parameters, we now need to decide which parameter the piped value goes into. In R, Elixir, and ReScript, it’s the first one (pipe-first), for everything else in the list above, it’s the last one (pipe-last).

The whole debate over pipe-first versus pipe-last is a huge can of worms that I won’t dive into here, but briefly: pipe-first makes it easier to do IDE integrations, good error messages, and type inference, while pipe-last is better for function composition in languages that embrace a more purely functional programming style. This excellent blog post goes into more detail on the topic.

Both of these can work, but they still require you to write your functions a certain way to take advantage of them. This can be limiting, especially in languages that don’t support both forms of piping, and it’s not immediately intuitive whether/how piping interacts with named or labeled parameters.

Uniform Function Call Syntax #

There’s also this thing called Uniform Function Call Syntax, which allows any function to be called as if it were a method for its first parameter. It exists in D and Nim and has been proposed for C++.

So the following two ways of invoking f would be equivalent, even if f is not a method:

f(a, b, c)
a.f(b, c)

I consider this to be a worse version of pipe-first, since it does the same thing but the lack of a unique operator makes the . syntax ambiguous.

Pipe Anywhere #

Ultimately I think the debate of pipe-first vs pipe-last is a false dichotomy. Those are not the only two ways to do piping.

There are a few examples of existing languages that allow users to pipe into any parameter, and I think this is the best way to implement piping since it provides the best balance of clarity and flexibility.

The first example is Hack, an offshoot of PHP which requires the programmer to specify the location of the piped argument with $$.

$x = vec[2,1,3]
  |> Vec\map($$, $a ==> $a * $a)
  |> Vec\sort($$);

Leaving aside your thoughts on the aesthetics of $ for a moment, we can see how this solves the flexibility problem by making everything explicit. Functions and methods have separate syntax, and any function can be piped into any other function regardless of how they were written.

If Hack had named parameters (it sadly does not), the $$ would make dealing with them pretty intuitive, like foo(bar=$$).

Before the introduction of R’s natively supported pipes, programmers had been using a custom pipe operator %>% defined in the magrittr package and automatically imported when you load the popular Tidyverse packages.

The following are equivalent:

x %>% f(y)
f(x, y)

This operator acts like pipe-first by default, but if you want to pipe the argument into another positional or labeled argument you can use a .. The following pair of function calls is equivalent:

y %>% f(x, .)
f(x, y)

R has labeled arguments, and this way of piping works nicely:

z %>% f(x, y, arg = .)
f(x, y, arg = z)

I have a slight preference for always requiring an explicit marker like Hack does, because default/implicit behavior can cause confusion (or because I have Stockholm syndrome from writing so much Hack during the last few years).

As for possible downsides of this approach I’m not sure how it would work with currying/partial application, but I think I’d rather have no currying + good pipes than currying + bad/confusing pipes.

I don’t have any data to back this up, but my hunch is that in the majority of real world codebases piping is going to be a lot more prevalent/useful than partial application (and thus the ergonomic impacts are greater), especially given the performance downsides of the latter. But if someone has data on this I’m happy to be proven wrong.

Further Reading #

Suggested u/Tubthumber8 on Reddit: there’s a proposal for adding Hack-style pipes to JavaScript that’s currently making its way through TC39, after several previous proposals for F#-style pipes were rejected. There’s a lot of very heated discussion on the pros and cons of each type of pipe, though after reading the discussions I remain squarely in the Hack-pipe camp.

Edit: a footnote on ReScript

ReScript is another interesting example. Like R, it’s pipe-first by default but you can use an underscore as a placeholder for the piped argument like so:

let addTo7 = (x) => add3(3, x, 4)
let addTo7 = add3(3, _, 4)

getName(input)
  ->namePerson(~person=personDetails, ~name=_)

ReScript actually has both pipe-first -> and pipe-last |> operators, the latter owing to its roots in OCaml. I wouldn’t recommend having both at the same time, since it’s impossible to write a standard library where both types of piping are well-supported. In ReScript, pipe-last is deprecated/discouraged in favor of pipe-first, and the standard library was rewritten with pipe-first in mind (for example, by making the list the first argument of List.map).



github · linkedin · email · rss