Christophe Gyurgyik | PhD Candidate, Stanford University

C.A.R. Hoare’s work “Recursive data structures” motivates a simple rule that good language design should abide by:

(1) Implementation details that don’t affect program correctness should be syntactically inexpressible.

Codd’s relational model attempts to realize this: you declare what you want and the implementation is determined by the query optimizer. For example,

SELECT age FROM users WHERE age > 21

The declarative language specifies nothing about index usage, join order, or memory allocation. In practice, relational database programmers still reason about implementation details, shaping queries to optimize well. Most programming languages abandon (1) entirely, giving us:

(2) Implementation details that don’t affect program correctness are syntactically expressible.

Imperative languages like C are the obvious case, but even Haskell qualifies, where you make explicit decisions about memory representation:

-- boxed (heap-allocated, thunk-able)
foo :: Int -> Int
foo x = x + 1
– unboxed (raw machine integer)
foo :: Int# -> Int#
foo x = x +# 1#

Both snippets add 1 to the variable x with different data layouts. The choices are semantically invisible¹ but syntactically different, and the latter exists solely for performance reasons. Scheduling languages like Halide and TACO take a different path:

(3) Implementation details that don’t affect program correctness are confined to a separate language.

In Halide, you write the algorithm once as a pure functional description, and then separately write a schedule specifying how the program should run. The schedule does not change the program semantics, only the execution strategy. This separation is enforced syntactically.

// algorithm
f(x) = a(x) * b(x);
// schedule
f.parallel(x);

The algorithm f is a simple element wise multiplication of two dense arrays a and b, and the schedule specifies that f should be run in parallel for all x.

Why not just achieve (1) with a sufficiently clever compiler? Because determining which details “don’t affect correctness” requires deciding program equivalence (Rice’s theorem forbids this). Languages abiding to (3) sidestep the problem by narrowing the domain to something tractable. Further, (3) doesn’t require the compiler to discover all semantics-preserving transformations², they provide a language for the programmer to implement them, while guaranteeing that only semantics-preserving transformations are expressible.

So, why does (3) exist? (1) provides clarity but surrenders performance control, and (2) provides control at the cost of entanglement. Programmers in performance-critical domains want both semantic clarity and performance control. Scheduling languages deliver: reason about correctness in one language, optimize in another. This separation of concerns is one motivation for scheduling languages. There’s more to say (e.g., tractable search space, encapsulation benefits, ergonomic costs), but that’s for another post.

Thank you to Rohan Yadav and AJ Root for their valuable feedback.

This is not true in general, the domain of unboxed types has no bottom element which just means they cannot be lazily evaluated. However, the point remains.

Usually, scheduling languages also involve a compiler to realize some of the more “trivially” discoverable optimizations, e.g., constant folding or dead code elimination.

Why scheduling languages exist