BYO-Closures For Performance

Some may be familiar with the idea of closures), which, in short, are local functions that capture state from enclosing context. Closures are obviously supported in Julia, which often just look like anonymous functions; in those docs, it mentions that "Functions in Julia are first-class objects", which means you can think about them as being defined in the language itself. Indeed, we could take the example of the exponent method for IEEFloats in Base, which is defined (slightly abbreviated) as:

function exponent(x::T) where T<:IEEEFloat
    xs = reinterpret(Unsigned, x) & ~sign_mask(T)
    k = Int(xs >> significand_bits(T))
    if k == 0 # x is subnormal
        m = leading_zeros(xs) - exponent_bits(T)
        k = 1 - m
    end
    return k - exponent_bias(T)
end

And think of this method definition being "lowered" to:

struct exponentFunction <: Function
end

function (f::exponentFunction)(x::T) where T<:IEEEFloat
 xs = reinterpret(Unsigned, x) & ~sign_mask(T)
    k = Int(xs >> significand_bits(T))
    if k == 0 # x is subnormal
        m = leading_zeros(xs) - exponent_bits(T)
        k = 1 - m
    end
    return k - exponent_bias(T)
end

const exponent = exponentFunction()

So here we're defining a struct exponentFunction, which is a subtype of Function, that all function types inherit from (you can check this yourself by querying supertype(typeof(Base.exponent))). Then we're defining a method with some unusual syntax to make instances of exponentFunction callable, like exponentFunction()(3.14), which is accomplished with the syntax function (f::exponentFunction)(x::T). Finally, we declare our const exponent to just be an instance of our exponentFunction, which is often known as a "functor". (Functors are covered in more detail in the Julia manual).

Ok, so why start off a blog post going over a bunch of stuff in the JuliaLang docs manual? Well, in a recent refactoring, I ran into a decently well-known performance issue with closures, which suggested a few different solutions, but none which quite fit my use-case. Now, I have to admit to not fully understanding the fundamental language issue causing the performance hit here; what I do understand from my own code factorings/use is that when you try to use a variable that gets captured as closure state after the closure definition/use, it ends up creating a Core.Box object to put the variable's value into (which massively affects performance because the variable's inferred type is essentially Any and then relies on runtime/dynamic dispatch at every use).

Part of my aforementioned refactoring involved moving some common code into higher-order functions that applied functor arguments to each field of a struct, for example, which meant my code was now relying on closures passed to the higher-order functions. Luckily, I tend to use my favorite JuliaLang feature, its code-inspection tools (see @code_typed, for example) to just see what core functions are getting inferred/lowered to, and noticed a bunch of red-flags in the form of Core.Box for key variables. Digging a little further, it became clear that I was a victim of issue #15276 and would need to figure out a solution. One solution suggested in the issue thread was to use let blocks, declaring closure-capture state variables to make it explicit which variables will be captured. In my case, however, I needed the variables to be updated within the closure and then needed those updated values afterwards to pass along (so my parsing functions passed current parsing state down into the closures and need to then pass it along to parse the next object).

So my solution? Well, an extremely unique feature of the Julia language is how much of the language is written in itself, and how many major constructs are true, first-class citizens of the language. So I decided to write my own closure object!

mutable struct StructClosure{T, KW}
    buf::T
    pos::Int
    len::Int
    kw::KW
end

@inline function (f::StructClosure)(i, nm, TT)
    pos_i, x_i = readvalue(f.buf, f.pos, f.len, TT; f.kw...)
    f.pos = pos_i
    return x_i
end

So similar to our exponent example before, we define a StructClosure functor object, which this time has a few fields, which represent the closure-captured state variables that we need access to inside our actual function code. Also note that we made our functor mutable struct because in our function, we actually want to update our position variable after we've read a value (f.pos = pos_i).

We end up using our home-grown closure like:

@inline function read(::Struct, buf, pos, len, b, ::Type{T}; kw...) where {T}
    if b != UInt8('{')
        error = ExpectedOpeningObjectChar
        @goto invalid
    end
    pos += 1
    @eof
    b = getbyte(buf, pos)
    @wh
    if b == UInt8('}')
        pos += 1
        return pos, T()
    elseif b != UInt8('"')
        error = ExpectedOpeningQuoteChar
        @goto invalid
    end
    pos += 1
    @eof
    c = StructClosure(buf, pos, len, kw)
    x = StructTypes.construct(c, T)
    return c.pos, x

@label invalid
    invalid(error, buf, pos, T)
end

So we first create an instance of our closure c = StructClosure(buf, pos, len, kw), and then pass it to the higher-order function like x = StructTypes.construct(c, T). Finally, you'll notice how we return our closure variable at the end with return c.pos, x. How's the performance? Back in-line with our fully-unrolled, pre-higher-order function code. Ultimately, this actually felt like a pretty simple, even clever solution in order to cleanup my code and use some common, well-tested higher-order functions to do some fancier code unrolling.

As always, hit me up on twitter with any comments or questions and I'm happy to discuss further.