Tricksy Tuple Types
Some of you may be aware of my obsession with JSON libraries, and it's true, there's something about simple data formats that sends my brain into endless brainstorming of ways to optimize reading, writing, and object-mapping in the Julia language. JSON3.jl is my latest attempt at a couple of new ideas for JSON <=> Julia fun. The package is almost ready for a public release, and I promise I'll talk through some of the fun ideas going on there, but today, just wanted to point out a tricky performance issue that took a bit of sleuthing to track down.
Here's the scenario: we have a string of JSON like {"a": 1}
, super simple right? In the standard Julia JSON.jl library, you just call JSON.parse(str)
and get back a Dict{String, Any}
. In JSON3.jl, we have a similar "plain parse" option which looks like JSON3.read(str)
, which returns a custom JSON3.Object
type which I can talk about in another post in more detail. Another option in JSON3.jl, is to do JSON3.read(str, Dict{String, Any})
, i.e. we can specify the type we'd like to parse from any string of JSON. While doing some quick benchmarking to make sure things look reasonable, I noticed JSON3.jl was about 2x slower compared to both JSON.parse
, and JSON3.read(str, Dict{String, Int})
. Hmmm, what's going on here??
I first turned to profiling, and used the wonderful StatProfilerHTML.jl package to inspect my profiling results. That's when I noticed around ~40% of the time was spent on a seemingly simple line of code:
Hmmmm......a return statement with a simple ifelse
call? Seems fishy. Luckily, there's a fun little project called Cthulhu.jl, which allows debugger "stepping" functionality with Julia's unparalleled code inspection tools (@code_lowered
, @code_typed
, @code_llvm
, etc.). As I "descended into madness" to take a look at the @code_typed
of this line of code, I found this:
%1865 = (JSON3.ifelse)(%1864, %1857, %1851)::Union{Float64, Int64}
%1866 = (Core.tuple)(%1853, %1865)::Tuple{Int64,Union{Float64, Int64}}
Ruh-roh Shaggy.......the issue here is this Tuple{Int64,Union{Float64,Int64}}
return type. It's not concrete and leads to worse type inference in later code that tries to access this tuple's second element. This is also undesirable because we know that the value should be either an Int64
or Float64
, so ideally we could structure things so that code generation can just do a single branch and generate nice clean code the rest of the way down. If we change the code to:
Let's take another cthulic descent and check out the generated code:
%1863 = (%1857 === %1862)::Bool
│ │ @ float.jl:484 within `==' @ float.jl:482
│ │┌ @ bool.jl:40 within `&'
│ ││ %1864 = (Base.and_int)(%1861, %1863)::Bool
│ └└
└──── goto #691 if not %1864
@ /Users/jacobquinn/.julia/dev/JSON3/src/structs.jl:330 within `read' @ /Users/jacobquinn/.julia/dev/JSON3/src/structs.jl:99
690 ─ %1866 = (Core.tuple)(%1853, %1857)::Tuple{Int64,Int64}
└──── goto #693
@ /Users/jacobquinn/.julia/dev/JSON3/src/structs.jl:330 within `read' @ /Users/jacobquinn/.julia/dev/JSON3/src/structs.jl:101
691 ─ %1868 = (Core.tuple)(%1853, %1851)::Tuple{Int64,Float64}
└──── goto #693
Ah, much better! Though there's a few more steps, we can now see we're getting what we're after: our return type will be Tuple{Int64,Int64}
or Tuple{Int64,Float64}
instead of Tuple{Int64,Union{Int64,Float64}}
. And the final performance results? Faster than JSON.jl!
Thanks for reading and I'll try to get things polished up in JSON3.jl soon so you can take it for a spin.
Feel free to follow me on twitter, ask questions, or discuss this post there :)
Cheers.