A while back, Matt Dees blogged about our upcoming change to LivePHP in 11.28. Specifically, he mentions the use of JSON. In this article I will illustrate, in brief, why this change was made. The decision process, as you’ll see, wasn’t exactly straight forward, but a solid compromise.
I’m not going to expound on the benefits or drawbacks of using XML vs. JSON as a format. We’ve see those debates for a while now in various online (and offline) venues. Instead, I’d like to talk about our choice to use JSON within LivePHP, based on data from a three point comparison we did.
First, let’s go over some background information. The LivePHP class was originally written for PHP 4 compatibility. Those were the days when we couldn’t depend on having a parser for XML or JSON readily available within a given PHP binary. One of the best options when we initially wrote LivePHP was to ‘include’ a pure PHP XML parser. But thankfully, PHP 4 has nearly gone the way of the dinosaurs and cPanel’s internal build of PHP has been 5.2.x. for awhile now. This allowed us the opportunity to refactor the LivePHP class.
Once we decided to refactor the LivePHP class, we boiled our wish list down to two key points:
- Reduce resource consumption.
- Maintain backwards compatibility for anyone who relied on an external parser or any legacy code within cPanel that produces a particular output string.
Maintaining backwards compatibility for the cPanel strings was a fairly easy task. We knew that XML strings were always sent in the legacy code, so as long as we could detect a “pure” XML structure, we could use SimpleXML or the pure XML parsing functions that were already being shipped. But which one — SimpleXML or pure PHP? SimpleXML seemed like the obvious answer until we explored the full meaning of this compatibility goal.
Since we not only wanted compatibility on the string level, but also from the utility level, the idea of changing the external parsing functions needed more consideration and, as you’ll see, compromise.
The likelihood that someone uses those pure PHP functions directly (in their own, neighboring code) is small. However, if we were to suddenly remove them or rewrite their internals, the potential risk of breaking someone’s application seemed too great. And thus, we struck a compromise: we would ship the parsing functions as they had always been and simply create a logic flow within the class that would fallback to the old functions if a legacy-style string happened to come across the wire.
The other goal was to reduce resource consumption. The refactoring process was not only about optimizing the code, but also making it “better. ” While evaluating this class, we found a ton of stuff that we wanted to change. In the end, we had to ask ourselves, “How can we cover the most ground while making the fewest changes?” Sure, there were plenty of other changes like more robust logging, conforming to standard PHP 5 class conventions, and a more PEAR-ish coding style that would eventually make it into the final revision of LivePHP. But really, the true gain would ultimately come down to dealing with the overhead of encoding and decoding the data stream. This is where a few simple tests helped us decide that PHP’s native JSON functions would work best for us.
For brevity’s sake, I won’t include the actual code in this article. Fortunately, it’s straight forward enough to describe the tests and present the results in a few charts.
We used two datasets, both based on real output from the Email::listpopswithdisk API2 function.. This function’s return value is an itemized list of email account details for a cPanel account. The first set is very small; only two email accounts exist for the cPanel user. The second set is large, but not uncommon at 5000 email accounts for the cPanel user.
We’ll compare three scenarios that all produce a PHP associative array:
- JSON string decoded with “json_decode($str, true)” [“true” produces the array]
- XML string decoded with the legacy pure PHP functions
- XML string decoded with “simplexml_load_string($str)” [in a recursive callback to get an array]
The first, small dataset yielded mildly interesting results. The data makes it pretty clear that SimpleXML takes less memory at compile and execution time and during the life of the script. It also responds just as fast as the JSON.
However, in the large dataset, the results are a bit more intriguing. The JSON version is using more memory at the start of execution, but once the script is done, it consumes the least memory over time. As one might guess, the pure SimpleXML parsing consumes a noticeably larger amount of memory during its lifetime. The pure PHP XML functions perform well, but as a whole they just aren’t as good as the native JSON parsing.
And then there’s the execution time on the larger dataset…wow! The pure PHP XML functions are just atrocious in comparison to the other two methods. SimpleXML, while consuming 42% more memory, took twice as long as the JSON.
Ultimately, we decided that PHP’s native JSON parsing was the best fit. We had several debates about the various permutations of these scenarios and dataset sizes. That is, with really small datasets, SimpleXML is the obvious tool of choice. We also did some analogous Perl testing to see how the other end of the system would perform. The results were conclusively in favor of JSON (we predominantly use JSON::XS nowadays). So, in the end, we came to the consensus that datasets tend to be larger rather than smaller and that execution time is the hardest resource to compromise. This makes JSON the best all around solution for LivePHP.