While preparing for the next revision of the Depth Controller firmware, I found that the web interface would start to show significant lag when a large number of zones were changing. Some profiling narrowed this down to the interface between the depth camera backend (written in C) and the web frontend (written in Ruby):
Results (267.6715s elapsed): | ||||
Zone.new | count: 22162 | time: 110.78 | each: 0.005 | overall: 41.39% |
ovh_plot | count: 693 | time: 61.45 | each: 0.089 | overall: 22.96% |
ovh_png | count: 693 | time: 53.60 | each: 0.077 | overall: 20.03% |
kvp | count: 22015 | time: 48.95 | each: 0.002 | overall: 18.29% |
Zone.normalize! | count: 22162 | time: 28.21 | each: 0.001 | overall: 10.54% |
get_zones | count: 123 | time: 13.55 | each: 0.110 | overall: 5.06% |
analysis
The Depth Controller web interface uses the same text-based protocol on port 14308 that is made available to users for custom integration. As such, every time a zone's contents change, the web frontend has to parse a line of key-value pairs describing the change. The Zone.new line measures the total amount of time spent parsing zone information from the backend. This total includes time spent in kvp, which is the step of parsing a line of key-value pairs into a Ruby hash, and Zone.normalize!, which makes sure all of a zone's attributes have the right data type.
The numbers show that, even with the web interface open and watching the overhead view, zone parsing is taking more time than either of the image processing steps for generating the overhead view (41.39% for zone parsing vs. 22.96% and 20.03% for overhead generation and compression)! This code had already gone through a couple of rounds of optimization before it ever reached the public, but has become a bottleneck once again. It looks like finding a way to attack both kvp and Zone.normalize! will knock out almost 29% of that 41% total. Additionally, the get_zones task, which executes rarely but runs for 110ms straight, includes some zone processing, so improving the speed of Zone.new will also shorten get_zones and cut down on intermittent jitter.
To understand how to remedy the situation, I prepared benchmarks of each of the methods I could use to speed up kvp. In addition to different kvp implementations, I tested JSON, YAML, and Ruby's eval function, as well as a standalone Zone class that didn't rely on hashes at all. After some testing, I wrote a simple state machine-based version of kvp in C that avoided all use of regular expressions. It also parsed integers, floating point numbers, and strings directly, thus eliminating the need for Zone.normalize!. These are the results on my desktop i7 (ARM is slower, but the proportional relations hold):
Implementation→ ↓Task |
Baseline kvp: Ruby regex, C unescape | Hash-free Zone class | kvp 2: C regex, C unescape | kvp 3: C state machine, C unescape | JSON.parse | YAML.load | Ruby language eval() |
---|---|---|---|---|---|---|---|
Parse full zone line |
12804/s |
N/A |
7191/s (0.56x) |
65832/s |
46910/s (3.66x) |
11707/s (0.91x) |
24014/s (1.88x) |
Parse partial zone line |
23485/s |
N/A |
18003/s (0.77x) |
152892/s |
N/A |
N/A |
N/A |
Update zone with new data |
935222/s |
83390/s |
N/A |
N/A |
N/A |
N/A |
N/A |
conclusion
The C-language state machine-based kvp is the clear winner, resulting in as much as a 6.5x speed improvement. With all overhead accounted for, the new key-value parser can process zone updates 2.4 times faster than the original code. The final word will come from the results on the controller itself:
Results (274.1677s elapsed): | ||||
ovh_plot | count: 1044 | time: 74.07 | each: 0.071 | overall: 27.02% |
ovh_png | count: 1044 | time: 60.15 | each: 0.058 | overall: 21.94% |
Zone.new | count: 20687 | time: 38.20 | each: 0.002 | overall: 13.93% |
unpack | count: 1049 | time: 19.26 | each: 0.018 | overall: 7.03% |
kvp | count: 21324 | time: 11.68 | each: 0.001 | overall: 4.26% |
get_zones | count: 131 | time: 5.89 | each: 0.045 | overall: 2.15% |
There you have it. With the new and improved key-value pair parser, Zone.new finds its rightful place below the CPU-intensive image processing tasks, and the web UI feels much snappier. We gained nearly all of our expected 29%. All this was accomplished without changing the backend protocol. There are still optimizations that could be made in the new kvp code, but if zone processing becomes a bottleneck again, I will switch to something like MessagePack (initial testing shows a further 2x speedup over the new kvp).