Solving slowdowns in the Depth Controller web interface

Lately I've been working on adding some new features to the Depth Camera Controller. Unfortunately, I found that with these new features and xAP enabled, the controller's frame rate would gradually decrease, its responses would become significantly delayed, and eventually the entire system would grind to a halt.

With a bit of profiling I discovered that the controller's web server was spending all its time in the xAP support code, specifically the Treetop-based xAP parser.


xAP

First, a bit of background on xAP.  xAP is a horribly inefficient protocol for smart home devices that just happens to be supported by a lot of DIY home automation platforms.  Reasons why xAP is inefficient:

  1. Everything is all broadcast, all the time.  Each xAP message is sent via UDP broadcast to the entire network, so all xAP devices and applications have to deal with every single xAP message on the network, even messages destined for other devices.
  2. xAP messages are text-based. Parsing values from text can be slow.
  3. xAP Basic Status and Control, the schema used by the Depth Camera Controller and supported by automation packages like HomeSeer, sends only a single binary and/or numeric value in each packet. That's 100+ bytes just to send a single bit.
  4. xAP has some unusual multi-device address wildcarding, which can be time consuming to process.

Reason number 1 is the probably the most significant factor in the Depth Camera Controller's performance problems—every message we send, we also receive and have to process. However, we can't be sure until we've actually tested the parser's performance. So, I threw together a quick benchmark of just the xAP parser, and tested it on my PC and a controller:


original parser - x64

$ ./parser_bench.rb 10000
Running 10000 iterations of ParseXap.parse
10000 iterations: cpu=9.690000 clock=(9.696733)
1031.2751730063692 per second

Running 10000 iterations of node.to_hash
10000 iterations: cpu=0.280000 clock=(0.275799)
36258.3475033779 per second

For whatever reason, the benchmark ran significantly slower when run with realtime priority via chrt or schedtool (perhaps something to examine some other day). Here are the numbers for the controller itself:


original parser - ARM

$ ./parser_bench.rb 1000
Running 1000 iterations of ParseXap.parse
1000 iterations: cpu=19.690000 clock=(25.032248)
39.94846963781098 per second

Running 1000 iterations of node.to_hash
1000 iterations: cpu=0.660000 clock=(0.733073)
1364.121077297844 per second

Yikes! Some of the new features I'm working on, which were designed to reduce average CPU usage and response delays, could generate 300 (30fps times 10 zones) xAP messages per second or more. Obviously the frame rate will drop as CPU usage increases, but if we're only parsing 40 messages per second at 80% CPU usage, it's no wonder the system is getting bogged down. The EventMachine event queue in the controller's web server is filling up with unprocessed xAP packets faster than it's being emptied by the parser.


to be continued aftermath

With the problem identified, I have some optimizing to do. I'll write a followup post detailing how I resolve the performance problem with the xAP parser.

Update Jul. 12, 2012: I have written a new trivial parser for xAP messages that improves performance by a factor of 28 or more. Details are in my next blog post (linked in the previous sentence).