I was actually looking for similar thing. I was disappointed while I was searching good presentation toolkit to produce presentation videos programmatically. I ended up with using HTML5 and phantomjs. It was totally hack and no control over timings. I would like to see how Gizeh handles text, fonts and text effects.
The animations aren't as nice-n-smooth as the David Whyte ones which inspire this. Is it just framerate (easy to fix) or is there something deeper causing this?
It's not just the frame rate; David Whyte is doing 4x temporal supersampling too, so for each output frame, 4 frames are rendered, accumulated into an integer buffer (lines 20..29) and the output is their average (lines 31..35).
In addition, interestingly the color mapping is done afterwards (lines 38..41), based only on the red value of the original image. (One could do HDR and other tone mapping here, too.)