At Stack Exchange our monitoring system bosun (http://bosun.org) can use different time series databases as long as they can be bent into tag key+tag value models. Currently it works best with OpenTSDB, but can also support graphite (and elasticsearch populated by logstash). InfluxDB query support is in a branch, but don't want to merge until we have a devoted Bosun+InfluxDB maintainer since we don't use it at Stack currently.
Based on that experience, plus from conversations at Monitorma the other week here is what I think of the current state of some various TSDBs are. Some of this might just be lies or rumor - so take it at at that:
* OpenTSDB: Requires HBase behind it, so that can be a pain for people. Maintenance on it is sparse, it doesn't seem like the project has a shortage of contributors with the time needed. Stability isn't great (connection errors from time to time, having alerts based on querying OpenTSDB highlights this). Aggregation and downsampling don't behave as expected. For example rate derivatives happen too late in the order of operations - linear interpolation can be strange. Also to query metric with anything many tag combinations over more than a recent interval of time (say a month or more) is basically impossible - OpenTSDB memory blows up and GC dominates. This requires one to create additional metrics that are denormalized for this. This is kind of okay because OpenTSDB is incredibly storage efficient at storing time series data. No support for NaN. OpenTSDB has quite a bit of serious users https://github.com/OpenTSDB/opentsdb/wiki/Companies-using-Op.... It can ingest a lot of metrics at a high rate without issue.
* KairosDB: Not much experience here. From what I gather it is like OpenTSDB but for Cassandra. Someone mentioned that they thought they heard some core devs have gone to work at Influx which might be concerning - but I don't know if that is true. But same issue of having to run Cassandra if you don't already.
* Graphite: Very rich query language, but currently not a key / value model. Also is not very storage efficient so the approach is that data gets rolled up after a certain period of time - generally problematic for forecasting.
* InfluxDB: Looks promising, but I heard from multiple people "Tried influxdb - was cool but all my data corrupted and I couldn't recover it" at Monitorama. The general concern at Montiroma was that they are overestimating their stability currently when it comes to a production environment. Based on some basic testing at Stack, we found it to be much slower and take up a lot more space than OpenTSDB.
In summary there is no great choice today. More of a pick your pain and best fit situation. But I'm really curious what people with actual experience in these technologies can add to the tradeoffs and am hopeful for the future.
InfluxDB CEO here. Those problems with corrupting data were with the 0.8 line of releases. But to be honest there are people that have been running that and 0.7 in production for almost a year without problems. Your mileage may vary, but we're not supporting any releases prior to the 0.9 line.
For the 0.9 set of releases, this is what we're supporting going forward. There are some queries that cause the server to crash, but as far as I know, there are no problems that corrupt the database or cause data loss.
We'll be releasing 0.9.1 tomorrow. Every 3 weeks after that we'll be releasing a new point release in the 0.9 line that will be a drop in replacement.
Each one of these releases will fix bugs, improve performance, and add features on clustering (last part starting with 0.9.2).
We're starting work on the on disk size with the 0.9.2 release cycle. If it's ready it'll be in that release in 3 weeks.
Basically, it works now for some use cases and scales. Over the next 3 months we'll be adding features and optimizing to make it useful for larger scales and more use cases.
Overall it's still alpha software, which is why we haven't put anything out there that's called a 1.0 release. However, we're trying very hard to not make any breaking API changes going forward between now and whenever we get to 1.0.
I performed some OpenTSDB vs InfluxDB comparisons and found that InfluxDB used almost 20x the storage space and was 3x slower than OpenTSDB for an identical data set. The speed isn't that big of a problem and I'm convinced will get faster (esp with the write improvements in 0.9.1), but the space issue is harder to swallow.
HBase, which is what backs OpenTSDB, can compress data using LZO or snappy. Uncompressed, our test data was at 4GB, but went down to about 400MB after HBase compressed it. InfluxDB was using 8GB. OpenTSDB has done a lot of work to be byte efficient, and it's paid off. We hope InfluxDB will get to a similar place.
Yep, compression work is starting in the 0.9.2 release cycle. We'll be testing out those compression methods along with other stuff like delta encoding
Based on that experience, plus from conversations at Monitorma the other week here is what I think of the current state of some various TSDBs are. Some of this might just be lies or rumor - so take it at at that:
* OpenTSDB: Requires HBase behind it, so that can be a pain for people. Maintenance on it is sparse, it doesn't seem like the project has a shortage of contributors with the time needed. Stability isn't great (connection errors from time to time, having alerts based on querying OpenTSDB highlights this). Aggregation and downsampling don't behave as expected. For example rate derivatives happen too late in the order of operations - linear interpolation can be strange. Also to query metric with anything many tag combinations over more than a recent interval of time (say a month or more) is basically impossible - OpenTSDB memory blows up and GC dominates. This requires one to create additional metrics that are denormalized for this. This is kind of okay because OpenTSDB is incredibly storage efficient at storing time series data. No support for NaN. OpenTSDB has quite a bit of serious users https://github.com/OpenTSDB/opentsdb/wiki/Companies-using-Op.... It can ingest a lot of metrics at a high rate without issue.
* KairosDB: Not much experience here. From what I gather it is like OpenTSDB but for Cassandra. Someone mentioned that they thought they heard some core devs have gone to work at Influx which might be concerning - but I don't know if that is true. But same issue of having to run Cassandra if you don't already.
* Graphite: Very rich query language, but currently not a key / value model. Also is not very storage efficient so the approach is that data gets rolled up after a certain period of time - generally problematic for forecasting.
* InfluxDB: Looks promising, but I heard from multiple people "Tried influxdb - was cool but all my data corrupted and I couldn't recover it" at Monitorama. The general concern at Montiroma was that they are overestimating their stability currently when it comes to a production environment. Based on some basic testing at Stack, we found it to be much slower and take up a lot more space than OpenTSDB.
In summary there is no great choice today. More of a pick your pain and best fit situation. But I'm really curious what people with actual experience in these technologies can add to the tradeoffs and am hopeful for the future.