The new model combines tags, which are indexed, with fields, which are not indexed. A measurement can have up to 255 different fields, all of which can be written for a single data point.
As we push things forward we'll be adding more analytics queries in. But for the time being it's more aptly suited for metrics and sensor data.
With 0.9.0 you should be able to use a combination of fields and tags to get some fairly sophisticated queries. Where clauses also work on both tags and field values.
It might be easier to discuss on the InfluxDB Google Group. I'd like to hear more about your specific use case, the data you're writing in, and the kinds of questions you're asking of that data.
Interesting. So we'd see a lot of benefit from the new model.
Except our documents may easily have more than 255 fields. We would never to query that much, mind you; but we don't know ahead of time what we'd need to query.
A nice thing about InfluxDB compared to other solutions is that it's schemaless and can still aggregate data pretty fast. ElasticSearch is much faster (than 0.8), but it has a big problem with indexes needing to be predefined (auto-creating mappings is highly flawed).
I might drop by the Google group with more questions.
Because ElasticSearch just assigns index settings based on the first value it gets.
For trivial values such as numbers, that's usually okay, unless it happens that the field is polymorphic (not a good schema design, of course).
But it doesn't know how to set up any of the mappings; it doesn't know whether something is a full-text field (which often requires analyzers) or an atomic, enum-type string.
It also doesn't know about dates. If you index pure JSON documents, it will simply store them as strings.
This would all have been a non-problem if updating mappings in ES were simple, but it's not. A mapping is generally append-only; if you want change the type of a mapping, or its analyzer, or most other settings, you have to create a new index and repopulate it. Schema migrations in ES are a lot more painful than, say, PostgreSQL.
Analytics are usually based on aggregating discreet events. It's based on irregular time series. Metrics and sensor data are usually regular time series. That is, series where you have samples taken at regular intervals of time, like once ever 10 seconds.
When it comes to querying regular time series, don't have a huge number of points to aggregate across a single series, where analytics can have millions in a single series that you're looking at.
Then there are other types of queries that you need in analytics that don't make sense in metrics like getting sessions and calculating funnels.
InfluxDB is still useful for analytics today, it's just that in some instances it's more basic and crude compared to what you can do with things like MixPanel.
Is it possible that a regular time series could have better read performance (particularly in aggregations) vs an irregular one due to determinism/randomness - or is that irrelevant to the underlying implementation?
As we push things forward we'll be adding more analytics queries in. But for the time being it's more aptly suited for metrics and sensor data.
With 0.9.0 you should be able to use a combination of fields and tags to get some fairly sophisticated queries. Where clauses also work on both tags and field values.
It might be easier to discuss on the InfluxDB Google Group. I'd like to hear more about your specific use case, the data you're writing in, and the kinds of questions you're asking of that data.