The new model combines tags, which are indexed, with fields, which are not index...

atombender · on June 4, 2015

Interesting. So we'd see a lot of benefit from the new model.

Except our documents may easily have more than 255 fields. We would never to query that much, mind you; but we don't know ahead of time what we'd need to query.

A nice thing about InfluxDB compared to other solutions is that it's schemaless and can still aggregate data pretty fast. ElasticSearch is much faster (than 0.8), but it has a big problem with indexes needing to be predefined (auto-creating mappings is highly flawed).

I might drop by the Google group with more questions.

fizx · on June 4, 2015

Why is auto-creating mappings flawed?

atombender · on June 4, 2015

Because ElasticSearch just assigns index settings based on the first value it gets.

For trivial values such as numbers, that's usually okay, unless it happens that the field is polymorphic (not a good schema design, of course).

But it doesn't know how to set up any of the mappings; it doesn't know whether something is a full-text field (which often requires analyzers) or an atomic, enum-type string.

It also doesn't know about dates. If you index pure JSON documents, it will simply store them as strings.

This would all have been a non-problem if updating mappings in ES were simple, but it's not. A mapping is generally append-only; if you want change the type of a mapping, or its analyzer, or most other settings, you have to create a new index and repopulate it. Schema migrations in ES are a lot more painful than, say, PostgreSQL.

andyl · on June 4, 2015

Paul - can you explain the difference between analytics and metrics/sensor queries??

pauldix · on June 4, 2015

Analytics are usually based on aggregating discreet events. It's based on irregular time series. Metrics and sensor data are usually regular time series. That is, series where you have samples taken at regular intervals of time, like once ever 10 seconds.

When it comes to querying regular time series, don't have a huge number of points to aggregate across a single series, where analytics can have millions in a single series that you're looking at.

Then there are other types of queries that you need in analytics that don't make sense in metrics like getting sessions and calculating funnels.

InfluxDB is still useful for analytics today, it's just that in some instances it's more basic and crude compared to what you can do with things like MixPanel.

mattLummus · on June 17, 2015

Is it possible that a regular time series could have better read performance (particularly in aggregations) vs an irregular one due to determinism/randomness - or is that irrelevant to the underlying implementation?