Interesting. Do you think your use case could be helped by doing only the traini...

Interesting. Do you think your use case could be helped by doing only the training remotely on a single GPU/GPU cluster, and doing the rest of the development work on a cheaper machine? (basically just equivalent of estimator train() runs on a faster machine that quits afterward).

I wrote a tool that does that with Keras but I'm not sure if it's actually useful for real-world use cases.