I don't have great measurements for this, but we have done optimizations to reduce the data flowing over the Go<->C interface. One of our key measurements that does make a big impact on performance is how often we need to upload data (that's not already over there) to the GPUs. So that's something we have worked on reducing (buffer reuse, compression). We also have a series of caches on the other side, so we aren't drawing more than we need to. It's hard for me to tease apart how much of these optimizations (and others) are ultimately aimed at addressing the cgo overhead, and how many are just typical stuff. The data we work with is cumbersome and my intuition is that there's probably a lot of room for optimization in our drawing even still, regardless of cgo. I wouldn't be surprised if a direct port C/C++ implementation of the rendering pipeline was significantly faster than ours in getting data into and out of the GPUs, but a big part of the project is data storage/networking/serving/caching as well and Go has bridged the gap for us (a small team that needs to build reasonably fast things reasonably quickly :)).
That's interesting. The cgo overhead was the only thing holding me back from considering it for games, since I didn't want to write a lot of C wrappers around the C libraries I want to use just to have them be more efficient, which is a shame, since Go is pretty nice, barring the C interop in some cases.
When I used to frequent gonuts, I raised the issue why they didn't went the FFI way as D, Rust, .NET, Delphi, FreePascal do, but sadly they rather use cgo as solution.