Comparing C and C++ usage and performance with a real world project

userbinator · on Sept 3, 2017

I'm going to be the first to point out the one major flaw in this comparison: "plain C using GLib" is not comparable to "C++ standard library only" --- what should be compared is "C++ standard library only" and "C standard library only". Reimplementing pkg-config in pure C without GLib would be necessary for that.

As for the "memory leaks" --- I haven't looked at the source, but something whose runtime is very short-lived, like pkg-config, may be very well justified in allocating and never freeing, letting the process exit itself be the "ultimate free". I've seen and done this many times myself.

I've seen projects that turned from simple and straightforward to buggy (and harder to debug), slow, and bloated because someone decided they wanted to "use C++" and would try to make use of as many "modern C++" features as they could.

Converting an existing C program into C++ can yield programs that are as fast, have fewer dependencies and consume less memory. The downsides include a slightly bigger executable and slower compilation times.

My experience has been the complete opposite.

_xzxj · on Sept 3, 2017

Reminds me of this anecdote I came across on the internets one time: https://groups.google.com/forum/message/raw?msg=comp.lang.ad...

jamiek88 · on Sept 3, 2017

The ultimate in garbage collection indeed!

Highly recommend reading that short tale! This lore is slowly being forgotten, thanks for that link.

pjc50 · on Sept 3, 2017

See also https://www.ibiblio.org/harris/500milemail.html

(Is there a canonical lore repository for this kind of thing, other than the Jargon file?)

_xzxj · on Sept 3, 2017

haha that's a good one!

tjalfi · on Sept 3, 2017

I submitted this post to HN a few months ago.

The discussion is at https://news.ycombinator.com/item?id=14233542

jimktrains2 · on Sept 3, 2017

It seems you can no longer view Google groups without logging in?

blub · on Sept 3, 2017

I have heard of calculating a "memory budget" and pre-allocating that, but calculating a "leak budget" and doubling that doesn't seem like hygienic programming.

coldtea · on Sept 3, 2017

Doesn't have to be hygienic, it's enough that fixing it doesn't justify the time/money costs for the programmer.

blub · on Sept 3, 2017

We wouldn't want the programmer to spend too much money/time on fixing errors. After all, it's not like people would die if there's an error in the missile software: http://www.gao.gov/mobile/products/IMTEC-92-26

Not to mention they added even more HW to work around the leaks. No wonder those projects always run over budget.

coldtea · on Sept 4, 2017

>We wouldn't want the programmer to spend too much money/time on fixing errors. After all, it's not like people would die if there's an error in the missile software

When exactly did pkg_config became missile software?

As they say in meme-land, "that escalated quickly".

Here's a novel idea: how about the appropriate level of effort/time/YAGNI-stuff based on the domain?

Or do you write one-time scripts with MISRA rules?

blub · on Sept 4, 2017

The sub-thread I replied to was linking to an anecdote about missile software.

But I still think not freeing is pretty lame even in short-lived tools. If one is using a no-op deallocator at least the code is designed properly and could be repurposed.

CJefferson · on Sept 3, 2017

The problem with counting leaks is it counts memory you need until the very end of your program. There is no point calling free as you are exiting anyway, and I have seen programs where freeing everything took .5 seconds as the program was closing.

nitrogen · on Sept 3, 2017

As someone whose startup was probably hindered by an almost religious adherence to testing with Valgrind (hardware was way too slow), I'd still say finding the real leaks and bugs is a lot easier when you don't have the noise of "expected" leaks.

bch · on Sept 3, 2017

> As for the "memory leaks" --- I haven't looked at the source, but something whose runtime is very short-lived, like pkg-config, may be very well justified in allocating and never freeing, letting the process exit itself be the "ultimate free". I've seen and done this many times myself.

This makes sense, and then somebody has a vision for a use case beyond the original imagination, goes to turn the code into a library and spends countless hours smartening up lazy resource management. Whether the original author should be "more responsible" is open for debate, but I've personally run into the above situation, and only mention for another perspective on The Life of Code.

Edit: s/coffee/code/

userbinator · on Sept 3, 2017

That's the classic KISS/YAGNI vs. (not sure if there is an initialism for it) robust extensible modular best practices etc. debate, for which there has been much bikeshed from both sides.

lmitchell · on Sept 3, 2017

You might be looking for the term 'SOLID' ;)

e: it may be worth noting that I can't for the life of me remember what the I and D stand for... clearly I am much more of a KISS kind of programmer.

coldtea · on Sept 3, 2017

>This makes sense, and then somebody has a vision for a use case beyond the original imagination, goes to turn the code into a library and spends countless hours smartening up lazy resource management.

That's THEIR problem then.

Why should the original author care for that?

bch · on Sept 3, 2017

> That's THEIR problem then.

It is their problem, clearly.

> Why should the original author care for that?

That's the question. Don't you think it's easy to think of reasons, though? What if the person porting the code to a library was a later version of the original author?

coldtea · on Sept 3, 2017

>What if the person porting the code to a library was a later version of the original author?

Then they know what they need to add, and can add it now that they need it -- instead of having it slowing them down when they didn't need it

astrobe_ · on Sept 3, 2017

Will you also call them lazy because they didn't care about being thread safe or UTF8-ready?

bch · on Sept 3, 2017

> Will you also call them lazy because...

I said it's lazy resource management. I'm really not trying to condemn, here, just provoke questions.

humanrebar · on Sept 3, 2017

> As for the "memory leaks" --- I haven't looked at the source, but something whose runtime is very short-lived, like pkg-config...

To be fair in the other direction, it's a minority of interesting software projects that don't care about memory leaks. Also, there are ways to "skip" freeing memory in C++ safely using appropriate allocators. In actuality, you'd have your allocator, in its destructor, clean up itself and all its objects at the same time. It's nearly the same performance without sacrificing correctness or lowering standards with respect to leaking memory.

andrepd · on Sept 3, 2017

"Using C++" does not mean "Using lots and lots of features of C++ just because". That some people do that is hardly a criticism of C++, more of the lousy programmers that write software like that.

userbinator · on Sept 3, 2017

Unfortunately it seems that the majority of C++ code out there is like that. It could be said that C++ makes it far easier than C to introduce unnecessary abstraction and indirection, without realising the true costs.

I've noticed that using a lower-level language, where abstractions have to be built explicitly, tends to cause one to rethink the problem and often come up with an even simpler and more efficient solution, by approaching it from another direction which using a higher-level language may not even allow.

An example of this I encountered several years ago was with several coworkers who were trying with utmost effort to optimise a piece of code which the profiler had indicated was taking a substantial amount of time --- and a lot of it consisted of memory allocation and copying. They tried lots of "classic" tricks like unrolling, inlining, even reorganising the layout of several classes in an attempt to be more cache-friendly. I looked at the algorithm and realised rather quickly that the code in question was not necessary at all; some trivial modifications to code elsewhere which was using it and deleting that code completely resulted in 30x faster performance and 1/10 memory usage. Due to their background, my coworkers were stuck in the mindset that it was necessary to perform all that convoluted processing, and neglected to see the bigger picture.

quicknir · on Sept 4, 2017

> Unfortunately it seems that the majority of C++ code out there is like that. It could be said that C++ makes it far easier than C to introduce unnecessary abstraction and indirection, without realising the true costs.

Citation needed? Better developers are more aware, in any language. There are some cases where idiomatic C++ may introduce more indirection over C (though I can't think of any); there are plenty where idiomatic C introduces more indirection than C++. However, with a little more effort and awareness, the faster and more maintainable solution is always accessible.

> I've noticed that using a lower-level language, where abstractions have to be built explicitly, tends to cause one to rethink the problem and often come up with an even simpler and more efficient solution, by approaching it from another direction which using a higher-level language may not even allow.

Having lots of developers, re-implement many things, mostly just results in much more buggy code. Getting things exactly right is hard. Having a bigger standard library and safer abstractions is a huge edge.

I don't think your anecdote has anything to do with C vs C++. I think basically some negative experiences with so-so C++ devs has colored your thinking rather than technical reasons.

kitsunesoba · on Sept 3, 2017

I've only written patches of C++ in a couple of tiny projects, and something I've had trouble with in both cases is trying to figure out the somewhat objectively "right" or "best" way to write it. It's not a language I'm fluent in (the vast majority of my experience has been in Objective-C and Swift) so I find myself spending a good chunk of time doing research trying to figure out what the most widely accepted/correct way to do [insert thing] is in C++.

So if what's written here is true, I may be unwittingly baking bad practices into my C++ knowledge as a direct result of trying to accomplish the exact opposite…

Which leads me to the question: what is the "right way"? I've seen highly vocal critics of writing C++ as "C with extras", so I assume some middleground is where I need to target?

quicknir · on Sept 4, 2017

I would read books like Scott Meyers' Effective Modern C++, watch cppcon talks. There's a lot of very high quality C++ content out there these days.

Generally there's a lot of emphasis on RAII, clear ownership semantics, leveraging more of the standard library as its grown, using lambdas, avoiding shared mutable state, judicious but not excessive use of inheritance and in particular avoiding implementation inheritane, encapsulation.

It's not so much a middle ground in the sense you are thinking. The people I'm talking about don't advocate developers, particularly non expert, going crazy with templates. People do that on their own.

Hope that helps.

jackmott · on Sept 3, 2017

lots of people who have produced super high quality, money making software in c++ ignore almost everything advised by stroustrop and other modern c++ advocates. don't worry about it, just aolve your own problems.

subwayclub · on Sept 4, 2017

The right way is, to speak in philosophical terms, totally a pragmatic choice. If nothing in C++ is working for you, don't use it. If a really obscure technology is making you go really fast, use it.

The one and only thing that muddies this picture is other people. Once you are collaborating on a program(whether on the same team, through end-user code, or through a library or API call) all your tricks, preferences and conventions are subject to other people's inept groping and misunderstanding. And that is where you get into standardized best practices. They are basically guaranteed to not actually be the best practice, but they're the one you can compromise on.

matthewaveryusa · on Sept 3, 2017

My experience has been the opposite of yours. Freeing is always a good idea so you can use tools like valgrind to find memory bugs without needing to sort through all the false positives. Your short-lived program today can become a service tomorrow. c++is worth it just for RAII

wilun · on Sept 3, 2017

Valgrind does not force the user to read mem leak reports along errors. Actually IIRC, by default you don't get detailed leak reports.

matthewaveryusa · on Sept 8, 2017

True, the defaults are pretty lame, but my point is if you don't free, and you start using data you intended to free, that's a memory bug valgrind won't be able to help you with unless you free your memory.

FWIW, this is in my bashrc along with a myriad of other programs that have lame defaults: alias valgrind='valgrind --leak-check=full --show-reachable=yes --track-origins=yes --track-fds=yes --error-limit=no'

zvrba · on Sept 3, 2017

> As for the "memory leaks" --- I haven't looked at the source, but something whose runtime is very short-lived, like pkg-config, may be very well justified in allocating and never freeing, letting the process exit itself be the "ultimate free".

But sometimes the code grows and what was once a stand-alone executable is about to become a component in a larger executable. With C++ you get a correct component out of the box (if you use RAII consistently), but with C you have to audit all of the code and clean up the leaks.

jackmott · on Sept 3, 2017

bit of a tautology there. correct if correct.

astrobe_ · on Sept 3, 2017

Correct if correct, portable if portable, modular if modular, and future-proof if the future isn't too surprising.

bitwize · on Sept 3, 2017

Converting the code to pure C without GLib would increase the code size considerably, since GLib provides utilities, such as hashtables, trees, and better string handling, that the standard C++ library provides but the standard C library does not.

taneq · on Sept 3, 2017

> My experience has been the complete opposite.

Keyword: "can".

jhasse · on Sept 3, 2017

> what should be compared is "C++ standard library only" and "C standard library only"

The C++ standard library has more features though.

blub · on Sept 3, 2017

Never freeing is sloppy, it means resource handling was likely not thought through. I would be concerned about non-memory resource handling in particular.

If one must use this "trick", it's better to think things through, add the appropriate release calls and then somehow replace the release function with a no-op.

Anyway, you're welcome to take a C++ project and translate it to a fast C project with fewer dependencies and lower memory consumption. Then we can discuss facts instead of your personal opinions.

TickleSteve · on Sept 3, 2017

Never freeing is common practice in embedded systems where you should be pre-allocating all data (after worst-case analysis). This removes any possibility of fragmentation issues, etc. Many coding standards forbid dynamic allocation for embedded & real-time systems also for safety & reliability reasons.

It is not unusual to have allocate-only heaps for exactly these reasons.

blub · on Sept 3, 2017

I agree, but the C idiom is to just not call free, without pre-allocating anything. It's an optimisation to avoid waiting for the resources to be released, with the idea that they will be anyway when the process is killed.

fpgaminer · on Sept 3, 2017

The minor discussion on the C version's memory leaks reminded me of a neat trick. If you're developing a short lived application, like pkg-config, you can opt to never deallocate. i.e. leak everything. In lightweight, short lived applications there's usually not a lot of incentive to deallocate; your application will never use much memory anyway and the deallocations waste time.

You can think of it like treating C as a garbage collected language, except the garbage collection cycle occurs only once at the end of the program :P

It really can be an effective trick. Deallocation isn't free, and under certain loads can be quite expensive.

The 1000+ leaks in the C version might actually be what's giving it the slight run-time advantage.

LukeShu · on Sept 3, 2017

I remember one of those sysadmin stories, where a multi-terabyte `cp` command had seemingly completed all of its work, but was sticking around for days; slowly and pointlessly free()ing the 17GB hash table of hardlinks that it had built up; when it could have just exited and let the OS reclaim the memory.

https://lists.gnu.org/archive/html/coreutils/2014-08/msg0001...

striking · on Sept 3, 2017

It's stop-the-world garbage collection, where collection only happens when the world ends.

Side note wrt deallocations being somewhat slow: could something like Boehm conservative GC speed that up, by grouping all the deallocations together, or by doing them on a separate thread?

agumonkey · on Sept 3, 2017

OS delegated one pass GC.

andreasgonewild · on Sept 3, 2017

Or take that even further and allocate a slab that's big enough to last the entire runtime and send malloc on vacation. It's worth repeating, with todays focus on web-frameworks, cloud-providers and dogmatics these ideas are slipping into obscurity; which is a shame given how beneficial they can be if your program fits the use case.

humanrebar · on Sept 3, 2017

C++ actually supports this approach in the standard library and doing this is a really standard technique in some domains. That's what the allocator template parameter in vector and other containers is for.

sebcat · on Sept 3, 2017

And, instead of using pointers into said memory, use indices so that the slab can be reallocated, or just mmap more pages following the slab if you own the process address space. Also, align properly. And having guard pages is always nice. Freeing on exit too, one single deallocation is pretty cheap.

I've seen programs building ASTs with hundreds of millions of nodes, where all the nodes were allocated by a separate malloc call, and ref-counted... More than one-third of the startup time (which was counted in minutes) was calls to malloc and free. Some optimizations were made, but in the end we ended up reducing the size of the AST instead of fixing the allocations.

pjc50 · on Sept 3, 2017

Ouch.

Last time I did any serious parser work I used a pool allocator so I could free all the nodes at once, so allocation was just a compare + increment operation. Although that was forced on me by the difficulties of error recovery in yacc.

gens · on Sept 3, 2017

> .. which is written in plain C using GLib ..

That's about as "C" as C++ is. Why not Gneural or libpng or even GNU make ?

http://git.savannah.gnu.org/cgit/gneuralnetwork.git https://github.com/glennrp/libpng http://git.savannah.gnu.org/cgit/make.git/tree/

rjzzleep · on Sept 3, 2017

according to dan saks, who apparently to some people is famous c++ is faster than c (well in the test setup he describes below)

https://accu.org/content/conf2015/DanSaks-Embedded%20Program...

    Language   Design Implementation  Relative Performance
    either any inline 1 (fastest)
    C++ polystate non-inline 1.56 x fastest
    C++ bundled non-inline 1.65 x fastest
    C polystate non-inline 1.70 x fastest
    C bundled non-inline 1.79 x fastest
    C++ unbundled   non-inline 1.82 x fastest
    C unbundled   non-inline 1.95 x fastest

He furthermore argued that the biggest mistakes C++ developers did to kill the adoption of C++ for C programmers was to diverge from the previous line of "C++ is a better C" to "if you're using C++ as a better C you're doing it wrong"

https://www.youtube.com/watch?v=D7Sd8A6_fYUI

(I have no skin in the game, I was just curious to see if it's worth looking at rust for embedded when I came across that talk)

WalterBright · on Sept 3, 2017

Over in D-land we've embraced the concept of using D as a "better C" :-)

https://dlang.org/blog/2017/08/23/d-as-a-better-c/

This is not in the sense of tossing away C coded programs wholesale and rewriting it in D, but incrementally using D here and there for parts of a C program. That way, you've always got a working, usable program.

harry8 · on Sept 4, 2017

    if (existsCoffee)
        writeln("Drink coffee");

http://ddili.org/ders/d.en/if.html Sad that you adopted one of C's worst features. Why? Can you get rid of it and mandate the bracing every block?

WalterBright · on Sept 4, 2017

My take on C's biggest mistake:

https://digitalmars.com/articles/b44.html

harry8 · on Sept 5, 2017

Sure but there is just no reason at all to copy unbraced blocks. All you do is invite bugs for zero benefit.

Maybe there are bigger issues with C? But that's a different discussion. I want to know why you copied something as simultaneously horrendous and useless as unbraced blocks? If you just didn't think it through and that's the way languages syntactically similar to C have always done it, ok. I'm sure I've made worse mistakes. But please call it one way or the other.

humanrebar · on Sept 3, 2017

> "if you're using C++ as a better C you're doing it wrong"

As far as correctness and safety goes, this is still true. It's difficult to scale systems-level programming to large teams. C++ gives the opportunity for more explicit semantics and more aggressive compile-time checks. C can scale well and can be used safely, but you need to do a lot more through convention (always call xyz_Create and xyz_Destroy in pairs!) and through runtime checks (calls to assert, unit testing).

D, Rust, OCaml, and a few other projects are interesting in this space since they provide some of the same benefits as C++ with respect to correctness and safety. Some are plausibly better in theory, though I'm not aware of huge, say, Rust projects that approach the size of huge C++ ones.

pcwalton · on Sept 3, 2017

> D, Rust, OCaml, and a few other projects are interesting in this space since they provide some of the same benefits as C++ with respect to correctness and safety.

Can you name a correctness and safety benefit that C++ has that these programming languages do not?

humanrebar · on Sept 3, 2017

I didn't mean to imply that C++ was safer somehow. I just meant that those languages are also competing in that feature space in a way that C doesn't.

And, on a pedantic level, C++ competitors can't provide exactly the same benefits of C++ because they took different approaches.

What's a key design difference among these languages? Well, C++ can mostly just #include a C header file and go with it. The other languages provide FFI mechanisms, but they each require declarations of the FFI to match the compiled C code. So theoretically there's a little more room for errors in that translation, though I doubt that's a big concern on the whole. Each of those languages have more mature module systems, which should more than make up for keeping FFI interfaces in sync with C headers.

red75prime · on Sept 3, 2017

Probably there are no such benefits. But taking the less serious approach, LLVM's -fsanitize=undefined could be thought of as the one.

khitchdee · on Sept 3, 2017

Agreed. It's the size and type of the project that determines the choice of language more than the newness of the language. C++ was designed to meet the requirements of certain types of projects that were coming into vogue at the time it was designed. It was not a replacement for C.

Some of the structural advantages of C++ over C can be achieved in C by using generative programming for example and building in automatic mechanisms to ensure there are no memory leaks for example. In other words, the C++ approach to structuring programs is not the only way to achieve the benefits that that structuring implies. It's just really easy to do it that way.

humanrebar · on Sept 3, 2017

It's true that generative programming can make up for shortcomings of C. But you're pretty much writing in two languages at that point, C and whatever spec generated the rest of the C code. It's not an apples-to-apples comparison to C++.

khitchdee · on Sept 3, 2017

Well, some parts of the generative programming could be built into a tool and you probably wouldn't have to rewrite a generative program each time. So it's not really writing in 2 languages but using code generation to assist the programming process to reduce potential for error as a more flexible alternative to creating fixed constructs in a purpose built language.

humanrebar · on Sept 3, 2017

You can get 90% of that benefit by using snippets in your favorite editor. I guess you could consider that "code generation", but "generative programming" means, to me, "check in the specification, not the production code".

khitchdee · on Sept 4, 2017

You're assuming gnerative programming is used to completely replace direct programming. This doesn't have to be the case. It could also be used merely in an assistive role to supplement the ability to write code.

humanrebar · on Sept 4, 2017

I'm not. That's what I'm referring to as "snippets", though other forms of scaffolding do apply. If your position is that people should use more sophisticated editing/authoring tools on a regular basis, that's not that controversial a statement.

I was just saying that full-blown code generation isn't merely writing in the same language but adopting a DSL as well, so we're not strictly comparing languages at that point.

khitchdee · on Sept 4, 2017

Right. Code generation has typically been associated with DSLs.

Here's one way it could be done simply. Let's say you wanted to automate the process of memory allocation and deallocation. You would need a way to describe to the code generator the mepory requirments of your structure. For that you would need a description outside of C. But that description could be embedded into the comments of your code and your code generator be designed to parse those comments to determine what needed to be done.

Knuth also came up with the idea of Literate Programming in which the description of a program is embedded as Latex in the code. This could work in a similar way. So, while you would use a DSL, the description would be inline with your code so the authoring process would be integrated and not 2-stream.

pjmlp · on Sept 3, 2017

> It was not a replacement for C.

Sure it was for Bjarne, that is why he created it in first place.

After being forced to re-write his thesis from Simula to BCPL, he swore never having to deal with such low level languages again.

C with Classes was his solution to not having to write C directly, after he got his job at AT&T.

sesutton · on Sept 3, 2017

The aside about Rust is a total straw man. Obviously no one with even a modicum of knowledge of programming languages would think Rust is the only memory safe language. Googling the supposed quote also turns up no results but this post.

wott · on Sept 3, 2017

It is nonsensical to consider memory leaks as reported by Valgrind on a program that uses Glib. It allocates and builds a whole context system and never frees it regularly, which confuses Valgrind. I am pretty certain almost all 'leaks' come from there. libglib should be put in Valgrind suppression file.

It was a very bad choice to choose a program based on Glib for this kind of experiment.

bjconlan · on Sept 4, 2017

I think what you have said here sums up the problem quite concisely. They might as well add core-foundation, qt-core and stllib based executables to the test for 'c' vs 'c++' to give a better cross section.

khitchdee · on Sept 3, 2017

FWIW, Donald Knuth was a proponent of using C over C++ at the time it first came out. He equated C++ with the use of frameworks in writing programs which he thought were a bad idea for the profession as it would dumb it down. C++ does make code reuse a lot easier.

khitchdee · on Sept 6, 2017

As an example of C++ making code reuse a lot easier, consider the Windows platform from a developer's perspective. Before C++, there was this huge library called Win32 in C that contained several hundred functions and data structures to access the services of the platform. Since it was not object oriented, there was a fat book by Charles Petzold, which was like a Bible for windows programmers that described how each of the functions related to each other, in what sequence to call them and a bunch of stuff that was not even documented by Microsoft. Once C++ came, there was a library called MFC which was object oriented and hence a lot better documented and organised and now there's .NET.

The organization of functions into objects makes it a lot easier to understand systems software specially if it's very large. Also the ability to subclass means you can take the base functionality of "template" classes provided by a library and subclass them to extend them with what you need. This was not as easy with C where you had to rely on sample code for this purpose. The Petzold book had a ton of sample code.

overgard · on Sept 3, 2017

> C++ does make code reuse a lot easier.

Not really, with ABI issues and compiler incompatibility widely used C++ libs are either header-only, or have an "extern C" version of the public API. Id say C++ makes reuse much harder.

aidenn0 · on Sept 3, 2017

1) ABI issues and compiler incompatibility hasn't been a problem for 5-10 years (using two compilers for a single binary is relatively rare).

2) Being "header-only" is no impediment to code reuse.

elderK · on Sept 4, 2017

Hey there Aidenn0,

I'm no expert on C++ and I've been considering using it for several projects.

An important thing for my needs is being able to define classes in one shared object and create new subtypes of those classes in another, possibly defining overrides on virtual methods and such.

A good friend of mine has said similar things as you - that the ABI issue has not been a major obstacle for some time.

And yet, as much as I search, I still find the same-old advice: Don't use STL types in your interfaces or throw exceptions across module boundaries.

If all the compilers used for a given platform follow the same ABI, would using a separate and specific STL implementation (say, STLport) instead alleviate that particular issue?

Sorry if this question seems a bit rambley but I'd really love to find out how to use C++ in the way I've mentioned.

aidenn0 · on Sept 4, 2017

If you use the same compiler, ABI is a non-issue.

If you want to distribute dynamic-link binaries for windows, use MSVC.

If you want to distribute dynamic-link binaries for OS X, use Xcode.

If you want to distribute dynamic-link binaries for linux, you are SOL regardless of whether or not you are using C++, but if you use the same compiler and flags that the latest LTS version of Ubuntu uses, then it will work on Ubuntu, and will be made to work anywhere that Steam works.

It used to be that there were at least two C++ compilers for each *nix (typically GNU and something cfront based), so ABI was a much bigger deal.

When "Modern C++ Design" came out, famously none of the compilers could correctly compile all of the sample code. Since then things are much better; not that all compilers are bug-free of course, but they are sufficiently good enough that if you report a bug, you can expect it to be fixed.

[EDIT]

"Don't use STL Types in your interfaces" is not advice I've heard in like 15 years; I more often hear "If you're using a C array instead of a Vector, you're doing it wrong"

"Don't throw exceptions across module boundaries" seems similarly odd. Unless your constructors are inlined, no modern code-base will follow that rule because RAII relies so strongly on exceptions.

There are coding styles that are opposed to exceptions as part of an external interface, but that's due to exceptions not being checked as part of the type system, and is not what I would call a majority opinion.

elderK · on Sept 5, 2017

Thanks for the response.

To clarify "module boundaries", I mean "separate shared objects."

As for Linux, I'm not too concerned with creating a single binary that works for all distributions.

I'm more concerned with someone being able to build a set of shared libraries on their distribution of choice and those shared libraries being able to interact naturally regardless of which compiler s/he uses to build each of them.

Say, LibA is built using LLVM. LibB is built using G++ and LibC is built using ICC.

LibA defines several classes. LibB creates some subtypes. LibC instantiates types from both LibA and LibB.

All the functions present in LibA, LibB, LibC make use of STL types such as std::string, std::vector, etc. Some may throw exceptions, whatever.

With respect to MSVC, I've read that compatibility between Debug and Release builds is kind of suspect, especially if you're using STL types. Not to mention differences in MSVC version. Is this still a concern?

aidenn0 · on Sept 5, 2017

> I'm more concerned with someone being able to build a set of shared libraries on their distribution of choice and those shared libraries being able to interact naturally regardless of which compiler s/he uses to build each of them.

Sorry, but this is an unreasonable standard. Literally no language, including C supports this. With C it only works inasmuch as the C compiler authors work really hard to make it works, and even then it sometimes breaks (if your compiler inlines a call to malloc, and you free a pointer compiled with a different C Compiler that inlined a different malloc implementation, it can break horribly. Yes I've seen this happen.)

Some languages support cross-version linking (or whatever the language's equivalent of "linking" is), but I'm not aware of any that specify a complete ABI for unrelated implementations to support. IPC libraries do typically support this though.

[edit]

I don't want to go on a shared-library rant, but I am fairly strongly opposed to them (except perhaps in cases like how nixos manages it). You can take a statically linked binary from 1997 and run it unmodified on your linux machine today. It is a virtual guarantee that any dynamically-linked binary more than 2 years old will not work correctly. Linus puts a huge amount of effort into backwards compatibility, and it is completely destroyed by dynamic linking.

overgard · on Sept 5, 2017

> If you use the same compiler, ABI is a non-issue

And yet... microsoft releases a new compiler every two years or so, and not every library you use is going to update at the same time. This is a huge frustration for a lot of people.

I write c++ professionally and I've seen people waste weeks on these things, and most the libraries we wrote had plain c interfaces because being able to use other languages to call into the code was important and c++ is a nightmare with that.

aidenn0 · on Sept 6, 2017

Yeah, MSVC breaking ABI is somewhat annoying, but I am also used to keeping the most recent half-dozen MSVC's installed.

VS 6.0 is getting very hard to source legally these days, and I wish MS made it easier to get.

As far as having high-level languages call directly into C++, yes that's quite a pain (nearly impossible without something like https://github.com/rpav/c2ffi). Note also that calling into non-C ABI functions in any language is hard (and most HLLs don't support anything like extern "C" to make it easy).

khitchdee · on Sept 4, 2017

Partly this depends on the platform. C++ is well supported on Microsoft's .NET platform where you can access all the functionality of the .NET libraries through C++.

STL, I guess, is more used on Linux. I would advise against trying to use portable libraries and instead using libraries designed for the platform you are targeting.

Having said that, a good portable UI library is the open source WxWidgets which is accessible through C++ for OSX, Linux, Windows

overgard · on Sept 5, 2017

1) Ever try to upgrade msvc versions? It's always a huge problem if you're using libraries you don't have source for. Not to mention the 50 million linker issues if one library is linked statically and the rest dynamic. There are still people on like msvc 6 because of this.

2) Header only libraries are horrible for compile times, especially heavily templated ones (and if you use c++ generics it basically has to be a header library). The reason boost is banned from a lot of cpp projects isn't because the library is bad, it's because of compile time.

humanrebar · on Sept 3, 2017

Interesting project!

The C++ version uses many memory allocations. Using allocators some in the C++ program would certainly cut down on the number of allocations. It would also be interesting to see if doing so also improved performance.

Similarly, it would be interesting to see if using the C++17 string_view (or the gsl version if C++17 isn't available to you) instead of `const string &` parameters affected performance.

Finally. I see that in most (all?) cases, objects are returned by value, not returned through reference parameters or pointers. It's interesting to see that that choice didn't compare poorly to a C implementation.

hedora · on Sept 3, 2017

It is interesting that the C++ version does twice as many allocations. I suspect this means there is some low hanging fruit for future optimizations.

dingo_bat · on Sept 3, 2017

I knew C++ compilation was slow but 30x slower? I'm sure the compile-time memory usage will also show a similar trend. It would be interesting if somebody could explain the reason for this disparity.

Matthias247 · on Sept 3, 2017

Based on your dependencies and programming style it might be less or more, but it sounds not unreasonable.

I wrote the same program in two environments once: C++ with stdlib and boost, and C++ purely using QT abstractions. The second one already compiled in 1/5th of the time. I guess that's because with QT the headers are small because most implementations are hidden behind pointers (PIMPL), while with boost you often pull in lots of code through headers and compile dozens of specializations of similar types to avoid indirection costs.

TheCoelacanth · on Sept 3, 2017

Not exactly a fair comparison when the C program is using 1.5MB of pre-compiled dependencies while the C++ program reimplements all of the functionality from the dependencies in the code being compiled.

dsign · on Sept 3, 2017

It's a well known fact. Just by including some headers from the standard library the compiler has to go over huge chunks of library template code, and it usually needs quite a few complicated phases to slowly morph those templates into executable code.

astrodust · on Sept 3, 2017

The problem, by and large, is that C++ is heavily dependent on header files to implement the Standard Library. It's largely templated, which means there's no way to make a pre-compiled version, the code generated varies wildly depending on the types involved.

C has relatively simple header files, they usually contain structs, function signatures, and a bunch of macros. They're easy to parse and apply by comparison, plus don't tend to be as deeply nested.

If C++ ever adopts the Pascal-style "module" extensions that have been kicking around in various proposals compile times could shrink by several orders of magnitude.

aewnjfksd · on Sept 3, 2017

> If C++ ever adopts the Pascal-style "module" extensions that have been kicking around in various proposals compile times could shrink by several orders of magnitude.

I'm skeptical. Modules don't avoid the need for template instantiation.

nly · on Sept 3, 2017

Template instantiation surely only requires type substitution and re-running some analysis though. What makes C++ compilation slow is reparsing headers again and again and again because the C the preprocessor means that every time they are encountered they may have new semantics.

The motivation for modules in C++ is similar to that of developing a Binary AST for Javascript, discussed on HN recently.

dozzie · on Sept 3, 2017

> What makes C++ compilation slow is reparsing headers again and again and again because the C the preprocessor means that every time they are encountered they may have new semantics.

Really? And I thought that this is why C and C++ headers are typically wrapped in #ifndef-#define-#endif block, so they only produce whitespace after preprocessing on second inclusion.

Matthias247 · on Sept 3, 2017

Yes, this happens inside a single translation unit (.cpp file). However if you have multiple .cpp files which include the same header file you have to reparse it each time. This is because before the inclusion of that header different #defines might have been set (e.g. through other headers), and therefore the content of the header file might be different.

astrodust · on Sept 3, 2017

It's not the second inclusion that's a problem but the way any given template might behave completely differently depending on what order they're loaded in.

That is, including a, b, c is not necessarily the same as a, c, b or b, a, c. This is not true with proper modules, they're order invariant, and as such you can make a ton of optimiztions.

astrodust · on Sept 3, 2017

If modules ever happen, wow, C++ is going to feel like a whole new language. I remember large Pascal codebases compiling in as little time as it took to press the key, and this was in the era of computers with mere megabytes of memory.

Turbo C++ was never as "turbo" as Turbo Pascal.

fasquoika · on Sept 3, 2017

That's probably not just modules though. Pascal, being one of Niklaus Wirth's languages, was specifically designed to be easy to compile, generally not even requiring building an AST (though a particular compiler still might, especially of it added extensions to the language).

pjmlp · on Sept 3, 2017

They already happened, kind of.

You can get Visual C++ with support for them.

clang has support, but the old style ones

Currently they are already a TS targeted for C++20.

pandaman · on Sept 3, 2017

In this case it's likely headers size. In native C++ programs you usually have more complex syntax (e.g. namespace is not global any more and each id has to be properly located in a hierarchy of namespaces) and, sometimes, compile-time evaluation via templates.

72deluxe · on Sept 3, 2017

"The C++ version has no pointers but instead uses value types. This means that all data is stored twice: once in the array and a second time in the hash table."

This is interesting. Are they using modern C++ and making use of moves and perfect forwarding? Or are they just throwing std::strings around and doing millions of copies (e.g. remember std::vector must support copyconstructable, so copy constructors & operator=) in the process? That would explain the allocations in C++ being higher perhaps, particularly if they're using the "wrong" containers. Why not sure unique_ptr or shared_ptr?

It is worth remembering that move constructors and assignment operators only get used in very specific places and you have to ensure that any constructors you write yourself are explicitly noexcept.

milansuk · on Sept 3, 2017

> Every manual resource deallocation call is a potential bug. This is confirmed by the number of memory leaks as reported by Valgrind. There are more than 1000 of them, several dozen of which are marked as "definitely lost".

You can't compare performance If one program doesn't free memory, which obviously "saves" time. Valgrind can tell you where non-freed heap blocks have been allocated and a fix should not be complicated.

mcguire · on Sept 3, 2017

"Valgrind can tell you where non-freed heap blocks have been allocated and a fix should not be complicated."

In theory, theory and practice are the same. In practice, they aren't.

widdershins · on Sept 3, 2017

This is quite a good comparison too, along with lots of advice on code quality. The result was that the speed was almost exactly the same.

https://www.youtube.com/watch?v=SIAAvv1O7Gg

ausjke · on Sept 3, 2017

In general there is no way a C program(apple to apple, comparing to its similar c++ version) will be larger than C++, be it static, shared libraries included, or whatever.

astrodust · on Sept 3, 2017

Today, sure, but there's no assurance that this will be true in the future. If more compiler-friendly extensions are added to C++ to help it generate tighter, more nimble machine code because it's given more leeway in optimizations, then the C++ code could be substantially smaller. C doesn't seem as interested in adopting some of the C++ paradigms that could make optimization better, tools like formalized iterators and such.

There's been various attempts at pre-compiling the headers over the years, but the results have always been, for various reasons, less than perfect.

khitchdee · on Sept 3, 2017

If you ignore the protection mechanisms and the class heirarchy built into C++, then a C++ class is like a C struct that can contain function pointers. For many programs this is all that's needed, and the amount of overhead involved in using such an approach to creating objects is obviously lower. So there's no question C will always be faster. It's only when you need protection and class heirarchies that C++ benefits you. That benefit is mainly one of better code organisation.

nly · on Sept 3, 2017

> a C++ class is like a C struct that can contain function pointers.

No, it's not. Calling an ordinary class member function in C++ has exactly the same overhead as calling a function in C. Even virtual functions in C++ are not the same as putting function pointers in a C struct (they live in a separate data structure called the vtable).

> the protection mechanisms and the class heirarchy

All C++ protection mechanisms occur at compile time and have no runtime overhead. Non-virtual inheritance hierarchies have the same overhead as C struct composition (because under the covers the memory layout is the same).

cperciva · on Sept 3, 2017

Calling an ordinary class member function in C++ has exactly the same overhead as calling a function in C.

Modern compilers make the same optimization for function pointers which are only ever set to one value, too.

nly · on Sept 3, 2017

Hi Colin. This isn't an optimisation though, it's a guarantee. Member function calls are resolved statically at compile-time. Replacing indirect calls off a function pointer with direct calls (devirtualization) is an optimisation that applies to both languages equally and requires whole-program / link-time optimization.

khitchdee · on Sept 3, 2017

Doesn't a vtable imply an extra level of indirection? You have to find where the vtable is in the object, then the function within the vtable, right? Is that not slower?

nly · on Sept 3, 2017

In common implementations the vtable pointer is always the first word inside the object. Given an arbitrary pointer to an object, the offset of this vtable pointer relative to what you're pointing to is always computed statically at compile time. Unless you're using multiple inheritance, this offset is usually zero because derived object pointers in a single-inheritance hierarchy actually always point to their base.

If you're using multiple inheritance then an object can have multiple vtable pointers, but again which one you need to use is known at compile-time based on which class the virtual function you're calling is declared within, and the type of pointer you have.

Once you have the vtable you then have to locate the function pointer for the function you're calling. Again, this is usually a compile-time constant offset from the start of the vtable. This ceases to be true when you have 'virtual inheritance' (not to be confused with virtual functions), when another indirection to find this function pointer is required.

Here are some examples:

https://godbolt.org/g/N2XcV7

You'll notice that the get_square() function, which returns a member function pointer to the virtual square function, doesn't even return any memory addresses, just metadata and an offset

mcguire · on Sept 3, 2017

Yes, although (cache behavior notwithstanding) it's a single pointer indirection that can frequently be optimized away by the compiler.

khitchdee · on Sept 4, 2017

My point is simply this -- adding protection mechanisms and inheritance to classes neccesitates adding more complexity to the structure used to represent them (such as a vtable) which does add performance overheads. If you dont need those features, you can go leaner and faster with a C structure that includes function pointers to give you the basic packaging of data and functions that an object has.

andrepd · on Sept 3, 2017

That is absolutely false. C++ member functions are zero-cost abstractions, i.e. they have the same cost as any other function call. Member functions are _not_ function pointers that reside inside the struct. They don't take up space, they don't need dereferencing to call. They are just "syntatic sugar" to group functions more logically.

inetknght · on Sept 3, 2017

> Member functions are _not_ function pointers that reside inside the struct.

...as long as they're not virtual functions, this is correct. Add a virtual table and this is less correct (but the optimizer may still make it correct if it can prove the types match).

andrepd · on Sept 3, 2017

Yes, that is correct, but then again virtual functions give you new functionality, namely dynamic dispatching. Stick with static inheritance and you won't have this overhead.

khitchdee · on Sept 3, 2017

Please check your facts. C++ member functions are not as simple as function pointers inside structs.

Narishma · on Sept 4, 2017

You're right, they are simpler in fact. They have the same cost as free-standing functions.

khitchdee · on Sept 4, 2017

Right. As is speech -- free.

jcelerier · on Sept 3, 2017

> So there's no question C will always be faster

Except that the poster above you shown that the opposite was true. The exact same program was written multiple times with C structs and with C++ classes and,

"Except for the non-inline unbundled monostate in C++, every non-inline C++ implementation outperformed every non-inline C implementation."

ric129 · on Sept 3, 2017

Only virtual functions behave as you described, and even them don't exactly cause a function pointer to be stored in an object.

humanrebar · on Sept 3, 2017

> It's only when you need protection and class heirarchies that C++ benefits you.

Well, need is a strong word. You might not need better correctness while still coming out way ahead by using C++.

khitchdee · on Sept 3, 2017

Protection implies a much more complex structure to represent an object and class heirarchies and inheritance imply the need for a runtime. Both these overheads come at a cost. It's the nature of the program you are wriitng that determines whether you will come out ahead. If you were writing a codec, say, you would not use C++.

humanrebar · on Sept 3, 2017

Not necessarily. A trivial scope guard initialized with a lambda has basically no overhead versus the equivalent be-really-careful approach in C.

khitchdee · on Sept 3, 2017

That would be cruelty to the cat dancing on the hot tin roof.

Joky · on Sept 3, 2017

Can you clarify what runtime needs would a class hierarchy have in C++ that a correctly structured C program wouldn't have?

khitchdee · on Sept 3, 2017

Well, if you override the implementation of a function in a subclass, the runtime has to determine that and load it in at runtime, when you instantiate an object of the subclass. In C there is no runtime.

psyc · on Sept 3, 2017

This is done at compile time. The call is indirect, which only means the call destination is decoupled from the generated calling code. This does not entail the runtime loading anything.

http://www.geeksforgeeks.org/virtual-functions-and-runtime-p...

khitchdee · on Sept 4, 2017

OK. Wouldn't that indirection add an extra instruction, each time you called a function?

KayEss · on Sept 4, 2017

Yes, but if your design requires the virtual function then you'll be using a function pointer in the C implementation as well, which has the same indirection.

khitchdee · on Sept 4, 2017

If your design requires inheritance or virtual functions, C++ is the right choice. But in many cases, when it doesn't, if you still use C++, you'll pay the price of an extra indirection and a much larger memory structure to hold your objects. If your function were processing the inner loop of a video codec, that would unneccesarily slow you down

bstamour · on Sept 4, 2017

Unless you explicitly type `virtual`, your C++ classes will have the exact same overhead as C structs. Even with inheritance. The memory layout of

    struct A { int x; };
    struct B : A { int y; };

is the same as if you had written

    struct B { int x; int y; };

Public/private/protected inheritance and access control do not add overhead. It's literally only if you opt in by typing `virtual` do you get class hierarchy overhead.

khitchdee · on Sept 4, 2017

Public/private/protected inheritance and access control do not add overhead

That's interesting. So does the compiler just put the functions in different parts of the vtable to remember the access control rules. There's no such thing as a free lunch and you're adding information here -- has to be stored somewhere.

bstamour · on Sept 4, 2017

Access control rules are all checked at compile time. There's literally nothing to store. If you want proof, check the output of your compiler. The only thing a non-virtual struct/class might do is reorder member variables if they're of different access controls, but if you're just using a C-style struct but with private member variables and public non-virtual member functions, it has literally the same memory layout as it would in C.

khitchdee · on Sept 4, 2017

OK. That makes sense. So it's only a compiler overhead

psyc · on Sept 4, 2017

You only pay for the indirection for virtual functions. You don't pay it simply for choosing C++. There is no "much larger memory structure" either. The vtable is per class. The per object cost is one pointer. C++ compilers are pretty smart.