I'm going to be the first to point out the one major flaw in this comparison: "plain C using GLib" is not comparable to "C++ standard library only" --- what should be compared is "C++ standard library only" and "C standard library only". Reimplementing pkg-config in pure C without GLib would be necessary for that.
As for the "memory leaks" --- I haven't looked at the source, but something whose runtime is very short-lived, like pkg-config, may be very well justified in allocating and never freeing, letting the process exit itself be the "ultimate free". I've seen and done this many times myself.
I've seen projects that turned from simple and straightforward to buggy (and harder to debug), slow, and bloated because someone decided they wanted to "use C++" and would try to make use of as many "modern C++" features as they could.
Converting an existing C program into C++ can yield programs that are as fast, have fewer dependencies and consume less memory. The downsides include a slightly bigger executable and slower compilation times.
I have heard of calculating a "memory budget" and pre-allocating that, but calculating a "leak budget" and doubling that doesn't seem like hygienic programming.
We wouldn't want the programmer to spend too much money/time on fixing errors. After all, it's not like people would die if there's an error in the missile software: http://www.gao.gov/mobile/products/IMTEC-92-26
Not to mention they added even more HW to work around the leaks. No wonder those projects always run over budget.
>We wouldn't want the programmer to spend too much money/time on fixing errors. After all, it's not like people would die if there's an error in the missile software
When exactly did pkg_config became missile software?
As they say in meme-land, "that escalated quickly".
Here's a novel idea: how about the appropriate level of effort/time/YAGNI-stuff based on the domain?
Or do you write one-time scripts with MISRA rules?
The sub-thread I replied to was linking to an anecdote about missile software.
But I still think not freeing is pretty lame even in short-lived tools. If one is using a no-op deallocator at least the code is designed properly and could be repurposed.
The problem with counting leaks is it counts memory you need until the very end of your program. There is no point calling free as you are exiting anyway, and I have seen programs where freeing everything took .5 seconds as the program was closing.
As someone whose startup was probably hindered by an almost religious adherence to testing with Valgrind (hardware was way too slow), I'd still say finding the real leaks and bugs is a lot easier when you don't have the noise of "expected" leaks.
> As for the "memory leaks" --- I haven't looked at the source, but something whose runtime is very short-lived, like pkg-config, may be very well justified in allocating and never freeing, letting the process exit itself be the "ultimate free". I've seen and done this many times myself.
This makes sense, and then somebody has a vision for a use case beyond the original imagination, goes to turn the code into a library and spends countless hours smartening up lazy resource management. Whether the original author should be "more responsible" is open for debate, but I've personally run into the above situation, and only mention for another perspective on The Life of Code.
That's the classic KISS/YAGNI vs. (not sure if there is an initialism for it) robust extensible modular best practices etc. debate, for which there has been much bikeshed from both sides.
>This makes sense, and then somebody has a vision for a use case beyond the original imagination, goes to turn the code into a library and spends countless hours smartening up lazy resource management.
That's the question. Don't you think it's easy to think of reasons, though? What if the person porting the code to a library was a later version of the original author?
> As for the "memory leaks" --- I haven't looked at the source, but something whose runtime is very short-lived, like pkg-config...
To be fair in the other direction, it's a minority of interesting software projects that don't care about memory leaks. Also, there are ways to "skip" freeing memory in C++ safely using appropriate allocators. In actuality, you'd have your allocator, in its destructor, clean up itself and all its objects at the same time. It's nearly the same performance without sacrificing correctness or lowering standards with respect to leaking memory.
"Using C++" does not mean "Using lots and lots of features of C++ just because". That some people do that is hardly a criticism of C++, more of the lousy programmers that write software like that.
Unfortunately it seems that the majority of C++ code out there is like that. It could be said that C++ makes it far easier than C to introduce unnecessary abstraction and indirection, without realising the true costs.
I've noticed that using a lower-level language, where abstractions have to be built explicitly, tends to cause one to rethink the problem and often come up with an even simpler and more efficient solution, by approaching it from another direction which using a higher-level language may not even allow.
An example of this I encountered several years ago was with several coworkers who were trying with utmost effort to optimise a piece of code which the profiler had indicated was taking a substantial amount of time --- and a lot of it consisted of memory allocation and copying. They tried lots of "classic" tricks like unrolling, inlining, even reorganising the layout of several classes in an attempt to be more cache-friendly. I looked at the algorithm and realised rather quickly that the code in question was not necessary at all; some trivial modifications to code elsewhere which was using it and deleting that code completely resulted in 30x faster performance and 1/10 memory usage. Due to their background, my coworkers were stuck in the mindset that it was necessary to perform all that convoluted processing, and neglected to see the bigger picture.
> Unfortunately it seems that the majority of C++ code out there is like that. It could be said that C++ makes it far easier than C to introduce unnecessary abstraction and indirection, without realising the true costs.
Citation needed? Better developers are more aware, in any language. There are some cases where idiomatic C++ may introduce more indirection over C (though I can't think of any); there are plenty where idiomatic C introduces more indirection than C++. However, with a little more effort and awareness, the faster and more maintainable solution is always accessible.
> I've noticed that using a lower-level language, where abstractions have to be built explicitly, tends to cause one to rethink the problem and often come up with an even simpler and more efficient solution, by approaching it from another direction which using a higher-level language may not even allow.
Having lots of developers, re-implement many things, mostly just results in much more buggy code. Getting things exactly right is hard. Having a bigger standard library and safer abstractions is a huge edge.
I don't think your anecdote has anything to do with C vs C++. I think basically some negative experiences with so-so C++ devs has colored your thinking rather than technical reasons.
I've only written patches of C++ in a couple of tiny projects, and something I've had trouble with in both cases is trying to figure out the somewhat objectively "right" or "best" way to write it. It's not a language I'm fluent in (the vast majority of my experience has been in Objective-C and Swift) so I find myself spending a good chunk of time doing research trying to figure out what the most widely accepted/correct way to do [insert thing] is in C++.
So if what's written here is true, I may be unwittingly baking bad practices into my C++ knowledge as a direct result of trying to accomplish the exact opposite…
Which leads me to the question: what is the "right way"? I've seen highly vocal critics of writing C++ as "C with extras", so I assume some middleground is where I need to target?
I would read books like Scott Meyers' Effective Modern C++, watch cppcon talks. There's a lot of very high quality C++ content out there these days.
Generally there's a lot of emphasis on RAII, clear ownership semantics, leveraging more of the standard library as its grown, using lambdas, avoiding shared mutable state, judicious but not excessive use of inheritance and in particular avoiding implementation inheritane, encapsulation.
It's not so much a middle ground in the sense you are thinking. The people I'm talking about don't advocate developers, particularly non expert, going crazy with templates. People do that on their own.
lots of people who have produced super high quality, money making software in c++ ignore almost everything advised by stroustrop and other modern c++ advocates. don't worry about it, just aolve your own problems.
The right way is, to speak in philosophical terms, totally a pragmatic choice. If nothing in C++ is working for you, don't use it. If a really obscure technology is making you go really fast, use it.
The one and only thing that muddies this picture is other people. Once you are collaborating on a program(whether on the same team, through end-user code, or through a library or API call) all your tricks, preferences and conventions are subject to other people's inept groping and misunderstanding. And that is where you get into standardized best practices. They are basically guaranteed to not actually be the best practice, but they're the one you can compromise on.
My experience has been the opposite of yours. Freeing is always a good idea so you can use tools like valgrind to find memory bugs without needing to sort through all the false positives. Your short-lived program today can become a service tomorrow. c++is worth it just for RAII
True, the defaults are pretty lame, but my point is if you don't free, and you start using data you intended to free, that's a memory bug valgrind won't be able to help you with unless you free your memory.
FWIW, this is in my bashrc along with a myriad of other programs that have lame defaults:
alias valgrind='valgrind --leak-check=full --show-reachable=yes --track-origins=yes --track-fds=yes --error-limit=no'
> As for the "memory leaks" --- I haven't looked at the source, but something whose runtime is very short-lived, like pkg-config, may be very well justified in allocating and never freeing, letting the process exit itself be the "ultimate free".
But sometimes the code grows and what was once a stand-alone executable is about to become a component in a larger executable. With C++ you get a correct component out of the box (if you use RAII consistently), but with C you have to audit all of the code and clean up the leaks.
Converting the code to pure C without GLib would increase the code size considerably, since GLib provides utilities, such as hashtables, trees, and better string handling, that the standard C++ library provides but the standard C library does not.
Never freeing is sloppy, it means resource handling was likely not thought through.
I would be concerned about non-memory resource handling in particular.
If one must use this "trick", it's better to think things through, add the appropriate release calls and then somehow replace the release function with a no-op.
Anyway, you're welcome to take a C++ project and translate it to a fast C project with fewer dependencies and lower memory consumption. Then we can discuss facts instead of your personal opinions.
Never freeing is common practice in embedded systems where you should be pre-allocating all data (after worst-case analysis). This removes any possibility of fragmentation issues, etc. Many coding standards forbid dynamic allocation for embedded & real-time systems also for safety & reliability reasons.
It is not unusual to have allocate-only heaps for exactly these reasons.
I agree, but the C idiom is to just not call free, without pre-allocating anything. It's an optimisation to avoid waiting for the resources to be released, with the idea that they will be anyway when the process is killed.
The minor discussion on the C version's memory leaks reminded me of a neat trick. If you're developing a short lived application, like pkg-config, you can opt to never deallocate. i.e. leak everything. In lightweight, short lived applications there's usually not a lot of incentive to deallocate; your application will never use much memory anyway and the deallocations waste time.
You can think of it like treating C as a garbage collected language, except the garbage collection cycle occurs only once at the end of the program :P
It really can be an effective trick. Deallocation isn't free, and under certain loads can be quite expensive.
The 1000+ leaks in the C version might actually be what's giving it the slight run-time advantage.
I remember one of those sysadmin stories, where a multi-terabyte `cp` command had seemingly completed all of its work, but was sticking around for days; slowly and pointlessly free()ing the 17GB hash table of hardlinks that it had built up; when it could have just exited and let the OS reclaim the memory.
It's stop-the-world garbage collection, where collection only happens when the world ends.
Side note wrt deallocations being somewhat slow: could something like Boehm conservative GC speed that up, by grouping all the deallocations together, or by doing them on a separate thread?
Or take that even further and allocate a slab that's big enough to last the entire runtime and send malloc on vacation. It's worth repeating, with todays focus on web-frameworks, cloud-providers and dogmatics these ideas are slipping into obscurity; which is a shame given how beneficial they can be if your program fits the use case.
C++ actually supports this approach in the standard library and doing this is a really standard technique in some domains. That's what the allocator template parameter in vector and other containers is for.
And, instead of using pointers into said memory, use indices so that the slab can be reallocated, or just mmap more pages following the slab if you own the process address space. Also, align properly. And having guard pages is always nice. Freeing on exit too, one single deallocation is pretty cheap.
I've seen programs building ASTs with hundreds of millions of nodes, where all the nodes were allocated by a separate malloc call, and ref-counted... More than one-third of the startup time (which was counted in minutes) was calls to malloc and free. Some optimizations were made, but in the end we ended up reducing the size of the AST instead of fixing the allocations.
Last time I did any serious parser work I used a pool allocator so I could free all the nodes at once, so allocation was just a compare + increment operation. Although that was forced on me by the difficulties of error recovery in yacc.
Language Design Implementation Relative Performance
either any inline 1 (fastest)
C++ polystate non-inline 1.56 x fastest
C++ bundled non-inline 1.65 x fastest
C polystate non-inline 1.70 x fastest
C bundled non-inline 1.79 x fastest
C++ unbundled non-inline 1.82 x fastest
C unbundled non-inline 1.95 x fastest
He furthermore argued that the biggest mistakes C++ developers did to kill the adoption of C++ for C programmers was to diverge from the previous line of "C++ is a better C" to "if you're using C++ as a better C you're doing it wrong"
This is not in the sense of tossing away C coded programs wholesale and rewriting it in D, but incrementally using D here and there for parts of a C program. That way, you've always got a working, usable program.
Sure but there is just no reason at all to copy unbraced blocks. All you do is invite bugs for zero benefit.
Maybe there are bigger issues with C? But that's a different discussion. I want to know why you copied something as simultaneously horrendous and useless as unbraced blocks? If you just didn't think it through and that's the way languages syntactically similar to C have always done it, ok. I'm sure I've made worse mistakes. But please call it one way or the other.
> "if you're using C++ as a better C you're doing it wrong"
As far as correctness and safety goes, this is still true. It's difficult to scale systems-level programming to large teams. C++ gives the opportunity for more explicit semantics and more aggressive compile-time checks. C can scale well and can be used safely, but you need to do a lot more through convention (always call xyz_Create and xyz_Destroy in pairs!) and through runtime checks (calls to assert, unit testing).
D, Rust, OCaml, and a few other projects are interesting in this space since they provide some of the same benefits as C++ with respect to correctness and safety. Some are plausibly better in theory, though I'm not aware of huge, say, Rust projects that approach the size of huge C++ ones.
> D, Rust, OCaml, and a few other projects are interesting in this space since they provide some of the same benefits as C++ with respect to correctness and safety.
Can you name a correctness and safety benefit that C++ has that these programming languages do not?
I didn't mean to imply that C++ was safer somehow. I just meant that those languages are also competing in that feature space in a way that C doesn't.
And, on a pedantic level, C++ competitors can't provide exactly the same benefits of C++ because they took different approaches.
What's a key design difference among these languages? Well, C++ can mostly just #include a C header file and go with it. The other languages provide FFI mechanisms, but they each require declarations of the FFI to match the compiled C code. So theoretically there's a little more room for errors in that translation, though I doubt that's a big concern on the whole. Each of those languages have more mature module systems, which should more than make up for keeping FFI interfaces in sync with C headers.
Agreed.
It's the size and type of the project that determines the choice of language more than the newness of the language. C++ was designed to meet the requirements of certain types of projects that were coming into vogue at the time it was designed. It was not a replacement for C.
Some of the structural advantages of C++ over C can be achieved in C by using generative programming for example and building in automatic mechanisms to ensure there are no memory leaks for example. In other words, the C++ approach to structuring programs is not the only way to achieve the benefits that that structuring implies. It's just really easy to do it that way.
It's true that generative programming can make up for shortcomings of C. But you're pretty much writing in two languages at that point, C and whatever spec generated the rest of the C code. It's not an apples-to-apples comparison to C++.
Well, some parts of the generative programming could be built into a tool and you probably wouldn't have to rewrite a generative program each time. So it's not really writing in 2 languages but using code generation to assist the programming process to reduce potential for error as a more flexible alternative to creating fixed constructs in a purpose built language.
You can get 90% of that benefit by using snippets in your favorite editor. I guess you could consider that "code generation", but "generative programming" means, to me, "check in the specification, not the production code".
You're assuming gnerative programming is used to completely replace direct programming. This doesn't have to be the case. It could also be used merely in an assistive role to supplement the ability to write code.
I'm not. That's what I'm referring to as "snippets", though other forms of scaffolding do apply. If your position is that people should use more sophisticated editing/authoring tools on a regular basis, that's not that controversial a statement.
I was just saying that full-blown code generation isn't merely writing in the same language but adopting a DSL as well, so we're not strictly comparing languages at that point.
Right. Code generation has typically been associated with DSLs.
Here's one way it could be done simply. Let's say you wanted to automate the process of memory allocation and deallocation. You would need a way to describe to the code generator the mepory requirments of your structure. For that you would need a description outside of C. But that description could be embedded into the comments of your code and your code generator be designed to parse those comments to determine what needed to be done.
Knuth also came up with the idea of Literate Programming in which the description of a program is embedded as Latex in the code. This could work in a similar way. So, while you would use a DSL, the description would be inline with your code so the authoring process would be integrated and not 2-stream.
The aside about Rust is a total straw man. Obviously no one with even a modicum of knowledge of programming languages would think Rust is the only memory safe language. Googling the supposed quote also turns up no results but this post.
It is nonsensical to consider memory leaks as reported by Valgrind on a program that uses Glib. It allocates and builds a whole context system and never frees it regularly, which confuses Valgrind. I am pretty certain almost all 'leaks' come from there. libglib should be put in Valgrind suppression file.
It was a very bad choice to choose a program based on Glib for this kind of experiment.
I think what you have said here sums up the problem quite concisely. They might as well add core-foundation, qt-core and stllib based executables to the test for 'c' vs 'c++' to give a better cross section.
FWIW, Donald Knuth was a proponent of using C over C++ at the time it first came out. He equated C++ with the use of frameworks in writing programs which he thought were a bad idea for the profession as it would dumb it down. C++ does make code reuse a lot easier.
As an example of C++ making code reuse a lot easier, consider the Windows platform from a developer's perspective. Before C++, there was this huge library called Win32 in C that contained several hundred functions and data structures to access the services of the platform. Since it was not object oriented, there was a fat book by Charles Petzold, which was like a Bible for windows programmers that described how each of the functions related to each other, in what sequence to call them and a bunch of stuff that was not even documented by Microsoft.
Once C++ came, there was a library called MFC which was object oriented and hence a lot better documented and organised and now there's .NET.
The organization of functions into objects makes it a lot easier to understand systems software specially if it's very large. Also the ability to subclass means you can take the base functionality of "template" classes provided by a library and subclass them to extend them with what you need. This was not as easy with C where you had to rely on sample code for this purpose. The Petzold book had a ton of sample code.
Not really, with ABI issues and compiler incompatibility widely used C++ libs are either header-only, or have an "extern C" version of the public API. Id say C++ makes reuse much harder.
I'm no expert on C++ and I've been considering using it for several projects.
An important thing for my needs is being able to define classes in one shared object and create new subtypes of those classes in another, possibly defining overrides on virtual methods and such.
A good friend of mine has said similar things as you - that the ABI issue has not been a major obstacle for some time.
And yet, as much as I search, I still find the same-old advice: Don't use STL types in your interfaces or throw exceptions across module boundaries.
If all the compilers used for a given platform follow the same ABI, would using a separate and specific STL implementation (say, STLport) instead alleviate that particular issue?
Sorry if this question seems a bit rambley but I'd really love to find out how to use C++ in the way I've mentioned.
If you want to distribute dynamic-link binaries for windows, use MSVC.
If you want to distribute dynamic-link binaries for OS X, use Xcode.
If you want to distribute dynamic-link binaries for linux, you are SOL regardless of whether or not you are using C++, but if you use the same compiler and flags that the latest LTS version of Ubuntu uses, then it will work on Ubuntu, and will be made to work anywhere that Steam works.
It used to be that there were at least two C++ compilers for each *nix (typically GNU and something cfront based), so ABI was a much bigger deal.
When "Modern C++ Design" came out, famously none of the compilers could correctly compile all of the sample code. Since then things are much better; not that all compilers are bug-free of course, but they are sufficiently good enough that if you report a bug, you can expect it to be fixed.
[EDIT]
"Don't use STL Types in your interfaces" is not advice I've heard in like 15 years; I more often hear "If you're using a C array instead of a Vector, you're doing it wrong"
"Don't throw exceptions across module boundaries" seems similarly odd. Unless your constructors are inlined, no modern code-base will follow that rule because RAII relies so strongly on exceptions.
There are coding styles that are opposed to exceptions as part of an external interface, but that's due to exceptions not being checked as part of the type system, and is not what I would call a majority opinion.
To clarify "module boundaries", I mean "separate shared objects."
As for Linux, I'm not too concerned with creating a single binary that works for all distributions.
I'm more concerned with someone being able to build a set of shared libraries on their distribution of choice and those shared libraries being able to interact naturally regardless of which compiler s/he uses to build each of them.
Say, LibA is built using LLVM. LibB is built using G++ and LibC is built using ICC.
LibA defines several classes. LibB creates some subtypes. LibC instantiates types from both LibA and LibB.
All the functions present in LibA, LibB, LibC make use of STL types such as std::string, std::vector, etc. Some may throw exceptions, whatever.
With respect to MSVC, I've read that compatibility between Debug and Release builds is kind of suspect, especially if you're using STL types. Not to mention differences in MSVC version. Is this still a concern?
> I'm more concerned with someone being able to build a set of shared libraries on their distribution of choice and those shared libraries being able to interact naturally regardless of which compiler s/he uses to build each of them.
Sorry, but this is an unreasonable standard. Literally no language, including C supports this. With C it only works inasmuch as the C compiler authors work really hard to make it works, and even then it sometimes breaks (if your compiler inlines a call to malloc, and you free a pointer compiled with a different C Compiler that inlined a different malloc implementation, it can break horribly. Yes I've seen this happen.)
Some languages support cross-version linking (or whatever the language's equivalent of "linking" is), but I'm not aware of any that specify a complete ABI for unrelated implementations to support. IPC libraries do typically support this though.
[edit]
I don't want to go on a shared-library rant, but I am fairly strongly opposed to them (except perhaps in cases like how nixos manages it). You can take a statically linked binary from 1997 and run it unmodified on your linux machine today. It is a virtual guarantee that any dynamically-linked binary more than 2 years old will not work correctly. Linus puts a huge amount of effort into backwards compatibility, and it is completely destroyed by dynamic linking.
> If you use the same compiler, ABI is a non-issue
And yet... microsoft releases a new compiler every two years or so, and not every library you use is going to update at the same time. This is a huge frustration for a lot of people.
I write c++ professionally and I've seen people waste weeks on these things, and most the libraries we wrote had plain c interfaces because being able to use other languages to call into the code was important and c++ is a nightmare with that.
Yeah, MSVC breaking ABI is somewhat annoying, but I am also used to keeping the most recent half-dozen MSVC's installed.
VS 6.0 is getting very hard to source legally these days, and I wish MS made it easier to get.
As far as having high-level languages call directly into C++, yes that's quite a pain (nearly impossible without something like https://github.com/rpav/c2ffi). Note also that calling into non-C ABI functions in any language is hard (and most HLLs don't support anything like extern "C" to make it easy).
Partly this depends on the platform.
C++ is well supported on Microsoft's .NET platform where you can access all the functionality of the .NET libraries through C++.
STL, I guess, is more used on Linux.
I would advise against trying to use portable libraries and instead using libraries designed for the platform you are targeting.
Having said that, a good portable UI library is the open source WxWidgets which is accessible through C++ for OSX, Linux, Windows
1) Ever try to upgrade msvc versions? It's always a huge problem if you're using libraries you don't have source for. Not to mention the 50 million linker issues if one library is linked statically and the rest dynamic. There are still people on like msvc 6 because of this.
2) Header only libraries are horrible for compile times, especially heavily templated ones (and if you use c++ generics it basically has to be a header library). The reason boost is banned from a lot of cpp projects isn't because the library is bad, it's because of compile time.
The C++ version uses many memory allocations. Using allocators some in the C++ program would certainly cut down on the number of allocations. It would also be interesting to see if doing so also improved performance.
Similarly, it would be interesting to see if using the C++17 string_view (or the gsl version if C++17 isn't available to you) instead of `const string &` parameters affected performance.
Finally. I see that in most (all?) cases, objects are returned by value, not returned through reference parameters or pointers. It's interesting to see that that choice didn't compare poorly to a C implementation.
I knew C++ compilation was slow but 30x slower? I'm sure the compile-time memory usage will also show a similar trend. It would be interesting if somebody could explain the reason for this disparity.
Based on your dependencies and programming style it might be less or more, but it sounds not unreasonable.
I wrote the same program in two environments once: C++ with stdlib and boost, and C++ purely using QT abstractions. The second one already compiled in 1/5th of the time. I guess that's because with QT the headers are small because most implementations are hidden behind pointers (PIMPL), while with boost you often pull in lots of code through headers and compile dozens of specializations of similar types to avoid indirection costs.
Not exactly a fair comparison when the C program is using 1.5MB of pre-compiled dependencies while the C++ program reimplements all of the functionality from the dependencies in the code being compiled.
It's a well known fact. Just by including some headers from the standard library the compiler has to go over huge chunks of library template code, and it usually needs quite a few complicated phases to slowly morph those templates into executable code.
The problem, by and large, is that C++ is heavily dependent on header files to implement the Standard Library. It's largely templated, which means there's no way to make a pre-compiled version, the code generated varies wildly depending on the types involved.
C has relatively simple header files, they usually contain structs, function signatures, and a bunch of macros. They're easy to parse and apply by comparison, plus don't tend to be as deeply nested.
If C++ ever adopts the Pascal-style "module" extensions that have been kicking around in various proposals compile times could shrink by several orders of magnitude.
> If C++ ever adopts the Pascal-style "module" extensions that have been kicking around in various proposals compile times could shrink by several orders of magnitude.
I'm skeptical. Modules don't avoid the need for template instantiation.
Template instantiation surely only requires type substitution and re-running some analysis though. What makes C++ compilation slow is reparsing headers again and again and again because the C the preprocessor means that every time they are encountered they may have new semantics.
The motivation for modules in C++ is similar to that of developing a Binary AST for Javascript, discussed on HN recently.
> What makes C++ compilation slow is reparsing headers again and again and again because the C the preprocessor means that every time they are encountered they may have new semantics.
Really? And I thought that this is why C and C++ headers are typically wrapped
in #ifndef-#define-#endif block, so they only produce whitespace after
preprocessing on second inclusion.
Yes, this happens inside a single translation unit (.cpp file). However if you have multiple .cpp files which include the same header file you have to reparse it each time. This is because before the inclusion of that header different #defines might have been set (e.g. through other headers), and therefore the content of the header file might be different.
It's not the second inclusion that's a problem but the way any given template might behave completely differently depending on what order they're loaded in.
That is, including a, b, c is not necessarily the same as a, c, b or b, a, c. This is not true with proper modules, they're order invariant, and as such you can make a ton of optimiztions.
If modules ever happen, wow, C++ is going to feel like a whole new language. I remember large Pascal codebases compiling in as little time as it took to press the key, and this was in the era of computers with mere megabytes of memory.
That's probably not just modules though. Pascal, being one of Niklaus Wirth's languages, was specifically designed to be easy to compile, generally not even requiring building an AST (though a particular compiler still might, especially of it added extensions to the language).
In this case it's likely headers size. In native C++ programs you usually have more complex syntax (e.g. namespace is not global any more and each id has to be properly located in a hierarchy of namespaces) and, sometimes, compile-time evaluation via templates.
"The C++ version has no pointers but instead uses value types. This means that all data is stored twice: once in the array and a second time in the hash table."
This is interesting. Are they using modern C++ and making use of moves and perfect forwarding? Or are they just throwing std::strings around and doing millions of copies (e.g. remember std::vector must support copyconstructable, so copy constructors & operator=) in the process? That would explain the allocations in C++ being higher perhaps, particularly if they're using the "wrong" containers. Why not sure unique_ptr or shared_ptr?
It is worth remembering that move constructors and assignment operators only get used in very specific places and you have to ensure that any constructors you write yourself are explicitly noexcept.
> Every manual resource deallocation call is a potential bug. This is confirmed by the number of memory leaks as reported by Valgrind. There are more than 1000 of them, several dozen of which are marked as "definitely lost".
You can't compare performance If one program doesn't free memory, which obviously "saves" time. Valgrind can tell you where non-freed heap blocks have been allocated and a fix should not be complicated.
In general there is no way a C program(apple to apple, comparing to its similar c++ version) will be larger than C++, be it static, shared libraries included, or whatever.
Today, sure, but there's no assurance that this will be true in the future. If more compiler-friendly extensions are added to C++ to help it generate tighter, more nimble machine code because it's given more leeway in optimizations, then the C++ code could be substantially smaller. C doesn't seem as interested in adopting some of the C++ paradigms that could make optimization better, tools like formalized iterators and such.
There's been various attempts at pre-compiling the headers over the years, but the results have always been, for various reasons, less than perfect.
If you ignore the protection mechanisms and the class heirarchy built into C++, then a C++ class is like a C struct that can contain function pointers. For many programs this is all that's needed, and the amount of overhead involved in using such an approach to creating objects is obviously lower. So there's no question C will always be faster. It's only when you need protection and class heirarchies that C++ benefits you. That benefit is mainly one of better code organisation.
> a C++ class is like a C struct that can contain function pointers.
No, it's not. Calling an ordinary class member function in C++ has exactly the same overhead as calling a function in C. Even virtual functions in C++ are not the same as putting function pointers in a C struct (they live in a separate data structure called the vtable).
> the protection mechanisms and the class heirarchy
All C++ protection mechanisms occur at compile time and have no runtime overhead. Non-virtual inheritance hierarchies have the same overhead as C struct composition (because under the covers the memory layout is the same).
Hi Colin. This isn't an optimisation though, it's a guarantee. Member function calls are resolved statically at compile-time. Replacing indirect calls off a function pointer with direct calls (devirtualization) is an optimisation that applies to both languages equally and requires whole-program / link-time optimization.
Doesn't a vtable imply an extra level of indirection? You have to find where the vtable is in the object, then the function within the vtable, right? Is that not slower?
In common implementations the vtable pointer is always the first word inside the object. Given an arbitrary pointer to an object, the offset of this vtable pointer relative to what you're pointing to is always computed statically at compile time. Unless you're using multiple inheritance, this offset is usually zero because derived object pointers in a single-inheritance hierarchy actually always point to their base.
If you're using multiple inheritance then an object can have multiple vtable pointers, but again which one you need to use is known at compile-time based on which class the virtual function you're calling is declared within, and the type of pointer you have.
Once you have the vtable you then have to locate the function pointer for the function you're calling. Again, this is usually a compile-time constant offset from the start of the vtable. This ceases to be true when you have 'virtual inheritance' (not to be confused with virtual functions), when another indirection to find this function pointer is required.
You'll notice that the get_square() function, which returns a member function pointer to the virtual square function, doesn't even return any memory addresses, just metadata and an offset
My point is simply this -- adding protection mechanisms and inheritance to classes neccesitates adding more complexity to the structure used to represent them (such as a vtable) which does add performance overheads. If you dont need those features, you can go leaner and faster with a C structure that includes function pointers to give you the basic packaging of data and functions that an object has.
That is absolutely false. C++ member functions are zero-cost abstractions, i.e. they have the same cost as any other function call. Member functions are _not_ function pointers that reside inside the struct. They don't take up space, they don't need dereferencing to call. They are just "syntatic sugar" to group functions more logically.
> Member functions are _not_ function pointers that reside inside the struct.
...as long as they're not virtual functions, this is correct. Add a virtual table and this is less correct (but the optimizer may still make it correct if it can prove the types match).
Yes, that is correct, but then again virtual functions give you new functionality, namely dynamic dispatching. Stick with static inheritance and you won't have this overhead.
Except that the poster above you shown that the opposite was true. The exact same program was written multiple times with C structs and with C++ classes and,
"Except for the non-inline unbundled monostate in C++, every non-inline C++ implementation outperformed every non-inline C implementation."
Protection implies a much more complex structure to represent an object and class heirarchies and inheritance imply the need for a runtime. Both these overheads come at a cost. It's the nature of the program you are wriitng that determines whether you will come out ahead. If you were writing a codec, say, you would not use C++.
Well, if you override the implementation of a function in a subclass, the runtime has to determine that and load it in at runtime, when you instantiate an object of the subclass. In C there is no runtime.
This is done at compile time. The call is indirect, which only means the call destination is decoupled from the generated calling code. This does not entail the runtime loading anything.
Yes, but if your design requires the virtual function then you'll be using a function pointer in the C implementation as well, which has the same indirection.
If your design requires inheritance or virtual functions, C++ is the right choice. But in many cases, when it doesn't, if you still use C++, you'll pay the price of an extra indirection and a much larger memory structure to hold your objects. If your function were processing the inner loop of a video codec, that would unneccesarily slow you down
Unless you explicitly type `virtual`, your C++ classes will have the exact same overhead as C structs. Even with inheritance. The memory layout of
struct A { int x; };
struct B : A { int y; };
is the same as if you had written
struct B { int x; int y; };
Public/private/protected inheritance and access control do not add overhead. It's literally only if you opt in by typing `virtual` do you get class hierarchy overhead.
Public/private/protected inheritance and access control do not add overhead
That's interesting. So does the compiler just put the functions in different parts of the vtable to remember the access control rules. There's no such thing as a free lunch and you're adding information here -- has to be stored somewhere.
Access control rules are all checked at compile time. There's literally nothing to store. If you want proof, check the output of your compiler. The only thing a non-virtual struct/class might do is reorder member variables if they're of different access controls, but if you're just using a C-style struct but with private member variables and public non-virtual member functions, it has literally the same memory layout as it would in C.
You only pay for the indirection for virtual functions. You don't pay it simply for choosing C++. There is no "much larger memory structure" either. The vtable is per class. The per object cost is one pointer. C++ compilers are pretty smart.
As for the "memory leaks" --- I haven't looked at the source, but something whose runtime is very short-lived, like pkg-config, may be very well justified in allocating and never freeing, letting the process exit itself be the "ultimate free". I've seen and done this many times myself.
I've seen projects that turned from simple and straightforward to buggy (and harder to debug), slow, and bloated because someone decided they wanted to "use C++" and would try to make use of as many "modern C++" features as they could.
Converting an existing C program into C++ can yield programs that are as fast, have fewer dependencies and consume less memory. The downsides include a slightly bigger executable and slower compilation times.
My experience has been the complete opposite.