> while writing in Rust is usually less effort than the original unsafe code too...

kbenson · on June 13, 2016

Same goes for physical infrastructure, BTW. If your infrastructure is old and you fear some bridges may collapse, you inspect them all. You rebuild the crumbling ones and strengthen those with cracks. What you don't do is rebuild all the bridges in the country. That's not only wasteful and comes at the expense of new bridges you could have built, but may end up increasing the danger in some cases, rather than decreasing it.

I would say this is more like a new building material coming along which allows infrastructure to be built that results in a safer construct. No, you don't want to rebuild everything at once, and maybe you never want to rebuild everything, but sometimes old but serviceable things may be considered for replacement because not only can it be made safeer, but there may be other benefits at the same time (less room, more likely to withstand 1 in X type natural disasters, etc.).

Additionally, I never took the "rewrite everything" camp, as presented here, to really mean "everything" as much as to mean "a stack from the bottom up so there are rust versions to use". Java seems has a lot of this, by virtue of being a contained system that doesn't like to share it's memory, so there are Java versions of just about everything. Rust can use C with zero cost runtime cost, I think people just want to make it so that rust must use C for certain things.

anp · on June 12, 2016

A big part of writing about this was to share one potential strategy for reducing risk in rewrites by doing it function by function. That's a strategy which is available to Rust with zero overhead FFI, so I'm not sure why you're holding that up as a benefit exclusive to static analysis and verification tools.

pron · on June 12, 2016

Because -- and I may be wrong about this -- the FFI has zero runtime overhead, not zero coding effort overhead (i.e., FFI code isn't identical to native Rust code). So you'd want to rewrite again (the FFI code to native code) as you translate more functions.

anp · on June 12, 2016

How does that differ from what's been reported is extensive work to appease static analysis tools?

pron · on June 12, 2016

It differs by being a far more expensive (and probably less effective) way to achieve the goal. Those verification tools were designed to be spectacularly less expensive than a rewrite in a new language (which is often prohibitively expensive), or they wouldn't have been designed in the first place (they're all much newer than safe systems languages like Ada).

I said that using verification tools is expensive, but I'd be surprised if it's not at least one, if not two, orders of magnitude less expensive than a rewrite (I can think of few if any more expensive ways of increasing confidence in such large amounts of legacy code than rewriting it). It can provide similar -- sometimes worse, sometimes better -- guarantees, requires much less creativity, it automatically focuses you on places in the code that are likely to be buggy (or, alternatively, shows you which code is likely not buggy) and cannot introduce new bugs that you have absolutely no way of detecting before pushing the new code to production systems.

(BTW, the chief complaint against those tools -- and what makes them expensive to use -- is that they have too many false-positive reports of potential bugs, but even "too many" is far fewer than every line, which is what you'll need to consider and study when rewriting.)

pcwalton · on June 13, 2016

> and cannot introduce new bugs that you have absolutely no way of detecting before pushing the new code to production systems.

Yes, they can. Remember the Debian OpenSSH vulnerability that arose from blindly making changes that Valgrind suggested?

> It can provide similar -- sometimes worse, sometimes better

Much worse, unless you're talking about systems that verify memory safety problems by disallowing dynamic allocation.

> Those verification tools were designed to be spectacularly less expensive than a rewrite in a new language (which is often prohibitively expensive), or they wouldn't have been designed in the first place (they're all much newer than safe systems languages like Ada).

I don't agree. Are you talking about things like Coverity? Coverity hasn't effective at stemming the tide of use-after-free and whatnot.

I strongly encourage you to familiarize yourself with Rust's ownership and borrowing discipline and to understand how it prevents problems like use-after-free. You'll find that it's very hard to retrofit onto C, and that's why no practical static analyses do it.

nickpsecurity · on June 13, 2016

I agree that Rust is superior solution if we're talking use-after-free detection. Also agree on tool immaturity. However, recent results might make you reconsider how practical they are.

http://research.microsoft.com/en-us/um/people/marron/selectp...

https://dl.acm.org/citation.cfm?id=2662394&dl=ACM&coll=DL&CF...

IMDEA's Undangle seems to have nailed it totally with one other doing pretty good. You're the compiler expert, though. Does Undangle paper seem to be missing anything as far as UAF detection or is it as total as it appears?

kbenson · on June 13, 2016

What I think is relevant in this case is not whether an an external tool for C can reach the same levels of safety presented by Rust, but whether an external, optional tool will every be able to provide the same level of assurance given that you can't ensure confirmity on the (source) consumer end without getting all the same tools and rerunning them (which is the situation you have when it's built into the compiler.

Another way of stating this is "code verification tools for C are great! What level of market penetration do you think we can achieve? Oh. That's disappointing..." :/

nickpsecurity · on June 14, 2016

That's a great point but a bit orthogonal. It's very important if one is aiming for mass adoption. Hence, why C++ was built on C and Java combined a C-like syntax with a marketing technique I call "throw money at it." We still need to consider the techniques in isolation, though.

Here's why. One type of tool needs to either not force conformity or blend in seemlessly with everyone's workflow with wide adoption due to awesome compatibility, safety, efficiency, and price tradeoffs. That's HARD. The other is just there for anyone that chooses to use it recognizing value of quality in a lifecycle. It just needs to be efficient, effective, have low to zero false positives, work with what they're using, be affordable, and ideally plug into an automated build process. These C or C++ techniques for safety largely fall into category number 2.

I totally agree with you on overall uptake potential. There's almost none. Most of the market ends up producing sub-optimal code with quality non-existent or as an after-thought. Those that do quality in general follow the leader on it. It's a rare organization or individual that's carefully researching tools, assessing their value, and incorporating everything usable into the workflow. Nothing I can really do about this except push solutions that are already mainstreaming with the right qualities. Rust, Go, and Visual Basic 6 [1] come to mind.

[1] When hell freezes over...

pron · on June 13, 2016

> Yes, they can. Remember the Debian OpenSSH vulnerability that arose from blindly making changes that Valgrind suggested?

Obviously, I'm not suggesting to blindly make any code changes. But do you honestly think the risk is anywhere at the same level as rewriting the whole thing? At least concentrate the effort where there likely is a problem rather than everywhere.

> Much worse, unless you're talking about systems that verify memory safety problems by disallowing dynamic allocation.

No. I am talking about both tools that are designed to uncover errors somewhat similar to Rust (SLAyer, Infer) as well as tools like CUTE and American fuzzy lop, that have completely different strengths. The former may be much worse; the latter may be much better. But let me suggest this: how about using tools like CUTE or AFL to generate more tests first, just so you at least have a better idea of which code is more problematic than other? You're suggesting to rebuild every bridge before even conducting an inspection that would give you an idea about where most problems lie. Those tools have been proven to be effective (see a very partial list here: https://news.ycombinator.com/item?id=11891635) and they do require orders of magnitude less work than a manual rewrite (they require less effort than even reading the code, let alone translating it). Shouldn't they at least guide the effort, help us concentrate our force where it would yield the biggest advantage?

> I strongly encourage you to familiarize yourself with Rust's ownership and borrowing discipline and to understand how it prevents problems like use-after-free. You'll find that it's very hard to retrofit onto C, and that's why no practical static analyses do it.

I am familiar with Rust's borrowing and I absolutely fucking love it! It is freakin' awesome! I recommend it to every C/C++ developer, and I sing Rust's praises whenever I have the chance; I really (really really) do. I think that Rust is definitely one of the most significant contributions to systems programming in the past couple of decades (if not more), and it is one of the most interesting (and potentially significantly useful) languages of any kind. I love and admire what you've done. Just the other day I told a friend that I'm sad I don't do as much low-level system's programming as I used to just so that I could have a chance to closely work with Rust rather than merely admire it.

But that has little to do with what I'm saying. First, I am not saying that the two approaches would squash the same bugs (see above). Second, suppose you're right about your analysis of the quality of the result (even though I disagree, as above). The question is not which would yield a better result overall (and I might agree that a Rust translation would, if only by ensuring that every line of code out there gets read and analyzed) but whether the result is worth the undertaking, and whether the best, most cost-effective way to improve the safety/correctness of legacy code of unknown, probably widely uneven quality, with little to non-existent test coverage is by rewriting the whole thing whenever a new, safer language comes along. I must say that I am truly surprised that you suggest that to be the most worthwhile, economically reasonable approach. Rebuilding every bridge when a stronger building material is invented really is terrific (provided that the work is infallible); but is that the most economically sensible approach to improve infrastructure? Isn't the effort best spent elsewhere?

anp · on June 12, 2016

I think that since you're making concrete claims about actual cost and benefit, perhaps it's appropriate to suggest that you cite some sources (postmortems, white papers, journal articles) which have found these assertions to be true?

pron · on June 13, 2016

You're suggesting that rewriting billions of lines of code that probably contain lots of bugs somewhere every time a new safer language comes along is the most sensible investment of effort by the software industry in order to improve the quality of our software infrastructure, while I'm saying, wait, maybe we should use some inspection tools first before we decide on an approach, so at least we know where the worst of the problem is, and you think I'm the one who needs to cite sources? Alternatively, do you think that "rewrite all legacy code!" (an effort that would probably cost billions of dollars) does not need to be justified by concrete claims about actual cost and benefit?

But fine, here's a taste (look for what is required to use the tool as well as results):

* http://mir.cs.illinois.edu/marinov/publications/SenETAL05CUT...

* http://lcamtuf.coredump.cx/afl/ (see the "The bug-o-rama trophy case")

* http://research.microsoft.com/pubs/144848/2011%20CAV%20SLAye...

* https://code.facebook.com/posts/1648953042007882

* http://link.springer.com/chapter/10.1007/978-3-319-17524-9_1

Note that there are some strengths, some weaknesses, but I hope those sources make it clear that using those tools first is at least more sensible than "rewriting everything", as they require orders of magnitude less effort, and they do uncover bugs (you're free to guess what percentage of those bugs would be found in a rewrite, and whether the bug density demands a rewrite). We may decide that a rewrite of some libraries is the best approach once we have more information.

Now, do you have any sources suggesting that a complete rewrite is cost-effective?

kbenson · on June 13, 2016

The biggest problem I (not being the people you've been conversing with so far, for the most part) have with relying on static analysis tools on top of C, instead of a rewrite, is that the analysis tools are not enforceable. You can do all the analysis you want, and put in all the procedures you want, but in the end people are submitting changes to these libraries, and you need to make sure that these people are ensuring the tools get run prior to release. Additionally, from the perspective of a user of this code, my ability to look at the source and know that it's been scanned ranges from uncertain to non-existent. Switching to a compilation system where these constraints are enforced (such as a new language that enforces it) has a benefit in that respect. You could get a similar benefit by releasing a specialized C compiler that enforced some static analysis tools and required a very small amount of syntax change such that the code would not compile on a vanilla C compiler, and then I think the there would be much less of a case for a rewrite in Rust from a safety point of view (even if they would handle slightly different aspects of safety, as you've pointed out).

That said, I'm not sure it makes sense to rewrite "everything" in Rust, but it would be nice to see one or two representatives of each thing in rust, so you can rely on the assurances Rust provides as much as possible as far down the stack as you can, if that's desired. In other words, I don't think we need every compression library ported to Rust, but I would like to see one or two choices for each type of compression, as applicable. An easy way to jump-start this is to take a library and port it.

pron · on June 13, 2016

1. We are not talking about code that is under heavy development. I specifically said that rewriting code may be sensible if you intend to make big changes.

2. Obviously, verification tools (and I'm not just talking about static analysis, but also whitebox fuzzing and concolic testing) do have their downside. The question is not which approach yields the best overall result, but which is more economically sensible given the goal.

> it would be nice to see one or two representatives of each thing in rust

Absolutely, as long as it's done while analyzing the cost vs. benefit. What isn't so nice is to have the software industry waste precious efforts rewriting instead of developing new software, without analyzing where a rewrite would yield the most benefit. Obviously, anyone is free to choose what they want to spend their time on, but I think that as an industry we should at least encourage projects with higher impact first.

kbenson · on June 13, 2016

> What isn't so nice is to have the software industry waste precious efforts rewriting instead of developing new software, without analyzing where a rewrite would yield the most benefit.

That's entirely subjective based on how short-term and long term your thinking is, and whether you think C will still be the dominant language for some things in 10 or 20 years.

I think C is a local maximum, it's already surpassed its limits as a language (as we're now talking about extra-language tools to enforce non-language constraints), and is approaching its potential limits. We can put X time in making it slightly better, and get closer to the asymptotic limit of C's capabilities, or we can spend some (small) multiple of that time to reimplement in something that is by its nature slightly more capable immediately, and with possibly much more future potential.

Fixing up the old Pinto year after year because it still works and gets you from here to there always looks like a good decision when you are just looking at your monthly outlay. But when you consider environmental impact, safety to yourself and those around you, and comfort, spending that extra money to get the new Focus starts to look really appealing. Sure, look at all that money you're throwing out by buying a newer car, but it's a fallacy to think you got nothing for it, and would have been just as well served by the Pinto. You mght have been, or tomorrow might have been the day you got in an accident. What would you rather be driving when that happens?

Note: This doesn't really mean that Rust needs to be the next thing, but I do think it's a fallacy to look purely and time spent when considering language rewrites, where there are so many important factors that ignores.

pron · on June 13, 2016

But you don't need to fix anything year after year. You only need to fix each bug once. If you decide to significantly modify the code or add new features -- by all means, rewrite if you think it's worth it. But correct untouched code is correct regardless of the language used to write it. And this is what we're talking about here: mostly untouched code. The limits of C as a language are entirely irrelevant, as the vast majority of this code interests us mostly in its machine representation. It's not under heavy development, so who cares what language it's written in if it's correct? And even if you do care, do you care enough to invest so much money to that particular end, money that could be invested elsewhere?

Do you honestly think that the benefits of rewriting mostly correct, mostly untouched code in a new language is the best use of those resources? Do you even think that's the most cost-effective way of catching bugs (if so, I don't think you're aware of the capabilities of modern verification tools)?

> or we can spend some (small) multiple of that time to reimplement in something that is by its nature slightly more capable immediately, and with possibly much more future potential.

I think we're talking about at least a 10x larger effort, possibly 100x, but we won't know until we measure the correctness of said code to the best of our abilities using tools that can approximately do that rather effectively.

> as we're now talking about extra-language tools to enforce non-language constraints

External tools are always necessary if you want to improve correctness, regardless of the language you use. Rust (and virtually every industry language out there) cannot prove most interesting functional properties, either.

kbenson · on June 13, 2016

> But you don't need to fix anything year after year. You only need to fix each bug once.

I was talking at a language meta-level, not for a particular program.

> But correct untouched code is correct regardless of the language used to write it.

A tautology, but largely irrelevant to the discussion, as if we could easily determine correct for anything other than the most trivial code, then this would be a non-issue. In practice, C has ended up being a poor choice for attempting to write correct code, as evidenced by the last few decades.

> Do you honestly think that the benefits of rewriting mostly correct, mostly untouched code in a new language is the best use of those resources? Do you even think that's the most cost-effective way of catching bugs (if so, I don't think you're aware of the capabilities of modern verification tools)?

The good thing is it doesn't matter. If it's done a little, and seems effective and is prized by users (users being both developers and end users), then it will happen. There isn't some central authority of programming that we need to make a case to, there's a free market to feel around the edges of the problem, and emergent behavior to actually make a choice. The good thing is that I think on average it's a few magnitudes more likely to make a good choice than anyone here as to what's the more useful and effective way to move forward.

> I think we're talking about at least a 10x larger effort, possibly 100x, but we won't know until we measure the correctness of said code to the best of our abilities using tools that can approximately do that rather effectively.

I don't think it's anywhere near that high, given the level both languages operate and and the similarity of the syntax. I think you would only approach those levels for more complex and esoteric pointer use, for small subsets of the code base, and we aren't measuring compared to rewriting in a language with the exact same capabilities, we're measuring compared to C with special analyzers and techniques used to reduce or remove problems, which I though would more than likely require a rewrite of those sections anyway to see any benefit.

> External tools are always necessary if you want to improve correctness, regardless of the language you use. Rust (and virtually every industry language out there) cannot prove most interesting functional properties, either.

That is easily proven false. Rust can be seen as a language that does not enforce memory correctness with a memory correctness tool bundled into the compiler. All the tools for C could be bundled into the compiler and a new language called Cfoo could be defined by what that supports. Every external tool can be bundled into the compiler (or the runtime, as applicable, by the compiler including it in the generated binary or its existence as part of the VM).

In the end, the same arguments regarding correctness apply to varying degrees to things written in assembly, and even raw machine code. We could be applying varying levels of analysis techniques to our assembly at this point, but a shift happened in the past when people saw languages that provided enough benefits that they saw the point of switching. That we mostly settled on C and haven't moved past it yet (depending on how much you consider C++ a C, and how incremental you see the changes) is largely an artifact of momentum, I don't think we should be discouraging people from trying to make that jump out of C for our infrastructure. I think that's the equivalent of you or I sitting in our comfortable homes and offices, carpeted, heated, sturdily built out of wood, steel and concrete, and noting that yes, while the majority of people either work or live in mud huts, it's a waste of resources for them to attempt to replace them with more modern housing when they can just retrofit their mud huts with a heater and portable stove and will be just as "serviceable". The best people to assess whether it's worthwhile or not are those doing the work, not you or I.

pron · on June 13, 2016

> Rust can be seen as a language that does not enforce memory correctness with a memory correctness tool bundled into the compiler.

What you call "memory correctness" is extremely valuable, but isn't correctness (in the software verification sense). Correctness is an external property of an algorithm. There are very few languages that can even express correctness properties. Correctness means something like: "every request will retrieve the relevant user data"; not: "the request will not fail due to a dangling pointer reference".

> I don't think we should be discouraging people from trying to make that jump out of C for our infrastructure

But I'm not discouraging people to move out of the C infrastructure. I want people to write new infrastructure. I just think it's a waste of effort to rewrite old infrastructure in new languages. To use your analogy, people want to rebuild the same mud huts just out of better mud. I say to them: what do you want? Do you want better mud huts? Then reinforce what you have; it will be a lot cheaper. But if you want to build something from scratch, why not build something new? Why spend the development budget on what is essentially maintenance? I don't know if that would be worth it, but at least it seems more worthwhile than re-implementing 1960-70s design with 201x materials, when we can make sure those mud huts stand long enough until we do come up with something new.

kbenson · on June 13, 2016

> Correctness is an external property of an algorithm.

While true (depending on what you think I meant by correct in that context), it's largely irrelevant to my point. Any external tooling can be merged with the compiler or VM.

> I just think it's a waste of effort to rewrite old infrastructure in new languages.

As you've stated multiple times here. Can I get a few examples of what you mean? I suspect there's a mismatch in our interpretation of the meaning of that statement.

Where does taking a libc implementation and converting with light refactoring fall, given there exists no other implementation in rust?

What about a new libc implementation from scratch in rust, using rust, given that there exist (non-rust) libc implementations?

What about a new libc implementation from scratch in rust, when there exists another rust libc implementation that isn't just a conversion?

What about an SSL library conversion (OpenSSL or NSS)?

What about a new implementation of SSL that conforms to OpenSSL's API?

What about a new implementation of SSL functions that has a completely new API to achieve the same operations?

pron · on June 13, 2016

> Any external tooling can be merged with the compiler or VM.

Not so simple. There is a debate whether it's better to specify correctness in the language or externally.

> Where does taking a libc implementation and converting with light refactoring fall, given there exists no other implementation in rust?

Probably not worth it. But how about writing a Rust standard library in Rust that doesn't implement libc's API?

> What about a new implementation of SSL that conforms to OpenSSL's API?

May be worth it, given OpenSSL's state and importance.

> What about a new implementation of SSL functions that has a completely new API to achieve the same operations?

Go for it!

kbenson · on June 13, 2016

> Not so simple. There is a debate whether it's better to specify correctness in the language or externally.

What's a benefit of having an external tool for that compared to something built in? Nothing comes to my mind that wouldn't be better handled as a per-location way to turn specific rules on and off (similar to Rust's unsafe blocks).

> Probably not worth it. But how about writing a Rust standard library in Rust that doesn't implement libc's API?

Well, yeah, that would be a good thing, but it's largely irrelevant to my point. In this case libc is just a wildly popular library we are considering presenting a new version of (whether it be largely a "fork" and refactor into rust of an existing one or a new implementation).

> May be worth it, given OpenSSL's state and importance.

Then I don't really think we are in disagreement on the "rewrite everything in rust" stance, as much as how we interpret that statement is different. I take it as far less literal, and more of a shorthand for "let's get rust implementations of critical infrastructure built so we can have some assurance a certain class of bugs can be close to eliminated. If the initial version of that is in some cases a conversion, that's not optimal, but it does provide some benefit, increase the corpus of rust work to refer to, prove rust's capability in the problem space, and provide a springboard for future projects".

That's a far cry from "let's rewrite every line of billions of lines of C code into rust because we want to rewrite it all".

pron · on June 13, 2016

> What's a benefit of having an external tool for that compared to something built in?

Because 1/ building full specification power into the language may make it unnecessarily complex, and 2/ software verification and PL design often progress at different paces, so it's better to have them orthogonal. I

It is unclear how such a language should work, and certainly whether anyone has come up with a decent idea yet. For example, the general dependent type approach used by Coq and Agda (and Idris and ATS) doesn't seem to work well so far for software development. So, perhaps theoretically the two may one day be combined, but we don't know how to do it just yet. You can call it an open research question.

kbenson · on June 13, 2016

> 1/ building full specification power into the language may make it unnecessarily complex

In that case, I'm not sure how you square the school of thought that you can use these external tools to solve the problems of C, or that it's at least more cost effective than something new, like Rust. If it's unnecessarily complex, does it matter if it's compiler+external tool or compiler+integrated tool? I think the same argument cuts both ways, except that in C it just means the tool won't be used, so the software will be of lower quality.

> 2/ software verification and PL design often progress at different paces, so it's better to have them orthogonal.

Having some verification integrated does not preclude further verification techniques from being developed. Types are a verification technique, and they exist in both C and Rust (at different levels). I'm not sure why it's okay for C to have some included verification, but not for languages to evolve further in this vein.

pron · on June 14, 2016

> If it's unnecessarily complex, does it matter if it's compiler+external tool or compiler+integrated tool? I think the same argument cuts both ways, except that in C it just means the tool won't be used, so the software will be of lower quality.

I'm not arguing in favor of one approach over another. My original point in this thread was solely about legacy codebases that are well-established and are no longer under active development, hence your arguments about enforcements do not apply. Now I'm talking in general. I don't know what you mean by an integrated tool, but there is a difference between making a language able to specify programs (like Idris), which may make it very difficult to use, or using a language that is relatively easy to use, plus a specification language, that's also easy to use.

> I'm not sure why it's okay for C to have some included verification, but not for languages to evolve further in this vein.

Again, my original point was only about codebases that aren't heavily developed, like libc. A language like Rust should and will have external verification tools, too, because its type system isn't powerful enough to specify correctness, and that's a good thing (because we don't yet know how to build friendly type systems that are capable of that). All I said was that if the codebase is huge, and if it is established and largely correct, and if it is no longer being heavily maintained, and if the language has good verification tools, and if there are no tests that will catch new bugs you will introduce in the translation, then it is more effective and far cheaper to use those tools than to rewrite in another language.

kbenson · on June 14, 2016

I understand your original point (if I still don't agree entirely), I was addressing some topics you brought up as tangents. I understand a language will always develop external tools is used enough, but I don't think that precludes migrating tools that have proved themselves into a language if the trade-off is clearly a good one, or at least a good one for a subset of problems (I think Rust fits this description).

To take a different tack on why I think it's not a waste to focus on rewrites of well established, mature code bases, consider the following: Rust is young. There aren't a lot of projects out there using it (comparatively), and almost none in certain sub-areas where it is supposedly a good fit (low level systems programs). A good way to find pitfalls, develop best practices and idioms, and find the pain points of the language is to use it to reimplement a well known, mature, stable project and see how it turns out.

You can say "sure, but we shouldn't focus on just that!", which is true, but also a non-sequitur, as I'm not sure anyone is actually doing so, or even suggesting it be done exclusively that way. I know what you are arguing, I'm just not sure why you feel the need to argue it, since I don't see anyone actually holding a position opposite you (the "we need to rebuild every project in Rust" view).

pron · on June 14, 2016

> but I don't think that precludes migrating tools that have proved themselves into a language if the trade-off is clearly a good one, or at least a good one for a subset of problems

Sure.

> A good way to find pitfalls, develop best practices and idioms, and find the pain points of the language is to use it to reimplement a well known, mature, stable project and see how it turns out.

Sure. I said that if the goal is to experiment with Rust, then rewriting some old code is a good exercise. But if the goal is to improve the quality of ~50MLOC, there are cheaper, less risky approaches.

tome · on June 13, 2016

> You can do all the analysis you want, and put in all the procedures you want, but in the end people are submitting changes to these libraries, and you need to make sure that these people are ensuring the tools get run prior to release. Additionally

This sounds like a problem that's easily solved with tooling, including continuous integration and mandatory pre-release testing.

kbenson · on June 13, 2016

Tooling is specifically not the problem. Since you can't enforce the client runs the tooling, you can't ensure the code is actually conformant. You can put policies and tooling in place so that it should be, but there's no way to ensure it. If you're both the developer and the user, sure, you can get pretty close, but what about an open source project that's freely available? You aren't going to get anywhere close to making sure it's tested at most compiles, and if you can't ensure that, you can't really protect against something of someone subverting your tooling, on purpose or accident and getting non-conforming code in the repo.

anp · on June 13, 2016

> You're suggesting that rewriting billions of lines of code that probably contain lots of bugs somewhere every time a new safer language comes along is the most sensible investment of effort by the software industry in order to improve the quality of our software infrastructure

From my own post:

> ... there are a bunch of calls to just rewrite the world in Rust, but even given unlimited resources, is it actually a reasonable thing to do? How well can Rust actually replace C in C’s native domains? What are best practices for managing the risks inherent in a project like that (injecting new bugs/vulnerabilities, missing important edge case functionality, etc.)?

> I don’t really know good answers to these questions...

So I'm not sure exactly where you're getting that argument. This also isn't "every time a new safer language comes along," Rust (AFAIK) is one of very few languages which is generally usable, memory safe, and without a garbage collection.

> ... do you think that "rewrite all legacy code!" (an effort that would probably cost billions of dollars) does not need to be justified by concrete claims about actual cost and benefit?

Absolutely not. But I don't see anywhere that I or others are making that claim. Part of what interests me here is to explore possible avenues of these rewrites.

> so at least we know where the worst of the problem is

I think that CVE's already give us a pretty good idea of places we might start if we were to seriously undertake this endeavor (as opposed to the hobbyist exploration I've started).

> But fine, here's a taste (look for what is required to use the tool as well as results): ...

Some of these look quite promising (and I suspect most on HN are familiar these days with AFL), but the only one which purports to present a cost/benefit analysis or postmortem requires paying $30 for the chapter. I'm not saying that analysis tools are a bad thing (and I would argue are quite badly needed because no matter how badly we want to we could never rewrite "everything" in Rust). I'm not saying that they aren't useful, or that they don't improve existing projects. I'm just saying that I don't yet see evidence that they are strictly always more cost-effective for needed improvements than a safe-language rewrite. I just don't know. I think that the software community barely knows how to properly quantify these things, so it's no surprise that there aren't any good metrics for making these decisions.

> Now, do you have any sources suggesting that a complete rewrite is cost-effective?

It's a small part of what I'm investigating with this project, specifically with regard to Rust. I think that looking into techniques for managing risk in any project (whether the changes are happening due to static analysis warnings or do to a rewrite) is useful.

Although, regarding Rust, one company (MaidSafe, talk about this here: https://www.youtube.com/watch?v=t8z5rA3A1RA) has already reported very promising results from a C++ to Rust rewrite. Dropbox rewrote a core piece of their Magic Pocket infrastructure from Go to Rust to get a handle on memory usage and has also generally reported positive results (wired article: http://www.wired.com/2016/03/epic-story-dropboxs-exodus-amaz..., subsequent HN conversation with Dropbox engineers: https://news.ycombinator.com/item?id=11282948).

All that said, I wouldn't generally advocate for complete ground-up rewrites without an incremental plan. Which is why I'm experimenting with ways to do so successfully and ergonomically between C and Rust.

pron · on June 13, 2016

> From my own post: ...

I acknowledged what you wrote -- and emphatically agreed with it -- in my original comment. I'm not sure what we're arguing about here (I guess I attributed to you something that would disagree with what I had said).

> Rust (AFAIK) is one of very few languages which is generally usable, memory safe, and without a garbage collection.

Similar arguments could have been (and maybe were) made when Ada was introduced, and will be made again if and when ATS or Idris are ever production-quality. Every good new language -- like Rust and like Ada (though I have serious doubts about Idris/ATS) -- improves on its predecessors. But being better -- even much better -- does not imply that a rewrite of legacy code that is not undergoing regular maintenance is the best possible use of resources (even if it were the gold standard, which I'm not sure it is considering the risks involved on top of the effort). I'm convinced that Rust is better than C enough to warrant the effort of switching languages for new code in many, and maybe even most, C shops. But to rewrite legacy code that is not undergoing regular improvement?

> I think that CVE's already give us a pretty good idea of places we might start if we were to seriously undertake this endeavor

Great, then we're in full agreement. My only warning was about deciding to rewrite before weighing alternatives and without having a good handle on precisely where the effort would yield the greatest benefit.

> I'm just saying that I don't yet see evidence that they are strictly always more cost-effective for needed improvements than a safe-language rewrite.

First you need to know what those needed improvements are. Second, some of those tools require very little effort as opposed to a rewrite, which requires a lot of effort, as you pointed out so well. Wouldn't giving them a try first be a more sensible thing to do (if the goal is to improve the quality of legacy code, obviously not if the goal is an experiment in Rust)?

> Although, regarding Rust...

Both of these concern code that is under heavy active development, something that I specifically said may warrant a rewrite (hopefully after weighing alternatives).

> All that said, I wouldn't generally advocate for complete ground-up rewrites without an incremental plan. Which is why I'm experimenting with ways to do so successfully and ergonomically between C and Rust.

Excellent, which is why I agreed with your post and said so. If you want to make your interesting work more impactful on the industry, please keep track of every legacy bug you find in the process, as well as the number of hours you work and how it is broken down to tasks (learning, translating etc.). Such a report could be invaluable.