More

brucehoult · 2026-03-25T02:22:40 1774405360

Very interesting.

The first question has to be: why?

I don't see any rationale or explanation of the thinking. Is it purely an exercise? Exploration? Is there some algorithm space in which it has an advantage over binary?

Is there a compiler?

How does it compare on Dhrystone or Coremark per LUT compared to a RISC-V core of similar size on the same FPGA?

claudio_mos · 2026-03-25T10:18:05 1774433885

> The first question has to be: why?

There are many reasons. The main one is that no architecture exists that models such a complex ternary processor. At most, there are "on paper" implementations of much less complex architectures; no one has addressed the problem at this level. Having a complete architecture and its hardware implementation now allows us to start developing software on something other than an emulator.

> ---- I don't see any rationale or explanation of the thinking. Is it purely an exercise? Exploration? Is there some algorithm space in which it has an advantage over binary? > -----

No, it's a processor that's available now, both the current hardware implementation (on FPGA) and the Verilog/VHDL description for implementation on other architectures (ASIC?), as well as the specifications made available under license.

>---- Is there a compiler? >----

Hmm, but I mentioned it in the paper; currently, a working cross-assembler (obviously) and a high-level language based on Rust are being designed/built.

>---- How does it compare on Dhrystone or Coremark per LUT compared to a RISC-V core of similar size on the same FPGA? >----

This is an interesting question; the answer is: currently, no. We intend to provide (soon) comparative tests of this type.

brucehoult · 2026-03-13T05:49:43 1773380983

There is nothing "belated" about it.

The "G" extension for everything you want to run shrink-wrapped binaries on a standard OS has been there since the May 7 2014 "User Level ISA, Version 2.0", which is before RISC-V started to be promoted outside of Berkeley e.g. at Hot Chips 26 in August 2014, and the first RISC-V workshop in January 2015 in Monterey.

The name "G" has morphed into now (along with the C extension) being called "RVA20", which led to "RVA22" and "RVA23", but the principle is unchanged.

"An integer base plus these four standard extensions (“IMAFD”) is given the abbreviation “G” and provides a general-purpose scalar instruction set. RV32G and RV64G are currently the default target of our compiler toolchains."

pp 4-5 in

https://www2.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-...

brucehoult · 2026-03-13T05:47:31 1773380851

Rubbish.

The "G" extension for everything you want to run shrink-wrapped binaries on a standard OS has been there since the May 7 2014 "User Level ISA, Version 2.0", which is before RISC-V started to be promoted outside of Berkeley e.g. at Hot Chips 26 in August 2014, and the first RISC-V workshop in January 2015 in Monterey.

The name "G" has morphed into now (along with the C extension) being called "RVA20", which led to "RVA22" and "RVA23", but the principle is unchanged.

"An integer base plus these four standard extensions (“IMAFD”) is given the abbreviation “G” and provides a general-purpose scalar instruction set. RV32G and RV64G are currently the default target of our compiler toolchains."

pp 4-5 in

https://www2.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-...

brucehoult · 2026-03-11T23:32:06 1773271926

> RISC-V hardware with slow misaligned mem ops does exist to non-insignificant extent

Only U74 and P550, old RV64GC CPUs.

SiFive's RVA23 cores have fast misaligned accesses, as do all THead and SpacemiT cores.

I can't imagine that all the Tenstorrent and Ventana and so forth people doing massively OoO 8-wide cores won't also have fast misaligned accesses.

As a previous poster said: if you're targeting RVA23 then just assume misaligned is fast and if someone one day makes one that isn't then sucks to be them.

dzaima · 2026-03-11T23:54:05 1773273245

P550 is, like, what, only a year old? I suppose there has been some laughing at it at least.

Also Kendryte K230 / C908, but only on vector mem ops, which adds a whole another mess onto this.

I'd hope all the massive OoO will have fast misaligned mem ops, anything else would immediately cause infinite pain for decades.

But of course there'll be plenty of RVA23 hardware that's much smaller eventually too, once it becomes a general expectation instead of "cool thing for the very-top-end to have".

I do agree that it'd be reasonable to just assume fast misaligned ops, but for whatever reason gcc and clang just don't, and that's what we have for defaults.

brucehoult · 2026-03-12T03:13:55 1773285235

> P550 is, like, what, only a year old?

No, it was released to customers in June 2021, almost five years ago.

https://www.sifive.com/press/sifive-performance-p550-core-se...

It has take a while for this core to appear in an SoC suitable for SBCs, as Intel was originally announced as doing that and got as far as showing a working SoC/Board at the Intel Innovation 2022 event in September 2022.

Someone who attended that event was able to download the source code for my primes benchmark and compile and run it, at the show, and was kind enough to send me the results. They were fine.

For reasons known only to Intel, they subsequently cancelled mass production of the chip.

ESWIN stepped up and made the EIC7700X, as used in the Milk-V Megrez and SiFive HiFive Premier P550, which did indeed ship just over a year ago.

But technically we could have had boards with the Intel chip three years ago.

Heck we should have had the far better/faster Milk-V Oasis with the P670 core (and 16 of them!) two years ago. Again, that was business/politics that prevented it, not technology.

dzaima · 2026-03-12T13:33:12 1773322392

> No, it was released to customers in June 2021, almost five years ago.

Ah, okay. (still, like, at least a couple decades newer than the last x86-64 chip with slow unaligned mem ops, if such ever existed at all? Haven't heard of / can't find anything saying any aarch64 ever had problems with them either, so still much worse for the RISC-V side).

Well, I suppose we can hope that business/politics messes will all never happen again and won't affect anything RVA23.

adgjlsfhk1 · 2026-03-12T03:13:42 1773285222

> I do agree that it'd be reasonable to just assume fast misaligned ops, but for whatever reason gcc and clang just don't, and that's what we have for defaults.

This very much has a "for now" on it. Once there is actually widespread hardware with the feature, I would be very surprised if the compilers don't update their heuristics (at least for RVA23 chips)

dzaima · 2026-03-12T13:33:52 1773322432

Indeed we shall hope heuristics update; but of course if no compilers emit it hardware has no reason to actually bother making fast misaligned ops, so it's primed for going wrong.

adgjlsfhk1 · 2026-03-12T20:31:23 1773347483

hardware devs traditionally have been pretty good at helping the compiler teams with things like this (because its a lot cheaper to improve the compiler than your chip).

brucehoult · 2026-03-11T23:20:31 1773271231

16 years from the START of getting an idea "why don't we make a new ISA?".

Less than 7 years from ratification of the initial RV{32,64}GC spec.

Less than 5 years from the first mass-produced roughly original Raspberry Pi level $100 SBC: AWOL Nezha, shipped June 2021.

brucehoult · 2026-03-11T23:10:14 1773270614

> VisionFive 2

It's a good solid reliable board, but over three years old at this point (in a fast-moving industry) and the maximum 8 GB RAM is quite challenging for some builds.

Binutils is fine, but on recent versions of gcc it wants to link four binaries at the same time, with each link using 4 GB RAM. I've found this fails on my 16 GB P550 Megrez with swap disabled, but works quickly and uses maybe 50 or 100 MB of swap if I enable it.

On the VisionFive 2 you'd need to use `-j1` (or `-j2` with swap enabled) which will nearly double or quadruple the build time.

Or use a better linker than `ld`.

At least the LLVM build system lets you set the number of parallel link jobs separately to the number of C/C++ jobs.

kashyapc · 2026-03-12T19:19:22 1773343162

> I've found this fails on my 16 GB P550 Megrez with swap disabled but works quickly and uses maybe 50 or 100 MB of swap if I enable it.

I see, I don't have a Megrez at my desk, only in the build system. I only have P550 as my "workhorse".

PS: I made a typo above - the P550 I was referring to was the SiFive "HiFive Premier P550". But based on your HN profile text, you must've guessed it as much :)

brucehoult · 2026-03-10T00:21:26 1773102086

Out of interest I tried running my Primes benchmark [1] on both the x86_64 and x86 Alpine and the riscv64 Buildroot, both in Chrome on M1 Mac Mini. Both are 2nd run so that all needed code is already cached locally.

x86_64:

    localhost:~# time gcc -O primes.c -o primes
    real    0m 3.18s
    user    0m 1.30s
    sys     0m 1.47s
    localhost:~# time ./primes
    Starting run
    3713160 primes found in 456995 ms
    245 bytes of code in countPrimes()
    real    7m 37.97s
    user    7m 36.98s
    sys     0m 0.00s
    localhost:~# uname -a
    Linux localhost 6.19.3 #17 PREEMPT_DYNAMIC Mon Mar  9 17:12:35 CET 2026 x86_64 Linux

x86 (i.e. 32 bit):

    localhost:~# time gcc -O primes.c -o primes
    real    0m 2.08s
    user    0m 1.43s
    sys     0m 0.64s
    localhost:~# time ./primes
    Starting run
    3713160 primes found in 348424 ms
    301 bytes of code in countPrimes()
    real    5m 48.46s
    user    5m 37.55s
    sys     0m 10.86s
    localhost:~# uname -a
    Linux localhost 4.12.0-rc6-g48ec1f0-dirty #21 Fri Aug 4 21:02:28 CEST 2017 i586 Linux

riscv64:

    [root@localhost ~]# time gcc -O primes.c -o primes
    real    0m 2.08s
    user    0m 1.13s
    sys     0m 0.93s
    [root@localhost ~]# time ./primes
    Starting run
    3713160 primes found in 180893 ms
    216 bytes of code in countPrimes()
    real    3m 0.90s
    user    3m 0.89s
    sys     0m 0.00s
    [root@localhost ~]# uname -a
    Linux localhost 4.15.0-00049-ga3b1e7a-dirty #11 Thu Nov 8 20:30:26 CET 2018 riscv64 GNU/Linux

Conclusion: as seen also in QEMU (also started by Bellard!), RISC-V is a *lot* easier to emulate than x86. If you're building code specifically to run in emulation, use RISC-V: builds faster, smaller code, runs faster.

Note: quite different gcc versions, with x86_64 being 15.2.0, x86 9.3.0, and riscv64 7.3.0.

[1] http://hoult..rg/primes.txt

dmitrygr · 2026-03-10T00:29:24 1773102564

MIPS (the arch of which RISCV is mostly a copy) is even easier to emulate, unlike RV it does not scatter immediate bits al over the instruction word, making it easier for an emulator to get immediates. If you need emulated perf, MIPS is the easiest of all

brucehoult · 2026-03-10T00:43:20 1773103400

That's a very small effect in the overall decoding of an instruction even in a pure interpretive emulator, and undetectable in a JIT.

Also MIPS code is much larger.

thesz · 2026-03-10T12:39:22 1773146362

MIPS code is not much larger.

There are two interesting differences of ISA between MIPS and RISC-V: that MIPS does not have branch on condition, only on zero/non-zero and that MIPS has 16 bit immediates with appropriate sign extension (all zeroes for ORI, all ones for ANDI). The first difference makes MIPS programs about 10% larger and second difference makes MIPS programs smaller (RISC-V immediates are 11.5 bits due to mandatory sign extension, 13 bits are required to cover 95% of immediates in MIPS-like scheme), a percent or so, I think.

dmitrygr · 2026-03-11T02:38:54 1773196734

ANDI is not 1-extended (though that would be nice) it is 0-extended. MIPS only has 2 extension modes for immediates - sign extended and zero-extended. all logical ops are 0-extended, all arith ops are sign extended.

thesz · 2026-03-11T22:37:48 1773268668

Thanks. My memory failed me, I've implemented MIPS 18 years ago.

dmitrygr · 2026-03-10T12:05:57 1773144357

Entirely disagreed. In a simple step by step emulator it can be as much as 30% of the time spent. In a jit indeed it is less of an effect.

anthk · 2026-03-10T09:54:17 1773136457

MIPS Linux ELF interpreter in Perl:

http://blog.schmorp.de/2015-06-08-emulating-linux-mips-in-pe...

saagarjha · 2026-03-10T11:30:54 1773142254

> If you're building code specifically to run in emulation, use RISC-V: builds faster, smaller code, runs faster.

I don't really think this bears out in practice. RISC-V is easy to emulate but this does not make it fast to emulate. Emulation performance is largely dominated by other factors where RISC-V does not uniquely dominate.

lxgr · 2026-03-10T12:37:08 1773146228

Do you have an explanation for GP's benchmark results then?

saagarjha · 2026-03-11T02:49:26 1773197366

If they had a working link I might

brucehoult · 2026-03-11T08:25:21 1773217521

Oh darn, I mistyped that, though it's not that hard to guess how to fix the typo (or Google):

http://hoult.org/primes.txt

camel-cdr · 2026-03-10T15:28:36 1773156516

x86 is a lot easier to JIT to Arm or RISC-V though, because it has fewer registers.

vexnull · 2026-03-10T00:59:07 1773104347

Interesting to see the gcc version gap between the targets. The x86_64 image shipping gcc 15.2.0 vs 7.3.0 on riscv64 makes the performance comparison less apples-to-apples than it looks - newer gcc versions have significantly better optimization passes, especially for register allocation.

brucehoult · 2026-03-10T04:03:09 1773115389

The RISC-V one has just never been touched since it was created in 2018.

> newer gcc versions have significantly better optimization passes

So what you're saying is that with a modern compiler RISC-V would win by even more?

TBH I doubt much has changed with register allocation on register-rich RISC ISAs since 2018. On i386, yeah, quite possible.

brucehoult · 2026-03-08T01:53:32 1772934812

Making insider/true expert information public more quickly in the form of influencing prices in a toy market is THE ENTIRE POINT of prediction markets.

Read the original papers on them.

http://li.mit.edu/Stuff/CNSE/Paper/Hanson90.pdf

https://www.jstor.org/stable/3216893

https://mason.gmu.edu/~rhanson/insiderbet.pdf

Retr0id · 2026-03-08T02:30:53 1772937053

It might've been the original purpose but in practice prediction markets have turned into a tool for gambling.

It also creates weird incentives. If I want to pay a politician to do something, bribing them would generally be illegal. But what if I instead bet lots of money that they won't do it?

yieldcrv · 2026-03-08T03:16:42 1772939802

> a tool for gambling

they require liquidity from people that are not in the know for adequate price discovery and exit liquidity worthwhile for participation

Retr0id · 2026-03-08T05:27:13 1772947633

You need liquidity but there is no requirement that it comes from gamblers

yieldcrv · 2026-03-08T06:25:25 1772951125

Tell that to the gamblers? like be for real right now

gzread · 2026-03-08T13:16:42 1772975802

If only people with accurate information bet, they wouldn't make any money so they'd stop betting.

Retr0id · 2026-03-08T18:56:48 1772996208

Betting on imperfect information is not the same thing as gambling

NiloCK · 2026-03-08T03:55:21 1772942121

This may be the purpose of a prediction market for an outside observer. But the outside observer and the US government (or any org that holds private information) have different purposes - it's an adversarial mechanism.

In particular: the government is free to just publish any insider/true information that it wants the public to know about. If it shared that purpose then the market wouldn't need to exist.

derektank · 2026-03-08T01:58:53 1772935133

True experts need not be the people with the ultimate ability to effect change. Professional sports organizations ban their players from betting on games because it creates bad incentives to throw a winnable game. Banning elected representatives from gambling on prediction markets doesn’t make it impossible for insider information to surface, but it does prevent the governance equivalent of match fixing.

TarasBob · 2026-03-08T02:02:12 1772935332

EXACTLY! We want insiders to participate in prediction markets and profit!

yieldcrv · 2026-03-08T03:12:24 1772939544

profit is the participation incentive

JumpCrisscross · 2026-03-08T06:13:01 1772950381

> Making insider/true expert information public more quickly in the form of influencing prices in a toy market is THE ENTIRE POINT of prediction markets

American taxpayers pay a lot of money for a military and intelligence advantage. It's not clear it's in our interest for that knowledge to be made "public more quickly."

Esophagus4 · 2026-03-08T12:15:44 1772972144

Exactly.

What we don’t want, and what we should enforce, is participants in prediction markets influencing the events they’re betting on (like the recent basketball betting scandal).

brucehoult · 2026-03-04T02:15:22 1772590522

Sequoia is 15. I still have my M1 Mini on Sonoma 14.5.

It keeps nagging me to update to Tahoe.

Oh ... I just checked, and I could update to 14.8.4. Maybe that's safe.

vunderba · 2026-03-04T02:37:49 1772591869

Same. Been rocking Sonoma on my M1 Mac for years at this point and it’s been great. There’s been almost zero upsides to upgrading MacOS versions lately.

JMiao · 2026-03-04T08:08:36 1772611716

Why not Sequoia?

brucehoult · 2026-03-04T22:04:43 1772661883

None of the new features appeal to me.

brucehoult · 2026-02-23T23:26:21 1771889181

It's not all that slow as a concept at that time when RAM speeds were as fast as CPU speeds. I think it's just that TI's implementation of the concept in that particular cost-optimised home computer was pretty bad -- the actual registers were in 256 bytes of fast static RAM, but the rest of the system memory (both ROM and RAM) was accessed very inefficiently, not only 1 bytes at a time on a 16 bit machine, but also with something like 4 wait states for every byte.

The 6502 is not very different with a very small number of registers and Zero Page being used for most of what a modern machine would use registers for. For example (unlike the Z80) there is no register-to-register add or subtract or compare -- you can only add/sub/cmp/and/or/xor a memory location to the accumulator. Also, pointers can only be done using a pair of adjacent Zero Page locations.

As long as you were using data in those in-RAM registers the TI-99/4 was around four times faster than a 1 MHz 6502 for 16 bit arithmetic -- and with a single 2-byte instruction doing what needed 7 instructions and 13 bytes of code on 6502 -- and it was also twice as fast on 8 bit arithmetic.

It was just the cheap-ass main memory (and I/O) implementation that crippled it.