WebAssembly and RISC-V are both new Instruction Set Architectures (ISAs) that evolved in the recent 10 years. Quoting the introduction to WebAssembly on webassembly.org:

WebAssembly (abbreviated Wasm) is a binary instruction format for a stack-based virtual machine. Wasm is designed as a portable target for compilation of high-level languages like C/C++/Rust, enabling deployment on the web for client and server applications.

and the introduction to RISC-V on riscv.org:

RISC-V is a free and open ISA enabling a new era of processor innovation through open standard collaboration. Born in academia and research, RISC-V ISA delivers a new level of free, extensible software and hardware freedom on architecture, paving the way for the next 50 years of computing design and innovation.

It seems that these two ISAs are quite different. WebAssembly is primarily targeted at JIT compilation and in particular, browsers, while RISC-V is designed for hardware and one of its goals is efficient implementation on FPGAs and circuits. They're like Java and x86 respectively in the old days: A bytecode format, and a hardware instruction set. How can they be related?

Well, the boundary between bytecodes and real hardware instruction sets is actually a bit blurred. The x86 instruction set, for example, was once executed directly by the circuits that performs control and arithmetic operations. But now on newer Intel and AMD CPUs, x86 instructions are decoded into a internal RISC format on-the-fly and sent into a RISC core for execution - isn't it some kind of hardware "JIT compilation" similar to what JRE does for Java? Meanwhile, there are also experimental hardware implementations for Java and WebAssembly, though not being as efficient as what a JIT compiler can achieve on a modern, mature CPU.

Actually, people are already using RISC-V in a position where WebAssembly has usually been used for: smart contract on blockchains. It is the CKB VM. Across the blurred line between the WebAssembly bytecode format and the RISC-V hardware instruction set, what are the similarities, and what are the differences?

Let's do a comparison.

Feature table

WebAssembly RISC-V
Open source Yes Yes
Memory architecture Load/Store Load/Store
Floating point Yes Yes (in extension)
SIMD Yes Yes (in extension)
Separated code and data Yes No
Pointer width 32 32 and 64
Maximum data width 64 32 and 64
Typing Weak None
Control flow Restricted Arbitrary
Machine model Stack Register
Memory layout Linear Paged
Memory protection None RWX
Synchronization primitive CAS LL/SC
Instruction encoding Variable length Fixed length (2 or 4 bytes)
Interaction with environment Imported functions System call
Executable image format WebAssembly ELF

The similarities are quite common, so I will focus on the differences here.

1. Separated code and data

Most modern architectures use a unique addressing space for code and data, including RISC-V. But WebAssembly doesn't, and actually the running code is not even provided with a way to access itself. Possible reasons:

  • Simplicity for JIT compilation. If code can modify itself, then the JIT compiler needs to detect the modification and update the corresponding machine code - which is very complex.
  • WebAssembly's assumption about a powerful host environment. Linking, etc. will be done entirely by the host, and the program doesn't need to care about bringing itself up.
  • Security. Generating code at runtime is dangerous.

2. Typing and control flow

WebAssembly is very "structured": The JIT compiler requires all function calls, loops, jumps and value types to conform to structural constraints. For example, you can't pass two values to a function that accepts three arguments, you can't jump to a location in another function, and you can't execute the floating point add instruction on two integers.

RISC-V however has none of these constraints. Everything is just values in registers and it's up to the programmer to decide how to use them.

3. Machine model

WebAssembly is built upon the stack machine model, while RISC-V (along with most other hardware architectures) is a register machine.

In WebAssembly, semantically any instruction pops its operands (if any) from the value stack, performs the defined operation, and pushes its results (if any) back on to the value stack. However, the structure of the value stack at any point in the program can be statically determined - which is different from the Java bytecode, for example, in that keeping the stack structure statically determinable is not an optimization but a compile-time mandatory requirement.

4. Memory

While both ISAs define an untyped, byte-addressable memory, some details are different. Memory in WebAssembly is literally a large byte array: addresses start from zero and continuously span to an upper limit (which can be dynamically increased by executing an instruction). In comparison, RISC-V uses virtual memory where page tables are used to map addresses to the underlying physical memory.

Memory layouts

WebAssembly's memory model, although being simple to implement and efficient when JIT compiled, has several issues compared to RISC-V:

  • Address zero is valid, causing programs that assume a crash for dereferencing the null pointer to misbehave.
  • No "gaps" are possible. Therefore the guard page trick to prevent stack overflow in a multi-threaded environment won't work.

Virtual memory usually comes with memory protection where each memory page has a combination of read, write and execute permissions. This is an important security mechanism because it prevents execution and writes on memory regions where such operations aren't expected. RISC-V implements memory protection but WebAssembly does not - maybe it's less important in an architecture where executable code isn't addressable?

5. Synchronization

A computing machine needs at least one conditional branch instruction to be turing complete. Similarly, a multiprocessor architecture needs at least one "atomic conditional branch" instruction for proper synchronization. This instruction is i{32,64}.atomic.rmw.cmpxchg in WebAssembly, and LR/SC in RISC-V.

LR/SC has stronger semantics than cmpxchg. With cmpxchg we have the famous ABA problem, but LR/SC is not affected. It also means that emulating LR/SC with a cmpxchg-only architecture is much more difficult than the other way; something like a global lock or transactional memory has to be used, greatly increasing the complexity. RISC-V is designed for hardware anyway, but if you ever want to write a software implementation you will have to work around this issue.

Side note: I worked on/am working on both a RISC-V interpreter, FlatRv, and a full WebAssembly JIT runtime, Wasmer. Take a look at them and star if you like my work :)