Architectures/RISC-V/64ILP32: Difference between revisions

Revision as of 09:30, 21 August 2023

Why 32-bit Linux?

The motivation for using a 32-bit Linux kernel is to reduce memory footprint and meet the small capacity of DDR & cache requirement .(e.g., 64/128MB SIP SoC).

The size of ilp32's long & pointer is just half of lp64's (rv64 default abi - longs and pointers are all 64-bit). This significant difference in data type causes different memory & cache footprint costs. Here is the comparison measurement between s32ilp32, s64ilp32, and s64lp64 in the same 128MB qemu system environment:

Rootfs:

u32ilp32 - Using the same 32-bit userspace rootfs.ext2 (UXL=32) binary from buildroot 2023.02-rc3, qemu_riscv32_virt_defconfig

Linux:

s32ilp32 - Linux version 6.3.0-rc1 (124MB) rv32_defconfig:

           $(Q)$(MAKE) -f $(srctree)/Makefile defconfig 32-bit.config

s64lp64 - Linux version 6.3.0-rc1 (126MB) defconfig:

           $(Q)$(MAKE) -f $(srctree)/Makefile defconfig

s64ilp32 - Linux version 6.3.0-rc1 (126MB) rv64ilp32_defconfig:

          $(Q)$(MAKE) -f $(srctree)/Makefile defconfig 64ilp32.config

Opensbi:

m64lp64 - (2MB) OpenSBI v1.2-80-g4b28afc98bbe
m32ilp32 - (4MB) OpenSBI v1.2-80-g4b28afc98bbe

It's a rough measurement based on the current default config without any modification, and 32-bit (s32ilp32, s64ilp32) saved more than 16% memory to 64-bit (s64lp64). But s32ilp32 & s64ilp32 have a similar memory footprint (about 0.33% difference), meaning s64ilp32 has a big chance to replace s32ilp32 on the 64-bit machine.

Why s64ilp32?

The current RISC-V has the profiles of RVA20S64, RVA22S64, and RVA23S64 (ongoing) [4], but no RVA**S32 profile exists or any ongoing plan. That means when a vendor wants to produce a 32-bit s-mode RISC-V Application Processor, they have no shape to follow. Therefore, many cheap riscv chips have come out but follow the RVA2xS64 profiles, such as Allwinner D1/D1s/F133 [5], SOPHGO CV1800B [6], Canaan Kendryte k230 [7], and Bouffalo Lab BL808 which are typically cortex a7/a35/a53 product scenarios. The D1 & CV1800B & BL808 didn't support UXL=32 (32-bit U-mode), so they need a new u64ilp32 userspace ABI which has no software ecosystem for the current. Thus, the first landing of s64ilp32 would be on Canaan Kendryte k230, which has c908 with rv64gcv and compat user mode (sstatus.uxl=32/64), which could support the existing rv32 userspace software ecosystem.

Another reason for inventing s64ilp32 is performance benefits and simplify 64-bit CPU hardware design (v.s. s32ilp32).

Why s64ilp32 has better performance?

Generally speaking, we should build a 32-bit hardware s-mode to run 32-bit Linux on a 64-bit processor (such as Linux-arm32 on cortex-a53). Or only use old 32ilp32-abi on a 64-bit machine (such as mips SYS_SUPPORTS_32BIT_KERNEL). These can't reuse performance-related features and instructions of the 64-bit hardware, such as 64-bit ALU, AMO, and LD/SD, which would cause significant performance gaps on many Linux features:

- memcpy/memset/strcmp (s64ilp32 has half of the instructions count
  and double the bandwidth of load/store instructions than s32ilp32.)

- ebpf JIT is a 64-bit virtual ISA, which is not suitable
  for mapping to s32ilp32.

- Atomic64 (s64ilp32 has the exact native instructions mapping as
  s64lp64, but s32ilp32 only uses generic_atomic64, a tradeoff &
  limited software solution.)

- 64-bit native arithmetic instructions for "long long" type

- Support cmxchg_double for slub (The 2nd 32-bit Linux
  supports the feature, the 1st is i386.)

- ...

Compared with the user space ecosystem, the 32-bit Linux kernel is more eager to need 64ilp32 to improve performance because the Linux kernel can't utilize float-point/vector features of the ISA.

Let's look at performance from another perspective (s64ilp32 v.s. s64lp64). Just as the first chapter said, the pointer size of ilp32 is half of the lp64, and it reduces the size of the critical data structs (e.g., page, list, ...). That means the cache of using ilp32 could contain double data that lp64 with the same cache capacity, which is a natural advantage of 32-bit.

Why s64ilp32 simplifies CPU design?

Yes, there are a lot of runing 32-bit Linux on 64-bit hardware examples in history, such as arm cortex a35/a53/a55, which implements the 32-bit EL1/EL2/EL3 hardware mode to support 32-bit Linux. We could follow Arm's style, but riscv could choose another better way. Compared to UXL=32, the MXL=SXL=32 has many CSR-related hardware functionalities, which causes a lot of effort to mix them into 64-bit hardware. The s64ilp32 works on MXL=SXL=64 mode, so the CPU vendors needn't implement 32-bit machine and supervisor modes.

How does s64ilp32 work?

The s64ilp32 is the same as the s64lp64 compat mode from a hardware view, i.e., MXL=SXL=64 + UXL=32. Because the s64ilp32 uses CONFIG_32BIT of Linux, it only supports u32ilp32 abi user space, the current standard rv32 software ecosystem, and it can't work with u64lp64 abi (I don't want that complex and useless stuff). But it may work with u64ilp32 in the future; now, the s64ilp32 depends on the UXL=32 feature of the hardware.

The 64ilp32 gcc still uses sign-extend lw & auipc to generate address variables because inserting zero-extend instructions to mask the highest 32-bit would cause significant code size and performance problems. Thus, we invented an OS approach to solve the problem:

- When satp=bare and start physical address < 2GB, there is no sign-extend
  address problem.
- When satp=bare and start physical address > 2GB, we need zjpm liked
  hardware extensions to mask high 32bit.
  (Fortunately, all existed SoCs' (D1/D1s/F133, CV1800B, k230, BL808)
   start physical address < 2GB.)
- When satp=sv39, we invent double mapping to make the sign-extended
  virtual address the same as the zero-extended virtual address.

How to run s64ilp32?

GNU toolchain

Please use riscv64-linux-gnu-  toolchain in Fedora

Opensbi

git clone https://github.com/riscv-software-src/opensbi.git
CROSS_COMPILE=riscv64-linux-gnu- make PLATFORM=generic

Linux kernel

git clone https://github.com/guoren83/linux.git -b s64ilp32
cd linux
make ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- rv64ilp32_defconfig
make ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- menuconfig
make ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- all

Fedora Rootfs

WIP

Qemu

git clone https://github.com/plctlab/plct-qemu.git -b plct-s64ilp32-dev
cd plct-qemu
mkdir build
cd build
../qemu/configure --target-list="riscv64-softmmu riscv32-softmmu"
make

Run

./qemu-system-riscv64 -cpu rv64 -M virt -m 128m -nographic -bios fw_dynamic.bin -kernel Image -drive file=rootfs.ext2,format=raw,id=hd0 -device virtio-blk-device,drive=hd0 -append "rootwait root=/dev/vda ro console=ttyS0 earlycon=sbi" -netdev user,id=net0 -device virtio-net-device,netdev=net0

@@ Line 135: / Line 135: @@
 === GNU toolchain ===
-<pre>
+ Please use riscv64-linux-gnu-  toolchain in Fedora
-git clone https://github.com/Liaoshihua/riscv-gnu-toolchain.git
-cd riscv-gnu-toolchain
-./configure --prefix="$PWD/opt-rv64-ilp32/" --with-arch=rv64imac --with-abi=ilp32
-make linux
-export PATH=$PATH:$PWD/opt-rv64-ilp32/bin/
-</pre>
 === Opensbi ===
 <pre>
 git clone https://github.com/riscv-software-src/opensbi.git
-CROSS_COMPILE=riscv64-unknown-linux-gnu- make PLATFORM=generic
+CROSS_COMPILE=riscv64-linux-gnu- make PLATFORM=generic
 </pre>
 === Linux kernel ===
@@ Line 151: / Line 146: @@
 git clone https://github.com/guoren83/linux.git -b s64ilp32
 cd linux
-make ARCH=riscv CROSS_COMPILE=riscv64-unknown-linux-gnu- rv64ilp32_defconfig
+make ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- rv64ilp32_defconfig
-make ARCH=riscv CROSS_COMPILE=riscv64-unknown-linux-gnu- all
+make ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- menuconfig
+make ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- all
 </pre>
 === Fedora Rootfs ===

Search