From Fedora Project Wiki
No edit summary
No edit summary
Line 1: Line 1:
{{lang|en|zh-cn|page=Architectures/RISC-V/64ILP32}}
{{lang|en|zh-cn|page=Architectures/RISC-V/64ILP32}}


== Why 32-bit Linux? ==
== what is rv64ilp32? ==
The advancement of AIoT technology has driven the demand for higher computing power in Microcontrollers (MCU) and Application Processors (AP), highlighting the limitations of the 32-bit architecture. Memory access and atomic operation instructions are challenging to meet the requirements of modern systems, leading to a transition towards a 64-bit architecture. This transition faces challenges: running 32-bit software on 64-bit hardware affects performance due to the mismatch in pointer and register widths. To address this issue, the Xuantie team of DAMO Academy of Alibaba proposed the Relaxed-Addressing Mode and, in collaboration with the PLCT Lab of Institute of Software, Chinese Academy of Sciences , released the industry's first RISC-V new 32-bit product-grade open-source toolchain (rv64ilp32 toolchain), designed specifically for firmware, RTOS, and the Linux kernel, optimizing performance and cost. The new 32-bit Linux kernel significantly surpasses traditional solutions in performance, with a 300% improvement in ebpf performance and a 17% increase in iperf-tcp.


The motivation for using a 32-bit Linux kernel is to reduce memory footprint and meet the small capacity of DDR & cache requirement .(e.g., 64/128MB SIP SoC).
The Fedora community has a rich software ecosystem for RISC-V, and compared to traditional 64-bit solutions, the new 32-bit builds can save 39% of memory. This advantage makes Fedora RISC-V more widely applicable in the embedded system. Seeing the potential of the new 32-bit, we initiated the new 32-bit Fedora Remix project, which can now run on the k230 development board:


The size of ilp32's long & pointer is just half of lp64's (rv64 default abi - longs and pointers are all 64-bit). This significant difference in data type causes different memory & cache footprint costs. Here is the comparison measurement between s32ilp32, s64ilp32, and s64lp64 in the same 128MB qemu system environment:
[[File:K230.jpg|thumb]]


=== Rootfs: ===
== Quick Start ==
* u32ilp32 - Using the same 32-bit userspace rootfs.ext2 (UXL=32) binary from buildroot 2023.02-rc3, qemu_riscv32_virt_defconfig
CanMV-K230 Fedora Firmware download:
 
[Release 2024.03.03-128m · ruyisdk/mkimg-k230-rv64ilp32](https://github.com/ruyisdk/mkimg-k230-rv64ilp32/releases/tag/2024.03.03-128m)


=== Linux: ===
* s32ilp32 - Linux version 6.3.0-rc1 (124MB) rv32_defconfig:
<pre>
<pre>
          $(Q)$(MAKE) -f $(srctree)/Makefile defconfig 32-bit.config
- rv64-canmv-rv64 (s64lp64+u64lp64)
- rv32-canmv-rv64 (s64lp64+u32ilp32)
- rv32-canmv-rv64ilp32 (s64ilp64 + u32ilp32)
</pre>
</pre>


* s64lp64  - Linux version 6.3.0-rc1 (126MB) defconfig:  
以下是各个版本的内存开销对比 (相比传统64位 k230,新32位Linux避免39%的内存开销):
s64lp64 + u64lp64:
<pre>
- free -h
-      total used free shared buff/cache available
- Mem: 107Mi 39Mi 15Mi 1.0Mi 52Mi      53Mi
</pre>
s64lp64 + u32ilp32:
<pre>
free -h
    total used free shared buff/cache available
Mem: 107Mi 33Mi 31Mi 1.0Mi  41Mi      67Mi
</pre>
s64ilp32 + u32ilp32:
<pre>
<pre>
          $(Q)$(MAKE) -f $(srctree)/Makefile defconfig
free -h
    total used free shared buff/cache available
Mem: 108Mi 28Mi 41Mi 1.0Mi  38Mi      73Mi
</pre>
</pre>
(used: 39MB -> 33MB -> 28MB, Prevent 39% memory waste in s64ilp32 + u32ilp32


* s64ilp32 - Linux version 6.3.0-rc1 (126MB)  rv64ilp32_defconfig:
== flash firmware ==
decompress zst file
<pre>
<pre>
          $(Q)$(MAKE) -f $(srctree)/Makefile defconfig 64ilp32.config
zstd -d k230-sdcard-fedora_rv32-canmv-rv64ilp32.img.zst
</pre>
</pre>


=== Opensbi: ===
Below is a comparison of memory overhead for different versions (compared to the traditional 64-bit k230, the new 32-bit Linux avoids 39% of memory overhead):


* m64lp64  - (2MB) OpenSBI v1.2-80-g4b28afc98bbe
<pre>
* m32ilp32 - (4MB) OpenSBI v1.2-80-g4b28afc98bbe
wipefs -a /dev/sdb
dd if=k230-sdcard-fedora_rv32-canmv-rv64ilp32.img of=/dev/sdb bs=1M status=progress
sync
eject
</pre>


== Build Linux kernel ==
getting toolchain : https://github.com/ruyisdk/riscv-gnu-toolchain-rv64ilp32


[[File:64ilp32.png|center|1024px]]
geting Linux kernel:
 
 
It's a rough measurement based on the current default config without any
modification, and 32-bit (s32ilp32, s64ilp32) saved more than 16% memory
to 64-bit (s64lp64). But s32ilp32 & s64ilp32 have a similar memory
footprint (about 0.33% difference), meaning s64ilp32 has a big chance to
replace s32ilp32 on the 64-bit machine.
 
 
 
== Why s64ilp32? ==
 
The current RISC-V has the profiles of RVA20S64, RVA22S64, and RVA23S64
(ongoing) [4], but no RVA**S32 profile exists or any ongoing plan. That
means when a vendor wants to produce a 32-bit s-mode RISC-V Application
Processor, they have no shape to follow. Therefore, many cheap riscv
chips have come out but follow the RVA2xS64 profiles, such as Allwinner
D1/D1s/F133 [5], SOPHGO CV1800B [6], Canaan Kendryte k230 [7], and
Bouffalo Lab BL808 which are typically cortex a7/a35/a53 product
scenarios. The D1 & CV1800B & BL808 didn't support UXL=32 (32-bit U-mode),
so they need a new u64ilp32 userspace ABI which has no software ecosystem
for the current. Thus, the first landing of s64ilp32 would be on Canaan
Kendryte k230, which has c908 with rv64gcv and compat user mode
(sstatus.uxl=32/64), which could support the existing rv32 userspace
software ecosystem.
 
Another reason for inventing s64ilp32 is performance benefits and
simplify 64-bit CPU hardware design (v.s. s32ilp32).
 
== Why s64ilp32 has better performance? ==
 
Generally speaking, we should build a 32-bit hardware s-mode to run
32-bit Linux on a 64-bit processor (such as Linux-arm32 on cortex-a53).
Or only use old 32ilp32-abi on a 64-bit machine (such as mips
SYS_SUPPORTS_32BIT_KERNEL). These can't reuse performance-related
features and instructions of the 64-bit hardware, such as 64-bit ALU,
AMO, and LD/SD, which would cause significant performance gaps on many
Linux features:
 
- memcpy/memset/strcmp (s64ilp32 has half of the instructions count
  and double the bandwidth of load/store instructions than s32ilp32.)
 
- ebpf JIT is a 64-bit virtual ISA, which is not suitable
  for mapping to s32ilp32.
 
- Atomic64 (s64ilp32 has the exact native instructions mapping as
  s64lp64, but s32ilp32 only uses generic_atomic64, a tradeoff &
  limited software solution.)
 
- 64-bit native arithmetic instructions for "long long" type
 
- Support cmxchg_double for slub (The 2nd 32-bit Linux
  supports the feature, the 1st is i386.)
 
- ...
 
Compared with the user space ecosystem, the 32-bit Linux kernel is more
eager to need 64ilp32 to improve performance because the Linux kernel
can't utilize float-point/vector features of the ISA.
 
Let's look at performance from another perspective (s64ilp32 v.s.
s64lp64). Just as the first chapter said, the pointer size of ilp32 is
half of the lp64, and it reduces the size of the critical data structs
(e.g., page, list, ...). That means the cache of using ilp32 could
contain double data that lp64 with the same cache capacity, which is a
natural advantage of 32-bit.
 
== Why s64ilp32 simplifies CPU design? ==
 
Yes, there are a lot of runing 32-bit Linux on 64-bit hardware examples
in history, such as arm cortex a35/a53/a55, which implements the 32-bit
EL1/EL2/EL3 hardware mode to support 32-bit Linux. We could follow Arm's
style, but riscv could choose another better way. Compared to UXL=32,
the MXL=SXL=32 has many CSR-related hardware functionalities, which
causes a lot of effort to mix them into 64-bit hardware. The s64ilp32
works on MXL=SXL=64 mode, so the CPU vendors needn't implement 32-bit
machine and supervisor modes.
 
== How does s64ilp32 work? ==
 
The s64ilp32 is the same as the s64lp64 compat mode from a hardware
view, i.e., MXL=SXL=64 + UXL=32. Because the s64ilp32 uses CONFIG_32BIT
of Linux, it only supports u32ilp32 abi user space, the current standard
rv32 software ecosystem, and it can't work with u64lp64 abi (I don't
want that complex and useless stuff). But it may work with u64ilp32 in the
future; now, the s64ilp32 depends on the UXL=32 feature of the hardware.
 
The 64ilp32 gcc still uses sign-extend lw & auipc to generate address
variables because inserting zero-extend instructions to mask the highest
32-bit would cause significant code size and performance problems. Thus,
we invented an OS approach to solve the problem:
- When satp=bare and start physical address < 2GB, there is no sign-extend
  address problem.
- When satp=bare and start physical address > 2GB, we need zjpm liked
  hardware extensions to mask high 32bit.
  (Fortunately, all existed SoCs' (D1/D1s/F133, CV1800B, k230, BL808)
    start physical address < 2GB.)
- When satp=sv39, we invent double mapping to make the sign-extended
  virtual address the same as the zero-extended virtual address.
 
== How to run s64ilp32? ==
 
 
=== GNU toolchain ===
Please use riscv64-linux-gnu-  toolchain in Fedora
 
=== Opensbi ===
<pre>
<pre>
git clone https://github.com/riscv-software-src/opensbi.git
git clone https://github.com/ruyisdk/k230-rv64ilp32-linux-kernel.git -b k230-6.6-ilp32-128M --depth=1
CROSS_COMPILE=riscv64-linux-gnu- make PLATFORM=generic
cd k230-rv64ilp32-linux-kernel
</pre>
</pre>
=== Linux kernel ===
<pre>
git clone https://github.com/guoren83/linux.git -b s64ilp32
cd linux
make ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- rv64ilp32_defconfig
make ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- menuconfig
make ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- all
</pre>
=== Fedora Rootfs ===
WIP


=== Qemu ===
build the traditional s64lp64 Linux kernel:
<pre>
<pre>
git clone https://github.com/plctlab/plct-qemu.git -b plct-s64ilp32-dev
make ARCH=riscv CROSS_COMPILE=<YOUR PATH>/riscv/bin/riscv64-unknown-elf- k230_evb_linux_enable_vector_defconfig all
cd plct-qemu
mkdir build
cd build
../qemu/configure --target-list="riscv64-softmmu riscv32-softmmu"
make
</pre>
</pre>
=== Run ===
build new Linux kenel,just follow the 64ilp64.config,示例如下:
<pre>
<pre>
./qemu-system-riscv64 -cpu rv64 -M virt -m 128m -nographic -bios fw_dynamic.bin -kernel Image -drive file=rootfs.ext2,format=raw,id=hd0 -device virtio-blk-device,drive=hd0 -append "rootwait root=/dev/vda ro console=ttyS0 earlycon=sbi" -netdev user,id=net0 -device virtio-net-device,netdev=net0
make ARCH=riscv CROSS_COMPILE=<YOUR PATH>/riscv/bin/riscv64-unknown-elf- k230_evb_linux_enable_vector_defconfig 64ilp32.config all
</pre>
</pre>

Revision as of 18:12, 5 March 2024

what is rv64ilp32?

The advancement of AIoT technology has driven the demand for higher computing power in Microcontrollers (MCU) and Application Processors (AP), highlighting the limitations of the 32-bit architecture. Memory access and atomic operation instructions are challenging to meet the requirements of modern systems, leading to a transition towards a 64-bit architecture. This transition faces challenges: running 32-bit software on 64-bit hardware affects performance due to the mismatch in pointer and register widths. To address this issue, the Xuantie team of DAMO Academy of Alibaba proposed the Relaxed-Addressing Mode and, in collaboration with the PLCT Lab of Institute of Software, Chinese Academy of Sciences , released the industry's first RISC-V new 32-bit product-grade open-source toolchain (rv64ilp32 toolchain), designed specifically for firmware, RTOS, and the Linux kernel, optimizing performance and cost. The new 32-bit Linux kernel significantly surpasses traditional solutions in performance, with a 300% improvement in ebpf performance and a 17% increase in iperf-tcp.

The Fedora community has a rich software ecosystem for RISC-V, and compared to traditional 64-bit solutions, the new 32-bit builds can save 39% of memory. This advantage makes Fedora RISC-V more widely applicable in the embedded system. Seeing the potential of the new 32-bit, we initiated the new 32-bit Fedora Remix project, which can now run on the k230 development board:

K230.jpg

Quick Start

CanMV-K230 Fedora Firmware download:

[Release 2024.03.03-128m · ruyisdk/mkimg-k230-rv64ilp32](https://github.com/ruyisdk/mkimg-k230-rv64ilp32/releases/tag/2024.03.03-128m)

- rv64-canmv-rv64 (s64lp64+u64lp64)
- rv32-canmv-rv64 (s64lp64+u32ilp32)
- rv32-canmv-rv64ilp32 (s64ilp64 + u32ilp32)

以下是各个版本的内存开销对比 (相比传统64位 k230,新32位Linux避免39%的内存开销): s64lp64 + u64lp64:

- free -h
-      total used free shared buff/cache available
- Mem: 107Mi 39Mi 15Mi 1.0Mi  52Mi       53Mi

s64lp64 + u32ilp32:

free -h
     total used free shared buff/cache available
Mem: 107Mi 33Mi 31Mi 1.0Mi  41Mi       67Mi

s64ilp32 + u32ilp32:

free -h
     total used free shared buff/cache available
Mem: 108Mi 28Mi 41Mi 1.0Mi  38Mi       73Mi

(used: 39MB -> 33MB -> 28MB, Prevent 39% memory waste in s64ilp32 + u32ilp32

flash firmware

decompress zst file

zstd -d k230-sdcard-fedora_rv32-canmv-rv64ilp32.img.zst

Below is a comparison of memory overhead for different versions (compared to the traditional 64-bit k230, the new 32-bit Linux avoids 39% of memory overhead):

wipefs -a /dev/sdb
dd if=k230-sdcard-fedora_rv32-canmv-rv64ilp32.img of=/dev/sdb bs=1M status=progress
sync
eject

Build Linux kernel

getting toolchain : https://github.com/ruyisdk/riscv-gnu-toolchain-rv64ilp32

geting Linux kernel:

git clone https://github.com/ruyisdk/k230-rv64ilp32-linux-kernel.git -b k230-6.6-ilp32-128M --depth=1
cd k230-rv64ilp32-linux-kernel

build the traditional s64lp64 Linux kernel:

make ARCH=riscv CROSS_COMPILE=<YOUR PATH>/riscv/bin/riscv64-unknown-elf- k230_evb_linux_enable_vector_defconfig all

build new Linux kenel,just follow the 64ilp64.config,示例如下:

make ARCH=riscv CROSS_COMPILE=<YOUR PATH>/riscv/bin/riscv64-unknown-elf- k230_evb_linux_enable_vector_defconfig 64ilp32.config all