Skip to content

Commit cb3abb4

Browse files
authored
Merge pull request #2608 from ArmDeveloperEcosystem/main
prod update
2 parents d545ced + 01b791a commit cb3abb4

File tree

34 files changed

+948
-350
lines changed

34 files changed

+948
-350
lines changed
Lines changed: 13 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,24 @@
11
---
2-
title: Overview
2+
title: Profile Linux kernel modules with Arm Streamline
33
weight: 2
44

55
### FIXED, DO NOT MODIFY
66
layout: learningpathall
77
---
88

9-
## Linux kernel profiling with Arm Streamline
9+
## Overview
10+
Performance tuning is just as important for kernel modules as it is for user-space applications. Arm Streamline is a powerful profiling tool that helps you find performance bottlenecks, hotspots, and memory issues - even inside the Linux kernel. In this Learning Path, you'll learn how to use Arm Streamline to profile a simple kernel module on Arm-based systems. You'll see how profiling can reveal optimization opportunities and help you improve your module's efficiency.
11+
## Benefits of profiling Linux kernel modules with Arm Streamline
1012

11-
Performance tuning is not limited to user-space applications—kernel modules can also benefit from careful analysis. [Arm Streamline](https://developer.arm.com/Tools%20and%20Software/Streamline%20Performance%20Analyzer) is a powerful software profiling tool that helps developers understand performance bottlenecks, hotspots, and memory usage, even inside the Linux kernel. This learning path explains how to use Arm Streamline to profile a simple kernel module.
13+
Kernel modules often operate in performance-critical paths, such as device drivers or networking subsystems. Even a small inefficiency in a module can affect the overall system performance.
1214

13-
### Why profile a kernel module?
15+
Profiling enables you to do the following:
1416

15-
Kernel modules often operate in performance-critical paths, such as device drivers or networking subsystems. Even a small inefficiency in a module can affect the overall system performance. Profiling enables you to:
17+
- Identify hotspots (functions consuming most CPU cycles)
18+
- Measure cache and memory behavior
19+
- Understand call stacks for debugging performance issues
20+
- Detect synchronization bottlenecks and race conditions
21+
- Quantify the impact of code changes on system latency and throughput
22+
- Validate that optimizations improve performance on Arm-based systems
1623

17-
- Identify hotspots (functions consuming most CPU cycles)
18-
- Measure cache and memory behavior
19-
- Understand call stacks for debugging performance issues
24+
By using Arm Streamline, you gain visibility into how your kernel module interacts with the rest of the system. This insight helps you make data-driven decisions to optimize code paths, reduce resource contention, and ensure your module performs efficiently on Arm platforms. Profiling is especially valuable when porting modules from x86 to Arm, as architectural differences can reveal new optimization opportunities or highlight previously hidden issues.
Lines changed: 42 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,45 +1,49 @@
11
---
2-
title: Build Linux image
2+
title: Set up your environment
33
weight: 3
44

55
### FIXED, DO NOT MODIFY
66
layout: learningpathall
77
---
88

9-
## Install packages
9+
## Prepare to build a Linux image with Buildroot
1010

11-
```
11+
Before you build a Linux image with [Buildroot](https://github.com/buildroot/buildroot), make sure your development environment includes all the required packages. These tools and libraries are essential for compiling, configuring, and assembling embedded Linux images on Arm platforms. Installing them now helps you avoid build errors and ensures a smooth workflow.
12+
13+
## Install the required packages for Buildroot
14+
15+
Run the following commands on your AArch64-based Linux system to update your package list and install the necessary dependencies:
16+
17+
```bash
1218
sudo apt update
13-
sudo apt install -y which sed make binutils build-essential diffutils gcc g++ bash patch gzip \
19+
sudo apt install -y which sed make binutils build-essential diffutils gcc g++ bash patch gzip \
1420
bzip2 perl tar cpio unzip rsync file bc findutils gawk libncurses-dev python-is-python3 \
1521
gcc-arm-none-eabi
1622
```
1723

24+
These packages ensure that Buildroot can configure, compile, and assemble all the components needed for your custom Linux image. If you encounter missing package errors during the build process, check your distribution's documentation for any additional dependencies specific to your environment.
25+
26+
1827
## Build a debuggable kernel image
1928

20-
For this learning path you will be using [Buildroot](https://github.com/buildroot/buildroot) to build a Linux image for Raspberry Pi 3B+ with a debuggable Linux kernel. You will profile Linux kernel modules built out-of-tree and Linux device drivers built in the Linux source code tree.
29+
For this Learning Path you'll build a Linux image for Raspberry Pi 3B+ with a debuggable Linux kernel. You'll profile Linux kernel modules built out-of-tree and Linux device drivers built in the Linux source code tree.
30+
31+
{{% notice Note on using a different board%}}
32+
If you're not using a Raspberry Pi 3B+ for this Learning Path, select the default configuration that matches your hardware. Replace `raspberrypi3_64_defconfig` with the appropriate file from the `$(BUILDROOT_HOME)/configs` directory. This ensures Buildroot generates an image compatible with your target board.{{% /notice %}}
2133

22-
1. Clone the Buildroot Repository and initialize the build system with the default configurations.
34+
Start by cloning the Buildroot repository and initialize the build system with the default configurations:
2335

2436
```bash
2537
git clone https://github.com/buildroot/buildroot.git
2638
cd buildroot
2739
export BUILDROOT_HOME=$(pwd)
2840
make raspberrypi3_64_defconfig
29-
```
30-
{{% notice Using a different board %}}
31-
If you're not using a Raspberry Pi 3 for this Learning Path, change the `raspberrypi3_64_defconfig` to the option that matches your hardware in `$(BUILDROOT_HOME)/configs`
32-
{{% /notice %}}
33-
34-
2. You will use `menuconfig` to configure the setup. Invoke it with the following command:
35-
36-
```
3741
make menuconfig
3842
```
3943

40-
![Menuconfig UI for Buildroot configuration](./images/menuconfig.png)
44+
![Buildroot menuconfig interface showing configuration options. The screen displays a blue dialog box with white text and a highlighted menu. The main menu lists Build options, System configuration, Kernel, and Target packages. The Build options section is selected, and sub-options include build packages with debugging symbols, gcc debug level set to debug level 3, and build packages with runtime debugging info. The environment is a text-based terminal interface, typical for embedded Linux development. The tone is technical and instructional, guiding users through enabling debugging features in Buildroot. alt-text#center](./images/menuconfig.png "Buildroot menuconfig interface showing configuration options")
4145

42-
Change Buildroot configurations to enable debugging symbols and SSH access.
46+
Now change the Buildroot configuration to enable debugging symbols and SSH access:
4347

4448
```plaintext
4549
Build options --->
@@ -61,38 +65,51 @@ Target packages --->
6165
[*] openssh
6266
[*] server
6367
[*] key utilities
64-
```
65-
66-
You might also need to change your default `sshd_config` file according to your network settings. To do that, you need to modify System configuration→ Root filesystem overlay directories to add a directory that contains your modified `sshd_config` file.
68+
You might need to update your default `sshd_config` file to match your network requirements. To do this, set the **Root filesystem overlay directories** option in the **System configuration** menu. Add a directory containing your customized `sshd_config` file. This ensures your SSH server uses the correct settings when the image boots.
6769
68-
3. By default the Linux kernel images are stripped. You will need to make the image debuggable as you'll be using it later.
70+
By default, Linux kernel images are stripped of debugging information. To make the image debuggable for profiling, you need to adjust the kernel build settings.
6971
70-
Invoke `linux-menuconfig` and uncheck the option as shown.
72+
Open the kernel configuration menu using the following:
7173
7274
```bash
7375
make linux-menuconfig
7476
```
7577

78+
In the menu, navigate to **Kernel hacking** and ensure that debugging options are enabled. Uncheck any option that reduces debugging information. This step preserves the symbols needed for effective kernel debugging and profiling:
79+
7680
```plaintext
7781
Kernel hacking --->
7882
-*- Kernel debugging
7983
Compile-time checks and compiler options --->
8084
Debug information (Rely on the toolchain's implicit default DWARF version)
8185
[ ] Reduce debugging information # un-check
8286
```
87+
Now you're ready to build the Linux image and flash it to your SD card for use with the Raspberry Pi.
8388

84-
4. Now you can build the Linux image and flash it to the the SD card to run it on the Raspberry Pi.
89+
Run the following command to start the build process:
8590

8691
```bash
8792
make -j$(nproc)
8893
```
8994

90-
It will take some time to build the Linux image. When it completes, the output will be in `$BUILDROOT_HOME/output/images/sdcard.img`:
95+
This step can take a while, depending on your system's performance. When the build finishes, you'll find the generated image at `$BUILDROOT_HOME/output/images/sdcard.img`.
96+
97+
To confirm that Buildroot created the SD card image, list the contents of the output directory:
9198

9299
```bash
93100
ls $BUILDROOT_HOME/output/images/ | grep sdcard.img
94101
```
95102

96-
For details on flashing the SD card image, see [this helpful article](https://www.ev3dev.org/docs/tutorials/writing-sd-card-image-ubuntu-disk-image-writer/).
103+
The expected output is:
104+
105+
```output
106+
sdcard.img
107+
```
108+
109+
If you see `sdcard.img` listed, your image is ready to be flashed to your SD card. You can now flash this image to your SD card and boot your Raspberry Pi with a debuggable Linux kernel.
110+
111+
For details on flashing the SD card image, see the article [Writing an SD Card Image Using Ubuntu Disk Image Writer](https://www.ev3dev.org/docs/tutorials/writing-sd-card-image-ubuntu-disk-image-writer/).
112+
113+
## What you've accomplished and what's next
97114

98-
Now that you have a target running Linux with a debuggable kernel image, you can start writing your kernel module that you want to profile.
115+
You've now successfully built a custom Linux image with debugging features enabled and flashed it to your Raspberry Pi. This milestone means your development board is now running a kernel that's ready for profiling and advanced debugging. Great job! Setting up a debuggable environment is a significant step in embedded Linux development. Next, you'll write and profile your own kernel module, building on the solid foundation you've established.

content/learning-paths/embedded-and-microcontrollers/streamline-kernel-module/3_oot_module.md

Lines changed: 37 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,20 @@
11
---
2-
title: Build out-of-tree kernel module
2+
title: Build the out-of-tree kernel module
33
weight: 4
44

55
### FIXED, DO NOT MODIFY
66
layout: learningpathall
77
---
88

9-
## Creating the Linux Kernel Module
9+
## Develop a cache-unfriendly Linux kernel module to analyze with Arm Streamline
1010

11-
You will now create an example Linux kernel module (Character device) that demonstrates a cache miss issue caused by traversing a 2D array in column-major order. This access pattern is not cache-friendly, as it skips over most of the neighboring elements in memory during each iteration.
11+
You'll create a simple Linux kernel module that acts as a character device and intentionally causes cache misses. It does this by traversing a two-dimensional array in column-major order, which is inefficient for the CPU cache because it jumps between non-adjacent memory locations. This pattern helps you see how cache-unfriendly code can slow down performance, making it easier to analyze with Arm Streamline.
1212

13-
To build the Linux kernel module, start by creating a new directory, for example `example_module`. Inside this directory, add two files: `mychardrv.c` and `Makefile`.
13+
To build the Linux kernel module, start by creating a new directory, for example `example_module`. Inside this directory, add two files: `Makefile` and `mychardrv.c`.
1414

15-
**Makefile**
15+
## Makefile
16+
17+
To build your Linux kernel module for Arm, you need a Makefile that instructs the build system how to compile and link your code against the kernel source. The Makefile below is designed for use with Buildroot and cross-compiles your module for the aarch64 architecture. Update the `BUILDROOT_OUT` variable to match your Buildroot output directory before running the build:
1618

1719
```makefile
1820
obj-m += mychardrv.o
@@ -28,11 +30,10 @@ clean:
2830
$(MAKE) -C $(KDIR) M=$(PWD) clean
2931
```
3032

31-
{{% notice Note %}}
32-
Change **BUILDROOT_OUT** to the correct buildroot output directory on your host machine.
33-
{{% /notice %}}
3433

35-
**mychardrv.c**
34+
## mychardrv.c
35+
36+
The `mychardrv.c` file contains the source code for your custom Linux kernel module. This module implements a simple character device that demonstrates cache-unfriendly behavior by traversing a two-dimensional array in column-major order. The code below allocates and initializes the array, performs the cache-unfriendly traversal, and exposes a write interface for user input. You’ll use this module to generate measurable performance bottlenecks for analysis with Arm Streamline.
3637

3738
```c
3839
// SPDX-License-Identifier: GPL-2.0
@@ -203,55 +204,54 @@ MODULE_DESCRIPTION("A simple char driver with cache misses issue");
203204
204205
The module above receives the size of a 2D array as a string through the `char_dev_write()` function, converts it to an integer, and passes it to the `char_dev_cache_traverse()` function. This function then creates the 2D array, initializes it with simple data, traverses it in a column-major (cache-unfriendly) order, computes the sum of its elements, and prints the result to the kernel log. The cache-unfriendly aspects allows you to inspect a bottleneck using Streamline in the next section.
205206
206-
## Building and Running the Kernel Module
207+
## Build and run the kernel module
207208
208-
1. To compile the kernel module, run make inside the example_module directory. This will generate the output file `mychardrv.ko`.
209+
To compile the kernel module, run `make` inside the `example_module` directory. This generates the output file `mychardrv.ko`.
209210
210-
2. Transfer the .ko file to the target using scp command and then insert it using insmod command. After inserting the module, you create a character device node using mknod command. Finally, you can test the module by writing a size value (e.g., 10000) to the device file and measuring the time taken for the operation using the `time` command.
211+
Transfer the kernel module (`mychardrv.ko`) to your target device using the `scp` command. Then, insert the module with `insmod` and create the character device node with `mknod`. To test the module, write a size value (such as 10000) to the device file and use the `time` command to measure how long the operation takes. This lets you see the performance impact of the cache-unfriendly access pattern in a clear, hands-on way:
211212
212-
```bash
213+
```bash
213214
scp mychardrv.ko root@<target-ip>:/root/
214-
```
215+
```
215216

216-
{{% notice Note %}}
217-
Replace \<target-ip> with your target's IP address
218-
{{% /notice %}}
217+
{{% notice Note %}} Replace `<target-ip>` with your target's IP address
218+
{{% /notice %}}
219219

220-
3. SSH onto your target device:
220+
SSH onto your target device:
221221

222-
```bash
222+
```bash
223223
ssh root@<your-target-ip>
224-
```
224+
```
225225

226-
4. Execute the following commands on the target to run the module:
227-
```bash
226+
Execute the following commads on the target to run the module:
227+
```bash
228228
insmod /root/mychardrv.ko
229229
mknod /dev/mychardrv c 42 0
230-
```
230+
```
231+
232+
{{% notice Note %}}
233+
42 and 0 are the major and minor number specified in the module code above.
234+
{{% /notice %}}
231235

232-
{{% notice Note %}}
233-
42 and 0 are the major and minor number specified in the module code above
234-
{{% /notice %}}
236+
To confirm that your kernel module is loaded and running, use the `dmesg` command. You should see a message like this in the output:
235237

236-
4. To verify that the module is active, run `dmesg` and the output should match the below:
237238

238-
```bash
239+
```bash
239240
dmesg
240-
```
241+
```
241242

242-
```output
243+
```output
243244
[12381.654983] mychardrv is open - Major(42) Minor(0)
244-
```
245+
```
245246

246-
5. To make sure it's working as expected you can use the following command:
247+
To make sure it's working as expected you can use the following command:
247248

248-
```bash { output_lines = "2-4" }
249+
```bash { output_lines = "2-4" }
249250
time echo '10000' > /dev/mychardrv
250251
# real 0m 38.04s
251252
# user 0m 0.00s
252253
# sys 0m 38.03s
253-
```
254-
255-
The command above passes 10000 to the module, which specifies the size of the 2D array to be created and traversed. The **echo** command takes a long time to complete (around 38 seconds) due to the cache-unfriendly traversal implemented in the `char_dev_cache_traverse()` function.
254+
```
255+
The command above sends the value 10000 to your kernel module, which sets the size of the 2D array it creates and traverses. Because the module accesses the array in a cache-unfriendly way, the `echo` command takes a noticeable amount of time to finish - it's about 38 seconds in this example. This delay is expected and highlights the performance impact of inefficient memory access patterns, making it easy to analyze with Arm Streamline.
256256

257-
With the kernel module built, the next step is to profile it using Arm Streamline. You will use it to capture runtime behavior, highlight performance bottlenecks, and help identifying issues such as the cache-unfriendly traversal in your module.
257+
You have successfully built and run your kernel module. In the next section, you'll profile it using Arm Streamline to capture runtime behavior, identify performance bottlenecks, and observe the effects of cache-unfriendly access patterns. This analysis will help you understand and optimize your code.

0 commit comments

Comments
 (0)