You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
title: Profile Linux kernel modules with Arm Streamline
3
3
weight: 2
4
4
5
5
### FIXED, DO NOT MODIFY
6
6
layout: learningpathall
7
7
---
8
8
9
-
## Linux kernel profiling with Arm Streamline
9
+
## Overview
10
+
Performance tuning is just as important for kernel modules as it is for user-space applications. Arm Streamline is a powerful profiling tool that helps you find performance bottlenecks, hotspots, and memory issues - even inside the Linux kernel. In this Learning Path, you'll learn how to use Arm Streamline to profile a simple kernel module on Arm-based systems. You'll see how profiling can reveal optimization opportunities and help you improve your module's efficiency.
11
+
## Benefits of profiling Linux kernel modules with Arm Streamline
10
12
11
-
Performance tuning is not limited to user-space applications—kernel modules can also benefit from careful analysis. [Arm Streamline](https://developer.arm.com/Tools%20and%20Software/Streamline%20Performance%20Analyzer) is a powerful software profiling tool that helps developers understand performance bottlenecks, hotspots, and memory usage, even inside the Linux kernel. This learning path explains how to use Arm Streamline to profile a simple kernel module.
13
+
Kernel modules often operate in performance-critical paths, such as device drivers or networking subsystems. Even a small inefficiency in a module can affect the overall system performance.
12
14
13
-
### Why profile a kernel module?
15
+
Profiling enables you to do the following:
14
16
15
-
Kernel modules often operate in performance-critical paths, such as device drivers or networking subsystems. Even a small inefficiency in a module can affect the overall system performance. Profiling enables you to:
17
+
- Identify hotspots (functions consuming most CPU cycles)
18
+
- Measure cache and memory behavior
19
+
- Understand call stacks for debugging performance issues
20
+
- Detect synchronization bottlenecks and race conditions
21
+
- Quantify the impact of code changes on system latency and throughput
22
+
- Validate that optimizations improve performance on Arm-based systems
16
23
17
-
- Identify hotspots (functions consuming most CPU cycles)
18
-
- Measure cache and memory behavior
19
-
- Understand call stacks for debugging performance issues
24
+
By using Arm Streamline, you gain visibility into how your kernel module interacts with the rest of the system. This insight helps you make data-driven decisions to optimize code paths, reduce resource contention, and ensure your module performs efficiently on Arm platforms. Profiling is especially valuable when porting modules from x86 to Arm, as architectural differences can reveal new optimization opportunities or highlight previously hidden issues.
Before you build a Linux image with [Buildroot](https://github.com/buildroot/buildroot), make sure your development environment includes all the required packages. These tools and libraries are essential for compiling, configuring, and assembling embedded Linux images on Arm platforms. Installing them now helps you avoid build errors and ensures a smooth workflow.
12
+
13
+
## Install the required packages for Buildroot
14
+
15
+
Run the following commands on your AArch64-based Linux system to update your package list and install the necessary dependencies:
16
+
17
+
```bash
12
18
sudo apt update
13
-
sudo apt install -y which sed make binutils build-essential diffutils gcc g++ bash patch gzip \
19
+
sudo apt install -y which sed make binutils build-essential diffutils gcc g++ bash patch gzip \
14
20
bzip2 perl tar cpio unzip rsync file bc findutils gawk libncurses-dev python-is-python3 \
15
21
gcc-arm-none-eabi
16
22
```
17
23
24
+
These packages ensure that Buildroot can configure, compile, and assemble all the components needed for your custom Linux image. If you encounter missing package errors during the build process, check your distribution's documentation for any additional dependencies specific to your environment.
25
+
26
+
18
27
## Build a debuggable kernel image
19
28
20
-
For this learning path you will be using [Buildroot](https://github.com/buildroot/buildroot) to build a Linux image for Raspberry Pi 3B+ with a debuggable Linux kernel. You will profile Linux kernel modules built out-of-tree and Linux device drivers built in the Linux source code tree.
29
+
For this Learning Path you'll build a Linux image for Raspberry Pi 3B+ with a debuggable Linux kernel. You'll profile Linux kernel modules built out-of-tree and Linux device drivers built in the Linux source code tree.
30
+
31
+
{{% notice Note on using a different board%}}
32
+
If you're not using a Raspberry Pi 3B+ for this Learning Path, select the default configuration that matches your hardware. Replace `raspberrypi3_64_defconfig` with the appropriate file from the `$(BUILDROOT_HOME)/configs` directory. This ensures Buildroot generates an image compatible with your target board.{{% /notice %}}
21
33
22
-
1. Clone the Buildroot Repository and initialize the build system with the default configurations.
34
+
Start by cloning the Buildroot repository and initialize the build system with the default configurations:
If you're not using a Raspberry Pi 3 for this Learning Path, change the `raspberrypi3_64_defconfig` to the option that matches your hardware in `$(BUILDROOT_HOME)/configs`
32
-
{{% /notice %}}
33
-
34
-
2. You will use `menuconfig` to configure the setup. Invoke it with the following command:
35
-
36
-
```
37
41
make menuconfig
38
42
```
39
43
40
-

44
+

41
45
42
-
Change Buildroot configurations to enable debugging symbols and SSH access.
46
+
Now change the Buildroot configuration to enable debugging symbols and SSH access:
43
47
44
48
```plaintext
45
49
Build options --->
@@ -61,38 +65,51 @@ Target packages --->
61
65
[*] openssh
62
66
[*] server
63
67
[*] key utilities
64
-
```
65
-
66
-
You might also need to change your default `sshd_config` file according to your network settings. To do that, you need to modify System configuration→ Root filesystem overlay directories to add a directory that contains your modified `sshd_config` file.
68
+
You might need to update your default `sshd_config` file to match your network requirements. To do this, set the **Root filesystem overlay directories** option in the **System configuration** menu. Add a directory containing your customized `sshd_config` file. This ensures your SSH server uses the correct settings when the image boots.
67
69
68
-
3.By default the Linux kernel images are stripped. You will need to make the image debuggable as you'll be using it later.
70
+
By default, Linux kernel images are stripped of debugging information. To make the image debuggable for profiling, you need to adjust the kernel build settings.
69
71
70
-
Invoke `linux-menuconfig` and uncheck the option as shown.
72
+
Open the kernel configuration menu using the following:
71
73
72
74
```bash
73
75
make linux-menuconfig
74
76
```
75
77
78
+
In the menu, navigate to **Kernel hacking** and ensure that debugging options are enabled. Uncheck any option that reduces debugging information. This step preserves the symbols needed for effective kernel debugging and profiling:
79
+
76
80
```plaintext
77
81
Kernel hacking --->
78
82
-*- Kernel debugging
79
83
Compile-time checks and compiler options --->
80
84
Debug information (Rely on the toolchain's implicit default DWARF version)
81
85
[ ] Reduce debugging information # un-check
82
86
```
87
+
Now you're ready to build the Linux image and flash it to your SD card for use with the Raspberry Pi.
83
88
84
-
4. Now you can build the Linux image and flash it to the the SD card to run it on the Raspberry Pi.
89
+
Run the following command to start the build process:
85
90
86
91
```bash
87
92
make -j$(nproc)
88
93
```
89
94
90
-
It will take some time to build the Linux image. When it completes, the output will be in `$BUILDROOT_HOME/output/images/sdcard.img`:
95
+
This step can take a while, depending on your system's performance. When the build finishes, you'll find the generated image at `$BUILDROOT_HOME/output/images/sdcard.img`.
96
+
97
+
To confirm that Buildroot created the SD card image, list the contents of the output directory:
91
98
92
99
```bash
93
100
ls $BUILDROOT_HOME/output/images/ | grep sdcard.img
94
101
```
95
102
96
-
For details on flashing the SD card image, see [this helpful article](https://www.ev3dev.org/docs/tutorials/writing-sd-card-image-ubuntu-disk-image-writer/).
103
+
The expected output is:
104
+
105
+
```output
106
+
sdcard.img
107
+
```
108
+
109
+
If you see `sdcard.img` listed, your image is ready to be flashed to your SD card. You can now flash this image to your SD card and boot your Raspberry Pi with a debuggable Linux kernel.
110
+
111
+
For details on flashing the SD card image, see the article [Writing an SD Card Image Using Ubuntu Disk Image Writer](https://www.ev3dev.org/docs/tutorials/writing-sd-card-image-ubuntu-disk-image-writer/).
112
+
113
+
## What you've accomplished and what's next
97
114
98
-
Now that you have a target running Linux with a debuggable kernel image, you can start writing your kernel module that you want to profile.
115
+
You've now successfully built a custom Linux image with debugging features enabled and flashed it to your Raspberry Pi. This milestone means your development board is now running a kernel that's ready for profiling and advanced debugging. Great job! Setting up a debuggable environment is a significant step in embedded Linux development. Next, you'll write and profile your own kernel module, building on the solid foundation you've established.
Copy file name to clipboardExpand all lines: content/learning-paths/embedded-and-microcontrollers/streamline-kernel-module/3_oot_module.md
+37-37Lines changed: 37 additions & 37 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,18 +1,20 @@
1
1
---
2
-
title: Build out-of-tree kernel module
2
+
title: Build the out-of-tree kernel module
3
3
weight: 4
4
4
5
5
### FIXED, DO NOT MODIFY
6
6
layout: learningpathall
7
7
---
8
8
9
-
## Creating the Linux Kernel Module
9
+
## Develop a cache-unfriendly Linux kernel module to analyze with Arm Streamline
10
10
11
-
You will now create an example Linux kernel module (Character device) that demonstrates a cache miss issue caused by traversing a 2D array in column-major order. This access pattern is not cache-friendly, as it skips over most of the neighboring elements in memory during each iteration.
11
+
You'll create a simple Linux kernel module that acts as a character device and intentionally causes cache misses. It does this by traversing a two-dimensional array in column-major order, which is inefficient for the CPU cache because it jumps between non-adjacent memory locations. This pattern helps you see how cache-unfriendly code can slow down performance, making it easier to analyze with Arm Streamline.
12
12
13
-
To build the Linux kernel module, start by creating a new directory, for example `example_module`. Inside this directory, add two files: `mychardrv.c` and `Makefile`.
13
+
To build the Linux kernel module, start by creating a new directory, for example `example_module`. Inside this directory, add two files: `Makefile` and `mychardrv.c`.
14
14
15
-
**Makefile**
15
+
## Makefile
16
+
17
+
To build your Linux kernel module for Arm, you need a Makefile that instructs the build system how to compile and link your code against the kernel source. The Makefile below is designed for use with Buildroot and cross-compiles your module for the aarch64 architecture. Update the `BUILDROOT_OUT` variable to match your Buildroot output directory before running the build:
16
18
17
19
```makefile
18
20
obj-m += mychardrv.o
@@ -28,11 +30,10 @@ clean:
28
30
$(MAKE) -C $(KDIR) M=$(PWD) clean
29
31
```
30
32
31
-
{{% notice Note %}}
32
-
Change **BUILDROOT_OUT** to the correct buildroot output directory on your host machine.
33
-
{{% /notice %}}
34
33
35
-
**mychardrv.c**
34
+
## mychardrv.c
35
+
36
+
The `mychardrv.c` file contains the source code for your custom Linux kernel module. This module implements a simple character device that demonstrates cache-unfriendly behavior by traversing a two-dimensional array in column-major order. The code below allocates and initializes the array, performs the cache-unfriendly traversal, and exposes a write interface for user input. You’ll use this module to generate measurable performance bottlenecks for analysis with Arm Streamline.
The module above receives the size of a 2D array as a string through the `char_dev_write()` function, converts it to an integer, and passes it to the `char_dev_cache_traverse()` function. This function then creates the 2D array, initializes it with simple data, traverses it in a column-major (cache-unfriendly) order, computes the sum of its elements, and prints the result to the kernel log. The cache-unfriendly aspects allows you to inspect a bottleneck using Streamline in the next section.
205
206
206
-
## Building and Running the Kernel Module
207
+
## Build and run the kernel module
207
208
208
-
1. To compile the kernel module, run make inside the example_module directory. This will generate the output file `mychardrv.ko`.
209
+
To compile the kernel module, run `make` inside the `example_module` directory. This generates the output file `mychardrv.ko`.
209
210
210
-
2. Transfer the .ko file to the target using scp command and then insert it using insmod command. After inserting the module, you create a character device node using mknod command. Finally, you can test the module by writing a size value (e.g., 10000) to the device file and measuring the time taken for the operation using the `time` command.
211
+
Transfer the kernel module (`mychardrv.ko`) to your target device using the `scp` command. Then, insert the module with `insmod` and create the character device node with `mknod`. To test the module, write a size value (such as 10000) to the device file and use the `time` command to measure how long the operation takes. This lets you see the performance impact of the cache-unfriendly access pattern in a clear, hands-on way:
211
212
212
-
```bash
213
+
```bash
213
214
scp mychardrv.ko root@<target-ip>:/root/
214
-
```
215
+
```
215
216
216
-
{{% notice Note %}}
217
-
Replace \<target-ip> with your target's IP address
218
-
{{% /notice %}}
217
+
{{% notice Note %}} Replace `<target-ip>` with your target's IP address
218
+
{{% /notice %}}
219
219
220
-
3. SSH onto your target device:
220
+
SSH onto your target device:
221
221
222
-
```bash
222
+
```bash
223
223
ssh root@<your-target-ip>
224
-
```
224
+
```
225
225
226
-
4. Execute the following commands on the target to run the module:
227
-
```bash
226
+
Execute the following commads on the target to run the module:
227
+
```bash
228
228
insmod /root/mychardrv.ko
229
229
mknod /dev/mychardrv c 42 0
230
-
```
230
+
```
231
+
232
+
{{% notice Note %}}
233
+
42 and 0 are the major and minor number specified in the module code above.
234
+
{{% /notice %}}
231
235
232
-
{{% notice Note %}}
233
-
42 and 0 are the major and minor number specified in the module code above
234
-
{{% /notice %}}
236
+
To confirm that your kernel module is loaded and running, use the `dmesg` command. You should see a message like this in the output:
235
237
236
-
4. To verify that the module is active, run `dmesg` and the output should match the below:
237
238
238
-
```bash
239
+
```bash
239
240
dmesg
240
-
```
241
+
```
241
242
242
-
```output
243
+
```output
243
244
[12381.654983] mychardrv is open - Major(42) Minor(0)
244
-
```
245
+
```
245
246
246
-
5. To make sure it's working as expected you can use the following command:
247
+
To make sure it's working as expected you can use the following command:
247
248
248
-
```bash { output_lines = "2-4" }
249
+
```bash { output_lines = "2-4" }
249
250
timeecho'10000'> /dev/mychardrv
250
251
# real 0m 38.04s
251
252
# user 0m 0.00s
252
253
# sys 0m 38.03s
253
-
```
254
-
255
-
The command above passes 10000 to the module, which specifies the size of the 2D array to be created and traversed. The **echo** command takes a long time to complete (around 38 seconds) due to the cache-unfriendly traversal implemented in the `char_dev_cache_traverse()` function.
254
+
```
255
+
The command above sends the value 10000 to your kernel module, which sets the size of the 2D array it creates and traverses. Because the module accesses the array in a cache-unfriendly way, the `echo` command takes a noticeable amount of time to finish - it's about 38 seconds in this example. This delay is expected and highlights the performance impact of inefficient memory access patterns, making it easy to analyze with Arm Streamline.
256
256
257
-
With the kernel module built, the next step is to profile it using Arm Streamline. You will use it to capture runtime behavior, highlight performance bottlenecks, and help identifying issues such as the cache-unfriendly traversal in your module.
257
+
You have successfully built and run your kernel module. In the next section, you'll profile it using Arm Streamlineto capture runtime behavior, identify performance bottlenecks, and observe the effects of cache-unfriendly access patterns. This analysis will help you understand and optimize your code.
0 commit comments