Skip to content

Commit 7e14df9

Browse files
Merge pull request #2611 from jasonrandrews/review
remove backticks in heading causing very small font
2 parents d37f57a + e47091a commit 7e14df9

File tree

7 files changed

+32
-27
lines changed

7 files changed

+32
-27
lines changed

content/learning-paths/cross-platform/adler32/summary-10.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -47,13 +47,13 @@ The project includes:
4747

4848
### Implementations
4949

50-
#### 1. Simple Implementation (`adler32-simple.c`)
50+
#### 1. Simple Implementation
5151

52-
This is a straightforward C implementation following the standard Adler-32 algorithm definition. It processes the input data byte by byte, updating two 16-bit accumulators (`a` and `b`) modulo 65521 (the largest prime smaller than 2^16).
52+
The code in `adler32-simple.c` is a straightforward C implementation following the standard Adler-32 algorithm definition. It processes the input data byte by byte, updating two 16-bit accumulators (`a` and `b`) modulo 65521 (the largest prime smaller than 2^16).
5353

54-
#### 2. NEON-Optimized Implementation (`adler32-neon.c`)
54+
#### 2. NEON-Optimized Implementation
5555

56-
This implementation leverages ARM NEON SIMD (Single Instruction, Multiple Data) instructions to accelerate the checksum calculation. Key aspects include:
56+
The code in `adler32-neon.c` leverages ARM NEON SIMD (Single Instruction, Multiple Data) instructions to accelerate the checksum calculation. Key aspects include:
5757
* Processing data in blocks (16 bytes at a time).
5858
* Using NEON intrinsics (`vld1q_u8`, `vmovl_u8`, `vaddq_u16`, `vpaddlq_u16`, `vmulq_u16`, etc.) to perform parallel operations on data vectors.
5959
* Calculating the sums `S1` (sum of bytes) and `S2` (weighted sum) for each block using vector operations.

content/learning-paths/embedded-and-microcontrollers/cmsis_rtx_vs/initialize.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ When setting up the project's Run-Time Environment, ensure you add the appropria
1313

1414
Once this is done, the `RTX5` initialization code is typically the same. It involves setting up the `SysTick` timer with the [SystemCoreClockUpdate()](https://www.keil.com/pack/doc/CMSIS/Core/html/group__system__init__gr.html#gae0c36a9591fe6e9c45ecb21a794f0f0f) function, then initializing and starting the RTOS.
1515

16-
## Create `main()`
16+
## Create the main() function
1717

1818
Return to the `CMSIS` view.
1919

content/learning-paths/embedded-and-microcontrollers/cmsis_rtx_vs/threads.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ In this step, you will implement the main RTOS thread (`app_main`), which is pri
1111

1212
You will create three threads. The number and naming of the threads are flexible, so feel free to adjust as needed.
1313

14-
## Create `app_main`
14+
## Create the app_main() function
1515

1616
Click on the `+` icon within the `Source Files` Group, and add a new file `app_main.c`. Populate with the below.
1717

@@ -28,6 +28,7 @@ void app_main (void *argument) {
2828
osThreadNew(thread3, NULL, NULL); // Create thread3
2929
}
3030
```
31+
3132
## Create Threads
3233
3334
Now you can implement the functionality of the threads themselves. Start with a simple example. Each thread will say hello, and then pause for a period, forever.

content/learning-paths/laptops-and-desktops/windowsperf_sampling_cpython/windowsperf_sampling_cpython_example_2.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ title: WindowsPerf record example
44
weight: 4
55
---
66

7-
## Example 2: Using the `record` command to simplify things
7+
## Example 2: Simplify the steps using the record command
88

99
The `record` command spawns the process and pins it to the core specified by the `-c` option. You can either use `--pe_file` to let WindowsPerf know which process to spawn or simply add the process to spawn at the very end of the `wperf` command.
1010

content/learning-paths/mobile-graphics-and-gaming/android_opencv_kleidicv/process-images.md

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,8 @@ This Learning Path uses a [cameraman image](https://github.com/antimatter15/came
1818

1919
For easier navigation between files in Android Studio, use the **Project** menu option from the project browser pane.
2020

21-
## `ImageOperation`
21+
## Create the ImageOperation class
22+
2223
You will now create an enum class, which is an enumeration, for a set of image processing operations in an application that uses the OpenCV library.
2324

2425
In the `src/main/java/com/arm/arm64kleidicvdemo` file directory, add the `ImageOperation.kt` file, and modify it as follows:
@@ -89,7 +90,8 @@ Generally, only single-channel images are supported; with Gaussian blur being an
8990

9091
There is also the companion object that provides a utility method `fromDisplayName`. This function maps the string `displayName` to its corresponding enum constant by iterating through the list of all enum values, and returns null if no match is found.
9192

92-
## `ImageProcessor`
93+
## Create the ImageProcessor class
94+
9395
Now add the `ImageProcessor.kt`:
9496

9597
```Kotlin
@@ -110,7 +112,7 @@ The `ImageProcessor` class acts as a simple orchestrator for image processing ta
110112

111113
This design is clean and modular, allowing developers to easily add new processing operations or reuse the `ImageProcessor` in different parts of an application. It aligns with object-oriented principles by promoting encapsulation and reducing processing logic complexity.
112114

113-
## `PerformanceMetrics`
115+
## Create the PerformanceMetrics class
114116
Now supplement the project with the `PerformanceMetrics.kt` file:
115117

116118
```Kotlin
@@ -150,7 +152,8 @@ The `PerformanceMetrics` class analyzes and summarizes performance measurements,
150152

151153
By encapsulating the raw data `durationsNano` and exposing only meaningful metrics through computed properties, the class ensures clear separation of data and functionality. The overridden `toString` method makes it easy to generate a human-readable summary for reporting or debugging purposes. You can use this method to report the performance metrics to the user.
152154

153-
## `MainActivity`
155+
## Create the MainActivity
156+
154157
You can now move on to modify `MainActivity.kt` as follows:
155158

156159
```Kotlin
@@ -332,7 +335,8 @@ The activity also implements several helper methods:
332335
7. `measureOperationTime` - Measures the execution time of an operation in nanoseconds using System.nanoTime().
333336
8. `displayProcessedImage`. This method converts the processed Mat back to a Bitmap for display and updates the ImageView with the processed image.
334337

335-
## `Databinding`
338+
## Add Databinding
339+
336340
Finally, modify `build.gradle.kts` by adding the databinding under build features:
337341

338342
```JSON

content/learning-paths/mobile-graphics-and-gaming/using-neon-intrinsics-to-optimize-unity-on-android/10-appendix.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -5,60 +5,60 @@ weight: 11
55
### FIXED, DO NOT MODIFY
66
layout: learningpathall
77
---
8-
## The Neon intrinsics we used
8+
## The Neon intrinsics used
99
Here is a breakdown of some of the Neon intrinsics that were used to optimize the AABB collision detection in the function called `NeonAABBObjCollisionDetectionUnrolled`. It can be found at line **718** in _CollisionCalculationScript.cs_.
1010

1111
`NeonAABBObjCollisionDetectionUnrolled` performs the collision detection between the characters and the walls. The outer loop iterates through all of the characters, while the inner loop iterates through the walls. The result is an array of boolean values (**true** denotes a collision has occurred) which tells us which characters have collided with which walls.
1212

13-
### `Unity.Burst.Intrinsics.v64` (loading data into a vector register)
13+
`Unity.Burst.Intrinsics.v64` (loading data into a vector register)
1414
```
1515
Line 721: var tblindex1 = new Unity.Burst.Intrinsics.v64((byte)0, 4, 8, 12, 255, 255, 255, 255)
1616
```
1717
Create a 64-bit vector with 8 8-bit elements with values (0, 4, 8, 12, 255, 255, 255, 255). This is used as a lookup table on _line 741_.
1818

19-
### `vdupq_n_f32`
19+
`vdupq_n_f32`
2020
```
2121
Line 728: charMaxXs = vdupq_n_f32(*(characters + c))
2222
```
2323
Duplicate floating point values into all 4 lanes of the 128-bit returned vector. The returned vector will contain 4 copies of a single _Max X_ value (of character bounds).
2424

25-
### `vld1q_f32`
25+
`vld1q_f32`
2626
```
2727
Line 736: wallMinXs = vld1q_f32(walls + w)
2828
```
2929
Load multiple floating point values from memory into a single vector register. The returned vector will contain _Min X_ values from 4 different walls.
3030

31-
### `vcgeq_f32`
31+
`vcgeq_f32`
3232
```
3333
Line 741: vcgeq_f32(wallMinXs, charMaxXs)
3434
```
3535
Floating point comparisons (greater-than or equal). It compares 4 walls at once with a character's _Max X_. Each of the four results will either be all ones (true) or all zeros (false).
3636

37-
### `vorrq_u32`
37+
`vorrq_u32`
3838
```
3939
Line 741: vorrq_u32(vcgeq_f32(wallMinXs, charMaxXs), vcgeq_f32(wallMinYs, charMaxYs))
4040
```
4141
Bitwise inclusive OR. The nested calls to `vcgeq\_f32` are comparing the walls (Min X and Min Y) against the characters' Max X and Max Y. The four comparison results are combined with a bitwise OR.
4242

43-
### `vqtbl1_u8`
43+
`vqtbl1_u8`
4444
```
4545
Line 741: results = vqtbl1_u8(_result of ORs_, tblindex1)
4646
```
4747
Table lookup function that selects elements from an array based on the indices provided. The result of the OR operations will be treated as an array of 8-bit values. The values from _tblindex1_ (0, 4, 8 and 12) ensure that we select the most significant bytes from each u32 OR result. So 4 character-wall comparisons are being merged into one 128-bit vector along with 4 dummies (because of the _out of range_ values in tblindex1) that will be replaced later with the 64-bit value of the next 4 wall comparisons (from _wmvn_u8_).
4848

49-
### `vqtbx1_u8`
49+
`vqtbx1_u8`
5050
```
5151
Line 751: vqtbx1_u8(results, …)
5252
```
5353
Table lookup function except when an index is out of range as it leaves the existing data alone. This has the effect of selecting 4 bytes (using indices from _tblindex2_) from the results of the last 4 comparisons and combines them with the previous 4 results. _results_ will now contain the results of 8 wall-character comparisons.
5454

55-
### `vmvn_u8`
55+
`vmvn_u8`
5656
```
5757
Line 751: results = vmvn_u8(...)
5858
```
5959
Bitwise NOT operation. This negates each of the 8 character-wall comparisons. It is effectively the _!_ (NOT) in our [AABB intersection function](/learning-paths/mobile-graphics-and-gaming/using-neon-intrinsics-to-optimize-unity-on-android/5-the-optimizations#the-aabb-intersection-function) except that it is working on 8 results instead of 1.
6060

61-
### `Unity.Burst.Intrinsics.v64` (storing to memory)
61+
`Unity.Burst.Intrinsics.v64` (storing to memory)
6262
```
6363
Line 755: *(Unity.Burst.Intrinsics.v64*)(collisions + (c * numWalls + w - 4)) = results;
6464
```

content/learning-paths/servers-and-cloud-computing/memory_consistency/litmus_syntax.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ Now inspect this litmus file to gain a better understanding of the assembly code
6868
- "On `P1`, is it possible to observe register `W0` (the flag) set to 1 **AND** register `W2` (the payload) set to 0?"
6969
- Wait...but the condition uses register names `X0` and `X2`, not `W0` and `W2`. See the note below for more.
7070
- In this condition check syntax, `/\` is a logical **AND**, while `\/` is a logical **OR**.
71-
#### Note on `X` and `W` Registers:
71+
#### Note on X and W Registers:
7272
- Notice you are using `X` registers for storing addresses and for doing the condition check, but `W` registers for everything else.
7373
- Addresses need to be stored as 64-bit values, hence the need to use `X` registers for the addresses because they are 64-bit. `W` registers are 32-bit. In fact, register `Wn` is the lower 32-bits of register `Xn`.
7474
- Writing the litmus tests this way is simpler than using all `X` registers. If all `X` registers are used, the data type of each register needs to be declared on additional lines. For this reason, most tests are written as shown above. The way this is done may be changed in the future to reduce potential confusion around the mixed use of `W` and `X` registers, but all of this is functionally correct.
@@ -77,7 +77,7 @@ Before you run this test with `herd7` and `litmus7`, you can hypothesize on what
7777

7878
Further, if you interleave these instructions in all possible permutations, you can figure out all of the possible valid outcomes of registers `X0` (flag) and `X2` (payload) on `P1`. For the example test above, the possible valid outcomes of `(X0,X2)` (or `(flag,data)`) are `(0,0)`, `(0,1)`, & `(1,1)`. Some permutations that result in these valid outcomes are shown below. These are not all the possible instruction permutations for this test. Listing them all would make this section needlessly long.
7979

80-
#### A Permutation That Results in `(0,0)`:
80+
#### A Permutation That Results in (0,0):
8181

8282
```output
8383
(P1) LDR W0, [X1] # P1 reads flag, gets 0
@@ -89,7 +89,7 @@ Further, if you interleave these instructions in all possible permutations, you
8989
```
9090
In this permutation of the test execution, `P1` runs to completion before `P0` even starts its execution. For this reason, `P1` observes the initial values of 0 for both the flag and payload.
9191

92-
#### A Permutation That Results in `(0,1)`:
92+
#### A Permutation That Results in (0,1):
9393

9494
```output
9595
(P1) LDR W0, [X1] # P1 reads flag, gets 0
@@ -101,7 +101,7 @@ In this permutation of the test execution, `P1` runs to completion before `P0` e
101101
```
102102
In this permutation of the test execution, `P1` reads the initial value of the flag (the first line) because this instruction is executed before `P0` writes the flag (the last list). However `P1` reads the payload value of 1 because it executes after `P0` writes the payload to 1 (third and forth lines).
103103

104-
#### A Permutation that Results in `(1,1)`:
104+
#### A Permutation that Results in (1,1):
105105

106106
```output
107107
(P0) MOV W0, #1
@@ -142,7 +142,7 @@ The Arm memory model tends to be considered a Relaxed Consistency model, which m
142142

143143
In a Release Consistency model, ordinary memory accesses like `STR` and `LDR` do not need to follow program order. This relaxation in the ordering rules expands the list of instruction permutations in the litmus test above. It is these additional instruction permutations allowed by the Relaxed Consistency model that yield at least one permutation that results in `(1,0)`. Below is one such example of a permutation. For this permutation, the `LDR` instructions in `P1` are reordered.
144144

145-
#### One Possible Permutation Resulting in `(1,0)`:
145+
#### One Possible Permutation Resulting in (1,0):
146146

147147
```output
148148
(P1) LDR W2, [X3] # P1 reads payload, gets 0

0 commit comments

Comments
 (0)