Merge pull request #2611 from jasonrandrews/review

jasonrandrews · web-flow · commit 7e14df981d3c · 2025-12-02T09:47:59.000-06:00
remove backticks in heading causing very small font
diff --git a/content/learning-paths/cross-platform/adler32/summary-10.md b/content/learning-paths/cross-platform/adler32/summary-10.md
@@ -47,13 +47,13 @@ The project includes:
 
 ### Implementations
 
-#### 1. Simple Implementation (`adler32-simple.c`)
+#### 1. Simple Implementation
 
-This is a straightforward C implementation following the standard Adler-32 algorithm definition. It processes the input data byte by byte, updating two 16-bit accumulators (`a` and `b`) modulo 65521 (the largest prime smaller than 2^16).
+The code in `adler32-simple.c` is a straightforward C implementation following the standard Adler-32 algorithm definition. It processes the input data byte by byte, updating two 16-bit accumulators (`a` and `b`) modulo 65521 (the largest prime smaller than 2^16).
 
-#### 2. NEON-Optimized Implementation (`adler32-neon.c`)
+#### 2. NEON-Optimized Implementation
 
-This implementation leverages ARM NEON SIMD (Single Instruction, Multiple Data) instructions to accelerate the checksum calculation. Key aspects include:
+The code in `adler32-neon.c` leverages ARM NEON SIMD (Single Instruction, Multiple Data) instructions to accelerate the checksum calculation. Key aspects include:
 *   Processing data in blocks (16 bytes at a time).
 *   Using NEON intrinsics (`vld1q_u8`, `vmovl_u8`, `vaddq_u16`, `vpaddlq_u16`, `vmulq_u16`, etc.) to perform parallel operations on data vectors.
 *   Calculating the sums `S1` (sum of bytes) and `S2` (weighted sum) for each block using vector operations.
diff --git a/content/learning-paths/embedded-and-microcontrollers/cmsis_rtx_vs/initialize.md b/content/learning-paths/embedded-and-microcontrollers/cmsis_rtx_vs/initialize.md
@@ -13,7 +13,7 @@ When setting up the project's Run-Time Environment, ensure you add the appropria
 
 Once this is done, the `RTX5` initialization code is typically the same. It involves setting up the `SysTick` timer with the [SystemCoreClockUpdate()](https://www.keil.com/pack/doc/CMSIS/Core/html/group__system__init__gr.html#gae0c36a9591fe6e9c45ecb21a794f0f0f) function, then initializing and starting the RTOS.
 
-## Create `main()`
+## Create the main() function 
 
 Return to the `CMSIS` view.
 
diff --git a/content/learning-paths/embedded-and-microcontrollers/cmsis_rtx_vs/threads.md b/content/learning-paths/embedded-and-microcontrollers/cmsis_rtx_vs/threads.md
@@ -11,7 +11,7 @@ In this step, you will implement the main RTOS thread (`app_main`), which is pri
 
 You will create three threads. The number and naming of the threads are flexible, so feel free to adjust as needed.
 
-## Create `app_main`
+## Create the app_main() function
 
 Click on the `+` icon within the `Source Files` Group, and add a new file `app_main.c`. Populate with the below.
 
@@ -28,6 +28,7 @@ void app_main (void *argument) {
 	osThreadNew(thread3, NULL, NULL);	// Create thread3
 }
 ```
+
 ## Create Threads
 
 Now you can implement the functionality of the threads themselves. Start with a simple example. Each thread will say hello, and then pause for a period, forever.
diff --git a/content/learning-paths/laptops-and-desktops/windowsperf_sampling_cpython/windowsperf_sampling_cpython_example_2.md b/content/learning-paths/laptops-and-desktops/windowsperf_sampling_cpython/windowsperf_sampling_cpython_example_2.md
@@ -4,7 +4,7 @@ title: WindowsPerf record example
 weight: 4
 ---
 
-## Example 2: Using the `record` command to simplify things
+## Example 2: Simplify the steps using the record command 
 
 The `record` command spawns the process and pins it to the core specified by the `-c` option. You can either use `--pe_file` to let WindowsPerf know which process to spawn or simply add the process to spawn at the very end of the `wperf` command. 
 
diff --git a/content/learning-paths/mobile-graphics-and-gaming/android_opencv_kleidicv/process-images.md b/content/learning-paths/mobile-graphics-and-gaming/android_opencv_kleidicv/process-images.md
@@ -18,7 +18,8 @@ This Learning Path uses a [cameraman image](https://github.com/antimatter15/came
 
 For easier navigation between files in Android Studio, use the **Project** menu option from the project browser pane.
 
-## `ImageOperation`
+## Create the ImageOperation class
+
 You will now create an enum class, which is an enumeration, for a set of image processing operations in an application that uses the OpenCV library. 
 
 In the `src/main/java/com/arm/arm64kleidicvdemo` file directory, add the `ImageOperation.kt` file, and modify it as follows:
@@ -89,7 +90,8 @@ Generally, only single-channel images are supported; with Gaussian blur being an
 
 There is also the companion object that provides a utility method `fromDisplayName`. This function maps the string `displayName` to its corresponding enum constant by iterating through the list of all enum values, and returns null if no match is found.
 
-## `ImageProcessor`
+## Create the ImageProcessor class
+
 Now add the `ImageProcessor.kt`:
 
 ```Kotlin
@@ -110,7 +112,7 @@ The `ImageProcessor` class acts as a simple orchestrator for image processing ta
 
 This design is clean and modular, allowing developers to easily add new processing operations or reuse the `ImageProcessor` in different parts of an application. It aligns with object-oriented principles by promoting encapsulation and reducing processing logic complexity.
 
-## `PerformanceMetrics`
+## Create the PerformanceMetrics class
 Now supplement the project with the `PerformanceMetrics.kt` file:
 
 ```Kotlin
@@ -150,7 +152,8 @@ The `PerformanceMetrics` class analyzes and summarizes performance measurements,
 
 By encapsulating the raw data `durationsNano` and exposing only meaningful metrics through computed properties, the class ensures clear separation of data and functionality. The overridden `toString` method makes it easy to generate a human-readable summary for reporting or debugging purposes. You can use this method to report the performance metrics to the user.
 
-## `MainActivity`
+## Create the MainActivity
+
 You can now move on to modify `MainActivity.kt` as follows:
 
 ```Kotlin
@@ -332,7 +335,8 @@ The activity also implements several helper methods:
 7. `measureOperationTime` - Measures the execution time of an operation in nanoseconds using System.nanoTime().
 8. `displayProcessedImage`. This method converts the processed Mat back to a Bitmap for display and updates the ImageView with the processed image.
 
-## `Databinding`
+## Add Databinding
+
 Finally, modify `build.gradle.kts` by adding the databinding under build features:
 
 ```JSON
diff --git a/content/learning-paths/mobile-graphics-and-gaming/using-neon-intrinsics-to-optimize-unity-on-android/10-appendix.md b/content/learning-paths/mobile-graphics-and-gaming/using-neon-intrinsics-to-optimize-unity-on-android/10-appendix.md
@@ -5,60 +5,60 @@ weight: 11
 ### FIXED, DO NOT MODIFY
 layout: learningpathall
 ---
-## The Neon intrinsics we used
+## The Neon intrinsics used
 Here is a breakdown of some of the Neon intrinsics that were used to optimize the AABB collision detection in the function called `NeonAABBObjCollisionDetectionUnrolled`. It can be found at line **718** in _CollisionCalculationScript.cs_.
 
 `NeonAABBObjCollisionDetectionUnrolled` performs the collision detection between the characters and the walls. The outer loop iterates through all of the characters, while the inner loop iterates through the walls. The result is an array of boolean values (**true** denotes a collision has occurred) which tells us which characters have collided with which walls.
 
-### `Unity.Burst.Intrinsics.v64` (loading data into a vector register)
+`Unity.Burst.Intrinsics.v64` (loading data into a vector register)
 ```
 Line 721: var tblindex1 = new Unity.Burst.Intrinsics.v64((byte)0, 4, 8, 12, 255, 255, 255, 255)
 ```
 Create a 64-bit vector with 8 8-bit elements with values (0, 4, 8, 12, 255, 255, 255, 255). This is used as a lookup table on _line 741_.
 
-### `vdupq_n_f32`
+`vdupq_n_f32`
 ```
 Line 728: charMaxXs = vdupq_n_f32(*(characters + c))
 ```
 Duplicate floating point values into all 4 lanes of the 128-bit returned vector. The returned vector will contain 4 copies of a single _Max X_ value (of character bounds).
 
-### `vld1q_f32`
+`vld1q_f32`
 ```
 Line 736: wallMinXs = vld1q_f32(walls + w)
 ```
 Load multiple floating point values from memory into a single vector register. The returned vector will contain _Min X_ values from 4 different walls.
 
-### `vcgeq_f32`
+`vcgeq_f32`
 ```
 Line 741: vcgeq_f32(wallMinXs, charMaxXs)
 ```
 Floating point comparisons (greater-than or equal). It compares 4 walls at once with a character's _Max X_. Each of the four results will either be all ones (true) or all zeros (false).
 
-### `vorrq_u32`
+`vorrq_u32`
 ```
 Line 741: vorrq_u32(vcgeq_f32(wallMinXs, charMaxXs), vcgeq_f32(wallMinYs, charMaxYs))
 ```
 Bitwise inclusive OR. The nested calls to `vcgeq\_f32` are comparing the walls (Min X and Min Y) against the characters' Max X and Max Y. The four comparison results are combined with a bitwise OR. 
 
-### `vqtbl1_u8`
+`vqtbl1_u8`
 ```
 Line 741: results = vqtbl1_u8(_result of ORs_, tblindex1)
 ```
 Table lookup function that selects elements from an array based on the indices provided. The result of the OR operations will be treated as an array of 8-bit values. The values from _tblindex1_ (0, 4, 8 and 12) ensure that we select the most significant bytes from each u32 OR result. So 4 character-wall comparisons are being merged into one 128-bit vector along with 4 dummies (because of the _out of range_ values in tblindex1) that will be replaced later with the 64-bit value of the next 4 wall comparisons (from _wmvn_u8_).
 
-### `vqtbx1_u8`
+`vqtbx1_u8`
 ```
 Line 751: vqtbx1_u8(results, …)
 ```
 Table lookup function except when an index is out of range as it leaves the existing data alone. This has the effect of selecting 4 bytes (using indices from _tblindex2_) from the results of the last 4 comparisons and combines them with the previous 4 results. _results_ will now contain the results of 8 wall-character comparisons.
 
-### `vmvn_u8`
+`vmvn_u8`
 ```
 Line 751: results = vmvn_u8(...)
 ```
 Bitwise NOT operation. This negates each of the 8 character-wall comparisons. It is effectively the _!_ (NOT) in our [AABB intersection function](/learning-paths/mobile-graphics-and-gaming/using-neon-intrinsics-to-optimize-unity-on-android/5-the-optimizations#the-aabb-intersection-function) except that it is working on 8 results instead of 1.
 
-### `Unity.Burst.Intrinsics.v64` (storing to memory)
+`Unity.Burst.Intrinsics.v64` (storing to memory)
 ```
 Line 755: *(Unity.Burst.Intrinsics.v64*)(collisions + (c * numWalls + w - 4)) = results;
 ```
diff --git a/content/learning-paths/servers-and-cloud-computing/memory_consistency/litmus_syntax.md b/content/learning-paths/servers-and-cloud-computing/memory_consistency/litmus_syntax.md
@@ -68,7 +68,7 @@ Now inspect this litmus file to gain a better understanding of the assembly code
   - "On `P1`, is it possible to observe register `W0` (the flag) set to 1 **AND** register `W2` (the payload) set to 0?"
     - Wait...but the condition uses register names `X0` and `X2`, not `W0` and `W2`. See the note below for more.
   - In this condition check syntax, `/\` is a logical **AND**, while `\/` is a logical **OR**.
-#### Note on `X` and `W` Registers:
+#### Note on X and W Registers:
   - Notice you are using `X` registers for storing addresses and for doing the condition check, but `W` registers for everything else.
     - Addresses need to be stored as 64-bit values, hence the need to use `X` registers for the addresses because they are 64-bit. `W` registers are 32-bit. In fact, register `Wn` is the lower 32-bits of register `Xn`.
     - Writing the litmus tests this way is simpler than using all `X` registers. If all `X` registers are used, the data type of each register needs to be declared on additional lines. For this reason, most tests are written as shown above. The way this is done may be changed in the future to reduce potential confusion around the mixed use of `W` and `X` registers, but all of this is functionally correct.
@@ -77,7 +77,7 @@ Before you run this test with `herd7` and `litmus7`, you can hypothesize on what
 
 Further, if you interleave these instructions in all possible permutations, you can figure out all of the possible valid outcomes of registers `X0` (flag) and `X2` (payload) on `P1`. For the example test above, the possible valid outcomes of `(X0,X2)` (or `(flag,data)`) are `(0,0)`, `(0,1)`, & `(1,1)`. Some permutations that result in these valid outcomes are shown below. These are not all the possible instruction permutations for this test. Listing them all would make this section needlessly long.
 
-#### A Permutation That Results in `(0,0)`:
+#### A Permutation That Results in (0,0):
 
 ```output
 (P1)  LDR  W0,  [X1]  # P1 reads flag, gets 0
@@ -89,7 +89,7 @@ Further, if you interleave these instructions in all possible permutations, you
 ```
 In this permutation of the test execution, `P1` runs to completion before `P0` even starts its execution. For this reason, `P1` observes the initial values of 0 for both the flag and payload.
 
-#### A Permutation That Results in `(0,1)`:
+#### A Permutation That Results in (0,1):
 
 ```output
 (P1)  LDR  W0,  [X1]  # P1 reads flag, gets 0
@@ -101,7 +101,7 @@ In this permutation of the test execution, `P1` runs to completion before `P0` e
 ```
 In this permutation of the test execution, `P1` reads the initial value of the flag (the first line) because this instruction is executed before `P0` writes the flag (the last list). However `P1` reads the payload value of 1 because it executes after `P0` writes the payload to 1 (third and forth lines).
 
-#### A Permutation that Results in `(1,1)`:
+#### A Permutation that Results in (1,1):
 
 ```output
 (P0)  MOV  W0,  #1
@@ -142,7 +142,7 @@ The Arm memory model tends to be considered a Relaxed Consistency model, which m
 
 In a Release Consistency model, ordinary memory accesses like `STR` and `LDR` do not need to follow program order. This relaxation in the ordering rules expands the list of instruction permutations in the litmus test above. It is these additional instruction permutations allowed by the Relaxed Consistency model that yield at least one permutation that results in `(1,0)`. Below is one such example of a permutation. For this permutation, the `LDR` instructions in `P1` are reordered.
 
-#### One Possible Permutation Resulting in `(1,0)`:
+#### One Possible Permutation Resulting in (1,0):
 
 ```output
 (P1)  LDR  W2,  [X3]  # P1 reads payload, gets 0