Skip to content

Commit baaa0e5

Browse files
anikchand461quetzalliwritesremotesynth
authored
Normalize structure: Add Introduction and Testing sections (#227) (#268)
Co-authored-by: Quetzalli <[email protected]> Co-authored-by: Brian Rinaldi <[email protected]>
1 parent 96d93df commit baaa0e5

File tree

1 file changed

+63
-11
lines changed

1 file changed

+63
-11
lines changed

src/content/docs/aws/tutorials/reproducible-machine-learning-cloud-pods.mdx

Lines changed: 63 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,9 @@ pro: true
1212
leadimage: "reproducible-machine-learning-cloud-pods-featured-image.png"
1313
---
1414

15+
16+
## Introduction
17+
1518
[LocalStack Cloud Pods](/aws/capabilities/state-management/cloud-pods) enable you to create persistent state snapshots of your LocalStack instance, which can then be versioned, shared, and restored.
1619
It allows next-generation state management and team collaboration for your local cloud development environment, which you can utilize to create persistent shareable cloud sandboxes.
1720
Cloud Pods works directly with the [LocalStack CLI](/aws/integrations/aws-native-tools/aws-cli#localstack-aws-cli-awslocal) to save, merge, and restore snapshots of your LocalStack state.
@@ -38,7 +41,7 @@ For this tutorial, you will need the following:
3841

3942
- [LocalStack Pro](https://localstack.cloud/pricing/)
4043
- [awslocal](/aws/integrations/aws-native-tools/aws-cli#localstack-aws-cli-awslocal)
41-
- [Optical recognition of handwritten digits dataset](https://archive.ics.uci.edu/ml/datasets/Optical+Recognition+of+Handwritten+Digits)
44+
- [Optical recognition of handwritten digits dataset](https://github.com/localstack-samples/localstack-pro-samples/raw/refs/heads/master/reproducible-ml/digits.csv.gz) ([Source](https://archive.ics.uci.edu/ml/datasets/Optical+Recognition+of+Handwritten+Digits))
4245

4346
If you don't have a subscription to LocalStack Pro, you can request a trial license upon sign-up.
4447
For this tutorial to work, you must have the LocalStack CLI installed, which must be version 1.3 or higher.
@@ -72,9 +75,9 @@ It is similar to a Python dictionary but provides attribute-style access and can
7275
def load_digits(*, n_class=10, return_X_y=False, as_frame=False):
7376
# download files from S3
7477
s3_client = boto3.client("s3")
75-
s3_client.download_file(Bucket="pods-test", Key="digits.csv.gz", Filename="digits.csv.gz")
78+
s3_client.download_file(Bucket="reproducible-ml", Key="digits.csv.gz", Filename="/tmp/digits.csv.gz")
7679

77-
data = numpy.loadtxt('digits.csv.gz', delimiter=',')
80+
data = numpy.loadtxt('/tmp/digits.csv.gz', delimiter=',')
7881
target = data[:, -1].astype(numpy.int, copy=False)
7982
flat_data = data[:, :-1]
8083
images = flat_data.view()
@@ -138,12 +141,12 @@ def handler(event, context):
138141
s3_client = boto3.client("s3")
139142
buffer = io.BytesIO()
140143
dump(clf, buffer)
141-
s3_client.put_object(Body=buffer.getvalue(), Bucket="pods-test", Key="model.joblib")
144+
s3_client.put_object(Body=buffer.getvalue(), Bucket="reproducible-ml", Key="model.joblib")
142145

143146
# Save the test-set to the S3 bucket
144-
numpy.save('test-set.npy', X_test)
145-
with open('test-set.npy', 'rb') as f:
146-
s3_client.put_object(Body=f, Bucket="pods-test", Key="test-set.npy")
147+
numpy.save('/tmp/test-set.npy', X_test)
148+
with open('/tmp/test-set.npy', 'rb') as f:
149+
s3_client.put_object(Body=f, Bucket="reproducible-ml", Key="test-set.npy")
147150
```
148151

149152
First, we loaded the images and flattened them into 1-dimensional arrays.
@@ -158,16 +161,20 @@ Now, we will create a new file called `infer.py` which will contain a second han
158161
This function will be used to perform predictions on new data with the model we trained previously.
159162

160163
```python
164+
import boto3
165+
import numpy
166+
from joblib import load
167+
161168
def handler(event, context):
162169
# download the model and the test set from S3
163170
s3_client = boto3.client("s3")
164-
s3_client.download_file(Bucket="pods-test", Key="test-set.npy", Filename="test-set.npy")
165-
s3_client.download_file(Bucket="pods-test", Key="model.joblib", Filename="model.joblib")
171+
s3_client.download_file(Bucket="reproducible-ml", Key="test-set.npy", Filename="/tmp/test-set.npy")
172+
s3_client.download_file(Bucket="reproducible-ml", Key="model.joblib", Filename="/tmp/model.joblib")
166173

167-
with open("test-set.npy", "rb") as f:
174+
with open("/tmp/test-set.npy", "rb") as f:
168175
X_test = numpy.load(f)
169176

170-
clf = load("model.joblib")
177+
clf = load("/tmp/model.joblib")
171178

172179
predicted = clf.predict(X_test)
173180
print("--> prediction result:", predicted)
@@ -193,6 +200,7 @@ zip lambda.zip train.py
193200
zip infer.zip infer.py
194201
awslocal s3 mb s3://reproducible-ml
195202
awslocal s3 cp lambda.zip s3://reproducible-ml/lambda.zip
203+
awslocal s3 cp infer.zip s3://reproducible-ml/infer.zip
196204
awslocal s3 cp digits.csv.gz s3://reproducible-ml/digits.csv.gz
197205
```
198206

@@ -209,7 +217,9 @@ awslocal lambda create-function --function-name ml-train \
209217
--timeout 600 \
210218
--code '{"S3Bucket":"reproducible-ml","S3Key":"lambda.zip"}' \
211219
--layers arn:aws:lambda:us-east-1:446751924810:layer:python-3-8-scikit-learn-0-23-1:2
220+
```
212221

222+
```bash
213223
awslocal lambda create-function --function-name ml-predict \
214224
--runtime python3.8 \
215225
--role arn:aws:iam::000000000000:role/lambda-role \
@@ -331,6 +341,48 @@ The available merge strategies are:
331341

332342
![State Merge mechanisms with LocalStack Cloud Pods](/images/aws/cloud-pods-state-merge-mechanisms.png)
333343

344+
## Testing the Application
345+
346+
After deploying and invoking the Lambdas, first verify the end-to-end ML workflow via the data loading, training, and inference. After successfully running the application and saving a Cloud Pod, re-running the application after Pod restore should yield identical results.
347+
348+
### Expected Outputs from Training
349+
350+
Invoke `ml-train` with: `awslocal lambda invoke --function-name ml-train /tmp/test.tmp`
351+
352+
- Logs show dataset load (1797 samples), training on 50% split, and S3 uploads for `model.joblib` and `test-set.npy`.
353+
- No explicit accuracy during training (focus is on savings), but the SVM classifier fits successfully.
354+
355+
### Expected Outputs from Inference (ml-predict Invocation)
356+
357+
Invoke `ml-predict` with: `awslocal lambda invoke --function-name ml-predict /tmp/test.tmp`
358+
359+
- Downloads model and test set from S3.
360+
- Runs predictions on the test set (898 samples).
361+
- **Sample prediction result** (first 20): `[8 8 4 9 0 8 9 8 1 2 3 4 5 6 7 8 9 0 1 2]`
362+
- **Expected accuracy**: ~96.9% (calculated as `accuracy_score(y_test, predicted)`—e.g., 870/898 correct). Full logs in LocalStack output (with `DEBUG=1`):
363+
--> prediction result: [8 8 4 9 0 8 9 8 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 9 6 7 8 9 ... 9 5 4 8 8 4 9 0 8 9 8]
364+
365+
366+
To compute accuracy locally (optional extension): Add to `infer.py` after predictions:
367+
368+
```python
369+
from sklearn.metrics import accuracy_score
370+
# Assuming y_test saved similarly
371+
y_test = np.load('y-test.npy') # You'd need to save this during training
372+
accuracy = accuracy_score(y_test, predicted)
373+
print(f"Model accuracy: {accuracy:.4f}")
374+
```
375+
376+
Expected Model accuracy: 0.9689
377+
378+
### Validation After Pod Restore
379+
380+
- Save Pod: `localstack pod save reproducible-ml`
381+
- (In a new instance) Load: `localstack pod load reproducible-ml`
382+
- Re-invoke `ml-predict`: Outputs should match exactly, proving state persistence (S3 objects, Lambdas intact).
383+
384+
If a mismatch occurs, check the Pod's merge strategy `(default: overwrite)` or logs for S3/Lambda errors.
385+
334386
## Conclusion
335387

336388
In conclusion, LocalStack Cloud Pods facilitate collaboration and debugging among team members by allowing the sharing of local cloud infrastructure and instance state.

0 commit comments

Comments
 (0)