Skip to content

Commit 802c921

Browse files
Merge branch 'main' into chienyuanchang/test_github_actions
2 parents 7d32934 + f9834e7 commit 802c921

File tree

13 files changed

+1481
-216
lines changed

13 files changed

+1481
-216
lines changed
Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
name: Auto Review Documentation
2+
3+
on:
4+
push:
5+
branches:
6+
- main
7+
8+
permissions:
9+
id-token: write
10+
contents: read
11+
12+
jobs:
13+
auto-review-merge:
14+
runs-on: ubuntu-latest
15+
environment: MMI-Samples
16+
env:
17+
AZURE_OPENAI_ENDPOINT: ${{ secrets.AZURE_OPENAI_ENDPOINT }}
18+
AZURE_OPENAI_DEPLOYMENT: ${{ secrets.GPT_4_1_MINI }}
19+
AZURE_OPENAI_API_VERSION: "2024-12-01-preview"
20+
GITHUB_TOKEN: ${{ secrets.TOKEN_GITHUB }}
21+
REPO_NAME: "Azure-Samples/azure-ai-content-understanding-python"
22+
DOC_FILE_FILTER: '\.md$|README|\.ipynb$'
23+
24+
steps:
25+
- name: Checkout code
26+
uses: actions/checkout@v4
27+
with:
28+
fetch-depth: 0 # Required to compare two SHAs in full history
29+
30+
- name: Set up Python
31+
uses: actions/setup-python@v4
32+
with:
33+
python-version: '3.12'
34+
35+
- name: Install dependencies
36+
run: |
37+
python -m pip install --upgrade pip
38+
pip install -r tools/review_file/requirements.txt
39+
40+
- name: Get changed files
41+
id: changed-docs
42+
run: |
43+
# Retrieve list of files changed between the previous commit and the current SHA
44+
# --diff-filter=d exclude deleted files
45+
git diff --name-only --diff-filter=d ${{ github.event.before }} ${{ github.sha }} > changed_files.txt
46+
# filter to include only documentation files
47+
# -i ignore case
48+
grep -i -E "${DOC_FILE_FILTER}" changed_files.txt > changed_docs.txt || > changed_docs.txt
49+
50+
echo "Files changed with DOC_FILE_FILTER (${DOC_FILE_FILTER}) — could be empty:"
51+
echo "---------------------------------------------------------------"
52+
cat changed_docs.txt
53+
echo "---------------------------------------------------------------"
54+
55+
- name: Azure Login
56+
uses: azure/login@v2
57+
with:
58+
client-id: ${{ secrets.AZURE_CLIENT_ID }}
59+
tenant-id: ${{ secrets.AZURE_TENANT_ID }}
60+
subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
61+
62+
- name: Run review script on each file
63+
run: |
64+
if [ ! -s changed_docs.txt ]; then
65+
echo "No documentation files changed matching DOC_FILE_FILTER (${DOC_FILE_FILTER}). Skipping review step."
66+
else
67+
while read file; do
68+
if [ -n "$file" ]; then
69+
echo "Running review script on file: $file"
70+
INPUT_FILE_PATH="$file" python tools/review_file/review_file.py
71+
fi
72+
done < changed_docs.txt
73+
fi

README.md

Lines changed: 128 additions & 101 deletions
Large diffs are not rendered by default.

docs/set_env_for_training_data_and_reference_doc.md

Lines changed: 37 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -6,23 +6,47 @@ Folders [document_training](../data/document_training/) and [field_extraction_pr
66
2. *Install Azure Storage Explorer:* Azure Storage Explorer is a tool which makes it easy to work with Azure Storage data. Install it and login with your credential, follow the [guide](https://aka.ms/download-and-install-Azure-Storage-Explorer).
77
3. *Create or Choose a Blob Container:* Create a blob container from Azure Storage Explorer or use an existing one.
88
<img src="./create-blob-container.png" width="600" />
9-
4. *Generate a Shared Access Signature (SAS) URL:*
10-
- Right-click on blob container and select the `Get Shared Access Signature...` in the menu.
11-
- Check the required permissions: `Read`, `Write` and `List`
12-
- Click the `Create` button.
13-
<img src="./get-access-signature.png" height="600" /> <img src="./choose-signature-options.png" height="600" />
14-
5. *Copy the SAS URL:* After creating the SAS, click `Copy` to get the URL with token. This will be used as the value for **TRAINING_DATA_SAS_URL** or **REFERENCE_DOC_SAS_URL** when running the sample code.
15-
<img src="./copy-access-signature.png" width="600" />
16-
6. *Set Environment Variables in ".env" File:* Depending on the sample that you will run, you will need to set required environment variables in [.env](../notebooks/.env).
17-
> NOTE: **REFERENCE_DOC_SAS_URL** can be the same as the **TRAINING_DATA_SAS_URL** to re-use the same blob container
18-
- [analyzer_training](../notebooks/analyzer_training.ipynb): Add the SAS URL as value of **TRAINIGN_DATA_SAS_URL**, and a prefix for **TRAINING_DATA_PATH**. You can choose any folder name you like for **TRAINING_DATA_PATH**. For example, you could use "training_files".
9+
4. *Set SAS URL Related Environment Variables in ".env" File:* Depending on the sample that you will run, you will need to set required environment variables in [.env](../notebooks/.env). There are two options to set up environment variables to utilize required Shared Access Signature (SAS) URL.
10+
- Option A - Generate a SAS URL manually on Azure Storage Explorer
11+
- Right-click on blob container and select the `Get Shared Access Signature...` in the menu.
12+
- Check the required permissions: `Read`, `Write` and `List`
13+
- We will need `Write` for uploading, modifying, or appending blobs
14+
- Click the `Create` button.
15+
<img src="./get-access-signature.png" height="600" /> <img src="./choose-signature-options.png" height="600" />
16+
- *Copy the SAS URL:* After creating the SAS, click `Copy` to get the URL with token. This will be used as the value for **TRAINING_DATA_SAS_URL** or **REFERENCE_DOC_SAS_URL** when running the sample code.
17+
<img src="./copy-access-signature.png" width="600" />
18+
19+
- Set the following in [.env](../notebooks/.env).
20+
> NOTE: **REFERENCE_DOC_SAS_URL** can be the same as the **TRAINING_DATA_SAS_URL** to re-use the same blob container
21+
- For [analyzer_training](../notebooks/analyzer_training.ipynb): Add the SAS URL as value of **TRAINIGN_DATA_SAS_URL**.
22+
```env
23+
TRAINING_DATA_SAS_URL=<Blob container SAS URL>
24+
```
25+
- For [field_extraction_pro_mode](../notebooks/field_extraction_pro_mode.ipynb): Add the SAS URL as value of **REFERENCE_DOC_SAS_URL**.
26+
```env
27+
REFERENCE_DOC_SAS_URL=<Blob container SAS URL>
28+
```
29+
- Option B - Auto-generate the SAS URL via code in sample notebooks
30+
- Instead of manually creating a SAS URL, you can set storage account and container information, and let the code generate a temporary SAS URL at runtime.
31+
> NOTE: **TRAINING_DATA_STORAGE_ACCOUNT_NAME** and **TRAINING_DATA_CONTAINER_NAME** can be the same as the **REFERENCE_DOC_STORAGE_ACCOUNT_NAME** and **REFERENCE_DOC_CONTAINER_NAME** to re-use the same blob container
32+
- For [analyzer_training](../notebooks/analyzer_training.ipynb): Add the storage account name as `TRAINING_DATA_STORAGE_ACCOUNT_NAME` and the container name under that storage account as `TRAINING_DATA_CONTAINER_NAME`.
33+
```env
34+
TRAINING_DATA_STORAGE_ACCOUNT_NAME=<your-storage-account-name>
35+
TRAINING_DATA_CONTAINER_NAME=<your-container-name>
36+
```
37+
- For [field_extraction_pro_mode](../notebooks/field_extraction_pro_mode.ipynb): Add the storage account name as `REFERENCE_DOC_STORAGE_ACCOUNT_NAME` and the container name under that storage account as `REFERENCE_DOC_CONTAINER_NAME`.
38+
```env
39+
REFERENCE_DOC_STORAGE_ACCOUNT_NAME=<your-storage-account-name>
40+
REFERENCE_DOC_CONTAINER_NAME=<your-container-name>
41+
```
42+
43+
5. *Set Folder Prefix in ".env" File:* Depending on the sample that you will run, you will need to set required environment variables in [.env](../notebooks/.env).
44+
- For [analyzer_training](../notebooks/analyzer_training.ipynb): Add a prefix for **TRAINING_DATA_PATH**. You can choose any folder name you like for **TRAINING_DATA_PATH**. For example, you could use "training_files".
1945
```env
20-
TRAINING_DATA_SAS_URL=<Blob container SAS URL>
2146
TRAINING_DATA_PATH=<Designated folder path under the blob container>
2247
```
23-
- [field_extraction_pro_mode](../notebooks/field_extraction_pro_mode.ipynb): Add the SAS URL as value of **REFERENCE_DOC_SAS_URL**, and a prefix for **REFERENCE_DOC_PATH**. You can choose any folder name you like for **REFERENCE_DOC_PATH**. For example, you could use "reference_docs".
48+
- For [field_extraction_pro_mode](../notebooks/field_extraction_pro_mode.ipynb): Add a prefix for **REFERENCE_DOC_PATH**. You can choose any folder name you like for **REFERENCE_DOC_PATH**. For example, you could use "reference_docs".
2449
```env
25-
REFERENCE_DOC_SAS_URL=<Blob container SAS URL>
2650
REFERENCE_DOC_PATH=<Designated folder path under the blob container>
2751
```
2852

notebooks/analyzer_training.ipynb

Lines changed: 34 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -23,12 +23,11 @@
2323
"\n",
2424
"## Prerequisites\n",
2525
"1. Ensure Azure AI service is configured following [steps](../README.md#configure-azure-ai-service-resource)\n",
26-
"1. Follow steps in [Set env for trainging data](../docs/set_env_for_training_data_and_reference_doc.md) to add training data related env variables `TRAINING_DATA_SAS_URL` and `TRAINING_DATA_PATH` into the [.env](./.env) file.\n",
27-
" - `TRAINING_DATA_SAS_URL`: SAS URL for your Azure Blob container. \n",
28-
" - `TRAINING_DATA_PATH`: Folder path within the container to upload training data. \n",
29-
"1. Install packages needed to run the sample\n",
30-
"\n",
31-
"\n"
26+
"2. Follow steps in [Set env for trainging data](../docs/set_env_for_training_data_and_reference_doc.md) to add training data related environment variables into the [.env](./.env) file.\n",
27+
" - You can either set `TRAINING_DATA_SAS_URL` directly with the SAS URL for your Azure Blob container,\n",
28+
" - Or set both `TRAINING_DATA_STORAGE_ACCOUNT_NAME` and `TRAINING_DATA_CONTAINER_NAME`, so the SAS URL can be generated automatically during one of the later steps.\n",
29+
" - Also set `TRAINING_DATA_PATH` to specify the folder path within the container where training data will be uploaded.\n",
30+
"3. Install packages needed to run the sample\n"
3231
]
3332
},
3433
{
@@ -119,11 +118,12 @@
119118
"metadata": {},
120119
"source": [
121120
"## Prepare labeled data\n",
122-
"In this step, we will \n",
123-
"- Check whether document files in local folder have corresponding `.labels.json` and `.result.json` files\n",
124-
"- Upload these files to the designated Azure blob storage.\n",
125-
"\n",
126-
"We use **TRAINING_DATA_SAS_URL** and **TRAINING_DATA_PATH** that's set in the Prerequisites step."
121+
"In this step, we will\n",
122+
"- Use `TRAINING_DATA_PATH` and SAS URL related environment variables that were set in the Prerequisites step.\n",
123+
"- Try to get the SAS URL from the environment variable `TRAINING_DATA_SAS_URL`.\n",
124+
"If this is not set, we attempt to generate the SAS URL automatically using the environment variables `TRAINING_DATA_STORAGE_ACCOUNT_NAME` and `TRAINING_DATA_CONTAINER_NAME`.\n",
125+
"- Verify that document files in the local folder have corresponding `.labels.json` and `.result.json` files\n",
126+
"- Upload these files to the Azure Blob storage container specified by the environment variables."
127127
]
128128
},
129129
{
@@ -132,10 +132,26 @@
132132
"metadata": {},
133133
"outputs": [],
134134
"source": [
135-
"TRAINING_DATA_SAS_URL = os.getenv(\"TRAINING_DATA_SAS_URL\")\n",
136-
"TRAINING_DATA_PATH = os.getenv(\"TRAINING_DATA_PATH\")\n",
137-
"\n",
138-
"await client.generate_training_data_on_blob(training_docs_folder, TRAINING_DATA_SAS_URL, TRAINING_DATA_PATH)"
135+
"training_data_sas_url = os.getenv(\"TRAINING_DATA_SAS_URL\")\n",
136+
"if not training_data_sas_url:\n",
137+
" TRAINING_DATA_STORAGE_ACCOUNT_NAME = os.getenv(\"TRAINING_DATA_STORAGE_ACCOUNT_NAME\")\n",
138+
" TRAINING_DATA_CONTAINER_NAME = os.getenv(\"TRAINING_DATA_CONTAINER_NAME\")\n",
139+
" if not TRAINING_DATA_STORAGE_ACCOUNT_NAME and not training_data_sas_url:\n",
140+
" raise ValueError(\n",
141+
" \"Please set either TRAINING_DATA_SAS_URL or both TRAINING_DATA_STORAGE_ACCCOUNT_NAME and TRAINING_DATA_CONTAINER_NAME environment variables.\"\n",
142+
" )\n",
143+
" from azure.storage.blob import ContainerSasPermissions\n",
144+
" # We will need \"Write\" for uploading, modifying, or appending blobs\n",
145+
" training_data_sas_url = AzureContentUnderstandingClient.generate_temp_container_sas_url(\n",
146+
" account_name=TRAINING_DATA_STORAGE_ACCOUNT_NAME,\n",
147+
" container_name=TRAINING_DATA_CONTAINER_NAME,\n",
148+
" permissions=ContainerSasPermissions(read=True, write=True, list=True),\n",
149+
" expiry_hours=1,\n",
150+
" )\n",
151+
"\n",
152+
"training_data_path = os.getenv(\"TRAINING_DATA_PATH\")\n",
153+
"\n",
154+
"await client.generate_training_data_on_blob(training_docs_folder, training_data_sas_url, training_data_path)"
139155
]
140156
},
141157
{
@@ -145,7 +161,7 @@
145161
"## Create analyzer with defined schema\n",
146162
"Before creating the analyzer, you should fill in the constant ANALYZER_ID with a relevant name to your task. Here, we generate a unique suffix so this cell can be run multiple times to create different analyzers.\n",
147163
"\n",
148-
"We use **TRAINING_DATA_SAS_URL** and **TRAINING_DATA_PATH** that's set up in the [.env](./.env) file and used in the previous step."
164+
"We use **training_data_sas_url** and **training_data_path** that's set up in the [.env](./.env) file and used in the previous step."
149165
]
150166
},
151167
{
@@ -160,8 +176,8 @@
160176
"response = client.begin_create_analyzer(\n",
161177
" CUSTOM_ANALYZER_ID,\n",
162178
" analyzer_template_path=analyzer_template,\n",
163-
" training_storage_container_sas_url=TRAINING_DATA_SAS_URL,\n",
164-
" training_storage_container_path_prefix=TRAINING_DATA_PATH,\n",
179+
" training_storage_container_sas_url=training_data_sas_url,\n",
180+
" training_storage_container_path_prefix=training_data_path,\n",
165181
")\n",
166182
"result = client.poll_result(response)\n",
167183
"if result is not None and \"status\" in result and result[\"status\"] == \"Succeeded\":\n",

0 commit comments

Comments
 (0)