|
23 | 23 | "\n", |
24 | 24 | "## Prerequisites\n", |
25 | 25 | "1. Ensure Azure AI service is configured following [steps](../README.md#configure-azure-ai-service-resource)\n", |
26 | | - "1. Follow steps in [Set env for trainging data](../docs/set_env_for_training_data_and_reference_doc.md) to add training data related env variables `TRAINING_DATA_SAS_URL` and `TRAINING_DATA_PATH` into the [.env](./.env) file.\n", |
27 | | - " - `TRAINING_DATA_SAS_URL`: SAS URL for your Azure Blob container. \n", |
28 | | - " - `TRAINING_DATA_PATH`: Folder path within the container to upload training data. \n", |
29 | | - "1. Install packages needed to run the sample\n", |
30 | | - "\n", |
31 | | - "\n" |
| 26 | + "2. Follow steps in [Set env for trainging data](../docs/set_env_for_training_data_and_reference_doc.md) to add training data related environment variables into the [.env](./.env) file.\n", |
| 27 | + " - You can either set `TRAINING_DATA_SAS_URL` directly with the SAS URL for your Azure Blob container,\n", |
| 28 | + " - Or set both `TRAINING_DATA_STORAGE_ACCOUNT_NAME` and `TRAINING_DATA_CONTAINER_NAME`, so the SAS URL can be generated automatically during one of the later steps.\n", |
| 29 | + " - Also set `TRAINING_DATA_PATH` to specify the folder path within the container where training data will be uploaded.\n", |
| 30 | + "3. Install packages needed to run the sample\n" |
32 | 31 | ] |
33 | 32 | }, |
34 | 33 | { |
|
119 | 118 | "metadata": {}, |
120 | 119 | "source": [ |
121 | 120 | "## Prepare labeled data\n", |
122 | | - "In this step, we will \n", |
123 | | - "- Check whether document files in local folder have corresponding `.labels.json` and `.result.json` files\n", |
124 | | - "- Upload these files to the designated Azure blob storage.\n", |
125 | | - "\n", |
126 | | - "We use **TRAINING_DATA_SAS_URL** and **TRAINING_DATA_PATH** that's set in the Prerequisites step." |
| 121 | + "In this step, we will\n", |
| 122 | + "- Use `TRAINING_DATA_PATH` and SAS URL related environment variables that were set in the Prerequisites step.\n", |
| 123 | + "- Try to get the SAS URL from the environment variable `TRAINING_DATA_SAS_URL`.\n", |
| 124 | + "If this is not set, we attempt to generate the SAS URL automatically using the environment variables `TRAINING_DATA_STORAGE_ACCOUNT_NAME` and `TRAINING_DATA_CONTAINER_NAME`.\n", |
| 125 | + "- Verify that document files in the local folder have corresponding `.labels.json` and `.result.json` files\n", |
| 126 | + "- Upload these files to the Azure Blob storage container specified by the environment variables." |
127 | 127 | ] |
128 | 128 | }, |
129 | 129 | { |
|
132 | 132 | "metadata": {}, |
133 | 133 | "outputs": [], |
134 | 134 | "source": [ |
135 | | - "TRAINING_DATA_SAS_URL = os.getenv(\"TRAINING_DATA_SAS_URL\")\n", |
136 | | - "TRAINING_DATA_PATH = os.getenv(\"TRAINING_DATA_PATH\")\n", |
137 | | - "\n", |
138 | | - "await client.generate_training_data_on_blob(training_docs_folder, TRAINING_DATA_SAS_URL, TRAINING_DATA_PATH)" |
| 135 | + "training_data_sas_url = os.getenv(\"TRAINING_DATA_SAS_URL\")\n", |
| 136 | + "if not training_data_sas_url:\n", |
| 137 | + " TRAINING_DATA_STORAGE_ACCOUNT_NAME = os.getenv(\"TRAINING_DATA_STORAGE_ACCOUNT_NAME\")\n", |
| 138 | + " TRAINING_DATA_CONTAINER_NAME = os.getenv(\"TRAINING_DATA_CONTAINER_NAME\")\n", |
| 139 | + " if not TRAINING_DATA_STORAGE_ACCOUNT_NAME and not training_data_sas_url:\n", |
| 140 | + " raise ValueError(\n", |
| 141 | + " \"Please set either TRAINING_DATA_SAS_URL or both TRAINING_DATA_STORAGE_ACCCOUNT_NAME and TRAINING_DATA_CONTAINER_NAME environment variables.\"\n", |
| 142 | + " )\n", |
| 143 | + " from azure.storage.blob import ContainerSasPermissions\n", |
| 144 | + " # We will need \"Write\" for uploading, modifying, or appending blobs\n", |
| 145 | + " training_data_sas_url = AzureContentUnderstandingClient.generate_temp_container_sas_url(\n", |
| 146 | + " account_name=TRAINING_DATA_STORAGE_ACCOUNT_NAME,\n", |
| 147 | + " container_name=TRAINING_DATA_CONTAINER_NAME,\n", |
| 148 | + " permissions=ContainerSasPermissions(read=True, write=True, list=True),\n", |
| 149 | + " expiry_hours=1,\n", |
| 150 | + " )\n", |
| 151 | + "\n", |
| 152 | + "training_data_path = os.getenv(\"TRAINING_DATA_PATH\")\n", |
| 153 | + "\n", |
| 154 | + "await client.generate_training_data_on_blob(training_docs_folder, training_data_sas_url, training_data_path)" |
139 | 155 | ] |
140 | 156 | }, |
141 | 157 | { |
|
145 | 161 | "## Create analyzer with defined schema\n", |
146 | 162 | "Before creating the analyzer, you should fill in the constant ANALYZER_ID with a relevant name to your task. Here, we generate a unique suffix so this cell can be run multiple times to create different analyzers.\n", |
147 | 163 | "\n", |
148 | | - "We use **TRAINING_DATA_SAS_URL** and **TRAINING_DATA_PATH** that's set up in the [.env](./.env) file and used in the previous step." |
| 164 | + "We use **training_data_sas_url** and **training_data_path** that's set up in the [.env](./.env) file and used in the previous step." |
149 | 165 | ] |
150 | 166 | }, |
151 | 167 | { |
|
160 | 176 | "response = client.begin_create_analyzer(\n", |
161 | 177 | " CUSTOM_ANALYZER_ID,\n", |
162 | 178 | " analyzer_template_path=analyzer_template,\n", |
163 | | - " training_storage_container_sas_url=TRAINING_DATA_SAS_URL,\n", |
164 | | - " training_storage_container_path_prefix=TRAINING_DATA_PATH,\n", |
| 179 | + " training_storage_container_sas_url=training_data_sas_url,\n", |
| 180 | + " training_storage_container_path_prefix=training_data_path,\n", |
165 | 181 | ")\n", |
166 | 182 | "result = client.poll_result(response)\n", |
167 | 183 | "if result is not None and \"status\" in result and result[\"status\"] == \"Succeeded\":\n", |
|
0 commit comments