Skip to content

Conversation

@baasitsharief
Copy link
Collaborator

  • Added Excel (.xlsx) file support for document ingestion
  • Enhanced indexing performance for tabular documents (CSV, Excel)
  • Updated dependencies and internal improvements
  • Minor UI Changes for consistency

mliu-cloudera and others added 10 commits October 7, 2025 14:30
#313)

* Convert Session into Pydantic

* Remove creation date and username

* Enable field validation

* Swap lines to make a bit more sequential sense
* re-add prebuilt-artifacts to the lfs support

* prebuilt_artifacts/fe-dist.tar.gz,prebuilt_artifacts/node-dist.tar.gz,prebuilt_artifacts/rag-api.jar: convert to Git LFS

---------

Co-authored-by: Michael Liu <[email protected]>
* Add RAG Studio documentation for UI guide and quickstart

* docs: Added settings configurations to UI guide

* docs: Add changelog and formatted UI guide and quickstart documentation

* docs: minor correction

* docs: Reorganize and refine RAG Studio UI guide images and content

* docs: Remove heading markers from image references

* Revise tool selection and settings descriptions

Updated the UI guide to improve clarity on tool selection and settings.

* docs: Update README.md
* Extend LFS support to prebuilt_artifacts/models

* Move EasyOCR model artifacts to Git LFS

* fix: EasyOCR model installation script

* fix: update EasyOCR model paths in installation script

* fix: simplify EasyOCR model installation script by removing existing file check

* Fix easyocr model installation script to work with LFS models

* Update EasyOCR model installation script print statement for clarity
* fix: minor UI fixes

* fix: reverting changes to label names
* feat: add support for reading XLSX files

* feat: add pandas excel dependencies

* fix: prevent lambda function from capturing loop variable in EmbeddingIndexer

* Use Executor.submit() args

* feat: renamed XlsxReader with ExcelReader for broader Excel file support

* refactor: renaming XlsxSplitter and fixing mypy errors

* refactor: rename config classes for consistency

* Update release version to dev-testing

* Cast TextNodes directly

* Simplify model_source if-else

* Remove implicit port conversion in config_to_env to stringify of None

* Improve Qdrant configuration and performance with environment variables and gRPC support

* Remove unused environment variables and hardcode embedding concurrency and boto3 max pool connections

* minor fixes for configuration

* Reduce EmbeddingIndexer batch size and add botocore config to BedrockModelProvider

* Adjust batch size in EmbeddingIndexer based on reader type to prevent Qdrant timeouts

* Add support for CSVReader in EmbeddingIndexer and adjust batch size accordingly

* Refactor batch sizes and sampling for EmbeddingIndexer and SummaryIndexer to improve performance with tabular documents

* Enhance ExcelReader to handle empty workbooks and ensure JSON serialization compatibility

* Enhance ExcelReader to handle null dataframes and improve JSON serialization

* fix: Use non-depracated `map` function over `applymap` in ExcelReader

* Refactor batch sizes in EmbeddingIndexer and SummaryIndexer to use Qdrant-safe batches

* Adjust batch sizes in LlamaIndexQdrantVectorStore

* fix: mypy errors

* Update release version to dev-testing

* Refactor Qdrant configuration and ExcelReader for improved performance and compatibility

* fix: more mypy issues

* Update release version to dev-testing

* Enable Git LFS for prebuilt artifacts

* merge origin/main

* Update prebuilt artifacts with new versions

* Update batch sizes for Qdrant vector store and indexing

* fix: Increase memory for application to allow excel use cases

* Update llm-service/app/ai/indexing/readers/base_reader.py

Co-authored-by: mliu-cloudera <[email protected]>

* Update llm-service/app/config.py

Co-authored-by: mliu-cloudera <[email protected]>

* Update llm-service/app/ai/vector_stores/qdrant.py

Co-authored-by: mliu-cloudera <[email protected]>

* Update llm-service/app/ai/indexing/embedding_indexer.py

Co-authored-by: mliu-cloudera <[email protected]>

* fix: minor fixes and adjustments for consistency

* refactor: simplify batch size logic in embedding and summary indexers

* Update llm-service/app/ai/indexing/readers/base_reader.py

Co-authored-by: mliu-cloudera <[email protected]>

* Update llm-service/app/ai/indexing/readers/base_reader.py

Co-authored-by: mliu-cloudera <[email protected]>

* Update llm-service/app/ai/indexing/summary_indexer.py

Co-authored-by: mliu-cloudera <[email protected]>

* refactor: reverting variable name batch_size to max_samples

* Update .DS_Store file in llm-service directory

---------

Co-authored-by: Michael Liu <[email protected]>
Co-authored-by: actions-user <[email protected]>
@baasitsharief baasitsharief merged commit f264171 into release/1 Oct 23, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants