Distributed Conversational AI Pipeline for Legacy CPU Clusters

Last updated: 2025-11-06

This project provides a complete solution for deploying a high-performance, low-latency conversational AI pipeline on a cluster of legacy, resource-constrained desktop computers. It uses Ansible for automated provisioning, Nomad for cluster orchestration, and a state-of-the-art AI stack to create a responsive, streaming, and embodied voice agent. For a detailed technical description of the system's layers, see the Holistic Project Architecture document.

1. System Requirements

Cluster Nodes: 3 to 20 legacy desktop computers (Intel Core 2 Duo or similar, 8GB RAM, SSD recommended).
Control Node: A machine to run Ansible for provisioning.
Recommended OS: Debian Trixie, minimal install with SSH server.

2. Project Structure

A brief overview of the key directories in this repository:

/ansible: Contains all Ansible playbooks, roles, and templates for provisioning and deploying the entire system.
- /ansible/roles: Individual, reusable components for managing specific parts of the system (e.g., nomad, consul, pipecatapp).
- /ansible/roles/pipecatapp/files: The core Python source code for the conversational agent, including app.py, memory.py, and the tools directory.
/prompt_engineering: Scripts and tools for evaluating and improving the AI's prompts using evolutionary algorithms.
/reflection: Scripts related to the agent's self-reflection and self-healing capabilities.
/scripts: Utility and linting scripts for maintaining code quality.
/testing: Contains unit and integration tests for the various components of the project.
/*.yaml: Top-level Ansible playbook files (e.g., playbook.yaml, heal_cluster.yaml).
/group_vars: Ansible configuration files that apply to all hosts, such as all.yaml and models.yaml.

3. Initial Machine Setup

Setting up a new cluster involves two main methods: a one-time manual setup for the first node, and a fully automated setup for all subsequent nodes.

3.1. Manual Setup (First Node / PXE Server)

The first node in your cluster requires a manual OS installation. This node will later be configured by Ansible to act as the PXE/iPXE boot server for all other nodes.

Install Debian Trixie: Perform a standard, minimal installation of Debian Trixie with an SSH server.
Clone this repository: git clone <repo_url>
Configure Initial Settings: Enter the initial-setup directory and edit the setup.conf file. You must provide the machine's desired HOSTNAME, a static IP address, and the CONTROL_NODE_IP (which should be the static IP of this same machine, as it will become the control node).
Run Setup Script: Execute the script with root privileges: sudo bash setup.sh
Reboot.

After rebooting, this node is ready for Ansible provisioning (see Section 4). It should be designated as both a controller_node and your pxe_server in the Ansible inventory.

3.2. Automated Setup (All Other Nodes)

Once your first node has been provisioned by Ansible and the pxe_server role has been applied to it, you can automatically install Debian on all other bare-metal machines in your cluster.

This system uses an advanced iPXE-over-HTTP method that is significantly faster and more reliable than traditional PXE. For detailed instructions on how to apply the Ansible role and prepare the client machines for network booting, see the iPXE Boot Server Setup Guide.

4. Easy Bootstrap (Single-Server Setup)

For development, testing, or bootstrapping the very first node of a new cluster, you can use the provided bootstrap script. This is the recommended method for getting started.

On your control node, install Git: sudo apt install git -y
Clone this repository.
Run the Bootstrap Script: This script is a powerful wrapper around Ansible that handles all necessary steps to configure the local machine as a fully-functional, standalone agent, a cluster controller, or a new worker node.

Basic Usage:
```
./bootstrap.sh
```
- What this does: By default, the script runs the complete end-to-end process to configure the local machine as a standalone agent and control node. It invokes a series of Ansible playbooks that install and configure all necessary system components (Consul, Nomad, Docker) and deploy the AI agent services.
- You will be prompted for your sudo password, as the script needs administrative privileges to install and configure software.
Common Flags for Customizing the Bootstrap Process:

You can control the behavior of the bootstrap script with the following flags:
- --role <role>: Specify the role for the node.
  - all (default): Full setup for a standalone control node.
  - controller: Sets up only the core infrastructure services (Consul, Nomad, etc.).
  - worker: Configures the node as a worker and requires --controller-ip.
- --controller-ip <ip>: The IP address of the main controller node. Required when --role is worker.
- --tags <tag1,tag2>: Run only specific parts of the Ansible playbook (e.g., --tags nomad would only run the Nomad configuration tasks).
- --external-model-server: Skips the download and build steps for large language models. This is ideal for development or if you are using a remote model server.
- --purge-jobs: Stops and purges all running Nomad jobs before starting the bootstrap process, ensuring a clean deployment.
- --clean: Use with caution. This will permanently delete all untracked files in the repository (git clean -fdx), restoring it to a pristine state.
- --debug: Enables verbose Ansible logging (-vvvv) and saves the full output to playbook_output.log.
- --continue: If a previous bootstrap run failed, this flag will resume the process from the last successfully completed playbook, saving significant time.

This single node is now ready to be used as a standalone conversational AI agent. It can also serve as the primary "seed" node for a larger cluster. To expand your cluster, see the advanced guide below.

4. Advanced: Multi-Node Cluster Provisioning

If you are setting up a multi-node cluster, you will need to work with the Ansible inventory directly.

Configure Initial Inventory (inventory.yaml): Edit the inventory.yaml file to define your initial controller nodes. While new worker nodes will be added to the cluster automatically, you must define the initial seed nodes for the control plane here.
- Create a host group named controller_nodes. This group must contain at least one node that will act as the primary control node and Nomad server.
- Create an empty host group named worker_nodes. This group will be populated automatically as new nodes join the cluster.
Run the Main Playbook: Run the following command from the root of this repository. This will configure the initial control node(s) and prepare the cluster for auto-expansion.
```
ansible-playbook -i inventory.yaml playbook.yaml --ask-become-pass
```
- --ask-become-pass: This flag is important. It will prompt you for your sudo password, which Ansible needs to perform administrative tasks.
- What this does: This playbook not only aconfigures the cluster services (Consul, Nomad, etc.) but also automatically bootstraps the primary control node into a fully autonomous AI agent by deploying the necessary AI services.

5. Expanding the Control Plane (Adding Controllers)

This cluster is designed for resilience and scalability. As your needs grow, you may need to add more controller nodes to the control plane for higher availability. This process is fully automated.

To promote an existing worker node to a controller:

Ensure the worker is part of the cluster: The node you wish to promote must already be a provisioned worker and visible in nomad node status.

Run the promotion playbook:

ansible-playbook promote_controller.yaml

Enter the hostname: You will be prompted to enter the exact hostname of the worker node you want to promote (e.g., worker1).

The playbook will handle everything:

It safely modifies the inventory.yaml file to move the node from the workers group to the controller_nodes group.
It stops the services on the target node, cleans up the old worker-specific state, and re-runs the consul and nomad configuration roles to re-provision it as a server.
The node will automatically rejoin the cluster as a controller, strengthening the control plane.

6. Agent Architecture: The `TwinService`

The core of this application is the TwinService, a custom service that acts as the agent's "brain." It orchestrates the agent's responses, memory, and tool use.

6.1. Memory

Short-Term: Remembers the last 10 conversational turns in a simple list.
Long-Term: Uses a FAISS vector store (long_term_memory.faiss) to remember key facts. It performs a semantic search over this memory to retrieve relevant context for new conversations.

6.2. Tool Use

The agent can use tools to perform actions and gather information. The TwinService dynamically provides the list of available tools to the LLM in its prompt, enabling the LLM to decide which tool to use based on the user's query.

Available Tools

SSH (ssh): Executes commands on remote machines.
Master Control Program (mcp): Provides agent introspection and self-control (e.g., status checks, memory management).
Vision (vision): Gets a real-time description of what is visible via the webcam.
Desktop Control (desktop_control): Provides full control over the desktop environment, including taking screenshots and performing mouse/keyboard actions.
Code Runner (code_runner): Executes Python code in a secure, sandboxed environment.
Web Browser (web_browser): Enables web navigation and content interaction.
Ansible (ansible): Runs Ansible playbooks to manage the cluster.
Power (power): Controls the cluster's power management policies.
Summarizer (summarizer): Summarizes conversation history.
Term Everything (term_everything): Provides a terminal interface for interacting with the system.
RAG (rag): Searches the project's documentation to answer questions.
Home Assistant (ha): Controls smart home devices via Home Assistant.
Git (git): Interacts with Git repositories.
Orchestrator (orchestrator): Dispatches high-priority, complex jobs to the cluster.
LLxprt Code (llxprt_code): A specialized tool for code-related tasks.

6.3. Mixture of Experts (MoE) Routing

The agent is designed to function as a "Mixture of Experts." The primary pipecat agent acts as a router, classifying the user's query and routing it to a specialized backend expert if appropriate.

How it Works: The TwinService prompt instructs the main agent to first classify the user's query. If it determines the query is best handled by a specialist (e.g., a 'coding' expert), it uses the route_to_expert tool. This tool call is intercepted by the TwinService, which then discovers the expert's API endpoint via Consul and forwards the query.
Configuration: Deploying these specialized experts is done using the deploy_expert.yaml Ansible playbook. For detailed instructions, see the Advanced AI Service Deployment section below.

6.4. Configuring Agent Personas

The personality and instructions for the main router agent and each expert agent are defined in simple text files located in the ansible/roles/pipecatapp/files/prompts/ directory. You can edit these files to customize the behavior of each agent. For example, you can edit coding_expert.txt to give it a different programming specialty.

7. Interacting with the Agent

There are two primary ways to interact with the conversational agent: the web interface and the Gemini CLI extension.

7.1. Web Interface

Navigate to the IP address of any node in your cluster on port 8000 (e.g., http://192.168.1.101:8000). The web UI provides real-time conversation logs, a request-approval interface, and the ability to save and load the agent's memory state.

7.2. Gemini CLI Extension

For command-line users, a Gemini CLI extension is provided to send messages directly to the agent.

7.2.1. First-Time Setup

Install the Gemini CLI:
```
npm install -g @google/gemini-cli
```
Navigate to the extension directory:
```
cd pipecat-agent-extension
```
Install dependencies and build the extension:
```
npm install
npm run build
```
Link the extension to your Gemini CLI installation:
```
gemini extensions link .
```

7.2.2. Sending a Message

Once the extension is linked, you can use the custom /pipecat:send command to send a message to the agent:

gemini /pipecat:send "Your message here"

Example:

gemini /pipecat:send "Can you write a python script to list files in a directory?"

The agent will process this message as if you had typed it in the web UI.

8. AI Service Deployment

The system is designed to be self-bootstrapping. The bootstrap.sh script (or the main playbook.yaml) handles the deployment of the core AI services on the primary control node. This includes a default instance of the llama-expert job and the pipecat voice agent.

8.1. Start, Restart, or "Heal" Your Core Services

If a job has been stopped, or you just want to verify that everything is running as it should be, you now use your new, lightweight playbook. It will skip all the system setup and only manage the Nomad jobs.

ansible-playbook heal_cluster.yaml

If you make a change to a job file or need to restart the services from a clean state, it's best to purge the old jobs before running the start script again.

nomad job stop -purge llamacpp-rpc
nomad job stop -purge pipecat-app

8.2. Advanced: Deploying Additional AI Experts

The true power of this architecture is the ability to deploy multiple, specialized AI experts that the main pipecat agent can route queries to. With the new unified llama-expert.nomad job template, deploying a new expert is handled through a dedicated Ansible playbook.

Define a Model List for Your Expert: First, open group_vars/models.yaml and create a new list of models for your expert. For example, to create a creative-writing expert, you could add:
```
creative_writing_models:
  - name: "phi-3-mini-instruct"
    # ... other model details
```
Deploy the Expert with Ansible: Use the deploy_expert.yaml playbook to render the Nomad job with your custom parameters and launch it. You pass variables on the command line using the -e flag.
- Example: Deploying a creative-writing expert to the creative namespace:
```
ansible-playbook deploy_expert.yaml -e "job_name=creative-expert service_name=llama-api-creative namespace=creative model_list={{ creative_writing_models }} worker_count=2"
```

The TwinService in the pipecatapp will automatically discover any service registered in Consul with the llama-api- prefix and make it available for routing.

9. Advanced System Features

9.1. Power Management

To optimize resource usage on legacy hardware, this project includes an intelligent power management system.

How it Works: A Python service, power_agent.py, uses an eBPF program (traffic_monitor.c) to monitor network traffic to specific services at the kernel level with minimal overhead.
Sleep/Wake: If a monitored service is idle for a configurable period, the power agent automatically stops the corresponding Nomad job. When new traffic is detected, the agent restarts the job.
Configuration: The agent can configure this behavior using the power.set_idle_threshold tool.

9.2. Mission Control Web UI

This project includes a web-based dashboard for real-time display and debugging. To access it, navigate to the IP address of any node in your cluster on port 8000 (e.g., http://192.168.1.101:8000). The UI provides:

Real-time conversation logs.
A request-approval interface for sensitive tool actions.
The ability to save and load the agent's memory state.

10. Testing and Verification

Check Cluster Status: nomad node status
Check Job Status: nomad job status
View Logs: nomad alloc logs <allocation_id> or use the Mission Control Web UI.

Cluster Health Check

A dedicated health check job exists to verify the status of all running LLM experts. This provides a quick way to ensure the entire cluster is operational.

Run the check: ansible-playbook run_health_check.yaml
View results: nomad job logs health-check
Manual Test Scripts: A set of scripts for manual testing of individual components is available in the testing/ directory.

10.1. Code Quality and Linting

This project uses a suite of linters to ensure code quality and consistency. For detailed instructions on how to install the development dependencies and run the checks, please see the Linting Documentation.

To run all linters, use the following command:

npm run lint

11. Performance Tuning & Service Selection

Model Selection: The llama-expert.nomad job is configured via Ansible variables in group_vars/models.yaml. You can define different model lists for different experts.
Network: Wired gigabit ethernet is strongly recommended over Wi-Fi for reduced latency.
VAD Tuning: The RealtimeSTT sensitivity can be tuned in app.py for better performance in noisy environments.
STT/TTS Service Selection: You can choose which Speech-to-Text and Text-to-Speech services to use by setting environment variables in the pipecatapp.nomad job file.

12. Benchmarking

This project includes two types of benchmarks.

12.1. Real-Time Latency Benchmark

Measures the end-to-end latency of a live conversation. Enable it by setting BENCHMARK_MODE = "true" in the env section of the pipecatapp.nomad job file. Results are printed to the job logs.

12.2. Standardized Performance Benchmark

Uses llama-bench to measure the raw inference speed (tokens/sec) of the deployed LLM backend. Run the benchmark.nomad job to test the performance of the default model.

nomad job run /opt/nomad/jobs/benchmark.nomad

View results in the job logs: nomad job logs llama-benchmark

13. Advanced Development: Prompt Evolution

For advanced users, this project includes a workflow for automatically improving the agent's core prompt using evolutionary algorithms. See prompt_engineering/PROMPT_ENGINEERING.md for details.

Name		Name	Last commit message	Last commit date
Latest commit History 1,959 Commits
.github/workflows		.github/workflows
.husky		.husky
ansible		ansible
distributed-llama-repo		distributed-llama-repo
docker		docker
e2e		e2e
examples		examples
group_vars		group_vars
host_vars		host_vars
initial-setup		initial-setup
jules-scratch		jules-scratch
pipecat-agent-extension		pipecat-agent-extension
playbooks		playbooks
prompt_engineering		prompt_engineering
prompts		prompts
reflection		reflection
scripts		scripts
testing		testing
.djlint.toml		.djlint.toml
.gitattributes		.gitattributes
.gitignore		.gitignore
.markdownlint.json		.markdownlint.json
.yamllint		.yamllint
AGENTS.md		AGENTS.md
ARCHITECTURE.md		ARCHITECTURE.md
BENCHMARKING.MD		BENCHMARKING.MD
DEPLOYMENT_AND_PROFILING.md		DEPLOYMENT_AND_PROFILING.md
GEMINI.md		GEMINI.md
LICENSE		LICENSE
MCP_SERVER_SETUP.md		MCP_SERVER_SETUP.md
MEMORIES.md		MEMORIES.md
NIXOS_PXE_BOOT_SETUP.md		NIXOS_PXE_BOOT_SETUP.md
PROJECT_SUMMARY.md		PROJECT_SUMMARY.md
PXE_BOOT_SETUP.md		PXE_BOOT_SETUP.md
README.md		README.md
REMOTE_WORKFLOW.md		REMOTE_WORKFLOW.md
TODO.md		TODO.md
aid_e_log.txt		aid_e_log.txt
ansible.cfg		ansible.cfg
benchmark_single_model.yaml		benchmark_single_model.yaml
bob.prompt.bin		bob.prompt.bin
bootstrap.sh		bootstrap.sh
cgroup.conf		cgroup.conf
cgroup_allowed_devices_file.conf		cgroup_allowed_devices_file.conf
chat.prompt.bin		chat.prompt.bin
check_all_playbooks.sh		check_all_playbooks.sh
debug_expert.sh		debug_expert.sh
debug_template.yaml		debug_template.yaml
deploy_app.yaml		deploy_app.yaml
deploy_expert.yaml		deploy_expert.yaml
deploy_prompt_evolution.yaml		deploy_prompt_evolution.yaml
diagnose_and_log_home_assistant.yaml		diagnose_and_log_home_assistant.yaml
diagnose_failure.yaml		diagnose_failure.yaml
diagnose_home_assistant.yaml		diagnose_home_assistant.yaml
e2e-tests.yaml		e2e-tests.yaml
en_US-lessac-medium.onnx		en_US-lessac-medium.onnx
fix_cluster.yaml		fix_cluster.yaml
heal_cluster.yaml		heal_cluster.yaml
heal_job.yaml		heal_job.yaml
health_check.yaml		health_check.yaml
hostfile		hostfile
inventory.yaml		inventory.yaml
local_inventory.ini		local_inventory.ini
mqtt.nomad		mqtt.nomad
nomad.log		nomad.log
package-lock.json		package-lock.json
package.json		package.json
pipecatapp.nomad		pipecatapp.nomad
playbook.yaml		playbook.yaml
promote_controller.yaml		promote_controller.yaml
pxe_setup.yaml		pxe_setup.yaml
pytest.ini		pytest.ini
redeploy_pipecat.yaml		redeploy_pipecat.yaml
requirements-dev.txt		requirements-dev.txt
run_config_manager.yaml		run_config_manager.yaml
run_consul.yaml		run_consul.yaml
run_ha_diag.yaml		run_ha_diag.yaml
run_health_check.yaml		run_health_check.yaml
slurm.conf		slurm.conf
start_services.sh		start_services.sh
status-check.yaml		status-check.yaml
supervisor.py		supervisor.py
test.wav		test.wav
test_consul.yaml		test_consul.yaml
test_llama_cpp.yaml		test_llama_cpp.yaml
test_nomad.yaml		test_nomad.yaml
wake.yaml		wake.yaml

License

LokiMetaSmith/llama-cluster-upbringing-script

Folders and files

Latest commit

History

Repository files navigation

Distributed Conversational AI Pipeline for Legacy CPU Clusters

1. System Requirements

2. Project Structure

3. Initial Machine Setup

3.1. Manual Setup (First Node / PXE Server)

3.2. Automated Setup (All Other Nodes)

4. Easy Bootstrap (Single-Server Setup)

4. Advanced: Multi-Node Cluster Provisioning

5. Expanding the Control Plane (Adding Controllers)

6. Agent Architecture: The TwinService

6.1. Memory

6.2. Tool Use

Available Tools

6.3. Mixture of Experts (MoE) Routing

6.4. Configuring Agent Personas

7. Interacting with the Agent

7.1. Web Interface

7.2. Gemini CLI Extension

7.2.1. First-Time Setup

7.2.2. Sending a Message

8. AI Service Deployment

8.1. Start, Restart, or "Heal" Your Core Services

8.2. Advanced: Deploying Additional AI Experts

9. Advanced System Features

9.1. Power Management

9.2. Mission Control Web UI

10. Testing and Verification

Cluster Health Check

10.1. Code Quality and Linting

11. Performance Tuning & Service Selection

12. Benchmarking

12.1. Real-Time Latency Benchmark

12.2. Standardized Performance Benchmark

13. Advanced Development: Prompt Evolution

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

6. Agent Architecture: The `TwinService`

Packages