Skip to content

Conversation

@wxdwfc
Copy link
Collaborator

@wxdwfc wxdwfc commented Jul 27, 2025

This PR mainly do the following:

  1. Provide a base image (phos-base-113) to simplify building, e.g., now developer can use make build to quickly build PhOS (w/o building dependencies from scractch) and make server-run to start PhOS workspace without in an interactive container.

  2. Separate the build of some packages (e.g., g++) to a separate flag in phos build system to simplify building the container.

  3. Move the doc of build from source to a separate markdown file to make the README more concise.

The build has been tested on an A800 server.

@wxdwfc wxdwfc requested review from Copilot and zobinHuang July 27, 2025 12:07
@wxdwfc wxdwfc added documentation Improvements or additions to documentation enhancement New feature or request labels Jul 27, 2025
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a base Docker image (phos-base-113) to simplify PhOS building, separates package installation from core building through new flags, and reorganizes documentation by moving build instructions to a dedicated markdown file.

  • Adds a new Dockerfile and Makefile for containerized builds with pre-built dependencies
  • Introduces -p and -b flags to separate package installation from PhOS building
  • Moves comprehensive build documentation from README to a separate markdown file

Reviewed Changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
scripts/utils/get_root_dir.sh Enhanced to handle non-git environments gracefully
scripts/build_scripts/main.go Added new flags for controlling package installation and building phases
scripts/build_scripts/cuda.go Moved NCCL installation logic and separated package vs build phases
scripts/build_scripts/pos.go Commented out NCCL installation (moved to cuda.go)
dockerfiles/build_113.Dockerfile New Dockerfile for creating base image with pre-built dependencies
Makefile New Makefile providing convenient targets for containerized operations
README.md Simplified to reference external documentation
docs/docs/getting_started/build_from_source.md Comprehensive build documentation moved from README

func CRIB_PhOS_CUDA(cmdOpt CmdOptions, buildConf BuildConfigs, logger *log.Logger) {
if cmdOpt.DoBuild {
if cmdOpt.DoPackage {
// ==================== Prepare ====================
Copy link

Copilot AI Jul 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The condition changed from cmdOpt.DoBuild to cmdOpt.DoPackage but this affects a large block of package installation code. Consider adding a comment explaining this separation of concerns between package installation and building phases.

Suggested change
// ==================== Prepare ====================
// ==================== Prepare ====================
// The following block installs necessary packages and tools required for the packaging phase.
// This separation ensures that package installation is only performed when packaging is requested,
// avoiding unnecessary installations during the building phase.

Copilot uses AI. Check for mistakes.
Comment on lines +32 to +39
//utils.CheckAndInstallMultiPackagesViaOsPkgManager([]string{
// "libnccl2=2.26.5-1+cuda12.9", "libnccl-dev=2.26.5-1+cuda12.9",
//}, logger)
//if !utils.CheckPackageViaOsPkgManager("libnccl-dev", logger) {
// utils.UnInstallPackageViaOsPkgManager("libnccl2", logger)
//}
//utils.CheckAndInstallPackageViaOsPkgManager("libnccl2=2.26.5-1+cuda12.9", logger)
//utils.CheckAndInstallPackageViaOsPkgManager("libnccl-dev=2.26.5-1+cuda12.9", logger)
Copy link

Copilot AI Jul 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Large blocks of commented code should be removed rather than left as comments. If this code needs to be preserved for reference, consider adding a brief comment explaining why it's disabled and where the functionality moved to.

Suggested change
//utils.CheckAndInstallMultiPackagesViaOsPkgManager([]string{
// "libnccl2=2.26.5-1+cuda12.9", "libnccl-dev=2.26.5-1+cuda12.9",
//}, logger)
//if !utils.CheckPackageViaOsPkgManager("libnccl-dev", logger) {
// utils.UnInstallPackageViaOsPkgManager("libnccl2", logger)
//}
//utils.CheckAndInstallPackageViaOsPkgManager("libnccl2=2.26.5-1+cuda12.9", logger)
//utils.CheckAndInstallPackageViaOsPkgManager("libnccl-dev=2.26.5-1+cuda12.9", logger)
// The NCCL installation logic has been disabled temporarily.
// Future updates will handle NCCL installation based on CUDA version.

Copilot uses AI. Check for mistakes.
repo_root=$(git rev-parse --show-toplevel)
echo "$repo_root"
else
echo "Not in a git repo."
Copy link

Copilot AI Jul 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message 'Not in a git repo.' is written to stdout, which could be problematic if the script output is used by other scripts expecting a directory path. Consider writing error messages to stderr or returning an error code.

Suggested change
echo "Not in a git repo."
echo "Not in a git repo." >&2
exit 1

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants