Skip to content

Conversation

@kakkoyun
Copy link
Member

@kakkoyun kakkoyun commented Nov 4, 2025

Overview

Adds a complete observability infrastructure stack for demo applications with production-ready monitoring, tracing, and metrics capabilities.

Key Features

Infrastructure Components

  • Jaeger v2.11.0 - Distributed tracing with native OTLP support and SPM (Service Performance Monitoring)
  • Prometheus v3.7.3 - Metrics collection with native OTLP receiver
  • OpenTelemetry Collector v0.139.0 - Telemetry data pipeline
  • Grafana v12.2.1 - Observability dashboards with metric-to-trace correlation
  • k6 v1.3.0 - Load testing for HTTP and gRPC endpoints

Pre-configured Dashboards

  • Go runtime metrics for smoke testing
  • OpenTelemetry Collector health monitoring
  • APM service overview dashboard
  • Metric-to-trace correlation enabled

Developer Experience

  • Docker Compose setup with 40+ Makefile commands
  • Dockerfiles for gRPC demo applications
  • k6 load testing scripts (HTTP/gRPC)
  • Dockerfile linting with hadolint
  • Comprehensive documentation

Quick Start

cd demo/infrastructure/docker-compose
make quickstart

Access points:

What's Included

Commits Summary

  1. Initial observability infrastructure setup
  2. Updated all components to latest stable versions (2024-2025)
  3. Added Dockerfile static checking with hadolint
  4. Created Dockerfiles for demo applications
  5. Fixed configuration issues
  6. Added Go runtime metrics dashboards
  7. Added APM dashboard for service monitoring
  8. Enabled Jaeger SPM for service performance metrics

Testing

All infrastructure components tested and verified working together with the gRPC demo applications.

@github-actions github-actions bot added the conventional-commit/chore Something that needs to be taken care of but not very appealing label Nov 4, 2025
@kakkoyun kakkoyun changed the title chore(demo): add observability infrastructure WIP: chore(demo): add observability infrastructure Nov 4, 2025
@github-actions
Copy link

github-actions bot commented Nov 4, 2025

The title of this pull request does not match the conventional commits format.
Please update the title, as apprioriate.

Refer to the CONTRIBUTING.md file for more information.

1 similar comment
@github-actions
Copy link

github-actions bot commented Nov 4, 2025

The title of this pull request does not match the conventional commits format.
Please update the title, as apprioriate.

Refer to the CONTRIBUTING.md file for more information.

@kakkoyun kakkoyun changed the title WIP: chore(demo): add observability infrastructure chore(demo): add observability infrastructure - WIP Nov 4, 2025
@kakkoyun kakkoyun force-pushed the add_demo_plumbing branch 2 times, most recently from 163118d to a21a31f Compare November 6, 2025 13:55
@kakkoyun kakkoyun changed the title chore(demo): add observability infrastructure - WIP chore(demo): add observability infrastructure Nov 6, 2025
@kakkoyun kakkoyun marked this pull request as ready for review November 6, 2025 19:37
@kakkoyun kakkoyun requested a review from a team as a code owner November 6, 2025 19:37
@echo "Linting YAML files..."
yamlfmt -lint -dstar '**/*.yml' '**/*.yaml'

lint/dockerfile: ## Lint Dockerfiles
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need a DockerfileLinter&&GithubCheckDockerfileAction. This isn't a Dockerfile-heavy project, and the benefit doesn't justify the added complexity.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it hurt having one? Where is the complexity?

It can save development time by catching trivial errors.

Copy link
Contributor

@y1yang0 y1yang0 Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point about catching trivial errors. My concern isn't about the linter itself, but about integrating it into the global Makefile and CI pipeline(makefile changes+.hadolint.yaml+lint-dockerfil.yaml).

Since we only have a couple of Dockerfiles that rarely change, adding a mandatory check for everyone feels like a bit of an over-optimization. It adds a dependency and a step to our shared workflow for a very specific use case.

Perhaps this would be better as a recommendation for local development for those who touch the Dockerfiles? That way, we get the benefit without adding overhead to the central build process.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can have the compromise and not run it for the CI.
Or we can only run it when a Dockerfile actually changed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'll put away my magnifying glass for now ;) It's not a critical issue, but I'd still like to see this removed in the future if the Dockerfiles prove to be stable with few changes. The number of make all targets is already getting a bit intimidating...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for the Go tooling I will introduce another way and clean the Makefile soon.
I believe static checks and developer convenience tooling shouldn't intimidate us. This is an open-source project and there will be a lot of contributors (I hope), so automated tools are better than reviewers trying the ensuring the standards.

@kakkoyun kakkoyun marked this pull request as draft November 7, 2025 10:26
Adds complete observability stack for demo applications with:
- Docker Compose setup for Jaeger, Prometheus, OTel Collector, Grafana
- Pre-configured Grafana dashboards (Go metrics, OTel Collector, Services)
- k6 load testing scripts for HTTP and gRPC
- Dockerfiles for gRPC demo applications
- Makefile with 40+ convenience commands
- Comprehensive documentation

Infrastructure is production-ready with latest 2024-2025 versions:
- Jaeger v2.0.0 (native OTLP)
- Prometheus v3.0.1 (native OTLP receiver)
- OpenTelemetry Collector v0.138.0
- Grafana with metric-to-trace correlation

Located under demo/infrastructure/ with future Kubernetes support planned.

Signed-off-by: Kemal Akkoyun <[email protected]>
Updates all observability stack components to latest stable versions:
- Jaeger: 2.0.0 → 2.11.0
- Prometheus: v3.0.1 → v3.7.3
- OpenTelemetry Collector: 0.138.0 → 0.139.0
- Grafana: latest → 12.2.1 (explicit version)
- k6: latest → 1.3.0 (explicit version)

Signed-off-by: Kemal Akkoyun <[email protected]>
Signed-off-by: Kemal Akkoyun <[email protected]>
Signed-off-by: Kemal Akkoyun <[email protected]>
Signed-off-by: Kemal Akkoyun <[email protected]>
@kakkoyun kakkoyun requested a review from y1yang0 November 7, 2025 12:31
@kakkoyun kakkoyun marked this pull request as ready for review November 7, 2025 12:31
Copy link
Contributor

@y1yang0 y1yang0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style and quality LGTM. @NameHaibinZhang @ralf0131 @pdelewski can you take a look at the functional side?

@echo "Linting YAML files..."
yamlfmt -lint -dstar '**/*.yml' '**/*.yaml'

lint/dockerfile: ## Lint Dockerfiles
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'll put away my magnifying glass for now ;) It's not a critical issue, but I'd still like to see this removed in the future if the Dockerfiles prove to be stable with few changes. The number of make all targets is already getting a bit intimidating...

@pdelewski
Copy link
Member

All tooling seems very useful, however at the moment, I'm experiencing following issue
image

@y1yang0
Copy link
Contributor

y1yang0 commented Nov 11, 2025

All tooling seems very useful, however at the moment, I'm experiencing following issue image

Maybe rm .otel-build and build again?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

conventional-commit/chore Something that needs to be taken care of but not very appealing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants