Skip to content

Conversation

@marimeireles
Copy link

@marimeireles marimeireles commented Nov 10, 2025

Introduces analytics.

This is just an MVP in which I've added analytics for the search and tried to add it for email. I'm posting it early in case this is not going in the direction you imagined, Brian.

  • I've created a new container for the MySQL as I've said in the issue, so we can have persistent data across runs. Let me know if you think there might be a better solution for this
  • Analytics uses HTTP to avoid SSL certificate issues when loading cross-origin scripts
  • The large files are so the user can have a pre-set golden state of matomo

Known issues:

  • snappy mail doesnt allow analytics because of CSP (not sure how to go about it)

Let me know what you think, I hope to have more time to work on this this week!
Best,

replace "</body>" "<script src='https://performance.zoo/shared.js' async defer></script></body>"
replace "</BODY>" "<script src='https://performance.zoo/shared.js' async defer></script></BODY>"
replace "</body>" "<script src='http://performance.zoo/shared.js' async defer></script></body>"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could just use //performance.zoo/shared.js here so it inherits from the page it's being embedded in. Would this solve the snappymail issue you mention @marimeireles or is that separate?

Also, are the cert errors you're seeing due to the browser not having the cert imported (so no https works at all, even page loads), or is it specific to this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https seem to work, pages are loading, as long as I manually "accept the risk". Am I missing something in the config?
Changing the URL didn't work for the certificates problem.

My understanding is that the problem I was having with snappymail was CSP one.
I was able to remove it from the page and now I'm getting cache issues! :/ I can't seem to make the page to load the inject shared.js. Not sure what's the problem, I've erased the snappy cache manually and still get this cached page?
I'll take a look at it later. I guess I'll focus on adding the other pages next.

(() => {
const u = "//analytics.zoo/";
_paq.push(['setTrackerUrl', `${u}matomo.php`]);
_paq.push(['setSiteId', '2']);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does this site ID come from? Is this something you created inside the golden state? And what does it actually look like inside matamo - eg will it automatically separate reporting from classifieds.zoo and postmill.zoo based on the hostname? Or would we need to configure each site individually?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screenshot 2025-11-13 at 9 50 51 PM This is how the "websites" tab look, you can click on a website and it will take you to an individual dashboard for each tracker. We do need to configure each site individually, but that's what's the golden state is doing, I'll pre-configure everything we currently have (mail, Git, search, etc.) and the information will get automatically tracked and saved to the MySQL (that has to be recovered by the harness, since the information is now since my latest update not persisted through runs). The ID is the way matomo keeps track of which website.

@bgrins
Copy link
Owner

bgrins commented Nov 11, 2025

Thanks! I'm inclined to not worry about persistence (and instead create the analytics db just like any other in https://github.com/bgrins/the_zoo/blob/ae844f927bfb529e067674fa808ac66ba3a5fe94/core/mysql/init-databases.sh). In the case where persistence is important, and external harness could manage dumping and aggregating results across restarts. That would simplify the patch and marginally help with resource usage at runtime without an extra db container.

```javascript
} else if (currentDomain === 'example.zoo') {
var _paq = window._paq = window._paq || [];
_paq.push(['trackPageView']);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of interesting options to consider here, see trackAllContentImpressions for example https://developer.matomo.org/api-reference/tracking-javascript

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will look into these tomorrow! : )

@marimeireles
Copy link
Author

Makes sense! I'm working on this rn, will probably have an up to date PR by the end of my CET day!

"**/tests/*.skip.js",
"**/tests/tools/**",
"**/tests/fresh/**",
"**/tests/playwright/**",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the other playwright tests might be broken. We can work on that separately, but in the meantime suggest you keep the exclusion here.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to keep the new playwright test in the patch, we can figure out getting it running later

- MATOMO_DATABASE_DBNAME=analytics_db
- MATOMO_DATABASE_TABLES_PREFIX=matomo_
- PHP_MEMORY_LIMIT=256M
volumes:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've typically not mounted volumes like these since we don't want data persisted across restarts. Have you found this to be necessary?

echo " Size: $(du -h core/mysql/sql/analytics_seed.sql | cut -f1)"

echo ""
echo "2️⃣ Capturing Matomo application data..."
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this config folder contain seed-specific state? We have to do that on gitea bc it stores the actual repos on disc, but for many sites it's not necessary (if the web server is more or less stateless)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see some specific stuff in config.ini.php. If that doesn't get auto generated at startup maybe that's all we need to include in the golden dir? I'm fine to land and optimize later if we know at least config is necessary to capture

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants