pdc_discovery

A discovery portal for Princeton research data.

Please note: While this is open-source software, we would disourage anyone from trying to just check it out and run it. Princeton specifics, from styling to authentication and authorization, are hard coded, and we have not invested any time in the kind of configurabily that would be needed for use at another institution. Instead it should be taken as an example of breaking a monolithic project into separate components, and developing iteratively in response to local user feedback.

Dependencies

Ruby: 3.1.0
nodejs: 12.18.3
yarn: 1.22.10
postgres: brew install postgresql@14; brew services start postgresql@14
Lando: 3.0.0

Updating the banner

Update the file config/banner.yml. Note that each environment can have its own banner text.

Local development

Setup

Check out code
bundle install
yarn install

Starting / stopping services

We use lando to run services required for both test and development environments.

Start and initialize solr and database services with:

bundle exec rake servers:start

To stop solr and database services:

bundle exec rake servers:stop or lando stop

Running tests

Fast: bundle exec rspec spec
Run in browser: RUN_IN_BROWSER=true bundle exec rspec spec

Linting Code

We utilize Rubocop for our Ryby code and Prettier for our JavaScript

To run rubocop run bundle exec rubocop
1. To allow for autocorrecting of errors run bundle exec rubocop -a
To run prettier via yar lint run yarn lint
1. To run prettier by itself to see more details on errors run yarn prettier app/javascript
2. To run prettier to autocorrect errors run yarn prettier --write app/javascript

Starting the development server

Terminal one: bin/rails s -p 3000
Access pdc_discovery at http://localhost:3000/

Deploying

To create a tagged release use the steps in the RDSS handbook

Indexing research data from PDC Describe

PDC Discovery indexes data from PDC Describe via the following rake task:

rake index:research_data

This rake task is scheduled to run every 60 minutes on the production and staging servers.

Solr configuration in production/staging

In production and staging we use Solr cloud to manage our Solr index. Our configuration uses a Solr alias to point to the current Solr collection that we are using. For example, in staging the alias pdc-discovery-staging points to the pdc-discovery-staging-new collection. Our code points to the alias.

At the end of the indexing process we delete any Solr documents that were not touched during the indexing. The delete operation is to make sure we don't keep in PDC Discovery records that are not longer in the source (PDC Describe).

NOTE: We used to use two Solr collections (e.g. pdc-discovery-staging-1 and pdc-discovery-staging-2) and toggle between them. We do not use this approach anymore.

Updating the Solr schema in production/staging

To make changes to the Solr schema in production/staging you need to update the files in the pul_solr repository and deploy them. The basic steps are:

Getting your changes into pul_solr configuration file for PDC Discovery

Copy your configuration updates to pul_solr (This command assumes all your projects live in one folder on your machine)
```
cp solr/conf/* ../pul_solr/solr_configs/pdc-discovery/conf/
```
create a Draft PR in pul_solr with your changes ( is the name of your new branch for the PR)

Getting your changes onto the server

Connect to the VPN.
Optional. You can tunnel to machine running Solr ssh -L 8983:localhost:8983 pulsys@lib-solr-staging4 if you want to see your current configuration (e.g. solrconfig.xml or schema.xml).
Make sure you are on the pul-solr repo.
Deploy the changes, e.g. BRANCH=<branch-name> bundle exec cap staging deploy.
verify your changes have worked and mark your PR ready for review
Once the PR has been merged cordiante a time to deploy the changes to production bundle exec cap production deploy

You can see the list of Capistrano environments here

The deploy will update the configuration for all Solr collections in the given environment, but it does not cause downtime. If you need to manually reload a configuration for a given Solr collection you can do it via the Solr Admin UI.

Monitoring

You can view the Honeybadger Uptime check. Currently it checks every minute and will report downtime when two checks fail in a row (i.e. we should know within 2 minutes).

To be notified of downtime enable notifications in Honeybadger under: Settings + Alerts & Integrtions + email (Edit). Enable notifications for "Uptime Events" for "PDC Discovery Production". Notice that email notifications settings are per project.

Mail

Mail on Development

Mailcatcher is a gem that can also be installed locally. See the mailcatcher documentation for how to run it on your machine.

Mail on Staging

To See mail that has been sent on the staging server you can utilize capistrano to open up both mailcatcher consoles in your browser.

cap staging  mailcatcher:console

Look in your default browser for the consoles

Mail on Production

Emails on production are sent via Pony Express.

PPPL / OSTI data feed

There is a data feed at /pppl_reporting_feed.json. It provides a feed of the full JSON blob from PDC Describe for every object tagged as belonging to the Princeton Plasma Physics Laboratory group, sorted by most recently updated first. This is so PPPL can harvest data sets to report to OSTI. This feed can be paged through using the parameters per_page and page, like this:

https://pdc-discovery-staging.princeton.edu/discovery/pppl_reporting_feed.json?per_page=2&page=3

Export of dataset information

There are two rake tasks that produce CSV files with information about the datasets.

bundle exec rake export:summary generates a file that includes the list of datasets and their size (one line per dataset).
bundle exec rake export:details generates a file that includes the list of datasets and their files (one line per file).

The generated file will be outputed to the ENV["DATASET_FILE_TALLY_DIR"] folder and will be named with todays' timestamp.

Name		Name	Last commit message	Last commit date
Latest commit History 792 Commits
.circleci		.circleci
.github/workflows		.github/workflows
app		app
architectture-decisions		architectture-decisions
bin		bin
config		config
db		db
docs		docs
lib		lib
log		log
public		public
solr/conf		solr/conf
spec		spec
storage		storage
tmp		tmp
vendor		vendor
.browserslistrc		.browserslistrc
.eslintrc.js		.eslintrc.js
.gitattributes		.gitattributes
.gitignore		.gitignore
.lando.yml		.lando.yml
.prettierignore		.prettierignore
.prettierrc		.prettierrc
.rspec		.rspec
.rubocop.yml		.rubocop.yml
.rubocop_todo.yml		.rubocop_todo.yml
.ruby-version		.ruby-version
.stylelintrc.json		.stylelintrc.json
.tool-versions		.tool-versions
Capfile		Capfile
Gemfile		Gemfile
Gemfile.lock		Gemfile.lock
Procfile		Procfile
Procfile.dev		Procfile.dev
README.md		README.md
Rakefile		Rakefile
Thorfile		Thorfile
babel.config.js		babel.config.js
config.ru		config.ru
package.json		package.json
postcss.config.js		postcss.config.js
vite.config.mts		vite.config.mts
yarn.lock		yarn.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

pdc_discovery

Dependencies

Updating the banner

Local development

Setup

Starting / stopping services

Running tests

Linting Code

Starting the development server

Deploying

Indexing research data from PDC Describe

Solr configuration in production/staging

Updating the Solr schema in production/staging

Getting your changes into pul_solr configuration file for PDC Discovery

Getting your changes onto the server

Monitoring

Mail

Mail on Development

Mail on Staging

Mail on Production

PPPL / OSTI data feed

Export of dataset information

About

Uh oh!

Releases 129

Packages

Uh oh!

Contributors 16

Uh oh!

Languages

pulibrary/pdc_discovery

Folders and files

Latest commit

History

Repository files navigation

pdc_discovery

Dependencies

Updating the banner

Local development

Setup

Starting / stopping services

Running tests

Linting Code

Starting the development server

Deploying

Indexing research data from PDC Describe

Solr configuration in production/staging

Updating the Solr schema in production/staging

Getting your changes into pul_solr configuration file for PDC Discovery

Getting your changes onto the server

Monitoring

Mail

Mail on Development

Mail on Staging

Mail on Production

PPPL / OSTI data feed

Export of dataset information

About

Resources

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases 129

Packages 0

Uh oh!

Contributors 16

Uh oh!

Languages

Packages