-
Notifications
You must be signed in to change notification settings - Fork 1
Solr API
LabCAS uses the Solr search engine in order to store, search, and retrieve metadata for the science data in the EDRN Cancer Biomarker Commons. LabCAS also provides an API that lets authenticated EDRN users run searches on the Solr API. This document will help you get started with this API.
The intent of this documentation is not to replace Solr's documentation. You are encouraged to read the Solr Common Query Parameters documentation to learn how to construct queries for Solr. Some example queries will be given within this document. Please note LabCAS uses Solr version 6.6.
This document uses Postman to make queries to the LabCAS Solr API, as it takes care of formulating URLs, quoting parameters, and so forth. Postman can also generate code for Python, C#, Java, etc., as well as for the curl command, so it serves as a nice umbrella technology.
If you prefer to write code instead of using Postman, you can craft queries for LabCAS Solr API. Two example programs are available (in Python) that demonstrate this capability:
- cibbbcd_events.py — this program extracts event IDs from the Solr "datasets" core for the LabCAS collection "Combined Imaging and Blood Biomarkers for Breast Cancer Diagnosis"
- events_by_blind.py — this program displays event IDs from the Solr "files" core given a blinded site ID as a parameter
You can read over the source code for these, or install the example programs onto your system for direct execution; see the README titled "Data Access API: Examples" and the source code for more information.
The remainder of this document will show how to use the Solr API directly using Postman.
First, download and install Postman. Postman is free software. There is also a web version, but for this document we'll use the desktop version.
Launch Postman for the first time, and in the lower-right, in the bottom status bar, click "⌂ Vault". The "vault" is where we'll store your EDRN username and password.
The first time you do this, you'll be prompted to encrypt your vault. That's a good security measure, so go ahead and click the Encrypt button. You'll get a "vault key" which you can use to unlock your vault in the future. You can save this key (a long hexadecimal number) in a safe place such as your password manager. Finally, press "Open Vault".
In the table of vault secrets, click "Add new secret" and name it edrn_username. For the value, put in your EDRN username. Under "Allowed Domains", enter https://edrn-labcas.jpl.nasa.gov.
Repeat this, but for the next secret call it edrn_password. For the value, put in your EDRN password. Use the same "Allowed Domain".
Finally, close the vault by clicking the ⤫ in the tab bar at the top.
We have created a Postman Collection that describes the LabCAS Solr API. With this, you won't have to worry about setting the URL, authorization, or query parameters.
Download the Postman Collection for the LabCAS Solr API.
Once downloaded, import it into your Postman from the "File → Import" menu.
Once you've got the Postman Collection imported, you should have a new item in your Postman Workspace, "LabCAS Solr API". You can expand the collection and see the three endpoints:
- Collections — describes the high-level science data collections in LabCAS
- Datasets — organizes the data in collections into groupings, typically associated with parts of a study (case versus control) or participants, or by other logical separation. Datasets can contain either other datasets (forming a hierarchy) or files in LabCAS
- Files — represents the metadata for individual files of scientific data, such as DICOM files. This core lets you retrieve the metadata for files. Note that download actual files is a separate API, not described here
To use these endpoints:
- Select Collections, Datasets, or Files
- Click the "Params" tab if it's not already visible
- Enter a Solr query in the
qparameter; fill in other parameters as needed - Press "Send"
As a test, try this:
- Select "Collections"
- In the "Params" tab type "biomarker" into the
q(meaning "show all collections with the wordbiomarkerin them") - Leave all other parameters at their defaults
- Press "Send"
In the lower half of the screen, make sure the "Pretty" and "JSON" formats are selected. You should see around 16 collections that match. Feel free to try the other options, AI formatting features, code conversions, etc.
The following are a few queries you can try:
- "Return
eventIDfor files withCollectionNameofLung Team Project 2 Imagesin JSON format"- Use the "Files" endpoint
- Set
qtoCollectionName:"Lung Team Project 2 Images"— note the quotes since the name has spaces - Set
fltoeventID - Set
wttojson - Set
rowsto999999— adjust this as needed, or userows+startto paginate
- "All details of collections with
SpecimenTypeofSerumin XML format"- Use the "Collections" endpoint
- Set
qtoSpecimenType:Serum - Set
rowsto99999— adjust this as needed, or userows+startto paginate - Set
wttoxml
- "Top 10
LeadPInames andLeadPIIdIDs of all datasets withCollectionNameofLung Team Project 2 Imagesin JSON format"- Use the "Datasets" endpoint
- Set
qtoCollectionName:"Lung Team Project 2 Images" - Set
fltoLeadPI,LeadPIId - Set
rowsto10 - Set
wttojson
- "The ID, data custodian, and data custodian email of the top 100 files with
City_of_Hopein their IDs in CSV format"- Use the "Files" endpoint
- Set
qtoid:*City_of_Hope* - Set
fltoid,DataCustodian,DataCustodianEmail - Set
rowsto100 - Set
wttocsv
Please note that the Postman Collection included above only includes a subset of the parameters that Solr supports. If you're confident in your programming skills, curl command usage, etc., feel free to use advanced parameters like fq, facet, facet.field, and so forth.
Consulting the Solr query documentation can be helpful in these cases, as well as custom Solr clients for your programming languages of choice.
ChatGPT or other large language models can be instrumental in helping to express the q, fq, etc. parameter syntax without the need of fully understanding Solr's query language.
As an example, this prompt presented to the ChatGPT "4o" model:
Write a curl command for Solr at https://https://edrn-labcas.jpl.nasa.gov/data-access-api/files/select that takes HTTP Basic username EDRNUSERNAME with password EDRNPASSWORD to return the non-empty "eventID" fields for the first 100 files where the "CollectionName" field is "Lung Team Project 2 Images"
produces a valid curl command as of this writing.