Replies: 3 comments 1 reply
-
|
We didn't add citation traversal yet cuz when you're in a folder of your own PDFs the chance of a traversed citation being there is low. That being said, if we add a real paper search (not just one searching local PDFs) then citation traversal becomes relevant. One question is, if you add a paper search, how do you get the papers in the search results? Will you make a scraping functionality too? If you feel like adding a paper search or citation traversal tool, feel free to open a PR, just make sure there's good unit tests added |
Beta Was this translation helpful? Give feedback.
-
|
Thanks for the reply! I put together a quick prototype over the weekend because I needed paper search urgently. Right now it is a small wrapper around OpenAlex with a basic PDF resolver (using the HTML or direct PDF URL). It works, but it still needs several improvements before it is really solid, for example: better handling of JATS, full-text XML, and structured HTML stronger PDF detection (redirects, cookies, multiple candidate links) smarter prioritization using license status, Crossref metadata, PMC links, etc. source-specific logic for tricky providers; for instance, for some PMC links I had to bypass a PoW gate, and I ended up using a loop with an AI agent (gpt-5-codex) that kept trying download strategies until it succeeded, then I asked it to output the final working script Given there are around 42.5k peer-reviewed journals (STM 2018), ~22k of them full OA, plus preprint servers, indices like PubMed, and repositories like PMC Europe, I am pretty sure we can automate more of this and gradually build a growing list of supported sources, especially if the OS community contributes. I am going to keep improving my prototype. Do you think it makes more sense to open an early PR and iterate together, or would you rather I wait until the paper search and fetching are more complete and robust before submitting a PR? |
Beta Was this translation helpful? Give feedback.
-
|
Btw I didn't any about about literature/OA ecosystem before this weekend so I’m sure I’m missing many known "trics" or methods |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
I understand that internally you already have a real paper search function and citation traversal. I assume the reason these aren’t open-sourced yet may be related to licensing or legal constraints.
If I implement these features myself, can I open a pull request to the main repository, or should I keep it in a separate repo?
thanks
Beta Was this translation helpful? Give feedback.
All reactions