-
Notifications
You must be signed in to change notification settings - Fork 10.5k
Description
Attach (recommended) or Link to PDF file
The PDF file isn't relevant. This issue is about a URL parsing/handling issue.
Web browser and its version
Chrome 141
Operating system and its version
macOS 15.7.1
PDF.js version
5.4.394
Is the bug present in the latest PDF.js version?
Yes
Is a browser extension
No
Steps to reproduce the problem
- Have a PDF file served via a route on the same host which requires query params to fetch
- Pass that as a relative URL to the default viewer via
file=param, use URI encoding as recommended in docs - For example:
let filePath = '/document/open?docPtr=cvhsyvhs7ev1b'
filePath = encodeURIComponent(filePath)
iframeElem.src = '/pdfjs-5.4.394-dist/web/viewer.html?file=' + filePath
- The file fails to open, with a network error due to an invalid route, 404 type error.
What is the expected behavior?
I don't know what the intention was behind adding that URL() call. So hard to say how to fix. But relative URLs and protocol-less URLs worked in 5.3.31 and earlier. So seems like a regression.
What went wrong?
I did some digging into what was happening. Version 5.3.31 is the last version that works without issue for me. After that, in viewer.mjs async run(config) function a code bit was added:
try {
file = new URL(decodeURIComponent(file)).href;
} catch {
file = encodeURIComponent(file).replaceAll("%2F", "/");
}
URL() can only handle fully formed urls with protocol & domain, ex (https://google.com/search?q=abc).
It cannot handle urls without protocol, ex. (google.com/search?q=abc).
Or relative urls on same domain as the host, ex. (/search?q=abc).
You get a TypeError with invalid URL message.
So because of that, when passing those types of paths to the file param, it falls into the catch. Which then URI encodes the already URI encoded URL and does that replaceAll.
So the URL from the reproduction steps above:
/document/open?docPtr=cvhsyvhs7ev1b
is already encoded once when added to the file param:
%2Fdocument%2Fopen%3FdocPtr%3Dcvhsyvhs7ev1b
And is then encoded again by that catch code:
%252Fdocument%252Fopen%253FdocPtr%253Dcvhsyvhs7ev1b
At some point before the file is actually opened, it seems to do a decode on the double encoded URL, so the final network request to fetch the file is:
%2Fdocument%2Fopen%3FdocPtr%3Dcvhsyvhs7ev1b
Which fails because the ? & & characters are encoded. They are reserved characters, to function as having there special meaning they can't be in an encoded form when arriving at the server otherwise they are regarded as text ? and & rather than special characters.
Link to a viewer
No response
Additional context
Seems likely, this other issue has the same under-lying cause:
#20218