Skip to content

[Bug]: Passing relative or protocol-less URL to viewer file param fails to open file correctly #20420

@leebickmtu

Description

@leebickmtu

Attach (recommended) or Link to PDF file

The PDF file isn't relevant. This issue is about a URL parsing/handling issue.

Web browser and its version

Chrome 141

Operating system and its version

macOS 15.7.1

PDF.js version

5.4.394

Is the bug present in the latest PDF.js version?

Yes

Is a browser extension

No

Steps to reproduce the problem

  1. Have a PDF file served via a route on the same host which requires query params to fetch
  2. Pass that as a relative URL to the default viewer via file= param, use URI encoding as recommended in docs
  3. For example:
let filePath = '/document/open?docPtr=cvhsyvhs7ev1b'
filePath = encodeURIComponent(filePath)
iframeElem.src = '/pdfjs-5.4.394-dist/web/viewer.html?file=' + filePath
  1. The file fails to open, with a network error due to an invalid route, 404 type error.

What is the expected behavior?

I don't know what the intention was behind adding that URL() call. So hard to say how to fix. But relative URLs and protocol-less URLs worked in 5.3.31 and earlier. So seems like a regression.

What went wrong?

I did some digging into what was happening. Version 5.3.31 is the last version that works without issue for me. After that, in viewer.mjs async run(config) function a code bit was added:

try {
    file = new URL(decodeURIComponent(file)).href;
} catch {
    file = encodeURIComponent(file).replaceAll("%2F", "/");
}

URL() can only handle fully formed urls with protocol & domain, ex (https://google.com/search?q=abc).
It cannot handle urls without protocol, ex. (google.com/search?q=abc).
Or relative urls on same domain as the host, ex. (/search?q=abc).
You get a TypeError with invalid URL message.

So because of that, when passing those types of paths to the file param, it falls into the catch. Which then URI encodes the already URI encoded URL and does that replaceAll.

So the URL from the reproduction steps above:
/document/open?docPtr=cvhsyvhs7ev1b
is already encoded once when added to the file param:
%2Fdocument%2Fopen%3FdocPtr%3Dcvhsyvhs7ev1b
And is then encoded again by that catch code:
%252Fdocument%252Fopen%253FdocPtr%253Dcvhsyvhs7ev1b

At some point before the file is actually opened, it seems to do a decode on the double encoded URL, so the final network request to fetch the file is:
%2Fdocument%2Fopen%3FdocPtr%3Dcvhsyvhs7ev1b

Which fails because the ? & & characters are encoded. They are reserved characters, to function as having there special meaning they can't be in an encoded form when arriving at the server otherwise they are regarded as text ? and & rather than special characters.

Link to a viewer

No response

Additional context

Seems likely, this other issue has the same under-lying cause:
#20218

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions