Skip to content

Clean document and encoding for maito: protocol results in unexpected output.  #268

@TasikBeyond

Description

@TasikBeyond

Bug Report

The Clean document function is encoding characters twice. Only happening when a %20 and a [ or ] are included in the original html data.

How to Reproduce

let html = #"<a href="mailto:[email protected]?subject=Job%20Requisition[NID]">Send</a></body></html>"#

let document = try SwiftSoup.parse(html)
let outputSettings = OutputSettings()
outputSettings.prettyPrint(pretty: false)
document.outputSettings(outputSettings)

let headWhitelist: Whitelist = {
    do {
        let customWhitelist = Whitelist.none()
        try customWhitelist
            .addTags("a")
            .addAttributes("a", "href")
            .addProtocols("a", "href", "mailto")
        return customWhitelist
    } catch {
        fatalError("Couldn't init head whitelist")
    }
}()
try headWhitelist

print("Original Document: ", document)
let cleaned = try Cleaner(headWhitelist: headWhitelist, bodyWhitelist: headWhitelist).clean(document)
print("Original Document: ", document)
print("Clean Document: ", cleaned)

Expected Behavior

Clean let html = #"<a href="mailto:[email protected]?subject=Job%20Requisition[NID]">Send</a></body></html>"#

Should result in

<html>
 <head></head>
 <body>
  <a href="mailto:[email protected]?subject=Job%20Requisition%5BNID%5B">Send</a>
 </body>
</html>

Actual Behavior

<html>
 <head></head>
 <body>
  <a href="mailto:[email protected]?subject=Job%2520Requisition%5BNID%5D">Send</a>
 </body>
</html>

Note: %2520 appears to be %20 getting encoded again.

Environment

Swift Soup Version: 2.6.1
Xcode Version: 15.3

Additional Notes

I print the original document before and after the clean(document) function as it appears both the original document and the clean document are being modified.

print("Original Document: ", document)
let cleaned = try Cleaner(headWhitelist: headWhitelist, bodyWhitelist: headWhitelist).clean(document)
print("Original Document: ", document)

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions