-
-
Notifications
You must be signed in to change notification settings - Fork 372
Open
Labels
Description
Bug Report
The Clean document function is encoding characters twice. Only happening when a %20 and a [ or ] are included in the original html data.
How to Reproduce
let html = #"<a href="mailto:[email protected]?subject=Job%20Requisition[NID]">Send</a></body></html>"#
let document = try SwiftSoup.parse(html)
let outputSettings = OutputSettings()
outputSettings.prettyPrint(pretty: false)
document.outputSettings(outputSettings)
let headWhitelist: Whitelist = {
do {
let customWhitelist = Whitelist.none()
try customWhitelist
.addTags("a")
.addAttributes("a", "href")
.addProtocols("a", "href", "mailto")
return customWhitelist
} catch {
fatalError("Couldn't init head whitelist")
}
}()
try headWhitelist
print("Original Document: ", document)
let cleaned = try Cleaner(headWhitelist: headWhitelist, bodyWhitelist: headWhitelist).clean(document)
print("Original Document: ", document)
print("Clean Document: ", cleaned)
Expected Behavior
Clean let html = #"<a href="mailto:[email protected]?subject=Job%20Requisition[NID]">Send</a></body></html>"#
Should result in
<html>
<head></head>
<body>
<a href="mailto:[email protected]?subject=Job%20Requisition%5BNID%5B">Send</a>
</body>
</html>
Actual Behavior
<html>
<head></head>
<body>
<a href="mailto:[email protected]?subject=Job%2520Requisition%5BNID%5D">Send</a>
</body>
</html>
Note: %2520 appears to be %20 getting encoded again.
Environment
Swift Soup Version: 2.6.1
Xcode Version: 15.3
Additional Notes
I print the original document before and after the clean(document) function as it appears both the original document and the clean document are being modified.
print("Original Document: ", document)
let cleaned = try Cleaner(headWhitelist: headWhitelist, bodyWhitelist: headWhitelist).clean(document)
print("Original Document: ", document)