Skip to content

Commit 97ac3c4

Browse files
committed
100% code and documentation coverage.
1 parent ab61883 commit 97ac3c4

File tree

12 files changed

+884
-25
lines changed

12 files changed

+884
-25
lines changed

guides/getting-started/readme.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -149,6 +149,27 @@ Protocol::URL::Path.expand("/a/b/file.html", "other.html", false)
149149
# => "/a/b/file.html/other.html"
150150
```
151151

152+
### Converting to Local File System Paths
153+
154+
Convert URL paths to local file system paths safely:
155+
156+
``` ruby
157+
# Convert URL path to local file system path:
158+
Protocol::URL::Path.to_local_path("/documents/report.pdf")
159+
# => "/documents/report.pdf"
160+
161+
# Handles percent-encoded characters:
162+
Protocol::URL::Path.to_local_path("/files/My%20Document.txt")
163+
# => "/files/My Document.txt"
164+
165+
# Security: Preserves percent-encoded path separators
166+
# This prevents directory traversal attacks:
167+
Protocol::URL::Path.to_local_path("/folder/safe%2Fname/file.txt")
168+
# => "/folder/safe%2Fname/file.txt"
169+
# %2F (/) and %5C (\) are NOT decoded, preventing them from creating
170+
# additional path components in the file system
171+
```
172+
152173
## Working with References
153174

154175
{ruby Protocol::URL::Reference} extends relative URLs with query parameters and fragments. For detailed information on working with references, see the [Working with References](../working-with-references/) guide.

guides/working-with-references/readme.md

Lines changed: 75 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,21 @@ This guide explains how to use {ruby Protocol::URL::Reference} for managing URLs
1010

1111
You can create references in several ways:
1212

13-
### Constructing from Components
13+
### Parsing External URLs (Untrusted Data)
14+
15+
Use {ruby Protocol::URL.parse} or {ruby Protocol::URL.[]} to parse URL strings from external sources (user input, APIs, web pages). These methods validate and decode the input:
16+
17+
``` ruby
18+
# Parse a reference with query and fragment:
19+
reference = Protocol::URL["/api/users?active=true&role=admin#list"]
20+
reference.path # => "/api/users"
21+
reference.query # => "active=true&role=admin"
22+
reference.fragment # => "list"
23+
```
24+
25+
### Constructing from Known Values (Trusted Data)
26+
27+
Use {ruby Protocol::URL::Reference.new} when you have known good values from your code. This method doesn't validate and expects unencoded values:
1428

1529
``` ruby
1630
require "protocol/url"
@@ -23,25 +37,58 @@ reference.to_s # => "/api/users"
2337
reference = Protocol::URL::Reference.new("/search", "q=ruby&page=2")
2438
reference.to_s # => "/search?q=ruby&page=2"
2539

26-
# Reference with fragment:
27-
reference = Protocol::URL::Reference.new("/docs", nil, "section-3")
28-
reference.to_s # => "/docs#section-3"
29-
3040
# Reference with all components:
3141
reference = Protocol::URL::Reference.new("/api/users", "status=active", "results")
3242
reference.to_s # => "/api/users?status=active#results"
43+
44+
# Using parameters (recommended for query strings):
45+
reference = Protocol::URL::Reference.new("/search", nil, nil, {q: "ruby", page: 2})
46+
reference.to_s # => "/search?q=ruby&page=2"
3347
```
3448

35-
### Parsing from Strings
49+
## Understanding Encoding
50+
51+
References use different encoding strategies depending on how they're constructed:
52+
53+
### With parse() - Decodes Input
3654

37-
Use {ruby Protocol::URL.[]} to parse complete URL strings:
55+
`parse()` expects already-encoded URLs and decodes them for internal storage:
3856

3957
``` ruby
40-
# Parse a reference with query and fragment:
41-
reference = Protocol::URL["/api/users?active=true&role=admin#list"]
42-
reference.path # => "/api/users"
43-
reference.query # => "active=true&role=admin"
44-
reference.fragment # => "list"
58+
ref = Protocol::URL::Reference.parse("path%20with%20spaces?foo=bar#frag%20ment")
59+
ref.path # => "path with spaces" (decoded)
60+
ref.fragment # => "frag ment" (decoded)
61+
ref.to_s # => "path%20with%20spaces?foo=bar#frag%20ment" (re-encoded)
62+
```
63+
64+
### With new() - Expects Unencoded Input
65+
66+
`new()` expects raw, unencoded values and encodes them during output:
67+
68+
``` ruby
69+
ref = Protocol::URL::Reference.new("path with spaces", "foo=bar", "frag ment")
70+
ref.path # => "path with spaces"
71+
ref.fragment # => "frag ment"
72+
ref.to_s # => "path%20with%20spaces?foo=bar#frag%20ment"
73+
```
74+
75+
**Warning**: Passing encoded values to `new()` causes double-encoding:
76+
77+
``` ruby
78+
# Wrong - will double-encode:
79+
ref = Protocol::URL::Reference.new("path%20with%20spaces")
80+
ref.to_s # => "path%2520with%2520spaces" (double-encoded!)
81+
82+
# Correct - use parse() for encoded input:
83+
ref = Protocol::URL::Reference.parse("path%20with%20spaces")
84+
ref.to_s # => "path%20with%20spaces"
85+
```
86+
87+
Unicode and special characters are handled automatically:
88+
89+
``` ruby
90+
ref = Protocol::URL::Reference.new("I/❤️/UNICODE")
91+
ref.to_s # => "I/%E2%9D%A4%EF%B8%8F/UNICODE"
4592
```
4693

4794
## Accessing Components
@@ -227,6 +274,22 @@ result.to_s # => "/search?q=ruby&lang=en#result-5"
227274
- Use {ruby Protocol::URL::Relative} for simple path-only URLs
228275
- Use {ruby Protocol::URL::Absolute} for complete URLs with scheme and host
229276

277+
### parse() vs new()
278+
279+
Choose the right method based on your data source:
280+
281+
- **Use `parse()` or `[]`** for external/untrusted data (user input, URLs from web pages, API responses). These methods validate and decode the URL.
282+
- **Use `new()`** for known good values from your code. This is more efficient since it skips validation and expects unencoded values.
283+
284+
``` ruby
285+
# External data - use parse():
286+
user_input = "/search?q=ruby%20gems"
287+
reference = Protocol::URL[user_input] # Validates and decodes
288+
289+
# Internal data - use new():
290+
reference = Protocol::URL::Reference.new("/api/users", "status=active") # Direct construction
291+
```
292+
230293
### Query String Management
231294

232295
The library provides built-in parameter handling through the `parameters` attribute:

lib/protocol/url/absolute.rb

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,18 @@ def authority?
3333
#
3434
# @parameter other [String, Relative, Reference, Absolute] The reference to resolve.
3535
# @returns [Absolute, String] The resolved absolute URL.
36+
#
37+
# @example Resolve a relative path.
38+
# base = Absolute.new("https", "example.com", "/documents/reports/")
39+
# relative = Relative.new("summary.pdf")
40+
# result = base + relative
41+
# result.to_s # => "https://example.com/documents/reports/summary.pdf"
42+
#
43+
# @example Navigate to parent directory.
44+
# base = Absolute.new("https", "example.com", "/documents/reports/2024/")
45+
# relative = Relative.new("../../archive/")
46+
# result = base + relative
47+
# result.to_s # => "https://example.com/documents/archive/"
3648
def +(other)
3749
case other
3850
when Absolute
@@ -92,6 +104,16 @@ def append(buffer = String.new)
92104
# @parameter fragment [String, nil] The fragment to use.
93105
# @parameter pop [Boolean] Whether to pop the last path component before merging.
94106
# @returns [Absolute] A new Absolute URL with the modified components.
107+
#
108+
# @example Change the scheme.
109+
# url = Absolute.new("http", "example.com", "/page")
110+
# secure = url.with(scheme: "https")
111+
# secure.to_s # => "https://example.com/page"
112+
#
113+
# @example Update the query string.
114+
# url = Absolute.new("https", "example.com", "/search", "query=ruby")
115+
# updated = url.with(query: "query=python")
116+
# updated.to_s # => "https://example.com/search?query=python"
95117
def with(scheme: @scheme, authority: @authority, path: nil, query: @query, fragment: @fragment, pop: true)
96118
self.class.new(scheme, authority, Path.expand(@path, path, pop), query, fragment)
97119
end

lib/protocol/url/encoding.rb

Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,14 @@ module Encoding
1111
#
1212
# @parameter string [String] The string to escape.
1313
# @returns [String] The escaped string.
14+
#
15+
# @example Escape spaces and special characters.
16+
# Encoding.escape("hello world!")
17+
# # => "hello%20world%21"
18+
#
19+
# @example Escape unicode characters.
20+
# Encoding.escape("café")
21+
# # => "caf%C3%A9"
1422
def self.escape(string, encoding = string.encoding)
1523
string.b.gsub(/([^a-zA-Z0-9_.\-]+)/) do |m|
1624
"%" + m.unpack("H2" * m.bytesize).join("%").upcase
@@ -21,12 +29,48 @@ def self.escape(string, encoding = string.encoding)
2129
#
2230
# @parameter string [String] The string to unescape.
2331
# @returns [String] The unescaped string.
32+
#
33+
# @example Unescape spaces and special characters.
34+
# Encoding.unescape("hello%20world%21")
35+
# # => "hello world!"
36+
#
37+
# @example Unescape unicode characters.
38+
# Encoding.unescape("caf%C3%A9")
39+
# # => "café"
2440
def self.unescape(string, encoding = string.encoding)
2541
string.b.gsub(/%(\h\h)/) do |hex|
2642
Integer($1, 16).chr
2743
end.force_encoding(encoding)
2844
end
2945

46+
# Unescapes a percent encoded path component, preserving encoded path separators.
47+
#
48+
# This method unescapes percent-encoded characters except for path separators
49+
# (forward slash `/` and backslash `\`). This prevents encoded separators like
50+
# `%2F` or `%5C` from being decoded into actual path separators, which could
51+
# allow bypassing path component boundaries.
52+
#
53+
# @parameter string [String] The path component to unescape.
54+
# @returns [String] The unescaped string with separators still encoded.
55+
#
56+
# @example
57+
# Encoding.unescape_path("hello%20world") # => "hello world"
58+
# Encoding.unescape_path("safe%2Fname") # => "safe%2Fname" (%2F not decoded)
59+
# Encoding.unescape_path("name%5Cfile") # => "name%5Cfile" (%5C not decoded)
60+
def self.unescape_path(string, encoding = string.encoding)
61+
string.b.gsub(/%(\h\h)/) do |hex|
62+
byte = Integer($1, 16)
63+
char = byte.chr
64+
65+
# Don't decode forward slash (0x2F) or backslash (0x5C)
66+
if byte == 0x2F || byte == 0x5C
67+
hex # Keep as %2F or %5C
68+
else
69+
char
70+
end
71+
end.force_encoding(encoding)
72+
end
73+
3074
# Matches characters that are not allowed in a URI path segment. According to RFC 3986 Section 3.3 (https://tools.ietf.org/html/rfc3986#section-3.3), a valid path segment consists of "pchar" characters. This pattern identifies characters that must be percent-encoded when included in a URI path segment.
3175
NON_PATH_CHARACTER_PATTERN = /([^a-zA-Z0-9_\-\.~!$&'()*+,;=:@\/]+)/.freeze
3276

@@ -37,6 +81,10 @@ def self.unescape(string, encoding = string.encoding)
3781
#
3882
# @parameter path [String] The path to escape.
3983
# @returns [String] The escaped path.
84+
#
85+
# @example Escape spaces while preserving path separators.
86+
# Encoding.escape_path("/documents/my reports/summary.pdf")
87+
# # => "/documents/my%20reports/summary.pdf"
4088
def self.escape_path(path)
4189
encoding = path.encoding
4290
path.b.gsub(NON_PATH_CHARACTER_PATTERN) do |m|
@@ -59,6 +107,14 @@ def self.escape_fragment(fragment)
59107
#
60108
# @parameter value [Hash | Array | Nil] The value to encode.
61109
# @parameter prefix [String] The prefix to use for keys.
110+
#
111+
# @example Encode simple parameters.
112+
# Encoding.encode({"name" => "Alice", "age" => "30"})
113+
# # => "name=Alice&age=30"
114+
#
115+
# @example Encode nested parameters.
116+
# Encoding.encode({"user" => {"name" => "Alice", "role" => "admin"}})
117+
# # => "user[name]=Alice&user[role]=admin"
62118
def self.encode(value, prefix = nil)
63119
case value
64120
when Array
@@ -105,21 +161,33 @@ def self.split(name)
105161

106162
# Assign a value to a nested hash.
107163
#
164+
# This method handles building nested data structures from query string parameters, including arrays of objects. When processing array elements (empty key like `[]`), it intelligently decides whether to add to the last array element or create a new one.
165+
#
108166
# @parameter keys [Array(String)] The parts of the key.
109167
# @parameter value [Object] The value to assign.
110168
# @parameter parent [Hash] The parent hash.
169+
#
170+
# @example Building an array of objects.
171+
# # Query: items[][name]=a&items[][value]=1&items[][name]=b&items[][value]=2
172+
# # When "name" appears again, it creates a new array element
173+
# # Result: {"items" => [{"name" => "a", "value" => "1"}, {"name" => "b", "value" => "2"}]}
111174
def self.assign(keys, value, parent)
112175
top, *middle = keys
113176

114177
middle.each_with_index do |key, index|
115178
if key.nil? or key.empty?
179+
# Array element (e.g., items[]):
116180
parent = (parent[top] ||= Array.new)
117181
top = parent.size
118182

183+
# Check if we should reuse the last array element or create a new one. If there's a nested key coming next, and the last array element already has that key, then we need a new array element. Otherwise, add to the existing one.
119184
if nested = middle[index+1] and last = parent.last
185+
# If the last element doesn't include the nested key, reuse it (decrement index).
186+
# If it does include the key, keep current index (creates new element).
120187
top -= 1 unless last.include?(nested)
121188
end
122189
else
190+
# Hash key (e.g., user[name]):
123191
parent = (parent[top] ||= Hash.new)
124192
top = key
125193
end
@@ -134,6 +202,14 @@ def self.assign(keys, value, parent)
134202
# @parameter maximum [Integer] The maximum number of keys in a path.
135203
# @parameter symbolize_keys [Boolean] Whether to symbolize keys.
136204
# @returns [Hash] The decoded query string.
205+
#
206+
# @example Decode simple parameters.
207+
# Encoding.decode("name=Alice&age=30")
208+
# # => {"name" => "Alice", "age" => "30"}
209+
#
210+
# @example Decode nested parameters.
211+
# Encoding.decode("user[name]=Alice&user[role]=admin")
212+
# # => {"user" => {"name" => "Alice", "role" => "admin"}}
137213
def self.decode(string, maximum = 8, symbolize_keys: false)
138214
parameters = {}
139215

0 commit comments

Comments
 (0)