Skip to content

Commit 18357a7

Browse files
Correct file path and sanitization in Windows
Not only we weren't normalizing the file directories, we were also agressively sanitizing incorrect characters, leading to some funny stuff on Windows. Fixes #16
1 parent 3fdfd70 commit 18357a7

File tree

1 file changed

+35
-19
lines changed

1 file changed

+35
-19
lines changed

lib/wayback_machine_downloader.rb

Lines changed: 35 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -131,7 +131,11 @@ def initialize params
131131
validate_params(params)
132132
@base_url = params[:base_url]
133133
@exact_url = params[:exact_url]
134-
@directory = params[:directory]
134+
if params[:directory]
135+
@directory = File.expand_path(params[:directory])
136+
else
137+
@directory = nil
138+
end
135139
@all_timestamps = params[:all_timestamps]
136140
@from_timestamp = params[:from_timestamp].to_i
137141
@to_timestamp = params[:to_timestamp].to_i
@@ -165,13 +169,11 @@ def backup_name
165169

166170
def backup_path
167171
if @directory
168-
if @directory[-1] == '/'
169-
@directory
170-
else
171-
@directory + '/'
172-
end
172+
# because @directory is already an absolute path, we just ensure it exists
173+
@directory
173174
else
174-
'websites/' + backup_name + '/'
175+
# ensure the default path is absolute and normalized
176+
File.expand_path(File.join('websites', backup_name))
175177
end
176178
end
177179

@@ -638,21 +640,35 @@ def download_file (file_remote_info, http)
638640
file_url = file_remote_info[:file_url].encode(current_encoding)
639641
file_id = file_remote_info[:file_id]
640642
file_timestamp = file_remote_info[:timestamp]
641-
file_path_elements = file_id.split('/')
643+
644+
# sanitize file_id to ensure it is a valid path component
645+
raw_path_elements = file_id.split('/')
646+
647+
sanitized_path_elements = raw_path_elements.map do |element|
648+
if Gem.win_platform?
649+
# for Windows, we need to sanitize path components to avoid invalid characters
650+
# this prevents issues with file names that contain characters not allowed in
651+
# Windows file systems. See # https://docs.microsoft.com/en-us/windows/win32/fileio/naming-a-file#naming-conventions
652+
element.gsub(/[:\*?"<>\|\&\=\/\\]/, ->(match) { '%' + match.ord.to_s(16).upcase })
653+
else
654+
element
655+
end
656+
end
657+
658+
current_backup_path = backup_path
642659

643660
if file_id == ""
644-
dir_path = backup_path
645-
file_path = backup_path + 'index.html'
646-
elsif file_url[-1] == '/' or not file_path_elements[-1].include? '.'
647-
dir_path = backup_path + file_path_elements[0..-1].join('/')
648-
file_path = backup_path + file_path_elements[0..-1].join('/') + '/index.html'
661+
dir_path = current_backup_path
662+
file_path = File.join(dir_path, 'index.html')
663+
elsif file_url[-1] == '/' || (sanitized_path_elements.last && !sanitized_path_elements.last.include?('.'))
664+
# if file_id is a directory, we treat it as such
665+
dir_path = File.join(current_backup_path, *sanitized_path_elements)
666+
file_path = File.join(dir_path, 'index.html')
649667
else
650-
dir_path = backup_path + file_path_elements[0..-2].join('/')
651-
file_path = backup_path + file_path_elements[0..-1].join('/')
652-
end
653-
if Gem.win_platform?
654-
dir_path = dir_path.gsub(/[:*?&=<>\\|]/) {|s| '%' + s.ord.to_s(16) }
655-
file_path = file_path.gsub(/[:*?&=<>\\|]/) {|s| '%' + s.ord.to_s(16) }
668+
# if file_id is a file, we treat it as such
669+
filename = sanitized_path_elements.pop
670+
dir_path = File.join(current_backup_path, *sanitized_path_elements)
671+
file_path = File.join(dir_path, filename)
656672
end
657673

658674
# check existence *before* download attempt

0 commit comments

Comments
 (0)