Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Net::HTTP::Persistent.write_timeout #586

Open
wants to merge 42 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
e9deb2b
updates Linux Firefox user-agent to rev94
ncs1 Nov 9, 2021
b898f47
Merge pull request #587 from ncs1/update_linux_firefox_ua
flavorjones Nov 11, 2021
4a0dfe5
version bump to v2.8.3
flavorjones Nov 11, 2021
1c099a6
use safe_load when using Psych >= 3.1
flavorjones Jan 17, 2022
ec9af73
Merge pull request #589 from sparklemotion/flavorjones-use-psych-safe…
flavorjones Jan 17, 2022
8302ec5
ci: update to cover Ruby 3.1
flavorjones Jan 17, 2022
c8b9d79
Merge pull request #588 from sparklemotion/flavorjones-update-ci-to-r…
flavorjones Jan 17, 2022
70ebc34
version bump to v2.8.4
flavorjones Jan 17, 2022
907c778
fix: clear credentials when redirecting to a different port
flavorjones Jun 6, 2022
c7fe699
Merge pull request #600 from sparklemotion/flavorjones-redirect-headers
flavorjones Jun 9, 2022
c1091fd
version bump to v2.8.5
flavorjones Jun 9, 2022
ab33104
ci: include ruby 3.2
flavorjones Feb 7, 2023
1cf35a5
ci: run CI in a cron job and cancel concurrent jobs
flavorjones Feb 7, 2023
802ce5c
test: skip NTLM test if openssl doesn't support MD4
flavorjones Feb 7, 2023
23f8a16
test: simplify jruby skips
flavorjones Feb 7, 2023
848084a
ci: pin jruby to 9.3 for now
flavorjones Feb 7, 2023
5c2b2b8
Merge pull request #606 from sparklemotion/flavorjones-update-ci-with…
flavorjones Feb 7, 2023
84ca53e
example: fix wikipedia script
flavorjones Feb 7, 2023
5dd0b84
Merge branch 'flavorjones-fix-wikipedia-script'
flavorjones Feb 7, 2023
427f9be
ci: update to jruby 9.4
flavorjones Mar 5, 2023
cb9da39
ci: use modern bundler
flavorjones Mar 5, 2023
70db881
Merge pull request #607 from sparklemotion/flavorjones-fix-ci-ruby-2.5
flavorjones Mar 6, 2023
3584eeb
drop Ruby 2.5 support
flavorjones Mar 14, 2023
4c8d3d2
Merge pull request #608 from sparklemotion/flavorjones-another-ruby25…
flavorjones Mar 15, 2023
96a28e7
fix: stop force_encoding the page body
Apr 6, 2023
18ca167
Merge pull request #610 from sparklemotion/flavorjones-drop-force-enc…
flavorjones Apr 7, 2023
cfa06d2
doc: update CHANGELOG
Apr 7, 2023
77ab71c
ci: update to actions/checkout@v3
Apr 7, 2023
b8f52c0
Merge pull request #611 from sparklemotion/flavorjones-update-ci-actions
flavorjones Apr 7, 2023
cf041d7
version bump to v2.9.0
Apr 7, 2023
78791c5
add example to fetch latest user agents
takatea Apr 14, 2023
96d21aa
update user agent strings for agent aliases
takatea Apr 14, 2023
18b7efd
doc: update docstring for AGENT_ALIASES and CHANGELOG
flavorjones Apr 15, 2023
4ab81ef
Merge pull request #612 from takatea/add-example-to-fetch-latest-user…
flavorjones Apr 15, 2023
59cc064
version bump to v2.9.1
flavorjones Apr 17, 2023
762df0c
test: work around libxml2 encoding changes
flavorjones Jun 7, 2023
74574f9
Merge pull request #614 from sparklemotion/613-update-tests-for-libxm…
flavorjones Jun 7, 2023
0ba09f4
test: work around libxml2 encoding changes
flavorjones Aug 11, 2023
4c82b11
Merge pull request #622 from sparklemotion/613-more-changes
flavorjones Aug 11, 2023
37785c0
ci: test_mechanize_page_link requires nokogiri before using it
flavorjones Sep 16, 2023
119d13e
Merge pull request #623 from sparklemotion/flavorjones-fix-require-order
flavorjones Sep 16, 2023
0dfaacb
Support Net::HTTP::Persistent.write_timeout
maurycy Aug 8, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
20 changes: 14 additions & 6 deletions .github/workflows/ci-test.yml
@@ -1,5 +1,9 @@
name: "ci"

concurrency:
group: "${{github.workflow}}-${{github.ref}}"
cancel-in-progress: true

on:
push:
branches:
Expand All @@ -8,15 +12,17 @@ on:
types: [opened, synchronize]
branches:
- main
schedule:
- cron: "0 8 * * 5" # At 08:00 on Friday # https://crontab.guru/#0_8_*_*_5

jobs:
rubocop:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v3
- uses: ruby/setup-ruby@v1
with:
ruby-version: "3.0"
ruby-version: "3.2"
bundler-cache: true
- run: bundle exec rake rubocop

Expand All @@ -25,15 +31,16 @@ jobs:
strategy:
fail-fast: false
matrix:
ruby-version: ["2.5", "2.6", "2.7", "3.0", "jruby", "truffleruby-head"]
ruby-version: ["2.6", "2.7", "3.0", "3.1", "3.2", "head", "jruby-9.4", "truffleruby-head"]

runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v3
- uses: ruby/setup-ruby@v1
with:
ruby-version: ${{matrix.ruby-version}}
bundler-cache: true
bundler: 2.3.26 # https://github.com/rubygems/rubygems/issues/6435
- run: bundle exec rake test

test-platform:
Expand All @@ -45,9 +52,10 @@ jobs:

runs-on: ${{matrix.platform}}
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v3
- uses: ruby/setup-ruby@v1
with:
ruby-version: "3.0"
ruby-version: "3.2"
bundler-cache: true
bundler: latest
- run: bundle exec rake test
40 changes: 40 additions & 0 deletions CHANGELOG.md
@@ -1,5 +1,45 @@
# Mechanize CHANGELOG

## 2.9.1 / 2023-04-17

### Update

* Updated User-Agent strings to represent modern browser versions. (#612) Thank you, @takatea!


## 2.9.0 / 2023-04-07

### Requirements

* Mechanize now requires Ruby 2.6 or newer.


### Improvement

* Mechanize can now parse frozen strings. (#610)


## 2.8.5 / 2022-06-09

### Security

Fixes low-severity CVE-2022-31033, "Authorization header leak on port redirect." See [GHSA-64qm-hrgp-pgr9](https://github.com/sparklemotion/mechanize/security/advisories/GHSA-64qm-hrgp-pgr9) for more details.


## 2.8.4 / 2022-01-17

### Fix

* `Mechanize::CookieJar#load` calls `Psych.safe_load` when using Psych >= 3.1


## 2.8.3 / 2021-11-11

### Update

* Update the "Linux Firefox" user agent string to rev94 (#587) Thank you, @ncs1!


## 2.8.2 / 2021-08-06

### Dependencies
Expand Down
2 changes: 1 addition & 1 deletion README.md
Expand Up @@ -13,7 +13,7 @@ The Mechanize library is used for automating interaction with websites. Mechaniz

## Dependencies

* Ruby >= 2.5
* Ruby >= 2.6
* Gems:
* `addressable`
* `domain_name`
Expand Down
91 changes: 91 additions & 0 deletions examples/latest_user_agents.rb
@@ -0,0 +1,91 @@
require 'mechanize'

class LatestUAFetcher
attr_reader :user_agents

BASE_URL = 'https://www.whatismybrowser.com/guides/the-latest-user-agent'

def initialize
@agent = Mechanize.new.tap { |a| a.user_agent_alias = 'Mac Firefox' }
@user_agents = {}
end

def run
sleep_time = 1

puts 'get chrome UA...'
chrome
puts "sleeping... (#{sleep_time}s)"
sleep 1

puts 'get firefox UA...'
firefox
puts "sleeping... (#{sleep_time}s)"
sleep 1

puts 'get safari UA...'
safari
puts "sleeping... (#{sleep_time}s)"
sleep 1

puts 'get edge UA...'
edge
end

private

def edge
page = @agent.get("#{BASE_URL}/edge")

windows_dom = page.css("h2:contains('Latest Edge on Windows User Agents')")
@user_agents[:edge] = {
windows: windows_dom.css('+ .listing-of-useragents .code').first.text
}
end

def firefox
page = @agent.get("#{BASE_URL}/firefox")

desktop_dom = page.css("h2:contains('Latest Firefox on Desktop User Agents')")
table_dom = desktop_dom.css('+ .listing-of-useragents')

@user_agents[:firefox] = {
windows: table_dom.css('td:contains("Windows")').css('+ td .code').text,
macOS: table_dom.css('td:contains("Macos")').css('+ td .code').text,
linux: table_dom.css('td:contains("Linux")').css("+ td .code:contains('Ubuntu; Linux x86_64')").text
}
end

def safari
page = @agent.get("#{BASE_URL}/safari")

macos_dom = page.css("h2:contains('Latest Safari on macOS User Agents')")
ios_dom = page.css("h2:contains('Latest Safari on iOS User Agents')")

@user_agents[:safari] = {
mac_os: macos_dom.css('+ .listing-of-useragents .code').first.text,
iphone: ios_dom.css('+ .listing-of-useragents').css("tr:contains('Iphone') .code").text,
ipad: ios_dom.css('+ .listing-of-useragents').css("tr:contains('Ipad') .code").text
}
end

def chrome
page = @agent.get("#{BASE_URL}/chrome")

windows_dom = page.css("h2:contains('Latest Chrome on Windows 10 User Agents')")
linux_dom = page.css("h2:contains('Latest Chrome on Linux User Agents')")
macos_dom = page.css("h2:contains('Latest Chrome on macOS User Agents')")
android_dom = page.css("h2:contains('Latest Chrome on Android User Agents')")

@user_agents[:chrome] = {
windows: windows_dom.css('+ .listing-of-useragents .code').first.text,
linux: linux_dom.css('+ .listing-of-useragents .code').first.text,
mac_os: macos_dom.css('+ .listing-of-useragents .code').first.text,
android: android_dom.css('+ .listing-of-useragents .code').first.text
}
end
end

agent = LatestUAFetcher.new
agent.run
p agent.user_agents
11 changes: 5 additions & 6 deletions examples/wikipedia_links_to_philosophy.rb
Expand Up @@ -58,10 +58,10 @@ def finished?
# the article.

def follow_first_link
puts @title
puts "#{@title} (#{@page.uri})"

# > p > a rejects italics
links = @page.root.css('.mw-content-ltr > p > a[href^="/wiki/"]')
links = @page.root.css('.mw-content-ltr p > a[href^="/wiki/"]')

# reject disambiguation and special pages, images and files
links = links.reject do |link_node|
Expand All @@ -74,10 +74,9 @@ def follow_first_link

link = links.first

unless link then
# disambiguation page? try the first item in the list
link =
@page.root.css('.mw-content-ltr > ul > li > a[href^="/wiki/"]').first
if link.nil?
puts "Could not parse #{@page.uri}"
exit 1
end

# convert a Nokogiri HTML element back to a mechanize link
Expand Down
94 changes: 64 additions & 30 deletions lib/mechanize.rb
Expand Up @@ -87,54 +87,73 @@ class Error < RuntimeError
# description in parenthesis is for informative purposes and is not part of
# the alias name.
#
# * Linux Firefox (43.0 on Ubuntu Linux)
# * Linux Konqueror (3)
# * Linux Mozilla
# * Mac Firefox (43.0)
# * Mac Mozilla
# * Mac Safari (9.0 on OS X 10.11.2)
# * Mac Safari 4
# * Mechanize (default)
# * Windows IE 6
# * Windows IE 7
# * Windows IE 8
# * Windows IE 9
# * Windows IE 10 (Windows 8 64bit)
# * Windows IE 11 (Windows 8.1 64bit)
# * Windows Edge
# * Windows Mozilla
# * Windows Firefox (43.0)
# * iPhone (iOS 9.1)
# * iPad (iOS 9.1)
# * Android (5.1.1)
# The default User-Agent alias:
#
# * "Mechanize"
#
# Linux User-Agent aliases:
#
# * "Linux Firefox"
# * "Linux Konqueror"
# * "Linux Mozilla"
#
# Mac User-Agent aliases:
#
# * "Mac Firefox"
# * "Mac Mozilla"
# * "Mac Safari 4"
# * "Mac Safari"
#
# Windows User-Agent aliases:
#
# * "Windows Edge"
# * "Windows Firefox"
# * "Windows IE 6"
# * "Windows IE 7"
# * "Windows IE 8"
# * "Windows IE 9"
# * "Windows IE 10"
# * "Windows IE 11"
# * "Windows Mozilla"
#
# Mobile User-Agent aliases:
#
# * "Android"
# * "iPad"
# * "iPhone"
#
# Example:
#
# agent = Mechanize.new
# agent.user_agent_alias = 'Mac Safari'

#
AGENT_ALIASES = {
# TODO: use output from examples/latest_user_agents.rb as the underling data structure
'Mechanize' => "Mechanize/#{VERSION} Ruby/#{ruby_version} (http://github.com/sparklemotion/mechanize/)",
'Linux Firefox' => 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:43.0) Gecko/20100101 Firefox/43.0',

'Linux Firefox' => 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:112.0) Gecko/20100101 Firefox/112.0',
'Linux Konqueror' => 'Mozilla/5.0 (compatible; Konqueror/3; Linux)',
'Linux Mozilla' => 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030624',
'Mac Firefox' => 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:43.0) Gecko/20100101 Firefox/43.0',

'Mac Firefox' => 'Mozilla/5.0 (Macintosh; Intel Mac OS X 13.3; rv:112.0) Gecko/20100101 Firefox/112.0',
'Mac Mozilla' => 'Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.4a) Gecko/20030401',
'Mac Safari 4' => 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_2; de-at) AppleWebKit/531.21.8 (KHTML, like Gecko) Version/4.0.4 Safari/531.21.10',
'Mac Safari' => 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9',
'Windows Chrome' => 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.125 Safari/537.36',
'Mac Safari' => 'Mozilla/5.0 (Macintosh; Intel Mac OS X 13_3_1) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.4 Safari/605.1.15',

'Windows Chrome' => 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36',
'Windows Edge' => 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36 Edg/112.0.1722.46',
'Windows Firefox' => 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:112.0) Gecko/20100101 Firefox/112.0',
'Windows IE 6' => 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)',
'Windows IE 7' => 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)',
'Windows IE 8' => 'Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.1.4322; .NET CLR 2.0.50727)',
'Windows IE 9' => 'Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)',
'Windows IE 10' => 'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; WOW64; Trident/6.0)',
'Windows IE 11' => 'Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; rv:11.0) like Gecko',
'Windows Edge' => 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2486.0 Safari/537.36 Edge/13.10586',
'Windows Mozilla' => 'Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4b) Gecko/20030516 Mozilla Firebird/0.6',
'Windows Firefox' => 'Mozilla/5.0 (Windows NT 6.3; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0',
'iPhone' => 'Mozilla/5.0 (iPhone; CPU iPhone OS 9_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13B5110e Safari/601.1',
'iPad' => 'Mozilla/5.0 (iPad; CPU OS 9_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13B143 Safari/601.1',
'Android' => 'Mozilla/5.0 (Linux; Android 5.1.1; Nexus 7 Build/LMY47V) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.76 Safari/537.36',

'Android' => 'Mozilla/5.0 (Linux; Android 10) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.5615.48 Mobile Safari/537.36',
'iPad' => 'Mozilla/5.0 (iPad; CPU OS 16_4_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.4 Mobile/15E148 Safari/604.1',
'iPhone' => 'Mozilla/5.0 (iPhone; CPU iPhone OS 16_4_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.4 Mobile/15E148 Safari/604.1',
}

AGENT_ALIASES.default_proc = proc { |hash, key|
Expand Down Expand Up @@ -933,6 +952,21 @@ def read_timeout= read_timeout
@agent.read_timeout = read_timeout
end

##
# Length of time to wait for data to be sent to the server

def write_timeout
@agent.write_timeout
end

##
# Sets the timeout for each chunk of data to be sent to the server to
# +write_timeout+. A single request may write many chunks of data.

def write_timeout= write_timeout
@agent.write_timeout = write_timeout
end

##
# Controls how mechanize deals with redirects. The following values are
# allowed:
Expand Down
14 changes: 13 additions & 1 deletion lib/mechanize/cookie_jar.rb
Expand Up @@ -149,7 +149,7 @@ def load(input, *options)
return super(input, opthash) if opthash[:format] != :yaml

begin
data = YAML.load(input) # rubocop:disable Security/YAMLLoad
data = load_yaml(input)
rescue ArgumentError
@logger.warn "unloadable YAML cookie data discarded" if @logger
return self
Expand All @@ -174,6 +174,18 @@ def load(input, *options)
return self
end
end

private

if YAML.name == "Psych" && Gem::Requirement.new(">= 3.1").satisfied_by?(Gem::Version.new(Psych::VERSION))
def load_yaml(yaml)
YAML.safe_load(yaml, aliases: true, permitted_classes: ["Mechanize::Cookie", "Time"])
end
else
def load_yaml(yaml)
YAML.load(yaml) # rubocop:disable Security/YAMLLoad
end
end
end

class ::HTTP::CookieJar
Expand Down