Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#7] [Backend] As a User, I can query a single keyword and get its Google search results parsed #39

Merged
merged 80 commits into from Jul 1, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
80 commits
Select commit Hold shift + click to select a range
074f17a
[#6] Add basic create action and form to perform a google query
malparty Jun 14, 2021
96843ba
[#5] Add lastname and firstname to registration views
malparty Jun 9, 2021
5a0848a
Rebase from develop pull
malparty Jun 16, 2021
6372b09
root commit
malparty Jun 11, 2021
604d04b
[#5] Rename lastname/firstname into last_name first_name
malparty Jun 14, 2021
9477eab
[#5] Merge email (citext) and renaming names into previous db migrations
malparty Jun 14, 2021
82c30ab
[#6] Handle error while query google - not colorized
malparty Jun 16, 2021
37c6c63
root commit
malparty Jun 11, 2021
520bd20
Rebase to develop - pull
malparty Jun 17, 2021
4e1ea37
root commit
malparty Jun 11, 2021
b317fd5
[#7] Instal nokogiri Html Parser
malparty Jun 15, 2021
e3a3539
[#7] Define parserservice interface
malparty Jun 15, 2021
21ed3d4
[#7] Implement ads_top_count
malparty Jun 15, 2021
8a350b3
[#7] Implement ads_page_count
malparty Jun 15, 2021
25be192
[#7] Implement ads_top_url
malparty Jun 15, 2021
31b709a
[#7] Add all other parsing methods
malparty Jun 16, 2021
844e5fe
root commit
malparty Jun 11, 2021
d1f1b05
[#5] Rename lastname/firstname into last_name first_name
malparty Jun 14, 2021
42d55fe
[#5] Merge email (citext) and renaming names into previous db migrations
malparty Jun 14, 2021
c397bd3
[#6] Declare vcr inline to improve code readability
malparty Jun 16, 2021
1f697b5
[#6] Replace instance variables by local variables inside view
malparty Jun 16, 2021
045ff30
[#6] Colorize logs from Rails.logger
malparty Jun 16, 2021
b6610c5
root commit
malparty Jun 11, 2021
2ae9256
root commit
malparty Jun 11, 2021
d35b731
root commit
malparty Jun 11, 2021
1a57cda
[#5] Update email db type to citext
malparty Jun 14, 2021
354e7da
[#5] Rename lastname/firstname into last_name first_name
malparty Jun 14, 2021
d30e668
[#5] Merge email (citext) and renaming names into previous db migrations
malparty Jun 14, 2021
9be2612
[#7] Fix rubocop warnings
malparty Jun 17, 2021
84d4330
[#7] Refactor tests with inline vcr
malparty Jun 17, 2021
b5abfeb
[#7] Add error handling into google parser
malparty Jun 18, 2021
12dd01f
root commit
malparty Jun 11, 2021
5248a3d
root commit
malparty Jun 11, 2021
e18701d
root commit
malparty Jun 11, 2021
954dab6
root commit
malparty Jun 11, 2021
27b18c2
[#6] Remove create action and index form
malparty Jun 21, 2021
ec6623e
root commit
malparty Jun 11, 2021
5f73d93
root commit
malparty Jun 11, 2021
635d495
root commit
malparty Jun 11, 2021
4b9601f
root commit
malparty Jun 11, 2021
66ee33a
[#7] Update rspec method name change to call
malparty Jun 22, 2021
9d31857
[#7] Use keyword argument to initislize service
malparty Jun 22, 2021
5462897
[#7] Rename GoogleService namespace in Google
malparty Jun 22, 2021
71f477c
[#7] Rename GoogleService namespace in Google missing folder
malparty Jun 22, 2021
9ca34f0
[#7] Update service tests to call with keyword arguments
malparty Jun 22, 2021
ff18ae1
[#7] Update parse_into method to be bang method and return the object
malparty Jun 22, 2021
b1d2594
root commit
malparty Jun 11, 2021
cf86c54
root commit
malparty Jun 11, 2021
e0e1eb4
root commit
malparty Jun 11, 2021
0bf96eb
[#293] Setup DoorKeeper to embbed OAuth2
malparty Jun 9, 2021
f578260
root commit
malparty Jun 11, 2021
1ca527a
[#6] Remove keywords#create action from routes
malparty Jun 22, 2021
847de8f
root commit
malparty Jun 11, 2021
708b704
root commit
malparty Jun 11, 2021
294cf8e
root commit
malparty Jun 11, 2021
5431dac
root commit
malparty Jun 11, 2021
50aeb0a
root commit
malparty Jun 11, 2021
5eed069
root commit
malparty Jun 11, 2021
ec48425
root commit
malparty Jun 11, 2021
1e82221
root commit
malparty Jun 11, 2021
a817125
root commit
malparty Jun 11, 2021
b751213
root commit
malparty Jun 11, 2021
4e8c6d6
root commit
malparty Jun 11, 2021
a4e09d0
root commit
malparty Jun 11, 2021
9ed9447
[#7] Rename GoogleService namespace in Google
malparty Jun 22, 2021
ca3c0a7
[#7] Rename GoogleService namespace in Google missing folder
malparty Jun 22, 2021
6c43759
[#7] Update service tests to call with keyword arguments
malparty Jun 22, 2021
c34874f
[#7] Update client service to match rebase code
malparty Jun 22, 2021
dff824b
[#7] Fix rubocop indentation
malparty Jun 22, 2021
edcb48d
[#7] Remove useless require nokogiri statment
malparty Jun 23, 2021
55783b5
[#7] Use constants to DRY parser css selectors
malparty Jun 23, 2021
438507e
[#7] Rebase from search-raw after user-login-api merge
malparty Jun 23, 2021
aa3a43f
[#7] Rebase from search-raw remove missed file
malparty Jun 23, 2021
3688817
[#7] Remove unwanted form in keyword#index view
malparty Jun 24, 2021
88e9a61
[#7] Fix parser service with call mthod and return hash of attribute
malparty Jun 28, 2021
bdd4d8a
[#7] Rename vcr to simpler google_search/xx name
malparty Jun 28, 2021
070abd5
[#7] Migrate DB after rebase
malparty Jun 28, 2021
e0a42da
[#7] Migrate DB after rebase - fix missing forzen string literal
malparty Jun 28, 2021
d8e6042
[#7] Add attr_reader for gogole parser service
malparty Jun 30, 2021
41d05a6
[#7] Wrap specs with describe #method_name
malparty Jun 30, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions Gemfile
Expand Up @@ -15,6 +15,7 @@ gem 'bootsnap', require: false # Reduces boot times through caching; required in
gem 'i18n-js', '3.5.1' # A library to provide the I18n translations on the Javascript
gem 'jsonapi-serializer' # A fast JSON:API serializer for Ruby Objects.
gem 'httparty' # A HTTP client for Ruby.
gem 'nokogiri' # Nokogiri makes it easy and painless to work with XML and HTML from Ruby

# Authentications & Authorizations
gem 'devise' # Authentication solution for Rails with Warden
Expand Down
1 change: 1 addition & 0 deletions Gemfile.lock
Expand Up @@ -506,6 +506,7 @@ DEPENDENCIES
letter_opener
listen (= 3.1.5)
mini_magick
nokogiri
pagy
pg
pry-byebug
Expand Down
69 changes: 69 additions & 0 deletions app/services/google/parser_service.rb
@@ -0,0 +1,69 @@
# frozen_string_literal: true

module Google
class ParserService
NON_ADS_RESULT_SELECTOR = 'a[data-ved]:not([role]):not([jsaction]):not(.adwords):not(.footer-links)'
AD_CONTAINER_ID = 'tads'
ADWORDS_CLASS = 'adwords'

def initialize(html_response:)
raise ArgumentError, 'response.body cannot be blank' if html_response.body.blank?

@html = html_response

@document = Nokogiri::HTML.parse(html_response)

# Add a class to all AdWords link for easier manipulation
document.css('div[data-text-ad] a[data-ved]').add_class(ADWORDS_CLASS)

# Mark footer links to identify them
document.css('#footcnt a').add_class('footer-links')
end

# Parse html data and return a hash with the results
def call
{
ads_top_count: ads_top_count,
ads_page_count: ads_page_count,
ads_top_url: ads_top_url,
ads_page_url: ads_page_url,
non_ads_result_count: non_ads_result_count,
non_ads_url: non_ads_url,
total_link_count: total_link_count,
html: html
}
end

private
malparty marked this conversation as resolved.
Show resolved Hide resolved

attr_reader :html, :document

def ads_top_count
document.css("##{AD_CONTAINER_ID} .#{ADWORDS_CLASS}").count
end

def ads_page_count
document.css(".#{ADWORDS_CLASS}").count
end

def ads_top_url
document.css("##{AD_CONTAINER_ID} .#{ADWORDS_CLASS}").map { |a_tag| a_tag['href'] }
end

def ads_page_url
document.css(".#{ADWORDS_CLASS}").map { |a_tag| a_tag['href'] }
end

def non_ads_result_count
document.css(NON_ADS_RESULT_SELECTOR).count
end

def non_ads_url
document.css(NON_ADS_RESULT_SELECTOR).map { |a_tag| a_tag['href'] }
end

def total_link_count
document.css('a').count
end
end
end
3,430 changes: 3,430 additions & 0 deletions spec/fixtures/vcr/google_search/top_ads_1.yml

Large diffs are not rendered by default.

343 changes: 343 additions & 0 deletions spec/fixtures/vcr/google_search/top_ads_6.yml

Large diffs are not rendered by default.

40 changes: 21 additions & 19 deletions spec/services/google/client_service_spec.rb
Expand Up @@ -3,34 +3,36 @@
require 'rails_helper'

RSpec.describe Google::ClientService, type: :service do
malparty marked this conversation as resolved.
Show resolved Hide resolved
context 'when querying a simple keyword' do
it 'returns an HTTParty Response', vcr: 'google_search' do
result = described_class.new(keyword: FFaker::Lorem.word).call
describe '#call' do
context 'when querying a simple keyword' do
it 'returns an HTTParty Response', vcr: 'google_search/base' do
result = described_class.new(keyword: FFaker::Lorem.word).call

expect(result).to be_an_instance_of(HTTParty::Response)
end
expect(result).to be_an_instance_of(HTTParty::Response)
end

it 'queries Google Search', vcr: 'google_search' do
path = described_class.new(keyword: FFaker::Lorem.word).call.request.path
it 'queries Google Search', vcr: 'google_search/base' do
path = described_class.new(keyword: FFaker::Lorem.word).call.request.path

expect(path.to_s).to start_with(described_class::BASE_SEARCH_URL)
expect(path.to_s).to start_with(described_class::BASE_SEARCH_URL)
end
end
end

context 'when google returns an HTTP error' do
it 'returns false', vcr: 'google_warn' do
result = described_class.new(keyword: FFaker::Lorem.word).call
context 'when google returns an HTTP error' do
it 'returns false', vcr: 'google_search/too_many_requests' do
result = described_class.new(keyword: FFaker::Lorem.word).call

expect(result).to eq(false)
end
expect(result).to eq(false)
end

it 'logs a warning with the escaped keyword', vcr: 'google_warn' do
allow(Rails.logger).to receive(:warn)
it 'logs a warning with the escaped keyword', vcr: 'google_search/too_many_requests' do
allow(Rails.logger).to receive(:warn)

word = FFaker::Lorem.word
described_class.new(keyword: word).call
word = FFaker::Lorem.word
described_class.new(keyword: word).call

expect(Rails.logger).to have_received(:warn).with(/#{CGI.escape(word)}/)
expect(Rails.logger).to have_received(:warn).with(/#{CGI.escape(word)}/)
end
end
end
end
54 changes: 54 additions & 0 deletions spec/services/google/parser_service_spec.rb
@@ -0,0 +1,54 @@
# frozen_string_literal: true

require 'rails_helper'

RSpec.describe Google::ParserService, type: :service do
malparty marked this conversation as resolved.
Show resolved Hide resolved
describe '#call' do
context 'when parsing a page having 1 top ad' do
it 'counts exactly 1 top ad', vcr: 'google_search/top_ads_1' do
result = Google::ClientService.new(keyword: 'squarespace').call

expect(described_class.new(html_response: result).call[:ads_top_count]).to eq(1)
end
end

context 'when parsing a page having 3 top ads, 3 bottom ads and 14 non ad links' do
it 'counts exactly 3 top ads', vcr: 'google_search/top_ads_6' do
result = Google::ClientService.new(keyword: 'vpn').call

expect(described_class.new(html_response: result).call[:ads_top_count]).to eq(3)
end

it 'counts exactly 6 ads in total', vcr: 'google_search/top_ads_6' do
result = Google::ClientService.new(keyword: 'vpn').call

expect(described_class.new(html_response: result).call[:ads_page_count]).to eq(6)
end

it 'finds exactly the 3 top ads urls', vcr: 'google_search/top_ads_6' do
result = Google::ClientService.new(keyword: 'vpn').call

expect(described_class.new(html_response: result).call[:ads_top_url]).to contain_exactly('https://cloud.google.com/free', 'https://www.expressvpn.com/', 'https://www.top10vpn.com/best-vpn-for-vietnam/')
end

it 'counts exactly 14 non ad results', vcr: 'google_search/top_ads_6' do
result = Google::ClientService.new(keyword: 'vpn').call

expect(described_class.new(html_response: result).call[:non_ads_result_count]).to eq(14)
end

it 'gets 14 results', vcr: 'google_search/top_ads_6' do
result = Google::ClientService.new(keyword: 'vpn').call

expect(described_class.new(html_response: result).call[:non_ads_url].count).to eq(14)
end

it 'gets exactly 113 links', vcr: 'google_search/top_ads_6' do
# Counted from cassette html raw code
result = Google::ClientService.new(keyword: 'vpn').call

expect(described_class.new(html_response: result).call[:total_link_count]).to eq(113)
end
end
end
end
4 changes: 4 additions & 0 deletions spec/support/vcr.rb
Expand Up @@ -10,6 +10,10 @@
c.ignore_request do |request|
URI(request.uri).port == 9200
end
# Uncomment when need to record a cassette with readable Html
# c.before_record do |i|
# i.response.body.force_encoding('UTF-8')
# end
c.default_cassette_options = { record: :none, match_requests_on: [:path] }
end

Expand Down