Skip to content
This repository has been archived by the owner on Jan 3, 2024. It is now read-only.

selenium-wire with proxy timing-out inside AWS Lambda #716

Open
FelipeLagare opened this issue Oct 4, 2023 · 0 comments
Open

selenium-wire with proxy timing-out inside AWS Lambda #716

FelipeLagare opened this issue Oct 4, 2023 · 0 comments

Comments

@FelipeLagare
Copy link

Apparently, newer versions of Selenium-Wire won't work inside Lambda, so I'm using the following config to deploy my Lambda function:

Versions
[headless-chromium] = 1.0.0-57
[chromedriver] = 86.0.4240.22
urllib3==1.26.6
selenium==3.141.0
pyopenssl==22.0.0
cryptography==38.0.4
selenium-wire==4.0.4

Lambda settings
Runtime = Python 3.7
Memory = 10240 MB
Timeout = 300 seconds

My lambda handler has the code below. After running some tests in Lambda, it's obvious that webdrive.Chrome(), driver.get() and driver.quit() are the methods lagging it the most. But without proxy that's not a problem.
On local machine the proxy works fine.

Is there any way to work around this? Do you have similar cases where newer versions of Selenium-Wire or Python work inside Lambda?

from seleniumwire import webdriver
from selenium.webdriver.common.by import By
import time, json

def lambda_handler(event, context):
    page_url = event['url']
    
    options = webdriver.ChromeOptions()
    options.add_argument("--headless")
    options.add_argument("--disable-gpu")
    options.add_argument('--disable-dev-shm-usage')
    options.add_argument('--disable-extensions')
    options.add_argument("--window-size=1024x768")
    options.add_argument("--disable-application-cache")
    options.add_argument("--user-data-dir=/tmp/user-data")
    options.add_argument('--disable-software-rasterizer')
    options.add_argument("--no-cache")
    options.add_argument("--disable-infobars")
    options.add_argument("--no-sandbox")
    options.add_argument("--hide-scrollbars")
    options.add_argument("--enable-logging")
    options.add_argument("--log-level=0")
    options.add_argument("--v=99")
    options.add_argument("--single-process")
    options.add_argument("--data-path=/tmp/data-path")
    options.add_argument("--ignore-certificate-errors")
    options.add_argument("--homedir=/tmp")
    options.add_argument("--remote-debugging-port=9222")
    options.add_argument("--disk-cache-dir=/tmp/cache-dir")
    options.add_argument("'user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36")
    options.binary_location = "./bin/headless-chromium"
    
    proxy_options = {
        'request_storage_base_dir': '/tmp',
         'exclude_hosts': '',
         "proxy": {
            "http": "**proxy_url**",
            "https": "**proxy_url**",
        }
    }
    driver = webdriver.Chrome(executable_path='./bin/chromedriver', options=options, seleniumwire_options=proxy_options)

    driver.get(page_url)
    time.sleep(2)

    graph = driver.find_element(By.CSS_SELECTOR, 'graph-element')
    elements = graph.find_elements(By.XPATH, ".//*")
    tooltip = elements[2]

    driver.execute_script(
        "arguments[0].scrollIntoView({'block':'center','inline':'center'})", 
        graph
    )

    text = []
    for offset in range(40, 730):
        action = webdriver.ActionChains(driver)
        action.move_to_element_with_offset(graph, offset, 200)
        action.perform()
        text.append(tooltip.text)
    
    driver.quit()

    response = {
        "statusCode": 200,
        "body": json.dumps(text)
    }
    return response
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant