Understanding Proxies in Web Automation
Proxies serve as the clandestine agents of the internet, masking your IP address and enabling you to traverse digital frontiers with subtlety. In the context of web automation—where Selenium and Puppeteer pirouette in the browser’s theatre—proxies are indispensable for circumventing rate limits, geo-restrictions, and surveillance. Free proxies, though capricious and ephemeral, can suffice for lightweight, non-critical scraping or testing scenarios.
Types of Proxies and Their Characteristics
| Proxy Type | Anonymity Level | Protocols Supported | Typical Use-Case | Reliability |
|---|---|---|---|---|
| HTTP | Low to Medium | HTTP, HTTPS | Simple web scraping | Low |
| SOCKS4/5 | High | SOCKS4, SOCKS5 | Complex protocols, HTTPS | Medium |
| Transparent | None (reveals IP) | HTTP, HTTPS | Caching, internal use | Very Low |
| Elite/Anonymous | High | HTTP, HTTPS | Bypassing geo-blocks | Medium |
For a compendium of free proxy lists, peruse https://free-proxy-list.net/ or https://www.sslproxies.org/.
Using Free Proxies with Selenium (Python)
1. Installing Dependencies
pip install selenium
Download the latest ChromeDriver compatible with your Chrome version.
2. Configuring a Proxy in Selenium
The browser, that digital marionette, can be commanded thus:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
proxy = "186.121.235.66:8080" # Replace with your free proxy
options = Options()
options.add_argument(f'--proxy-server=http://{proxy}')
driver = webdriver.Chrome(options=options)
driver.get('https://httpbin.org/ip')
print(driver.page_source)
driver.quit()
Table: Common Chrome Proxy Switches
| Option | Description |
|---|---|
--proxy-server=http://IP:PORT |
Set HTTP proxy |
--proxy-server=https=IP:PORT |
Set HTTPS proxy |
--proxy-bypass-list=localhost;127.0.0.1 |
Exclude addresses from proxy |
3. Using Proxies with Authentication
Free proxies with authentication are rare gems, but should you chance upon one:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
# Authenticated proxies require a Chrome extension workaround
from selenium.webdriver.common.by import By
import zipfile
proxy_host = 'proxy.example.com'
proxy_port = 8000
proxy_user = 'user'
proxy_pass = 'pass'
manifest_json = """
{
"version": "1.0.0",
"manifest_version": 2,
"name": "Chrome Proxy",
"permissions": [
"proxy",
"tabs",
"unlimitedStorage",
"storage",
"<all_urls>",
"webRequest",
"webRequestBlocking"
],
"background": {
"scripts": ["background.js"]
}
}
"""
background_js = f"""
var config = {{
mode: "fixed_servers",
rules: {{
singleProxy: {{
scheme: "http",
host: "{proxy_host}",
port: parseInt({proxy_port})
}},
bypassList: ["localhost"]
}}
}};
chrome.proxy.settings.set({{value: config, scope: "regular"}}, function() {{}});
function callbackFn(details) {{
return {{
authCredentials: {{
username: "{proxy_user}",
password: "{proxy_pass}"
}}
}};
}}
chrome.webRequest.onAuthRequired.addListener(
callbackFn,
{{urls: ["<all_urls>"]}},
['blocking']
);
"""
# Create the proxy extension
pluginfile = 'proxy_auth_plugin.zip'
with zipfile.ZipFile(pluginfile, 'w') as zp:
zp.writestr("manifest.json", manifest_json)
zp.writestr("background.js", background_js)
chrome_options = Options()
chrome_options.add_extension(pluginfile)
driver = webdriver.Chrome(options=chrome_options)
driver.get('https://httpbin.org/ip')
Reference: Selenium Proxy Authentication (GitHub Gist)
Using Free Proxies with Puppeteer (Node.js)
1. Installing Puppeteer
npm install puppeteer
2. Launching Puppeteer with a Proxy
Let the browser don its new mask:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({
args: ['--proxy-server=http://186.121.235.66:8080']
});
const page = await browser.newPage();
await page.goto('https://httpbin.org/ip');
const body = await page.content();
console.log(body);
await browser.close();
})();
3. Handling Proxy Authentication
When the gatekeeper demands credentials:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({
args: ['--proxy-server=http://proxy.example.com:8000']
});
const page = await browser.newPage();
await page.authenticate({
username: 'user',
password: 'pass'
});
await page.goto('https://httpbin.org/ip');
const body = await page.content();
console.log(body);
await browser.close();
})();
4. Rotating Proxies in Puppeteer
A ballet of ephemeral identities, orchestrated thus:
const proxies = [
'http://proxy1:port',
'http://proxy2:port',
'http://proxy3:port'
];
const puppeteer = require('puppeteer');
(async () => {
for (const proxy of proxies) {
const browser = await puppeteer.launch({
args: [`--proxy-server=${proxy}`]
});
const page = await browser.newPage();
await page.goto('https://httpbin.org/ip');
const body = await page.content();
console.log(`Proxy: ${proxy}\n${body}\n`);
await browser.close();
}
})();
Free Proxy Sources
| Name | URL | Features |
|---|---|---|
| Free Proxy List | https://free-proxy-list.net/ | Large HTTP/S list, updated |
| SSL Proxies | https://www.sslproxies.org/ | HTTPS proxies, fast update |
| ProxyScrape | https://proxyscrape.com/free-proxy-list | Multiple protocols |
| Spys.one | http://spys.one/en/ | Advanced filtering |
Best Practices and Limitations
- Ephemeral Nature: Free proxies often vanish without notice; monitor their liveness using tools like ProxyChecker.
- Speed & Reliability: Expect latency, timeouts, and the occasional dead end.
- Security: Never use free proxies for sensitive accounts—man-in-the-middle attacks lurk in the shadows.
- Legal & Ethical Considerations: Always respect robots.txt and terms of service.
Proxy Validation Example (Python)
Before invoking your browser, test the proxy’s pulse:
import requests
proxy = "186.121.235.66:8080"
proxies = {"http": f"http://{proxy}", "https": f"http://{proxy}"}
try:
response = requests.get('https://httpbin.org/ip', proxies=proxies, timeout=5)
print(response.json())
except Exception as e:
print(f"Proxy failed: {e}")
Comments (0)
There are no comments here yet, you can be the first!