Using a proxy server with Python requests can be very useful for a variety of reasons. Proxies allow you to route your web traffic through an intermediary server, which can help you access blocked or restricted resources. Proxies can also provide privacy and anonymity when making requests.
In this comprehensive guide, I’ll explain what proxies are, the different types of proxies, and how to use them with the Python requests module. I’ll provide code examples for setting up both HTTP and SOCKS proxies in Python. I’ll also cover how to handle proxy authentication and some best practices when working with proxies in Python.
What is a Proxy Server?
A proxy server acts as an intermediary between your computer and the wider internet. When you connect through a proxy server, the proxy handles sending requests and receiving responses on your behalf.
Your computer connects to the proxy server, and the proxy server connects to the target server you want to communicate with. All your network traffic gets routed through the proxy instead of communicating directly with servers.
There are several advantages to using a proxy:
- Access to restricted resources – Proxies can allow you to bypass firewalls or geographic restrictions and access blocked content. For example, using a proxy located in another country can help you view geographically restricted content.
- Anonymity – Using a proxy hides your origin IP address, providing more anonymity. Public proxies often mask who the actual requester is.
- Cache – Proxy servers cache content like web pages and images, providing faster access if the content has been viewed before.
- Security – Proxies can add a layer of security by hiding your IP address and encrypting traffic. This makes tracking and hacking more difficult.
- Load balancing – Organizations can use proxies to distribute requests across multiple servers. This provides scalability and resilience.
There are different types of proxy servers, which I’ll cover next.
Types of Proxy Servers
There are a few common types of proxy servers:
HTTP Proxy
An HTTP proxy handles HTTP and HTTPS requests specifically. An HTTP proxy sits between your computer and web servers and listens on port 8080 by default. All your web traffic gets forwarded through the proxy.
HTTP proxies are easy to set up and use with most internet activities like web browsing and API access. However, they won’t work for non-HTTP/HTTPS traffic.
SOCKS Proxy
A SOCKS proxy (SOCKetS proxy) works at the TCP level rather than the HTTP layer. This means a SOCKS proxy can handle nearly any type of internet traffic including HTTP/HTTPS, email, FTP, etc.
SOCKS proxies listen on port 1080 by default. SOCKS also supports UDP traffic whereas HTTP proxies typically only handle TCP.
Transparent Proxy
A transparent proxy intercepts traffic at the network level without any special client configuration required. The client is unaware traffic is being proxied. This approach is commonly used by enterprises and ISPs for caching content or policy enforcement.
Reverse Proxy
A reverse proxy sits in front of a web server and handles requests before passing them to the real servers for processing. This can provide benefits like load balancing, increased security, and caching without requiring changes to the web server configuration.
For the purposes of making requests from Python code, we are generally interested in using forward HTTP or SOCKS proxies that we explicitly configure in our code.
Now that we’ve covered the proxy basics, let’s look at how to use proxies with Python requests.
Using a Proxy with Python Requests
The Python requests module provides easy ways to configure both HTTP and SOCKS proxies.
To use a proxy with requests, you set the proxy properties in a dictionary, then pass this dictionary as the proxies
parameter when making requests:
proxies = {
'http': 'http://10.10.1.10:3128',
'https': 'http://10.10.1.10:1080'
}
requests.get('http://example.org', proxies=proxies)
You can also set authentication credentials, timeout, and other options in the proxies dictionary.
Let’s look at HTTP and SOCKS proxies in more detail.
HTTP Proxy
To use an HTTP proxy with requests, configure the http
and/or https
keys in the proxies dictionary:
proxies = {
'http': 'http://10.10.1.10:3128',
'https': 'http://10.10.1.10:1080'
}
requests.get('https://example.com', proxies=proxies)
This will route all HTTP and HTTPS traffic through the proxy server at 10.10.1.10 on the specified ports.
Many public HTTP proxies do not require authentication. But if your proxy does require a username and password, you can specify them like this:
proxies = {
'http': 'http://user:[email protected]:3128/',
}
Some examples of popular public HTTP proxies include:
- hidester.com
- spys.one
- proxyscrape.com
There are also many public proxy lists online you can use for testing.
SOCKS Proxy
To use a SOCKS proxy with requests, configure the http
and https
keys to point to a SOCKS URL:
proxies = {
'http': 'socks5://user:[email protected]:1080',
'https': 'socks5://user:[email protected]:1080'
}
The format is socks5://user:password@host:port
. Requests will then make all HTTP and HTTPS requests through the SOCKS proxy.
Two common SOCKS protocols are:
- SOCKS5 – The newer SOCKS5 supports UDP, authentication, and IPv6.
- SOCKS4 – SOCKS4 is older but supports most TCP traffic.
Python requests will handle either protocol automatically. By default, it tries to upgrade to SOCKS5 if available.
Some examples of popular public SOCKS5 proxies include:
- hidester.com
- proxy-sale.com
- spys.one
Proxy Authentication
If your proxy requires authentication, you can pass the username and password in the URL like shown in the examples above:
proxies = {
'http': 'http://user:[email protected]:3128/',
'https': 'socks5://user:[email protected]:1080'
}
Alternatively, you can use the Proxy-Authorization
header to pass proxy credentials:
import requests
proxies = {
'http': 'http://10.10.1.10:3128',
'https': 'http://10.10.1.10:1080'
}
auth = 'user:password'
proxies['http'] = 'http://10.10.1.10:3128'
r = requests.get('https://example.com',
proxies=proxies,
headers={'Proxy-Authorization': basic_auth})
This can be useful if you need to dynamically change the credentials in your code.
Proxy Chaining
Proxy chaining refers to using multiple proxies at once by connecting through a sequence of different proxies.
This provides an additional layer of anonymity and makes your traffic harder to track. It also allows you to combine the benefits of different proxies you have access to.
To use multiple proxies with Python requests, you can specify the proxies in order:
proxies = {
'http': 'http://10.10.1.10:3128',
'https': 'https://10.11.1.10:1080,
'http://10.12.1.10:3128',
}
Requests will route through each proxy according to the order in the chain.
Best Practices for Proxies
Here are some best practices to follow when using proxies with Python requests:
- Test your proxies first to verify they work and provide the anonymity you expect. Many public proxies are already detected or blocked.
- Avoid reusing the same proxies, mix up the ones you use to prevent tracking.
- Handle proxy errors correctly instead of failing open. Proxies often fail and your code should handle it gracefully.
- Use proxy rotation to constantly cycle through different proxies automatically. Don’t stick with the same proxies.
- Use proxy authentication when possible, anonymous proxies are more suspicious.
- Limit requests through a single proxy to avoid getting detected and blocked.
- Take care when passing sensitive information like credentials through proxies. Use HTTPS if possible.
- Set reasonable timeouts on your proxied requests in case of poor network conditions.
By following these best practices you can effectively leverage proxies for privacy, access to blocked resources, and resiliency.
Proxy Use Cases
Some common use cases for using proxies with Python requests include:
- Accessing API endpoints and web services that block certain countries or regions.
- Scraping websites that would otherwise blacklist your origin IP address after too many requests.
- Crawling through pages anonymously without tracking user activity.
- Checking your website/application from different proxy locations to test geo-blocking or locality relevant content.
- Gathering data from sources that rate limit by IP address by spreading requests across multiple proxies.
- Hiding intrusion attempts or reconnaissance by attackers by masking the originating location.
- Avoiding regional internet censorship and access restrictions.
- Improving performance by caching commonly accessed content through proxy servers.
Conclusion
Being able to route traffic through proxies enables many useful applications with Python’s requests module. Configuring HTTP and SOCKS proxies is straightforward by passing the proxies dictionary to requests.
Key points to remember:
- Use the proxies dictionary to configure HTTP and SOCKS proxies.
- HTTP proxies handle only HTTP/HTTPS requests while SOCKS proxies support nearly any traffic.
- Chain together multiple proxies for additional anonymity and failover.
- Handle proxy authentication, failures, and usage best practices for robust code.
- Test proxies thoroughly and use them ethically for authorized purposes only.
By leveraging proxies you can open up many possibilities like accessing blocked APIs, distributing requests across IPs, and anonymizing your traffic. Proxies provide a very useful tool but should be utilized carefully and legally.
Frequently Asked Questions
How do I find open proxies to use?
There are many proxy aggregators and lists online you can find with search engines. Some popular sites include free-proxy-list.net, spys.one, and proxy-list.download. Remember to actually test proxies before using to verify they work, provide anonymity, and suit your particular needs.
Are proxies legal to use?
Proxies are generally legal to use in most regions. However, how you utilize them and what activities you perform through proxies could be illegal. Always obey laws and use proxies in an authorized, ethical manner.
Is it safe to send sensitive data through proxy servers?
In general its best not to pass highly sensitive information such as credentials, financial data, or personal details through proxy servers unless they are trusted and you can verify the encryption level used. Many public proxies are run by unknown parties so the confidentiality and integrity they provide is questionable. Use proxies to anonymize browsing but take precautions around truly sensitive data.
How can I test response times through different proxies?
You can write a Python script that iterates through different proxies, makes a requests through each, and records the latency. This can help you find the fastest proxies to use out of a pool of options. Latency can guide you to picking proxies physically closer to the destination or with better network connectivity.
How do I handle proxy errors in Python requests?
Wrap your proxied requests in try/except blocks and handle exceptions like ConnectTimeout, ProxyError, ConnectionErrors etc. You can also implement a proxy “pool” with logic that rotates to the next available proxy when errors occur. This provides failover and avoids your code crashing due to flaky proxies.
How do I route only some requests through a proxy?
You can conditionally set the proxies parameter to control which requests use a proxy vs connect directly. For example:
if use_proxy:
proxies = {...}
else:
proxies = None
requests.get(url, proxies=proxies)
This allows you to proxy only certain requests that need anonymity or access restrictions bypassed.