How to follow redirects using cURL?
Learn to manage HTTP redirects with cURL, including handling credentials, maintaining POST methods, and navigating HTML or JavaScript redirects.
- What is cURL?
- Permanent and temporary
- Tell curl to follow redirects
- GET or POST?
- Decide what method to use in redirects
- Redirecting to other hostnames
- Non-HTTP redirects
- HTML redirects
- JavaScript redirects
- Conclusion
A “redirect” is a basic part of how the internet works, introduced in the first internet rules back in 1996. It’s as straightforward as it sounds. Instead of getting what it asked for, the server tells your computer to go somewhere else to find it.
Not all redirects are the same. They can differ in how long they last and what method your computer should use to ask again.
Every time there’s a redirect, the server must also give a Location: header. This tells your computer the new address to check, which could be a complete URL or just a part of one.
What is cURL?
Imagine having a toolbox that helps you move data back and forth from the web effortlessly. That’s cURL for you - a free, open-source tool that speaks the language of the internet, whether it’s HTTP for web browsing, FTP for file transfers, or SMTP for sending emails. It’s like a digital Swiss Army knife, perfect for trying out APIs, downloading content, and automating web tasks. For those interested in leveraging cURL for search engine data retrieval, the Google SERP API provides a practical application of these principles.
Permanent and temporary
Redirects can be permanent or temporary, guiding how users or their browsers move from one resource to another. If you want a redirect to last forever, telling users to go from resource A to B with a GET request, you use a 301 code. This tells the browser to remember this change and use the new address for any future requests to the original URL.
For a temporary move, use a 302 code. This means the server wants the client to visit resource B just for now, without remembering this route for future visits to them original URL
Both 301 and 302 code cause browsers to use GET requests next, even if the original was a POST. This switch is based on old web standards, but it’s how things still work today, affecting most online behavior.
The 303 code acts like a 302 but is specifically for situations where you’re giving a response to the request indirectly rather than just redirecting.
Originally, HTTP/1.0 only had these three redirect codes.
However, tools like cURL don’t store memory of redirects, treating permanent and temporary ones the same. This understanding of web navigation can be enhanced by exploring how web crawlers manage redirects, a topic elaborated upon through the Google Crawl API.
Tell curl to follow redirects
In keeping with its approach of sticking to the basics, cURL doesn’t automatically follow HTTP redirects unless you instruct it to. By using the -L or --location option, You can enable cURL to follow redirects. Once this feature is activated, cURL will follow up to 30 redirects as a standard setting. This limit is mainly there to prevent it from getting stuck in infinite loops. If your find that 30 redirects aren't enough for your needs, you can adjust this limit by using the --max-redirs option to specify a different maximum number of redirects cURL should follow.
GET or POST?
The response codes 301, 302 and 303 lead to the client making a GET request for the new URL, even if the original request was a POST. This is a key point, especially when you’re working with actions that don’t use GET.
If a server wants to redirect the client to a new URL but needs the client to use the same method (like POST) in the follow-up request, it would use different codes. For example, if a server wants to tell the client that the URL it POSTed to has permanently moved to a new location (let’s call it B), and it should POST there from now on, it uses the 308 response code. However, since the 308 code was only defined in 2014, older clients might not recognize it. In that case, the only other option is the 307 response code, which is older and tells the client to POST again to the new location but only temporarily. This means the client won’t remember to POST to the new location (B) in future requests; it will go back to posting to the original location (A). the 307 code came with HTTP/1.1
And just for clarity, redirects function the same way in HTTP/2 as they did in HTTP/1.1
Decide what method to use in redirects
Some web services send a POST request to the original URL and respond with 301,302, or 303 redirect codes but still expect the next request to be a POST. However, browsers and cURL, by default, won’t do this.
Given that these situations are not uncommon, cURL provides options to handle them differently.
You can prevent cURL from switching a non-GET method to GET after receiving a 30x reponse. Use the --post301, --post302, and --post303 options to maintain the POST method after these redirects. If you're working with a libcURL-based application, you can achieve the same effect with the CURLOPT_POSTREDIR option.
Redirecting to other hostnames
When using cURL, you might need to enter a username and password for a site. However, if there’s an HTTP redirect to a different host, cURL, by default, won’t send your credentials to this new host during the same transfer, for security reasons.
If you trust the new host and know it’s safe to send your credentials there, you can use the --location-trusted option with cURL. This tells cURL it’s okay to pass your login details to the new host, even if it’s different from the original one you were interacting with.
Non-HTTP redirects
Browsers have various methods for handling redirects, which can complicate things for cURL users because cURL doesn’t support or recognize some of these methods.
HTML redirects
Besides the methods already mentioned, websites can also redirect browsers using plain HTML, like with a <meta>tag. This becomes tricky with cURL because cURL doesn’t parse HTML, so it doesn’t recognize these types of redirects.
<meta http-equiv="refresh" content="0; url=http://example.com/">
JavaScript redirects
The modern web uses a lot of JavaScript, which is both a programming language and a runtime that lets code run in your browser when you visit websites. JavaScript can also tell the browser to go to a different site, essentially redirecting you.
Conclusion
Handling redirects is crucial for web navigation, and cURL offers tools to manage this process. Knowing the difference between permanent and temporary redirects, as well as those that keep the same method, helps in how we deal with online resources. cURL also provides ways to safely send your credentials to new hosts if you trust them. But, there are limitations, especially with redirects triggered by HTML or JavaScript, because cURL can’t process HTML or run JavaScript. Understanding these details helps you navigate the web more effectively with cURL.