cURL: follow locations with safe_mode enabled or open_basedir set

cURL is a tool to connect to a remote server and load data from it while schemes like HTTP, HTTPS, FTP, gopher, telnet, DICT, FILE, LDAP and more are supported for the request URI.

PHP has built-in support by providing its users a layer upon the underlying libcurl library. Here is an example how cURL is used in PHP:

$ch = curl_init("http://example.com");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$data = curl_exec($ch);
curl_close($ch);

The option CURLOPT_RETURNTRANSFER advices PHP to return the result after executing curl_exec() on success and FALSE else. If this option is ommitted it would return TRUE instead of the result.

At some point you need to follow a location. This is the case if a server you are connecting to is replying with a location redirect. In a HTTP response the server would reply with a 301 or 302 status code and the HTTP header Location pointing to the new URI. In the code the option CURLOPT_FOLLOWLOCATION needs to be set to allow libcurl to follow the redirect.

$ch = curl_init("http://example.com");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
$data = curl_exec($ch);
curl_close($ch);

However if your web server has safe_mode activated or open_basedir set then CURLOPT_FOLLOWLOCATION won’t have any effect. The below warning will appear and libcurl won’t follow the new location.

Warning: curl_setopt() [function.curl-setopt]: CURLOPT_FOLLOWLOCATION 
cannot be activated when in safe_mode or an open_basedir is set in ...

This blog post shows you a workaround by changing the server configuration and putting the loader script to a directory defined in open_basedir. Unfortunately many websites are hosted on a shared host. Hence most people can’t just alter something in a configuration file but rather need a user-space solution.

One solution is to follow redirects manually by examining the server response and send the request to the new location again. The next sample code does exactly this. The function curl_exec_follow is passed two arguments, one is the cURL handler and the second the maximum amount of allowed redirects. If a server response contains a redirect location the script also checks if it’s a URL or a relative path to the resource.

function curl_exec_follow($ch, &$maxredirect = null) {
  
  // we emulate a browser here since some websites detect
  // us as a bot and don't let us do our job
  $user_agent = "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5)".
                " Gecko/20041107 Firefox/1.0";
  curl_setopt($ch, CURLOPT_USERAGENT, $user_agent );

  $mr = $maxredirect === null ? 5 : intval($maxredirect);

  if (filter_var(ini_get(‘open_basedir’), FILTER_VALIDATE_BOOLEAN) === false 
      && filter_var(ini_get(‘safe_mode’), FILTER_VALIDATE_BOOLEAN) === false
  ) {

    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, $mr > 0);
    curl_setopt($ch, CURLOPT_MAXREDIRS, $mr);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);

  } else {
    
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);

    if ($mr > 0)
    {
      $original_url = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
      $newurl = $original_url;
      
      $rch = curl_copy_handle($ch);
      
      curl_setopt($rch, CURLOPT_HEADER, true);
      curl_setopt($rch, CURLOPT_NOBODY, true);
      curl_setopt($rch, CURLOPT_FORBID_REUSE, false);
      do
      {
        curl_setopt($rch, CURLOPT_URL, $newurl);
        $header = curl_exec($rch);
        if (curl_errno($rch)) {
          $code = 0;
        } else {
          $code = curl_getinfo($rch, CURLINFO_HTTP_CODE);
          if ($code == 301 || $code == 302) {
            preg_match('/Location:(.*?)\n/i', $header, $matches);
            $newurl = trim(array_pop($matches));
            
            // if no scheme is present then the new url is a
            // relative path and thus needs some extra care
            if(!preg_match("/^https?:/i", $newurl)){
              $newurl = $original_url . $newurl;
            }   
          } else {
            $code = 0;
          }
        }
      } while ($code && --$mr);
      
      curl_close($rch);
      
      if (!$mr)
      {
        if ($maxredirect === null)
        trigger_error('Too many redirects.', E_USER_WARNING);
        else
        $maxredirect = 0;
        
        return false;
      }
      curl_setopt($ch, CURLOPT_URL, $newurl);
    }
  }
  return curl_exec($ch);
}

This function is used in place of curl_exec() and no extra user privileges are required compared to Olaf’s workaround in the above linked blog post. Here’s how curl_exec_follow() is used:

$ch = curl_init("http://example.com");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$data = curl_exec_follow($ch);
curl_close($ch);

Hopefully this helped you.

40 Comments

  1. atharvan

    Syntax error in
    “curl_setopt($ch, CURLOPT_FOLLOWLOCATION, $mr > 0);”
    “if ($mr > 0)”

  2. slopjong

    Thanks. It looks like something went wrong when I published the code. ‘>’ turned into >

  3. Doug

    Thank you! This helped a ton.

  4. Mike

    I can’t make this work with on a server that has safe_mode on and register_globals off. I’m trying to make a curl login script that would compare if the login attempt was successful or not with preg_match. Seems that PHP and CURL are making things very hard. Eventually all the hosts will upgrade to php 5.3.19 and the lastest libcurl 7.28.1.

  5. slopjong

    Please allow PHP to report error messages as follows

    error_reporting(E_ALL);

    You have to put that before the curl_exec_follow(…) usage. If no error shows up add some debugging messages and track down the place things go wrong. Be also aware of the cURL options which maybe need to be adapted in your case.

  6. I found it useful to mimic a browser by adding before the first IF
    curl_setopt($ch, CURLOPT_USERAGENT, ‘Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0′);

    By doing this, I could access a website that was otherwise seeing me as malware and was therefore denying me.

  7. slopjong

    Yes, I also have the impression, that emulating a browser makes life easier and avoids additional debugging. I’m adding that line. Thanks for your report.

  8. The code window on this page is too narrow for the whole CURLOPT_USERAGENT line to show…

  9. slopjong

    I’m aware of that. I didn’t split up the string because I copied while not testing it. I just wanted to avoid any stupid error being introduced. I’ve done it now but it’s still untested. It looks error-free but could you test it to be sure that everything’s fine?

  10. It seems to be working well, although in the original string, there is a space between “…rv:1.7.5)” and “Gecko…”
    I don’t know if that really matters. I tested it with Firefox.

  11. Pan

    I’m getting 502 error and events in one request event published many times even if I set second param of curl_exec_follow function. I’m trying to use this with Google Calendar API.

  12. slopjong

    Please try the modified version below and tell me if it worked. I wrapped the curl_exec() function by an output buffer.

    function curl_exec_follow($ch, &$maxredirect = null) {
      
      // we emulate a browser here since some websites detect
      // us as a bot and don't let us do our job
      $user_agent = "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5)".
                    " Gecko/20041107 Firefox/1.0";
      curl_setopt($ch, CURLOPT_USERAGENT, $user_agent );
    
      $mr = $maxredirect === null ? 5 : intval($maxredirect);
    
      if (ini_get('open_basedir') == '' && ini_get('safe_mode') == 'Off') {
    
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, $mr > 0);
        curl_setopt($ch, CURLOPT_MAXREDIRS, $mr);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
        curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
    
      } else {
        
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
    
        if ($mr > 0)
        {
          $original_url = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
          $newurl = $original_url;
          
          $rch = curl_copy_handle($ch);
          
          curl_setopt($rch, CURLOPT_HEADER, true);
          curl_setopt($rch, CURLOPT_NOBODY, true);
          curl_setopt($rch, CURLOPT_FORBID_REUSE, false);
          do
          {
            curl_setopt($rch, CURLOPT_URL, $newurl);
    
            ob_start();
            curl_exec($rch);
            $header = ob_get_contents();
            ob_end_flush();
    
            if (curl_errno($rch)) {
              $code = 0;
            } else {
              $code = curl_getinfo($rch, CURLINFO_HTTP_CODE);
              if ($code == 301 || $code == 302) {
                preg_match('/Location:(.*?)\n/', $header, $matches);
                $newurl = trim(array_pop($matches));
                
                // if no scheme is present then the new url is a
                // relative path and thus needs some extra care
                if(!preg_match("/^https?:/i", $newurl)){
                  $newurl = $original_url . $newurl;
                }   
              } else {
                $code = 0;
              }
            }
          } while ($code && --$mr);
          
          curl_close($rch);
          
          if (!$mr)
          {
            if ($maxredirect === null)
            trigger_error('Too many redirects.', E_USER_WARNING);
            else
            $maxredirect = 0;
            
            return false;
          }
          curl_setopt($ch, CURLOPT_URL, $newurl);
        }
      }
    
      ob_start();
      curl_exec($ch);
      $ret = ob_get_contents();
      ob_end_flush(); 
    
      return $ret;
    }
    
  13. slopjong

    Little update. I made a tiny mistake in the modified script. I’ve corrected it so use the snippet from the previous comment again.

  14. Pan

    Many thanks. Unfortunatelly I’m still getting 502. Its because of host. On another server your function works almost fine – task is added to google calendar but often many times in one request.

  15. slopjong

    Try to compare the configuration of both hosts you were trying it (with the phpinfo() function). If you wish to get it run on the one server but aren’t able to do so, I can spend some spare time but it will be beyond of free support though.

  16. Pan

    Thanks, but open_base dir is set. Cant use ini_set to change this cause it is PHP_INI_SYSTEM value.

  17. slopjong

    The above script in fact is meant to be used on hosts where open_basedir is set which I am pretty sure has nothing to do with your issue. You would get this kind of message instead:

    Warning: curl_setopt() [function.curl-setopt]: CURLOPT_XXX 
    cannot be activated when in safe_mode or an open_basedir is set in ...
    

    502 means that cURL has received an invalid response from the upstream server. This can have various reasons e.g. a too big file being sent when using the CURLOPT_POST option. I’d say your issue is more cURL-related than php-related.

  18. Hello,

    I encountered the same problem and I would like to use your solution. However, I have no idea where to put the code. Can you give me a hint?

  19. slopjong

    This highly depends on your cms/theme or what else system you use.

    The function

    function curl_exec_follow($ch, &$maxredirect = null) { 
      ...
    }
    

    can be put anywhere in your php code, even after the actual usage (at least in the same source file).

    Then put the following four lines where you want to curl data from the internet:

    $ch = curl_init("http://example.com");
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    $data = curl_exec_follow($ch);
    curl_close($ch)
    

    If you put the function in one source file and the four lines into another be sure you include the source file with the function in the source file where you call it. To include a source file use one of these php functions:

    • include
    • include_once
    • require
    • require_once
  20. Ok…
    I use Joomla. This is way beyond my knowledge. I guess I must try to persuade the web hoster to adjust the settings on his side…
    Thanks for your time!

  21. I’m trying to login to a web page using cURL but after searching solutions i keep getting

    HTTP/1.1 200 OK 
    Date: Fri, 22 Mar 2013 18:00:24 GMT 
    Server: Microsoft-IIS/6.0 
    X-Powered-By: ASP.NET 
    X-AspNet-Version: 2.0.50727 
    X-dynaTrace: PT=720777;PA=680855120;RS=EBSCOhost/20130322122352_0.session;PS=-1981931378 
    dynaTrace: PT=720777;PA=680855120;RS=EBSCOhost/20130322122352_0.session;PS=-1981931378 
    Cache-Control: no-cache, no-store 
    Pragma: no-cache 
    Expires: -1 
    Content-Type: text/html; charset=utf-8
    Content-Length: 11495
    Object moved to here.
    

    Anybody knows why this keep happening? I’m using
    curl_setopt ($ch, CURLOPT_POSTFIELDS, $postdata) to login, so maybe thats the problem.
    Thanks!

  22. Ale

    Looks like there is a small typo in the script, shouldn’t this:
    if (ini_get('open_basedir') == '' && ini_get('safe_mode' == 'Off')) {
    be:
    if (ini_get('open_basedir') == '' && ini_get('safe_mode') == 'Off') {

  23. slopjong

    Nice catch! You are absolutely right. I didn’t notice it all the time as my web server(s) didn’t complain it because syntactically it’s correct.

    PHP should complain ini_get(false) ;-)

  24. vikas

    Thanks alot !!!… It works well on my end…

  25. AmyFaiz

    Hello Slopjong

    When i get a php script from internet I tested it with http://adlz.tk/ and files hosted with 000webhost

    But it shows error as
    Warning: curl_setopt() [function.curl-setopt]: CURLOPT_FOLLOWLOCATION cannot be activated when safe_mode is enabled or an open_basedir is set in /home/xxxxxxx/public_html/jackdata/function.php on line 45

    I don’t know php programming
    My function.php file is at
    https://drive.google.com/file/d/0B1QK8PSN44W0eUR1Y3lZNkRaWWs/edit?usp=sharing

    Could you please tell how to solve this

  26. slopjong

    That’s a known problem with open_basedir and safe_mode. You can fix this by adding the function curl_follow_exec to your function.php and changing your get_data as follows.

    function get_data($url)
    {
      $ch = curl_init();
      $data = curl_follow_exec($ch);
      curl_close($ch);
      return $data;
    }
    

    Please notice that in the original get_data you have additional headers set. I guess that the header for your remote IP matters and needs to be set in curl_follow_exec.

    As a quick fix you can change

    function curl_exec_follow($ch, &$maxredirect = null) {
      $user_agent = "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5)".
                    " Gecko/20041107 Firefox/1.0";
      curl_setopt($ch, CURLOPT_USERAGENT, $user_agent );
      ...
    

    to the following code accepting an IP argument:

    function curl_exec_follow($ch, &$maxredirect = null, $ip='') {
      $user_agent = "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5)".
                    " Gecko/20041107 Firefox/1.0";
      curl_setopt($ch, CURLOPT_USERAGENT, $user_agent );
      if (!empty($ip)) {
        curl_setopt($ch, CURLOPT_HTTPHEADER, array("REMOTE_ADDR: $ip", "HTTP_X_FORWARDED_FOR: $ip"));
      }
      ...
    

    With this adaption you need to use curl_exec_follow as follows. In order to use your global variable $ip you must make it global in your function scope.

    function get_data($url)
    {
      global $ip
      $ch = curl_init();
      $data = curl_follow_exec($ch, null, $ip);
      curl_close($ch);
      return $data;
    }
    

    Let me know if that worked for you.

  27. slopjong

    According the Q&A Custom IP in cURL you can’t spoof the IP.

  28. AmyFaiz

    Hello slopjong

    I rewritten get_data in function.php as below

    function get_data($url)
    {
      $ch = curl_init();
      $data = curl_follow_exec($ch);
      curl_close($ch);
      return $data;
    }
    function curl_exec_follow($ch, &$maxredirect = null) 
    {
      $user_agent = "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5)".
                    " Gecko/20041107 Firefox/1.0";
      curl_setopt($ch, CURLOPT_USERAGENT, $user_agent );
    }
    

    Then it gives error as
    Fatal error: Call to undefined function curl_follow_exec() in /home/xxxxxxx/public_html/jackdata/function.php on line 41

    please help thanking you in advance

  29. AmyFaiz

    Secondly with global $ip argument

    I rewritten my function.php as below

    function get_data($url)
    {
      global $ip;
      $ch = curl_init();
      $data = curl_follow_exec($ch, null, $ip);
      curl_close($ch);
      return $data;
    }
    function curl_exec_follow($ch, &$maxredirect = null, $ip='') {
      $user_agent = "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5)".
                    " Gecko/20041107 Firefox/1.0";
      curl_setopt($ch, CURLOPT_USERAGENT, $user_agent );
      if (!empty($ip)) {
        curl_setopt($ch, CURLOPT_HTTPHEADER, array("REMOTE_ADDR: $ip", "HTTP_X_FORWARDED_FOR: $ip"));
      }
    }
    

    Then also it gives error as
    Fatal error: Call to undefined function curl_follow_exec()

    Please help to resolve this problem thanking you in advance

  30. slopjong

    Put curl_exec_follow before get_data and you won’t get this error message.

    Please use the code tag for code blocks in your comments. I have to reformat your comments manually otherwise which consumes unpaid time, unless you pay me a virtual beer ;-)

  31. slopjong

    You must also copy the full curl_exec_follow from the article.

  32. Laurent

    Bonjour,

    la fonction fonctionne sur ma VM ubuntu en dev mais pas en prod chez mon hébergeur (retour “false”)

  33. Petrina

    I’ve added the curl_exec_follow function to my code and called it instead of curl_exec_follow, but I still get the same error message “CURLOPT_FOLLOWLOCATION cannot be activated when safe_mode…”. Any ideas?

  34. Petrina

    Sorry, I meant to say ‘instead of curl_exec’.

  35. James

    The preg_match should include the i modifier. HTTP headers are case-insensitive* and, at the least, twitter’s t.co shortening service returns “location: “.

    * RFC 2616: Each header field consists of a name followed by a colon (“:”) and the field value. Field names are case-insensitive.

  36. Thanks a lot! I am at work right now and this solved the problem with my current project! Finally it works!

  37. kindox

    The condition

    ini_get('safe_mode') == 'Off'
    

    would be something like:

    in_array(strtolower(ini_get('safe_mode')), array('off', '', '0'))
    

    because:

    A boolean ini value of off will be returned as an empty string or “0″ while a boolean ini value of on will be returned as “1″. The function can also return the literal string of INI value.

    (from http://php.net/manual/en/function.ini-get.php)

  38. slopjong

    Things changed in PHP 5.3.

    Another solution is filter_var which turns ‘off’, ” and ’0′ to false with the boolean validation flag.

    filter_var(ini_get(‘safe_mode’), FILTER_VALIDATE_BOOLEAN) === false
    
  39. slopjong

    I’ve updated the article. Thx @James & @kindox.

  40. nabi

    Your code have bug in check open_basedir by ini_get().
    This code is better: http://php.net/manual/en/function.curl-setopt.php#102121

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>