cURL: follow locations with safe_mode enabled or open_basedir set

cURL is a tool to connect to a remote server and load data from it while schemes like HTTP, HTTPS, FTP, gopher, telnet, DICT, FILE, LDAP and more are supported for the request URI.

PHP has built-in support by providing its users a layer upon the underlying libcurl library. Here is an example how cURL is used in PHP:

$ch = curl_init("http://example.com");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$data = curl_exec($ch);
curl_close($ch);

The option CURLOPT_RETURNTRANSFER advices PHP to return the result after executing curl_exec() on success and FALSE else. If this option is ommitted it would return TRUE instead of the result.

At some point you need to follow a location. This is the case if a server you are connecting to is replying with a location redirect. In a HTTP response the server would reply with a 301 or 302 status code and the HTTP header Location pointing to the new URI. In the code the option CURLOPT_FOLLOWLOCATION needs to be set to allow libcurl to follow the redirect.

$ch = curl_init("http://example.com");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
$data = curl_exec($ch);
curl_close($ch);

However if your web server has safe_mode activated or open_basedir set then CURLOPT_FOLLOWLOCATION won’t have any effect. The below warning will appear and libcurl won’t follow the new location.

Warning: curl_setopt() [function.curl-setopt]: CURLOPT_FOLLOWLOCATION 
cannot be activated when in safe_mode or an open_basedir is set in ...

This blog post shows you a workaround by changing the server configuration and putting the loader script to a directory defined in open_basedir. Unfortunately many websites are hosted on a shared host. Hence most people can’t just alter something in a configuration file but rather need a user-space solution.

One solution is to follow redirects manually by examining the server response and send the request to the new location again. The next sample code does exactly this. The function curl_exec_follow is passed two arguments, one is the cURL handler and the second the maximum amount of allowed redirects. If a server response contains a redirect location the script also checks if it’s a URL or a relative path to the resource.

function curl_exec_follow($ch, &$maxredirect = null) {
  
  // we emulate a browser here since some websites detect
  // us as a bot and don't let us do our job
  $user_agent = "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5)".
                " Gecko/20041107 Firefox/1.0";
  curl_setopt($ch, CURLOPT_USERAGENT, $user_agent );

  $mr = $maxredirect === null ? 5 : intval($maxredirect);

  if (filter_var(ini_get(‘open_basedir’), FILTER_VALIDATE_BOOLEAN) === false 
      && filter_var(ini_get(‘safe_mode’), FILTER_VALIDATE_BOOLEAN) === false
  ) {

    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, $mr > 0);
    curl_setopt($ch, CURLOPT_MAXREDIRS, $mr);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);

  } else {
    
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);

    if ($mr > 0)
    {
      $original_url = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
      $newurl = $original_url;
      
      $rch = curl_copy_handle($ch);
      
      curl_setopt($rch, CURLOPT_HEADER, true);
      curl_setopt($rch, CURLOPT_NOBODY, true);
      curl_setopt($rch, CURLOPT_FORBID_REUSE, false);
      do
      {
        curl_setopt($rch, CURLOPT_URL, $newurl);
        $header = curl_exec($rch);
        if (curl_errno($rch)) {
          $code = 0;
        } else {
          $code = curl_getinfo($rch, CURLINFO_HTTP_CODE);
          if ($code == 301 || $code == 302) {
            preg_match('/Location:(.*?)\n/i', $header, $matches);
            $newurl = trim(array_pop($matches));
            
            // if no scheme is present then the new url is a
            // relative path and thus needs some extra care
            if(!preg_match("/^https?:/i", $newurl)){
              $newurl = $original_url . $newurl;
            }   
          } else {
            $code = 0;
          }
        }
      } while ($code && --$mr);
      
      curl_close($rch);
      
      if (!$mr)
      {
        if ($maxredirect === null)
        trigger_error('Too many redirects.', E_USER_WARNING);
        else
        $maxredirect = 0;
        
        return false;
      }
      curl_setopt($ch, CURLOPT_URL, $newurl);
    }
  }
  return curl_exec($ch);
}

This function is used in place of curl_exec() and no extra user privileges are required compared to Olaf’s workaround in the above linked blog post. Here’s how curl_exec_follow() is used:

$ch = curl_init("http://example.com");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$data = curl_exec_follow($ch);
curl_close($ch);

Hopefully this helped you.

52 Comments

  1. xxx

    Reɑlly when someone doesn’t understand then its up to otheг users that
    they ԝill assist, so here it happens.

Trackbacks / Pings

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>