cURL: follow locations with safe_mode enabled or open_basedir set

cURL is a tool to connect to a remote server and load data from it while schemes like HTTP, HTTPS, FTP, gopher, telnet, DICT, FILE, LDAP and more are supported for the request URI.

PHP has built-in support by providing its users a layer upon the underlying libcurl library. Here is an example how cURL is used in PHP:

$ch = curl_init("http://example.com");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$data = curl_exec($ch);
curl_close($ch);

The option CURLOPT_RETURNTRANSFER advices PHP to return the result after executing curl_exec() on success and FALSE else. If this option is ommitted it would return TRUE instead of the result.

At some point you need to follow a location. This is the case if a server you are connecting to is replying with a location redirect. In a HTTP response the server would reply with a 301 or 302 status code and the HTTP header Location pointing to the new URI. In the code the option CURLOPT_FOLLOWLOCATION needs to be set to allow libcurl to follow the redirect.

$ch = curl_init("http://example.com");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
$data = curl_exec($ch);
curl_close($ch);

However if your web server has safe_mode activated or open_basedir set then CURLOPT_FOLLOWLOCATION won’t have any effect. The below warning will appear and libcurl won’t follow the new location.

Warning: curl_setopt() [function.curl-setopt]: CURLOPT_FOLLOWLOCATION
cannot be activated when in safe_mode or an open_basedir is set in ...

This blog post shows you a workaround by changing the server configuration and putting the loader script to a directory defined in open_basedir. Unfortunately many websites are hosted on a shared host. Hence most people can’t just alter something in a configuration file but rather need a user-space solution.

One solution is to follow redirects manually by examining the server response and send the request to the new location again. The next sample code does exactly this. The function curl_exec_follow is passed two arguments, one is the cURL handler and the second the maximum amount of allowed redirects. If a server response contains a redirect location the script also checks if it’s a URL or a relative path to the resource.

function curl_exec_follow($ch, &$maxredirect = null) {

  $mr = $maxredirect === null ? 5 : intval($maxredirect);

  if (ini_get('open_basedir') == '' && ini_get('safe_mode' == 'Off')) {

    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, $mr > 0);
    curl_setopt($ch, CURLOPT_MAXREDIRS, $mr);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);

  } else {

    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);

    if ($mr > 0)
    {
      $original_url = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
      $newurl = $original_url;

      $rch = curl_copy_handle($ch);

      curl_setopt($rch, CURLOPT_HEADER, true);
      curl_setopt($rch, CURLOPT_NOBODY, true);
      curl_setopt($rch, CURLOPT_FORBID_REUSE, false);
      do
      {
        curl_setopt($rch, CURLOPT_URL, $newurl);
        $header = curl_exec($rch);
        if (curl_errno($rch)) {
          $code = 0;
        } else {
          $code = curl_getinfo($rch, CURLINFO_HTTP_CODE);
          if ($code == 301 || $code == 302) {
            preg_match('/Location:(.*?)\n/', $header, $matches);
            $newurl = trim(array_pop($matches));

            // if no scheme is present then the new url is a
            // relative path and thus needs some extra care
            if(!preg_match("/^https?:/i", $newurl)){
              $newurl = $original_url . $newurl;
            }
          } else {
            $code = 0;
          }
        }
      } while ($code && --$mr);

      curl_close($rch);

      if (!$mr)
      {
        if ($maxredirect === null)
        trigger_error('Too many redirects.', E_USER_WARNING);
        else
        $maxredirect = 0;

        return false;
      }
      curl_setopt($ch, CURLOPT_URL, $newurl);
    }
  }
  return curl_exec($ch);
}

This function is used in place of curl_exec() and doesn’t need extra user privileges as you would need to change the server configuration.

$ch = curl_init("http://example.com");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$data = curl_exec_follow($ch);
curl_close($ch);

Hopefully this helped you.

DeliciousDiggTechnorati FavoritesRedditLinkedInFacebookSpurlTwitterWebnewsYiGGMySpaceYahoo BookmarksFriendFeedGoogle BookmarksLiveJournalShare

ID3 batch editing

Some of my music had no interpreter and title tags set thus I wrote a little shell script to batch edit these ID3 tags. The filenames of the songs looked like “01 Interpret – Song Title.mp3″. In a few moments I wrote the following php script1 which split the filename into the interpreter and the title.

All the python scripts listed here are using the library pytagger.

<?php
if ($handle = opendir('.'))
{
   while (false !== ($file = readdir($handle)))
   {
	   if ($file != "." && $file != "..")
	   {
		$file_name = $file;
		$file = explode("-", $file);

		if(count($file) < 2)
		  continue;

		$interpret = trim($file[0]);
		$title = trim(str_replace(".mp3","",$file[1]));

		echo "interpret: $interpret , tite: $title \n";
		// execute the python script id3.py to set the tags
		exec("./id3.py '$file_name' '$title' '$interpret' ");
	   }
	}
}

closedir($handle);
?>

Unfortunately I forgot to strip the digits so all the interpreters had two leading digits afterwards.

I had to iterate over the files again. As id3.py creates new tag frames I had to write another script (fix_interpreters.py) that reads the old frames to get the old values.

#!/usr/bin/env python

from tagger import *
import sys, os, fnmatch, pickle, re

filename = sys.argv[1]

print "Processing '", filename, "'"

try:
  id3 = ID3v2(filename)

  if id3.tag_exists():
    interpret_frame = None
    for frame in id3.frames:
      interpretfid = 'TPE1'
      if id3.version == 2.2:
        interpretfid = 'TP1'

      if frame.fid == interpretfid:
        interpret_frame = frame
        break

    interpret = interpret_frame.strings[0]
    repaired = re.sub('^\d\d\s', '', interpret)

    print "Repairing "", interpret, "" => "",repaired,"""
    interpret_frame.set_text(repaired)

    # replace interpret frame
    id3.frames = [frame for frame in id3.frames if frame.fid != interpretfid]
    id3.frames.append(interpret_frame)
    id3.commit(pretend=0)

except ID3Exception, e:
  print("ID3 exception: %s" % str(e))

I didn’t want to rewrite the php script above because I thought that typing a loop in the shell would be easier (or at least faster).

for file in `ls -1 *.mp3`; do ./fix_interpreters.py $file; done

I tried it with this loop but ls -1 *.mp3 doesn’t list each mp3 file one per line as expected. Instead it splits the filenames after each whitespace. Thankfully to Greg Miller’s post about Handling Filenames With Spaces this wasn’t no problem anymore. I changed the loop into

find ./ -name '* *' | while read filename; do ls -ld "$filename"; ./fix_interpreters.py "$filename"; done

and ta-da, it worked.

  1. PHP scripts can be executed in a shell. Use php -f script.php to execute a file or use -r instead of -f to run inline code.
DeliciousDiggTechnorati FavoritesRedditLinkedInFacebookSpurlTwitterWebnewsYiGGMySpaceYahoo BookmarksFriendFeedGoogle BookmarksLiveJournalShare

Removing line numbers from a source code

At the moment I’m learning developping eclipse plugins for my work. Therefore I had to copy some code 1 from this tutorial. Unfortunately I had to remove the stupid line numbers. It’s best to let remove them by a script. Because I spent a lot of time in WordPress plugin development few weeks ago I chose PHP to do the work.

<?php
$code = '
1.   package com.myplugin.rmp.views;
2.   import java.util.ArrayList;
3.   import org.eclipse.core.resources.IFile;
4.   import org.eclipse.core.resources.IFolder;
5.   import org.eclipse.core.resources.IProject;
6.   import org.eclipse.core.resources.IResource;
7.   import org.eclipse.core.resources.IWorkspace;
8.   import org.eclipse.core.resources.ResourcesPlugin;
9.   import org.eclipse.core.runtime.IAdaptable;
10.  import org.eclipse.jface.action.Action;
11.  import org.eclipse.jface.action.MenuManager;
12.  import org.eclipse.jface.dialogs.MessageDialog;

...

199.         public void setFocus() {
200.                  viewer.getControl().setFocus();
201.         }
202. }
';

echo "<pre>". preg_replace("/\d+\./", "", $code) ."</pre>";
?>

The script is self descriptive but be sure that you adopt the regular expression to your needs. The regex I used destroys code containing constructs like anyObject12.doAnything();

  1. Copy & paste is bad but 200 lines are too many to rewrite ;-)
DeliciousDiggTechnorati FavoritesRedditLinkedInFacebookSpurlTwitterWebnewsYiGGMySpaceYahoo BookmarksFriendFeedGoogle BookmarksLiveJournalShare

Using sub-patterns in regular expressions

Using sub-patterns allows you to just filter a part of a specific content. These are patterns in patterns. In the following example I show you how to get the text This text is what I want to filter.

<html>
<body>
This is a <u>sample text</u> to show you how to filter
some text surrounded by quotes. "Here's a dummy text to
make our example more complicated". Now here's the text we
are interested in: "This text is what I want to filter."
</body>
</html>

There are several php commands that handles regular expressions but only preg_match_all is the most interesting for us now.

Let us consider the the sub-pattern first. We’re looking for any characters surrounded by quotes that are different to the quote itself.

"([^"]+)"
() represents a group
[^"] any character that is not the quote
+ tells the regex engine to look for repeated characters one or more times

We’re not interested in the quotes itself so we put them to the left and right of the opening and closing braces. Thus the quotes don’t appear in the final result.

Now we have to specify which match of the sub-pattern we’d like to filter. This is quite easy. Just take some text in front of it, for example we are interested in:

we are interested in: "([^"]+)"

In this example there’s only one space between the colon and the first quote. If the number of spaces is unknown we have to use the meta-character \s followed by an asterisk that represents a space appearing zero or more times. So the final expression looks like

we are interested in:\s*"([^"]+)"

And now some php code to get the final result ;-)

$pattern = '/we are interested in:\s*"([^"]+)"/';
preg_match_all( $pattern, $input, $matches );
$final_result = $matches[1][0];

The pattern has to be surrounded by delimiters which have to be non-alphanumeric. With preg_match_all we’re looking for any matches stored in the array $matches. $matches[0] contains the matches by the full expression and $matches[1] those of the sub-pattern. They can be accessed by a further index variable. In this example $matches[1][0] contains This text is what I want to filter.

DeliciousDiggTechnorati FavoritesRedditLinkedInFacebookSpurlTwitterWebnewsYiGGMySpaceYahoo BookmarksFriendFeedGoogle BookmarksLiveJournalShare

Call-time pass-by-reference has been deprecated

I’ve just tested my new wordpress plugin and I’ve got this warning:

Warning: Call-time pass-by-reference has been deprecated - argument passed by value

My plugin is based on the wordpress plugin framework. It seems that it contains a little bug.

  Line 467:  $this-&gt;_UpdatePluginOptions( &amp;$_REQUEST );
  Line 718:  function _UpdatePluginOptions( &amp;$requestArray )

There’s a “&” too much just before $_REQUEST. Two examples below describe how you get that warning and how to avoid it. You may not get the warning if you still use PHP4.

Bad:

  function ping( $pong ) {}
  $ding = "dong";
  ping( &amp;$ding );

Good:

  function ping( &amp;$pong ) {}
  $ding = "dong";
  ping( $ding );
DeliciousDiggTechnorati FavoritesRedditLinkedInFacebookSpurlTwitterWebnewsYiGGMySpaceYahoo BookmarksFriendFeedGoogle BookmarksLiveJournalShare