PHP – parsing multipart/form-data the correct way when using non-POST methods

If you are here, then you probably are familiar with the problem.  You want to be a good netizen and make a RESTful web API in PHP, but you want it to be robust and handle typical POST data from other HTTP methods – namely, PUT, PATCH, and DELETE.

This helpful walkthrough of making a REST api in PHP suggests using parse_str.  parse_str will only work on application/x-www-form-urlencoded data – meaning, the other common MIME type, multipart/form-data, is not supported.  What does this mean?  Well, complex data cannot be transmitted using simple key-value coding – particularly, file uploads.

So you put on your programming hat and you have these steps already:

  1. Check $_SERVER[‘REQUEST_METHOD’]
  2. If not POST, check $_SERVER[‘CONTENT_TYPE’]
  3. If application/x-www-form-urlencoded, parse_str
  4. If multipart/form-data, handle multipart (but how?!)
  5. If anything else, treat it as a binary stream upload.

That is exactly where I was.  I could have given up, which would have been the smart thing to do, but instead I banged my head against it until it worked.

At this point, let’s break and talk about testing.  You need software to test your API with otherwise you’ll go mad.  I found RESTClient.

Back to the task at hand – you may have noticed that checking CONTENT_TYPE was annoying since it sometimes has more than just the content type (charset).  For file uploads, we will need to read name and filename from Content-Disposition.  To handle that, I’ve come up with the following little overly-complicated code:

function HttpParseHeaderValue($line) {
    $retval = array();
    $regex  = <<<'EOD'
                /
            (?:^|;)\s*
                (?[^=,;\s"]*)
                (?:
                    (?:="
                        (?[^"\\]*(?:\\.[^"\\]*)*)
                        ")
                    |(?:=(?[^=,;\s"]*))
                )?
                /mx
EOD;

    $matches = null;
    preg_match_all($regex, $line, $matches, PREG_SET_ORDER);

    for($i = 0; $i < count($matches); $i++) {
        $match = $matches[$i];
        $name = $match['name'];
        $quotedValue = $match['quotedValue'];
        if(empty($quotedValue)) {
            $value = $match['value'];
        } else {
            $value = stripcslashes($quotedValue);
        }
        if(empty($value) && $i == 0) {
            $value = $name;
            $name = 'value';
        }
        $retval[$name] = $value;
    }
    return $retval;
}

Great! Step 1 achieved! We can now go back and use this on CONTENT_TYPE. If you’re like me, you probably already saw some incorrect or ugly answers on StackOverflow. One involved a regex that would load copious amounts of data into memory, another was less flawed but more on-track.

If you’re anything like me, you’ll rewrite this anyway, but at least it’s a little better:

function HttpParseMultipart($stream, $boundary, array &$variables, array &$files) {
    if($stream == null) {
        $stream = fopen('php://input');
    }

    $partInfo = null;

    $lineN = fgets($stream);
    while(($lineN = fgets($stream)) !== false) {
        if(strpos($lineN, '--') === 0) {
            if(!isset($boundary)) {
                $boundary = rtrim($lineN);
            }
            continue;
        }

        $line = rtrim($lineN);

        if($line == '') {
            if(!empty($partInfo['Content-Disposition']['filename'])) {
                HttpParseMultipartFile($stream, $boundary, $partInfo, $files);
            } else {
                HttpParseMultipartVariable($stream, $boundary, $partInfo['Content-Disposition']['name'], $variables);
            }
            $partInfo = null;
            continue;
        }

        $delim = strpos($line, ':');

        $headerKey = substr($line, 0, $delim);
        $headerVal = ltrim(substr($line, $delim + 1));
        $partInfo[$headerKey] = HttpParseHeaderValue($headerVal);
    }
    fclose($stream);
}

function HttpParseMultipartVariable($stream, $boundary, $name, &$array) {
    $fullValue = '';
    $lastLine = null;
    while(($lineN = fgets($stream)) !== false && strpos($lineN, $boundary) !== 0) {
        if($lastLine != null) {
            $fullValue .= $lastLine;
        }
        $lastLine = $lineN;
    }

    if($lastLine != null) {
        $fullValue .= rtrim($lastLine, "\r\n");
    }

    $array[$name] = $fullValue;
}

function HttpParseMultipartFile($stream, $boundary, $info, &$array) {
    $tempdir = sys_get_temp_dir();
    // we should technically 'clean' name - replace '.' with _, etc
    // http://stackoverflow.com/questions/68651/get-php-to-stop-replacing-characters-in-get-or-post-arrays
    $name = $info['Content-Disposition']['name'];
    $fileStruct['name'] = $info['Content-Disposition']['filename'];
    $fileStruct['type'] = $info['Content-Type']['value'];

    $array[$name] = &$fileStruct;

    if(empty($tempdir)) {
        $fileStruct['error'] = UPLOAD_ERR_NO_TMP_DIR;
        return;
    }

    $tempname = tempnam($tempdir, 'php');
    $outFP = fopen($tempname, 'wb');

    $fileStruct['tmp_name'] = $tempname;
    if($outFP === false) {
        $fileStruct['error'] = UPLOAD_ERR_CANT_WRITE;
        return;
    }

    $lastLine = null;
    while(($lineN = fgets($stream, 8096)) !== false && strpos($lineN, $boundary) !== 0) {
        if($lastLine != null) {
            if(fwrite($outFP, $lastLine) === false) {
                $fileStruct['error'] = UPLOAD_ERR_CANT_WRITE;
                return;
            }
        }
        $lastLine = $lineN;
    }

    if($lastLine != null) {
        if(fwrite($outFP, rtrim($lastLine, "\r\n")) === false) {
                $fileStruct['error'] = UPLOAD_ERR_CANT_WRITE;
                return;
        }
    }
    $fileStruct['error'] = UPLOAD_ERR_OK;
    $fileStruct['size'] = filesize($tempname);
}

The only remaining caveats that I can find:

  • HttpParseMultipartVariable does not populate indexed variables
  • HttpParseMultipartFile does not replicate PHP’s POST behavior 100%

 

Good luck and have fun!

 

Comments (2)

  1. Misiek wrote::

    Your code doesn’t work with headers. Regex is wrong.

    Friday, August 8, 2014 at 9:19 am #
  2. fleeb wrote::

    Misiek noted that the code doesn’t quite work as-is. And that is true.

    When xtravar posted this, I suspect the parser interpreted parts of his regular expression as HTML tags, and stripped them.

    Three of the regex operations that start with ‘?[‘ should be ‘?’, then a less-than symbol, then a name, followed by a greater-than symbol.

    The first one has the variable ‘name’.

    The second one has the variable ‘quotedValue’.

    The last one has the variable ‘value’.

    The only other problem involves the last line of the content… it will have an extra two – characters at the end to indicate that no more variables exist. This code doesn’t account for that.

    To correct this, look within HttpParseMultipartVariable for the while loop. Add:

    && strpos($lineN, $boundary.”–“) !== 0

    to the test, and it should address that problem. Without this, your last variable’s value will contain the tag after a newline.

    I hope this helps someone. Sorry I can’t post the entire code here.

    Thursday, March 24, 2016 at 12:08 pm #