Extending werkzeug for no sane reason
This blog post was originally written with Flask 1.x in mind, alongside the corresponding Werkzeug version. This commit for Flask 2.x may assist going forward.
Recently, I've found myself working on reverse engineering the Digicam Print Channel, a Japanese-only channel designed to allow you to print images. As it turns out, this channel is not as Japanese specific as once thought. Much of the channel's internals have overly verbose information as to ongoing operations. Multi-language support is apparent based on numerous functions switching on region/language operation. They even went as far as to set
Accept-Language on all requests, as this is not a default within NHTTP. (I consider all of this a treat. Insight into inner workings are always appreciated!)
NHTTP is the library commonly used by Nintendo on the Wii for HTTP operations. It still lives on in various forms to this day in various consoles, but far, far more improved. Most headers are
sprintf'd directly to a buffer, and are repeated across various libraries. I do not think there was any sort of unified development team writing this code at any point within Nintendo.
As part of development within WiiLink24, we use Flask for all projects, which in turn uses werkzeug to interface with WSGI-related functionality. In this situation unfortunately, NHTTP's quirks were too much for werkzeug. By default, it is able to separate
multipart/form-data request data into two types: form, which you might use in a submitted text field, and files, which is more usable for file uploads.
Our glorious friends at Nintendo send over a multipart form in the following format (abridged):
--t9Sf4yfjf1RtvDu3AA Content-Disposition: form-data; name="imageFileName" example.jpeg --t9Sf4yfjf1RtvDu3AA Content-Disposition: form-data; name="jpegData" Content-Type: application/octet-stream Content-Transfer-Encoding: binary [binary contents follow]
If you notice, this lacks the explicit mention of
filename="example.jpeg", instead sending the filename in an entirely separate field! After investigating source, it appears that werkzeug requires a filename for it to become part of FileStorage. While this surprisingly does not seem to be required by any RFC, many if not all browsers support this form and it is the most sensible course of action in my opinion.
Unfortunately, this meant that
jpegData is now part of the form field. Values returned are now strings. One might think that surely this is okay – we can simply convert a string back to bytes, right? They would be unfortunately wrong.
By default, werkzeug has a default charset handled this by taking something like
FFD8FFDB 00430001 (the start of an example JPEG) and replacing it with
EFBFBDEF BFBDEFBF BDEFBFBD 00430001 01
which.. looks an awful like a UTF-8 BOM. That's because it is – it's coercing the string to be UTF-8 once read from the upload stream in bytes. Attempting to try with a default charset of
ascii (a single-byte encoding) left us in a worst shape as it expected the proper ASCII range instead of
This left me searching for a proper answer. An initial solution was to attempt intercepting the initial
parse method. We could read into its stream, replace
name="jpegData; filename="jpegData" (suitable for our requirements), set the position to 0, write, and set the position once more to 0.
This once again brought more werkzeug internals – a type named LimitedStream existed. It wraps an io.IOBase, which hopefully should have
read. However, during runtime, it was discovered the stream given to LimitedStream from other WSGI code was an
_io.BufferedReader which lacks seek support.
Eventually, I decided to investigate what else I could monkeypatch. I found a function named
This meant that we could now have one module dedicated to calling the real
parse_multipart_headers, and create another function called
parse_multipart_headers_fix that would loop through headers and fix the faulty
Content-Disposition header. This function was called shortly before actual data separation, therefore allowing us to easily insert a filename and have this file truly be treated as a file.
You can see roughly what was done below, available in a more verbose form on the cam-server repository itself:
to_find = 'form-data; name="jpegData"' to_replace = 'form-data; name="jpegData"; filename="jpegData"' def parse_multipart_headers_fix(iterable): headers = parse_multipart_headers(iterable) if not headers: # No headers is silly and will break us later on. return exceptions.BadRequest() # Look for our faulty "jpegData" data. for key, value in headers: if value == to_find: # Make "jpegData" have a filename. headers[key] = to_replace return headers
I’m not sure how to end this as I’m now 8 hours lesser than I once was. Learning the internals of everything involved was great fun, and I hope to continue!