HTML forms that support file uploads are a staple of many modern web applications. For those interested in a little World Wide Web history, the form tag first appeared in the never approved standard of HTML 2 (RFC 1866) around 1995. It appeared around the time when the table tag and many other tags commonly used today first appeared. The ability to upload files through form submissions was formally suggested in 1995 through RFC 1867 submitted by E. Nebel and L. Masinter.
The capability to collect information from website visitors, and especially the ability for users to upload files to the web server, would forever transform the World Wide Web from a static medium for posting pages into an information management tool. From online checkbooks to online email clients, from file sharing to social media applications, form submission and file uploading through a web browser have transformed life in the modern world.
How does it work?
Up until Web Sockets and version 2.0 of the XMLHttpRequest object, file uploads could only be accomplished through a POST of the form data using the built-in form submission capabilities of a web browser.
To submit a simple HTML form to the web server, a developer would need to add a form tag and form content to the web page. The form content could be a mixture of input, select, and textarea elements combined with a submit button. When clicked, the submit button would submit the entered form data to the web server URL specified by the action attribute, which uses the HTTP action verb specified by the method attribute of the form tag.
<form action="index.php" method="post"> <div> <label for="first-name">First Name</label> <input type="text" id="first-name" name="first-name"> </div> <div> <label for="last-name">Last Name</label> <input type="text" id="last-name" name="last-name"> </div> <div> <label for="age">Age</label> <input type="text" id="age" name="age"> </div> <input type="submit"> </form>
While the method attribute could be set to GET or POST, typically, it would be set to POST since GET requests do not modify existing data. A GET request can be executed multiple times in a row returning the same result. POST requests are used when data will be modified, as requests cannot be executed over and over again. When set to POST, the form content would be transformed to an ampersand (&) delimited string of URL-encoded name-value pairs and posted in the body of the HTTP POST request to the web server. For the form above, the following URL-encoded form data would look similar to the structure of the data posted in the request body to the web server. For more information on GET and POST, please see http://www.w3schools.com/tags/ref_httpmethods.asp.
The web server would then receive the request, parse the request body, and, depending upon the server-side language/platform being used (such as Perl, PHP, Classic ASP, ASP.Net, or Java), the form field values would be made available to the web server application.
The form configuration above works great for forms that do not support file uploads. However, if a form needs to allow the user to upload a file, an additional attribute is needed on the form element. For file uploads, the form tag’s method attribute needs to be set to POST and the enctype attribute needs to be set to “multipart/form-data”. In RFC 1866, the enctype attribute was added, but only supported one value: “application/x-www-form-urlencoded”. This value (which also serves as the default value) is responsible for the URL-encoding for the form data described already.
To support file uploads, RFC 1867 added the new multipart value for uploading files. In addition to this HTML specification and other web browser enhancements, server-side languages also needed to be updated to support the new “multipart/form-data” enctype. Some languages, such as PHP, supported file uploads natively, while other language such as Classic ASP, required a third party plugin or extension to properly handle file uploads.
<form action="index.php" method="post" enctype="multipart/form-data"> <div> <label for="first-name">First Name</label> <input type="text" id="first-name" name="first-name"> </div> <div> <label for="last-name">Last Name</label> <input type="text" id="last-name" name="last-name"> </div> <div> <label for="age">Age</label> <input type="text" id="age" name="age"> </div> <div> <label for="resume">Upload Resume</label> <input type="file" id="resume" name="resume"> </div> <input type="submit"> </form>
Once the form tag is configured correctly, an input element with the type attribute set to “file” can be added to the form. The type attribute was added in RFC 1866, but the value “file” was added in RFC 1867. Per RFC 1867, for the “file” input element, the web browser will display a UI widget that will allow the user to select a file (or multiple files) to upload. After the file is selected (and the rest of the form is filled out as appropriate), the user can then click on submit to post the form data and upload the file to the server.
When “multipart/form-data” content is posted to the server, the Content-Type header is set to “multipart/form-data” and a boundary string used to separate the multiple parts is specified.
Content-Type: multipart/form-data; boundary=----WebKitFormBoundaryK8NOU66bOhWkzidB
When the request body is posted, the boundary is used between each field of data being uploaded, including non-file fields. Within each part, additional headers such as Content-Disposition and Content-Type may be specified.
------WebKitFormBoundaryK8NOU66bOhWkzidB Content-Disposition: form-data; name="first-name" Bob ------WebKitFormBoundaryK8NOU66bOhWkzidB Content-Disposition: form-data; name="last-name" Smith ------WebKitFormBoundaryK8NOU66bOhWkzidB Content-Disposition: form-data; name="age" 45 ------WebKitFormBoundaryK8NOU66bOhWkzidB Content-Disposition: form-data; name="resume"; filename="test-text-file.txt" Content-Type: text/plain Test text file for testing file uploads. ------WebKitFormBoundaryK8NOU66bOhWkzidB--
As you can see, the amount of data sent per field is significantly greater using “multipart/form-data” than it is with the default “application/x-www-form-urlencoded” enctype. Therefore, the “multipart/form-data” enctype should only be used when uploading files. Nevertheless, the multipart format easily accommodates multiple files being uploaded.
When RFC 1867 was first implemented by web browsers, multiple files could be uploaded simultaneously through the use of multiple input elements of type “file”. In modern web browsers, a single input element can be used to select multiple files. HTML5 added a new attribute named multiple that is used to specify that a “file” input can accept multiple files.
<div> <label for="resumes">Upload Resumes</label> <input type="file" id="resumes" name="resumes" multiple> </div>
The web browser’s widget for file uploads allows the user to select multiple files, and then displays on the web page the number of files selected. When the submit button for the form is clicked, there will be a part in the multipart submission for each file selected. The field name for all of the files will be the same.
------WebKitFormBoundaryCZykT6Xc9CmIjJYK Content-Disposition: form-data; name="first-name" Bob ------WebKitFormBoundaryCZykT6Xc9CmIjJYK Content-Disposition: form-data; name="last-name" Smith ------WebKitFormBoundaryCZykT6Xc9CmIjJYK Content-Disposition: form-data; name="age" 45 ------WebKitFormBoundaryCZykT6Xc9CmIjJYK Content-Disposition: form-data; name="resumes"; filename="test-text-file-2.txt" Content-Type: text/plain Test text file for testing file uploads 2. ------WebKitFormBoundaryCZykT6Xc9CmIjJYK Content-Disposition: form-data; name="resumes"; filename="test-text-file.txt" Content-Type: text/plain Test text file for testing file uploads. ------WebKitFormBoundaryCZykT6Xc9CmIjJYK--
Observe in the above “multipart/form-data” request that the name for both file upload parts is the same: “resumes”. The server-side platform would then be responsible for providing an effective mechanism to allow web application developer to work with each file individually.
The Content-Disposition header is used to specify the file name of the file being transferred. It is used to specify a filename for both downloading and uploading files. There are three values specified by the Content-Disposition header: disposition type, name, and filename.
For file uploads, the disposition type is “form-data” because the enctype is “multipart/form-data”.
The name is the name of the form field.
Finally, the filename is the suggested name for the file when saving it on the server. The filename should only be the name of the file with no additional directory path. If the directory path is submitted, it should be ignored.
Files are uploaded with the appropriate Content-Type as determined by the local operating system, typically from the file’s extension. Because the HTTP protocol supports the transmission of binary data, binary files are sent binary data, therefore no Content-Transfer-Encoding header needs to be set.
Limiting File Types
Many times, it’s useful to limit the kinds of files uploaded by the user on a given form. RFC 1867 provided an attribute named accept which could be used to limit file uploads to a particular kind of file such as an image file, an audio file, a video file or even a file with a certain extension.
<input type="file" id="resume" name="resume" accept="image/*">
While this capability is not a fool-proof validation technique, it can be useful as a quick validation at the time of file selection. When the accept attribute is set, the web browser will apply a filter to the file selection dialog while the user is selecting the file(s) to upload. However, the user can override that filter and upload any type of file they would like. Therefore, additional client-side and server-side validation is absolutely necessary.
Twenty years ago when forms and file uploads first appeared, the vast majority of web applications reloaded web pages with every post of a form to the web server. Today, with the rise of Single-Page Applications (SPAs) and enhancements to the Web Browser APIs, file uploads are rarely accomplished using a full page reload after posting the form (except for older legacy applications).
The next two blog posts will discuss the XHR2 and Web Sockets techniques, including the use of the Drag & Drop APIs to upload files dragged and dropped onto the web page.
Author: Eric Greene, one of Accelebrate’s instructors.