Amazon S3
Uploading large files across unreliable network connections is a huge challenge, and many of you might have faced it in the past. Each time you try to upload files of large sizes, the connection breaks and you will be forced to restart the whole process. It is not very easy to resume the upload from where you have left off, instead you need to restart the upload from the beginning.
Amazon S3 provides a faster, easier and flexible method to upload larger files, known as “multipart upload” feature. This feature allows you to break the larger objects into smaller chunks and upload a number of chunks in parallel. If any of the chunks fails to upload, you can restart it. Due to the parallel upload process, it is possible to improve the overall upload speed.
You can break the files into as many as 1024 separate parts and upload each one independently, as long as each part is sized 5MB or more. If the upload of a part fails, it can be restarted without affecting the upload of any of the other parts. Once you finish uploading all the parts, you can ask S3 to assemble the full object with another call to S3.
The application functions in the following way:
- Separates the source object into multiple parts.
- Initiates the multipart upload and receives an upload ID in return.
- Uploads each part (a contiguous portion of an object’s data) accompanied by the upload ID and a part number (1-10,000 inclusive). The part numbers need not be adjacent, but the order of the parts determines the position of the part within the object. S3 will return an ETag in response to each upload.
Finalizes the upload by providing the upload ID and the part number / ETag pairs for each part of the object.
The API File uploading process is given below:
-
Enter your AWS credentials and create an instance of the AmazonS3Client
Initiate multipart upload by executing the initiateMultipartUpload method. Provide the required information needed to initiate the multipart upload, by creating an instance of the InitiateMultipartUploadRequest class.
-
Save the Upload ID returned by the initiateMultipartUpload method. This upload ID is required for each of the subsequent multipart upload operations.
-
Next step is to upload the parts. For each part upload, execute the uploadPart method. For this, you need to provide part upload information such as upload ID, bucket name, and the part number. This information can be provided by creating an instance of the UploadPartRequest class.
-
You can save the response of the uploadPart method in a list. This response includes the ETag value and the part number that you will require to complete the multipart upload, later.
-
For each part, you need to repeat the tasks mentioned in steps 4 and 5.
-
Execute the completeMultipartUpload method to complete the multipart upload.
Sample code for achieving multipart upload:
Related read: How to Create and Manage a Bucket in Amazon S3.
Let us know how the above steps worked for you. We appreciate your thoughts on our posts. Do share your tips and suggestions in the Comments box below.