From Development

How to Migrate Paperclip assets to Amazon S3

Recently, after deploying a web app on a dedicated server, we realized that the disk storage would be a future bottleneck to the business of the application.

It would require an exponential space during the time of app usage since it'll eventually be used in many simultaneous locations at once, while communicating with a central server uploading hundred of megabytes of assets.
We decided to move the storage away from our dedicated server to the Amazon AWS S3 to increase scalability of the application.

Easy stuff, we could say at this stage. Just switch some configuration in Paperclip in order to use S3.

The first problem

Well, the problem was that the app was already in production, so the change required us to move all the existing assets to the S3 in order to prevent losing data (obviously!).

A problem within the problem

During our research we found an easy fix to migrate all the assets to S3. The solution all over the Internet is quite self-explanatory; just send the folder where your assets are stored to the S3 (the well-known public/system folder)!

This would work in a situation where the url of Paperclip configuration was previously defined. The default schema that Paperclip uses ends up in a file with the location like:

public/system/documents/files/000/000/00{id}/{style}/{file}

To prevent losing time figuring out where /000/000/ came from, the explanation for all the leading 0’s, and in order to create a flexible solution, we decided to make our own strategy.

Requirements

  1. We need to move all the model attachments;
    
  2. We need to preserve all the styles;
    
  3. This needs to be done without locking the server;
    
  4. S3 bucket should be different according to Rails environments (test, production, staging);
    
  5. **We need to have minimum downtime.**
    

Solution

First we need to be able to switch the configuration on Paperclip to use S3 after migration, and to use our server during migration (we need to get the attachments files).

Lets see a typical model with attachments:

1.  class Document < ActiveRecord::Base  
2.    has_attached_file :file,   
3.      styles: {   
4.        original: ["300x300>"],   
5.        thumb: ["125x125>"]   
6.      },   
7.      default_url: lambda { |image| ActionController::Base.helpers.asset_path('no_image.png') }  
8.  end  

To prepare the model to be migrated to the S3 service we need to add a switch that can be easily activated.

Lets create an s3_migrate.yml file with some configuration:

1.  storage: :s3  
2.  s3_credentials: <%= "#{Rails.root}/config/aws.yml" %>  
3.  path: <%= "#{Rails.env}/:class/:attachment/:id/:style/:filename" %>  
4.  url: ':s3_domain_url'  
5.  activated: false

From line 1 until 4 is the normal configuration for the Paperclip S3 storage, in line 5 we added the switch to activate the S3.

If it's not activated, the application should serve the assets from the local server. Next we change our model definition in order to reflect these changes.

1.  class Document < ActiveRecord::Base  
2.    
3.    PAPERCLIP_STORAGE_OPTIONS = {  
4.      styles: {   
5.        original: ["300x300>"],   
6.        thumb: ["125x125>"]   
7.      },   
8.      default_url: lambda { |image| ActionController::Base.helpers.asset_path('no_image.png') }  
9.    }  
10.   
11.   migrate_options = YAML.load(ERB.new(File.read("#{Rails.root}/config/s3_migrate.yml")).result).symbolize_keys  
12.   PAPERCLIP_STORAGE_OPTIONS.merge! migrate_options if migrate_options[:activated]  
13.   
14.   has_attached_file :file,   
15.     PAPERCLIP_STORAGE_OPTIONS  
16. end  

In this snippet we store the Paperclip options in a hash (PAPERCLIP_STORAGE_OPTIONS) and load the file containing the migration options. We merge the S3 settings if activated is true.

For now we've prepared our model to switch to S3 after the migration process is complete.

MIGRATION

In order to migrate our assets to S3 we will need a custom task to perform the action. Here is the snippet responsible for it.

1.  namespace 'paperclip' do  
2.    
3.    desc "migrate to s3"  
4.    task :migrate => :environment do  
5.      s3_options = YAML.load_file(File.join(Rails.root, 'config/aws.yml')).symbolize_keys  
6.      migrate_options = YAML.load(ERB.new(File.read("#{Rails.root}/config/s3_migrate.yml")).result).symbolize_keys  
7.    
8.      bucket_name = s3_options[Rails.env.to_sym]["bucket"]  
9.    
10.     AWS.config(  
11.       :access_key_id => s3_options[Rails.env.to_sym]["access_key_id"],  
12.       :secret_access_key => s3_options[Rails.env.to_sym]["secret_access_key"]  
13.     )  
14.   
15.     s3 = AWS::S3.new  
16.     bucket = s3.buckets[bucket_name]  
17.   
18.     classes = []  
19.     classes = ENV['Class'].split(",") if ENV['Class']  
20.   
21.     classes.each do |class_info|  
22.       begin  
23.         class_name = class_info.split(":")[0]  
24.         attachment_name = class_info.split(":")[1].downcase  
25.   
26.         class_def = class_name.capitalize.constantize  
27.   
28.         puts "Migrating #{class_name}:#{attachment_name}..."  
29.         if class_def.all.empty?  
30.           puts "#{class_name} is empty"  
31.           next  
32.         end  
33.   
34.         styles = class_def.first.send(attachment_name).styles.map{|style| style[0]}  
35.           
36.         class_def.find_each do |instance|  
37.           if not instance.send(attachment_name).exists? or instance.send(attachment_name).url.include? "amazonaws"  
38.             next  
39.           end  
40.   
41.           styles.each do |style|  
42.             attach = instance.send(attachment_name).path(style.to_sym)  
43.             filename = attach.split("/").last  
44.             path = "#{Rails.env}/#{class_name.downcase.pluralize}/#{attachment_name.pluralize}/#{instance.id}/#{style}/#{filename}"  
45.             file = File.open(attach)  
46.             puts "Storing #{style} #{filename} in S3..."  
47.             attachment = bucket.objects[path].write(file, acl: :public_read)  
48.           end  
49.         end  
50.       rescue AWS::S3::Errors::NoSuchBucket => e  
51.         puts "Creating the non-existing bucket: #{bucket_name}"  
52.         s3.buckets.create(bucket_name)  
53.         retry  
54.       rescue Exception => e  
55.         puts "Ignoring #{class_name}"  
56.       end  
57.       puts ""  
58.     end  
59.   end  
60. end  

Create a file named paperclip_migrate.rake and put it on lib/tasks folder and paste the code above.

Run the task: rake paperclip:migrate Class=Document:file

Migrating Document:file...
Storing original file1.png in S3...
Creating the non-existing bucket: test-app

Migrating Document:file...
Storing original file1.png in S3...
Storing thumb file1.png in S3...
Storing original file2.png in S3...
Storing thumb file2.png in S3...

**After finishing the migration we need to update our s3_migrate.yml and activate the S3 storage. **

Just change the line activated to true. Restart the webserver and you are done!

Note 1: The path setting in s3_migrate.yml should be equal to the one in the line 44 of the task. Change that according to your scenario.

Note 2: After completing the migration and assuring that our assets were successfully migrated to S3 you can delete the files that are still stored in public/system.

Conclusion

Lets iterate through our requirements for this problem and validate each one of them.

1 - We need to move all the model attachments

As we add the ability to specify the class and objects dynamically, one can execute the task with whatever models wanted, example:

    rake paperclip:migrate Class=Document:file,Document:file2,Email:logo,Email:footer and so on.

Accomplished!

2 - We need to preserve all the styles

In our task, after we have checked that there is at least one object in our database, we can query it to retrieve all the defined styles:

1. class_def.first.send(attachment_name).styles.map{|style| style[0]}

It will return what we expect:

⇨ [:original, :thumb]

Accomplished!

3 - This needs to be done without locking the server

We use a rake task to perform the migration so it can be executed without affecting the usage of the application on production environment.

Accomplished!

4 - S3 bucket should be different according to Rails environments

In our s3_migrate.yml we define the path of assets in order to use the Rails.env variable, so the assets will be saved according to the environment that the application is running.

Accomplished!

5 - We need to have minimum downtime

We only need a single restart to the webserver! So we can say that the requirement was...

Accomplished!

And that's about everything you need to know to migrate Paperclip assets from a local server to Amazon S3.

At Imaginary Cloud, we simplify complex systems, delivering interfaces that users love. If you’ve enjoyed this article, you will certainly enjoy our newsletter, which may be subscribed below. Take this chance to also check our latest work and, if there is any project that you think we can help with, feel free to reach us. We look forward to hearing from you!