Recently, after deploying a web app on a dedicated server, we realized that the disk storage would be a future bottleneck to the business of the application.
It would require an exponential space during the time of app usage since it'll eventually be used in many simultaneous locations at once, while communicating with a central server uploading hundred of megabytes of assets.
We decided to move the storage away from our dedicated server to the Amazon AWS S3 to increase scalability of the application.
Easy stuff, we could say at this stage. Just switch some configuration in Paperclip in order to use S3.
The first problem
Well, the problem was that the app was already in production, so the change required us to move all the existing assets to the S3 in order to prevent losing data (obviously!).
A problem within the problem
During our research we found an easy fix to migrate all the assets to S3. The solution all over the Internet is quite self-explanatory; just send the folder where your assets are stored to the S3 (the well-known public/system folder)!
This would work in a situation where the url of Paperclip configuration was previously defined. The default schema that Paperclip uses ends up in a file with the location like:
public/system/documents/files/000/000/00{id}/{style}/{file}
To prevent losing time figuring out where /000/000/ came from, the explanation for all the leading 0’s, and in order to create a flexible solution, we decided to make our own strategy.
Requirements
-
We need to move all the model attachments;
-
We need to preserve all the styles;
-
This needs to be done without locking the server;
-
S3 bucket should be different according to Rails environments (test, production, staging);
-
**We need to have minimum downtime.**
Solution
First we need to be able to switch the configuration on Paperclip to use S3 after migration, and to use our server during migration (we need to get the attachments files).
Lets see a typical model with attachments:
1. class Document < ActiveRecord::Base
2. has_attached_file :file,
3. styles: {
4. original: ["300x300>"],
5. thumb: ["125x125>"]
6. },
7. default_url: lambda { |image| ActionController::Base.helpers.asset_path('no_image.png') }
8. end
To prepare the model to be migrated to the S3 service we need to add a switch that can be easily activated.
Lets create an s3_migrate.yml file with some configuration:
1. storage: :s3
2. s3_credentials: <%= "#{Rails.root}/config/aws.yml" %>
3. path: <%= "#{Rails.env}/:class/:attachment/:id/:style/:filename" %>
4. url: ':s3_domain_url'
5. activated: false
From line 1 until 4 is the normal configuration for the Paperclip S3 storage, in line 5 we added the switch to activate the S3.
If it's not activated, the application should serve the assets from the local server. Next we change our model definition in order to reflect these changes.
1. class Document < ActiveRecord::Base
2.
3. PAPERCLIP_STORAGE_OPTIONS = {
4. styles: {
5. original: ["300x300>"],
6. thumb: ["125x125>"]
7. },
8. default_url: lambda { |image| ActionController::Base.helpers.asset_path('no_image.png') }
9. }
10.
11. migrate_options = YAML.load(ERB.new(File.read("#{Rails.root}/config/s3_migrate.yml")).result).symbolize_keys
12. PAPERCLIP_STORAGE_OPTIONS.merge! migrate_options if migrate_options[:activated]
13.
14. has_attached_file :file,
15. PAPERCLIP_STORAGE_OPTIONS
16. end
In this snippet we store the Paperclip options in a hash (PAPERCLIP_STORAGE_OPTIONS) and load the file containing the migration options. We merge the S3 settings if activated is true.
For now we've prepared our model to switch to S3 after the migration process is complete.
MIGRATION
In order to migrate our assets to S3 we will need a custom task to perform the action. Here is the snippet responsible for it.
1. namespace 'paperclip' do
2.
3. desc "migrate to s3"
4. task :migrate => :environment do
5. s3_options = YAML.load_file(File.join(Rails.root, 'config/aws.yml')).symbolize_keys
6. migrate_options = YAML.load(ERB.new(File.read("#{Rails.root}/config/s3_migrate.yml")).result).symbolize_keys
7.
8. bucket_name = s3_options[Rails.env.to_sym]["bucket"]
9.
10. AWS.config(
11. :access_key_id => s3_options[Rails.env.to_sym]["access_key_id"],
12. :secret_access_key => s3_options[Rails.env.to_sym]["secret_access_key"]
13. )
14.
15. s3 = AWS::S3.new
16. bucket = s3.buckets[bucket_name]
17.
18. classes = []
19. classes = ENV['Class'].split(",") if ENV['Class']
20.
21. classes.each do |class_info|
22. begin
23. class_name = class_info.split(":")[0]
24. attachment_name = class_info.split(":")[1].downcase
25.
26. class_def = class_name.capitalize.constantize
27.
28. puts "Migrating #{class_name}:#{attachment_name}..."
29. if class_def.all.empty?
30. puts "#{class_name} is empty"
31. next
32. end
33.
34. styles = class_def.first.send(attachment_name).styles.map{|style| style[0]}
35.
36. class_def.find_each do |instance|
37. if not instance.send(attachment_name).exists? or instance.send(attachment_name).url.include? "amazonaws"
38. next
39. end
40.
41. styles.each do |style|
42. attach = instance.send(attachment_name).path(style.to_sym)
43. filename = attach.split("/").last
44. path = "#{Rails.env}/#{class_name.downcase.pluralize}/#{attachment_name.pluralize}/#{instance.id}/#{style}/#{filename}"
45. file = File.open(attach)
46. puts "Storing #{style} #{filename} in S3..."
47. attachment = bucket.objects[path].write(file, acl: :public_read)
48. end
49. end
50. rescue AWS::S3::Errors::NoSuchBucket => e
51. puts "Creating the non-existing bucket: #{bucket_name}"
52. s3.buckets.create(bucket_name)
53. retry
54. rescue Exception => e
55. puts "Ignoring #{class_name}"
56. end
57. puts ""
58. end
59. end
60. end
Create a file named paperclip_migrate.rake and put it on lib/tasks folder and paste the code above.
Run the task: rake paperclip:migrate Class=Document:file
Migrating Document:file...
Storing original file1.png in S3...
Creating the non-existing bucket: test-app
Migrating Document:file...
Storing original file1.png in S3...
Storing thumb file1.png in S3...
Storing original file2.png in S3...
Storing thumb file2.png in S3...
**After finishing the migration we need to update our s3_migrate.yml and activate the S3 storage. **
Just change the line activated to true. Restart the webserver and you are done!
Note 1: The path setting in s3_migrate.yml should be equal to the one in the line 44 of the task. Change that according to your scenario.
Note 2: After completing the migration and assuring that our assets were successfully migrated to S3 you can delete the files that are still stored in public/system.
Conclusion
Lets iterate through our requirements for this problem and validate each one of them.
1 - We need to move all the model attachments
As we add the ability to specify the class and objects dynamically, one can execute the task with whatever models wanted, example:
rake paperclip:migrate Class=Document:file,Document:file2,Email:logo,Email:footer and so on.
Accomplished!
2 - We need to preserve all the styles
In our task, after we have checked that there is at least one object in our database, we can query it to retrieve all the defined styles:
1. class_def.first.send(attachment_name).styles.map{|style| style[0]}
It will return what we expect:
⇨ [:original, :thumb]
Accomplished!
3 - This needs to be done without locking the server
We use a rake task to perform the migration so it can be executed without affecting the usage of the application on production environment.
Accomplished!
4 - S3 bucket should be different according to Rails environments
In our s3_migrate.yml we define the path of assets in order to use the Rails.env variable, so the assets will be saved according to the environment that the application is running.
Accomplished!
5 - We need to have minimum downtime
We only need a single restart to the webserver! So we can say that the requirement was...
Accomplished!
And that's about everything you need to know to migrate Paperclip assets from a local server to Amazon S3.