Hello, I’m a web craftsman with a passion for the modern web. I build web applications and play with social services and communities.

May 1, 2011 at 2:51 am

In the past months I have been confronted with a few web projects requiring scalable file storage solutions. One of them is my very own service comemories, which stores user-uploaded photos. Those photos are stored in original size as well as a few generated thumbnails.

To deal with file uploads, most developers seem to use Amazon S3 these days, so did I. Surely it’s a great service that pretty much just works, but with increasing traffic on comemories I started to feel a little uncomfortable about the way Amazon charges me. Basically they let a month pass and then sum up how much storage I use, how much traffic I used and how many API calls my app did.

So far so good, but what if my app sees some tremendous success suddenly and many more people upload photos than I anticipated? S3 would scale just fine and handle all the traffic with no issues at all… And by the end of the month I would get the big surprise in form of a credit card bill.

Of course it depends a lot on what you are trying to do, but if you are anything like me and you are running a web service more or less for fun and because you are learning about technology, you should really reconsider if S3 is the right choice.

Let’s have a look at their pricing for a little example application:

Image we run a web service with 10.000 users who uploaded 100 photo each at 5MB per photo. This would sum up at about 5TB of storage needed.

  • Amazon charges $0.14 per GB for the first TB leading to $140
  • then the other 4TB cost $0.125 per GB summing up at +$500
  • every image has to be transferred in once: +$500
  • and for simplicity let’s say every images is transferred out once: +$750

A total of $1890 for the month the images are uploaded. Now let’s imagine no more photos are uploaded from this point on but every photo is accessed once per month again. This means we would keep paying $1390 month to month.

Wow! 5TB is not even that much. You could buy three 2TB HDDs for 210€ altogether to get 6TB of storage and this would only be a one time cost instead of monthly charges. But ok, just buying plain HDDs is not really comparable to the service Amazon offers. But what about rented dedicated servers?

Hetzner Online offers the EQ9 for 99€ per month and the server has 4500GB storage installed. Coming to 0.022€ per GB. So I could just get two of them and have 9TB storage for less than 200€ per month. Now Amazon offers some redundancy. So let’s just get 4 of those servers and have everything stored twice. Voilà, redundant storage for under 400€ per month.

Let’s not forget that we actually have 4 powerful machines here with Core i7 and 12GB of RAM each! Those make some pretty good app servers too. If we ran this on Amazon EC2 instead, it would cost us another fortune.

Don’t get me wrong here. I know getting a bunch of dedicated servers is no cool and fancy cloud with unlimited automatic scaling capabilities and also requires quite a bit of operations know how and work, but for web applications of small to medium size with just 1-2 people working on them this may just be the better solution. Big enterprises on the other hand could see some benefits from outsourcing all storage worries to a over-priced 3rd party. But really, who can count themselves to those?

5 Responses to “Amazon S3: doing the math”

  1. Josh Clayton says:

    I totally agree about Amazon charges sneaking up on you, without any way to set a cap (which would be an awesome feature). Since it sounds like you’ve done the math, it seems reasonable to do one or many of the following:

    1. cap the limit of the upload to 500kb or 1mb
    2. limit the number of uploads per user per month
    3. start charging users

    Obviously, charging users specifically on comemories feels a bit odd since it’s so simple, but if a service you provide is really worth it (which I think comemories does), people will be glad to pay.

    Alternatively, feel free to impose some restrictions! Capping upload size is a quick and dirty way to save a ton on storage, and limiting the number of uploads will help too. Decreasing storage time (maybe to two weeks or a month) would also help.

    At any rate, I think the most helpful thing Amazon could do is allow users to set a maximum charge amount that, when met, causes the API calls to start returning 403s and notify the user that the limit’s been reached.

  2. Tom says:

    Very nice article. Which solution did you choose for your photo hosting service? Did check out other solutions?

  3. Herval Freire says:

    Hey there,

    Thanks for your comment on the RRS flag – thanks to it I jumped on and actually implemented some Paperclip support for it :)

    Work in progress here: https://github.com/thoughtbot/paperclip/issues/468

  4. “Image we run a web service with 10.000 users who uploaded 100 photo each at 5MB per photo. This would sum up at about 5TB of storage needed.”

    10,000 * 100= 1000,000 images
    1M * 5 M = 5 M , so it will sum up to 5GB, isnt?

  5. Matthias says:

    10.000 * 100 = 1.000.000
    1.000.000 * 5M = 5.000.000MB = 5.000GB = 5TB

Leave a Reply