Monday, November 22, 2010

Goodbye Google App Engine (GAE)

   Choosing GAE as the platform four our project is a mistake which cost I estimate in about 15000€. Considering it's been my money, it is a "bit" painful.
   GAE is not exactly comparable to Amazon, Rackspace or any of this hosting services, because it is a  PaaS (Platform as a Service) rather than just a cluster of machines. This means that you just use the platform and gain scalability, high availability and all those things we want for our websites, without any other software architecture. Cool, isn't it.
   You do not pay until you get a lot of traffic to your site so it sounds very appealing for a small startup. I read about this in a book called "The web success startup guide" by Bob Walsh.
So yes, everything looked cool and we decided to go with this platform a couple of months ago.
It supports Python and Django (without the ORM) which we love so we tried it out. We made a spike, kind of 'hello world' and it was easy and nice to deploy.
When we started developing features we started realizing the hard limitations imposed but the platform:

1. It requires Python 2.5, which is really old. Using Ubuntu that means that you need a virtualenv or chroot with a separate environment in order to work with the SDK properly: Ok, just a small frustration.


2. You can't use HTTPS with your own domain (naked domain as they called) so secure connections should go though yourname.appspot.com: This just sucks.


3. No request can take more than 30 secons to run, otherwise it is stopped: Oh my god, this has been a pain in the ass all the time. When we were uploading data to the database (called datastore a no-sql engine) the uplaod was broken after 30 seconds so we have to split the files and do all kind of difficult stuff to manage the situation. Running background tasks (cron) have to be very very well engineered too, because the same rule applies. There are many many tasks that need to take more than 30 secons in website administration operations. Can you imagine?


4. Every GET or POST from the server to other site, is aborted if it has not finished within 5 seconds. You can configure it to wait till 10 seconds max. This makes impossible to work with Twitter and Facebook many times so you need intermediate servers. Again, this duplicates the time you need to accomplish what seemed to be a simple task


5. You can't use python libraries that are build on C, just libraries written in python: Forget about those great libraries you wanted to use.


6. There are no "LIKE" operator for queries. There is a hack that allows you to make a kind of "starts with" operator. So forget about text search in database. We had to work around this normalizing and filtering data in code which took as 4 times the estimated time for many features.


7. You can't join two tables. Forget about "SELECT table1, table2..." you just can't. Now try to think about the complexity this introduces in your source code in order to make queries.


8. Database is really slow. You have to read about how to separate tables using inheritance so that you can search in a table, get the key and then obtain its parent in order to avoid deserialization performance and all kind and weird things.


9. Database behavior is not the same in the local development server than in google servers. So you need some manual or selenium tests to run them against the cloud, because your unit and integration tests won't fail locally. Again this means more money.


10. If you need to query on a table asking for several fields, you need to create indexes. If those fields are big, it is easy to get the "Too many indexes" runtime exception.

Read more: El blog de Carlos Ble