2011-09-10

When the Cloud messes with you

by Forrest Sheng Bao http://fsbao.net

I think a big difference between cloud computing and grid or cluster computing is that you use Cloud as a storage that you frequently access. In this sense, Cloud is our great friend. However, today I realise this could be a point where Cloud messes with you.

I created a lot of small files (in many deeply hiearchical folders) by mistake on a Dropbox-watching folder. I have 4 computers, all connected to Dropbox. When I was deleting files on one computer, Dropbox was adding them to another computer, and later, it synced the files I just deleted back!

So I thought about the reason. This is what I got on my Linux server - where files are always synchronized with my dekstops by Dropbox.
$ python ~/bin/dropbox.py status
Uploading 13,904 files...
Indexing 24,132 files...
Downloading file list...
Too many files are the cause.  This is what happened, to the best of my brain power:
  1. When file F is deleted on computer A, it won't be removed from the Cloud unless local synchronzation client indexes it and/or has time to report the Cloud.
  2. Before that, synchronization client on computer B will download F. 
  3. After A reports the Cloud to remove F, it will take some time for the Cloud tells B to do the same. In my case, "some time" is long enough to cause problem.
  4. It is very possible that before 3 happens, B indexes F and considers it as a new file and upload it.
  5. Since F now is a new file recently uploaded, the Cloud will tell A to download F.
  6. This may take forever - maybe even a dead loop.
The solution is very easy: No uploading, indexing and downloading file list at the same time. The lantency to go over all files may cause problems. I finally calmed everything down by connecting only one computer to Dropbox at one time.

But Dropbox uses a non-conservative strategy for quick synchronization across computers.

This purpose of this blog post is not to attack Dropbox, which, i have no doubt,  is a great company. But this is a problem when using Cloud as main storage.

(Picture) My precious Saturday afternoon:

I may make mistaking in steps 1-5, please feel free to tell me. I didn't think very carefully.

No comments: