where does wanproxy cache files

Boxiang Pan aquarypbx at gmail.com
Fri Mar 8 23:13:44 PST 2013


Hi, Juli,

Thank you for the explanation, that makes a lot of sense!

On Fri, Mar 8, 2013 at 11:05 PM, Juli Mallett <juli at clockworksquid.com>wrote:

> On Fri, Mar 8, 2013 at 10:54 PM, Boxiang Pan <aquarypbx at gmail.com> wrote:
> > Hi, Juli,
> >
> > We have successfully used wanproxy to transfer files from client to the
> > intranet through the server. And we managed to time the transmission
> time.
> > The second time, it indeed transferred faster. We've also managed to run
> > wanproxy from our program. Thanks a lot for all the help!
> >
> > We have a few more questions about how wanproxy is implemented at top
> level.
> >
> > 1) Since it takes less time to transfer the second time, we assume that
> the
> > file is cached somewhere, so where is the file cached? Is it cached on
> the
> > "server", which is between the "client" and the "intranet"?
>
> As you may have noticed, WANProxy works independently of the protocol
> being used.  It has no notion of files, only data.
>
> Both client and server remember the data that has been sent between
> them in the past.  They remember it in blocks of 2KB, and each block
> has a unique 64-bit name derived from the file hash.  When data is
> being transferred between them, it is split up into blocks so that
> where possible they only transfer the 64-bit name for parts of the
> data being transferred, rather than the whole 2KB block.  Where data
> has been inserted, removed or changed, the blocks around it will still
> be replaced with their names, but the new or changed piece will be
> transferred for the first time.
>
> > 2) In the scenario where a client is periodically backing up a large
> file,
> > say 2GB,  to the "intranet", but each backup, the file only differs a
> small
> > portion (100MB) from the previous version, does wanproxy treat each
> backup
> > file as a completely different file and cache the entire file again, or
> does
> > it smartly only updated the new 100MB in the previously cached file? If
> so,
> > how does wanproxy tells the difference?
>
> Why don't you try it?  :)  The above probably makes the answer
> obvious, but just to be clear:
>
> The data that are unchanged, that have been sent previously, will be
> replaced with their names, and the data which are new or which are
> changed will be transferred over the link and remembered for later
> use.  So if a different 100MB changes every time, you should still
> only need to send about 100MB of data plus the names that correspond
> to the rest of it.
>
> > 3) If there have already been a lot of files cached on the "server", the
> > next time the "client" is sending a file to the "server", how does the
> > server know if that file has previously been cached?
>
> See above.  WANProxy does deduplication of TCP streams and is not file
> oriented in any way.  As far as it knows, it could be deduplicating an
> IRC session, E-Mail, web traffic, file replication, etc.
>
> > 4) Is there any reference document/ readme about the implementation of
> > wanproxy that we may refer to?
>
> I have sent other information to the mailing lists in the past.  You
> can also read about how 'rsync' works, although rsync is
> file-oriented, but the algorithm is basically the same.
>
> There is a little bit of information here, but it is out of date about
> protocol details:
>
> http://wanproxy.org/performance.shtml
>
> There have also been some posts to the mailing list, but there's not
> any one message that I would point you to.  I'm happy to answer any
> more questions.
>
> Thanks,
> Juli.
>



-- 
潘博翔
Boxiang Pan

Department of Electrical and Computer Engineering
University of California, San Diego
Tel: 858-999-7655
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.wanproxy.org/pipermail/wanproxy-wanproxy.org/attachments/20130308/90c31707/attachment-0001.htm>


More information about the wanproxy mailing list