where does wanproxy cache files

Juli Mallett juli at clockworksquid.com
Thu Mar 14 08:40:16 PDT 2013


On Thu, Mar 14, 2013 at 12:41 AM, Boxiang Pan <aquarypbx at gmail.com> wrote:
> Hi, Juli,
>
> We came up with another question:
>
> We tested sending 2G files to the "client", the second time the transmission
> from "client" to "server" was faster , but from "server" to "intranet" , it
> was still the same speed as first time. So wanproxy is only optimizing
> traffic between "client" and "server" , but not "server" and "intranet" ,
> correct? If so what is the purpose of the "intranet " in this setup?

That's correct.  "intranet" just represents the resource you want to
access over the WAN.  There are ways that WANProxy could provide
protocol-specific optimizations between "server" and "intranet", but
none are currently available.

> Thank you.
>
> On Fri, Mar 8, 2013 at 11:05 PM, Juli Mallett <juli at clockworksquid.com>
> wrote:
>>
>> On Fri, Mar 8, 2013 at 10:54 PM, Boxiang Pan <aquarypbx at gmail.com> wrote:
>> > Hi, Juli,
>> >
>> > We have successfully used wanproxy to transfer files from client to the
>> > intranet through the server. And we managed to time the transmission
>> > time.
>> > The second time, it indeed transferred faster. We've also managed to run
>> > wanproxy from our program. Thanks a lot for all the help!
>> >
>> > We have a few more questions about how wanproxy is implemented at top
>> > level.
>> >
>> > 1) Since it takes less time to transfer the second time, we assume that
>> > the
>> > file is cached somewhere, so where is the file cached? Is it cached on
>> > the
>> > "server", which is between the "client" and the "intranet"?
>>
>> As you may have noticed, WANProxy works independently of the protocol
>> being used.  It has no notion of files, only data.
>>
>> Both client and server remember the data that has been sent between
>> them in the past.  They remember it in blocks of 2KB, and each block
>> has a unique 64-bit name derived from the file hash.  When data is
>> being transferred between them, it is split up into blocks so that
>> where possible they only transfer the 64-bit name for parts of the
>> data being transferred, rather than the whole 2KB block.  Where data
>> has been inserted, removed or changed, the blocks around it will still
>> be replaced with their names, but the new or changed piece will be
>> transferred for the first time.
>>
>> > 2) In the scenario where a client is periodically backing up a large
>> > file,
>> > say 2GB,  to the "intranet", but each backup, the file only differs a
>> > small
>> > portion (100MB) from the previous version, does wanproxy treat each
>> > backup
>> > file as a completely different file and cache the entire file again, or
>> > does
>> > it smartly only updated the new 100MB in the previously cached file? If
>> > so,
>> > how does wanproxy tells the difference?
>>
>> Why don't you try it?  :)  The above probably makes the answer
>> obvious, but just to be clear:
>>
>> The data that are unchanged, that have been sent previously, will be
>> replaced with their names, and the data which are new or which are
>> changed will be transferred over the link and remembered for later
>> use.  So if a different 100MB changes every time, you should still
>> only need to send about 100MB of data plus the names that correspond
>> to the rest of it.
>>
>> > 3) If there have already been a lot of files cached on the "server", the
>> > next time the "client" is sending a file to the "server", how does the
>> > server know if that file has previously been cached?
>>
>> See above.  WANProxy does deduplication of TCP streams and is not file
>> oriented in any way.  As far as it knows, it could be deduplicating an
>> IRC session, E-Mail, web traffic, file replication, etc.
>>
>> > 4) Is there any reference document/ readme about the implementation of
>> > wanproxy that we may refer to?
>>
>> I have sent other information to the mailing lists in the past.  You
>> can also read about how 'rsync' works, although rsync is
>> file-oriented, but the algorithm is basically the same.
>>
>> There is a little bit of information here, but it is out of date about
>> protocol details:
>>
>> http://wanproxy.org/performance.shtml
>>
>> There have also been some posts to the mailing list, but there's not
>> any one message that I would point you to.  I'm happy to answer any
>> more questions.
>>
>> Thanks,
>> Juli.
>
>
>
>
> --
>
> Boxiang Pan
>
> Department of Electrical and Computer Engineering
> University of California, San Diego
> Tel: 858-999-7655
>
>



More information about the wanproxy mailing list