General questions...

Wed Feb 24 21:59:08 PST 2016

Hi Robert,

Great questions — here's a quick attempt at some answers.

On Wed, Feb 24, 2016 at 4:04 PM, Robert Jordan <robert at conductorio.com> wrote:
> Hi WanProxy Community,
>
> I'm researching WanProxy for possible inclusion in our SAAS product and I
> have a few questions (I have read the wanproxy.org site and a few years
> worth of this list so hopefully these haven't been covered, apologies if
> they have!):
>
> 1.  Are there options for encrypting the proxy to proxy connection?  I see
> that there has been work on SSH proxying which would meet this requirement,
> but unfortunately without the ability to authenticate with keys I don't
> think it will work for our use case.

Nope.  This would be easy to do, but has not yet been done.  It's a
matter of priorities — none of the WANProxy users seem to need this
very much, and it's easy to deal with isolation or security of a lot
of long links in other ways, and run WANProxy over that.  I have a
work-in-progress attempt at this, and while it would simplify some
other things, I'm a little reluctant to move forward without a user
who really cares about it.  I don't want to introduce a whole bunch of
configuration knobs that most users can't reason about (do you want to
authenticate the traffic, or encrypt it, or ...) that all impact
performance badly.

> 2.  What's the status of transparent proxy?  I see several references on the
> site and in the list, but no example configurations.

This was implemented for one downstream consumer of the WANProxy
codebase in a way that isn't directly applicable for WANProxy.  A
different company offered to pay for integration into mainline
WANProxy, but that company failed to pay for other work that was done
for them.  Again, the issue for doing this without a specific user
with specific needs is down to configuration: how to let users make
the right policy decisions, while still providing meaningful tools.
This could be implemented in a few days any time anyone wants.  The
issue is that people don't consider the question of how they want
non-TCP traffic handled, i.e. should we pass through anything other
than TCP?  And do you want to target specific connections, or all of
them?  Or do you want to encapsulate and decapsulate all traffic and
tunnel over the network?  These are very different configurations, and
very difficult to expose.

Frankly, I want to take the time to add a more robust way to make
decisions configurable in WANProxy, rather than exposing the kind of
flat configuration we have right now.  Then all of that becomes much
easier, and we could just provide a few documented policies that
provide base functionality, while leaving it open to people to do more
complex things if they want to dig deep.

> 3.  How well can we expect the cache performance to scale?  We move a few
> hundred GB a day which I expect to be moderately "deduplicatable".  How
> expensive are cache lookups at this scale?  Are enhancements like SSD
> recommended?   Anyone with real world experience with this?

SSD is helpful.  I have real world experience here, but it depends a
lot on hardware.  The on-disk cache code scales well with faster
equipment, and memory caches can be very large on top of that, which
helps.  For very high transfer rates, you might want to improve the
code to delay flushing the cache to disk into chunks in a worker
thread.  Easy, but again a question of how you expose that.

The actual deduplication stuff scales pretty well.  I'm hopeful to get
a week or so at some point this year to finish making the XCodec code
multi-threaded, which is mostly straightforward, though some changes
to the encoder and decoder to batch their lookups helps reduce
contention.

> 4.  I've seen a few requests in the list for configuration help /
> documentation.  I'd like to reiterate this request.  The few samples on the
> site and the one in github are certainly helpful and somewhat
> understandable, but a more in-depth explanation of the general concepts and
> specific directives would be a huge help for newcomers like me.

Could you say more?  I don't know at what level you're looking for
information here.  Most configurations are just a matter of paring
down or slightly expanding one of the examples, and there's little
utility in doing more than that.  Again, a real language to make
decisions and direct behaviour is really needed here.

> 5.  (WRT #4) The 'monitor' function looks interesting, but again it's not
> self explanatory.  Is it possible to use this interface to interrogate
> metrics like deduplication and compression metrics?  My cursory examination
> seemed to only include configuration info.

Run some traffic through and look again; you should see the count of
bytes in and bytes out for each codec, which are real-time 64-bit
counters, and should give a reliable ratio if your front-end wants to
expose that.

> 6.  There are a number of applications in the programs directory with no
> mention in wanproxy.org documentation (but that have been updated recently).
> Can anyone explain these?  Fwdproxy in particular piques my interest, but
> even after examining the code a bit I can't decipher it's purpose.

Most of them are not very user-serviceable.  It's worth noting that
there are several products built on WANProxy that are not related to
WANProxy itself, and so it's useful to have examples (though many of
them are rotted) or other programs which exercise behaviour they need,
or which are otherwise useful in development.  So they tend to not be
worth documenting, and certainly there's nothing like the support for
them that there is for WANProxy.

fwdproxy is just an implementation of another tool, which I thought
was open source but may have been part of some other proprietary
package.  It just listens for connections on a given interface (IP and
port) and forwards them to the configured IP and port.  It does
nothing else.  A customer uses it, and it's a simple example of how to
use build a useful application on the codebase without some of the
heavier moving parts.

> Sorry for the long list of questions, but the project looks quite promising
> and I'm excited to see if we can use it for our purposes.

Thanks for the questions; they're good ones!  It's worth keeping in
mind that WANProxy is pretty much just the work of one person (me),
and that I am badly oversubscribed.  I manage to improve some of the
infrastructure and keep things running on the backs of some of the
other projects I've alluded to, but my own needs for the deduplication
are limited, there's little incentive to develop better front-end
management, reporting, and configuration, etc.  There are things I
want to do in all those spaces, but it turns out to be tough to make
time for them.

I hope that helps.

Thanks,
Juli.

PS: You may have detected a little frustration in some of my answers,
too, because of the ways that some commercial users of WANProxy have
engaged with the project.  For example, I have done substantial work
on the basis of promised payment that I had to take a loss on because
I wasn't willing to withhold the code from the community, and the
backer decided to just walk away after committing funds, and after I
committed time.  I used to be a bit more willing to jump to provide
things that people expressed an interest in, but now I tend to work on
the things that I care most about, and only when I happen to have time
for them.  This means WANProxy isn't as accessible as I would like it
to be, that it isn't as good at meeting the needs of people I would
genuinely like to help, but I have made my peace for the time being
with the high barriers to use it presents.  I do sincerely want to
improve those things, but left to my own devices it will probably take
quite some time.  I'm more likely to work on optimizations that
benefit both WANProxy and other projects that share code with it, as I
am to make functionality improvements to WANProxy itself.