General questions...

Fri Feb 26 12:13:37 PST 2016

Hi Juli,

Thanks for your answers (and of course for your work on WANProxy!).  Just a
followup on the issue of configuration examples and documentation:

Here are some items that aren't clear (even after reading most of the
source code):

1.  What is the definition of an interface in WANProxy?  Do these
correspond to physical or logical host interfaces in some way?  It seems
that they define listeners or transmitters with hostname/IP and port number
pairs, but it's not clear.

2.  In the posting where you announced persistent caching you mention that
it's possible to configure a raw disk as a cache device but setting the
path attribute to a device path (e.g. /dev/sdb) results in an error.  How
would one configure a raw device?

3.  The proxy config type accepts a 'type' attribute of either TCP,
TCP-TCP, SSH, or SSH-SSH.  I don't understand the difference between the
doubled types and the singular (TCP vs. TCP-TCP).  It seems that they're
both defined as the same type in the code (e.g. WANProxyConfigProxyTypeTCPTCP).
Does this imply that one is legacy and that they're the same?

Thanks for any help with these!

As far as your comment about frustration with commercial users:  very
understandable!  I've had very similar experience with projects I've worked
on in the past.  I only hope we can be better OSS citizens!

-robert

On Wed, Feb 24, 2016 at 9:59 PM, Juli Mallett <juli at clockworksquid.com>
wrote:

> Hi Robert,
>
> Great questions — here's a quick attempt at some answers.
>
> On Wed, Feb 24, 2016 at 4:04 PM, Robert Jordan <robert at conductorio.com>
> wrote:
> > Hi WanProxy Community,
> >
> > I'm researching WanProxy for possible inclusion in our SAAS product and I
> > have a few questions (I have read the wanproxy.org site and a few years
> > worth of this list so hopefully these haven't been covered, apologies if
> > they have!):
> >
> > 1.  Are there options for encrypting the proxy to proxy connection?  I
> see
> > that there has been work on SSH proxying which would meet this
> requirement,
> > but unfortunately without the ability to authenticate with keys I don't
> > think it will work for our use case.
>
> Nope.  This would be easy to do, but has not yet been done.  It's a
> matter of priorities — none of the WANProxy users seem to need this
> very much, and it's easy to deal with isolation or security of a lot
> of long links in other ways, and run WANProxy over that.  I have a
> work-in-progress attempt at this, and while it would simplify some
> other things, I'm a little reluctant to move forward without a user
> who really cares about it.  I don't want to introduce a whole bunch of
> configuration knobs that most users can't reason about (do you want to
> authenticate the traffic, or encrypt it, or ...) that all impact
> performance badly.
>
> > 2.  What's the status of transparent proxy?  I see several references on
> the
> > site and in the list, but no example configurations.
>
> This was implemented for one downstream consumer of the WANProxy
> codebase in a way that isn't directly applicable for WANProxy.  A
> different company offered to pay for integration into mainline
> WANProxy, but that company failed to pay for other work that was done
> for them.  Again, the issue for doing this without a specific user
> with specific needs is down to configuration: how to let users make
> the right policy decisions, while still providing meaningful tools.
> This could be implemented in a few days any time anyone wants.  The
> issue is that people don't consider the question of how they want
> non-TCP traffic handled, i.e. should we pass through anything other
> than TCP?  And do you want to target specific connections, or all of
> them?  Or do you want to encapsulate and decapsulate all traffic and
> tunnel over the network?  These are very different configurations, and
> very difficult to expose.
>
> Frankly, I want to take the time to add a more robust way to make
> decisions configurable in WANProxy, rather than exposing the kind of
> flat configuration we have right now.  Then all of that becomes much
> easier, and we could just provide a few documented policies that
> provide base functionality, while leaving it open to people to do more
> complex things if they want to dig deep.
>
> > 3.  How well can we expect the cache performance to scale?  We move a few
> > hundred GB a day which I expect to be moderately "deduplicatable".  How
> > expensive are cache lookups at this scale?  Are enhancements like SSD
> > recommended?   Anyone with real world experience with this?
>
> SSD is helpful.  I have real world experience here, but it depends a
> lot on hardware.  The on-disk cache code scales well with faster
> equipment, and memory caches can be very large on top of that, which
> helps.  For very high transfer rates, you might want to improve the
> code to delay flushing the cache to disk into chunks in a worker
> thread.  Easy, but again a question of how you expose that.
>
> The actual deduplication stuff scales pretty well.  I'm hopeful to get
> a week or so at some point this year to finish making the XCodec code
> multi-threaded, which is mostly straightforward, though some changes
> to the encoder and decoder to batch their lookups helps reduce
> contention.
>
> > 4.  I've seen a few requests in the list for configuration help /
> > documentation.  I'd like to reiterate this request.  The few samples on
> the
> > site and the one in github are certainly helpful and somewhat
> > understandable, but a more in-depth explanation of the general concepts
> and
> > specific directives would be a huge help for newcomers like me.
>
> Could you say more?  I don't know at what level you're looking for
> information here.  Most configurations are just a matter of paring
> down or slightly expanding one of the examples, and there's little
> utility in doing more than that.  Again, a real language to make
> decisions and direct behaviour is really needed here.
>
> > 5.  (WRT #4) The 'monitor' function looks interesting, but again it's not
> > self explanatory.  Is it possible to use this interface to interrogate
> > metrics like deduplication and compression metrics?  My cursory
> examination
> > seemed to only include configuration info.
>
> Run some traffic through and look again; you should see the count of
> bytes in and bytes out for each codec, which are real-time 64-bit
> counters, and should give a reliable ratio if your front-end wants to
> expose that.
>
> > 6.  There are a number of applications in the programs directory with no
> > mention in wanproxy.org documentation (but that have been updated
> recently).
> > Can anyone explain these?  Fwdproxy in particular piques my interest, but
> > even after examining the code a bit I can't decipher it's purpose.
>
> Most of them are not very user-serviceable.  It's worth noting that
> there are several products built on WANProxy that are not related to
> WANProxy itself, and so it's useful to have examples (though many of
> them are rotted) or other programs which exercise behaviour they need,
> or which are otherwise useful in development.  So they tend to not be
> worth documenting, and certainly there's nothing like the support for
> them that there is for WANProxy.
>
> fwdproxy is just an implementation of another tool, which I thought
> was open source but may have been part of some other proprietary
> package.  It just listens for connections on a given interface (IP and
> port) and forwards them to the configured IP and port.  It does
> nothing else.  A customer uses it, and it's a simple example of how to
> use build a useful application on the codebase without some of the
> heavier moving parts.
>
> > Sorry for the long list of questions, but the project looks quite
> promising
> > and I'm excited to see if we can use it for our purposes.
>
> Thanks for the questions; they're good ones!  It's worth keeping in
> mind that WANProxy is pretty much just the work of one person (me),
> and that I am badly oversubscribed.  I manage to improve some of the
> infrastructure and keep things running on the backs of some of the
> other projects I've alluded to, but my own needs for the deduplication
> are limited, there's little incentive to develop better front-end
> management, reporting, and configuration, etc.  There are things I
> want to do in all those spaces, but it turns out to be tough to make
> time for them.
>
> I hope that helps.
>
> Thanks,
> Juli.
>
> PS: You may have detected a little frustration in some of my answers,
> too, because of the ways that some commercial users of WANProxy have
> engaged with the project.  For example, I have done substantial work
> on the basis of promised payment that I had to take a loss on because
> I wasn't willing to withhold the code from the community, and the
> backer decided to just walk away after committing funds, and after I
> committed time.  I used to be a bit more willing to jump to provide
> things that people expressed an interest in, but now I tend to work on
> the things that I care most about, and only when I happen to have time
> for them.  This means WANProxy isn't as accessible as I would like it
> to be, that it isn't as good at meeting the needs of people I would
> genuinely like to help, but I have made my peace for the time being
> with the high barriers to use it presents.  I do sincerely want to
> improve those things, but left to my own devices it will probably take
> quite some time.  I'm more likely to work on optimizations that
> benefit both WANProxy and other projects that share code with it, as I
> am to make functionality improvements to WANProxy itself.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.wanproxy.org/pipermail/wanproxy-wanproxy.org/attachments/20160226/56ac3206/attachment.htm>