Freezed download via Wanproxy

Juli Mallett juli at clockworksquid.com
Fri Mar 16 16:47:04 PDT 2018


Hi Ivan,

I've worked on that code pretty extensively in the past, and I'm pretty
sure I tested a wide set of circumstances, but it's certainly possible
there's some missing edge case.  I'd suggest looking at the code related
to XCODEC_PIPE_OP_EOS and XCODEC_PIPE_OP_EOS_ACK.  If I were testing this,
and trying to reproduce it, my first step would be to put INFO or DEBUG
statements throughout the code, and to watch what happens in as simplified
of a test case as possible, to determine the correct behaviour.

Note that you have to be very slow and methodical in changing these things,
as you can easily make a change which will close the connection in your
case, but which is wrong in the case where shutdown(2) is being used on a
connection which may, in fact, be long-lived.  That's the sort of mistake
that people tend to make in working on middleboxes and proxies:
overfitting.  In this case, it sounds like you're reproducing a case where
the EOS machinery isn't running properly, but without digging into it, it's
hard to be sure what's being too conservative, and how to fix it without
breaking other things.  If I were reproducing and testing the issue, my
expectation would be that it would come out to be a fairly simple fix in
most cases, but I've been wrong about that before.

I'd be shocked if the polling code for kqueue was wrong, and mildly
surprised if it were wrong for epoll, given how extremely widely deployed
and tested that code is.  Your assessment that it's probably in the XCodec
protocol stuff is probably right, and I hope any of this is helpful to
you.  It sounds like you're an accomplished programmer working on WANProxy,
so I'm sure you'll be able to figure it out.  If you run in verbose (-v)
mode, with debugging compiled in, you should see that there's already some
debugging statements around these cases.  Where there might be some loss of
fidelity would be in how errors, rather than simply ordinary end-of-stream,
propagate into the pipe system.  There's a lot of testing and work I've
done on related things, mostly using libuinet, that aren't part of the open
source version of WANProxy, so if I had to guess about a location for an
issue outside of XCodec, that's where I'd think about looking.  Like, the
case where Splice::complete is called with an error: the underlying
connections should be torn down, but it's possible that's not happening for
some reason.

Again, just be careful: when changing this kind of thing, overfitting is
extremely easy to do.  Good luck, and I look forward to hearing what you
find!  I wish I had time to take a look and provide either a patch or a
more helpful set of suggestions myself.

Thanks,
Juli.

On Fri, Mar 16, 2018 at 4:32 PM, Ivan Pizhenko <ivan.pizhenko at gmail.com>
wrote:

> Hi Juli,
>
> I've started exploring Wanproxy code and found that socket event
> polling with epoll(), which I use in Linux, is likely done correctly.
> To check this, I've performed another experiement -  I have set
> "codec" to None on the both server and client and tiried again.
> And it started to work correctly, exactly as I expect - when I kill
> "server" Wanproxy, "client" Wanproxy has disconnected its client -
> but... without any traffic optimization, which I want Wanproxy to do.
> So the issue must be inside XCodec. Can you please help me to identify
> it and recommend how to fix?
>
> Ivan.
>
>
> 2018-03-15 6:43 GMT+02:00 Ivan Pizhenko <ivan.pizhenko at gmail.com>:
> > Hi Juli,
> >
> > I have managed to install couple FreeBSD 11 RELEASE VMs (that was
> > really tricky, but setting up second one was finally easier than
> > first), built the Wanproxy on them and executed the same experiment.
> > I have tried few various combinations: all locally, on the same
> > Linux/FreeBSD machine, and client on the one Linux/FreeBSD machine
> > with server on the different Linux/FreeBSD machine.
> > And the result was the same in all cases - when "server" Wanproxy goes
> > down, "client" Wanproxy does not disconnect its client. So I think
> > there must be major issue the Wanproxy logic.
> > I still did not review source code deeply yet, but can you please
> > confirm, do you really think that current implementation should
> > propagate connection state correctly inside "client" Wanproxy?
> >
> > Also I have got Wanproxy crash on FreeBSD, when attempted to specify
> > server VM name in the client wanproxy config.
> > I have put following into my client.conf:
> >
> > create peer peer0
> > set peer0.family IP
> > set peer0.host "wptest1"
> > set peer0.port "3301"
> > activate peer0
> >
> > This have given me following error (and crash right after it):
> > 1521079851.327281 [/socket/address] ERR: bool
> > socket_address::operator()(int, int, int, const string&): Could not
> > look up [wptest1]:3301: hostname nor servname provided, or not known
> > 1521079851.327354 [/socket/handle] ERR: static SocketHandle*
> > SocketHandle::create(SocketAddressFamily, SocketType, const string&,
> > const string&): Invalid hint: [wptest1]:3301
> > ./client.sh: line 1: 13501 Segmentation fault (core dumped) ./wanproxy
> > -c client.conf
> >
> > Note that on Linux that worked pretty good.
> > I have had name resolution configured through WINS (Samba), i.e. have
> > running Samba with valid config, and have wins added to
> > /etc/nsswitch.conf:
> >
> > hosts: files wins dns
> >
> > Note that ping has reached that host successfully:
> >
> > $ ping wptest1
> > PING wptest1 (192.168.150.11): 56 data bytes
> > 64 bytes from 192.168.150.11: icmp_seq=0 ttl=64 time=0.266 ms
> > 64 bytes from 192.168.150.11: icmp_seq=1 ttl=64 time=0.234 ms
> > 64 bytes from 192.168.150.11: icmp_seq=2 ttl=64 time=0.381 ms
> > 64 bytes from 192.168.150.11: icmp_seq=3 ttl=64 time=0.382 ms
> > 64 bytes from 192.168.150.11: icmp_seq=4 ttl=64 time=0.269 ms
> > ^C
> > --- wptest1 ping statistics ---
> > 5 packets transmitted, 5 packets received, 0.0% packet loss
> > round-trip min/avg/max/stddev = 0.234/0.306/0.382/0.063 ms
> >
> > But wanproxy crashed.
> > I had to specify IP address (192.168.150.11) instead of name(wptest1)
> > to mitigate this.
> > But it works on Linux with no matter there is IP address or host name.
> >
> > WBW, Ivan.
> >
> >
> > 2018-03-07 5:01 GMT+02:00 Juli Mallett <juli at clockworksquid.com>:
> >> Hi Ivan,
> >>
> >> I don't know the Linux TCP/IP stack, unfortunately, so I can't be any
> help
> >> there.  In your case, I think you might want to consider adding, or
> having
> >> someone add, a simple heartbeat mechanism to the xcodec protocol in
> >> WANProxy.
> >>
> >> Thanks,
> >> Juli.
> >>
> >> On Tue, Mar 6, 2018 at 6:15 PM, Ivan Pizhenko <ivan.pizhenko at gmail.com>
> >> wrote:
> >>>
> >>> Hi Juli,
> >>>
> >>> Thanks for replying to my email.
> >>>
> >>> I am using Linux. I have set up VirtualBox VM with Xubuntu 16.04 LTS
> >>> with latest HWE kernel 4.13 and all latest updates. I have not tuned
> >>> any OS options related to networking and TCP/IP protocol. I am not
> >>> using libuinet. I am not targeting FreeBSD, I need to have it working
> >>> on Linux, primarily on Ubuntu Server.
> >>>
> >>> So I also was expecting that connection should be reset after some
> >>> reasonable timeout, but that didn't happen (or I have waited for too
> >>> short time??? I remember it was like at least 10 minutes). So present
> >>> mechanism seems to don't work. Thanks, heartbeat is interesting idea,
> >>> but probably there is something we can do via TCP connection settings
> >>> that we did not do yet? I am not big specialist in TCP protocol
> >>> settings, but I suppose you must be more aware in this area, so I am
> >>> asking about this, probably you can recommend something else. If
> >>> nothing more can be done, then sure, I will need to implement
> >>> heartbeat.
> >>>
> >>> Ivan.
> >>>
> >>>
> >>> 2018-03-06 3:48 GMT+02:00 Juli Mallett <juli at clockworksquid.com>:
> >>> > Hi Ivan,
> >>> >
> >>> > WANProxy should pass along state when a stream is closed from end to
> >>> > end,
> >>> > not perfectly, but your connection should be properly reset at some
> >>> > point
> >>> > from the server going away.  There isn't anything that can be done
> in a
> >>> > protocol-neutral way that exceeds that, but that should be good
> enough
> >>> > for
> >>> > most uses.  Of course there are things that can disrupt the TCP state
> >>> > machine, or settings on a system can mean that connections aren't
> timed
> >>> > out
> >>> > when they should be.
> >>> >
> >>> > Are you using libuinet, FreeBSD, Linux, or something else for the
> TCP/IP
> >>> > stack?
> >>> >
> >>> > An easy change would be to add a heartbeat on all active sessions
> with
> >>> > WANProxy to actively probe for disconnected peers, but I'm not sure
> I'd
> >>> > encourage that.  If you think that would be helpful to you, let me
> know.
> >>> >
> >>> > Thanks,
> >>> > Juli.
> >>> >
> >>> > On Sat, Feb 24, 2018 at 1:09 AM, Ivan Pizhenko <
> ivan.pizhenko at gmail.com>
> >>> > wrote:
> >>> >>
> >>> >> Hi,
> >>> >>
> >>> >> I am making some tests with Wanproxy to understand how much it is
> >>> >> stable and reliable. I am using latest Wanproxy code from Github and
> >>> >> work on Ubuntu 16.04 LTS with kernel 4.13 and all latest updates.
> >>> >>
> >>> >> I have conducted following simple test:
> >>> >>
> >>> >> I have installed locally Apache 2 HTTP Server and put some large
> file
> >>> >> into the document root. Then I have configured, also locally,
> "client"
> >>> >> and "server" Wanproxy similar to how it is described in examples
> >>> >> section on wanproxy.org, but without ssh tunnel between them, to
> proxy
> >>> >> Apaches's HTTP port. Then I have used wget to download that large
> file
> >>> >> through "client" Wanproxy. It worked fine but slower than direct
> >>> >> download from Apache. Then I have tried to do the same thing but  I
> >>> >> have shut down "server" Wanproxy somewhere in the middle of
> download.
> >>> >> The download has freezed, the were no further progress. When I have
> >>> >> restarted "server" Wanproxy, the download did not resume. When I
> shut
> >>> >> down client Wanproxy, wget showed error like "connection refused"
> and
> >>> >> exited.
> >>> >>
> >>> >> I would expect that when "server" Wanproxy went down, "client" one
> >>> >> would disconnect clients connected to it to indicate that upstream
> >>> >> link is broken, if not immediately, then after some reasonable
> >>> >> timeout. Is there a way to achieve something like this with
> Wanproxy?
> >>> >> If not, what changes to Wanproxy are needed to enable such
> >>> >> functionality?
> >>> >>
> >>> >> Ivan.
> >>> >> _______________________________________________
> >>> >> wanproxy mailing list
> >>> >> wanproxy at lists.wanproxy.org
> >>> >> http://lists.wanproxy.org/listinfo.cgi/wanproxy-wanproxy.org
> >>> >
> >>> >
> >>
> >>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.wanproxy.org/pipermail/wanproxy-wanproxy.org/attachments/20180316/99249822/attachment-0001.html>


More information about the wanproxy mailing list