Page 1 of 1

Network dies randomly - help debugging it

Posted: Fri Jul 24, 2015 1:55 pm
by chourizo
I have a Nano 54415 with 2.7.1 and am experiencing random network issues. I would like some help or orientation about how to debug it.

My project consists on one 54415 that talks via UDP with two motors and one computer. The motors are identical (just different IPs) and I am exchanging around 160 UDP pps with each one of them (receiving one answer per packet, so it's double). The packets are very small, just a few bytes. I am also exchanging with the computer around 150 pps around 40 bytes each.

The system can run for several hours with no problem, but there are random interruptions. It can happen after 5 minutes or after 2 hours. I've seen it running for 8 hours straight with no problem. Most of the times if fails after 10-20 minutes. When it fails the netburner keeps sending "some" information to the top end, but I cannot ping it and it doesn't receive or send anything to the motors.

I have tried increasing the amount of traffic (to the motors and computer) to see if I can force an error, but it doesn't seem to affect.

Any suggestion about how to start debugging it are appreciated. For example, is there any way to monitor the available memory to detect a memory leak?

Re: Network dies randomly - help debugging it

Posted: Fri Jul 24, 2015 4:20 pm
by dciliske
GetFreeCount().

GetFreeCount will tell you the number of free buffers in the system. There's usually somewhere between 250 and 265 buffers in the system (any extra SRAM space becomes a buffer).

-Dan

Re: Network dies randomly - help debugging it

Posted: Fri Jul 24, 2015 5:21 pm
by chourizo
Ok, using GetFreeCount() I did some tests and saw a 261 - 263 in normal condition (everything working properly) and it seems to go down to 11 suddenly when everything goes out. So it seems that I have a problem and a nice tool to look for it.

About the number of packets per second (around 300-500 per second) is that reasonable or getting close to the limits of the nano?

Re: Network dies randomly - help debugging it

Posted: Fri Jul 24, 2015 6:27 pm
by dciliske
Hmm... Not really. Unless you're doing a lot of complex processing per packet.

My guess is that you end up in some sort of deadlock state where a timeout eventually cleans thing up. Either that or you end up just slightly behind and things take longer and then somehow there's a hiccup in timing that allows you to recover.

Aka, ¯\_(ツ)_/¯

-Dan

Re: Network dies randomly - help debugging it

Posted: Sun Jul 26, 2015 8:04 am
by pbreed
Are you using UDP sockets or the UDP class interface?

UDP is about the only way you can run out of buffers, ie you recieve a bunch of UDP packets and you don't process them,
they just sit in the queue waiting to be processed...

Normally i'd tell you to fix your buffer leak, but running 140 packets a second its possible that a small glitch will run you out of buffers....

You can increase the number of buffers availible (nano and 5411X have a lot of memory)

in nburn\include\constants.h change
BUFFER_POOL_SIZE to something like 2000 and then monitor free count...
If you change this you must recompile EVERYTHING.


I still think you have an issue with your processing received udp packets, but increasing the count might fix this.

Re: Network dies randomly - help debugging it

Posted: Mon Jul 27, 2015 7:46 pm
by chourizo
I've been doing a lot more tests and I think that's exactly what was happening. I think that from time to time there was a small glitch in the communications that was accumulating packets and running out of buffers.

I think the glitch was resolved in 2.7.1 (maybe the UDP bug that pbreed found). I developed all the code in 2.6.2 (I believe) and only updated to 2.7.1 the same day I posted this, but by then I haven't cleaned and recompiled the project. As soon as I updated to 2.7.1 properly I never saw the problem again and the number of packets with timeout (my timeout is really short) decreased drastically. I also enabled the stack monitoring and had the system under test for a very long time with no problem.

Anyway, I will increase the number of buffers just for extra safety.

Thanks for your help (and to the forum in general, I found a lot of useful information).