Page 1 of 3

OSTimeDly crash

Posted: Wed Sep 10, 2008 12:05 pm
by kevingnx
Having a weird problem after migrating to NNDK 22 rc 1. Our app was performing flawlessly with the old NNDK (21 rc4) but now crashes during a call to OSTimeDly(TICKS_PER_SECOND*1).

The trap shows the following

Trap occured
Vector=Format Error FMT =00 SR =2700 FS =00
Faulted PC = 020198B6
D0:0225CBBC 0201991C 00000000 000000D1 000000D2 000000D3 000000D4 000000D5
A0:000000D6 000000D7 000000A0 000000A1 0202CF00 000000A3 000000A4 0225CBA0

and winaddr2line decodes the faulted PC as OSCtxSw??:0

I've recompiled the system files to put TICKS_PER_SECOND back to default 20 (we use 100) but no difference. Seem to have some problem with task switching in our install of this NNDK. Device is MOD5234.

Any ideas?

Kevin

Re: OSTimeDly crash

Posted: Wed Sep 10, 2008 12:37 pm
by lgitlitz
I would definitely switch from NNDK22rc1 to NNDK22rc2. RC1 was taken down a few months ago when a critical bug was found. RC2 has gone through much more extensive testing and also has also been available for a few months with no major issues. Repost if you have the same problem with rc2.

Re: OSTimeDly crash

Posted: Wed Sep 10, 2008 12:45 pm
by kevingnx
My apologies I meant rc2. I just downloaded and installed it yesterday. So the problem I'm having is WITH rc2

Re: OSTimeDly crash

Posted: Wed Sep 10, 2008 12:56 pm
by lgitlitz
Could you possibly be running out of stack space? Did you increase any of the stack sizes in the old NNDK build? Have you defined any large arrays that are not global or static?
Another possibility is the new SRAM usage. Do you manually manipulate SRAM memory in your application? The NNDK22 build, and all future builds, uses SRAM for system stacks and variables. In constants.h you can comment out the following line to completely disable any SRAM usage by the system: #define ENABLE_SRAM_SYS. Make sure to recompile the system files for this to take effect.

Re: OSTimeDly crash

Posted: Wed Sep 10, 2008 1:09 pm
by kevingnx
It is a possibility but I'm confused because it worked under the previous NNDK. (I'm installing the new one to experiment with the better transmit and receive performance).

We do need larger stacks and we have a modified constants.h as follows (NB_SSH_SUPPORTED not defined)

/* Stack definitions, SSH requires more for key generation support see predef.h */
#ifdef NB_SSH_SUPPORTED
#define IP_STK_SIZE (4096)
#define TCP_STK_SIZE (4096)
#define HTTP_STK_SIZE (2048)
#define IDLE_STK_SIZE (2048)
#define ETHER_SEND_STK_SIZE (2048)
#define USER_TASK_STK_SIZE (4096)
#else /* #ifdef NB_SSH_SUPPORTED */
#define IP_STK_SIZE (2048)
#define TCP_STK_SIZE (2048)
#define HTTP_STK_SIZE (2048)
#define IDLE_STK_SIZE (2048)
#define ETHER_SEND_STK_SIZE (2048)
// #define USER_TASK_STK_SIZE (2048)
#define USER_TASK_STK_SIZE (32768)
#define USER_TASK_STK_SIZE_1 (4096)
#endif /* #ifdef NB_SSH_SUPPORTED */

We don't manipulate sram but I did have to comment out the definition of FAST_MAIN_STACK in the new NNDK to stop the region SRAM overflowing during build. The build now indicates that we're using 16% of flash and 16% of ram.

Could commenting out FAST_MAIN_STACK only cause a problem if we're still allowing everything else to run in ram?

Re: OSTimeDly crash

Posted: Wed Sep 10, 2008 1:47 pm
by kevingnx
Sadly the same problem exists when I comment out #define ENABLE_SRAM_SYS.

Re: OSTimeDly crash

Posted: Wed Sep 10, 2008 2:59 pm
by lgitlitz
The RAM usage described after compiling refers only to SDRAM. The SRAM only informs you of an overflow via the linker but gives no info when all SRAM vars and stacks fit in the 64K of SRAM.

Does this problem occur if you call OSTimeDly with a value greater then 1? This really sounds like you are overflowing a stack, possibly overrunning a buffer declared in a stack. How did you get the value 32768 for USER_TASK_STK_SIZE? This value is the number of DWORDs allocated so you are allocating 128KBytes to a stack which is a bit excessive, 16 times the default size and twice the available SRAM. You should go through your UserMain and any functions called by this task and find where you allocated all this memory. There are likely some very large arrays defined inside the task. Move these declarations outside the functions so they are global or declare them to be static. This will tell the linker to allocate global memory for the array instead of allocating stack memory. In general you should only declare small amounts of variables inside a stack (<1K). You need to save space on the stack for any context switches or interrupts that occur during this task. You should check any ISRs you have created and make sure there are no large arrays declared in these either. Variables declared in an ISR will be allocated to the stack of any task that was running when the interrupt occurs.

To get the new performance gains it will likely be important to have the UserMain stack running out of SRAM instead of SDRAM. The only way this will be possible is if you keep its size small.... the default 2048. It should be easy to have a task size of 2048 once you remove all the large arrays. Then make sure to uncomment the FAST_MAIN_STACK you changed earlier and recompile everything.
FYI, the next beta and future builds will now have a define called MAIN_TASK_STK_SIZE for the UserMain task size. The USER_TASK_STK_SIZE will be used only for the OSSimpleTaskCreate function.

-Larry

Re: OSTimeDly crash

Posted: Wed Sep 10, 2008 3:25 pm
by kevingnx
Thanks for the feedback. The large stack is needed to handle constructors initialized before main() in a multi-channel (64 channel in 8 out) application .i.e. every array or object initialized is multiplied by (64+8) in terms of amount of storage required and unfortunately the main body of the code is inherited from another source and it would be time consuming to change. As its currently written you certainly do get stack overflows with values less than 128k. I will have a look at changing this in due course though.

What is confusing me now is that the app worked fine under the previous NNDK and with ENABLE_SRAM_SYS commented out, there should be little or no difference in memory allocation right. So what has changed between the NNDK's that is now giving me a stack problem?

I've checked again that my changes to the constants.h file are having an effect and I'm using the right files.

Re: OSTimeDly crash

Posted: Wed Sep 10, 2008 4:04 pm
by lgitlitz
Do you know how much memory you are actually allocating with your constructors. Was the 128K calculated or was it obtained through trial and error? It is possible that you have always been overrunning the stack but running into unallocated space in the old build. The physical location of where variables and stacks are stored are not guaranteed to remain the same between different builds. This might cause a pre-existing stack overflow to now run into allocated memory in the new build. When you call the OSTimeDly do you have any other active tasks running? If there are no other tasks running then the idle task will run. What I suspect is that the main task exceeded its stack and corrupted the idle task stack. This would cause a trap when the idle task begins running... as soon as you call OSTimeDly.
Try increasing the USER_STACK_SIZE to a larger value, maybe double it for testing purposes. You have plenty of space in RAM so it shouldn't hurt.

Re: OSTimeDly crash

Posted: Wed Sep 10, 2008 4:46 pm
by kevingnx
Doubled the stack size with a defined amount of 65k, same problem. Just to be sure I wasn't going insane I reloaded the original app (compiled under NNDK 21 rc4). It runs fine and here's the stack report from that. 50 is the main task.

OS Stacks
Prio StackPtr Stack Bottom Free Now Minimum Free
63 | 0x211cc54 | 0x211acac | 8104 | 8032
50 | 0x212eadc | 0x211ccb4 | 73256 | 69560
40 | 0x21123d0 | 0x2110484 | 8012 | 8012
39 | 0x20a1ce8 | 0x209fd9c | 8012 | 7876
38 | 0x211510c | 0x21131c0 | 8012 | 7924
48 | 0x209bc1c | 0x2097cb4 | 16232 | 16232
46 | 0x209fc30 | 0x209bcc0 | 16240 | 16240
47 | 0x2097b74 | 0x2093c64 | 16144 | 12604

OSTimeDly is called from within InitializeGPIO() just after a call to initializeStack() which is in turn the first call in main(). I'll start to strip down the code to see if I can get further clues but I'm perplexed.