Jump to content


Photo
- - - - -

ProIV Spinlocks


5 replies to this topic

#1 faz

faz

    Newbie

  • Members
  • Pip
  • 3 posts
  • Gender:Male

Posted 21 December 2003 - 08:48 PM

When we recently upgraded the system a ProIV based application runs on from an 8 processor, 900mhz Sun v880 to a Sun 4800 (8 procs, 1200mhz), we started experiencing eratic behaviour which resulted in the sudden appearance of high kernel mode (80-90 %) during busy days. We are using 4.6r213, solaris 8 and oracle 9i.

As a result of this our application supplier suggested us to enable spinlocks, which are mentioned briefly in the ProIV documentation. Since then the kernel mode storm has appeared less frequently, but whenever it appears we have to restart the application for things to settle, which neither makes me nor the users happy.

So far I have not managed to reproduce the behaviour on any of our slower testsystems, however I am able to force that behaviour to happen on the production system by running some nightly batch job, which should run on a total of one processor, but causes up to 90% kernel mode on the entire system. The application is just unusable then.
If I copy the entire application (app+oracle) to a 900mhz machine, the same job consumes almost no kernel resources.

I have been collecting data using isview and the usual unix tools, but havent got a real clue yet.

Looking at /etc/isamdef and isview I have determined that there are actualy 4 spinlock parameters, SL_SPIN (default 100), SL_NAP (default 10), SL_TIMEOUT (default 50), SL_SLEEP (default 2)

Can anyone give me more detailed information about these than "set SL_SPIN=100, SL_NAP=10 on a multiprocessor system" as the Dokumentation does?

If I remove write permissions of the bootstrap files, I can make the inode based locks go away, but experimenting with this (so far on the testsystem only) has not allowed me to create any noticeable difference to what I have with.

If one upgrades to a faster machine, but with the same number of processors one should not experience suddenly increased contention, but we obviously do. Has anyone experinced something similiar yet?

regards,
faz

#2 Rob Donovan

Rob Donovan

    rob@proivrc.com

  • Admin
  • 1,640 posts
  • Gender:Male
  • Location:Spain

Posted 22 December 2003 - 08:28 AM

Hi,

Just a few questions...

1) When you upgraded the box, I guess you also upgraded or installed the O/S??

2) What sort of user level do you have logged on when you get the problem?

3) Do you have many ProISAM files?

4) If you do an isview, how many lines are in the output?

5) Is there anything in the file /etc/isamlog?
If this file does not exist, create it with rw-rw-rw permissions. Then shared memory problems will be logged into this file the next time.

6) Becareful with isview... isview 'halts' the system when it takes it snapshot and if you do a 'isview | more' then the system will freeze until you get to the end of the 'more'. This might have only been a problem with a certain release of 4.6, I cant remember.


Rob D.

#3 Richard Bassett

Richard Bassett

    ProIV Guru

  • Members
  • PipPipPipPipPip
  • 696 posts
  • Location:Rural France

Posted 22 December 2003 - 10:53 AM

I know that probably the exact same problem has occurred with a large installation of a ProIV app running on Oracle.
As it appears in your case, only the bootstrap was in ProISAM but when they upgraded the hardware the performance fell off a cliff. I wasn't actually involved so I don't have any data but I'm pretty sure ProIV support resolved the problem very quickly and I'm almost sure it WAS a case of tuning the ProISAM spinlocks. Not sure if that hardware was Sun or HP.
Speculating generally, a fairly common problem with d-i-y mutexes/spinlocks is "priority inversion" where sensitivity to the "performance balance" causes processor time not to be allocated to the processes that are blocking progress..
HTH.

Edited by Richard Bassett, 22 December 2003 - 10:53 AM.

Nothing's as simple as you think

#4 faz

faz

    Newbie

  • Members
  • Pip
  • 3 posts
  • Gender:Male

Posted 22 December 2003 - 08:00 PM

Thanks for your replies.

Let me go into further details, without mentioning the application.

1) regarding the operating system:
We use Solaris EIS Standard, 5.8 Generic_108528-23 sun4u sparc SUNW,Sun-Fire on all machines.
All systems come from a common jumpstart installation source and were installed within the last few weeks.

2) User Level
If the problem appears at night (the only time I can do some testing anyway) I have hardly any
user mode. If I shutdown the application and leave oracle up, the system is 99% idle. If the proIV app
is running, I have, for exampe 2%idle, 8%user,90% kernel, 0% iowait.

at this moment the application should be running a simple batch job that runs on one cpu.

3,4) I have about 120 .pro files, the file table in isview is 32 long. the output of isview is about 1000 lines
long, that is if I allow write access to the .pro files which normally is the case.

5) I do not have an isamlog, but I have created one now. I have seen the shared memory segment extend once,
I thereafter increased it.

6) isview seems not to be blocking on my system.

ever since we are using spinlocks, the proability of one out of mind task in the applikation slowing the entire system down seems to have increased. the application consists of over a hundred tasks, most of which I cannot restart seperatly.
I have browsed through the proIV docs on the website to see if there was was a bugfix related to this in any of the later releases, but I could not find anything.

faz

#5 faz

faz

    Newbie

  • Members
  • Pip
  • 3 posts
  • Gender:Male

Posted 01 January 2004 - 09:45 PM

I have now switched to

SL_SPIN=1
SL_NAP=1

which is not the recommended setting for multiprocessor systems, but the system has been stable with these settings for 2 weeks now. I did not suffer any performance degradation by doing this.

faz

#6 Marcel De Rijk

Marcel De Rijk

    Newbie

  • Members
  • Pip
  • 9 posts
  • Gender:Male
  • Location:Barberton, South Africa

Posted 12 March 2004 - 11:07 AM

We also run a Sun4800 with the exact same setup. About a year ago we had the same problem and changed isamdef to;
SHMDELAY = 2
SHMRETRY = 30
SL_SPIN = 100
SL_NAP = 10

Since then, not a single problem.



Reply to this topic



  


0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users