Jump to content


Photo
- - - - -

8 YEARS OF PROBLEMS WITH ISCOLLECT


14 replies to this topic

#1 alex

alex

    Newbie

  • Members
  • Pip
  • 2 posts
  • Gender:Male

Posted 18 November 2004 - 04:37 PM

We began to use PRO-IV in 1995 with a Novell Server with few users, we were happy and produced TONS of code and applications.
In 1996 we migrated to version 4.0 in a HP9000 Unix server with many more users and our problems began.
When more than 200 users entered to our applications the response time went from seconds to 3 to 5 minutes, it was a nightmare.
After months of analysis we found that stopping all users and reseting the ISCOLLECT solved the problem... until the ISCOLLECT got saturated again.
Since then we stop the system once or twice a day.
In 2001 we migrated to a SUN Microsystems 10K server with 8 CPU's with Solaris 5.8 and 400 concurrent users.
The phenomena was the same: ISCOLLECT freezes down the server after many hours of use.
PRO-IV Corporation has been unable to solve this in 8 years, and the local support in my country (Mexico) has been also unable to solve the problem.
Today we are not developing any new code in PRO-IV, but will be using it for at least 2 years more.
I would like to receive some feedback from users of this forum regarding this topic.

#2 Joseph Bove

Joseph Bove

    ProIV Guru

  • Members
  • PipPipPipPipPip
  • 756 posts
  • Gender:Male
  • Location:Ramsey, United States

Posted 18 November 2004 - 05:05 PM

Alex,

We've had some relatively minor problems with iscollect over the years - nothing like what you are describing though.

Every now and then (and much more rarely - if ever on 5.5) more than one iscollect will be running. This is a bad thing. To be on the safe side, we have everyone log out and kill one of the iscollects.

In 4.6, we have seen instances (nearly impossible to recreate) where during an update can cause iscollect to go to 100% of CPU.

Our largest client sites are anywhere for 100 - 150 users and they have been a lot more stable than what you are describing.

I'm not sure what more feedback I can offer...

Is there something in particular you are hoping to find out?

Regards,

Joseph

#3 Mike Schoen

Mike Schoen

    Expert

  • Members
  • PipPipPipPip
  • 198 posts
  • Gender:Male
  • Location:Guelph, Canada

Posted 18 November 2004 - 05:23 PM

We also have large installations running with 100+ users and no issues like this.

I have seen similar response issues when iscollect runs out of shared memory.
The application would run slower as processes queued up for shared memory, and you would see
entries in /etc/isamlog that pid #NNN tried N times for a lock (or something like that).

I would however expect that your local pro-iv support would have tried using the
SHMSIZE parameter in /etc/isamdef, as well as MAXFILES, etc.

#4 Guest_Mike P_*

Guest_Mike P_*
  • Guests

Posted 18 November 2004 - 06:13 PM

I've worked with unix boxes for 15 years and have never had to compromise as you have described. Normally we have found a way to tune the unix kernel parameters, /etc/isamdef pro-iv parameters and with plenty of system memory and caching slow system response can be resolved. Sometimes it's the PRO-IV coding techniques that cause unacceptable system slowdown. For example, I've seen programming where an update walks through hundreds of thousands of detail records for a simple process. When many users call the same update, system response comes to a crawl. Are you minimizing detail file walkthroughs via alternate index techniques?

#5 Rob Donovan

Rob Donovan

    rob@proivrc.com

  • Admin
  • 1,640 posts
  • Gender:Male
  • Location:Spain

Posted 18 November 2004 - 08:19 PM

Hi,

I also have never seen such problems.

I've worked on systems with over 500 users, and it worked perfectly (well just about...)

Are you using Spinlocks?

Are you using Oracle or any other Database / file system other than ProISAM?

If you post us your /etc/pro4.ini file, we could check that for you.

You should also create the following file (if it does not exist) , /ect/isamlog and give it rw-rw-rw access. Whenever ProIV has shared memory problems, it will log the problems in this file, if it exists. You should check this file once you are getting 'slowdowns'.

If you are running out of shared memory, then each ProIV session will in effect 'freeze' and wait for shared memory to become available. This file should indicate if you are running out of shared memory.

Also, when you are getting the 'slowdown', use the ProIV utility isview and pipe the output to a file (eg. isview > isview.log) and post us the file.

Rob.

#6 Joseph Bove

Joseph Bove

    ProIV Guru

  • Members
  • PipPipPipPipPip
  • 756 posts
  • Gender:Male
  • Location:Ramsey, United States

Posted 18 November 2004 - 10:29 PM

Alex,

Along the same lines of what Rob is saying, is anything queueing up in /etc/isamlog?

Regards,

Joseph

#7 alex

alex

    Newbie

  • Members
  • Pip
  • 2 posts
  • Gender:Male

Posted 19 November 2004 - 09:08 PM

Dear Sirs:

I want to thank you all the feedback received.

Antonio Delgado, who has been involved here with this problem for many years, will post later the files and information you are requesting.

I really appreciate your help.

Alex

#8 Antonio Delgado

Antonio Delgado

    Newbie

  • Members
  • Pip
  • 2 posts
  • Gender:Male

Posted 20 November 2004 - 12:58 AM

Hi to All, we want to thanks in advance for all your comments & recomendations about this concern. Next are some bullets in answer for some of your recomendations posted above:

* Files
- isamlog This file has been empty since its creation about 2 years ago. We have reviewed recently the access: rwxrwxrwx.

- proiv.ini. Contents next:

[Serial - 4.0]
Customer=UNITEC
Installation0=524D 59EE 5853 62DE 4G9H FBE5
Installation1=28HB 65BA A379 AECE 5DC8 ADH8
Installation2=578D 4FHB GD3H G25E HD63 C5GH
Installation3=CH94 A78F 42FG D2A9 CF54 A4DD
Installation4=93DC D8G6 F5F2 C3BC E9G9 78H7
Activation1=653H 3E22 DBD5 87HA 6EDD 52HD
Installation5=BB49 A89G 3E89 7GFE EA4G 7HEE
Installation6=CH2B 7237 5D98 C47E 4D5H E3F6
Installation7=6874 B292 EB6H GAGA 53GF F8DE
Activation2=7665 8HEC B3ED 25DE FEH9 3CE9
[Serial Info - 4.0]
Installation0=Evaluation mode for SiteId 287963, valid through Tue Nov 16 2004
Installation1=Serial number 7657: CISAM enabled through Thu Nov 18 2004
Installation2=Serial number 7657: ORACLE enabled through Thu Nov 18 2004
Installation3=Serial number 7657, platform SUN_SOLARIS: 20 Development users, valid through Thu Nov 18 2004
Installation4=Serial number 7657, platform SUN_SOLARIS: 395 Runtime users, valid through Thu Nov 18 2004
Activation1=Activate serial number 7657 for SiteId 287963, valid indefinitely
Installation5=Serial number 7736: CISAM enabled through Thu Dec 9 2004
Installation6=Serial number 7736: ORACLE enabled through Thu Dec 9 2004
Installation7=Serial number 7736, platform SUN_SOLARIS: 120 Runtime users, valid through Thu Dec 9 2004
Activation2=Activate serial number 7736 for SiteId 287963, valid through Thu Dec 9 2004


- isamdef Contents next:

MAXFILES = 3000 max # of open isam files

MAXUSERS = 400 max # users in system
USRFILES = 200 max # open files per user

MAXLOCKS = 250 max # of simultaneous rcd locks
AVGLOCK = 120 average size (bytes) of a record lock

#SHMSIZE = 3145728 shared memory size (alternate to above 4 parms)
# (assuming (dev_t == 2) and (pid_t == 4))
# SHMDELAY = 5 time to wait for a resource (seconds)
#SHMRETRY = 30 max # of attempts at resource

SL_SPIN = 1000
SL_NAP = 10

The value for SHMSIZE is 2MB in accordance from parameters in isamdef file

- isview171104 This file is the output from the isview command and is some long, see attached file.

We don't know how interpret content from this file, we have asked to our local support too without success. can you help us to determine the kind of errors or warnings, meannings, docs, etc?

- In general, our site has the next characteristics:
  • Sun Enterprise 10000
  • 8 CPU
  • 8 GB RAM

Attached Files



#9 Rob Donovan

Rob Donovan

    rob@proivrc.com

  • Admin
  • 1,640 posts
  • Gender:Male
  • Location:Spain

Posted 22 November 2004 - 08:25 AM

Hi,

Looking at the ISVIEW results, setting MAXFILES to 3000 is probably not needed. At the time of the ISVIEW, you only had 166 open ProISAM files. So you can probably lower the 3000. Somewhere around 400 would probably be fine, to be safe.

Also, SL_SPIN is set to 1000, it is recommended that on a multiprocessor system this should be set to 100, but I'm not too sure on the relevance of this figure.

I don’t think that these things will have anything to do with the slow downs, its just things I noticed, that seemed a bit strange. I've never set either of those settings to such high numbers, so just maybe ProIV is struggling with them being that high.

Also, you have USRFILES set to 200. Is your system that complex that each user will have 200 ProISAM files open (in a single function) at once? It seems alot.

The system I was on with 600 users, only had 200KB of shared memory set for ProIV, but we were using CISAM, so that would reduce the size of Shared memory that ProIV would need.

Do you use any other file / data types, such are Oracle or CISAM, or is all your data in ProISAM files?

If you are getting the slow down so regularly, you could try to do some testing to see what it is that is actually slowing down. I would write some 'test' functions, that do simple things, like reading 100 records from a ProISAM file, adding records to a ProISAM file and Deleting records from a ProISAM file. Then when the system slows down, run these functions and see if they all slow down also. This could help you narrow down what it is that is slowing down things.

Also, when you are running your test functions, you could set the ProIV trace (TRACEALL=15 and TIMESTAMP, do a search on this site to find out more) and see if you can see in the trace which bit of the function it is that is running slowly or where it gets 'stuck'.

If you are still on ProIV 4.0, have you thought about upgrading, because that is quite an old version of ProIV you are on. Maybe this could be a bug that is now fixed in a later release.

What exactly is it you do to fix the system. Kill ISCOLLECT and log off all the uses?? Reboot the machine?

ISCOLLECT does not do that much for ProIV. It basically just clears up shared memory if the kernel leaves things around when it crashes. So ISCOLLECT just running alot may just be a 'by-product' of something else, it may not be the thing that is slowing things down.

OK, thats enough for now... :(

HTHs,

Rob.

#10 Chris Pepper

Chris Pepper

    ProIV Guru

  • Members
  • PipPipPipPipPip
  • 369 posts
  • Gender:Male
  • Location:United Kingdom

Posted 22 November 2004 - 11:23 AM

The only other thing I've seen that can cause serious slowing down is when multiple iscollect sessions get started due to incorrect settings.
Only one copy of iscollect should ever be running.

#11 Richard Bassett

Richard Bassett

    ProIV Guru

  • Members
  • PipPipPipPipPip
  • 696 posts
  • Location:Rural France

Posted 22 November 2004 - 11:49 AM

Only one copy of iscollect should ever be running.

True, except ISTR when running certain combinations of older and newer versions of Unix ProIV on the same machine.
From time to time the "id" of the shared memory segment used for ProISAM record locking was changed (probably because the format of its content changed).
If you run multiple historical versions of ProIV on a machine then it can be OK to have multiple iscollects provided each is monitoring a distinct shared memory segment.
(You can see the existence of "shm" segments using the ipcs command).
Obviously this is not a common case and I'm only mentioning it for completeness in case anyone might start to think they have a problem when in fact they don't.
Caveat - it's a long time since I had to dirty my hands with this stuff but I'm fairly sure I recall correctly.
Nothing's as simple as you think

#12 Chris Pepper

Chris Pepper

    ProIV Guru

  • Members
  • PipPipPipPipPip
  • 369 posts
  • Gender:Male
  • Location:United Kingdom

Posted 22 November 2004 - 02:57 PM

No you are quite correct. I was going to mention this in the post above - but thought that it might be confusing in this case.

You're right to mention it because these posts may get referenced in years to come :(

#13 Guest_Guest_Paul_*

Guest_Guest_Paul_*
  • Guests

Posted 25 November 2004 - 04:24 PM

Hi,

Also, SL_SPIN is set to 1000, it is recommended that on a multiprocessor system this should be set to 100, but I'm not too sure on the relevance of this figure.


SL_SPIN is the number of times that a process will 'spin' while trying to obtain a semaphore.
SL_NAP is the minimum time in milliseconds that the process will then wait before retrying.

I'm not too sure about SUN and SPINLOCKS. However, when we upgraded our hardware to HP GS1280 AlphaServers, we noticed that even under load, 1000+ proiv sessions simulating application usage, the system never used more than 20% resources. The application slowed down considerably while the system remained almost completely unused.

By changing our SL_SPIN setting from 100 down to 10 and the SL_NAP from 10 down to 1, we were able to start using up to 70% resources with similar load, and the application response improved accordingly. Furthermore, by replacing Pro-ISAM files with C-ISAM files we realised an additional performance improvement.

This appears to be a problem with the scaleability of Pro-iv & Pro-ISAM, specifically on UNIX machines. 10 years ago, a 1millisecond delay would have been fine. However, with todays hardware, this should probably be changed to be specified in nano-seconds.

Cheers,
Paul

#14 Antonio Delgado

Antonio Delgado

    Newbie

  • Members
  • Pip
  • 2 posts
  • Gender:Male

Posted 03 December 2004 - 02:25 AM

Hi to All, we want to thanks in advance for all your comments & recomendations about this concern. sorry for the delay for feedback B)), next are some questions.

The ISVIEW file sent corresponds to a normal load for around 250 concurrent users but this does not mean an overload at all. In fact we have about 2 weeks every 4 months that are critical and presents slow downs due about 380 concurrent users and high process demand, at these moments we have observed more than 2500 files opened at once, this is reported by the corresponding ISVIEW file.

SL_SPIN was setted to such number for 2 reasons: 1) Was a Local Support recommendation and, 2) Was a result after large test sessions. So, we don´t find how to set this value in accordance our server configuration: number of processors & memory. Is there any documentation that help us to solve this ?

The SHMSIZE parameter is deducted by the other 4 parameters. Do you recommend to set this to a fixed value?

We use PRO-ISAM files for small size ones & C-ISAM files for larger than 1GB. Really Oracle is used since jan 2004, but is used for interface purposes.

We have constantly session tests to reproduce overloading, these runs pasts 10:00 PM. The first test runs with 350 concurrent users ending with a duration of 12 minutes, then we kill the ISCOLLECT processs without restart the server. The second test with the same number of users longs 5 minutes, can you tell me why this happens?

Can you tell us where to set the TRACEALL & TIMESTAMP parameters?

At this moment we are unable to upgrade our PRO-IV 4.6 version.

To kill the ISCOLLECT process, we log off all users then kill the process without restart the system. This task longs 1 to 2 minutes. These week we have done this process 5 times in working time.

Please, we need some feedback concerning to interpret correctly de ISVIEW file contents. Can you help us?

#15 Joseph Bove

Joseph Bove

    ProIV Guru

  • Members
  • PipPipPipPipPip
  • 756 posts
  • Gender:Male
  • Location:Ramsey, United States

Posted 03 December 2004 - 06:05 PM

Antonio,

The SHMSIZE parameter is deducted by the other 4 parameters. Do you recommend to set this to a fixed value?

Setting it to a fixed value removes all guess work for what value it ends up with. Our larger sites, I find it easier to just set the value based on the RAM.

Of course, it is meaningless to set SHMSIZE greater than your overall kernel shmmax.

Can you tell us where to set the TRACEALL & TIMESTAMP parameters?


You can set these as environment variables in your script that initiates ProIV. However, please be aware that running TRACEALL with that sort of user load would fill terabytes within a few hours!

hth,

Joseph



Reply to this topic



  


0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users