segmentation error : coredump
Started by Vol Yip, Jun 24 2003 07:29 AM
35 replies to this topic
#1
Posted 24 June 2003 - 07:29 AM
My customer which is running AIX 5L and PRO-IV 46. Today, with no special event happened, user encountered "segmentation error: coredump" when try to connect to PRO-IV suddenly. At the time where the error happened, actually there were 30+ users existing at the server. And these users dead after the first "segmentation error: coredump" happened.
I don't know what happened and since no one is able to login to PRO-iV again (however, the rest of the applications work fine, like Oracle), I have to instruct customer to re-cycle the server.
What can I check for the reason why this segmentation error? Why does it happen suddenly? I need to take some preventive measure to aviod it happens again.
Thx,
Vol
I don't know what happened and since no one is able to login to PRO-iV again (however, the rest of the applications work fine, like Oracle), I have to instruct customer to re-cycle the server.
What can I check for the reason why this segmentation error? Why does it happen suddenly? I need to take some preventive measure to aviod it happens again.
Thx,
Vol
#2
Posted 24 June 2003 - 03:19 PM
Did you check to see it there were any errors in /etc/isamlog. If you don't have one, create a world writable one for the future.
Segmentation Fault is unfortunately, common in my environment. You could use a debugger to view the stack in the core dump file (Sorry I don't know aix but it would be something like 'adb' or 'dbx' in HP or Sun). This will tell you if it was ProIV or some part of your app. This is probably your best bet for tracking it down.
I have also seen this error when someone has stepped on the ProIV Shared Memory Segment; you can view that with ipcs -sob and look for the one that starts with x69. You can delete it with ipcrm and it will get recreated again when ProIV runs for the first time (make sure all ProIV processes are gone before you attempt this, especially iscollect). This MIGHT save you a reboot the next time IF this is what happened. Of course you then have to find out who/what stepped on it, and since it is usually world accessible, and that can be a chore.
hope this helps....
Segmentation Fault is unfortunately, common in my environment. You could use a debugger to view the stack in the core dump file (Sorry I don't know aix but it would be something like 'adb' or 'dbx' in HP or Sun). This will tell you if it was ProIV or some part of your app. This is probably your best bet for tracking it down.
I have also seen this error when someone has stepped on the ProIV Shared Memory Segment; you can view that with ipcs -sob and look for the one that starts with x69. You can delete it with ipcrm and it will get recreated again when ProIV runs for the first time (make sure all ProIV processes are gone before you attempt this, especially iscollect). This MIGHT save you a reboot the next time IF this is what happened. Of course you then have to find out who/what stepped on it, and since it is usually world accessible, and that can be a chore.
hope this helps....
#3
Posted 24 June 2003 - 03:46 PM
Hi Jerry,
You get segmentation faults alot???
What version of ProIV are you using? and what platform?
Having used Compaq Tru64 with ProIV 4.6 (and a bit using 5.0) for a long time, we very rarely got segmentation faults.
Any we did get were always explainable, and normally due to ProISAM files exceeding their limits.
From what I have seen on other systems, alot depends on how the system is monitored and treated. And the procedures for certain things, like machine crashes, imports, upgrades etc.
Rob D.
You get segmentation faults alot???
What version of ProIV are you using? and what platform?
Having used Compaq Tru64 with ProIV 4.6 (and a bit using 5.0) for a long time, we very rarely got segmentation faults.
Any we did get were always explainable, and normally due to ProISAM files exceeding their limits.
From what I have seen on other systems, alot depends on how the system is monitored and treated. And the procedures for certain things, like machine crashes, imports, upgrades etc.
Rob D.
#4 Guest_Guest_*
Posted 24 June 2003 - 04:46 PM
yea...lots of development/developers...lots of new installs...lots of flavors i.e. VMS/Solaris 2.6 thru 8/HP-UX 10 thru 11i/Linux...using ProIV v3,v4 and now v5.5 (TG we finally dropped 1.5102!!!) all with ProISAM/C-ISAM/V-CISAM/Oracle 7/8/9...which all adds up to seeing lots of different and often interesting errors.
I usually just troll this site (which is excellent BTW)...I only responded to this particular post because it caught my SysAdmin eye. I have always, and will continue to, disavow any knowledge of ProIV.
Regards,
Jerry
I usually just troll this site (which is excellent BTW)...I only responded to this particular post because it caught my SysAdmin eye. I have always, and will continue to, disavow any knowledge of ProIV.
Regards,
Jerry
#6
Posted 25 June 2003 - 04:17 AM
Hi,
Thats a very strange isamlog file... it should be text and you can read it.. did you get it from /etc ?
Looks like something is corrupt...
Try this...
This will create a new clean isamlog file. Then you can monitor and see if anything gets put in there.
If you want, attach (or email) me a copy of the /etc/isamdef file, and I can check the settings in there for you.
Rob D
Thats a very strange isamlog file... it should be text and you can read it.. did you get it from /etc ?
Looks like something is corrupt...
Try this...
- Get everyone off ProIV.
- Kill iscollect process
- Remove shared memory & semaphores, with the ipcrm command (UNIX)... do you know how to do that??
- Remove the /etc/isamlog file
- Create an emtpy /etc/isamlog file
- Log back into ProIV
This will create a new clean isamlog file. Then you can monitor and see if anything gets put in there.
If you want, attach (or email) me a copy of the /etc/isamdef file, and I can check the settings in there for you.
Rob D
#7
Posted 25 June 2003 - 04:29 AM
Thx Rob,
Attached is the isamdef.
BTW, I have killed iscollect and recycle server yesterday when segmentation error happened.
The isamlog file is newly generated file after the restart of server. BTW, do I need to create an empty isamlog file? The attached isamlog file was created by server from nothing.
How can I remover shared memory and semaphones?
Regards,
Vol
Attached is the isamdef.
BTW, I have killed iscollect and recycle server yesterday when segmentation error happened.
The isamlog file is newly generated file after the restart of server. BTW, do I need to create an empty isamlog file? The attached isamlog file was created by server from nothing.
How can I remover shared memory and semaphones?
Regards,
Vol
Attached Files
#8
Posted 25 June 2003 - 04:51 AM
If you rebooted yesterday, there is no need to do all that again then.. until you get things sorted.
You just need to remove isamlog, and then just use the touch command to create it, once we need to clear it.
Not sure if this is the problem, because I have never setup the isamdef file like this before...
In isamdef, you must specify MAXUSERS, USRFILES, MAXLOCKS,AVGLOCK or SHMSIZE.
In your file, you are specifying all of them, and this is incorrect.
I'm not sure what ProIV will do with this.
To determine what you should do, give me the output of the following 2 UNIX commands.
ipcs - a
isview -v
The first command shows all the Semaphores and Shared memory that the machine is using (ProIV, oracle + system)
The 2nd command tells me what the ID of the ProIV shared memory segment is...
Rob D.
You just need to remove isamlog, and then just use the touch command to create it, once we need to clear it.
Not sure if this is the problem, because I have never setup the isamdef file like this before...
In isamdef, you must specify MAXUSERS, USRFILES, MAXLOCKS,AVGLOCK or SHMSIZE.
In your file, you are specifying all of them, and this is incorrect.
I'm not sure what ProIV will do with this.
To determine what you should do, give me the output of the following 2 UNIX commands.
ipcs - a
isview -v
The first command shows all the Semaphores and Shared memory that the machine is using (ProIV, oracle + system)
The 2nd command tells me what the ID of the ProIV shared memory segment is...
Rob D.
#9
Posted 25 June 2003 - 05:20 AM
Rob,
Vol
# ipcs -a IPC status from /dev/mem as of Wed Jun 25 15:15:35 EET 2003 T ID KEY MODE OWNER GROUP CREATOR CGROUP CBYTES Message Queues: q 0 0x4107001c --rw-rw---- root printq root printq 00 T ID KEY MODE OWNER GROUP CREATOR CGROUP NATTCH Shared Memory: m 0 0x0d02d99f --rw-rw-rw- root system root system 0 2 m 1 0xe4663d62 --rw-rw-rw- imnadm imnadm imnadm imnadm 0 6 m 2 0x9308e451 --rw-rw-rw- imnadm imnadm imnadm imnadm 0 6 m 3 0x52e74b4f --rw-rw-rw- imnadm imnadm imnadm imnadm 0 6 m 4 0xc76283cc --rw-rw-rw- imnadm imnadm imnadm imnadm 0 6 m 5 0x298ee665 --rw-rw-rw- imnadm imnadm imnadm imnadm 0 6 m 524294 0x3351a5d0 --rw-rw---- oracle dba oracle dba 0 16 m 7 0x5800c89f --rw-rw-rw- root system root system 0 18 m 8 0xffffffff --rw-rw---- root system root system 0 8 m 10 0x00280267 --rw-r--r-- root system root system 0 1 m 393227 0x6900307c --rw-rw-rw- root system root system 0 4 T ID KEY MODE OWNER GROUP CREATOR CGROUP NSEMS Semaphores: s 393216 0xe4663d62 --ra-ra-ra- imnadm imnadm imnadm imnadm 2 186 s 1 0x6202d8e1 --ra-r--r-- root system root system 1 189 s 2 0xffffffff --ra-ra-ra- imnadm imnadm imnadm imnadm 2 186 s 3 0xffffffff --ra-ra-ra- imnadm imnadm imnadm imnadm 2 no6 s 131076 0x4400c89f --ra-ra-ra- root system root system 2 188 s 131077 0x5800c89f --ra-ra-ra- root system root system 1 188 s 393222 0x0102d833 --ra------- root system root system 1 156 s 131079 0x660028c4 --ra-ra-r-- root system root system 1 184 s 8 0x00280269 --ra-ra-ra- root system root system 14 154 s 655369 0x6900307c --ra-ra-ra- root system root system 2 154 / # / # / #isview -v isview ------ PRO-IV ® View Shared Memory/Semaphore. Copyright © 1999 PROIV Technology, Inc. All rights reserved. Unauthorised use strictly prohibited. PRO-ISAM Version: 13.04.17.00S Date: Oct 10, 2001 Shared memory key starts with 'i' [ hex 0x69 ] Shared Memory ID : 393227 Semaphore ID : 655369 / #Thx
Vol
#10
Posted 25 June 2003 - 09:23 AM
Hi,
Thats strange, the icps command has not displayed each row correctly.
I think that each row is > 80 chars, so you have lost some of the output maybe, because you screen is not wrapping the data... unfortunatly, the data I need is in the columns > 80
Could you try again? Or maybe your version of Unix needs a different Switch to output more data??
Thanks,
Rob D.
Thats strange, the icps command has not displayed each row correctly.
I think that each row is > 80 chars, so you have lost some of the output maybe, because you screen is not wrapping the data... unfortunatly, the data I need is in the columns > 80
Could you try again? Or maybe your version of Unix needs a different Switch to output more data??
Thanks,
Rob D.
#11
Posted 25 June 2003 - 09:42 AM
Hi Rob,
Do that enough?
Do that enough?
/ # ipcs -m -q -s -a -b -c -o -p -t IPC status from /dev/mem as of Wed Jun 25 19:40:08 EET 2003 T ID KEY MODE OWNER GROUP CREATOR CGROUP CBYTES Q NUM QBYTES LSPID LRPID STIME RTIME CTIME Message Queues: q 0 0x4107001c -Rrw-rw---- root printq root printq 0 0 4194304 0 0 no-entry no-entry 17:54:02 T ID KEY MODE OWNER GROUP CREATOR CGROUP NATTCH SEGSZ CPID LPID ATIME DTIME CTIME Shared Memory: m 0 0xe4663d62 --rw-rw-rw- imnadm imnadm imnadm imnadm 0 96 13932 19118 17:54:22 17:54:22 17:54:08 m 1 0x9308e451 --rw-rw-rw- imnadm imnadm imnadm imnadm 0 97948 13932 19118 17:54:22 17:54:22 17:54:08 m 2 0x52e74b4f --rw-rw-rw- imnadm imnadm imnadm imnadm 0 36028 13932 19118 17:54:22 17:54:22 17:54:08 m 3 0xc76283cc --rw-rw-rw- imnadm imnadm imnadm imnadm 0 16384 13932 19118 17:54:22 17:54:22 17:54:08 m 4 0x298ee665 --rw-rw-rw- imnadm imnadm imnadm imnadm 0 2844 13932 19118 17:54:22 17:54:22 17:54:08 m 6 0x5800c89f --rw-rw-rw- root system root system 0 13 4217728 19882 19882 17:54:20 no-entry 17:54:20 m 7 0xffffffff --rw-rw---- root system root system 0 4096 27092 27092 19:38:29 19:38:29 17:54:30 m 8 0x0d02d99f --rw-rw-rw- root system root system 0 1440 7606 23936 19:36:02 19:39:09 17:56:44 m 9 0x00280267 --rw-r--r-- root system root system 0 1048576 10220 10220 18:00:18 19:07:29 18:00:18 T ID KEY MODE OWNER GROUP CREATOR CGROUP NSEMS O TIME CTIME Semaphores: s 393216 0xe4663d62 --ra-ra-ra- imnadm imnadm imnadm imnadm 2 17: 54:22 17:54:08 s 1 0x6202d8e1 --ra-r--r-- root system root system 1 17: 53:21 17:53:21 s 2 0xffffffff --ra-ra-ra- imnadm imnadm imnadm imnadm 2 17: 54:22 17:54:08 s 3 0xffffffff --ra-ra-ra- imnadm imnadm imnadm imnadm 2 no- entry 17:54:08 s 131076 0x4400c89f --ra-ra-ra- root system root system 2 17: 54:20 17:54:20 s 131077 0x5800c89f --ra-ra-ra- root system root system 1 17: 54:20 17:54:20 s 131078 0x00280269 --ra-ra-ra- root system root system 14 18: 00:18 18:00:18 s 131079 0x660028c4 --ra-ra-r-- root system root system 1 17: 54:27 17:54:27 / # / #Thx
#13
Posted 25 June 2003 - 10:06 AM
If they rebooted, I'll need the isview -v again, since the id's will have changed.
Rob D.
Rob D.
#14
Posted 26 June 2003 - 12:41 AM
Rob,
This is the latest info.
Thank you
Regards,
Vol
This is the latest info.
/ # ipcs -m -q -s -a -b -c -o -p -t IPC status from /dev/mem as of Thu Jun 26 10:38:28 EET 2003 T ID KEY MODE OWNER GROUP CREATOR CGROUP CBYTES Q NUM QBYTES LSPID LRPID STIME RTIME CTIME Message Queues: q 0 0x4107001c -Rrw-rw---- root printq root printq 0 0 4194304 84924 13418 10:38:19 10:38:19 6:30:40 T ID KEY MODE OWNER GROUP CREATOR CGROUP NATTCH SEGSZ CPID LPID ATIME DTIME CTIME Shared Memory: m 0 0x0d02d99f --rw-rw-rw- root system root system 0 1440 8810 80900 10:36:31 10:36:33 6:30:15 m 1 0xe4663d62 --rw-rw-rw- imnadm imnadm imnadm imnadm 0 96 14964 19934 6:31:00 6:31:00 6:30:46 m 2 0x9308e451 --rw-rw-rw- imnadm imnadm imnadm imnadm 0 97948 14964 19934 6:31:00 6:31:00 6:30:46 m 3 0x52e74b4f --rw-rw-rw- imnadm imnadm imnadm imnadm 0 36028 14964 19934 6:31:00 6:31:00 6:30:46 m 4 0xc76283cc --rw-rw-rw- imnadm imnadm imnadm imnadm 0 16384 14964 19934 6:31:00 6:31:00 6:30:46 m 5 0x298ee665 --rw-rw-rw- imnadm imnadm imnadm imnadm 0 2844 14964 19934 6:31:00 6:31:00 6:30:46 m 6 0x3351a5d0 --rw-rw---- oracle dba oracle dba 0 17 9613696 20148 84084 10:33:37 10:33:51 6:30:54 m 7 0x5800c89f --rw-rw-rw- root system root system 0 13 4217728 23756 28382 6:31:06 6:31:06 6:30:58 m 8 0xffffffff --rw-rw---- root system root system 0 4096 28382 28382 10:37:06 10:37:06 6:31:09 m 9 0x6900307c --rw-rw-rw- root system root system 0 65532 30716 89372 10:33:37 10:36:17 6:31:29 m 10 0x6a74c5ac --rw-rw---- oracle dba oracle dba 0 10 7651072 5580 41296 6:49:22 6:54:29 6:48:33 m 11 0x115b867c --rw-rw---- oracle dba oracle dba 0 10 7651072 36566 83896 9:09:12 9:14:21 6:48:59 m 12 0x00280267 --rw-r--r-- root system root system 0 1048576 48876 77780 9:04:37 7:28:11 7:01:02 T ID KEY MODE OWNER GROUP CREATOR CGROUP NSEMS O TIME CTIME Semaphores: s 393216 0xe4663d62 --ra-ra-ra- imnadm imnadm imnadm imnadm 2 6: 31:00 6:30:46 s 1 0x6202d8e1 --ra-r--r-- root system root system 1 6: 29:49 6:29:49 s 2 0xffffffff --ra-ra-ra- imnadm imnadm imnadm imnadm 2 6: 31:00 6:30:46 s 3 0xffffffff --ra-ra-ra- imnadm imnadm imnadm imnadm 2 no- entry 6:30:46 s 131076 0x4400c89f --ra-ra-ra- root system root system 2 6: 30:58 6:30:58 s 131077 0x5800c89f --ra-ra-ra- root system root system 1 6: 30:58 6:30:58 s 131078 0x6900307c --ra-ra-ra- root system root system 2 10: 38:27 6:31:29 s 131079 0x660028c4 --ra-ra-r-- root system root system 1 6: 31:05 6:31:05 s 8 0x0102d833 --ra------- root system root system 1 10: 33:36 6:33:31 s 9 0x00280269 --ra-ra-ra- root system root system 14 10: 36:14 9:04:37 / # / # isview -v isview ------ PRO-IV ® View Shared Memory/Semaphore. Copyright © 1999 PROIV Technology, Inc. All rights reserved. Unauthorised use strictly prohibited. PRO-ISAM Version: 13.04.17.00S Date: Oct 10, 2001 Shared memory key starts with 'i' [ hex 0x69 ] Shared Memory ID : 9 Semaphore ID : 131078
Thank you
Regards,
Vol
#15
Posted 26 June 2003 - 03:52 AM
Ok,
That means that it seems to have used the SHMSIZE setting to define the size of shared memory.
So, I would comment out (ie put a # if front of) the following lines...
And then do the steps that I outlined before.....[1]Get everyone off ProIV.
[2]Kill iscollect process
[3]Remove shared memory & semaphores, with the ipcrm command (UNIX)... do you know how to do that??
[4]Remove the /etc/isamlog file
]5]Create an emtpy /etc/isamlog file
[6]Log back into ProIV
To remove shared memory etc....
Use 'isview -v' to find the ids....
then use the ipcrm command to remove them.
In your last example, you would use this line...
This will reset shared memory for you, and will save you having to reboot.
Also...
How many users do you normally have on the system?
And is your system Oracle,CISAM. Do you have many ProISAM files?
Depending one your systems size, you may have to increase SHMSIZE.
Normally, if ProIV runs out of space in Shared Memory, this is when it writes to isamlog. Normally 1 line of text explaining when it happened.
Lets first see if this procedure clears up the fact that rubbish is getting logged into this file.
Rob D.
That means that it seems to have used the SHMSIZE setting to define the size of shared memory.
So, I would comment out (ie put a # if front of) the following lines...
MAXUSERS = 128 max # users in system USRFILES = 100 max # open files per user MAXLOCKS = 1000 max # of simultaneous rcd locks AVGLOCK = 128 average size (bytes) of a record lock
And then do the steps that I outlined before.....[1]Get everyone off ProIV.
[2]Kill iscollect process
[3]Remove shared memory & semaphores, with the ipcrm command (UNIX)... do you know how to do that??
[4]Remove the /etc/isamlog file
]5]Create an emtpy /etc/isamlog file
[6]Log back into ProIV
To remove shared memory etc....
Use 'isview -v' to find the ids....
then use the ipcrm command to remove them.
In your last example, you would use this line...
ipcrm -s 131078 -m 9
This will reset shared memory for you, and will save you having to reboot.
Also...
How many users do you normally have on the system?
And is your system Oracle,CISAM. Do you have many ProISAM files?
Depending one your systems size, you may have to increase SHMSIZE.
Normally, if ProIV runs out of space in Shared Memory, this is when it writes to isamlog. Normally 1 line of text explaining when it happened.
Lets first see if this procedure clears up the fact that rubbish is getting logged into this file.
Rob D.
Reply to this topic
0 user(s) are reading this topic
0 members, 0 guests, 0 anonymous users