Waitng for database - file - full function
Posted 21 July 2004 - 02:08 PM
One of our clients had their ProIV 5.5, AIX, Oracle 8.1.7 hang completely last night.
All transactions were committing, however, the app hung after the commit and before returning processing to the end user. It did not appear to be a record lock situation because the commits did occur.
The only message of note was:
WAITING FOR DATABASE RESPONSE - FILE - Full function
This message appeared for multiple users.
The client was able to add rows to tables outside of ProIV and COMMIT changes without any problem.
The log file in bdump showed no errors. Also, no system errors showed up.
The database did not shut down cleanly and needed to be aborted. After bringing the database back up, everything seems fine. Of course, the question looms - what caused this and will it happen again.
Oracle dutifully responded that since the message appeared in ProIV, it could not be an Oracle issue...
Has anyone dealt with this?
Posted 21 July 2004 - 02:23 PM
What is the full function mode SQL doing? Just a select?
It sounds like the session was waiting on a record lock, as generally the "WAITING FOR DATABASE RESPONSE" message is because ProIV is doing a "select... for update" (i.e. your file is in change mode) and can't get a lock on the desired record because another process already has that record locked.
The most common cause I have seen where many users have the "WAITING FOR DATABASE RESPONSE" message is because someone is sitting in a maintenance screen in change mode on a record that is regularly opened in change mode by other transactions.
The other is if a really long process is running and has records in change mode without a commit strategy.
You might want to do a query in Oracle if it happens again to list record locks.
Posted 21 July 2004 - 02:54 PM
The problem was occuring all over the place. I went into a table that no one else would ever use an added a row to it. The row added, but the session hung after the commit. I know this because I killed the session and the new row was in the table.
When I did this, I did not get the record lock message, but I did hang.
I suspect the record lock message may be a red herring. However, it is interesting in that it did not return a table name, but instead the words Full function.
Posted 21 July 2004 - 03:42 PM
Everytime I've seen a record lock in Oracle, the message WAITING FOR DATABASE RESPONSE - FILE - is followed by a table name.
If you are using embedded full function mode SQL in the function, then it will say "WAITING FOR DATABASE RESPONSE - FILE - Full Function" if it is waiting on a record lock, or if your SQL query is very slow.
Are you using any embedded SQL in your functions?
Posted 22 July 2004 - 10:34 AM
Had a quick search for "commit does not return" on Metalink and Google and there are results but I didn't quickly find anything that seemed clearly relevant to a problem with ProIV.
The following link seemed kind of close but it was with V7.. and the guy didn't seem to have got any answers!
My (rather obvious) guess would be you are hitting some (configurable) resource limit in Unix or Oracle but I don't know where to suggest you start looking. If that's the case though, I'm kind of surprised it's not more of a "well-known" problem.
Did you get to the bottom of it already?
I'd also suspect the "full function" bit could be a red herring and that ProIV might be showing that because it has a "generic" waiting message that simplistically says "full function" whenever there's no obviously relevant filedef - and there is no particular filedef associated with a commit call..
Edited by Richard Bassett, 22 July 2004 - 10:36 AM.
Posted 22 July 2004 - 11:25 AM
Just did some reading up in my Oracle Admin textbook.
Steps in a commit :
1 Server process generates a SCN number (System Change Number) and is assigned a rollback segment. Then the rollback segment is marked that the transaction is committed
2 The LGWR process writes the redo log buffers to the online redo log files along with the SCN number
3 The server process releases locks held on rows and tables
4 The user is notified that the commit is complete
5 The server process marks the transaction as complete
So i would think either the LGWR process is not running or not working for some unkown reason, your redo log file is inaccesible or corrupt, or a redo log switch is being attempted and can't be done for some reason. Best to probably get a dba in there and sort it out
Posted 22 July 2004 - 01:58 PM
Is it possible redo log archiving fell behind for some reason and hence the database could not switch online redo logs because the next log group in the cycle still had not been [completely] archived?
a redo log switch is being attempted and can't be done for some reason
This is just a thought - I'm not sure I've ever actually seen what happens in that scenario - I would have hoped you'd see a meaningful message or active warnings somewhere!
Posted 22 July 2004 - 02:41 PM
We have very little embedded SQL. Based on the number of places where users were encountering the error, I doubt that the embedded SQL was in play. Also, our use of embedded SQL is largely limited to reports and look ups, so it should not be creating lock problems.
Posted 22 July 2004 - 02:50 PM
Is it possible redo log archiving fell behind for some reason and hence the database could not switch online redo logs because the next log group in the cycle still had not been [completely]archived?
It's possible. Of course, the problem (thankfully) has not recurred since they bounced the database. Of course, the concern is that whatever the root cause of the issue is was most likely not addressed by simply bouncing the database.
There were no meaningful errors anywhere (bdump, root's mail, system errors, etc.)
What I may have to check though is the possibility that the SQLTIMEOUT could have been a factor. I saw an issue on SQL Server that forced me to yank SQLTIMEOUT.
The client had a large database (>10 gig). The problem was that some transactions were forcing a table to autoextend. When the table could not autoextend within 3 seconds, ProIV was immediately rolling back the operation - without error.
The user lost their transaction every time. SQL Server never reported an error, because no error had occurred.
Perhaps there is a similar potential for problem with setting this variable on the Oracle side. A hypothetical transaction might force Oracle to cycle its redo logs which it cannot complete within 3 seconds.
Does anyone else set the SQLTIMEOUT for Oracle?
Posted 22 July 2004 - 03:06 PM
That would indicate excessive activity around cycling logfiles, etc, etc.
of the poster and do not represent those of any organisation.
Posted 22 July 2004 - 03:37 PM
You mention checking some logs - di you check the oracle alert log?
The oracle log I checked was the one in /u01/app/admin/database name/bdump/alert_database_name.log
It did not show anything happening around the time of the hang up. In fact the greatest majority of what was in that log was just the fact that it was cycling through redo logs.
If there are other logs that may be of help, I'd be happy to know what / where they are.
Posted 23 July 2004 - 08:19 AM
Yes it is possible that its unable to write the redo log, especially if over the network to another machine/backup server. Could also be that the redo log switch is happening to fast for Oracle to catch up. For eg switches to redo log 1 but while writing to redo log 1, redo log 2 needs to be written as well etc etc. This can also happen if archiving of redo logs is setup
Bad idea on the autoextend, resource intensive as you have found out. Rather add more datafiles. With regards to the redo logs, how often are they switching ? Every 5 minutes, once a day ? Also check the alert file for any trace files being written to in udump.
Probably best to get a certified dba in there to do some performance tuning on your database
Reply to this topic
0 user(s) are reading this topic
0 members, 0 guests, 0 anonymous users