need test for runaway compute errors

Started by JStateson, April 29, 2010, 12:01:45 AM

Previous topic - Next topic

0 Members and 2 Guests are viewing this topic.

JStateson

I just lost 300+ collatz tasks here when one gpu had a driver error and the other gpu was running ok.  The tasks all errored out within seconds and were refilled by the collatz queue withing minutes.  I only found out when I spotted 5 red highlighted rows in boinctasks.  Downloads occured 5 at a time.

I also have had this happen in gpugrid but that project has a limit of, as I recall, 12 tasks total in 24 hours.  It can occur with a video driver error or when remote console is inadvertanly brought up on a system running a gpu (vnc is ok, remote console stops the gpu)


The Wish:   That after 10 compute errors in a row with the same project and the same system, that the project is suspended and highlighted and something flash.  The 10 would be adjustable.

jjwhalen

Quote from: BeemerBiker on April 29, 2010, 12:01:45 AM
I also have had this happen in gpugrid but that project has a limit of, as I recall, 12 tasks total in 24 hours.  It can occur with a video driver error or when remote console is inadvertanly brought up on a system running a gpu (vnc is ok, remote console stops the gpu)

I believe GPUGRID's quota is actually 30 per day per processor.

I use Radmin 3.4 for my remote desktop.  Its server coexists well with my (NVIDIA) GPU.

Best wishes.


fred

Quote from: jjwhalen on April 29, 2010, 12:53:40 AM
Quote from: BeemerBiker on April 29, 2010, 12:01:45 AM
I also have had this happen in gpugrid but that project has a limit of, as I recall, 12 tasks total in 24 hours.  It can occur with a video driver error or when remote console is inadvertanly brought up on a system running a gpu (vnc is ok, remote console stops the gpu)

I believe GPUGRID's quota is actually 30 per day per processor.

I use Radmin 3.4 for my remote desktop.  Its server coexists well with my (NVIDIA) GPU.

Best wishes.
OK.
- Add: Suspend project after a set number of errors (like 10), exclude VLAR errors from Seti.