eFMer - BoincTasks and TThrottle forum

BoincTasks For Window, Mac & Linux => Wish List => Topic started by: Joseph Stateson on April 29, 2010, 12:01:45 AM

Title: need test for runaway compute errors
Post by: Joseph Stateson on April 29, 2010, 12:01:45 AM
I just lost 300+ collatz tasks here (http://boinc.thesonntags.com/collatz/results.php?hostid=20354&offset=0&show_names=0&state=5) when one gpu had a driver error and the other gpu was running ok.  The tasks all errored out within seconds and were refilled by the collatz queue withing minutes.  I only found out when I spotted 5 red highlighted rows in boinctasks.  Downloads occured 5 at a time.

I also have had this happen in gpugrid but that project has a limit of, as I recall, 12 tasks total in 24 hours.  It can occur with a video driver error or when remote console is inadvertanly brought up on a system running a gpu (vnc is ok, remote console stops the gpu)


The Wish:   That after 10 compute errors in a row with the same project and the same system, that the project is suspended and highlighted and something flash.  The 10 would be adjustable.
Title: Re: need test for runaway compute errors
Post by: jjwhalen on April 29, 2010, 12:53:40 AM
Quote from: BeemerBiker on April 29, 2010, 12:01:45 AM
I also have had this happen in gpugrid but that project has a limit of, as I recall, 12 tasks total in 24 hours.  It can occur with a video driver error or when remote console is inadvertanly brought up on a system running a gpu (vnc is ok, remote console stops the gpu)

I believe GPUGRID's quota is actually 30 per day per processor.

I use Radmin 3.4 (http://www.radmin.com/?r1=radmin&r2=interface&r3=clt30_menu_radmin) for my remote desktop.  Its server coexists well with my (NVIDIA) GPU.

Best wishes.
Title: Re: need test for runaway compute errors
Post by: fred on April 29, 2010, 12:55:37 AM
Quote from: jjwhalen on April 29, 2010, 12:53:40 AM
Quote from: BeemerBiker on April 29, 2010, 12:01:45 AM
I also have had this happen in gpugrid but that project has a limit of, as I recall, 12 tasks total in 24 hours.  It can occur with a video driver error or when remote console is inadvertanly brought up on a system running a gpu (vnc is ok, remote console stops the gpu)

I believe GPUGRID's quota is actually 30 per day per processor.

I use Radmin 3.4 (http://www.radmin.com/?r1=radmin&r2=interface&r3=clt30_menu_radmin) for my remote desktop.  Its server coexists well with my (NVIDIA) GPU.

Best wishes.
OK.
- Add: Suspend project after a set number of errors (like 10), exclude VLAR errors from Seti.