rsc_fpops

Started by Tim Norton, August 18, 2010, 05:04:18 PM

Previous topic - Next topic

0 Members and 2 Guests are viewing this topic.

Tim Norton

Fred

could i request a bit more info and possibly a modification to BRS

as i understand it if i check the Expert option Limit rsc_fpops_bounds to avoid -177 errors RS does the following( - should this not be called extend the bounds of rsc_fpops as that is what it will do :))

BRS looks at rsc_fpops for each wu and replaces the <rsc_fpops_bound> value with 500000000000000000.000000 so that a wu cannot reach a point where it has not completed crunching before the bounds value is reached
not sure if it does this for all wu or just the ones it reschedules but i think it is causing boinc problems or its related

it appears to be that any wu that has been rescheduled then has inconsistent times to complete values which can vary wildly (based on the dcf as well)

as the wu's from seti before rescheduling have rsc_bounds 10 times larger than the rsc_fpops_est could we not use this estimate combined with say the fpops value in either the app info file (assuming its there) or put these values into RS so as the rsc_fpops_est will be wrong (too low) for cpu to gpu or (too high) for gpu to cpu and when boinc does its calcs this throws out the time to complete

this would allow more realistic estimates of the new bounds values and estimates to keep boinc happy - especially with the server side dcf for cpu, gpu and soon ap

with the new vlar policy its becoming less necessary to move work around for vlar reasons but being able to shuffle work between the cpu and gpu - especially in the outages is a very useful option

Hopefully i have explianed the above enough for you to follow what i am suggesting  ???
Thanks

Tim

fred

Quote from: Tim Norton on August 18, 2010, 05:04:18 PM
Fred

could i request a bit more info and possibly a modification to BRS

as i understand it if i check the Expert option Limit rsc_fpops_bounds to avoid -177 errors RS does the following( - should this not be called extend the bounds of rsc_fpops as that is what it will do :))

BRS looks at rsc_fpops for each wu and replaces the <rsc_fpops_bound> value with 500000000000000000.000000 so that a wu cannot reach a point where it has not completed crunching before the bounds value is reached
not sure if it does this for all wu or just the ones it reschedules but i think it is causing boinc problems or its related

it appears to be that any wu that has been rescheduled then has inconsistent times to complete values which can vary wildly (based on the dcf as well)

as the wu's from seti before rescheduling have rsc_bounds 10 times larger than the rsc_fpops_est could we not use this estimate combined with say the fpops value in either the app info file (assuming its there) or put these values into RS so as the rsc_fpops_est will be wrong (too low) for cpu to gpu or (too high) for gpu to cpu and when boinc does its calcs this throws out the time to complete

this would allow more realistic estimates of the new bounds values and estimates to keep boinc happy - especially with the server side dcf for cpu, gpu and soon ap

with the new vlar policy its becoming less necessary to move work around for vlar reasons but being able to shuffle work between the cpu and gpu - especially in the outages is a very useful option

Hopefully i have explianed the above enough for you to follow what i am suggesting  ???
The -177 check sets the bound on all WU's.
When WU's are rescheduled the bound is set to a factor 20 (normally this is 10). But the -177 check will override this.
Rescheduling work uses a factor that is the average ratio. When you set the debug check, you can see the ratio used.
This should correct the estimated time.
But of course rescheduling itself messes up the ratio on the server side.