New Action Request

Started by John C, July 18, 2010, 03:58:33 AM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

John C

Fred -
I can't begin to express what a benefit these rules have been.  This truly is a game changer in terms of automating boinc.

OK, after playing with it for several weeks, there are a few things I'd like to propose. 

The first (and most important to me) is a new "pause & resume" action.

Especially for DNETC, tasks will get "stuck" but they are reset when you suspend and then resuming them so that usually they will finish successfully.  Right now, I'm just suspending when they get to a time mark and then manually restarting them later.  I'd love to have a "pause & resume" action within the rules so that I could automate doing that.  The key here is that I need to be able to have 2 rules with different thresholds so when a task gets to 8:00 it will reset (and its fine if it does it every minute) but once it gets to 12:00, I need the second rule to engage that suspends (or eventually aborts) it entirely.  Are rules processed in the order they are entered?  So that I would need to enter the 12:00 rule first and then the 8:00 rule, at 9:00 will it fail the 12:00 rule and then process the 8:00 rule?  Or would it process both and therefore the second rule would resume the pause that was intended to be permanent?  If that were the case, I'd really need an abort action to be added now for this to work.

I'd still like to see a rule that fired when the error count exceeded a trigger point on a project within a given time period.  So, if I had 10 errors on primegrid within a 5 minute period, as an example, then I could suspend that project.

And the other rule that I'd still like to see is to have a rule triggered when a machine loses its connection and fails to reconnect.  I'd only use this one with the external scripting capability to reboot the box that wasn't connecting (and was presumably frozen).  It's doesn't happen a lot and therefore is a lower priority, but this rule would really allow a bulletproof operational environment.

Thanks again for all you are doing here.  I keep BT up 24x7.  Good stuff.

fred

Quote from: John C on July 18, 2010, 03:58:33 AM
Fred -
I can't begin to express what a benefit these rules have been.  This truly is a game changer in terms of automating boinc.

OK, after playing with it for several weeks, there are a few things I'd like to propose. 

The first (and most important to me) is a new "pause & resume" action.

Especially for DNETC, tasks will get "stuck" but they are reset when you suspend and then resuming them so that usually they will finish successfully.  Right now, I'm just suspending when they get to a time mark and then manually restarting them later.  I'd love to have a "pause & resume" action within the rules so that I could automate doing that.  The key here is that I need to be able to have 2 rules with different thresholds so when a task gets to 8:00 it will reset (and its fine if it does it every minute) but once it gets to 12:00, I need the second rule to engage that suspends (or eventually aborts) it entirely.  Are rules processed in the order they are entered?  So that I would need to enter the 12:00 rule first and then the 8:00 rule, at 9:00 will it fail the 12:00 rule and then process the 8:00 rule?  Or would it process both and therefore the second rule would resume the pause that was intended to be permanent?  If that were the case, I'd really need an abort action to be added now for this to work.

I'd still like to see a rule that fired when the error count exceeded a trigger point on a project within a given time period.  So, if I had 10 errors on primegrid within a 5 minute period, as an example, then I could suspend that project.

And the other rule that I'd still like to see is to have a rule triggered when a machine loses its connection and fails to reconnect.  I'd only use this one with the external scripting capability to reboot the box that wasn't connecting (and was presumably frozen).  It's doesn't happen a lot and therefore is a lower priority, but this rule would really allow a bulletproof operational environment.

Thanks again for all you are doing here.  I keep BT up 24x7.  Good stuff.
I could make an event Pause/Resume in one action, so one event. That should work most of the time.
The error count is on the list, but these sort of rules have no relation to the other rules.
The connection lost is also on the list.

John C

Yep, a combined "pause and resume" action is what I was thinking, but we have to account for the occasional task that is thoroughly messed up to where that "reset" doesn't fix it.  In those instances, after we reset it and it still doesn't finish, then we need to either permanently suspend or abort it - which I was assuming would be a second rule.