eFMer - BoincTasks and TThrottle forum

BoincTasks For Window, Mac & Linux => Beta Testing => Topic started by: Corsair on June 12, 2010, 02:50:49 PM

Title: BT 0.60
Post by: Corsair on June 12, 2010, 02:50:49 PM
hi all,

1 - after exit from BT, size and position of the window when restarted BT not saved.
2 - Rule editor window, not translated fields: Type, Operator, Value, time.
3 - when in  Rule editor windows set the time e.g. as 1d,0:12:0 or 00:12:00 converted to 00:00:12 ??

the rules are promising.
Title: Re: BT 0.60
Post by: fred on June 12, 2010, 03:01:11 PM
Quote from: Corsair on June 12, 2010, 02:50:49 PM
hi all,

1 - after exit from BT, size and position of the window when restarted BT not saved.
2 - Rule editor window, not translated fields: Type, Operator, Value, time.
3 - when in  Rule editor windows set the time e.g. as 1d,0:12:0 or 00:12:00 converted to 00:00:12 ??

the rules are promising.
1 - Works with me, use File -> exit to store these settings.
2 - Not quite sure if I will keep this, the way it is.
3 - What time, what type are you using? The one in "value" or the one in "time". Press the check button to see what BT thinks it is.

The one in time should normally be short as it is the trigger time.
Title: Re: BT 0.60
Post by: Corsair on June 12, 2010, 03:22:47 PM
Quote from: fred on June 12, 2010, 03:01:11 PM
Quote from: Corsair on June 12, 2010, 02:50:49 PM
hi all,

1 - after exit from BT, size and position of the window when restarted BT not saved.
2 - Rule editor window, not translated fields: Type, Operator, Value, time.
3 - when in  Rule editor windows set the time e.g. as 1d,0:12:0 or 00:12:00 converted to 00:00:12 ??

the rules are promising.
1 - Works with me, use File -> exit to store these settings.
2 - Not quite sure if I will keep this, the way it is.
3 - What time, what type are you using? The one in "value" or the one in "time". Press the check button to see what BT thinks it is.

The one in time should normally be short as it is the trigger time.

1 - done, working.
2 - will see ;)
3 - PM sent with screen shot.

but I really do not understand the rule, e.g.
I want to make a rule for computer "XXXX", this computer is:
- performing work for project A with application AA (GPU application) warehouse full and activated.
- project B application BB (GPU also) warehouse empty and deactivated, but project not(/or) suspended.
- project A run out of work and application AA is not running for HH:MM:SS
- activate work permit for project B app. BB, and ask for work for DD,HH:MM:SS
- deactivate work permit for B-BB and continuing asking for work of A-AA.
- finish work for B-BB and do same check again, if A-AA warehouse full don't ask for work.
- suspend B-BB after finishing the active work unit and continuing with A-AA
- check deadlines to finish in time work B-BB.

could it be possible?? as you could see is a way of working with to GPU projects, but with preference in one.

Title: Re: BT 0.60
Post by: fred on June 12, 2010, 05:15:48 PM
Quote from: Corsair on June 12, 2010, 03:22:47 PM

but I really do not understand the rule, e.g.
I want to make a rule for computer "XXXX", this computer is:
- performing work for project A with application AA (GPU application) warehouse full and activated.
- project B application BB (GPU also) warehouse empty and deactivated, but project not(/or) suspended.
- project A run out of work and application AA is not running for HH:MM:SS
- activate work permit for project B app. BB, and ask for work for DD,HH:MM:SS
- deactivate work permit for B-BB and continuing asking for work of A-AA.
- finish work for B-BB and do same check again, if A-AA warehouse full don't ask for work.
- suspend B-BB after finishing the active work unit and continuing with A-AA
- check deadlines to finish in time work B-BB.

could it be possible?? as you could see is a way of working with to GPU projects, but with preference in one.
These kinds of rules are impossible, because BOINC won't allow it.
And BOINC should take care of this automatically.

The rules are intended for tasks that are not behaving properly.
So run too long or use too much CPU time or the temperature of the GPU drops unexpectedly etc. So abnormal behavior.
Title: Re: BT 0.60
Post by: John C on June 12, 2010, 09:22:25 PM
The rule values are static, which is useful in many cases but doesn't allow rules such as when progress has stalled.  Would really like some way to indicate that progress % on an active task stays the same (regardless of value) for a period of time.  Currently no way to enter that condition.

Would be very nice to have rules for a lost connection.  Currently no way to indicate that.

Not necessary for first release, but might be nice to have a counter for tasks that fail and allow rules for threshholds.  IE. If I get 5 tasks that fail for project X within 60 seconds, then suspend that project (which would be a great rule to have for anyone who has ever tried to use Remote Desktop Connection to a GPU crunching box).  But it would also be useful if I suspend projects that fail but want to watch for repeat occurances that would indicate a full reboot is needed.

More than anything else, I really am hoping next beta will include ability to call external script when rule is triggered so that I will be able to automate reboots.  That's my number 1 need in terms of actions off the rules.

While I have SNMP controllable PDUs and wouldn't need or use this feature, it you REALLY want to get fancy then you could build a reboot function into TThrottle to allow rule generated remote reboots (so long as the entire machine wasn't frozen).
Title: Re: BT 0.60
Post by: John C on June 12, 2010, 09:24:53 PM
Noticed this in .59 also, but any reason why File / Exit takes so long to process?
Title: Re: BT 0.60
Post by: jjwhalen on June 12, 2010, 09:49:33 PM
Quote- Fixed: History: Should be fixed, but it wasn't: Einstein@Home/3.02 Global Correlations S5 search #1 (S5GCESSE2) still showed as GPU task.
QuoteSolved in 0.60 hopefully.  ;D

OK, entries already in History are now displaying correctly 8)  I've reattached 1 host to Einstein and will test with a new (S5GCESSE2) task also.
Title: Re: BT 0.60
Post by: John C on June 13, 2010, 04:56:25 AM
Not exactly sure what's going on here, but look at the forth task down in this screen cap.  Freehal is showing itself as 1400% complete and the bar extends to the left.

(http://www.chastain.us/media/temp/bt60.png)

About a minute later, it now shows that it has been running for 13 minutes, has 39 minutes to go, and is therefore 0.3% done.

(http://www.chastain.us/media/temp/bt60x2.png)

As of now, it has been running 28 minutes; it has 1:16 to go; and it is 5.6% complete.  Not sure how it will end up, but that math isn't adding up in my mind and the first graph is definitely off.
Title: Re: BT 0.60
Post by: John C on June 13, 2010, 04:59:28 AM
Also, that reminds me, when I move the computer name to the left where I really want it, all I get is the following (Note the names)

EDIT:  I figured this one out.  I had added "Computer" to be displayed early but also had left it at the end.  When it is listed twice, you display "initializing" for every data element as shown below.  As soon as I changed the second instance to "HIDE" it worked fine. 

(http://www.chastain.us/media/temp/bt60x3.png)
Title: Re: BT 0.60
Post by: John C on June 13, 2010, 06:41:23 AM
Quote from: John C on June 12, 2010, 09:22:25 PM
More than anything else, I really am hoping next beta will include ability to call external script when rule is triggered so that I will be able to automate reboots.  That's my number 1 need in terms of actions off the rules.

One more thought... while designing, it might be nice to consider eventual support for 2 actions from the same trigger.  I can envision instances where I would want to kill the task and then reboot the machine.  I'm thinking that's likely not especially difficult as long as you plan for it from the start.

Thanks.
Title: Re: BT 0.60
Post by: fred on June 13, 2010, 08:20:48 AM
Quote from: John C on June 12, 2010, 09:22:25 PM
1) The rule values are static, which is useful in many cases but doesn't allow rules such as when progress has stalled.  Would really like some way to indicate that progress % on an active task stays the same (regardless of value) for a period of time.  Currently no way to enter that condition.

Would be very nice to have rules for a lost connection.  Currently no way to indicate that.

Not necessary for first release, but might be nice to have a counter for tasks that fail and allow rules for threshholds.  IE. If I get 5 tasks that fail for project X within 60 seconds, then suspend that project (which would be a great rule to have for anyone who has ever tried to use Remote Desktop Connection to a GPU crunching box).  But it would also be useful if I suspend projects that fail but want to watch for repeat occurances that would indicate a full reboot is needed.

2) More than anything else, I really am hoping next beta will include ability to call external script when rule is triggered so that I will be able to automate reboots.  That's my number 1 need in terms of actions off the rules.

3) While I have SNMP controllable PDUs and wouldn't need or use this feature, it you REALLY want to get fancy then you could build a reboot function into TThrottle to allow rule generated remote reboots (so long as the entire machine wasn't frozen).
I'm working on the rules for just a week now. ;D
1) Is on the list.
2) Will include the start any bat or exe, with parameters.
3) TThrottle only allows local reboot.
Title: Re: BT 0.60
Post by: fred on June 13, 2010, 08:22:43 AM
Quote from: John C on June 12, 2010, 09:24:53 PM
Noticed this in .59 also, but any reason why File / Exit takes so long to process?
The exit has to wait for all threads to close. They may be busy communicating. So this may take a couple of seconds.
The total wait is maxed out at about 2 minutes, then BT tries to shut down everything.
Title: Re: BT 0.60
Post by: fred on June 13, 2010, 08:27:26 AM
Quote from: John C on June 13, 2010, 04:56:25 AM
Not exactly sure what's going on here, but look at the forth task down in this screen cap.  Freehal is showing itself as 1400% complete and the bar extends to the left.

About a minute later, it now shows that it has been running for 13 minutes, has 39 minutes to go, and is therefore 0.3% done.

As of now, it has been running 28 minutes; it has 1:16 to go; and it is 5.6% complete.  Not sure how it will end up, but that math isn't adding up in my mind and the first graph is definitely off.
Ah Freehal, I never have been able to make anything out of these applications.
The app should report the progress to BOINC and to BT, but there is probably something wrong with the Freehal application.
For the rules I removed the 100% limit and that causes the bar to go a bit.... weird. I will correct the progress bar.
Title: Re: BT 0.60
Post by: John C on June 13, 2010, 08:37:34 AM
Quote from: fred on June 13, 2010, 08:20:48 AM
3) TThrottle only allows local reboot.

Sorry, I wasn't clear.  Yes, TThrottle supports local reboot.  But TThrottle is communicating with BoicTasks and sending temperature information.  So what if BT could send a message back to TT instructing it to reboot that local machine?  Then if BT saw a string of jobs going south, it could just instruct that machine to reboot itself by using TT as the local reboot agent.

Again, this is lower priority in my mind. I already have a better way to reboot.  But it just seemed like an idea that others might find useful and that would even better integrate your tools.

And PS... I know you've only been working on the rules for a week, but I'm pretty excited by where this is going.  This is already my favorite monitoring tool and it is about to fill a huge need I have for active management as well.  That's huge.  So I'm having a hard time keeping my enthusiasm contained.  :-)
Title: Re: BT 0.60
Post by: fred on June 13, 2010, 08:49:15 AM
Quote from: John C on June 13, 2010, 06:41:23 AM
Quote from: John C on June 12, 2010, 09:22:25 PM
More than anything else, I really am hoping next beta will include ability to call external script when rule is triggered so that I will be able to automate reboots.  That's my number 1 need in terms of actions off the rules.

One more thought... while designing, it might be nice to consider eventual support for 2 actions from the same trigger.  I can envision instances where I would want to kill the task and then reboot the machine.  I'm thinking that's likely not especially difficult as long as you plan for it from the start.

Thanks.
2 identical rules would do this, one with a suspends a tasks and one that starts a bat file. I don't want to add a kill, too dangerous for the moment and a bit permanent.
Title: Re: BT 0.60
Post by: fred on June 13, 2010, 08:51:41 AM
Quote from: John C on June 13, 2010, 04:59:28 AM
Also, that reminds me, when I move the computer name to the left where I really want it, all I get is the following (Note the names)

EDIT:  I figured this one out.  I had added "Computer" to be displayed early but also had left it at the end.  When it is listed twice, you display "initializing" for every data element as shown below.  As soon as I changed the second instance to "HIDE" it worked fine. 

I added this to the to-do list, check for double entries.
Title: Re: BT 0.60
Post by: John C on June 13, 2010, 09:06:49 AM
Quote from: fred on June 13, 2010, 08:49:15 AM
Quote from: John C on June 13, 2010, 06:41:23 AM
Quote from: John C on June 12, 2010, 09:22:25 PM
More than anything else, I really am hoping next beta will include ability to call external script when rule is triggered so that I will be able to automate reboots.  That's my number 1 need in terms of actions off the rules.

One more thought... while designing, it might be nice to consider eventual support for 2 actions from the same trigger.  I can envision instances where I would want to kill the task and then reboot the machine.  I'm thinking that's likely not especially difficult as long as you plan for it from the start.

Thanks.

2 identical rules would do this, one with a suspends a tasks and one that starts a bat file. I don't want to add a kill, too dangerous for the moment and a bit permanent.

I suppose it depends on how you evaluate the rules. Rules can't be applied to suspended tasks or else they will fire off repeatedly against the task that was suspended.  Rules really only make sense against active tasks.  So, if the first rule is the suspend rule, then the second rule wouldn't fire unless you cache the state and run through all rules against the cached rather than the actual state.  Likewise if the first rule caused the reboot, then potentially the program wouldn't receive the instruction to suspend (unless someone build a slight delay into the batch file that triggered the reboot).

Anyway, using 2 rules is possible, it just creates a little more complexity.  I've already been thinking though that the same multi-rule approach could be used if we don't have batch parameters.  A different rule would just need to be setup for each server with each triggering a different batch file that was specific to that server.

As for kill (technically task abort), I'd prefer to have it.  The goal of this is to have automation and suspend just time-shifts the clean-up rather than truly automating it.  Suspend is the smart way to go initially as the user test his or her rules, but after they are tested it would be better to be able to convert to an abort - better for the user and better for the projects so that they would be notified of the abort and didn't have to wait for those suspended jobs to time out by users who got lazy and just allowed the deadlines to take care of their cleanup for them.  Of course, all of this impacts the "use 2 rules" approach and potentially further complicates that.  For my needs, I'll just put a delay in the reboot script and that will solve my known need.  I'm just wondering about the 2 rule scenario that we haven't yet considered because surely there are some.

Good stuff.  Thanks again for all you are doing on this.
Title: Re: BT 0.60
Post by: fred on June 13, 2010, 01:41:29 PM
Quote from: John C on June 13, 2010, 09:06:49 AM
Quote from: fred on June 13, 2010, 08:49:15 AM
Quote from: John C on June 13, 2010, 06:41:23 AM
Quote from: John C on June 12, 2010, 09:22:25 PM
More than anything else, I really am hoping next beta will include ability to call external script when rule is triggered so that I will be able to automate reboots.  That's my number 1 need in terms of actions off the rules.

One more thought... while designing, it might be nice to consider eventual support for 2 actions from the same trigger.  I can envision instances where I would want to kill the task and then reboot the machine.  I'm thinking that's likely not especially difficult as long as you plan for it from the start.

Thanks.

2 identical rules would do this, one with a suspends a tasks and one that starts a bat file. I don't want to add a kill, too dangerous for the moment and a bit permanent.

I suppose it depends on how you evaluate the rules. Rules can't be applied to suspended tasks or else they will fire off repeatedly against the task that was suspended.  Rules really only make sense against active tasks.  So, if the first rule is the suspend rule, then the second rule wouldn't fire unless you cache the state and run through all rules against the cached rather than the actual state.  Likewise if the first rule caused the reboot, then potentially the program wouldn't receive the instruction to suspend (unless someone build a slight delay into the batch file that triggered the reboot).

Anyway, using 2 rules is possible, it just creates a little more complexity.  I've already been thinking though that the same multi-rule approach could be used if we don't have batch parameters.  A different rule would just need to be setup for each server with each triggering a different batch file that was specific to that server.

As for kill (technically task abort), I'd prefer to have it.  The goal of this is to have automation and suspend just time-shifts the clean-up rather than truly automating it.  Suspend is the smart way to go initially as the user test his or her rules, but after they are tested it would be better to be able to convert to an abort - better for the user and better for the projects so that they would be notified of the abort and didn't have to wait for those suspended jobs to time out by users who got lazy and just allowed the deadlines to take care of their cleanup for them.  Of course, all of this impacts the "use 2 rules" approach and potentially further complicates that.  For my needs, I'll just put a delay in the reboot script and that will solve my known need.  I'm just wondering about the 2 rule scenario that we haven't yet considered because surely there are some.

Good stuff.  Thanks again for all you are doing on this.
Rules are only applied on running tasks. But they are captured at the same time. So they will execute at the same time (sequentially but close).

Probably later I will include the abort for advanced users. ;D Because this is potentially very dangerous, you could delete all tasks with a wrong rule.
And there are always things we didn't consider in advance.

And to be sure, every rule has it's own event and can start it's own external program, you don't need the parameters, only if you want to know the exact cause of the rule.
Title: Re: BT 0.60
Post by: fred on June 13, 2010, 02:37:43 PM
I will add  the possibility of 1 internal and 1 external event. So starting a program and suspending a task.
Title: Re: BT 0.60
Post by: John C on June 13, 2010, 04:02:10 PM
Quote from: fred on June 13, 2010, 02:37:43 PM
I will add  the possibility of 1 internal and 1 external event. So starting a program and suspending a task.

Like it.  Thanks.
Title: Re: BT 0.60
Post by: jjwhalen on June 13, 2010, 06:38:52 PM
Quote from: jjwhalen on June 12, 2010, 09:49:33 PM
Quote- Fixed: History: Should be fixed, but it wasn't: Einstein@Home/3.02 Global Correlations S5 search #1 (S5GCESSE2) still showed as GPU task.
QuoteSolved in 0.60 hopefully.  ;D

OK, entries already in History are now displaying correctly 8)  I've reattached 1 host to Einstein and will test with a new (S5GCESSE2) task also.

1) Confirming that newly completed (S5GCESSE2) WUs are displaying correctly in History; this issue looks to be resolved ;D
2) I'm following the discussion in this thread of the first implementation of Rules.  I agree with John C that the new functionality is very promising 8)
Title: Re: BT 0.60
Post by: idahofisherman on June 14, 2010, 02:28:56 AM
Love the new rules area.

I have a problem with the minute field (Middle field) of the timer field.  What ever I place in the second field ends up in the third field.  Therefore I am  unable to specify minutes.
Title: Re: BT 0.60
Post by: fred on June 14, 2010, 07:03:54 AM
Quote from: idahofisherman on June 14, 2010, 02:28:56 AM
Love the new rules area.

I have a problem with the minute field (Middle field) of the timer field.  What ever I place in the second field ends up in the third field.  Therefore I am  unable to specify minutes.
A problem that I can't reproduce for some reason. ???
Try something like 0d,00:30:00 or 30:00
Title: Re: BT 0.60
Post by: jjwhalen on June 14, 2010, 11:27:05 AM
Quote from: fred on June 14, 2010, 07:03:54 AM
Quote from: idahofisherman on June 14, 2010, 02:28:56 AM
Love the new rules area.

I have a problem with the minute field (Middle field) of the timer field.  What ever I place in the second field ends up in the third field.  Therefore I am  unable to specify minutes.
A problem that I can't reproduce for some reason. ???
Try something like 0d,00:30:00 or 30:00

I also am not seeing this problem, either in the Time or the Value field of the Rule Editor dialog ???
Title: Re: BT 0.60
Post by: Corsair on June 14, 2010, 11:48:30 AM
Quote from: idahofisherman on June 14, 2010, 02:28:56 AM
Love the new rules area.

I have a problem with the minute field (Middle field) of the timer field.  What ever I place in the second field ends up in the third field.  Therefore I am  unable to specify minutes.

the same happens to me too, any value inserted in "minutes" appears in the "seconds" field and as you state I've tried:
xd,xx:xx:00
xx:00
xx:xx:00.

that's what I posted in the initial message.
Title: Re: BT 0.60
Post by: John C on June 14, 2010, 12:10:25 PM
Quote from: jjwhalen on June 14, 2010, 11:27:05 AM
Quote from: fred on June 14, 2010, 07:03:54 AM
Quote from: idahofisherman on June 14, 2010, 02:28:56 AM
Love the new rules area.

I have a problem with the minute field (Middle field) of the timer field.  What ever I place in the second field ends up in the third field.  Therefore I am  unable to specify minutes.
A problem that I can't reproduce for some reason. ???
Try something like 0d,00:30:00 or 30:00

I also am not seeing this problem, either in the Time or the Value field of the Rule Editor dialog ???

Odd.  Like you (but unlike the others) I am not having this problem.  For whatever it is worth, I am running Win 7 x64 bit.
Title: Re: BT 0.60
Post by: fred on June 14, 2010, 02:01:04 PM
Quote from: John C on June 14, 2010, 12:10:25 PM
Quote from: jjwhalen on June 14, 2010, 11:27:05 AM
Quote from: fred on June 14, 2010, 07:03:54 AM
Quote from: idahofisherman on June 14, 2010, 02:28:56 AM
Love the new rules area.

I have a problem with the minute field (Middle field) of the timer field.  What ever I place in the second field ends up in the third field.  Therefore I am  unable to specify minutes.
A problem that I can't reproduce for some reason. ???
Try something like 0d,00:30:00 or 30:00

I also am not seeing this problem, either in the Time or the Value field of the Rule Editor dialog ???

Odd.  Like you (but unlike the others) I am not having this problem.  For whatever it is worth, I am running Win 7 x64 bit.
Yep found something, X64 fine X32 problem.
Something with data types no doubt, should be easy to find now I can reproduce it. ;D
Title: Re: BT 0.60
Post by: fred on June 14, 2010, 02:27:25 PM
Quote from: fred on June 14, 2010, 02:01:04 PM
Yep found something, X64 fine X32 problem.
Something with data types no doubt, should be easy to find now I can reproduce it. ;D
The problem was in the output, so the number entered in the rule is correct.
It's only shown incorrect.
Not sure why though.. I used __int64, changed that into int and that works fine. The Format function doesn't seem to like 64 integers on a 32 bit system. ;D
Title: Re: BT 0.60
Post by: wicked on June 17, 2010, 04:37:03 AM
Quote from: fred on June 12, 2010, 03:01:11 PM
Quote from: Corsair on June 12, 2010, 02:50:49 PM
1 - after exit from BT, size and position of the window when restarted BT not saved.
1 - Works with me, use File -> exit to store these settings.

This has never worked for me..  :'( not even using File -> Exit that I tried. BoincTasks always starts with default size and on my primary monitor. Could it be that it gets confused because I always place it on my second monitor? This is also on Win 7 x64, which may be relevant..
Title: Re: BT 0.60
Post by: fred on June 17, 2010, 06:07:40 AM
Quote from: wicked on June 17, 2010, 04:37:03 AM
Quote from: fred on June 12, 2010, 03:01:11 PM
Quote from: Corsair on June 12, 2010, 02:50:49 PM
1 - after exit from BT, size and position of the window when restarted BT not saved.
1 - Works with me, use File -> exit to store these settings.

This has never worked for me..  :'( not even using File -> Exit that I tried. BoincTasks always starts with default size and on my primary monitor. Could it be that it gets confused because I always place it on my second monitor? This is also on Win 7 x64, which may be relevant..
Don't know why, I've seen it myself, something between versions.
I use a second monitor as well and haven't seen any problems.
And.... Windows treats the whole screen, of the 2 monitors as one virtual screen, so the program doesn't see the difference.