BT 0.60

Started by Corsair, June 12, 2010, 02:50:49 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Corsair

hi all,

1 - after exit from BT, size and position of the window when restarted BT not saved.
2 - Rule editor window, not translated fields: Type, Operator, Value, time.
3 - when in  Rule editor windows set the time e.g. as 1d,0:12:0 or 00:12:00 converted to 00:00:12 ??

the rules are promising.
Roses don't bloom on the sailor's grave

Corsair.

fred

Quote from: Corsair on June 12, 2010, 02:50:49 PM
hi all,

1 - after exit from BT, size and position of the window when restarted BT not saved.
2 - Rule editor window, not translated fields: Type, Operator, Value, time.
3 - when in  Rule editor windows set the time e.g. as 1d,0:12:0 or 00:12:00 converted to 00:00:12 ??

the rules are promising.
1 - Works with me, use File -> exit to store these settings.
2 - Not quite sure if I will keep this, the way it is.
3 - What time, what type are you using? The one in "value" or the one in "time". Press the check button to see what BT thinks it is.

The one in time should normally be short as it is the trigger time.

Corsair

#2
Quote from: fred on June 12, 2010, 03:01:11 PM
Quote from: Corsair on June 12, 2010, 02:50:49 PM
hi all,

1 - after exit from BT, size and position of the window when restarted BT not saved.
2 - Rule editor window, not translated fields: Type, Operator, Value, time.
3 - when in  Rule editor windows set the time e.g. as 1d,0:12:0 or 00:12:00 converted to 00:00:12 ??

the rules are promising.
1 - Works with me, use File -> exit to store these settings.
2 - Not quite sure if I will keep this, the way it is.
3 - What time, what type are you using? The one in "value" or the one in "time". Press the check button to see what BT thinks it is.

The one in time should normally be short as it is the trigger time.

1 - done, working.
2 - will see ;)
3 - PM sent with screen shot.

but I really do not understand the rule, e.g.
I want to make a rule for computer "XXXX", this computer is:
- performing work for project A with application AA (GPU application) warehouse full and activated.
- project B application BB (GPU also) warehouse empty and deactivated, but project not(/or) suspended.
- project A run out of work and application AA is not running for HH:MM:SS
- activate work permit for project B app. BB, and ask for work for DD,HH:MM:SS
- deactivate work permit for B-BB and continuing asking for work of A-AA.
- finish work for B-BB and do same check again, if A-AA warehouse full don't ask for work.
- suspend B-BB after finishing the active work unit and continuing with A-AA
- check deadlines to finish in time work B-BB.

could it be possible?? as you could see is a way of working with to GPU projects, but with preference in one.

Roses don't bloom on the sailor's grave

Corsair.

fred

Quote from: Corsair on June 12, 2010, 03:22:47 PM

but I really do not understand the rule, e.g.
I want to make a rule for computer "XXXX", this computer is:
- performing work for project A with application AA (GPU application) warehouse full and activated.
- project B application BB (GPU also) warehouse empty and deactivated, but project not(/or) suspended.
- project A run out of work and application AA is not running for HH:MM:SS
- activate work permit for project B app. BB, and ask for work for DD,HH:MM:SS
- deactivate work permit for B-BB and continuing asking for work of A-AA.
- finish work for B-BB and do same check again, if A-AA warehouse full don't ask for work.
- suspend B-BB after finishing the active work unit and continuing with A-AA
- check deadlines to finish in time work B-BB.

could it be possible?? as you could see is a way of working with to GPU projects, but with preference in one.
These kinds of rules are impossible, because BOINC won't allow it.
And BOINC should take care of this automatically.

The rules are intended for tasks that are not behaving properly.
So run too long or use too much CPU time or the temperature of the GPU drops unexpectedly etc. So abnormal behavior.

John C

The rule values are static, which is useful in many cases but doesn't allow rules such as when progress has stalled.  Would really like some way to indicate that progress % on an active task stays the same (regardless of value) for a period of time.  Currently no way to enter that condition.

Would be very nice to have rules for a lost connection.  Currently no way to indicate that.

Not necessary for first release, but might be nice to have a counter for tasks that fail and allow rules for threshholds.  IE. If I get 5 tasks that fail for project X within 60 seconds, then suspend that project (which would be a great rule to have for anyone who has ever tried to use Remote Desktop Connection to a GPU crunching box).  But it would also be useful if I suspend projects that fail but want to watch for repeat occurances that would indicate a full reboot is needed.

More than anything else, I really am hoping next beta will include ability to call external script when rule is triggered so that I will be able to automate reboots.  That's my number 1 need in terms of actions off the rules.

While I have SNMP controllable PDUs and wouldn't need or use this feature, it you REALLY want to get fancy then you could build a reboot function into TThrottle to allow rule generated remote reboots (so long as the entire machine wasn't frozen).

John C

Noticed this in .59 also, but any reason why File / Exit takes so long to process?

jjwhalen

Quote- Fixed: History: Should be fixed, but it wasn't: Einstein@Home/3.02 Global Correlations S5 search #1 (S5GCESSE2) still showed as GPU task.
QuoteSolved in 0.60 hopefully.  ;D

OK, entries already in History are now displaying correctly 8)  I've reattached 1 host to Einstein and will test with a new (S5GCESSE2) task also.


John C

Not exactly sure what's going on here, but look at the forth task down in this screen cap.  Freehal is showing itself as 1400% complete and the bar extends to the left.



About a minute later, it now shows that it has been running for 13 minutes, has 39 minutes to go, and is therefore 0.3% done.



As of now, it has been running 28 minutes; it has 1:16 to go; and it is 5.6% complete.  Not sure how it will end up, but that math isn't adding up in my mind and the first graph is definitely off.

John C

#8
Also, that reminds me, when I move the computer name to the left where I really want it, all I get is the following (Note the names)

EDIT:  I figured this one out.  I had added "Computer" to be displayed early but also had left it at the end.  When it is listed twice, you display "initializing" for every data element as shown below.  As soon as I changed the second instance to "HIDE" it worked fine. 


John C

Quote from: John C on June 12, 2010, 09:22:25 PM
More than anything else, I really am hoping next beta will include ability to call external script when rule is triggered so that I will be able to automate reboots.  That's my number 1 need in terms of actions off the rules.

One more thought... while designing, it might be nice to consider eventual support for 2 actions from the same trigger.  I can envision instances where I would want to kill the task and then reboot the machine.  I'm thinking that's likely not especially difficult as long as you plan for it from the start.

Thanks.

fred

Quote from: John C on June 12, 2010, 09:22:25 PM
1) The rule values are static, which is useful in many cases but doesn't allow rules such as when progress has stalled.  Would really like some way to indicate that progress % on an active task stays the same (regardless of value) for a period of time.  Currently no way to enter that condition.

Would be very nice to have rules for a lost connection.  Currently no way to indicate that.

Not necessary for first release, but might be nice to have a counter for tasks that fail and allow rules for threshholds.  IE. If I get 5 tasks that fail for project X within 60 seconds, then suspend that project (which would be a great rule to have for anyone who has ever tried to use Remote Desktop Connection to a GPU crunching box).  But it would also be useful if I suspend projects that fail but want to watch for repeat occurances that would indicate a full reboot is needed.

2) More than anything else, I really am hoping next beta will include ability to call external script when rule is triggered so that I will be able to automate reboots.  That's my number 1 need in terms of actions off the rules.

3) While I have SNMP controllable PDUs and wouldn't need or use this feature, it you REALLY want to get fancy then you could build a reboot function into TThrottle to allow rule generated remote reboots (so long as the entire machine wasn't frozen).
I'm working on the rules for just a week now. ;D
1) Is on the list.
2) Will include the start any bat or exe, with parameters.
3) TThrottle only allows local reboot.

fred

Quote from: John C on June 12, 2010, 09:24:53 PM
Noticed this in .59 also, but any reason why File / Exit takes so long to process?
The exit has to wait for all threads to close. They may be busy communicating. So this may take a couple of seconds.
The total wait is maxed out at about 2 minutes, then BT tries to shut down everything.

fred

Quote from: John C on June 13, 2010, 04:56:25 AM
Not exactly sure what's going on here, but look at the forth task down in this screen cap.  Freehal is showing itself as 1400% complete and the bar extends to the left.

About a minute later, it now shows that it has been running for 13 minutes, has 39 minutes to go, and is therefore 0.3% done.

As of now, it has been running 28 minutes; it has 1:16 to go; and it is 5.6% complete.  Not sure how it will end up, but that math isn't adding up in my mind and the first graph is definitely off.
Ah Freehal, I never have been able to make anything out of these applications.
The app should report the progress to BOINC and to BT, but there is probably something wrong with the Freehal application.
For the rules I removed the 100% limit and that causes the bar to go a bit.... weird. I will correct the progress bar.

John C

Quote from: fred on June 13, 2010, 08:20:48 AM
3) TThrottle only allows local reboot.

Sorry, I wasn't clear.  Yes, TThrottle supports local reboot.  But TThrottle is communicating with BoicTasks and sending temperature information.  So what if BT could send a message back to TT instructing it to reboot that local machine?  Then if BT saw a string of jobs going south, it could just instruct that machine to reboot itself by using TT as the local reboot agent.

Again, this is lower priority in my mind. I already have a better way to reboot.  But it just seemed like an idea that others might find useful and that would even better integrate your tools.

And PS... I know you've only been working on the rules for a week, but I'm pretty excited by where this is going.  This is already my favorite monitoring tool and it is about to fill a huge need I have for active management as well.  That's huge.  So I'm having a hard time keeping my enthusiasm contained.  :-)

fred

Quote from: John C on June 13, 2010, 06:41:23 AM
Quote from: John C on June 12, 2010, 09:22:25 PM
More than anything else, I really am hoping next beta will include ability to call external script when rule is triggered so that I will be able to automate reboots.  That's my number 1 need in terms of actions off the rules.

One more thought... while designing, it might be nice to consider eventual support for 2 actions from the same trigger.  I can envision instances where I would want to kill the task and then reboot the machine.  I'm thinking that's likely not especially difficult as long as you plan for it from the start.

Thanks.
2 identical rules would do this, one with a suspends a tasks and one that starts a bat file. I don't want to add a kill, too dangerous for the moment and a bit permanent.