BT 0.71

Started by jjwhalen, August 18, 2010, 03:37:52 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

jjwhalen

Much better ;D 8)

Now on to testing.


jjwhalen

#1
1) During my Tthrottle witch hunt (Op. cit.) I Removed my P8400 machine from BT.  When reattaching it, I noticed that the Computer>Find computers dialog seems to be broken.  After Find or Scan address range, you can't select anything from the list.  The checkmark appears for an instant then disappears.  I was able to manually add the computer with no problem by entering the IP address/password.  I rolled back to v0.69 & verified that the Computer>Find computers dialog list selection is working OK there.

2) Apparently the
Quote-Fixed: Tasks: Checkpoint: When the resources are > 1, like 2 Cpu, divide the time by 2.
fix didn't work out so well.  Checkpoints of the AQUA multicore are now FUBAR.  The new (counter) is definitely tallying the number of checkpoints, but the time appears to be a derivative of the CPU time (sum of the cores) minus some unknown.  For example, with a CPU time of 02d,01:00:01, the Checkpoint value reads 01d,12:46:50.  The actual Checkpoint interval is <7 minutes.  Also the seconds still appear to be incrementing at (wallclock) * (number of cores) :(

So, simple divide by...bad idea.  Sorry I suggested it.  FYI the CPU tiime & CPU time at last checkpoint on the Task Properties sheet look accurate (considering the multicores of course) and would seem to provide the basis for a simple divide by to an accurate wallclock time since last checkpoint.  Go figure.


fred

Quote from: jjwhalen on August 20, 2010, 12:35:50 PM
1) During my Tthrottle witch hunt (Op. cit.) I Removed my P8400 machine from BT.  When reattaching it, I noticed that the Computer>Find computers dialog seems to be broken.  After Find or Scan address range, you can't select anything from the list.  The checkmark appears for an instant then disappears.  I was able to manually add the computer with no problem by entering the IP address/password.  I rolled back to v0.69 & verified that the Computer>Find computers dialog list selection is working OK there.

2) Apparently the
Quote-Fixed: Tasks: Checkpoint: When the resources are > 1, like 2 Cpu, divide the time by 2.
fix didn't work out so well.  Checkpoints of the AQUA multicore are now FUBAR.  The new (counter) is definitely tallying the number of checkpoints, but the time appears to be a derivative of the CPU time (sum of the cores) minus some unknown.  For example, with a CPU time of 02d,01:00:01, the Checkpoint value reads 01d,12:46:50.  The actual Checkpoint interval is <7 minutes.  Also the seconds still appear to be incrementing at (wallclock) * (number of cores) :(

So, simple divide by...bad idea.  Sorry I suggested it.  FYI the CPU tiime & CPU time at last checkpoint on the Task Properties sheet look accurate (considering the multicores of course) and would seem to provide the basis for a simple divide by to an accurate wallclock time since last checkpoint.  Go figure.
1) I can't find anything wrong with it.
You can only set a check, when the BOINC or TThrottle column show a Yes.
2) The implementation of the checkpoints is... a bit strange.
Have to check that with AQUA. What project is running multicore?

jjwhalen

#3
Quote from: fred on August 20, 2010, 02:26:49 PM
1) I can't find anything wrong with it.
You can only set a check, when the BOINC or TThrottle column show a Yes.
2) The implementation of the checkpoints is... a bit strange.
Have to check that with AQUA. What project is running multicore?

1) Actually, I knew that...and (with the hosts already attached) I notice after a Find that the hosts have a Yes under Tthrottle but not under BOINC, which is showing a (correct) BOINC version number on the main Computers tab (also Status=Connected).  When I Remove one host and run Find again, same indications - with Yes under Tthrottle & not under BOINC, can't hold a checkmark.  This isn't a big problem for me, since I (a) have few hosts and (b) know my IP addresses.  But it's another one of those WTFs.  I'll check this out on one of my remote boxes and see what happens.  I just know that I've used this feature before to (re)attach machines, and without problem  :-\

2) Only the "IQUANA" [1.14 D-Wave's Iterative QUANtum Algorithms : Multi-Threaded (mt1)] is multicore/multithreaded.  Actually according to their Server Status it's the only subproject generating work right now anyway.  BE WARNED:  IQUANA has a known tendency to randomly sequester 1 of n cores while the task is in state "Waiting," effectively idling that core, since it won't actually do anything with it.  It's recommended to Suspend Project on AQUA when not actually running the task.  Rumor on the boards is that it's (yet another) bug in the BOINC Scheduler/Scheduling Policy.  I've seen this behavior on both Dual and Quad cores, at random times.  Suspending either the task or project immediately releases the core for rescheduling.  BTW IQUANA is automatically assigned the Resource of n.00 CPUs (up to 8 ).  I believe there are command-line options to limit the number of cores.  Other than the sequestering problem, IQUANA 1.14 looks pretty stable.  I just couldn't resist the goal - developing algorithms for imaginary Quantum computers, like something out of Isaac Asimov ;D  EDIT--In fact I'm reminded of the orignal Star Trek, where the old "duotronic" logic circuits had been superseded by Richard Daystrom's "multitronic" computer.  We may yet see the repeal of Moore's Law.


jjwhalen

If this has been pointed out before, I apologize in advance.  I searched the FAQ for this topic but didn't find.  I also reviewed the manual.

In Find computers if you "Scan address range" then hit "Stop", it doesn't stop.  Also whether you Stop or not, it appears to scan well past the upper address limit you set, possibly to the end of the subnet defined at the router.  (I say that because the scan appears to take the same time whether I set an upper limit of 2 or 255).  And if I do set an upper limit of 2, known addresses beyond 2 are detected.

Meanwhile you're locked out because the Find computers daughter window holds focus over the main BT display.  I hate that ;)  In fact I've used Process Explorer to kill BT because of it.  Few things tick me off more than my computer so busy doing something else that it won't respond to ME, its LORD & MASTER ;D


jjwhalen

Quote from: fred on August 20, 2010, 02:26:49 PM
1) ...
You can only set a check, when the BOINC or TThrottle column show a Yes.

I reviewed the Manual and found
QuoteFinding TThrottle and not BOINC (missing yes), indicates a missing remote_hosts.cfg or a missing entry in the file.
but I definitely have remote_hosts.cfg on all my boxes & as far as I know it's complete.  Certainly I can use BM to log onto any machine from any other machine using either IP or machine_name.  So what exactly does
Quotea missing entry in the file
refer to? I'll be glad to tweak my remote_hosts.cfg(s) any way I need to :)


fred

Quote from: jjwhalen on August 20, 2010, 04:38:37 PM
If this has been pointed out before, I apologize in advance.  I searched the FAQ for this topic but didn't find.  I also reviewed the manual.

In Find computers if you "Scan address range" then hit "Stop", it doesn't stop.  Also whether you Stop or not, it appears to scan well past the upper address limit you set, possibly to the end of the subnet defined at the router.  (I say that because the scan appears to take the same time whether I set an upper limit of 2 or 255).  And if I do set an upper limit of 2, known addresses beyond 2 are detected.

Meanwhile you're locked out because the Find computers daughter window holds focus over the main BT display.  I hate that ;)  In fact I've used Process Explorer to kill BT because of it.  Few things tick me off more than my computer so busy doing something else that it won't respond to ME, its LORD & MASTER ;D
The scan addresses, fires up to 255 thread to make a connection. In all this takes only 1 minute and killing is never a good idea, because you never know what is left behind. Handles, global memory, sockets, you don't want to know, making the system less stable.
And this process is unstoppable.
And it doesn't matter if it are 2 or 255 it will take just as long. It's the time needed for an adequate detection.

jjwhalen

Quote from: fred on August 20, 2010, 05:17:48 PM
The scan addresses, fires up to 255 thread to ma ke a connection. In all this takes only 1 minute and killing is never a good idea, because you never know what is left behind. Handles, global memory, sockets, you don't want to know, making the system less stable.
And this process is unstoppable.
And it doesn't matter if it are 2 or 255 it will take just as long. It's the time needed for an adequate detection.

So we've established that the Stop button on Find computers during Scan address range serves no function, and should be removed/grayed out/whatever ???  (BTW I make it 43 seconds on my Casio stopwatch.)  Also does this section of the Manual maybe need an update?  The FAQ is pretty informative.

Believe me as a Systems Analyst I know well the perils of killing an application in progress.  I've done enough banging on new application suites to find out first hand, and got shockingly well paid to do it.  I also know it's the OS's ( ;) ) responsibility to clean up after an application that crashes for any reason.  And on a preemptive multitasking system, it seems at least feasible, whether or not desirable, for BoincTasks to multitask away from Find Computers to do other BOINC things while the address scan is in progress.

But I'm just an end user, who's really grateful to have such a well-designed and useful tool 8) ;D


fred

Quote from: jjwhalen on August 20, 2010, 06:05:58 PM
Quote from: fred on August 20, 2010, 05:17:48 PM
The scan addresses, fires up to 255 thread to ma ke a connection. In all this takes only 1 minute and killing is never a good idea, because you never know what is left behind. Handles, global memory, sockets, you don't want to know, making the system less stable.
And this process is unstoppable.
And it doesn't matter if it are 2 or 255 it will take just as long. It's the time needed for an adequate detection.

So we've established that the Stop button on Find computers during Scan address range serves no function, and should be removed/grayed out/whatever ???  (BTW I make it 43 seconds on my Casio stopwatch.)  Also does this section of the Manual maybe need an update?  The FAQ is pretty informative.

Believe me as a Systems Analyst I know well the perils of killing an application in progress.  I've done enough banging on new application suites to find out first hand, and got shockingly well paid to do it.  I also know it's the OS's ( ;) ) responsibility to clean up after an application that crashes for any reason.  And on a preemptive multitasking system, it seems at least feasible, whether or not desirable, for BoincTasks to multitask away from Find Computers to do other BOINC things while the address scan is in progress.

But I'm just an end user, who's really grateful to have such a well-designed and useful tool 8) ;D
43 Seconds is within the design parameters. ;D
I made a note to make the search dialog independent of the main program. What's in another thread.
And it isn't locking the program, just the user input. ;D

BobCat13

Three computers are monitored by BT, let's call them 1 (localhost), 2 and 3 (both remote).  After switching to the Computers tab and unchecking one of the remote boxes (#2), then switching back to the Tasks tab, the Extra --> Allow network communication has the following listed:  1, 3, 1, 2, 3.

The first 1 and 3 work, but highlighting any of the 1, 2, or 3 below those causes BT to lockup and the machine becomes unusable until BT is terminated.  I have waited over 5 minutes and BT does not crash and restart, it just stays locked up.   

jjwhalen

Quote from: BobCat13 on August 22, 2010, 10:09:58 PM
Three computers are monitored by BT, let's call them 1 (localhost), 2 and 3 (both remote).  After switching to the Computers tab and unchecking one of the remote boxes (#2), then switching back to the Tasks tab, the Extra --> Allow network communication has the following listed:  1, 3, 1, 2, 3.

The first 1 and 3 work, but highlighting any of the 1, 2, or 3 below those causes BT to lockup and the machine becomes unusable until BT is terminated.  I have waited over 5 minutes and BT does not crash and restart, it just stays locked up.   

Good catch, BobCat13!  I have a similar setup - localhost & 2 remotes.  After not unchecking a remote host, but after BT 0.71 had been running for ~30 hours, I opened the Extra>Allow network communication> submenu to find an apparently infinite repitition of the 3 host names (full page height with scroll arrows top & bottom).  The submenu couldn't be manipulated or canceled, and BT locked up as above.  I killed BT via Process Explorer (don't cringe, Fred ;D) and restarted.  The behavior did not repeat after restart -- the Extra>Allow network communication> submenu was normal and worked as designed.  I'll check again after longer uptime.

I've not seen this problem before in earlier releases, even after a week of (BT) uptime.  But I don't recall any announced changes in this area of the app :-\


fred

#11
Quote from: BobCat13 on August 22, 2010, 10:09:58 PM
Three computers are monitored by BT, let's call them 1 (localhost), 2 and 3 (both remote).  After switching to the Computers tab and unchecking one of the remote boxes (#2), then switching back to the Tasks tab, the Extra --> Allow network communication has the following listed:  1, 3, 1, 2, 3.

The first 1 and 3 work, but highlighting any of the 1, 2, or 3 below those causes BT to lockup and the machine becomes unusable until BT is terminated.  I have waited over 5 minutes and BT does not crash and restart, it just stays locked up.  
Yep there is something wrong, seems some sort of looping. Easy to reproduce and should be easy to fix.
Fixed in V 0.72