News:

Follow BoincTasks on Twitter Facebook        Visit our website here.
BoincTasks cloud login is working again

Main Menu

Switched, busy...

Started by Pepo, September 24, 2010, 12:11:39 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

wicked

Quote from: fred on September 25, 2010, 10:35:10 AM
Is there anyone else, who recognizes these problems.

Oh yes, I've been haunted by "Switched, busy..." messages with BT. I currently have 1 external and 2 local machines monitored. (And 4 external that are ticked off in Computers-tab.) Machines can sometimes be slow to respond or even off line. This seems to especially happen when there are machines off line, even if it's one of the local ones.

I could actually just reproduce this as the external machine is off line. When I switch to Transfers-tab, it took a lot of time for the Switched-message to go away and BT to show an empty screen. It's like it's waiting a lot of time to give up on the offl ine machine.. I have only one of my local one selected for display, though. And it seems to have only happened once as now the subsequent switches are fast again.

There's another symptom I have seen that may or may not be related. Sometimes on Tasks-tab I see status line to show "Update in 6 seconds" then "Updating..." then "Update in 5 seconds" then "Updating..." again and then "Update in 4 seconds" followed by "Updating..." until countdown goes to 1 and then starts over. I have set refresh rate to Slow. I think this was associated with constant ~200kB network transfer rate I was seeing while back in BT.. (This was with all machines on line so 5 external were getting lot of traffic targeted to them.)

Pepo

#16
Quote from: wicked on September 27, 2010, 04:30:08 AM
Quote from: fred on September 25, 2010, 10:35:10 AM
Is there anyone else, who recognizes these problems.
Machines can sometimes be slow to respond or even off line. This seems to especially happen when there are machines off line, even if it's one of the local ones.

I could actually just reproduce this as the external machine is off line. When I switch to Transfers-tab, it took a lot of time for the Switched-message to go away and BT to show an empty screen. It's like it's waiting a lot of time to give up on the offl ine machine.. I have only one of my local one selected for display, though. And it seems to have only happened once as now the subsequent switches are fast again.
BINGO! Offline - that should be it. As Fred said, upon each tab switch, all machines are contacted again and this could explain my roughly identical wait times (around 20 seconds - probably network communication timeout) on each tab. And my external machine did go off-line...

QuoteI think this was associated with constant ~200kB network transfer rate I was seeing while back in BT.. (This was with all machines on line so 5 external were getting lot of traffic targeted to them.)
This was the reason I was suggesting optional separate refresh rates for (especially far remote external) computers. (I suspect it did not find its way into the wish list :'()
Peter

fred

Quote from: Pepo on September 27, 2010, 07:49:12 AM
BINGO! Offline - that should be it. As Fred said, upon each tab switch, all machines are contacted again and this could explain my roughly identical wait times (around 20 seconds - probably network communication timeout) on each tab. And my external machine did go off-line...

I think this was associated with constant ~200kB network transfer rate I was seeing while back in BT.. (This was with all machines on line so 5 external were getting lot of traffic targeted to them.)

This was the reason I was suggesting optional separate refresh rates for (especially far remote external) computers. (I suspect it did not find its way into the wish list :'()
A computer that times out a couple of times is moved to the unconnected list, but there will be a retry when you setup Reconnect everry....
When a computer is in the unconnected list it's no longer in the request for data list.

The separate refresh is difficult to implement as I need the data from all computers at once in the case of the history.
The other tabs shouldn't request that much data except off course the task tab.

fred

Quote from: wicked on September 27, 2010, 04:30:08 AM
Quote from: fred on September 25, 2010, 10:35:10 AM
Is there anyone else, who recognizes these problems.

1) Oh yes, I've been haunted by "Switched, busy..." messages with BT. I currently have 1 external and 2 local machines monitored. (And 4 external that are ticked off in Computers-tab.) Machines can sometimes be slow to respond or even off line. This seems to especially happen when there are machines off line, even if it's one of the local ones.

I could actually just reproduce this as the external machine is off line. When I switch to Transfers-tab, it took a lot of time for the Switched-message to go away and BT to show an empty screen. It's like it's waiting a lot of time to give up on the offl ine machine.. I have only one of my local one selected for display, though. And it seems to have only happened once as now the subsequent switches are fast again.

2) There's another symptom I have seen that may or may not be related. Sometimes on Tasks-tab I see status line to show "Update in 6 seconds" then "Updating..." then "Update in 5 seconds" then "Updating..." again and then "Update in 4 seconds" followed by "Updating..." until countdown goes to 1 and then starts over. I have set refresh rate to Slow. I think this was associated with constant ~200kB network transfer rate I was seeing while back in BT.. (This was with all machines on line so 5 external were getting lot of traffic targeted to them.)
1) When the computer goes off line, there may be some longer delays. The computer has to timeout first. After a couple of times the computer is regarded disconnected and things should go back to normal.
2) The update counter is linked to the last time it refreshed (wall clock) and the number of tasks, the more the slower it gets.
You may see the update of the history refresh. The history refresh is depended on the remaining time and has a maximum of 30 or 1/2 the expected remaining time.

Pepo

Quote from: Pepo on September 25, 2010, 01:03:13 PM
I've tried to restart BT. Now it takes your 1 second ???

Quote from: Pepo on September 26, 2010, 05:58:26 PM
BT runs on the machine with 6.11.6. As I've restarted BT, It had no contact to the machine with 6.11.7, the last one was some week ago (Sunday), but I do not remember whether the tabs' slugishness has been ever since, or appeared at some later point.

Quote from: Pepo on September 27, 2010, 07:49:12 AM
Quote from: wicked on September 27, 2010, 04:30:08 AM
I could actually just reproduce this as the external machine is off line. When I switch to Transfers-tab, it took a lot of time for the Switched-message to go away and BT to show an empty screen. It's like it's waiting a lot of time to give up on the offl ine machine.. I have only one of my local one selected for display, though. And it seems to have only happened once as now the subsequent switches are fast again.
BINGO! Offline - that should be it. As Fred said, upon each tab switch, all machines are contacted again and this could explain my roughly identical wait times (around 20 seconds - probably network communication timeout) on each tab. And my external machine did go off-line...

Apparently it is not that simple :-\ My BT suddenly needs 5 seconds to switch between any tabs. Suddenly. It did not happen a hour or two ago. The BT had no contact to any other machine with BOINC since the restart yesterday.
???
Peter

Pepo

Quote from: Pepo on September 27, 2010, 10:25:08 AM
My BT suddenly needs 5 seconds to switch between any tabs. Suddenly. It did not happen a hour or two ago. The BT had no contact to any other machine with BOINC since the restart yesterday.

I've enabled the log's Debug mode. Although the machine have had no contact to the remote one for a week and BT was restarted yesterday, the log looks like following:

BT opened on Tasks tab, not touching anything:
27 september 2010 - 12:56:44 Vetroplach, Connect ---- Invalid Socket
27 september 2010 - 12:56:50 Vetroplach, Connect ---- Invalid Socket
27 september 2010 - 12:56:56 Vetroplach, Connect ---- Invalid Socket
27 september 2010 - 12:57:02 hî ---- Host: vetroplach,Vetroplach,Port: 31416, Couldn't resolve hostname
27 september 2010 - 12:57:02 Vetroplach, Connect ---- Invalid Socket
27 september 2010 - 12:57:10 Vetroplach, Connect ---- Invalid Socket
27 september 2010 - 12:57:16 Vetroplach, Connect ---- Invalid Socket
27 september 2010 - 12:57:22 Vetroplach, Connect ---- Invalid Socket
27 september 2010 - 12:57:31 Vetroplach, Connect ---- Invalid Socket
27 september 2010 - 12:57:36 hî ---- Host: vetroplach,Vetroplach,Port: 31416, Couldn't resolve hostname
27 september 2010 - 12:57:36 Vetroplach, Connect ---- Invalid Socket
27 september 2010 - 12:57:44 Vetroplach, Connect ---- Invalid Socket
27 september 2010 - 12:57:49 Vetroplach, Connect ---- Invalid Socket
27 september 2010 - 12:57:54 Vetroplach, Connect ---- Invalid Socket
27 september 2010 - 12:57:59 Vetroplach, Connect ---- Invalid Socket
27 september 2010 - 12:58:04 hî ---- Host: vetroplach,Vetroplach,Port: 31416, Couldn't resolve hostname
27 september 2010 - 12:58:04 Vetroplach, Connect ---- Invalid Socket
27 september 2010 - 12:58:07 hî ---- Host: vetroplach,Vetroplach,Port: 31416, Couldn't resolve hostname
27 september 2010 - 12:58:14 Vetroplach, Connect ---- Invalid Socket

BT minimized to tray:
27 september 2010 - 13:00:40 hî ---- Host: vetroplach,Vetroplach,Port: 31416, Couldn't resolve hostname
27 september 2010 - 13:03:12 hî ---- Host: vetroplach,Vetroplach,Port: 31416, Couldn't resolve hostname
27 september 2010 - 13:05:44 hî ---- Host: vetroplach,Vetroplach,Port: 31416, Couldn't resolve hostname

BT just restored - opened on Tasks tab, again not touching anything:
27 september 2010 - 13:07:00 Vetroplach, Connect ---- Invalid Socket
27 september 2010 - 13:07:04 Vetroplach, Connect ---- Invalid Socket
27 september 2010 - 13:07:09 Vetroplach, Connect ---- Invalid Socket
27 september 2010 - 13:07:14 Vetroplach, Connect ---- Invalid Socket
27 september 2010 - 13:07:21 hî ---- Host: vetroplach,Vetroplach,Port: 31416, Couldn't resolve hostname
27 september 2010 - 13:07:21 Vetroplach, Connect ---- Invalid Socket
27 september 2010 - 13:07:28 Vetroplach, Connect ---- Invalid Socket
27 september 2010 - 13:07:33 Vetroplach, Connect ---- Invalid Socket
27 september 2010 - 13:07:38 Vetroplach, Connect ---- Invalid Socket
27 september 2010 - 13:07:43 Vetroplach, Connect ---- Invalid Socket
27 september 2010 - 13:07:49 hî ---- Host: vetroplach,Vetroplach,Port: 31416, Couldn't resolve hostname
27 september 2010 - 13:07:49 Vetroplach, Connect ---- Invalid Socket


Apparently BT is constantly trying to reconnect the remote machine. (It's name is not defined by IP, but Win machine name.)
Peter

fred

When things become slow, try removing the check at "Connect to TThrottle" in the General tab.

Pepo

Quote from: fred on September 27, 2010, 12:00:58 PM
When things become slow, try removing the check at "Connect to TThrottle" in the General tab.
I've updated to BT 0.75 some hour and half ago. Just now the tabs switching takes consistent 2 seconds. Toggling "Connect to TThrottle" makes no difference. Debug Log looks pretty the same.
Peter

fred

Quote from: Pepo on September 27, 2010, 12:33:24 PM
Quote from: fred on September 27, 2010, 12:00:58 PM
When things become slow, try removing the check at "Connect to TThrottle" in the General tab.
I've updated to BT 0.75 some hour and half ago. Just now the tabs switching takes consistent 2 seconds. Toggling "Connect to TThrottle" makes no difference. Debug Log looks pretty the same.
But do you have the reconnect ever xx second set.

Pepo

Quote from: fred on September 27, 2010, 12:39:18 PM
Quote from: Pepo on September 27, 2010, 12:33:24 PM
Quote from: fred on September 27, 2010, 12:00:58 PM
When things become slow, try removing the check at "Connect to TThrottle" in the General tab.
I've updated to BT 0.75 some hour and half ago. Just now the tabs switching takes consistent 2 seconds. Toggling "Connect to TThrottle" makes no difference. Debug Log looks pretty the same.
But do you have the reconnect ever xx second set.
Yes, at 710 sec.
Now the delay is around 1.5 sec.
Peter

fred

For the next version, I will not request temperatures from an unconnected computer in Tasks.
hî ---- Host: Is not ok, an unpredictable string is send.

wicked

Quote from: fred on September 27, 2010, 09:35:36 AM
1) When the computer goes off line, there may be some longer delays. The computer has to timeout first. After a couple of times the computer is regarded disconnected and things should go back to normal.
2) The update counter is linked to the last time it refreshed (wall clock) and the number of tasks, the more the slower it gets.
You may see the update of the history refresh. The history refresh is depended on the remaining time and has a maximum of 30 or 1/2 the expected remaining time.

1) Is the lightning symbol appearing at the same time host is timed out? So that when I see the lightning symbol, the computer should be in disconnected state and not part of the updating? Because the symbol was already there when the switching was delayed. It could have been that the reconnect time had passed when I tried. Or will tab switching always trigger a retry attempt?

Maybe regular updating and tab switching should be separated from the reconnection attempts? They should probably happen according to schedule and on background thread, IMHO. Additionally, an implicit menu command could be run by the user to trigger backgroun retry attempt (no "hidden" triggering for seemingly unrelated actions).

2) BT seems to go on overdrive sometimes (like it is at the moment) and is updating something every second. It claims to only have 36 tasks to update and I'm still only selecting only one computer (out of three that are all online). BT transfers about ~145kB total per second from all three hosts. That seems a bit excessive, although does seem to be affected according to number of tasks on the host. I guess this is history fetching that now gueries all tasks every second?

BT is on overdrive but the switching seems to be fast at the moment so I guess these are unrelated problems.

fred

Quote from: wicked on September 27, 2010, 02:51:04 PM

1) Is the lightning symbol appearing at the same time host is timed out? So that when I see the lightning symbol, the computer should be in disconnected state and not part of the updating? Because the symbol was already there when the switching was delayed. It could have been that the reconnect time had passed when I tried. Or will tab switching always trigger a retry attempt?

Maybe regular updating and tab switching should be separated from the reconnection attempts? They should probably happen according to schedule and on background thread, IMHO. Additionally, an implicit menu command could be run by the user to trigger backgroun retry attempt (no "hidden" triggering for seemingly unrelated actions).

2) BT seems to go on overdrive sometimes (like it is at the moment) and is updating something every second. It claims to only have 36 tasks to update and I'm still only selecting only one computer (out of three that are all online). BT transfers about ~145kB total per second from all three hosts. That seems a bit excessive, although does seem to be affected according to number of tasks on the host. I guess this is history fetching that now gueries all tasks every second?

BT is on overdrive but the switching seems to be fast at the moment so I guess these are unrelated problems.
1) A background request is not a bad idea. At least something that doesn't slow down the displaying of data.
2) Only the selected computer is updated, but... the history is still running.  The only way to get a reliable history is to request all results.
The history fetching depends on the time left from 1 - 30 seconds. = time left / 2 with a maximum of 30.

Pepo

Quote from: fred on September 27, 2010, 01:40:08 PM
For the next version, I will not request temperatures from an unconnected computer in Tasks.
hî ---- Host: Is not ok, an unpredictable string is send.
May be - after restart, the text changed to e.g. "13:12:49 ÈŠÆ ---- Host:", like:
27 september 2010 - 14:31:19 ÈŠÆ ---- Host: vetroplach,Vetroplach,Port: 31416, Couldn't resolve hostname
27 september 2010 - 14:31:36 Host: localhost, Rpc Thread ID: 12120, dnetc_cpu_normal_1028373_1 ---- Update State
27 september 2010 - 14:31:44 ÈŠÆ ---- Host: vetroplach,Vetroplach,Port: 31416, Couldn't resolve hostname
27 september 2010 - 14:31:54 Connected ---- Host: localhost,Pavilon6,Port: 31417, TThrottle Version: 2.20
27 september 2010 - 14:31:57 Vetroplach, Connect ---- Invalid Socket


Quote from: Pepo on September 27, 2010, 01:17:50 PM
Quote from: fred on September 27, 2010, 12:39:18 PM
But do you have the reconnect ever xx second set.
Yes, at 710 sec.
Now the delay is around 1.5 sec.
Forgot to tell, setting the "Reconnect every..." to 0 seemed to have not much influence (at least on tab switching delay). Maybe just the "17:23:11 ÈŠÆ ---- Host: vetroplach,Vetroplach,Port: 31416, Couldn't resolve hostname"-like strings were appearing less frequently than each 20-40 seconds (more precisely not at all during one 4-minute interval, but I did not recognize why).

Quote from: wicked on September 27, 2010, 02:51:04 PM
1) Is the lightning symbol appearing at the same time host is timed out? So that when I see the lightning symbol, the computer should be in disconnected state and not part of the updating? Because the symbol was already there when the switching was delayed.
Currently the red flash symbol is constatntly displayed for my remote computer. I've not noticed it - I do not use the Sidebar computer selection - not that many machines at hand.
Peter

Pepo

BT still runs as the same process since launched after upgrading to 0.75, the same single computer on a separate network. The switching time is now around 5 seconds, regardless of "Connect to TThrottle" and "Reconnect every..." settings.

I think that it has also little in common with responses from local BOINC client, as the BOINC Manager manages to get responses from the client on its regular 1-second intervals without any problem.
Peter