eFMer - BoincTasks and TThrottle forum

BoincTasks For Window, Mac & Linux => Beta Testing => Topic started by: idahofisherman on November 14, 2011, 04:29:33 AM

Title: 1.27.still looping and using most of the CPU (98%)
Post by: idahofisherman on November 14, 2011, 04:29:33 AM
This has been happening since 1.20, but have not been able to pin point it to any thing specific.  It alo seems to cause the boincmgr to crash with invalid password supplied.  When I reconnect using localhost the boincmgr comes  up alright.  Exiting Boinctasks and restarting it allows it to run for a while, but continues to slowly build up the CPU usage until the computer becomes unusable and the boincmgr loses its connection to the client.

Are there any suggestions as to  how to track this down.  Its become a pain in the butt. because I can not leave boinctasks unattended for more than 12 hours.
Title: Re: 1.27.still looping and using most of the CPU (98%)
Post by: fred on November 14, 2011, 08:04:18 AM
Quote from: idahofisherman on November 14, 2011, 04:29:33 AM
This has been happening since 1.20, but have not been able to pin point it to any thing specific.  It alo seems to cause the boincmgr to crash with invalid password supplied.  When I reconnect using localhost the boincmgr comes  up alright.  Exiting Boinctasks and restarting it allows it to run for a while, but continues to slowly build up the CPU usage until the computer becomes unusable and the boincmgr loses its connection to the client.

Are there any suggestions as to  how to track this down.  Its become a pain in the butt. because I can not leave boinctasks unattended for more than 12 hours.
It's not a good idea to run both the Manager and BoincTasks.
As the BOINC Manager is getting problems connecting, it points to some problem with the BOINC Manager or the BOINC client. But I've seen the BOINC Manager do this on my own computer, if it gets too many WU's to handle.
It may be a memory problem in the Manager that effects BoincTasks as well.

What BOINC version are you using? What Windows version? 32 or 64 bit.
Title: Re: 1.27.still looping and using most of the CPU (98%)
Post by: fred on November 14, 2011, 10:13:37 AM
For V 1.28 I will add some run time graphs, to better analyze the problem.
Title: Re: 1.27.still looping and using most of the CPU (98%)
Post by: idahofisherman on November 14, 2011, 09:00:20 PM
I am presently running XP3 32 bit.

I cant seem to get BT to connect to Client without the boincmgr running.
Title: Re: 1.27.still looping and using most of the CPU (98%)
Post by: fred on November 15, 2011, 07:01:24 AM
Quote from: idahofisherman on November 14, 2011, 09:00:20 PM
I am presently running XP3 32 bit.

I cant seem to get BT to connect to Client without the boincmgr running.
Probably relate to: http://www.efmer.eu/forum_tt/index.php?topic=862.0 (http://www.efmer.eu/forum_tt/index.php?topic=862.0)
If you can answer the same question.
Title: Re: 1.27.still looping and using most of the CPU (98%)
Post by: idahofisherman on November 15, 2011, 04:16:19 PM
I run in administration mode at all times.   Have also changed BT to start Boinc client.  Seems to work okay and I can attach the client.  Have also changed the BOINC folders and sub folders from read only to allow all activities.
Title: Re: 1.27.still looping and using most of the CPU (98%)
Post by: idahofisherman on November 17, 2011, 08:42:56 AM
Still looping.  NNoticed that Boinc is not running, when I find BT looping.  Have to kill BT with taskmgr and restart it.
Title: Re: 1.27.still looping and using most of the CPU (98%)
Post by: fred on November 17, 2011, 08:48:07 AM
Quote from: idahofisherman on November 17, 2011, 08:42:56 AM
Still looping.  NNoticed that Boinc is not running, when I find BT looping.  Have to kill BT with taskmgr and restart it.
BOINC Version?
Why is the BOINC client not running?
Title: Re: 1.27.still looping and using most of the CPU (98%)
Post by: idahofisherman on November 17, 2011, 05:12:38 PM
Boinc version 6.13.12.  I don't know why it isn't running.  It starts when I start BT.  Is there some where I can look to see why it stopped?
Title: Re: 1.27.still looping and using most of the CPU (98%)
Post by: fred on November 17, 2011, 06:13:23 PM
Quote from: idahofisherman on November 17, 2011, 05:12:38 PM
Boinc version 6.13.12.  I don't know why it isn't running.  It starts when I start BT.  Is there some where I can look to see why it stopped?
6.13.12 is more alpha, a bit too buggy for me to test. I have one computer running, but strange things happen.
That explains the strange behavior a crashed client.
Title: Re: 1.27.still looping and using most of the CPU (98%)
Post by: Pepo on November 17, 2011, 06:26:09 PM
It smells pretty ill like my issues with BT 1.25-1.27 and BOINC 6.13.9+6.13.10.
Please, idahofisherman, have you observed any of the issues I've been reporting?

I'm just thinking of checking an even newer 6.13.x client. I'd like to find out rather sooner (alpha) than at later (pre-release) stages...
Title: Re: 1.27.still looping and using most of the CPU (98%)
Post by: fred on November 17, 2011, 06:31:03 PM
I check out the new BOINC versions, but it's too much work to check all the changes.  In V 10 I reported a problem writing the cc_config.xml, lets see if they fixed it.
Title: Re: 1.27.still looping and using most of the CPU (98%)
Post by: idahofisherman on November 19, 2011, 01:51:56 AM
Here is the last part of stdoutdae file for when Boinc wasn't running and BT was in a loop.  Maybe this will be some help.

18-Nov-2011 17:29:17 [---] failed to rename xfer history file: Error 5
18-Nov-2011 17:29:26 [Server for testing Bolpex] Scheduler request completed
18-Nov-2011 17:30:11 [EDGI Demo Project] Sending scheduler request: Requested by project.
18-Nov-2011 17:30:11 [EDGI Demo Project] Not reporting or requesting tasks
18-Nov-2011 17:30:14 [EDGI Demo Project] Scheduler request completed
18-Nov-2011 17:30:14 [---] Can't rename current state file to previous state file; The process cannot access the file because it is being used by another process. (0x20)
18-Nov-2011 17:30:14 [---] rename error: Access is denied. (0x5)


Here is the BT log at the end just before I terminated BT while it was in a  loop:

Elements,Port: 31416, connection error
18 November 2011 - 18:11:46 Update State ---- Host: 192.168.0.196, Rpc Thread ID: 5156, wu_1321525201_66099_0
18 November 2011 - 18:13:10 Connect, init ---- Host: localhost, Elements,Port: 31416, connection error
18 November 2011 - 18:14:29 Update State ---- Host: 192.168.0.100, Rpc Thread ID: 2600, wu_1321525201_66086_0
18 November 2011 - 18:14:45 Update State ---- Host: 192.168.0.199, Rpc Thread ID: 5444, wu_1321525201_66200_0
18 November 2011 - 18:15:24 Connect, init ---- Host: localhost, Elements,Port: 31416, connection error
18 November 2011 - 18:17:26 Connect, init ---- Host: localhost, Elements,Port: 31416, connection error
18 November 2011 - 18:18:14 Update State ---- Host: 192.168.0.102, Rpc Thread ID: 888, qcnc_001213_0
18 November 2011 - 18:19:27 Connect, init ---- Host: localhost, Elements,Port: 31416, connection error
18 November 2011 - 18:20:24 Update State ---- Host: 192.168.0.197, Rpc Thread ID: 4920, wu_1321525201_66325_0
18 November 2011 - 18:21:28 Connect, init ---- Host: localhost, Elements,Port: 31416, connection error
18 November 2011 - 18:23:28 Connect, init ---- Host: localhost, Elements,Port: 31416, connection error
18 November 2011 - 18:25:29 Connect, init ---- Host: localhost, Elements,Port: 31416, connection error
18 November 2011 - 18:26:04 Update State ---- Host: 192.168.0.102, Rpc
18-Nov-2011 17:30:14 [---] Couldn't write state file: rename() failed; giving up


It looks like it  is trying to connect to the localhost Boinc Client, and just keeps looping .
Title: Re: 1.27.still looping and using most of the CPU (98%)
Post by: fred on November 19, 2011, 10:09:21 AM
Quote from: idahofisherman on November 19, 2011, 01:51:56 AM
Here is the last part of stdoutdae file for when Boinc wasn't running and BT was in a loop.  Maybe this will be some help.

18-Nov-2011 17:29:17 [---] failed to rename xfer history file: Error 5
18-Nov-2011 17:30:14 [---] Can't rename current state file to previous state file; The process cannot access the file because it is being used by another process. (0x20)
18-Nov-2011 17:30:14 [---] rename error: Access is denied. (0x5)

It looks like it  is trying to connect to the localhost Boinc Client, and just keeps looping .
Something is locking files in the BOINC data folder.
I see 2 different files being locked.
Is there a rescheduler running or do you have a backup program running?
Title: Re: 1.27.still looping and using most of the CPU (98%)
Post by: idahofisherman on November 19, 2011, 08:48:34 PM
Niether
Title: Re: 1.27.still looping and using most of the CPU (98%)
Post by: fred on November 20, 2011, 09:35:43 AM
Quote from: idahofisherman on November 19, 2011, 08:48:34 PM
Niether
Virus scanner?http://boinc.berkeley.edu/dev/forum_thread.php?id=1470#7574 (http://boinc.berkeley.edu/dev/forum_thread.php?id=1470#7574)
Otherwise it's a bug in the BOINC client.
Title: Re: 1.27.still looping and using most of the CPU (98%)
Post by: Beyond on November 20, 2011, 04:15:07 PM
If you haven't tried this yet, I'd suggest:

Disable long term history and then delete all the long term history files from your history and history/backup folders.  Restart BT.
Title: Re: 1.27.still looping and using most of the CPU (98%)
Post by: fred on November 20, 2011, 06:20:08 PM
Quote from: Beyond on November 20, 2011, 04:15:07 PM
If you haven't tried this yet, I'd suggest:

Disable long term history and then delete all the long term history files from your history and history/backup folders.  Restart BT.
The error is from the BOINC client not from BT, so no need to delete history files.
Title: Re: 1.27.still looping and using most of the CPU (98%)
Post by: idahofisherman on November 20, 2011, 08:47:42 PM
I am convienced that BT is the problem as I ran THe client with out BT for approximately 20 hours and no problem with client disappearing or looping.   Have now activated BT again and we shall see what happens.
Title: Re: 1.27.still looping and using most of the CPU (98%)
Post by: idahofisherman on November 21, 2011, 12:44:22 AM
BT is back to its looping and making the client disappear.
Title: Re: 1.27.still looping and using most of the CPU (98%)
Post by: fred on November 21, 2011, 08:12:16 AM
Quote from: idahofisherman on November 21, 2011, 12:44:22 AM
BT is back to its looping and making the client disappear.
A client that disappears is always a client problem.
And locking a file is not something BT does.
Title: Re: 1.27.still looping and using most of the CPU (98%)
Post by: Beyond on November 21, 2011, 04:13:06 PM
Quote from: fred on November 21, 2011, 08:12:16 AM
Quote from: idahofisherman on November 21, 2011, 12:44:22 AM
BT is back to its looping and making the client disappear.
A client that disappears is always a client problem.
And locking a file is not something BT does.
I have 2 scenarios when clients show as unconnected: the first is when a machine becomes slow due to a heavy load or an app that hogs the processor.  The second is when machines are connected via wireless connections that don't have the best signal strength.  The big problem is the way BT handles these situations: it often crashes (sometimes a hard crash, sometimes a restart).  This has IMO always been the #1 BT problem, it was getting better up to v1.21 and has been worse again since v1.21.

Edit:  A third scenario I've been seeing in v1.27:  if a BOINC client is stopped and restarted for any reason sometimes its jobs will not reappear.  Looking at the left pane it appears to be connected but there are no tasks in the task list.  Left it in this condition for up to 30 minutes and still no tasks.  Double-clicking on the computer will not cause the tasks to reappear, only exiting and restarting BT brings them back.  Have not seen this happen in v1.21.