News:

Follow BoincTasks on Twitter Facebook        Visit our website here.
BoincTasks cloud login is working again

Main Menu

Testing Version 5.40

Started by fred, October 25, 2011, 09:54:57 AM

Previous topic - Next topic

0 Members and 5 Guests are viewing this topic.

fred

You need BoincTasks 1.25 or up.
Add: Debug flag to exclude, to debug the exclude process:   <debug>1</debug>.
Changed: Show XXX: 0 instead of CPU: 0, to show the task is excluded.
Changed: Max number of cores 8 -> 12.
Changed: Max number of Gpu's 6-> 8.
Changed: Graphic: Max Cpu and Gpu lines, now show the actual max temperatures used
Changed: Salable graph redesigned.
Changed: No longer shows a warning if another copy of TThrottle is running
Fixed: Driver debug started on Control+Shift, should toggle on the keyword DRIVER.
Fixed: File logging, should toggle on the keyword LOGGING.
Fixed: After a calibration the temperature window shows the wrong number of cores.
Fixed: BOINC connection mode: Even with a list of 0 programs, the regulator keeps on regulating on the old list.

Pepo

Quote from: fred on October 25, 2011, 09:54:57 AM
Changed: Graphic: Max Cpu and Gpu lines, now show the actual max temperatures used
My deepest thanks! Also for the addition of (3, 6 hour) entries to the graph intervals selector (which is missing in the announced changelist).

QuoteChanged: Scalable graph redesigned.
The small color boxes, left to the checkboxes, are not in-line with both checkboxes and their Core/Gpu/Max texts (which are lined-up just correctly), but are shifted up by some 4-5 pixels. This is on a Win7 machine with text size set to 115% (which might be causing this).
I'll check it later on other machine with no text enlargement.
Peter

fred

Quote from: Pepo on October 26, 2011, 08:31:40 AM
The small color boxes, left to the checkboxes, are not in-line with both checkboxes and their Core/Gpu/Max texts (which are lined-up just correctly), but are shifted up by some 4-5 pixels. This is on a Win7 machine with text size set to 115% (which might be causing this).
I'll check it later on other machine with no text enlargement.
Yep I noticed.

Pepo

Quote from: fred on October 26, 2011, 10:42:34 AM
Quote from: Pepo on October 26, 2011, 08:31:40 AM
The small color boxes, left to the checkboxes, are not in-line with both checkboxes and their Core/Gpu/Max texts (which are lined-up just correctly), but are shifted up by some 4-5 pixels. This is on a Win7 machine with text size set to 115% (which might be causing this).
I'll check it later on other machine with no text enlargement.
Yep I noticed.
On the other machine as well, and similarly also on BT 1.25.
Peter

Pepo

Quote from: Pepo on October 25, 2011, 06:08:42 PM
Just noticed that after running for a week long, my TTh 5.30 has over 20 000 "Process" handles named like "<Non-existent Process>(9996)" - usually multiple handles for the same process ID.

This might have something in common with 5.40's "Fixed: BOINC connection mode: Even with a list of 0 programs, the regulator keeps on regulating on the old list." However as TTh 5.40 lists BT 1.25 as a required partner, I'm waiting with upgrade and retesting.
Happens on 5.40 as well. After running 2 hours long (a bit less than my task switching interval), i can see two groups: 19 "<Non-existent Process>(4716)" and 6 "<Non-existent Process>(6728)" handles (both were famous_um_6.11_windows_intelx86.exe, according to the log).

Process Explorer displays now another two groups of handles:
- "105 references to 90 Process type handles" named "famous_um_6.11_windows_intelx86.exe(5968)",
- "100 refs to 92 handles" named "enigma2_0.76_windows_intelx86.exe(2340)",
and the numbers keep growing. Each group shares the same Object Address.

BTW, what is here the difference between "handle" and "reference"?

(I remember that TTh has already had such leaking process handle problem in the past.)
Peter

fred

Quote from: Pepo on October 26, 2011, 03:12:52 PM
Happens on 5.40 as well. After running 2 hours long (a bit less than my task switching interval), i can see two groups: 19 "<Non-existent Process>(4716)" and 6 "<Non-existent Process>(6728)" handles (both were famous_um_6.11_windows_intelx86.exe, according to the log).

Process Explorer displays now another two groups of handles:
- "105 references to 90 Process type handles" named "famous_um_6.11_windows_intelx86.exe(5968)",
- "100 refs to 92 handles" named "enigma2_0.76_windows_intelx86.exe(2340)",
and the numbers keep growing. Each group shares the same Object Address.

BTW, what is here the difference between "handle" and "reference"?

(I remember that TTh has already had such leaking process handle problem in the past.)
I don't know where you see this in the Process Explorer.

If you see this: Driver regulator: active in the logging, the driver is regulating, not TThrottle. And you can't see the driver with PE.

Non-existent Process should mean a process that isn't there, looks more like a project or BOINC problem to me.
More so, because the handles refer to a project exe.

Maybe the project exe crashes or killed by BOINC? Something like:  if parent process is killed or exited it will be flagged as <non-existent>

Pepo

Quote from: fred on October 26, 2011, 03:25:43 PM
If you see this: Driver regulator: active in the logging, the driver is regulating, not TThrottle. And you can't see the driver with PE.
Currently I do not see any note about regulation, but my temperatures are well below the thresholds (and I'm possibly not logging it :))

QuoteNon-existent Process should mean a process that isn't there, looks more like a project or BOINC problem to me.
More so, because the handles refer to a project exe.

Maybe the project exe crashes or killed by BOINC? Something like:  if parent process is killed or exited it will be flagged as <non-existent>
Yes, these "<Non-existent Process>(4716)" are pointing to the exited processes. In my understanding... this has nothing to do with BOINC killing tasks or so. Additionally, at the moment, my BOINC client has just one single Process handle open per one in-memory task process - not even the Enigma wrapper's child I can see hundredfold in TTh.

If you look at the Logging or Programs tab, yo see that TTh is aware of the processes, their PIDs etc. You must know better whether TTh is supplying them to the driver as text, or as handles, or is creating and using the handles just internally (regular process list enumeration? there was also such nice window in the past versions). TTh is adding these handles at exactly the "Expert / Rebuild list after ... seconds" rate - this might be your entry point for guessing and debugging.

I just have no idea why just these two particular processes can be seen now in the list (Enigma + CPDN at the moment), because there are e.g. three CPU-intensive tasks running, plus some nCi's... and all three are listed in TTh's log. And funny enough, some ten minutes ago, the Enigma task was even preempted :)

Back to the "<Non-existent Process>(4716)" - I guess that TTh is for any reason opening a sheer number of such Process handles and finally forgets to close them all. When the observed task's process terminates, the remainder of my example "enigma2_0.76_windows_intelx86.exe(2340)" handles will for sure mutate into "<Non-existent Process>(2340)" - I'll let you know later. [.....] Yup, I have now 208 "<Non-existent Process>(2340)" handles, all displaying identical "References: 181", "Handles: 181", "Object Address: 0xFFFFFA8007305370" in their properties.

QuoteI don't know where you see this in the Process Explorer.
Initially, in the System Info dialog, I've noticed an unusual number of handles in the system, Then, in the main window's Handles column, there was some 20 656 for TThrottle.
To see the list of process' handles in Process Explorer, select a process' line and choose View / Lower Pane View / Handles (or Ctrl+H), then at best sort the list according to the Name column - this will put all identical handles together.
Or in the Process Hacker, open a process' Properties dialog, select Handles tab - the result is identical.
Peter

fred

Stop stop not so much text. :o I have to read it. ;D And it doesn't give me the info I need.

I still don't know if TThrottle is working in driver mode or not, it probably is.

Try rebooting the system.
What happens it TThrottle gets the PID from the BOINC client. TThrottle sends the PID to the driver the PID is used to throttle the threads, but that't done by Windows internally.
If the internal process or thread tables get corrupted, strange things may happen.

But I've never seen Non-existent Process messages. Furthermore these messages have nothing to do with handles.

Pepo

Quote from: fred on October 26, 2011, 05:36:27 PM
Stop stop not so much text. :o I have to read it. ;D And it doesn't give me the info I need.
;D I'm sorry

QuoteI still don't know if TThrottle is working in driver mode or not, it probably is.
It probably is (or let me know how to check for sure). Prior to introducing the driver mode a few releases ago, TTh used to produce a bit more page faults - you sure remember me bugging with it 8)

QuoteTry rebooting the system.
I'm doing this just now. You can let me know whal to look for tomorrow.
QuoteIf the internal process or thread tables get corrupted, strange things may happen.
Better not :o

QuoteBut I've never seen Non-existent Process messages. Furthermore these messages have nothing to do with handles.
Neither did I. But I also did not mention any.
Peter

Pepo

Quote from: Pepo on October 26, 2011, 07:23:38 PM
Quote from: fred on October 26, 2011, 05:36:27 PM
I still don't know if TThrottle is working in driver mode or not, it probably is.
It probably is (or let me know how to check for sure). Prior to introducing the driver mode a few releases ago, TTh used to produce a bit more page faults - you sure remember me bugging with it 8)

QuoteTry rebooting the system.
I'm doing this just now. You can let me know whal to look for tomorrow.
Sorry, I've just noticed it now in the log: "26 October 2011 - 21:35:49 Driver was correctly installed, driver version: 2.3
26 October 2011 - 21:35:49 Driver regulator: active"

Although there are said to be 4 matching processes, as possible candidates for regulation:
26 October 2011 - 21:37:15  Počet zhodných programov (procesov): 8
Cpu: famous_um_6.11_windows_intelx86.exe, PID: 5124, Thready: 4
Cpu: hsgamma_fgrp1_0.23_windows_intelx86.exe, PID: 4224, Thready: 3
Cpu: setiathome_6.97_windows_intelx86.exe, PID: 4072, Thready: 3
Cpu: freehalboinc_1.93_windows_intelx86.exe, PID: 5648, Thready: 6
Gpu: famous_6.11_windows_intelx86.exe, PID: 4952, Thready: 3
Gpu: qcn_6.81_windows_intelx86__nci.exe, PID: 3616, Thready: 6
Gpu: crawler_1.08_windows_intelx86.exe, PID: 6740, Thready: 3
Gpu: data_collect_v3_3.27_windows_intelx86__nci.exe, PID: 6876, Thready: 3
---------------------------------------------------------------------------------------------------------------------   4 0 4 0
26 October 2011 - 21:37:15  Počet zhodných programov (procesov): 4
XXX: 0 (0%) - PID:4952 (3) Slot:7 http://cpdnbeta.oerc.ox.ac.uk/ famous_v0tf_1599_200_000223920
CPU:1, GPU:0, PID:5124 (4) Child: famous_um_6.11_windows_intelx86.exe
CPU:1 (66%) - PID:4224 (3) Slot:6 http://einstein.phys.uwm.edu/ LATeah0002S_416.0_41000_0.0
CPU:1 (73%) - PID:4072 (3) Slot:4 http://setiweb.ssl.berkeley.edu/beta/ 12jl11aa.6761.24610.4.14.7
XXX: 0 (0%) - PID:6740 (3) Slot:1 http://surveill.dei.uc.pt/surveill/ wu_1319568902_40288
----------------------------------------------------------------------------------------------------------------------------------------------

the observed "Process" handles are being added just for "famous_um_6.11_windows_intelx86.exe(5124)". Maybe there will be more i parallel, as more preempted tasks will become active and in-memory. I'll check at the morning.
Peter

fred

Quote from: Pepo on October 26, 2011, 07:49:09 PM

Although there are said to be 4 matching processes, as possible candidates for regulation:
26 October 2011 - 21:37:15  Počet zhodných programov (procesov): 8
Cpu: famous_um_6.11_windows_intelx86.exe, PID: 5124, Thready: 4
Cpu: hsgamma_fgrp1_0.23_windows_intelx86.exe, PID: 4224, Thready: 3
Cpu: setiathome_6.97_windows_intelx86.exe, PID: 4072, Thready: 3
Cpu: freehalboinc_1.93_windows_intelx86.exe, PID: 5648, Thready: 6
Gpu: famous_6.11_windows_intelx86.exe, PID: 4952, Thready: 3
Gpu: qcn_6.81_windows_intelx86__nci.exe, PID: 3616, Thready: 6
Gpu: crawler_1.08_windows_intelx86.exe, PID: 6740, Thready: 3
Gpu: data_collect_v3_3.27_windows_intelx86__nci.exe, PID: 6876, Thready: 3
---------------------------------------------------------------------------------------------------------------------   4 0 4 0
26 October 2011 - 21:37:15  Počet zhodných programov (procesov): 4
XXX: 0 (0%) - PID:4952 (3) Slot:7 http://cpdnbeta.oerc.ox.ac.uk/ famous_v0tf_1599_200_000223920
CPU:1, GPU:0, PID:5124 (4) Child: famous_um_6.11_windows_intelx86.exe
CPU:1 (66%) - PID:4224 (3) Slot:6 http://einstein.phys.uwm.edu/ LATeah0002S_416.0_41000_0.0
CPU:1 (73%) - PID:4072 (3) Slot:4 http://setiweb.ssl.berkeley.edu/beta/ 12jl11aa.6761.24610.4.14.7
XXX: 0 (0%) - PID:6740 (3) Slot:1 http://surveill.dei.uc.pt/surveill/ wu_1319568902_40288
----------------------------------------------------------------------------------------------------------------------------------------------

the observed "Process" handles are being added just for "famous_um_6.11_windows_intelx86.exe(5124)". Maybe there will be more i parallel, as more preempted tasks will become active and in-memory. I'll check at the morning.
That may be, as all active BOINC processes are used in the first list.  E.g. projects that are suspended and still in memory.
The second list is the actual running list from the BOINC client.

Pepo

Sorry for maybe another lengthy post  :-[ I'm putting down my observations and thoughts.
To finally make the long story short: TThrottle seems to be accumulating handles just for wrappers' child processes.



Quote from: Pepo on October 26, 2011, 07:49:09 PM
Although there are said to be 4 matching processes, as possible candidates for regulation: [...] the observed "Process" handles are being added just for "famous_um_6.11_windows_intelx86.exe(5124)". Maybe there will be more i parallel, as more preempted tasks will become active and in-memory. I'll check at the morning.
A day and half later I'm back to the Process handles. TTh is using 3744 handles ATM, and the number seems to be stabilized, oscillating around 7643-7647 +- 1-2 handles. [...] But occasionally indeed rising. 3752, 3762, 3778 - checked approx. once in a 10 minutes.

Sorted list of "<Non-existent Process>(PID)" handles (PID - count):
2488 - 16
3032 - 16
3760 - 15
3880 - 22
3888 - 16
5500 - 16
5648 - 22
5964 - 16
5972 - 15
6020 - 15
6208 - 16
6752 - 16
6812 - 16
7200 - 15
7224 - 23
7348 - 16
7464 - 90
7468 - 16
7488 - 16
7700 - 22
7728 - 15
7796 - 15
7988 - 15
8068 - 16
8096 - 15
8160 - 15
8260 - 15
8296 - 15
8524 - 15
8612 - 248
8764 - 18
8904 - 15
9008 - 16

Remaining Process handles:
enigma2_0.76_windows_intelx86.exe(8760) - 167
famous_um_6.11_windows_intelx86.exe(5124) - 1892
hadcm3n_um_6.07_windows_intelx86.exe(8216) - 336
primegrid_cllr.exe(8076) - 324

As expected, these four processes still do reside in memory, and were listed in TTh log at some point in time as candidates for throttling. (BTW, during last 24 hours, just a bit of throttling happened during 21-22 hours ago.)

BTW I'm noticing that Win7 is reusing PIDs just a few hours after a process vanishes - I've thought it will keep unique for a bit longer time:
Quote28 October 2011 - 04:46:58  Počet zhodných programov (procesov): 5
CPU:1 (50%) - PID:9008 (3)   Slot:2   http://surveill.dei.uc.pt/surveill/   wu_1319568902_94082


28 October 2011 - 08:41:10  Počet zhodných programov (procesov): 5
CPU:1, GPU:0, PID:9008 (2)   Child:   wcg_dsfl_vina_6.19_windows_intelx86
The log list is incomplete, only a few last hours, so I've got just a few (~15) PID matches - with this one Surveill exception, it was always on wcg_dsfl_vina_6.19_windows_intelx86 - maybe this WCG DSFL project task is the culprit?

TTh seems to be regularly adding handles for these processes. Maybe it did so for the WCG task, but could not free them?

The WCG task is now preempted, the worker child process' thread in Wait:Suspended thread. From BOINC log, the task launched first since the reboot on 27.10.2011 17:26:04 - this time is identical with when the residing wrapper process (PID=8136) started. Its current worker child (PID=7416) has started on 28. 10. 2011 10:52:17 - this time weirdly ??? does match the moment, when the task was preempted the last time, after a series of few checkpoints:

Quote
28.10.2011 7:53:08 | World Community Grid | [task] task_state=EXECUTING for DSFL_00000044_0000027_0122_0 from unsuspend
28.10.2011 7:53:08 | World Community Grid | Resuming task DSFL_00000044_0000027_0122_0 using dsfl version 619
28.10.2011 8:09:04 | World Community Grid | [task] result DSFL_00000044_0000027_0122_0 checkpointed
28.10.2011 8:25:00 | World Community Grid | [task] result DSFL_00000044_0000027_0122_0 checkpointed
[......]
28.10.2011 10:35:54 | World Community Grid | [task] result DSFL_00000044_0000027_0122_0 checkpointed
28.10.2011 10:52:18 | World Community Grid | [task] result DSFL_00000044_0000027_0122_0 checkpointed
28.10.2011 10:52:18 | World Community Grid | [cpu_sched] Preempting DSFL_00000044_0000027_0122_0 (left in memory)
28.10.2011 10:52:18 | World Community Grid | [task] task_state=SUSPENDED for DSFL_00000044_0000027_0122_0 from suspend

Now the task runs again (since 15:03:48), TTh lists it as a candidate (just no throttling is happening, temperatures are fine):
QuoteXXX: 0 (0%) - PID:8136 (3)   Slot:15   http://www.worldcommunitygrid.org/   DSFL_00000044_0000027_0122
CPU:1, GPU:0, PID:7416 (2)   Child:   wcg_dsfl_vina_6.19_windows_intelx86
and I can see it having 15 "wcg_dsfl_vina_6.19_windows_intelx86(7416)" Process handles, keeps being incremented (16 now, 17, 18...)
I'll let it run now.


My preliminary conclusion is, that the wrapper process, for some reason, discards its worker at the end of time slot, replaces with a fresh new worker and puts it on hold? [...] No, I've just seen one checkpoint - process 7416 vanished and got replaced with a new worker PID=8500. TTh now holds 19 "wcg_dsfl_vina_6.19_windows_intelx86(7416)" Process handles and 4 "wcg_dsfl_vina_6.19_windows_intelx86(8500)" handles (and keeps increment the latter one).

(Maybe when I'll close the Handles list in Process Explorer and open it once more, PE will forget the old process name and "wcg_dsfl_vina_6.19_windows_intelx86(7416)" will turn into "<Non-existent Process>(7416)"? Yes, happened exactly.)



My final conclusion: I have no idea what is the reason why TTh adds these handles (at the Process List refresh interval rate), but apparently they are not discarded after a process terminates. It appears like this is happening just with worker child processes of these tasks, which are split to wrapper and worker processes.
Peter

fred

Stop stop, I get an information overload and not closer to an understanding.

Please only answer my questions so we can go step by step, otherwise we will never find this.

Where do you see this, tab etc, screenshot maybe. Because I haven't got a clue.

Sorted list of "<Non-existent Process>(PID)" handles (PID - count):

Does this only happen with child processes?


Pepo

Quote from: fred on October 28, 2011, 03:08:49 PM
Where do you see this, tab etc, screenshot maybe. Because I haven't got a clue.
Sorted list of "<Non-existent Process>(PID)" handles (PID - count):
As noted previously:
Quote from: Pepo on October 26, 2011, 05:04:26 PM
To see the list of process' handles in Process Explorer, select a process' line and choose View / Lower Pane View / Handles (or Ctrl+H), then at best sort the list according to the Name column (a click on its header) - this will put all identical handles together.
Or in the Process Hacker, open a process' Properties dialog, select Handles tab - the result is identical.

QuoteDoes this only happen with child processes?
Apparently only with these. I do not see any stand-alone task processes in TTh's handles' list. Only wrappers' children.
Peter

fred

Do you have a project with only troublesome child processes, for testing, so I can see what's going on.