BT 1.03

Started by Beyond, May 13, 2011, 12:23:45 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

fred

Quote from: Pepo on May 17, 2011, 02:15:38 PM
As the full dump was 165 MB, even compressed 34 MB, I've rather uploaded it - I assume it needs to be deleted soon because of its length.
Got it and deleted.
In the history delete after time section.
But nothing definitive, may be a program problem somewhere else or a memory corruption by something.

Pepo

Quote from: fred on May 17, 2011, 02:32:03 PM
Quote from: Pepo on May 17, 2011, 02:15:38 PM
As the full dump was 165 MB...
In the history delete after time section.
But nothing definitive, may be a program problem somewhere else or a memory corruption by something.
My history is 365+23, possibly some first old entries just started to get purged? ... No, the oldest ones will apparently just start to be deleted in a few days. (Or some oldest batch was already deleted and I've not noticed it.)
Peter

Pepo

I'm regularly seeing Rosetta Mini CPU tasks, preempted just at 100%, still in a "Ready to run" state. (They really need just the last kick to finish, less than 1 second! >:()

In such state, like with my "casd_sr10_boinc_nmr_control.2vajA_30_abrelax_cs_frags_tex_IGNORE_THE_REST_25722_26751_0" now, BT states on the Tasks tab their "Elapsed (CPU) time" being "-(-)" (possibly because of the 100%??), "last checkpoint CPU time" is e.g. "[0] 01:51:00", ETA is "-". In the Properties window, there is no CPU time, but the checkpoint time is the same and Elapsed time is also available there - like "01:53:26". BOINC Manager displays also its CPU time (equal to the checkpoint time).

While the task is visible in the tab, BT should correctly display their Elapsed and CPU and checkpoint times...
Peter

Pepo

#18
One more crash dump (uploaded).
The message about a .dmp file being generated popped up after I've opened the tray icon's context menu. (I admit the system was pretty unstable during previous two hours - I've been punishing it ;D.) Nevertheless BT still seemed to work fine. (Possibly just some exception handler generated the dump). After confirming, one more message with exception description popped up, at the time BT closed itself.
Peter

Pepo

Quote from: Pepo on April 28, 2011, 09:20:48 AM
One found:

  • An errored-out (but not reported yet) task's Properties window is empty.
Something similar:

  • A task, which failed to download (is similar to an errored-out task), but was also not reported yet, does show up its details in Properties window (except the state there is still "Downloading" instead of "Download failed" or so), but does not appear in History tab - possibly because of this neither in the tray icon's "Report finshed tasks" counter.
Peter

Pepo

Quote from: Pepo on May 18, 2011, 09:00:37 PM
One more crash dump (uploaded).
The message about a .dmp file being generated popped up after I've opened the tray icon's context menu. (I admit the system was pretty unstable during previous two hours - I've been punishing it ;D.) Nevertheless BT still seemed to work fine. (Possibly just some exception handler generated the dump). After confirming, one more message with exception description popped up, at the time BT closed itself.
Three more crash dumps overnight ("Sorry but BoincTasks has crashed. Dump file(s): BoincTasks_103_20-05-2011_05-59.dmp, BoincTasks_103_20-05-2011_07-12.dmp, BoincTasks_103_20-05-2011_07-45.dmp, created succesfully.") from the same BT instance - it again seemed to work until I've confirmed the dumps with OK ("boinctasks64.exe - Fatal Application Exit").

Do I have to upload (some of) them?

Again my system was low on memory - the EVO project launched 14 :o (fourteeeeeen) "20.09 LBCpp Beta" tasks in the same .../slots/19/ - the RunBoincWorkUnit_20.09_windows_intelx86.exe executable needs to allocate 1.7 GB of memory and the last two of them are still running and trying to grab some more RAM to proceed. My OS tried to adapt to the situation and enlarged my pagefile by 8 GB ::) AFAICT just BT and Skype did not survive.
Peter

fred

Quote from: Pepo on May 20, 2011, 10:31:43 AM
Do I have to upload (some of) them?

Again my system was low on memory - the EVO project launched 14 :o (fourteeeeeen) "20.09 LBCpp Beta" tasks in the same .../slots/19/ - the RunBoincWorkUnit_20.09_windows_intelx86.exe executable needs to allocate 1.7 GB of memory and the last two of them are still running and trying to grab some more RAM to proceed. My OS tried to adapt to the situation and enlarged my pagefile by 8 GB ::) AFAICT just BT and Skype did not survive.
A low on memory will get you into serious problems.
I don't check for memory problems all the time. It's way to intense to check in thousands of places.

Pepo

Quote from: fred on May 20, 2011, 11:38:28 AM
Quote from: Pepo on May 20, 2011, 10:31:43 AM
Do I have to upload (some of) them?
Again my system was low on memory ...
A low on memory will get you into serious problems.
I don't check for memory problems all the time. It's way to intense to check in thousands of places.
Sure. Then I'll assume it was because of lack of memory.
Peter

Pepo

A few minutes ago (around 16:40) I've noticed a cryptic "20100-state" in History. I've copied the not-colored line, but a few seconds later the line completely disappeared! :o When ??? do such lines appear there?

It was a CPDN 6.09 HADAM3P EU task, Elapsed (CPU) time was "04d,21:24:36 (04d,18:39:56)", however I'm not sure, whether "20.05.11  14:38" was in the Finished or Reported column (the tabs were unfortunately replaced by spaces, but according to the number of spaces it was rather in Finished).

A counterpart Tasks line contained (a bit later): Elapsed (CPU) time = "04d,21:24:36 (04d,20:56:26)", Progress% = "62.604", Checkpoint = "[0] 00:00:06", Remaining = "03d,00:07:10", State = "Waiting to run".

OTOH, the two much different CPU times (while both Elapsed times were equal) seemed weird to me - I've compared it with BOINC Manager: Elapsed = 04d,23:19:12, CPU=04d,20:56:26 - again and still the wrong elapsed time bug (on preempted multi-day tasks) manifests itself here? :-\

Indeed!! after resuming the task, the Elapsed (CPU) time jumped to "04d,23:19:18 (04d,20:56:26)" and after suspending back to "04d,21:24:36 (04d,20:56:26)"!!

The task's last string of messages in log was:
20.05.2011 12:43:10 | climateprediction.net | [cpu_sched] Starting hadam3p_eu_2tpi_1960_1_007230948_1(resume)
20.05.2011 12:43:10 | climateprediction.net | [task] task_state=EXECUTING for hadam3p_eu_2tpi_1960_1_007230948_1 from start
20.05.2011 12:43:10 | climateprediction.net | Restarting task hadam3p_eu_2tpi_1960_1_007230948_1 using hadam3p_eu version 609
20.05.2011 12:43:17 | climateprediction.net | [task] result hadam3p_eu_2tpi_1960_1_007230948_1 checkpointed
20.05.2011 13:22:41 | climateprediction.net | [task] result hadam3p_eu_2tpi_1960_1_007230948_1 checkpointed
20.05.2011 14:04:30 | climateprediction.net | [task] result hadam3p_eu_2tpi_1960_1_007230948_1 checkpointed
20.05.2011 14:38:40 | climateprediction.net | [task] result hadam3p_eu_2tpi_1960_1_007230948_1 checkpointed
20.05.2011 14:38:40 | climateprediction.net | [cpu_sched] Preempting hadam3p_eu_2tpi_1960_1_007230948_1 (left in memory)
20.05.2011 14:38:40 | climateprediction.net | [task] task_state=SUSPENDED for hadam3p_eu_2tpi_1960_1_007230948_1 from suspend
Peter

Pepo

Quote from: Pepo on May 20, 2011, 01:24:09 PM
OTOH, the two much different CPU times (while both Elapsed times were equal) seemed weird to me - I've compared it with BOINC Manager: Elapsed = 04d,23:19:12, CPU=04d,20:56:26 - again and still the wrong elapsed time bug (on preempted multi-day tasks) manifests itself here? :-\

Indeed!! after resuming the task, the Elapsed (CPU) time jumped to "04d,23:19:18 (04d,20:56:26)" and after suspending back to "04d,21:24:36 (04d,20:56:26)"!!
Possibly related - an EVO 20.09 task nearly immediately set its progress to 100%. From the moment, on the Tasks tab the task's Elapsed time is still just 00:00:02 seconds, although the task's Properties window contains a correct elapsed time value. CPU time does increment correctly, checkpoint time correctly remains 00:00:02 (and Checkpoint column displays a correct difference value.

After suspending the task, both CPU and checkpoint time are 00:00:02 and the elapsed time is 00:00:35. After resuming the task again, its elapsed jumped back to 2 seconds, CPU is correct.

Progress still remains 100%.
Peter

Pepo

Quote from: Pepo on May 20, 2011, 01:48:06 PM
Possibly related - an EVO 20.09 task nearly immediately set its progress to 100%. From the moment, on the Tasks tab the task's Elapsed time is still just 00:00:02 seconds, although the task's Properties window contains a correct elapsed time value. CPU time does increment correctly, checkpoint time correctly remains 00:00:02 (and Checkpoint column displays a correct difference value.

After suspending the task, both CPU and checkpoint time are 00:00:02 and the elapsed time is 00:00:35. After resuming the task again, its elapsed jumped back to 2 seconds, CPU is correct.

Progress still remains 100%.
I assume it is related to the Progress% and running state (fast history fetching kicks in again?) - 3 minutes after I've resumed the EVO task, BT's CPU usage jumped to 2/3 of a core and stayed for hours, with the exception of 3 minutes, while EVO task was temporarily suspended.
Peter

fred

Quote from: Pepo on May 20, 2011, 01:24:09 PM
A few minutes ago (around 16:40) I've noticed a cryptic "20100-state" in History. I've copied the not-colored line, but a few seconds later the line completely disappeared! :o When ??? do such lines appear there?
I changed the cleanup routing to once every 5 minutes and that one removed it.
20100 = Waiting to run and shouldn't be there in the first place.

fred

Quote from: Pepo on May 20, 2011, 04:07:16 PM
I assume it is related to the Progress% and running state (fast history fetching kicks in again?) - 3 minutes after I've resumed the EVO task, BT's CPU usage jumped to 2/3 of a core and stayed for hours, with the exception of 3 minutes, while EVO task was temporarily suspended.
The progress comes from the BOINC client and is wrong....
The CPU usage, comes from the BOINC client as well. Is the time difference over couple of seconds.

Pepo

#28
Quote from: fred on May 20, 2011, 04:12:20 PM
Quote from: Pepo on May 20, 2011, 04:07:16 PM
I assume it is related to the Progress% and running state (fast history fetching kicks in again?) - 3 minutes after I've resumed the EVO task, BT's CPU usage jumped to 2/3 of a core and stayed for hours, with the exception of 3 minutes, while EVO task was temporarily suspended.
The progress comes from the BOINC client and is wrong....
I believe it and I'm sure that the client gets it directly from EVO's wrapper (which behaves everything but correctly :))) - BOINC Manager sees the same. But the times are wrong.

QuoteThe CPU usage, comes from the BOINC client as well. Is the time difference over couple of seconds.
How this?? What time difference over few seconds? BT simply consumes that much CPU time and is totally unresponsive (like poured with glue, the same feeling like when GPU tasks are blocking everything visible)... When I suspend the EVO 100% task, BT's CPU usage goes down to 1-2% in a few seconds. When I resume the EVO task, in approx. 30 seconds BT's CPU usage goes again up and it gets slowly responsive. And suddenly makes 4 x more I/O.

One more weird problem: when in such slowly responsive state, BT is able to suspend/resume any of the running or ready-to-run tasks, but sometimes somehow can not modify the suspend/resume state of the not-yet-started (ready to start) tasks :o (I've tried to restart it a couple of times, but no joy - BOINC Manager had to help.)
Peter

fred

I attached to the EVO project, lets see what problems I can produce.