History - Missed status

Started by Tim Norton, July 23, 2010, 03:12:40 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Tim Norton

Hi

with Milkyway reporting a lot of my CUDA wu appear as "missed" in the History log on both 0.62 and 0.66 when in fact they are not missed reporting - have checked on Milkyway site

it appears that for all of them (12)  if the wu is completed and reported at exactly the same time - boinctasks shows it as a "missed" status

also just checking it has done the same on three SETI wu as well - cant verify these but i never get close to a wu running out

other work units not with the same times are fine

is this a bug or am i missing something?

Cheers

Tim
Thanks

Tim

fred

Quote from: Tim Norton on July 23, 2010, 03:12:40 PM
Hi

with Milkyway reporting a lot of my CUDA wu appear as "missed" in the History log on both 0.62 and 0.66 when in fact they are not missed reporting - have checked on Milkyway site

it appears that for all of them (12)  if the wu is completed and reported at exactly the same time - boinctasks shows it as a "missed" status

also just checking it has done the same on three SETI wu as well - cant verify these but i never get close to a wu running out

other work units not with the same times are fine

is this a bug or am i missing something?

Cheers

Tim
Missed means not seen.
BT checks every 15 seconds if a task is in upload or ready to report.
Sometimes a task goes from running to upload to reported to gone in just a couple of seconds.
So all the information is there except the exit status.

Another thing that can happen is that it's a remote computer and BT wasn't running at the time the WU completed.

Tim Norton

Ah ok

its the 15 second thing then

Thanks

Tim

fred

Quote from: Tim Norton on July 23, 2010, 04:30:09 PM
Ah ok
its the 15 second thing then
Quicker gives too much overhead.

You may try: http://www.efmer.eu/forum_tt/index.php?topic=415.0 and see if it catches the tasks.
I may couple these two sometimes to get a better history capture.

Pepo

Quote from: fred on July 23, 2010, 03:45:06 PM
Quote from: Tim Norton on July 23, 2010, 03:12:40 PM
with Milkyway reporting a lot of my CUDA wu appear as "missed" in the History log on both 0.62 and 0.66 when in fact they are not missed reporting - have checked on Milkyway site

it appears that for all of them (12)  if the wu is completed and reported at exactly the same time - boinctasks shows it as a "missed" status
Missed means not seen.
BT checks every 15 seconds if a task is in upload or ready to report.
Sometimes a task goes from running to upload to reported to gone in just a couple of seconds.
So all the information is there except the exit status.
Unfortunately exactly the same thing is happening to all of my QCN tasks - They are often reported maybe the same second they finish "waiting for a quake".

Sad that the BOINC devs, after being pushed hard for a long time, were willing to implement "Report Results Immediately", but not something more versatile with a "number of seconds delay" parameter. :( Yet another thing waiting on volunteering devs.
Peter

fred

Quote from: Pepo on July 23, 2010, 06:52:38 PM
Quote from: fred on July 23, 2010, 03:45:06 PM
Quote from: Tim Norton on July 23, 2010, 03:12:40 PM
with Milkyway reporting a lot of my CUDA wu appear as "missed" in the History log on both 0.62 and 0.66 when in fact they are not missed reporting - have checked on Milkyway site

it appears that for all of them (12)  if the wu is completed and reported at exactly the same time - boinctasks shows it as a "missed" status
Missed means not seen.
BT checks every 15 seconds if a task is in upload or ready to report.
Sometimes a task goes from running to upload to reported to gone in just a couple of seconds.
So all the information is there except the exit status.
Unfortunately exactly the same thing is happening to all of my QCN tasks - They are often reported maybe the same second they finish "waiting for a quake".

Sad that the BOINC devs, after being pushed hard for a long time, were willing to implement "Report Results Immediately", but not something more versatile with a "number of seconds delay" parameter. :( Yet another thing waiting on volunteering devs.
Ziet de BoincMonitor ze wel?

Pepo

#6
Quote from: fred on July 23, 2010, 07:29:38 PM
Quote from: Pepo on July 23, 2010, 06:52:38 PM
Sad that the BOINC devs, after being pushed hard for a long time, were willing to implement "Report Results Immediately", but not something more versatile with a "number of seconds delay" parameter. :( Yet another thing waiting on volunteering devs.
Does the BoincMonitor show these tasks?
:o ??? OK I can loosely imagine what you mean. BM can catch the moment when client_state.xml is written with result info. But, as you've already noted, it is speachless regarding remote machines.

I'm considering this being a client bug - a missing option for an user to set up his system according to her/his needs.
Peter

fred

Quote from: Pepo on July 23, 2010, 10:13:09 PM
:o ??? OK I can loosely imagine what you mean. BM can catch the moment when client_state.xml is written with result info. But, as you've already noted, it is speachless regarding remote machines.

I'm considering this being a client bug - a missing option for an user to set up his system according to her/his needs.
I will change the History capture a bit and make the timing more flexible.
When there is a running tasks close to the finish line, the history capture time will be shorter.

Pepo

Quote from: fred on July 24, 2010, 09:38:13 AM
Quote from: Pepo on July 23, 2010, 10:13:09 PM
I'm considering this being a client bug - a missing option for an user to set up his system according to her/his needs.
I will change the History capture a bit and make the timing more flexible.
When there is a running tasks close to the finish line, the history capture time will be shorter.
Maybe worth a try. But still, this would be more effectively solved on the client's side.
Peter

Pepo

Quote from: Pepo on July 27, 2010, 08:14:50 AM
Quote from: fred on July 24, 2010, 09:38:13 AM
I will change the History capture a bit and make the timing more flexible.
When there is a running tasks close to the finish line, the history capture time will be shorter.
Maybe worth a try. But still, this would be more effectively solved on the client's side.
I've just looked at the (my most usually missed project's reports) QCN's timings around finished tasks: since exiting the task's process, it usually takes 3 (at most 5) seconds, until the results are uploaded, corresponding scheduler request (for a final trickle-up) is completed and the client gets 1 new task assigned. (After next 3 or more seconds later the new task is downloaded and started.) A goodie for your attempt: QCN't Remaining time countdown goes exactly to zero, so BT would have to speed up the history capture not sooner than 1-2 cycles before the end.

My next mostly "missed" project is Open Rendering Environment with its often tiny Blender tasks. I could observe finish-to-report times at 6 seconds, but their runtimes are highly unpredictable (seconds to hours) and the percentage and countdown possibly too. Let's see...
Peter

fred

Quote from: Pepo on July 28, 2010, 11:05:20 AM
Quote from: Pepo on July 27, 2010, 08:14:50 AM
Quote from: fred on July 24, 2010, 09:38:13 AM
I will change the History capture a bit and make the timing more flexible.
When there is a running tasks close to the finish line, the history capture time will be shorter.
Maybe worth a try. But still, this would be more effectively solved on the client's side.
I've just looked at the (my most usually missed project's reports) QCN's timings around finished tasks: since exiting the task's process, it usually takes 3 (at most 5) seconds, until the results are uploaded, corresponding scheduler request (for a final trickle-up) is completed and the client gets 1 new task assigned. (After next 3 or more seconds later the new task is downloaded and started.) A goodie for your attempt: QCN't Remaining time countdown goes exactly to zero, so BT would have to speed up the history capture not sooner than 1-2 cycles before the end.

My next mostly "missed" project is Open Rendering Environment with its often tiny Blender tasks. I could observe finish-to-report times at 6 seconds, but their runtimes are highly unpredictable (seconds to hours) and the percentage and countdown possibly too. Let's see...
The new timing should take care of that. But the estimated time shouldn't be off by more than 50%, in the last minute that is.

Pepo

Quote from: fred on July 28, 2010, 11:53:25 AM
Quote from: fred on July 24, 2010, 09:38:13 AM
I will change the History capture a bit and make the timing more flexible.
When there is a running tasks close to the finish line, the history capture time will be shorter.
The new timing should take care of that. But the estimated time shouldn't be off by more than 50%, in the last minute that is.
Is BT already capable of communicating with various hosts ( -> update some of them's history) using different count-down timers? (I guess it would be an implementation of one of the wishes.) Like, to be able speed up communication just with one-two particular hosts, which tasks are approaching their end.
Peter

fred

Quote from: Pepo on July 28, 2010, 12:47:52 PM
Is BT already capable of communicating with various hosts ( -> update some of them's history) using different count-down timers? (I guess it would be an implementation of one of the wishes.) Like, to be able speed up communication just with one-two particular hosts, which tasks are approaching their end.
A bit cryptic but I think it's yes. The variable history fetcher, is for all computers. Close to the deadline it will shorten the time up to 1 second.

Pepo

Quote from: fred on July 28, 2010, 03:10:41 PM
Quote from: Pepo on July 28, 2010, 12:47:52 PM
Is BT already capable of communicating with various hosts ( -> update some of them's history) using different count-down timers? (I guess it would be an implementation of one of the wishes.) Like, to be able speed up communication just with one-two particular hosts, which tasks are approaching their end.
A bit cryptic but I think it's yes. The variable history fetcher, is for all computers. Close to the deadline it will shorten the time up to 1 second.
Does "The variable history fetcher, is for all computers" mean it is common for all machines? I could imagine that with 10-20 connected machines, updating them all at once, BT would more wait than communicate.
Peter

fred

Quote from: Pepo on July 28, 2010, 03:49:01 PM
Quote from: fred on July 28, 2010, 03:10:41 PM
Quote from: Pepo on July 28, 2010, 12:47:52 PM
Is BT already capable of communicating with various hosts ( -> update some of them's history) using different count-down timers? (I guess it would be an implementation of one of the wishes.) Like, to be able speed up communication just with one-two particular hosts, which tasks are approaching their end.
A bit cryptic but I think it's yes. The variable history fetcher, is for all computers. Close to the deadline it will shorten the time up to 1 second.
Does "The variable history fetcher, is for all computers" mean it is common for all machines? I could imagine that with 10-20 connected machines, updating them all at once, BT would more wait than communicate.
Let's see how it works out. But the more computers the more work the more overhead. But it's multi threaded so the load is evenly balanced over the processor.