News:

Follow BoincTasks on Twitter Facebook        Visit our website here.
BoincTasks-M on Android and iOS should be ready soon

Main Menu
Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - Joseph Stateson

#16
Sorry, I cut off the image.

All those tasks in red are CPU only tasks, no GPU.  Most are now in the 20%
I have never seen CPU tasks that low
CPU time percent is set to %100

windows 11, CPU has 20 threads.

Just noticed something - I disabled GPU usage and all the CPU power levels are slowly increasing.
Went up and back down, just varying about %20 <-> %60 but should be at %99.
Going to swap the 3070 out for a 1660 and see if the problem disappears.
Will run enough CPU work units to see if the run time is really different depending on which graphics board is used.

going to run some tests using cosmology home and compare to other systems then swap out the graphics board.

#17
RTX-2080 was replaced under warranty with RTX-3070 (Gigabyte did not have any more 2080)
System seems to be running well but the CPU usages for CPU bound tasks shown values under %50 that were normally %99 using the gtx-1660 or rtx-2080
I removed all processors from windows 11 then rebooted but that did not fix the problem.  The win11 resource cpu plots indicate the cpu is used a lot more than the %35
Wondering if my "free" RTX-3070 was a refurbished replacement with a problem or may the problem is the express-3 motherboard with an express-4 graphics card.

#18
Thanks for the additional link.  I noticed that all of them suggested letting windows decide the size of the page or swap file.  That might be true for some users and systems.  I know that the last time I let windows manage my printers I had a large word document printing on my small label printer.

Another problem that I failed to mention.  I wrote that BT is handling 7 systems and 82 apps but actually there were over 600 tasks waiting to run that BT handles.  That is a lot of traffic with updates every few seconds.   World community grid does not feed the BOINC app like most other projects.  I just went and set the number of tasks from unlimited to 20 to cut down the queue size.  I assume a smaller list of tasks will mean faster updates from BT.  If 20 does not work then possibly setting resource to 0 will work.  WCG seems to never run out of work so I dont need a large queue of tasks waiting.
#19
I came here looking for why over the last several months my system has really slowed especially noticeable in BoincTasks "updates" but also affecting other programs.  Using windows 10 resource manager I observed disk drive C generally %100 most of the time with a very rare drop to under %5

BT is handling 7 systems and 82 apps.  It was getting difficult to even scroll BT and I had to wait 20-30 seconds to see a response. Sometimes the response was "Not Responding".  Disk usage was always showing  %100 during these times.


I used the following for ideas that worked so well I had to come here to post about it.
https://www.drivereasy.com/knowledge/100-disk-usage-windows-10-fixed/

What I did and how it worked

1.  Saw that iCloud photos was a big user, I disallowed iCloud from any access to my files.  This brought the disk usage down and a slight improvement.


2.  Went to startup services and changed sysmain from automatic to manual.  This made a huge difference in startup.  No long %100 for the first 3-4 minutes after rebooting or starting windows.  I suspect that this app informs M$OFT what programs you run most of the time in addition to pre-loading them in memory.  I don't need to have gridcoin research, BOINC or BT preloaded especially gridcoin as it is huge.

3. Change system performance from "best looking" to "adjust for best performance".  This made a huge difference in boinctasks.  Some time ago after a feature update I was asked if Windows could change my display settings to improve them.  I think this caused the shift from best performance to best looking.

4. Set virtual memory to custom:  minimum 4096 max 32768 for C drive only,  Nothing for D drive.  I have 32gb ram and the recommended was 1.5 * 32 but I went with 32 instead of 48.

It is as if I have a new computer again!!!!


hope this helps someone.
#20
Questions / Re: Can't connect to BOINC client
March 21, 2020, 04:24:06 AM
Put the following into the cc_config.xml file at \programdata\boinc under "options"

<allow_remote_gui_rpc>1</allow_remote_gui_rpc>

suggest this:
<cc_config>
  <log_flags>
  </log_flags>
  <options>
   <use_all_gpus>1</use_all_gpus>
   <allow_remote_gui_rpc>1</allow_remote_gui_rpc>
  </options>
</cc_config>

double check your password file for white space or unprintable character.   If you delete the gui_rpc_auth.cfg file it is automatically re-created with a 32 char password when boinc starts up.  Best is to use notepad and delete the line and save an empty file.  Its length should be exactly "0" unless you want a password
#21
Questions / Re: Can't connect to BOINC client
March 19, 2020, 01:37:50 PM
Windows or Linux client?

If Linux then at /etc/boinc-client you need to edit the file remote_hosts.cfg and add the name of the boinctasks system or its IP address.  This is not needed for windows.

If client is in windows make sure there is only one client running and that the manager is not running.  Use tasks manager to verify.  Make sure that each system can "ping" the other using system names else network problem.

From the system running the client, from an admin command prompt do the following:
c:\Program Files\boinc>boinccmd --get_host_info

do the same thing from system running boinctasks. I assume it is also running boinc (but not boincmgr)
c:\Program Files\boinc>boinccmd --host YOUR_REMOTE_SYSTEM --get_host_info

Installing boinctasks under windows should automatically ask to allow access through the firewall.

If each system can ping each other then try a telnet connection from the boinctasks system to the client

telnet YOUR_REMOTE_SYSTEM 31416

Pressing CTRL-C should generate an error message such as "<boinc_gui_rpc_reply>" plus other stuff.   If you do not see that message suspect firewall or network problem.  Use CTRL-] then "quit" to exit telnet.

HTH
#22
If you have several systems running the same app, the BoincTasks history reader can now estimate the total number of work units per day you can complete.  This takes into account idle time between completion of work units.  You need to have minimum of 24 hours of BT history.

You will need to know the averaged credit per work unit. For some projects and apps, the amount is fixed.  For example. Einstein-at-home's Gamma Ray Pulse Binary search #1 is always 3,465 credits.  Other projects require the average be calculated.  That can be done using this web site

For example:  this url represents one of the current board leader at SETI.  If you click on that url and then select "calculate" you will see an average of about 80 credits per work units based on 20 works units.

Once you have the average credit per work units you can estimate your total throughput by running the BT history reader and selecting all the histories for each system that is running, for the example below, Einstein.  The BT reader will then show all the apps that all the systems are running.  You must then select only the apps for which you have the average credit.  A shown below, the apps for Gamma Ray Pulse Binary search #1  have been selected.  You can then click on the "SAVE" to get a listing on notepad of the number of work units per day.  That number can then be multiplied by the average credit per work unit.  A shown below the estimated credits per day would be around 14,000,000.  Due to the way projects calculate the actual daily credit, it may take 2-3 weeks at 24/7 before that value shows up.



Executables are here (there is no install, just a zip file)
https://stateson.net/BTHistory/bthistory_64_32_bins.zip
All sources are at GitHub and require VS2017
https://github.com/JStateson/Gridcoin-BoincTask-HistoryReader
The above includes the web app "HostProjectStats" sources.
#23
Questions / Need clarification on interface messages
January 31, 2020, 09:05:48 PM
I ran into a problem when receiving temperatures when multiple manufacture video boards are being used.  This mainly affects my Linux program that is sending temperature information to Boinctasks for display as a TThrottle temp.

From a windows system running tthrottle, with one each NVidia and ATI, your BT debug log shows the following:

<TThrottle><HN:JYSArea51><PV 7.72><AC 0><TC 41><TG 65><NV 1><NA 1><DC 100><DG 100><CT0 36.1><CT1 38.5><CT2 37.2><CT3 36.4><CT4 36.0><CT5 40.8><CT6 36.1><CT7 36.3><CT8 36.3><CT9 39.2><GT0 41.0><GT1 65.0><RSPJI3$0q><AA0><SC85><SG83><XC100><MC2><TX><TThrottle>


The temperature of 41.0 was the NVidia, the "<NV 1>"
The temperature of 65.0 is the ATI, the "<NA 1>"
I did not see anything for intel: was expecting an "<NI 0>" or something like that.

If my guess is correct, then if there are 6 nvidia and 3 ati then there should 9 values of: <GT 0>...<GT 8>
All preceded by <NV 6><NA 3>
However, that is just a guess as I was unable to observer multipole ATI temps on systems with NVidia board.
I then looked at a windows system that had an Intel GPU in addition to 6 ATI GPUs.

<TThrottle><HN:s9x00><PV 7.72><AC 0><TC 57><TG 59><NV 0><NA 6><DC 100><DG 100><CT0 58.3><CT1 59.6><CT2 58.0><CT3 58.9><GT0 52.0><GT1 59.0><GT2 59.0><GT3 59.0><GT4 59.0><GT5 59.0><RSSh)b+1m><AA0><SC79><SG97><XC100><MC2><TX><TThrottle>


The intel temperature is displayed by Boinctasks is 59.0 degrees from looking at the display.  I am guessing that value came from one of the GT1...GT5 since they are all 59.0.  I am guessing that, based on the Intel having the GPU incorporated in the CPU, the temperature should be closer to CT0 or any of CT0..CT3.  The last 5 video boards are all identical and all run identical work units so it is no surprise that 5 of the 6 are exactly 59.0

(1)  Question:  is the 59.0 displayed by BT from the CPU temps?  If so, then that is correct for imbedded Intel HD graphics.
However, CT0 shows 58.3, not 59 and I suspect that Intel temps comes from your <TG 59>, the maximum temp.  The Intel temp was associated with project collatz which supports intel and is labeled as "1INT".  The other projects were Milkyway and d0..d5 of "(ATIs)"

The brings me to the second problem: 
(2) What to have my Linux program send to BT to show temperatures when there are multiple NVidia, ATI and maybe a single Intel.  Boinc numbers coprocessors D0..Dn-1 for n NVidia and the same for AMD: D0..Dn-1.  I don't know of any intel co-processor boards that are GPUs so AFAICT there is only 1 Intel possible.

Currently, if a mixture of NVidia and ATI then I only bother to report the coprocessors that have the bigger count, as I do not know how to format the message to BT to properly identify the coprocessors.

Following shows temperatures from Linux systems running NVidia plus one Intel GPU tasks.  The wuprop tasks is displayed as it allows me to check the CPU temperatures.




Both systems run Ubuntu 18.04 as shown here
https://einsteinathome.org/host/12783910

Since BOINC does not keep track of the actual board name nor do they use the same D0..Dn-1 numbering as the Linux kernel, I had to come up with a translation table to display the correct temps adjacent to the actual D0..Dn-1 boards.

For the TB85 mining rig and NVidia only:

<devmap>
<Num_GPUs>6</Num_GPUs>
<1>0 5 01:00.0 NV GTX-1070</1>
<2>1 0 02:00.0 NV GTX-1660-Ti</2>
<3>2 1 03:00.0 NV P102-100</3>
<4>3 2 04:00.0 NV P102-100</4>
<5>4 3 05:00.0 NV P102-100</5>
<6>5 4 06:00.0 NV GTX-1070-Ti</6>
</devmap>


For the BTC110

<devmap>
<Num_GPUs>9</Num_GPUs>
<1>0 1 01:00.0 NV GTX-1060-6GB</1>
<2>1 3 02:00.0 NV GTX-1060-3GB</2>
<3>2 4 03:00.0 NV GTX-1060-3GB</3>
<4>3 2 04:00.0 NV P106-100</4>
<5>4 0 05:00.0 NV GTX-1070</5>
<6>5 8 08:00.0 NV P106-090</6>
<7>6 5 0A:00.0 NV GTX-1060-3GB</7>
<8>7 6 0B:00.0 NV GTX-1060-3GB</8>
<9>8 7 0E:00.0 NV GTX-1060-3GB</9>
</devmap>
#24
Wish List / Re: Remove projects not initializing
December 20, 2019, 01:50:41 PM
There could be a possible problem here but I have been unable to duplicate it.

Have noticed that when attaching a project a few seconds go by before the project becomes responsive.

For example:  On two of my systems the CPU is not capable of running CPU tasks as it is two busy with the video board and their OpenCL.  If I try to add another project,  I have to click repeatedly on the "initializing project" and when the project becomes responsive I select no-new-tasks.  Once that takes effect I log onto the project, locate my "new" system and set the venue to prevent CPU tasks.  There may be a way to avoid this but it is easier this way.  Unfortunately, if I do not react quick enough, one or two CPU tasks sneak in before the NNT takes effect and have to be aborted.

Thinking about this, it seems logical that if the project never initializes one would never get control. However, Boinctasks does not actually "attach" the project.  It sends a message to the boinc client and asks it to attach.  Sometimes it attaches in a few seconds, other times it can take a few minutes.  It could be you didn't wait long enough.  If the client (boinc) has a problem initializing the project it may not respond to the manager.

RR: I tried to duplicate the problem but did not find QNC listed. The only project that started with Q was quake catcher.  Is there another name this project goes under?

Dave:  I put in goofygrid thinking that garbage-in garbage-out might hang up BT but all I got was an error message and the fake project never got initialized.  If you meant gpugrid then I had had no problem attaching.

On Tuesdays, SETI is regularly down for maintenance.  I once tried to attach with a new Linux system and had to wait a few minutes before BT recovered but that was understandable. 
#25
Wish List / Re: BoincTasks Stealing Window Focus
December 18, 2019, 03:55:59 PM
I have been running 1.80 24/7 since it came out and have not seen anything like you mentioned.  Maybe there is some setting you have enabled that is causing it.

The only time I have ever lost focus to BoincTasks was when one of my "rules' got triggered and sent a text message to my  phone that the temperature went too high on one of my systems.  The focus problem was obvious as the rules log popped up on the display and I found myself typing "into it".  I stopped that by not allowing the rules log to be enabled when activating a Boinctask rule.  That log was only good for diagnostics anyway.

Exactly what is the symptom you are seeing?

#26
Occasionally a GPU gets hung and never finishes a job, or it can reject a job within seconds of receiving it.  These events are quickly discovered using the rules mechanism.  Currently, a batch file can be executed and an email or text message can easily be sent.  However, it would be advantageous to the project and the user, to be able to handle the situation automatically.  This can only be implemented if identifying parameters can be passed from BoincTasks to the handler.  At a minimum, the following parameters might be needed

$temp---------temperature of the device assuming tthrottle running or "none"
$device-------device id of GPU (D0, D1, etc) or just "CPU" if not a co-processor
$ip_address---need to know which system has problem
$port---------if needed to communicate with client and some systems have multiple clients
$password-----if needed to communicate with the client
$rule_name----the name of the rule could have an identifying phrase useful to the handler
$computer-----name of the system
$platform-----handler might need to know which OS: Linux, mac, windows
$project------name of project would be useful to handler
$app----------name of app
$rule_count---number of times rule has been applied

Example of rule usage

if Elapsed time > 5 minutes,  project "SETI@home",  app "8.01 setiathome_v8 (cuda90)", run program:
d:\ProgramData\boinc\scripts\HandleRule.bat $rule_name $ $ip_address $device

With these additions, more useful rules can be contributed as well as 3rd party scripts or apps such as resetting the GPU, excluding it from use by the Boinc client, or shutting down the client or system.

There is a discussion back in jan 2019 by Boinc principals here where they are considering adding xml files that basically duplicate a few of the BoincTasks rules.  Their xml includes, for example, instructions to a particular nvidia board to enable or disable.
This functionality is partially present in BoincTasks but is missing the parameters required to identify the device and system having the problem.  Even if their "Computing prefs 2.0" is implemented it would required those XML file to be present on each system.

The device_id can be 0, 1, 2 etc for each type of GPU so it must include a type such as nvidia, intel, amd, etc
Need to be consistent with naming used by the exclude_gpu which appear to be
  [<type>NVIDIA|ATI|intel_gpu</type>]
#27
Right click on a computer in the "All Computers" tree brings up a select list of apps to run.

Apps would be on a tab similar to "Extra" -> "BoincTask settings" -> "Messages"

Buttons such as ADD, DEL, TEST etc.

example of what might look like.  Instead of "Project" and "Message"
  "Name"                    "Command"
PuttyLinux            "C:\Program Files\PuTTY\putty.exe" username@$(IP_ADDRESS) -pw password
IssueMWUpdate          "D:\RUN_MW_RPC_APP.BAT" $(IP_ADDRESS) $(PORT) $(PASSWORD)

etc


The names would show up in the dropdown box
#28
Questions / need help making up a rule
August 24, 2019, 11:08:24 AM
This is a hardware or software problem but it would be nice if I could spot the problem when it first occurs.  Going to post over at BOINC also as possibly the problem could be debugged better if I knew more about what was happening.

---once every couple of days----

On a 5 GPU rig, one of the GPUs crunches for 4-5 seconds then goes on to another work unit.  A queue of "waiting to run" starts building up.  Because there are 4 other working GPUs. they pull from this queue so the queue grows only slowly.  After about an hour or two there might be 40 items in the queue.

sudo /etc/init.d/boinc-client restart  => does not always work
sudo shutdown now => looks like it works but I generally cycle the power after a few minutes of waiting

When the system boots back up I run a script to set the fans to %100 else temps get up past 80 for a pair of gtx1060

I failed to make a note of which GPU had the problem if indeed the problem is a single gpu.  The only way to tell is to stop the fan and see which one reports 0 speed and then look up the bus id and see which GPU it matches in coproc-info.xml.  Have not done this yet but will the next time this happens. It would be nice if BOINC reported the same GPU# that nvidia reports on their diagnostics.  BOINC assigned 0 to best (like 1070 or gtx 2080) and larger numbers to weaker GPUs.  Not sure why they bother to rank GPUs in the first place.

---back to the rule---

The most obvious thing is to see if there are more than X items in the "waiting to run" queue and then run a script that sends me a text message.  I already have a script that does that but there is no "waiting to run" and I am pretty sure the %cpu was 99 percent so I cant use that as a trigger.  However, the CPU% is always 99 because I need to run "-nobs" to force the system to dedicated a thread %1l00 to the GPU.  So possilbly the cpu is really idle and the 99 is simply a "busy polling all the time" symptom which is a feature of the "-nobs" parameter.
#29
Questions / Re: Rule not continuous?
August 23, 2019, 06:37:10 PM
Yes, I think your observation is correct.  I have a similar problem with  the SETI NoCal app:  It seems to get stuck.  Instead of finishing in 14 - 15 minutes it drags on for hours until it times out.  I use a rule to suspend the task after 24 minutes.  The following is the log:

23 August 2019 - 13:04:15 Rule(s) ---- Active: 1
23 August 2019 - 13:04:15 Rule: StopNocal ---- rx560, SETI@home, 8.22 SETI@home v8 (opencl_ati5_SoG_nocal),  | Elapsed Time > 00d,00:24:00
23 August 2019 - 13:04:15 ============================================================================== ----
23 August 2019 - 13:05:16 Rule: StopNocal, trigger ---- rx560, SETI@home, SETI@home v8, (Elapsed >00d,03:26:24),
23 August 2019 - 13:05:16 Rule: StopNocal, from que ---- Suspend task
23 August 2019 - 13:05:17 Rule: StopNocal ---- Activated: OK, Project: SETI@home, rx560, Suspend task
23 August 2019 - 13:07:19 Rule: StopNocal ---- No longer active: rx560, SETI@home, SETI@home v8, (Elapsed >00d,03:26:24)


Note that the rule is NO LONGER ACTIVE.  That is because the  task still exists and is beyond the time limit.  The problem is that the tasks needs to be aborted.  I don't see an easy way to do that especially since the task is on a remote system.  I will look at this problem but I suspect it is not an easy fix as a simple re-activate will just try to suspend the same task again.  Just a guess.  Possibly Fred could add "Abort" as a rule option.

[EDIT] This may not apply to your case, but I discovered that if I resume the task it finishes within a few minutes.  Obviously there is a problem of some type, hardware most likely.   For this unique case I can write an additional rule that if a task is suspended for x minutes it could be resumed.  However, I do not see a "resume task" option. I think an abort option is best.
#30
I added a feature to compare GPU boards.  GRC mining frequently has a mining rack with 6 or more GPUs.

As shown below, There are 10 assorted nVidia boards recognized by BOINC.  The system TB85-nvidia is running the Linux app "cuda90" and shows 9351 work units completed successfully.  The "Type Analysis" shows the GPU option and the last 5.27 hours were analyzed and organized by GPU#.  Note that the gtx1070 Ti had the best performance with the GTX1060's slower.  The display is elapsed time in minutes.

Source to build the app (windows c#) and links to the executables as well as instructions for running the program are in the Boinctasks History Analyzer & Project performance post.  Feel free to email or PM me any questions, bugs, suggestions, etc.  I assume you can also post here.  I put a zip file with 32 and 64 bit executables here  This app uses dotnet framework 4.6.1 but I assume anything later could also be installed to allow it to run.



The following show how many credits were earned in the last 1.2 days.  Note that the system needs to be continuously mining in addition to boinctasks running constantly during the 1.2 days.




Added feature:  can plot wall time -vs- elapsed time to see changes in GPU performance.




All, or Individua,l GPUs can be scatter graphed to show differences in elapsed time. 







In addition, one can offset each GPU to see if there are differences in the processing of the data.
For example, the first graph shows a 5 GPU processing Milkyway datasets.  Once offset, it is
obvious that there are two different type of datasets some of which take longer to get the same
credits.