Need clarification on interface messages

Started by JStateson, January 31, 2020, 09:05:48 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

JStateson

I ran into a problem when receiving temperatures when multiple manufacture video boards are being used.  This mainly affects my Linux program that is sending temperature information to Boinctasks for display as a TThrottle temp.

From a windows system running tthrottle, with one each NVidia and ATI, your BT debug log shows the following:

<TThrottle><HN:JYSArea51><PV 7.72><AC 0><TC 41><TG 65><NV 1><NA 1><DC 100><DG 100><CT0 36.1><CT1 38.5><CT2 37.2><CT3 36.4><CT4 36.0><CT5 40.8><CT6 36.1><CT7 36.3><CT8 36.3><CT9 39.2><GT0 41.0><GT1 65.0><RSPJI3$0q><AA0><SC85><SG83><XC100><MC2><TX><TThrottle>


The temperature of 41.0 was the NVidia, the "<NV 1>"
The temperature of 65.0 is the ATI, the "<NA 1>"
I did not see anything for intel: was expecting an "<NI 0>" or something like that.

If my guess is correct, then if there are 6 nvidia and 3 ati then there should 9 values of: <GT 0>...<GT 8>
All preceded by <NV 6><NA 3>
However, that is just a guess as I was unable to observer multipole ATI temps on systems with NVidia board.
I then looked at a windows system that had an Intel GPU in addition to 6 ATI GPUs.

<TThrottle><HN:s9x00><PV 7.72><AC 0><TC 57><TG 59><NV 0><NA 6><DC 100><DG 100><CT0 58.3><CT1 59.6><CT2 58.0><CT3 58.9><GT0 52.0><GT1 59.0><GT2 59.0><GT3 59.0><GT4 59.0><GT5 59.0><RSSh)b+1m><AA0><SC79><SG97><XC100><MC2><TX><TThrottle>


The intel temperature is displayed by Boinctasks is 59.0 degrees from looking at the display.  I am guessing that value came from one of the GT1...GT5 since they are all 59.0.  I am guessing that, based on the Intel having the GPU incorporated in the CPU, the temperature should be closer to CT0 or any of CT0..CT3.  The last 5 video boards are all identical and all run identical work units so it is no surprise that 5 of the 6 are exactly 59.0

(1)  Question:  is the 59.0 displayed by BT from the CPU temps?  If so, then that is correct for imbedded Intel HD graphics.
However, CT0 shows 58.3, not 59 and I suspect that Intel temps comes from your <TG 59>, the maximum temp.  The Intel temp was associated with project collatz which supports intel and is labeled as "1INT".  The other projects were Milkyway and d0..d5 of "(ATIs)"

The brings me to the second problem: 
(2) What to have my Linux program send to BT to show temperatures when there are multiple NVidia, ATI and maybe a single Intel.  Boinc numbers coprocessors D0..Dn-1 for n NVidia and the same for AMD: D0..Dn-1.  I don't know of any intel co-processor boards that are GPUs so AFAICT there is only 1 Intel possible.

Currently, if a mixture of NVidia and ATI then I only bother to report the coprocessors that have the bigger count, as I do not know how to format the message to BT to properly identify the coprocessors.

Following shows temperatures from Linux systems running NVidia plus one Intel GPU tasks.  The wuprop tasks is displayed as it allows me to check the CPU temperatures.




Both systems run Ubuntu 18.04 as shown here
https://einsteinathome.org/host/12783910

Since BOINC does not keep track of the actual board name nor do they use the same D0..Dn-1 numbering as the Linux kernel, I had to come up with a translation table to display the correct temps adjacent to the actual D0..Dn-1 boards.

For the TB85 mining rig and NVidia only:

<devmap>
<Num_GPUs>6</Num_GPUs>
<1>0 5 01:00.0 NV GTX-1070</1>
<2>1 0 02:00.0 NV GTX-1660-Ti</2>
<3>2 1 03:00.0 NV P102-100</3>
<4>3 2 04:00.0 NV P102-100</4>
<5>4 3 05:00.0 NV P102-100</5>
<6>5 4 06:00.0 NV GTX-1070-Ti</6>
</devmap>


For the BTC110

<devmap>
<Num_GPUs>9</Num_GPUs>
<1>0 1 01:00.0 NV GTX-1060-6GB</1>
<2>1 3 02:00.0 NV GTX-1060-3GB</2>
<3>2 4 03:00.0 NV GTX-1060-3GB</3>
<4>3 2 04:00.0 NV P106-100</4>
<5>4 0 05:00.0 NV GTX-1070</5>
<6>5 8 08:00.0 NV P106-090</6>
<7>6 5 0A:00.0 NV GTX-1060-3GB</7>
<8>7 6 0B:00.0 NV GTX-1060-3GB</8>
<9>8 7 0E:00.0 NV GTX-1060-3GB</9>
</devmap>

fred

I don't have time to look into this for some time.
What I do know.. The BOINC Client has a list of GPU's it uses and the ID, that way you should be able to match the device number BoincTasks uses.