News:

Follow BoincTasks on Twitter Facebook        Visit our website here.
BoincTasks cloud login is working again

Main Menu
Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - JStateson

#61
Wish List / Re: BT 0.62 - Rules / Temperature
July 01, 2010, 05:12:56 PM
Thanks - I was not aware that history had to be enabled.  Looks like it is working.  The rules log shows the rule I just added and the gadget has stuff in it :-)
#62
Wish List / Re: BT 0.62 - Rules / Temperature
July 01, 2010, 02:58:41 PM
I tried all you suggested but still no-go. I am posting links to the images as they take up too much space in the display box.  Rules were created using the add-rule mechanism you suggested.

(1) No computers are shown in the .63 gadget whereas the .62 did show 9 computers.

(2) I set the elapsed rule for 1 hour, 2 minutes, 3 seconds for task that were already all over 2 hours. Time was set to 15 seconds.  I also set temp rule for > 75c and 10 seconds and all cpu's were >= 77c.  Nothing happened, neither in the display nor the log.  The project was not suspended nor was there an entry in the rules log as shown here.   There was no entry in the rules log whether the x was shown in the checkbox or not.

(3) The state of the Show logging checkbox is being toggled off after the Rule editor is refreshed and dismissed.  To observe the problem do the following exactly:
- Add a rule and put an x in the Show logging [ ] then click on OK to dismiss the box
- put a check in the rule just created and click on edit
- observe that the checkbox is still checked as one would expect
- Click on OK to dismiss
- put a check in the rule just edited and click on edit again
- observe that the Show logging checkbox is no longer checked

(4) The state of the "BoincTasks Settings::Rules::Computer" checkbox is not preserved after the "Rule editor " dialog box is dismissed.  This is observed when running through step (3) above.
#63
Wish List / Re: BT 0.62 - Rules / Temperature
July 01, 2010, 08:23:33 AM
I cant make it work.  Went and uninstalled then re-installed .63 on  vista 64 system, dual monitor.  Set rules for < , > and for gpu and cpu. Nothing showed up in the rules log.  If I set the time for 10 seconds, something should have happened after 10 seconds.


I thought the [ x ] box (in the result creation/edit/delete box) was for enableing rules.  It is actually for selecting the one to edit (or delete).  Think a radio button would be more appropriate for that dialog box rather than a checkbox.   Ideally one should highlight the line item to edit and the [ x ] should show which rules are enabled or disabled.

Un-accountably, the rules log dialog box is always empty.
---

Another problem

The gadget tool tip is always blank. In addition, when the tasking display under the gadget setup dialog box is updated,  the cursor is moved away from the text input box and is not restored. It takes several tries before a value can be entered in, for example, the time field.  I have to be either very quick to enter it in, or use the mouse to position the cursor before each digit is typed.

#64
Wish List / Re: BT 0.62 - Rules / Temperature
July 01, 2010, 04:30:49 AM
cant get rule to work, not sure what I am doing wrong.  collatz has been running 3 days (normally only 30 minutes), should have been suspended as the temp was under the rule limit.



turning loging on or off seems to show no debug activity other than throttle messages.  I expected to see the temp being measured or messages to the effect the rule was enabled, etc.

I downloaded .63 but same problem.  Also, the checkbox state is not being saved after a rule edit.

Is there a wildcard to use so I dont have to spell the project exactly?  What about the app - wildcard for the app.  Some projects the app changes as I recall.
#65
I rebooted and re-ran that scan and it seem to be working very nicely.  I saw a red circle X but it disappeared before I could click on it.  On another occasion the red X appeared and I noticed that even a ping to that address failed.  Since it is the same address this is happening at, I suspect an intermittant connection problem does exist.

Please check your private messages as I sent you a suggestion.


#66
I went to that expert page and set the retry to 30 seconds.  It did not seem to do anything.  Maybe I need to reboot.

I clicked on the "X" to close BT.  This puts it into the tray.  About an hour or so later I brought it back up and noticed the circled red X again, on the same win7 system I had seen previously.  I clicked on that X and it went away.  I am going to reboot and next time I will wait at least 30 seconds before clicking on the red X after resuming BT.

I assume BT no longer polls when closed to the tray.    It would be helpfull if it continues to get temp's and can flash its icon if any temps exceed a threshold.  It appears that traffic to port 31417 stops when BT iconizes.

I just had 3 collatz tasks freeze up, for the 3rd time in the last couple of weeks.  I complained to admin about this last week, but was told to downgrade to a lower driver.  I suspended then resumed collatz and that fixed it.  If BT could flash a warning when temps are too low or too high it would be nice.  One gpu held that collatz task for over 24 hours.  The task should have finished in about 35 minutes.  The only indication that something is wrong is the temperature down to 50c - 55c instead of the high 70's.  Otherwise I check the wall clock time to see if too much time elapsed. It is a lot easier to spot temps that are out of bounds.  I think this is on your todo list but I thought I would mention it again.

#67
Ok - got time to downloadand run.  Before downloading, I already knew that one of my linux system was not reporting temps back to BT as it has rebooted on its on and my mod to the gnome sensor-applet runs from the desktop (not a daemon) and requires a login.  Previously, it showed up as "blank" for 192.168.0.7 (before installing .58)

After installing, I pulled the enet cable and the version went from "blank" to 1.74 and the phrase connected/not-connected showed up for 192.168.0.7 which was confusing.  At this point the cable was actually disconnected but BT hand not done a refresh of the panel.



I waited about 2 - 3 minutes and may have selected "tasks" then gone back to "computer" and BT finally spotted that the cable was disconnected.



I selected "tasks" again, then went back, to "computer" and about a minute or two later BT started recognizeing the systems that had been last displayed.  The new install of BT does not delete the old computers.xml. I had to wait a few minutes before the old systems were displayed as the "scan" button was grayed out.  When I did click on scan it quickly found all the systems.  I apecified 192.168.0.1 to 192.168.0.21 as I knew that was the range of the boinc systems.  The scan took less than 1 minute to find all the systems.  The blue progress bar was exercised about 5 times I think.

The reason I select "tasks" and go back to "compters" is because that seems to trigger BT to go off and see what is there and update the "computer" panel.  I think there should be a button there to quickly test all connections.  Currently, both on 58 and 56, I see a red circle with an X indicating a system is off line, but when I select that system the warning disappears   ie: the system was not really off line after all.




HTH.
#68
I ran three tests, the first two with the cable off.  The last one with the cable on.
The first two were identical and about 30 ip's were processed each minute. The scan stopped about 192.168.0.150 which was right at 5 minutes and never got to .255



The second test was with the cable on and at each 1 minute interval,  I made a note of the last "dot" as the ip's were scanned.
Minutes:  1 2 3 4 5 6 7 8
Last IP:   10 11 12 15 16 17 19 20
after 8 minutes I canceled out.  Somewhere after .112 I have a printer and a wireless repeater and I didnt want to wait that long.



I ran this on BT .56 which scans for ip_addr, not domain_name.
 I assume <get_host_info/> will be used to obtain domain_name eventually?
#69
Quote from: fred on May 25, 2010, 09:46:32 AM
Quote from: BeemerBiker on May 25, 2010, 05:15:03 AM
I have had collatz tasks hang, twice in the last week, on a pair of 9600gtx+ GPUs (two tasks both hung).  It was not obvious because one was about 75 pct done and the other %50 when both hung.  I finally noticed that the temps for both GPUs were in the low 50's instead of the mid to high 70's.  In both cases several days went buy before I noticed the problem.

Anyway, it would be nice to be able to highlight gpu temps that are either too high or too low.  It is true that some CUDA projects use very little of the GPU so there might be some false positives
You can setup TThrottle to execute a batch file or email. If that's Windows of course.
I made an entry in the todo list. Warning for GPU temperature low or high.


The problem with this is that TThrottle has to be set up on each system and TThrottle only runs on windows.  Now, if TThrottle could accept packets from other systems running TThrottle (or my modification to linux's sensors-applet) then the user needs to set up rules for only one system and TThrottle displays temps from all systems that sent it packets.

Currently, TThrottle monitors port 31417 and looks for "BT\0" then routes, for example,

"<TThrottle><PV 1.74><AC 0><TC 1><TG 1><DC 5><DG 45><CT0 72.0><CT1 73.0><CT2 74.0><CT3 81.0><GT0 79.0>\0"

back to the source ip (BT)


If it could look for and process, for example,
"TT<TThrottle jys2x290><PV 1.74><AC 0><TC 1><TG 1><DC 5><DG 45><CT0 72.0><CT1 73.0><CT2 74.0><CT3 81.0><GT0 79.0>\0"

it would know that jys2x290 was the hostname of the system who's IP address was the source of the incomeing TCP packet and that the temperatures that follow were collected using TThrottle <PV 1.74> or some application like linux's sensors-applet <SA 1.74> for example.

TThrottle would treat the incoming temperatures as if it has measured them itself and display them on the graph.  The hostname would have to be a new property for TThrottle to maintain.

I suspect this would best be done in BT and not TThrottle but TThrottle has all the rules processing and temp graph capability.

#70
Quote from: Pepo on May 25, 2010, 08:46:50 AM
Quote from: BeemerBiker on May 25, 2010, 05:15:03 AM
Anyway, it would be nice to be able to highlight gpu temps that are either too high or too low.

There could be an additional problem that BT could be monitoring X machines and these could contain 1..n GPUs, while each of them could have a very different idea of "too high or too low". Thus the highlight temps would have to be stored in BT per-GPU.

This could be solved by adding rules or scripting capability to BT much like rules in TT.  The user could then come up with project unique actions.  For example, assuming the BT "Tasks" column titles are objects that could be parsed by a scripting mechanism built into BT, then the user could come up with very complex rules such as

Rule 1:  On (Temperature.gpu.value < 55 && Temperature.gpu.samples > 10) && Project.value=="Collatz Conjecture" then launch(whatever.exe) && html(mailto:someone@example.com?subject=temp too low&body=Collatz) && stop_processing_rule

Rule 2:  If (Temperature.gpu.value < 55 && Temperature.gpu.samples > 10) && Project=="Collatz Conjecture" then Temperature.gpu.highlight=WARNING else Temperature.gpu.highlight=NORMAL


This could be done as an add-in to BT assuming an API was available.
#71
Wish List / temp warning (too high and too cold)
May 25, 2010, 05:15:03 AM
I have had collatz tasks hang, twice in the last week, on a pair of 9800gtx+ GPUs (two tasks both hung).  It was not obvious because one was about 75 pct done and the other %50 when both hung.  I finally noticed that the temps for both GPUs were in the low 50's instead of the mid to high 70's.  In both cases several days went buy before I noticed the problem.

Anyway, it would be nice to be able to highlight gpu temps that are either too high or too low.  It is true that some CUDA projects use very little of the GPU so there might be some false positives
#72
More info on the flacky xp-32 system.  It is blinkin on and off about every 3 seconds but is staying on more often than off.  No other systems behave like this flacky one.  It is at the end of a wireless repeater and has 2 gpu's.  There is also a ubuntu x64 box with one gpu and on the same wireless repeater.  That system is reporting temps back to boinctasks just fine.

I have 4 linux system that are polling cpu & gpu temps once every 10 seconds from a gnome panel applet (8.1, thru 10.04 ubuntu) and a pthread routes the temps back to boinctasks with a 7 second timeout on "accept".  Originally I had 5 and 1 and that gave flacky results tho not near as bad as that xp-32 system running tthrottle.

I have 3 other windows systems, all 64 bit and all running tthrottle just fine without a glitch.

I am going to look at that wireless repeater and see if that is the problem.  Possibly linux handles bad connections better then xp-32.

#73
OK, that makes sense, not showing the version number but coloring it red when offline might be a better solution. I assume I have had this problem all along although it is strange that only the xp-32 system is affected the worst. It is a fast quad.
#74
Beta Testing / 56: losing temps from tthrottle
May 24, 2010, 03:40:13 AM
Put in (v) 0.56 and immediately noticed that tthrottle was not consistently recognized.  Previously, the version number eg: 1.74, was rock solid once the system was recognized in the computer tab.  Now, some are blinking on and off.   My windows xp32 system shows up for 2-3 seconds then disappears for 5-6 seconds.  Some are stable, others blink off and come back on solid for a while.  Was there a timeing change?  Something is flacky.  

#75
Beta Testing / Re: 64 bit version
May 05, 2010, 04:23:16 AM
If you are referring to boinc stats displaying the temps, the yes, I have 2 remote win boxes and (almost) one linux* box reporting temps back to boincstasks.

However, the temps reported are identical for each of the 4 cpu's but the gpu's are correct.



I am getting back individual correct reading for each cpu temp, but the last temp is the only one that is being displayed (for all tasks).  I used microsoft net monitor to see what was happening.



The last packet of data shows 49.7 50.1  49.8 and 48.9 but only the last temp is used for the boinctasks temperature display for all 4 cpu's  (The packet does not match the image since they were taken at different times).  The gpu temperatures all are displayed correctly.

BTW I am attempting to modify the ubuntu linux sensors-applet that uses lm-sensors to send temperature measurements to your program for the cpu's and the gpu's on my linux systems.  *So far, all I have is a ubuntu server that routes some random temps back to boinctasks but I had the idea of putting it into the sensors-applet.  These are all 64 bit systems, vista, windows 7 and ubuntu