PDA

View Full Version : Up/Downloads fail until a re-start


MikeTimbers
6th February 2007, 02:48
What gives with this? I have frequent problems with services that get stuck when they can't connect for new work, yet a simple re-start instantly connects and works?

Waiting before retry.
[07:21:26] + Attempting to get work packet
[07:21:26] - Connecting to assignment server
[07:21:26] Couldn't send HTTP request to server (wininet)
[07:21:26] + Could not connect to Assignment Server
[07:21:26] Couldn't send HTTP request to server (wininet)
[07:21:26] + Could not connect to Assignment Server 2
[07:21:26] + Couldn't get work instructions.
[07:21:26] - Error: Attempt #9 to get work failed, and no other work to do.
Waiting before retry.
[07:42:39] Service stop request received.

Folding@Home Client Shutdown.

[07:42:41] - Ask before connecting: No
[07:42:41] - Use IE connection settings: Yes
[07:42:41] - User name: MikeTimbers (Team 131)
[07:42:41] - User ID: xxx
[07:42:41] - Machine ID: 1
[07:42:41]
[07:42:41] Loaded queue successfully.
[07:42:41] + Benchmarking ...
[07:42:43] - Preparing to get new work unit...
[07:42:43] + Attempting to get work packet
[07:42:43] - Connecting to assignment server


[07:42:43] + Attempting to send results
[07:42:44] - Successful: assigned to (171.65.103.160).
[07:42:44] + News From Folding@Home: Welcome to Folding@Home
[07:42:44] + Results successfully sent
[07:42:44] Thank you for your contribution to Folding@Home.
[07:42:44] Loaded queue successfully.


[07:42:44] + Attempting to send results
[07:42:46] + Results successfully sent
[07:42:46] Thank you for your contribution to Folding@Home.
[07:42:47] + Closed connections
[07:42:47]
[07:42:47] + Processing work unit
[07:42:47] Core required: FahCore_78.exe
[07:42:47] Core found.
[07:42:48] Working on Unit 09 [February 6 07:42:48]
[07:42:48] + Working ...
[07:42:48]
[07:42:48] *------------------------------*
[07:42:48] Folding@Home Gromacs Core
[07:42:48] Version 1.90 (March 8, 2006)
[07:42:48]
[07:42:48] Preparing to commence simulation
[07:42:48] - Assembly optimizations manually forced on.
[07:42:48] - Not checking prior termination.
[07:42:48] - Expanded 291020 -> 1461493 (decompressed 502.1 percent)
[07:42:48] - Starting from initial work packet
[07:42:48]
[07:42:48] Project: 3039 (Run 7, Clone 409, Gen 2)
[07:42:48]
[07:42:48] Assembly optimizations on if available.
[07:42:48] Entering M.D.
[07:42:54] Protein: p3039_supervillin-03
[07:42:54]
[07:42:54] Writing local files
[07:42:54] Extra SSE boost OK.
[07:42:54] Writing local files
[07:42:54] Completed 0 out of 5000000 steps (0)


Now I know that there is a problem with using IE connection settings with IE7 but this machine and all of the old farm are still on IE6 so it can't be that. I've had machines stuck for tens of hours yet a simple re-start immediately connects and gets work http://img116.exs.cx/img116/934/z0tdntknw.gif

TDKozan
6th February 2007, 11:45
Wireless or wired networking? I don't use IE settings but I've seen some glitches sending over WiFi if the machine is unattended.

MikeTimbers
6th February 2007, 13:20
wired

TDKozan
6th February 2007, 15:16
Ah well. One thing: try not using IE settings unless you have a pressing reason.

Also, it could just be timing. If the collection server is down for a period of time and comes back up it might have caught it just right. After the second, IIRC, subsequent retries come farther and farther apart so restarting the process gives you a few immediate retries.

TK

pelligrini
6th February 2007, 19:15
That does sound strange.

Is the FAH client old too? You might make sure it is the latest one.

TDKozan
6th February 2007, 22:20
<shame>I've done that one myself but restarting didn't cure it, only replacing the client solved the problem. </shame> I did get credit for the hung unit though.

Another 800 MHz back in production!

MikeTimbers
7th February 2007, 15:23
This happens regularly at home and used to at work. Every time, the client would be fully network connected with no indication of any other networking issues. Re-starting either the console client or the service client instantly resulted in a successful load of the queue and work resuming. The amount of hours lost is enormous.

pelligrini
9th February 2007, 12:10
I've never noticed anything like that here, but I don't really watch the clients all that close anymore. Sometimes I've had a machine down for a week or more before I noticed.