PDA

View Full Version : Help!


dnar
16th September 2005, 03:15
Something weird going on with my dual PIII machine. It completed work last night (both clients, I run 2). It sat there for ages and didn't look like the work was being uploaded, so I shutdown the clients and restarted them. The work has not shown up some 12 hours later. The following is an extract from the logs of one of the clients. Can someone please tell me whats going wrong here?
[10:44:34] Completed 2475000 out of 2500000 steps (99%)
[12:10:29] Writing local files
[12:10:29] Completed 2500000 out of 2500000 steps (100%)
[12:10:29] Writing final coordinates.
[12:10:29] Past main M.D. loop
[12:11:29]
[12:11:29] Finished Work Unit:
[12:11:29] - Reading up to 296208 from "work/wudata_01.arc": Read 296208
[12:11:29] - Reading up to 1040312 from "work/wudata_01.xtc": Read 1040312
[12:11:29] goefile size: 0
[12:11:29] logfile size: 338604
[12:11:29] Leaving Run
[12:11:30] - Writing 1837264 bytes of core data to disk...
[12:11:30] ... Done.
[12:11:30] - Shutting down core
[12:11:31]
[12:11:31] Folding@home Core Shutdown: FINISHED_UNIT

Folding@Home Client Shutdown.


--- Opening Log file [September 16 02:18:47]


# Linux Console Edition ################################################## #####
################################################## #############################

Folding@Home Client Version 5.02

http://folding.stanford.edu

################################################## #############################
################################################## #############################

Launch directory: /pub/dc/holly0/fah
Executable: /pub/dc/holly0/fah/FAH502-Linux.exe


[02:18:47] - Ask before connecting: No
[02:18:47] - User name: dnar (Team 131)
[02:18:47] - User ID: 5BCB35B671A97D48
[02:18:47] - Machine ID: 2
[02:18:47]
[02:18:47] Loaded queue successfully.
[02:18:47] + Benchmarking ...
[02:18:54]
[02:18:54] + Processing work unit
[02:18:54] Core required: FahCore_78.exe
[02:18:54] Core found.
[02:18:54] Working on Unit 01 [September 16 02:18:54]
[02:18:54] + Working ...
[02:18:54]
[02:18:54] *------------------------------*
[02:18:54] Folding@Home Gromacs Core
[02:18:54] Version 1.86 (August 28, 2005)
[02:18:54]
[02:18:54] Preparing to commence simulation
[02:18:54] - Looking at optimizations...
[02:18:54] - Created dyn
[02:18:54] - Files status OK
[02:18:54]
[02:18:54] Folding@home Core Shutdown: MISSING_WORK_FILES
[02:18:54] CoreStatus = 74 (116)
[02:18:54] The core could not find the work files specified. Removing from queue
[02:18:54] Deleting current work unit & continuing...
[02:18:54] - Preparing to get new work unit...
[02:18:54] + Attempting to get work packet
[02:18:54] - Connecting to assignment server
[02:18:55] - Successful: assigned to (171.64.122.112).
[02:18:55] + News From Folding@Home: Welcome to Folding@Home
[02:18:55] Loaded queue successfully.
[02:18:56] - Deadline time not received.
[02:18:57] + Closed connections
[02:19:02]
[02:19:02] + Processing work unit
[02:19:02] Core required: FahCore_65.exe
[02:19:02] Core not found.
[02:19:02] - Core is not present or corrupted.
[02:19:02] - Attempting to download new core...
[02:19:02] + Downloading new core: FahCore_65.exe
[02:19:03] + 10240 bytes downloaded
[02:19:03] + 20480 bytes downloaded
[02:19:03] + 30720 bytes downloaded
[02:19:03] + 40960 bytes downloaded
[02:19:03] + 51200 bytes downloaded
[02:19:03] + 61440 bytes downloaded
[02:19:03] + 71680 bytes downloaded
[02:19:03] + 81920 bytes downloaded
[02:19:04] + 92160 bytes downloaded
[02:19:04] + 102400 bytes downloaded
[02:19:04] + 112640 bytes downloaded
[02:19:04] + 122880 bytes downloaded
[02:19:04] + 133120 bytes downloaded
[02:19:04] + 143360 bytes downloaded
[02:19:04] + 153600 bytes downloaded
[02:19:04] + 163840 bytes downloaded
[02:19:04] + 174080 bytes downloaded
[02:19:04] + 184320 bytes downloaded
[02:19:04] + 194560 bytes downloaded
[02:19:04] + 204800 bytes downloaded
[02:19:04] + 215040 bytes downloaded
[02:19:04] + 225280 bytes downloaded
[02:19:04] + 235520 bytes downloaded
[02:19:05] + 245760 bytes downloaded
[02:19:05] + 256000 bytes downloaded
[02:19:05] + 266240 bytes downloaded
[02:19:05] + 276480 bytes downloaded
[02:19:05] + 286720 bytes downloaded
[02:19:05] + 296960 bytes downloaded
[02:19:05] + 307200 bytes downloaded
[02:19:05] + 317440 bytes downloaded
[02:19:06] + 327680 bytes downloaded
[02:19:06] + 337920 bytes downloaded
[02:19:06] + 348160 bytes downloaded
[02:19:06] + 358400 bytes downloaded
[02:19:06] + 368640 bytes downloaded
[02:19:06] + 378880 bytes downloaded
[02:19:06] + 389120 bytes downloaded
[02:19:06] + 399360 bytes downloaded
[02:19:06] + 409600 bytes downloaded
[02:19:06] + 419840 bytes downloaded
[02:19:07] + 430080 bytes downloaded
[02:19:07] + 440320 bytes downloaded
[02:19:07] + 450560 bytes downloaded
[02:19:07] + 460800 bytes downloaded
[02:19:07] + 471040 bytes downloaded
[02:19:07] + 481280 bytes downloaded
[02:19:07] + 491520 bytes downloaded
[02:19:07] + 501760 bytes downloaded
[02:19:07] + 512000 bytes downloaded
[02:19:07] + 522240 bytes downloaded
[02:19:07] + 532480 bytes downloaded
[02:19:07] + 542720 bytes downloaded
[02:19:07] + 552960 bytes downloaded
[02:19:08] + 563200 bytes downloaded
[02:19:08] + 573440 bytes downloaded
[02:19:08] + 583680 bytes downloaded
[02:19:08] + 593920 bytes downloaded
[02:19:08] + 604160 bytes downloaded
[02:19:08] + 614400 bytes downloaded
[02:19:08] + 624640 bytes downloaded
[02:19:08] + 634880 bytes downloaded
[02:19:08] + 645120 bytes downloaded
[02:19:08] + 655360 bytes downloaded
[02:19:08] + 665600 bytes downloaded
[02:19:08] + 675840 bytes downloaded
[02:19:08] + 686080 bytes downloaded
[02:19:08] + 696320 bytes downloaded
[02:19:09] + 706560 bytes downloaded
[02:19:09] + 716800 bytes downloaded
[02:19:09] + 727040 bytes downloaded
[02:19:09] + 737280 bytes downloaded
[02:19:09] + 747520 bytes downloaded
[02:19:09] + 757760 bytes downloaded
[02:19:09] + 768000 bytes downloaded
[02:19:09] + 778240 bytes downloaded
[02:19:09] + 788480 bytes downloaded
[02:19:09] + 798720 bytes downloaded
[02:19:09] + 808960 bytes downloaded
[02:19:09] + 819200 bytes downloaded
[02:19:09] + 829440 bytes downloaded
[02:19:09] + 839680 bytes downloaded
[02:19:09] + 849920 bytes downloaded
[02:19:09] + 860160 bytes downloaded
[02:19:10] + 861831 bytes downloaded
[02:19:10] Verifying core Core_65.fah...
[02:19:10] Signature is VALID
[02:19:10]
[02:19:10] Trying to unzip core FahCore_65.exe
[02:19:10] Decompressed FahCore_65.exe (2264152 bytes) successfully
[02:19:10] + Core successfully engaged
[02:19:15]
[02:19:15] + Processing work unit
[02:19:15] Core required: FahCore_65.exe
[02:19:15] Core found.
[02:19:15] Working on Unit 02 [September 16 02:19:15]
[02:19:15] + Working ...
[02:19:16] Folding@Home Client Core Version 2.53 (June 29, 2004)
[02:19:16]
[02:19:16] Proj: work/wudata_02
[02:19:16] Done: 22864 -> 142987 (decompressed 625.3 percent)
[02:19:16] nsteps: 5000000 dt: 2.000000 dt_dump: 250.000000 temperature: 298.000000
[02:19:16] xyzfile:
[02:19:16] " 393 p1159_L939_K12M_298K_DT_5ns_clones
[02:19:16] 1 N -22.735945 4.552132 ..."
[02:19:16] keyfile:
[02:19:16] "parameters ./proj1159.prm
[02:19:16] NOVERSION
[02:19:16] ARCHIVE
[02:19:16]
[02:19:16] cutoff 16.0
[02:19:16] taper 12..."
[02:19:16]
[02:19:16] - Couldn't get size info for dyn file: work/wudata_02.dyn
[02:19:16] Starting from initial work packet
[02:19:16]
[02:19:16] Protein: p1159_L939_K12M_298K_DT_5ns_clones
[02:19:16] - Run: 2 (Clone 211, Gen 0)
[02:19:16] - Frames Completed: 0, Remaining: 400
[02:19:16] - Dynamic steps required: 5000000
[02:19:16]
[02:19:16] Writing local files:
[02:19:16]
[02:19:16] parameters work/wudata_02.prm
[02:19:16] - Writing "work/wudata_02.key": (overwrite) successful.
[02:19:16] - Writing "work/wudata_02.xyz": (overwrite) successful.
[02:19:16] - Writing "work/wudata_02.prm": (overwrite) successful.
[02:19:18] - Writing "work/wudata_02.key": (append) successful.
[02:19:18]
[02:19:18] PROJECT="work/wudata_02", NSTEPS=5000000, DT=2.0000, DTDUMP=25.000000, TEMP=298.00
[02:19:18] TINKER: Software Tools for Molecular Design
[02:19:18] Version 3.8 October 2000
[02:19:18] Copyright (c) Jay William Ponder 1990-2000
[02:19:18] portions Copyright (c) Michael Shirts 2001
[02:19:18] portions Copyright (c) Vijay S Pande 2001
[02:38:04] Finished a frame (1)
[02:56:49] Finished a frame (2)
[03:15:32] Finished a frame (3)

dnar
16th September 2005, 03:28
BTW, the contents of the work directory is:
[wayne@Criten fah]$ ls -l work
total 2420
-rw-rw-r-- 1 wayne wayne 26016 Sep 16 15:20 current.xyz
-rwxr-x--- 1 wayne wayne 1607 Sep 16 15:18 logfile_02.txt
-rw-rw-r-- 1 wayne wayne 417312 Sep 16 15:20 wudata_02.arc
-rw-rw-r-- 1 wayne wayne 576 Sep 16 15:20 wudata_02.chk
-rw-rw-r-- 1 wayne wayne 23376 Sep 16 10:18 wudata_02.dat
-rw-rw-r-- 1 wayne wayne 124664 Sep 16 15:20 wudata_02.dyn
-rw-rw-r-- 1 wayne wayne 16384 Sep 16 14:50 wudata_02.log
-rwxr-x--- 1 wayne wayne 512 Sep 16 15:20 wuinfo_02.dat
-rw-rw-r-- 1 wayne wayne 1837264 Sep 15 20:11 wuresults_01.dat
[wayne@Criten fah]$

TDKozan
16th September 2005, 05:08
If you do a -queueinfo does it show anything hanging out?

T"clutching at straws"K

dnar
16th September 2005, 06:52
Only the current (new) work unit. :(

pelligrini
16th September 2005, 09:19
I'm not sure what is happening, but I think you shut down the client at a bad time. Is this the same machine with your previous shutdown problems? This is part of the log from a recent gromacs unit on my machine (it's a P4 1.7 running win2k)


[10:42:57] Completed 990000 out of 1000000 steps (99)
[10:52:34] Writing local files
[10:52:34] Completed 1000000 out of 1000000 steps (100)
[10:52:34] Writing final coordinates.
[10:52:34] Past main M.D. loop
[10:53:34]
[10:53:34] Finished Work Unit:
[10:53:34] - Reading up to 130248 from "work/wudata_05.arc": Read 130248
[10:53:34] - Reading up to 403816 from "work/wudata_05.xtc": Read 403816
[10:53:34] goefile size: 0
[10:53:34] logfile size: 58138
[10:53:34] Leaving Run
[10:53:36] - Writing 611042 bytes of core data to disk...
[10:53:36] ... Done.
[10:53:36] - Shutting down core
[10:53:36]
[10:53:36] Folding@home Core Shutdown: FINISHED_UNIT
[10:53:40] CoreStatus = 64 (100)
[10:53:40] Sending work to server


[10:53:40] + Attempting to send results
[10:53:56] + Results successfully sent
[10:53:56] Thank you for your contribution to Folding@home.
[10:53:56] + Number of Units Completed: 390

[10:54:00] - Preparing to get new work unit...
[10:54:00] + Attempting to get work packet
[10:54:00] - Connecting to assignment server
[10:54:01] - Successful: assigned to (171.64.122.112).
[10:54:01] + News From Folding@Home: Welcome to Folding@Home
[10:54:01] Loaded queue successfully.
[10:54:01] - Deadline time not received.
[10:54:02] + Closed connections


There is also a verbosity switch that you can use when running the client. Incressing it will show more details in the log file. It's useful to run it when troubleshooting.

dnar
16th September 2005, 09:24
Yeah, this is the same machine. A real bummer, many days of crunching lost. 260 units each CPU.

Your logile looks like my other 2 machines that have sucessfully returned work.

pelligrini
16th September 2005, 09:45
I'm thinking there is something strange going on with your OS.

Did both clients have the same problem?

dnar
16th September 2005, 09:56
I'm thinking there is something strange going on with your OS.

Did both clients have the same problem?
Yup. The clients sat for hours doing nothing (at the point in the lofs were upload normally occurs.

This is my dually server running RedHat 9 (old). I dont wish to upgrade the OS as it runs Win4Lin with a patched kernel, I cant get a patched kernel for Fedora without running a vanilla kernel, the RedHat kernels rock in comparison so this one machine stays at RH9 for a little longer, at least until I have XP running adequately on Fedora using QEMU (almost there, just slow).

dnar
21st September 2005, 06:24
Well my dually has finally started registering results! Yah!

Daniel, Laura, and Nora
22nd September 2005, 04:07
...until I have XP running adequately on Fedora using QEMU... Say what?

dnar
22nd September 2005, 06:02
Sorry, it's Penguin talk!

QEMU is an open source virtual machine for Linux (and other OS's). I have XP running inside of QEMU on Linux. :)

pelligrini
23rd September 2005, 17:16
I was just wondering why would you want to run it like that?

TDKozan
23rd September 2005, 17:59
Because he can?

TK

Daniel, Laura, and Nora
23rd September 2005, 23:15
I've got it completely sussed out. I wouldn't discuss it here except things are so quiet at TGC that word will never get out. There are things about Wayne that just shouldn't be discussed in an open forum. Deep dark kinky embarrassing things that Shu told me in a long drunken PM a while back. The fact is, Wayne has a Windows fetish. Late at night he indulges in endless conversations with Clippy, the paperclip help friend. He gets hard with MicroSOFT. The Linux is just an elaborate cover.

dnar
24th September 2005, 00:18
make uninstall

dnar
24th September 2005, 08:13
Weird, my dually has done it again! Days wasted...

Time to look into any possible issues with C libraries on Redhat 9.

pelligrini
26th September 2005, 09:37
I wouldn't doubt it DLN. :D

dnar
28th September 2005, 10:35
My duallie will turn in another whoonit tonight, lets see what happens - or doesn't happen :cry:

NEWS FLASH: My 466 Smelleron is at 75% after 21 days crunching!!! All for 240 points... Damn, that machine felt really fast when I bought it 5 years ago...

dnar
28th September 2005, 11:25
Found the problem. The last glibc package update for RH9 causes FAH to fail uploading. More specifically, it is only the i686 package that is the problem. "Upgrading" to the i386 package should solve the proble, will report back in a day or two.

pelligrini
28th September 2005, 11:47
Cool, I hope it works for you.

How did you find out what it is?

dnar
28th September 2005, 12:04
Cool, I hope it works for you.

How did you find out what it is?
Thx. Me 2.

Folding forums.

MikeTimbers
28th September 2005, 12:08
My duallie will turn in another whoonit tonight, lets see what happens - or doesn't happen :cry:

NEWS FLASH: My 466 Smelleron is at 75% after 21 days crunching!!! All for 240 points... Damn, that machine felt really fast when I bought it 5 years ago...

And how much energy did that use and is it really worth it? With just two computers I'm producing over 400 points per day!

dnar
28th September 2005, 12:35
And how much energy did that use and is it really worth it? With just two computers I'm producing over 400 points per day!
I wonder how the Linux client performs relative to the M$ client?

pelligrini
28th September 2005, 14:16
I don't think that there is much of a difference, if any. The way I understand it, is that the client is just a frontend for the different cores.