PDA

View Full Version : New Genome Performance Data


wbierman
6th September 2001, 02:52
Whom ever asked about performance.... new data.

Upgraded my Web server from a 900MHz T-bird @1.1GHz to a new T-bird 1.4GHz @ 1.57GHz

KDfold now reports a Sequence at 00:10:34 for a 72aa

The old value was 00:17:38 for the same 72aa

That knocks 3 hours off the total processing time for a 72aa

Jodie
6th September 2001, 03:10
I'm getting 11:04/sequence for a 65aa on a Tbird 1.33 @ 1.33
and 10:50 for a 65aa on a Tbird 1.33 @ 1.42 both on Linux. I'm seeing 20:22/sequence for a 68 on a 1G PIII running W2KP and 21:18 for a 72 on a 1.13G PIII running Linux. A Dell 600 PIII is doing 21:00 on a 65aa. A 550 Classic Athlon is doing 29:10 on a 68aa. I'll try to keep a chart on the various speed machines and build a table of identical protein counts over the next week so we can get a good comparison.

Jodie
6th September 2001, 03:32
Actually, I have enough machines at any given time to see some rather enlightening data-points... It should be interesting over a week. Some numbers weren't making sense until I took OS into account.

I have an NT4 machine, 3x98 machines, a W2K Pro machine, a W2K AS machine and my own Linux kernels.

I have to say - I'm going to dump NT4 on that machine and install Linux. Poor 700 AMD! A 550AMD with Win98 is outrunning a 700AMD with NT flat out big time majorly. a 68aa on the NT4 machine does a sequence in 32:20. The 550 AMD does a 68aa sequence in 29:10. WOW.

The 1G-P3 with Win2k does a 68 in 19:52. The 1G-P3 running Linux does a 68 in 16:54. The 850 PIII running Linux is doing a 68 sequence in 21:16. Basically, the W2k erases most of the difference between an 850 and a 1G. The Tbirds are working on odd-sized WU's right now, but the 1.33@1.33 just downloaded a 68aa, so I'll have a comparison there in a frame or two.

[edit - the 1.33@1.33 is doing a 68aa sequence in 11:28 right now. Pretty consistant with what I'd expect. The OS is Linux twice as fast as the 850 and faster than raw clock can account for from a P3-1G]

These are, of course, preliminary results, I'll keep track of a sampling of same-sized WUs over time and post the scores with drift. I'll keep the raw data for anyone that wants it... If this continues to hold, I'm afraid all but the three machines that MUST be windows'd will be converted. Again, though, all but those three machines aren't desktops, they're GenomeToasters...

wbierman
6th September 2001, 03:49
Interesting...

My other Win2K As running @ 1.1GHz is doing a 65aa @ 14:15

wbierman
6th September 2001, 05:05
I've edited my first post to reflect my last observation.

1st look = 11:34 per sequence
2nd look = 10:30
3rd look = 10:45

We need an average of the 30 sequences....

MikeTimbers
6th September 2001, 07:02
P3-667 W95 99aa 50:12 per sequence (yuck)
P3-933 W2KAS 60aa 16:59 per sequence
P3-1G W2KAS 66aa 19:28 per sequence
P3-1G W2KAS 66aa 18:30 per sequence


The last two are on the same machine without setting affinity. Should be identical but I guess the system threads are affecting one more than the other. <shrug>

siggy
6th September 2001, 07:09
WOW wbierman you are a Veteran Professional. Congrats man.

400 post. way to go.:D

Jodie
6th September 2001, 10:29
Totally agreed, because that 68 on the 133@1.42 actually did 9:10/seq over time (it's on 71% of 29 now)

I don't think kdfold is especially 'trustable' early on... We need to assemble a "set of proteins" for profiling that are identical... Upload them to a file server. I'll volunteer to house them- I have unlimited bandwidth and storage...

dnar
6th September 2001, 10:33
Dont you mean a 65aa....

To quote you "10:50 for a 65aa on a Tbird 1.33 @ 1.42 both on Linux"

Jodie
6th September 2001, 10:45
That was my screw-up last night. It was actually a 68When I wrote it down, I looked at the machine below it for sequence size. I'm reading them off the top now, so I won't make that mistake again... It finished the WU at 9:46 just about two seconds ago.. The 1.33@1.33 is actually working on a 68 now too. Running at 12:08/seq, but only at 56% of seq 7, so tooo early to tell.

I'm going to write a perl script to grok out all the scrlog files and solve that with a spreadsheet of ALL the times over a thousand plus genes... That's the only real way to do it, I think, although I'd still rather we use standardized genes...

dnar
6th September 2001, 10:58
So I gather you have networking and Samba running now Jodie. Are you back to the standard kernel? (BTW, a copy of your modified kernel would be terrific, I am still waiting, do you have it packaged yet?) :D



Dont forget to measure the seq time as an average of 30. I find quite a spread between the 30 sequences.

Jodie
6th September 2001, 11:18
Nod. I am averaging it.

Nope, I just put what I needed back into the kernel to support Samba. And I will get a copy to you soon! Promise! As soon as this stuff with Lucent and with AOL is under control...

dnar
6th September 2001, 11:22
Originally posted by Jodie

Nod. I am averaging it.





Nope, I just put what I needed back into the kernel to support Samba. And I will get a copy to you soon! Promise! As soon as this stuff with Lucent and with AOL is under control...

What kernel tree did you base on, and also, what version of gcc are you compiling with ???

pelligrini
6th September 2001, 11:31
Originally posted by Jodie
I'm going to write a perl script to grok out all the scrlog files and solve that with a spreadsheet of ALL the times over a thousand plus genes... That's the only real way to do it, I think, although I'd still rather we use standardized genes...
That would be really helpful.

I think it was JPS, that came by wanting everyone's gah files to run some numbers on the stats weighting a few months ago. I haven't heard about any results. I'm sure he's got a lot of useful data for some benchmarks and times. He had everyone list their type of processors and speed.

phil
6th September 2001, 11:41
Originally posted by pelligrini

That would be really helpful.

I think it was JPS, that came by wanting everyone's gah files to run some numbers on the stats weighting a few months ago. I haven't heard about any results. I'm sure he's got a lot of useful data for some benchmarks and times. He had everyone list their type of processors and speed.


Yeah, there was also a website that could examine your logfile and would give you a list of times versus AA. I NEED to find that link!

Jodie
6th September 2001, 12:00
Originally posted by dnar


What kernel tree did you base on, and also, what version of gcc are you compiling with ???

2.4.2-2, 2.98 gcc. I'll pull the lib versions - they're whatever is on my development system...

Put a firewall in front of these boxes and don't plan on using them for ANYTHING else... Also, I need to know what chipsets you want because each is chipset dependent. Or I'll send you the source for each chipset I support. It's pretty easy to modify for your chipset...

Bruce
6th September 2001, 15:21
Originally posted by phil



Yeah, there was also a website that could examine your logfile and would give you a list of times versus AA. I NEED to find that link!

I lost the link too -- and I don't remember his name, but he wrote the script, placed it on a website, asked for donated information, and then didn't publish his results.

It seems he had an axe to grind. His sole purpose was to refute the new WU calculations -- which he did in email to Stefan. Apparently, however, all the data didn't fit his assumptions so the data he submitted was from a single reference machine. He came up with (IMHO) an overly complicated formula which is being considered by Stefan. (See Stefan's to-do list.)

I asked him how closely the other data could be made to fit the same curve -- and never got an answer.

Personally, I figure the current WU formula is close enough that we shouldn't mess with it. As long as it isn't so distorted that it makes folks hoard short (or long) proteins, just because of the stats, it is good enough. The science needs to be able to ask for 30 genes (or whatever) and get back somewhere between 30 and 300. I'll bet they got back 3000 (or 30000) of those short ones before Stefan made the change.

I'd also be interested in some kind of accuracy calculation. (How much variation is there between sequences? How much variation is there between proteins of identical length? etc.

We need a good set of refernence data. It would be nice to organize it and publish it to the whole community.

Go for it.

phil
7th September 2001, 09:50
I am just running my new Athlon 4 in at 1200MHz....It has just completed a 68AA in 7hrs and 40 mins which is just over 15mins per sequence. I am not sure it is any faster than a regular T-Bird at 1200MHz. I will start to overclock soon.

Dustin
7th September 2001, 10:19
Cool, add a couple more.:D That Tbird 1.2G I just threw together really stinks. 1.3G seems to be the max.:mad: It's one of the first 1.2's released. Oh well, it was only $50 for the board and CPU.

phil
7th September 2001, 10:35
Originally posted by Dustin
Cool, add a couple more.:D That Tbird 1.2G I just threw together really stinks. 1.3G seems to be the max.:mad: It's one of the first 1.2's released. Oh well, it was only $50 for the board and CPU.


You can't argue with that for $50!! I am going to start messing with the mem timings in the BIOS of the KG7....have you seen all of the settings? Tweakers paradise :D

Dustin
7th September 2001, 10:41
I'm not familiar with that board.:(

phil
7th September 2001, 11:04
Damn!! Booted up straight away at 150x10 and the Crucial PC2100 is maxed out at cas2. I am using custom timings from a site I found (http://www.icronticforums.com/showthread.php?s=&threadid=12201).....the Abit KG7 is simply awesome!!

Dustin
7th September 2001, 11:08
Cool!:D I won't comment about Abit though.;)

eldiablo
17th September 2001, 03:26
Originally posted by phil



Yeah, there was also a website that could examine your logfile and would give you a list of times versus AA. I NEED to find that link!

Genome Log Scanner (http://12.36.1.8/GenomeLogScan.asp)

this is currently on my old webserver, so i don't know how fast it will be. my new server (dual mp's) is waiting for it's replacement hard drive.

eldiablo
17th September 2001, 03:32
Originally posted by phil
Damn!! Booted up straight away at 150x10 and the Crucial PC2100 is maxed out at cas2. I am using custom timings from a site I found (http://www.icronticforums.com/showthread.php?s=&threadid=12201).....the Abit KG7 is simply awesome!!

KG7?! You sir, suck. :p ;)

I'm holding out for the KR7 and/or an nvidia chipset board. :D

phil
17th September 2001, 04:05
Originally posted by eldiablo


Genome Log Scanner (http://12.36.1.8/GenomeLogScan.asp)

this is currently on my old webserver, so i don't know how fast it will be. my new server (dual mp's) is waiting for it's replacement hard drive.


Thanks man, you're a star :) ...is there any chance I could have a copy of that for my own personal use?

edit: Does this report the new wu weightings and could you add the total average time per wu?