DIYbanter - View Single Post - Need a computer system for your nuclear program?

In article , Lawrence
Glickman wrote:

On 04 Dec 2004 17:10:21 GMT, Ian Stirling
wrote:

I can't imagine running anything my 2.5 giger can't handle. It gets
the job done.

A 2.5Ghz pentium may only be a couple of times faster than a P100, for
some tasks.

If it requires random access to large (greater than the on-chip cache)
amounts of memory, random access speed of chips hasn't really changed
in the past decade.

Some 'supercomputer' type computers took extreme measures to get round this.

OK, I put in 512MB of RAM and this thing is working as advertised.
Less than 512 is not a good idea, because I can see from my stats that
I only have 170MB of unused physical memory at the moment.

Once you get enough RAM so you don't have to do HD read/writes, you're
on your way to speed. But M$ has shafted everybody, because M$ loves
huge swap files. The more RAM you have the less you will need to
read/write to this swap file, and the more smoothly things seem to
run.

For my own purposes, this is all the computer I need. I suppose if
you had to, you could put up to 1 gig of RAM into this thing, but from
what I can see, that would be a waste of money. It wouldn't be used.

The issue isn't about the hard drive...I think we all take it as
obvious that if hard disk accesses are frequent, performance suffers.

The issue is that there at least 3 different speed regimes for the
solid-state memory:

-- relatively slow dynamic RAM (DRAM) installed as "main memory"--this
is what people talk about when they say they have 512 MB of RAM. Access
speeds are much slower than processor cycle speeds.

-- static RAM (SRAM), which is several times faster than DRAM because
of the way the circuitry works (cross-coupled inverters rather than bit
lines and storage in depletion regions).

(The reason DRAMs and SRAMs have their niches is because DRAMs can be
built with only one transistor per memory cell, whereas SRAMs tend to
take 4 transistors per memory cell--the canonical design of
cross-couple inverters making a latch takes 4 transistors and 2
resistors. Various clever designs keep reducing the sizes, but never
down to what a single-transistor DRAM has been at for nearly 30 years.)

The SRAM may be in a separate module, either next to the processor, or
even in the same package. Or some of it may be integrated onto the
chip, as various levels of cache.

The processor looks for references in cache (via things like "TLBs,"
standing for "translation lookaside buffer"). If it finds what it needs
in cache, especially on-chip cache (or even faster, on-chip registers,
the fastest access of all), then memory access can happen in just one
or a couple of processor cycles.

If what it needs is NOT in cache, a "cache miss" has occurred, and the
processor gets a block of memory from DRAM and puts it in cache.

The analogy is like this:

Imagine you have several pieces of things you need--papers, pens,
stationery, etc. Some are right on your desk, where they can be
accessed immediately. Some are stored in drawers or file cabinets,
where they can be gotten to quickly, but not as quickly as the "cached"
items. And some are stored in other rooms.

So you need a pen. You can't one. You suffer a cache miss, and your
work stalls for a while. You root around in your desk and find one. But
since you have already paid the price of stalling and going into your
desk, you might as well "refill the cache" with several pens, and maybe
a stapler, etc.

(But you don't want to completely flush your old cache, as you may need
some of its items soon. So you only partly flush the cache and replace
it with stuff from your desk drawer. Stategies for what to flush depend
on things like "oldest gets flushed" and "least recently used" and even
some metrics coming from what you expect to be working on.)

Worse is having to go out to a storage box in the garage for something.
This is like accessing main memory.

And even worse is having to drive over to Office Depot or Staples for
something...like staples. This is like loading from a hard disk, with
access time thousands of times slower than main memory. And even slower
is having to access from a tape drive (rare for most home users these
days) or even floppies stored in a box somewhere. Hundreds of millions
of times slower than memory accesses. And because the cost of this
access is so high, you don't just get the actual staples you'll need,
you get more. You "swap from disk into main memory" and, when you
resume work, you put some of those staples in a stapler kept on your
desk....you have swapped into main memory and then loaded cache.

Look on your processor and you'll references to how much "L1" and/or
"L2" (levels, a measure of "closeness" to the processor core) cache it
has. One of the biggest differences between consumer-grade CPUs and
server-grade CPUs, like the difference between a Celeron and a Xeon--is
the amount of cache. Some processors have up to 8 MB of on-chip cache,
assuring higher "hit rates" on the cache.

Supercomputers use various strategies for boosting performance. Lots of
fast cache memory is one of them. Another is to have lots of
processors. The current top-ranked supercomputers use lots of cache
(one of them uses the Itanium-2, with, IIRC, 4 MB of cache per
processor).

One of the interesting trends is to emphasize the memory over the
processor and, instead of attaching memory to processors, consider the
processors to be embedded in a sea of memory. This is the "processor in
memory" (PIM) approach. And it's used in the IBM "Blue Gene"
supercomputer, currently the fastest supercomputer in the world (the
aforementioned Itanium-2-based machine is the second fastest).

--Tim May