[SGVLUG] bit order (was Linux based web-server appliance)
David Lawyer
dave at lafn.org
Wed May 24 14:58:25 PDT 2006
On Fri, May 19, 2006 at 01:51:01PM -0700, Dustin Laurence wrote:
> On Fri, May 19, 2006 at 12:57:42PM -0700, David Lawyer wrote:
> > [snip]
> > Consider transmitting the number 0xABC. If A is sent first, what does
> > it mean? 10, 160, 2560, etc.? In order to decipher it you need to
> > scan to C and note that there are 3 hex digits. But if C, the low order
> > byte is sent first (little endian) then you know what it means: just C
> > (12). So little endian does make sense, even though we don't write
> > number that way and thus have to scan to the end of a number before
> > we know what the 1st digit represents.
>
> Quite to the contrary, the provably optimal algorithm is inherently
> big-endian,
Not necessarily. See my remarks latter on.
> and the obvious little-endian algorithm is both expensive and
> numerically unstable (when approximate floating-point numbers are
> being represented--obviously this is irrelevant in the usual
> computer case of exact integers, but as a physicist I felt it
> necessary to stipulate good numerical behavior). And it's
> interesting, if you're the sort of person that finds these things
> interesting.
>
> Re-constructing a number from its digits is the problem of
> evaluating a polynomial, since a positional notation simply writes
> the coefficients of the unique canonical form where they have values
> 0 <= a_i < b for the known base b. Since you say "transmitting" we
> have the constraints that we want the evaluation algorithm to be
> serial, i.e. to be able to work with one digit (bit, digit, word,
> byte, it doesn't matter for this purpose since all of them can be
> viewed as digits in an appropriate base) at a time without backing
> up. Given that, we also want to need to remember as little state as
> possible for simplicity and efficiency. And of course we want it to
> a good algorithm in the same sense as for any polynomial evaluation,
> sequential or not: we require numerical stability and computational
> efficiency.
>
> By lucky chance the provably optimal general solution, Horner's
> rule, is serializable and needs no state. It is very familiar to
> those of us who keep an RPN calculator with great fondness and live
> in terror of the day it fails and we can't replace it.
I use a RPN calculator and never thought about the state. There's
just the contents of the 4 stack registers. I always thought of it as
just using postfix notation where 2 3 + results in 5. You just place
the operator +after the two operands 2 and 3.
But back to the question of efficiency. But first, putting the case in
context, most of the time there is no need to extract a number from
digits because the number is in binary and it's just kept in binary
form. So extracting a number from it's digits is really just changing
the basis, like from binary to decimal or hexadecimal.
Now the value of the number "def" (with base b) is d x b^2 + e x b^1
+ f x b^0. But if we are going to do a lot of such calculations with
base b, we precalculate b^0, b^1, b^2, etc. So let B be the vector of
the b's (..., b^3, b^2, b^1, b^0) and X be the vector of the digits
(d, e, f, ....). The value of the number is B.X where . is inner
product. B is a constant vector. Now the efficiency of calculating
B.X doesn't depend of whether or not we start multiplying with the low
order order digit of X or with the high order.
> It is written inside-out on paper, but that is because that is how
> you denote big-digit-first evaluation. It is also simple: start
> with the value of the first digit, and then for each successive
> digit multiply the current value by the base (with fast shift
> instructions unless a moron chose the base) and add the new digit.
Above has n -1 multiplications and n -1 additions, the same as
with the inner product of B and X vectors as I just described.
> You specified that it's nice not to know how many digits there are
> beforehand--this means there is a terminator and this is the best
> case for Horner's rule since the current value is the only required
> state. The naive little-endian algorithm requires an additional
> piece of state in this case, the current digit count. It also
> requires O(n^2) multiplications rather than the O(n) of Horner's
> rule.
O(n^2) is true if we didn't store the values of b^n in a table. But
for repeated operations one would store them in a table so it's O(n).
In fact the b^n values can be hard-coded into the hardware or
software.
> With numbers known to fit in a machine register this is
> irrelevant because the multiplies are shifts and probably cost less
> than anything else (and will be O(n) even for naive little-endian),
> but with arbitrarily large numbers and bignum style representations
> this can really matter.
For little endian, we can wait till we get all the digits and then use
Horner's, inspecting the most significant digit first. OK, I admit I
was wrong stating an advantage for little endian. I guess it really
doesn't make much difference.
-----------------------topic change-----------------------------------
>
Dustin:
> > > Even a Pentium or Pentium II can consume 200 W, so I'd have
> > > expected a 486 to consume more.
> >
David:
> > There's a page that shows how, as speeds increased from the 486 to
> > Pentium I to Pentium ... to Pentium IV, the power kept going up.
> > And within for example Pentium I's the power went up as speed
> > increased.
>
> Of course it does--for a given design the power consumed is
> proportional to the clock speed (well, to zeroeth order). And there
> is a thermodynamic minimum cost which is also proportional to clock
> speed because every copy that overwrites a value creates entropy,
> and the faster the copies the faster the inherent power.
>
> Anyway, nobody cared about CPU consumption in that era and so it
> wasn't optimized.
The people who made my 486 cared since they had a "green" model which
had very low consumption which was advertised in the MB manual with
estimates for the HD and monitor.
> If you take your favorite speed rating per watt you'll find that a
> 486, or any x86 chip (at the very least before the Pentium-M), is
> utterly blown away by chips engineered for low power.
But what about the consumption per hour? I could type and read email
just as fast on the 486 as on a PC of more recent vintage. And the
modem speeds were about the same, at least with an analog modem. So
perhaps newer servers save energy but for a home PC's used mostly for
reading and writing, the old 486 PC's might win. But I'll agree that
a PC desktop designed like a laptop with hibernation mode, etc. will
use less energy. But how many desktops are designed like this?
[snip]
David Lawyer
More information about the SGVLUG
mailing list