Spectacular!

GTBecker · Post by **GTBecker** » 09 February 2008, 17:24 PM

A 32-bit floating-point multiplicitive inverse of a 3x3 matrix takes 1830µS on a ZX-24a; it takes 464µS on a ZX-24n, a 3.95x speedup. Fantastic!

FYI, this is 8.16 times the speed of a BX-24 which takes 3789µS, and 7.45 times as fast as a BX-24p (3457µS).

Superb!

mikep · Post by **mikep** » 09 February 2008, 19:34 PM

GTBecker wrote:A 32-bit floating-point multiplicitive inverse of a 3x3 matrix takes 1830µS on a ZX-24a; it takes 464µS on a ZX-24n, a 3.95x speedup. Fantastic!

FYI, this is 8.16 times the speed of a BX-24 which takes 3789µS, and 7.45 times as fast as a BX-24p (3457µS).

Yes this is good but not entirely unexpected. The clock speed of the ZX devices is twice that of the BX so not surprisingly it is half the speed. The 4x speedup is good and shows the overhead of the VM and multiple calls into the VM. I'm sure GCC also optimized some of your code as well.

The next step is to replace your ZX function with custom C or assembler code which should enable you to get this time down to 350µs or less. No more Micromega uFPU

stevech · Post by **stevech** » 10 February 2008, 10:13 AM

but let's not forget - speed vs. code density for p-code. Sometimes important.

mikep · Post by **mikep** » 10 February 2008, 10:30 AM

stevech wrote:but let's not forget - speed vs. code density for p-code. Sometimes important.

Absolutely. For many applications it is completely unnecessary to use native mode and VM instructions are all you need. Even for Tom's application, it might that the matrix inverse time of ~1.8ms doesn't affect overall system performance or end user perceived performance. Only optimize for execution performance where and when you need to.

The interesting subject for Don is to provide some help so his customers know which ZX device they should buy. I doubt he is going to provide any kind of "upgrade" program and I'm guessing that native mode devices will cost more than their ZVM cousins.

GTBecker · Post by **GTBecker** » 10 February 2008, 11:48 AM

Dupe deleted.

GTBecker · Post by **GTBecker** » 10 February 2008, 12:12 PM

> ... it might that the matrix inverse time of ~1.8ms doesn't affect
overall system performance...

The system I'm developing is an imaging system, of a sort, and I am
forever working to improve the effective resolution and response time.
So far, faster is always better for me; a 4x speed improvement is
welcome at any memory cost - as long as I haven't yet run out.

Tom

mikep · Post by **mikep** » 10 February 2008, 13:34 PM

GTBecker wrote:The system I'm developing is an imaging system, of a sort, and I am forever working to improve the effective resolution and response time. So far, faster is always better for me; a 4x speed improvement is welcome at any memory cost - as long as I haven't yet run out.

Or you could just get a faster processor like an ARM - save yourself a lot of trouble I would think. We are off-topic now so we can continue this discussion elsewhere or in the general forum.

GTBecker · Post by **GTBecker** » 11 February 2008, 12:08 PM

mikep wrote:[...] just get a faster processor like an ARM - save yourself a lot of trouble...

Quite on topic. Yes, I could have selected another processor for some high-speed tasks but I've, so far, chosen to avoid the time necessarily spent learning another platform environment. It's also been handy to have a common philosophy and language among the three processors now in the system but, I admit, it is also sometimes a compromise.

Again, the apparent 4x speedup, at least for these heavy-math functions, is significant and welcome.

Don_Kirby · Post by **Don_Kirby** » 11 February 2008, 18:55 PM

On a similar note...

I have been having trouble getting an application working correctly, it would hang at startup. After playing around with task stack sizes (and not getting anywhere) I decided to add some delays to see exactly where the hang was occurring.

After narrowing down the problem to one particular task, and adding a 0.1 second delay in the beginning of the task, the hang disappeared. Now, of course, I'm still looking for the root cause, but that's another topic.

The point is, previously, the task cycled at about 100Hz running on a ZX40 (VM version). On the '24n, I'm seeing a speed increase of between 2400 and 2500%. At 2500 loops per second, I can certainly see why there might be some timing issue in the code that never showed up before.

Of course these numbers are meaningless in any real sense, as they apply only to this particular application. For reference only, the task in question performs only integer math, some string operations, I2C communications, and a bunch of comparing this variable to that variable.

I suppose the lesson here is that, along with the increase in speed, comes the inevitable increase in careful planning to avoid unwanted interactions.

In my particular case, I need to purposely slow down the offending task, as it is part of the UI, and simply does not need to run as fast as it is (on the new device).

-Don

ZBasic Forum

Spectacular!

Spectacular!

Re: Spectacular!

Spectacular!

Re: Spectacular!

Re: Spectacular!