RoweBits: 2007

Monday, December 31, 2007

RoweBits Year In Review

The best of the year was:

- violin playing robots from Toyota started just 3 years ago
- Fanuc fast arms - prize winner in Japan
- utility experiment proving power negotiation pays - driving smart everything electrical in the future
- more and better robots everywhere

The best from RoweBots was:

- Launch of DSPnano v2 - DSP RTOS for tiny tiny Linux compatible DSP applications
- Completion of Unison v4 - RTOS for tiny tiny Linux compatible applications
- Tighter robotics focus with lower cost software solutions packaged for applications - coming soon!

Friday, August 31, 2007

Autonomous Robots Starting To Become Real

A recent post related to portable supercomputers shows just how easy and inexpensive it is becoming to produce high performance computers capable of doing advanced analysis on their environment using smart sensors.

If you can build a machine so inexpensively ($2470) which can have twice the power of deep blue and be a checkable airline bag, we are definitely close to a breakthrough.

According to James Albus (of NIST) another two orders of magnitude should do it and service robots should be possible.

Coming to your home sometime soon....

Monday, August 20, 2007

And now the floodgates start to open... Tilera Ships

Hey;

Finally the new Sun chip has some competition. There is a good link dealing with this product here.

I am a bit surprised that the chip doesn't just have memory, not L2 if it is intended for embedded applications. I think the real story is that it is intended to be more general purpose but the embedded market for H.264 and other applications seemed a bit easier to penetrate near term.

In the future expect to see chips like this optimized for embedded applications.

Tuesday, August 14, 2007

Microchip dsPIC and DSPnano Offer Ultimate Integration

The Masters was great. I must say that meeting the CEO Steve was a real highlight. He is so practical and down to earth. I'm sure that he has had much to do with the great performance that they have seen over many years.

The DSPnano for dsPIC product is highly complementary to the offerings that they have created at Microchip. Their focus has been tiny I/O modules and debug tools. Our focus is open source DSP RTOS tools which offer POSIX compliance, DSP libraries and next generation development environments. As it turns out, we offer the glue that it takes to exploit many smaller components for dsPIC and build a total solution.

For example, voice processing, SPI, Lin, USB, TCP, I2C and much more can be quickly and easily integrated into the RTOS with the POSIX interface and all calls can be standardized. The development can use the latest Eclipse technology and debugging and display of target data levers off MPLAB, ICD2 and REAL ICE technologies.

The most amazing thing is that you can now cut your design time substantially. This combination along with 3-5 week lead times for parts accelerates your growth and lets you improve faster than your competition. Just ask Steve or read his book - this is a winning formula.

Thursday, August 9, 2007

Microchip dsPIC Expectations High

At the Master's 2007 its clear that Microchip is continuing to execute on its 16 bit strategy for micro-controllers and digital signal controllers. The number of applications is staggering for me and the chip volumes that are being shipped as well as the revenue generation from this company is staggering.

It was only just over fourteen years ago that they were on the brink of insolvency. Since this time they have transformed their business to dominate the low end micro-controller business worldwide. They are now number one in both volume and dollars.

Four years ago Microchip added the 16 bit line of products and they are doing as well or better from their introduction as the original PIC. The scenario is exciting because they are delivering superior price performance at the lower end and dominating first on volume and then on overall revenue. Congratulations to all those hard working people at Microchip!

The fallout from this for multicore is a bit obscure, but I don't see why they aren't considering using multiple dsPICs in a single core for very low power higher performance applications. They are great engineers so look for this in a couple of years.

Wednesday, August 1, 2007

Emotional Memory For Intelligent Machines

As we try and build more and more intelligent machines it seems that their is many lessons to be learned from the human brain about how to avoid dangerous situations and how to use learning as a basis of computer memory models.

Today, our idea of an intelligent machine that can understand situations from the scenes that it sees is to do something like the following:
a) Create a taxonomy of all the objects expected in an environment
b) Create relationships between the objects and have some idea of purpose and function associated with objects.
c) Look at a scene with sensors of various kinds and correlate 2D representations of 3D models to develop a list of related scene objects from the taxonomy.
d) From the structure, understand the scene.
e) Modify the scene, update the structure and update the understanding.
f) All object matching, relationship establishment, understanding and so on is done from taxonomy information saved in a relational database and recovered by search mechanisms related to relational tuples in the database.

If we have Stanley (Stanford's robot Toureg) driving down a desert road without too much around where it has weighpoints along the way and maps for terrain, this kind of sort of works. If you miss a weighpoint (CMU's red team did this), big trouble may follow because the system isn't really intelligent. Stanley didn't have the same weakness because it was smarter but still had many limitations.

Now consider how the human brain solves the same problem. It is a very different scenario.
a) The brain learns from birth, building up its knowledge of the surroundings.
b) Strong emotional response related to danger, pain, happiness etc keeps training the brain to remember certain scenes and experiences.
c) Discussion of events reinforces these event memories.
d) High level extraction of abstract concepts are related to these strong memories which are tied to the detailed memories, allowing further refinement and evolution of the concepts build upon these memories.
e) The strength of the emotion at the time creates a window by which to filter memory response time.

Would it not be relatively easy to add emotional memory to the computer system to improve response? Then the system could use strong emotional memories to respond quickly to critical events and take more time when events are not so critical.

The cost of this extra processing would be the cost of creating an emotional measure for each scene as it changes over time. This could be done in many ways but must be related to the variable score over a broad set of emotional words for a given language. But why stop at just emotional words? Shouldn't we take a scene and create a scene understanding dialog rating the scene on all means of words related to the scene and use this as a key for identifying all future scenes?

Imagine leaning and feeling happy about what you're learning. The subject matter abstract concepts, emotional feelings and the fact that your basic activity is learning are all keys to finding similar scenes. Emotional response could be a quick first pass but all words with a meansure of strength could be good measures of the correlation between this scene and others with similar characteristics.

Has anyone seen any research in this area?

Monday, July 23, 2007

gdb and multicore are mutually exclusive

I've spent the last few days looking at debugging technology for multicore chips with a view towards real-time embedded with volume applications. In this mix I'm including FPGA solutions which will soon offer the capability to have many cores on one chip, particularly given the Altera C to H compiler and NIOS as well as the offerings of their main competitor Xlinx.

My initial look for debugging solutions is clearly to have a debugger which supports many processors and has a simple user interface. After the last Multicore Expo it was clear that heterogeneous processors were necessary and that it was highly likely that collections of networked and shared memory processors would exist as part of a complex multicore processor.

First, I think the case where many cores are on a chip requires some hardware assist on the chip to support debugging. Multicore register access and a stop all cores capability will be added to most chips but this is largely unsuitable for many applications. Emulator people will be happy but real applications will have to run partially with some ability to control the rest of the application to support debugging. Emulator type solutions could do this but with a severe performance penalty that gets worse as the number of cores grow (using traditional approaches).

To do this type of debugging, a kernel must be running on the chip and there has to be some common debugger interface that talks to the system under test. The connection could be multiple or a single multiplexed connection - it makes little difference in most applications. The debugger on the other hand, must understand the processor type that it is debugging and talk to the common graphical user interface to provide debugging support across all the cores.

I was looking at gdb to solve some of these problems and it seems that it might do a half a job in the case where all the cores run some huge kernel, but it is completely unsuitable in the case where many independent cores running a small kernel are communicating to solve a problem.

For a start, it is highly complex for absolutely no reason. The remedy debugger meets almost all criteria to do this with 10% of the bulk of gdb and far less complexity. What went wrong here? How did the debugger get so large and complex with no apparent value added? It is not that remedy is much less functional, as a matter of fact, it offers support for 8 processor types and the user can select any one of them for any core on the fly. Doing a port for a new processor is much simpler too - the disassembler is separate, the register understanding and memory/stack unravelling is done with a few simple routines which are easily understood and the symbol table info is relatively standard.

It would be great if someone could provide some insight to me and others on this issue. I don't relish porting remedy to a raft of new processors that gdb already supports but it seems like it will be less work and better functionality in the long run.

I should add here that this is for this specific case that gdb seems less than attractive. For the case of debugging a single core, it does the job. The downside is that it likely takes months to do a new processor type instead of a few weeks because of all the extra complexity.

Friday, June 29, 2007

A Stunning Parallel Multicore Approach

Having been at Multicore Expo a few months ago, I was stunned by what I read a few minutes ago. It was the "elephant on the table" that nobody talked about. Companies did not want to discuss it because of their vested interests and I can only imagine that the researchers were incredibly jealous or as ignorant as I.

Having worked in the area of multiprocessing for many years, the promise of automatic parallel behavior generated by a compiler was thought of as an ideal. We never got their in the 90s but it seems that we are making real progress today. A team at College Park have done some remarkable work to solve the fundamental parallelism issues that plague most computers with a technique called "explicit multithreading". You can find some details here and here. Wen and Vishkin both deserve a huge amount of credit for creating this solution.

The other approaches that approach this on a higher level from Peak Stream (now owned by Google) and Rapid Mind both assume that there is an automatic parallelizer to allocate chunks of work to processors and that all the parallelization is done for you. The disadvantage of their approaches is that everything must be an array. This approach is quite unnatural for most programmers although it is possible to learn it. Math majors would certainly like this approach.

Another disadvantage is that their efforts (Peak Stream and Rapid Mind) focus on trying to maximize resource utilization for standard Von Neuman machines in an attached processor model while this new approach seems much simpler and lends itself to static allocation because IPC times will be ignored.

The XMT approach doesn't use an attached model but assumes that the processor has inherent parallelism. The difference is that the scheduling seems to be much clearer and communication times become minimal. This approach also seems to suffer from the same limitation that resources go unused if there is nothing that can be done in parallel. In fact, it is this reason that has many semiconductor companies holding back on large multicore architectures.

I will look at the programming models in more detail next but it seems that this machine could benefit from both automation of parallel operations and the ability to explicitly code each parallel thread.

The future signal processors are going to be really fun if technology like this hits the market anytime soon. I wonder what the follow on to Niagara 2 will really look like?

Monday, June 25, 2007

Restarting And Marketing

I've been restarting a products company and the marketing has changed so much over the past few years! For example, on the good side:

SugarCRM offers great value - open source, why spend 75K?
SEO tools (I hope google's not reading this) - open source
portals for Engineers - why leave your cube to find anything?
and the best help a guy like me could ask for ...
lots few trees cut down
more time with the family

At the same time, there is some downside as well:

the world has been replaced by voicemail
email filters kill many requests
shows are smaller and smaller
margins are thinner
channels are becoming lower cost
prices are falling
products are volume

It is sure a different place - and still lots of fun....

Comments

Tuesday, May 29, 2007

All Those Pesky DSP Functions

As DSP becomes mainstream and we incorporate it into all systems, we are left with a dillema: "how can we make this functionality simple enough for all to use effectively?" In short, we can't.

Signal processing is difficult by its nature. It can be done in fixed point, floating point or double precision floating point. It involves many different algorithms. Often at the front end, processing is data independent, while at the back end it is data dependent. Different array sizes are passed between modules, concepts are abstract and require deep mathematical understanding, and time constraints on processing make all this more difficult.

One might say: "Oh, well we'll use a graphical tool and it will solve all our problems" but this is not the case even though it might help in special cases where processing is always regular and data dependent processing is not done. As soon as branching is involved and interrupt processing is required, the need for better tools becomes apparent. In general, if the problem is more limited then this can work, but if it is complex and demanding it tends to be less than optimal.

Or does it? You tell me. How would you recommend people solve these problems?

Friday, May 25, 2007

DSP and FPGA Implementations

Hi All;

How many of you have looked at the rapidly dropping cost of fpgas and wondered how this is going to change the world of signal processing? I'd like to know too. I have some ideas summarized as follows:

FPGAs offer competitive advantages in hardware implementations of signal processing algorithms with the cost and difficulty of implementation being a barrier
DSPs offer competitive advantages where the algorithms are experimental and subject to rapid change or the ultimate idea is to push the processor to the limit by adding as many features as possible for the same hardware price.
DSPs offer integrated A/D and D/A support - something not included in FPGAs to date.
Serial D/A and A/D components can offer greater flexibility to designers of FPGAs.
Drivers for PWM are generally integrated into DSP's including dead band timing - lots of work might go into recreation of the wheel in the FPGA world.
FPGA prices relative to DSP prices are falling - this means that the cross over point from DSP to FPGA for development is changing - but how?

FPGA implementations are now cheaper but development is more expensive. And it is still not entirely clear what algorithms are possible with an FPGA compared to a DSP. For example, neural nets are computed on DSPs without difficulty even if it is time consuming. In comparison, this same algorithm would be extremly difficult to run in an FPGA if it was even possible.

How do we decide how to partition these problems? What thoughts to others have on this?

Wednesday, January 24, 2007

Pseudo Open Source

In years past we had proprietary solutions and people protected their IP. Today, many solutions are open source based on some type of "copyleft" license while other models range from Freemium to partially protected software.

I think that GPL is the ultimate bait and switch strategy. Support costs money and has to come from somewhere - the community support models are not suitable for production systems unless the software is extremely stable or the systems aren't critical. People end up purchasing support which is the real cost of the software but are unable to protect any modifications that are done as an add on.

Why does the GPL insist on having all associated code in GPL? What are they afraid of in proprietary technology?

The freemium model is much better supported by other license agreements.

Sun's agreement seems like the best of many worlds offering great freemium model support, ecosystem support and protection for those who require it.

Tuesday, January 23, 2007

Peak Stream - A New New Array Processor

The Peak team has great marketing. I really liked it and their new launch will be a great success I'm sure.

Myself, I prefer more standard calls that our industry has been working on for some time but it isn't a material difference. I do think that their tools are more limited than those of 20 years ago and their basic architecture is 30 years old or more for the most part. Why shouldn't we call it a new new thing - everyone else rehashes old technology into new all the time?

Myself, I would think that a combined programming model would be much stronger. After all, they have one multiprocessor application - why not make it all multi-threaded too and have the benefits of being able to program and debug on the underlying processors if requried? Why guess when it doesn't work - go and look (debugging rule number 6)? The dynamic allocation strategies and dynamic compiling (the new part) make this a bit tricky, but it should be possible.

I think that this will work well for many applications but we all really need some new thinking in the programming area. For real-time systems it seems much too limited in this form.

RoweBits