Grid Watch

Monday, December 19, 2005

How can one be so wrong about the direction of Computing

In April 2001,
Paul Graham wrote:

"So far, Java seems like a stinker to me. I've never written a Java program,
never more than glanced over reference books about it, but I have a hunch that it won't be a very successful language. "

In April 2003,

"When I say Java won't turn out to be a successful language, I mean something
more specific: that Java will turn out to be an evolutionary dead-end, like Cobol."

Is Java a successfull language now in 2005? It surely looks like it! Java is the leading development platform for large enterprise applications to small scale applications for embedded devices and cellphone. None of this can be said about the language Paul says is the most productive: LISP. This article is not about bashing Paul Graham, whos essays I LOVE to read, and I really like his insight however this article is about a broader question: How could so many people in the past be so wrong about the direction of computing!

Their have been numerous wrong predictions about computers:

"There is no reason for any individual to have a computer in his home."
Ken Olsen (1926 - ), President, Digital Equipment, 1977

"640K memory ought to be enough for everybody"
Allegedly attributed to Bill Gates

Predicting future is difficult anyway, however it is rare that the exact opposite prediction comes true.

Friday, December 09, 2005

Linux rises as a Supercomputer Operating System, but faces hurdles in the Desktop

In Nov 1997, 99.2% of the top 500 supercomputers in the world ran Unix.
8 years afterwards, It's market share has been eaten away by Linux which now runs on 74.4% of the top 500 supercomputers.

In 1998 Linux made it's debut in the Top500 List, an authoritative list of the top 500 supercomputers in the world.
After 1998, it took Linux 7 years to break the magical 50% mark. As of November 2004, it ran on 60.2% of the top 500 supercomputers, in 2005 it nearly reached the 75% mark. I believe that Linux will go all the way to completely take over Unix's user base. Both OS's run on about 94.4% of the top 500 sueprcomputers.

The rise of Linux has come largely at the cost of Unix. Which is understandable as both operating systems are very similar in nature, and they both are operated the same way, thus the cost of switching is minimal. But they have one major difference: A lot of money is flowing into Linux these days, and Linux has a large active community which contribute to it, enhancing and making the OS better. Whereas the money flowing into Unix is stagnant. After the commercialization of Unix, the operating system has been largely in decline, I expect the decline to continue and see it as irreversible.

Linux has made remarkable progress from a hobbyists project in 1991, to the leading OS in Supercomputers and server systems. Linux also has made inroads into the embedded market. Motorola has been very successful marketing Linux based smartphones, like the e680 which I personally use, and have no regrets about buying it.

One area where Linux lacks is the desktop market. Which of course Microsoft rules. I do not think that even with the most user friendly interfaces like KDE, Linux will be able to break into the desktop market. As desktop users think Windows to be synonymous to the PC. Another dificulty Linux faces is that the number of commercial application which have been developed for Windows far exceeds those available for Linux. This leads to an chicken and egg problem, where ISV's refuse to release software for Linux until it has a large user base, and user's won't switch until Linux has a lot of applications. A lot of organizations have made huge investments into building softwares based on the windows platform, investments they would not like to lose easily. There exist a plethora of highly productive software development tools for Windows, which ISV's all across the world use to develop solutions based on Windows. Their exist no such tools for Linux. The Qt toolkit, is the only one I can think of, but it does not come with the best of IDE's (unfortunately), Gambas is another developing project, but is way behind it's main rival in Windows VB.NET. Mono is maturing nicely, but lacks major functionality which deters developers to develop desktop applications on it, or port existing Windows developed applications on it. Java is a solution to the problem, but it is not Open Source, and hardcore Linux enthusiats will never accept Java applicatiosn on their Linux desktops.
Joel lays it down very clearly when he says that the main hurdles Linux faces is because of the "culture" it's developers follow.
PC Hardware support although not a major problem anymore, as many hardware manufacturers provide drivers for Linux, is still a problem. Recently one of my friends wanted to try Slackware 10.2 out, but was unable to do so, as Slackware could not work with his GeForce 5200 Graphic card, I asked him to download the latest nVidia drivers for Linux, he did but it had such a cumbersome installation procedure for him, that he quit Slackware and returned to Windows. I after months tried out Fedora C4 which worked flawlessly on his system.

A lot of readers of this article have pointed out that the rise of Linux in the server and supercomputing space may lead ISV's to port their applications on Linux Desktop. However I do not believe that this is the case, here why:

Windows has 90% or 95% of the desktop market. Linux might not even have 2%. If you're talking about office users, Windows is even more dominant; Linux is used as desktop more at home. What this means is for a ISV developing a software would make more financial sense to develop a Windows version first. Then, the ISV would need to evaluate the cost of doing a Linux version. If that cost is only marginally more, it's worth it. If that cost is something like 50% more, it's not worth it. Because no ISV would spend more than 20% of the cost of developing a software to capture just 2% of the market. They would rather spend the same amount on enhancing existing application.
As most commercial applications are developed on Visual Studio, which is creates applications that are dificult to port to other operating systems, the cost of the port would be more than 10% .
This is an example from Joel's article which deals with cross platform development, it makes a good read.
"If I have a product that cost me $1,000,000 to develop, and 10,000 Windows users are using it, that's $100 per user. Now if I have to make a Mac version, and it's going to cost me $500,000 to port the Windows version, and the product is going to be just as popular among Mac users as Windows users, then I will have about 1000 Mac users. That means that my port cost me $500 per user.
This is not a good proposition. I'd rather spend the money getting more Windows users, because they're cheaper. "

However all is not lost for the Linux desktop: mono might be a solution. As all ISV's would like to jump on the .NET bandwagon, and port their existing application to .NET, mono might provide the cost-effective way for ISV's to port their .NET application to Linux afterall.

Interesting Facts about the current Top 500 Supercomputer List

61% of the top 500 supercomputers lie in the USA.

Nearly 74.4% of the top 500 supercomputers run on a form of linux. and 20% run on Unix, if we consider Linux and Unix as same and call them *nix, we can say that 94.4% of all supercomputers run on *nix based systems
While on a performance basis *nix systems only make 71.8% of the top 500.

Most Supercomputers in the world are used for the Financial sector (8.8%)

Cluster Supercomputers make up the most supercomputers in the top 500, however in terms of performance they make only 48.1%, which shows that a lot of cluster computers are badly designed, as the slightest bottleneck in effects system performance greatly. Whereas Massively Parrallel Processing Systems ,systems which do not share memory resources,(MPP) only make up 20% of the total number of supercomputers, but make nearly 49% of all the performance, remarkable.

IBM remains the unbeating leading manufacturer of supercomputers with having 219 out of 500 systems in the list, they also hold slots in the top 10, including the most powerful supercomputer

On the Rules of Supercomputer Design

Their are 11 rules which one must consider when designing a supercomputer:

1) Performance, performance, performance. People are buying supercomputers for performance. Performance, within a broad price range, is everything. Thus, performance goals for Titan were increased during the initial design phase even though it increased the target selling price. Furthermore, the focus on the second generation Titan was on increasing performance above all else.

2) Everything matters. The use of the harmonic mean for reporting performance on the Livermore Loops severely penalizes machines that run poorly on even one loop. It also brings little benefit for those loops that run significantly faster than other loops. Since the Livermore Loops was designed to simulate the real computational load mix at Livermore Labs, there can be no holes in performance when striving to achieve high performance on this realistic mix of computational loads.

3) Scalars matter the most. A well-designed vector unit will probably be fast enough to make scalars the limiting factor. Even if scalar operations can be issued efficiently, high latency through a pipelined floating point unit such as the VPU can be deadly in some applications. The P3 Titan improved scalar performance by using the MIPS R3010 scalar floating point co processing chip. This significantly reduced overhead and latency for scalar operations.

4) Provide as much vector performance as price allows. Peak vector performance is primarily determined by bus bandwidth in some circumstances, and the use of vector registers in others. Thus the bus was designed to be as fast as practical using a cost-effective mix of TTL and ECL logic, and the VRF was designed to be as large and flexible as possible within cost limitations. Gordon Bell's rule of thumb is that each vector unit must be able to produce at least two results per clock tick to have acceptably high performance.

5) Avoid holes in the performance space. This is an amplification of rule 2. Certain specific operations may not occur often in an "average" application. But in those applications where they occur, lack of high speed support can significantly degrade performance. An example of this in Titan is the slow divide unit on the first version. A pipelined divide unit was added to the P3 version of Titan because one particular benchmark code (Flo82) made extensive use of division.

6) Place peaks in performance. Marketing sells machines as much as or more so than technical excellence. Benchmark and specification wars are inevitable. Therefore the most important inner loops or benchmarks for the targeted market should be identified, and inexpensive methods should be used to increase performance. It is vital that the system can be called the "World's Fastest", even though only on a single program. A typical way that this is done is to build special optimizations into the compiler to recognize specific benchmark programs. Titan is able to do well on programs that can make repeated use of a long vector stored in one of its vector register files.

7) Provide a decade of addressing. Computers never have enough address space. History is full of examples of computers that have run out of memory addressing space for important applications while still relatively early in their life (e.g., the PDP-8, the IBM System 360, and the IBM PC). Ideally, a system should be designed to last for 10 years without running out of memory address space for the maximum amount of memory that can be installed. Since dynamic RAM chips tend to quadruple in size every three years, this means that the address space should contain 7 bits more than required to address installed memory on the initial system. A first-generation Titan with fully loaded memory cards uses 27 bits of address space, while only 29 bits of address lines are available on the system bus. When 16M bit DRAM chips become available, Titan will be limited by its bus design, and not by real estate on its memory boards.

8) Make it easy to use. The "dusty deck" syndrome, in which users want to reuse FORTRAN code written two or three decades early, is rampant in the supercomputer world. Supercomputers with parallel processors and vector units are expected to run this code efficiently without any work on the part of the programmer. While this may not be entirely realistic, it points out the issue of making a complex system easy to use. Technology changes too quickly for customers to have time to become an expert on each and every machine version.

9) Build on other's work. One mistake on the first version of Titan was to "reinvent the wheel" in the case of the IPU compiler. Stardent should have relied more heavily on the existing MIPS compiler technology, and used its resources in areas where it could add value to existing MIPS work (such as in the area of multiprocessing).

10) Design for the next one, and then do it again. In a small startup company, resources are always scarce, and survival depends on shipping the next product on schedule. It is often difficult to look beyond the current design, yet this is vital for long term success. Extra care must be taken in the design process to plan ahead for future upgrades. The best way to do this is to start designing the next generation before the current generation is complete, using a pipelined hardware design process. Also, be resigned to throwing away the first design as quickly as possible.

11) Have slack resources. Expect the unexpected. No matter how good the schedule, unscheduled events will occur. It is vital to have spare resources available to deal with them, even in a startup company with little extra manpower or capital.

Global Grid Forum

Tbe Global Grid Forum is a international community leading the pervasive adoption of grid computing in research and industry.
The Global Grid Forum community believes that standards-based grid computing is critical to new scientific discovery, better business processes, lower IT costs and regional economic development.

The Global Grid Forum has an excellent introductory article to Grid Computing at here

On the site of the Global Grid Forum you can also find the various standards and specifications released by the GGF for the advancement of Grid Computing, you can find it here

Thursday, December 08, 2005

Introductory Post

My Name is Irfan Habib, I'm a student of Software Engineering at the National University of Sciences and Technology, Pakistan

My primary interests are: Open Source Software Development, Distributed and Grid Computing(1)(to which this blog is dedicated to), and General Web Technologies.

I created this blog because, duirng my work as part of the NUST Grid Computing Research Group, I discovered that their was no single site which detailed the most popular tools for Grid Computing, it might because it is a developing field, or Grid Computing researchers/enthusiats might have more interesting work to do.

What I plan to convey through this blog is the following: Detail and describe existing tools for Grid computing, talk about related advancements, new research, comment on related research publications.

I also will use this blog to talk about my ambitious project on Grid Computing, the development of kernel level modules which would make an operating system Grid aware, when completed, it would allow a person to develop applications which can be run over dozens of clusters, using a special process level IPC mechanism, the entire ssytem will be bound be an grid-filesytem, which would allow any user to mount and use any drive located on the Grid. But as the project develops I would comment on it more, talk about architectural and implementation details.

(1) If you dont know what Grid Computing is about check the following references:
Grid Computing Wikipedia Article
A Great Resource by IBM
Excellent Site dedicated to Grid Computing