Life rocks, sort of ...

Friday, August 14, 2009

On comparing algorithmic cost estimates prepared in different organisations

Roughly summarized in the algorithmic cost estimation the cost is estimated as a mathematical function linking costs or inputs with metrics to produce an estimated output. This function arise from the analysis of historical cost information which relates some commonly used attributes for cost (usually product size, function points, object points, etc.), to the project cost:

Effort = A x Size ^B x M

where A is an organization-dependent constant, B reflects the disproportionate effort for large projects and M is a multiplier reflecting product, process and people attributes. Due to the constant A beeing a organization-dependant constant, the algorithmic cost model will differ from one organization to another. Another great disadvantage of this models is the inconsistency of the estimates, studies show that estimates vary from 85% - 610% between predicated and actual values. By adjusting the weightings of the attributes (in the formula - the multiplier M), also called calibrating to the specific environment, the accuracy of the model can be greatly improved. This calibration however is in a sense customization of the generic model for a specific environment (situation, organization, etc) and would result in a model, which is not useful outside of this particular environment (e.g. organization) it was calibrated for.
Conclusion: The estimates of factors contributing to B and M are subjective and thus organization and situation-dependant. Due to this algorithmic cost models are not directly comparable form one organization to another.

Video searching on the web

With Google, Yahoo Video, AltaVista, Singingfish, Dogpile, Blinx.tv, YouTube, Truveo, AOL Video, Live Search and other search engines are for a while now indexing videos too.

Traditional search engines on the World Wide Web index web pages by treating them as plain text documents, and indexing the content of the page in order to allow users to look for it. This however is not practicable for images, videos and audio content. Therefore searching for multimedia data (images, video and audio) introduces challenging problems in many areas. Not only the question what should be used to indexed multimedia data but also how this index has to be queried in order to retrieve the information back is still a grand challenge in this area.

Different search engines use slightly different "techniques" to index video files. Among those techniques are:

Using text from the filename.
Using alternate text.
Using the text in the hyperlink or other relevant text from the web page, which links to the particular video
The video header information, which usually include title, author and depending on the video format copyright information.
Textual meta data.
User tags for the video file.

Relying entirely on this handful of approaches has many drawbacks. Metadata, for instance, often doesn't have enough information to identify a video, and the weakness of user tags, is that they can be misused. Therefore in recent years a couple of more innovative approaches for indexing videos have arisen. The following the following two I find particularly interesting:

The search engine Blinkx, for example, uses speech-recognition technology in addition to standard metadata and surrounding text searches. It converts audio speech into searchable text by extracting the audio information accompanying most video files and useing it to create a searchable text index of "words".
Researchers at the university of Leeds (former Oxford) are working on another innovative solution which aims to make the content of a video searchable, instead of only the text description and meta data. In order to do this they have developed a system that uses a combination of face recognition, close-captioning information, and original television scripts to automatically name the faces in the videos. There is still a long way to go, but this innovation is seen as the first step in getting automated descriptions of the happenings in a video.

Google Web APIs

Google Web APIs are for developers and researchers interested in using Google as a resource in their applications and enables them to easily find and manipulate information on the web and to query more than 8 billion web documents directly from their own computer programs. Google uses the SOAP and WSDL standards to act as an interface between the user’s program and Google API and officually support Java, .NET, Ruby and Perl. By using the API developers can issue search requests to Google's web pages index and receive results as structured data (number of results, URI's, Snippets, Query Time, etc.). Additionally developers can access information in the Google cache and can check the spelling of words. To start using the API one needs to download and install the API package from http://www.google.com/apis/ and create an account to get an license key (however the google FAQs state that google is no longer issuing new API keys, so this step can be problematic). The key I received last year is limited to 1,000 queries/day. Last but not least one will need a SOAP implementation like e.g. Apache Axis, SOAP::Lite for Perl, SOAP4R if Ruby is the language of choice, etc. The installed API package contains:

googleapi.jar - Java library for accessing the Google Web APIs service.
GoogleAPIDemo.java - Example program that uses googleapi.jar. dotnet/
Example .NET - programs that uses Google Web APIs.
APIs_Reference.html - Reference doc for the API. Describes semantics of all calls and fields.
Javadoc - Documentation for the example Java libraries.
Licenses - Licenses for Java code that is redistributed in this package.
GoogleSearch.wsdl -WSDL description for Google SOAP API.
soap-samples/ - Different examples

Following an small example, I found somewhere on the web, of using the SOAP::Lite SOAP implementation to make a Google query:

Example: query.pl

#!/usr/local/bin/perl –w
use SOAP::Lite;
# Configuration
$key   = "The Google API Key Goes Here";
# Initialize with local SOAP::Lite file
$service = SOAP::Lite
   -> service('file:GoogleSearch.wsdl');
$query= “Viadrina”; 
$result = $service
   -> doGoogleSearch(
                     $key,     # key
                      $query,   # search query
                      0,        # start results
                      10,       # max results
                      "false",  # filter: boolean to turn on/off automatic filtering
                      "",       # restrict (string) , e.g. "linux"
                      "false",  # safeSearch: boolean
                      "",       # language restrict e.g. lang_de
                      "",       # input encoding
                      ""        # output emcoding
                      );
                    
if(defined($result->{resultElements})) {
   print join "\n",
   "Found:",
   $result->{resultElements}->[0]->{title},
   $result->{resultElements}->[0]->{URL},
   $result->{resultElements}->[0]->{snippet} . "\n"
    } 

print "\n The search took ";
print  $result->{searchTime};
print "\n\n";
print "The estimated Number of results for your query is: ";
print $result->{estimatedTotalResultsCount};
print "\n\n";

Tuesday, July 14, 2009

SOAP implementations: Axis2 vs gSOAP vs PocketSOAP

SOAP is a lightweight protocol for exchange of information in a decentralized, distributed environment. It is an XML based protocol that consists of three parts: an envelope that defines a framework for describing what is in a message and how to process it, a set of encoding rules for expressing instances of application-defined datatypes, and a convention for representing remote procedure calls and responses. There are several implementations of SOAP (different sources list up 80 SOAP implementations) and they differ in their support for class binding, ease of use and performance. Asix2, gSOAP and PocketSOAP will be shortly discussed in the following.

Axis2
Apache SOAP was developed by IBM alphaWorks and donated to the Apache Software Foundation. It is a Java package, which uses its own object model (AXIOM) and Streaming API for XML (StAX) parsing to achieve, in order to increase performance. Axis functions as a plug into a servlet engine, such as Tomcat. That is, one simply add an Axis directory hierarchy to the Tomcat WEB-INF directory and adjusts the CLASSPATH under which Tomcat runs. The Axis distribution also includes tools for mapping between Java (bean) classes and XML elements/attributes as well as a TCP monitor in which one can see the SOAP messages going to and from a SOAP engine. Additionally the Axis architecture gives the developer complete freedom to insert extensions into the engine for custom header processing, system management, or anything else imaginable. However C/C++ implementations are also available. One of the key advantages of Axis2 is that it keeps logic and data in separate components. Users typically have two types of data models, static and runtime. Static data consists of AxisService, ServiceGroup, AxisOperation, AxisConfiguration and others. The dynamic data hierarchy consists of ConfigurationContext, ServiceGroupContext, ServiceContext, OperationContext and MesasgeContext. This also provides better support for concurrency handling.
gSOAP
gSOAP is a cross-platform development toolkit for C and C++ SOAP XML Web services (SOAP1.1/1.2,WSDL1.1). gSOAP supports XML serialization of native C/C++ data types. Includes SOAP/XML engine, Web server, stub/skeleton compiler, WSDL tools, and much more. The gSOAP tools provide a SOAP/XML-to-C/C++ language binding to ease the development of SOAP/XML Web services and client application in C and C++. Most toolkits for C++ Web services adopt a SOAP-centric view and offer APIs that require the use of class libraries for SOAP-specific data structures. This often forces a user to adapt the application logic to these libraries. In contrast, gSOAP provides a C/C++ transparent SOAP API through the use of compiler technology that hides irrelevant SOAP-specific details from the user. The gSOAP stub and skeleton compiler automatically maps native and user-defined C and C++ data types to semantically equivalent XML data types and vice-versa. As a result, full SOAP interoperability is achieved with a simple API relieving the user from the burden of SOAP details, thus enabling him or her to concentrate on the application-essential logic. Additionally gSOAP supports pure C, which is essential for many system-oriented applications developed in C. Last but not least gSOAP uses XMLpredictive pull parsing, streamingmediatechniques,andhigh-performance and latency hiding methods, which leads to a performance that can surpass this of
JavaRMI and IIOP.
PocketSOAP
PocketSOAP is an Open Source SOAP client Component Object Model (COM) component, originally targeted for the PocketPC, that is now developed for Pocket PC and Windows 95, 98, Me, NT4.0 and 2000. PocketSOAP can make remote procedure calls to SOAP servers implemented with 4s4c, ROPE, Apache SOAP, SOAP::Lite, DM's SOAP/Perl and the XMethods SOAP Server. The package includes a Hypertext Transfer Protocol (HTTP) client for making HTTP based SOAP requests, but other transports can be added. PocketSOAP is distributed under the Mozila Public License (MPL). PocketSOAP includes the following features: SOAP Section 5 encoding support, support for document/literal style SOAP services (such as ASP.NET), attachments support via both DIME and SOAP with Attachments, Support for SOAP, HTTP 1.1 support including persistent connections, SSL, proxies, authentication, proxy authentication, redirects, cookies and compression.

Tuesday, July 7, 2009

Why the development of information systems is a complex task!

IT Project failure is not an random event. Statistics (e.g. the Chaos Chronicles report of the Standish Group) show that over 75% of all IT projects, fail. As a consequences decreased revenues, damage to reputation, exposure to legal liabilities, and a productivity decreases are the result. There are a lot of reasons to explain these failure and one of them is that over the years, the Development of Information Systems has become an very complex task. This complexity has many sources and aspects:

Todays business environment of stiff competition and very short times to market, requires a continuous reduction in cost levels, and an increased quality and service level to customers, while software continues to grow more complex. Many software suppliers cannot keep pace with the demand for new products, so they frequently set overly aggressive schedules for new product development. As a result more often than not, quality suffers. Managing the balance between these tradeoffs is a big challenge.
In today's uncertan and unstable economical and technological climate, requirements on information systems change on a daily basis. Changing requirements implicate rework in design and implementation and thus impact schedules and cost which endangers the whole project.
As of today, organizations invest greatly in information technology in order to improve their operational and strategic position. Therefore the extent to which information systems (IS) play a part in organizations is increasing rapidly. The interaction between IT and organization is very complex and influenced by many factors, including the organizations structure, operating procedures, politics, culture, environment and management decisions. Consequently, the main problem for an organization in achieving viability is the complexity and uncertainty exhibited by itself and its environment.
The distributed nature of today's enterprise requires distributed IT infrastructure in an effort to ensure that employees have reliable access to business applications. This makes high-availability and security of today's information systems critical and their architecture has to support these increased user demands. Additionally distributed workforce requires different management techniques and skills to keep motivated, productive and on track.
Last but not least the multitude of different stakeholders (users, developers, architects, requirements engineers, customers, etc) involved in a information systems development require consideration of competing objectives, risk and uncertainty.

Sunday, June 14, 2009

Usage and Purpose of XML Namespaces

The Problem:
According to the XML data model, an XML document is a hierarchy of nested elements, consisting of a name and a set of attributes. The attributes also have a name and a value. All these tag names are defined by the developers. This freedom however comes with an inherent problem attached. Different people work with different domains, but the phraseology used can often be common. Applications make use of the elements' names and attributes to determine how to process the element. In a distributed environment like the Internet this is rather problematic, as different people might use the same element names to mean different things. One XML document may use the element table to describe a html table, another one may use a table element to describe a furniture, however applications aren't smart enough to judge the difference between the context of elements from different markup languages that share the same name. Thus, due to the name collision and ambiguity an application has no way of knowing how to process the table element.

Code Sample: HTML table element

<table>
<tr>
    <td>Product</td>
    <td>Price</td>
</tr>
<tr>
    <td>Coffee Table</td>
    <td>199.99</td>
</tr>
</table>

Code Sample: Furniture table element

<table sku=”12222221″>
<type>Coffee Table</type>
<price>199.99</price>
<inStock>yes</inStock>
<material>maple</material>
</table>

The Solution:
The XML namespaces recommendation defines a way to distinguish between duplicate element type and attribute names. It allows you to resolve ambiguity and avoid "collisions", so that schemas created by one organization will not conflict with those created by another. Just as two Java classes can have the same name as long as they are defined in separate packages, two XML elements can have the same name as long as they belong to different namespaces. When you place a set of tags into a namespace, the tags are given a context and the ability to retain a unique ID based on the context in which they are used. In other words, it becomes possible to utilize both of the mentioned earlier table-tags, even though they are named identically, in the same document while retaining different meanings for the two tags.

A namespace is declared using the reserved XML attribute xmlns, the value of which must be a URI (Uniform Resource Identifier) reference, which is usually a URL. However the URI has no semantic meaning and is not actually read, it is simply treated by an XML parser as a string. Using a URI to identify a namespace, rather than a simple string (such as "xhtml"), reduces the possibility of different namespaces using duplicate identifiers. The declaration includes a short prefix with which elements and attributes can be identified e.g. xmlns:xhtml=http://www.w3.org/1999/xhtml. After such a definition each elements belonging to the specific namespace has to be qualified with the prefix. Doing this repeatedly for each element can be painful. In such cases, you can declare a default namespace instead. However, at any point in time, there can be only one default namespace in existence. Declaring a default namespace means that any element within the scope of the default namespace declaration will be qualified implicitly, if it is not already qualified explicitly using a prefix. As with prefixed namespaces, a default namespace can be overridden too. The scope of an XML namespace declaration is that part of an XML document to which the declaration applies. An XML namespace declaration remains in scope for the element on which it is declared and all of its descendants, unless it is overridden or undeclared on one of those descendants

Code Sample: HTML table and Furniture table with namespaces

<table xmlns=”http://www.w3.org/tr/xhtml”
          xmlns:furn=”http://www.furniture.org/tables”>
<tr>
    <td>Product</td>
    <td>Price</td>
</tr>
<tr>
    <td>
      <furn:table sku=”1222221″>
        <furn:type>Coffee Table</furn:type>
      </furn:table>
    </td>
    <td>
      <furn:table sku=”1222221″>
        <furn:price>199.99</furn:price>
      </furn:table>
    </td>
</tr>
</table>

Saturday, June 6, 2009

Thought on COCOMO2 used for software development projects.

COCOMO (COnstructive COst MOdel) was developed by Barry Boehm in the early seventies by collecting data from many projects to gather an empirical database of development efforts for tasks included in these projects. Thus COCOMO provided the first solid data on the productivity of engineers in the workplace. In the nineties, Boehm launched the COCOMO II project, attempting to gather similar data on a much broader scale and to address some of the changes to software development processes and methodologies from the last two decades since COCOMO was first introduced (e.g. prototyping, incremental development, component reuse, CASE tool support, etc.). The COCOMO II equation embeds many project parameters and is defined
as follows:

Effort = A x Size^B x M,

where Effort refers to the person months needed to complete the project; A represents the type of project and there are three possible values for this parameter; Size is defined by using a SLOC estimate or function point count; B is a derived metric which includes the sum of five cost driver metrics and M is a metric for effort multiplier. The COCOMO II equation defines seven effort multipliers for early life cycle estimating. One of the main diffculties applying the COCOMO II technique is coping with the very broad solution
space. Trying to perform an effort estimation using the COCOMO II method at an early project stage, would mean that a product manager would have 3 options to choose a project type, 55 options for cost drivers, 57 options for effort multipliers and an effort value for the Size parameter or alltogether 730000000 different settings combinations, which is too much to review even for the eagerest manager. Another problem with the COCOMO II technique is that it requires, already in an very early project stage, a project size estimation, which however is kind of paradoxical, as if such estimation would exist - it would be fairly easy to formulate a reasonable effort estimation.

Sunday, May 10, 2009

Siemens Rockville Climbing Centre

Last Thursday I was climbing for the first time. Now I understand why so many people from very diverse backgrounds dedicate so much effort to climbing. Beside the fact that it's just almost too much fun, climbing is great for building strength and flexibility. Since the complexity of climbing movement is almost infinite and much of the challenge involves problem solving, you tend not to get bored with your workout. You are flexing your mind as well as your muscles and learning teamwork, as roped climbing requires the coordinated effort of two people - Every climber needs a belayer, someone to control the safety ropes while I climb.

Monday, April 20, 2009

Avenue Q on Broadway!

Yesterday I was on Broadway to see the Avenue Q musical. Avenue Q is hands down the most entertaining I’ve watched so far. I laughed from beginning to end. Supremely highly recommended if you are visiting New York City, as in don’t miss it. I bought the soundtrack immediately on iTunes and I’m still laughing. That might not sound too amazing, but seeing as how I don't remember the last time I actually bought a CD, I was moderately shocked by my own behavior.

Don’t expect elaborate sets or dancing here, but the puppetry is ingenious and the script simply excellent. It absolutely deserved the Tony Award for Best Musical over Wicked.

I know some of you expect Broadway shows to have amazing sets and costumes, but Avenue Q demonstrates that you don’t need any of that to have a great time.

Avenue Q" is a old show, but it's still the hot ticket on Broadway. It’s "Sesame Street" meets "South Park" with humans and puppets interacting in a familiar way on…well…unfamiliar topics. The song titles include "If You Were Gay," "Everyone’s A Little Bit Racist," "The Internet Is For Porn," "You Can Be as Loud as the Hell You Want (When You're Makin' Love)" and "Schadenfreude".

If you have RealPlayer you can see parts of some of the songs here: "IF YOU WERE GAY", "THE INTERNET IS FOR PORN", "EVERYONE'S A LITTLE BIT RACIST".

Saturday, March 14, 2009

Mac OS X - Sends other unix boxes to /dev/null

Just today, I was referred to as being “fanboyist at it's finest” whet it comes to Apple and Mac OSX. And I decided this topic deserves a post all of its own and would be a great start for my new blog ...

Yes, indeedy! I am A Mac OSX and Apple fan all the way.. and I really can't understand why some people always have to start bitching around when they hear Mac OSX .. especially people who have never seen or played with this awesome OS. Apple is now the LARGEST UNIX vendor in the world and the second innovative it-company right after Google! So what's wrong with being a fan and getting angry when somebody talks drivel?

Mac OSX & Open Source

Mac OS X is a great OS - it preserves the strength and spirit of BSD, but also adds value & improves the base. To many, Darwin/Mac OS X is the "fifth BSD" and apart from few architectural differences i.e.the Mach kernel, which I want to refer to later, Darwin/MacOS is as compatible as possible with FreeBSD (which Apple uses as reference platform). Additionally Apple has “opensourced” the whole core of MacOS (the userland and the kernel) and now others have a real say in how the operating system evolves. There’re instructions from Apple on how to build your own kernel and keep things. neat and organized. With its BSD heritage, OS X is not some newbie operating system. It is a mature, extensible one. On Mac OS X, you actually have two parts to the overall system: Aqua, helping define the Appleness to the OS graphical user experience, and Darwin, the core UNIX foundation the system is built on.

Apple is even somewhat behind the MacPorts project, which brings a second-generation system for the building, installation and management of third party software - very similar to the FreeBSD ports system, but better! If that is not enough, there are still Gentoo's Portage, FreeBSD ports, NetBSD pkgsrc available for OS X and if want binaries instead of compiling everything from source there is the Fink project, which also ports open-source applications to MacOS X, but uses the Debian package management system apt. X11 (a windowing system originated from MIT) is also available as part of Mac OS X. With X11 you get the output of 20+ years of other folks grinding out useful tools and applications -- commercial and otherwise. In case you need any other software, most packages available on other flavors of UNIX have been ported to OS X.

And YES Apple does contribute to the Open Source community, too. Look at how WebCore developed. When Apple created Safari, it used the Open Source kHTML rendering engine. When, in Mac OS X 10.4, it replaced kHTML with WebCore, the Open Source community became offended. Of course, The GPL license would allow Apple to do this. I saw an article from an Apple developer saying, “Hey! Wait a minute. Apple has vastly improved kHTML in WebCore.” And a few days later, Apple releases WebCore’s source code. So much on Apple and Open Source.

Apple & Commersial Software

One of the things I leaned here is that Commercial software is not always a bad thing. When you decide your time and level of productivity outweigh the cost of acquiring OSS or commercial software, you make the investment. And there are applications for OSX, you poor unix/windows/solaris/whatever_else users have never dreamed of :)! There are students who will take the time needed to make their Linux installations productive environments. However, I'd wager the time needed to load and configure an equivalent amount of productivity tools on MAC OS X or Windows is less than on Linux. And the number of people willing to invest that extra time is much smaller than those willing to get a Mac or Windows system. I mean who wants to work with this crappy Oppenoffice.org anyway :).

Before you come up with ways to contradict me let me say that I do agree, that Linux and BSD are great OS’s too. I myself used them both for many, many years and that’s why I know they are a great choice as a server, just not as the preferred desktop productivity environment -- at least not quite yet (Ubuntu anyway acquires a taste for it). To summarize this section: because it's aimed at consumers, OS X has no problems in some of the areas where Linux/BSD still lags, such as multimedia performance and productivity tools. Some Linux users must run a separate PC for these tools, but the Mac user doesn’t have to.

OS X as a development platform

When it's time to work on programming projects for school or personal curiosity, install the developer tools, gratis in OS X. GCC, Java, Perl, Python, Objective-C, and Ruby are all at your fingertips. About the only languages or compilers I can think of not included are Fortran and C#.

Anyway mono and gcc-fortran can be installed through one of the many package managers I already mentioned. Once you're in the Terminal program for command-line access, you'll also find the familiar GNU tools, as would be found on most UNIX systems. There are even open source x86 emulators, such as BOCHS, if you want to spend time tweaking your own emulator instead of purchasing Virtual PC. Essentially, the visually stunning Aqua user interface doesn't prevent the inquisitive user from going to town customizing the system and developing tools just as other UniX users do.

The OSX kernel: XNU

Here is the article a colleague told me about today: “Next Version of Mac OS X to use Linux Kernel”. I hope this is a basilisk. Linux would make a really bad kernel for Mac OS X. It would turn the Mac development world upside down unnecessarily, and there's nothing inherently bad about Mach anyway. The people who talk about how XNU is inherently several times slower than Linux are either uninformed, liars or idiots.

Apple has better things to do than chase fleeting buzz. The XNU kernel works. Mach provides XNU with a competitive advantage in its unique design. It's investable. Apple's intending to build for 15 years on Mac OS X, and they're comfortable with the foundation they've designed.

Sure, there's plenty of room for optimization throughout Mac OSX, as there is with any software product. In that regard, Apple can import technology from the various BSDs, each with its different focus. In contrast, Linux has one codebase, and a general focus that isn't closely aligned with Apple's.

Mach capabilities or limitations are a red herring, as a quick glance at wikipedia would show. XNU is a heavily hacked hybrid of Mach 3.0 and 4.4BSD (FreeBSD in practice), and bears almost no resemblance to the original NeXT-Mach.

On the other hand Apple announced, that the new Leopard will be FULLY POSIX COMPLIANT. In fact the beauty of the POSIX environment is that one could switch between different kernels with a little effort still build a stable and secure OS which is what makes this OS design of separating the OS kernel from the userland tools such an exciting proposition. So the announcement of Leopard being FULLY POSIX compliant might be an indication for a possible kernel switch. Apple has already announced its wish to make the Solaris ZFS the default FS for the 10.6 release, so I can’t be sure that the kernel switch is just a rumor!

Conclusion

I've been a user of Windows, Linux, serval BSD’s Solaris and finally Mac OS X throughout the years. From what I've seen in the past few years, Mac OS X offers the best combination to satisfy your inner geek and get your day-to-day tasks out of the way. Sometimes you want to explore, other times you just want to avoid hassles and get your work done. Mac OS X is the place I now call home. I think many others would be happy here too if they realized the benefits this platform provides.