Mihai Fonoage's Blog

Wednesday, November 19, 2008

(Software) Agents and the Semantic Web

I thought about why the concept of Agent has not taken off, why people/companies do not use them as widely as they use other technologies. Two reasons came in mind.

First, there is the security issue. How does one know that the software agent has not been tampered with, or that the information it caries is secure enough, or that other agents it meets are "good" agents, not "bad" ones. Since the concept of agent has to include the property of being autonomous, how can we restrict the agent in its actions and decisions without greatly reducing his autonomy?

Second, the job of an (software) agent is to talk to other systems, interpret and gather data, make decisions on that data, and present it back to the user in a readable and useful way.

While the first reason might have been already overcome, the second has not. In my opinion, to overcome this, one has to use the Semantic Web. If you add meaning to data, agents can interpret it and make proper decisions, without having to ask the user for guidance at every step (thus loosing its effectiveness, and some of the main properties it's supposed to posses, like being autonomous) . I am sure this is not an novel idea, and that people have already thought about it, but for me it makes sense. I envision an "Agent Store", or "Agent Market", where people would go and "rent" or "buy" agents to fulfill their immediate or long time needs, such as paying all utilities (power, phone, internet, cable, credit cards, etc), or scheduling a doctor appointment, etc.
Wouldn't this be nice?

Tuesday, November 18, 2008

Frequently Forgotten Fundamental Facts about Software Engineering

Interesting article by Robert L. Glass, initially printed in 2001, vol. 18 of IEEE Software. The author writes about "forgotten fundamentals" facts regarding software engineering.

Wednesday, November 12, 2008

Self-Healing Hulls

In the 2008 November issue of IEEE Spectrum, there is an article that talks about self-healing hulls. The carbon-fiber composition of a yacht can considerably heal itself after a collision. You can improve the healing process by inducing a little electric current. The work is being done by Eva Kirkby, a graduate student from EPFL. The main idea is that carbon-fiber is composed of carbon fibers and epoxy; the problem is that in case of impacts, these materials tend to separate internally causing cracks parallel to the surface of the material. In order to counter this problem, the material is infused with hundreds of very small bubbles filled with liquid-monomer molecules plus some small particles of catalyst. The outcome would be a hardening of the material. In order to keep the concentration and size of the bubbles to a minimum, Kirkby incorporated into the composition wires of a smart alloy, an alloy that can return to its initial shape after being deformed by applying heat (electricity) through it. Great idea!

Dilbert's self-aware comic strip

In my recent posts I kept mentioning biologically-inspired concepts such as self-aware, self-healing etc. Here is a comic view of self-awareness:

I love it!

Monday, November 10, 2008

Do agile teams model or write documentation?

To be honest, I had the misconception that agility and modeling are at the opposite poles, that agile teams right little or no documentation. That all changed once I read an article from Dr. Dobb's Journal. Some of the reasons agile teams do up-front modeling is "to answer questions around the scope that they're addressing, the relative cost and schedule, and what their technical strategy is." Another reason is to better grasp and manage the complexity of system architecture.

Some agile modeling an documentation best practices are mentioned as doing "some initial requirements and architecture envisioning early in the project to write executable specifications via a Test-Driven Development (TDD) approach, to single source information whenever possible, to write documentation later in the lifecycle, to promote active stakeholder participation, to implement requirements in priority order, to include modeling in iteration/sprint planning activities, to create models and documents that are just barely good enough for the situation at hand, to model storm the details on a just-in-time (JIT) basis, to sometimes model a bit ahead to explore complex requirements, and to take a multiview approach via multiple models".

One of the complaints that exist with using agile methodologies is that they cannot be applied on large-scale projects and in large development teams. For those kind of projects, plan-driven, model-based solutions are better suited. To achieve this kind of scalability, there is an agile version of Model Driven Development (where MDA is one example of it), called AMDD or Agile Model Driven Development. The difference is that instead of creating extensive models, you create instead agile models. Furthermore, with AMDD, you do just a little of modeling, followed by a lot of coding.

Tuesday, November 4, 2008

Controlled Chaos

In the 2007 December issue of IEEE Spectrum entitled Controlled Chaos, the authors describe a new generation of algorithms based on concepts related to the thermodynamic concept of entropy, which is a measure of how disordered a system is. By the fact that malicious code changes the flow of data in the network, the entropy of the network is thus altered. The new malicious threat, called Storm, uses different ways to be installed on the host machine, mostly through email attachments. Hot do we protect the networks? First step is to know how the network traffic moves around the network. Such collections of data from nodes in the network are possible because routers or servers are configured in such a way as to provide information about the network traffic in form of source and destination IPs, source and destination port numbers, the size of the packet transmitted, and the time elapsed between packets. Information regarding the routers themselves is also collected. Such information is used by the proposed algorithms to build a profile of the network’s normal behavior. It is stressed that the entire network is monitored, not just one single link in the network.

The principle behind the entropy-based algorithms is the fact that "Malicious network anomalies are created by humans, so they must affect the natural "randomness" or entropy that normal traffic has when left to its own devices. Detecting these shifts in entropy in turn detects anomalous traffic." When the network has established patterns, any outcome that is different from the normal states of the network can be easily detected. Even if the malicious code manifests by downloading pictures from the internet, the fingerprint of the network would look unusual, different from what is expected, from how the network was used. The authors make an interesting point, namely that Internet traffic has both uniformity and randomness. A worm will alter both, making the traffic either more random, or more structured. In case of the 2004 Sasser attack, the information entropy associated with the destination IP addresses rises suddenly, indicating an increase in randomness in traffic destinations due to the scanning initiated by the infected machines, as it looks for new victims. At the same time, the entropy associated with the source IP addresses suddenly drops, indicating a decrease in randomness as the already infected computers initiate a higher than normal number of connections. The conclusion is that the network goes into a new internal state unknown before, hence easily detectable.

The Storm worm I mentioned at the beginning works in some perspective similar to other worms, namely new code is placed on the computer (because the user clicks on some attachment), which will make it to join a botnet. However, there are distinct differences between old warms and Storm. One of them is the way it makes the user click the attachment, like using a clever subject line for the email, or attachment name, related to hot topics that are currently on the news, such as elections, hurricanes, major storms, etc. Most importantly, Storm hides its network activity. It first looks what ports and protocols a user is using. If it finds a P2P program, such as eMule, Kazaa, BitComet etc, it will use that program’s port and protocol to do its network scanning. Storm will also look at what IP addresses the P2P program communicated with, and will communicate with them, instead of new IP addresses, which would trigger its detection. Furthermore, Storm will not spread as fast as it can, because it has a dormant and a walking mode. It will gather information for a short period, then it will go quit. Very interesting that Storm actually tailors its behavior based on the pattern of the network usage. How to detect Storm? The worm will still alter the network entropy. For example, during its active period, the host computer will send many emails, which is unusually for normal use. In addition, the port used is not 25. All these are hints that something is wrong inside the network.

A great article! Nothing short of what I am used to expect from IEEE Spectrum.

Monday, October 27, 2008

On System Design

In an essay entitled "On System Design", Jim Waldo, a distinguished engineer with Sun Microsystems, expresses his ideas and opinions on system design, including factors that hinder the effort necessary to do system design.

Some of the techniques that aid in the process of good system design are Six Sigma, or UML. The latter helps in the development of good design, but it is you (as an individual or a team) that has to come up with the design. UML helps you in expressing that design better. Many approaches have been found to work, but there is no general approach that works on all projects. Depending on the project, you have to decide which approach better suites your needs. The only exception to this is given by Brooks, namely that good designs have one thing in common, that is they are produced by good designers. Jim Waldo explained this in terms of probability/statistics: those who have been able to produce a good design in the past are far more likely to produce a good design in the future. Furthermore, designers have to be bale to deal with ambiguity while creating the design, and to know that the question that cannot be answered right away, will be eventually answered at the appropriate time. The author mentions that instead of learning a process of design, it is better to learn how to best design. An indicative to the best way to learn how to design includes two factors:

1. Trying, failing, and then trying again.
2. Being under the guidance of someone who is an expert at the task.

It si the way of thinking that matters most in system design, rather than the subject that can be thought in a particular course. Interesting enough, the author mentions that System Design cannot be learned in one course, but should rather be though in a much larger process, similar to an apprenticeship. The relationship needed between a master and its apprentice is found at the graduate level, where each student has an adviser with whom he is involved in academic research. One other place is at work (in industry) where you get hands-on insight on system design. I would argue that at work, you are more constrained in terms of time; hence, there is less opportunity in learning system design.

Diversity in the sense of academic backgrounds (physics, philosophy, anthropology, computer science, music, etc), and how such a diversity creates different views on a given problem, is described as being beneficial. I would particularize this to software development in the sense of not trying to think of a solution for a problem in terms of a particular programming language, but instead design the solution independent of the software platform and only afterward think of what language can help me express this design the best (similar to the MDA concepts).

The problem with doing design up front is the fact that no immediate code is produced. The pressure to produce something that can be seen, or the measure of productivity by means of the number of lines of code written, still exists. What is needed is "an act of faith by management". He continues by saying that "if you find a manager who is actually willing to give you time to do the design task, stick with him or her. He or she is a treasure much rarer than gold."

The paper also mentions intellectual property (as a note on why system design is not shared among peers anymore), and how any patent has the same value as any other patent. For me this is unbelievable; it is equivalent to saying that ideas that are behind the pattern are the same, namely have the same value. The quality of an idea is not taken into consideration at all by the patent system. Furthermore, in case of a patent litigation, the jury is made up of ordinary (non-technical) people, not technical peers. Hence, "it is very hard to learn about good system design unless you can see and study other system designs, both good and bad. The intellectual property atmosphere in industry has limited the number of designs that are actually talked about, and has convinced many system designers that they should not even look at the designs that are available."

Ways of solving the lack of system design from academia and industry are given in forms of agile methodologies and open source projects, because they enhance the discussion of a system's design in several ways: the communication between two (or a small group) of programmers can help in clarifying and better understanding the design of a system; furthermore, communicating the design to other members of an open source project will help clarify the design of a system. It is interesting to note that the author provides a different view to both agile methodologies and open source projects, namely them being a reaction to the impossibility of doing system design in academia and in the industry. Overall, the paper makes for an inspiring reading. You find yourself nodding and agreeing with the statements made while reading the essay.