Planet Internet of Services (light version without commits)

March 20, 2014

Effingo: Near Duplicate Detection

Joining the Ubuntu App Showdown

I am participating at the Ubuntu App Showdown and made a short brainstorming about possible apps. Continue reading

by klemens at March 20, 2014 03:30 PM

July 12, 2013

Dynvoker/SPACE Research

Deadline approaching: 6th IEEE/ACM International Conference on Utility and Cloud Computing

UCC 2013 Banner

Do you have original and sound research results concerning compute and storage clouds, distributed computing, crowdsourcing and human interaction with clouds, utility, green and autonomic computing, scientific computing or big data? (Or do you know people who surely have?)

UCC 2013, the premier conference on these topics with its six co-located workshops, welcomes academic and industrial submissions to advance the state of the art in models, techniques, software and systems. Please consult the Call for Papers and the complementary calls for Tutorials and Industry Track Papers as well as the individual workshop calls for formatting and submission details.

This will be the 6th UCC in a successful conference series. Previous events were held in Shanghai, China (Cloud 2009), Melbourne, Australia (Cloud 2010 & UCC 2011), Chennai, India (UCC 2010), and Chicago, USA (UCC 2012). UCC 2013 happens while cloud providers worldwide add new services and increase utility at a high pace and is therefore of high relevance for both academic and industrial research.

UCC 2013 Website

July 12, 2013 04:28 PM

April 13, 2012

Effingo: Near Duplicate Detection

Deploying a JSF 2.0 Web Application using EL 2.2 with Glassfish 3, Jetty 8 and Tomcat 6 and 7 as Maven 2 and 3 Project

I tried for two days to deploy the first prototype of effingo to my Ubuntu Server 10.04 still running Tomcat 6. Effingo is a JSF 2 Web Application using Expression Language in Version 2.2, which is not supported by Tomcat 6. It would work with Tomcat 7 but since that server also runs several other critical web applications an update is no solution. So I searched the web for a solution and was shocked by the hugh amount of outdated, wrong or confusing information on the topic. Therefore I will give a step by step explanation on how to achieve this task. While trying this I also got the whole project to run on an embedded Glassfish 3 Web Application container and updated my embedded Jetty Server to the most recent version. I will give explanations on how to achieve this too. That way it is easy to quickly test a web application under several different application servers.

The tutorial is based on the best tutorial I could find on the web which is located here and the only working solution for getting JSF 2.0 applications to work with Tomcat 6 which is located here.

I used the helloworld project developed by in his tutorial. This is some simple code using basic JSF 2.0 features. Most important for me was the ability to use parameters inside Expression Language statements like

#{hi.greetFrom('initial page')}

You can download this project as zip archive or from an SVN repository.

You can already try to run this project from using one of the following three commands:

$ mvn package tomcat:run-war
$ mvn package jetty:run-war
$ mvn package -Pglassfish

All of them will fail with some error. To solve these issues we need to edit the dependencies in the pom.xml and some entries in web.xml so our application is able to find the correct classes. Let’s look at the modified pom.xml at first.

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">

 <modelVersion>4.0.0</modelVersion>
 <groupId>com.googlecode.sandcode</groupId>
 <artifactId>helloworld</artifactId>
 <packaging>war</packaging>
 <name>${project.artifactId}</name>
 <url>http://dmakariev.blogspot.com/2009/12/jsf-20-with-maven-2-plugins-for.html</url>
 <version>1.0</version>
 <developers>
  <developer>
   <name>Dimitar Makariev</name>
   <email>dimitar.makariev at gmail.com</email>
   <url>http://dmakariev.blogspot.com/</url>
  </developer>
 </developers>

 <build>
  <defaultGoal>package</defaultGoal>
  <sourceDirectory>src/main/java</sourceDirectory>
  <finalName>${project.artifactId}</finalName>
  <plugins>
   <plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-compiler-plugin</artifactId>
    <configuration>
     <source>1.5</source>
     <target>1.5</target>
    </configuration>
   </plugin>
  </plugins>
 </build>

 <repositories>
  <repository>
   <id>java.net.m2</id>
   <url>http://download.java.net/maven/2</url>
  </repository>
  <repository>
   <id>java.net.glassfish.m2</id>
   <url>http://download.java.net/maven/glassfish</url>
  </repository>
 </repositories>

 <pluginRepositories>
  <pluginRepository>
   <id>java.net.glassfish.m2</id>
   <url>http://download.java.net/maven/glassfish</url>
  </pluginRepository>
 </pluginRepositories>

 <profiles>
  <profile>
   <id>default</id>
   <!-- Tests are disabled by default. See the test profile -->
   <activation>
    <activeByDefault>true</activeByDefault>
   </activation>
   <build>
    <defaultGoal>install</defaultGoal>
    <plugins>
     <!-- Embedded Jetty (jetty:run-war) -->
     <plugin>
      <groupId>org.mortbay.jetty</groupId>
      <artifactId>jetty-maven-plugin</artifactId>
      <configuration>
       <!-- force friendly name instead of artifact name + version -->
       <contextPath>${project.build.finalName}</contextPath>
       <!-- This parameter will auto-deploy modified classes. -->
       <!-- You can save changes in a file or class and refresh your browser to view the changes. -->
       <scanIntervalSeconds>3</scanIntervalSeconds>
      </configuration>
     </plugin>

     <!-- Embedded Tomcat (package tomcat:run) -->
     <!-- Standalone Tomcat (package tomcat:deploy) -->
     <plugin>
      <groupId>org.codehaus.mojo</groupId>
      <artifactId>tomcat6-maven-plugin</artifactId>
      <version>2.0-beta-1</version>
      <configuration>
       <path>/${project.build.finalName}</path>
       <!-- Embedded port -->
       <port>8080</port>
      </configuration>
     </plugin>
    </plugins>
   </build>
   <dependencies>
    <dependency>
     <groupId>com.sun.faces</groupId>
     <artifactId>jsf-api</artifactId>
     <version>[2.0.1,)</version>
    </dependency>
    <dependency>
     <groupId>com.sun.faces</groupId>
     <artifactId>jsf-impl</artifactId>
     <version>[2.0.1,)</version>
    </dependency>
    <dependency>
     <groupId>javax.el</groupId>
     <artifactId>el-api</artifactId>
     <version>2.2</version>
     <scope>provided</scope>
    </dependency>
    <dependency>
     <groupId>org.glassfish.web</groupId>
     <artifactId>el-impl</artifactId>
     <version>2.2</version>
     <scope>runtime</scope>
    </dependency>
    <dependency>
     <groupId>javax.servlet</groupId>
     <artifactId>servlet-api</artifactId>
     <version>2.5</version>
     <scope>provided</scope>
    </dependency>
    <dependency>
     <groupId>javax.servlet.jsp</groupId>
     <artifactId>jsp-api</artifactId>
     <version>2.1</version>
     <scope>provided</scope>
    </dependency>
    <dependency>
     <groupId>javax.servlet</groupId>
     <artifactId>jstl</artifactId>
     <version>1.2</version>
     </dependency>
    </dependencies>
   </profile>
   <!-- embedded Glassfish v3 ( -Pglassfish ) -->
   <profile>
    <id>glassfish</id>
    <build>
     <defaultGoal>package</defaultGoal>
     <plugins>
      <plugin>
       <groupId>org.glassfish</groupId>
       <artifactId>maven-embedded-glassfish-plugin</artifactId>
       <version>3.1.1</version>
       <configuration>
        <app>${project.build.directory}/${build.finalName}.war</app>
        <port>8080</port>
        <contextRoot>${build.finalName}</contextRoot>
        <autoDelete>true</autoDelete>
       </configuration>
       <executions>
        <execution>
         <phase>package</phase>
         <goals>
          <goal>run</goal>
         </goals>
        </execution>
       </executions>
      </plugin>
     </plugins>
    </build>
    <dependencies>
     <dependency>
      <groupId>javax</groupId>
      <artifactId>javaee-web-api</artifactId>
      <version>6.0</version>
     </dependency>
    </dependencies>
   </profile>
  </profiles>
</project>

I applied the following modifications to make it run:

  • Renamed artifactId of Jetty plugin from maven-jetty-plugin to jetty-maven-plugin. The second one uses the most recent version (8 instead of 6) from March 2012.
  • I removed the connectors section since it does not work with the new Jetty plugin and is not necessary anyways.
  • I changed the srtifactId of the Tomcat plugin from tomcat-maven-plugin to tomcat6-maven-plugin since the new version of the plugin is separated into one for tomcat6 and one for tomcat7. If you need both just insert both plugins. For running you just need to add the number to the maven mojo. For example tomcat6:run.
  • I also added the newest version tag to the tomcat plugin which is 2.0-beta-1. Altough this is beta it runs pretty stable.
  • I added scope provided to the dependency for the el-api dependency. I’ll this step later.
  • According to one of the comments from the tutorial this article is based on I added the jstl 1.2 dependency. This is necessary, because the server will not be able to interprete the jstl namespace if you don’t and the probability is high that you will need jstl for any little more complex project anyways.
  • To enable embedded Glassfish support I updated the version of the embedded Glassfish plugin from 3.0 to 3.1.1. In 3.0 is a bug that will prevent our project from running.
  • Also according to some comments on the original tutorials page I changed the Glassfish el-api dependency to the javaee-web-api. This includes all dependencies necessary for the Glassfish web profile which mirrors the functionality of Tomcat and Jetty on Glassfish.

The second important file is the web.xml. After my modifications it looks like:

<?xml version="1.0" encoding="UTF-8"?>
<web-app version="2.4"
              xmlns="http://java.sun.com/xml/ns/j2ee"
              xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
              xsi:schemaLocation="http://java.sun.com/xml/ns/j2ee http://java.sun.com/xml/ns/j2ee/web-app_2_4.xsd">
 <!-- Faces Servlet -->
 <servlet>
  <servlet-name>Faces Servlet</servlet-name>
  <servlet-class>javax.faces.webapp.FacesServlet</servlet-class>
  <load-on-startup>1</load-on-startup>
 </servlet>

 <!-- Faces Servlet Mapping -->
 <servlet-mapping>
  <servlet-name>Faces Servlet</servlet-name>
  <url-pattern>*.jsf</url-pattern>
 </servlet-mapping>

 <!-- explicitly setting the EL factory, otherwise is not working correctly under tomcat and jetty -->
 <context-param>
  <param-name>com.sun.faces.expressionFactory</param-name>
  <param-value>com.sun.el.ExpressionFactoryImpl</param-value>
 </context-param>

 <listener>
  <listener-class>com.sun.faces.config.ConfigureListener</listener-class>
 </listener>

 <!-- welcome file mapping -->
 <welcome-file-list>
  <welcome-file>index.jsp</welcome-file>
 </welcome-file-list>

</web-app>

In detail I applied the following modification:

  • Added the ConfigureListener. This is necessary for Glassfish to load the Faces Servlet correctly.

That was the only necessary modification to web.xml. Jetty and Glassfish now run fine as embedded version. The only server still not working is Tomcat6. I did not manage to make it run as embedded version but fortunately there also is a mvn tomcat6:deploy mojo that deploys an application to a running Tomcat6 server.

To get the deployment mojo work you need to exchange the jar file in the Tomcat6 lib folder for the updated 2.2 version. There are many tipps on the web telling you you also need to exchange the implementation (jasper-el.jar). Don’t do this. The implementation comes bundled with your web app. Just exchange the el-api.jar for an updated version (I got mine from my local maven repository as el-api-2.2.jar) and set the scope of the el-api dependency to provided as explained. That should do the trick. In addition the el-impl.jar in version 2.2 is added as maven dependency with scope runtime to the Jetty/Tomcat profile of the pom.xml. That way we still can not do tomcat6:run but at least deploy to a running Tomcat instance. To run an embedded Tomcat it would be necessary to tell that embedded version to exchange its el-api.jar somehow which I could not figure out yet.

by klemens at April 13, 2012 08:38 PM

First prototype online

I realized a first prototype of the Effingo system. The Layout is not final and not all the algorithms are implemented yet. The prototype is able to search within questions from the SAP Developer Network Forum and the Oracle Developer Network Forum and deliver answers for the questions found. The questions are not grouped yet, so there might be duplicates. This will be changed in the next update. Other features still on my roadmap are:

  1. Giving a probability on how likely it is that a question without an answer will recieve one in the near future. That way it is possible to decide whether one wants to wait for an answer or repost the question somewhere else with ones own words.
  2. Integration of user feedback rating the answers the system provides.
  3. Summaries of grouped questions.
  4. Support for sending questions directly to an appropriate forum if no matching question is found.
  5. Integration of a much larger data basis using other programming focused support forums (especially Java) like Stackoverflow, java.net and CodeRanch.

The prototype is available under: http://www.effingo.de/effingo

by klemens at April 13, 2012 05:50 PM

March 09, 2012

Dynvoker/SPACE Research

Know the Service - Know the Score

Choice is good for users, but when too much choice becomes a problem, smart helpers are needed to choose the right option. In the world of services, there are often functionally comparable or even equivalent offers with differences only in some aspects such as pricing or fine-print regulations. Especially when using services in an automated way, e.g. for storing data in the cloud underneath a desktop or web application, users could care less about which services they use as long as the functionality is available and selected preferences are honoured (e.g. not storing data outside of the country).

The mentioned smart helpers need a brain to become smart, and this brain needs to be filled with knowledge. A good way to represent this knowledge is through ontologies. Today marks the launch of the consolidated WSMO4IoS ontology concept collection which contains this knowledge specially for contemporary web services and cloud providers. While still in its infancy, additions will happen quickly over the next weeks.

One thing which bugged me when working with ontologies was the lack of a decent editor. There are big Java behemoths available, which certainly can do magic for each angle and corner allowed by the ontology language specifications, but having a small tool available would be a nice thing to have. And voilà, the idea for the wsmo4ios-editor was born. It's written in PyQt and currently, while still far from being functional, 8.3 kB in size (plus some WSML-to-Python code which could easily be generated on the fly). The two screenshots show the initial selection of a domain for which a service description should be created, and then the editor view with dynamically loaded tabs for each ontology, containing the relevant concepts, relations (hierarchical only) and units.

![wsmo4ios-editor selection](http://serviceplatform.org/cgi-bin/gitweb.cgi?p=smartoffice;a=blob_plain;f=kde-cloudstorage/wsmo4ios-editor/docs/wsmo4ioseditor-selection.png)
![wsmo4ios-editor selection](http://serviceplatform.org/cgi-bin/gitweb.cgi?p=smartoffice;a=blob_plain;f=kde-cloudstorage/wsmo4ios-editor/docs/wsmo4ioseditor-cloudstorage.png)

The preliminary editor code can be found in the kde-cloudstorage git directory. It is certainly a low-priority project but a nice addition, especially considering the planned ability to submit service descriptions directly to a registry or marketplace through which service consumers can then select the most suitable offer. Know the service - with WSMO4IoS :)

March 09, 2012 01:11 PM

March 02, 2012

IoS.com News

USDL als open source Komponenten

Im Rahmen der Entwicklung der Unified Service Descriptions Language (USDL) ist neben der bereits länger verfügbaren formalen USDL Spezifikation nun auch die relevante USDL Werkzeugkette als open source Lösung verfügbar. Damit stehen nun der USDL Editor, der USDL Editor light mit live Demo, der USDL Marktplatz und ein USDL Modell zum Download zur Verfügung. - USDL Editor download - USDL Editor light download live Demo download - USDL Modell download - USDL Marktplatz download

March 02, 2012 03:26 PM

THESEUS – New technologies for the Internet of Services

The goal of THESEUS, one of Germany’s biggest research projects in the field of ICT, is to facilitate the access to information, to connect data in order to receive new knowledge, and to create the basis for the development of new services and business models in the Internet. Under the roof of THESEUS, about 60 research partners from industry and academia develop new technologies for the Internet of Services. The project, with a total duration of five years, had been publicly funded in equal parts by both the Federal Ministry of Economics and Technology and partners from industry and academia. SAP was consortium leader of the use case TEXO that provides the concept, components and infrastructure for the Internet of Services. This very successful research project has developed the basis for a service economy in the Internet and its organizational, technical and economic requirements. Extensive results have also been obtained especially when it comes to the Unified Services Description Language (USDL) and the integrated USDL tool chain which is available as Open Source. On behalf of the Federal Ministry of Economics and Technology, Stefan Kapferer, state secretary, opened the event. He laid special attention on the fact that THESEUS had primarily concentrated on areas of growth such as software, medical engineering, and media. Prof. Dr. Hans-Jörg Bullinger, President of Fraunhofer, said: “A variety of patents, innovative business models, state-of-the-art services and technologies have been created in THESEUS that opens up totally new possibilities especially for SMEs.” Henning Kagermann, former CEO of SAP, emphasized that the results of the THESEUS project are an important contribution to a web-based service and knowledge society. At the closing congress, Thomas Widenka, Vice President and COO, as well as Martin Przewloka, Head of the IA&S practice, delivered lectures with special focus on the exploitation of the research results. During the pre-congress which took place on February 13, the participants had the possibility to get a closer look of USDL in a tutorial. Overall SAP Research was able to draw a positive assessment about the THESEUS project. You can find a detailed documentation of the lectures on the closing congress here.

March 02, 2012 01:24 PM

February 25, 2012

Dynvoker/SPACE Research

Sync with the cloud

How can desktop users assume command over their data in the cloud? This follow-up to the previous blog entry on a proposed optimal cloud storage solution in KDE concentrates on the smaller integration pieces which need to be combined in the right way to achieve the full potential. Again, the cloud storage in use is assumed to be dispersed among several storage providers with the NubiSave controller as opposed to potentially unsafe single-provider setups. All sources are available from the kde-cloudstorage git directory until they may eventually find a more convenient location.

NubiSave

The cloud storage integration gives users a painless transfer of their data into the world of online file and blob stores. Whatever the users have paid for or received for free, shall be intelligently integrated this way. First of all, the storage location naturally integrates with the network folder view. One click brings the content entrusted to the cloud to the user's attention. Likewise, this icon is also available in the file open/save dialogues, fusing the local and remote file management paradigms.

Cloud storage place

Having a file stored either locally or in the cloud is often undesirable. Instead, a file should be available locally and in the cloud at the same time, with the same contents, through some magic synchronisation. In the screenshot below, the user (who is apparently a friend of Dolphin) clicks on a file or directory and wants it to be synchronised with the cloud in order to access it from other devices or to get instant backups after modifications. The alternative menu point would move the data completely but leave a symlink in the local filesystem so the data being in the cloud will not change the user's workflow except for perhaps sluggish file operations on cold caches.

Cloud synchronisation initiation

What happens is that instead of copying right away, the synchronisation link is registered with a nifty command-line tool called syncme (interesting even for users who mostly refrain from integrated desktops). From that point on, a daemon running alongside this tool synchronises the file or directory on demand. The screenshot below shows the progress bar representing the incremental synchronisation. The rsync-kde tool is typically hidden behind the service menu as well.

Cloud synchronisation progress

The current KDE cloud storage integration architecture is shown in the diagram below. Please note that it is quite flexible and modular. Most of the tools can be left out and fallbacks will automatically be picked, naturally coupled with a degraded user experience. In the worst case, a one-time full copy of the selected files is performed without any visual notification of what is going on - not quite what you want, so for the best impression, install all tools together.

KDE cloud storage architecture

Naturally, quite a few ingredients are missing from this picture, but rest assured that they're being worked on. In particular, how can the user select, configure and assemble cloud storage providers with as few clicks and hassles as possible? This will be a topic for a follow-up post. A second interesting point is that ownCloud can currently be used as a backend storage provider to NubiSave, but could theoretically also serve as the entry point, e.g. by running on a router and offloading all storage to providers managed through one of its applications. This is another topic for a follow-up post...

February 25, 2012 06:36 PM

December 23, 2011

Dynvoker/SPACE Research

Cloud resources delivered from and to your desktop in 2012

What is the free desktop ecosystem's answer to both the growing potential and the growing threat from the cloudmania? Unfortunately, there is not much to it yet. My continuous motivation to change this can best be described by this excerpt from an abstract of a DS'11 submission:

The use of online services, social networks and cloud computing offerings has become increasingly ubiquitous in recent years, to the point where a lot of users entrust most of their private data to such services. Still, the free desktop architectures have not yet addressed the challenges arising from this trend. In particular, users are given little systematic control over the selection of service providers and use of services. We propose, from an applied research perspective, a non-conclusive but still inspiring set of desktop extension concepts and implemented extensions which allow for more user-centric service and cloud usage.

When these lines were written, we did already have the platform-level answers, but not yet the right tools to build a concrete architecture for everyday use. The situation has recently improved with results pointing into the right direction. This post describes such a tool for optimal cloud storage in particular (optimal compute clouds are still ahead of us).

NubiSave Enter NubiSave, our award-winning optimal cloud storage controller. It evaluates formal cloud resource descriptions with some SQL/XML schema behind them plus some ontology magic such as constraints and axioms, together with user-defined optimality criteria (i.e. security vs. cost vs. speed). Then, it uses these to create the optimal set of resources and spreads data entering through a FUSE-J folder among the resources, scheduling again according to optimality criteria. Even if encryption is omitted or brute-forced, no single cloud provider gets access to the file contents. Furthermore, transmission and retention quality is increased compared to legacy single-provider approaches. This puts the user into command and the provider into the backseat. Thanks to redundancy, insubordinate providers can be dismissed by the click of a button :-)

NubiSave experimental PyQt GUI Going from proof-of-concept prototypes to usable applications requires some programming and maintenance effort. This is typically not directly on our agenda, but in selected cases we choose this route for increasing the impact through brave adopters. The recently started PyQt GUI shown here gives a good impression on how desktop users will be able to mix and match suitable resource service providers. This tool will soon be combined with the allocation GUI which interestingly enough is also written in PyQt for real-time control of what is going on between the desktop and the cloud.

Of course, there are still plenty of open issues, especially concerning automation for the masses - how many GB of free storage can we get today? Without effort to set it up? But the potential of this solution over deprecated single vendor relationships is pretty clear. If people want RAIDs for local storage, why don't they go for RAICs and RAOCs (Redundant Arrays of Optimal Cloud storage) already? In fact, a fairly large company has shown significant interest in this work, and clearly we hope on more companies securing their souvereignty in the cloud through technologies such as ours. And we hope on desktops offering dead simple tools to administer all of this, and complementary efforts such as ownCloud to add fancy web-based sharing capabilities.

Making optimal use of externally provided resources in the cloud is a good first step (and a necessity to preserve leeway in the cloud age), but being able to collaboratively participate in community/volunteer cloud resource provisioning is the logical path going beyond just the consumption. We are working on a community-driven cloud resource spotmarket for interconnected personal clouds and on a sharing tool to realise this vision. The market could offer GHNS feeds for integration into the Internet of Services desktop. I'm glad to announce that in addition to German Ministry of Economics and EU funding, for the entire year of 2012 we were able to acquire funds from the Brazilian National Council for Scientific and Technological Development (CNPq). This means that next year I will migrate between hemispheres a couple times to work with a team of talented people on scalable cloud resource delivery to YOUR desktop. Hopefully, more people from the community are interested to join our efforts, especially for desktop and distribution integration!

CNPq

December 23, 2011 05:28 PM

October 10, 2011

SPACE Service Platform Notifications

serviceplatform: The Open Source Service Platform Research initiative presents itself on a newly designed website: http://ur1.ca/5c4ta

The Open Source Service Platform Research initiative presents itself on a newly designed website: http://ur1.ca/5c4ta

by SPACE Service Platform Notifications at October 10, 2011 08:53 AM

October 08, 2011

SPACE Service Platform Notifications

serviceplatform: World domination plan: 1. redesign http://ur1.ca/5c4ta website; 2. fix social network integration; 3. promote ontology sets; 4. experiments

World domination plan: 1. redesign http://ur1.ca/5c4ta website; 2. fix social network integration; 3. promote ontology sets; 4. experiments

by SPACE Service Platform Notifications at October 08, 2011 10:59 AM

September 25, 2011

SPACE Service Platform Notifications

serviceplatform: SPACEflight 1.0 beta6 released - Live Demonstrator for the Internet of Services & Cloud Computing: http://ur1.ca/56pph

SPACEflight 1.0 beta6 released - Live Demonstrator for the Internet of Services & Cloud Computing: http://ur1.ca/56pph

by SPACE Service Platform Notifications at September 25, 2011 11:21 AM

Dynvoker/SPACE Research

SPACEflight 1.0 beta6 released

We're proud to announce the latest revision of SPACEflight, the live demonstrator for the Internet of Services and Cloud Computing. The overall scope of the demonstrator can be seen from the picture below. We consider SPACEflight to be a platform for exploring and showcasing future emerging technologies. At its core, the SPACE service platform handles management and execution of heterogeneous services as a foundation for marketplaces. Services can be deployed, discovered, configured and contract-protected, delivered to clients with frontends, rated, executed with access control, monitored and adapted. SPACEflight integrates this platform into a self-running operating system with a pre-configured desktop environment, scenario workflows and scenario services.

Scope of the SPACEflight live demonstrator

In version 1.0 beta6, a first light-weight engineering and provisioning toolchain was added (USDL service description & WSAG service level agreement template editors, service package bundler with one-click deployment button), thus extending the demonstrateable service lifecycle considerably. Read about the added functionality in our wiki.

Furthermore, the base system was stabilised and the service discovery was optimised for high query and synchronisation performance through new SOAP extensions.

You can download the image (for USB sticks and KVM virtualisation on x86_64 architectures, other choices will follow soon). Furthermore, a Debian package repository is available for installing the constituent platform services and tools individually on existing systems. Find out more information and download links on the 1.0 beta6 release info page.

SPACEflight 1.0 beta6 Cover

This release certainly marks the most complete and high-quality integrated demonstrator of its kind. In total, more than 250 improvements have been applied over the previous version. The demonstrator has been presented in a conference presentation titled SPACEflight - A Versatile Live Demonstrator and Teaching System for Advances Service-Oriented Technologies at CriMiCo'11. Development is already continuing with the addition of more diverse service implementation technology containers.

September 25, 2011 09:42 AM

August 19, 2011

SPACE Service Platform Notifications

serviceplatform: The USB writer script in the SPACE #git repo now contains the option to add a data partition which can be mounted for storing #experiments.

The USB writer script in the SPACE # repo now contains the option to add a data partition which can be mounted for storing #

by SPACE Service Platform Notifications at August 19, 2011 07:31 PM

July 08, 2011

SPACE Service Platform Notifications

serviceplatform: #Server upgrade: the virtual sp.o server has been extended from 20 to 70 GB; download.sp.o will join the mothership soon, freeing one nettop

# upgrade: the virtual sp.o server has been extended from 20 to 70 GB; download.sp.o will join the mothership soon, freeing one nettop

by SPACE Service Platform Notifications at July 08, 2011 11:19 AM

June 16, 2011

SPACE Service Platform Notifications

serviceplatform: Start preparing the release: merged cebit branch from space-cloud extension; merged spacecloud and future-beta6 into #SPACE-development

Start preparing the release: merged cebit branch from space-cloud extension; merged spacecloud and future-beta6 into #

by SPACE Service Platform Notifications at June 16, 2011 06:13 AM

June 15, 2011

SPACE Service Platform Notifications

serviceplatform: Uploads of USDL files from the USDL editor are now possible when using ConQo instead of the USDL repository http://ur1.ca/4fuwp

Uploads of USDL files from the USDL editor are now possible when using ConQo instead of the USDL repository http://ur1.ca/4fuwp

by SPACE Service Platform Notifications at June 15, 2011 03:16 PM

IoS.com News

TEXO Themenkonferenz

Rund 50 hochrangige Experten aus Forschung und Industrie haben anlässlich der THESEUS TEXO Themenkonferenz „Wachstum in der webbasierten Dienstleistungswirtschaft: USDL als Dienstbeschreibungssprache und das TEXO Lab“ am 27.05.2011 in Berlin über die Weiterentwicklung von USDL und der künftigen Gestaltung des TEXO Labs diskutiert. Die Fachvorträge reichten von der Vision des Internet der Dienste über die Motivation zu USDL bis hin zur Umsetzung von TEXO Ergebnissen in weiteren Forschungsprojekten und Unternehmenskonzepten. Zum Auftakt der Konferenz veranschaulichte Dr. Regine Gernert vom Bundesministerium für Wirtschaft und Technologie die Potenziale webbasierter Dienstleistungen in Deutschland. Der Dienstleistungssektor ist in den letzten Jahren kontinuierlich gewachsen. IKT-Technologien können dazu beitragen, dass dieses Wachstum weiter steigt. Dies zeige die Aktualität eines Forschungsprojektes wie THESEUS TEXO.

Foto: Felix Peschko, Berlin

Einen ähnlichen Ausgangspunkt wählte auch Prof. Dr. Martin Przewloka, SAP AG, um die ursprüngliche Motivation von Seiten SAPs zum TEXO Projekt zu veranschaulichen. Bereits heute werden manuell erbrachte Dienstleistungen durch IKT-Technologien unterstützt, allerdings herrscht immer noch ein großer Bedarf an Optimierungsmaßnahmen. USDL ist ein wichtiger Schritt, um webbasierte Dienste handelbar zu machen, referierte Prof. Dr. Przewloka.

Über die große Herausforderung, eine Beschreibungssprache wie USDL zu entwickeln, sprach Prof. Dr. Weber von der THESEUS Begleitforschung. Seine Wettbewerbsanalyse bestätigt, dass Standards zur Dienstbeschreibung zwar existieren, aber diese meist nur die technischen Aspekte der Beschreibung berücksichtigten. Dahingegen lassen sich mit USDL auch die betriebswirtschaftlichen und operationalen Aspekte von Diensten beschreiben.

Im Fokus des TEXO Projekts steht aktuell die Verwertung von Forschungsergebnissen. Hierzu berichtete Dr. Holger Eggs, SAP AG, über den neuen SAP Store. Für die Konzeption des SAP Stores habe der TEXO Marktplatz wichtige Grundlagen geliefert, so Dr. Eggs. Weitere Ergebnisse werden momentan auf ihre Einsetzbarkeit in marktfähigen Produkten sowohl innerhalb der SAP als auch von mittelständischen Unternehmen evaluiert und teilweise bereits umgesetzt.

In zwei parallel stattfindenden Breakout Sessions stellten die mittelständischen Unternehmen Metasonic AG und SEEBURGER AG ihre ersten Implementierungsergebnisse vor und zeigten damit unterschiedliche Einsatzmöglichkeiten von USDL.

Der als Beobachter fungierende Prof. Dr. Mühlhäuser, TU Darmstadt, setzte sich abschließend mit der Frage „Was passiert nach TEXO?“ auseinander. Er hob hervor, wie wichtig es ist, die Ergebnisse aus THESEUS TEXO in Deutschland einzusetzen. Nur so kann Deutschland weiterhin die Führungsrolle im Internet der Dienste innehalten.

Das Konzept der Konferenz erwies sich als erfolgreich. Nach den kurzen und abwechslungsreichen Vorträgen wurde immer wieder die Diskussion zwischen Referenten und Teilnehmern angestoßen. Diese gestaltete sich als offen und konstruktiv. Ausschlaggebend dafür könnte das vorabendliche Get Together in den modernen und futuristischen Räumlichkeiten des THESEUS Innovationszentrums gewesen sein, zu welchem die SAP einlud. So hatten die Teilnehmer bereits im Vorfeld die Gelegenheit sich über die Vision des Internet der Dienste in angenehmer Atmosphäre auszutauschen.

Insgesamt war die Themenkonferenz des TEXO Projekts eine überaus gelungene Veranstaltung.

Sie können eine Videogrußbotschaft von Dr. Andreas Goerdeler, Bundesministerium für Wirtschaft und Technologie, Leiter der Unterabteilung "Informationsgesellschaft/Medien" anschauen. klicken Sie HIER

Hier können Sie die Beiträge der Redner als PDF Datei downloaden:

Redner Vortrag Download (PDF)
Dr. Regine Gernert, Bundesministerium für Wirtschaft und Technologie Das Internet der Dienste im Spektrum von Forschung und Markt Download
Prof. Dr. Martin Przewloka, SAP AG TEXO und SAP Download
Prof. Dr. Herbert Weber, THESEUS Begleitforschung USDL im Kontext
Download
Dr. Holger Eggs, SAP AG Internet der Dienste im Business byDesign Umfeld
Download
Dr. Markus Heller, SAP AG Vorstellung der Software-Komponenten im TEXO Lab
Download
Prof. Dr. Felix Sasaki, DFKI Motivation zu USDL. Wie kann USDL zum Katalysator für das Internet der Dienste werden?
Download

June 15, 2011 03:16 PM

May 23, 2011

FlexCloud

josefspillner: Work in !flexcloud now concentrates on defining an architecture to match our vision of hybrid clouds. First focus: data-centric services.

Work in !flexcloud now concentrates on defining an architecture to match our vision of hybrid clouds. First focus: data-centric services.

by Josef Spillner at May 23, 2011 10:45 AM

May 16, 2011

FlexCloud

josefspillner: Successful !flexcloud participation @ OUTPUT.DD 6.0, TU Dresden, including a #spaceflight demonstration and talks http://ur1.ca/46yoe

Successful !flexcloud participation @ OUTPUT.DD 6.0, TU Dresden, including a # demonstration and talks http://ur1.ca/46yoe

by Josef Spillner at May 16, 2011 12:32 PM

May 05, 2011

WebKnox

What You Need To Know About WebKnox

WebKnox, which is a shortened name for the term "Web Knowledge eXtraction", reinvents the face of information extraction in the age of perpetually-updated search engines in the internet. When one searches for information via the world wide web, there will always be a multitude of repeated information that will come up as results. WebKnox is purported to extract more reliable, factual information about various topics and render these as the top result. ...

Keep on reading: What You Need To Know About WebKnox

May 05, 2011 08:14 AM

April 25, 2011

SPACE Service Platform Notifications

serviceplatform: http://serviceplatform.org/spec/wsag+vt-spec.html

by SPACE Service Platform Notifications at April 25, 2011 08:46 PM

April 03, 2011

Effingo: Near Duplicate Detection

Areca – A central platform for scientific results and data

I guess many scientists encountered the following problem: You read a paper very relevant to your work and they achieved quite interesting results. So you want to check if you can achieve similar or better results using your own algorithms for solving the problem. However there is no access to the data that was used for their evaluation either because it was never accessible online or because the person responsible for the research already left the research facility and his/her webspace was cleared.

Areca Logo

Areca - Portal for comparing and sharing research results and datasets

To make research results believable and comparable it is necessary to give access to the results as well as the data used to every interested person. There are portals trying to solve this problem and today I am going to present one of them. Its name is Areca. The portal can be used to upload datasets as well as sharing and comparing results achieved on these datasets. That way other reserachers are able to download the data run their own algorithms and enter their results with a comparison to other results using the same evaluation methodology.

Result page on Areca

Comparing research results on Areca

Want to give it a try? Then visit http://areca.co!

by klemens at April 03, 2011 09:29 PM

March 28, 2011

IoS.com News

The TEXO Governance Framework - Whitepaper

Das Dokument präsentiert ein Service Governance Framework für Marktplätze im Internet of Services. Es wurde basierend auf einer State-of-the-Art Analyse entwickelt und fußt auf den COBIT und ITIL Frameworks. Es beinhaltet darüber hinaus Teile aus Frameworks aus der Industrie sowie Teile, die speziell für Internet Service Marketplaces entworfen wurden. Das Framework nutzt vier Kernbausteine, die für den Einsatz instanziiert werden müssen: Prozesse, Stakeholder, KPI, und Reifegradmodelle.

[Das Whitepaper ist nur in Englisch verfügbar.]

We base our conceptual considerations on existing frameworks and also take into consideration the particularities of emerging SOA Governance frameworks.

  • The Process Framework defines tasks and activities required to manage the ISM and its life cycle. Especially, the areas of “service portfolio management”, “service life cycle management” as well as “broker operations” are not adequately represented by current frameworks and are developed in this component.
  •  Roles and responsibilities of the processes, tasks are focused on in the Stakeholder Map.
  • The third building block, the Measurement Framework, describes corresponding key performance indicators and other result measures, which are used to evaluate process quality as well as the compliance with internal, normative, and legal regulations.
  •  The fourth building block is a Maturity Model. The application to a service-oriented IT system allows the evaluation concerning system maturity and identification of potential gaps, which need to be covered by additional governance processes.

Based on evidence from existing governance frameworks and research from academia and practice, we propose processes, stakeholders, measurements, and maturity levels as four central building blocks for a governance framework that can support operations of such a platform. We acknowledge that there are more, as related work points out, however they can most likely be related to the above four. In this report, we provide an initial framework that needs to be instantiated according to the application and its process, roles, maturity levels, and metrics need to be detailed for application.

Sie können das PDF Dokument (Englisch, 3.8 MB, 173 Seiten) im PDF Download Center oder -> HIER.

March 28, 2011 08:05 AM

February 28, 2011

Effingo: Near Duplicate Detection

The Search for Answers on the Web.

As the Effingo development continues I realized how useful such a system could be several times. Recently for example I tried to solve a weird Subversion Problem together with a colleague. The following explains how this is solved classically and with the help of Effingo.The Subversion problem we are currently facing is that almost every time we try to commit something to our universities server from home or update our working copies the system simply hangs up and one has to start anew. I went through a lot of logging files and it seemed that the following entries from the WebDAV Module of Apache 2 describe the problem:

[Tue Feb 01 11:17:29 2011] [error] [client xxx.xxx.xxx.xxx] Provider
encountered an error while streaming a REPORT response.  [500, #0]
[Tue Feb 01 11:17:29 2011] [error] [client xxx.xxx.xxx.xxx] A
failure occurred while driving the update report editor  [500, #103]
[Tue Feb 01 11:17:29 2011] [error] [client xxx.xxx.xxx.xxx] Error
writing base64 data: Software caused connection abort  [500, #103]

[Tue Feb 01 15:03:24 2011] [error] [client xxx.xxx.xxx.xxx] Could
not get next bucket brigade  [500, #0]

To solve this we did a quick Google search getting a result list like:

Most of the results on the first page refer to some forum discussion. All of them correctly are about the same problem we had. However they either:

  1. Propose different solutions
  2. Solutions not applicable to our problem (i.e. Windows Environment)
  3. Solutions already presented in another discussion
  4. No solution at all

This is a very annoying scenario since we have to filter out threads without answers and threads with answers we already read. Then we need to choose among the remaining answers which is most applicable to our scenario. Each step requires some minutes or even hours of reading. So even though we identified results containing the correct question we might be hours from the solution. In addition we still face the possibility that after hours of searching we find no solution at all. Wouldn’t it be nice to just have a big read button labeled “SHOW ME THE ANSWER” directly adjacent to the correct question.

To implement such a button we need to get all threads containing the same question from a set of search engine results. From these groups we need to filter out all threads containing the correct question (GROUP BY question) and for each group of questions need to filter answers so that every answer occurs only once in the result set (DISTINCT answer). Finally after having a complete set of answers for each question we rank all answers by their quality and relevance (ORDER BY relevance, quality) and show the top result to the user. These three operators:

  • GROUP BY question
  • DISTINCT answer
  • ORDER BY relevance, quality

will form the core of the Effingo system to support the mentioned scenario.

by klemens at February 28, 2011 04:14 PM

IoS.com News

USDL 3.0 M5 Spezifikationen veröffentlicht

SAP Research hat den Meilenstein 5 der Unified Service Description Language (USDL) 3.0 Spezifikationen veröffentlicht. Bestandteil sind die Module: “Service, Service Level, Legal, Technical, Functional, Interaction, Participants, Pricing und Foundation”. Die Downloads sind auf der Internet of Services Website innerhalb des PDF download Bereichs und auf der USDL Spezifikation Seite verfügbar. Wesentliche Neuerungen und Änderungen gegenüber Meilenstein 4 sind:
  • Vollständig neues Übersichtsdokument
  • Vollständig neues Legal Modul (Deutsches Urheberrecht)
  • Komplett überarbeitetes Service Level Modul mit einer Grunderweiterung, um die Modellierung von konkreten Dienstebenen sofort umsetzen zu können
  • Neu eingeführtes Technical Modul (für eine bessere Modularität wurde dieses von dem Functional Modul getrennt)
  • Eine Vielzahl an kleineren Verbesserungen: überarbeitetes Benennungs-Konzept, vereinfachtes Interaction Modul, neue Konsumenten Rolle, etc.

February 28, 2011 12:37 PM

THESEUS TEXO auf der CeBIT 2011

THESEUS im Internet der Dienste. So könnte man das übergeordnete Motto für den diesjährigen CeBIT Auftritt des THESEUS Forschungsprogramms betiteln. Als ein grundlegender Gestalter von Technologien für das Internet der Dienste, präsentiert THESEUS TEXO innovative und vielseitige Forschungsergebnisse auf mehreren Demostationen. Auf dem Stand des Bundesministeriums für Wirtschaft und Technologie in Halle 9 zeigt TEXO, wie einfach und schnell webbasierte Dienstleistungen künftig entwickelt und angeboten werden können. Über 10 TEXO Demonstratoren veranschaulichen dafür den gesamten Prozess von der Erstellung eines neuen Dienstes, über dessen Handel bis hin zur Leistungserbringung. Zu Beginn wird beispielhaft erklärt, wie ein Dienst mit der ISE Workbench modelliert wird und wie anschließend in der Unified Service Description Language (USDL) wichtige Merkmale des Dienstes beschrieben werden. Solche Merkmale können unter anderem Geschäftszeiten, Preise und die Art der Dienstleistung sein. Nach dieser Spezifikation der Merkmale ist der Computer in der Lage zu verstehen, was der Dienst leistet und kann somit auf einem Online-Marktplatz veröffentlicht werden. In einer der Präsentationen wird dieses Szenario praxisnah demonstriert – von der Erstellung eines Dienstes bis hin zur Integration auf dem TEXO Marktplatz, von wo aus der Dienst gehandelt werden kann. Abschließend wird demonstriert, wie dieser Dienst dank des Mobility Mediation Layers auf einem mobilen Endgerät verfügbar gemacht werden kann. Neben der Hauptdemonstration, innerhalb derer eine komplette Werkzeugkette veranschaulicht wird, können sich Besucher ebenfalls über die TEXO-Werkzeuge im Einzelnen genauer informieren. An sogenannten Deep Dive Stationen haben Besucher die Möglichkeit, einzelne TEXO Komponenten ausführlich gezeigt zu bekommen. Jeder Deep Dive beginnt mit dem Lebenszyklus von Dienstleistungen. Der Lebenszyklus zeigt die verschiedenen Phasen eines Dienstes - von der Innovation bis zur Bereitstellung des Dienstes. Die TEXO Komponenten sind den einzelnen Phasen des Lebenszyklus zugeordnet, sodass Interessenten jederzeit wissen, welchem Zweck die jeweiligen Komponenten dienen. Was wird konkret von TEXO gezeigt werden? Folgende Partner zeigen ihre Demonstrationen:
  • DFKI – Multimodaler Zugang zum Internet der Dienste
  • Fraunhofer FOKUS - Modellgetriebene Sicherheit für Geschäftsdienste auf Dienstmarktplätzen und TRICE
  • Metasonic AG - jCPEX!
  • Metris GmbH / IAT Universität Stuttgart – openXchange
  • SAP AG – integrierte Werkzeugkette für das Internet der Dienste, Unified Service Description Language (USDL) , TEXO Marktplatz
  • SEEBURGER AG / Karlsruhe Institute of Technology - B2B in the Cloud
  • Technische Universität Darmstadt – Processes as Services
  • Technische Universität Dresden – SPACEflight

    Als Beispiel einer erfolgreichen und innovativen Technologie steht die „Cloud“ im Mittelpunkt der CeBIT 2011. Wie die in TEXO entwickelte Kernkomponente „USDL“ Cloud Technologien unterstützen kann, erfahren Sie am Stand der BITKOM in Halle 4.
Die PDF Versionen der folgenden Demonstratoren erhalten Sie hier oder in dem PDF Download Bereich:

February 28, 2011 12:32 PM

February 18, 2011

FlexCloud

josefspillner: Announcing the #spaceflight videos for #cebit 2011 with the first results of the space-cloud extension from !flexcloud: http://ur1.ca/3a2ga

Announcing the # videos for # 2011 with the first results of the space-cloud extension from !flexcloud: http://ur1.ca/3a2ga

by Josef Spillner at February 18, 2011 11:33 AM

February 14, 2011

SPACE Service Platform Notifications

serviceplatform: More performance goodies: With 12GB ramdisk builder, bootstrap down to 182s, DVD iso creation down to 7.9s (vs. 194 on hdd) #ultrafast

More performance goodies: With 12GB ramdisk builder, bootstrap down to 182s, DVD iso creation down to 7.9s (vs. 194 on hdd) #

by SPACE Service Platform Notifications at February 14, 2011 10:17 PM

February 10, 2011

IoS.com News

USDL 3.0 M4 Spezifikationen veröffentlicht

Heute hat SAP Research den Meilenstein 4 der Unified Service Description Language (USDL) 3.0 Spezifikationen inclusive der Module “Service, Functional, Interaction, Participants, Pricing und Foundation” veröffentlicht. Die Downloads sind innerhalb des PDF Download Bereichs und auf den USDL Detailseiten verfügbar. Die Veröffentlichung des Meilenstein 4 hat sich aus mehreren Gründen verspätet, jedoch verläuft die Entwicklung der nächsten Version (Meilenstein 5) wie geplant und wird termingerecht veröffentlicht werden. Über USDL: USDL besteht aus einer Menge von Modulen, von denen jedes einzelne einen unterschiedlichen Aspekt der gesamten Dienstbeschreibung adressiert. Die Modularität wurde zur Verbesserung der Lesbarkeit des Models eingeführt, da dieses im Vergleich zu seinen Vorgängern stark an Größe zunahm. Die Module haben untereinander Abhängigkeiten, da sie Elemente aus  anderen Modulen teilweise wiederverwenden. Zum jetzigen Zeitpunkt existieren 8 Module in der USDL 3.0 Version. Nur 6 davon haben jedoch einen ausreichenden Reifegrad und sind Bestandteil des Meilenstein 4 von USDL.

February 10, 2011 12:36 PM

February 06, 2011

SPACE Service Platform Notifications

serviceplatform: Extra caching of #debootstrap tree gains two more minutes during #spaceflight production. Now down to a record time of 510s for 1st phase.

Extra caching of # tree gains two more minutes during # production. Now down to a record time of 510s for 1st phase.

by SPACE Service Platform Notifications at February 06, 2011 12:32 AM

February 05, 2011

SPACE Service Platform Notifications

serviceplatform: Gaining 46 seconds in the bootstrap phase by yet more aggressive caching of 8 #ruby gems during #spaceflight mastering.

Gaining 46 seconds in the bootstrap phase by yet more aggressive caching of 8 # gems during # mastering.

by SPACE Service Platform Notifications at February 05, 2011 11:40 PM

February 04, 2011

SPACE Service Platform Notifications

serviceplatform: Drafting a #video with evolvotron-generated animation, #ccmixter music by George Ellinas and assembly in kdenlive: http://ur1.ca/33t99

Drafting a # with evolvotron-generated animation, # music by George Ellinas and assembly in kdenlive: http://ur1.ca/33t99

by SPACE Service Platform Notifications at February 04, 2011 12:53 AM

February 03, 2011

SPACE Service Platform Notifications

serviceplatform: For comparison, #spaceflight mastering stats for spacedevelopment machine: packaging 1003s, bootstrap 1214s, cloud ~270s, usbimg 303s

For comparison, # mastering stats for spacedevelopment machine: packaging 1003s, bootstrap 1214s, cloud ~270s, usbimg 303s

by SPACE Service Platform Notifications at February 03, 2011 04:24 PM

serviceplatform: Mastering process for #spaceflight live distro now fully automatic. Stats for bomba: packaging 642s, bootstrap 644s, cloud 237s, usbimg 176s

Mastering process for # live distro now fully automatic. Stats for bomba: packaging 642s, bootstrap 644s, cloud 237s, usbimg 176s

by SPACE Service Platform Notifications at February 03, 2011 12:53 AM

February 01, 2011

SPACE Service Platform Notifications

serviceplatform: First ramdisk support when building SPACEflight reduces image creation time from 5m55s (including 40 ms for ext2 initialisation) to 3m4s

First ramdisk support when building SPACEflight reduces image creation time from 5m55s (including 40 ms for ext2 initialisation) to 3m4s

by SPACE Service Platform Notifications at February 01, 2011 11:03 PM

serviceplatform: Current !debian packages of SPACE platform services available for testing: http://texo.inf.tu-dresden.de/packages/trunk/binary-x86_64/

Current !debian packages of SPACE platform services available for testing: http://texo.inf.tu-dresden.de/packages/trunk/binary-x86_64/

by SPACE Service Platform Notifications at February 01, 2011 07:58 PM

January 06, 2011

FlexCloud

josefspillner: Plan in !flexcloud for Q1/2011: Development of SPACEflight 1.0 beta5 with cloud capabilities, to be seen on expos and promotional video.

Plan in !flexcloud for Q1/2011: Development of SPACEflight 1.0 beta5 with cloud capabilities, to be seen on expos and promotional video.

by Josef Spillner at January 06, 2011 08:53 PM

December 30, 2010

IoS.com News

Neue Chancen für webbasierte Dienstleistungen

Am 17. Mai 2011 öffnet das THESEUS Innovationzentrum die Türen, um  über die wirtschaftlichen Potentiale und technologischen Grundlagen des Internets der Dienste zu informieren. Unter dem Motto „Neue Chancen für webbasierte Dienstleistungen: USDL als Dienst-Beschreibungssprache und das TEXO Lab als virtuelles Labor“ lädt das THESEUS Programmbüro zur Themenkonferenz des TEXO Anwendungsszenarios ein. Im Rahmen dieser Konferenz werden die in TEXO entwickelten Schlüsseltechnologien, allen voran die Unified Service Description Language (USDL),  für das Internet der Dienste sowie das kürzlich eröffnete TEXO Lab vorgestellt. Das Programm der Themenkonferenz sowie die Referenten werden in Kürze bekanntgegeben.   Am Folgetag wird ebenfalls im THESEUS Innovationszentrum eine Pressekonferenz stattfinden, an der zentrale Forschungsergebnisse aus dem TEXO Projekt vorgestellt werden.

December 30, 2010 10:56 AM

Neue Chancen für webbasierte Dienstleistungen

Am 17. Mai 2011 öffnet das THESEUS Innovationzentrum die Türen, um  über die wirtschaftlichen Potentiale und technologischen Grundlagen des Internets der Dienste zu informieren. Unter dem Motto „Neue Chancen für webbasierte Dienstleistungen: USDL als Dienst-Beschreibungssprache und das TEXO Lab als virtuelles Labor“ lädt das THESEUS Programmbüro zur Themenkonferenz des TEXO Anwendungsszenarios ein. Im Rahmen dieser Konferenz werden die in TEXO entwickelten Schlüsseltechnologien, allen voran die Unified Service Description Language (USDL),  für das Internet der Dienste sowie das kürzlich eröffnete TEXO Lab vorgestellt. Das Programm der Themenkonferenz sowie die Referenten werden in Kürze bekanntgegeben.   Am Folgetag wird ebenfalls im THESEUS Innovationszentrum eine Pressekonferenz stattfinden, an der zentrale Forschungsergebnisse aus dem TEXO Projekt vorgestellt werden.

December 30, 2010 10:56 AM

December 18, 2010

SPACE Service Platform Notifications

serviceplatform: Switching from pbuilder to cowbuilder reduces the build time of e.g. Access Gate from 1m11.104s to 0m33.526s and saves the hard disk

Switching from pbuilder to cowbuilder reduces the build time of e.g. Access Gate from 1m11.104s to 0m33.526s and saves the hard disk

by SPACE Service Platform Notifications at December 18, 2010 01:02 PM

serviceplatform: Finally: After 1.0 beta4 release, upgrading base system from !debian squeeze of 20th of May to 18th of December, gives e.g. Python 2.6

Finally: After 1.0 beta4 release, upgrading base system from !debian squeeze of 20th of May to 18th of December, gives e.g. Python 2.6

by SPACE Service Platform Notifications at December 18, 2010 11:45 AM

November 17, 2010

SPACE Service Platform Notifications

serviceplatform: Yet another remastering session, this time for completing the customised image of SPACEflight used for the funding project dissem. workshop

Yet another remastering session, this time for completing the customised image of SPACEflight used for the funding project dissem. workshop

by SPACE Service Platform Notifications at November 17, 2010 06:29 PM

October 20, 2010

FlexCloud

josefspillner: The !flexcloud group account will be used for short status messages about the research project's public results.

The !flexcloud group account will be used for short status messages about the research project's public results.

by Josef Spillner at October 20, 2010 07:45 PM

October 13, 2010

Effingo: Near Duplicate Detection

Featureengineering

Featureengineering is the most important step in designing a modern information retrieval system like Effingo. Independent of the similarity measure and the cluster algorithm the final system will use, the features extracted from the forum content will shape the results like nothing else.

A Feature is a piece of information that is applied to compare two contributions in the final system. From the two sets of all features of two contributions the similarity measure will calculate a similarity value. In classical information retrieval features where simple keywords from the two documents to compare. Modern information retrieval systems use more sophisticated feature types. The success or failure of the system depends strongly on choosing the correct feature types.

Effingos feature types are organised into three categories – local, contextual and structural. The following paragraphs present the feature types, ordered by these three feature type categories.

Local Features

Local features are taken directly from the raw contributions text. They mostly resemble classical IR features.

Text

The raw text is the most simple source for features. Effingo can handle it with classical information retrieval methods. Filtering stop words, stemming the remaining words and building a keyword index. To improve on this feature type one can apply methods for calculating similarity of documents like described in (Andrei Z. Broder, 2000, 1--10).

However, there is a big problem that applies only to user generated content and thus research about it is still at the beginning. Forum posts vary greatly in quality of spelling and grammar. This is because forum posts are not edited like news articles or books and everyone can produce them. Many established natural language processing methods like stemming, stop word removal or even tokenization are hard to apply to such content. Therefore it might be necessary to concentrate more elaborated text processing on high quality content. It is possible to find such content as shown for example in (Eugene Agichtein, Carlos Castillo, Debora Donato, Aristides Gionis, Gilad Mishne, 2008, 183-194). Effingo can process the remaining content using language independent methods like N-Gram segmentation and hashing.

Another problem applying especially to unstructured social media contributions is their shortness. It is possible – however subject to further research – that the body text of many contributions does not contain enough information to order it to the correct cluster.

PoS Tags

Part of Speech (PoS) Tags are used to assign each word in a sentence to a part of speech like noun, verb or adjective. They are created using so called Tagger programs. Tagger use machine learning approaches. They are assigned tagged sentences, from which they create an internal model of words, sentence structure and tags belonging to words.

PoS Tags are useful to find patterns in the sentence structure of a forum contribution. The hypothesis that must be proven is, that there are established patterns of sentences to state certain kinds of problems or explanations. For example the pattern “Question word, verb” is applicable to question sentences like “What is”, “Who are”.  Effingo can use this knowledge to find types of forum posts and use these types to compare only contributions of the same type. That way the candidate set for comparison is reduced and performance increases.

Unfortunately PoS Tags face the same problem as language dependent text features. Poor quality content is hard, if not impossible to tag. In the best case the Tagger recognizes this problem – in the worst it assigns wrong tags to unrecognised words. A second problem is that even high quality content can contain areas that are impossible to tag, like source code, tables or lists formed of sentence parts. Before applying the Tagger Effingo would need to find such content and exclude them from Tagging. Often such content is annotated by the forum engine and thus easy to find. However code can occur within a free text sentence and lists might be placed without using the forums list feature. Such error sources need to be detected automatically.

Named Entities and Facts

In addition to the usual vocabulary a forum also contains many domain dependent terms. Product names in a customer support forum, location names in a travel forum and person names in a political forum, for example. Such entities usually have different representations like abbreviations and acronyms. There are Named Entity Recognisers (NER) that are able to find such entities and even order different representations to the same concept. Facts in addition are relations between entities like “Windows is a product by Microsoft”. Extracting such entities and facts from contributions enables Effingo to assign a high similarity score to posts that share a high amount of them.

NER and fact extraction faces the same problems as PoS Tagging. The NER system must be trained in advance. It is prone to poor quality content and the training phase is required for each new forum or at least for each new domain of forums. Since annotating a large set of examples for learning by hand is a tedious work, one can not expect a forums operator to do this. In addition NER systems also require a free text without code snippets, tables and lists to do their work correctly.

External Links

Users usually can add links to external resources in their contribution text. Such links usually provide a reference to some external resource the creator of the thread has requested or the answering user thinks is helpful to solve the discussion. Similar discussions will attract similar resources. Therefore it is reasonable that two contributions pointing to the same resource or even resources share a high amount of similarity.

There are however a few problems with link detection. If we are lucky links are marked explicitly by two tags. We assume this will be the case for most links, since this offers the possibility to click the link directly instead of copying it manually to a browsers address bar. Not marked links are harder to detect since there are tokens like the “.” full stop sign or the “?” question mark, that are valid inside a link but often are used to mark the end of a sentence. If the link is set at the sentence end as well, it is hard to distinguish between the sentence mark and the links characters. However even if we can detect most links it gets harder to find out if two links point to the same resource if a resource is accessible under different URLs. Link normalisation can be carried out by looking at the actual target of the link, but this approach increases the work load on the Effingo system for each extracted link and requires comparison of each link target with each other.

Contextual Features

Contextual features are all features from the context of the contribution. One could also say they are the contributions meta information.

Title

A contributions title is similar to the  contributions body in the way that it is free text as well. However a title often carries much more significance, since it is a very condensed description of the topic discussed within the contribution. Effingo can handle it similar to body text and apply local features like “Text”, “PoS Tags” as well as “Named Entities and Facts”, but weight them differently.

However using a contributions title faces the same problems about low quality content as its body. In addition there are many lazy users choosing a very short non-sense title like “buh”, “problem”, “need help” or similar. A solution to these problems might be to apply quality filtering and consider only titles above some length threshold.

The second problem is that contribution titles usually are only set at the beginning of a forum thread. Even though it is possible to give each contribution in the thread another title, few use this feature. So the further away a contribution is from the start of the thread the higher the probability that the topic slides away from the opening contribution. Therefore titles in late contributions need to be handled with care.

Publication Date

At first the publication date seems not to be a good indicator for calculating the similarity of user generated content. However since user generated content is usually coupled closely to real time events it is possible to relate contributions in a given time frame to events occurring at that time. After each release of a new “Ubuntu” version for example the forums at ubuntu.com or ubuntuusers.de are flooded with threads about this release. If the new version has some specific bug, multiple discussions about this bug get created. Shortly after the release their usually is a fix or workaround and discussion about the bug gets fewer. This means a burst of contributions in a certain time frame means there is some important event going on and many contributions in this time frame might belong to that contribution. The second conclusion indicates that contributions from the same time might have a higher similarity. This second points is not absolutely sure and subject to further research. It is also possible that the probability of encountering the same question again increases with the time between two contributions.

A second indicator provided by a contributions date is its correlation to the first occurence of a product or event, which can be used for candidate set generation. If Effingo “knows” that the iPhone was introduced the first time on 9th january 2007 it needs not to compare contributions about the iPhone to contributions from the year 2000 or earlier. Of course there might be interesting rumour before this date so candidates should also be chosen from dates shortly in advance to this deadline. This means the candidate suitability value should gradually slow down the further a contribution is away from the original.

To really use this feature type it is necessary to apply additional feature types.

Quality Rating

Many forums allow users to give ratings to the contributions provided by other users. These usually have the form of a five star system, simple categorisation (“not helpful”, “helpful”, “solution”) or a point system (see slashdot.org). This is a good feature for the final step of labeling a cluster of similar contributions or finding a representative. It can also be used to find the high quality content as proposed to improve the applicability of local features and the title feature type.

Unfortunately high quality content says nothing about the similarity of two contributions.

Contextdescription

The context of a forum can drastically change the meaning of certain terms or improve their usefulness for information retrieval. The term Java has no discriminating power in a Java forum. In a .Net forum on the other hand it marks a small number of threads that can be grouped together, since they will mostly talk about the differences of both programming languages.

In addition if the forums context describes relations between the terms, this would open up the possibility of grouping contributions hierarichally. Posts about components of a bigger concept for example could be grouped as a subcluster. If I have some solution to a problem with Ubuntu’s Twitter client “Gwibber” and another about problems with Ubuntu’s new social media integration and Effingo “knows” that Gwibber is part of Ubuntu it is able to group both together. The answer to the generic problem might be provided for the specific problem without mentioning it and the other way around.

To really apply such a context data structure other techniques like “Named Entity and Fact Recognition” are applied to detect the concepts and relations described in the data structure.

Assigning a context to a forum is however no easy task. Some data structure like an ontology, taxonomy or simply a list of important terms is necessary. Such a data structure is hard to assign and even harder to create. It is subject to further research if there already are structures usable for certain forums. If there are not, an interesting question is if it is possible to motivate forum users to create one as they use the forum. This works quite well for tagging, but is it possible to change tools like Ontofly or Webprotegé, so users of all forums (or a meaningful subset) are able and willing to use them?

Structural Features

Structural features finally are all pieces of information describing the forum around the contribution. They characterise how it is embedded into the forums structure.

Internal Links

In contrast to external links internal links are links between two forum contributions in the same forum. Usually they are created by some expert linking two threads because he knows there is a similarity between both. This leads to the conclusion that two linked threads contain similar contributions with a higher probability. It is even possible to create a link graph over a forum and analyse tightly coupled subgraphs. At least all questions in such a graph that are close to each other might correlate. The further apart two threads are the less similarity between their contributions exists.

In addition to the same problems as described for the detection of external links, internal links are rare and thus Effingo can not depend on them.

Social Graph, Reply Graph

One of the most interesting structural features is a forum’s social or reply graph. The users are the nodes in the graph and there is a directed edge between two users if one user answered to the contribution of the other. There already is some research showing that this graph can be used to capture the expertise of a forums user. Each answer a user provided is the result of some part of the users expertise. If there are several users with overlapping expertise, topics will become evident when looking at the intersections of threads these users answered. In addition it is possible to look at the left and right context of a thread’s development to find out who answers to whom. Users usually posting the last contribution in most threads have a high probability, that they are seen as experts in their area.

Unfortunately I believe that significant patterns will only occur on long threads.

Subforum Graph

As each contribution is assigned to some channel of threads and each channel defines a certain subtopic of the domain discussed in the forum, contributions from the same sub forum have a higher probability of sharing equal topics. But since the most interesting similar threads are those one cannot find in the same channel, Effingo should not absolutely rely on this feature type.

This feature type is extensible to the whole web.

Position in Thread

The last interesting structural feature is the position of a contribution inside its thread. Most obvious is that the first contribution is a question or statement, that is discussed in the other contributions. This means it is usually not helpful to compare apples and oranges so opening contributions should be compared with other opening contributions. At least their probability of being similar is higher. For the general location of two contributions in a thread I think that the distance of two contributions in the same thread has some influence on their probability to be similar. For inter thread similarity the position is helpful to filter out off topic contributions, since they tend to occur at the end of long threads.

Unfortunately there are forums where multiple threads are mingled and questions occur later in a thread’s message stream. Such questions can be detected quite good by existing systems however and there is research on decoupling such mingled threads.

Conclusion

This list of feature types might not be complete and is created from observations of existing forums and partly compiled from related work adapted to the forum domain. In future entries I will show results of detecting similarity relations using some of the features alone and combinations of them.

References

Andrei Z. Broder, AZB 2000, 'Identifying and Filtering Near-Duplicate Documents', Lecture Notes In Computer Science, Annual Symposium on Combinatorial Pattern Matching, 11, 1--10, viewed , , .^
Eugene Agichtein, Carlos Castillo, Debora Donato, Aristides Gionis, Gilad Mishne, 2008, 'Finding high-quality content in social media', Web Search and Web Data Mining, International conference on Web search and web data mining, 1, 183-194, viewed , , .^

by klemens at October 13, 2010 05:00 PM

October 09, 2010

WebKnox

Ontofly – A web-based Ontology editor

Ontology engineering is the task of creating and refining a knowledge model for one or multiple domains. The engineering process requires time, solid knowledge of the target domain, and the right tools. For WebKnox, we devloped a tool called Ontofly that semi-automatically aids the user in creating an ontology using a standard web browser. Our system helps to create ontologies whose target domain can be found on the web and which will be used for extracting knowledge to populate the ontology...

Keep on reading: Ontofly – A web-based Ontology editor

October 09, 2010 06:32 PM

September 26, 2010

SPACE Service Platform Notifications

serviceplatform: Back in #kubuntu after having booted into new SPACEflight system with heavily service-oriented KDE modding, to be shown at #informatik2010

Back in # after having booted into new SPACEflight system with heavily service-oriented KDE modding, to be shown at #

by SPACE Service Platform Notifications at September 26, 2010 01:48 PM

September 23, 2010

WebKnox

Named Entity Definition

There is no agreement on the definition of a named entity in the research community. Often, only instances of concepts in a certain scenario are considered to be Named Entities, the following definition is taken from the NER task. "Named entities are phrases that contain the names of persons, organizations, locations, times, and quantities." (CoNLL 2002) The CoNLL 2002 definition is a very pragmatic one in order to clarify the goals of the NER task. It also unnecessarily limits Named Entit...

Keep on reading: Named Entity Definition

September 23, 2010 08:22 AM

August 30, 2010

Effingo: Near Duplicate Detection

Relationship Detection Step 1: Acquiring a data set

The first step in information retrieval is always to get a dataset of the documents of interest. In the case of web forums, social media or more generally spoken online discussions, this set consists of discussions and the contributions provided by participants. To do meaningful research it is necessary to take a fixed set of such discussions. That way it is possible to repeat experiments thereby watching the influence of different parameters. There are several ways to access online discussions and persist them. The three most obvious are:

  1. Database Dump: If direct access to a forum or similar platform is available one can always take a dump of the current state of the database and start from there. Unfortunately this is also very hard to achieve. Forum data is very valuable and most forum operators would rather not share their data with strangers mostly for security reasons. Setting up an own forum is a very strenuous task – not for technical reasons but because finding a good topic that attracts a large community is not easy. However for Effingo a large sample of a german hobby forum is available.
  2. Crawler: Existing forums publish their data online. This data is freely available for everyone. Therefor it is possible to build a crawler that reads forums and puts the data to a local datastore. That way one can get a large dataset in relatively short time. Unfortunately it is hard to extract the individual discussions and discussion contributions from the HTML code.
  3. Feed Reader: The third approach uses feed readers. Such readers notify all subscribers each time a new post is created. Their output data is standardized (somehow ;) ) so it is much more easy to extract the data and save it to a local dataset. One can use the ROME framework for example.

In this article I will concentrate on step two, because it shows that it is the most complex but also the one that provides the quickest access to the largest dataset. The forum crawler chain consists of four distinct steps:

  1. Find a forum: At first a crawler is not aware of which kind of website it is visiting. By applying clever heuristics or machine learning it can identify a domain, subdomain or even only a subsite as a forum. One simple heuristic would be to look for the term “forum” or “board” in the URL. Such a crawler is called a focused crawler, since it concentrates on finding pages of a specific type.
  2. Clustering of subpages: All pages on the same domain, subdomain or subsite as the root page must be classified to find the different types of forum pages. These are administration pages like login pages or terms of use, channel overview pages, discussion overview pages and the meat content in the form of discussion pages.
  3. Disassemble discussion pages: During the third step the crawler tries to extract the actual content from discussion pages. This can be seen as database reengineering. Most forums are created from a template that describes layout and color scheme merged with the content from a content database. The extractor tries to recreate the database entries by finding the template parts on each discussion page. This step is quite easy for one forum that does not change its layout but becomes very hard for multiple forums or when the forum changes its layout. However it is obvious that most forums share a similar layout and by defining very abstract rules, it should be possible to extract the data that always occurs on all forums. This can be achieved applying machine learning for instance.
  4. Cleaning user data: Forum data is always very heterogeneous and noisy. It not only contains free text but tables, code blocks, lists, several kinds of HTML markup and is very often terminated by a signature block, that does not provide additional information. After the database was recovered during step 3 it is desirable to separate such content. That way it is possible to apply Natural Language Processing (NLP) on the free text parts, concentrating on high quality texts or finding semi structured information inside of tables. Of course such data usually is marked with its own HTML code and should be easy to find. This is not always true. Code for example is very often provided in its own block but often users also just post it like free text driving NLP techniques nuts. Signatures usually simply continue the free text and are hard to find. But they should be excluded from algorithms working on the free text parts of forum contributions or they disturb statistical analysis techniques like TF/IDF. This step therefore annotates parts inside the text of forum contributions with different types of content for further research.

That’s it so far. Stay tuned for further insight into the Effingo project!

by klemens at August 30, 2010 08:26 PM

August 06, 2010

Crowdserving: Use the Internet of Services

docs

Random documents

August 06, 2010 09:34 AM

July 20, 2010

SPACE Service Platform Notifications

serviceplatform: We claim to offer first public USDL description for a real-world service: http://beta.crowdserving.com:3000/events/homepage

We claim to offer first public USDL description for a real-world service: http://beta.crowdserving.com:3000/events/homepage

by SPACE Service Platform Notifications at July 20, 2010 09:35 AM

July 18, 2010

SPACE Service Platform Notifications

serviceplatform: Due to HDD badblocks issue, the last SPACEflight image was most likely corrupted (e.g., OpenJDK). Disk contents cloned, now regenerating....

Due to HDD badblocks issue, the last SPACEflight image was most likely corrupted (e.g., OpenJDK). Disk contents cloned, now regenerating....

by SPACE Service Platform Notifications at July 18, 2010 09:20 PM

July 06, 2010

SPACE Service Platform Notifications

serviceplatform: Today we will organise a #usertesting by offering and consuming services on #crowdserving with invited students from Київський Пол Інстітут.

Today we will organise a # by offering and consuming services on # with invited students from Київський Пол Інстітут.

by SPACE Service Platform Notifications at July 06, 2010 07:09 AM

July 02, 2010

SPACE Service Platform Notifications

serviceplatform: Dissertation submitted. Implementation work to resume shortly, with focus on experiments in the Internet of Services (publication accepted).

Dissertation submitted. Implementation work to resume shortly, with focus on experiments in the Internet of Services (publication accepted).

by SPACE Service Platform Notifications at July 02, 2010 11:55 AM

June 01, 2010

ServFace Notifications

servface: trip to consortium meeting in paris starts tomorrow - last items for the preparation are still on the todo list

servface: trip to consortium meeting in paris starts tomorrow - last items for the preparation are still on the todo list

June 01, 2010 09:48 AM

May 11, 2010

WebKnox

Linked Open Data on the Web Visualization

Based on the W3C Linked Open Data statistics, we created a interactive visualization of the linked data on the web. Controls: Hover over dataset: highlight all data sets it is connected with Left click and move: move the data sets Mousewheel: zoom in/out Right click and move up/down: zoom in/out Right click: zoom to see all data sets ...

Keep on reading: Linked Open Data on the Web Visualization

May 11, 2010 04:10 PM

April 30, 2010

Crowdserving: Use the Internet of Services

Welcome VIT students!

While we are gearing up our Crowdserving service marketplace, we need many users to help us evaluate the functionality. I'm glad that among the most recent signups, there are many students from Vellore Institute of Technology in India who will take part already during the beta phase of the portal. More information on our evaluation plans will be given in a few days.

April 30, 2010 05:55 PM

April 17, 2010

WebKnox

Distribution of Document File Formats on the Web

WebKnox tries to extract information from the web. A great part of the information we are seeking is textual. In order to extract information we need to understand in which formats and structures they are encoded in. Since there is no good overview (correct me when I'm wrong in the comments) on the distribution and number of document files on the web, we created one here. ...

Keep on reading: Distribution of Document File Formats on the Web

April 17, 2010 11:55 AM

April 11, 2010

Crowdserving: Use the Internet of Services

Launch preparations for Universe of Web Services

Aligned with the Deep Web Service Exploration terminology and the general hotness of space-related topics, including ontological n-tuple-spaces, we've found a name for the virtual enterprise (not the spaceship, though) which offers all web services imported from other portals. It will be named Universe of Web Services as one can already read on its new homepage

During the next days, all retrieved and extracted web services and associated metadata will be deployed as relay services. This means that Crowdserving offers registration as service package, SOAP proxy functionality including authentication, SLA management and limited remote monitoring capabilities, but obviously neither the execution part nor any detailed monitoring and adaptation of the running service instances. After some time, we'll have a look at how the metadata is going to be like and if our tools can cope with this amount of services.

April 11, 2010 11:15 AM

April 08, 2010

WebKnox

Fact Extraction with WebKnox

One of the main features of WebKnox is the fact extraction from the web. WebKnox uses a set of techniques to detect facts in attribute value pairs ("display size: 3 inches") free text ("[...]the display size is 3 inches[...]") and HTML tables (|display size|3 inches). WebKnox is however driven by an ontology that is used to guide the extraction process. Since you don't always have such an ontology at hand you can also give WebKnox a set of seed attribute names that help the system to...

Keep on reading: Fact Extraction with WebKnox

April 08, 2010 05:53 PM