|
The management of large data sets, guaranteeing of the actuality and consistency as well as retrieval of data are main features of informationsystems that are implemented in various areas of enterprises. Due to the globalization of markets the need to use actual worldwide distributed information increases. The character of this data - heterogenity, structure, redundance and inconsistence - makes the integration with own data more difficult. At the same time the great amount of data requires suitable precautions for filtering and condensing and for extraction of relevant information. The variety of potential data sources and data structures, different requests for information (for example concerning consistence, actuality and availability), the support of user-related fusion and analysis methods as well as scalability presuppose a flexible and extensible infrastructure. Methodologies and techniques of such an infrastructure shall be developed as generic kernel for efficient aplications to support the information fusion. |
The concept of Lasting Information Fusion denotes all aspects pertinent to a non-transient utilization of an information space. This notion is not consistent with the traditional approach which clearly distinguishes between the analysis and the fusion of data as two distinct, essentially non-recurring processes using predetermined sources and producing results which are used unmodified over a certain period of time. In contrast, we often have to deal with heterogeneous and dynamic data sources. The sources are not only distinct but - with respect to their availability as well as to the data they contain - subject to changes. Existing learning and analyzing techniques leave it up to the user's responsibility to select pertinent sources and data, and to determine the procedures' adequate iteration or supplementation. Thus, they are not suited for a heterogeneous and dynamic framework. The objective of our project is to develop active techniques and tools to support the user in the presence of heterogeneous and dynamic data, and thus to warrant the coordination of the results obtained from different data sources. In order to accomplish this objective, active learning techniques require a precise and operationalized definition of the learning target, e.g. the attainment of optimal prediction or classification accuracy. By means of the given learning target, the learning procedures can determine the data or sources, respectively, to be considered. Moreover, cost functions can be used to examine the expected gain and cost, e.g. with respect to computational or data procurement resources. Revision techniques can furthermore, by means of dynamically comparing data sources with the system's current model, determine whether altered data or data sources grant or even require the current model's modification. We plan to integrate these procedures into interactive software environments in order to allow the user to use them in a fully or partially automated or completely interactive way. |
Expert knowledge in a special domain (e.g. a product, a procedure, regulations etc..) appears only in few cases in the form of formal knowledge structures. Usually it is manifested either in written form in natural-language documents, or it is not directly accessible, as it has not been formally fixed and can only be acquired via the experts themselves. In the two latter cases the acquisition of the knowledge is an expensive and complex process (knowledge acquisition bottleneck). Most approaches to knowledge acquisition, for example the KADS methodology, concentrate at present on the elicitation and formalization of knowledge of domain experts. On the other hand, there has been very little investigation in the process of automatically acquiring knowledge from documents, which contain domain knowledge from experts in natural language. Knowledge which has been verbalized in documents is not represented in a degree of formalization that allows purposeful queries, but has to be extracted laboriously from the documents. A further problem concerns the distributed storage of information: In none of the documents the complete domain knowledge is contained. Different aspects of the domain are described in different documents, but not without redundancies and inconsistencies. Finally inconsistent term usage during textual descriptions of the area and different views on the same circumstances impede the access to the documented knowledge. We envisage a solution for these problems in formally representing the domain knowledge contained in these doucments in the form of a knowledge base. Such a formal representation does not only support various forms of access and therefore applications like in knowledge based systems, but in addition performs the fusion of knowledge contained in these documents. In other words, knowledge which was formerly distributed over numerous documents and often in a redundant way, is now condensed in a single knowledge resource. The goal of the project is the development of elements of an interactive workbench, which puts at the user's disposal tools for the semiautomatic construction of this condensed source of knowledge. |
One important criterion of the information fusion project is to minimize the response times of search requests. Short replies improve the responsiveness of the system and support the interactive usage of the provided tools. Search requests operate on data sets from remote and/or local databases. These data sets are processed on several levels. For that purpose, the data has either to move locally from bottom to top or functions have to move from top to bottom. In some circumstances there is also a vertical movement of data or functions required. In all cases some context levels (in particular address spaces) have to be crossed. Depending on the kind of boundary and the mechanisms supported by the operating system vertical interactions are more or less heavy and have appropriate influence on the response times. The goal of this project is the design and development of an object-oriented runtime environment, which enables the optimization of interactions between software components. In dependence of the application profile and its particular needs the most profitable interaction patterns will be used. The system is based on a suite of prefabricated, family-based invocation stubs. These stubs differ between macro-, precedure-, domain- or message-based invocation protocols. They build the foundation for the composition of the interaction in dependence of the configuration selected by the application. In a way, architecture transparency is established. Application components operate independendly of the fact whether a monolithic or a modular overall system structure is or has to be choosen at runtime. The application specifies the configuration aspects which in turn determine the appropriate interaction patterns. As far as possible, this selection will be done automatically. It is based on specifications, which correlate with the needs of the application on the one hand and describe the (functional) features of the prefabricated invocation stubs on the other hand. An aspect weaver will merge the existing source code of both sides (application and system components) at translation- and/or runtime, based on the choosen invocation stubs. |
Information Fusion is a process which is expected to be heavily influenced by user interaction. On the one hand users need to decide on the kind of fusion to apply, on the other hand the process of fusion needs to be presented in a comprehensible manner. After all a possibility is needed to modify single aspects of the process of the fusion. Therefore techniques and tools need to be developed using which the user is capable of interacting with data to be used in the process of fusion or data that comes out of it. A tight correlation exists between interaction and visualization: illuminating visualization is a prerequisite for enabling the user's access to the underlying data. In addition interaction requires that the visualization allows to trace back visual characteristics to their raw data source or intermediate results of the fusion process. Therefore special datastructures are needed to enrich the visualization in a way that provides the user with the capability to access the characteristics seen in the data directly. This in turn requires the provision of suitable data already at the information fusion. Object movements are seen as a peculiar characteristic attribute of the information fusion in regard to visualization. On the one hand an additional set of presentation variables is needed because the visualization of data to be fusioned often uses the conventional variables such as color, shape, and position for its own. On the other hand it is expected that good correspondents exist for processes of the fusion which can be expressed by object movements. Such object movements are in a need of specific interaction techniques to provide access to moving objects and movement parameters by the user. User interaction is to be considered early in the work of the research group. Thereby data can be collected early and be provided in a structured manner that makes it accessible by the visualization. In case interaction would not be considered early, suitable interaction methods can be brought in later on only at the cost of very high efforts. |
In the project, visual data mining methods will be developed with the goal of examining and linking data from different applications participating in the information fusion process. The main objective of the project is to develop and evaluate a visual data mining system allowing an efficient and effective exploration of the data to be fused. Most of the currently used cluster algorithms can not handle large multidimensional datasets in an efficient and effective way. To overcome these limitations new visualization techniques and combinations of innovative visual and adaptive automated methods will be investigated and implemented. In contrast to existing systems the user will be enabled to assess the significance of the discovered information and to understand the parameter of the automated techniques as well as to tune them. In addition, the visualization techniques allow an abstract understanding of the data and effectively support the integration of informal background knowledge into the exploration process. The user interaction will improve the quality and the effectiveness of the information fusion process. The main applications of the developed visual data mining techniques within the research group are cluster analysis in engineering and molecular biology databases. |
Very important in small and middleclass companies of the casting industry is the computer-aided support of engineer-technical tasks for the casting design. With help of this support and an efficiant information utilization the casting design is to rationalize. Thereby a lot of legacy-applications will be used. In order to provide a field of application for the Workbench for Informationfusion the part of the complex generation of raw products is choosed. This contemplated field comprises some engineering-technical tasks, like check the casting for technical feasibility, check the casting for producibility, the predefinition of moulding and core moulding processes and scheme of allocation of sheets. Thereby a raw part is a casting after the extraction of the mould and after the cleaning shop, without core prints and without gating and risering system. The generation of raw parts depends on a lot of parameters, whereby an interrelation between the parameters is to considered. Examples for raw part parameters are: predefinition of the parting line, position, shape, and dimension of cores, drafts, casting radii, technological necessary features, like ribs, root faces, eyes and notches and material properties. To the single processes or work steps a lot of specific databases ara available, whose data contains redundancies and inconsistencies. Examples are databases about casting defects, design rules or physic variables and so on. A multitude of direct and indirect cause variables must be considered. For instance the moulding and casting process, the casting shape, the complexity of the casting, the dimensions, requirements to the surface quality of casting outlines, the material and the gating and risering system. The worker, which execute the generation of raw parts, uses the mentioned connections in an iterative and interactive process. Many foundries are depend on the expert knowledge of the workers. But this is a critical handicap in the competition and so it is tried to rationalize the field of raw part generation, in order to use new technology resp. to increase the level of automation. There are a lot of other information sources, like work instructions, material information, DIN- and DIN EN-Norms, or foundry specific guides, which must be used in order to support the generation of raw parts. These information are available in different sources (audio, video, photo, text). The aim of the information fusion is to find dependencies over all databases and to support the casting design process with help of these information. Through the interaction of several databases the expert knowlegde of the engineer is supported and completed. For instance the connection of information about casting technology and information about avoidance of casting defects is very important. Resultant of the connection of different databases "new knowledge" is arised, which is applicable during the casting design as well as in CAD-Systems. Another example in our project is the use of data about already arised casting defects in order to select all kinds of moulding and casting processes from a database. Thereby databases with standard values and design rules should be considered. A cooperation of the existing databases is necessary, in order to use dependencies with the aim to provide the information in a new quality. |
The bioinformatics project analysis regulatory DNA-Sequences. A workbench is developed for integrated access to relevant molecular databases and analysis tools (methods). Relevant databases include TRANSFAC, EMBL and EPD. Analysis tools include Blast, AliBaba, AliBaba2 and SiteMiner. Aim of the project is to derive models of promoters from databases through the construction of information flow processes. The promoter models help identifying new promoters in regulatory unknown DNA sequences. Also they help explaining the regulatory function of the promoters. To construct valuable information the information process and its function will be visualized. |
The goal of the life-cycle-spreading integration of environmental relevant material information is the analysis of the effects on the environment, which result from the use of a material, even if the data which are situated to reason are incomplete or partially defective, because only the unification of all material flow, taken part in the use of a material, enables real statements about the impact on the environment. Data gaps and defects are to be discovered, classified and closed as far as possible by methods of the information fusion. Further a description of all relevant data on a superordinate level is made. These form the basis for the investigation of suitable data acquisition methods, which are to find use for extracting environmental relevant data from (not or partial structured) documents. |
|
||||||||
© 1999-2002 Otto-von-Guericke-Universität Magdeburg, Fakultät für Informatik |