Knowledge Discovery Databases

Publications, Reports & Invited Talks

Dr. Biswas's Homepage

  Frawley et al. define knowledge discovery to be "the non trivial extraction of implicit, previously unknown and potentially useful information in data". In knowledge discovery from databases(KDD), machine learning techniques have been adapted to large scale databases for discoverying task. The discovery method, which is at the core of the generic architecture for a discovery system, computes and evaluates groupings, patterns, and relationships in the context of a problem solving task. The groupings, patterns, and relationships are derived from raw data extracted from a database, or a preprocessed form of the raw data. Preprocessing may be done by statistical or by knowledge-based techniques. Depending on the discovery method used, the knowledge produced may be in different forms:
  • data objects organized into groups or categories, and each group represents a relevant concept in the problem solving domain. Inductive discovery methods in this category are called clustering methods,
  • classification rules that identify a group of objects that have the same label or differentiate among groups of objects that have different labels. These methods are termed classification/regression methods, and
  • descriptive regularities, qualitative or quantitative among sets of parameters drawn from object descriptions Inductive methods in this category are called empirical discovery methods.
Our work in this field include (1) ITERATE, the conceptual clustering system that generates stable and cohesive clusters through ADO-star data ordering technique and iterative redistribution strategy, and (2) knowedge-based equation discovery system that defines homogenuous context using clustering techniques and derives analytical equations for the response variable under proper context.

More recently, we have been looking at unsupervised learning(clustering) techniques with temporal sequences of data. The goal is to clarify objects with temporal features, and this will find applications in domains, such as analysis of Pediatric Intensive Care Unit (PICU) patients, and classification of faults in complex, dynamic systems. Recent papers discuss our Hidden Marker Model (HMM) - based algorithm for clustering of data objects with continuous time sequence features.

To contact the Knowledge Discovery from Databases group send Email to

Last modified: 30 September 1999 / About this document
Send feedback to the webmaster