Note: t for Two (Clusters)

The computation for cluster analysis is done by iterative algorithms. But here, a straightforward, non-iterative procedure is presented for clustering in the special case of one variable and two groups. The method is univariate but may reasonably be applied to multivariate datasets when the first principal component or a single factor explains much of the variation in the data. The t method is motivated by the fact that minimizing the within-groups sum of squares is equivalent to maximizing the between-groups sum of squares, and that Student’s t statistic measures the between-groups difference in means relative to within-groups variation. That is, the t statistic is the ratio of the difference in sample means, divided by the standard error of this difference. So, maximizing the t statistic is developed as a method for clustering univariate data into two clusters. In this situation, the t method gives the same results as the K-means algorithm.

A Unified Theory of the Completeness of Q-Matrices for the DINA Model

Diagnostic classification models in educational measurement describe ability in a knowledge domain as a composite of specific binary skills called “cognitive attributes,” each of which an examinee may or may not have mastered. Attribute Hierarchy Models (AHMs) account for the possibility that attributes are dependent by imposing a hierarchical structure such that mastery of one or more attributes is a prerequisite of mastering one or more other attributes. Thus, the number of meaningfully defined attribute combinations is reduced, so that constructing a complete Q-matrix may be challenging. (The Q-matrix of a cognitively diagnostic test documents which attributes are required for solving which item; the Q-matrix is said to be complete if it guarantees the identifiability of all realizable proficiency classes among examinees.)

Matrix Normal Cluster-Weighted Models

Finite mixtures of regressions with fixed covariates are a commonly used model-based clustering methodology to deal with regression data. However, they assume assignment independence, i.e., the allocation of data points to the clusters is made independently of the distribution of the covariates. To take into account the latter aspect, finite mixtures of regressions with random covariates, also known as cluster-weighted models (CWMs), have been proposed in the univariate and multivariate literature. In this paper, the CWM is extended to matrix data, e.g., those data where a set of variables are simultaneously observed at different time points or locations. Specifically, the cluster-specific marginal distribution of the covariates and the cluster-specific conditional distribution of the responses given the covariates are assumed to be matrix normal.

On Bayesian Analysis of Parsimonious Gaussian Mixture Models

Cluster analysis is the task of grouping a set of objects in such a way that objects in the same cluster are similar to each other. It is widely used in many fields including machine learning, bioinformatics, and computer graphics. In all of these applications, the partition is an inference goal, along with the number of clusters and their distinguishing characteristics. Mixtures of factor analyzers is a special case of model-based clustering which assumes the variance of each cluster comes from a factor analysis model. It simplifies the Gaussian mixture model through parameter dimension reduction and conceptually represents the variables as coming from a lower dimensional subspace where the clusters are separate. In this paper, we introduce a new RJMCMC (reversible-jump Markov chain Monte Carlo) inferential procedure for the family of constrained MFA models.