Tsolakis Dimitrios et al [5] proposed fuzzy clustering based **vector** **quantization** **algorithm** for deal with the following problems. The first one is the high computational cost. The second one is the **vector** **quantization** is required to assign each training sample to only one cluster. The third one is the dependence on initialization. The proposed method having two basic design facts. The first facet concerns the minimization of the specialized objective function that unifies three potentially different approaches namely, c-means, fuzzy c-means and the competitive agglomeration. C-means and fuzzy C-means **algorithm** used to reduce the computational cost, reduce the number of distance then competitive agglomeration used to reduce the size of the cloud clusters. The second facet concerns the development of a novel codeword migration technique. The proposed techniques reduce the computational complexity as well as maintain the high performance level at local minima.

Show more
Motivated by the problem of effectively executing clustering algorithms on very large data sets, we address a model for large scale distributed clustering methods. To this end, we briefly recall some standards on the **quantization** problem and some results on the almost sure convergence of the competitive learning **vector** **quantization** (CLVQ) procedure. A general model for linear distributed asynchronous algorithms well adapted to several parallel computing architectures is also discussed. Our approach brings together this scalable model and the CLVQ **algorithm**, and we call the resulting technique the distributed asynchronous learning **vector** **quantization** **algorithm** (DALVQ). An in- depth analysis of the almost sure convergence of the DALVQ **algorithm** is performed. A striking result is that we prove that the multiple versions of the quantizers distributed among the processors in the parallel architecture asymptotically reach a consensus almost surely. Furthermore, we also show that these versions converge almost surely towards the same nearly optimal value for the **quantization** criterion.

Show more
36 Read more

counterpart convergence theorems for stochastic gradient like those established e.g. in [26, 39, 8]. Of course, its main asset is that it can be implemented very easily when µ is simulatable. In fact, under various names (k-means, Competitive Learning **Vector** **Quantization** **algorithm**, nu´res dynamiques, etc), it has been widely implemented for years in the communities of Artificial Neural Networks, Data Mining and, more recently, Machine Learning, etc, as a clustering procedure producing prototypes and an automatic classifier (see below).

55 Read more

Lara Dantas et al. (2015) studied to use neural networks such as Multi-layer Perceptron, Extreme learning Machine and Reservoir computing to performing early diagnosis of a patient with or without AD and Mild Cognitive Impairment (MCI), and for another common type of disease. This paper also give detail to utilize the Random Forest **Algorithm** and the feature selection method available on Weak called Info Gain Attribute Eval to select proteins from the original set and, thus, make a new protein signature. Through experiments result show that the best performance was obtained with the MLP and the new signatures created with the Random Forest achieved better results than any other system [11].

Show more
The operations of VQ [4], a **vector** encoding method involves the image partitioning into numerous input vectors and then it is compared along with codebook codewords with a view to search the nearby code word of every input **vector**. Within the codebook index, the VQ usually encode every input **vector**. Generally, the codebook size is comparatively small when compared to actual dataset of images; hence, the intention of image compression is achieved. Through the encoded codebook in decoding procedure, the linked sub-images are retrieved precisely. While every sub-image is entirely rebuilt, the decoding process is done. Several researches have derived methods for constructing codebook. The entire VQ algorithms were divided into two types: (1) k-means-based and (2) competitive learning-based. In terms of series of mutual competition procedure, the codebooks were gained in competitive learning-based techniques [5]. To reduce a distortion error of choosing an appropriate codebook, the k-means based methods were modeled. LBG **algorithm** [6] is a popular technique; but it follows a procedure of local search. It has one major disadvantage that the performance is highly based on primary conditions.

Show more
Xiang [31] proposed a color image **quantization** **algorithm** that minimizes the maximum distance between color pixels in each cluster (i.e. the intra-cluster distance). The **algorithm** starts by assigning all the pixels into one cluster. A pixel is then randomly chosen as the head of the cluster. A pixel that is the most distant from its cluster head is chosen as the head of a new cluster. Then, pixels nearer to the head of the new cluster move towards the new head forming the new cluster. This procedure is repeated until the desired number of clusters is obtained. The set of cluster heads forms the colormap. A hybrid competitive learning (HCL) approach combining competitive learning and splitting of the color space was proposed by [19]. HCL starts by randomly choosing a pixel as a cluster centroid. Competitive learning is then applied resulting in assigning all the image pixels to one cluster surrounding the centroid. A splitting process is then conducted by creating another copy of the centroid; competitive learning is then applied on both centroids. This process is repeated until the desired number of clusters is obtained. According to [19], HCL is fast, completely independent of initial conditions and can obtain near global optimal results. When applied to commonly used images, HCL outperformed MCA, VBA and K-means, and performed comparably with competitive learning [19], [20].

Show more
Abstract: Image compression techniques are presented in this paper which can be used for storage and transmission of digital lossy images. It is mostly important in both multimedia and medical field to store huge database and data transfer. Medical images are used for diagnosis purposes. So, **vector** **quantization** is a novel method for lossy image compression that includes codebook design, encoding and decoding stages. Here, we have applied different lossy compression techniques like VQ-LBG (**Vector** **quantization**- Linde, Buzo and Gray **algorithm**), DWT-MSVQ (Discrete wavelet transform-Multistage **Vector** **quantization**), FCM (Fuzzy c-means clustering) and GIFP-FCM (Generalized improved fuzzy partitions-FCM) methods on different medical images to measure the qualities of compression. GIFP-FCM is an extension of classical FCM and IFP-FCM (Improved fuzzy partitions FCM) **algorithm** with a purpose to reward hard membership degree. The presentation is assessed based on the effectiveness of grouping output. In this method, a new objective function is reformulated and minimized so that there is a smooth transition from fuzzy to crisp mode. It is fast, easy to implement and has rapid convergence. Thus, the obtained results show that GIFP-FCM **algorithm** gives better PSNR performance, high CR (compression ratio), less MSE (Mean square error) and less distortion as compared to other above used methods indicating better image compression.

Show more
30 Read more

Discriminative **vector** **quantization** schemes such as learning **vector** quantiza- tion (LVQ) and extensions thereof offer efficient and intuitive classifiers which are based on the representation of classes by prototypes. The original methods, how- ever, rely on the Euclidean distance corresponding to the assumption that the data can be represented by isotropic clusters. For this reason, extensions of the methods to more general metric structures have been proposed such as relevance adaptation in generalized LVQ (GLVQ) and matrix learning in GLVQ. In these approaches, metric parameters are learned based on the given classification task such that a data driven distance measure is found. In this article, we consider full matrix adapta- tion in advanced LVQ schemes; in particular, we introduce matrix learning to a recent statistical formalization of LVQ, robust soft LVQ, and we compare the re- sults on several artificial and real life data sets to matrix learning in GLVQ, which is a derivation of LVQ-like learning based on a (heuristic) cost function. In all cases, matrix adaptation allows a significant improvement of the classification ac- curacy. Interestingly, however, the principled behavior of the models with respect to prototype locations and extracted matrix dimensions shows several characteris- tic differences depending on the data sets.

Show more
26 Read more

Prototype-based methods enjoy a wide popularity in various application domains due to their very intuitive and simple behavior. They represent their decisions in terms of typical representatives contained in the input space and a classification is based on the distance of data as compared to these pro- totypes [61]. Thus, models can be directly inspected by experts since pro- totypes can be treated in the same way as data. Popular techniques in this context include standard learning **vector** **quantization** (LVQ) schemes and extensions to more powerful settings such as variants based on cost func- tions or metric learners such as generalized LVQ (GLVQ) or robust soft LVQ (RSLVQ), for example [81, 84, 88, 85]. These approaches are based on the notion of margin optimization similar to SVM in case of GLVQ [84], or based on a likelihood ratio maximization in case of RSLVQ, respectively [88]. For GLVQ and RSLVQ, learning rules which closely resemble standard LVQ2.1 result, whereby the performance is superior to this latter heuristics, in par- ticular excellent generalization ability can be observed [85]. A few recent ap- plications of LVQ technology can be found in the context of biomedical data analysis or life-long learning, as an example [22, 31, 59]. These applications crucially rely on the representation of the models in terms of representative prototypes which opens the way towards model interpretability and compact model representation, respectively.

Show more
101 Read more

The gain in time is due to two factors: the first one and the most im- portant is related with the number of computations that take place on the learning phase: the ACLVQ is able to update more quantizers at each it- eration than the CLVQ and, at the same time, in the overall process the ACLVQ update less times. The number of iterations needed by the ACLVQ is smaller. This fact relies on the Lloyd modification of the ACLVQ. The computation time needed for this modification is negligible for the whole process. It is important to emphasize that the number of random **vector** simulations generation for both algorithms is similar. The second aspect concerns the procedure used for the computation of the distance matrix in the competitive phase (see Appendix A) which exploits very efficiently the concept of vectorizing algorithms of MATLAB.

Show more
19 Read more

column (fields, attributes, samples, or conditions) [2]. Datasets of breast cancer (Hedenfalk et al, 2001), sugarcane, Mus musculus, A. thaliana (all of NCBI, 2002) and yeast (Eisen et al, 1998) were used for carrying out experiments on the microarray gene expression data using the three variants of LVQ. For all experiments Euclidean measure was taken as the distance metric. Two algorithms of the self-organizing map (SOM) and the three variants of the LVQ **algorithm** were used for cluster analysis of microarray gene expression data. Over 650 experiments were conducted on different datasets using the application of the five algorithms covering two variants of SOM and three variants of LVQ. The number of clusters/classes, weights of the ANNs and the number of iterations were kept constant at 9, 0.5 and 1000 respectively. The learning rate (LR) was gradually increased from 0.1 to 1.0 and correspondingly the clustering/classification error was computed. For all the datasets, data log transformed, except in the case of Mus musculus dataset in which case data pre-processing was applied for zero filling for genes whose expression value was null. Almost none of the data mining algorithms have been very precise in extensive studies of large-scale datasets and genome-wide expression, except that they have been successful in giving “a fair idea” or “a probable match” about the datasets.

Show more
best adapted to the situation of unknown cancer subtypes. In this context, some of the existing approaches are discussed in [Bair and Tibshirani, 2004]. In partic- ular, it is mentioned that an approach by clustering only with respect to X may lead to groups differing through the biological features but unrelated to patient sur- vival, which are not of prime interest for clinicians. Therefore, it is desirable to perform clustering taking into account the censored survival time. To that aim, in the context of genetic data, [Bair and Tibshirani, 2004] propose to select among the components of X only the variables correlated with T (having a large Cox score) and to apply then a non supervised clustering method with respect to the selected covariates. The approach that we propose in this paper may be an alternative way to detect groups related to the survival time without excluding covariates. The idea is that our **algorithm** performs clustering with respect to the whole **vector** (T, X ) using as the input the available set of incomplete observations. As the variable T participates in the procedure directly, the constitution of groups takes naturally into account the survival time.

Show more
23 Read more

With the rapid development of the science and technology, people have paid much attention to the problem of information security. Personal identification systems utilizing personal biometric features naturally became more and more important due to its many applications in different fields, such as, identity authentication, access control, and surveillance. In face recognition, facial feature extraction is the key for recognizing examples accurately and creating more effective systems. There are a lot of techniques for extracting facial feature, for example, Principal Component analysis (PCA) [1], Edge Contour Feature analysis [2], Elastic Bunch Graph Matching [3], etc. Many classification and recognition techniques, such as, KNN [4], Neural networks classification [5] and Support **Vector** Machine (SVM) [6], have also been proposed. As a well-known image coding compression method, **Vector** **Quantization** (VQ) method [7] can be utilized for extracting facial features. For recognition problem, SVM classifier has been shown to give good generalization performance.

Show more
Abstract— In this paper, a novel approach to Arabic letter recognition is proposed. The system is based on the classified **vector** **quantization** (CVQ) technique employing the minimum distance classifier. To prove the robustness of the CVQ system, its performance is compared to that of a standard artificial neural network (ANN)-based solution. In the CVQ system, each input letter is mapped to its class using the minimum Euclidean distance. Simulation results are provided and show that the CVQ system always produces a lower Mean Squared Error (MSE) and higher success rates than the current ANN solutions.

Show more
Leakage due to insensitivity of GLVQ output The fact that the vanilla GLVQ algo- rithm leaks information is not necessarily surprising since a useful **algorithm** always exhibits some degree of sensitivity to its input. A natural cure in the diﬀerential privacy framework, as proposed by the Laplacian mechanism, is to add noise proportional to the global sensi- tivity of the learning **algorithm**. As we will see, however, this is not a feasible solution in the case of GLVQ since GLVQ can exhibit an extremely high sensitivity even on large data sets, due to another eﬀect of the learning **algorithm** in settings where there exists a mismatch of the prototypes and the underlying modality of the data distribution, the **algorithm** needs to distribute the prototypes among the data. For perfectly balanced data distributions, this results in a symmetry breaking of the **algorithm**. This symmetry, i.e. two diﬀerent prototype locations which are regarded as equally good by the **algorithm**, can be disturbed by adding few additional points. We will show that this is the case using the noisy XOR problem (see Fig. 2b):

Show more
24 Read more

The purpose of training is the computation of suitable prototype vectors based on the available example data. The ultimate goal, of course, is general- ization: the successful application of the classifier to novel, unseen data. LVQ training can follow heuristic ideas as in Kohonen’s original LVQ1 [15]. A va- riety of modifications has been suggested in the literature, aiming at better convergence or favorable generalization behavior. A prominent and appealing example is the cost function based Generalized Learning **Vector** **Quantization** (GLVQ) [16]. We will resort to the latter as an example framework in which to introduce and discuss divergence based LVQ. We would like to point out, how- ever, that differentiable divergences could be incorporated into a large variety of cost-function based or heuristic training prescriptions.

Show more
13 Read more

One of the main problem investigated in the past twenty years in Numerical Probability has been the numerical computation of conditional expectations, mostly motivated by problems arising in finance for the pricing of derivative products of American style or more generally known as “callable”. It is also a challenging problem for the implementation of numerical schemes for Backward Stochastic Differential Equations (see [2,3]), Stochastic PDEs (see [32]), for non-linear filtering [57, 68] or Stochastic Control Problems (see [13, 14, 58]). Further references are available in the survey paper [62] devoted to applications of optimal **vector** **quantization** to Numerical Probability. The specificity of these problems in the probabilistic world is that, whatever the selected method is, it suffer in some way or another from the curse of dimensionality. Optimal **quantization** trees (introduced in [2]) is one of the numerical methods designed to cope with this problem (with regression and Monte Carlo-Malliavin method, see [46], [28]). The precise connection between **vector** **quantization** and conditional expectation computation can be summed up in the proposition below.

Show more
51 Read more