GitHub - yunfile123/K-meansCluster: A java implementation of k-means algorithm.It uses ball tree as internal data structure to accelerate the computation.It uses 2-norm distance to compute the similarity between instances.

yunfile123 / K-meansCluster Public

forked from skyline0623/K-meansCluster

Notifications You must be signed in to change notification settings
Fork 0
Star 0

A java implementation of k-means algorithm.It uses ball tree as internal data structure to accelerate the computation.It uses 2-norm distance to compute the similarity between instances.

0 stars 26 forks Branches Tags Activity

Star

Notifications

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
KMeansCluster		KMeansCluster
.gitattributes		.gitattributes
.gitignore		.gitignore
Example Explanation.doc		Example Explanation.doc
example.txt		example.txt
output.txt		output.txt
readme.txt		readme.txt

Repository files navigation

1.The input dataset should have certain form:one line one instance,and neighboured feature should be splitted by space.
2.If you can change the maximum number of instances in a leaf node in the ball tree or the times of running the algorithm to find a better cluster for one specific k,change the conf.properties file in KMeansCluster.
3.The function you can use:
In the class Process:
	//gives your dataset's path and this function will build the internal data structures.	
	public static void loadData(String path) throws IOException;

	/**
	 * @param k  the initial number of clustering centers
	 * @return an entry:the key is the result of clustering.The label starts from 0.The value is the evaluation of the clustering result
	 */
	public static Entry<Integer[], Double> cluster(int k);

	/**gives the evaluation and differential of each k in specific range.you can use these infos to choose a good k for your clustering
	 * @param startK  gives the start point of k for the our try on k(inclusive)
	 * @param endK    gives the end point(exclusive)
	 * @return Entry's key is the evaluation of clustering of each k.The value is the differential of the evaluations--evaluation of k(i) - evaluation of k(i+1) for i in range(startK, endK - 1)
	 * */
	public static Entry<ArrayList<Double>, ArrayList<Double>> cluster(int startK, int endK);


Before call the last two functions,the first one must be called.

About

A java implementation of k-means algorithm.It uses ball tree as internal data structure to accelerate the computation.It uses 2-norm distance to compute the similarity between instances.

Readme