![]() ![]() Pearson's Correlation Coefficient = covariance( A, B ) / ( (standardDeviation(A) * standardDeviation(B) ).Pearson's correlation coefficient is one such measure between two objects, A and B, such that: Sum_of_squares=sum(-movie_preferences,2)įor item in movie_preferences if item in movie_preferences])Ĭorrelation is the measure of the linear relationship between the attributes of two objects. # Add up the squares of all the differences # If they have no ratings in common, return 0 # Returns a distance-based similarity score between the movie preferences ofĭef euclidean_distance(movie_preferences,person1,person2): # as guided by the Programming Collective Intelligence book by Toby Segaran # This is an implementation of a Euclidean Distance function in python Euclidean Distance = sqrt( Sum of Squared Differences ) = sqrt ( (x A- x B) 2.Take the square root of the sum from step 3:.Sum of Squared Differences = (x A- x B) 2+(y A- y B) 2.Sum all the squared values from step 2:.Square the differences between each pair of attributes:.Find the differences between each pair of attributes:.For example, if there are two objects, A and B, with attributes x, y, and z, to determine the euclidean distance between the two one need only: This value is found by taking the root of the sum of squared differences between each of their attributes. Symmetry being the property that states that for all x and for all y the similarity of x and y must be the same as the similarity of y and x.Įuclidean Distance between a pair of objects refers to the metric distance between the objects. Depending on the similarity metric used the triangle inequality between objects may hold, but more generally the two properties that must be maintained for similarites is that the measure of similarity must fall within the range of 0 and 1 and symmetry. Similarity can also be seen as the numerical distance between multiple data objects that are typically represented as value between the range of 0 (not similar at all) and 1 (completely similar). Similarity can be roughly described as the measure of how much two or more objects are alike. Data Mining Portfolio Similarity Techniques
0 Comments
Leave a Reply. |