Feature on TimeSeriesKmean: DTW_BaryCenterAverage #268

Ninimama · 2020-06-18T23:49:07Z

Hi,

I opened an issue before but closed it later and decided to say it here in "Feature Request."

I was wondering if you could modify the TimeSeriesKmean function such that it can accept weight (as a callable function) in its metric_params for calculating the dtw_barycenteraveraging.

So:
metric_params = {'weights: ', my_function(data_points)}

So, it is a function that gets a set of data points (observations) and based on that calculates a weight vector and returns it. It gives the flexibility to the user to define a weight function and apply it throughout the clustering process.

(In my problem, for instance, I modified the centroid of the FINAL RESULT and see that it works better for me. However, if such modification can be applied throughout the whole clustering process (and just the final result), it might better enhance the final clusters and result.)

Best,
Nima

GillesVandewiele · 2020-06-19T06:29:38Z

Not sure if I understand the question completely, but there is a sample_weight argument in the fit function of TimeSeriesKmean. Can't you precompute all weights with my_function before calling that and pass it to fit?

rtavenar · 2020-06-19T07:21:19Z

@GillesVandewiele I think we do have such a parameter for KernelKMeans but not for TimeSeriesKMeans. And I agree that this would be the correct way to implement it.

This should not be too difficult to implement since dtw_barycenter_averaging already accepts weights as input, so I guess it would be a matter of:

add the new sample_weights argument to fit
see in KernelKMeans how this argument is pre-processed and do the same
call the barycenters with adequate weights
use weights for inertia computation

Hence I tag this one as a good first issue: anyone willing to work on this should feel free to open a PR.

Ninimama · 2020-06-19T15:56:19Z

@GillesVandewiele
@rtavenar

Thanks for the response. I took a look at the argument sample_weights for KernelKmeans fit method. According to my understanding, it seems it only accepts a pre-defined vector for weight.

However, in my case, the weights are changing. In other words, the weights of points in a cluster (to calculate its DBA) are calculated as a function of those points in that cluster and return an array with a length equal to its cluster size.

So, it would be nice if it can accept a function as well.

rtavenar · 2020-06-19T16:20:23Z

@Ninimama

I understand your point, yet:

depending on the form of your weight computation function, I am not sure that the algorithm at stake in TimeSeriesKMeans would be guaranteed to converge
We will definitely stick to scikit-learn API in this case, and in scikit-learn, sample_weights is assumed to be a vector of fixed weights.

Ninimama · 2020-06-19T17:15:04Z

@rtavenar

I thought about the convergence problem. However, I think that is what a user should be worried about. So, if someone wants to employ weight function, they should either mathematically or by experiment show that the results are good and the problem can be converged.
So, wouldn't it be a good idea to have such an ability that one can play with weights? The tslearn package can give a warning to the user that the problem might not get converged or if the number of iteration exceeds. Any opinion?
Yes. I agree that using fixed sample_weights is a stable approach without being worried about the non-convergence error and make sure the result is reliable.

In the end, you are the expert here. So, you definitely know better than me. My field is in electrical engineering (power system) and I am a newbie in this area.

Thanks again for your responses.

GillesVandewiele · 2020-06-20T06:24:04Z

@Ninimama

I understand your point, yet:

1. depending on the form of your weight computation function, I am not sure that the algorithm at stake in `TimeSeriesKMeans` would be guaranteed to converge

2. We will definitely stick to `scikit-learn` API in this case, and in `scikit-learn`, `sample_weights` is assumed to be a vector of fixed weights.

I agree! Although it should be noted that there are some exceptions to this, e.g. the KNN can accept a string for the weights parameter (uniform or based on the distances). It can be a callable as well. While a sample_weight is indeed a vector of weights passed during the fit method.

rtavenar · 2020-06-20T08:00:31Z

But for knn, weights are just used at predict time, they are not involved in any fit time optimization.

Once again I feel that this could definitely break convergence which is not a desirable behavior.

Ninimama added the new feature label Jun 18, 2020

rtavenar added the good first issue label Jun 19, 2020

Jul	AUG	Sep
	21
2019	2020	2021

tslearn-team / tslearn

Feature on TimeSeriesKmean: DTW_BaryCenterAverage #268

Feature on TimeSeriesKmean: DTW_BaryCenterAverage #268

Ninimama commented Jun 18, 2020

GillesVandewiele commented Jun 19, 2020

rtavenar commented Jun 19, 2020

Ninimama commented Jun 19, 2020

rtavenar commented Jun 19, 2020

Ninimama commented Jun 19, 2020

GillesVandewiele commented Jun 20, 2020 •

edited

rtavenar commented Jun 20, 2020

tslearn-team / tslearn

Join GitHub today

Feature on TimeSeriesKmean: DTW_BaryCenterAverage #268

Feature on TimeSeriesKmean: DTW_BaryCenterAverage #268

Comments

Ninimama commented Jun 18, 2020

GillesVandewiele commented Jun 19, 2020

rtavenar commented Jun 19, 2020

Ninimama commented Jun 19, 2020

rtavenar commented Jun 19, 2020

Ninimama commented Jun 19, 2020

GillesVandewiele commented Jun 20, 2020 • edited

rtavenar commented Jun 20, 2020

GillesVandewiele commented Jun 20, 2020 •

edited