The Wayback Machine - http://web.archive.org/web/20200821015612/https://github.com/tslearn-team/tslearn/issues/268
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature on TimeSeriesKmean: DTW_BaryCenterAverage #268

Open
Ninimama opened this issue Jun 18, 2020 · 7 comments
Open

Feature on TimeSeriesKmean: DTW_BaryCenterAverage #268

Ninimama opened this issue Jun 18, 2020 · 7 comments

Comments

@Ninimama
Copy link

@Ninimama Ninimama commented Jun 18, 2020

Hi,

I opened an issue before but closed it later and decided to say it here in "Feature Request."

I was wondering if you could modify the TimeSeriesKmean function such that it can accept weight (as a callable function) in its metric_params for calculating the dtw_barycenteraveraging.

So:
metric_params = {'weights: ', my_function(data_points)}

So, it is a function that gets a set of data points (observations) and based on that calculates a weight vector and returns it. It gives the flexibility to the user to define a weight function and apply it throughout the clustering process.

(In my problem, for instance, I modified the centroid of the FINAL RESULT and see that it works better for me. However, if such modification can be applied throughout the whole clustering process (and just the final result), it might better enhance the final clusters and result.)

Best,
Nima

@GillesVandewiele
Copy link
Contributor

@GillesVandewiele GillesVandewiele commented Jun 19, 2020

Not sure if I understand the question completely, but there is a sample_weight argument in the fit function of TimeSeriesKmean. Can't you precompute all weights with my_function before calling that and pass it to fit?

@rtavenar
Copy link
Member

@rtavenar rtavenar commented Jun 19, 2020

@GillesVandewiele I think we do have such a parameter for KernelKMeans but not for TimeSeriesKMeans. And I agree that this would be the correct way to implement it.

This should not be too difficult to implement since dtw_barycenter_averaging already accepts weights as input, so I guess it would be a matter of:

  1. add the new sample_weights argument to fit
  2. see in KernelKMeans how this argument is pre-processed and do the same
  3. call the barycenters with adequate weights
  4. use weights for inertia computation

Hence I tag this one as a good first issue: anyone willing to work on this should feel free to open a PR.

@Ninimama
Copy link
Author

@Ninimama Ninimama commented Jun 19, 2020

@GillesVandewiele
@rtavenar

Thanks for the response. I took a look at the argument sample_weights for KernelKmeans fit method. According to my understanding, it seems it only accepts a pre-defined vector for weight.

However, in my case, the weights are changing. In other words, the weights of points in a cluster (to calculate its DBA) are calculated as a function of those points in that cluster and return an array with a length equal to its cluster size.

So, it would be nice if it can accept a function as well.

@rtavenar
Copy link
Member

@rtavenar rtavenar commented Jun 19, 2020

@Ninimama

I understand your point, yet:

  1. depending on the form of your weight computation function, I am not sure that the algorithm at stake in TimeSeriesKMeans would be guaranteed to converge
  2. We will definitely stick to scikit-learn API in this case, and in scikit-learn, sample_weights is assumed to be a vector of fixed weights.
@Ninimama
Copy link
Author

@Ninimama Ninimama commented Jun 19, 2020

@rtavenar

  1. I thought about the convergence problem. However, I think that is what a user should be worried about. So, if someone wants to employ weight function, they should either mathematically or by experiment show that the results are good and the problem can be converged.
    So, wouldn't it be a good idea to have such an ability that one can play with weights? The tslearn package can give a warning to the user that the problem might not get converged or if the number of iteration exceeds. Any opinion?

  2. Yes. I agree that using fixed sample_weights is a stable approach without being worried about the non-convergence error and make sure the result is reliable.

In the end, you are the expert here. So, you definitely know better than me. My field is in electrical engineering (power system) and I am a newbie in this area.

Thanks again for your responses.

@GillesVandewiele
Copy link
Contributor

@GillesVandewiele GillesVandewiele commented Jun 20, 2020

@Ninimama

I understand your point, yet:

1. depending on the form of your weight computation function, I am not sure that the algorithm at stake in `TimeSeriesKMeans` would be guaranteed to converge

2. We will definitely stick to `scikit-learn` API in this case, and in `scikit-learn`, `sample_weights` is assumed to be a vector of fixed weights.

I agree! Although it should be noted that there are some exceptions to this, e.g. the KNN can accept a string for the weights parameter (uniform or based on the distances). It can be a callable as well. While a sample_weight is indeed a vector of weights passed during the fit method.

@rtavenar
Copy link
Member

@rtavenar rtavenar commented Jun 20, 2020

But for knn, weights are just used at predict time, they are not involved in any fit time optimization.

Once again I feel that this could definitely break convergence which is not a desirable behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.