Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SweepableEstimator SearchSpace not being fully explored #7085

Open
fwaris opened this issue Mar 20, 2024 · 1 comment
Open

SweepableEstimator SearchSpace not being fully explored #7085

fwaris opened this issue Mar 20, 2024 · 1 comment
Labels
untriaged New issue has not been triaged

Comments

@fwaris
Copy link

fwaris commented Mar 20, 2024

System Information (please complete the following information):

  • OS & Version: Windows 10]
  • ML.NET Version: 0.21.1
  • .NET Version: .Net 8.0

Describe the bug
The SearchSpace is not being fully explored for a SweepableEstimator.
I have a SweepableEstimator where the search space is for the 'k' for KMeans number of clusters.
The range is Min=3, Max=20 and Default = 10. (uniform int).
I am logging the selected k parameter when the SweepableEstimator is called.
The logs show that k hovers around the default value (i.e. 8,9,10,11).
The full space is not explored.

A clear and concise description of what the bug is.
The script that showcases this problem is here

https://github.com/fwaris/MLNetGEOpt/blob/master/MLNetGEOpt/scripts/custering.fsx

Expected behavior
The search space should be explored more fully

Screenshots, Code, Sample Projects
Project: https://github.com/fwaris/MLNetGEOpt

Additional context

Background

The referenced project is a layer of auto ML above the AutoML (of ML.Net). This higher layer is called 'MLNetGEOpt'.

AutoML finds optimal parameters given a SweepablePipeline.

MLNetGEOpt proposes new SweepablePipelines for AutoML to optimize.

It uses a method called "Grammatical Evolution" (GE). The pipelines are constructed according to a given 'grammar'. Each pipeline is a valid 'sentence' constructed from the grammar.

The grammar ensures that the pipelines are reasonable. This greatly reduces the search space - as compared to randomly constructed pipelines - say via a Genetic Algorithm.

Note: I solved for optimal number of clusters by building a grammar that allows for one-of-many SweepableEstimators each tied a particular k.

Here is an example of the grammar (prefix 'se' stands for SweepableEstimator; 'Alt'=select 1 from available options; 'Opt'=optional term):

let g = 
    [
        Estimator seBase
        Opt(Estimator (E.Def.seFtrSelCount 3))        
        Alt [
            Alt ([(1,10); (11,20); (21,30); (31,100)] |> List.map(E.Def.seNorm>>Estimator))
            Estimator E.Def.seNormLpNorm
            Estimator E.Def.seNormLogMeanVar
            Estimator E.Def.seNormMeanVar
            Alt([0.1f .. 0.5f .. 4.0f] |> List.pairwise |> List.map(fun (a,b) -> a, b - 0.001f)  |> List.map(E.Def.seGlobalContrast>>Estimator))
            Estimator E.Def.seNormMinMax
            Estimator E.Def.seNormRobustScaling
        ]
        Alt [for i in 3 .. 20 -> Estimator (seCluster i)]  // this works 
        //Estimator seClusterWithSS                        // this does not work
    ]

For reference, a specific grammar can be constructed from this simple 'meta-grammar':

type Term = 
    | Opt of Term 
    | Pipeline of (unit -> SweepablePipeline)
    | Estimator of (unit -> SweepableEstimator)
    | Alt of Term list
    | Union of Term list
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged label Mar 20, 2024
@fwaris
Copy link
Author

fwaris commented Mar 27, 2024

I just realized that the default tuner is the 'eci cost frugal tuner' that may be searching more narrowly.

I will try with another tuner to see if the search space is explored more fully

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
untriaged New issue has not been triaged
Projects
None yet
Development

No branches or pull requests

1 participant