Skip to content

The causal discovery toolkit, related algorithms are derived from the matlab version, for ease of use, converted to the python version, so that non-professionals can also use it.

License

Notifications You must be signed in to change notification settings

dario-github/causal-discovery

Repository files navigation

CI codecov version visitors

English / 简体中文

Index

1. Introduction

2. Usage

Installation

  Installing via pip

  GPU Support

Quick Start

  Command Line Usage

  Explanation of Output

Performance

Parameter Description

  Simplified Command Line Version

  Complete Parameter Configuration

   Local_NG_CD

3. Development

Environment Setup

  Creating a Virtual Environment

Building Documentation

Method of Invocation

  Python

  Command Line (see above for parameter details)

The causal discovery algorithm toolkit currently includes:

  • local_ng_cd: see docs/algo/Local_NG_CD.doc for details

Note that:

  • local_ng_cd is a linear model that does not distinguish between discrete and continuous data, and treats them uniformly as continuous values.

It is recommended to use local_ng_cd to test the performance on the dataset first (it is the fastest and the algorithm is the newest, and the results are asymptotically correct, taking into account unknown confounding factors).

See the following text for detailed usage instructions.

python3.7 -m pip install causal-discovery

It is necessary to check the CUDA version manually and install the corresponding version of CuPy. If CuPy is not installed, NumPy will be used for CPU computation by default.

# Check the supported CUDA version
ls /usr/local/ | grep cuda

# Install the corresponding version of CuPy, for example, CUDA 10.0
python3.7 -m poetry add cupy-cuda100
# Check the parameter instructions
python3.7 -m causal_discovery fast-simul-data --help
python3.7 -m causal_discovery run-local-ng-cd --help

# Example of parameters for generating simulated data
python3.7 -m causal_discovery fast-simul-data --cov-matrix '[[0, 1, 0], [0, 0, 0.5], [1, 0, 0]]' --sample-size 10

# Generate a default simulated data set (the first row represents the column index indicating the variable names, and each row represents a sampling record)
python3.7 -m causal_discovery fast-simul-data

# Call the default simulated data set
python3.7 -m causal_discovery run-local-ng-cd simul_data.csv 3 matrixT

The last line of the console log is the path where the calculation result is saved. If the 'output' directory is not specified, it defaults to the current directory.

After calling local_ng_cd with the simulation dataset simul_data.csv, the result is divided into two files:

  1. Trustworthy edges edges_trust.json; trustworthy edges are the paths that directly lead from the cause to the effect (1 hop).

    • Three columns, cause, effect, and causal effect strength.

    • The larger the causal effect strength, the deeper the direct causal relationship is. Positive and negative values indicate positive and negative effects, respectively.

causal  reason  effect
2       3       0.7705689874891608
1       3       0.5863603810291644
5       1       0.0993025854935757
3       4       0.5015018174923119
3       5       0.7071753114627015
6       5       0.6977965771255858
  1. Composite weight synthesize_effect.json. The composite weight is the sum of all directed edge weights from the cause to the effect. The n-step composite weight can be calculated by computing the nth power of the adjacency matrix B.

    • Three columns, cause, effect, and composite causal effect strength (within 5 hops).
causal  reason  effect
2       3       0.7700866938213671
1       3       0.6950546424688089
3       3       0.34082384182310194
5       3       -0.19710467189008646
4       3       0.06902072305646559

It is recommended to use the numpy library provided by conda, which includes MKL provided by Inter and greatly improves the speed of matrix operations (about 50 times faster in the inverse function)

Performance comparison of numpy, cupy, and torch for inverting a 500 x 500 random matrix

Function mean std
numpy.linalg.inv 71.8 ms ± 64.9 ms
cupy.linalg.inv 1.39 ms ± 41.5 µs
torch.inverse 6.02 ms ± 6.26 µs
Usage: __main__.py [OPTIONS] INPUT_FILE TARGET
                   DATA_TYPE:[triple|matrix|matrixT]

  [Causal Discovery Algorithm: Local-NG-CD, Author: Kun Zhang, Year: 2020]
  
Args:
    input_file (str): [Input file address in csv format]
    target (str): [Name of the target variable]
    data_type (DataType): [Data type: triple (triplet [sample index, variable name, value]), 
                           matrix (matrix, row index as variable name, column index as sample index),
                           matrixT (matrix, row index as sample index, column index as variable name)]
    sep (str, optional): [Csv delimiter]. Defaults to ",".
    index_col (str, optional): [Index index for reading csv]. Defaults to None.
    header (str, optional): [Header index for reading csv]. Defaults to None.
    output_dir (str, optional): [Output directory]. Defaults to "./output".
    log_root (str, optional): [Log directory]. Defaults to "./logs".
    verbose (bool, optional): [Whether to print logs to the console]. Defaults to True.
    candidate_two_step (bool, optional): [Whether to enable 2-step relationship filtering]. Defaults to False.

Raises:
    DataTypeError: [Data type error]

Arguments:
  INPUT_FILE                      [required]
  TARGET                          [required]
  DATA_TYPE:[triple|matrix|matrixT]
                                  [required]

Options:
  --sep TEXT                      [default: ,]
  --index-col TEXT
  --header INTEGER
  --output-dir TEXT               [default: ./output]
  --log-root TEXT                 [default: ./logs]
  --verbose / --no-verbose        [default: True]
  --candidate-two-step / --no-candidate-two-step
                                  [default: False]
  --install-completion [bash|zsh|fish|powershell|pwsh]
                                  Install completion for the specified shell
  --show-completion [bash|zsh|fish|powershell|pwsh]
                                  Show completion for the specified shell, to
                                  copy it or customize the installation.

  --help                          Show this message and exit.
# Importing method
from causal_discovery.parameter.algo import LocalNgCdParam

# Parameter Details
target_index: int = Field(0, ge=0)               # Target variable index, default 0, unless necessary, no need to modify
candidate_two_step: bool = True                  # Whether to use the 2-step correlation filtering to obtain more variables. If True, the 2-step correlation is used to filter more variables.
alpha: float = Field(5e-2, ge=0, le=1)           # p-value used in correlation filtering. The smaller the value, the more stringent. Generally, 0.05 or 0.01 is used to represent 95% or 99% confidence level
mb_beta_threshold: float = Field(5e-2, ge=0)     # A threshold used to determine whether the edge is undirected when obtaining factor weights using ALasso regression. The larger the value, the more stringent.
ica_regu: float = Field(1e-3, gt=0)              # A penalty term used to constrain the sparsity when using ICA. The smaller the value, the sparser the resulting graph.
b_orig_trust_value: float = Field(5e-2, gt=0)    # A weight threshold used for further filtering after obtaining the adjacency matrix B. The default value is 0.05, and the larger the value, the more stringent.
# python version: >=3.7
cd $PROJECT_DIR
python3.7 -m pip install -U pip setuptools
python3.7 -m pip install poetry
python3.7 -m poetry install

[Top]

poetry install --extra doc
invoke doc

[Top]

# Algorithm Main Function
from causal_discovery.algorithm import local_ng_cd, fges_mb, mab_lingam  

# Parameter Class
from causal_discovery.parameter.algo import LocalNgCdParam, FgesMbParam, MabLingamParam
# Viewing Parameter Descriptions
python3.7 -m causal_discovery run-local-ng-cd --help

# Calling Example
python3.7 -m causal_discovery run-local-ng-cd simul_data.csv 3 matrixT

About

The causal discovery toolkit, related algorithms are derived from the matlab version, for ease of use, converted to the python version, so that non-professionals can also use it.

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Sponsor this project

Packages

No packages published