Getting Started
Please reference our project repository to view the codebase for our 2024-2025 UC San Diego Data Science Capstone Project: An Adversarial Framework for Mitigating Gender Bias in Coronary Heart Disease Prediction.
1. Cloning Code
- Clone repository into your working directory:
git clone https://github.com/patsals/CHD-adversarial-nn.git
- Enter the repository directory
CHD-adversarial-nn
2.a. Setting up Environment (Windows Users)
- Install Linux using Windows Subsystem for Linux (WSL) in powershell:
wsl --install
- Install Linux Distribution System through Microsoft store:
Ubuntu 22.04.5 LTS
- Open Ubuntu and wait for download to complete
- Once download complete enter a Username and Password as prompted
- Open VS Code then click on the two caret mark icon on the bottom left of window:
- Select the “Connect to WSL Using Distro…”
- You should see
Ubuntu 22.04.5 LTS
and select that option - Proceed to 2.b instructions
2.b. Setting up Environment (Linux/Mac OS users)
- Note: TensorFlow may not be available for the latest version of Python. As of the development of this project, we are using Python 3.11.
- Create a working environment:
python3 -m venv venv
- Activate the working environment:
source venv/bin/activate
- Download the dependencies:
pip install -r requirements.txt
3. Downloading & Processing Data
- Run the data setup script to extract, transform, and load all of the data locally:
python data_setup.py
for windows userspython3 data_setup.py
for Linux/Mac OS users- adding the
--skip_download
flag skips the requesting/downloading of files
- Run the data processing script to handle null/missing values:
python data_process.py
for windows userspython3 data_process.py
for Linux/Mac OS users
- Note: The setup script makes requests to multiple url endpoints at www.sciencedirect.com to download the files. It is important to lookout for any message logs that do not show
[SUCCESS]
as an output — this indicates an error that needs to be resolved.