GitHub

Dependencies

The code has been tested on Python 3.12. To install the dependencies, run the following command:

pip install -r requirements.txt

Generating datasets

To generate the datasets, run the following command:

# addition
python -m arithmetic_lm.dataset.generate_addition

which will generate the datasets in the data/ directory in the project root.

Training

The training script uses Hydra and can be run as follows:

python -m arithmetic_lm.train +experiment=<experiment_name>

e.g.

# first, generate the datasets if you haven't already using the command above
# to train NanoGPT on 1-3 digit addition task
python -m arithmetic_lm.train +experiment=1/exp1_nanogpt

The parameters can be overridden by using CLI arguments, using the dotted key of the parameter in the YAML config file as such:

# override the batch size to 128
python -m arithmetic_lm.train +experiment=1/exp1_nanogpt training.batch_size=128

# another useful example: use only specified GPUs (id 0 and 3 as seen in nvidia-smi output)
python -m arithmetic_lm.train +experiment=1/exp1_nanogpt training.devices=[0,3]

Note that the key and value are separated by an = sign, and no spaces are used.

Run the training script with --help to see all the available configuration options:
python -m arithmetic_lm.train --help

Multiple experiments can be run using Hydra's multirun feature. E.g.

python -m arithmetic_lm.train -m +experiment=6/exp6_ut,6/exp6_transformer model.args.n_head=1,2,3 model.args.n_embd=96

will launch 6 experiments (serially by default) with all combinations.

For experiment descriptions, see the experiments doc.

Downloading models

From Weights & Biases

Use the scripts/download_model.py script.

Name		Name	Last commit message	Last commit date
Latest commit History 860 Commits
.github/workflows		.github/workflows
.vscode		.vscode
arithmetic_lm		arithmetic_lm
docs		docs
latex		latex
notebooks		notebooks
plots		plots
scripts		scripts
.gitignore		.gitignore
README.md		README.md
figure.mplstyle		figure.mplstyle
mypy.ini		mypy.ini
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Dependencies

Generating datasets

Training

Downloading models

From Weights & Biases

About

Uh oh!

Releases

Packages

Uh oh!

Languages

iibrahimli/msc_thesis

Folders and files

Latest commit

History

Repository files navigation

Dependencies

Generating datasets

Training

Downloading models

From Weights & Biases

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages