-
Notifications
You must be signed in to change notification settings - Fork 28
Closed
Labels
bugSomething isn't workingSomething isn't workingleostandalone CLI binarystandalone CLI binaryp0-criticalMax priority (ASAP)Max priority (ASAP)
Description
it looks like leo delete
with --output
and --workdir
doesn't sync changes back.
example main.tf (uses: https://github.com/iterative/magnetic-tiles-defect):
terraform {
required_providers { iterative = { source = "iterative/iterative" } }
}
provider "iterative" {}
resource "iterative_task" "gpu-runner" {
cloud = "aws"
machine = "m+t4"
timeout = 7200 #2 hrs
region = "us-west-1"
image = "nvidia"
disk_size = 100
permission_set = "arn:aws:iam::342840881361:instance-profile/tpi-vscode-example"
storage {
workdir = "."
output = "."
}
environment = {
"REPO_TOKEN" = ""
}
script = <<-END
#!/bin/bash
# setup project requirments
apt-get update
apt-get install python3.9 -y
nvidia-smi
# Run project
pipenv install --skip-lock
pipenv run dvc pull
pipenv run dvc repro --force
git status
END
}
executing commands
#!/bin/bash
leo_id=$(leo create \
--cloud aws \
--region us-west-1)
echo "id: $leo_id"
./leo read \
--cloud aws \
--region us-west-1 \
--follow "$leo_id"
leo delete \
--cloud aws \
--region us-west-1 \
--workdir . \
--output . \
"$leo_id"
logs
$ ./tpi-run.sh
INFO Using identifier tpi-absolutely-square-skunk-49ay0tcn-92xo2t3b
INFO Creating resources...
INFO [1/12] Parsing PermissionSet...
INFO [2/12] Importing DefaultVPC...
INFO [3/12] Importing DefaultVPCSubnets...
INFO [4/12] Reading Image...
INFO [5/12] Creating Bucket...
INFO [6/12] Creating SecurityGroup...
INFO [7/12] Creating KeyPair...
INFO [8/12] Reading Credentials...
INFO [9/12] Creating LaunchTemplate...
INFO [10/12] Creating AutoScalingGroup...
INFO [11/12] Uploading Directory...
INFO Transferring 99.55MB (853 files)...
INFO 9.079 MiB / 94.938 MiB, 10%, 929.045 KiB/s, ETA 1m34s (xfr#153/853)
INFO 19.638 MiB / 94.938 MiB, 21%, 1018.368 KiB/s, ETA 1m15s (xfr#296/853)
INFO 26.135 MiB / 94.938 MiB, 28%, 850.263 KiB/s, ETA 1m22s (xfr#390/853)
INFO 34.420 MiB / 94.938 MiB, 36%, 841.356 KiB/s, ETA 1m13s (xfr#497/853)
INFO 44.787 MiB / 94.938 MiB, 47%, 933.029 KiB/s, ETA 55s (xfr#635/853)
INFO 58.284 MiB / 94.939 MiB, 61%, 1.113 MiB/s, ETA 32s (xfr#756/853)
INFO 71.340 MiB / 94.939 MiB, 75%, 1.206 MiB/s, ETA 19s (xfr#844/853)
INFO 84.857 MiB / 94.939 MiB, 89%, 1.283 MiB/s, ETA 7s
INFO [12/12] Starting task...
INFO Creation completed
id: tpi-absolutely-square-skunk-49ay0tcn-92xo2t3b
INFO Reading resources... (this may happen several times)
INFO [1/9] Reading DefaultVPC...
INFO [2/9] Reading DefaultVPCSubnets...
INFO [3/9] Reading Image...
INFO [4/9] Reading Bucket...
INFO [5/9] Reading SecurityGroup...
INFO [6/9] Reading KeyPair...
INFO [7/9] Reading Credentials...
INFO [8/9] Reading LaunchTemplate...
INFO [9/9] Reading AutoScalingGroup...
INFO Read completed
Waiting for instance......................
Started tpi-task.service.
Get:1 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal InRelease [265 kB]
Get:2 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-updates InRelease [114 kB]
Get:3 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-backports InRelease [108 kB]
Hit:4 https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64 InRelease
Hit:5 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64 InRelease
Hit:6 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64 InRelease
Get:7 http://security.ubuntu.com/ubuntu focal-security InRelease [114 kB]
Get:8 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal/main amd64 Packages [970 kB]
Get:9 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal/main Translation-en [506 kB]
Get:10 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal/main amd64 c-n-f Metadata [29.5 kB]
Get:11 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal/restricted amd64 Packages [22.0 kB]
Get:12 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal/restricted Translation-en [6212 B]
Get:13 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal/restricted amd64 c-n-f Metadata [392 B]
Get:14 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal/universe amd64 Packages [8628 kB]
Get:15 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal/universe Translation-en [5124 kB]
Get:16 http://security.ubuntu.com/ubuntu focal-security/main amd64 Packages [1822 kB]
Get:17 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal/universe amd64 c-n-f Metadata [265 kB]
Get:18 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal/multiverse amd64 Packages [144 kB]
Get:19 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal/multiverse Translation-en [104 kB]
Get:20 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal/multiverse amd64 c-n-f Metadata [9136 B]
Get:21 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-updates/main amd64 Packages [2197 kB]
Get:22 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-updates/main Translation-en [385 kB]
Get:23 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-updates/main amd64 c-n-f Metadata [16.0 kB]
Get:24 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-updates/restricted amd64 Packages [1381 kB]
Get:25 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-updates/restricted Translation-en [196 kB]
Get:26 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-updates/restricted amd64 c-n-f Metadata [600 B]
Get:27 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-updates/universe amd64 Packages [973 kB]
Get:28 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-updates/universe Translation-en [222 kB]
Get:29 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-updates/universe amd64 c-n-f Metadata [21.8 kB]
Get:30 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-updates/multiverse amd64 Packages [29.9 kB]
Get:31 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-updates/multiverse Translation-en [7940 B]
Get:32 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-updates/multiverse amd64 c-n-f Metadata [664 B]
Get:33 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-backports/main amd64 Packages [45.7 kB]
Get:34 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-backports/main Translation-en [16.3 kB]
Get:35 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-backports/main amd64 c-n-f Metadata [1420 B]
Get:36 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-backports/restricted amd64 c-n-f Metadata [116 B]
Get:37 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-backports/universe amd64 Packages [24.0 kB]
Get:38 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-backports/universe Translation-en [16.0 kB]
Get:39 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-backports/universe amd64 c-n-f Metadata [864 B]
Get:40 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-backports/multiverse amd64 c-n-f Metadata [116 B]
Get:41 http://security.ubuntu.com/ubuntu focal-security/main Translation-en [301 kB]
Get:42 http://security.ubuntu.com/ubuntu focal-security/main amd64 c-n-f Metadata [11.2 kB]
Get:43 http://security.ubuntu.com/ubuntu focal-security/restricted amd64 Packages [1289 kB]
Get:44 http://security.ubuntu.com/ubuntu focal-security/restricted Translation-en [183 kB]
Get:45 http://security.ubuntu.com/ubuntu focal-security/universe amd64 Packages [743 kB]
Get:46 http://security.ubuntu.com/ubuntu focal-security/universe Translation-en [137 kB]
Get:47 http://security.ubuntu.com/ubuntu focal-security/universe amd64 c-n-f Metadata [15.3 kB]
Fetched 26.4 MB in 5s (5247 kB/s)
Reading package lists...
Reading package lists...
Building dependency tree...
Reading state information...
The following packages were automatically installed and are no longer required:
python-pip-whl python3-wheel
Use 'apt autoremove' to remove them.
The following additional packages will be installed:
libpython3.9-minimal libpython3.9-stdlib python3.9-minimal
Suggested packages:
python3.9-venv python3.9-doc binfmt-support
The following NEW packages will be installed:
libpython3.9-minimal libpython3.9-stdlib python3.9 python3.9-minimal
0 upgraded, 4 newly installed, 0 to remove and 33 not upgraded.
Need to get 4979 kB of archives.
After this operation, 19.9 MB of additional disk space will be used.
Get:1 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-updates/universe amd64 libpython3.9-minimal amd64 3.9.5-3ubuntu0~20.04.1 [756 kB]
Get:2 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-updates/universe amd64 python3.9-minimal amd64 3.9.5-3ubuntu0~20.04.1 [2022 kB]
Get:3 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-updates/universe amd64 libpython3.9-stdlib amd64 3.9.5-3ubuntu0~20.04.1 [1778 kB]
Get:4 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-updates/universe amd64 python3.9 amd64 3.9.5-3ubuntu0~20.04.1 [423 kB]
debconf: unable to initialize frontend: Dialog
debconf: (TERM is not set, so the dialog frontend is not usable.)
debconf: falling back to frontend: Readline
debconf: unable to initialize frontend: Readline
debconf: (This frontend requires a controlling tty.)
debconf: falling back to frontend: Teletype
dpkg-preconfigure: unable to re-open stdin:
Fetched 4979 kB in 0s (34.1 MB/s)
Selecting previously unselected package libpython3.9-minimal:amd64.
(Reading database ... 147341 files and directories currently installed.)
Preparing to unpack .../libpython3.9-minimal_3.9.5-3ubuntu0~20.04.1_amd64.deb ...
Unpacking libpython3.9-minimal:amd64 (3.9.5-3ubuntu0~20.04.1) ...
Selecting previously unselected package python3.9-minimal.
Preparing to unpack .../python3.9-minimal_3.9.5-3ubuntu0~20.04.1_amd64.deb ...
Unpacking python3.9-minimal (3.9.5-3ubuntu0~20.04.1) ...
Selecting previously unselected package libpython3.9-stdlib:amd64.
Preparing to unpack .../libpython3.9-stdlib_3.9.5-3ubuntu0~20.04.1_amd64.deb ...
Unpacking libpython3.9-stdlib:amd64 (3.9.5-3ubuntu0~20.04.1) ...
Selecting previously unselected package python3.9.
Preparing to unpack .../python3.9_3.9.5-3ubuntu0~20.04.1_amd64.deb ...
Unpacking python3.9 (3.9.5-3ubuntu0~20.04.1) ...
Setting up libpython3.9-minimal:amd64 (3.9.5-3ubuntu0~20.04.1) ...
Setting up python3.9-minimal (3.9.5-3ubuntu0~20.04.1) ...
Setting up libpython3.9-stdlib:amd64 (3.9.5-3ubuntu0~20.04.1) ...
Setting up python3.9 (3.9.5-3ubuntu0~20.04.1) ...
Processing triggers for man-db (2.9.1-1) ...
Processing triggers for mime-support (3.64ubuntu1) ...
Tue Nov 8 01:24:00 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03 Driver Version: 510.47.03 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 On | 00000000:00:1E.0 Off | 0 |
| N/A 40C P8 14W / 70W | 0MiB / 15360MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Creating a virtualenv for this project...
Pipfile: /opt/task/directory/Pipfile
Using /usr/bin/python3.9 (3.9.5) to create virtualenv...
⠴ Creating virtual environment...created virtual environment CPython3.9.5.final.0-64 in 1731ms
creator Venv(dest=/root/.local/share/virtualenvs/directory-6uwWda-_, clear=False, no_vcs_ignore=False, global=False, describe=CPython3Posix)
seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/root/.local/share/virtualenv)
added seed packages: pip==22.2.2, setuptools==65.3.0, wheel==0.37.1
activators BashActivator,CShellActivator,FishActivator,NushellActivator,PowerShellActivator,PythonActivator
✔ Successfully created virtual environment!
Virtualenv location: /root/.local/share/virtualenvs/directory-6uwWda-_
Installing dependencies from Pipfile...
To activate this project's virtualenv, run pipenv shell.
Alternatively, run a command inside the virtualenv with pipenv run.
A models/
A data/MAGNETIC_TILE_SURFACE_DEFECTS/images/
A data/MAGNETIC_TILE_SURFACE_DEFECTS/masks/
A data/MAGNETIC_TILE_SURFACE_DEFECTS/train_images/
A data/MAGNETIC_TILE_SURFACE_DEFECTS/train_masks/
A data/MAGNETIC_TILE_SURFACE_DEFECTS/test_images/
A data/MAGNETIC_TILE_SURFACE_DEFECTS/test_masks/
7 files added and 785 files fetched
Running stage 'data_load':
> python src/stages/data_load.py --config=params.yaml
Matplotlib is building the font cache; this may take a moment.
100%|██████████| 392/392 [00:00<00:00, 495.12it/s]
Running stage 'data_split':
> python src/stages/data_split.py --config=params.yaml
Updating lock file 'dvc.lock'
Running stage 'train':
> python src/stages/train.py --config=params.yaml
/root/.local/share/virtualenvs/directory-6uwWda-_/lib/python3.9/site-packages/torch/_tensor.py:1142: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
ret = func(*args, **kwargs)
Downloading: "https://download.pytorch.org/models/resnet34-b627a593.pth" to /root/.cache/torch/hub/checkpoints/resnet34-b627a593.pth
100%|██████████| 83.3M/83.3M [00:00<00:00, 191MB/s]
INFO:dvclive:Report path (if generated): /opt/task/directory/training_metrics/report.html
epoch train_loss valid_loss time
0 0.395255 0.593410 00:13
epoch train_loss valid_loss time
0 0.345424 0.327121 00:11
1 0.304969 0.256029 00:10
2 0.306849 0.338111 00:10
3 0.291200 0.235138 00:10
4 0.313439 0.274647 00:10
5 0.282574 0.608300 00:10
6 0.235368 0.123192 00:11
7 0.190397 0.102500 00:11
8 0.158510 0.095538 00:11
9 0.143658 0.094383 00:11
Updating lock file 'dvc.lock'
Running stage 'evaluate':
> python src/stages/eval.py --config=params.yaml
/opt/task/directory/src/eval_utils.py:57: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`). Consider using `matplotlib.pyplot.close()`.
fig, axarr = plt.subplots(1, 3)
100%|██████████| 78/78 [00:50<00:00, 1.56it/s]
Updating lock file 'dvc.lock'
To track the changes with git, run:
git add dvc.lock
To enable auto staging, run:
dvc config core.autostage true
Use `dvc push` to send your updates to remote storage.
On branch temp
Changes not staged for commit:
(use "git add/rm <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: dvc.lock
deleted: main.tf
modified: metrics.json
modified: tpi-run.sh
modified: training_metrics.json
modified: training_metrics/report.html
modified: training_metrics/scalars/epoch.tsv
modified: training_metrics/scalars/eval/loss.tsv
modified: training_metrics/scalars/train/loss.tsv
Untracked files:
(use "git add <file>..." to include in what will be committed)
leo
test.tf
no changes added to commit (use "git add" and/or "git commit -a")
tpi-task.service: Succeeded.
INFO Deleting resources...
INFO Reading resources... (this may happen several times)
INFO [1/9] Reading DefaultVPC...
INFO [2/9] Reading DefaultVPCSubnets...
INFO [3/9] Reading Image...
INFO [1/6] Deleting AutoScalingGroup...
INFO [2/6] Deleting LaunchTemplate...
INFO [3/6] Deleting KeyPair...
INFO [4/6] Deleting SecurityGroup...
INFO [5/6] Reading Credentials...
INFO [6/6] Deleting Bucket...
INFO Deletion completed
$ git status
On branch temp
nothing to commit, working tree clean
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingleostandalone CLI binarystandalone CLI binaryp0-criticalMax priority (ASAP)Max priority (ASAP)