Skip to content

leo delete doesn't sync data back? #712

@dacbd

Description

@dacbd

it looks like leo delete with --output and --workdir doesn't sync changes back.

example main.tf (uses: https://github.com/iterative/magnetic-tiles-defect):

terraform {
  required_providers { iterative = { source = "iterative/iterative" } }
}
provider "iterative" {}

resource "iterative_task" "gpu-runner" {
  cloud = "aws"
  machine = "m+t4"
  timeout = 7200 #2 hrs
  region = "us-west-1"
  image = "nvidia"
  disk_size = 100
  permission_set = "arn:aws:iam::342840881361:instance-profile/tpi-vscode-example"
  storage {
    workdir = "."
    output = "."
  }
  environment = {
    "REPO_TOKEN" = ""
  }
  script = <<-END
    #!/bin/bash
    # setup project requirments
    apt-get update
    apt-get install python3.9 -y

    nvidia-smi

    # Run project
    pipenv install --skip-lock
    pipenv run dvc pull
    pipenv run dvc repro --force
    
    git status
  END
}

executing commands

#!/bin/bash

leo_id=$(leo create \
  --cloud aws \
  --region us-west-1)

echo "id: $leo_id"

./leo read \
  --cloud aws \
  --region us-west-1 \
  --follow "$leo_id"

leo delete \
  --cloud aws \
  --region us-west-1 \
  --workdir . \
  --output . \
  "$leo_id"
logs

$ ./tpi-run.sh 
INFO Using identifier tpi-absolutely-square-skunk-49ay0tcn-92xo2t3b 
INFO Creating resources...                        
INFO [1/12] Parsing PermissionSet...              
INFO [2/12] Importing DefaultVPC...               
INFO [3/12] Importing DefaultVPCSubnets...        
INFO [4/12] Reading Image...                      
INFO [5/12] Creating Bucket...                    
INFO [6/12] Creating SecurityGroup...             
INFO [7/12] Creating KeyPair...                   
INFO [8/12] Reading Credentials...                
INFO [9/12] Creating LaunchTemplate...            
INFO [10/12] Creating AutoScalingGroup...         
INFO [11/12] Uploading Directory...               
INFO Transferring 99.55MB (853 files)...          
INFO     9.079 MiB / 94.938 MiB, 10%, 929.045 KiB/s, ETA 1m34s (xfr#153/853) 
INFO    19.638 MiB / 94.938 MiB, 21%, 1018.368 KiB/s, ETA 1m15s (xfr#296/853) 
INFO    26.135 MiB / 94.938 MiB, 28%, 850.263 KiB/s, ETA 1m22s (xfr#390/853) 
INFO    34.420 MiB / 94.938 MiB, 36%, 841.356 KiB/s, ETA 1m13s (xfr#497/853) 
INFO    44.787 MiB / 94.938 MiB, 47%, 933.029 KiB/s, ETA 55s (xfr#635/853) 
INFO    58.284 MiB / 94.939 MiB, 61%, 1.113 MiB/s, ETA 32s (xfr#756/853) 
INFO    71.340 MiB / 94.939 MiB, 75%, 1.206 MiB/s, ETA 19s (xfr#844/853) 
INFO    84.857 MiB / 94.939 MiB, 89%, 1.283 MiB/s, ETA 7s 
INFO [12/12] Starting task...                     
INFO Creation completed                           
id: tpi-absolutely-square-skunk-49ay0tcn-92xo2t3b
INFO Reading resources... (this may happen several times) 
INFO [1/9] Reading DefaultVPC...                  
INFO [2/9] Reading DefaultVPCSubnets...           
INFO [3/9] Reading Image...                       
INFO [4/9] Reading Bucket...                      
INFO [5/9] Reading SecurityGroup...               
INFO [6/9] Reading KeyPair...                     
INFO [7/9] Reading Credentials...                 
INFO [8/9] Reading LaunchTemplate...              
INFO [9/9] Reading AutoScalingGroup...            
INFO Read completed                               
Waiting for instance......................
Started tpi-task.service.
Get:1 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal InRelease [265 kB]
Get:2 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-updates InRelease [114 kB]
Get:3 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-backports InRelease [108 kB]
Hit:4 https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64  InRelease
Hit:5 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  InRelease
Hit:6 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64  InRelease
Get:7 http://security.ubuntu.com/ubuntu focal-security InRelease [114 kB]
Get:8 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal/main amd64 Packages [970 kB]
Get:9 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal/main Translation-en [506 kB]
Get:10 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal/main amd64 c-n-f Metadata [29.5 kB]
Get:11 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal/restricted amd64 Packages [22.0 kB]
Get:12 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal/restricted Translation-en [6212 B]
Get:13 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal/restricted amd64 c-n-f Metadata [392 B]
Get:14 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal/universe amd64 Packages [8628 kB]
Get:15 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal/universe Translation-en [5124 kB]
Get:16 http://security.ubuntu.com/ubuntu focal-security/main amd64 Packages [1822 kB]
Get:17 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal/universe amd64 c-n-f Metadata [265 kB]
Get:18 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal/multiverse amd64 Packages [144 kB]
Get:19 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal/multiverse Translation-en [104 kB]
Get:20 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal/multiverse amd64 c-n-f Metadata [9136 B]
Get:21 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-updates/main amd64 Packages [2197 kB]
Get:22 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-updates/main Translation-en [385 kB]
Get:23 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-updates/main amd64 c-n-f Metadata [16.0 kB]
Get:24 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-updates/restricted amd64 Packages [1381 kB]
Get:25 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-updates/restricted Translation-en [196 kB]
Get:26 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-updates/restricted amd64 c-n-f Metadata [600 B]
Get:27 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-updates/universe amd64 Packages [973 kB]
Get:28 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-updates/universe Translation-en [222 kB]
Get:29 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-updates/universe amd64 c-n-f Metadata [21.8 kB]
Get:30 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-updates/multiverse amd64 Packages [29.9 kB]
Get:31 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-updates/multiverse Translation-en [7940 B]
Get:32 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-updates/multiverse amd64 c-n-f Metadata [664 B]
Get:33 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-backports/main amd64 Packages [45.7 kB]
Get:34 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-backports/main Translation-en [16.3 kB]
Get:35 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-backports/main amd64 c-n-f Metadata [1420 B]
Get:36 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-backports/restricted amd64 c-n-f Metadata [116 B]
Get:37 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-backports/universe amd64 Packages [24.0 kB]
Get:38 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-backports/universe Translation-en [16.0 kB]
Get:39 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-backports/universe amd64 c-n-f Metadata [864 B]
Get:40 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-backports/multiverse amd64 c-n-f Metadata [116 B]
Get:41 http://security.ubuntu.com/ubuntu focal-security/main Translation-en [301 kB]
Get:42 http://security.ubuntu.com/ubuntu focal-security/main amd64 c-n-f Metadata [11.2 kB]
Get:43 http://security.ubuntu.com/ubuntu focal-security/restricted amd64 Packages [1289 kB]
Get:44 http://security.ubuntu.com/ubuntu focal-security/restricted Translation-en [183 kB]
Get:45 http://security.ubuntu.com/ubuntu focal-security/universe amd64 Packages [743 kB]
Get:46 http://security.ubuntu.com/ubuntu focal-security/universe Translation-en [137 kB]
Get:47 http://security.ubuntu.com/ubuntu focal-security/universe amd64 c-n-f Metadata [15.3 kB]
Fetched 26.4 MB in 5s (5247 kB/s)
Reading package lists...
Reading package lists...
Building dependency tree...
Reading state information...
The following packages were automatically installed and are no longer required:
  python-pip-whl python3-wheel
Use 'apt autoremove' to remove them.
The following additional packages will be installed:
  libpython3.9-minimal libpython3.9-stdlib python3.9-minimal
Suggested packages:
  python3.9-venv python3.9-doc binfmt-support
The following NEW packages will be installed:
  libpython3.9-minimal libpython3.9-stdlib python3.9 python3.9-minimal
0 upgraded, 4 newly installed, 0 to remove and 33 not upgraded.
Need to get 4979 kB of archives.
After this operation, 19.9 MB of additional disk space will be used.
Get:1 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-updates/universe amd64 libpython3.9-minimal amd64 3.9.5-3ubuntu0~20.04.1 [756 kB]
Get:2 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-updates/universe amd64 python3.9-minimal amd64 3.9.5-3ubuntu0~20.04.1 [2022 kB]
Get:3 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-updates/universe amd64 libpython3.9-stdlib amd64 3.9.5-3ubuntu0~20.04.1 [1778 kB]
Get:4 http://us-west-1.ec2.archive.ubuntu.com/ubuntu focal-updates/universe amd64 python3.9 amd64 3.9.5-3ubuntu0~20.04.1 [423 kB]
debconf: unable to initialize frontend: Dialog
debconf: (TERM is not set, so the dialog frontend is not usable.)
debconf: falling back to frontend: Readline
debconf: unable to initialize frontend: Readline
debconf: (This frontend requires a controlling tty.)
debconf: falling back to frontend: Teletype
dpkg-preconfigure: unable to re-open stdin:
Fetched 4979 kB in 0s (34.1 MB/s)
Selecting previously unselected package libpython3.9-minimal:amd64.
(Reading database ... 147341 files and directories currently installed.)
Preparing to unpack .../libpython3.9-minimal_3.9.5-3ubuntu0~20.04.1_amd64.deb ...
Unpacking libpython3.9-minimal:amd64 (3.9.5-3ubuntu0~20.04.1) ...
Selecting previously unselected package python3.9-minimal.
Preparing to unpack .../python3.9-minimal_3.9.5-3ubuntu0~20.04.1_amd64.deb ...
Unpacking python3.9-minimal (3.9.5-3ubuntu0~20.04.1) ...
Selecting previously unselected package libpython3.9-stdlib:amd64.
Preparing to unpack .../libpython3.9-stdlib_3.9.5-3ubuntu0~20.04.1_amd64.deb ...
Unpacking libpython3.9-stdlib:amd64 (3.9.5-3ubuntu0~20.04.1) ...
Selecting previously unselected package python3.9.
Preparing to unpack .../python3.9_3.9.5-3ubuntu0~20.04.1_amd64.deb ...
Unpacking python3.9 (3.9.5-3ubuntu0~20.04.1) ...
Setting up libpython3.9-minimal:amd64 (3.9.5-3ubuntu0~20.04.1) ...
Setting up python3.9-minimal (3.9.5-3ubuntu0~20.04.1) ...
Setting up libpython3.9-stdlib:amd64 (3.9.5-3ubuntu0~20.04.1) ...
Setting up python3.9 (3.9.5-3ubuntu0~20.04.1) ...
Processing triggers for man-db (2.9.1-1) ...
Processing triggers for mime-support (3.64ubuntu1) ...
Tue Nov  8 01:24:00 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            On   | 00000000:00:1E.0 Off |                    0 |
| N/A   40C    P8    14W /  70W |      0MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
Creating a virtualenv for this project...
Pipfile: /opt/task/directory/Pipfile
Using /usr/bin/python3.9 (3.9.5) to create virtualenv...
⠴ Creating virtual environment...created virtual environment CPython3.9.5.final.0-64 in 1731ms
  creator Venv(dest=/root/.local/share/virtualenvs/directory-6uwWda-_, clear=False, no_vcs_ignore=False, global=False, describe=CPython3Posix)
  seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/root/.local/share/virtualenv)
    added seed packages: pip==22.2.2, setuptools==65.3.0, wheel==0.37.1
  activators BashActivator,CShellActivator,FishActivator,NushellActivator,PowerShellActivator,PythonActivator
✔ Successfully created virtual environment!
Virtualenv location: /root/.local/share/virtualenvs/directory-6uwWda-_
Installing dependencies from Pipfile...
To activate this project's virtualenv, run pipenv shell.
Alternatively, run a command inside the virtualenv with pipenv run.
A       models/
A       data/MAGNETIC_TILE_SURFACE_DEFECTS/images/
A       data/MAGNETIC_TILE_SURFACE_DEFECTS/masks/
A       data/MAGNETIC_TILE_SURFACE_DEFECTS/train_images/
A       data/MAGNETIC_TILE_SURFACE_DEFECTS/train_masks/
A       data/MAGNETIC_TILE_SURFACE_DEFECTS/test_images/
A       data/MAGNETIC_TILE_SURFACE_DEFECTS/test_masks/
7 files added and 785 files fetched
Running stage 'data_load':
> python src/stages/data_load.py --config=params.yaml
Matplotlib is building the font cache; this may take a moment.
100%|██████████| 392/392 [00:00<00:00, 495.12it/s]
Running stage 'data_split':
> python src/stages/data_split.py --config=params.yaml
Updating lock file 'dvc.lock'
Running stage 'train':
> python src/stages/train.py --config=params.yaml
/root/.local/share/virtualenvs/directory-6uwWda-_/lib/python3.9/site-packages/torch/_tensor.py:1142: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  ret = func(*args, **kwargs)
Downloading: "https://download.pytorch.org/models/resnet34-b627a593.pth" to /root/.cache/torch/hub/checkpoints/resnet34-b627a593.pth
100%|██████████| 83.3M/83.3M [00:00<00:00, 191MB/s]
INFO:dvclive:Report path (if generated): /opt/task/directory/training_metrics/report.html
epoch     train_loss  valid_loss  time
0         0.395255    0.593410    00:13                                                  
epoch     train_loss  valid_loss  time
0         0.345424    0.327121    00:11                                                   
1         0.304969    0.256029    00:10                                                   
2         0.306849    0.338111    00:10                                                   
3         0.291200    0.235138    00:10                                                   
4         0.313439    0.274647    00:10                                                   
5         0.282574    0.608300    00:10                                                   
6         0.235368    0.123192    00:11                                                   
7         0.190397    0.102500    00:11                                                   
8         0.158510    0.095538    00:11                                                   
9         0.143658    0.094383    00:11                                                    
Updating lock file 'dvc.lock'
Running stage 'evaluate':
> python src/stages/eval.py --config=params.yaml
/opt/task/directory/src/eval_utils.py:57: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`). Consider using `matplotlib.pyplot.close()`.
  fig, axarr = plt.subplots(1, 3)
100%|██████████| 78/78 [00:50<00:00,  1.56it/s]                      
Updating lock file 'dvc.lock'
To track the changes with git, run:
    git add dvc.lock
To enable auto staging, run:
	dvc config core.autostage true
Use `dvc push` to send your updates to remote storage.
On branch temp
Changes not staged for commit:
  (use "git add/rm <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   dvc.lock
	deleted:    main.tf
	modified:   metrics.json
	modified:   tpi-run.sh
	modified:   training_metrics.json
	modified:   training_metrics/report.html
	modified:   training_metrics/scalars/epoch.tsv
	modified:   training_metrics/scalars/eval/loss.tsv
	modified:   training_metrics/scalars/train/loss.tsv
Untracked files:
  (use "git add <file>..." to include in what will be committed)
	leo
	test.tf
no changes added to commit (use "git add" and/or "git commit -a")
tpi-task.service: Succeeded.
INFO Deleting resources...                        
INFO Reading resources... (this may happen several times) 
INFO [1/9] Reading DefaultVPC...                  
INFO [2/9] Reading DefaultVPCSubnets...           
INFO [3/9] Reading Image...                       
INFO [1/6] Deleting AutoScalingGroup...           
INFO [2/6] Deleting LaunchTemplate...             
INFO [3/6] Deleting KeyPair...                    
INFO [4/6] Deleting SecurityGroup...              
INFO [5/6] Reading Credentials...                 
INFO [6/6] Deleting Bucket...                     
INFO Deletion completed  
$ git status
On branch temp
nothing to commit, working tree clean

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingleostandalone CLI binaryp0-criticalMax priority (ASAP)

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions