Skip to content

Commit d84c462

Browse files
author
dbickson
committed
adding
1 parent b9482f4 commit d84c462

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

RUN.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -359,7 +359,7 @@ Note that it is possible to configure the behaviour regarding deletion of files.
359359
```
360360
This keeps all the downloaded tars and images in the /tmp folder.
361361

362-
Running example. Assume you got to the full dataset downloaded into s3://mybucket/myfolder. In total there are 40,000 tar files. Further assume you want to run using 20 compute nodes to extract the feature in parallel. In this case you cam run:
362+
Running example. Assume you got to the full dataset downloaded into `s3://mybucket/myfolder`. In total there are 40,000 tar files. Further assume you want to run using 20 compute nodes to extract the feature in parallel. In this case you cam run:
363363

364364
```python
365365
import fastdup
@@ -369,11 +369,11 @@ fastdup.run('s3://mybucket/myfolder', run_mode=1, work_dir='/path/to/work_dir',
369369
```
370370
The first job runs on 2000 tars from 0 to 2000 not including. Next you can run with `min_offset=2000, max_offset=4000` etc.
371371

372-
Once all jobs are finished, collect all the output files from the work_dir into a single location and run:
372+
Once all jobs are finished, collect all the output files from the `work_dir` into a single location and run:
373373

374374
```python
375375
import fastdup
376-
fastdup.run('', run_mode=2, work_dir='/path/to/work_dir)
376+
fastdup.run('', run_mode=2, work_dir='/path/to/work_dir')
377377
```
378378

379379
For running on 50M images you will need an ubuntu machine with 32 cores and 256GB RAM. We are working on further scaling the implementation for the full dataset - stay tuned!

0 commit comments

Comments
 (0)