Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions .github/workflows/replicate-push.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
name: Push to Replicate

on:
workflow_dispatch:
inputs:
model_name:
description: "owner/model-name (Replicate)"
required: true
type: string

jobs:
push:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Setup Cog
uses: replicate/setup-cog@v2
with:
token: ${{ secrets.REPLICATE_API_TOKEN }}

- name: Push
run: cog push r8.im/${{ inputs.model_name }}
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@
*.pkl
*.pth
result*
weights*
# weights*
62 changes: 39 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,74 +1,83 @@
## CRAFT: Character-Region Awareness For Text detection

Official Pytorch implementation of CRAFT text detector | [Paper](https://arxiv.org/abs/1904.01941) | [Pretrained Model](https://drive.google.com/open?id=1Jk4eGD7crsqCCg9C9VjCLkMN3ze8kutZ) | [Supplementary](https://youtu.be/HI8MzpY8KMI)

**[Youngmin Baek](mailto:[email protected]), Bado Lee, Dongyoon Han, Sangdoo Yun, Hwalsuk Lee.**

Clova AI Research, NAVER Corp.

### Sample Results

### Overview
PyTorch implementation for CRAFT text detector that effectively detect text area by exploring each character region and affinity between characters. The bounding box of texts are obtained by simply finding minimum bounding rectangles on binary map after thresholding character region and affinity scores.

PyTorch implementation for CRAFT text detector that effectively detect text area by exploring each character region and affinity between characters. The bounding box of texts are obtained by simply finding minimum bounding rectangles on binary map after thresholding character region and affinity scores.

<img width="1000" alt="teaser" src="./figures/craft_example.gif">

## Updates

**13 Jun, 2019**: Initial update
**20 Jul, 2019**: Added post-processing for polygon result
**28 Sep, 2019**: Added the trained model on IC15 and the link refiner


## Getting started

### Install dependencies

#### Requirements

- PyTorch>=0.4.1
- torchvision>=0.2.1
- opencv-python>=3.4.2
- check requiremtns.txt

```
pip install -r requirements.txt
```

### Training
The code for training is not included in this repository, and we cannot release the full training code for IP reason.

The code for training is not included in this repository, and we cannot release the full training code for IP reason.

### Test instruction using pretrained model

- Download the trained models

*Model name* | *Used datasets* | *Languages* | *Purpose* | *Model Link* |
| :--- | :--- | :--- | :--- | :--- |
General | SynthText, IC13, IC17 | Eng + MLT | For general purpose | [Click](https://drive.google.com/open?id=1Jk4eGD7crsqCCg9C9VjCLkMN3ze8kutZ)
IC15 | SynthText, IC15 | Eng | For IC15 only | [Click](https://drive.google.com/open?id=1i2R7UIUqmkUtF0jv_3MXTqmQ_9wuAnLf)
LinkRefiner | CTW1500 | - | Used with the General Model | [Click](https://drive.google.com/open?id=1XSaFwBkOaFOdtk4Ane3DFyJGPRw6v5bO)
| _Model name_ | _Used datasets_ | _Languages_ | _Purpose_ | _Model Link_ |
| :----------- | :-------------------- | :---------- | :-------------------------- | :-------------------------------------------------------------------------- |
| General | SynthText, IC13, IC17 | Eng + MLT | For general purpose | [Click](https://drive.google.com/open?id=1Jk4eGD7crsqCCg9C9VjCLkMN3ze8kutZ) |
| IC15 | SynthText, IC15 | Eng | For IC15 only | [Click](https://drive.google.com/open?id=1i2R7UIUqmkUtF0jv_3MXTqmQ_9wuAnLf) |
| LinkRefiner | CTW1500 | - | Used with the General Model | [Click](https://drive.google.com/open?id=1XSaFwBkOaFOdtk4Ane3DFyJGPRw6v5bO) |

* Run with pretrained model
``` (with python 3.7)

```(with python 3.7)
python test.py --trained_model=[weightfile] --test_folder=[folder path to test images]
```

The result image and socre maps will be saved to `./result` by default.

### Arguments
* `--trained_model`: pretrained model
* `--text_threshold`: text confidence threshold
* `--low_text`: text low-bound score
* `--link_threshold`: link confidence threshold
* `--cuda`: use cuda for inference (default:True)
* `--canvas_size`: max image size for inference
* `--mag_ratio`: image magnification ratio
* `--poly`: enable polygon type result
* `--show_time`: show processing time
* `--test_folder`: folder path to input images
* `--refine`: use link refiner for sentense-level dataset
* `--refiner_model`: pretrained refiner model

- `--trained_model`: pretrained model
- `--text_threshold`: text confidence threshold
- `--low_text`: text low-bound score
- `--link_threshold`: link confidence threshold
- `--cuda`: use cuda for inference (default:True)
- `--canvas_size`: max image size for inference
- `--mag_ratio`: image magnification ratio
- `--poly`: enable polygon type result
- `--show_time`: show processing time
- `--test_folder`: folder path to input images
- `--refine`: use link refiner for sentense-level dataset
- `--refiner_model`: pretrained refiner model

## Links

- WebDemo : https://demo.ocr.clova.ai/
- Repo of recognition : https://github.com/clovaai/deep-text-recognition-benchmark

## Citation

```
@inproceedings{baek2019character,
title={Character Region Awareness for Text Detection},
Expand All @@ -80,6 +89,7 @@ The result image and socre maps will be saved to `./result` by default.
```

## License

```
Copyright (c) 2019-present NAVER Corp.

Expand All @@ -101,3 +111,9 @@ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
```

---

cd D:\python\CRAFT-pytorch
python -m venv venv
.\venv\Scripts\Activate.ps1
16 changes: 16 additions & 0 deletions cog.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
build:
python_version: "3.10"
system_packages:
- "libgl1"
- "libglib2.0-0"
python_packages:
- torch==2.0.1
- torchvision==0.15.2
- opencv-python
- matplotlib
- scikit-image
- scipy
- Pillow
- numpy

predict: "predict.py:Predictor"
67 changes: 67 additions & 0 deletions predict.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
import os
import cv2
import torch
import numpy as np
from cog import BasePredictor, Input, Path

import craft_utils
import imgproc
from craft import CRAFT


class Predictor(BasePredictor):
def setup(self):
# Load model once
self.net = CRAFT()
weight_path = "craft_mlt_25k.pth"

# Download weights if not present
if not os.path.exists(weight_path):
import requests
url = "https://github.com/clovaai/CRAFT-pytorch/releases/download/1.0/craft_mlt_25k.pth"
r = requests.get(url)
with open(weight_path, "wb") as f:
f.write(r.content)

self.net.load_state_dict(
torch.load(weight_path, map_location="cpu")
)
self.net.eval()

def predict(
self,
image: Path = Input(description="Input image"),
text_threshold: float = Input(default=0.7, description="Text confidence threshold"),
link_threshold: float = Input(default=0.4, description="Link confidence threshold"),
low_text: float = Input(default=0.4, description="Low text threshold"),
) -> dict:
# Load image
img = imgproc.loadImage(str(image))

# Run detection
bboxes, polys, score_text = craft_utils.test_net(
self.net,
img,
text_threshold=text_threshold,
link_threshold=link_threshold,
low_text=low_text,
cuda=False,
)

# Draw boxes on image
vis = img.copy()
for box in bboxes:
pts = np.array(box).astype(np.int32).reshape((-1, 1, 2))
cv2.polylines(vis, [pts], True, (0, 255, 0), 2)

# Save output image
out_path = "/tmp/output.jpg"
cv2.imwrite(out_path, vis)

# Convert boxes to JSON serializable format
boxes_json = [np.array(box).astype(float).tolist() for box in bboxes]

return {
"output_image": Path(out_path),
"boxes": boxes_json,
}