Skip to content

Commit 5688bd5

Browse files
committed
Add first commit for "doi2cite"
1 parent 6813921 commit 5688bd5

File tree

11 files changed

+670
-0
lines changed

11 files changed

+670
-0
lines changed

doi2cite/Makefile

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
DIFF ?= diff --strip-trailing-cr -u
2+
3+
test: sample1.md sample1.csl doi2cite.lua
4+
@pandoc --lua-filter=doi2cite.lua --to=markdown $< | $(DIFF) expected1.md -
5+
@pandoc --lua-filter=doi2cite.lua --to=pdf $< | $(DIFF) expected1.pdf -
6+
7+
expected1.md: sample1.md doi2cite.lua
8+
pandoc --lua-filter=doi2cite.lua --wrap=preserve --output $@ $<
9+
10+
expected1.pdf: sample1.md sample1.csl doi2cite.lua
11+
pandoc --lua-filter=doi2cite.lua --filter=pandoc-crossref --citeproc --csl=sample1.csl --output $@ $<
12+
13+
expected2.md: sample2.md doi2cite.lua
14+
pandoc --lua-filter=doi2cite.lua --wrap=preserve --output $@ $<
15+
16+
.PHONY: test

doi2cite/README.md

Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
# pandoc-doi2cite
2+
This pandoc lua filiter helps users to insert references in a document
3+
with using DOI(Digital Object Identifier) tags. With this filter, users
4+
do not need to make bibtex file by themselves. Instead, the filter
5+
automatically generate .bib file from the DOI tags, and convert the DOI
6+
tags into citation keys available by `--citeproc`.
7+
8+
<img src="https://user-images.githubusercontent.com/30950088/117561410-87ec5d00-b0d1-11eb-88be-931f3158ec44.png" width="960">
9+
10+
What the filter do are as follows:
11+
1. Search citations with DOI tags in the document
12+
2. Search corresponding bibtex data from `__from_DOI.bib` file
13+
3. If not found, get bibtex data of the DOI from
14+
http://api.crossref.org
15+
4. Add reference data to `__from_DOI.bib` file
16+
5. Check duplications of reference keys
17+
6. Replace DOI tags to the correspoinding citation keys
18+
19+
# Prerequisites
20+
- Pandoc version 2.0 or newer
21+
- This filter does not need any external dependencies
22+
- This filter must be executed before `pandoc-crossref` or
23+
`--citeproc`
24+
25+
# DOI tags
26+
Following DOI tags can be used:
27+
- @https://doi.org/
28+
- @doi.org/
29+
- @DOI:
30+
- @doi:
31+
32+
The first one (@https://doi.org/) may be the most useful because it is
33+
same as the accessible URL.
34+
35+
# YAML header
36+
The file **name** of the auto-generated bibliography file **MUST** be
37+
`__from_DOI.bib`, but the **place** of the file can be changed (e.g.
38+
`./refs/__from_DOI.bib`). You can designate the filepath in the
39+
document yaml header. The yaml key is `bibliography`, which is also
40+
used by Pandoc `--citeproc`.
41+
42+
43+
# Example
44+
45+
example1.md:
46+
47+
---
48+
bibliography:
49+
- "my_refs.bib"
50+
- "__from_DOI.bib"
51+
---
52+
53+
# Introduction
54+
The Laemmli system is one of the most widely used gel systems for the separation of proteins.[@LAEMMLI_1970]
55+
By the way, Einstein is genius.[@https://doi.org/10.1002/andp.19053220607; @doi.org/10.1002/andp.19053220806; @doi:10.1002/andp.19053221004]
56+
57+
Example command 1 (.md -\> .md)
58+
59+
``` {.sh}
60+
pandoc --lua-filter=doi2cite.lua --wrap=preserve -s example1.md -o expected1.md
61+
```
62+
63+
Example command 2 (.md -\> .pdf with
64+
[ACS](https://pubs.acs.org/journal/jacsat) style):
65+
66+
``` {.sh}
67+
pandoc --lua-filter=doi2cite.lua --filter=pandoc-crossref --citeproc --csl=sample1.csl -s example1.md -o expected1.pdf
68+
```
69+
70+
Example result
71+
72+
![expected1](https://user-images.githubusercontent.com/30950088/119964566-4d952200-bfe4-11eb-90d9-ed2366c639e8.png)

doi2cite/__from_DOI.bib

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
@article{Einstein_1905,
2+
doi = {10.1002/andp.19053220607},
3+
url = {https://doi.org/10.1002%2Fandp.19053220607},
4+
year = 1905,
5+
publisher = {Wiley},
6+
volume = {322},
7+
number = {6},
8+
pages = {132--148},
9+
author = {A. Einstein},
10+
title = {Über einen die Erzeugung und Verwandlung des Lichtes betreffenden heuristischen Gesichtspunkt},
11+
journal = {Annalen der Physik}
12+
}
13+
@article{Einstein_1905_10.1002/andp.19053220806,
14+
doi = {10.1002/andp.19053220806},
15+
url = {https://doi.org/10.1002%2Fandp.19053220806},
16+
year = 1905,
17+
publisher = {Wiley},
18+
volume = {322},
19+
number = {8},
20+
pages = {549--560},
21+
author = {A. Einstein},
22+
title = {Über die von der molekularkinetischen Theorie der Wärme geforderte Bewegung von in ruhenden Flüssigkeiten suspendierten Teilchen},
23+
journal = {Annalen der Physik}
24+
}
25+
@article{Einstein_1905_10.1002/andp.19053221004,
26+
doi = {10.1002/andp.19053221004},
27+
url = {https://doi.org/10.1002%2Fandp.19053221004},
28+
year = 1905,
29+
publisher = {Wiley},
30+
volume = {322},
31+
number = {10},
32+
pages = {891--921},
33+
author = {A. Einstein},
34+
title = {Zur Elektrodynamik bewegter Körper},
35+
journal = {Annalen der Physik}
36+
}

doi2cite/doi2cite.lua

Lines changed: 230 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,230 @@
1+
--------------------------------------------------------------------------------
2+
-- Copyright © 2021 Takuro Hosomi
3+
-- This library is free software; you can redistribute it and/or modify it
4+
-- under the terms of the MIT license. See LICENSE for details.
5+
--------------------------------------------------------------------------------
6+
7+
8+
--------------------------------------------------------------------------------
9+
-- Global variables --
10+
--------------------------------------------------------------------------------
11+
base_url = "http://api.crossref.org"
12+
13+
bibname = "__from_DOI.bib"
14+
bibpath = "__from_DOI.bib"
15+
key_list = {};
16+
doi_key_map = {};
17+
doi_entry_map = {};
18+
error_strs = {};
19+
error_strs["Resource not found."] = 404
20+
error_strs["No acceptable resource available."] = 406
21+
error_strs["<html><body><h1>503 Service Unavailable</h1>\n"..
22+
"No server is available to handle this request.\n"..
23+
"</body></html>"] = 503
24+
25+
26+
--------------------------------------------------------------------------------
27+
-- Pandoc Functions --
28+
--------------------------------------------------------------------------------
29+
-- Get bibliography filepath from yaml metadata
30+
function Meta(m)
31+
local bib_data = m.bibliography
32+
local bibpaths = get_paths_from(bib_data)
33+
bibpath = get_filepath(bibname, bibpaths)
34+
if bibpath == nil then
35+
bibpath = "__from_DOI.bib"
36+
print("[doi2cite WARNING]: "
37+
.."Include '"..bibpath.."' into bibliography list"
38+
.." to be processed by citeproc."
39+
)
40+
end
41+
local f = io.open(bibpath, "r")
42+
if f then
43+
entries_str = f:read('*all')
44+
if entries_str then
45+
doi_entry_map = get_doi_entry_map(entries_str)
46+
doi_key_map = get_doi_key_map(entries_str)
47+
for doi,key in pairs(doi_key_map) do
48+
key_list[key] = true
49+
end
50+
end
51+
f:close()
52+
else
53+
if io.open(bibpath, "w") == nil then
54+
error("Unable to make bibtex file: "..bibpath..".\n"
55+
.."This error may come from the missing directory. \n"
56+
.."doi2cite filter will not make directory by iteself. \n"
57+
.."Make sure that the directory for bibtex file exists."
58+
)
59+
end
60+
end
61+
end
62+
63+
-- Get bibtex data of doi-based citation.id and make bibliography.
64+
-- Then, replace "citation.id"
65+
function Cite(c)
66+
for _, citation in pairs(c.citations) do
67+
local id = citation.id:gsub('%s+', ''):gsub('%%2F', '/')
68+
if id:sub(1,16) == "https://doi.org/" then
69+
doi = id:sub(17):lower()
70+
elseif id:sub(1,8) == "doi.org/" then
71+
doi = id:sub(9):lower()
72+
elseif id:sub(1,4) == "DOI:" or id:sub(1,4) == "doi:" then
73+
doi = id:sub(5):lower()
74+
else
75+
doi = nil
76+
end
77+
if doi then
78+
if doi_key_map[doi] ~= nil then
79+
local entry_key = doi_key_map[doi]
80+
citation.id = entry_key
81+
else
82+
local entry_str = get_bibentry(doi)
83+
if entry_str == nil or error_strs[entry_str] ~= nil then
84+
print("Failed to get ref from DOI: " .. doi)
85+
else
86+
entry_str = tex2raw(entry_str)
87+
local entry_key = get_entrykey(entry_str)
88+
if key_list[entry_key] ~= nil then
89+
entry_key = entry_key.."_"..doi
90+
entry_str = replace_entrykey(entry_str, entry_key)
91+
end
92+
key_list[entry_key] = true
93+
doi_key_map[doi] = entry_key
94+
citation.id = entry_key
95+
local f = io.open(bibpath, "a+")
96+
if f then
97+
f:write(entry_str .. "\n")
98+
f:close()
99+
else
100+
error("Unable to open file: "..bibpath)
101+
end
102+
end
103+
end
104+
end
105+
end
106+
return c
107+
end
108+
109+
110+
--------------------------------------------------------------------------------
111+
-- Common Functions --
112+
--------------------------------------------------------------------------------
113+
-- Get bib of DOI from http://api.crossref.org
114+
function get_bibentry(doi)
115+
local entry_str = doi_entry_map[doi]
116+
if entry_str == nil then
117+
print("Request DOI: " .. doi)
118+
local url = base_url.."/works/"
119+
..doi.."/transform/application/x-bibtex"
120+
.."?mailto="..mailto
121+
mt, entry_str = pandoc.mediabag.fetch(url)
122+
end
123+
return entry_str
124+
end
125+
126+
-- Extract designated filepaths from 1 or 2 dimensional metadata
127+
function get_paths_from(metadata)
128+
local filepaths = {};
129+
if metadata then
130+
if metadata[1].text then
131+
filepaths[metadata[1].text] = true
132+
elseif type(metadata) == "table" then
133+
for _, datum in pairs(metadata) do
134+
if datum[1].text then
135+
filepaths[datum[1].text] = true
136+
end
137+
end
138+
end
139+
end
140+
return filepaths
141+
end
142+
143+
-- Extract filename from a given a path
144+
function get_filename(path)
145+
local len = path:len()
146+
local reversed = path:reverse()
147+
if reversed:find("/") then
148+
local pos = reversed:find("/")
149+
local fname_rev = reversed:sub(1, pos-1)
150+
return fname_rev:reverse()
151+
elseif reversed:find([[\]]) then
152+
local pos = reversed:find([[\]])
153+
local fname_rev = reversed:sub(1, pos-1)
154+
return fname_rev:reverse()
155+
else
156+
return path
157+
end
158+
end
159+
160+
-- Find bibname in a given filepath list and return the filepath if found
161+
function get_filepath(filename, filepaths)
162+
for path, _ in pairs(filepaths) do
163+
local filename = get_filename(path)
164+
if filename == bibname then
165+
return path
166+
end
167+
end
168+
return nil
169+
end
170+
171+
-- Make some TeX descriptions processable by citeproc
172+
function tex2raw(string)
173+
local symbols = {};
174+
symbols["{\textendash}"] = ""
175+
symbols["{\textemdash}"] = ""
176+
symbols["{\textquoteright}"] = ""
177+
symbols["{\textquoteleft}"] = ""
178+
for tex, raw in pairs(symbols) do
179+
local string = string:gsub(tex, raw)
180+
end
181+
return string
182+
end
183+
184+
-- get bibtex entry key from bibtex entry string
185+
function get_entrykey(entry_string)
186+
local key = entry_string:match('@%w+{(.-),') or ''
187+
return key
188+
end
189+
190+
-- get bibtex entry doi from bibtex entry string
191+
function get_entrydoi(entry_string)
192+
local doi = entry_string:match('doi%s*=%s*["{]*(.-)["}],?') or ''
193+
return doi
194+
end
195+
196+
-- Replace entry key of "entry_string" to newkey
197+
function replace_entrykey(entry_string, newkey)
198+
entry_string = entry_string:gsub('(@%w+{).-(,)', '%1'..newkey..'%2')
199+
return entry_string
200+
end
201+
202+
-- Make hashmap which key = DOI, value = bibtex entry string
203+
function get_doi_entry_map(bibtex_string)
204+
local entries = {};
205+
for entry_str in bibtex_string:gmatch('@.-\n}\n') do
206+
local doi = get_entrydoi(entry_str)
207+
entries[doi] = entry_str
208+
end
209+
return entries
210+
end
211+
212+
-- Make hashmap which key = DOI, value = bibtex key string
213+
function get_doi_key_map(bibtex_string)
214+
local keys = {};
215+
for entry_str in bibtex_string:gmatch('@.-\n}\n') do
216+
local doi = get_entrydoi(entry_str)
217+
local key = get_entrykey(entry_str)
218+
keys[doi] = key
219+
end
220+
return keys
221+
end
222+
223+
224+
--------------------------------------------------------------------------------
225+
-- The main function --
226+
--------------------------------------------------------------------------------
227+
return {
228+
{ Meta = Meta },
229+
{ Cite = Cite }
230+
}

doi2cite/expected1.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
# Introduction
2+
3+
The Laemmli system is one of the most widely used gel systems for the separation of proteins.[@LAEMMLI_1970]
4+
By the way, Einstein is genius.[@Einstein_1905; @Einstein_1905_10.1002/andp.19053220806; @Einstein_1905_10.1002/andp.19053221004]

doi2cite/expected1.pdf

110 KB
Binary file not shown.

doi2cite/expected2.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Introduction
2+
3+
People sometimes make mistakes.[@DOI:10.1002/THIS.IS.NOT.VALID.DOI.SAMPLE]

doi2cite/my_refs.bib

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
@article{LAEMMLI_1970,
2+
doi = {10.1038/227680a0},
3+
url = {https://doi.org/10.1038%2F227680a0},
4+
year = 1970,
5+
month = {aug},
6+
publisher = {Springer Science and Business Media {LLC}},
7+
volume = {227},
8+
number = {5259},
9+
pages = {680--685},
10+
author = {U. K. LAEMMLI},
11+
title = {Cleavage of Structural Proteins during the Assembly of the Head of Bacteriophage T4},
12+
journal = {Nature}
13+
}

0 commit comments

Comments
 (0)