Skip to content
This repository was archived by the owner on Jun 30, 2025. It is now read-only.

Commit f97e5f8

Browse files
committed
feat(extractors): add image extractor
1 parent 8dcd6bc commit f97e5f8

File tree

11 files changed

+143
-6
lines changed

11 files changed

+143
-6
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,3 +30,4 @@ logs
3030
coverage
3131
cache
3232
.zed
33+
*.traineddata

fixtures/006.expected

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
Lorem ipsum
2+
3+
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sapien ante conubia vestibulum
4+
ultrices quisque nam nascetur consectetur. Viverra amet lacinia massa donec gravida primis
5+
leo tellus. Montes nulla sit cras odio penatibus cum aenean metus. Per per eros fusce et
6+
platea et feugiat ullamcorper.

fixtures/006.png

22.4 KB
Loading

fixtures/007.expected

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
at his touch of a certain icy pang along my blood. “Come, sir,” said I.
2+
“You forget that I have not yet the pleasure of your acquaintance. Be
3+
seated, if you please.” And I showed him an example, and sat down
4+
myself in my customary seat and with as fair an imitation of my or-
5+
dinary manner to a patient, as the lateness of the hour, the nature of
6+
my preoccupations, and the horror I had of my visitor, would suffer
7+
me to muster.
8+
9+
“I beg your pardon, Dr. Lanyon,” he replied civilly enough. “What
10+
you say is very well founded; and my impatience has shown its heels
11+
to my politeness. I come here at the instance of your colleague, Dr.
12+
Henry Jekyll, on a piece of business of some moment; and I under-
13+
stood...” He paused and put his hand to his throat, and I could see,
14+
in spite of his collected manner, that he was wrestling against the
15+
approaches of the hysteria—“T understood, a drawer...”
16+
17+
But here I took pity on my visitor's suspense, and some perhaps
18+
on my own growing curiosity.
19+
20+
“There it is, sir,” said I, pointing to the drawer, where it lay on the
21+
floor behind a table and still covered with the sheet.
22+
23+
He sprang to it, and then paused, and laid his hand upon his
24+
heart: I could hear his teeth grate with the convulsive action of his
25+
jaws; and his face was so ghastly to see that I grew alarmed both for
26+
his life and reason.
27+
28+
“Compose yourself,” said I.
29+
30+
He turned a dreadful smile to me, and as if with the decision of
31+
despair, plucked away the sheet. At sight of the contents, he uttered
32+
one loud sob of such immense relief that I sat petrified. And the
33+
next moment, in a voice that was already fairly well under control,
34+
“Have you a graduated glass?” he asked.
35+
36+
I rose from my place with something of an effort and gave him
37+
what he asked.
38+
39+
He thanked me with a smiling nod, measured out a few min-
40+
ims of the red tincture and added one of the powders. The mix-
41+
ture, which was at first of a reddish hue, began, in proportion as the

fixtures/007.jpg

393 KB
Loading

fixtures/008.expected

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
Lorem ipsum
2+
3+
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sapien ante conubia vestibulum
4+
ultrices quisque nam nascetur consectetur. Viverra amet lacinia massa donec gravida primis
5+
leo tellus. Montes nulla sit cras odio penatibus cum aenean metus. Per per eros fusce et
6+
platea et feugiat ullamcorper.

fixtures/008.gif

12.6 KB
Loading

package.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,7 @@
4646
"release": "bumpp --commit --tag --push"
4747
},
4848
"dependencies": {
49+
"tesseract.js": "^6.0.0",
4950
"unpdf": "^0.12.1"
5051
},
5152
"devDependencies": {

pnpm-lock.yaml

Lines changed: 63 additions & 6 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

src/extractors.registry.ts

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,12 @@
11
import type { ExtractorDefinition } from './extractors.models';
2+
import { imageExtractorDefinition } from './extractors/img.extractor';
23
import { pdfExtractorDefinition } from './extractors/pdf.extractor';
34
import { txtExtractorDefinition } from './extractors/txt.extractor';
45

56
export const extractorDefinitions: ExtractorDefinition[] = [
67
pdfExtractorDefinition,
78
txtExtractorDefinition,
9+
imageExtractorDefinition,
810
];
911

1012
export function getExtractor({

0 commit comments

Comments
 (0)