bug: Unreliable image link extraction.

I have a simple scenario:
- I create a document ( tested with google docs or LibreOffice) 
- I create some text and insert an image
- I export/download the document as pdf

Expected behaviour: 
When I extract images and content with kreuzberg from pdf (to markdown), the markdown file should contain a link to the image (E.g. `![](image_3.jpeg`) 

Actual behaviour:
No link extracted


The scenario is easy enough to reproduce. E.g.:
```
ExtractionConfig(
  output_format="markdown",
  images=ImageExtractionConfig(
            extract_images=true
  )
)
``` 

I tried various text editors, images, text variations & kreuzberg configs, but no luck. 
So I think its a bug. ( My kreuzberg version: v4.9.2 in python) 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: Unreliable image link extraction. #762

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

bug: Unreliable image link extraction. #762

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions