Reading A Docx File From S3 Bucket With Flask Results In An Attributeerror

July 25, 2024 Post a Comment

I got so many different errors, I don't even know which is pertinent to mention but it's not about the credentials because I can upload files already and I can read a txt file. Now

Solution 1:

I checked out the documentation for python-docx, specifically the Document-constructor:

docx.Document(docx=None)
Return a Document object loaded from docx, where docx can be either a path to a .docx file (a string) or a file-like object. If docx is missing or None, the built-in default document “template” is loaded.

It seems to expect a file-like object or the path to a file. We can turn the different representations we get from boto3 into a file-like object, here's some sample code:

import io

import boto3
import docx

BUCKET_NAME = "my-bucket"defmain():
    s3 = boto3.resource("s3")
    bucket = s3.Bucket(BUCKET_NAME)

    object_in_s3 = bucket.Object("test.docx")
    object_as_streaming_body = object_in_s3.get()["Body"]
    print(f"Type of object_as_streaming_body: {type(object_as_streaming_body)}")
    object_as_bytes = object_as_streaming_body.read()
    print(f"Type of object_as_bytes: {type(object_as_bytes)}")

    # Now we use BytesIO to create a file-like object from our byte-stream
    object_as_file_like = io.BytesIO(object_as_bytes)
    
    # Et voila!
    document = docx.Document(docx=object_as_file_like)

    print(document.paragraphs)

if __name__ == "__main__":
    main()

This is what it looks like:

$ python test.py
Type of object_as_streaming_body: <class 'botocore.response.StreamingBody'>
Type of object_as_bytes: <class 'bytes'>
[<docx.text.paragraph.Paragraph object at 0x00000258B7C34A30>]

Python Channel

Reading A Docx File From S3 Bucket With Flask Results In An Attributeerror

Solution 1:

Post a Comment for "Reading A Docx File From S3 Bucket With Flask Results In An Attributeerror"