Skip to content Skip to sidebar Skip to footer

Import Data To Sql Using Python

I'm going to need to import 30k rows of data from a CSV file into a Vertica database. The code I've tried with is taking more than an hour to do so. I'm wondering if there's a fast

Solution 1:

You want to use the COPY command (COPY).

COPYTableFROM'/path/to/csv/file.csv' DELIMITER ',';

This is much faster than inserting each row at a time.

Since you are using python, I would recommend the vertica_python module as it has a very convenient copy method on it's cursor object (vertica-python GitHub page).

The syntax for using COPY with vertica-python is as follows:

withopen('file.csv', 'r') as file:
    csv_file = file.read()
    copy_cmd = "COPY Table FROM STDIN DELIMITER ','"
    cur.copy(copy_cmd, csv_file)
    connection.commit()

Another thing you can do to speed up the process is compress the csv file. Vertica can read gzip, bzip and lzo compressed files.

withopen('file.csv.gz', 'r') as file:
    gzipped_csv_file = file.read()
    copy_cmd = "COPY Table FROM STDIN GZIP DELIMITER ','"
    cur.copy(copy_cmd, gzipped_csv_file)
    connection.commit()

Copying compressed files will reduce network time. So you have to determine if the extra time it takes to compress the csv file is made up for in the time saved copying the compressed files. In most cases I've dealt with, it is worth it to compress the file.

Post a Comment for "Import Data To Sql Using Python"