Load The Csv File Into Big Query Auto Detect Schema Using Python Api
Solution 1:
Based on the Google BigQuery python API documentation, you should set source_format to 'CSV' instead of 'text/csv':
source_format='CSV'
Code Sample:
withopen(csv_file.name, 'rb') as readable:
table.upload_from_file(
readable, source_format='CSV', skip_leading_rows=1)
Source: https://googlecloudplatform.github.io/google-cloud-python/stable/bigquery-usage.html#datasets
If this does not solve your problem, please provide more specifics about the errors you are observing.
Solution 2:
You can use the below code snippet to create and load data (CSV format) from Cloud Storage to BigQuery with auto-detect schema:
from google.cloud import bigquery
bigqueryClient = bigquery.Client()
jobConfig = bigquery.LoadJobConfig()
jobConfig.skip_leading_rows = 1
jobConfig.source_format = bigquery.SourceFormat.CSV
jobConfig.write_disposition = bigquery.WriteDisposition.WRITE_APPEND
jobConfig.autodetect=True
datasetName = "dataset-name"
targetTable = "table_name"
uri = "gs://bucket_name/file.csv"
tableRef = bigqueryClient.dataset(datasetName).table(targetTable)
bigqueryJob = bigqueryClient.load_table_from_uri(uri, tableRef, job_config=jobConfig)
bigqueryJob.result()
Solution 3:
Currently, the Python Client has no support for loading data from file with a schema auto-detection flag (I plan on doing a pull request to add this support but still I'd like to talk to the maintainers what their opinions are on this implementation).
There are still some ways to work this around. I didn't find a very elegant solution so far but nevertheless this code allows you to add schema detection as input flag:
from google.cloud.bigquery import Client
import os
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'path/your/json.key'import google.cloud.bigquery.table as mtable
def_configure_job_metadata(metadata,
allow_jagged_rows,
allow_quoted_newlines,
create_disposition,
encoding,
field_delimiter,
ignore_unknown_values,
max_bad_records,
quote_character,
skip_leading_rows,
write_disposition):
load_config = metadata['configuration']['load']
if allow_jagged_rows isnotNone:
load_config['allowJaggedRows'] = allow_jagged_rows
if allow_quoted_newlines isnotNone:
load_config['allowQuotedNewlines'] = allow_quoted_newlines
if create_disposition isnotNone:
load_config['createDisposition'] = create_disposition
if encoding isnotNone:
load_config['encoding'] = encoding
if field_delimiter isnotNone:
load_config['fieldDelimiter'] = field_delimiter
if ignore_unknown_values isnotNone:
load_config['ignoreUnknownValues'] = ignore_unknown_values
if max_bad_records isnotNone:
load_config['maxBadRecords'] = max_bad_records
if quote_character isnotNone:
load_config['quote'] = quote_character
if skip_leading_rows isnotNone:
load_config['skipLeadingRows'] = skip_leading_rows
if write_disposition isnotNone:
load_config['writeDisposition'] = write_disposition
load_config['autodetect'] = True# --> Here you can add the option for schema auto-detection
mtable._configure_job_metadata = _configure_job_metadata
bq_client = Client()
ds = bq_client.dataset('dataset_name')
ds.table = lambda: mtable.Table('table_name', ds)
table = ds.table()
withopen(source_file_name, 'rb') as source_file:
job = table.upload_from_file(
source_file, source_format='text/csv')
Solution 4:
Just wanted to show how i've used the python client.
Below is my function to create a table and load it with a csv file.
Also, self.client is my bigquery.Client()
definsertTable(self, datasetName, tableName, csvFilePath, schema=None):
"""
This function creates a table in given dataset in our default project
and inserts the data given via a csv file.
:param datasetName: The name of the dataset to be created
:param tableName: The name of the dataset in which the table needs to be created
:param csvFilePath: The path of the file to be inserted
:param schema: The schema of the table to be created
:return: returns nothing
"""
csv_file = open(csvFilePath, 'rb')
dataset_ref = self.client.dataset(datasetName)
# <import>: from google.cloud.bigquery import Dataset
dataset = Dataset(dataset_ref)
table_ref = dataset.table(tableName)
if schema isnotNone:
table = bigquery.Table(table_ref,schema)
else:
table = bigquery.Table(table_ref)
try:
self.client.delete_table(table)
except:
pass
table = self.client.create_table(table)
# <import>: from google.cloud.bigquery import LoadJobConfig
job_config = LoadJobConfig()
table_ref = dataset.table(tableName)
job_config.source_format = 'CSV'
job_config.skip_leading_rows = 1
job_config.autodetect = True
job = self.client.load_table_from_file(
csv_file, table_ref, job_config=job_config)
job.result()
Let me know if this solves your problem.
Post a Comment for "Load The Csv File Into Big Query Auto Detect Schema Using Python Api"