Skip to content
This repository was archived by the owner on Mar 6, 2026. It is now read-only.

feat: added system test and sample for dataframe contains array#365

Closed
HemangChothani wants to merge 1 commit intogoogleapis:masterfrom
MaxxleLLC:bigquery_issue_19
Closed

feat: added system test and sample for dataframe contains array#365
HemangChothani wants to merge 1 commit intogoogleapis:masterfrom
MaxxleLLC:bigquery_issue_19

Conversation

@HemangChothani
Copy link
Copy Markdown
Contributor

Fixes #19

@HemangChothani HemangChothani requested review from a team and tswast November 4, 2020 12:28
@HemangChothani HemangChothani requested a review from a team as a code owner November 4, 2020 12:28
@google-cla google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Nov 4, 2020
@snippet-bot
Copy link
Copy Markdown

snippet-bot bot commented Nov 4, 2020

Here is the summary of changes.

You added 1 region tag.

@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Nov 4, 2020
# pyarrow 1.0.0 is required for the use of timestamp_as_object keyword.
"pyarrow >= 1.0.0, < 2.0dev",
# pyarrow 2.0.0 is required for the use of arrays in dataframe to load the table .
"pyarrow >= 2.0.0, < 3.0dev",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not bump the minimum version here. Most features do work with 1.0, and pyarrow is a core library that is very useful to have a wide range of support.

None,
(
bigquery.SchemaField(
"item", "INTEGER", "NULLABLE", None, (), None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... This is a bit of a surprising schema. It appears to match the behavior we were encountering previously. This feature is not supported if we cannot upload directly to a REPEATED INTEGER column.

# table_id = "your-project.your_dataset.your_table_name"

dataframe = pandas.DataFrame({"A": [[1, 2, 3], [4, 5, 6], [7, 8, 9]]})
job = client.load_table_from_dataframe(dataframe, table_id) # Make an API request.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without an explicit schema, this sample is no different from the generic load_table_from_dataframe sample.

I was imagining system test XOR sample, as they are testing the same behavior.

@tswast
Copy link
Copy Markdown
Contributor

tswast commented Nov 4, 2020

I've sent #368 to capture just the desired setup.py changes.

It's possible there are some kinds of arrays (such as arrays of records) that are supported, but it appears arrays of scalars still aren't handled correctly.

@tswast tswast closed this Nov 5, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

api: bigquery Issues related to the googleapis/python-bigquery API. cla: yes This human has signed the Contributor License Agreement.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BigQuery: Upload pandas DataFrame containing arrays

2 participants