Copy data as fast as possible with a job in the cloud (Big query and Cloud SQL) #803
-
Hi, I'm trying to copy data between two data sources. If you're curious, I'm copying data between Big query and Cloud SQL. It's not really relevant here, this question should be transposable to other types of database. I can dump data from my first data source (Big query) into files, either json or csv files. And I can read from these files (either in batch or streaming), and I want to write this data into a postgres database (cloud SQL). I would like to do this as quickly/efficiently as possible. I'm using a job (a cloud run job) to read the data from the files. It looks like this: with open("data/super_data.csv", "r") as f:
with global_pool.connection() as conn:
with conn.cursor() as cur:
with cur.copy("COPY my_table FROM STDIN") as copy:
while data := f.read(1024):
copy.write(data) I have a few questions regarding this approach:
|
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
|
Beta Was this translation helpful? Give feedback.
-
That's pretty much a perfect answer, thank you very much. The psql equivalent is below, so I imagine that postgres knows how to parse json. Note that postgres requires new line delimited json:
I imagine this is still valid: with cur.copy("COPY my_table FROM STDIN") as copy:
while data := f.read(1024):
copy.write(data) But I need to figure out how to read the json file (it's not as easy as just doing Just for your curiosity, it's not that I prefer json over csv, it's because very often people use nested fields in big query and these can't be exported to csv easily |
Beta Was this translation helpful? Give feedback.
write()
instead ofwrite_row()
is definitely a better approach: you read block by block and write block by block with no Python parsing.