Changing csv structure and corresponding etl

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Changing csv structure and corresponding etl

Shekar Tippur
Hello,

I am using redshift to store data from csv backups that appear at a regular interval. I use pyspark (psycopg library) to perform etl. The issue is that the csv structure changes in between and the etl job fails.
The issue I found is that the column gets mixed up.
For example, the original column list was A,B,C,D. In the next iteration, the columns can be A, B, C, X,Y, D
I read from some of the other posts that it is not possible to alter a table to add a column in a particular position within Postgres.
The table itself currently has millions of rows. Merging tables whenever I get a change may not be a good option. I.e. create a union of existing table and new data, drop the original table and rename union to original.
Any pointers in how to proceed?

Thanks,
Shekar
Sent from my iPhone