Today, I’d like to introduce to you my new project, a SQLServer to AWS Redshift data migration tool . There’s not much tooling for this out there on the Internet, so I hope this tool is going to be valuable for some of you. It’s going to be written in Python 3.7 and once finished, it will be published to my Github account under a MIT Licence. What I’m currently doing is going to be described here in this blog in 2 phases.
Phase #1 will be all about SQL Server coding, there I’ll need to:
- extract and filter the data from the SQL Server tables I need to transfer to AWS Redshift
- I’ll need to persist this data using dynamically generated BCP commands into .csv files ( these .csv files will be split based on the target Redshift tables )
- I’ll need to store these .csv files on a local hard drive.
Phase #2 will be about Python and AWS Boto3 libraries and wrapping this tool all together to push the data through all the way to AWS Redshift. That means:
- Upload the .csv files from Phase #1 into a AWS S3 bucket
- Run the copy commands to load these .csv files to AWS Redshift target tables
- Do the cleanup of the files and write log data
As soon as I have this initial version out, I would like to extend this tool to be capable of running incremental data loads based on watermarks as well.