OMOP-CDM ETL with Multi-DB Support
Designed and developed a secure, containerized ETL platform tailored for large-scale patient data processing in the healthcare sector. The system automates data ingestion, transformation, and loading workflows with out-of-the-box support for OMOP-CDM, ensuring compliance and interoperability. The solution improves efficiency, reduces manual intervention, and enables data-driven healthcare analytics and decision-making.
Workflow : -
Designed a declarative ETL framework enabling schema and job configurations via YAML files.
Developed a CLI using Python Click to manage data workflows (create, populate, run).
Built transformation pipelines to standardize data into OMOP-CDM format.
Enabled support for PostgreSQL, MSSQL, and Oracle backends using SQLAlchemy.
Ensured performance and scalability through incremental ETL job execution.
Wrote unit and integration tests using Pytest and CliRunner to ensure reliability.
Containerized the entire ETL system using Docker and automated deployment via GitLab CI on Cloud environment.
Leveraged AWS RDS for PostgreSQL and S3 for secure data storage and ingestion workflows
Services
Responsibilities
Python
Click
Pandas
SQL
Alchemy
PostgreSQL
MSSQL
Oracle
Docker
GitLab CI/CD APIs