, "serverDownMessage":"Internal server error, please try again after some time"
Data Engineer - India Data Portal , Bharti Institute of Public Policy
We are looking for a team of passionate Data Engineers to join hands with us in building the best open data portal in the country. India Data Portal has the ambition to serve as a one-stop open-access portal for all public data related to the social sector and the environment to support an array of different stakeholders – researchers, students, journalists, policymakers, administrators, NGOs, and entrepreneurs – to access and interact with public datasets.
What do we do?
Thrive to build a platform with public data on a single portal providing seamless access to the data, inter-data comparisons, and visualisations.
We make data available in Machine Readable formats via automated, semi-automated data pipelines
Engage with Various key stakeholders for improved open data usage across various domains/industries.
Work with multiple projects for data collection using web-scraping techniques followed by automation/semi-automation of data ingestion workflows
Ensure data is transformed and stored in efficient and secure methods for retrieval and use
Maintain and track the dataset updates from period to period
Create precise documentation and follow code standards, code management and automated systems to a smooth daily workflow Responsible for efficiency and effectiveness of ETL jobs
Maintain deep technical knowledge of various tools in the data warehouse, data hub, and analytical tools.
Work with interns from various top institutions in the country to plan and deliver work.
Minimum 2-3 years of experience working on ETL pipelines using python
Should be familiar withusing Apache Airflow, dbt or other orchestration technologies
Expertise in web-scraping using python – using selenium, scrapy, requests, beautiful soup, and other libraries
Hands-on experience with RDBMS, Cloud Storage like Amazon S3/ Google Cloud Storage
Excellent knowledge of data transformations using pandas, NumPy and other libraries
Good to have knowledge of Cloud Implementation/deployment (either one of Azure, GCP, or Amazon)
Implementation of Serverless or Cloud Functions
Experience in data exchange products like CKAN will be an advantage
Should be independent and able to work in a fast-paced environment with minimal supervision
Good communication skills (verbal/written) to effectively communicate and work with the team to coordinate and deliver
Do you wish to apply for this role? Share your profile