Data Engineer - India Data Portal , Bharti Institute of Public Policy

We are looking for a team of passionate Data Engineers to join hands with us in building the best open data portal in the country. India Data Portal has the ambition to serve as a one-stop open-access portal for all public data related to the social sector and the environment to support an array of different stakeholders – researchers, students, journalists, policymakers, administrators, NGOs, and entrepreneurs – to access and interact with public datasets.

What do we do?

·       Thrive to build a platform with public data on a single portal providing seamless access to the data, inter-data comparisons, and visualisations.

·       We make data available in Machine Readable formats via automated, semi-automated data pipelines

·       Engage with Various key stakeholders for improved open data usage across various domains/industries.

Job Purpose:

·       Work with multiple projects for data collection using web-scraping techniques followed by automation/semi-automation of data ingestion workflows

·       Ensure data is transformed and stored in efficient and secure methods for retrieval and use

·       Maintain and track the dataset updates from period to period

·       Create precise documentation and follow code standards, code management and automated systems to a smooth daily workflow Responsible for efficiency and effectiveness of ETL jobs

·       Maintain deep technical knowledge of various tools in the data warehouse, data hub, and analytical tools.

·       Work with interns from various top institutions in the country to plan and deliver work.

Skills Required:

·       Minimum 2-3 years of experience working on ETL pipelines using python

·       Should be familiar withusing Apache Airflow, dbt or other orchestration technologies

·       Expertise in web-scraping using python – using selenium, scrapy, requests, beautiful soup, and other libraries

·       Hands-on experience with RDBMS, Cloud Storage like Amazon S3/ Google Cloud Storage

·       Excellent knowledge of data transformations using pandas, NumPy and other libraries

·       Good to have knowledge of Cloud  Implementation/deployment (either one of Azure, GCP, or Amazon)

·       Implementation of Serverless or Cloud Functions

·       Experience in data exchange products like CKAN will be an advantage

·       Should be independent and able to work in a fast-paced environment with minimal supervision

·       Good communication skills (verbal/written) to effectively communicate and work with the team to coordinate and deliver


Do you wish to apply for this role? Share your profile

Contact us at

040 23187777

0172 4591800


Monday- Friday, 08:00 to 18:00