Data Engineer - India Data Portal , Bharti Institute of Public Policy

We are looking for a team of passionate Data Engineers to join hands with us in building the best open data portal in the country. India Data Portal has the ambition to serve as a one-stop open-access portal for all public data related to the social sector and the environment to support an array of different stakeholders – researchers, students, journalists, policymakers, administrators, NGOs, and entrepreneurs – to access and interact with public datasets.

What do we do?

  • Thrive to build a platform with public data on a single portal providing seamless access to the data, inter-data comparisons, and visualisations.
  • We make data available in Machine Readable formats via automated, semi-automated data pipelines
  • Engage with Various key stakeholders for improved open data usage across various domains/industries.

Job Purpose:

  • Work with multiple projects for data collection using web-scraping techniques followed by automation/semi-automation of data ingestion workflows
  • Ensure data is transformed and stored in efficient and secure methods for retrieval and use
  • Maintain and track the dataset updates from period to period
  • Create precise documentation and follow code standards, code management and automated systems to a smooth daily workflow Responsible for efficiency and effectiveness of ETL jobs
  • Maintain deep technical knowledge of various tools in the data warehouse, data hub, and analytical tools.
  • Work with interns from various top institutions in the country to plan and deliver work.

Skills Required:

  • Minimum 2-3 years of experience working on ETL pipelines using python
  • Should be familiar withusing Apache Airflow, dbt or other orchestration technologies
  • Expertise in web-scraping using python – using selenium, scrapy, requests, beautiful soup, and other libraries
  • Hands-on experience with RDBMS, Cloud Storage like Amazon S3/ Google Cloud Storage
  • Excellent knowledge of data transformations using pandas, NumPy and other libraries
  • Good to have knowledge of Cloud  Implementation/deployment (either one of Azure, GCP, or Amazon)
  • Implementation of Serverless or Cloud Functions
  • Experience in data exchange products like CKAN will be an advantage
  • Should be independent and able to work in a fast-paced environment with minimal supervision
  • Good communication skills (verbal/written) to effectively communicate and work with the team to coordinate and deliver

Do you wish to apply for this role? Share your profile

Contact us at

040 23187777

0172 4591800


Monday- Friday, 02:30 to 12:30