Data Engineer - India Data Portal , Bharti Institute of Public Policy

We are looking for a team of passionate Data Engineers to join hands with us in building the best open data portal in the country. India Data Portal has the ambition to serve as a one-stop open-access portal for all public data related to the social sector and the environment to support an array of different stakeholders – researchers, students, journalists, policymakers, administrators, NGOs, and entrepreneurs – to access and interact with public datasets.

What do we do?

  • Thrive to build a platform with public data on a single portal providing seamless access to the data, inter-data comparisons, and visualisations.
  • We make data available in Machine Readable formats via automated, semi-automated data pipelines
  • Engage with Various key stakeholders for improved open data usage across various domains/industries.
     

Job Purpose:

  • Work with multiple projects for data collection using web-scraping techniques followed by automation/semi-automation of data ingestion workflows
  • Ensure data is transformed and stored in efficient and secure methods for retrieval and use
  • Maintain and track the dataset updates from period to period
  • Create precise documentation and follow code standards, code management and automated systems to a smooth daily workflow Responsible for efficiency and effectiveness of ETL jobs
  • Maintain deep technical knowledge of various tools in the data warehouse, data hub, and analytical tools.
  • Work with interns from various top institutions in the country to plan and deliver work.
     

Skills Required:

  • Minimum 2-3 years of experience working on ETL pipelines using python
  • Should be familiar withusing Apache Airflow, dbt or other orchestration technologies
  • Expertise in web-scraping using python – using selenium, scrapy, requests, beautiful soup, and other libraries
  • Hands-on experience with RDBMS, Cloud Storage like Amazon S3/ Google Cloud Storage
  • Excellent knowledge of data transformations using pandas, NumPy and other libraries
  • Good to have knowledge of Cloud  Implementation/deployment (either one of Azure, GCP, or Amazon)
  • Implementation of Serverless or Cloud Functions
  • Experience in data exchange products like CKAN will be an advantage
  • Should be independent and able to work in a fast-paced environment with minimal supervision
  • Good communication skills (verbal/written) to effectively communicate and work with the team to coordinate and deliver

Do you wish to apply for this role? Share your profile

Contact us at

040 23187777

0172 4591800

Timings

Monday- Friday, 08:00 AM IST to 06:00 PM IST