Senior Software Engineer (R-19021)

Python Hybrid

Shape the Future with Dun & Bradstreet

At Dun & Bradstreet, we believe data has the power to create a better tomorrow. As a global leader in business decisioning data and analytics, we help companies worldwide grow, manage risk, and innovate. For over 180 years, businesses have trusted us to turn uncertainty into opportunity. We’re a diverse, global team that values creativity, collaboration, and bold ideas. Are you ready to make an impact and help shape what’s next? Join us! Explore opportunities at dnb.com/careers.

Job Summary:

We are looking for a skilled Data Engineer to join our Global Product Data (GPD) team in Hyderabad. You will play a critical role in building and maintaining automated web scraping pipelines that extract structured data from diverse online sources, transforming raw data into production-ready datasets for our Master Data Repository (MDR).

This role is part of a strategic initiative to bring web scraping and data acquisition capabilities in-house, replacing external vendor dependencies. You will work closely with the data engineering and product teams to ensure high-quality, reliable, and timely data delivery.

Key Responsibilities:

Design, develop, and maintain scalable web scraping solutions to extract data from a wide range of websites and online platforms

Build robust data pipelines and automation workflows for data collection, cleaning, validation, and transformation

Process and prepare scraped data into MDR production-ready formats, meeting strict quality and timeline requirements

Monitor and troubleshoot scraping jobs, handling anti-bot mechanisms, CAPTCHAs, rate limiting, and site structure changes

Collaborate with cross-functional teams to understand data requirements, prioritize sources, and define scraping specifications

Document scraping processes, data schemas, and technical decisions for knowledge sharing and continuity

Identify opportunities for process improvement and automation to increase efficiency and reduce turnaround time

Support the transition of work from external vendors, ensuring seamless continuity of data deliveries

Key Skills:

8+ years of professional experience in web scraping, data extraction, or data engineering

Strong proficiency in Python, with hands-on experience using scraping libraries and frameworks (Scrapy, BeautifulSoup, Selenium, Playwright, or similar)

Experience building and scheduling automated data pipelines (cron, Airflow, or equivalent orchestration tools)

Solid understanding of HTML, CSS, DOM structure, and browser developer tools for inspecting and reverse-engineering web pages

Familiarity with REST APIs, JSON, and techniques for extracting data from API endpoints

Experience with relational databases (PostgreSQL, MySQL) and proficiency in SQL

Ability to handle anti-scraping measures: proxy rotation, headless browsers, CAPTCHA handling, and request throttling

Strong problem-solving skills and attention to data quality and accuracy

Good to have Skills:

Experience with cloud platforms (AWS, GCP, or Azure) for deploying and scaling scraping infrastructure

Familiarity with containerization (Docker) and CI/CD pipelines

Experience with data transformation tools or ETL frameworks

Knowledge of natural language processing (NLP) or AI-assisted data extraction techniques

Prior experience in education data, institutional data, or similar structured-data domains

Experience with NoSQL databases (MongoDB, Elasticsearch) for handling semi-structured data

Apply Now