PEI Group is a subscriber-focused business intelligence company that focuses on private investment markets in real estate, infrastructure, private equity, private debt, and specialist sector-specific activities within private asset classes. We provide industry-leading journalism, data, and market insight to subscribing clients via a wide portfolio of specialist brands supported by our robust and scalable digital publishing, analytics, and database platform.
Since its inception in 2001, we have grown into a subscriber-focused business intelligence company with our multi-talented global team of over 400 people, spread across EMEA, USA & Asia, our purpose is to inform and connect investment professionals across global, specialised markets.
As a Senior Data Engineer, you will be responsible for designing, implementing, and maintaining data processing pipelines and workflows using Databricks on the Azure platform. Your expertise in PySpark, SQL, Databricks, test-driven development, and Docker will be essential to the success of our data engineering initiatives.
Roles and responsibilities:
- Collaborate with cross-functional teams to understand data requirements and design scalable and efficient data processing solutions.
- Develop and maintain data pipelines using PySpark and SQL on the Databricks platform.
- Optimize and tune data processing jobs for performance and reliability.
- Implement automated testing and monitoring processes to ensure data quality and reliability.
- Work closely with data scientists, data analysts, and other stakeholders to understand their data needs and provide effective solutions.
- Troubleshoot and resolve data-related issues, including performance bottlenecks and data quality problems.
- Stay up to date with industry trends and best practices in data engineering and Databricks.
Key Requirements:
- 5+ years of experience as a Data Engineer, with a focus on Databricks and cloud-based data platforms with a minimum of 2 years of experience in writing unit/end-to-end tests for data pipelines and ETL processes on Databricks.
- Hands-on experience in PySpark programming for data manipulation, transformation, and analysis.
- Strong experience in SQL and writing complex queries for data retrieval and manipulation.
- Experience in developing and implementing test cases for data processing pipelines using a test-driven development approach.
- Experience in Docker for containerising and deploying data engineering applications is good to have.
- Experience in the scripting language Python is mandatory.
- Strong knowledge of Databricks platform and its components, including Databricks notebooks, clusters, and jobs.
- Experience in designing and implementing data models to support analytical and reporting needs will be an added advantage.
- Strong Knowledge of Azure Data Factory for Data orchestration, ETL workflows, and data integration is good to have.
- Good to have knowledge of cloud-based storage such as Amazon S3 and Azure Blob Storage.
- Bachelor's or Master's degree in Computer Science, Engineering, or a related field.
- Strong analytical and problem-solving skills.
- Strong English communication skills, both written and spoken, are crucial.
- Capability to solve complex technical issues and comprehend risks prior to the circumstance.