Remove Copyright Remove Info Remove PDF
article thumbnail

Data Pipeline with Airflow and AWS Tools (S3, Lambda & Glue)

Towards Data Science

Our task will be to extract these questions from the actual exams, which are available as PDFs on the MEC (Ministry of Education) website [CC BY-ND 3.0]. Extract questions from PDF. Always be responsible when planning to create a web scraper: check the site’s terms of use and the hosted content copyright. Image by Author.

Data 98