STAT 39000: Project 1 — Spring 2022

Motivation: Welcome back! This semester should be a bit more straightforward than last semester in many ways. In the first project back, we will do a bit of UNIX review, a bit of Python review, and I’ll ask you to learn and write about some terminology.

Context: This is the first project of the semester! We will be taking it easy and slowly getting back to it.

Scope: UNIX, Python

Learning Objectives
  • Differentiate between concurrency and parallelism at a high level.

  • Differentiate between synchronous and asynchronous.

Make sure to read about, and use the template found here, and the important information about projects submissions here.

Dataset(s)

The following questions will use the following dataset(s):

  • /depot/datamine/data/

Questions

Question 1

Google the difference between synchronous and asynchronous — there is a lot of information online about this.

Explain what the following tasks are (in day-to-day usage) and why: asynchronous, or synchronous.

  • Communicating via email.

  • Watching a live lecture.

  • Watching a lecture that is recorded.

Please review our updated submission guidelines before submitting your project.

Items to submit
  • Code used to solve this problem.

  • Output from running the code.

Question 2

Given the following scenario and rules, explain the synchronous and asynchronous ways of completing the task.

You have 2 reports to write, and 2 wooden pencils. 1 sharpened pencil will write 1/2 of 1 report. You have a helper that is willing to sharpen 1 pencil at a time, for you, and that helper is able to sharpen a pencil in the time it takes to write 1/2 of 1 report.

You can assume you start with 2 sharpened pencils. Of course, if you assumed otherwise before the project was modified, you will get full credit with a different assumption.

Items to submit
  • Code used to solve this problem.

  • Output from running the code.

Question 3

Write Python code that simulates the scenario in question (2) that is synchronous. Make the time it takes to sharpen a pencil be 2 seconds. Make the time it takes to write .5 reports 5 seconds.

Use time.sleep to accomplish this.

How much time does it take to write the reports in theory?

Items to submit
  • Code used to solve this problem.

  • Output from running the code.

Question 4

The original text of the question is below. This is too difficult to do for this project. For this question, you are not required to write the code yourself. Rather, just answer the theoretical component to the question.

This question will be addressed in a future project, with better examples, and many more hints.

Read the StackOverflow post and write Python code that simulates the scenario in question (2) that is asynchronous. The time it takes to sharpen a pencil is 2 seconds and the time it takes to write .5 reports is 5 seconds.

Use async functions and asyncio.sleep to accomplish this.

How much time does it take to write the reports in theory?

Items to submit
  • Code used to solve this problem.

  • Output from running the code.

Question 5

In your own words, describe the difference between concurrency and parallelism. Then, look at the flights datasets here: /depot/datamine/data/flights/subset. Describe an operation that you could do to the entire dataset as a whole. Describe how you (in theory) could parallelize that process.

Now, assume that you had the entire frontend system at your disposal. Use a UNIX command to find out how many cores the frontend has. If processing 1 file took 10 seconds to do. How many seconds would it take to process all of the files? Now, approximately how many seconds would it take to process all the files if you had the ability to parallelize on this system?

Don’t worry about overhead or the like. Just think at a very high level.

Best make sure this sounds like a task you’d actually like to do — I may be asking you to do it in the not-too-distant future.

Items to submit
  • Code used to solve this problem.

  • Output from running the code.

Please make sure to double check that your submission is complete, and contains all of your code and output before submitting. If you are on a spotty internet connection, it is recommended to download your submission after submitting it to make sure what you think you submitted, was what you actually submitted.

In addition, please review our submission guidelines before submitting your project.