TDM 30200: Project 1 — 2023

Motivation: Welcome back! This semester should be a bit more straightforward than last semester in many ways. In the first project back, we will do a bit of UNIX review, a bit of Python review, and I’ll ask you to learn and write about some terminology.

Context: This is the first project of the semester! We will be taking it easy and slowly getting back to it.

Scope: UNIX, Python

Learning Objectives
  • Differentiate between concurrency and parallelism at a high level.

  • Differentiate between synchronous and asynchronous.

Make sure to read about, and use the template found here, and the important information about projects submissions here.

Dataset(s)

The following questions will use the following dataset(s):

  • /anvil/projects/tdm/data

Questions

Question 1

Google the difference between synchronous and asynchronous — there is a lot of information online about this.

Explain what the following tasks are (in day-to-day usage) and why: asynchronous, or synchronous.

  • Communicating via email.

  • Watching a live lecture.

  • Watching a lecture that is recorded.

Items to submit
  • Code used to solve this problem.

  • Output from running the code.

Question 2

Given the following scenario and rules, explain the synchronous and asynchronous ways of completing the task.

You have 2 reports to write, and 2 wooden pencils. 1 sharpened pencil will write 1/2 of 1 report. You have a helper that is willing to sharpen 1 pencil at a time, for you, and that helper is able to sharpen a pencil in the time it takes to write 1/2 of 1 report.

Please assume you start with 2 sharpened pencils.

Items to submit
  • Code used to solve this problem.

  • Output from running the code.

Question 3

Write Python code that simulates the scenario in question (2) that is synchronous. Make the time it takes to sharpen a pencil be 2 seconds. Make the time it takes to write .5 reports 5 seconds.

Use time.sleep to accomplish this.

How much time does it take to write the reports in theory?

Here is some skeleton code to get you started.

def sharpen_pencil(pencil: dict) -> dict:
    if pencil['is_sharp']:
        return pencil
    else:
        time.sleep(2)
        pencil['is_sharp'] = True
        return pencil

def write_essays(number_essays: int, pencils: List[dict]):
    # fill in here
    # make sure both pencils are sharpened and sharpen both otherwise

    # write half the essay

    # dull first pencil

    # write the other half essay

    # dull second pencil
    pass

def simulate_story():
    pencils = [{'name': 'pencil1', 'is_sharp': True}, {'name': 'pencil1', 'is_sharp': True}]
    write_essays(2, pencils)
%%time

simulate_story()
Items to submit
  • Code used to solve this problem.

  • Output from running the code.

Question 4

Read the StackOverflow post and think about the scenario in question (2) that is asynchronous. Assume the time it takes to sharpen a pencil is 2 seconds and the time it takes to write .5 reports is 5 seconds.

How much time does it take to write the reports in theory, if you use the asynchronous method? Explain.

Items to submit
  • Code used to solve this problem.

  • Output from running the code.

Question 5

In your own words, describe the difference between concurrency and parallelism. Then, look at the flights datasets here: /anvil/projects/tdm/data/flights/subset. Describe an operation that you could do to the entire dataset as a whole. Describe how you (in theory) could parallelize that process.

Now, assume that you had the entire frontend system at your disposal. Use a UNIX command to find out how many cores the frontend has. If processing 1 file took 10 seconds to do. How many seconds would it take to process all of the files? Now, approximately how many seconds would it take to process all the files if you had the ability to parallelize on this system?

Don’t worry about overhead or the like. Just think at a very high level.

Best make sure this sounds like a task you’d actually like to do — I may be asking you to do it in the not-too-distant future.

Items to submit
  • Code used to solve this problem.

  • Output from running the code.

Please make sure to double check that your submission is complete, and contains all of your code and output before submitting. If you are on a spotty internet connection, it is recommended to download your submission after submitting it to make sure what you think you submitted, was what you actually submitted.

In addition, please review our submission guidelines before submitting your project.