Fall 2024 Syllabus - The Data Mine Seminar

Course Information

Course Number and Title CRN

TDM 10100 - The Data Mine I

possible CRNs 12067 or 12072 or 12073 or 12071 or 24448 or 28162 or 28160 or 28161

TDM 20100 - The Data Mine III

possible CRNs 12117 or 12106 or 12113 or 12118 or 24450 or 28174 or 28166 or 28171

TDM 30100 - The Data Mine V

possible CRNs 12104 or 12112 or 12115 or 12120 or 24451 or 28173 or 28165 or 28170

TDM 40100 - The Data Mine VII

possible CRNs 12103 or 12111 or 12114 or 12119 or 24449 or 28172 or 28163 or 28167

TDM 50100 - The Data Mine Seminar

possible CRNs 15644 or 30617 or 30618 or 30619 or 28177 or 28184 or 28175

Course credit hours: 1 credit hour, so you should expect to spend about 3 hours per week doing work for the class

Prerequisites: TDM 10100 and TDM 10200 can be taken in either order. Both of these courses are introductory. TDM 10100 is an introduction to data analysis in R. TDM 10200 is an introduction to data analysis in Python.

For all of the remaining TDM seminar courses, students are expected to take the courses in order (with a passing grade), namely, TDM 20100, 20200, 30100, 30200, 40100, 40200. The topics in these courses build on the knowledge from the previous courses. All students, regardless of background are welcome. TDM 50100 is geared toward graduate students and can be taken repeatedly; TDM 50100 meets concurrently with the other courses, at whichever level is appropriate for the graduate students in the course. We can make adjustments on an individual basis if needed.

Course Web Pages

Meeting Times

There are officially 4 Monday class times: 8:30 am, 9:30 am, 10:30 am (all in the Hillenbrand Dining Court atrium—no meal swipe required), and 4:30 pm (synchronous online, recorded and posted later; This online meeting is also available to students participating in Seminar from other universities outside of Purdue). There is also an asynchronous class section. All the information you need to work on the projects each week will be provided online on the Thursday of the previous week, and we encourage you to get a head start on the projects before class time. Dr. Ward does not lecture during the class meetings. Instead, the seminar time is a good time to ask questions and get help from Dr. Ward, the T.A.s, and your classmates. Attendance is not required. The T.A.s will have many daytime and evening office hours throughout the week.

Course Description

The Data Mine is a supportive environment for students in any major and from any background who want to learn some data science skills. Students will have hands-on experience with computational tools for representing, extracting, manipulating, interpreting, transforming, and visualizing data, especially big data sets, and in effectively communicating insights about data. Topics include: the R environment, Python, visualizing data, UNIX, bash, regular expressions, SQL, XML and scraping data from the internet, as well as selected advanced topics, as time permits.

Learning Outcomes

By the end of the course, you will be able to:

  1. Discover data science and professional development opportunities in order to prepare for a career.

  2. Explain the difference between research computing and basic personal computing data science capabilities in order to know which system is appropriate for a data science project.

  3. Design efficient search strategies in order to acquire new data science skills.

  4. Devise the most appropriate data science strategy in order to answer a research question.

  5. Apply data science techniques in order to answer a research question about a big data set.

Mapping to Foundational Learning Outcome (FLO) = Information Literacy

Note: The Data Mine has applied for the course seminar to satisfy the information literacy outcome, but this request is still under review by the university. This request has not yet been approved.

  1. Identify a line of inquiry that requires information, including formulating questions and determining the scope of the investigation. In each of the 14 weekly projects, the scope is described at a high level at the very top of the project. Students are expected to tie their analysis on the individual weekly questions back to the stated scope. As an example of the stated scope in a project: Understanding how to use Pandas and be able to develop functions allows for a systematic approach to analyzing data. In this project, students will already be familiar with Pandas but will not (yet) know at the outset how to "develop functions" and take a "systematic approach" to solving the questions. Students are expected to comment on each question about how their "line of inquiry" and "formulation of the question" ties back to the stated scope of the project. As the seminar progresses past the first few weeks, and the students are being asked to tackle more complex problems, they need to identify which Python, SQL, R, and UNIX tools to use, and which statements and queries to run (this is "formulating the questions"), in order to get to analyze the data, derive the results, and summary the results in writing and visualizations ("determining the scope of the investigation").

  2. Locate information using effective search strategies and relevant information sources. The Data Mine seminar progresses by increasing the complexity of the problems. The students are being asked to solve complex problems using data science tools. Students need to "locate information" within technical documentation, API documentation, online manuals, online discussions such as Stack Overflow, etc. Within these online resources, they need to determine the "relevant information sources" and apply these sources to solve the data analysis problem at hand. They need to understand the context, motivation, technical notation, nomenclature of the tools, etc. We enable students to practice this skill on every weekly project during the semester, and we provide additional resources, such as Piazza (an online discussion platform to interact with peers, teaching assistants, and the instructor), office hours throughout the week, and attending in-person or virtual seminar, for interaction directly with the instructor.

  3. Evaluate the credibility of information. The students work together this objective in several ways. They need evaluate and analyze the "credibility of information" and data from a wide array of resources, e.g., from the federal government, from Kaggle, from online repositories and archives, etc. Each project during the semester focuses attention on a large data repository, and the students need to understand the credible data, the missing data, the inaccurate data, the data that are outliers, etc. Some of the projects for students involve data cleansing efforts, data imputation, data standardization, etc. Students also need to validate, verify, determine any missing data, understand variables, correlation, contextual information, and produce models and data visualizations from the data under consideration.

  4. Synthesize and organize information from different sources in order to communicate. This is a key aspect of The Data Mine. In many of the student projects, they need to assimilate geospatial data, categorical and numerical data, textual data, and visualizations, in order to have a comprehensive data analysis of a system or a model. The students can use help from Piazza, office hours, the videos from the instructor and seminar live sessions to synthesize and organize the information they are learning about, in each project. The students often need to also understand many different types of tools and aspects of data analysis, sometimes in the same project, e.g., APIs, data dictionaries, functions, concepts from software engineering such as scoping, encapsulation, containerization, and concepts from spatial and temporal analysis. Synthesizing many "different sources" to derive and "communicate" the analysis is a key aspect of the projects.

  5. Attribute original ideas of others through proper citing, referencing, paraphrasing, summarizing, and quoting. In every project, students need to use "citations to sources" (online and written), "referencing" forums and blogs where their cutting-edge concepts are "documented", proper methods of "quotation" and "citation", documentation of any teamwork, etc. The students have a template for their project submissions in which they are required to provide the proper citation of any sources, collaborations, reference materials, etc., in each and every project that they submit every week.

  6. Recognize relevant cultural and other contextual factors when using information. Students weekly project include data and information on data about (all types of genders), political data, geospatial questions, online forums and rating schema, textual data, information about books, music, online repositories, etc. Students need to understand not only the data analysis but also the "context" in which the data is provided, the data sources, the potential usage of the analysis and its "cultural" implications, etc. Students also complete professional development, attending several professional development and outside-the-classroom events each semester. The meet with alumni, business professionals, data practitioners, data engineers, managers, scientists from national labs, etc. They attend events about the "culture related to data science", and "multicultural events". Students are required to respond in writing to every such event, and their writing is graded and incorporated into the grades for the course.

  7. Observe ethical and legal guidelines and requirements for the use of published, confidential, and/or proprietary information. Students complete an academic integrity quiz at the beginning of each semester that sets the stage of these "ethical and legal guidelines and requirements". They have documentation about proper data handling and data management techniques. They learn about the context of data usage, including (for instance) copyrights, the difference between open source and proprietary data, different types of software licenses, the need for confidentiality with Corporate Partners projects, etc.

Assessment of Foundational Learning Outcome (FLO) = Information Literacy

Note: The Data Mine has applied for the course seminar to satisfy the information literacy outcome, but this request is still under review by the university. This request has not yet been approved.

  1. Assessment method for this course. Students are assigned a weekly project that usually includes a data set and then questions about the data set that engage the student in experiential learning. Each week, these projects are graded by teaching assistants based on solutions provided.

  2. Identify a line of inquiry that requires information, including formulating questions and determining the scope of the investigation. Students are assigned a weekly project that usually includes a data set and then questions about the data set that engage the student in experiential learning. Each week, these projects are graded by teaching assistants based on solutions provided. Students identify which R and Python statements and queries to run (this is formulating the questions), in order to get to the results they think they are looking for (determining the scope of the investigation).

  3. Locate information using effective search strategies and relevant information sources. Students are assigned a weekly project that usually includes a data set and then questions about the data set that engage the student in experiential learning. Each week, these projects are graded by teaching assistants based on solutions provided. The students are being asked to solve complex problems using data science tools. They need to figure out what they are looking to figure out, and to do that they need to figure out what to ask.

  4. Evaluate the credibility of information. Students are assigned a weekly project that usually includes a data set and then questions about the data set that engage the student in experiential learning. Each week, these projects are graded by teaching assistants based on solutions provided. Some of the projects that students complete in the course involve data cleansing efforts including validation, verification, missing data, and modeling and students must evaluate the credibility as they move through the project.

  5. Synthesize and organize information from different sources in order to communicate. Students are assigned a weekly project that usually includes a data set and then questions about the data set that engage the student in experiential learning. Each week, these projects are graded by teaching assistants based on solutions provided. Information on how to complete the projects is learned through many sources and student utilize an experiential learning model.

  6. Attribute original ideas of others through proper citing, referencing, paraphrasing, summarizing, and quoting. Students are assigned a weekly project that usually includes a data set and then questions about the data set that engage the student in experiential learning. Each week, these projects are graded by teaching assistants based on solutions provided set and then questions about the data set that engage the student in experiential learning. At the beginning of each project there is a question regarding citations for the project.

  7. Recognize relevant cultural and other contextual factors when using information. Students are assigned a weekly project that usually includes a data set and then questions about the data set that engage the student in experiential learning. Each week, these projects are graded by teaching assistants based on solutions provided. For professional development event assessment – students are required to attend three approved events and then write a guided summary of the event.

  8. Observe ethical and legal guidelines and requirements for the use of published, confidential, and/or proprietary information. Students complete an academic integrity quiz at the beginning of each semester, and they are also graded on their proper documentation and usage of data throughout the semester, on every weekly project.

Required Materials

  • A laptop so that you can easily work with others. Having audio/video capabilities is useful.

  • Access to Brightspace, Gradescope, and Piazza course pages.

  • Access to Jupyter Lab at the On Demand Gateway on Anvil: ondemand.anvil.rcac.purdue.edu/

  • "The Examples Book": the-examples-book.com

  • Good internet connection.

Attendance Policy

When conflicts or absences can be anticipated, such as for many University-sponsored activities and religious observations, the student should inform the instructor of the situation as far in advance as possible.

For unanticipated or emergency absences when advance notification to the instructor is not possible, the student should contact the instructor as soon as possible by email or phone. When the student is unable to make direct contact with the instructor and is unable to leave word with the instructor’s department because of circumstances beyond the student’s control, and in cases falling under excused absence regulations, the student or the student’s representative should contact or go to the Office of the Dean of Students website to complete appropriate forms for instructor notification. Under academic regulations, excused absences may be granted for cases of grief/bereavement, military service, jury duty, parenting leave, and medical excuse. For details, see the Academic Regulations & Student Conduct section of the University Catalog website.

How to succeed in this course

If you would like to be a successful Data Mine student:

  • Start on the weekly projects on or before Mondays so that you have plenty of time to get help from your classmates, TAs, and Data Mine staff. Don’t wait until the due date to start!

  • Be excited to challenge yourself and learn impressive new skills. Don’t get discouraged if something is difficult—you’re here because you want to learn, not because you already know everything!

  • Remember that Data Mine staff and TAs are excited to work with you! Take advantage of us as resources.

  • Network! Get to know your classmates, even if you don’t see them in an actual classroom. You are all part of The Data Mine because you share interests and goals. You have over 800 potential new friends!

  • Use "The Examples Book" with lots of explanations and examples to get you started. Google, Stack Overflow, etc. are all great, but "The Examples Book" has been carefully put together to be the most useful to you. the-examples-book.com

  • Expect to spend approximately 3 hours per week on the projects. Some might take less time, and occasionally some might take more.

  • Don’t forget about the syllabus quiz, academic integrity quiz, and outside event reflections. They all contribute to your grade and are part of the course for a reason.

  • If you get behind or feel overwhelmed about this course or anything else, please talk to us!

  • Stay on top of deadlines. Announcements will also be sent out every Monday morning, but you should keep a copy of the course schedule where you see it easily.

  • Read your emails!

Information about the Instructors

The Data Mine Staff

Name Title

Shared email we all read

[email protected]

Kevin Amstutz

Senior Data Scientist

Donald Barnes

Guest Relations Administrator

Maggie Betz

Managing Director of The Data Mine at Indianapolis

Kimmie Casale

ASL Tutor

Bryce Castle

Corporate Partners Technical Specialist

Cai Chen

Corporate Partners Technical Specialist

Doug Crabill

Senior Data Scientist

Stacey Dunderman

Program Administration Specialist

Jessica Gerlach

Corporate Partners Technical Specialist

Dan Hirleman

Regional Director of The Data Mine of the Rockies

Jessica Jud

Senior Manager of Expansion Operations

Kali Lacy

Associate Research Engineer

Gloria Lenfestey

Senior Financial Analyst

Nicholas Lenfestey

Interim Managing Director of Corporate Partners

Naomi Mersinger

ASL Interpreter / Strategic Initiatives Coordinator

Kim Rechkemmer

Senior Program Administration Specialist

Katie Sanders

Chief Operating Officer

Betsy Satchell

Senior Administrative Assistant

Diva Sharma

Corporate Partners Technical Specialist

Dr. Mark Daniel Ward

Executive Director

The Data Mine Team uses a shared email which functions as a ticketing system. Using a shared email helps the team manage the influx of questions, better distribute questions across the team, and send out faster responses. You can use the Piazza forum to get in touch. In particular, Dr. Ward responds to questions on Piazza faster than by email.

Communication Guidance

  • For questions about how to do the homework, use Piazza or visit office hours. You will receive the fastest response by using Piazza versus emailing us.

  • For general Data Mine questions, email [email protected]

  • For regrade requests, use Gradescope’s regrade feature within Brightspace. Regrades should be requested within 1 week of the grade being posted.

Office Hours

Office hours are held in person in Hillenbrand lobby and on Zoom. Check the schedule to see the available times.

Piazza

Piazza is an online discussion board where students can post questions at any time, and Data Mine staff or T.A.s will respond. Piazza is available through Brightspace. There are private and public postings. Last year we had over 11,000 interactions on Piazza, and the typical response time was around 5-10 minutes.

Assignments and Grades

Course Schedule & Due Dates

Click below to view the Fall 2024 Course Schedule:

See the schedule and later parts of the syllabus for more details, but here is an overview of how the course works:

In the first week of the beginning of the semester, you will have some "housekeeping" tasks to do, which include taking the Syllabus quiz and Academic Integrity quiz.

Generally, every week from the very beginning of the semester, you will have your new projects released on a Thursday, and they are due 8 days later on the following Friday at 11:55 pm Purdue West Lafayette (Eastern) time. This semester, there are 14 weekly projects, but we only count your best 10. This means you could miss up to 4 projects due to illness or other reasons, and it won’t hurt your grade.

We suggest trying to do as many projects as possible so that you can keep up with the material. The projects are much less stressful if they aren’t done at the last minute, and it is possible that our systems will be stressed if you wait until Friday night causing unexpected behavior and long wait times. Try to start your projects on or before Monday each week to leave yourself time to ask questions.

Outside of projects, you will also complete 3 Outside Event reflections. More information about these is in the "Outside Event Reflections" section below. The Data Mine does not conduct or collect an assessment during the final exam period. Therefore, TDM Courses are not required to follow the Quiet Period in the Academic Calendar.

Projects

  • The projects will help you achieve Learning Outcomes #2-5.

  • Each weekly programming project is worth 10 points.

  • There will be 14 projects available over the semester, and your best 10 will count.

  • The 4 project grades that are dropped could be from illnesses, absences, travel, family emergencies, or simply low scores. No excuses necessary.

  • No late work will be accepted, even if you are having technical difficulties, so do not work at the last minute.

  • There are many opportunities to get help throughout the week, either through Piazza or office hours. We’re waiting for you! Ask questions!

  • Follow the instructions for how to submit your projects properly through Gradescope in Brightspace.

  • It is ok to get help from others or online, although it is important to document this help in the comment sections of your project submission. You need to say who helped you and how they helped you.

  • Each week, the project will be posted on the Thursday before the seminar, the project will be the topic of the seminar and any office hours that week, and then the project will be due by 11:55 pm Eastern time on the following Friday. See the schedule for specific dates.

  • If you need to request a regrade on any part of your project, use the regrade request feature inside Gradescope. The regrade request needs to be submitted within one week of the grade being posted (we send an announcement about this).

Outside Event Reflections

  • The Outside Event reflections will help you achieve Learning Outcome #1. They are an opportunity for you to learn more about data science applications, career development, and diversity.

  • Throughout the semester, The Data Mine will have many special events and speakers, typically happening in person so you can interact with the presenter, but some may be online and possibly recorded.

  • These eligible opportunities will be posted on The Data Mine’s website (datamine.purdue.edu/events/) and updated frequently. Feel free to suggest good events that you hear about, too.

  • You are required to attend 3 of these over the semester, with 1 due each month. See the schedule for specific due dates.

  • You are welcome to do all 3 reflections early. For example, you could submit all 3 reflections in September.

  • You must submit your outside event reflection within 1 week of attending the event or watching the recording.

  • Follow the instructions on Brightspace for writing and submitting these reflections.

  • At least one of these events should be on the topic of Professional Development. These events will be designated by "PD" next to the event on the schedule.

  • This semester you will answer questions directly in Gradescope including the name of the event and speaker, the time and date of the event, what was discussed at the event, what you learned from it, what new ideas you would like to explore as a result of what you learned at the event, and what question(s) you would like to ask the presenter if you met them at an after-presentation reception. This should not be just a list of notes you took from the event—it is a reflection.

  • We read every single reflection! We care about what you write! We have used these connections to provide new opportunities for you, to thank our speakers, and to learn more about what interests you.

Late Work Policy

We generally do NOT accept late work. For the projects, we count only your best 10 out of 14, so that gives you a lot of flexibility. We need to be able to post answer keys for the rest of the class in a timely manner, and we can’t do this if we are waiting for other students to turn their work in.

Grade Distribution

Projects (best 10 out of Projects #1-14)

86%

Outside event reflections (3 total)

12%

Academic Integrity Quiz

1%

Syllabus Quiz

1%

Total

100%

Grading Scale

In this class grades reflect your achievement throughout the semester in the various course components listed above. Your grades will be maintained in Brightspace. This course will follow the 90-80-70-60 grading scale for A, B, C, D cut-offs. If you earn a 90.000 in the class, for example, that is a solid A. /- grades will be given at the instructor’s discretion below these cut-offs. If you earn an 89.11 in the class, for example, this may be an A- or a B. * A: 100.000% - 90.000% * B: 89.999% - 80.000% * C: 79.999% - 70.000% * D: 69.999% - 60.000% * F: 59.999% - 0.000%

Academic Integrity

Academic integrity is one of the highest values that Purdue University holds. Individuals are encouraged to alert university officials to potential breaches of this value by either emailing or by calling 765-494-8778. While information may be submitted anonymously, the more information that is submitted provides the greatest opportunity for the university to investigate the concern.

In TDM 10100/20100/30100/40100/50100, we encourage students to work together. However, there is a difference between good collaboration and academic misconduct. We expect you to read over this list, and you will be held responsible for violating these rules. We are serious about protecting the hard-working students in this course. We want a grade for The Data Mine seminar to have value for everyone and to represent what you truly know. We may punish both the student who cheats and the student who allows or enables another student to cheat. Punishment could include receiving a 0 on a project, receiving an F for the course, and incidents of academic misconduct reported to the Office of The Dean of Students.

Good Collaboration:

  • First try the project yourself, on your own.

  • After trying the project yourself, then get together with a small group of other students who have also tried the project themselves to discuss ideas for how to do the more difficult problems. Document in the comments section any suggestions you took from your classmates or your TA.

  • Finish the project on your own so that what you turn in truly represents your own understanding of the material.

  • Look up potential solutions for how to do part of the project online, but document in the comments section where you found the information.

  • If the assignment involves writing a long, worded explanation, you may proofread somebody’s completed written work and allow them to proofread your work. Do this only after you have both completed your own assignments, though.

Academic Misconduct:

  • Divide up the problems among a group. (You do #1, I’ll do #2, and he’ll do #3: then we’ll share our work to get the assignment done more quickly.)

  • Attend a group work session without having first worked all of the problems yourself.

  • Allowing your partners to do all of the work while you copy answers down, or allowing an unprepared partner to copy your answers.

  • Letting another student copy your work or doing the work for them.

  • Sharing files or typing on somebody else’s computer or in their computing account.

  • Getting help from a classmate or a TA without documenting that help in the comments section.

  • Looking up a potential solution online without documenting that help in the comments section.

  • Reading someone else’s answers before you have completed your work.

  • Have a tutor or TA work though all (or some) of your problems for you.

  • Uploading, downloading, or using old course materials from Course Hero, Chegg, or similar sites.

  • Using the same outside event reflection (or parts of it) more than once. Using an outside event reflection from a previous semester.

  • Using somebody else’s outside event reflection rather than attending the event yourself.

The Purdue Honor Pledge "As a boilermaker pursuing academic excellence, I pledge to be honest and true in all that I do. Accountable together - we are Purdue"

Please refer to the student guide for academic integrity for more details.

Disclaimer

This syllabus is subject to small changes. All questions and feedback are always welcome!