Getting Started with Data Science

As The Data Science team within The Data Mine our goal is to provide the technical support that helps to make your project successful.

We loved working in different technical environments and helping students learn during their data science journey.

We do have challenges and environmental limitations. This guide is designed to help provide an overview of our services and outline how we can best collaborate as you develop your project.

Data Science Team Strengths

  • Coding - providing environments in Python, R, SQL, or other languages for student use.

    • This includes Jupyter Lab or R Studio.

    • This also includes the installation of any packages that the team would need to use.

  • Databases - setting up PostgreSQL, MariaDB, Mongodb, sqlite3 databases for student teams to use in testing and development.

  • GitHub - providing secure, Purdue hosted, GitHub environments for the student teams.

  • Containers - leveraging Purdue’s Geddes environment to deploy containers for student use.

    • We have access and experience with Singularity and Docker.

    • This also includes the use of open-source Kubernetes tools, like Rancher.

    • Upon request, students can be granted access to a Rancher interface to Geddes to deploy custom services.

  • Windows and Linux servers - deployed within Purdue’s environment and connected to our secured data storage locations.

    • These allow students to leverage Purdue’s licenses for products like Tableau, PowerBI, and ArcGIS Pro.

  • Data science - we love meeting with teams to help answer data science or data engineering questions.

    • We also develop content and examples to help with student learning, when appropriate.

  • Project brainstorming - need help thinking through your potential project? We love collaborating to brainstorm projects for student success.

  • Secure data storage - we work with the awesome Research Computing (RCAC) team to provide secure data storage locations for the teams that we work with.

  • GPUs - working with RCAC we are well equipped to offer GPU resources for teams.

  • Technical communication - the whole Data Mine team loves to work with students to review presentations and ensure that they are communicating information in a clear and easy-to-understand way.

This is not an exhaustive list! We love to continue learning and working to improve our systems.

We also value your feedback as we continue to evolve.

Data Science Team Challenges

  • Production Environments - we are happy to work with you to plan how the students will transition their project to your production environment, but we don’t have the capacity to support production applications at this time.

  • External Containers - while we can expose containers on Geddes outside of Purdue’s network, this raises our threat profile and is done sparingly.

  • Software Licenses - Purdue provides many software licenses that students can take advantage of. However, The Data Mine does not have budget for the purchase of software.

  • Local Development - we take every measure to ensure that data is secure. This is difficult to accomplish when it is stored on a student’s local computer. In addition, multiple custom coding environments become very difficult to support.

  • Leading Technical Development - we love to provide students with resources and help them iterate on potential solutions. However, we often don’t have the resources to lead the research ourselves. We collaborate with our awesome team mentors and other faculty at Purdue to ensure that the students get the support they need.

  • Client/Server Development Environments - when developing server/client-like applications (dashboards, RESTful APIs, any project that communicates on a given port), it is typical to have an environment where the programmer can modify code, and quickly view the result by first running the server and using a browser or CLI to interact with the running server. We do not currently have tooling that makes it easy to create uniform developement environments for applications of this nature. Our solutions currently involve virtual machines and/or encouraging local development.

Want to know more?

Check out our Technical Systems document for more information.