NMBU, DAT121: Introduction to data science (August 2023)
August block, lecture days from 14th August to 1st September 2023 (on all working days except 16th and 30th August). A typical lecture days' structure looks as follows:
- Discussion of the previous day's work: 9.15 - 10.00
- Lecture and problem solving: 10.15 - 12.00
- Project work and tutorial: 13.15 - 15.00
The rooms TF1-201 and TF1-212 are reserved for the whole duration of the module.
Instructor: Martin Thomas Horsch (office: TF2-303A)
Teaching assistant: Mahrin Tasfe
Information:
Recommended literature:
- W. McKinney, Python for Data Analysis, 3rd edn., O'Reilly (ISBN 978-1-09810403-0), 2022.
- A. Silberschatz, H. F. Korth, S. Sudarshan, Database System Concepts, 7th edn. (international student edn.), McGraw-Hill Education (ISBN 978-1-26008450-4), 2019.
Material:
- The glossary document keeps track of our work toward agreed definitions of selected key concepts.
- See the list of code examples for some of the material discussed in the lectures.
Structure:
-
Python basics
- Schedule (14th and 15th August 2023)
- Slides from the joint intro with IMRT100, first lecture slides, second lecture slides, lab worksheet
- Textbook: McKinney (2019), Chapters 2 (basics), 3 (built-in data structures), and 4 (numpy)
- Documentation and tools:
- Literature:
- S. Cass, "Top programming languages 2022," IEEE Spectrum, 2022
- Kunnskapsdepartementet, Kultur- og likestillingsdepartementet, "Frå ord til handling: Handlingsplan for norsk fagspråk i akademia," 2023
- O. Tomic, T. Graff, K. H. Liland, T. Næs, "hoggorm: A Python library for explorative multivariate statistics," J. Open Source Softw. 4(39): 980, doi:10.21105/joss.00980, 2019
- Register for membership in Data Science's linjeforening (students' union)
- Data and objects
- Schedule (17th and 18th August 2023)
- First lecture slides, second lecture slides, lab worksheet
- Textbook: Silberschatz et al. (2019), Sections 6.2 (E-R model), 6.4 (cardinalities), 6.7 (reducing E-R to relational schemas), 6.10 (notations for modelling data), and 8.1 (semi-structured data)
- Documentation and tools:
- Regression basics
- Schedule (21st to 24th August 2023)
- First lecture slides, second lecture slides, lab worksheet
- Textbook: McKinney (2019), Chapters 9 (plotting and visualization), 10 (data aggregation), and 11 (time series)
- Documentation and tools:
- Literature:
- H. Flyvbjerg, H. G. Petersen, "Error estimates on averages of correlated data," J. Chem. Phys. 91(1): 461-466, doi:10.1063/1.457480, 1989
- J. Mayer, K. Khairy, J. Howard, "Drawing an elephant with four complex parameters," Am. J. Phys. 78(6): 648-649, doi:10.1119/1.3254017, 2010
- T. Vigen: Spurious correlations
- Good practice (and bad practice)
- Schedule (25th August 2023)
- Lecture slides, lab worksheet
- Literature:
- C. Bezerra, F. Santana, F. Freitas, "CQChecker: A tool to check ontologies in OWL-DL using competency questions written in controlled natural language," Learning and Nonlinear Models 12(2): 115-129, doi:10.21528/lnlm-vol12-no2-art4, 2014
- M. Gruninger, M. S. Fox, "The role of competency questions in enterprise engineering," in A. Rolstadås, Benchmarking: Theory and Practice, pp. 22-31, doi:10.1007/978-0-387-34847-6_3, 1995
- C. M. Keet, An Introduction to Ontology Engineering, 2020, Chapters 6 (top-down ontology development) and 7 (bottom-up ontology development)
- H. E. Plesser, "Reproducibility vs. replicability: A brief history of a confused terminology," Frontiers Neuroinform. 11: 76, doi:10.3389/fninf.2017.00076, 2018
- Norwegian Reproducibility Network
- Multidimensionality
- Student presentations
License:
Any code examples from this module are released by NMBU REALTEK's Institutt for datavitskap under the conditions of the Creative Commons BY-NC-SA 4.0 License (attribution, non-commercial, share-alike).
Index