With the support from the Texas A&M Institute of Data Science, the Texas Engineering Experiment Station, the Texas A&M High Performance Research Computing, and the Texas A& University College of Engineering, We will offer a special topic course – ECEN 489 section 504 (CRN 40958) in Fall 2019 to teach undergraduate students various subjects in Computational Data Science.
Time & Location
Time: 4:10 PM to 5:25 PM on Monday and Wednesday.
Location: ZACH 160
Data Science is a multidisciplinary field that utilizes statistics, data analysis, machine learning, algorithms, software and computing systems to extract information, acquire knowledge, and gain insights into the underlying context from which data is generated. The course introduces students to the computational practice of Data Science through a sequence of interactive modules that provide an integrated hands-on approach to its methods, tools, and applications, and supporting technologies including high performance and cloud computing platforms. These modules prepare students for a concurrent semester-long project involving real-world applications of Data Science. The course is aimed both at students who wish to acquire knowledge of Data Science by developing fluency in its applications, and also students with previous exposure to Data Science foundations who wish to develop complementary skills in the use of state of the art systems and tools.
MATH 251 or STAT 211 with grade C or above. Advanced undergraduate standing (junior or senior classification)
Upon the completion of the course, the student should be able to:
- apply basic statistical concepts that are used in data science,carry out data science projects systematically,
- apply simple statistical techniques for data analysis,
- create and manage an open source software environment for data science projects,
- design non-trivial data science projects with Python,
- apply open source tools to read, update, and write JSON, CSV, XML and other structured data formats,
- retrieve data with simple SQL and NoSQL queries,
- explore a data set with limited contextual information to get insight through analysis and visualization,
- create, train, and deploy common machine learning models with scikit-learn,
- create, train, and deploy simple deep learning models with TensorFlow,
- apply high performance computing and cloud computing platforms to carry out big data analysis.
|1||Introduction to data science (get accounts on HPRC systems, cloud computing platforms, and local servers) – overview about how to have a successful data science project|
|2||Introduction to Linux and Python (attend HPRC short course on Linux if time permits) – identify datasets to work on for the final project|
|3||Statistical primer for data scientists (review other related math subjects if it is necessary) – define problem specification|
|4||Data analysis with pandas and numpy (assign or propose the final data science projects for the course and form teams) – preliminary analysis and visualization of the selected data set|
|5||Graph analytics for data scientists (graph theory primer and graph database for unstructured data) – determine approach|
|6||Data exploration and visualization with matplotlib and seaborn (learn to explore a data set with limited information about it) – initial analysis|
|8||Statistical learning with scikit-learn (linear regression, clustering, classification, support vector machine) – half way assessment and adjustment|
|9||Working with JSON, CSV and XML data formats (learn to use open source toolkits to read, update and write into JSON, CSV, and XML formats) – complete analysis in a sharable format|
|10||SQL and NoSQL for data scientists (learn to use SQL and NoSQL to retrieve data) – complete analysis with peer-review|
|11||Distributed computing and big data analytics with Spark (the fundamentals of high performance computing and cloud computing, big data, and Spark) – scale to big data|
|12||Machine learning with TensorFlow (basics of machine learning, deep learning, and convolutional neural network, TensorFlow) – create machine learning models to make predictions|
|13||Data Science and Society (current issues and how to overcome, roundtable discussion about the issues in data science projects and daily lives) – broader impact and finalize the report|
|14||Team Project Presentation|