Knowledge Tracing Models
Stated simply, knowledge tracing algorithms model students' knowledge acquisition throughout a learning program. Generally, the algorithm is trained on students' past performance data and predicts their future performance in the form of probabilities of correctness. As identified by Liu et al. [4], there are three basic knowledge tracing models, probabilistic models such as Bayesian Knowledge Tracing [2, 8], logistic models such as Logistic Knowledge Tracing [6], and deep-learning models such as Deep Knowledge Tracing [7] or Self-attention Knowledge Tracing [5].
Researchers believe that the level of mastery of a knowledge concept, also known as knowledge state, is a latent variable that can only be inferred from observable performance, such as correct or incorrect response to a question. Probabilistic models consist of parameters that can be used to infer knowledge states predefined by psychologists and mathematicians. In Bayesian Knowledge Tracing, the change of knowledge state is captured by a learning parameter P(T), the transition of knowledge state from unknown to known. However, the standard Bayesian model does not consider the probability of forgetting, nor can it simultaneously trace multiple knowledge concepts. Our dataset consists of skills from multiple domains at different ages; thus, a probabilistic model could be overly complex. Similarly, the logistic model applies logistic regression to the dataset and thus assumes uniform learning speeds of all students. This approach does not allow for personalized learning and therefore was dismissed. In contrast, deep-learning models compute the probabilities of observable outcomes, bypassing explicit approximation of knowledge states. The computations are parameterized with neural networks, which are able to capture complex interactions within high-dimensional data.
Bayesian Knowledge Tracing
Learning parameter P(T) - the transition of knowledge state from unknown to known:
unknown | known | |
unknown | 1 - P(T) | P(T) |
known | 0 | 1 |
Recurrent Neural Network
One of the deep-learning models is Recurrent Neural Network (RNN). The model predicts future performance based on past performance captured by the hidden states $h_{i-1}$ and current performance $x_i$
where RNNCell is a differentiable non-linear function.
Future work
We plan to explore other architecture types of Recurrent Neural Networks such as Long Short-Term Memory, Gated Recurrent Units, and Neural Turing Machines [3, 9]. The main advantage of these models is the concept of memory, implemented through modifying hidden states through gates or rewriting external memory. This implementation enables the models to capture long-term dependencies and reveal a hidden structure among knowledge concepts through time.
We also intend to explore other knowledge tracing models. One caveat of the RNN model is that it assumes an even time interval between two time steps. However, this assumption weakens the model as it does not represent reality; for example, uneven gaps between data points due to children missing school during pandemics. However, this assumption weakens the model as children could miss school due to the ongoing pandemics. Another caveat of the RNN model is the lack of interpretability of its latent variables. Thus, a model that captures various time intervals and explicit learning speeds of different students is ideal. One candidate is deep learning models motivated by Neural ODE [1].
References
[1] Chen, R. T. Q., Rubanova, Y., Bettencourt, J., & Duvenaud, D. (2019). Neural ordinary differential equations. ArXiv:1806.07366 [Cs, Stat]. http://arxiv.org/abs/1806.07366
[2] Corbett, A. T., & Anderson, J. R. (1995). Knowledge tracing: Modeling the acquisition of procedural knowledge. User Modelling and User-Adapted Interaction, 4(4), 253ā278. https://doi.org/10.1007/BF01099821
[3] Graves, A., Wayne, G., & Danihelka, I. (2014). Neural turing machines. ArXiv:1410.5401 [Cs]. http://arxiv.org/abs/1410.5401
[4] Liu, Q., Shen, S., Huang, Z., Chen, E., & Zheng, Y. (2021). A survey of knowledge tracing. ArXiv:2105.15106 [Cs]. http://arxiv.org/abs/2105.15106
[5] Pandey, S., & Karypis, G. (2019). A self-attentive model for knowledge tracing. ArXiv:1907.06837 [Cs, Stat]. http://arxiv.org/abs/1907.06837
[6] Pavlik, J., Eglington, L. G., & Harrell-Williams, L. M. (2021). Logistic knowledge tracing: A constrained framework for learner modeling. ArXiv:2005.00869 [Stat]. http://arxiv.org/abs/2005.00869
[7] Piech, C., Spencer, J., Huang, J., Ganguli, S., Sahami, M., Guibas, L., & Sohl-Dickstein, J. (2015). Deep knowledge tracing. ArXiv:1506.05908 [Cs]. http://arxiv.org/abs/1506.05908
[8] Yudelson, M. V., Koedinger, K. R., & Gordon, G. J. (2013). Individualized bayesian knowledge tracing models. In H. C. Lane, K. Yacef, J. Mostow, & P. Pavlik (Eds.), Artificial Intelligence in Education (Vol. 7926, pp. 171ā180). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-39112-5_18
[9] Zhang, J., Shi, X., King, I., & Yeung, D.-Y. (2017). Dynamic key-value memory networks for knowledge tracing. ArXiv:1611.08108 [Cs]. http://arxiv.org/abs/1611.08108
Last updated