🌻
Models
  • Step by step intro
  • Bash
  • Git
    • Remove folder
  • Embedding
    • Normalize Input
    • One-hot
  • Hyperparameter tuning
    • Test vs Validation
    • Bias vs Variance
    • Input
      • Normalize input
      • Initialize weight
    • Hidden Layer
      • Hidden layer size
    • Learning Rate
      • Oscillate learning rate
      • Learning rate finder
    • Batch Size
    • Epoch
    • Gradient
      • Vanishing / Exploding Gradients
      • Gradient Checking
    • Cost Function
      • Loss Binary Cross Entropy
    • Regularization
      • Lā‚‚ regularization
      • L₁ regularization
      • Dropout regularization
      • Data augmentation
      • Early stopping
  • Fine-tuning
    • Re-train on new data
    • Freeze layer/weight
  • Common Graphing Stats
    • Confidence interval (CI) and error bar
    • Confusion matrix and type I type II error
    • Effect size
  • Models
    • Inverted Pendulum Model
    • Recurrent Neural Networks
      • GRU and LSTM
      • Neural Turing Machines
    • Hopfield
    • Attention
      • Re-attention
      • Enformer
    • Differential Equations
      • Ordinary Differential Equations
        • Language Ordinary Differential Equations (ODE)
        • Neural Ordinary Differential Equations (ODE)
          • Adjoint Sensitive Method
          • Continuous Backpropagation
          • Adjoint ODE
      • Partial Differential Equations
      • Stochastic Differential Equations
    • Knowledge Tracing Models
      • Bayesian Knowledge Tracing
    • trRosetta
    • Curve up grades
  • deeplearning.ai
    • Neural Networks and Deep Learning
      • Wk2 - Python Basics with Numpy
      • Wk2 - Logistic Regression with a Neural Network mindset
      • Wk3 - Planar data classification with a hidden layer
      • Wk4 - Building your Deep Neural Network: Step by Step
      • Wk4 - Deep Neural Network - Application
    • Hyperparameter Tuning, Regularization and Optimization
      • Wk1 - Initialization
      • Wk1 - Regularization
      • Wk1 - Gradient Checking
    • Structuring Machine Learning Projects
    • Convolutional Neural Networks
    • Sequence Models
  • Neuroscience Paper
    • Rotation and Head Direction
    • Computational Models of Memory Search
    • Bayesian Delta-Rule Model Explains the Dynamics of Belief Updating
    • Sensory uncertainty and spatial decisions
    • A Neural Implementation of the Kalman Filter
    • Place cells, spatial maps and the population code for memory (Hopfield)
    • Spatial Cognitive Map
    • Event Perception and Memory
    • Interplay of Hippocampus and Prefrontal Cortex in Memory
    • The Molecular and Systems Biology of Memory
    • Reconsidering the Evidence for Learning in Single Cells
    • Single Cortical Neurons as Deep Artificial Neural Networks
    • Magnetic resonance-based eye tracking using deep neural networks
Powered by GitBook
On this page
  • Bayesian Knowledge Tracing
  • Recurrent Neural Network
  • Future work
  • References

Was this helpful?

  1. Models

Knowledge Tracing Models

Stated simply, knowledge tracing algorithms model students' knowledge acquisition throughout a learning program. Generally, the algorithm is trained on students' past performance data and predicts their future performance in the form of probabilities of correctness. As identified by Liu et al. [4], there are three basic knowledge tracing models, probabilistic models such as Bayesian Knowledge Tracing [2, 8], logistic models such as Logistic Knowledge Tracing [6], and deep-learning models such as Deep Knowledge Tracing [7] or Self-attention Knowledge Tracing [5].

Researchers believe that the level of mastery of a knowledge concept, also known as knowledge state, is a latent variable that can only be inferred from observable performance, such as correct or incorrect response to a question. Probabilistic models consist of parameters that can be used to infer knowledge states predefined by psychologists and mathematicians. In Bayesian Knowledge Tracing, the change of knowledge state is captured by a learning parameter P(T), the transition of knowledge state from unknown to known. However, the standard Bayesian model does not consider the probability of forgetting, nor can it simultaneously trace multiple knowledge concepts. Our dataset consists of skills from multiple domains at different ages; thus, a probabilistic model could be overly complex. Similarly, the logistic model applies logistic regression to the dataset and thus assumes uniform learning speeds of all students. This approach does not allow for personalized learning and therefore was dismissed. In contrast, deep-learning models compute the probabilities of observable outcomes, bypassing explicit approximation of knowledge states. The computations are parameterized with neural networks, which are able to capture complex interactions within high-dimensional data.

Bayesian Knowledge Tracing

Learning parameter P(T) - the transition of knowledge state from unknown to known:

unknown

known

unknown

1 - P(T)

P(T)

known

0

1

Recurrent Neural Network

One of the deep-learning models is Recurrent Neural Network (RNN). The model predicts future performance based on past performance captured by the hidden states hiāˆ’1h_{i-1}hiāˆ’1​ and current performance xi x_ixi​

hi=RNNCell(hiāˆ’1;xi) h_i = RNNCell(h_{i-1}; x_i)hi​=RNNCell(hiāˆ’1​;xi​)

where RNNCell is a differentiable non-linear function.

Future work

We plan to explore other architecture types of Recurrent Neural Networks such as Long Short-Term Memory, Gated Recurrent Units, and Neural Turing Machines [3, 9]. The main advantage of these models is the concept of memory, implemented through modifying hidden states through gates or rewriting external memory. This implementation enables the models to capture long-term dependencies and reveal a hidden structure among knowledge concepts through time.

We also intend to explore other knowledge tracing models. One caveat of the RNN model is that it assumes an even time interval between two time steps. However, this assumption weakens the model as it does not represent reality; for example, uneven gaps between data points due to children missing school during pandemics. However, this assumption weakens the model as children could miss school due to the ongoing pandemics. Another caveat of the RNN model is the lack of interpretability of its latent variables. Thus, a model that captures various time intervals and explicit learning speeds of different students is ideal. One candidate is deep learning models motivated by Neural ODE [1].

References

PreviousStochastic Differential EquationsNextBayesian Knowledge Tracing

Last updated 3 years ago

Was this helpful?

[1] Chen, R. T. Q., Rubanova, Y., Bettencourt, J., & Duvenaud, D. (2019). Neural ordinary differential equations. ArXiv:1806.07366 [Cs, Stat].

[2] Corbett, A. T., & Anderson, J. R. (1995). Knowledge tracing: Modeling the acquisition of procedural knowledge. User Modelling and User-Adapted Interaction, 4(4), 253–278.

[3] Graves, A., Wayne, G., & Danihelka, I. (2014). Neural turing machines. ArXiv:1410.5401 [Cs].

[4] Liu, Q., Shen, S., Huang, Z., Chen, E., & Zheng, Y. (2021). A survey of knowledge tracing. ArXiv:2105.15106 [Cs].

[5] Pandey, S., & Karypis, G. (2019). A self-attentive model for knowledge tracing. ArXiv:1907.06837 [Cs, Stat].

[6] Pavlik, J., Eglington, L. G., & Harrell-Williams, L. M. (2021). Logistic knowledge tracing: A constrained framework for learner modeling. ArXiv:2005.00869 [Stat].

[7] Piech, C., Spencer, J., Huang, J., Ganguli, S., Sahami, M., Guibas, L., & Sohl-Dickstein, J. (2015). Deep knowledge tracing. ArXiv:1506.05908 [Cs].

[8] Yudelson, M. V., Koedinger, K. R., & Gordon, G. J. (2013). Individualized bayesian knowledge tracing models. In H. C. Lane, K. Yacef, J. Mostow, & P. Pavlik (Eds.), Artificial Intelligence in Education (Vol. 7926, pp. 171–180). Springer Berlin Heidelberg.

[9] Zhang, J., Shi, X., King, I., & Yeung, D.-Y. (2017). Dynamic key-value memory networks for knowledge tracing. ArXiv:1611.08108 [Cs].

http://arxiv.org/abs/1806.07366
https://doi.org/10.1007/BF01099821
http://arxiv.org/abs/1410.5401
http://arxiv.org/abs/2105.15106
http://arxiv.org/abs/1907.06837
http://arxiv.org/abs/2005.00869
http://arxiv.org/abs/1506.05908
https://doi.org/10.1007/978-3-642-39112-5_18
http://arxiv.org/abs/1611.08108