Neural Net Training
Learn about the training process for neural nets and how weights are adjusted to minimize loss
<div style='margin-bottom: 20px;'>
<div style="margin-bottom: 10px; background-color: #f2f2f2; border-radius: 1rem; padding: 10px 20px;">
<h2 style="font-weight: bold; margin-bottom: 3px; font-size: 1.5rem;">What type of neural networks does ChatGPT use?</h2>
<p style="font-weight: normal; font-size: 1.2rem;">Transformer nets</p>
</div>
<div style="margin-bottom: 10px; background-color: #f2f2f2; border-radius: 1rem; padding: 10px 20px;">
<h2 style="font-weight: bold; margin-bottom: 3px; font-size: 1.5rem;">What method is mentioned as revealing some structure in the 768x768 matrix of weights for GPT-2?</h2>
<p style="font-weight: normal; font-size: 1.2rem;">Taking 64x64 moving averages.</p>
</div>
<div style="margin-bottom: 10px; background-color: #f2f2f2; border-radius: 1rem; padding: 10px 20px;">
<h2 style="font-weight: bold; margin-bottom: 3px; font-size: 1.5rem;">What type of learning makes it easier to train ChatGPT?</h2>
<p style="font-weight: normal; font-size: 1.2rem;">Unsupervised learning.</p>
</div>
<div style="margin-bottom: 10px; background-color: #f2f2f2; border-radius: 1rem; padding: 10px 20px;">
<h2 style="font-weight: bold; margin-bottom: 3px; font-size: 1.5rem;">What additional capabilities would a complete symbolic discourse language need?</h2>
<p style="font-weight: normal; font-size: 1.2rem;">It would need built-in 'calculi' about general things in the world, like the movement of objects.</p>
</div>
<div style="margin-bottom: 10px; background-color: #f2f2f2; border-radius: 1rem; padding: 10px 20px;">
<h2 style="font-weight: bold; margin-bottom: 3px; font-size: 1.5rem;">What token is included in training to indicate the end of an output?</h2>
<p style="font-weight: normal; font-size: 1.2rem;">End token</p>
</div>
<div style="margin-bottom: 10px; background-color: #f2f2f2; border-radius: 1rem; padding: 10px 20px;">
<h2 style="font-weight: bold; margin-bottom: 3px; font-size: 1.5rem;">How many attention blocks does GPT-2 have, according to the text?</h2>
<p style="font-weight: normal; font-size: 1.2rem;">12</p>
</div>
<div style="margin-bottom: 10px; background-color: #f2f2f2; border-radius: 1rem; padding: 10px 20px;">
<h2 style="font-weight: bold; margin-bottom: 3px; font-size: 1.5rem;">How many attention heads does GPT-3 have in ChatGPT?</h2>
<p style="font-weight: normal; font-size: 1.2rem;">96</p>
</div>
<div style="margin-bottom: 10px; background-color: #f2f2f2; border-radius: 1rem; padding: 10px 20px;">
<h2 style="font-weight: bold; margin-bottom: 3px; font-size: 1.5rem;">What is the basic idea for finding weights that reproduce a desired function in neural networks?</h2>
<p style="font-weight: normal; font-size: 1.2rem;">The basic idea is to supply many 'input → output' examples for the network to learn from and then find weights that can reproduce these examples.</p>
</div>
<div style="margin-bottom: 10px; background-color: #f2f2f2; border-radius: 1rem; padding: 10px 20px;">
<h2 style="font-weight: bold; margin-bottom: 3px; font-size: 1.5rem;">Can specific examples and patterns in computational systems always prevent unexpected outcomes?</h2>
<p style="font-weight: normal; font-size: 1.2rem;">No</p>
</div>
<div style="margin-bottom: 10px; background-color: #f2f2f2; border-radius: 1rem; padding: 10px 20px;">
<h2 style="font-weight: bold; margin-bottom: 3px; font-size: 1.5rem;">What is an example of using existing resources for training data in machine learning?</h2>
<p style="font-weight: normal; font-size: 1.2rem;">Using alt tags provided for images on the web.</p>
</div>
<div style="margin-bottom: 10px; background-color: #f2f2f2; border-radius: 1rem; padding: 10px 20px;">
<h2 style="font-weight: bold; margin-bottom: 3px; font-size: 1.5rem;">What do attention heads recombine in the embedding vectors?</h2>
<p style="font-weight: normal; font-size: 1.2rem;">Chunks in the embedding vectors associated with different tokens, with certain weights.</p>
</div>
<div style="margin-bottom: 10px; background-color: #f2f2f2; border-radius: 1rem; padding: 10px 20px;">
<h2 style="font-weight: bold; margin-bottom: 3px; font-size: 1.5rem;">What is the tradeoff between capability and trainability in systems?</h2>
<p style="font-weight: normal; font-size: 1.2rem;">More computational capability leads to less trainability</p>
</div>
<div style="margin-bottom: 10px; background-color: #f2f2f2; border-radius: 1rem; padding: 10px 20px;">
<h2 style="font-weight: bold; margin-bottom: 3px; font-size: 1.5rem;">What happens if a neural network is too small?</h2>
<p style="font-weight: normal; font-size: 1.2rem;">It cannot reproduce the function desired.</p>
</div>
<div style="margin-bottom: 10px; background-color: #f2f2f2; border-radius: 1rem; padding: 10px 20px;">
<h2 style="font-weight: bold; margin-bottom: 3px; font-size: 1.5rem;">What is a benefit of having a 'squeeze' in the middle of a neural network?</h2>
<p style="font-weight: normal; font-size: 1.2rem;">It allows for a smaller network by forcing everything through a smaller number of intermediate neurons.</p>
</div>
<div style="margin-bottom: 10px; background-color: #f2f2f2; border-radius: 1rem; padding: 10px 20px;">
<h2 style="font-weight: bold; margin-bottom: 3px; font-size: 1.5rem;">How many computational steps are typically needed to train the network, based on the number of weights?</h2>
<p style="font-weight: normal; font-size: 1.2rem;">About n^2 computational steps are needed.</p>
</div>
<div style="margin-bottom: 10px; background-color: #f2f2f2; border-radius: 1rem; padding: 10px 20px;">
<h2 style="font-weight: bold; margin-bottom: 3px; font-size: 1.5rem;">What is a common challenge in machine learning related to neural networks?</h2>
<p style="font-weight: normal; font-size: 1.2rem;">Acquiring or preparing the necessary training data.</p>
</div>
<div style="margin-bottom: 10px; background-color: #f2f2f2; border-radius: 1rem; padding: 10px 20px;">
<h2 style="font-weight: bold; margin-bottom: 3px; font-size: 1.5rem;">What is a 'loss function' in the context of neural networks?</h2>
<p style="font-weight: normal; font-size: 1.2rem;">A loss function, such as the L2 loss function, measures the sum of the squares of the differences between the network's output values and the true values.</p>
</div>
<div style="margin-bottom: 10px; background-color: #f2f2f2; border-radius: 1rem; padding: 10px 20px;">
<h2 style="font-weight: bold; margin-bottom: 3px; font-size: 1.5rem;">What is the primary method used to train ChatGPT initially?</h2>
<p style="font-weight: normal; font-size: 1.2rem;">Showing it large amounts of existing text from the web, books, etc.</p>
</div>
<div style="margin-bottom: 10px; background-color: #f2f2f2; border-radius: 1rem; padding: 10px 20px;">
<h2 style="font-weight: bold; margin-bottom: 3px; font-size: 1.5rem;">What is the primary function of the transformer in ChatGPT?</h2>
<p style="font-weight: normal; font-size: 1.2rem;">To transform the original collection of embeddings for the sequence of tokens to a final collection.</p>
</div>
<div style="margin-bottom: 10px; background-color: #f2f2f2; border-radius: 1rem; padding: 10px 20px;">
<h2 style="font-weight: bold; margin-bottom: 3px; font-size: 1.5rem;">How many attention heads does GPT-2 have?</h2>
<p style="font-weight: normal; font-size: 1.2rem;">12</p>
</div>
<div style="margin-bottom: 10px; background-color: #f2f2f2; border-radius: 1rem; padding: 10px 20px;">
<h2 style="font-weight: bold; margin-bottom: 3px; font-size: 1.5rem;">What do humans typically avoid with their brain activities?</h2>
<p style="font-weight: normal; font-size: 1.2rem;">Computational irreducibility</p>
</div>
<div style="margin-bottom: 10px; background-color: #f2f2f2; border-radius: 1rem; padding: 10px 20px;">
<h2 style="font-weight: bold; margin-bottom: 3px; font-size: 1.5rem;">Why do current methods of training neural networks like ChatGPT involve significant financial investment?</h2>
<p style="font-weight: normal; font-size: 1.2rem;">Because of the n^2 computational steps required for training.</p>
</div>
<div style="margin-bottom: 10px; background-color: #f2f2f2; border-radius: 1rem; padding: 10px 20px;">
<h2 style="font-weight: bold; margin-bottom: 3px; font-size: 1.5rem;">What is the primary function of ChatGPT?</h2>
<p style="font-weight: normal; font-size: 1.2rem;">To produce a reasonable continuation of whatever text it has so far.</p>
</div>
<div style="margin-bottom: 10px; background-color: #f2f2f2; border-radius: 1rem; padding: 10px 20px;">
<h2 style="font-weight: bold; margin-bottom: 3px; font-size: 1.5rem;">Why might neural nets need to see many examples to train well?</h2>
<p style="font-weight: normal; font-size: 1.2rem;">Because they need to 'see a lot of examples' to learn effectively.</p>
</div>
<div style="margin-bottom: 10px; background-color: #f2f2f2; border-radius: 1rem; padding: 10px 20px;">
<h2 style="font-weight: bold; margin-bottom: 3px; font-size: 1.5rem;">What distinguishes ChatGPT's computational setup from a Turing machine?</h2>
<p style="font-weight: normal; font-size: 1.2rem;">Results are not reprocessed by the same computational elements; each neuron is used only once per output token.</p>
</div>
<div style="margin-bottom: 10px; background-color: #f2f2f2; border-radius: 1rem; padding: 10px 20px;">
<h2 style="font-weight: bold; margin-bottom: 3px; font-size: 1.5rem;">Why might a sentence that passes semantic grammar not be realized in practice?</h2>
<p style="font-weight: normal; font-size: 1.2rem;">Because it might describe scenarios that are not possible in the actual world, like 'The elephant traveled to the Moon'.</p>
</div>
<div style="margin-bottom: 10px; background-color: #f2f2f2; border-radius: 1rem; padding: 10px 20px;">
<h2 style="font-weight: bold; margin-bottom: 3px; font-size: 1.5rem;">What is the length of the re-weighted embedding vector for GPT-3 in ChatGPT?</h2>
<p style="font-weight: normal; font-size: 1.2rem;">12,288</p>
</div>
<div style="margin-bottom: 10px; background-color: #f2f2f2; border-radius: 1rem; padding: 10px 20px;">
<h2 style="font-weight: bold; margin-bottom: 3px; font-size: 1.5rem;">What is the core of ChatGPT's ability to generate text?</h2>
<p style="font-weight: normal; font-size: 1.2rem;">A large language model (LLM) that estimates the probabilities of sequences occurring.</p>
</div>
<div style="margin-bottom: 10px; background-color: #f2f2f2; border-radius: 1rem; padding: 10px 20px;">
<h2 style="font-weight: bold; margin-bottom: 3px; font-size: 1.5rem;">Why can't the probabilities for all possible n-grams be estimated from existing English text?</h2>
<p style="font-weight: normal; font-size: 1.2rem;">There isn't enough English text ever written to deduce those probabilities due to the vast number of possible n-grams.</p>
</div>
<div style="margin-bottom: 10px; background-color: #f2f2f2; border-radius: 1rem; padding: 10px 20px;">
<h2 style="font-weight: bold; margin-bottom: 3px; font-size: 1.5rem;">How many words are estimated to be in the web and digitized books combined?</h2>
<p style="font-weight: normal; font-size: 1.2rem;">Approximately a few hundred billion words in the web crawl and another hundred billion in digitized books.</p>
</div>
<div style="margin-bottom: 10px; background-color: #f2f2f2; border-radius: 1rem; padding: 10px 20px;">
<h2 style="font-weight: bold; margin-bottom: 3px; font-size: 1.5rem;">What is the primary function of attention heads in the context of transformers?</h2>
<p style="font-weight: normal; font-size: 1.2rem;">To look back in the sequence of tokens and package up the past in a form useful for finding the next token.</p>
</div>
<div style="margin-bottom: 10px; background-color: #f2f2f2; border-radius: 1rem; padding: 10px 20px;">
<h2 style="font-weight: bold; margin-bottom: 3px; font-size: 1.5rem;">What does the success of ChatGPT imply about human language, according to the text?</h2>
<p style="font-weight: normal; font-size: 1.2rem;">It implies there is more structure and simplicity to meaningful human language than previously known, possibly governed by simple rules.</p>
</div>
<div style="margin-bottom: 10px; background-color: #f2f2f2; border-radius: 1rem; padding: 10px 20px;">
<h2 style="font-weight: bold; margin-bottom: 3px; font-size: 1.5rem;">What initiates the process of generating a new token in ChatGPT?</h2>
<p style="font-weight: normal; font-size: 1.2rem;">An array of numbers representing the embedding vectors for the tokens so far.</p>
</div>
<div style="margin-bottom: 10px; background-color: #f2f2f2; border-radius: 1rem; padding: 10px 20px;">
<h2 style="font-weight: bold; margin-bottom: 3px; font-size: 1.5rem;">What type of functions can 'no-intermediate-layer' networks learn?</h2>
<p style="font-weight: normal; font-size: 1.2rem;">Essentially linear functions.</p>
</div>
<div style="margin-bottom: 10px; background-color: #f2f2f2; border-radius: 1rem; padding: 10px 20px;">
<h2 style="font-weight: bold; margin-bottom: 3px; font-size: 1.5rem;">What does the text suggest is a common misconception about the ability to identify 'mathematical-physics-like' 'semantic laws of motion' in ChatGPT's operations?</h2>
<p style="font-weight: normal; font-size: 1.2rem;">It suggests that expecting to identify such laws by studying ChatGPT's internal behavior might be misguided due to looking at the wrong variables.</p>
</div>
<div style="margin-bottom: 10px; background-color: #f2f2f2; border-radius: 1rem; padding: 10px 20px;">
<h2 style="font-weight: bold; margin-bottom: 3px; font-size: 1.5rem;">What does learning involve in terms of data?</h2>
<p style="font-weight: normal; font-size: 1.2rem;">Compressing data by leveraging regularities</p>
</div>
<div style="margin-bottom: 10px; background-color: #f2f2f2; border-radius: 1rem; padding: 10px 20px;">
<h2 style="font-weight: bold; margin-bottom: 3px; font-size: 1.5rem;">What is needed beyond syntactic grammar to deal with meaning in language?</h2>
<p style="font-weight: normal; font-size: 1.2rem;">A semantic grammar that considers finer gradations and concepts like 'moving' and 'objects maintaining identity independent of location' is needed.</p>
</div>
<div style="margin-bottom: 10px; background-color: #f2f2f2; border-radius: 1rem; padding: 10px 20px;">
<h2 style="font-weight: bold; margin-bottom: 3px; font-size: 1.5rem;">What does computational irreducibility imply about the limits of learning?</h2>
<p style="font-weight: normal; font-size: 1.2rem;">There's a limit to the regularities that can be found</p>
</div>
<div style="margin-bottom: 10px; background-color: #f2f2f2; border-radius: 1rem; padding: 10px 20px;">
<h2 style="font-weight: bold; margin-bottom: 3px; font-size: 1.5rem;">What tends to happen to the text generated by ChatGPT for longer pieces?</h2>
<p style="font-weight: normal; font-size: 1.2rem;">It tends to wander off in non-human-like ways.</p>
</div>
<div style="margin-bottom: 10px; background-color: #f2f2f2; border-radius: 1rem; padding: 10px 20px;">
<h2 style="font-weight: bold; margin-bottom: 3px; font-size: 1.5rem;">How can computational devices serve neural nets?</h2>
<p style="font-weight: normal; font-size: 1.2rem;">As tools</p>
</div>
<div style="margin-bottom: 10px; background-color: #f2f2f2; border-radius: 1rem; padding: 10px 20px;">
<h2 style="font-weight: bold; margin-bottom: 3px; font-size: 1.5rem;">What is used to predict the ratings given by humans to ChatGPT's outputs?</h2>
<p style="font-weight: normal; font-size: 1.2rem;">Another neural net model.</p>
</div>
<div style="margin-bottom: 10px; background-color: #f2f2f2; border-radius: 1rem; padding: 10px 20px;">
<h2 style="font-weight: bold; margin-bottom: 3px; font-size: 1.5rem;">What is the fundamental basis of neural nets?</h2>
<p style="font-weight: normal; font-size: 1.2rem;">Numbers</p>
</div>
<div style="margin-bottom: 10px; background-color: #f2f2f2; border-radius: 1rem; padding: 10px 20px;">
<h2 style="font-weight: bold; margin-bottom: 3px; font-size: 1.5rem;">What was Galileo's approach to understanding the time it takes for a cannon ball to fall?</h2>
<p style="font-weight: normal; font-size: 1.2rem;">He made a model to compute the answer instead of measuring each case.</p>
</div>
<div style="margin-bottom: 10px; background-color: #f2f2f2; border-radius: 1rem; padding: 10px 20px;">
<h2 style="font-weight: bold; margin-bottom: 3px; font-size: 1.5rem;">How does ChatGPT determine the next token to generate?</h2>
<p style="font-weight: normal; font-size: 1.2rem;">By picking up the last embedding in the collection and decoding it to produce a list of probabilities for the next token.</p>
</div>
<div style="margin-bottom: 10px; background-color: #f2f2f2; border-radius: 1rem; padding: 10px 20px;">
<h2 style="font-weight: bold; margin-bottom: 3px; font-size: 1.5rem;">Why is it difficult to think through the steps of a nontrivial program in one's brain?</h2>
<p style="font-weight: normal; font-size: 1.2rem;">Due to the complexity and effort required</p>
</div>
<div style="margin-bottom: 10px; background-color: #f2f2f2; border-radius: 1rem; padding: 10px 20px;">
<h2 style="font-weight: bold; margin-bottom: 3px; font-size: 1.5rem;">What allows us to perform computationally irreducible tasks?</h2>
<p style="font-weight: normal; font-size: 1.2rem;">Computers</p>
</div>
<div style="margin-bottom: 10px; background-color: #f2f2f2; border-radius: 1rem; padding: 10px 20px;">
<h2 style="font-weight: bold; margin-bottom: 3px; font-size: 1.5rem;">What is the number of possible 2-grams with 40,000 common words?</h2>
<p style="font-weight: normal; font-size: 1.2rem;">1.6 billion.</p>
</div>
<div style="margin-bottom: 10px; background-color: #f2f2f2; border-radius: 1rem; padding: 10px 20px;">
<h2 style="font-weight: bold; margin-bottom: 3px; font-size: 1.5rem;">What does ChatGPT implicitly discover during its training?</h2>
<p style="font-weight: normal; font-size: 1.2rem;">Rules</p>
</div>
<div style="margin-bottom: 10px; background-color: #f2f2f2; border-radius: 1rem; padding: 10px 20px;">
<h2 style="font-weight: bold; margin-bottom: 3px; font-size: 1.5rem;">What is 'transfer learning'?</h2>
<p style="font-weight: normal; font-size: 1.2rem;">A method to reduce data requirements by transferring in important features learned in another network.</p>
</div>
<div style="margin-bottom: 10px; background-color: #f2f2f2; border-radius: 1rem; padding: 10px 20px;">
<h2 style="font-weight: bold; margin-bottom: 3px; font-size: 1.5rem;">What does using sufficiently long n-grams aim to achieve in text generation?</h2>
<p style="font-weight: normal; font-size: 1.2rem;">It aims to generate essay-length sequences of words with correct overall essay probabilities.</p>
</div>
</div>