Posts

Showing posts from December, 2022

Base optimizer and mesa-optimizer comparison

Image
  Here I will be documenting my understanding of what a base optimizer and a mesa-optimizer are, and putting them in comparison.  The Github link is here:  https://github.com/evhub/mesa-optimization/blob/master/post1.md - The base optimizer is a gradient descent process to create a model, and the model is designed to accomplish some specific task - The mesa optimizer produces a base optimizer that is itself good at optimizing systems  - The mesa-optimizer is different from a subsystem because it is an optimization process, not an agent.  - Mesa-optimization happens when the base optimizer can find a model that exists for the purpose of optimizing another  - Within every optimizer, there are objectives - Unlike the base objective, the mesa-objective is not specified directly by the programmers - Mesa-optimization sometimes leads to mismatch of base and mesa-objectives. This is called misalignment.  - We can call a model generated by the base optimizer as a learned algorithm Now we will

AI, ML models, and data ethics

Image
There is definitely no better way to spend Christmas than to look at AI progress and, in particular, gain forecasting capabilities from trend extrapolation.  The definition of “fermiized” is breaking concepts down into sub questions for which more rigorous methods can be applied. For example, Bayesian reasoning and Bayesian calculations are computing probabilities after breaking them down into more manageable parts. I will fermiize AI progress. AI and Compute  Three factors drive the advance of AI: algorithmic innovation, data, and the amount of compute available for training. What I gathered was that the compute does not equate to direct usefulness or direct constraint on performance. Important breakthroughs are still made with modest amounts of compute. My summary of Srivastava's DropOut paper Deep neural networks are powerful tools with many parameters, but the machine learning system can be laborious and slow to use. This is due to units co-adapting and the difficulty in integ