Posts

Showing posts from February, 2023

AGI Fundamentals - Week 3 - Goal misgeneralisation

Image
MiniTorch has been particularly helpful this week. I've been kept indoors by the sudden week-long downpour of rain.  def goal misgeneralisation:  when agents in new situations generalize to behaving in competent yet undesirable ways, because           of learning the wrong goals from previous training Goal Misgeneralisation: Why Correct Specifications Aren't Enough For Correct Goals  A typical system that tends to arrive at goal misgeneralisation by:  1. Training a system with a correct specification 2. The system only sees specification values on the training data  3. The system learns a policy  4. ... which is consistent with the specification on the training distribution  5. Under a distribution shift  6. ... The policy pursues an undesired goal  Some function f maps input x as a member of set of inputs and y as a member of set of labels. In RL, X is the set of states or observation histories, and Y is the set of actions.  A scoring function s will evaluate the performanc

AGI Fundamentals - Not Attending EAG

Image
The reason that I will not be attending EA Global Bay Area this February is that I do not have as strong an incentive to repeat EAGxBerkley activities & networking. I found my time better utilized consuming LessWrong posts, reading EA content, and producing content. I will be attending EAG London in May, up to 90% certainty.  I really do love tea lattes--- the stuff that makes my thinking go much faster than normal. I'd even like to think I am funnier after caffeine consumption, what a waste of a comedic genius pouring its guts out to instrumental convergence.  It would be super cool to find a way to attend NeuRIPS. I need someone to sponsor me. Bain's new client OpenAI might be my "in". Now I just need to get into Bain. *aggressively sips coffee* It is here I took a longer break and went off to wonderland.  https://aisafetyideas.com/ is a visual representation of the alignment landscape, and how each AI safety entity is categorized. I see my current involvement!

AGI Fundamentals: Week 2 - Reward misspecification and instrumental convergence

Image
Since my flight to Atlanta has been delayed for three hours, I find this time adequate for week 2: reward misspecification and instrumental convergence.  Specification gaming: the flip side of AI ingenuity.  Specification gaming is when the system's "achievement" of meeting objective without undergoing the intended outcome. Reinforcement learning agents may have been designed to getting reward without successful completion of the task delegated.  Deepmind research provides a list of existing lists of specific gaming problems.  In the Lego stacking task. The objective is to elevate the position of a piece of red lego piece, by stacking on top of a blue lego piece. The agent was reward for the height of the bottom face of the red block when it is not touching the block. Instead, the agent flipped over the red block to collect the reward.  It seems to me that it's a reward function misnomer.  With all my talk about alignment, I understand I haven't formulated a clea

AGI Fundamentals: Week 1

Image
I followed the Machine Learning Safety Scholar curriculum held by Center for AI Safety in summer 2022, and I plan to finish the AGI Safety Fundamental curriculum by June 2023. These notes are my jargon megafile, should have written this paragraph in week 0.  I've found that if I spend enough time engaging with a certain activity, I can grow to be extremely fond of it. Previously, the activities that I associated with are: field hockey, computational biology, journalism. I am pivoting to: tech consulting, AI safety, and general programming.  From interviewing at Anthropic and Epoch, I've found start-ups to be much more well-resourced and incentivized than academia. Community projects and decentralized efforts like EleutherAI face the expertise bottleneck --- I wonder if aggregation is possible, and how.  ___________________________________________________________________ def Foundation models : models trained on broad data that can be adapted to a wide range of downstream tasks.