Posts

Showing posts from January, 2023

What AI can do: enumerated

Image
"Frankenstein was written during the first Industrial Revolution, a period of enormous changes that provoked confusion and anxiety for many. It asked searching questions about man's relationship with technology: are we creating a monster we cannot control, are we losing our humanity, our compassion, our ability to feel empathy and emotions?"  This was written by Paolo Gallo, the Chief Human Resources officer at World Economic Forum.  "Frankenstein relies on the notion that humans will inherently reject artificial intelligence as unnatural and bizarre. A great deal of that is owed to the particularly odd appearance of Frankenstein's monster... But what about when AI comes in a more attractive package, one that has real utility?" Now here is the ever-expanding list:  - playing checkers  - natural language processing  - be an intelligent personal assistant - GANs (optimizing both a generator & discriminator) - DALL-E (optimizing both an embedding prior mode

Stoicism

Image
Every rendezvous is your exultant surrender to my intransigent soul and unlimited ambition I have known pain or fear or guilt never  for rationality is my best vindication  Though your defiant frankness is an unrevealing sincerity   I practice the principle of scarcity  though indulging how the corners of your eyes crinkle and unfold  I plan to let time erode  Stoicism bids at the price of innocence  so you have earned the right to be light-hearted I am done with melancholy  all my troubled waters are charted 

How likely is deception alignment in practice?

Image
How likely is deceptive alignment in practice? Speaker:  Buck  Shlegeris What does ML inductive biases look like? 1. High path-dependence:  - Different training runs can converge to very different models depending on the particular path taken through model space  2. Low path-dependence - Similar training processes converge to essentially the same, simple solution, regardless of early training dynamics Deceptive alignment in the high path-dependence world Suppose our training process is good enough that, for the model to do well, it has to fully understand what we want--- essentially what you get in the limit of doing enough adversarial training.  Goal attainment for models:  1. How much marginal performance improvement do we get from each step toward the model class? 2. How many steps are needed until the model becomes a member of that class? Types of Alignment  1. Internal alignment:  An  internally aligned  mesa-optimizer is a robustly aligned mesa-optimizer that has internalized the

Heuristics

Image
DesignBoom Graphic - sculptural, AI-generated facades renaissance + baroque forms with fluid silk What is a heuristic technique?  From our most credible source, I gathered that it is...  An approach to problem-solve by employing a practical method, arriving at a satisfactory solution nonetheless by taking mental shortcuts and easing the cognitive load of making a decision.  Although, these practices are not resistant to cognitive biases. We operate on the terms of bounded rationality, or in my own words, conditional rationality. It is conditional in the sense that within computational boundaries and situational urgency, the behavioral strategic selection is a perfected one.  Heuristics converge on human psychology, delving into self-consciousness and fine-tuning, one decision upon another. Some of the most frequently used examples of heuristics are anchoring and adjustment, which is simply giving a lower or higher bound to allow for systematic adjustments and reasonably deviate from th

Alignment landscape, learned content

Image
More research output on LessWrong than on arxiv, interesting How might white-box methods fit into the Alignment Plan:  1. Model internal access during training and deployment 2. The promise of AI to empower  Within every research group working on ML models, we can decompose the workforce into such categories: 1. Data team (paying humans to generate data points) 2. Oversight team  3. Deployment of SGD-where-RLHF-is-the-algorithm team  RLHF is Reinforcement Learning from Human Feedback, and the problems with baseline RLHF are oversight and catastrophes. Current proposals that have these problems are: 1. using AIs to help oversee (oversight) 2. Adversarial training (catastrophes) After reading Holden Karnofsky's post " [How might we align transformative AI if it’s developed very soon?] ", we can conclude that the remaining problems for current ML models are:  1. Eliciting latent knowledge  2. Easier to detect fakes than to produce fakes. For ChatGPT at least, it is difficult

EA and the FTX collapse

Image
 EA and the state of FTX  EAGxBerkeley, Dec 2-4th, 2022 - #1 rule in cryptocurrency--- never let customers trade collateral  - ensuring no gambling - Alameda Research was a customer that was allowed to do gambling during the 2021 bull market  - Alameda was short about 8 billion (not known at the time) which resulted in liabilities superseding assets, committing one of the greatest accounting errors in history  So what can we say about FTX liquidation in general?  Perhaps the fact that it was not handling risk management well, fraudulent accounting and risky derivatives caused domino effects  1. misconduct in statements from SBF was not publicly known 2. depositors are victims (virtue ethics...) What can we learn from this mistake?  1. make sure future EA know how smart oversights work  2. more governance, and sloppy accounting because of agent/CEO demanding sloppy accounting can not be perpetuated further  3. Creating more "firewalls" What are some of the persistent problems?