Vendôme

Posts

Showing posts from March, 2023

AGISF - Week 6 - Interpretability

- March 22, 2023

I'm blown away by how cool distill.pub is, definitely worth revisiting this spring break. What is interpretability? Develop methods of training capable neural networks to produce human-interpretable checkpoints, such that we know what these networks are doing and where to exert interference. Mechanistic interpretability is a subfield of interpretability which aims to understand networks on the level of individual neurons. After understanding neurons, we can identify how they construct increasingly complex representations, and develop a bottom-up understanding of how neural networks work. Concept-based interpretability focuses on techniques for automatically probing (and potentially modifying) human-interpretable concepts stored in representations within neural networks. Feature Visualization (2017) by Chris Olah, Alexander Mordvintsez and Ludwig Schubert A big portion of feature visualization is answering questions about what a netowork---or parts of a network--- are...

Read >>

Reverie

- March 22, 2023

Floral Patterns, Edward Denton I am your Sunday reverie button nose, slender waist, killer looks You'll be on guard for Monday blues workaholics, cynics, intoxicated crooks You are tossing and turning in your sleep idolizing my witty remarks as art I'm messing with your idea of perfection and every fortification is a futile rampart

Read >>

Desiderata

- March 19, 2023

In my dreams Ehrmann the orator would chant his fervent "Desiderata"; And virtuous Desdemona bathe in the glamour of an inamorata

Read >>

AGISF - Week 5 - Adversarial techniques for scalable oversight

- March 14, 2023

There's no better way to celebrate pi day than looking at scalable oversight. I welcome constructive feedback, this is completely anonymous. Write down how I can improve, doesn’t have to be deterministically beneficial. I prefer actionable changes. Previous examples: seek forecasting mentorship, read “The Pyramid Principle”, and be agentic. https://www.admonymous.co/lisawang __________________________________________________________________ AI-written critiques help humans notice flaws by OpenAI AI systems that rely on human evaluations as training signal may fall prey to faulty systematic evaluators. Proof of concept is to use SL to train LLMs that write critiques of short stories, Wikis, and other texts. An interesting find is that larger models are better at self-critiquing. Another finding is that larger models are able to directly improve their outputs, using self-critiques, which small models are unable to do. Unfortunately, models are better discriminating than at critiquing...

Read >>

AGISF - Week 4 - Task Decomposition for Scalable Oversight

- March 08, 2023

Today we celebrate International Women's day by looking at scalable oversight. Scalable oversight refers to methods that enable humans to oversee AI systems that are solving tasks too complicated for a single human to evaluate. Scalable oversight is an approach to prevent reward misspecification, and we can do this by iterated amplification. Iterated amplification is built upon the idea of task decomposition, which is the strategy of training agents to perform well on complex tasks by breaking down said tasks into more-evaluable tasks, then having them produce solutions for the full tasks. In this way, iterated amplification involves repeatedly using task decomposition to train increasingly powerful agents. The AGISF deck AI Alignment Landscape by Paul Christiano Intent alignment is trying to build AI systems that are performing as intended, and have a positive, long-run impact--- essentially, it is robust and reliable while being sufficiently competent. Here we define "wel...

Read >>

The Poisonmaster

- March 06, 2023

The Poisonmaster Written on September 25, 2015 The man vanished into a dim alley, which was silent, dark, and menacing. It was no more than a gap between two old terraced houses built many years ago, bleak and uninviting, especially in November. You couldn't see the far end, for a large, heavy oak gate had blocked it off many years ago to stop the thieves and misfits of the area, cutting through to the wastes that lay beyond the rear of the old, boarded up houses. The walls ran with slime, which covered the now long-forgotten graffitied brickwork. Bold footsteps echoed from the man’s heels, and moths circled above the flickering street light. The man is husky, large in comparison to the angular alley. His mouth and nose exhaled warm steam as he mumbled incoherent words; eyes darting from empty corridors and ghostly-lit windows. Charcoal strands of hair sat awkwardly on his oversized head, he was presumably 40; his barely visible neck is tied loos...

Read >>