Inside every AI is a bigger AI, trying to get out
You\u2019ve probably heard AI is a \u201cblack box\u201d. No one knows how it works. Researchers simulate a weird type of pseudo-neural-tissue, \u201creward\u201d it a little every time it becomes a little more like the AI they want, and eventually it becomes the AI they want. But God only knows what goes on inside of it.
This is bad for safety. For safety, it would be nice to look inside the AI and see whether it\u2019s executing an algorithm like \u201cdo the thing\u201d or more like \u201ctrick the humans into thinking I\u2019m doing the thing\u201d. But we can\u2019t. Because we can\u2019t look inside an AI at all.
Until now! Towards Monosemanticity, recently out of big AI company/research lab Anthropic, claims to have gazed inside an AI and seen its soul. It looks like this:
https://www.astralcodexten.com/p/god-help-us-lets-try-to-understand