Adding reasoning to your AI? Take these resources, they may help you on your way. 
| AGI/causality/frml grammar |
|
|
| Deepmind Chomsky Hierarchy |
Problems crafted for FSM/PDA/TM |
[1] |
| automata |
a neurallambda tool to gen from grammars |
[1] |
| im a strange dataset |
Tough for LLMs because of self-references. |
[1] |
| DiagGSM8k |
NL Reasoning Benchmark |
[1] |
| CLadder |
Causal reasoning |
[1] |
| Cause-Effect Pairs |
108 datasets of 2 var dynamics (not NL) |
[1] |
| MNLI Entailment |
sentence parsing + entailment |
[1] |
| AGENT/TOOL |
|
|
| THUDM AgentInstruct |
long form dialogs |
[1] |
| WANG AgentInstruct |
gpt3 synthesized instructions |
[1] |
| KnowLM Tool |
prompt + tool call + answer |
[1] |
| Glaive Tool Usage |
sys prompt says tools + prompt + answer |
[1] |
| opentoolformer retrieval |
prompt + tool call |
[1] |
| CODE |
|
|
| rosetta |
same program, many diff languages |
[1] |
| EvoEval Tool Use |
100 prompt + code + tests |
[1] |
| MATH/LOGIC |
|
|
| gsm8k |
Grade School Math 8k |
[1] |
| MetaMath |
one-shot math |
[1] |
| MetaMathFewShot |
few-shot math |
[1] |
| MathPile |
9B tok from filtered internet |
[1] |
| LogiQA |
NL multi choice, requires abstraction |
[1] |
| Logic-LM |
a model combining auto theorem provers and llms |
[1] |
| Coq Facts |
270k cog theorem prover programs |
[1] |
| NATURAL LANGUAGE |
|
|
| UltraInteract_sft |
GPT generated iterated reasoning dialogs |
[1] |
| MUD videogames |
(various could be training data) |
|
| Winogrande |
ambiguous sentences, fill in 1 word |
[1] |
| Winograd_wsc |
ambiguous sentences, choose the right word |
[1] |
| Contradiction |
2 phrases, do they contradict |
[1] |
| Recognizing Textual Entailment |
2 phrases, do they entail each other |
[1] |
| Textual Entailment Pool |
more entailment |
[1] |
| Answer Validation |
2 phrases, does the answer solve question |
[1] |
| Monotonicity Entailment |
x is true, does y follow |
[1] |
| entailment |
passage, question -> T/F |
[1] |
| Commonsense QA |
muti choice QA |
[1] |
| GLUE |
several datasets |
[1] |
| custom multi-hop |
use wikipedia's graph of articles |
|
| TOY PROBLEMS |
|
|
| Big Bench Hard |
23 challenges (only 6k datapoints) |
[1] |
| logical entailment dataset |
logic strings by deepmind |
[1] |
| logical entailment dataset code |
(generate it yourself) |
[1] |
| FSM Game |
generate strings according to grammar |
|
| Adaptive Grammar |
grammar rule might change |
|
| String/Graph Rewriting |
|
string_rewriting.py |
| LibraryOfLogic |
generate NL from multiple games |
[1] |
| AB-XY Game |
|
|
| word ladder |
|
|
| parser |
|
|
| longest cmn subseq |
|
|
| string reversal |
|
|
| wisconsin card sorting |
|
|
| anagram |
|
|
| palindrome |
|
|
| permutation composition |
|
|
| TOKEN AUGMENTED REASONING |
|
|
| Reasoning tokens |
Self-Reasoning Tokens, teaching models to think ahead |
[1] |
| Quiet-STaR |
LLMs Can Teach Themselves to Think Before Speaking |
[1] |
| Multi-token Prediction |
Multi-token prediction is favorable for the development of induction heads and algorithmic reasoning capabilities |
https://arxiv.org/abs/2404.19737 |
| INDIRECT REASONING (IR) |
|
|
| Contrapositive and Contradiction for Automated Reasoning |
use logic of contrapositives and contradictions for factual reasoning and mathematical proofs |
https://arxiv.org/pdf/2402.03667 |
| DIRECT REASONING (DR) |
|
|
| Graph of Thoughts (GoT) |
Model the information generated by an LLM as an arbitrary graph |
https://arxiv.org/abs/2308.09687 |
| Self-Consistency |
Self-consistency leverages the intuition that a complex reasoning problem typically admits multiple different ways of thinking leading to its unique correct answer |
https://arxiv.org/abs/2203.11171 |
| Chain of Thoughts |
chain of thought -- a series of intermediate reasoning steps -- significantly improves the ability of large language models to perform complex reasoning |
https://arxiv.org/abs/2201.11903 |
| Chain of thoughts without prompting |
CoT reasoning paths can be elicited from pre-trained LLMs by simply altering the decoding proces |
https://arxiv.org/abs/2402.10200 |
| Iterative Reasoning Preference Optimization |
Iterated DPO, but for CoT, repeated until performance saturates on reasoning tasks |
https://arxiv.org/pdf/2404.19733 |