
In the examples folder you can find several example training tasks. Check the configs folder for the associated configs files. examples.randomwalks does offline reinforcement on a set of graph random walks to stitch shortest paths to some destination. examples.simulacra optimizes prompts by using prompts-ratings dataset (https://github.com/JD-P/simulacra-aesthetic-captions). examples.architext tries to optimize designs represented textually by minimazing number of rooms (pretrained model is under a license on hf). examples.ilql_sentiments and examples.ppo_sentiments train to generate movie reviews with a positive sentiment, in offline setting – by fitting to IMDB dataset sentiment scores, and in online setting – by sampling finetuned on IMDB model and rating samples with learned sentiment reward model, You can tweak these scripts to your liking and tune hyperparameters to your problem if you wish to use trlx for some custom task.