5.7 KiB
ReadFeeder
This script loads an RSS feed and creates an audio file via TTS for new and changed entries. Afterwards, this file is posted on a Mastodon account (or any other Fediverse service with a Mastodon.py compatible API).
Installation
- Clone Git Repo:
git clone https://dev.spittank.org/daniel/ReadFeeder.git
- Install libraries:
pip install -r requirements.txt
- Copy configuration example
config.example.yml
toconfig.yml
and adjust settings.
Configuration
The configuration must always be complete, no checks are carried out. Missing configuration parameters will cause program errors.
Elevenlabs
elevenlabs:
enabled: false
api_key: <CHANGE_ME>
voice:
id: ODq5zmih8GrVes37Dizd
stability: 0.5
similarity_boost: 0.75
style: 0.0
use_speaker_boost: true
If ElevenLabs should be used for generation, it has to be activated with enabled: true
. Otherwise gTTS will be used. In this case, an api_key
has to be set, additionally a language and its settings can be specified. By default "Patrick" is set.
Audio
audio:
artist: ReadFeeder
In this section, parts of the metadata for the generated audio files can be specified, currently the artist
.
Mastodon
mastodon:
enabled: true
instance_url: <CHANGE_ME>
access_token: <CHANGE_ME>
visibility: private
post:
title: true
text: false
date: false
link: true
In this section, all Mastodon relevant settings are made. Posting requires an instance_url
and an access_token
and can be completely switched off with enabled: false
.
The visibility
determines who can read the postings:
- Followers only:
visibility: private
- Unlisted:
visibility: unlisted
- Everyone:
visibility: public
Under post
, it can be determined for each component whether it should appear in the post. WARNING: Currently, no size control is carried out. Therefore, it is safer to not set the text or nothing at all (then only the generated file will be posted).
RSS
rss:
feed_url: <CHANGE_ME>
use_field: title
In this section, the feed address (feed_url
) and the field to be used (use_field
) must be specified. The field can be, for example, title
or description
. The full text of an article is usually in description
, but some feeds only publish a title.
Languages
languages:
default: en
used:
- en
- de
The default
language is used for all sentences that cannot be assigned or whose language is not supported.
With the languages under used
, you define a selection of languages that can occur in the feed. This improves the accuracy of detection.
Filter
Since not all terms can (or should) be generated, two types of filters are applied to the contents.
Word filter
word_filter:
"...": "..."
Word filters allow simple replacements according to the pattern "original": "replacement"
.
RegEx filter
regex_filter:
"[tT]est": "Test"
RegEx filters allow complex replacements with regular expressions according to the pattern "pattern": "replacement"
.
Usage
The script can be called up directly.
python main.py
Functionality
Basic Procedure
- Loading configuration and initialization
- Load the feed
- For each article
- Load the content (either from title or description, see configuration)
- Apply filters (see configuration)
- Generate a hash, if no audio file to this hash is yet available...
- Split into sentences
- Generate a hash for each sentence, if no audio file to this hash is yet available, generate this with TTS
- Combine all sentences
- Post entry
Hashing and Generation
The script performs some steps to keep the generation calls low. This means on the one hand, a reduction in the required time, and on the other hand a reduction in the costs for calling payed APIs (like ElevenLabs).
To do this, the content of the article is hashed first. The article is only processed if there is no audio file with this hash already. This also ensures that changed articles are definitely generated and posted again.
The hashes are simple MD5 hashes, which are formed over the normalized texts (lower case, without whitespaces).
When generating an article, hashes are formed again for the individual sentences and the corresponding audio files are generated if they do not exist yet. This ensures that existing parts are reused and the entire article does not have to be regenerated for the smallest changes.
On the other hand, this may lead to reduced spoken audio quality as it may not ensure a consistent tonality of the article. It is therefore a compromise between the length of the sentences - and thus their reusability - as well as the sound quality.
The separators in the function split_into_sentences
are used to divide the sentences. By default, only ?
, !
, .
and …
are used. The separators are left at the end of the sentences, so "Hello!" and "Hello?" are different sentences. This is necessary,
because the punctuation can change the emphasis.
The text could be split up to the maximum with the separator
(space). There would be very few new generations required, however, the speech quality would be very choppy.
For each sentence, the language is determined. This is primarily relevant for gTTS in order to improve the quality of spoken output. ElevenLabs has a multilingual model.
Generating with gTTS is free (it uses Google Translate's spoken output). ElevenLabs requires an API key, a (small) free plan is available.