Daniel Spittank 2828cbff91 small fixes to README

2023-11-06 00:20:51 +01:00

5.7 KiB

Raw Blame History

ReadFeeder

This script loads an RSS feed and creates an audio file via TTS for new and changed entries. Afterwards, this file is posted on a Mastodon account (or any other Fediverse service with a Mastodon.py compatible API).

Installation

Clone Git Repo: git clone https://dev.spittank.org/daniel/ReadFeeder.git
Install libraries:
```
 pip install -r requirements.txt
```
Copy configuration example config.example.yml to config.yml and adjust settings.

Configuration

The configuration must always be complete, no checks are carried out. Missing configuration parameters will cause program errors.

Elevenlabs

elevenlabs:
  enabled: false
  api_key: <CHANGE_ME>
  voice:
    id: ODq5zmih8GrVes37Dizd
    stability: 0.5
    similarity_boost: 0.75
    style: 0.0
    use_speaker_boost: true

If ElevenLabs should be used for generation, it has to be activated with enabled: true. Otherwise gTTS will be used. In this case, an api_key has to be set, additionally a language and its settings can be specified. By default "Patrick" is set.

Audio

audio:
  artist: ReadFeeder

In this section, parts of the metadata for the generated audio files can be specified, currently the artist.

Mastodon

mastodon:
  enabled: true
  instance_url: <CHANGE_ME>
  access_token: <CHANGE_ME>
  visibility: private
  post:
    title: true
    text: false
    date: false
    link: true

In this section, all Mastodon relevant settings are made. Posting requires an instance_url and an access_token and can be completely switched off with enabled: false.

The visibility determines who can read the postings:

Followers only: visibility: private
Unlisted: visibility: unlisted
Everyone: visibility: public

Under post, it can be determined for each component whether it should appear in the post. WARNING: Currently, no size control is carried out. Therefore, it is safer to not set the text or nothing at all (then only the generated file will be posted).

RSS

rss:
  feed_url: <CHANGE_ME>
  use_field: title

In this section, the feed address (feed_url) and the field to be used (use_field) must be specified. The field can be, for example, title or description. The full text of an article is usually in description, but some feeds only publish a title.

Languages

languages:
  default: en
  used:
    - en
    - de

The default language is used for all sentences that cannot be assigned or whose language is not supported.

With the languages under used, you define a selection of languages that can occur in the feed. This improves the accuracy of detection.

Filter

Since not all terms can (or should) be generated, two types of filters are applied to the contents.

Word filter

word_filter:
  "...": "..."

Word filters allow simple replacements according to the pattern "original": "replacement".

RegEx filter

regex_filter:
  "[tT]est": "Test"

RegEx filters allow complex replacements with regular expressions according to the pattern "pattern": "replacement".

Usage

The script can be called up directly.

python main.py

Functionality

Basic Procedure

Loading configuration and initialization
Load the feed
For each article
1. Load the content (either from title or description, see configuration)
2. Apply filters (see configuration)
3. Generate a hash, if no audio file to this hash is yet available...
  1. Split into sentences
  2. Generate a hash for each sentence, if no audio file to this hash is yet available, generate this with TTS
  3. Combine all sentences
4. Post entry

Hashing and Generation

The script performs some steps to keep the generation calls low. This means on the one hand, a reduction in the required time, and on the other hand a reduction in the costs for calling payed APIs (like ElevenLabs).

To do this, the content of the article is hashed first. The article is only processed if there is no audio file with this hash already. This also ensures that changed articles are definitely generated and posted again.

The hashes are simple MD5 hashes, which are formed over the normalized texts (lower case, without whitespaces).

When generating an article, hashes are formed again for the individual sentences and the corresponding audio files are generated if they do not exist yet. This ensures that existing parts are reused and the entire article does not have to be regenerated for the smallest changes. On the other hand, this may lead to reduced spoken audio quality as it may not ensure a consistent tonality of the article. It is therefore a compromise between the length of the sentences - and thus their reusability - as well as the sound quality. The separators in the function split_into_sentences are used to divide the sentences. By default, only ?, !, . and … are used. The separators are left at the end of the sentences, so "Hello!" and "Hello?" are different sentences. This is necessary, because the punctuation can change the emphasis.

The text could be split up to the maximum with the separator (space). There would be very few new generations required, however, the speech quality would be very choppy.

For each sentence, the language is determined. This is primarily relevant for gTTS in order to improve the quality of spoken output. ElevenLabs has a multilingual model.

Generating with gTTS is free (it uses Google Translate's spoken output). ElevenLabs requires an API key, a (small) free plan is available.

5.7 KiB Raw Blame History