Speech Note

Rating: 
4.90909
Your rating: None Average: 4.9 (22 votes)

Note taking, reading and translating with offline Speech to Text, Text to Speech and Machine Translator.

Speech Note let you take, read and translate notes in multiple languages. It uses Speech to Text, Text to Speech and Machine Translation to do so. Text and voice processing take place entirely offline, locally on your phone, without using a network connection.

Your privacy is always respected. No data is sent to the Internet!

Speech Note uses many different processing engines to do its job. Currently these are used:

Speech Note supports extensive number of language models. Some of them give very good accuracy, but some are not perfect. All models can be downloaded directly from the app.

A detailed list of supported languages is here.

If you are looking for similar app but for Linux Desktop you should check Speech Note available on Flathub (video demo).

Limitations:

  • App does not work on i486 architecture (e.g. Jolla Tablet)
  • Models for Whisper engine are disabled on phones with ARMv7 CPU (e.g. Jolla C).
  • Models for Whisper and April-ASR engine are extremely slow on ARM32. Practically, they are usable only on ARM64.
  • Speech to Text for languages other than English is not very accurate in general.
  • Machine translation is slow on ARM32, especially on ARMv7 phones.

Any comments, ideas, translations, issue reports are highly appreciated.

Translations (both Speech Note and Speech Keyboard):
All translations are very welcome. There are three ways to contribute:
- [preferred] Transifex project
- Direct github pull request or gitlab merge request
- Translation file sent to me via e-mail: dsnote@mkiol.net

Source code: https://github.com/mkiol/dsnote or https://gitlab.com/mkiol/dsnote
Bugs, Feature requests: https://github.com/mkiol/dsnote/issues or https://gitlab.com/mkiol/dsnote/-/issues or just email: dsnote@mkiol.net

Screenshots: 
Application versions: 
AttachmentSizeDate
File harbour-dsnote-1.5.1-1.armv7hl.rpm1.27 MB17/11/2021 - 10:00
File harbour-dsnote-1.5.1-1.aarch64.rpm1.34 MB17/11/2021 - 19:28
File harbour-dsnote-1.6.1-1.armv7hl.rpm1.31 MB10/12/2021 - 20:52
File harbour-dsnote-1.6.1-1.aarch64.rpm1.39 MB10/12/2021 - 20:52
File harbour-dsnote-1.8.0-1.aarch64.rpm1.44 MB02/04/2022 - 19:40
File harbour-dsnote-1.8.0-1.armv7hl.rpm1.36 MB02/04/2022 - 19:40
File harbour-dsnote-2.0.1-1.armv7hl.rpm6.7 MB15/04/2023 - 16:58
File harbour-dsnote-2.0.1-1.aarch64.rpm7.86 MB15/04/2023 - 16:58
File harbour-dsnote-3.1.6-1.aarch64.rpm92.14 MB13/07/2023 - 16:41
File harbour-dsnote-3.1.6-1.armv7hl.rpm21.78 MB13/07/2023 - 16:41
File harbour-dsnote-4.5.0-1.aarch64.rpm28.63 MB18/05/2024 - 19:55
File harbour-dsnote-4.5.0-1.armv7hl.rpm27.38 MB18/05/2024 - 19:55
File harbour-dsnote-4.6.0-1.armv7hl.rpm28.21 MB03/08/2024 - 18:05
File harbour-dsnote-4.6.0-1.aarch64.rpm29.41 MB03/08/2024 - 18:05
File harbour-dsnote-4.6.1-1.aarch64.rpm29.41 MB17/08/2024 - 11:58
File harbour-dsnote-4.6.1-1.armv7hl.rpm28.21 MB17/08/2024 - 11:58
File harbour-dsnote-4.7.0-1.aarch64.rpm29.77 MB29/12/2024 - 12:55
File harbour-dsnote-4.7.0-1.armv7hl.rpm28.55 MB29/12/2024 - 12:55
Changelog: 

4.7.0

  • General
    • New mode for replacing the current note instead of appending new text to it. When the Replace an existing note option is set, whenever new text is added, it will replace the existing note.
  • User Interface
    • Speech Note has been translated into Slovenian language.
  • Speech to Text
    • Settings option Profile which allows you to change WhisperCpp processing parameters. There are two profiles to choose from: Best Performance, Best Quality.
    • Echo mode. After processing, the decoded text will be immediately read out using the currently set Text to Speech model.
    • Update the whisper.cpp library. This provides a 10% increase in STT speed with WhisperCpp models.
  • Text to Speech
    • New Piper voice for Latvian
  • Translator
    • New models: English to Finnish, English to Turkish, English to Swedish, Swedish to English, English to Slovak, English to Indonesian, English to Romanian, English to Greek, Chinese to English
    • Updated models: English to Catalan, English to Russian, English to Ukrainian, English to Czech

4.6.1

  • User Interface
    • Swedish translation has been updated.
  • Translator
    • New models: English to Latvian, English to Danish, English to Croatian, English to Slovenian, Indonesian to English, Romanian to English
    • Updated models: English to Hungarian, Czech to English, Greek to English

4.6.0

  • User Interface
    • Speech Note has been translated into Norwegian language.
    • Grouped models. Models that provide multiple sub-models (for example, TTS models that provide different voices) are shown in groups.
    • Option to enable/disable support for subtitles. Subtitle support is a niche functionality. To simplify the user interface, the subtitle options is not visible by default.
  • Speech to Text
    • The name of the all Whisper models has been changed to WhisperCpp to better reflect the engine behind them.
    • Automatic language detection in STT. To automatically detect the language during STT, select one of the models that is in the Auto detected category in the language list.
    • Quicker decoding with WhisperCpp. Optimization for short sentences has been added to WhisperCpp. With it, the speed of STT has doubled!
    • Translate to English option for WhisperCpp models. When enabled, speech is automatically translated into English.
    • Option for inserting processing statistics. New settings option allows inserting processing related information to the text after decoding, such as processing time and audio length. This can be useful for comparing the performance of different models, engines and their parameters.
  • Text to Speech
    • Welsh language. New language is enabled with Piper voice.
    • New Piper voices for Spanish, Italian and English
    • New RHVoice voices for Slovak and Croatian
  • Translator
    • New button for switching languages.
    • New models: English to Lithuanian, Croatian to English, Latvian to English, Danish to English, Serbian to English, Slovak to English, Bosnian to English, Vietnamese to English
    • Updated models: Lithuanian to English, Slovenian to English, Russian to English, Ukrainian to English

4.5.0

  • User Interface
    • Import subtitles in many formats and subtitles embedded into video file. You can import and export subtitles in SRT, WebVTT and ASS formats. If your video file contains one or many subtitle streams, you can import the selected subtitles into notepad.
    • Unified file importing and exporting. Text, subtitles, audio and video files can be imported or exported using unified pull-down menu option.
    • Settings option to enable/disable remembering the last note. If the option is disabled, the last note will not be available after restarting the app.
    • Settings option for default action when importing note from a file.
    • New text appending style: After empty line
    • Speech Note has been translated into Ukrainian and Russian languages.
    • Fix: Cancellation was blocking the user interface.
  • Speech to Text
    • Subtitles support in STT. To generate timestamped text in SRT format, change the text format to SRT Subtitles using the button at the bottom of the text area. Check the settings to find more subtitle options.
  • Text to Speech
    • Speech synchronized with subtitle timestamps in TTS. When the text format is set to SRT Subtitles, the generated speech will be synchronized with the subtitle timestamps.
    • New Piper voices for English, Persian, Slovenian, Turkish, French and Spanish
    • New RHVoice voice for Czech
    • Settings option to enable/disable speech synchronization with subtitle timestamps.
    • Speech audio is always normalized after TTS processing.
  • Translator
    • New models: Greek to English, Maltese to English, Slovenian to English, Turkish to English, English to Catalan
    • Updated models: Czech and Lithuanian

4.4.0

  • Translator
    • New model: Lithuanian to English
    • Progress indicator
  • Speech to Text:
    • New language: Marathi
    • Support for Speex audio codec in 'Transcribe a file'
    • Support for multiple audio streams in a video file
  • Text to Speech:
    • New voices for Serbian and Uzbek languages (RHVoice models)

4.3.0

  • Translator
    • New model: English to Hungarian
  • Speech to Text:
    • New languages: Afrikaans, Gujarati, Hausa, Telugu, Tswana, Javanese, Hebrew
    • New engine: April-ASR. Models for: English, French and Polish.
    • Stop listening button
    • Support for Opus audio codec in Transcribe a file
  • Text to Speech:
    • New Piper voices: Arabic, English, Hungarian, Polish, Czech, German, Ukrainian, Vietnamese, Serbian, French, Spanish, Nepali
    • More steps in Speech speed option
    • Diacritical marks restoration before speech synthesis for Arabic
    • Fix: Exporting to audio file was not possible when text was very long
  • Other:
    • Setting option Clear cache on close
    • Cache compression (Opus format instead of raw audio)

4.2.0

  • Translator
    • New models: Hungarian to English, Finnish to English
  • Speech to Text:
    • Support for video files transcription. With 'Transcribe a file' menu option you can convert audio file or audio from video file to text.
    • Whisper engine update and increase in performance. Processing time has been reduced by an average of 15% (Xperia 10 III).
  • Text to Speech:
    • Save audio in compressed formats (MP3 or Ogg Vorbis). You can also save metadata tags to the audio file, such as track number, title, artist or album.
    • Pause option. You can pause or resume speech reading.
    • Update of RHVoice voice for Uzbek
    • Fix: Piper models could not be downloaded
  • User Interface:
    • Share to Speech Note. You can push text, audio or video content to Speech Note using share button in other apps (e.g. Notes, Gallery, Audio recorder, Browser).

4.1.0

  • Speech to Text:
    • Remove of experimental 'Restore punctuation' option
    • Fix: Whisper wasn't able to decode short speech sentences
  • Text to Speech:
    • Option 'Speech speed' to make synthesized speech slower or faster.
    • New Piper voices: Czech, German, Hungarian, Portuguese, Slovak,
      English
    • Update of RHVoice voices for Slovak and Czech
    • Fix: Splitting text into sentences was incorrect for: Georgian,
      Japanese, Bengali, Nepali, Hindi

4.0.0

  • Translator (new feature - watch this video)
    • Support for offline translations for following languages: Catalan, Bulgarian, Czech, Danish, English, Spanish, German, Estonian, French, Italian, Polish, Portuguese, Norwegian, Iranian, Dutch, Russian, Ukrainian, Icelandic
    • Translator uses models that were created as part of Bergamot project and Firefox Translations.
  • User Interface:
    • User interface has been redesign. It is more handy and better supports landscape view.
    • Application has been translated to new languages: Dutch and Italian. Many thanks to all translators <3 <3
  • Text to Speech:
    • All existing Piper models have been updated.
    • New voices for: English, Swedish, Turkish, Polish, German, Spanish, Finnish, French, Ukrainian, Russian, Swahili, Serbian, Romanian, Luxembourgish and Georgian and Slovak

To read more details check About->Changes in the app.

Comments

legar's picture

great soft. I use it on mx linux and the flatpack version worked at once ! I wonder if there is a way, a code or something to make a sound to reflect that the number in parentheses cites a book when doing text to speech. Thanks a lot. 

Malakay's picture

Now i use vosk small

Malakay's picture

I upgraded to last version and it seems it is quite faster than before.but still quite many typos and one new thing - sometimes in written text appears this [unk]

mkiol's picture

What exact Speech to Text model you are testing?

Just a general remark from my observations. Sadly, STT works fine only for English right now. For any other language Whisper provides decent accuracy but it is also veeery slow :/

Malakay's picture

So I solved it finally - I installed back version 1.8.0 and keyboard 1.3.0 and it works flawlessly again. Model Commodoro CS.

Malakay's picture

So what should i do?

Malakay's picture

And whisper small, but i tested all of them and it seemed +- the same on all of them. Maybe you could obtain that old one used in 1.6-1.8 as another one for testing purposes?

mkiol's picture

Actually I don't think that any Whisper models is usable on ARM32. The smallest is 'Tiny' and it might work but will be very slow. Whisper has any value only on ARM64 :/

Malakay's picture

I use Xperia X. Thanks, looking forward for fix :)

Malakay's picture

I can´t help myself, but older version (some 1.6 - 1.8 perhaps, on sfos 3.4.0.24) worked much better for me. It recognized much faster and much better, almost without typos, what I said, it wrote. Nothing more, nothing less. This new version 3 seems to me slower and making more typos.

mkiol's picture

What model do you test and on what device?

I have to admit I've also noticed a performace reggression on ARM32 with DeepSpeech/Coqui models.

mkiol's picture

I took measurements and you were perfectly right. Speech Note v2.x uses new Coqui STT lib which has much worse performance comparing to old one. On Xperia 10 (ARM32) it 2x slower! I don't know why I missed it.

https://github.com/mkiol/dsnote/issues/11

Working on a fix...

TMavica's picture

It works. Anyway to add cantonese?

mkiol's picture

According this, Cantonese should work decently on Whisper 'Medium' model. Unfortunately 'Medium' model is disabled on SFOS version because phone's CPU is too week to handle processing (BTW, If you are Linux user, I recommend you to check Speech Note for desktop).

I made a test with 'Base' model and it looks that it can transcribe also to Cantonese but honestly can't say anything about accuracy. Did you try Whisper 'Base' model with Cantonese speech? What was the result?

TMavica's picture

Seem is not working unfortunately

TMavica's picture

One more question, my native language is tranditional chinese Cantonese , i am from HK, do u think it works??

mkiol's picture

I think it works for Taiwanese Mandarin but propably not Cantonese. I may be mistaken. Just uploaded new version, so you can verify it by your self. Please try Whisper Base model.

TMavica's picture

Ok thx

TMavica's picture

Can u add tranditional chinese?

mkiol's picture

Regarding STT, It looks like only Whisper model produces text in tranditional script but... Whisper for Chinese doesn't work right now at all because of the bug in the code. I've discovered this bug when I tried to answer your question, so big Thank You :) I will fix it in the upcoming release (in 2 days).

TTS Piper model reads chinese regardles of the script. It accepts traditional and simplified.

PamNor's picture

Can't find Norwegian download in settings.
Jolla C.

mkiol's picture

Unfortunately Norwegian is provided only by Whisper model and all Whisper models are disabled on ARM7 devices (like Jolla C). Whisper requires a lot of computation power and this old CPU can't handle it. Sorry.

eson's picture

Great upgrade! Thanks for the Swedish speech models. Much appreciated.

articice's picture

Fatal error: the to be installed harbour-dsnote-1.8.0-1.armv7hl require
s 'qt5-qtmultimedia-plugin-mediaservice-gstaudiodecoder'

Looks like there's no gstaudiodecoder for qt5-qtmultimedia-5.6.2+git31-1.12.1 in Vanha Rauma

mkiol's picture

On which device you are installing? This package should be available on SFOS 4.4 as well.

At least it is available on Jolla C:

[root@Sailfish nemo]# cat /etc/sailfish-release  
NAME="Sailfish OS"
ID=sailfishos
VERSION="4.4.0.58 (Vanha Rauma)"
VERSION_ID=4.4.0.58
PRETTY_NAME="Sailfish OS 4.4.0.58 (Vanha Rauma)"
SAILFISH_BUILD=58
SAILFISH_FLAVOUR=release
HOME_URL="https://sailfishos.org/"
[root@Sailfish nemo]# zypper info qt5-qtmultimedia-plugin-mediaservice-gstaudiodecoder
Loading repository data...
Reading installed packages...


Information for package qt5-qtmultimedia-plugin-mediaservice-gstaudiodecoder:
-----------------------------------------------------------------------------
Repository     : jolla
Name           : qt5-qtmultimedia-plugin-mediaservice-gstaudiodecoder
Version        : 5.6.2+git31-1.12.1.jolla
Arch           : armv7hl
Vendor         : meego
Installed Size : 23.8 KiB
Installed      : Yes (automatically)
Status         : up-to-date
Source package : qt5-qtmultimedia-5.6.2+git31-1.12.1.jolla.src
Summary        : Qt Multimedia - GStreamer audio decoder media service
Description    :  
   This package contains the GStreamer audio decoder plugin for QtMultimedia
articice's picture

It's Xperia 10 Plus.

Perhaps this issue only applies to aarch64.

pkcon install harbour-dsnote
Fatal error: the to be installed harbour-dsnote-1.8.0-1.armv7hl requires 'qt5-qtmultimedia-plugin-mediaservice-gstaudiodecoder', but this requirement cannot be provided

pkcon search qt5-qtmultimedia-plugin-mediaservice-gstaudiodecoder
Available       qt5-qtmultimedia-plugin-mediaservice-gstaudiodecoder-5.6.2+git29-1.11.1.jolla.armv7hl (jolla)   Qt Multimedia - GStreamer audio decoder media service

 

unsocialcortex's picture

Re 1.6.1 patchnotes:

just tested this wonderful app out for a while and "Deutsch (Aashish Agarwal)" seems very inferior to "Deutsch (Jaco)". tried some normal conversation aswell as nicely read out sentences using my xa2 for both and alot more words just got completly garbled or left out with "Aashish Agarwal".

mkiol's picture

Thank you so much for the feedback. Would you be able to evaluate "Deutsch (med)" as well? This model is available in version 1.8.0.

unsocialcortex's picture

so im no doctor or anything but i tested "med" a bit using some medical vocabulary and excerpts from german medical journals. "jaco" always gets more in general from sentences. for the medical terms they miss words or get them wrong regularly but "jaco" gets closer in my experience by doing *something* instead of nothing in some cases.

all in all german deepspeech is obviously nowhere near english but its not bad for normal people conversation

JayJay's picture

Real nice work! The app is really cool. Is there any option to customize the vocabulary (i would need german medical language with drug recognition and medical vocabulary... is there maybe a file i can download or buy? If not... That would be an awesome new feature if i could add new vocabulary myself :-)

Pages