Speech Note

Rating: 
4.90909
Your rating: None Average: 4.9 (22 votes)

Note taking, reading and translating with offline Speech to Text, Text to Speech and Machine Translator.

Speech Note let you take, read and translate notes in multiple languages. It uses Speech to Text, Text to Speech and Machine Translation to do so. Text and voice processing take place entirely offline, locally on your phone, without using a network connection.

Your privacy is always respected. No data is sent to the Internet!

Speech Note uses many different processing engines to do its job. Currently these are used:

Speech Note supports extensive number of language models. Some of them give very good accuracy, but some are not perfect. All models can be downloaded directly from the app.

A detailed list of supported languages is here.

If you are looking for similar app but for Linux Desktop you should check Speech Note available on Flathub (video demo).

Limitations:

  • App does not work on i486 architecture (e.g. Jolla Tablet)
  • Models for Whisper engine are disabled on phones with ARMv7 CPU (e.g. Jolla C).
  • Models for Whisper and April-ASR engine are extremely slow on ARM32. Practically, they are usable only on ARM64.
  • Speech to Text for languages other than English is not very accurate in general.
  • Machine translation is slow on ARM32, especially on ARMv7 phones.

Any comments, ideas, translations, issue reports are highly appreciated.

Translations (both Speech Note and Speech Keyboard):
All translations are very welcome. There are three ways to contribute:
- [preferred] Transifex project
- Direct github pull request or gitlab merge request
- Translation file sent to me via e-mail: dsnote@mkiol.net

Source code: https://github.com/mkiol/dsnote or https://gitlab.com/mkiol/dsnote
Bugs, Feature requests: https://github.com/mkiol/dsnote/issues or https://gitlab.com/mkiol/dsnote/-/issues or just email: dsnote@mkiol.net

Screenshots: 
Application versions: 
AttachmentSizeDate
File harbour-dsnote-1.5.1-1.armv7hl.rpm1.27 MB17/11/2021 - 10:00
File harbour-dsnote-1.5.1-1.aarch64.rpm1.34 MB17/11/2021 - 19:28
File harbour-dsnote-1.6.1-1.armv7hl.rpm1.31 MB10/12/2021 - 20:52
File harbour-dsnote-1.6.1-1.aarch64.rpm1.39 MB10/12/2021 - 20:52
File harbour-dsnote-1.8.0-1.aarch64.rpm1.44 MB02/04/2022 - 19:40
File harbour-dsnote-1.8.0-1.armv7hl.rpm1.36 MB02/04/2022 - 19:40
File harbour-dsnote-2.0.1-1.armv7hl.rpm6.7 MB15/04/2023 - 16:58
File harbour-dsnote-2.0.1-1.aarch64.rpm7.86 MB15/04/2023 - 16:58
File harbour-dsnote-3.1.6-1.aarch64.rpm92.14 MB13/07/2023 - 16:41
File harbour-dsnote-3.1.6-1.armv7hl.rpm21.78 MB13/07/2023 - 16:41
File harbour-dsnote-4.5.0-1.aarch64.rpm28.63 MB18/05/2024 - 19:55
File harbour-dsnote-4.5.0-1.armv7hl.rpm27.38 MB18/05/2024 - 19:55
File harbour-dsnote-4.6.0-1.armv7hl.rpm28.21 MB03/08/2024 - 18:05
File harbour-dsnote-4.6.0-1.aarch64.rpm29.41 MB03/08/2024 - 18:05
File harbour-dsnote-4.6.1-1.aarch64.rpm29.41 MB17/08/2024 - 11:58
File harbour-dsnote-4.6.1-1.armv7hl.rpm28.21 MB17/08/2024 - 11:58
File harbour-dsnote-4.7.0-1.aarch64.rpm29.77 MB29/12/2024 - 12:55
File harbour-dsnote-4.7.0-1.armv7hl.rpm28.55 MB29/12/2024 - 12:55
Changelog: 

4.7.0

  • General
    • New mode for replacing the current note instead of appending new text to it. When the Replace an existing note option is set, whenever new text is added, it will replace the existing note.
  • User Interface
    • Speech Note has been translated into Slovenian language.
  • Speech to Text
    • Settings option Profile which allows you to change WhisperCpp processing parameters. There are two profiles to choose from: Best Performance, Best Quality.
    • Echo mode. After processing, the decoded text will be immediately read out using the currently set Text to Speech model.
    • Update the whisper.cpp library. This provides a 10% increase in STT speed with WhisperCpp models.
  • Text to Speech
    • New Piper voice for Latvian
  • Translator
    • New models: English to Finnish, English to Turkish, English to Swedish, Swedish to English, English to Slovak, English to Indonesian, English to Romanian, English to Greek, Chinese to English
    • Updated models: English to Catalan, English to Russian, English to Ukrainian, English to Czech

4.6.1

  • User Interface
    • Swedish translation has been updated.
  • Translator
    • New models: English to Latvian, English to Danish, English to Croatian, English to Slovenian, Indonesian to English, Romanian to English
    • Updated models: English to Hungarian, Czech to English, Greek to English

4.6.0

  • User Interface
    • Speech Note has been translated into Norwegian language.
    • Grouped models. Models that provide multiple sub-models (for example, TTS models that provide different voices) are shown in groups.
    • Option to enable/disable support for subtitles. Subtitle support is a niche functionality. To simplify the user interface, the subtitle options is not visible by default.
  • Speech to Text
    • The name of the all Whisper models has been changed to WhisperCpp to better reflect the engine behind them.
    • Automatic language detection in STT. To automatically detect the language during STT, select one of the models that is in the Auto detected category in the language list.
    • Quicker decoding with WhisperCpp. Optimization for short sentences has been added to WhisperCpp. With it, the speed of STT has doubled!
    • Translate to English option for WhisperCpp models. When enabled, speech is automatically translated into English.
    • Option for inserting processing statistics. New settings option allows inserting processing related information to the text after decoding, such as processing time and audio length. This can be useful for comparing the performance of different models, engines and their parameters.
  • Text to Speech
    • Welsh language. New language is enabled with Piper voice.
    • New Piper voices for Spanish, Italian and English
    • New RHVoice voices for Slovak and Croatian
  • Translator
    • New button for switching languages.
    • New models: English to Lithuanian, Croatian to English, Latvian to English, Danish to English, Serbian to English, Slovak to English, Bosnian to English, Vietnamese to English
    • Updated models: Lithuanian to English, Slovenian to English, Russian to English, Ukrainian to English

4.5.0

  • User Interface
    • Import subtitles in many formats and subtitles embedded into video file. You can import and export subtitles in SRT, WebVTT and ASS formats. If your video file contains one or many subtitle streams, you can import the selected subtitles into notepad.
    • Unified file importing and exporting. Text, subtitles, audio and video files can be imported or exported using unified pull-down menu option.
    • Settings option to enable/disable remembering the last note. If the option is disabled, the last note will not be available after restarting the app.
    • Settings option for default action when importing note from a file.
    • New text appending style: After empty line
    • Speech Note has been translated into Ukrainian and Russian languages.
    • Fix: Cancellation was blocking the user interface.
  • Speech to Text
    • Subtitles support in STT. To generate timestamped text in SRT format, change the text format to SRT Subtitles using the button at the bottom of the text area. Check the settings to find more subtitle options.
  • Text to Speech
    • Speech synchronized with subtitle timestamps in TTS. When the text format is set to SRT Subtitles, the generated speech will be synchronized with the subtitle timestamps.
    • New Piper voices for English, Persian, Slovenian, Turkish, French and Spanish
    • New RHVoice voice for Czech
    • Settings option to enable/disable speech synchronization with subtitle timestamps.
    • Speech audio is always normalized after TTS processing.
  • Translator
    • New models: Greek to English, Maltese to English, Slovenian to English, Turkish to English, English to Catalan
    • Updated models: Czech and Lithuanian

4.4.0

  • Translator
    • New model: Lithuanian to English
    • Progress indicator
  • Speech to Text:
    • New language: Marathi
    • Support for Speex audio codec in 'Transcribe a file'
    • Support for multiple audio streams in a video file
  • Text to Speech:
    • New voices for Serbian and Uzbek languages (RHVoice models)

4.3.0

  • Translator
    • New model: English to Hungarian
  • Speech to Text:
    • New languages: Afrikaans, Gujarati, Hausa, Telugu, Tswana, Javanese, Hebrew
    • New engine: April-ASR. Models for: English, French and Polish.
    • Stop listening button
    • Support for Opus audio codec in Transcribe a file
  • Text to Speech:
    • New Piper voices: Arabic, English, Hungarian, Polish, Czech, German, Ukrainian, Vietnamese, Serbian, French, Spanish, Nepali
    • More steps in Speech speed option
    • Diacritical marks restoration before speech synthesis for Arabic
    • Fix: Exporting to audio file was not possible when text was very long
  • Other:
    • Setting option Clear cache on close
    • Cache compression (Opus format instead of raw audio)

4.2.0

  • Translator
    • New models: Hungarian to English, Finnish to English
  • Speech to Text:
    • Support for video files transcription. With 'Transcribe a file' menu option you can convert audio file or audio from video file to text.
    • Whisper engine update and increase in performance. Processing time has been reduced by an average of 15% (Xperia 10 III).
  • Text to Speech:
    • Save audio in compressed formats (MP3 or Ogg Vorbis). You can also save metadata tags to the audio file, such as track number, title, artist or album.
    • Pause option. You can pause or resume speech reading.
    • Update of RHVoice voice for Uzbek
    • Fix: Piper models could not be downloaded
  • User Interface:
    • Share to Speech Note. You can push text, audio or video content to Speech Note using share button in other apps (e.g. Notes, Gallery, Audio recorder, Browser).

4.1.0

  • Speech to Text:
    • Remove of experimental 'Restore punctuation' option
    • Fix: Whisper wasn't able to decode short speech sentences
  • Text to Speech:
    • Option 'Speech speed' to make synthesized speech slower or faster.
    • New Piper voices: Czech, German, Hungarian, Portuguese, Slovak,
      English
    • Update of RHVoice voices for Slovak and Czech
    • Fix: Splitting text into sentences was incorrect for: Georgian,
      Japanese, Bengali, Nepali, Hindi

4.0.0

  • Translator (new feature - watch this video)
    • Support for offline translations for following languages: Catalan, Bulgarian, Czech, Danish, English, Spanish, German, Estonian, French, Italian, Polish, Portuguese, Norwegian, Iranian, Dutch, Russian, Ukrainian, Icelandic
    • Translator uses models that were created as part of Bergamot project and Firefox Translations.
  • User Interface:
    • User interface has been redesign. It is more handy and better supports landscape view.
    • Application has been translated to new languages: Dutch and Italian. Many thanks to all translators <3 <3
  • Text to Speech:
    • All existing Piper models have been updated.
    • New voices for: English, Swedish, Turkish, Polish, German, Spanish, Finnish, French, Ukrainian, Russian, Swahili, Serbian, Romanian, Luxembourgish and Georgian and Slovak

To read more details check About->Changes in the app.

Comments

rdomschk's picture

Perfect Work!  A big Thank You from me...

inta's picture

Thanks for the great work, now it runs on arm64 and it works really well. :)

inta's picture

Languages still do not load here. Is there anything I have to clean up? I removed the settings folder from .config and the models dir inside Downloads.

mkiol's picture

Sorry, silly me. I forgot to upload 1.5.1 package for aarch64. It should be available in a moment.

inta's picture

The app does not "hang" anymore on startup and uninstall works, but the language list in the settings is empty (Xperia 10 II), so I can not choose a model to get started with.

mkiol's picture

Fixed in 1.5.1. Would be grateful for check if problem is resolved. Thanks.

mkiol's picture

Oh dear. I know what is wrong. I will fix it tomorrow.

inta's picture

@robthebold 10 II, so @mkiol could be right that this is an arm64 issue. Never mind, force uninstall worked and I'll try it again if you need someone to test it.

dubliner's picture

While version 1.3 worked flawlessly under SFOS 3.4, it seems the new version 1.4 runs into a problem. All I get is "Language is not configured". When I open the settings, there are "no languages", nothing is displayed.

Curiously, the old "Downloads/DeepSpeech models" directory was still there, populated with "de.scorer  de.tflite  en.scorer  en.tflite". Pointing the "Location on language files" to that directory does not make any difference.

I also tried deleting "Downloads/DeepSpeech models" as well as ".config/harbour-dsnote" to get a fresh start. Unexpectedly, that ".config/harbour-dsnote" is not re-created after starting DeepSpeech Note.

Starting from the CLI I receive this output:

$ harbour-dsnote
[D] unknown:0 - cannot load translation: "C" "/usr/share/harbour-dsnote/translations"
[D] unknown:0 - cannot load default translation
[D] unknown:0 - starting configuration
[D] unknown:0 - Using Wayland-EGL
[W] unknown:0 - cannot open models file
[W] unknown:0 - cannot open lang models file
[D] unknown:0 - [app => dbus] call KeepAliveService
[W] unknown:247 - file:///usr/lib/qt5/qml/Sailfish/Silica/private/TextBase.qml:247: TypeError: Cannot call method 'createObject' of null
[W] unknown:0 - cannot reload service because is's not running
[D] unknown:0 - [dbus => app] signal ModelsPropertyChanged
[D] unknown:0 - [dbus => app] signal StatusPropertyChanged: 2
[D] unknown:0 - [dbus => app] signal ModelsPropertyChanged
[D] unknown:0 - [dbus => app] signal StatusPropertyChanged: 1
[D] unknown:0 - [app => dbus] get DefaultModel
[D] unknown:0 - [app => dbus] get CurrentTask
[W] unknown:0 - ignore update speech
[D] unknown:0 - [app => dbus] call KeepAliveService

Any help would be appreciated, especially since I really love this application!

dubliner's picture

Update: When I copied ".config/harbour-dsnote" and ".local/share/harbour-dsnote" as well as "Downloads/DeepSpeech models" from another phone running SFOS 4.2 it works!!! Yay!

Not sure, though, why the ".local/share/harbour-dsnote" directory was not created and populated on the first try?!

P.S. Now Speech keyboard is not working on the SFOS 3.4 phone. I get the logo (three vertical lines) with strikethrough symbol.

robthebold's picture

I installed this on my Xperia 10 II, can't seem to make it work . . . When I start the app, I see an error "Unable to start service" pop up. As I'd expect for this error, speech recognition doesn't work, and when I go to Settings, there are no languages to choose from.

I was going to uninstall and reinstall the app, but Storeman can't uninstall it and when I try to uninstall from terminal a "scriptlet" fails, saying it can't stop the service because it isn't running and uninstalling fails.

I've also tried starting the service manually from the terminal but that didn't work. I'm not totally sure I did that right, though: as root I tried "systemctl start harbour-dsnote.service" and "systemctl start --user harbour-dsnote.service" and fails with message "Unit harbour-dsnote.service not found."

"rpm -rl harbour-dsnote"  led me to check to make sure /usr/lib64/systemd/user/harbour-dsnote/ exists, and it does.

Any ideas on how I can fix this or debug further? If more details are needed I can find my glasses and copy/paste stuff from terminal

mkiol's picture

I'm sorry for this mess. Most likely something is wrong with arm64 package. To be honest, I did not test it because I don't have any arm64 device yet.

To force uninstall run following in a terminal:

devel-su rpm --erase --allmatches --noscripts harbour-dsnote harbour-dskeyboard

I will investigate what went wrong tomorrow. Sorrrry.

inta's picture

I tried to install this app and the keyboard app, but the list of languages in the settings is empty. I cannot remove this app, it fails with the message that the service is not running. Any idea how to fix that?

robthebold's picture

I didn't realize you posted this issue before me -- I'm getting the same problem. What device are you using?

PamNor's picture

@mkoli. I'll continue search for Norwegian *.tflite file. Keep up your good work.

PamNor's picture

Is there a possibility to get speech model for Norwegian language?
https://www.google.com/url?sa=t&source=web&cd=&ved=2ahUKEwi53uqAnpj0AhVC...

mkiol's picture

I really would like to add such support but unfortunately I wasn't able to find any DeepSpeech model for Norwegian (usually file with *.tflite extension) :(

lispy's picture

A big Thankyou for the Transcribe Audio File feature. Made my day!!!

eson's picture

How about more language models, Swedish in perticular? ;)

mkiol's picture

I've tried but unfortunately I didn't find any available DeepSpeech model for Swedish. If you find one I will be pleased to add it.

eson's picture

Well, knowing exactly nothing about the matter, I found these links on the net. Maybe you've already seen them or they are totally useless?

https://github.com/AlexandrosFerles/Swedish-Language-Automatic-Speech-Re...

https://github.com/se-asr/model

https://medium.com/@klintcho/creating-an-open-speech-recognition-dataset...

 

Thanks anyway for your good work as allways!

mkiol's picture

Sorry for the late reply. Indeed this project provides model for Swedish. Unfortunately it was trained for older version of DeepSpeech and therefore it is not compatible. Sadly, there is no simple way to convert it to new one. The only solution is to repeat the training, which is possible but requires access to source material (voice samples) and significant computing power.

defactofactotum's picture

Now working on pinephone with sfos4.2. But the microphone disconnects after every input.

Fuchur's picture

It really is working very well and a very nice app.
One thing I really would love to see is to be have a button on the keyboard or an own keyboard layout which would include it to the keyboard input.

That would just be great :).

lispy's picture

Really works. I like it. My wife has to convert a huge audiofile to text but pushing the button for an hour sadly doesn't cut it for her. Can you imagine an audiofile import of sorts? Or maybe make the button sticky?

mkiol's picture

There are to modes (Settings->"Speech detection mode"). In "Automatic" mode, you don't have to hold the button. App will (in most cases ;-) automaticaly detect that speaking begins.

defactofactotum's picture

Thanks for the keyboard fix! It still doesn't work on my pinephone - it worked briefly in Italian but with very bad recognition, then stopped again. Another suggestion: would it be possible to add words to the database? I imagine this is probably a huge and complicated task....

defactofactotum's picture

Also does not work on pinephone. Suggestion for keyboard behaviour: at the moment it's possible to edit text in the middle of a line but after typing one letter the cursor snaps back to end of line. When an entire word is wrong this is very laborious.

mkiol's picture

Thank for suggestion.

In the meantime, I've managed to fix Jolla 1, Jolla C and PinePhone issue. Moreover with alpha version of DeepSpeech accuracy of recognition is much improved. Stay tuned for next release :)

ichthyosaurus's picture

This looks very promising - I suggest that you ask for it to be included in the next community news :)!

Pages