Voice Recognition

VeriSpeak is available as a software development kit that enables the development of stand-alone and Web-based speaker recognition applications on Microsoft Windows, Linux, Mac OS X, iOS and Android platforms.

Description

VeriSpeak voice identification technology is designed for biometric system developers and integrators. The text-dependent speaker recognition algorithm assures system security by checking both voice and phrase authenticity. Voiceprint templates can be matched in 1-to-1 (verification) and 1-to-many (identification) modes..

Reliability and performance tests

The VeriSpeak 11.0 algorithm has been tested with voice samples taken from the XM2VTS Database, as well as with voice samples from Neurotechnology’s internal database.

Applications

  • Banks
  • Public Sector
  • Business
  • Department stores
  • Hotels
  • Text-dependent algorithm prevents unauthorized access with a covertly-recorded user voice.
  • Two-factor authentication by checking voice biometrics and pass-phrase authenticity.
  • Regular microphones and smartphones are suitable for recording user voices.
  • Available as a multiplatform SDK that supports multiple programming languages.

The VeriSpeak algorithm can perform both text-dependent and phrase-independent speaker recognition and is able to detect when users start and finish speaking. A template may store several voice records with the same phrase to improve recognition reliability. A system may ask users to pronounce several specific phrases during speaker verification or identification and match each audio sample against records in the database. The VeriSpeak algorithm can fuse the matching results for each phrase together to improve matching reliability.

Basic Recommendations for Speaker Recognition

The speaker recognition accuracy of VeriSpeak depends on the audio quality during enrollment and identification. Certain constraints should be noted before or during algorithm integration into a speaker recognition system. Other variables may be overcome by enrollment with the same phrase in different environments

There are specific requirements for each platform which will run VeriSpeak-based applications.

Microsoft Windows Platform Requirements

  • Microsoft Windows 7 / 8 / 10, 32-bit or 64-bit.
    • Windows XP is no longer supported in this version of the SDK. If your product requires to support Windows XP, you may consider the previous version of the SDK. Please contact us for more information.
  • PC or laptop with x86 (32-bit) or x86-64 (64-bit) compatible processors.
    • 2 GHz or better processor is recommended.
    • SSE2 support is required. Processors that do not support SSE2 cannot run the VeriSpeak algorithm. Please check if a particular processor model supports SSE2 instruction set.
  • At least 512 MB of free RAM should be available for the application. Additional RAM is required for applications that perform 1-to-many identification, as all biometric templates need to be stored in RAM for matching. For example, 1,000 templates (each containing 1 voiceprint record) require about 5 MB of additional RAM.
  • Free space on hard disk drive (HDD):
    • at least 1 GB required for the development.
    • 100 MB required for VeriSpeak components deployment.
    • Additional space would be required in these cases:
      • VeriSpeak does not require the original voice sample to be stored for the matching; only the templates need to be stored. However, storing voice samples on hard drive for the potential future usage is recommended.
      • Usually a database engine runs on a separate computer (back-end server). However, DB engine can be installed on the same computer for standalone applications. In this case HDD space for templates storage must be available. For example, 10,000 templates (each containing 1 voiceprint record) stored using a relational database would require about 50 MB of free HDD space. Also, the database engine itself requires HDD space for running. Please refer to HDD space requirements from the database engine providers.
  • Microphone. Any microphone that is supported by the operating system can be used.
  • Database engine or connection with it. VeriSpeak templates can be saved into any DB (including files) supporting binary data saving. VeriSpeak Extended SDK contains the following support modules for Matching Server on Microsoft Windows platform:
    • Microsoft SQL Server;
    • MySQL;
    • Oracle;
    • PostgreSQL;
    • SQLite.
  • Network/LAN connection (TCP/IP) for client/server applications. Also, network connection is required for using Matching server component (included in VeriSpeak Extended SDK). Communication with Matching server is not encrypted, therefore, if communication must be secured, a dedicated network (not accessible outside the system) or a secured network (such as VPN; VPN must be configured using operating system or third party tools) is recommended.
  • Microsoft .NET framework 4.5 or newer (for .NET components usage).
  • One of following development environments for application development:
    • Microsoft Visual Studio 2012 or newer (for application development under C/C++, C#, Visual Basic .Net)
    • Sun Java 1.6 SDK or later

Android Platform Requirements

  • A smartphone or tablet that is running Android 4.4 (API level 19) OS or newer.
    • If you have a custom Android-based device or development board, contact us to find out if it is supported.
  • ARM-based 1.5 GHz processor recommended for voiceprint processing in the specified time. Slower processors may be also used, but the voiceprint processing will take longer time.
  • At least 256 MB of free RAM should be available for the application. Additional RAM is required for applications that perform 1-to-many identification, as all biometric templates need to be stored in RAM for matching. For example, 1,000 templates (each containing 1 voiceprint record) require about 5 MB of additional RAM.
  • Free storage space (built-in flash or external memory card):
    • 30 MB required for embedded voice components deployment for each separate application.
    • Additional space would be required if an application needs to store original voice samples. VeriSpeak does not require the original voice sample to be stored for the matching; only the templates need to be stored.
  • Any smartphone’s or tablet’s built-in or headset microphone which is supported by Android OS.
  • Network/LAN connection (TCP/IP) for client/server applications. Also, network connection is required for using Matching server component (included in VeriSpeak Extended SDK). Communication with Matching server is not encrypted, therefore, if communication must be secured, a dedicated network (not accessible outside the system) or a secured network (such as VPN; VPN must be configured using operating system or third party tools) is recommended.
  • PC-side development environment requirements:
    • Java SE JDK 6 (or higher)
    • Eclipse Indigo (3.7) IDE
    • Android development environment (at least API level 19 required)
    • One of the following build automation systems:
    • Internet connection for activating VeriSpeak component licenses

iOS Platform Requirements

  • One of the following devices, running iOS 8.0 or newer:
    • iPhone 5S or newer iPhone.
    • iPad 2 or newer iPad, including iPad Mini and iPad Air models.
    • iPod Touch 6th Generation or newer iPod.
  • At least 256 MB of free RAM should be available for the application. Additional RAM is required for applications that perform 1-to-many identification, as all biometric templates need to be stored in RAM for matching. For example, 1,000 templates (each containing 1 voiceprint record) require about 5 MB of additional RAM.
  • Free storage space (built-in flash or external memory card):
    • 30 MB required for embedded voice components deployment for each separate application.
    • Additional space would be required if an application needs to store original voice samples. VeriSpeak does not require the original voice samples to be stored for the matching; only the templates need to be stored.
  • Any smartphone’s or tablet’s built-in or headset microphone which is supported by iOS.
  • Network/LAN connection (TCP/IP) for client/server applications. Also, network connection is required for using Matching server component (included in VeriSpeak Extended SDK). Communication with Matching server is not encrypted, therefore, if communication must be secured, a dedicated network (not accessible outside the system) or a secured network (such as VPN; VPN must be configured using operating system or third party tools) is recommended.
  • Development environment requirements:
    • a Mac running Mac OS X 10.10.x or newer.
    • Xcode 6.4 or newer.

Mac OS X Platform Requirements

  • A Mac running Mac OS X 10.7.x or newer. 2 GHz or better processor is recommended.
  • At least 512 MB of free RAM should be available for the application. Additional RAM is required for applications that perform 1-to-many identification, as all biometric templates need to be stored in RAM for matching. For example, 1,000 templates (each with 1 voiceprint record) require about 5 MB of additional RAM.
  • Free space on hard disk drive (HDD):
    • at least 1 GB required for the development.
    • 100 MB required for VeriSpeak components deployment.
    • Additional space would be required in these cases:
      • VeriSpeak does not require the original voice sample to be stored for the matching; only the templates need to be stored. However, storing voice sample on hard drive for the potential future usage is recommended.
      • Usually a database engine runs on a separate computer (back-end server). However, DB engine can be installed on the same computer for standalone applications. In this case HDD space for templates storage must be available. For example, 10,000 templates (each with 1 voiceprint record) stored using a relational database would require about 50 MB of free HDD space. Also, the database engine itself requires HDD space for running. Please refer to HDD space requirements from the database engine providers.
  • Microphone. Any microphone that is supported by the operating system can be used.
  • Database engine or connection with it. VeriSpeak templates can be saved into any DB (including files) supporting binary data saving. VeriSpeak Extended SDK contains SQLite support modules for Matching Server on Mac OS X platform.
  • Network/LAN connection (TCP/IP) for client/server applications. Also, network connection is required for using Matching server component (included in VeriSpeak Extended SDK). Communication with Matching server is not encrypted, therefore, if communication must be secured, a dedicated network (not accessible outside the system) or a secured network (such as VPN; VPN must be configured using operating system or third party tools) is recommended.
  • Specific requirements for application development:
    • XCode 4.3 or newer
    • wxWidgets 3.0.0 or newer libs and dev packages (to build and run SDK samples and applications based on them)
    • GNU Make 3.81 or newer (to build samples and tutorials development)
    • Sun Java 1.6 SDK or later

Linux x86/ x86-64 Platform Requirements

  • Linux 2.6 or newer kernel (32-bit or 64-bit) is required. Linux 3.0 kernel or newer is recommended.
  • PC or laptop with x86 (32-bit) or x86-64 (64-bit) compatible processors.
    • 2 GHz or better processor is recommended.
    • SSE2 support is required. Processors that do not support SSE2 cannot run the VeriLook algorithm. Please check if a particular processor model supports SSE2 instruction set.
  • At least 512 MB of free RAM should be available for the application. Additional RAM is required for applications that perform 1-to-many identification, as all biometric templates need to be stored in RAM for matching. For example, 10,000 templates (each with 1 voiceprint record) require about 50 MB of additional RAM.
  • Free space on hard disk drive (HDD):
    • at least 1 GB required for the development.
    • 100 MB required for VeriSpeak components deployment.
    • Additional space would be required in these cases:
      • VeriSpeak does not require the original voice sample to be stored for the matching; only the templates need to be stored. However, storing voice sample on hard drive for the potential future usage is recommended.
      • Usually a database engine runs on a separate computer (back-end server). However, DB engine can be installed on the same computer for standalone applications. In this case HDD space for templates storage must be available. For example, 10,000 templates (each with 1 voiceprint record inside) stored using a relational database would require about 50 MB of free HDD space. Also, the database engine itself requires HDD space for running. Please refer to HDD space requirements from the database engine providers.
  • Microphone. Any microphone that is supported by the operating system can be used.
  • glibc 2.13 library or newer
  • libasound 1.0.x or newer (for voice capture)
  • libgudev-1.0 164-3 or newer (for microphone usage)
  • Database engine or connection with it. VeriSpeak templates can be saved into any DB (including files) supporting binary data saving. VeriSpeak Extended SDK contains the following support modules for Matching Server on Linux platform:
    • MySQL;
    • Oracle;
    • PostgreSQL;
    • SQLite.
  • Network/LAN connection (TCP/IP) for client/server applications. Also, network connection is required for using Matching server component (included in VeriSpeak Extended SDK). Communication with Matching server is not encrypted, therefore, if communication must be secured, a dedicated network (not accessible outside the system) or a secured network (such as VPN; VPN must be configured using operating system or third party tools) is recommended.
  • Specific requirements for application development:
    • wxWidgets 3.0.0 or newer libs and dev packages (to build and run SDK samples and applications based on them)
    • GCC-4.4.x or newer
    • GNU Make 3.81 or newer
    • Sun Java 1.6 SDK or later
    • pkg-config-0.21 or newer (optional; only for Matching Server database support modules compilation)

ARM Linux Platform Requirements

We recommend to contact us and report the specifications of a target device to find out if it will be suitable for running VeriSpeak-based applications.

There is a list of common requirements for ARM Linux platform:

  • A device with ARM-based processor, running Linux 3.2 kernel or newer.
  • ARM-based 1.5 GHz processor recommended for voiceprint processing in the specified time.
    • ARMHF architecture (EABI 32-bit hard-float ARMv7) is required.
    • Lower clock-rate processors may be also used, but the voiceprint processing will take longer time.
  • At least 256 MB of free RAM should be available for the application. Additional RAM is required for applications that perform 1-to-many identification, as all biometric templates need to be stored in RAM for matching. For example, 1,000 templates (each containing 1 voiceprint record) require about 5 MB of additional RAM.
  • Free storage space (built-in flash or external memory card):
    • 100 MB required for VeriSpeak components deployment.
    • Additional space would be required in these cases:
      • An application needs to store original voice samples. Note that VeriSpeak does not require the original voice sample to be stored for the matching; only the templates need to be stored.
      • Usually a database engine runs on a separate computer (back-end server). However, a DB engine can be installed on the same device for standalone applications. For example, 1,000 templates (each with 1 voiceprint record) stored using a relational database would require about 5 MB of free storage space.
        PostgreSQL, MySQL and SQLite are supported on ARM-Linux. Please refer to hardware requirements from the corresponding database engine providers.
  • Microphone. Any microphone that is supported by the operating system can be used.
  • glibc 2.13 or newer.
  • libasound 1.0.x or newer (for voice capture)
  • libgudev-1.0 164-3 or newer (for microphone usage)
  • libstdc++-v3 4.7.2 or newer.
  • Network/LAN connection (TCP/IP) for client/server applications. Also, network connection is required for using Matching server component (included in VeriSpeak Extended SDK). Communication with Matching server is not encrypted, therefore, if communication must be secured, a dedicated network (not accessible outside the system) or a secured network (such as VPN; VPN must be configured using operating system or third party tools) is recommended.
  • Development environment specific requirements:
    • GCC-4.4.x or newer
    • GNU Make 3.81 or newer
    • JDK 1.6 or later

The VeriSpeak 11.0 algorithm has been tested with voice samples taken from the XM2VTS Database, as well as with voice samples from Neurotechnology’s internal database. These voice template matching experiments were performed with the VeriSpeak 11.0 text-dependent engine:

  • Experiment 1 used voice samples from the XM2VTS database. All samples include the same fixed phrase pronounced by all subjects.
  • Experiment 2 used voice samples from Neurotechnology’s internal voice database 1. All samples included the same fixed phrase pronounced by all subjects.
  • Experiment 3 used voice samples from Neurotechnology’s internal voice database 2. Each subject pronounced a unique phrase during his/her recording.

Receiver operation characteristic (ROC) curves are usually used to demonstrate the recognition quality of an algorithm. ROC curves show the dependence of false rejection rate (FRR) on the false acceptance rate (FAR).

Charts with ROC curves for each of the experiments are available bellow:

  • Experiment 1:

https://www.neurotechnology.com/res/verispeak_roc_xm2vts_db.gif

  • Experiment 2 and 3:

 

https://www.neurotechnology.com/res/verispeak_roc_neurotechnology_internal_db.gif

VeriSpeak 11.0 text-dependent algorithm tests with XM2VTS and Neurotechnology’s internal databases
Exp. 1 Exp. 2 Exp. 3
Total voice samples in the database 2360 309 305
Subjects in the database 295 42 42
Recording sessions per subject 8 1 – 10 1 – 10
Average voice sample length (seconds) 6.167 4.975 6.214
FRR at 0.1 % FAR 4.055 % 5.473 % 0.285 %