Today, NVIDIA introduced Jetson TX1, a small form-factor Linux system-on-module, destined for demanding embedded applications in visual computing. Designed for developers and makers everywhere, the miniature Jetson TX1 (Figure 1) deploys teraflop-level supercomputing performance onboard platforms in the field. Backed by the Jetson TX1 Developer Kit, a premier developer community, and a software ecosystem including NVIDIA JetPack, Linux For Tegra R23.1, CUDA Toolkit 7, cuDNN, and VisionWorks, Jetson enables machines everywhere with the proverbial brains required to achieve advanced levels of autonomy in today’s world.
Figure 1 shows the 50x87mm embedded Jetson TX1 module and thermal plate, featuring an integrated Maxwell GPU, ARMv8 CPU, and H.265 video processor.
Aimed at developers interested in computer vision and on-the-fly sensing, Jetson TX1’s credit-card footprint and low power consumption mean that it’s geared for deployment onboard embedded systems with constrained size, weight, and power (SWaP). Jetson TX1 exceeds the performance of Intel’s high-end Core i7-6700K Skylake in deep learning classification with Caffe and, while drawing only a fraction of the power, achieves more than 10x the perf-per-watt.
Jetson provides superior efficiency while maintaining a developer-friendly environment for agile prototyping and product development, removing extra legwork typically associated with deploying power-limited embedded systems. Jetson TX1’s small form-factor module enables you to deploy Tegra into embedded applications ranging from autonomous navigation to deep learning-driven inference and analytics.
Jetson TX1 Module
Built around the NVIDIA 20nm Tegra X1 SoC featuring the 1024-GFLOP Maxwell GPU, 64-bit quad-core Arm Cortex-A57, and hardware H.265 encoder/decoder, Jetson TX1 measures in at 50x87mm and is packed with performance and functionality.
- 4GB LPDDR4
- 16GB eMMC flash
- 802.11ac WiFi
- Bluetooth 4.0
- Gigabit Ethernet
Jetson TX1 accepts 5.5V-19.6VDC input (Figure 2).
- Up to six MIPI CSI-2 cameras (on a dual ISP)
- 2x USB 3.0
- 3x USB 2.0
- PCIe gen2 x4 + x1
- Independent HDMI 2.0/DP 1.2 and DSI/eDP 1.4
- 3x SPI
- 4x I2C
- 3x UART, SATA, GPIO, and others
Needless to say, Jetson TX1 stands tall in the face of many an algorithmic and integration challenge.
The Jetson module uses a 400-pin board-to-board connector (Figure 3) for interfacing with the developer kit’s reference carrier board, or with a bespoke, customized board designed during your productization process. Tegra’s chip-level capabilities and I/O are closely mapped to the module’s pin-out. The pin-out will be backward-compatible with future versions of the Jetson module.
Jetson TX1 comes with an integrated thermal transfer plate (Figure 3), rated between -25° C and 80° C, for interfacing with passive or active cooling solutions. For more information about detailed electromechanical specifications, see the Jetson – Embedded Computing Platform product page, in addition to visiting the active and open development community on the Jetson forum.
Jetson TX1 draws as little as 1 watt of power or lower while idle, around 8-10 watts under typical CUDA load, and up to 15 watts TDP when the module is fully utilized, for example during gameplay and the most demanding vision routines.
Jetson TX1 provides exceptional dynamic power scaling, either based on workload through its automated governor or by explicit user commands to gate cores and specify clock frequencies:
- The four Arm A57 cores automatically scale between 102 MHz and 1.9 GHz.
- The memory controller scales between 40 MHz and 1.6 GHz.
- The Maxwell GPU scales between 76 MHz and 998 MHz.
Touting 256 CUDA cores with Compute Capability 5.3 and Dynamic Parallelism, the Jetson TX1 Maxwell GPU is rated for up to 1024 GFLOPS of FP16. When combined with support for up to 1200 megapixels/sec from either three MIPI CSI x4 cameras or six CSI x2 cameras, along with hardware H.265 encoder and decoder, integrated WiFi, and HDMI 2.0, Jetson TX1 is primed for all-4K video processing.
The Jetson TX1 module retails for $299 and has 5-year availability. In addition to releasing the ecosystem tools, NVIDIA has made available the Jetson TX1 Developer Kit to help you get started today.
Jetson TX1 Developer Kit
The NVIDIA Jetson TX1 Developer Kit includes everything that you need to get started developing on Jetson. Including the premounted module, the Jetson TX1 Developer Kit (Figure 4) contains the following:
- A reference mini-ITX carrier board
- 5MP MIPI CSI-2 camera module
- Two 2.4/5GHz antennas
- An active heatsink and fan
- An acrylic base plate
- A 19VDC power supply brick
The PCIe lanes on the Jetson TK1 Developer Kit are routed from the module to a PCIe x4 desktop slot on the carrier for easy prototyping, in addition to an M.2-E mezzanine with PCIe x1 for wireless radios.
On the Jetson – Embedded AI Computing Platform page, NVIDIA shares the schematics and design files for the reference carrier along with the 5MP CSI-2 camera module, including routing and signal integrity guidelines. Board software support bundled by Jetpack provides easy flashing and device configuration. Out of the box, the Jetson TX1 Developer Kit provides the experience of a desktop PC, but in a small embedded form factor that only draws a fraction of the power.
The Jetson TX1 Developer Kit is available for pre-order immediately for $599, with shipments beginning November 16 in the US and December 20 in Europe and APAC.
Select researchers had the chance to review the Jetson TX1 Developer Kit in the lead-up to launch. MIT professor Dr. Sertac Karaman and his autonomous robotics lab worked hands-on with the new kit, upgrading their self-driving RACECAR from their previous Jetson TK1 setup. The following video shows their autonomous vehicle in action.
In addition to their autonomous RACECAR powered by Jetson TX1, Dr. Karaman’s lab at MIT is behind other projects that use Jetson for autonomy, as well. In collaboration with MIT Media Lab’s Changing Places group on the Persuasive Electric Vehicle (PEV), their self-driving tricycle provides autonomous transport of pedestrians and packages in urban environments—and is also powered by Jetson. Using the ecosystem, the students at MIT quickly prototyped their projects and benefited from the flexible development environment and performance afforded by Jetson TX1.
JetPack and Linux For Tegra R23.1
The software ecosystem for Jetson is extensive, and NVIDIA JetPack simplifies software configuration and deployment. JetPack automates the installation process on Jetson to include all the tools and drivers for development. JetPack 2.0 is provided for Jetson TX1. This version of JetPack bundles the following:
- Linux For Tegra (L4T) R23.1
- Tegra System Profiler 2.4 and Graphics Debugger 2.1
- PerfKit 4.5.0
L4T R23.1 ships with U-Boot and Linux 3.10.64 aarch64 kernel, alongside the Ubuntu 14.04
Recent improvements in L4T include
gstreamer 1.6 extensions with hardware support for H.265, an improved
nvgstcapture sample for testing the camera module, and integrated support for WiFi and Bluetooth.
L4T R23.1 includes support for full desktop OpenGL 4.5, allowing full-on Linux gaming and VR experience in addition to simulation. OpenGL ES 3.1 is also provided. This release includes OpenCV4Tegra 22.214.171.124, enabling you to transparently use NEON SIMD extensions from the standard OpenCV interface. For more information, see posts about OpenCV.
CUDA 7 and cuDNN/Caffe
JetPack 2.0 includes the CUDA Toolkit version 7.0, with 16-bit floating-point support (FP16). CUDA 7.0 unleashes the Jetson TX1 integrated Maxwell GPU. Maxwell, with Compute Capability 5.3, supports Dynamic Parallelism and higher performance FP16. The many uses for Dynamic Parallelism in embedded applications include point cloud processing and tree partitioning, parallel path planning and cost estimation, particle filtering, RANSAC, and solvers.
One of the highlights of the Jetson software ecosystem is an incredible deep learning toolkit built on CUDA, providing Jetson with onboard inference and the ability to apply reasoning in the field. Included is the NVIDIA cuDNN library, adopted by multiple deep learning frameworks including Caffe.
We ran a power benchmark using the Caffe AlexNet image classifier, comparing Jetson TX1 to an Intel Core i7-6700K Skylake CPU. Table 1 shows the results. For more information, see Inference: The Next Step in GPU-Accelerated Deep Learning.
|Efficiency vs. i7-6700K
Kespry Designs, a Silicon Valley industrial drone developer, is using deep learning on Jetson TX1 to provide inference on construction sites for asset tracking of equipment and materials. This takes the tiresome, human-intensive work out of looking after assets and on-site logistical planning. Due to the low SWaP and computational capability of Jetson TX1, Kespry plans to migrate processing onboard unmanned aerial vehicles instead of offline in the data center, shortening response times for tasks like inspection and triage.
Kespry developed their proof-of-concept on the Jetson TX1 Development Kit in just a few weeks. The prototype uses a Caffe model trained to recognize and count different classes of construction equipment. Using Jetson TX1, Kespry is now deploying this previously offline process in real time onboard their drone. Jetson is able to transfer resource-intensive tasks once performed in a data center onboard mobile platforms, thereby closing the loop on response and improving quick-reaction capabilities, creating new opportunities for companies like Kespry.
Jetson TX1 marks the first release of VisionWorks available to developers through JetPack 2.0 and the Embedded Developer Zone. Built on Khronos Group’s OpenVX standard for power-efficient vision processing, VisionWorks provides primitives and building blocks that are highly optimized for Tegra using tuned CUDA kernels. Figure 5 shows the results of benchmarks that we ran on Jetson TX1, profiling the differences between VisionWorks and OpenCV.
VisionWorks is more than 10x faster than upstream CPU-only OpenCV, is 4.5x faster than OpenCV4Tegra with NEON extensions, and is 1.6x faster than OpenCV’s GPU module. The Overall Computer Vision Score was collected from the geometric mean performance of all the overlapping primitives between OpenCV and VisionWorks. Each primitive was measured across image sizes 720p and larger and across all permutations of argument parameters.
In addition to more than 50 filtering, warping, and image-enhancement primitives, VisionWorks also offers numerous higher-level building blocks as well, such as LK optical flow, stereo block-matching (SBM), Hough lines and circles, and Harris (Corner) feature-detection and tracking. VisionWorks provides a full implementation of OpenVX 1.1. You can leverage VisionWorks to deploy camera-ready algorithms and vision pipelines, already tuned for Jetson.
Get VisionWorks today on NVIDIA’s Embedded Developer Zone.
Jetson TX1: A rich development platform
The NVIDIA Jetson ecosystem is rich with tools and support for enabling your research and development of applications and products with Jetson TX1. In the larger scheme, NVIDIA software toolkits for accelerated computing, deep learning, computer vision, and graphics are portable from the data center to the workstation to embedded SoC (Figure 6), allowing enterprise users to seamlessly scale and deploy their applications to devices in the field. Using Jetson, you can leverage the NVIDIA shared architecture and power-efficient technology to roll out high-performance embedded systems with ease and flexibility.
Adept at hosting core processing capabilities alongside learning-driven inference and reasoning, Jetson TX1 represents the ultimate in performance and efficiency for powering your device with the next wave of autonomy. With shipments of Jetson TX1 Developer Kit beginning November 16, secure your pre-order today. And let us know about the amazing things you create using Jetson!