Introduction: MongoDB’s Latest Announcement
Recently MongoDB announced its financial performance of 2021 that highlighted the growing demand for MongoDB applications. The popular data platform generated a whopping revenue of $590.4 million, a 40% rise in revenue over the last year. According to the survey published by Statista, MongoDB is also the second most-demanded database skill as 17.89% of developers prefer using it. MongoDB has now decided to launch its Atlas Data Lake at their much-awaited developer conference. We can also expect a new version of MongoDB which will be named MongoDB 4.2. This version is said to have tons of new security features.
Source: Pexels
The Atlas Data Lake allows users to search by utilizing data through the company’s Query Language. It can be operated and managed on AWS S3. The best part is that users can conduct searches irrespective of their format. There will be dedicated support for various computer data formats like BSON, JSON, Parquet, CSV, TSV, and Avro. The Users are now required to route the service to the existing S3 buckets to utilize the Atlas Data Lake. The best part about this latest development is that users no longer need to manage servers or infrastructure.
Experts also say that we can expect cloud-native workloads like Azure Storage and Google Cloud Storage to support the Atlas Data Lake due to the newest development. MongoDB is making efforts to intensify its integration of Atlas services with cloud computing services like Google Cloud.
It shall activate the services provided by MongoDB’s managed database on the Google Cloud Platform. The introduction of new developments doesn’t end here. MongoDB has also announced that they will enable users to utilize Full-Text Search. This feature will allow developers to conduct advanced text search functions based on the Apache Lucene 8, an open-source, Java full-text search engine.
What is MongoDB 4.2?
Source: Unsplash
MongoDB 4.2
The latest version of MongoDB offers new security features. One of the significant features is client-side Field Level Encryption. What does it do? Database security used to rely on the server-side. Hence, the data was accessible to administrators. Yes! They could access the data even when they didn’t have client access. Therefore, in a situation where an attacker can enter the server, there can be major problems.
However, the latest security model offers access to the local drivers and the client. Due to this feature, the encryption remains separated from the database. On the other hand, MongoDB 4.2 also incorporates the support for distributed transactions. It offers the prime feature for developers where they can manage MongoDB deployments via the Kubernetes control plane.
What is Realm?
MongoDB has introduced Realm, a mobile database acquired by MongoDB with the rest of the products earlier this year. The company blends the serverless platform MongoDB Stitch with Realm’s mobile database and synchronization platform utilizing the Realm brand. Realm’s synchronization protocol equates to the Atlas cloud database, while Realm Sync allows developers to provide that information to their applications.
What are the benefits of MongoDB’s Atlas Data Lake?
The “data lake” idea became famous after the massive increase in demand for big data and Hadoop. The data lake was considered an agile and efficient substitute for the enterprise data warehouse. Disappointment with EDWs was widespread. Why? Because it led to a delay fetching data into the EDW. The Data in an EDW had to restrict itself to a rigid data model. That means it had to use a star or snowflake schema. Converting the current data into this format needed a dull manual data modeling method. It was succeeded by establishing an Extract, Transform, Load pipeline to ensure that the newly developed data reaches EDW.
The delay in building the ETL pipelines often heightened extremity. The EDW that was bound to offer rapid decision-making became a bottleneck! One of the critical benefits of Hadoop was that developers didn’t need to manage the data model before loading any data. Hadoop could accept structured or unstructured data. However, it was essential to determine the data structure before utilizing this feature to capture the data immediately. In this giant world of big data, it was crucial to keep real, unprocessed, and untransformed data available for future analysis. However, this was not supported by EDW.
Source: Unsplash
“Schema On Read”
The modern paradigm motivated by Hadoop was named “schema on read” instead of the traditional “schema on write.” Companies were encouraged to drop the EDW in favor of the new “data lake” because it was like a vast repository of structured and unstructured data. One could mine this vast repository to gain a competitive advantage. Unfortunately, this extent of data lake and Hadoop was not recognized by developers.
Even though Hadoop offered an economic mechanism for storing large volumes of data, it did not present a means for transforming that data into knowledge. It was also challenging to that decipher data! Schema on reading may sound fabulous if you have a definite perception of the data structure. However, this understanding was missing. Despite the broken promises of the old data lake, the concept clutches some depth in larger enterprises; hence, MongoDB decided to leverage the term for one of its newest offerings.
MongoDB’s Atlas Data Lake shows a superficial resemblance to Hadoop-powered data lakes. Yes, it is a valuable feature that attains to witness significant uptake. MongoDB’s data lake feature is more precise and similar to the external table feature present in Oracle and other RDMS. The external table is the place where the data lives externally to the database.
On the other hand, we can see a similar feature in the Atlas Data Lake. It offers a collection of data that does not reside within the MongoDB database. It stays as a file in a cloud object-store.
To produce an Atlas Data Lake, we need to develop a particular MongoDB server and supply. It must incorporate the credentials that allow it to connect to one or more than one S3 bucket. Within those S3 buckets, files can stay in Avro, JSON, Parquet, or CSV formats. The collections are charted to one or more of these files, which one can query via standard MongoDB find(), and aggregate() commands. The files are read-only.
The Atlas Data Lake presents at least two notable benefits. It offers data in object files to be queried using the well-known MongoDB syntax without loading them into MongoDB. It also provides a budget-friendly storage mechanism that one can use to save cold MongoDB data. The only drawback is that features like indexing or query optimization are not available. Currently, MongoDB’s data lake amenity can be utilized from the Atlas cloud offering. It is only accessible for the data carried in Amazon S3 buckets. We can expect it to be soon available on other cloud platforms and other on-premise deployments.
What are the new features?
1. Native time
The time-series collection is available with the regular group within the same workload. It supports any workload. The automated and seamless data lifecycle system offers visualization, online archiving, real-time analysis, and data coupling. There are numerous benefits of utilizing the platform for time-series data management. It drastically lowers the index sizes along with IO for operations; this leads to higher performance. It also reduces storage sizes. In addition to that, the complete data lifecycle, from ingestion, storage, to visualization, from online archiving to automatic expiration, can be handled efficiently.
2. Smooth data redistribution
With sharding, the chunks of databases can share more extensive datasets. They can also manage or handle additional developer requests. The platform offers live resharding of databases that helps users change the shared key for collections. This function can be performed on-demand as databases and workloads continue to scale. The best part? You can conduct the process without facing any downtime.
3. Future proof- application compatibility
The future-proofing is done with a versioned API. Due to this reason, the application lifecycle is separated from the database lifecycle. It provides a greater level of investment protection too. Developers are also assured that their codes run seamlessly and remain uninterrupted for years.
4. Multi cloud-security tools
The MongoDB 5.0 also offers Client-Side Field Level Encryption that has a robust data privacy control. The MongoDB 5.0 is also supported by auditing and certificate rotation. It allows users to maintain high security without interruptions, allowing them to run applications anywhere. The Client-Side Field-level encryption encrypts data in particular data fields. Users can encrypt and decrypt messages using a single key.
Other features
Other enhancements include Function Scoring that helps users in applying mathematical formulas in the field. It also allows users to specify synonyms for search indexes. With the latest developments, game developers can utilize Realm for data storage; this includes game data like player stats, scores, rankings, etc. MongoDB Charts has now integrated itself with the Atlas Data Lake to help users visualize data stored in Amazon AWS S3. They don’t have to worry about data transformation, duplication, or movement.
Saffron Tech is a premier MongoDB open-source development company with experience of 12+ years in creating high-quality products for its clients. We have clients all across the globe; with a team of highly skilled developers, we have managed to boost their ROI and customer loyalty. Contact us today!