The Feature List : Oracle Endeca Information Discovery 3.1
As promised last week, we've been compiling a list of all the new features that were added as part of the Oracle Endeca Information Discovery (OEID) 3.1 release earlier this month.
If we've missed anything, please shoot us an email and we'll update the post.
OEID Integrator v3.1
The gang at Javlin has implemented some major updates in the past 6 months, especially around big data. The OEID Integrator releases, obviously, lag a bit behind their corresponding CloverETL release but there's still a lot to get excited about from both a CloverETL and "pure OEID" standpoint:
- Base CloverETL version upgraded from 3.3 to 3.4 - full details here
- HadoopReader / HadoopWriter components
- Job Control component for executing MapReduce jobs
- Integrated Hive JDBC connection
- Language Detection component!
The big takeaway here is the work that the Javlin team has done in terms of integrating their product more closely with the "Big Data" ecosystem. Endeca has always been a great complementary fit with sophisticated data architectures based on Hadoop and these enhancements will only make it easier.
Keeping with our obsession of giving some time to the small wins that add big gains, I really like the quick win with the Language Detection component. This is something that had been around "forever" in the old Endeca world of Forge and Dgidx but was rarely used or understood. It is nice to see the return of this functionality as it will play a huge role in multi-lingual/multi-national organizations, especially those with a lot of unstructured data. Think about a European bank with a large presence in multiple countries trying to hear the "Voice of the Customer". Having the ability to navigate, filter and summarize based on a customer's native language gets so much easier.
OEID Web Acquisition Toolkit (aka KAPOW!)
The Kapow suite of tools are a new addition to the product, announced earlier this month. The way we've heard it described by the team inside Oracle is "ETL for the Web". It essentially lets you build workflows that greatly simplify data extraction for any piece of data that is available online. Looking at the software, it looks like it's going to be a tremendous addition to the suite, especially when looking at semi-structured data or the classic problem of "deriving structure from unstructured". It also has the potential to greatly simplify acquisition of social data, allowing developers to quickly acquire data visually, rather than having to learn new APIs, come up to speed on OAuth and the rest of the learning curve associated to consuming data from the Twitters and LinkedIns of the world.
OEID Self-Provisioning Service
- Upload JSON Files as well as Excel
- Advanced data cleanup and configuration tools for Self-Service
- Connect to a database via JDBC
- Connect directly to Oracle Business Intelligence (OBIEE)
- Mash-up multiple data sources
- Online Integrated Text Enrichment for Self-Service
Self-provisioned applications continue to get more robust with the ability to leverage additional sources, specify mapping and formatting rules and perform "light" ETL and enrichment tasks such as entity extraction. For example, when uploading a spreadsheet, you can preview your data:
You an also see a sampling of values, combine/split related fields and perform data cleanup tasks that normally would require scrubbing in the data warehouse or ETL. You can also set fields to be searchable or specify the language for a given field:
Once the creation process begins, you don't need to wait for your data to completely load before discovering. Once the first 100 records are brought in, the application redirects to a home page where the data is available for exploration. The records continue to load in the background but users can begin to work and receive a small prompt when the load completes. And once it completes, users can mash up and augment their data with additional data sets:
It's a really powerful self-service experience and a true step forward from previous versions. There's an excellent post specific to self-service from Farnaz Mostowfi at Rittman Mead that goes into more detail here.
OEID Studio v3.1
There's so many big new things in Studio, we're still discovering more every time we go in. Here's what we've found so far:
- Sections of data, previously achieved through record filters and record type fields are now "Datasets", a first-class entity in Oracle Endeca Server
- All portlets are now associated to these datasets and are context aware when it comes to refinements that apply and do not apply
- Because of this, the State Manager extension point no longer exists and has been incorporated directly into the engine (how this affects pivoting will be a topic for a future post. Spoiler alert: You're still going to need this.)
- Other ripple effects of this are seen across OEID Studio from Security filter application (it happens per Dataset now, rather than globally) to search interfaces and configuration (certain search interfaces may not apply to certain Datasets)
- Metrics Bar and the Alerts Component have been merged into a single Summarization Bar component
- Cosmetic changes related to renaming of portlets (Ex: Guided Navigation is now Available Refinements)
There's also a bunch of additional UI changes and improved treatments such as one-click editing, heat mapping, separate control panel menus and a new menu bar for adding components to pages:
Can't wait to get our components upgraded and working inside 3.1 with some nice big icons.
Oracle Endeca Server
The very nature of Endeca Server, being a query processing engine, means a lot of the changes and work that go into a release can go un-noticed. In addition to the aforementioned "Datasets" change (sometimes referred to as "Collections" in Endeca Server documentation), we've uncovered some other key enhancements:
- The Endeca Query Language (EQL) has been enhanced with respect to treatment of multi-assign attributes in the engine. Users now have the ability to group by individual values (how these attributes previously worked) or unique sets/combinations of values with a host of new, supporting EQL functions such as SET_UNIONS and CARDINALITY.
- Since datasets are now a first-class entity, all views must leverage a dataset though it's unclear if advanced developers who write "pure EQL" are bound by this
- Thirteen additional languages including Croatian, Arabic and Danish
- Per-query control over implicit and explicit refinement calculation
- Greater control over performance characteristics such as memory-to-index-size and threads-per-core ratios
- Auto-idling of Endeca Server instances when not queried for a period of time
- Ability to automate cache warming based on previous queries/requests
I'm sure we've missed something so please let us know in the comments if you see something that we missed. To go meta for a moment, if you're looking for some evidence as to how large and feature-rich this release is, we've just cranked out an 1200-word post on it. We detailed some of the previous releases and their features in fewer than 500 words.
Also, we purposefully did not get into the performance updates that we've been told are present as we'd like to make sure we can quantify improvements of that nature. That said, anecdotally, ingest seems much faster and EQL execution is even snappier than before (though we honestly haven't seen unexpected sluggishness with our customers in over a year) on the datasets we've loaded.
We'll have a lot more to say, including a mammoth post on pivoting between Datasets, but we're extremely excited about the capabilities of this release and helping to bring enterprise-class Data Discovery to the masses.