I launched Straightforward Knowledge Remodel v2 at present. After no fewer than 80 (!) v1 manufacturing releases since 2019, that is the primary paid improve.
Main enhancements embody:
- Schema versioning, so you’ll be able to routinely deal with adjustments to the column construction of an enter (e.g. extra or lacking columns).
- A brand new Confirm rework so you’ll be able to verify a dataset has the anticipated values.
At present there are 48 totally different verification checks you may make:
- At the least 1 non-empty worth
- Incorporates
- Don’t enable listed values
- Ends with
- Integer besides listed particular worth(s)
- Is native file
- Is native folder
- Is decrease case
- Is sentence case
- Is title case
- Is higher case
- Is legitimate EAN13
- Is legitimate e-mail
- Is legitimate phone quantity
- Is legitimate UPC-A
- Match column title
- Matches common expression
- Most characters
- Most variety of columns
- Most variety of rows
- Most worth
- Minimal characters
- Minimal variety of columns
- Minimal variety of rows
- Minimal worth
- No clean values
- No carriage returns
- No foreign money
- No digits
- No double areas
- No duplicate column names
- No duplicate values
- No empty rows
- No empty values
- No gaps in values
- No main or trailing whitespace
- No line feeds
- No non-ASCII
- No non-printable
- No punctuation
- No symbols
- No Tab characters
- No whitespace
- Numeric besides listed particular worth(s)
- Solely enable listed values
- Require listed values
- Begins with
- Legitimate date in format
You may see any fails visually, with color coding by severity:
- Aspect-by-side comparability of dataset headers:
- Aspect-by-side comparability of dataset information values:
- A lot of further matching choices for the Lookup rework:
Permitting you to do unique lookups similar to:
Plus a lot of different adjustments.
In v1 there have been points associated to how column-related adjustments cascaded via the system. This was the toughest factor to get proper, and it took a reasonably large redesign to repair all the problems. As a bonus, now you can disconnect and reconnect nodes, and it remembers all of the column-based choices (inside sure limits). These adjustments make Straightforward Knowledge Remodel really feel way more sturdy to make use of, as now you can make a lot of adjustments with out worrying an excessive amount of about breaking issues additional downstream.
Straightforward Knowledge Remodel now helps:
- 9 enter codecs (together with varied CSV variants, Excel, XML and JSON)
- 66 totally different information transforms (similar to Be a part of, Filter, Pivot, Pattern and Lookup)
- 11 output codecs (together with varied CSV variants, Excel, XML and JSON)
This lets you snap collectively a sequence of nodes like Lego, to in a short time rework or analyse your information. In contrast to a code-based strategy (similar to R or Python) or a command line instrument, this can be very visible, with pretty-much immediate suggestions each time you make a change. Plus, no pesky syntax to recollect.
Consuming my very own dogfood, utilizing Straightforward Knowledge Remodel to create an e-mail advertising and marketing marketing campaign from varied disparate information sources (mailing lists, licence key databases and so forth).
Straightforward Knowledge Remodel is all written in C++ with reminiscence compression and reference counting, so it’s quick and reminiscence environment friendly and might deal with multi-million row datasets with no downside.
Whereas a lot of my opponents are transitioning to the online, Straightforward Knowledge Remodel stays an area instrument for Home windows and Mac. This has a number of main benefits:
- Your delicate information stays in your pc.
- Much less latency.
- I don’t should pay your compute and bandwidth prices, which implies I can cost an reasonably priced one-time charge for a perpetual licence.
I feel privateness is simply going to develop into ever extra of a priority as rampaging AIs attempt to scrape each single piece of knowledge they will discover.
Utilization-based charges for on-line information instruments are not any small matter. For a spread of utilization charge horror tales, similar to enabling debug logging in a big manufacturing ETL pipeline leading to $100k of additional prices in per week, see this Reddit publish. A few of my prospects have processed greater than a billion rows in Straightforward Knowledge Remodel. Not dangerous for $99!
It has been lots of laborious work, however I’m please with how far Straightforward Knowledge Remodel has come. I feel Straightforward Knowledge Remodel is now a complete, quick and sturdy instrument for file-based information wrangling. When you have some information to wrangle, give it a attempt! It’s only $99+tax ($40+tax in case you are upgrading from v1) and there’s a totally purposeful, 7 day free trial right here:
Obtain Straightforward Knowledge Remodel v2
I’m very grateful to my prospects, who’ve been an enormous assist in offering suggestions. This has improved the product no finish. Many heads are higher than one!
The following large step goes to be including the power to speak on to databases, REST APIs and different information sources. I additionally hope sooner or later so as to add the power to visualise information utilizing graphs and charts. Watch this area!