1
2
3
4
5
6
7
8
9
10
0DARIENZO 20140408
3~098~032~Shampoo
3~075~392~Laptop
1~032~478~Spade
3~321~021~Blades
2~045~432~Mobile
...
...
...
9000000003
The Trials of Smooks
16 Sep 2014 - Claude
This post was reproduced on the On Code & Design blog.
The fact that I’m a hard to please guy explains why I rarely show appreciation for a tool. I easily get frustrated when a tool fails to meet the challenges it’s meant to solve. Smooks is one of the few tools I appreciate. It’s an invaluable transformation framework in the integrator’s arsenal. On a project I was on, I threw at Smooks [1] all manner of challenges, and one after another, Smooks overcame them without giving up a key requirement: maintaining a low memory overhead during transformation. A shoutout to Tom Fennelly and his team for bringing to us such a fantastic tool.
Trial I
The initial challenge I brought to Smooks was about taking a tilde delimited CSV file and map its records to POJOs:
You can see the file has an unorthodox header in addition to a footer. Using Smooks’s built-in CSV reader, I wrote concisely the Smooks config doing the mapping to POJOs:
1
2
3
4
5
6
7
8
9
<?xml version='1.0' encoding='UTF-8'?>
<smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd"
xmlns:csv="http://www.milyn.org/xsd/smooks/csv-1.5.xsd">
<csv:reader separator="~" fields="recordClass,code,itemId,itemDesc">
<csv:singleBinding beanId="product" class="org.ossandme.Product" />
</csv:reader>
</smooks-resource-list>
What’s happening under the covers, and in general, is that the reader pulls data from a source (e.g., java.io.InputStream) to go on to produce a stream of SAX events. The reader I’m using above is expecting the source data to be structured as CSV and to consist of 4 columns. Let’s make things more concrete. Reading from the products.csv file, the reader produces the following XML stream [2]:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
<csv-set>
<csv-record number="1">
<recordClass>3</recordClass>
<code>098</code>
<itemId>032</itemId>
<itemDesc>Shampoo</itemDesc>
</csv-record>
<csv-record number="2">
<recordClass>3</recordClass>
<code>075</code>
<itemId>392</itemId>
<itemDesc>Laptop</itemDesc>
</csv-record>
<csv-record number="3">
<recordClass>1</recordClass>
<code>032</code>
<itemId>478</itemId>
<itemDesc>Spade</itemDesc>
</csv-record>
<csv-record number="4">
<recordClass>3</recordClass>
<code>321</code>
<itemId>021</itemId>
<itemDesc>Blades</itemDesc>
</csv-record>
<csv-record number="5">
<recordClass>2</recordClass>
<code>045</code>
<itemId>432</itemId>
<itemDesc>Mobile</itemDesc>
</csv-record>
...
</csv-set>
Listening to the stream of SAX events is the visitor. A visitor listens to specific events from the stream to fire some kind of behaviour, typically transformation. With the singleBinding element in the csv-to-pojos.xml config, the CSV reader pre-configures a JavaBean visitor to listen for csv-record elements. On intercepting this element, the JavaBean visitor instantiates a org.ossandme.Product object and binds its properties to csv-record’s children element content. You’ll notice that I left Product’s target properties unspecified in the config. The CSV reader assumes Product follows JavaBean conventions and its properties are named the same as the defined CSV columns. Records disobeying the column definition are ignored. Consequently, I do not need to worry about the file’s header and footer.
With the transformation configuration out of the way, I turned my attention to running the transformation on the CSV file from my Java code and process the Product objects as they are instantiated and bound by Smooks:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
package org.ossandme;
import org.milyn.Smooks;
import org.milyn.container.ExecutionContext;
import org.milyn.javabean.lifecycle.BeanContextLifecycleEvent;
import org.milyn.javabean.lifecycle.BeanContextLifecycleObserver;
import org.milyn.javabean.lifecycle.BeanLifecycle;
import javax.xml.transform.stream.StreamSource;
public class CsvToPojosTransformer {
public void transform() throws Exception {
// create a Smooks instance for transforming CSV to Products
Smooks smooks = new Smooks(CsvToPojosTransformer.class.getResourceAsStream("/csv-to-pojos.xml"));
ExecutionContext executionContext = smooks.createExecutionContext();
// set an event listener on Smooks
executionContext.getBeanContext().addObserver(new BeanContextLifecycleObserver() {
@Override
public void onBeanLifecycleEvent(BeanContextLifecycleEvent event) {
// apply logic only when Smooks has made a 'org.ossandme.Product' and set its properties
if (event.getLifecycle().equals(BeanLifecycle.END_FRAGMENT) && event.getBeanId().toString().equals("product")) {
Product product = (Product) event.getBean();
System.out.println(product.getItemDesc());
// DO STUFF
// ...
}
}
});
// transform CSV to Products
smooks.filterSource(executionContext, new StreamSource(CsvToPojosTransformer.class.getResourceAsStream("/products.csv")));
smooks.close();
}
}
Trial II
A more complex transformation task I gave to Smooks was to load file records, holding a variable number of columns, into a database. As in the previous task, this file had a header as well as a footer:
1
2
3
4
5
6
7
8
9
10
11
12
FH~20140407224630~1235~Calo Data
TH~1~2014-04-06~2014-04-06 15:19:59~APPROVED~SALE~109
TB~1~3~APPROVED~Shampoo~29012~2~4.30
TB~1~3~APPROVED~Soap~29012~2~1.00
TB~1~3~APPROVED~Gel~29012~2~2.90
TB~1~3~DECLINED~Soap~29012~2~1.00
TF~1~2014-12-01 00:00:00~VISA
TF~1~2014-12-01 00:00:00~VISA
...
...
...
FT~265449~4412826.67~4410413.48~4248007.43
You’ll observe in the sample CSV file that records could be one of three types as denoted by the first column: TH, TB or TF. The CSV reader, as it transforms and pushes records to the XML stream, can be customised such that it renames the csv-record holder to the record’s primary column:
1
2
3
4
5
6
7
8
9
10
11
12
<?xml version='1.0' encoding='UTF-8'?>
<smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd"
xmlns:jb="http://www.milyn.org/xsd/smooks/javabean-1.4.xsd"
xmlns:csv="http://www.milyn.org/xsd/smooks/csv-1.5.xsd"
xmlns:db="http://www.milyn.org/xsd/smooks/db-routing-1.1.xsd"
xmlns:ds="http://www.milyn.org/xsd/smooks/datasource-1.3.xsd">
<csv:reader separator="~" fields="TH[seqNo,startDate,finishDate,status,type,code] | TB[seqNo,type,status,item,voucherNo,dept,amount] | TF[seqNo,expireDate,cardType]"/>
...
</smooks-resource-list>
As we’ll see later, the above config permits Smooks to distinguish between the different record types. Given the sample file transactions.csv, the reader I’ve configured produces the following stream:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
<csv-set>
<UNMATCHED number="1">
<value>FH</value>
</UNMATCHED>
<TH number="2">
<seqNo>1</seqNo>
<startDate>2014-04-06</startDate>
<finishDate>2014-04-06 15:19:59</finishDate>
<status>APPROVED</status>
<type>SALE</type>
<code>109</code>
</TH>
<TB number="3">
<seqNo>1</seqNo>
<type>3</type>
<status>APPROVED</status>
<item>Shampoo</item>
<voucherNo>29012</voucherNo>
<dept>2</dept>
<amount>4.30</amount>
</TB>
<TB number="4">
<seqNo>1</seqNo>
<type>3</type>
<status>APPROVED</status>
<item>Soap</item>
<voucherNo>29012</voucherNo>
<dept>2</dept>
<amount>1.00</amount>
</TB>
<TB number="5">
<seqNo>1</seqNo>
<type>3</type>
<status>APPROVED</status>
<item>Gel</item>
<voucherNo>29012</voucherNo>
<dept>2</dept>
<amount>2.90</amount>
</TB>
<TB number="6">
<seqNo>1</seqNo>
<type>3</type>
<status>DECLINED</status>
<item>Soap</item>
<voucherNo>29012</voucherNo>
<dept>2</dept>
<amount>1.00</amount>
</TB>
<TF number="7">
<seqNo>1</seqNo>
<expireDate>2014-12-01 00:00:00</expireDate>
<cardType>VISA</cardType>
</TF>
<TF number="8">
<seqNo>1</seqNo>
<expireDate>2014-12-01 00:00:00</expireDate>
<cardType>VISA</cardType>
</TF>
...
<UNMATCHED number="9">
<value>FT</value>
</UNMATCHED>
</csv-set>
UNMATCHED elements represent the file’s header and footer. A CSV record having TH in the first field will trigger the reader to create a TH element holding the other record fields. The same logic goes for TB and TF.
Database visitors load the records. However, since these visitors are limited to binding data from POJOs, I first must turn the XML mapped records from the stream into said POJOs. The CSV reader doesn’t know how to bind variable field records to POJOs so I configure the mapping myself:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
<?xml version='1.0' encoding='UTF-8'?>
<smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd"
xmlns:jb="http://www.milyn.org/xsd/smooks/javabean-1.4.xsd"
xmlns:csv="http://www.milyn.org/xsd/smooks/csv-1.5.xsd"
xmlns:db="http://www.milyn.org/xsd/smooks/db-routing-1.1.xsd"
xmlns:ds="http://www.milyn.org/xsd/smooks/datasource-1.3.xsd">
<csv:reader separator="~" fields="TH[seqNo,startDate,finishDate,status,type,code] | TB[seqNo,type,status,item,voucherNo,dept,amount] | TF[seqNo,expireDate,cardType]"/>
<jb:bean beanId="transactionHeader" class="java.util.HashMap" createOnElement="TH">
<jb:value property="seqNo" data="TH/seqNo" />
<jb:value property="startDate" data="TH/startDate" />
<jb:value property="finishDate" data="TH/finishDate" />
<jb:value property="status" data="TH/status" />
<jb:value property="type" data="TH/type" />
<jb:value property="code" data="TH/code" />
</jb:bean>
<jb:bean beanId="transactionBody" class="java.util.HashMap" createOnElement="TB">
<jb:value property="seqNo" data="TB/seqNo" />
<jb:value property="type" data="TB/type" />
<jb:value property="status" data="TB/status" />
<jb:value property="item" data="TB/item" />
<jb:value property="voucherNo" data="TB/voucherNo" />
<jb:value property="dept" data="TB/dept" />
<jb:value property="amount" data="TB/amount" />
</jb:bean>
<jb:bean beanId="transactionFooter" class="java.util.HashMap" createOnElement="TF">
<jb:value property="seqNo" data="TF/seqNo" />
<jb:value property="expireDate" data="TF/expireDate" />
<jb:value property="cardType" data="TF/cardType" />
</jb:bean>
...
</smooks-resource-list>
Given what we’ve learnt about Smooks, we can deduce what’s happening here. The JavaBean visitor for lines 10 till 17 has a selector (i.e, createOnElement) for the element TH. A selector is a quasi XPath expression applied on XML elements as they come through the stream. On viewing TH, the visitor will:
-
Instantiate a HashMap.
-
Iterate through the TH fragment. If an element inside the fragment matches the selector set in a data attribute, then (a) a map entry is created, (b) bound to the element content, and (c) put in the map.
-
Add the map to the Smooks bean context which is identified by the name set in beanID. The map overwrites any previous map in the context with the same ID. This makes sense since we want to prevent objects from accumulating in memory.
The database visitors reference the maps in the bean context:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
<?xml version='1.0' encoding='UTF-8'?>
<smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd"
xmlns:jb="http://www.milyn.org/xsd/smooks/javabean-1.4.xsd"
xmlns:csv="http://www.milyn.org/xsd/smooks/csv-1.5.xsd"
xmlns:db="http://www.milyn.org/xsd/smooks/db-routing-1.1.xsd"
xmlns:ds="http://www.milyn.org/xsd/smooks/datasource-1.3.xsd">
<csv:reader separator="~" fields="TH[seqNo,startDate,finishDate,status,type,code] | TB[seqNo,type,status,item,voucherNo,dept,amount] | TF[seqNo,expireDate,cardType]"/>
<jb:bean beanId="transactionHeader" class="java.util.HashMap" createOnElement="TH">
<jb:value property="seqNo" data="TH/seqNo" />
<jb:value property="startDate" data="TH/startDate" />
<jb:value property="finishDate" data="TH/finishDate" />
<jb:value property="status" data="TH/status" />
<jb:value property="type" data="TH/type" />
<jb:value property="code" data="TH/code" />
</jb:bean>
<jb:bean beanId="transactionBody" class="java.util.HashMap" createOnElement="TB">
<jb:value property="seqNo" data="TB/seqNo" />
<jb:value property="type" data="TB/type" />
<jb:value property="status" data="TB/status" />
<jb:value property="item" data="TB/item" />
<jb:value property="voucherNo" data="TB/voucherNo" />
<jb:value property="dept" data="TB/dept" />
<jb:value property="amount" data="TB/amount" />
</jb:bean>
<jb:bean beanId="transactionFooter" class="java.util.HashMap" createOnElement="TF">
<jb:value property="seqNo" data="TF/seqNo" />
<jb:value property="expireDate" data="TF/expireDate" />
<jb:value property="cardType" data="TF/cardType" />
</jb:bean>
<db:executor executeOnElement="TH" datasource="StagingArea">
<db:statement>INSERT INTO TransactionHeaders (seqNo, startDate, finishDate, status, type, code) VALUES (${transactionHeader.seqNo}, ${transactionHeader.startDate}, ${transactionHeader.finishDate}, ${transactionHeader.status}, ${transactionHeader.type}, ${transactionHeader.code})</db:statement>
</db:executor>
<db:executor executeOnElement="TB" datasource="StagingArea">
<db:statement>INSERT INTO TransactionBody (seqNo, type, status, item, voucherNo, dept, amount) VALUES (${transactionBody.seqNo}, ${transactionBody.type}, ${transactionBody.status}, ${transactionBody.item}, ${transactionBody.voucherNo}, ${transactionBody.dept}, ${transactionBody.amount})</db:statement>
</db:executor>
<db:executor executeOnElement="TF" datasource="StagingArea">
<db:statement>INSERT INTO TransactionFooters (seqNo, expireDate, cardType) VALUES (${transactionFooter.seqNo}, ${transactionFooter.expireDate}, ${transactionFooter.cardType})</db:statement>
</db:executor>
...
</smooks-resource-list>
The insert statements are bound to the map entry values and are executed after the element, the executeOnElement selector points to, is processed. The next step is to configure a datasource for the database visitors (lines 47-49):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
<?xml version='1.0' encoding='UTF-8'?>
<smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd"
xmlns:jb="http://www.milyn.org/xsd/smooks/javabean-1.4.xsd"
xmlns:csv="http://www.milyn.org/xsd/smooks/csv-1.5.xsd"
xmlns:db="http://www.milyn.org/xsd/smooks/db-routing-1.1.xsd"
xmlns:ds="http://www.milyn.org/xsd/smooks/datasource-1.3.xsd">
<csv:reader separator="~" fields="TH[seqNo,startDate,finishDate,status,type,code] | TB[seqNo,type,status,item,voucherNo,dept,amount] | TF[seqNo,expireDate,cardType]"/>
<jb:bean beanId="transactionHeader" class="java.util.HashMap" createOnElement="TH">
<jb:value property="seqNo" data="TH/seqNo" />
<jb:value property="startDate" data="TH/startDate" />
<jb:value property="finishDate" data="TH/finishDate" />
<jb:value property="status" data="TH/status" />
<jb:value property="type" data="TH/type" />
<jb:value property="code" data="TH/code" />
</jb:bean>
<jb:bean beanId="transactionBody" class="java.util.HashMap" createOnElement="TB">
<jb:value property="seqNo" data="TB/seqNo" />
<jb:value property="type" data="TB/type" />
<jb:value property="status" data="TB/status" />
<jb:value property="item" data="TB/item" />
<jb:value property="voucherNo" data="TB/voucherNo" />
<jb:value property="dept" data="TB/dept" />
<jb:value property="amount" data="TB/amount" />
</jb:bean>
<jb:bean beanId="transactionFooter" class="java.util.HashMap" createOnElement="TF">
<jb:value property="seqNo" data="TF/seqNo" />
<jb:value property="expireDate" data="TF/expireDate" />
<jb:value property="cardType" data="TF/cardType" />
</jb:bean>
<db:executor executeOnElement="TH" datasource="StagingArea">
<db:statement>INSERT INTO TransactionHeaders (seqNo, startDate, finishDate, status, type, code) VALUES (${transactionHeader.seqNo}, ${transactionHeader.startDate}, ${transactionHeader.finishDate}, ${transactionHeader.status}, ${transactionHeader.type}, ${transactionHeader.code})</db:statement>
</db:executor>
<db:executor executeOnElement="TB" datasource="StagingArea">
<db:statement>INSERT INTO TransactionBody (seqNo, type, status, item, voucherNo, dept, amount) VALUES (${transactionBody.seqNo}, ${transactionBody.type}, ${transactionBody.status}, ${transactionBody.item}, ${transactionBody.voucherNo}, ${transactionBody.dept}, ${transactionBody.amount})</db:statement>
</db:executor>
<db:executor executeOnElement="TF" datasource="StagingArea">
<db:statement>INSERT INTO TransactionFooters (seqNo, expireDate, cardType) VALUES (${transactionFooter.seqNo}, ${transactionFooter.expireDate}, ${transactionFooter.cardType})</db:statement>
</db:executor>
<ds:direct bindOnElement="$document" datasource="StagingArea"
driver="org.apache.derby.jdbc.EmbeddedDriver" url="jdbc:derby:memory:staging"
autoCommit="true" username="" password="" />
</smooks-resource-list>
Last but not least, the Java code to kick off the data load:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
package org.ossandme;
import org.milyn.Smooks;
import javax.xml.transform.stream.StreamSource;
public class CsvToDbTransformer {
public void transform() throws Exception {
// create a Smooks instance for loading the CSV records to the database
Smooks smooks = new Smooks(CsvToDbTransformer.class.getResourceAsStream("/transactions-to-db.xml"));
// load the records
smooks.filterSource(new StreamSource(CsvToDbTransformer.class.getResourceAsStream("/transactions.csv")));
smooks.close();
}
}
Trial III
The next challenge for Smooks makes the previous ones look like child’s play. The goal: transform an XML stream to a CSV file that is eventually uploaded to an FTP server. The input:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
<queryResult xmlns="http://ossandme.org" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<record>
<type>Account</type>
<First_Name>Carlos</First_Name>
<Last_Name>Di Sarli</Last_Name>
<ShippingStreet>San Telmo</ShippingStreet>
<ShippingCity>Buenos Aires</ShippingCity>
<ShippingState>N/A</ShippingState>
<ShippingPostalCode></ShippingPostalCode>
<Member_Tier__c>Gold</Member_Tier__c>
</record>
<record>
<type>Account</type>
<First_Name>Osvaldo</First_Name>
<Last_Name>Fresedo</Last_Name>
<ShippingStreet></ShippingStreet>
<ShippingCity>Rome</ShippingCity>
<ShippingState>N/A</ShippingState>
<ShippingPostalCode></ShippingPostalCode>
<Member_Tier__c>Silver</Member_Tier__c>
</record>
<record>
<type>Account</type>
<First_Name>Roberto</First_Name>
<Last_Name>Canelo</Last_Name>
<ShippingStreet>Venezuela</ShippingStreet>
<ShippingCity>Buenos Aires</ShippingCity>
<ShippingState>N/A</ShippingState>
<ShippingPostalCode></ShippingPostalCode>
<Member_Tier__c>Silver</Member_Tier__c>
</record>
<record>
<type>Account</type>
<First_Name>Juan</First_Name>
<Last_Name>D'Arienzo</Last_Name>
<ShippingStreet></ShippingStreet>
<ShippingCity></ShippingCity>
<ShippingState></ShippingState>
<ShippingPostalCode></ShippingPostalCode>
<Member_Tier__c>Gold</Member_Tier__c>
</record>
...
</queryResult>
The desired output:
1
2
3
4
5
6
7
000000Card Extract 20140921
Carlos~San Telmo~Buenos Aires~N/A~~Gold
Osvaldo~Fresedo~~Rome~N/A~~Silver
Roberto~Canelo~Venezuela~Buenos Aires~N/A~~Silver
Juan~D'Arienzo~~~~~Gold
...
999999002213
Considering the CSV could be large in size, my requirement was for Smooks to write the transformed content to a PipedOutputStream. An FTP library would read from the PipedOutputStream’s connected PipedInputStream, and write the streamed content to a file. To this end, I wrote the class running the transformation as follows:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
package org.ossandme;
import java.io.InputStream;
import java.io.PipedInputStream;
import java.io.PipedOutputStream;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
import org.milyn.Smooks;
public class XmlToCsvTransformer {
public InputStream transform(final InputStream inputStream) throws Exception {
// create a Smooks instance for transforming XML to CSV
final Smooks smooks = new Smooks(getClass().getResourceAsStream("/xml-to-csv.xml"));
// create an InputStream to be read by the FTP client library
PipedInputStream pipedInputStream = new PipedInputStream();
// create an OutputStream for Smooks to write the CSV to
final PipedOutputStream pipedOutputStream = new PipedOutputStream(pipedInputStream);
// smooks.filterSource(...) blocks so we carry out the transformation on a new thread
new Thread(new Runnable() {
@Override
public void run() {
// transform XML read from the InputStream to CSV
smooks.filterSource(new StreamSource(inputStream), new StreamResult(pipedOutputStream));
smooks.close();
}
});
// return the PipedInputStream to be read by the FTP client library
return pipedInputStream;
}
}
My focus then turned to the XML-to-CSV mapping configuration. After deliberation, I reluctantly settled to use the FreeMarker visitor for writing the CSV. I considered as an alternative to develop a visitor specialised for this type of transformation but time constraints made this unfeasible. The FreeMarker visitor, like the database one, cannot read directly off the XML stream. Instead, it can read from DOM and POJOs. So I decide to use the DOM visitor such that it creates DOMs from record elements found within the input stream:
1
2
3
4
5
6
7
8
<?xml version='1.0' encoding='UTF-8'?>
<smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd">
<resource-config selector="record">
<resource>org.milyn.delivery.DomModelCreator</resource>
</resource-config>
</smooks-resource-list>
I then configured the FreeMarker visitor to apply the CSV template on seeing the element record in the stream:
1
2
3
4
5
6
7
8
9
10
11
12
13
<?xml version='1.0' encoding='UTF-8'?>
<smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd"
xmlns:ftl="http://www.milyn.org/xsd/smooks/freemarker-1.1.xsd">
<resource-config selector="record">
<resource>org.milyn.delivery.DomModelCreator</resource>
</resource-config>
<ftl:freemarker applyOnElement="record">
<ftl:template>account.ftl</ftl:template>
</ftl:freemarker>
</smooks-resource-list>
Below is a simplified version of what I had in real life in account.ftl (note the last line of the template must be a newline):
1
2
<#ftl ns_prefixes={"ossandme":"http://ossandme.org"}>
${record['ossandme:First_Name']}~${record['ossandme:Last_Name']}~${record['ossandme:ShippingStreet']}~${record['ossandme:ShippingCity']}~${record['ossandme:ShippingState']}~${record['ossandme:ShippingPostalCode']}~${record['ossandme:Member_Tier__c']}
An additional complexity I had to consider were the CSV’s header and footer. Apart from being structured differently than the rest of the records, the header had to contain the current date whereas, for the footer, the total record count. What I did for the header was to bind the current date from my Java code to Smooks’s bean context (lines 27-30 and 38):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
package org.ossandme;
import java.io.InputStream;
import java.io.PipedInputStream;
import java.io.PipedOutputStream;
import java.util.Date;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
import org.milyn.Smooks;
import org.milyn.container.ExecutionContext;
public class XmlToCsvTransformer {
public InputStream transform(final InputStream inputStream) throws Exception {
// create a Smooks instance for transforming XML to CSV
final Smooks smooks = new Smooks(getClass().getResourceAsStream("/xml-to-csv.xml"));
// create an InputStream to be read by the FTP client library
PipedInputStream pipedInputStream = new PipedInputStream();
// create an OutputStream for Smooks to write the CSV to
final PipedOutputStream pipedOutputStream = new PipedOutputStream(pipedInputStream);
final ExecutionContext executionContext = smooks.createExecutionContext();
// bind the current date to Smook's bean context
executionContext.getBeanContext().addBean("now", new Date());
// smooks.filterSource(...) blocks so we carry out the transformation on a new thread
new Thread(new Runnable() {
@Override
public void run() {
// transform XML read from the InputStream to CSV
smooks.filterSource(executionContext, new StreamSource(inputStream), new StreamResult(pipedOutputStream));
smooks.close();
}
});
// return the PipedInputStream to be read by the FTP client library
return pipedInputStream;
}
}
The date is then referenced from the Smooks config (lines 9-12):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
<?xml version='1.0' encoding='UTF-8'?>
<smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd"
xmlns:ftl="http://www.milyn.org/xsd/smooks/freemarker-1.1.xsd">
<resource-config selector="record">
<resource>org.milyn.delivery.DomModelCreator</resource>
</resource-config>
<ftl:freemarker applyOnElement="#document">
<ftl:template><!--000000Card Extract ${now?string('yyyyMMdd')}
<?TEMPLATE-SPLIT-PI?>--></ftl:template>
</ftl:freemarker>
<ftl:freemarker applyOnElement="record">
<ftl:template>account.ftl</ftl:template>
</ftl:freemarker>
</smooks-resource-list>
With respect to the above config, at the start of the XML stream, FreeMarker writes the header to the output stream (i.e., PipedOutputStream):
000000Card Extract [current date]
<?TEMPLATE-SPLIT-PI?> is an embedded Smooks instruction that applies account.ftl to record elements after the header.
Adding the record count to the footer is just a matter of configuring the Calculator visitor to maintain a counter in the bean context and referencing that counter from the template:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
<?xml version='1.0' enco0ding='UTF-8'?>
<smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd"
xmlns:ftl="http://www.milyn.org/xsd/smooks/freemarker-1.1.xsd"
xmlns:calc="http://www.milyn.org/xsd/smooks/calc-1.1.xsd">
<resource-config selector="record">
<resource>org.milyn.delivery.DomModelCreator</resource>
</resource-config>
<calc:counter countOnElement="#document" beanId="totalRecordCount" start="0"/>
<calc:counter countOnElement="record" beanId="totalRecordCount" start="1"/>
<ftl:freemarker applyOnElement="#document">
<ftl:template><!--000000Card Extract ${now?string('yyyyMMdd')}
<?TEMPLATE-SPLIT-PI?>999999${totalRecordCount?string?left_pad(6, '0')}--></ftl:template>
</ftl:freemarker>
<ftl:freemarker applyOnElement="record">
<ftl:template>account.ftl</ftl:template>
</ftl:freemarker>
</smooks-resource-list>
Trial IV
The final challenge Smooks had to go against was to read from a java.util.Iterator of maps and, like the previous task, write the transformed output to a stream in CSV format. Unlike the InputStream that Smooks read from the other tasks, Smooks doesn’t have a reader that is capable of writing a properly structured XML doc from an iterator of maps. So I’m left with writing my own reader:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
package org.ossandme;
import java.io.IOException;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
import javax.xml.XMLConstants;
import org.apache.commons.lang.StringUtils;
import org.milyn.cdr.SmooksConfigurationException;
import org.milyn.container.ExecutionContext;
import org.milyn.delivery.java.JavaXMLReader;
import org.xml.sax.ContentHandler;
import org.xml.sax.DTDHandler;
import org.xml.sax.EntityResolver;
import org.xml.sax.ErrorHandler;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.SAXNotRecognizedException;
import org.xml.sax.SAXNotSupportedException;
import org.xml.sax.helpers.AttributesImpl;
public class MapIteratorSourceReader implements JavaXMLReader {
// the stream writer
private ContentHandler contentHandler;
// holds the iterator of maps
private List<Object> sourceObjects;
@Override
public ContentHandler getContentHandler() {
return contentHandler;
}
@Override
public DTDHandler getDTDHandler() {
return null;
}
@Override
public EntityResolver getEntityResolver() {
return null;
}
@Override
public ErrorHandler getErrorHandler() {
return null;
}
@Override
public boolean getFeature(String arg0) throws SAXNotRecognizedException, SAXNotSupportedException {
return false;
}
@Override
public Object getProperty(String arg0) throws SAXNotRecognizedException, SAXNotSupportedException {
return null;
}
// called by Smooks to perform transformation
@Override
public void parse(InputSource inputSource) throws IOException, SAXException {
// retrieve Iterator instance from sourceObjects; not the InputSource parameter
Iterator<Map<String, String>> iterator = (Iterator<Map<String, String>>) sourceObjects.get(0);
// write the start of the document
contentHandler.startDocument();
contentHandler.startElement(XMLConstants.NULL_NS_URI, "records", StringUtils.EMPTY, new AttributesImpl());
// iterate through the maps
while (iterator.hasNext()) {
// write a 'record' start tag to the stream for each map
contentHandler.startElement(XMLConstants.NULL_NS_URI, "record", StringUtils.EMPTY, new AttributesImpl());
// get a map from the iterator
Map<String, String> record = iterator.next();
// iterate through the map entries
for (Map.Entry<String, String> map : record.entrySet()) {
// write a start tag that is named after the entry key
contentHandler.startElement(XMLConstants.NULL_NS_URI, map.getKey(), StringUtils.EMPTY, new AttributesImpl());
if (map.getValue() != null) {
// set the element's text content to the entry value
contentHandler.characters(map.getValue().toCharArray(), 0, map.getValue().length());
}
// close the element that is mapped to an entry
contentHandler.endElement(XMLConstants.NULL_NS_URI, map.getKey(), StringUtils.EMPTY);
}
// close the 'record' element
contentHandler.endElement(XMLConstants.NULL_NS_URI, "record", StringUtils.EMPTY);
}
// close the document
contentHandler.endElement(XMLConstants.NULL_NS_URI, "records", StringUtils.EMPTY);
contentHandler.endDocument();
}
@Override
public void parse(String arg0) throws IOException, SAXException {
}
@Override
public void setContentHandler(ContentHandler contentHandler) {
this.contentHandler = contentHandler;
}
@Override
public void setDTDHandler(DTDHandler arg0) {
}
@Override
public void setEntityResolver(EntityResolver arg0) {
}
@Override
public void setErrorHandler(ErrorHandler arg0) {
}
@Override
public void setFeature(String arg0, boolean arg1) throws SAXNotRecognizedException, SAXNotSupportedException {
}
@Override
public void setProperty(String arg0, Object arg1) throws SAXNotRecognizedException, SAXNotSupportedException {
}
@Override
public void setExecutionContext(ExecutionContext executionContext) {
}
@Override
public void setSourceObjects(List<Object> sourceObjects) throws SmooksConfigurationException {
this.sourceObjects = sourceObjects;
}
}
The custom reader is hooked into Smooks as follows (line 5):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
<?xml version='1.0' encoding='UTF-8'?>
<smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd"
xmlns:ftl="http://www.milyn.org/xsd/smooks/freemarker-1.1.xsd">
<reader class="org.ossandme.MapIteratorSourceReader"/>
<resource-config selector="record">
<resource>org.milyn.delivery.DomModelCreator</resource>
</resource-config>
<ftl:freemarker applyOnElement="record">
<ftl:template>annual-census.ftl</ftl:template>
</ftl:freemarker>
</smooks-resource-list>
Finally, passing the iterator to Smooks for transformation consists of setting a JavaSource parameter, holding the iterator, on filterSource(…) (line 27):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
package org.ossandme;
import java.io.InputStream;
import java.io.PipedInputStream;
import java.io.PipedOutputStream;
import java.util.Iterator;
import java.util.Map;
import javax.xml.transform.stream.StreamResult;
import org.milyn.Smooks;
import org.milyn.payload.JavaSource;
public class MapIteratorToCsvTransformer {
public InputStream transform(final Iterator<Map<String, String>> mapIterator) throws Exception {
PipedInputStream pipedInputStream = new PipedInputStream();
final Smooks smooks = new Smooks(getClass().getResourceAsStream("/map-iterator-to-csv.xml"));
final PipedOutputStream pipedOutputStream = new PipedOutputStream(pipedInputStream);
new Thread(new Runnable() {
@Override
public void run() {
smooks.filterSource(new JavaSource(mapIterator), new StreamResult(pipedOutputStream));
smooks.close();
}
});
return pipedInputStream;
}
}