hqlopt.cpp 141 KB


  1. /*##############################################################################
  2. Copyright (C) 2011 HPCC Systems.
  3. All rights reserved. This program is free software: you can redistribute it and/or modify
  4. it under the terms of the GNU Affero General Public License as
  5. published by the Free Software Foundation, either version 3 of the
  6. License, or (at your option) any later version.
  7. This program is distributed in the hope that it will be useful,
  8. but WITHOUT ANY WARRANTY; without even the implied warranty of
  9. MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
  10. GNU Affero General Public License for more details.
  11. You should have received a copy of the GNU Affero General Public License
  12. along with this program. If not, see <http://www.gnu.org/licenses/>.
  13. ############################################################################## */
  14. #include "hqlopt.ipp"
  15. #include "hqlpmap.hpp"
  16. #include "jexcept.hpp"
  17. #include "jlog.hpp"
  18. #include "hqlutil.hpp"
  19. #include "hqlfold.hpp"
  20. #include "hqlthql.hpp"
  21. #include "hqlerrors.hpp"
  22. #include "hqlexpr.ipp" // Not needed, but without it I don't see the symbols in the debugger.
  23. #include "hqlattr.hpp"
  24. #include "hqlmeta.hpp"
  25. #define MIGRATE_JOIN_CONDITIONS // This works, but I doubt it is generally worth the effort. - maybe on a flag.
  26. //#define TRACE_USAGE
  27. /*
  28. Notes:
  29. * Need to carefully keep track of usage counts after the expression tree has been transformed, otherwise activities end up being duplicated.
  30. o The usage count of the current expression doesn't matter since it won't be referenced any more...
  31. o Any replacement nodes need to inherit the link count of the item they are replacing.
  32. o Link counts for new children need to be incremented (they may already exist so don't set to 1).
  33. o Link counts for children that are no longer used should be decremented. However since items are not
  34. combined if the children are shared they will no longer be referenced, so it won't be a disaster if
  35. that doesn't happen (note aggregate child stripping is an exception).
  36. o If removal of a node causes other child expressions to no longer be linked, the whole branch needs removing.
  37. (I don't think we currently have any examples).
  38. o I try and track new datasets created when projects are expanded.
  39. o Moving a filter over a project doesn't change the normalized inputs, so the selectorSequence doesn't need changing.
  40. Known issues:
  41. o The usage counts are done at a global level, whilst the transformations are dependent on the context. That means it might be possible
  42. to decrement a link count too many times, causing activities to appear unshared when in reality they are.
  43. o Sometimes the order the graph is traversed in produces a non optimal result. For instance filter2(filter1(project1(x)) and filter1(project2(x))
  44. would best be converted to project1(filter2([filter1(x)])) and project2[filter1(x)] where filter1(x) is shared. However it is just as likely to produce:
  45. project1(filter2,1(x)) and project2(filter1(x)) because the filters are also combined.
  46. o Similarly nodes can become unshared if
  47. i) an unshared node is optimized
  48. ii) a different (shared) node is then optimized to generate the same expression as the original.
  49. Because the second version is marked as shared it won't get transformed, but the first instance will have been.
  50. This has been worked around to a certain extent by moving some of the code into the null transformer.
  51. o Sharing between subqueries is too aggressive. This is worked around by reoptimizing the subqueries.
  52. o Constant folding can create new datasets with no associated usage. The code is now structured to allow the constant fold to
  53. be included, but I suspect it makes it too inefficient, and I don't know of any examples causing problems.
  54. */
  55. //---------------------------------------------------------------------------
  56. IHqlExpression * createFilterCondition(const HqlExprArray & conds)
  57. {
  58. if (conds.ordinality() == 0)
  59. return createConstant(true);
  60. OwnedITypeInfo boolType = makeBoolType();
  61. return createBalanced(no_and, boolType, conds);
  62. }
  63. bool optimizeFilterConditions(HqlExprArray & conds)
  64. {
  65. ForEachItemInRev(i, conds)
  66. {
  67. IHqlExpression & cur = conds.item(i);
  68. if (cur.isConstant())
  69. {
  70. OwnedHqlExpr folded = foldHqlExpression(&cur);
  71. IValue * value = folded->queryValue();
  72. if (value)
  73. {
  74. if (!value->getBoolValue())
  75. {
  76. conds.kill();
  77. conds.append(*folded.getClear());
  78. return true;
  79. }
  80. conds.remove(i);
  81. }
  82. }
  83. }
  84. return conds.ordinality() == 0;
  85. }
  86. //---------------------------------------------------------------------------
  87. ExpandMonitor::~ExpandMonitor()
  88. {
  89. if (!complex)
  90. {
  91. unsigned max = datasetsChanged.ordinality();
  92. for (unsigned i=0; i < max; i+= 2)
  93. {
  94. IHqlExpression & newValue = datasetsChanged.item(i);
  95. IHqlExpression & oldValue = datasetsChanged.item(i+1);
  96. if (newValue.queryBody() != oldValue.queryBody())// && oldValue->queryTransformExtra())
  97. optimizer.inheritUsage(&newValue, &oldValue);
  98. }
  99. }
  100. }
  101. IHqlExpression * ExpandMonitor::onExpandSelector()
  102. {
  103. //SELF.someField := LEFT
  104. complex = true;
  105. return NULL;
  106. }
  107. void ExpandMonitor::onDatasetChanged(IHqlExpression * newValue, IHqlExpression * oldValue)
  108. {
  109. //NB: Cannot call inheritUsage here because a different transform is in operation
  110. datasetsChanged.append(*LINK(newValue));
  111. datasetsChanged.append(*LINK(oldValue));
  112. }
  113. //MORE: This needs improving... especially caching. Probably stored in the expressions and used for filter scoring
  114. //(cardinality, cost, ...) - investigate some schemes + review hole implementation
  115. static bool isComplexExpansion(IHqlExpression * expr)
  116. {
  117. switch (expr->getOperator())
  118. {
  119. case no_select:
  120. {
  121. while (expr->getOperator() == no_select)
  122. {
  123. if (!expr->hasProperty(newAtom))
  124. return false;
  125. expr = expr->queryChild(0);
  126. }
  127. return true;
  128. }
  129. case NO_AGGREGATE:
  130. case no_call:
  131. case no_externalcall:
  132. case no_rowdiff:
  133. return true;
  134. case no_constant:
  135. return false;
  136. }
  137. ForEachChild(i, expr)
  138. if (isComplexExpansion(expr->queryChild(i)))
  139. return true;
  140. return false;
  141. }
  142. void ExpandComplexityMonitor::analyseTransform(IHqlExpression * transform)
  143. {
  144. ForEachChild(i, transform)
  145. {
  146. IHqlExpression * cur = transform->queryChild(i);
  147. switch (cur->getOperator())
  148. {
  149. case no_assignall:
  150. analyseTransform(cur);
  151. break;
  152. case no_assign:
  153. onExpand(cur->queryChild(0), cur->queryChild(1));
  154. break;
  155. case no_skip:
  156. if (isComplexExpansion(cur->queryChild(0)))
  157. complex = true;
  158. break;
  159. }
  160. if (complex)
  161. break;
  162. }
  163. }
  164. void ExpandComplexityMonitor::onExpand(IHqlExpression * select, IHqlExpression * newValue)
  165. {
  166. if (complex)
  167. return;
  168. if (select->isDataset())
  169. {
  170. switch (newValue->getOperator())
  171. {
  172. case no_null:
  173. case no_select:
  174. case no_getresult:
  175. case no_id2blob:
  176. //MORE: Should be a common list somewhere...
  177. break;
  178. default:
  179. complex = true;
  180. return;
  181. }
  182. }
  183. if (!newValue->isPure())
  184. complex = true;
  185. else if (isComplexExpansion(newValue))
  186. complex = true;
  187. }
  188. //---------------------------------------------------------------------------
  189. static HqlTransformerInfo cTreeOptimizerInfo("CTreeOptimizer");
  190. CTreeOptimizer::CTreeOptimizer(unsigned _options) : PARENT(cTreeOptimizerInfo)
  191. {
  192. options = _options;
  193. optimizeFlags |= TCOtransformNonActive;
  194. }
  195. IHqlExpression * CTreeOptimizer::extractFilterDs(HqlExprArray & conds, IHqlExpression * expr)
  196. {
  197. if (expr->getOperator() != no_filter || isShared(expr))
  198. return expr;
  199. IHqlExpression * ds = extractFilterDs(conds, expr->queryChild(0));
  200. unsigned max = expr->numChildren();
  201. for (unsigned i = 1; i < max; i++)
  202. {
  203. IHqlExpression * cur = queryRealChild(expr, i);
  204. if (cur)
  205. cur->unwindList(conds, no_and);
  206. }
  207. return ds;
  208. }
  209. inline IHqlExpression * makeChildList(IHqlExpression * expr)
  210. {
  211. IHqlExpression * exprList = NULL;
  212. unsigned num = expr->numChildren();
  213. for (unsigned i=1; i<num; i++)
  214. exprList = createComma(exprList, LINK(expr->queryChild(i)));
  215. return exprList;
  216. }
  217. IHqlExpression * CTreeOptimizer::removeChildNode(IHqlExpression * expr)
  218. {
  219. IHqlExpression * child = expr->queryChild(0);
  220. DBGLOG("Optimizer: Node %s remove child: %s", queryNode0Text(expr), queryNode1Text(child));
  221. noteUnused(child);
  222. return replaceChild(expr, child->queryChild(0));
  223. }
  224. IHqlExpression * CTreeOptimizer::removeParentNode(IHqlExpression * expr)
  225. {
  226. IHqlExpression * child = expr->queryChild(0);
  227. DBGLOG("Optimizer: Node %s remove self (now %s)", queryNode0Text(expr), queryNode1Text(child));
  228. // Need to dec link count of child because it is just about to inherited the link count from the parent
  229. decUsage(child);
  230. return LINK(child);
  231. }
  232. IHqlExpression * CTreeOptimizer::swapNodeWithChild(IHqlExpression * parent)
  233. {
  234. IHqlExpression * child = parent->queryChild(0);
  235. DBGLOG("Optimizer: Swap %s and %s", queryNode0Text(parent), queryNode1Text(child));
  236. OwnedHqlExpr newParent = swapDatasets(parent);
  237. //if this is the only reference to the child (almost certainly true) then no longer refd, so don't inc usage for child.
  238. noteUnused(child);
  239. if (!alreadyHasUsage(newParent))
  240. incUsage(newParent->queryChild(0));
  241. return newParent.getClear();
  242. }
  243. IHqlExpression * CTreeOptimizer::forceSwapNodeWithChild(IHqlExpression * parent)
  244. {
  245. OwnedHqlExpr swapped = swapNodeWithChild(parent);
  246. return replaceOwnedProperty(swapped, getNoHoistAttr());
  247. }
  248. IHqlExpression * CTreeOptimizer::getNoHoistAttr()
  249. {
  250. //Ensure the attribute is unique for each call to the optimizer - otherwise it stops items being hoisted that could be.
  251. if (!noHoistAttr)
  252. noHoistAttr.setown(createAttribute(_noHoist_Atom, createUniqueId()));
  253. return LINK(noHoistAttr);
  254. }
  255. IHqlExpression * CTreeOptimizer::swapNodeWithChild(IHqlExpression * parent, unsigned childIndex)
  256. {
  257. IHqlExpression * child = parent->queryChild(0);
  258. DBGLOG("Optimizer: Swap %s and %s", queryNode0Text(parent), queryNode1Text(child));
  259. OwnedHqlExpr newChild = replaceChildDataset(parent, child->queryChild(childIndex), 0);
  260. OwnedHqlExpr swapped = insertChildDataset(child, newChild, childIndex);
  261. if (!alreadyHasUsage(swapped))
  262. incUsage(newChild);
  263. noteUnused(child);
  264. return swapped.getClear();
  265. }
  266. IHqlExpression * CTreeOptimizer::swapIntoIf(IHqlExpression * expr, bool force)
  267. {
  268. IHqlExpression * child = expr->queryChild(0);
  269. //Can't optimize over a condition once a graph has been resourced, otherwise the activities aren't found.
  270. if (child->hasProperty(_resourced_Atom))
  271. return LINK(expr);
  272. IHqlExpression * body = expr->queryBody();
  273. IHqlExpression * cond = child->queryChild(0);
  274. IHqlExpression * left = child->queryChild(1);
  275. IHqlExpression * right = child->queryChild(2);
  276. OwnedHqlExpr newLeft = replaceChildDataset(body, left, 0);
  277. OwnedHqlExpr newRight = replaceChildDataset(body, right, 0);
  278. OwnedHqlExpr transformedLeft = transform(newLeft);
  279. OwnedHqlExpr transformedRight = transform(newRight);
  280. //Don't bother moving the condition over the if if it doesn't improve the code elsewhere
  281. if (force || (newLeft != transformedLeft) || (newRight != transformedRight))
  282. {
  283. //Need to call dec on all expressions that are no longer used... left and right still used by newLeft/newRight
  284. noteUnused(child);
  285. DBGLOG("Optimizer: Swap %s and %s", queryNode0Text(expr), queryNode1Text(child));
  286. HqlExprArray args;
  287. args.append(*LINK(cond));
  288. args.append(*LINK(transformedLeft));
  289. args.append(*LINK(transformedRight));
  290. OwnedHqlExpr ret = child->clone(args);
  291. if (!alreadyHasUsage(ret))
  292. {
  293. incUsage(transformedLeft);
  294. incUsage(transformedRight);
  295. }
  296. return ret.getClear();
  297. }
  298. return LINK(expr);
  299. }
  300. //NB: Similar logic to swapIntoIf()
  301. IHqlExpression * CTreeOptimizer::swapIntoAddFiles(IHqlExpression * expr, bool force)
  302. {
  303. IHqlExpression * child = expr->queryChild(0);
  304. IHqlExpression * body = expr->queryBody();
  305. bool changed = false;
  306. HqlExprArray replacedArgs;
  307. HqlExprArray transformedArgs;
  308. ForEachChild(idx, child)
  309. {
  310. IHqlExpression * in = child->queryChild(idx);
  311. if (in->isAttribute())
  312. {
  313. replacedArgs.append(*LINK(in));
  314. transformedArgs.append(*LINK(in));
  315. }
  316. else
  317. {
  318. IHqlExpression * next = replaceChild(body, in);
  319. replacedArgs.append(*next);
  320. //MORE: Will be linked too many times if changed and item already exists
  321. incUsage(next); //Link so values get correctly inherited if they are transformed.
  322. IHqlExpression * transformed = transform(next);
  323. transformedArgs.append(*transformed);
  324. if (transformed != next)
  325. changed = true;
  326. }
  327. }
  328. if (force || changed)
  329. {
  330. ForEachItemIn(i, replacedArgs)
  331. {
  332. if (&replacedArgs.item(i) != &transformedArgs.item(i))
  333. decUsage(&replacedArgs.item(i)); //If they are the same then inheritUsage wont't have been called, so don't decrement.
  334. }
  335. //Need to call dec on all expressions that are no longer used... grand children should not be decremented
  336. noteUnused(child);
  337. //And create the new funnel
  338. DBGLOG("Optimizer: Swap %s and %s", queryNode0Text(expr), queryNode1Text(child));
  339. return child->clone(transformedArgs);
  340. }
  341. //Note, replaced == args so no need to call decUsage on args
  342. ForEachItemIn(i, replacedArgs)
  343. {
  344. IHqlExpression & cur = replacedArgs.item(i);
  345. if (!cur.isAttribute())
  346. decUsage(&cur); //If they are the same then inheritUsage wont't have been called, so don't decrement.
  347. }
  348. return LINK(expr);
  349. }
  350. IHqlExpression * CTreeOptimizer::moveFilterOverSelect(IHqlExpression * expr)
  351. {
  352. IHqlExpression * select = expr->queryChild(0);
  353. if (!select->hasProperty(newAtom))
  354. return NULL;
  355. IHqlExpression * ds = select->queryChild(0);
  356. IHqlExpression * newScope = select->queryNormalizedSelector();
  357. HqlExprArray args, hoisted, notHoisted;
  358. HqlExprCopyArray inScope;
  359. unwindFilterConditions(args, expr);
  360. ForEachItemIn(i, args)
  361. {
  362. IHqlExpression & cur = args.item(i);
  363. inScope.kill();
  364. cur.gatherTablesUsed(NULL, &inScope);
  365. if (inScope.find(*newScope) == NotFound)
  366. hoisted.append(OLINK(cur));
  367. else
  368. notHoisted.append(OLINK(cur));
  369. }
  370. if (hoisted.ordinality() == 0)
  371. return NULL;
  372. DBGLOG("Optimizer: Move filter over select (%d/%d)", hoisted.ordinality(), args.ordinality());
  373. //Create a filtered dataset
  374. IHqlExpression * inDs = LINK(ds);
  375. if (inDs->isDatarow())
  376. inDs = createDatasetFromRow(inDs);
  377. hoisted.add(*inDs, 0);
  378. OwnedHqlExpr newDs = expr->clone(hoisted);
  379. //Now a select on that
  380. args.kill();
  381. unwindChildren(args, select);
  382. args.replace(*LINK(newDs), 0);
  383. OwnedHqlExpr newSelect = select->clone(args);
  384. if (!alreadyHasUsage(newSelect))
  385. incUsage(newDs);
  386. if (notHoisted.ordinality())
  387. {
  388. notHoisted.add(*LINK(select), 0);
  389. OwnedHqlExpr unhoistedFilter = expr->clone(notHoisted);
  390. OwnedHqlExpr ret = replaceChild(unhoistedFilter, newSelect);
  391. if (!alreadyHasUsage(ret))
  392. incUsage(newSelect);
  393. return ret.getClear();
  394. }
  395. return newSelect.getClear();
  396. }
  397. IHqlExpression * CTreeOptimizer::optimizeAggregateUnsharedDataset(IHqlExpression * expr, bool isSimpleCount)
  398. {
  399. if (isShared(expr) || (getNumChildTables(expr) != 1))
  400. return LINK(expr);
  401. //Don't include any operations which rely on the order/distribution:
  402. bool childIsSimpleCount = isSimpleCount;
  403. node_operator op = expr->getOperator();
  404. IHqlExpression * ds = expr->queryChild(0);
  405. switch (op)
  406. {
  407. case no_filter:
  408. case no_aggregate:
  409. childIsSimpleCount = false;
  410. break;
  411. case no_hqlproject:
  412. case no_newusertable:
  413. case no_newaggregate:
  414. case no_sort:
  415. case no_distribute:
  416. case no_keyeddistribute:
  417. case no_fetch:
  418. case no_transformebcdic:
  419. case no_transformascii:
  420. if (childIsSimpleCount && !isPureActivity(expr))
  421. childIsSimpleCount = false;
  422. break;
  423. case no_compound_indexread:
  424. case no_compound_diskread:
  425. break;
  426. case no_limit:
  427. if (expr->hasProperty(onFailAtom))
  428. return LINK(expr);
  429. //fall through
  430. case no_choosen:
  431. case no_topn:
  432. if (isSimpleCount)
  433. break;
  434. return LINK(expr);
  435. default:
  436. return LINK(expr);
  437. }
  438. OwnedHqlExpr optimizedDs = optimizeAggregateUnsharedDataset(ds, childIsSimpleCount);
  439. //Remove items that are really inefficient and unnecessary, but don't for the moment remove projects or anything that changes the
  440. //record structure.
  441. switch (op)
  442. {
  443. case no_sort:
  444. case no_distribute:
  445. case no_keyeddistribute:
  446. noteUnused(expr);
  447. return optimizedDs.getClear();
  448. case no_topn:
  449. {
  450. assertex(isSimpleCount);
  451. noteUnused(expr);
  452. OwnedHqlExpr ret = createDataset(no_choosen, optimizedDs.getClear(), LINK(expr->queryChild(2)));
  453. incUsage(ret);
  454. return expr->cloneAllAnnotations(ret);
  455. }
  456. case no_hqlproject:
  457. case no_newusertable:
  458. if (isSimpleCount && (options & HOOinsidecompound))
  459. {
  460. if (expr->hasProperty(_countProject_Atom) || expr->hasProperty(prefetchAtom))
  461. break;
  462. if (isPureActivity(expr) && !isAggregateDataset(expr))
  463. {
  464. noteUnused(expr);
  465. return optimizedDs.getClear();
  466. }
  467. }
  468. break;
  469. }
  470. if (ds == optimizedDs)
  471. return LINK(expr);
  472. OwnedHqlExpr replaced = replaceChild(expr, optimizedDs);
  473. incUsage(replaced);
  474. noteUnused(expr);
  475. return replaced.getClear();
  476. }
  477. IHqlExpression * CTreeOptimizer::optimizeAggregateDataset(IHqlExpression * transformed)
  478. {
  479. HqlExprArray children;
  480. unwindChildren(children, transformed);
  481. IHqlExpression * root = &children.item(0);
  482. HqlExprAttr ds = root;
  483. IHqlExpression * wrapper = NULL;
  484. node_operator aggOp = transformed->getOperator();
  485. bool insideShared = false;
  486. bool isScalarAggregate = (aggOp != no_newaggregate) && (aggOp != no_aggregate);
  487. bool isSimpleCount = isSimpleCountExistsAggregate(transformed, false, true);
  488. loop
  489. {
  490. node_operator dsOp = ds->getOperator();
  491. IHqlExpression * next = NULL;
  492. switch (dsOp)
  493. {
  494. case no_hqlproject:
  495. case no_newusertable:
  496. if (ds->hasProperty(prefetchAtom))
  497. break;
  498. //Don't remove projects for the moment because they can make counts of disk reads much less
  499. //efficient. Delete the following lines once we have a count-diskread activity
  500. if (!isScalarAggregate && !(options & (HOOcompoundproject|HOOinsidecompound)) && !ds->hasProperty(_countProject_Atom) )
  501. break;
  502. if (isPureActivity(ds) && !isAggregateDataset(ds))
  503. {
  504. OwnedMapper mapper = getMapper(ds);
  505. ExpandSelectorMonitor expandMonitor(*this);
  506. HqlExprArray newChildren;
  507. unsigned num = children.ordinality();
  508. LinkedHqlExpr oldDs = ds;
  509. LinkedHqlExpr newDs = ds->queryChild(0);
  510. if (transformed->getOperator() == no_aggregate)
  511. {
  512. oldDs.setown(createSelector(no_left, ds, querySelSeq(transformed)));
  513. newDs.setown(createSelector(no_left, newDs, querySelSeq(transformed)));
  514. }
  515. for (unsigned idx = 1; idx < num; idx++)
  516. {
  517. OwnedHqlExpr mapped = expandFields(mapper, &children.item(idx), oldDs, newDs, &expandMonitor);
  518. if (containsCounter(mapped))
  519. expandMonitor.setComplex();
  520. newChildren.append(*mapped.getClear());
  521. }
  522. if (!expandMonitor.isComplex())
  523. {
  524. for (unsigned idx = 1; idx < num; idx++)
  525. children.replace(OLINK(newChildren.item(idx-1)), idx);
  526. next = ds->queryChild(0);
  527. }
  528. }
  529. break;
  530. case no_fetch:
  531. if (ds->queryChild(3)->isPure())
  532. next = ds->queryChild(1);
  533. break;
  534. case no_group:
  535. if (isScalarAggregate)
  536. next = ds->queryChild(0);
  537. break;
  538. case no_sort:
  539. case no_sorted:
  540. //MORE: Allowed if the transform is commutative for no_aggregate
  541. if (aggOp != no_aggregate)
  542. next = ds->queryChild(0);
  543. break;
  544. case no_distribute:
  545. case no_distributed:
  546. case no_keyeddistribute:
  547. case no_preservemeta:
  548. if (isScalarAggregate || !isGrouped(ds->queryChild(0)))
  549. next = ds->queryChild(0);
  550. break;
  551. case no_preload:
  552. wrapper = ds;
  553. next = ds->queryChild(0);
  554. break;
  555. }
  556. if (!next)
  557. break;
  558. if (!insideShared)
  559. {
  560. insideShared = isShared(ds);
  561. noteUnused(ds);
  562. }
  563. ds.set(next);
  564. }
  565. //Not completely sure about usageCounting being maintained correctly
  566. if (!insideShared)
  567. {
  568. OwnedHqlExpr newDs = (aggOp != no_aggregate) ? optimizeAggregateUnsharedDataset(ds, isSimpleCount) : LINK(ds);
  569. if (newDs != ds)
  570. {
  571. HqlMapTransformer mapper;
  572. mapper.setMapping(ds, newDs);
  573. mapper.setSelectorMapping(ds, newDs);
  574. ForEachItemIn(i, children)
  575. children.replace(*mapper.transformRoot(&children.item(i)), i);
  576. ds.set(newDs);
  577. }
  578. }
  579. if (ds == root)
  580. return LINK(transformed);
  581. if (wrapper)
  582. {
  583. if (ds == root->queryChild(0))
  584. {
  585. incUsage(root);
  586. return LINK(transformed);
  587. }
  588. }
  589. //A different node is now shared between the graphs
  590. if (insideShared)
  591. incUsage(ds);
  592. if (wrapper)
  593. {
  594. HqlExprArray args;
  595. args.append(*ds.getClear());
  596. unwindChildren(args, wrapper, 1);
  597. ds.setown(wrapper->clone(args));
  598. incUsage(ds);
  599. }
  600. DBGLOG("Optimizer: Aggregate replace %s with %s", queryNode0Text(root), queryNode1Text(ds));
  601. children.replace(*ds.getClear(), 0);
  602. return transformed->clone(children);
  603. }
  604. IHqlExpression * CTreeOptimizer::optimizeDatasetIf(IHqlExpression * transformed)
  605. {
  606. //if(cond, ds(filt1), ds(filt2)) => ds(if(cond,filt1,filt2))
  607. HqlExprArray leftFilter, rightFilter;
  608. IHqlExpression * left = extractFilterDs(leftFilter, transformed->queryChild(1));
  609. IHqlExpression * right = extractFilterDs(rightFilter, transformed->queryChild(2));
  610. if (left->queryBody() == right->queryBody())
  611. {
  612. HqlExprArray args;
  613. args.append(*LINK(left));
  614. // intersectConditions(args, leftFilter, rightFilter);
  615. OwnedHqlExpr leftCond = createFilterCondition(leftFilter);
  616. OwnedHqlExpr rightCond = createFilterCondition(rightFilter);
  617. if (leftCond == rightCond)
  618. {
  619. args.append(*leftCond.getClear());
  620. }
  621. else
  622. {
  623. IHqlExpression * cond = transformed->queryChild(0);
  624. args.append(*createValue(no_if, cond->getType(), LINK(cond), leftCond.getClear(), rightCond.getClear()));
  625. }
  626. OwnedHqlExpr ret = createDataset(no_filter, args);
  627. DBGLOG("Optimizer: Convert %s to a filter", queryNode0Text(transformed));
  628. //NOTE: left and right never walk over any shared nodes, so don't need to decrement usage for
  629. //child(1), child(2) or intermediate nodes to left/right, since not referenced any more.
  630. noteUnused(right); // dataset is now used one less time
  631. return transformed->cloneAllAnnotations(ret);
  632. }
  633. return LINK(transformed);
  634. }
  635. IHqlExpression * CTreeOptimizer::optimizeIf(IHqlExpression * expr)
  636. {
  637. IHqlExpression * trueExpr = expr->queryChild(1);
  638. IHqlExpression * falseExpr = expr->queryChild(2);
  639. if (!falseExpr)
  640. return NULL;
  641. if (trueExpr->queryBody() == falseExpr->queryBody())
  642. {
  643. noteUnused(trueExpr); // inherit usage() will increase the usage again
  644. noteUnused(falseExpr);
  645. return LINK(trueExpr);
  646. }
  647. IHqlExpression * cond = expr->queryChild(0);
  648. IValue * condValue = cond->queryValue();
  649. if (condValue)
  650. {
  651. if (condValue->getBoolValue())
  652. {
  653. recursiveDecUsage(falseExpr);
  654. decUsage(trueExpr); // inherit usage() will increase the usage again
  655. return LINK(trueExpr);
  656. }
  657. else
  658. {
  659. recursiveDecUsage(trueExpr);
  660. decUsage(falseExpr); // inherit usage() will increase the usage again
  661. return LINK(falseExpr);
  662. }
  663. }
  664. //Usage counts aren't handled correctly for datarows, so only optimize datasets, otherwise it can get bigger.
  665. if (!expr->isDataset())
  666. return NULL;
  667. //if(c1, if(c2, x, y), z) y==z => if(c1 && c2, x, z)
  668. //if(c1, if(c2, x, y), z) x==z => if(c1 && !c2, y, z)
  669. //if(c1, z, if(c2, x, y)) x==z => if(c1 || c2, z, y)
  670. //if(c1, z, if(c2, x, y)) y==z => if(c1 || !c2, z, x)
  671. //Only do these changes if c2 has no additional dependencies than c1
  672. HqlExprArray args;
  673. if ((trueExpr->getOperator() == no_if) && !isShared(trueExpr))
  674. {
  675. IHqlExpression * childCond = trueExpr->queryChild(0);
  676. if (introducesNewDependencies(cond, childCond))
  677. return NULL;
  678. IHqlExpression * childTrue = trueExpr->queryChild(1);
  679. IHqlExpression * childFalse = trueExpr->queryChild(2);
  680. if (falseExpr->queryBody() == childFalse->queryBody())
  681. {
  682. args.append(*createBoolExpr(no_and, LINK(cond), LINK(childCond)));
  683. args.append(*LINK(childTrue));
  684. args.append(*LINK(falseExpr));
  685. }
  686. else if (falseExpr->queryBody() == childTrue->queryBody())
  687. {
  688. args.append(*createBoolExpr(no_and, LINK(cond), getInverse(childCond)));
  689. args.append(*LINK(childFalse));
  690. args.append(*LINK(falseExpr));
  691. }
  692. if (args.ordinality())
  693. {
  694. DBGLOG("Optimizer: Merge %s and %s", queryNode0Text(expr), queryNode1Text(trueExpr));
  695. noteUnused(falseExpr);
  696. }
  697. }
  698. if (args.empty() && (falseExpr->getOperator() == no_if) && !isShared(falseExpr))
  699. {
  700. IHqlExpression * childCond = falseExpr->queryChild(0);
  701. if (introducesNewDependencies(cond, childCond))
  702. return NULL;
  703. IHqlExpression * childTrue = falseExpr->queryChild(1);
  704. IHqlExpression * childFalse = falseExpr->queryChild(2);
  705. if (trueExpr->queryBody() == childTrue->queryBody())
  706. {
  707. args.append(*createBoolExpr(no_or, LINK(cond), LINK(childCond)));
  708. args.append(*LINK(trueExpr));
  709. args.append(*LINK(childFalse));
  710. }
  711. else if (trueExpr->queryBody() == childFalse->queryBody())
  712. {
  713. args.append(*createBoolExpr(no_or, LINK(cond), getInverse(childCond)));
  714. args.append(*LINK(trueExpr));
  715. args.append(*LINK(childTrue));
  716. }
  717. if (args.ordinality())
  718. {
  719. DBGLOG("Optimizer: Merge %s and %s", queryNode0Text(expr), queryNode1Text(falseExpr));
  720. noteUnused(trueExpr);
  721. }
  722. }
  723. if (args.ordinality())
  724. return expr->clone(args);
  725. return NULL;
  726. }
  727. bool CTreeOptimizer::expandFilterCondition(HqlExprArray & expanded, HqlExprArray & unexpanded, IHqlExpression * filter, bool moveOver, bool onlyKeyed)
  728. {
  729. HqlExprArray conds;
  730. unwindFilterConditions(conds, filter);
  731. IHqlExpression * child = filter->queryChild(0);
  732. IHqlExpression * grandchild = child->queryChild(0);
  733. OwnedMapper mapper = getMapper(child);
  734. ForEachItemIn(i, conds)
  735. {
  736. IHqlExpression * cur = &conds.item(i);
  737. bool isKeyed = containsAssertKeyed(cur);
  738. if (!onlyKeyed || isKeyed || (options & HOOfiltersharedproject) )
  739. {
  740. ExpandComplexityMonitor expandMonitor(*this);
  741. OwnedHqlExpr expandedFilter;
  742. if (moveOver)
  743. expandedFilter.setown(expandFields(mapper, cur, child, grandchild, &expandMonitor));
  744. else
  745. expandedFilter.setown(mapper->expandFields(cur, child, grandchild, grandchild, &expandMonitor));
  746. if (expandedFilter->isConstant())
  747. {
  748. expandedFilter.setown(foldHqlExpression(expandedFilter));
  749. IValue * value = expandedFilter->queryValue();
  750. if (value && !value->getBoolValue())
  751. {
  752. if (onlyKeyed)
  753. DBGLOG("Optimizer: Merging filter over shared project always false");
  754. expanded.kill();
  755. expanded.append(*LINK(expandedFilter));
  756. return true;
  757. }
  758. }
  759. if ((!onlyKeyed || isKeyed) && !expandMonitor.isComplex())
  760. expanded.append(*LINK(expandedFilter));
  761. else
  762. unexpanded.append(*LINK(cur));
  763. }
  764. else
  765. unexpanded.append(*LINK(cur));
  766. }
  767. return expanded.ordinality() != 0;
  768. }
  769. IHqlExpression * CTreeOptimizer::hoistMetaOverProject(IHqlExpression * expr)
  770. {
  771. IHqlExpression * child = expr->queryChild(0);
  772. if (hasUnknownTransform(child))
  773. return NULL;
  774. IHqlExpression * grandchild = child->queryChild(0);
  775. IHqlExpression * active = queryActiveTableSelector();
  776. try
  777. {
  778. OwnedMapper mapper = getMapper(child);
  779. HqlExprArray args;
  780. args.append(*LINK(grandchild));
  781. ForEachChildFrom(i, expr, 1)
  782. {
  783. IHqlExpression * cur = expr->queryChild(i);
  784. args.append(*expandFields(mapper, cur, active, active, NULL));
  785. }
  786. OwnedHqlExpr newPreserve = expr->clone(args);
  787. OwnedHqlExpr newProject = replaceChild(child, newPreserve);
  788. decUsage(child);
  789. if (!alreadyHasUsage(newProject))
  790. incUsage(newPreserve);
  791. return newProject.getClear();
  792. }
  793. catch (IException * e)
  794. {
  795. //Can possibly occur if the field has been optimized away. (see bug #76896)
  796. e->Release();
  797. return NULL;
  798. }
  799. }
  800. IHqlExpression * CTreeOptimizer::hoistFilterOverProject(IHqlExpression * transformed, bool onlyKeyed)
  801. {
  802. IHqlExpression * child = transformed->queryChild(0);
  803. //Should be able to move filters over count projects, as long as not filtering on the count fields.
  804. //Would need to add a containsCounter() test in the expandFields code - cannot just test filterExpr
  805. //because counter may be there (e.g., countindex3.hql)
  806. if (child->hasProperty(_countProject_Atom) || child->hasProperty(prefetchAtom) || isAggregateDataset(child))
  807. return NULL;
  808. if (hasUnknownTransform(child))
  809. return NULL;
  810. HqlExprArray expanded, unexpanded;
  811. if (expandFilterCondition(expanded, unexpanded, transformed, true, onlyKeyed))
  812. {
  813. if (optimizeFilterConditions(expanded))
  814. return getOptimizedFilter(transformed, expanded);
  815. OwnedHqlExpr filterExpr = createFilterCondition(expanded);
  816. if (unexpanded.ordinality())
  817. DBGLOG("Optimizer: Move %d/%d filters over %s", expanded.ordinality(), expanded.ordinality()+unexpanded.ordinality(), queryNode1Text(child));
  818. else
  819. DBGLOG("Optimizer: Swap %s and %s", queryNode0Text(transformed), queryNode1Text(child));
  820. IHqlExpression * newGrandchild = child->queryChild(0);
  821. OwnedHqlExpr newFilter = createDataset(no_filter, LINK(newGrandchild), LINK(filterExpr));
  822. newFilter.setown(transformed->cloneAllAnnotations(newFilter));
  823. OwnedHqlExpr ret = replaceChild(child, newFilter);
  824. if (!alreadyHasUsage(ret))
  825. incUsage(newFilter);
  826. noteUnused(child);
  827. if (unexpanded.ordinality() == 0)
  828. return ret.getClear();
  829. unexpanded.add(*LINK(child), 0);
  830. OwnedHqlExpr unhoistedFilter = transformed->clone(unexpanded);
  831. OwnedHqlExpr newUnhoistedFilter = replaceChild(unhoistedFilter, ret);
  832. if (!alreadyHasUsage(newUnhoistedFilter))
  833. incUsage(ret);
  834. return newUnhoistedFilter.getClear();
  835. }
  836. return NULL;
  837. }
  838. IHqlExpression * CTreeOptimizer::getHoistedFilter(IHqlExpression * transformed, bool canHoistLeft, bool canMergeLeft, bool canHoistRight, bool canMergeRight, unsigned conditionIndex)
  839. {
  840. HqlExprArray conds;
  841. unwindFilterConditions(conds, transformed);
  842. IHqlExpression * child = transformed->queryChild(0);
  843. IHqlExpression * left = child->queryChild(0);
  844. IHqlExpression * right = queryJoinRhs(child);
  845. IHqlExpression * seq = querySelSeq(child);
  846. OwnedHqlExpr leftSelector = createSelector(no_left, left, seq);
  847. OwnedHqlExpr rightSelector = createSelector(no_right, right, seq);
  848. OwnedHqlExpr activeLeft = ensureActiveRow(left);
  849. OwnedHqlExpr activeRight = ensureActiveRow(right);
  850. OwnedMapper mapper = getMapper(child);
  851. HqlExprArray expanded, unexpanded, leftFilters, rightFilters;;
  852. ForEachItemIn(i, conds)
  853. {
  854. ExpandComplexityMonitor expandMonitor(*this);
  855. IHqlExpression * cur = &conds.item(i);
  856. OwnedHqlExpr expandedFilter = mapper->expandFields(cur, child, NULL, NULL, &expandMonitor);
  857. bool matched = false;
  858. if (expandedFilter->isConstant())
  859. {
  860. expandedFilter.setown(foldHqlExpression(expandedFilter));
  861. IValue * value = expandedFilter->queryValue();
  862. if (value)
  863. {
  864. if (!value->getBoolValue())
  865. return getOptimizedFilter(transformed, false);
  866. else
  867. matched = true;
  868. }
  869. }
  870. if (!matched && !expandMonitor.isComplex())
  871. {
  872. OwnedHqlExpr leftMappedFilter = replaceSelector(expandedFilter, leftSelector, activeLeft);
  873. OwnedHqlExpr rightMappedFilter = replaceSelector(expandedFilter, rightSelector, activeRight);
  874. //MORE: Could also take join conditions into account to sent filter up both sides;
  875. if (rightMappedFilter==expandedFilter)
  876. {
  877. //Only contains LEFT.
  878. if (canHoistLeft)
  879. {
  880. leftFilters.append(*LINK(leftMappedFilter));
  881. matched = true;
  882. }
  883. else if (canMergeLeft && (conditionIndex != NotFound))
  884. {
  885. expanded.append(*LINK(expandedFilter));
  886. matched = true;
  887. }
  888. //If the filter expression is invariant of left and right then hoist up both paths.
  889. if (leftMappedFilter==expandedFilter && canHoistRight)
  890. {
  891. rightFilters.append(*LINK(expandedFilter));
  892. matched = true;
  893. }
  894. }
  895. else if (leftMappedFilter==expandedFilter)
  896. {
  897. //Only contains RIGHT.
  898. if (canHoistRight)
  899. {
  900. rightFilters.append(*LINK(rightMappedFilter));
  901. matched = true;
  902. }
  903. else if (canMergeRight && (conditionIndex != NotFound))
  904. {
  905. expanded.append(*LINK(expandedFilter));
  906. matched = true;
  907. }
  908. }
  909. else if (canMergeLeft && canMergeRight && conditionIndex != NotFound)
  910. {
  911. expanded.append(*LINK(expandedFilter));
  912. matched = true;
  913. }
  914. }
  915. if (!matched)
  916. unexpanded.append(*LINK(cur));
  917. }
  918. if (leftFilters.ordinality() || rightFilters.ordinality() || expanded.ordinality())
  919. {
  920. LinkedHqlExpr ret = child;
  921. //first insert filters on the left/right branches
  922. if (leftFilters.ordinality())
  923. ret.setown(createHoistedFilter(ret, leftFilters, 0, conds.ordinality()));
  924. if (rightFilters.ordinality())
  925. ret.setown(createHoistedFilter(ret, rightFilters, 1, conds.ordinality()));
  926. //extend the join condition where appropriate
  927. if (expanded.ordinality())
  928. {
  929. DBGLOG("Optimizer: Merge filters(%d/%d) into %s condition", expanded.ordinality(), conds.ordinality(), queryNode1Text(child));
  930. OwnedITypeInfo boolType = makeBoolType();
  931. HqlExprArray args;
  932. unwindChildren(args, ret);
  933. expanded.add(OLINK(args.item(conditionIndex)), 0);
  934. args.replace(*createBalanced(no_and, boolType, expanded), conditionIndex);
  935. ret.setown(ret->clone(args));
  936. }
  937. if (ret != child)
  938. noteUnused(child);
  939. //Now add the item that couldn't be hoisted.
  940. if (unexpanded.ordinality())
  941. {
  942. if (ret != child)
  943. incUsage(ret);
  944. unexpanded.add(*LINK(child), 0);
  945. OwnedHqlExpr unhoistedFilter = transformed->clone(unexpanded);
  946. ret.setown(replaceChild(unhoistedFilter, ret));
  947. }
  948. return ret.getClear();
  949. }
  950. else if (unexpanded.ordinality() == 0)
  951. //All filters expanded to true => remove the filter
  952. return getOptimizedFilter(transformed, true) ;
  953. return NULL;
  954. }
  955. IHqlExpression * CTreeOptimizer::createHoistedFilter(IHqlExpression * expr, HqlExprArray & conditions, unsigned childIndex, unsigned maxConditions)
  956. {
  957. IHqlExpression * grand = expr->queryChild(childIndex);
  958. DBGLOG("Optimizer: Hoisting filter(%d/%d) over %s.%d", conditions.ordinality(), maxConditions, queryNode0Text(expr), childIndex);
  959. conditions.add(*LINK(grand), 0);
  960. OwnedHqlExpr hoistedFilter = createDataset(no_filter, conditions);
  961. OwnedHqlExpr ret = insertChildDataset(expr, hoistedFilter, childIndex);
  962. if (!alreadyHasUsage(ret))
  963. incUsage(hoistedFilter);
  964. return ret.getClear();
  965. }
  966. IHqlExpression * CTreeOptimizer::queryPromotedFilter(IHqlExpression * expr, node_operator side, unsigned childIndex)
  967. {
  968. IHqlExpression * child = expr->queryChild(0);
  969. IHqlExpression * grand = child->queryChild(childIndex);
  970. OwnedMapper mapper = getMapper(child);
  971. HqlExprArray conds;
  972. unwindFilterConditions(conds, expr);
  973. HqlExprArray hoisted, unhoisted;
  974. OwnedHqlExpr mapParent = createSelector(side, grand, querySelSeq(child));
  975. ForEachItemIn(i1, conds)
  976. {
  977. IHqlExpression & cur = conds.item(i1);
  978. bool ok = false;
  979. OwnedHqlExpr collapsed = mapper->collapseFields(&cur, child, grand, mapParent, &ok);
  980. if (ok)
  981. hoisted.append(*collapsed.getClear());
  982. else
  983. unhoisted.append(OLINK(cur));
  984. }
  985. if (hoisted.ordinality() == 0)
  986. return NULL;
  987. DBGLOG("Optimizer: Hoisting filter(%d/%d) over %s", hoisted.ordinality(), hoisted.ordinality()+unhoisted.ordinality(), queryNode0Text(child));
  988. OwnedHqlExpr newChild = createHoistedFilter(child, hoisted, childIndex, conds.ordinality());
  989. noteUnused(child);
  990. if (unhoisted.ordinality() == 0)
  991. return newChild.getLink();
  992. unhoisted.add(*LINK(child), 0);
  993. OwnedHqlExpr unhoistedFilter = createDataset(no_filter, unhoisted);
  994. OwnedHqlExpr newUnhoistedFilter = replaceChild(unhoistedFilter, newChild);
  995. if (!alreadyHasUsage(newUnhoistedFilter))
  996. incUsage(newChild);
  997. return newUnhoistedFilter.getClear();
  998. }
  999. bool CTreeOptimizer::extractSingleFieldTempTable(IHqlExpression * expr, OwnedHqlExpr & retField, OwnedHqlExpr & retValues)
  1000. {
  1001. IHqlExpression * record = expr->queryRecord();
  1002. IHqlExpression * field = NULL;
  1003. ForEachChild(i, record)
  1004. {
  1005. IHqlExpression * cur = record->queryChild(i);
  1006. switch (cur->getOperator())
  1007. {
  1008. case no_record:
  1009. case no_ifblock:
  1010. return false;
  1011. case no_field:
  1012. if (cur->queryRecord() || field)
  1013. return false;
  1014. field = cur;
  1015. break;
  1016. }
  1017. }
  1018. if (!field)
  1019. return false;
  1020. OwnedHqlExpr values = normalizeListCasts(expr->queryChild(0));
  1021. switch (values->getOperator())
  1022. {
  1023. case no_null:
  1024. break;
  1025. case no_recordlist:
  1026. {
  1027. HqlExprArray args;
  1028. ITypeInfo * fieldType = field->queryType();
  1029. ForEachChild(i, values)
  1030. {
  1031. IHqlExpression * cur = values->queryChild(i);
  1032. if (cur->getOperator() != no_rowvalue)
  1033. return false;
  1034. args.append(*ensureExprType(cur->queryChild(0), fieldType));
  1035. }
  1036. values.setown(createValue(no_list, makeSetType(LINK(fieldType)), args));
  1037. }
  1038. break;
  1039. default:
  1040. if (values->queryType()->getTypeCode() != type_set)
  1041. return false;
  1042. break;
  1043. }
  1044. retField.set(field);
  1045. retValues.setown(values.getClear());
  1046. return true;
  1047. }
  1048. IHqlExpression * mapJoinConditionToFilter(IHqlExpression * expr, IHqlExpression * search, IHqlExpression * replace)
  1049. {
  1050. switch (expr->getOperator())
  1051. {
  1052. case no_and:
  1053. case no_or:
  1054. {
  1055. HqlExprArray args;
  1056. ForEachChild(i, expr)
  1057. {
  1058. IHqlExpression * mapped = mapJoinConditionToFilter(expr->queryChild(i), search, replace);
  1059. if (!mapped)
  1060. return NULL;
  1061. args.append(*mapped);
  1062. }
  1063. return expr->clone(args);
  1064. }
  1065. case no_eq:
  1066. {
  1067. IHqlExpression * l = expr->queryChild(0);
  1068. IHqlExpression * r = expr->queryChild(1);
  1069. if (l == search)
  1070. return createValue(no_in, makeBoolType(), LINK(r), LINK(replace));
  1071. if (r == search)
  1072. return createValue(no_in, makeBoolType(), LINK(l), LINK(replace));
  1073. break;
  1074. }
  1075. }
  1076. OwnedHqlExpr temp = replaceExpression(expr, search, replace);
  1077. if (temp != expr)
  1078. return NULL;
  1079. return LINK(expr);
  1080. }
  1081. /*
  1082. Convert join(inline-dataset, x, condition, transform, ...) to
  1083. project(x(condition'), t')
  1084. */
  1085. IHqlExpression * CTreeOptimizer::optimizeInlineJoin(IHqlExpression * expr)
  1086. {
  1087. //This doesn't really work because the input dataset could contain duplicates, which would generate duplicate
  1088. //values for the keyed join, but not for the index read.
  1089. //I could spot a dedup(ds, all) and then allow it, but it's a bit messy.
  1090. return NULL;
  1091. if (!isSimpleInnerJoin(expr) || expr->hasProperty(keyedAtom))
  1092. return NULL;
  1093. //Probably probably keep the following...
  1094. if (expr->hasProperty(allAtom) || expr->hasProperty(_lightweight_Atom) || expr->hasProperty(lookupAtom) ||
  1095. expr->hasProperty(hashAtom))
  1096. return NULL;
  1097. if (expr->hasProperty(localAtom) || expr->hasProperty(atmostAtom) || expr->hasProperty(onFailAtom))
  1098. return NULL;
  1099. IHqlExpression * key = expr->queryChild(1);
  1100. switch (key->getOperator())
  1101. {
  1102. case no_newkeyindex:
  1103. //more - e.g., inline child query stuff
  1104. break;
  1105. default:
  1106. //probably always more efficient.
  1107. break;
  1108. return false;
  1109. }
  1110. IHqlExpression * tempTable = expr->queryChild(0);
  1111. if (tempTable->getOperator() != no_temptable)
  1112. return NULL;
  1113. OwnedHqlExpr field, values;
  1114. if (!extractSingleFieldTempTable(tempTable, field, values))
  1115. return NULL;
  1116. IHqlExpression * joinSeq = querySelSeq(expr);
  1117. OwnedHqlExpr newSeq = createSelectorSequence();
  1118. OwnedHqlExpr left = createSelector(no_left, tempTable, joinSeq);
  1119. OwnedHqlExpr right = createSelector(no_right, key, joinSeq);
  1120. OwnedHqlExpr rightAsLeft = createSelector(no_left, key, newSeq);
  1121. OwnedHqlExpr selectLeft = createSelectExpr(LINK(left), LINK(field));
  1122. OwnedHqlExpr activeDs = ensureActiveRow(key);
  1123. //Transform can't refer to left hand side.
  1124. IHqlExpression * transform = expr->queryChild(3);
  1125. OwnedHqlExpr mapped = replaceExpression(transform, left, right);
  1126. if (mapped != transform)
  1127. return NULL;
  1128. OwnedHqlExpr cond = replaceSelector(expr->queryChild(2), right, activeDs);
  1129. OwnedHqlExpr mappedCond = mapJoinConditionToFilter(cond, selectLeft, values);
  1130. if (!mappedCond)
  1131. return NULL;
  1132. OwnedHqlExpr replacement = createDataset(no_filter, LINK(key), mappedCond.getClear());
  1133. OwnedHqlExpr newTransform = replaceExpression(transform, right, rightAsLeft);
  1134. replacement.setown(createDataset(no_hqlproject, replacement.getClear(), createComma(newTransform.getClear(), LINK(newSeq))));
  1135. return replacement.getClear();
  1136. }
  1137. IHqlExpression * splitJoinFilter(IHqlExpression * expr, HqlExprArray * leftOnly, HqlExprArray * rightOnly)
  1138. {
  1139. node_operator op = expr->getOperator();
  1140. switch (op)
  1141. {
  1142. case no_assertkeyed:
  1143. case no_and:
  1144. {
  1145. HqlExprArray args;
  1146. ForEachChild(i, expr)
  1147. {
  1148. IHqlExpression * next = splitJoinFilter(expr->queryChild(i), leftOnly, rightOnly);
  1149. if (next)
  1150. args.append(*next);
  1151. }
  1152. unsigned numRealArgs = args.ordinality() - numAttributes(args);
  1153. if (numRealArgs == 0)
  1154. return NULL;
  1155. if ((numRealArgs == 1) && (op == no_and))
  1156. return LINK(&args.item(0));
  1157. return cloneOrLink(expr, args);
  1158. }
  1159. }
  1160. HqlExprCopyArray scopeUsed;
  1161. expr->gatherTablesUsed(NULL, &scopeUsed);
  1162. if (scopeUsed.ordinality() == 1)
  1163. {
  1164. node_operator scopeOp = scopeUsed.item(0).getOperator();
  1165. if (leftOnly && scopeOp == no_left)
  1166. {
  1167. leftOnly->append(*LINK(expr));
  1168. return NULL;
  1169. }
  1170. if (rightOnly && scopeOp == no_right)
  1171. {
  1172. rightOnly->append(*LINK(expr));
  1173. return NULL;
  1174. }
  1175. }
  1176. return LINK(expr);
  1177. }
  1178. IHqlExpression * CTreeOptimizer::optimizeJoinCondition(IHqlExpression * expr)
  1179. {
  1180. //Look at the join condition and move any conditions just on left/right further up the tree
  1181. //can help after other constant folding....
  1182. if (!isSimpleInnerJoin(expr) || expr->hasProperty(keyedAtom) || expr->hasProperty(atmostAtom))
  1183. return NULL;
  1184. IHqlExpression * cond = expr->queryChild(2);
  1185. IHqlExpression * seq = querySelSeq(expr);
  1186. HqlExprArray leftOnly, rightOnly;
  1187. OwnedHqlExpr newCond = splitJoinFilter(cond, &leftOnly, isKeyedJoin(expr) ? (HqlExprArray *)NULL : &rightOnly);
  1188. if ((leftOnly.ordinality() == 0) && (rightOnly.ordinality() == 0))
  1189. return NULL;
  1190. HqlExprArray args;
  1191. unwindChildren(args, expr);
  1192. if (leftOnly.ordinality())
  1193. {
  1194. DBGLOG("Optimizer: Hoist %d LEFT conditions out of %s", leftOnly.ordinality(), queryNode0Text(expr));
  1195. IHqlExpression * lhs = expr->queryChild(0);
  1196. OwnedHqlExpr left = createSelector(no_left, lhs, seq);
  1197. OwnedHqlExpr leftFilter = createFilterCondition(leftOnly);
  1198. OwnedHqlExpr newFilter = replaceSelector(leftFilter, left, lhs->queryNormalizedSelector());
  1199. args.replace(*createDataset(no_filter, LINK(lhs), LINK(newFilter)), 0);
  1200. incUsage(&args.item(0));
  1201. }
  1202. if (rightOnly.ordinality())
  1203. {
  1204. DBGLOG("Optimizer: Hoist %d RIGHT conditions out of %s", rightOnly.ordinality(), queryNode0Text(expr));
  1205. IHqlExpression * rhs = expr->queryChild(1);
  1206. OwnedHqlExpr right = createSelector(no_right, rhs, seq);
  1207. OwnedHqlExpr rightFilter = createFilterCondition(rightOnly);
  1208. OwnedHqlExpr newFilter = replaceSelector(rightFilter, right, rhs->queryNormalizedSelector());
  1209. args.replace(*createDataset(no_filter, LINK(rhs), LINK(newFilter)), 1);
  1210. incUsage(&args.item(1));
  1211. }
  1212. if (!newCond)
  1213. newCond.setown(createConstant(true));
  1214. if (!queryProperty(_conditionFolded_Atom, args))
  1215. args.append(*createAttribute(_conditionFolded_Atom));
  1216. args.replace(*newCond.getClear(), 2);
  1217. return expr->clone(args);
  1218. }
  1219. //DISTRIBUTE(DEDUP(ds, x, y, all), hash(trim(x)))
  1220. //It is likely that the following would be better since it removes one distribute:
  1221. //DEDUP(DISTRIBUTE(ds, hash(trim(x))), x, y, all, LOCAL)
  1222. IHqlExpression * CTreeOptimizer::optimizeDistributeDedup(IHqlExpression * expr)
  1223. {
  1224. IHqlExpression * child = expr->queryChild(0);
  1225. if (!child->hasProperty(allAtom) || child->hasProperty(localAtom) || isGrouped(child))
  1226. return NULL;
  1227. DedupInfoExtractor info(child);
  1228. if (info.equalities.ordinality() == 0)
  1229. return NULL;
  1230. IHqlExpression * dist = expr->queryChild(1);
  1231. if (!matchDedupDistribution(dist, info.equalities))
  1232. return NULL;
  1233. DBGLOG("Optimizer: Swap %s and %s", queryNode0Text(expr), queryNode1Text(child));
  1234. OwnedHqlExpr distn;
  1235. if (expr->hasProperty(manyAtom))
  1236. {
  1237. //DEDUP(DISTRIBUTE(DEDUP(ds, x, y, all, local), hash(trim(x))), x, y, all, LOCAL)
  1238. HqlExprArray localDedupArgs;
  1239. unwindChildren(localDedupArgs, child);
  1240. localDedupArgs.append(*createLocalAttribute());
  1241. localDedupArgs.append(*createAttribute(hashAtom));
  1242. OwnedHqlExpr localDedup = child->clone(localDedupArgs);
  1243. distn.setown(replaceChildDataset(expr, localDedup, 0));
  1244. }
  1245. else
  1246. {
  1247. //DEDUP(DISTRIBUTE(ds, hash(trim(x))), x, y, all, LOCAL)
  1248. distn.setown(replaceChildDataset(expr, child->queryChild(0), 0));
  1249. }
  1250. HqlExprArray args;
  1251. args.append(*LINK(distn));
  1252. unwindChildren(args, child, 1);
  1253. args.append(*createLocalAttribute());
  1254. //We would have generated a global hash dedup, so adding hash to the local dedup makes sense.
  1255. args.append(*createAttribute(hashAtom));
  1256. OwnedHqlExpr ret = child->clone(args);
  1257. if (!alreadyHasUsage(ret))
  1258. incUsage(distn);
  1259. return ret.getClear();
  1260. }
  1261. IHqlExpression * CTreeOptimizer::optimizeProjectInlineTable(IHqlExpression * transformed, bool childrenAreShared)
  1262. {
  1263. IHqlExpression * child = transformed->queryChild(0);
  1264. IHqlExpression * values = child->queryChild(0);
  1265. //MORE If trivial projection then might be worth merging with multiple items, but unlikely to occur in practice
  1266. if (!isPureInlineDataset(child) || transformed->hasProperty(prefetchAtom))
  1267. return NULL;
  1268. bool onlyFoldConstant = false;
  1269. if (values->numChildren() != 1)
  1270. {
  1271. if (options & HOOfoldconstantdatasets)
  1272. {
  1273. if (!isConstantDataset(child))
  1274. return NULL;
  1275. onlyFoldConstant = true;
  1276. }
  1277. else
  1278. return NULL;
  1279. }
  1280. if (childrenAreShared)
  1281. {
  1282. if (!isConstantDataset(child))
  1283. return NULL;
  1284. }
  1285. IHqlExpression * transformedCountProject = transformed->queryProperty(_countProject_Atom);
  1286. IHqlExpression * seq = querySelSeq(transformed);
  1287. node_operator projectOp = transformed->getOperator();
  1288. OwnedHqlExpr oldSelector = (projectOp == no_hqlproject) ? createSelector(no_left, child, seq) : LINK(child->queryNormalizedSelector());
  1289. IHqlExpression * curTransform = queryNewColumnProvider(transformed);
  1290. if (!isKnownTransform(curTransform))
  1291. return NULL;
  1292. ExpandSelectorMonitor monitor(*this);
  1293. HqlExprArray newValues;
  1294. ForEachChild(i, values)
  1295. {
  1296. TableProjectMapper mapper;
  1297. mapper.setMapping(values->queryChild(i), NULL);
  1298. OwnedHqlExpr next = expandFields(&mapper, curTransform, oldSelector, NULL, &monitor);
  1299. //Expand counter inline!
  1300. if (transformedCountProject)
  1301. {
  1302. OwnedHqlExpr counter = createConstant(createIntValue(i+1, 8, false));
  1303. next.setown(replaceExpression(next, transformedCountProject->queryChild(0), counter));
  1304. }
  1305. if (!next || monitor.isComplex())
  1306. return NULL;
  1307. if (onlyFoldConstant && !isConstantTransform(next))
  1308. return NULL;
  1309. newValues.append(*ensureTransformType(next, no_transform));
  1310. }
  1311. DBGLOG("Optimizer: Merge %s and %s", queryNode0Text(transformed), queryNode1Text(child));
  1312. HqlExprArray args;
  1313. args.append(*createValue(no_transformlist, makeNullType(), newValues));
  1314. if (projectOp == no_newusertable)
  1315. args.append(*LINK(transformed->queryChild(1)));
  1316. else
  1317. args.append(*LINK(transformed->queryRecord()));
  1318. unwindChildren(args, child, 2);
  1319. noteUnused(child);
  1320. OwnedHqlExpr ret = child->clone(args);
  1321. return transformed->cloneAllAnnotations(ret);
  1322. }
  1323. void CTreeOptimizer::analyseExpr(IHqlExpression * expr)
  1324. {
  1325. if (incUsage(expr))
  1326. return;
  1327. switch (expr->getOperator())
  1328. {
  1329. case no_filepos:
  1330. case no_file_logicalname:
  1331. case no_sizeof:
  1332. case no_offsetof:
  1333. return;
  1334. case no_table:
  1335. //only look at the filename - not the parent files.
  1336. analyseExpr(expr->queryChild(0));
  1337. return;
  1338. }
  1339. PARENT::analyseExpr(expr);
  1340. }
  1341. bool CTreeOptimizer::noteUnused(IHqlExpression * expr)
  1342. {
  1343. // return false;
  1344. return decUsage(expr);
  1345. }
  1346. bool CTreeOptimizer::decUsage(IHqlExpression * expr)
  1347. {
  1348. OptTransformInfo * extra = queryBodyExtra(expr);
  1349. #ifdef TRACE_USAGE
  1350. if (expr->isDataset() || expr->isDatarow())
  1351. DBGLOG("%lx dec %d [%s]", (unsigned)expr, extra->useCount, queryNode0Text(expr));
  1352. #endif
  1353. if (extra->useCount)
  1354. return extra->useCount-- == 1;
  1355. return false;
  1356. }
  1357. bool CTreeOptimizer::alreadyHasUsage(IHqlExpression * expr)
  1358. {
  1359. OptTransformInfo * extra = queryBodyExtra(expr);
  1360. return (extra->useCount != 0);
  1361. }
  1362. bool CTreeOptimizer::incUsage(IHqlExpression * expr)
  1363. {
  1364. OptTransformInfo * extra = queryBodyExtra(expr);
  1365. #ifdef TRACE_USAGE
  1366. if (expr->isDataset() || expr->isDatarow())
  1367. DBGLOG("%lx inc %d [%s]", (unsigned)expr, extra->useCount, queryNode0Text(expr));
  1368. #endif
  1369. return (extra->useCount++ != 0);
  1370. }
  1371. IHqlExpression * CTreeOptimizer::inheritUsage(IHqlExpression * newExpr, IHqlExpression * oldExpr)
  1372. {
  1373. OptTransformInfo * newExtra = queryBodyExtra(newExpr);
  1374. OptTransformInfo * oldExtra = queryBodyExtra(oldExpr);
  1375. #ifdef TRACE_USAGE
  1376. if (newExpr->isDataset() || newExpr->isDatarow())
  1377. DBGLOG("%lx inherit %d,%d (from %lx) [%s]", (unsigned)newExpr, newExtra->useCount, oldExtra->useCount, (unsigned)oldExpr, queryNode0Text(newExpr));
  1378. //assertex(extra->useCount);
  1379. if ((oldExtra->useCount == 0) && (newExpr->isDataset() || newExpr->isDatarow()))
  1380. DBGLOG("Inherit0: %lx inherit %d,%d (from %lx)", (unsigned)newExpr, newExtra->useCount, oldExtra->useCount, (unsigned)oldExpr);
  1381. #endif
  1382. newExtra->useCount += oldExtra->useCount;
  1383. return newExpr;
  1384. }
  1385. bool CTreeOptimizer::isComplexTransform(IHqlExpression * transform)
  1386. {
  1387. ExpandComplexityMonitor monitor(*this);
  1388. monitor.analyseTransform(transform);
  1389. return monitor.isComplex();
  1390. }
  1391. IHqlExpression * CTreeOptimizer::expandProjectedDataset(IHqlExpression * child, IHqlExpression * transform, IHqlExpression * childSelector, IHqlExpression * expr)
  1392. {
  1393. if (hasUnknownTransform(child))
  1394. return NULL;
  1395. OwnedMapper mapper = getMapper(child);
  1396. ExpandSelectorMonitor monitor(*this);
  1397. OwnedHqlExpr expandedTransform = expandFields(mapper, transform, childSelector, NULL, &monitor);
  1398. IHqlExpression * onFail = child->queryProperty(onFailAtom);
  1399. OwnedHqlExpr newOnFail;
  1400. if (onFail)
  1401. {
  1402. IHqlExpression * oldFailTransform = onFail->queryChild(0);
  1403. OwnedMapper onFailMapper = createProjectMapper(oldFailTransform, NULL);
  1404. OwnedHqlExpr onFailTransform = expandFields(onFailMapper, transform, childSelector, NULL, &monitor);
  1405. if (onFailTransform)
  1406. newOnFail.setown(createExprAttribute(onFailAtom, ensureTransformType(onFailTransform, oldFailTransform->getOperator())));
  1407. }
  1408. if (expandedTransform && (!onFail || newOnFail) && !monitor.isComplex())
  1409. {
  1410. unsigned transformIndex = queryTransformIndex(child);
  1411. IHqlExpression * oldTransform = child->queryChild(transformIndex);
  1412. expandedTransform.setown(ensureTransformType(expandedTransform, oldTransform->getOperator()));
  1413. DBGLOG("Optimizer: Merge %s and %s", queryNode0Text(expr), queryNode1Text(child));
  1414. HqlExprArray args;
  1415. unwindChildren(args, child);
  1416. args.replace(*expandedTransform.getClear(), transformIndex);
  1417. if (onFail)
  1418. args.replace(*newOnFail.getClear(), args.find(*onFail));
  1419. noteUnused(child);
  1420. return child->clone(args);
  1421. }
  1422. return NULL;
  1423. }
  1424. IHqlExpression * CTreeOptimizer::optimizeAggregateCompound(IHqlExpression * transformed)
  1425. {
  1426. //Keep in sync with code in CompoundSourceTransformer
  1427. IHqlExpression * child = transformed->queryChild(0);
  1428. if (isLimitedDataset(child, true))
  1429. return NULL;
  1430. IHqlExpression * tableExpr = queryRoot(transformed);
  1431. node_operator modeOp = queryTableMode(tableExpr);
  1432. if (modeOp == no_csv || modeOp == no_xml)
  1433. return NULL;
  1434. if (isLimitedDataset(child) && !isSimpleCountExistsAggregate(transformed, true, false))
  1435. return NULL;
  1436. node_operator newOp = no_none;
  1437. node_operator childOp = child->getOperator();
  1438. if (queryRealChild(transformed, 3))
  1439. {
  1440. //Grouped aggregate
  1441. switch (childOp)
  1442. {
  1443. case no_compound_diskread:
  1444. case no_compound_disknormalize:
  1445. newOp = no_compound_diskgroupaggregate;
  1446. break;
  1447. case no_compound_indexread:
  1448. case no_compound_indexnormalize:
  1449. newOp = no_compound_indexgroupaggregate;
  1450. break;
  1451. case no_compound_childread:
  1452. case no_compound_childnormalize:
  1453. newOp = no_compound_childgroupaggregate;
  1454. break;
  1455. }
  1456. }
  1457. else
  1458. {
  1459. switch (childOp)
  1460. {
  1461. case no_compound_diskread:
  1462. case no_compound_disknormalize:
  1463. newOp = no_compound_diskaggregate;
  1464. break;
  1465. case no_compound_indexread:
  1466. case no_compound_indexnormalize:
  1467. newOp = no_compound_indexaggregate;
  1468. break;
  1469. case no_compound_childread:
  1470. case no_compound_childnormalize:
  1471. newOp = no_compound_childaggregate;
  1472. break;
  1473. case no_compound_inline:
  1474. newOp = no_compound_inline;
  1475. break;
  1476. }
  1477. }
  1478. if (newOp)
  1479. return createDataset(newOp, removeChildNode(transformed));
  1480. return NULL;
  1481. }
  1482. bool CTreeOptimizer::childrenAreShared(IHqlExpression * expr)
  1483. {
  1484. if (expr->isDataset() || expr->isDatarow())
  1485. {
  1486. switch (getChildDatasetType(expr))
  1487. {
  1488. case childdataset_none:
  1489. return false;
  1490. case childdataset_dataset:
  1491. case childdataset_datasetleft:
  1492. case childdataset_left:
  1493. case childdataset_same_left_right:
  1494. case childdataset_top_left_right:
  1495. case childdataset_dataset_noscope:
  1496. {
  1497. IHqlExpression * ds = expr->queryChild(0);
  1498. //Don't restrict the items that can be combined with no_null.
  1499. return isShared(ds);
  1500. }
  1501. case childdataset_leftright:
  1502. return isShared(expr->queryChild(0)) || isShared(expr->queryChild(1));
  1503. case childdataset_evaluate:
  1504. case childdataset_if:
  1505. case childdataset_case:
  1506. case childdataset_map:
  1507. case childdataset_nway_left_right:
  1508. return true; // stop any folding of these...
  1509. case childdataset_addfiles:
  1510. case childdataset_merge:
  1511. {
  1512. ForEachChild(i, expr)
  1513. {
  1514. IHqlExpression * cur = expr->queryChild(i);
  1515. if (!cur->isAttribute() && isShared(cur))
  1516. return true;
  1517. }
  1518. return false;
  1519. }
  1520. default:
  1521. UNIMPLEMENTED;
  1522. }
  1523. }
  1524. switch (expr->getOperator())
  1525. {
  1526. case no_select:
  1527. if (!expr->hasProperty(newAtom))
  1528. return false;
  1529. return isShared(expr->queryChild(0));
  1530. case NO_AGGREGATE:
  1531. return isShared(expr->queryChild(0));
  1532. }
  1533. return false;
  1534. }
  1535. bool CTreeOptimizer::isWorthMovingProjectOverLimit(IHqlExpression * project)
  1536. {
  1537. if (noHoistAttr && project->queryProperty(_noHoist_Atom) == noHoistAttr)
  1538. return false;
  1539. IHqlExpression * expr = project->queryChild(0);
  1540. loop
  1541. {
  1542. switch (expr->getOperator())
  1543. {
  1544. case no_limit:
  1545. case no_keyedlimit:
  1546. case no_choosen:
  1547. expr = expr->queryChild(0);
  1548. break;
  1549. case no_compound_diskread:
  1550. case no_compound_disknormalize:
  1551. case no_compound_indexread:
  1552. case no_compound_indexnormalize:
  1553. case no_compound_childread:
  1554. case no_compound_childnormalize:
  1555. case no_compound_selectnew:
  1556. case no_compound_inline:
  1557. //if (options & HOOcompoundproject)
  1558. return true;
  1559. case no_join:
  1560. if (isKeyedJoin(expr))
  1561. return false;
  1562. case no_selfjoin:
  1563. case no_fetch:
  1564. case no_normalize:
  1565. case no_newparse:
  1566. case no_newxmlparse:
  1567. return true;
  1568. case no_null:
  1569. return true;
  1570. case no_newusertable:
  1571. if (isAggregateDataset(expr))
  1572. return false;
  1573. //fallthrough.
  1574. case no_hqlproject:
  1575. if (!isPureActivity(expr) || expr->hasProperty(_countProject_Atom) || expr->hasProperty(prefetchAtom))
  1576. return false;
  1577. return true;
  1578. default:
  1579. return false;
  1580. }
  1581. if (isShared(expr))
  1582. return false;
  1583. }
  1584. }
  1585. IHqlExpression * CTreeOptimizer::moveProjectionOverSimple(IHqlExpression * transformed, bool noMoveIfFail, bool errorIfFail)
  1586. {
  1587. IHqlExpression * child = transformed->queryChild(0);
  1588. IHqlExpression * grandchild = child->queryChild(0);
  1589. IHqlExpression * newProject = replaceChild(transformed, grandchild);
  1590. HqlExprArray args;
  1591. args.append(*newProject);
  1592. OwnedMapper mapper = getMapper(transformed);
  1593. ForEachChild(idx, child)
  1594. {
  1595. if (idx != 0)
  1596. {
  1597. bool ok = false;
  1598. IHqlExpression * cur = child->queryChild(idx);
  1599. IHqlExpression * collapsed = mapper->collapseFields(cur, grandchild, newProject, &ok);
  1600. if (!ok)
  1601. {
  1602. ::Release(collapsed);
  1603. if (errorIfFail)
  1604. {
  1605. StringBuffer cause;
  1606. if (cur->getOperator() == no_sortlist)
  1607. {
  1608. ForEachChild(i, cur)
  1609. {
  1610. IHqlExpression * elem = cur->queryChild(i);
  1611. OwnedHqlExpr collapsed = mapper->collapseFields(elem, grandchild, newProject, &ok);
  1612. if (!ok)
  1613. {
  1614. cause.append(" expression: ");
  1615. getExprECL(elem, cause);
  1616. break;
  1617. }
  1618. }
  1619. }
  1620. throwError1(HQLERR_BadProjectOfStepping, cause.str());
  1621. }
  1622. if (noMoveIfFail)
  1623. return LINK(transformed);
  1624. //NB: Always succeed for distributed/sorted/grouped, because it is needed for the disk read/index read processing.
  1625. if (cur->getOperator() == no_sortlist)
  1626. collapsed = createValue(no_sortlist, makeSortListType(NULL), createAttribute(unknownAtom));
  1627. else
  1628. collapsed = createAttribute(unknownAtom);
  1629. }
  1630. args.append(*collapsed);
  1631. }
  1632. }
  1633. DBGLOG("Optimizer: Swap %s and %s", queryNode0Text(transformed), queryNode1Text(child));
  1634. OwnedHqlExpr swapped = child->clone(args);
  1635. if (!alreadyHasUsage(swapped))
  1636. incUsage(newProject);
  1637. noteUnused(child);
  1638. return swapped.getClear();
  1639. }
  1640. IHqlExpression * CTreeOptimizer::moveProjectionOverLimit(IHqlExpression * transformed)
  1641. {
  1642. IHqlExpression * child = transformed->queryChild(0);
  1643. IHqlExpression * grandchild = child->queryChild(0);
  1644. IHqlExpression * newProject = replaceChild(transformed, grandchild);
  1645. HqlExprArray args;
  1646. args.append(*newProject);
  1647. ExpandSelectorMonitor monitor(*this);
  1648. ForEachChildFrom(idx, child, 1)
  1649. {
  1650. IHqlExpression * cur = child->queryChild(idx);
  1651. if (cur->isAttribute() && cur->queryName() == onFailAtom)
  1652. {
  1653. IHqlExpression * oldFailTransform = cur->queryChild(0);
  1654. if (!isKnownTransform(oldFailTransform))
  1655. return LINK(transformed);
  1656. OwnedMapper onFailMapper = createProjectMapper(oldFailTransform, NULL);
  1657. IHqlExpression * projectionTransformer = queryNewColumnProvider(transformed);
  1658. OwnedHqlExpr parentSelector = getParentDatasetSelector(transformed);
  1659. OwnedHqlExpr onFailTransform = expandFields(onFailMapper, projectionTransformer, parentSelector, NULL, &monitor);
  1660. args.append(*createExprAttribute(onFailAtom, ensureTransformType(onFailTransform, oldFailTransform->getOperator())));
  1661. }
  1662. else
  1663. args.append(*LINK(cur));
  1664. }
  1665. if (monitor.isComplex())
  1666. return LINK(transformed);
  1667. DBGLOG("Optimizer: Swap %s and %s", queryNode0Text(transformed), queryNode1Text(child));
  1668. OwnedHqlExpr swapped = child->clone(args);
  1669. if (!alreadyHasUsage(swapped))
  1670. incUsage(newProject);
  1671. noteUnused(child);
  1672. return swapped.getClear();
  1673. }
  1674. IHqlExpression * CTreeOptimizer::insertChild(IHqlExpression * expr, IHqlExpression * newChild)
  1675. {
  1676. return insertChildDataset(expr, newChild, 0);
  1677. }
  1678. IHqlExpression * CTreeOptimizer::replaceChild(IHqlExpression * expr, IHqlExpression * newChild)
  1679. {
  1680. return replaceChildDataset(expr, newChild, 0);
  1681. }
  1682. void CTreeOptimizer::unwindReplaceChild(HqlExprArray & args, IHqlExpression * expr, IHqlExpression * newChild)
  1683. {
  1684. HqlMapTransformer mapper;
  1685. mapper.setMapping(expr->queryChild(0), newChild);
  1686. mapper.setSelectorMapping(expr->queryChild(0), newChild);
  1687. ForEachChild(idx, expr)
  1688. args.append(*mapper.transformRoot(expr->queryChild(idx)));
  1689. }
  1690. ANewTransformInfo * CTreeOptimizer::createTransformInfo(IHqlExpression * expr)
  1691. {
  1692. return CREATE_NEWTRANSFORMINFO(OptTransformInfo, expr);
  1693. }
  1694. IHqlExpression * CTreeOptimizer::expandFields(TableProjectMapper * mapper, IHqlExpression * expr, IHqlExpression * oldDataset, IHqlExpression * newDataset, IExpandCallback * _expandCallback)
  1695. {
  1696. OwnedHqlExpr expandedFilter = mapper->expandFields(expr, oldDataset, newDataset, _expandCallback);
  1697. if (options & HOOfold)
  1698. expandedFilter.setown(foldHqlExpression(expandedFilter));
  1699. return expandedFilter.getClear();
  1700. }
  1701. IHqlExpression * CTreeOptimizer::inheritSkips(IHqlExpression * newTransform, IHqlExpression * oldTransform, IHqlExpression * oldSelector, IHqlExpression * newSelector)
  1702. {
  1703. HqlExprArray args;
  1704. ForEachChild(i, oldTransform)
  1705. {
  1706. IHqlExpression * cur = oldTransform->queryChild(i);
  1707. if (cur->getOperator() == no_skip)
  1708. args.append(*replaceSelector(cur, oldSelector, newSelector));
  1709. }
  1710. if (args.ordinality() == 0)
  1711. return LINK(newTransform);
  1712. unwindChildren(args, newTransform);
  1713. return newTransform->clone(args);
  1714. }
  1715. IHqlExpression * CTreeOptimizer::createTransformed(IHqlExpression * expr)
  1716. {
  1717. node_operator op = expr->getOperator();
  1718. switch (op)
  1719. {
  1720. case no_field:
  1721. case no_record:
  1722. return LINK(expr);
  1723. }
  1724. //Do this first, so that any references to a child dataset that changes are correctly updated, before proceeding any further.
  1725. OwnedHqlExpr dft = defaultCreateTransformed(expr);
  1726. #ifndef USE_MERGING_TRANSFORM
  1727. updateOrphanedSelectors(dft, expr);
  1728. #endif
  1729. OwnedHqlExpr ret = doCreateTransformed(dft, expr);
  1730. if (ret->queryBody() == expr->queryBody())
  1731. return ret.getClear();
  1732. inheritUsage(ret, expr);
  1733. if (ret == dft)
  1734. return ret.getClear();
  1735. return transform(ret);
  1736. }
  1737. IHqlExpression * CTreeOptimizer::getOptimizedFilter(IHqlExpression * transformed, bool alwaysTrue)
  1738. {
  1739. if (alwaysTrue)
  1740. return removeParentNode(transformed);
  1741. else
  1742. {
  1743. noteUnused(transformed->queryChild(0));
  1744. //MORE: Really wants to walk down the entire chain until we hit something that is shared.
  1745. IHqlExpression * ret = createNullDataset(transformed);
  1746. DBGLOG("Optimizer: Replace %s with %s", queryNode0Text(transformed), queryNode1Text(ret));
  1747. return ret;
  1748. }
  1749. }
  1750. IHqlExpression * CTreeOptimizer::getOptimizedFilter(IHqlExpression * transformed, HqlExprArray const & filters)
  1751. {
  1752. return getOptimizedFilter(transformed, filters.ordinality() == 0);
  1753. }
  1754. void CTreeOptimizer::recursiveDecUsage(IHqlExpression * expr)
  1755. {
  1756. if (decUsage(expr))
  1757. recursiveDecChildUsage(expr);
  1758. }
  1759. void CTreeOptimizer::recursiveDecChildUsage(IHqlExpression * expr)
  1760. {
  1761. switch (getChildDatasetType(expr))
  1762. {
  1763. case childdataset_none:
  1764. break;
  1765. case childdataset_dataset:
  1766. case childdataset_datasetleft:
  1767. case childdataset_left:
  1768. case childdataset_same_left_right:
  1769. case childdataset_top_left_right:
  1770. case childdataset_dataset_noscope:
  1771. recursiveDecUsage(expr->queryChild(0));
  1772. break;
  1773. case childdataset_leftright:
  1774. recursiveDecUsage(expr->queryChild(0));
  1775. recursiveDecUsage(expr->queryChild(0));
  1776. break;
  1777. case childdataset_if:
  1778. recursiveDecUsage(expr->queryChild(1));
  1779. if (expr->queryChild(2))
  1780. recursiveDecUsage(expr->queryChild(2));
  1781. break;
  1782. case childdataset_evaluate:
  1783. case childdataset_case:
  1784. case childdataset_map:
  1785. case childdataset_nway_left_right:
  1786. break; // who knows?
  1787. case childdataset_addfiles:
  1788. case childdataset_merge:
  1789. {
  1790. ForEachChild(i, expr)
  1791. recursiveDecUsage(expr->queryChild(i));
  1792. break;
  1793. }
  1794. default:
  1795. UNIMPLEMENTED;
  1796. }
  1797. }
  1798. IHqlExpression * CTreeOptimizer::replaceWithNull(IHqlExpression * transformed)
  1799. {
  1800. IHqlExpression * ret = createNullExpr(transformed);
  1801. DBGLOG("Optimizer: Replace %s with %s", queryNode0Text(transformed), queryNode1Text(ret));
  1802. recursiveDecChildUsage(transformed);
  1803. return ret;
  1804. }
  1805. IHqlExpression * CTreeOptimizer::replaceWithNullRow(IHqlExpression * expr)
  1806. {
  1807. IHqlExpression * ret = createRow(no_null, LINK(expr->queryRecord()));
  1808. DBGLOG("Optimizer: Replace %s with %s", queryNode0Text(expr), queryNode1Text(ret));
  1809. recursiveDecChildUsage(expr);
  1810. return ret;
  1811. }
  1812. IHqlExpression * CTreeOptimizer::replaceWithNullRowDs(IHqlExpression * expr)
  1813. {
  1814. assertex(!isGrouped(expr));
  1815. IHqlExpression * ret = createDatasetFromRow(createRow(no_null, LINK(expr->queryRecord())));
  1816. DBGLOG("Optimizer: Replace %s with %s", queryNode0Text(expr), queryNode1Text(ret));
  1817. recursiveDecChildUsage(expr);
  1818. return ret;
  1819. }
  1820. IHqlExpression * CTreeOptimizer::transformExpanded(IHqlExpression * expr)
  1821. {
  1822. return transform(expr);
  1823. }
  1824. IHqlExpression * CTreeOptimizer::queryMoveKeyedExpr(IHqlExpression * transformed)
  1825. {
  1826. //Need to swap with these, regardless of whether the input is shared, because the keyed limit only makes sense
  1827. //inside a compound source
  1828. IHqlExpression * child = transformed->queryChild(0);
  1829. node_operator childOp = child->getOperator();
  1830. switch(childOp)
  1831. {
  1832. case no_compound_indexread:
  1833. case no_compound_diskread:
  1834. case no_assertsorted:
  1835. case no_assertdistributed:
  1836. case no_section: // no so sure...
  1837. case no_sectioninput:
  1838. case no_executewhen:
  1839. return swapNodeWithChild(transformed);
  1840. case no_compound:
  1841. return swapNodeWithChild(transformed, 1);
  1842. case no_if:
  1843. return swapIntoIf(transformed, true);
  1844. case no_nonempty:
  1845. case no_addfiles:
  1846. return swapIntoAddFiles(transformed, true);
  1847. //Force the child to be keyed if it is surrounded by something that needs to be keyed, to ensure both migrate up the tree
  1848. case no_hqlproject:
  1849. case no_newusertable:
  1850. case no_aggregate:
  1851. case no_newaggregate:
  1852. case no_choosen:
  1853. case no_limit:
  1854. case no_keyedlimit:
  1855. case no_sorted:
  1856. case no_stepped:
  1857. case no_distributed:
  1858. case no_preservemeta:
  1859. case no_grouped:
  1860. case no_nofold:
  1861. case no_nohoist:
  1862. case no_filter:
  1863. {
  1864. OwnedHqlExpr newChild = queryMoveKeyedExpr(child);
  1865. if (newChild)
  1866. {
  1867. OwnedHqlExpr moved = replaceChildDataset(transformed, newChild, 0);
  1868. decUsage(child);
  1869. if (!alreadyHasUsage(moved))
  1870. incUsage(newChild);
  1871. return moved.getClear();
  1872. }
  1873. }
  1874. }
  1875. return NULL;
  1876. }
  1877. IHqlExpression * CTreeOptimizer::doCreateTransformed(IHqlExpression * transformed, IHqlExpression * _expr)
  1878. {
  1879. OwnedHqlExpr folded = foldNullDataset(transformed);
  1880. if (folded && folded != transformed)
  1881. return folded.getClear();
  1882. node_operator op = transformed->getOperator();
  1883. IHqlExpression * child = transformed->queryChild(0);
  1884. //Any optimizations that remove the current node, or modify the current node don't need to check if the children are shared
  1885. //Removing child nodes could be included, but it may create more spillers/spliters - which may be significant in thor.
  1886. switch (op)
  1887. {
  1888. case no_if:
  1889. {
  1890. OwnedHqlExpr ret = optimizeIf(transformed);
  1891. if (ret)
  1892. return ret.getClear();
  1893. //Processed hereThis won't split shared nodes, but one of the children may be shared - so proce
  1894. if (transformed->isDataset())
  1895. return optimizeDatasetIf(transformed);
  1896. break;
  1897. }
  1898. case no_keyedlimit:
  1899. {
  1900. IHqlExpression * ret = queryMoveKeyedExpr(transformed);
  1901. if (ret)
  1902. return ret;
  1903. break;
  1904. }
  1905. case no_filter:
  1906. if (filterIsKeyed(transformed))
  1907. {
  1908. IHqlExpression * ret = queryMoveKeyedExpr(transformed);
  1909. if (ret)
  1910. return ret;
  1911. }
  1912. break;
  1913. case no_hqlproject:
  1914. case no_newusertable:
  1915. if (transformed->hasProperty(keyedAtom))
  1916. {
  1917. IHqlExpression * ret = queryMoveKeyedExpr(transformed);
  1918. if (ret)
  1919. return ret;
  1920. }
  1921. break;
  1922. case no_join:
  1923. {
  1924. #ifdef MIGRATE_JOIN_CONDITIONS
  1925. OwnedHqlExpr ret = optimizeJoinCondition(transformed);
  1926. if (ret)
  1927. return ret.getClear();
  1928. #endif
  1929. IHqlExpression * ret2 = optimizeInlineJoin(transformed);
  1930. if (ret2)
  1931. return ret2;
  1932. //MORE:
  1933. //If left outer join, and transform doesn't reference RIGHT, and only one rhs record could match each lhs record (e.g., it was rolled
  1934. //up, or a non-many lookup join, then the join could be converted into a project
  1935. //Can occur once fields get implicitly removed from transforms etc. - e.g., bc10.xhql, although that code has since been fixed.
  1936. break;
  1937. }
  1938. case no_dedup:
  1939. {
  1940. node_operator childOp = child->getOperator();
  1941. switch(childOp)
  1942. {
  1943. case no_dedup:
  1944. {
  1945. DedupInfoExtractor dedup1(transformed); // slightly costly to create
  1946. DedupInfoExtractor dedup2(child);
  1947. switch (dedup1.compareWith(dedup2))
  1948. {
  1949. //In roxie this would probably be better, in thor it may create extra spills
  1950. //case DedupInfoExtractor::DedupDoesAll:
  1951. // return removeChildNode(transformed);
  1952. case DedupInfoExtractor::DedupDoesNothing:
  1953. return removeParentNode(transformed);
  1954. }
  1955. break;
  1956. }
  1957. }
  1958. break;
  1959. }
  1960. case no_aggregate:
  1961. case no_newaggregate:
  1962. {
  1963. node_operator childOp = child->getOperator();
  1964. if (transformed->hasProperty(keyedAtom))
  1965. {
  1966. IHqlExpression * moved = NULL;
  1967. switch(childOp)
  1968. {
  1969. case no_compound_diskread:
  1970. case no_compound_disknormalize:
  1971. case no_compound_indexread:
  1972. case no_compound_indexnormalize:
  1973. case no_compound_childread:
  1974. case no_compound_childnormalize:
  1975. if (!isGrouped(queryRoot(child)) && (options & HOOhascompoundaggregate))
  1976. moved = optimizeAggregateCompound(transformed);
  1977. break;
  1978. default:
  1979. moved = queryMoveKeyedExpr(transformed);
  1980. break;
  1981. }
  1982. if (moved)
  1983. return moved;
  1984. }
  1985. IHqlExpression * folded = NULL;
  1986. switch(childOp)
  1987. {
  1988. case no_thisnode:
  1989. return swapNodeWithChild(transformed);
  1990. case no_inlinetable:
  1991. if ((options & HOOfoldconstantdatasets) && isPureInlineDataset(child))
  1992. folded = queryOptimizeAggregateInline(transformed, child->queryChild(0)->numChildren());
  1993. break;
  1994. default:
  1995. if ((options & HOOfoldconstantdatasets) && hasSingleRow(child))
  1996. folded = queryOptimizeAggregateInline(transformed, 1);
  1997. break;
  1998. }
  1999. if (folded)
  2000. {
  2001. recursiveDecUsage(child);
  2002. return folded;
  2003. }
  2004. //MORE: The OHOinsidecompound isn't really good enough - because might remove projects from
  2005. //nested child aggregates which could benifit from them. Probably not as long as all compound
  2006. //activities support aggregation. In fact test should be removable everywhere once all
  2007. //engines support the new activities.
  2008. if (isGrouped(transformed->queryChild(0)) || (queryRealChild(transformed, 3) && !(options & HOOinsidecompound)))
  2009. break;
  2010. OwnedHqlExpr ret = optimizeAggregateDataset(transformed);
  2011. if (ret != transformed)
  2012. return ret.getClear();
  2013. break;
  2014. }
  2015. case NO_AGGREGATE:
  2016. case no_countindex:
  2017. return optimizeAggregateDataset(transformed);
  2018. case no_selectnth:
  2019. {
  2020. node_operator childOp = child->getOperator();
  2021. switch(childOp)
  2022. {
  2023. case no_inlinetable:
  2024. {
  2025. __int64 index = getIntValue(transformed->queryChild(1), -1);
  2026. if (index == -1)
  2027. break;
  2028. IHqlExpression * values = child->queryChild(0);
  2029. if (!values->isPure())
  2030. break;
  2031. if (index < 1 || index > values->numChildren())
  2032. return replaceWithNull(transformed);
  2033. //MORE If trivial projection then might be worth merging with multiple items, but unlikely to occur in practice
  2034. OwnedHqlExpr ret = createRow(no_createrow, LINK(values->queryChild((unsigned)index-1)));
  2035. noteUnused(child);
  2036. DBGLOG("Optimizer: Replace %s with %s", queryNode0Text(transformed), queryNode1Text(ret));
  2037. return ret.getClear();
  2038. }
  2039. case no_datasetfromrow:
  2040. {
  2041. __int64 index = getIntValue(transformed->queryChild(1), -1);
  2042. if (index == -1)
  2043. break;
  2044. if (index != 1)
  2045. return replaceWithNull(transformed);
  2046. IHqlExpression * ret = child->queryChild(0);
  2047. noteUnused(child);
  2048. decUsage(ret); // will inherit later
  2049. DBGLOG("Optimizer: Replace %s with %s", queryNode0Text(transformed), queryNode1Text(ret));
  2050. return LINK(ret);
  2051. }
  2052. #if 0
  2053. //This works (with either condition used), but I don't tink it is worth the cycles..
  2054. case no_choosen:
  2055. {
  2056. __int64 index = getIntValue(transformed->queryChild(1), -1);
  2057. __int64 choosenMax = getIntValue(child->queryChild(1), -1);
  2058. //choosen(x,<n>)[m] == x[m] iff n >= m
  2059. // if ((index == 1) && (choosenMax == 1) && !queryRealChild(child, 2))
  2060. if ((index > 0) && (choosenMax >= index) && !queryRealChild(child, 2) && !isGrouped(child->queryChild(0)))
  2061. return removeChildNode(transformed);
  2062. }
  2063. break;
  2064. #endif
  2065. }
  2066. break;
  2067. }
  2068. case no_select:
  2069. {
  2070. if (transformed->hasProperty(newAtom))
  2071. {
  2072. node_operator childOp = child->getOperator();
  2073. switch (childOp)
  2074. {
  2075. case no_createrow:
  2076. {
  2077. OwnedHqlExpr match = getExtractSelect(child->queryChild(0), transformed->queryChild(1));
  2078. if (match)
  2079. {
  2080. IHqlExpression * cur = match;
  2081. while (isCast(cur))
  2082. cur = cur->queryChild(0);
  2083. switch (cur->getOperator())
  2084. {
  2085. case no_constant:
  2086. case no_select:
  2087. case no_null:
  2088. case no_getresult:
  2089. DBGLOG("Optimizer: Extract value %s from %s", queryNode0Text(match), queryNode1Text(transformed));
  2090. noteUnused(child);
  2091. return match.getClear();
  2092. }
  2093. }
  2094. }
  2095. break;
  2096. case no_datasetfromrow:
  2097. {
  2098. HqlExprArray args;
  2099. args.append(*LINK(child->queryChild(0)));
  2100. unwindChildren(args, transformed, 1);
  2101. noteUnused(child);
  2102. return transformed->clone(args);
  2103. }
  2104. break;
  2105. case no_inlinetable:
  2106. {
  2107. IHqlExpression * values = child->queryChild(0);
  2108. if (values->numChildren() == 1)
  2109. {
  2110. IHqlExpression * transform = values->queryChild(0);
  2111. OwnedHqlExpr match = getExtractSelect(transform, transformed->queryChild(1));
  2112. if (match)
  2113. {
  2114. IHqlExpression * cur = match;
  2115. while (isCast(cur))
  2116. cur = cur->queryChild(0);
  2117. switch (cur->getOperator())
  2118. {
  2119. case no_constant:
  2120. case no_select:
  2121. case no_null:
  2122. case no_getresult:
  2123. case no_inlinetable:
  2124. case no_left:
  2125. case no_right:
  2126. {
  2127. DBGLOG("Optimizer: Extract value %s from %s", queryNode0Text(match), queryNode1Text(transformed));
  2128. noteUnused(child);
  2129. return match.getClear();
  2130. }
  2131. }
  2132. }
  2133. }
  2134. }
  2135. break;
  2136. }
  2137. }
  2138. }
  2139. break;
  2140. case no_extractresult:
  2141. {
  2142. //Very similar to the transform above, but needs to be done separately because of the new representation of no_extractresult.
  2143. //extract(inline-table(single-row), somefield) -> single-row.somefield if simple valued.
  2144. node_operator childOp = child->getOperator();
  2145. switch (childOp)
  2146. {
  2147. case no_inlinetable:
  2148. {
  2149. IHqlExpression * extracted = transformed->queryChild(1);
  2150. if ((extracted->getOperator() == no_select) && (extracted->queryChild(0) == child->queryNormalizedSelector()))
  2151. {
  2152. IHqlExpression * values = child->queryChild(0);
  2153. if (values->numChildren() == 1)
  2154. {
  2155. IHqlExpression * transform = values->queryChild(0);
  2156. OwnedHqlExpr match = getExtractSelect(transform, extracted->queryChild(1));
  2157. if (match)
  2158. {
  2159. IHqlExpression * cur = match;
  2160. while (isCast(cur))
  2161. cur = cur->queryChild(0);
  2162. switch (cur->getOperator())
  2163. {
  2164. case no_constant:
  2165. case no_select:
  2166. case no_null:
  2167. case no_getresult:
  2168. {
  2169. DBGLOG("Optimizer: Extract value %s from %s", queryNode0Text(match), queryNode1Text(transformed));
  2170. noteUnused(child);
  2171. HqlExprArray args;
  2172. args.append(*match.getClear());
  2173. unwindChildren(args, transformed, 2);
  2174. return createValue(no_setresult, makeVoidType(), args);
  2175. }
  2176. }
  2177. }
  2178. }
  2179. }
  2180. }
  2181. break;
  2182. }
  2183. }
  2184. break;
  2185. case no_keyeddistribute:
  2186. case no_distribute:
  2187. {
  2188. //If distribution matches existing and grouped then don't distribute, but still remove grouping.
  2189. IHqlExpression * distn = queryDistribution(transformed);
  2190. if (distn == queryDistribution(child))
  2191. {
  2192. assertex(isGrouped(child)); // not grouped handled already.
  2193. OwnedHqlExpr ret = createDataset(no_group, LINK(child));
  2194. DBGLOG("Optimizer: replace %s with %s", queryNode0Text(transformed), queryNode1Text(ret));
  2195. return transformed->cloneAllAnnotations(ret);
  2196. }
  2197. break;
  2198. }
  2199. case no_choosen:
  2200. {
  2201. IValue * num = transformed->queryChild(1)->queryValue();
  2202. if (num && (num->getIntValue() >= 1) && !queryRealChild(transformed, 2))
  2203. {
  2204. if (hasNoMoreRowsThan(child, 1))
  2205. return removeParentNode(transformed);
  2206. }
  2207. break;
  2208. }
  2209. case no_preservemeta:
  2210. {
  2211. node_operator childOp = child->getOperator();
  2212. switch(childOp)
  2213. {
  2214. case no_hqlproject:
  2215. case no_newusertable:
  2216. {
  2217. IHqlExpression * ret = hoistMetaOverProject(transformed);
  2218. if (ret)
  2219. return ret;
  2220. break;
  2221. }
  2222. //more; iterate, join? others?
  2223. case no_compound_diskread:
  2224. case no_compound_disknormalize:
  2225. case no_compound_indexread:
  2226. case no_compound_indexnormalize:
  2227. case no_compound_childread:
  2228. case no_compound_childnormalize:
  2229. case no_compound_selectnew:
  2230. case no_compound_inline:
  2231. return swapNodeWithChild(transformed);
  2232. }
  2233. break;
  2234. }
  2235. }
  2236. bool shared = childrenAreShared(transformed);
  2237. if (shared)
  2238. {
  2239. bool okToContinue = false;
  2240. switch (op)
  2241. {
  2242. case no_filter:
  2243. {
  2244. node_operator childOp = child->getOperator();
  2245. switch(childOp)
  2246. {
  2247. case no_hqlproject:
  2248. case no_newusertable:
  2249. {
  2250. IHqlExpression * ret = hoistFilterOverProject(transformed, true);
  2251. if (ret)
  2252. return ret;
  2253. break;
  2254. }
  2255. case no_inlinetable:
  2256. //shared is checked within the code below....
  2257. okToContinue = true;
  2258. break;
  2259. }
  2260. }
  2261. case no_hqlproject:
  2262. {
  2263. node_operator childOp = child->getOperator();
  2264. switch(childOp)
  2265. {
  2266. case no_inlinetable:
  2267. okToContinue = true;
  2268. break;
  2269. }
  2270. break;
  2271. }
  2272. case no_addfiles:
  2273. //It is generally worth always combining inlinetable + inlinetable because it opens the scope
  2274. //for more optimizations (e.g., filters on inlinetables) and the counts also become a known constant.
  2275. okToContinue = true;
  2276. break;
  2277. }
  2278. if (!okToContinue)
  2279. return LINK(transformed);
  2280. }
  2281. switch (op)
  2282. {
  2283. case no_choosen:
  2284. {
  2285. //worth moving a choosen over an activity that doesn't read a record at a time.
  2286. //also worth moving if it brings two projects closer togther, if
  2287. //that doesn't mess up a projected disk read.
  2288. IHqlExpression * const1 = transformed->queryChild(1);
  2289. IValue * val1 = const1->queryValue();
  2290. if (val1)
  2291. {
  2292. __int64 limit = val1->getIntValue();
  2293. if ((limit == CHOOSEN_ALL_LIMIT) && !transformed->queryChild(2))
  2294. return removeParentNode(transformed);
  2295. //if (limit == 0)
  2296. //.,..
  2297. }
  2298. node_operator childOp = child->getOperator();
  2299. switch(childOp)
  2300. {
  2301. case no_choosen:
  2302. {
  2303. if (transformed->queryChild(2) || child->queryChild(2))
  2304. {
  2305. //choosen(choosen(x, a, b), c, d))
  2306. //could generate choosen(x, (b+d-1), min(c, a)) but I doubt it is worth it....
  2307. break;
  2308. }
  2309. IHqlExpression * const2 = child->queryChild(1);
  2310. IValue * val2 = const2->queryValue();
  2311. if (val1 && val2)
  2312. {
  2313. __int64 ival1 = val1->getIntValue();
  2314. __int64 ival2 = val2->getIntValue();
  2315. IHqlExpression * newLimit;
  2316. if (ival1 < ival2)
  2317. newLimit = const1;
  2318. else
  2319. newLimit = const2;
  2320. DBGLOG("Optimizer: Merge %s and %s", queryNode0Text(transformed), queryNode1Text(child));
  2321. return createDataset(no_choosen, LINK(child->queryChild(0)), LINK(newLimit));
  2322. //don't bother to transform
  2323. }
  2324. break;
  2325. }
  2326. //This can be done, but I think it makes matters worse. The choosen() will short circuit the reading anyway,
  2327. //so no advantage of swapping with the project, and makes things worse, since stops projects commoning up.
  2328. case no_hqlproject:
  2329. case no_newusertable:
  2330. case no_transformascii:
  2331. case no_transformebcdic:
  2332. {
  2333. if (isPureActivity(child) && !isAggregateDataset(child))
  2334. {
  2335. //Don't move a choosen with a start value over a count project - we could if we also adjust the counter
  2336. if (!child->queryProperty(_countProject_Atom) || !queryRealChild(transformed, 2))
  2337. return forceSwapNodeWithChild(transformed);
  2338. }
  2339. break;
  2340. }
  2341. case no_fetch: //NB: Not filtered fetch
  2342. {
  2343. if (isPureActivity(child))
  2344. return swapNodeWithChild(transformed, 1);
  2345. break;
  2346. }
  2347. case no_if:
  2348. return swapIntoIf(transformed);
  2349. case no_nonempty:
  2350. return swapIntoAddFiles(transformed);
  2351. case no_sort:
  2352. {
  2353. unsigned __int64 topNLimit = 1000;
  2354. OwnedHqlExpr topn = queryConvertChoosenNSort(transformed, topNLimit);
  2355. if (topn)
  2356. {
  2357. noteUnused(child);
  2358. return topn.getClear();
  2359. }
  2360. break;
  2361. }
  2362. }
  2363. break;
  2364. }
  2365. case no_limit:
  2366. {
  2367. node_operator childOp = child->getOperator();
  2368. switch(childOp)
  2369. {
  2370. case no_hqlproject:
  2371. case no_newusertable:
  2372. {
  2373. if (isPureActivity(child) && !isAggregateDataset(child) && !transformed->hasProperty(onFailAtom))
  2374. return forceSwapNodeWithChild(transformed);
  2375. break;
  2376. }
  2377. case no_fetch:
  2378. {
  2379. if (isPureActivity(child))
  2380. return swapNodeWithChild(transformed, 1);
  2381. break;
  2382. }
  2383. case no_if:
  2384. return swapIntoIf(transformed);
  2385. case no_nonempty:
  2386. return swapIntoAddFiles(transformed);
  2387. case no_limit:
  2388. {
  2389. //Could be cleverer... but this is safer
  2390. if (transformed->queryProperty(skipAtom) != child->queryProperty(skipAtom))
  2391. break;
  2392. if (transformed->queryProperty(onFailAtom) != child->queryProperty(onFailAtom))
  2393. break;
  2394. OwnedHqlExpr parentLimit = foldHqlExpression(transformed->queryChild(1));
  2395. OwnedHqlExpr childLimit = foldHqlExpression(child->queryChild(1));
  2396. if (parentLimit == childLimit)
  2397. return removeParentNode(transformed);
  2398. IValue * parentLimitValue = parentLimit->queryValue();
  2399. IValue * childLimitValue = childLimit->queryValue();
  2400. if (parentLimitValue && childLimitValue)
  2401. {
  2402. if (parentLimitValue->getIntValue() <= childLimitValue->getIntValue())
  2403. return removeParentNode(transformed);
  2404. }
  2405. break;
  2406. }
  2407. case no_compound_indexread:
  2408. case no_compound_diskread:
  2409. if (!isLimitedDataset(child))
  2410. {
  2411. if (transformed->hasProperty(skipAtom) || transformed->hasProperty(onFailAtom))
  2412. {
  2413. //only merge if roxie
  2414. }
  2415. else
  2416. {
  2417. if ((options & HOOnoclonelimit) || ((options & HOOnocloneindexlimit) && (childOp == no_compound_indexread)))
  2418. return swapNodeWithChild(transformed);
  2419. OwnedHqlExpr childLimit = ::replaceChild(transformed, 0, child->queryChild(0));
  2420. OwnedHqlExpr localLimit = appendLocalAttribute(childLimit);
  2421. OwnedHqlExpr newCompound = ::replaceChild(child, 0, localLimit);
  2422. incUsage(localLimit);
  2423. incUsage(newCompound);
  2424. decUsage(child);
  2425. return ::replaceChild(transformed, 0, newCompound);
  2426. }
  2427. }
  2428. break;
  2429. case no_choosen:
  2430. {
  2431. OwnedHqlExpr parentLimit = foldHqlExpression(transformed->queryChild(1));
  2432. OwnedHqlExpr childLimit = foldHqlExpression(child->queryChild(1));
  2433. if (getIntValue(parentLimit, 0) > getIntValue(childLimit, I64C(0x7fffffffffffffff)))
  2434. return removeParentNode(transformed);
  2435. break;
  2436. }
  2437. case no_topn:
  2438. {
  2439. OwnedHqlExpr parentLimit = foldHqlExpression(transformed->queryChild(1));
  2440. OwnedHqlExpr childLimit = foldHqlExpression(child->queryChild(2));
  2441. if (getIntValue(parentLimit, 0) > getIntValue(childLimit, I64C(0x7fffffffffffffff)))
  2442. return removeParentNode(transformed);
  2443. break;
  2444. }
  2445. }
  2446. break;
  2447. }
  2448. case no_dedup:
  2449. {
  2450. node_operator childOp = child->getOperator();
  2451. switch(childOp)
  2452. {
  2453. case no_dedup:
  2454. {
  2455. DedupInfoExtractor dedup1(transformed); // slightly costly to create
  2456. DedupInfoExtractor dedup2(child);
  2457. switch (dedup1.compareWith(dedup2))
  2458. {
  2459. case DedupInfoExtractor::DedupDoesAll:
  2460. return removeChildNode(transformed);
  2461. }
  2462. break;
  2463. }
  2464. }
  2465. break;
  2466. }
  2467. case no_filter:
  2468. {
  2469. node_operator childOp = child->getOperator();
  2470. IHqlExpression * newGrandchild = child->queryChild(0);
  2471. switch(childOp)
  2472. {
  2473. case no_filter:
  2474. {
  2475. DBGLOG("Optimizer: Merge %s and %s", queryNode0Text(transformed), queryNode1Text(child));
  2476. HqlExprArray args;
  2477. unwindChildren(args, child);
  2478. unwindChildren(args, transformed, 1);
  2479. OwnedHqlExpr combined = child->clone(args);
  2480. return transformed->cloneAllAnnotations(combined);
  2481. }
  2482. case no_hqlproject:
  2483. case no_newusertable:
  2484. {
  2485. IHqlExpression * ret = hoistFilterOverProject(transformed, false);
  2486. if (ret)
  2487. return ret;
  2488. break;
  2489. }
  2490. //more; iterate, join? others?
  2491. case no_compound_diskread:
  2492. case no_compound_disknormalize:
  2493. case no_compound_indexread:
  2494. case no_compound_indexnormalize:
  2495. case no_compound_childread:
  2496. case no_compound_childnormalize:
  2497. case no_compound_selectnew:
  2498. case no_compound_inline:
  2499. if (!isLimitedDataset(child))// && child->isPure())
  2500. return swapNodeWithChild(transformed);
  2501. break;
  2502. case no_sorted:
  2503. case no_stepped:
  2504. case no_distributed:
  2505. case no_distribute:
  2506. case no_group:
  2507. case no_grouped:
  2508. case no_keyeddistribute:
  2509. case no_sort:
  2510. case no_preload:
  2511. case no_assertsorted:
  2512. case no_assertgrouped:
  2513. case no_assertdistributed:
  2514. return swapNodeWithChild(transformed);
  2515. case no_keyedlimit:
  2516. {
  2517. //It is ugly this is forced.... but ensures filters get combined
  2518. OwnedHqlExpr ret = swapNodeWithChild(transformed);
  2519. //Need to add the filter as a skip on the onFail() transform
  2520. IHqlExpression * onFail = ret->queryProperty(onFailAtom);
  2521. if (!onFail)
  2522. return ret.getClear();
  2523. IHqlExpression * limitTransform = onFail->queryChild(0);
  2524. if (!isKnownTransform(limitTransform))
  2525. return ret.getClear();
  2526. NewProjectMapper2 mapper;
  2527. mapper.setMapping(limitTransform);
  2528. HqlExprArray filterArgs;
  2529. unwindChildren(filterArgs, transformed, 1);
  2530. OwnedITypeInfo boolType = makeBoolType();
  2531. OwnedHqlExpr cond = createBalanced(no_and, boolType, filterArgs);
  2532. OwnedHqlExpr skipFilter = mapper.expandFields(cond, child, NULL, NULL, NULL);
  2533. OwnedHqlExpr skip = createValue(no_skip, makeVoidType(), getInverse(skipFilter));
  2534. OwnedHqlExpr newTransform = appendOwnedOperand(limitTransform, skip.getClear());
  2535. OwnedHqlExpr newOnFail = createExprAttribute(onFailAtom, newTransform.getClear());
  2536. return replaceOwnedProperty(ret, newOnFail.getClear());
  2537. }
  2538. case no_if:
  2539. return swapIntoIf(transformed);
  2540. case no_nonempty:
  2541. return swapIntoAddFiles(transformed);
  2542. case no_fetch:
  2543. if (isPureActivity(child) && !hasUnknownTransform(child))
  2544. {
  2545. IHqlExpression * ret = getHoistedFilter(transformed, false, false, true, true, NotFound);
  2546. if (ret)
  2547. return ret;
  2548. }
  2549. break;
  2550. case no_iterate:
  2551. //Should be possible to move a filter over a iterate, but only really same if the filter fields match the grouping criteria
  2552. #if 0
  2553. if (isPureActivity(child))
  2554. {
  2555. OwnedHqlExpr ret = queryPromotedFilter(transformed, no_right, 0);
  2556. if (ret)
  2557. return ret.getClear();
  2558. }
  2559. #endif
  2560. break;
  2561. case no_rollup:
  2562. //I don't think you can't move a filter over a rollup because it might affect the records rolled up.
  2563. //unless the filter fields match the grouping criteria
  2564. #if 0
  2565. if (isPureActivity(child))
  2566. {
  2567. OwnedHqlExpr ret = queryPromotedFilter(transformed, no_left, 0);
  2568. if (ret)
  2569. return ret.getClear();
  2570. }
  2571. #endif
  2572. break;
  2573. case no_selfjoin:
  2574. if (isPureActivity(child) && !hasUnknownTransform(child) && !isLimitedJoin(child) && !child->hasProperty(fullouterAtom) && !child->hasProperty(fullonlyAtom))
  2575. {
  2576. //Strictly speaking, we could hoist conditions that can be hoisted for left only (or even full) joins etc. if the fields that are filtered
  2577. //are based on equalities in the join condition. However, that can wait.... (same for join below...)
  2578. bool canHoistLeft = !child->hasProperty(rightouterAtom) && !child->hasProperty(rightonlyAtom) &&
  2579. !child->hasProperty(leftouterAtom) && !child->hasProperty(leftonlyAtom);
  2580. bool canMergeLeft = isInnerJoin(child);
  2581. bool canHoistRight = false;
  2582. bool canMergeRight = canMergeLeft;
  2583. IHqlExpression * ret = getHoistedFilter(transformed, canHoistLeft, canMergeLeft, canHoistRight, canMergeRight, 2);
  2584. if (ret)
  2585. return ret;
  2586. }
  2587. break;
  2588. case no_join:
  2589. if (isPureActivity(child) && !hasUnknownTransform(child) && !isLimitedJoin(child) && !child->hasProperty(fullouterAtom) && !child->hasProperty(fullonlyAtom))
  2590. {
  2591. bool canHoistLeft = !child->hasProperty(rightouterAtom) && !child->hasProperty(rightonlyAtom);
  2592. bool canMergeLeft = isInnerJoin(child);
  2593. bool canHoistRight = !child->hasProperty(leftouterAtom) && !child->hasProperty(leftonlyAtom) && !isKeyedJoin(child);
  2594. bool canMergeRight = canMergeLeft;
  2595. IHqlExpression * ret = getHoistedFilter(transformed, canHoistLeft, canMergeLeft, canHoistRight, canMergeRight, 2);
  2596. if (ret)
  2597. return ret;
  2598. }
  2599. break;
  2600. case no_select:
  2601. {
  2602. IHqlExpression * ret = moveFilterOverSelect(transformed);
  2603. if (ret)
  2604. return ret;
  2605. }
  2606. break;
  2607. case no_inlinetable:
  2608. if (options & HOOfoldconstantdatasets)
  2609. {
  2610. HqlExprArray conditions;
  2611. unwindChildren(conditions, transformed, 1);
  2612. OwnedITypeInfo boolType = makeBoolType();
  2613. OwnedHqlExpr filterCondition = createBalanced(no_and, boolType, conditions);
  2614. HqlExprArray filtered;
  2615. IHqlExpression * values = child->queryChild(0);
  2616. unsigned numValues = values->numChildren();
  2617. unsigned numOk = 0;
  2618. //A vague rule of thumb for the maximum proportion to retain if the dataset is shared.
  2619. unsigned maxSharedFiltered = (numValues >= 10) ? numValues / 10 : 1;
  2620. ForEachChild(i, values)
  2621. {
  2622. IHqlExpression * curTransform = values->queryChild(i);
  2623. if (!isKnownTransform(curTransform))
  2624. break;
  2625. NewProjectMapper2 mapper;
  2626. mapper.setMapping(curTransform);
  2627. OwnedHqlExpr expandedFilter = mapper.expandFields(filterCondition, child, NULL, NULL);
  2628. //This can prematurely ignore some expressions e.g., x and (' ' = ' '), but saves lots of
  2629. //additional constant folding on non constant expressions, so worthwhile.
  2630. if (!expandedFilter->isConstant())
  2631. break;
  2632. OwnedHqlExpr folded = foldHqlExpression(expandedFilter);
  2633. IValue * value = folded->queryValue();
  2634. if (!value)
  2635. break;
  2636. if (value->getBoolValue())
  2637. {
  2638. filtered.append(*LINK(curTransform));
  2639. //Only break sharing on an inline dataset if it generates something significantly smaller.
  2640. if (shared && (filtered.ordinality() > maxSharedFiltered))
  2641. break;
  2642. }
  2643. numOk++;
  2644. }
  2645. if (numOk == numValues)
  2646. {
  2647. if (filtered.ordinality() == 0)
  2648. return replaceWithNull(transformed);
  2649. if (filtered.ordinality() == values->numChildren())
  2650. return removeParentNode(transformed);
  2651. DBGLOG("Optimizer: Node %s reduce values in child: %s from %d to %d", queryNode0Text(transformed), queryNode1Text(child), values->numChildren(), filtered.ordinality());
  2652. HqlExprArray args;
  2653. args.append(*values->clone(filtered));
  2654. unwindChildren(args, child, 1);
  2655. decUsage(child);
  2656. return child->clone(args);
  2657. }
  2658. }
  2659. break;
  2660. }
  2661. break;
  2662. }
  2663. case no_keyedlimit:
  2664. {
  2665. node_operator childOp = child->getOperator();
  2666. switch(childOp)
  2667. {
  2668. case no_distributed:
  2669. case no_sorted:
  2670. case no_stepped:
  2671. case no_limit:
  2672. case no_choosen:
  2673. case no_compound_indexread:
  2674. case no_compound_diskread:
  2675. case no_assertsorted:
  2676. case no_assertdistributed:
  2677. return swapNodeWithChild(transformed);
  2678. case no_if:
  2679. return swapIntoIf(transformed);
  2680. case no_nonempty:
  2681. return swapIntoAddFiles(transformed);
  2682. }
  2683. break;
  2684. }
  2685. case no_hqlproject:
  2686. {
  2687. node_operator childOp = child->getOperator();
  2688. IHqlExpression * transformedCountProject = transformed->queryProperty(_countProject_Atom);
  2689. if (transformed->hasProperty(prefetchAtom))
  2690. break; // play safe
  2691. IHqlExpression * transformKeyed = transformed->queryProperty(keyedAtom);
  2692. IHqlExpression * transform = transformed->queryChild(1);
  2693. switch(childOp)
  2694. {
  2695. case no_if:
  2696. if (isComplexTransform(transform))
  2697. break;
  2698. return swapIntoIf(transformed);
  2699. case no_nonempty:
  2700. if (isComplexTransform(transform))
  2701. break;
  2702. return swapIntoAddFiles(transformed);
  2703. case no_newusertable:
  2704. if (isAggregateDataset(child))
  2705. break;
  2706. case no_hqlproject:
  2707. {
  2708. if (!isPureActivityIgnoringSkip(child) || hasUnknownTransform(child))
  2709. break;
  2710. IHqlExpression * childCountProject = child->queryProperty(_countProject_Atom);
  2711. //Don't merge two count projects - unless we go through and replace counter instances.
  2712. if (transformedCountProject && childCountProject)
  2713. break;
  2714. IHqlExpression * childKeyed = child->queryProperty(keyedAtom);
  2715. if (childKeyed && !transformKeyed)
  2716. break;
  2717. OwnedMapper mapper = getMapper(child);
  2718. IHqlExpression * transformedSeq = querySelSeq(transformed);
  2719. OwnedHqlExpr oldLeft = createSelector(no_left, child, transformedSeq);
  2720. OwnedHqlExpr newLeft = createSelector(no_left, child->queryChild(0), transformedSeq);
  2721. ExpandSelectorMonitor monitor(*this);
  2722. OwnedHqlExpr expandedTransform = expandFields(mapper, transform, oldLeft, newLeft, &monitor);
  2723. if (expandedTransform && !monitor.isComplex())
  2724. {
  2725. expandedTransform.setown(inheritSkips(expandedTransform, child->queryChild(1), mapper->queryTransformSelector(), newLeft));
  2726. DBGLOG("Optimizer: Merge %s and %s", queryNode0Text(transformed), queryNode1Text(child));
  2727. //NB: Merging a project with a count project can actually remove the count project..
  2728. IHqlExpression * countProjectAttr = transformedCountProject;
  2729. if (childCountProject && transformContainsCounter(expandedTransform, childCountProject->queryChild(0)))
  2730. countProjectAttr = childCountProject;
  2731. if (countProjectAttr)
  2732. expandedTransform.setown(createComma(LINK(expandedTransform), LINK(countProjectAttr)));
  2733. noteUnused(child);
  2734. OwnedHqlExpr ret = createDataset(op, LINK(child->queryChild(0)), createComma(expandedTransform.getClear(), LINK(transformedSeq), LINK(transformKeyed)));
  2735. ret.setown(child->cloneAllAnnotations(ret));
  2736. return transformed->cloneAllAnnotations(ret);
  2737. }
  2738. break;
  2739. }
  2740. case no_join:
  2741. if (isKeyedJoin(child))
  2742. break;
  2743. //fall through
  2744. case no_selfjoin:
  2745. case no_fetch:
  2746. case no_normalize:
  2747. case no_newparse:
  2748. case no_newxmlparse:
  2749. case no_rollupgroup:
  2750. {
  2751. if (!isPureActivity(child) || !isPureActivity(transformed) || transformed->queryProperty(_countProject_Atom))
  2752. break;
  2753. IHqlExpression * transformedSeq = querySelSeq(transformed);
  2754. OwnedHqlExpr oldLeft = createSelector(no_left, child, transformedSeq);
  2755. IHqlExpression * ret = expandProjectedDataset(child, transform, oldLeft, transformed);
  2756. if (ret)
  2757. return ret;
  2758. break;
  2759. }
  2760. case no_preload:
  2761. if (!transformedCountProject)
  2762. return swapNodeWithChild(transformed);
  2763. break;
  2764. case no_sort:
  2765. if (transformedCountProject)
  2766. break;
  2767. if (increasesRowSize(transformed))
  2768. break;
  2769. return moveProjectionOverSimple(transformed, true, false);
  2770. case no_distribute:
  2771. if (increasesRowSize(transformed))
  2772. break;
  2773. return moveProjectionOverSimple(transformed, true, false);
  2774. case no_distributed:
  2775. case no_sorted:
  2776. case no_grouped:
  2777. return moveProjectionOverSimple(transformed, false, false);
  2778. case no_stepped:
  2779. return moveProjectionOverSimple(transformed, true, false);
  2780. case no_keyedlimit:
  2781. if (isWorthMovingProjectOverLimit(transformed))
  2782. {
  2783. if (child->hasProperty(onFailAtom))
  2784. return moveProjectionOverLimit(transformed);
  2785. return swapNodeWithChild(transformed);
  2786. }
  2787. break;
  2788. case no_catchds:
  2789. //could treat like a limit, but not at the moment
  2790. break;
  2791. case no_limit:
  2792. case no_choosen:
  2793. if (isWorthMovingProjectOverLimit(transformed))
  2794. {
  2795. //MORE: Later this is going to be worth moving aggregates.... when we have a compound aggregates.
  2796. if (isPureActivity(transformed) && !isAggregateDataset(transformed) && !transformedCountProject)
  2797. {
  2798. if (child->hasProperty(onFailAtom))
  2799. return moveProjectionOverLimit(transformed);
  2800. return swapNodeWithChild(transformed);
  2801. }
  2802. }
  2803. break;
  2804. case no_inlinetable:
  2805. {
  2806. if (transformContainsSkip(transform))
  2807. break;
  2808. IHqlExpression * ret = optimizeProjectInlineTable(transformed, shared);
  2809. if (ret)
  2810. return ret;
  2811. break;
  2812. }
  2813. case no_compound_diskread:
  2814. case no_compound_disknormalize:
  2815. case no_compound_indexread:
  2816. case no_compound_indexnormalize:
  2817. case no_compound_childread:
  2818. case no_compound_childnormalize:
  2819. case no_compound_selectnew:
  2820. case no_compound_inline:
  2821. if (!transformedCountProject)
  2822. return swapNodeWithChild(transformed);
  2823. break;
  2824. case no_addfiles:
  2825. if (transformedCountProject || isComplexTransform(transform))
  2826. break;
  2827. return swapIntoAddFiles(transformed);
  2828. }
  2829. break;
  2830. }
  2831. case no_projectrow:
  2832. {
  2833. node_operator childOp = child->getOperator();
  2834. switch(childOp)
  2835. {
  2836. case no_if:
  2837. if (isComplexTransform(transformed->queryChild(1)))
  2838. break;
  2839. return swapIntoIf(transformed);
  2840. case no_createrow:
  2841. case no_projectrow:
  2842. {
  2843. if (!isPureActivity(child) || !isPureActivity(transformed) || hasUnknownTransform(child))
  2844. break;
  2845. IHqlExpression * transform = transformed->queryChild(1);
  2846. IHqlExpression * transformedSeq = querySelSeq(transformed);
  2847. OwnedHqlExpr oldLeft = createSelector(no_left, child, transformedSeq);
  2848. OwnedMapper mapper = getMapper(child);
  2849. ExpandSelectorMonitor monitor(*this);
  2850. OwnedHqlExpr expandedTransform = expandFields(mapper, transform, oldLeft, NULL, &monitor);
  2851. if (expandedTransform && !monitor.isComplex())
  2852. {
  2853. DBGLOG("Optimizer: Merge %s and %s", queryNode0Text(transformed), queryNode1Text(child));
  2854. HqlExprArray args;
  2855. unwindChildren(args, child);
  2856. args.replace(*expandedTransform.getClear(), queryTransformIndex(child));
  2857. noteUnused(child);
  2858. return createRow(child->getOperator(), args);
  2859. }
  2860. break;
  2861. }
  2862. }
  2863. break;
  2864. }
  2865. case no_selectfields:
  2866. case no_usertable:
  2867. //shouldn't really have any, because we can't really process them properly.
  2868. break;
  2869. case no_newusertable:
  2870. {
  2871. node_operator childOp = child->getOperator();
  2872. switch(childOp)
  2873. {
  2874. case no_if:
  2875. if (isComplexTransform(transformed->queryChild(2)))
  2876. break;
  2877. return swapIntoIf(transformed);
  2878. case no_nonempty:
  2879. if (isComplexTransform(transformed->queryChild(2)))
  2880. break;
  2881. return swapIntoAddFiles(transformed);
  2882. case no_newusertable:
  2883. if (isAggregateDataset(child))
  2884. break;
  2885. //fallthrough.
  2886. case no_hqlproject:
  2887. {
  2888. if (!isPureActivity(child) || hasUnknownTransform(child))
  2889. break;
  2890. if (child->hasProperty(_countProject_Atom) || child->hasProperty(prefetchAtom))
  2891. break;
  2892. IHqlExpression * transformKeyed = transformed->queryProperty(keyedAtom);
  2893. IHqlExpression * childKeyed = child->queryProperty(keyedAtom);
  2894. if (childKeyed && !transformKeyed)
  2895. break;
  2896. IHqlExpression * grandchild = child->queryChild(0);
  2897. OwnedMapper mapper = getMapper(child);
  2898. HqlExprArray args;
  2899. args.append(*LINK(grandchild));
  2900. args.append(*LINK(transformed->queryChild(1)));
  2901. ExpandSelectorMonitor monitor(*this);
  2902. IHqlExpression * transformExpr = transformed->queryChild(2);
  2903. HqlExprArray assigns;
  2904. ForEachChild(idxt, transformExpr)
  2905. {
  2906. IHqlExpression * cur = transformExpr->queryChild(idxt);
  2907. IHqlExpression * tgt = cur->queryChild(0);
  2908. IHqlExpression * src = cur->queryChild(1);
  2909. assigns.append(*createAssign(LINK(tgt), expandFields(mapper, src, child, grandchild, &monitor)));
  2910. }
  2911. OwnedHqlExpr expandedTransform = transformExpr->clone(assigns);
  2912. args.append(*LINK(expandedTransform));
  2913. unsigned max = transformed->numChildren();
  2914. for(unsigned idx=3; idx < max; idx++)
  2915. args.append(*expandFields(mapper, transformed->queryChild(idx), child, grandchild, &monitor));
  2916. if (!monitor.isComplex())
  2917. {
  2918. DBGLOG("Optimizer: Merge %s and %s", queryNode0Text(transformed), queryNode1Text(child));
  2919. removeProperty(args, _internal_Atom);
  2920. noteUnused(child);
  2921. return transformed->clone(args);
  2922. }
  2923. break;
  2924. }
  2925. case no_join:
  2926. if (isKeyedJoin(child))
  2927. break;
  2928. //fall through
  2929. case no_selfjoin:
  2930. case no_fetch:
  2931. case no_normalize:
  2932. case no_newparse:
  2933. case no_newxmlparse:
  2934. case no_rollupgroup:
  2935. {
  2936. if (!isPureActivity(child) || !isPureActivity(transformed))
  2937. break;
  2938. IHqlExpression * transform = transformed->queryChild(2);
  2939. IHqlExpression * ret = expandProjectedDataset(child, transform, child, transformed);
  2940. if (ret)
  2941. return ret;
  2942. break;
  2943. }
  2944. case no_preload:
  2945. return swapNodeWithChild(transformed);
  2946. case no_distribute:
  2947. case no_sort:
  2948. if (increasesRowSize(transformed))
  2949. break;
  2950. return moveProjectionOverSimple(transformed, true, false);
  2951. case no_distributed:
  2952. case no_sorted:
  2953. case no_grouped:
  2954. return moveProjectionOverSimple(transformed, false, false);
  2955. case no_stepped:
  2956. return moveProjectionOverSimple(transformed, false, true);
  2957. case no_keyedlimit:
  2958. case no_limit:
  2959. case no_choosen:
  2960. if (isWorthMovingProjectOverLimit(transformed))
  2961. {
  2962. if (isPureActivity(transformed) && !isAggregateDataset(transformed))
  2963. {
  2964. if (child->hasProperty(onFailAtom))
  2965. return moveProjectionOverLimit(transformed);
  2966. return swapNodeWithChild(transformed);
  2967. }
  2968. }
  2969. break;
  2970. case no_compound_diskread:
  2971. case no_compound_disknormalize:
  2972. case no_compound_indexread:
  2973. case no_compound_indexnormalize:
  2974. case no_compound_childread:
  2975. case no_compound_childnormalize:
  2976. case no_compound_selectnew:
  2977. case no_compound_inline:
  2978. if (!isAggregateDataset(transformed))
  2979. return swapNodeWithChild(transformed);
  2980. break;
  2981. case no_addfiles:
  2982. if (isComplexTransform(transformed->queryChild(2)))
  2983. break;
  2984. return swapIntoAddFiles(transformed);
  2985. case no_inlinetable:
  2986. {
  2987. IHqlExpression * ret = optimizeProjectInlineTable(transformed, shared);
  2988. if (ret)
  2989. return ret;
  2990. break;
  2991. }
  2992. }
  2993. break;
  2994. }
  2995. case no_group:
  2996. {
  2997. switch (child->getOperator())
  2998. {
  2999. case no_group:
  3000. {
  3001. IHqlExpression * newChild = child;
  3002. bool isLocal = transformed->hasProperty(localAtom);
  3003. while (newChild->getOperator() == no_group)
  3004. {
  3005. if (newChild->queryProperty(allAtom))
  3006. break;
  3007. if (queryRealChild(newChild, 1))
  3008. {
  3009. //Don't allow local groups to remove non-local groups.
  3010. if (isLocal && !newChild->hasProperty(localAtom))
  3011. break;
  3012. }
  3013. noteUnused(newChild);
  3014. newChild = newChild->queryChild(0);
  3015. }
  3016. if (child == newChild)
  3017. break;
  3018. if (queryGrouping(transformed) == queryGrouping(newChild))
  3019. {
  3020. decUsage(newChild); // since will inherit usage on return
  3021. return LINK(newChild);
  3022. }
  3023. return replaceChild(transformed, newChild);
  3024. }
  3025. case no_hqlproject:
  3026. case no_newusertable:
  3027. //Move ungroups() over projects to increase the likely hood of combining projects and removing groups
  3028. // if (!queryRealChild(transformed, 1) && !child->hasProperty(_countProject_Atom) && !isAggregateDataset(child))
  3029. // return swapNodeWithChild(transformed);
  3030. break;
  3031. }
  3032. break;
  3033. }
  3034. //GH->Ilka no_enth now has a different format, may want to do something with that as well.
  3035. case no_sample:
  3036. {
  3037. IValue * const1 = transformed->queryChild(1)->queryValue();
  3038. if (const1)
  3039. {
  3040. __int64 val1 = const1->getIntValue();
  3041. if (val1 == 1)
  3042. return removeParentNode(transformed);
  3043. node_operator childOp = child->getOperator();
  3044. switch(childOp)
  3045. {
  3046. case no_hqlproject:
  3047. case no_newusertable:
  3048. if (isPureActivity(child) && !child->hasProperty(_countProject_Atom) && !child->hasProperty(prefetchAtom) && !isAggregateDataset(child))
  3049. return swapNodeWithChild(transformed);
  3050. break;
  3051. }
  3052. }
  3053. break;
  3054. }
  3055. case no_sort:
  3056. {
  3057. switch(child->getOperator())
  3058. {
  3059. case no_sort:
  3060. if (!isLocalActivity(transformed) || isLocalActivity(child))
  3061. return removeChildNode(transformed);
  3062. break;
  3063. case no_distributed:
  3064. case no_distribute:
  3065. case no_keyeddistribute:
  3066. if (!isLocalActivity(transformed))
  3067. return removeChildNode(transformed); // no transform()
  3068. break;
  3069. }
  3070. break;
  3071. }
  3072. case no_keyeddistribute:
  3073. case no_distribute:
  3074. {
  3075. if (transformed->hasProperty(skewAtom))
  3076. break;
  3077. //If distribution matches existing and grouped then don't distribute, but still remove grouping.
  3078. IHqlExpression * distn = queryDistribution(transformed);
  3079. switch(child->getOperator())
  3080. {
  3081. case no_distributed:
  3082. case no_distribute:
  3083. case no_keyeddistribute:
  3084. case no_sort:
  3085. if (!transformed->hasProperty(mergeAtom))
  3086. return removeChildNode(transformed);
  3087. break;
  3088. case no_dedup:
  3089. {
  3090. IHqlExpression * ret = optimizeDistributeDedup(transformed);
  3091. if (ret)
  3092. return ret;
  3093. break;
  3094. }
  3095. case no_addfiles:
  3096. if ((distn == queryDistribution(child->queryChild(0))) ||
  3097. (distn == queryDistribution(child->queryChild(1))))
  3098. return swapIntoAddFiles(transformed);
  3099. break;
  3100. }
  3101. break;
  3102. }
  3103. case no_distributed:
  3104. {
  3105. switch(child->getOperator())
  3106. {
  3107. case no_distribute:
  3108. case no_distributed:
  3109. if (transformed->queryChild(1) == child->queryChild(1))
  3110. return removeParentNode(transformed);
  3111. break;
  3112. case no_compound_diskread:
  3113. case no_compound_disknormalize:
  3114. case no_compound_indexread:
  3115. case no_compound_indexnormalize:
  3116. return swapNodeWithChild(transformed);
  3117. }
  3118. break;
  3119. }
  3120. case no_sorted:
  3121. {
  3122. switch(child->getOperator())
  3123. {
  3124. case no_compound_diskread:
  3125. case no_compound_disknormalize:
  3126. case no_compound_indexread:
  3127. case no_compound_indexnormalize:
  3128. return swapNodeWithChild(transformed);
  3129. }
  3130. break;
  3131. }
  3132. case no_aggregate:
  3133. case no_newaggregate:
  3134. {
  3135. node_operator childOp = child->getOperator();
  3136. switch(childOp)
  3137. {
  3138. case no_if:
  3139. return swapIntoIf(transformed);
  3140. case no_nonempty:
  3141. return swapIntoAddFiles(transformed);
  3142. case no_compound_diskread:
  3143. case no_compound_disknormalize:
  3144. case no_compound_indexread:
  3145. case no_compound_indexnormalize:
  3146. case no_compound_childread:
  3147. case no_compound_childnormalize:
  3148. if (!isGrouped(child) && (options & HOOhascompoundaggregate) && !transformed->hasProperty(localAtom))
  3149. {
  3150. IHqlExpression * ret = optimizeAggregateCompound(transformed);
  3151. if (ret)
  3152. return ret;
  3153. }
  3154. break;
  3155. case no_thisnode:
  3156. return swapNodeWithChild(transformed);
  3157. }
  3158. //MORE: The OHOinsidecompound isn't really good enough - because might remove projects from
  3159. //nested child aggregates which could benifit from them. Probably not as long as all compound
  3160. //activities support aggregation. In fact test should be removable everywhere once all
  3161. //engines support the new activities.
  3162. if (isGrouped(transformed->queryChild(0)) || (queryRealChild(transformed, 3) && !(options & HOOinsidecompound)))
  3163. break;
  3164. return optimizeAggregateDataset(transformed);
  3165. }
  3166. case NO_AGGREGATE:
  3167. case no_countindex:
  3168. return optimizeAggregateDataset(transformed);
  3169. case no_fetch:
  3170. {
  3171. //NB: Required for fetch implementation
  3172. node_operator childOp = child->getOperator();
  3173. switch(childOp)
  3174. {
  3175. case no_newusertable:
  3176. if (isAggregateDataset(child))
  3177. break;
  3178. //fallthrough.
  3179. case no_hqlproject:
  3180. if (!hasUnknownTransform(child))
  3181. {
  3182. OwnedMapper mapper = getMapper(child);
  3183. IHqlExpression * selSeq = querySelSeq(transformed);
  3184. OwnedHqlExpr oldLeft = createSelector(no_left, child, selSeq);
  3185. OwnedHqlExpr newLeft = createSelector(no_left, child->queryChild(0), selSeq);
  3186. IHqlExpression * expanded = expandFields(mapper, transformed->queryChild(3), oldLeft, newLeft);
  3187. if (expanded)
  3188. {
  3189. DBGLOG("Optimizer: Merge %s and %s", queryNode0Text(transformed), queryNode1Text(child));
  3190. HqlExprArray args;
  3191. args.append(*LINK(child->queryChild(0)));
  3192. args.append(*LINK(transformed->queryChild(1)));
  3193. args.append(*LINK(transformed->queryChild(2)));
  3194. args.append(*expanded);
  3195. args.append(*LINK(selSeq));
  3196. return transformed->clone(args);
  3197. }
  3198. }
  3199. break;
  3200. }
  3201. break;
  3202. }
  3203. case no_addfiles:
  3204. {
  3205. //MORE: This is possibly worth doing even if the children are shared.
  3206. HqlExprArray allTransforms;
  3207. bool ok = true;
  3208. ForEachChild(i, transformed)
  3209. {
  3210. IHqlExpression * cur = transformed->queryChild(i);
  3211. if (!cur->isAttribute())
  3212. {
  3213. if (cur->getOperator() != no_inlinetable)
  3214. {
  3215. ok = false;
  3216. break;
  3217. }
  3218. cur->queryChild(0)->unwindList(allTransforms, no_transformlist);
  3219. }
  3220. }
  3221. if (!ok)
  3222. break;
  3223. DBGLOG("Optimizer: Merge inline tables for %s", queryNode0Text(transformed));
  3224. HqlExprArray args;
  3225. args.append(*createValue(no_transformlist, makeNullType(), allTransforms));
  3226. args.append(*LINK(child->queryRecord()));
  3227. ForEachChild(i2, transformed)
  3228. {
  3229. IHqlExpression * cur = transformed->queryChild(i2);
  3230. if (!cur->isAttribute())
  3231. decUsage(cur);
  3232. }
  3233. OwnedHqlExpr ret = createDataset(no_inlinetable, args);
  3234. return transformed->cloneAllAnnotations(ret);
  3235. }
  3236. #if 0
  3237. //Something like the following might theoretically be useful, but seems to cause problems not commoning up
  3238. case no_select:
  3239. if (transformed->hasProperty(newAtom) && !childrenAreShared(child))
  3240. {
  3241. OwnedHqlExpr ret = transformTrivialSelectProject(transformed);
  3242. if (ret)
  3243. {
  3244. DBGLOG("Optimizer: Select %s from %s optimized", ret->queryChild(1)->queryName()->str(), queryNode1Text(child));
  3245. noteUnused(child);
  3246. return ret.getClear();
  3247. }
  3248. }
  3249. break;
  3250. #endif
  3251. case no_datasetfromrow:
  3252. {
  3253. node_operator childOp = child->getOperator();
  3254. switch (childOp)
  3255. {
  3256. case no_projectrow:
  3257. {
  3258. break;
  3259. IHqlExpression * grand = child->queryChild(0);
  3260. IHqlExpression * base = createDatasetFromRow(LINK(grand));
  3261. HqlExprArray args;
  3262. unwindChildren(args, child);
  3263. args.replace(*base, 0);
  3264. return createDataset(no_hqlproject, args);
  3265. }
  3266. case no_createrow:
  3267. {
  3268. DBGLOG("Optimizer: Merge %s and %s to Inline table", queryNode0Text(transformed), queryNode1Text(child));
  3269. HqlExprArray args;
  3270. args.append(*createValue(no_transformlist, makeNullType(), LINK(child->queryChild(0))));
  3271. args.append(*LINK(child->queryRecord()));
  3272. OwnedHqlExpr ret = createDataset(no_inlinetable, args);
  3273. ret.setown(child->cloneAllAnnotations(ret));
  3274. return transformed->cloneAllAnnotations(ret);
  3275. }
  3276. }
  3277. break;
  3278. }
  3279. case no_join:
  3280. {
  3281. if (isKeyedJoin(transformed) || transformed->hasProperty(lookupAtom))
  3282. {
  3283. node_operator childOp = child->getOperator();
  3284. switch (childOp)
  3285. {
  3286. case no_newusertable:
  3287. case no_hqlproject:
  3288. {
  3289. if (!isPureActivity(child) || child->queryProperty(_countProject_Atom) || child->hasProperty(prefetchAtom))
  3290. break;
  3291. IHqlExpression * transform = queryNewColumnProvider(child);
  3292. if (transformContainsSkip(transform) || !isSimpleTransformToMergeWith(transform))
  3293. break;
  3294. OwnedMapper mapper = getMapper(child);
  3295. IHqlExpression * transformedSeq = querySelSeq(transformed);
  3296. OwnedHqlExpr oldLeft = createSelector(no_left, child, transformedSeq);
  3297. OwnedHqlExpr newLeft = createSelector(no_left, child->queryChild(0), transformedSeq);
  3298. bool ok = true;
  3299. HqlExprArray args;
  3300. args.append(*LINK(child->queryChild(0)));
  3301. args.append(*LINK(transformed->queryChild(1)));
  3302. ExpandSelectorMonitor monitor(*this);
  3303. ForEachChildFrom(i, transformed, 2)
  3304. {
  3305. OwnedHqlExpr expanded = expandFields(mapper, transformed->queryChild(i), oldLeft, newLeft, &monitor);
  3306. if (expanded && !monitor.isComplex())
  3307. {
  3308. args.append(*expanded.getClear());
  3309. }
  3310. else
  3311. {
  3312. ok = false;
  3313. break;
  3314. }
  3315. }
  3316. if (ok)
  3317. {
  3318. //If expanding the project removed all references to left (very silly join....) make it an all join
  3319. if (transformed->hasProperty(lookupAtom) && !exprReferencesDataset(&args.item(2), newLeft))
  3320. args.append(*createAttribute(allAtom));
  3321. DBGLOG("Optimizer: Merge %s and %s", queryNode0Text(transformed), queryNode1Text(child));
  3322. noteUnused(child);
  3323. return transformed->clone(args);
  3324. }
  3325. break;
  3326. }
  3327. }
  3328. }
  3329. break;
  3330. }
  3331. case no_selectnth:
  3332. {
  3333. node_operator childOp = child->getOperator();
  3334. switch(childOp)
  3335. {
  3336. case no_sort:
  3337. {
  3338. IHqlExpression * index = transformed->queryChild(1);
  3339. if (getIntValue(index, 99999) <= 100 && !isGrouped(child))
  3340. {
  3341. HqlExprArray topnArgs;
  3342. unwindChildren(topnArgs, child);
  3343. topnArgs.add(*LINK(index), 2);
  3344. OwnedHqlExpr topn = createDataset(no_topn, topnArgs);
  3345. incUsage(topn);
  3346. DBGLOG("Optimizer: Replace %s with %s", queryNode0Text(child), queryNode1Text(topn));
  3347. HqlExprArray selectnArgs;
  3348. selectnArgs.append(*child->cloneAllAnnotations(topn));
  3349. unwindChildren(selectnArgs, transformed, 1);
  3350. return transformed->clone(selectnArgs);
  3351. }
  3352. break;
  3353. }
  3354. }
  3355. }
  3356. }
  3357. return LINK(transformed);
  3358. }
  3359. IHqlExpression * CTreeOptimizer::defaultCreateTransformed(IHqlExpression * expr)
  3360. {
  3361. return PARENT::createTransformed(expr);
  3362. }
  3363. TableProjectMapper * CTreeOptimizer::getMapper(IHqlExpression * expr)
  3364. {
  3365. return new TableProjectMapper(expr);
  3366. }
  3367. bool CTreeOptimizer::isShared(IHqlExpression * expr)
  3368. {
  3369. switch (expr->getOperator())
  3370. {
  3371. case no_null:
  3372. return false;
  3373. case no_spillgraphresult:
  3374. case no_spill:
  3375. case no_split:
  3376. case no_throughaggregate:
  3377. case no_commonspill:
  3378. return true;
  3379. }
  3380. return (queryBodyExtra(expr)->useCount > 1);
  3381. }
  3382. bool CTreeOptimizer::isSharedOrUnknown(IHqlExpression * expr)
  3383. {
  3384. switch (expr->getOperator())
  3385. {
  3386. case no_null:
  3387. return false;
  3388. case no_spillgraphresult:
  3389. case no_spill:
  3390. case no_split:
  3391. case no_throughaggregate:
  3392. case no_commonspill:
  3393. return true;
  3394. }
  3395. OptTransformInfo * extra = queryBodyExtra(expr);
  3396. return (extra->useCount != 1);
  3397. }
  3398. IHqlExpression * optimizeHqlExpression(IHqlExpression * expr, unsigned options)
  3399. {
  3400. //The no_compound can get very heavily nested => unwind to save stack traversal. We really should support nary no_compound
  3401. HqlExprArray args, newArgs;
  3402. unwindCommaCompound(args, expr);
  3403. optimizeHqlExpression(newArgs, args, options);
  3404. return createActionList(newArgs);
  3405. }
  3406. void optimizeHqlExpression(HqlExprArray & target, HqlExprArray & source, unsigned options)
  3407. {
  3408. CTreeOptimizer optimizer(options);
  3409. optimizer.analyseArray(source, 0);
  3410. optimizer.transformRoot(source, target);
  3411. }
  3412. /*
  3413. Implementation issues:
  3414. 1. References to transformed items.
  3415. x := project(w, ...);
  3416. y := filter(x, ...);
  3417. z := distibute(y, x.fx);
  3418. when x and y are switched, all references to x need to be replaced by x'
  3419. y' := filter(w, ...);
  3420. x' := project(y', ...);
  3421. z := distibute(x', x'.fx);
  3422. Need to map an selector, where selector->queryNormalized() == oldDataset->queryNormalized() and replace with newDataset->queryNormalized()
  3423. However, the mapping is context dependant - depending on what the parent dataset is.
  3424. Could either have transformed[parentDataset] or could post process the transformed expression.
  3425. So to process efficiently, we need:
  3426. a) transformedSelector[parentCtx];
  3427. b) transformed[parentCtx]
  3428. c) on dataset transform, set dataset->queryNormalizedSelector()->transformedSelector[ctx] to newDataset->queryNormalizedSelector();
  3429. d) on mapping, replace with i) queryTransformed(x) or queryNomalizedSelector()->transformedSelector[ctx];
  3430. Could either have
  3431. expr->queryExtra()->transformedSelector[parentCtx]
  3432. or
  3433. ::transformSelector[parentCtx, expr]
  3434. First is not likely to affect many nodes - since only will be set on datasets.
  3435. Second is likely to use much less memory, and probably as quick - trading an extra indirection+construction time with an assign to a structure.
  3436. Have a noComma(top-ds, prev-ctx) to mark the current context.
  3437. *** Only need to change if dataset is visible inside the arguments to the ECL syntax ***
  3438. Use an array of ctx, where tos is current don't seed with a dummy value - because will cause commas to be created
  3439. The idea of the transformedSelector should also be generalized:
  3440. if (!transformed) try transformedSelector, and set transformedSelector to result.
  3441. - should we replace the boolean flags in CHqlExpression with a mask?
  3442. i) would make anding /oring more efficient.
  3443. ii) would make adding code generator helpers much less painful - use 32bits and allocate from top down for the code generator.
  3444. Useful flags
  3445. - context free - not getresults or access to fields in unrelated tables.
  3446. - unconditional?
  3447. - look at transforms and see what causes pain.
  3448. 2. optimizing shared items.
  3449. * When is it worthwhile?
  3450. o removing duplicate sorts?
  3451. o when it only removes a node e.g., count(project).
  3452. o when would enable operation to be done more efficiently. ??Eg.
  3453. * Need to differentiate between a use and a reference - only link count former.
  3454. */