5 tahun lalu · 1b8619d5c6
--- a/readme.md
+++ b/readme.md
@@ -1,6 +1,7 @@
 
				 [1 Image patch extraction](#1-image-patch-extraction)   
			
 
				 [2 Prediction](#2-prediction)  
			
 
				 [3 Heatmap stitching](#3-heatmap-stitching)
			
 
				+[4 Retreiving run-time statistics](#4-Retreiving-run-time-statistics)
			
 
				 
			
 
				 # 1 Image patch extraction
			
 
				 The following commands launch Son of Grid Engine (SGE) jobs to extract, group patches in HDF5 files and create a lookup tables for every HDF5 file. 
			
@@ -34,152 +35,14 @@ qsub process_main.sh ./config_normal.txt
 
				 qsub process_main.sh ./config_tumor.txt  
			
 
				 
			
 
				 # 3 Heatmap stitching
			
 
				-After the predictions matrices have been generated the following SGE job could be launched to genertae heatmaps.
			
 
				+After the predictions matrices have been generated an SGE job using heatmap_main.sh SGE scrip could be launched to genertae heatmaps. Two arguments for this launch are: a) type of the slides (test, normal or tumor); b) the root directory of the results, like in below ecxample run:  
			
 
				 qsub heatmap_main.sh test /scratch/mikem/UserSupport/weizhe.li/runs_process_cn_True/testing_wnorm_448_400_7690953
			
 
				 
			
 
				+# 4 Retreiving run-time statistics
			
 
				+In time_all_stats_pred.sh file adjust job results root directory, like:  
			
 
				+DIR=/scratch/mikem/UserSupport/weizhe.li/runs_process_cn_False/normal_wnorm_448_400_7691563  
			
 
				+Then run:
			
 
				+time bash ./time_all_stats_pred.sh  
			
 
				 
			
 
				 
			
 
				-# TESTING
			
 
				-[mikem@betsy02 split_wsi]$ ls -alsh /scratch/wxc4/CAMELYON16-testing | less
			
 
				-total 206G
			
 
				-[mikem@betsy02 split_wsi]$ ls -1 /scratch/wxc4/CAMELYON16-testing | wc -l
			
 
				-129
			
 
				 
			
 
				-# TUMOR
			
 
				-[mikem@betsy02 split_wsi]$ ls -alsh /scratch/wxc4/CAMELYON16-training/tumor | less
			
 
				-total 219G
			
 
				-[mikem@betsy02 split_wsi]$ ls -1 /scratch/wxc4/CAMELYON16-training/tumor | wc -l
			
 
				-111
			
 
				-
			
 
				-# NORMAL
			
 
				-[mikem@betsy02 split_wsi]$ ls -alsh /scratch/wxc4/CAMELYON16-training/normal | less
			
 
				-total 278G
			
 
				-[mikem@betsy02 split_wsi]$ ls -1 /scratch/wxc4/CAMELYON16-training/normal | wc -l
			
 
				-159
			
 
				-
			
 
				-
			
 
				-location of the bouding box
			
 
				-
			
 
				-    dimensions = {'normal' : '/home/weizhe.li/li-code4hpc/pred_dim_0314/training-updated/normal/dimensions',
			
 
				-                  'tumor' : '/home/weizhe.li/li-code4hpc/pred_dim_0314/training-updated/tumor/dimensions',
			
 
				-                  'test' : '/home/weizhe.li/li-code4hpc/pred_dim_0314/testing/dimensions'  
			
 
				-        }
			
 
				-slide.dimensions
			
 
				-A (width, height) tuple for level 0 of the slide.
			
 
				-
			
 
				-get_tile_dimensions(level, address)
			
 
				-    Return a (pixels_x, pixels_y) tuple for the specified tile.
			
 
				-
			
 
				-get_tile_coordinates(level, address)
			
 
				-Return the OpenSlide.read_region() arguments corresponding to the specified tile.
			
 
				-read_region(location, level, size)
			
 
				-Return an RGBA Image containing the contents of the specified region.
			
 
				-•llocation (tuple) – (x, y) tuple giving the top left pixel in the level 0 reference frame
			
 
				-•level (int) – the level number
			
 
				-•size (tuple) – (width, height) tuple giving the region size
			
 
				-
			
 
				-
			
 
				->>> crds
			
 
				-((96, 0), 0, (512, 288))
			
 
				->>> crds[0]
			
 
				-(96, 0)
			
 
				->>> crds[0][0]
			
 
				-96
			
 
				->>> crds[2]
			
 
				-(512, 288)
			
 
				->>> crds[2][0]
			
 
				-512
			
 
				->>> crds[2][1]
			
 
				-288
			
 
				->>> crds[1]
			
 
				-0
			
 
				-
			
 
				-The numbers of dimension times 224 is the actually dimension for the highest resolution image. 
			
 
				-
			
 
				-The dimension from openslide is a coordinate (x, y) x is the width and y is the height.
			
 
				-If the image was read into a numpy array. When you check the numpy array shape, you will get another coordinate (x, y) x is the height, y is the width. Please note that the sequence is reversed. Please use the description of the dimension file in this email. 
			
 
				-By the way, the openslide use the top left corner as the (0, 0) point. X axis poinst to right; Y axis points down
			
 
				-The dimension files I mentioned in my last email have their contents as the following:
			
 
				-
			
 
				-Each file store a list: [item1, item2, item3, item4, item5, item6, item7, item8]  br:  4256   5152   4256   5152
			
 
				-
			
 
				-item1:  height of the WSI image
			
 
				-item2:  width of the WSI image
			
 
				-item3: number of channels ( equal to 3. We don’t need this item)
			
 
				-item4:  x coordinate on left of bounding box;
			
 
				-item5:  x coordinate on right of bounding box;
			
 
				-item6: y coordinate on top of bounding box;
			
 
				-item7: y coordinate on bottom of bounding box;
			
 
				-item8: height of the bounding box of bounding box;
			
 
				-item9: width of the bounding box of bounding box.
			
 
				-
			
 
				-
			
 
				-=====
			
 
				-
			
 
				-Each file stores a list of bounding box. The actually coordinate on highest resolution needs to be timed by 224. I will send you a description tomorrow morning.
			
 
				-
			
 
				-The dimension files I mentioned in my last email have their contents as the following:
			
 
				-
			
 
				-Each file store a list: [item1, item2, item3, item4, item5, item6, item7, item8]
			
 
				-
			
 
				-item1:  width of the WSI image
			
 
				-item2:  height of the WSI image
			
 
				-item3: number of channels ( equal to 3. We don’t need this item)
			
 
				-item4:  x coordinate on left of bounding box;
			
 
				-item5:  x coordinate on right of bounding box;
			
 
				-item6: y coordinate on top of bounding box;
			
 
				-item7: y coordinate on bottom of bounding box;
			
 
				-item8: width of the bounding box of bounding box;
			
 
				-item9: height of the bounding box of bounding box.
			
 
				-
			
 
				-
			
 
				-
			
 
				-
			
 
				-Level counting
			
 
				-qsub get_levels.sh
			
 
				-cat level_count/* >> all_level_count.csv
			
 
				-sort -k1n all_level_count.csv > all_level_count-sorted.csv
			
 
				-
			
 
				-
			
 
				-1. Create a list of WSI located under: /scratch/wxc4/CAMELYON16-testing/
			
 
				-   cd /projects/mikem/UserSupport/weizhe.li/split_wsi
			
 
				-   ls -1 /scratch/wxc4/CAMELYON16-testing > list.txt
			
 
				-
			
 
				-2. Run the below job to split TIF files intosmaller HDF5 files 
			
 
				-   qsub split_grp.sh 
			
 
				-
			
 
				-3. Geneate look-up table:
			
 
				-   bash create_lookup_grp.sh 
			
 
				-   
			
 
				-4. Run below script to process all HDF5 file in parallel.
			
 
				-   qsub process_main.sh
			
 
				-
			
 
				-process_images_grp.o7677468
			
 
				-
			
 
				-At [mikem@betsy02 split_wsi]$:
			
 
				-RESULT=grp_timing1
			
 
				-BASEDIR=/projects/mikem/UserSupport/weizhe.li/split_wsi/sysout_process_images_grp
			
 
				-APP_PREFIX=process_images_grp.o7678134
			
 
				-find $BASEDIR -name "$APP_PREFIX.*" | xargs grep "seconds" > "$RESULT".txt
			
 
				-sort -k2 -n "$RESULT".txt > "$RESULT"_sorted.txt 
			
 
				-
			
 
				-
			
 
				-find /projects/mikem/UserSupport/weizhe.li/split_wsi/sysout_process_images_ds -name "process_images_ds.o7676383.*" | xargs grep "seconds" > ds_timing1.txt
			
 
				-sort -k2 -n ds_timing1.txt > ds_timing1_sorted.txt 
			
 
				-
			
 
				-find /projects/mikem/UserSupport/weizhe.li/split_wsi/sysout_process_images_ds -name "process_images_ds.o7674135.*" | xargs grep "seconds" > ds_timing.txt
			
 
				-sort -k2 -n ds_timing.txt > ds_timing_sorted.txt 
			
 
				-find /projects/mikem/UserSupport/weizhe.li/split_wsi/sysout_process_images -name "process_images.o7673281.*" | xargs grep "seconds" > file_timing.txt
			
 
				-sort -k2 -n file_timing.txt > file_timing_sorted.txt 
			
 
				-
			
 
				-create_dataset, see: http://docs.h5py.org/en/stable/high/group.html
			
 
				- 
			
 
				-
			
 
				-# for splitting and grouping 
			
 
				-D=/scratch/mikem/UserSupport/weizhe.li/runs_split_group/448_400_7684656/sysout
			
 
				-find $D -name "split*" | xargs grep "real" > split_timing.txt
			
 
				- 
			
 
				- 
			
 
				- 
			
 
				- 
			
 
				-